Panel
The Dynamic Nature of Parallel Computing Curricula

Nobody questions the fact that parallel computing is a constantly changing discipline within computer science. This constant change is driven primarily by rapid changes in hardware advances in serial machines, parallel architectures and communication networks for both parallel processors and clusters of high performance workstations. Furthermore, among the many parallel computational models -- e. g., SIMD, MIMD, dataflow, data parallel, graph reduction -- proposed by researchers, no clear winner has emerged.

As a consequence, faculty who teach parallel computing courses are running ragged to keep up with current developments. Also, faculty who are interested in developing parallel computing courses in their schools are overwhelmed by the sheer size and variety of the parallel processing jungle.

To help alleviate this problem, this panel will focus on three key pedagogic questions central to teaching parallel computing:

Each panelist will present his opinion on these three key questions.
Below are opening position statements.


P. Takis Metaxas
Assistant Professor of Computer Science
Wellesley College

Fundamental Issues:

Data parallel programming will become the dominant way of programming, independently of platform - SIMDs, MIMDs, COWs... Clusters of workstations will see an increased usage as distributed computing platforms, but will not contribute to parallel processing unless a data parallel language that hides them appears. The real issues are: parallel languages with transparent communication and network embeddings, and really parallel I/O.

Facilities:

Anything that does the job: COWs (with the requirements mentioned), powerful supercomputers, bundled transputers (if, like COWs, become transparent), special-purpose machines. If scalability is of any importance (and we think it is), grid-connected processors with a shortcutting network planted on top (like mesh-of-trees or butterflies) will be the winning combination eventually.

Users:

Eventually everyone, from multimedia designers and MSWord 7.0 users (:-) to computational scientists and virtual reality surfers. Personally, I am looking forward to my feather-weight laptop which will divide almost equally its 32 processors to recognize my accent and handwriting, synthesize voice to respond, search and fetch video from the WWW, and run the applications I am currently using. As educators, we prepare our students best by teaching them ideas and paradigms, not programming tricks - these that belong to the OS, should be given to the OS. (See my talk for more on this.) We will need to change our curriculum several times, but we will not complain, because every time there will be a fascinating reason to do that!


Rodney Tosten
Assistant Professor of Computer Science
Gettysburg College

The future of parallel processing will be influenced by the four following points:

  1. Fact: Funding is decreasing for supercomputers.

    Result: This will decrease the number of supercomputing facilities.

  2. Fact: Networks are overloaded with World Wide Web and multimedia information

    Result: Distributed processing will be hindered by slow networks.

  3. Fact: Cheaper, smaller, and especially faster microprocessors are developing at a quick pace.

    Result: Everyone will have personal access to powerful workstations.

  4. Fact: The popularity of the internet is increasing at an extremely high rate.

    Result: To increase the speed of network interfacing, ethernet units will be integrated into processor designs and thus ethernet units will be placed on-chip similar to memory management units.

All of these points will yield small microprocessors that will be networked in a LAN fashion inside a small box. In each box, there will be several types of processors grouped by task. For example, one possible arrangement is: two processors for graphics imaging, three processors for database processing, two processors for general data processing, and one processor for external communication.

With this type of hardware arrangement, parallel computing courses will generally teach data parallel programming and multiple program multiple data (MPMD) programming. Besides these parallel concepts, more networking topics will be integrated into the parallel discussions. These topics will involve such concepts as data security and resource access control.


Paul Tymann
Assistant Professor of Computer Science
SUNY at Oswego

Recent advances in computer technology has made distributed computing an economical alternative to using dedicated parallel computers. An undergraduate course in parallel computing should focus on the technology that the students are most likely to work with once they graduate. I believe that courses in parallel computing should shift their focus away from dedicated parallel hardware to that of distributed computing.


Daniel C. Hyde
Associate Professor of Computer Science
Bucknell University

Fundamental Issues:

A.) Certainly the two main themes of parallelism - replication of agents and pipelining (assembly line) are fundamental and will be taught.

B). I currently see two broad classes of parallel programs:

(1). transformational - a program that transforms the input to the output.

(2). reactive - a program that reacts to its environment.

The majority of programs written are transformational. The key characteristic is the lack of non-determinism, i. e., if one runs the program more than once with the same input one receives the same output. Most is not all programs written by undergraduate students are transformational. The majority of the scientific computing community views computing this way. The dataflow model and demand flow model (i. e., the basis for reduction machines and functional languages) are based on this view.

Reactive programs are event-driven such as operating systems or control programs for a robot. A primary focus of reactive programs is to deal with non-determinism. Tony Hoare's CSP model aims to capture the essence of reactive programs. Inmos's Occam/Transputer pair is a noble attempt at a software/hardware platform for reactive programs. Designers of real time systems and embedded systems also write reactive programs. Fault tolerance is a critical issue in reactive programs.

In the next five years, I see healthy growth in both transformational and reactive programming. Because of its shift of responsibility of scheduling, partitioning and synchronization away from the end user to the compiler, I predict that data parallelism will be the main stay of parallel transformational programming. I see FORTRAN 90 and data parallel versions of C++ having a strong hold. A data parallel APL has promise.

The future is less clear for reactive programs. The programming languages C, C++ and Ada will probably dominate.

C.) I see more ``home grown'' cluster computing where users add a high speed switch, e. g., ATM, to a cluster of high speed workstations. However, they will not replace the need for high end supercomputers. Linda and PVM are still too primitive to use clusters effectively. I hope for an effective data parallel C++ which run equally well on the big supercomputers and workstation clusters.

D). As users compute more and faster, there will be a need for a focus on large distributed databases.

Facilities:

To teach the transformational class of parallel programming any fast computer that supports the data parallel approach will do. Perhaps even a cluster of workstations. The workstation would also support the distributed databases.

To teach reactive programming, a ``visual'' hands-on platform, e. g., Rod Tosten's Model railroad concurrency lab, will be needed. A high performance machine is not needed.

Users:

Future computer designers will need to understand parallel computing for the architectural issues.

Computational scientist will be extensive users of parallel transformational programs.

Operating and real time systems designers will need to learn parallel reactive programming.


David E. Keyes
Associate Professor of Computer Science
Old Dominion University &
Senior Research Associate
Inst. for Computer Applications in Science and Engineering, NASA Langley Research Center

Can Clusters Take Over for Tightly Coupled MPPs?

Now that the processor performance of desktop workstations has come within a small factor (even unity) of the individual processor performance of the best so-called "massively parallel processors" (MPPs), the question of the suitability of networked clusters of workstations as substitutes for MPPs is being nearly universally addressed. Corporations such as Boeing, United Technologies, and AT&T and federal agencies such as NASA and DOE are moving towards clusters, along with nearly every university dependent for computational resource support on sources such as these.

The answers obtained vary with the flavor of the question, and distinctions are important. An oversimplified answer based on processor price-performance ratio alone could doom the NSF supercomputer centers, and, indeed, the MPP industry. While properly software-equipped clusters have advantages over to MPPs in throughput, they are still fundamentally and severely limited in single job elapsed time for large-scale parallel computations, in which the granularity is fine because the memory requirement is large.

A particular flavor of the question relevant to this panel is the degree to which clusters suffice for educational purposes. From personal experience, almost everything important about parallel algorithms can be taught on clusters, using one of the portable interfaces such as MPI, PVM or Chameleon. However, not everything taught will be believed unless a department maintains a dedicated cluster, separate from the interactive environment, on which high-performance applications may be run (part of the week, at least) in a relatively contention-free environment.

It is certainly incumbent upon parallel algorithms instructors to highlight the sensitivity of many popular algorithms, such as conjugate gradients, or any implicit PDE solver, to synchronization, and also to highlight the sensitivity of visualization environments, or of algorithms that exploit computational heterogeneity by passing essentially all of the data of the problem around between phases, to network contention.

Time-permitting, some quantitative illustrations will be presented to highlight the degradation caused when a high-performance, dedicated-link, richly interconnected network is replaced with ethernet.