As a consequence, faculty who teach parallel computing courses are running ragged to keep up with current developments. Also, faculty who are interested in developing parallel computing courses in their schools are overwhelmed by the sheer size and variety of the parallel processing jungle.
To help alleviate this problem, this panel will focus on three key pedagogic questions central to teaching parallel computing:
The future of parallel processing will be influenced by the four following points:
Result: This will decrease the number of supercomputing facilities.
Result: Distributed processing will be hindered by slow networks.
Result: Everyone will have personal access to powerful workstations.
Result: To increase the speed of network interfacing, ethernet units will be integrated into processor designs and thus ethernet units will be placed on-chip similar to memory management units.
All of these points will yield small microprocessors that will be networked in a LAN fashion inside a small box. In each box, there will be several types of processors grouped by task. For example, one possible arrangement is: two processors for graphics imaging, three processors for database processing, two processors for general data processing, and one processor for external communication.
With this type of hardware arrangement, parallel computing courses will generally teach data parallel programming and multiple program multiple data (MPMD) programming. Besides these parallel concepts, more networking topics will be integrated into the parallel discussions. These topics will involve such concepts as data security and resource access control.
Recent advances in computer technology has made distributed computing an economical alternative to using dedicated parallel computers. An undergraduate course in parallel computing should focus on the technology that the students are most likely to work with once they graduate. I believe that courses in parallel computing should shift their focus away from dedicated parallel hardware to that of distributed computing.
B). I currently see two broad classes of parallel programs:
(1). transformational - a program that transforms the input to the output.
(2). reactive - a program that reacts to its environment.
The majority of programs written are transformational. The key characteristic is the lack of non-determinism, i. e., if one runs the program more than once with the same input one receives the same output. Most is not all programs written by undergraduate students are transformational. The majority of the scientific computing community views computing this way. The dataflow model and demand flow model (i. e., the basis for reduction machines and functional languages) are based on this view.
Reactive programs are event-driven such as operating systems or control programs for a robot. A primary focus of reactive programs is to deal with non-determinism. Tony Hoare's CSP model aims to capture the essence of reactive programs. Inmos's Occam/Transputer pair is a noble attempt at a software/hardware platform for reactive programs. Designers of real time systems and embedded systems also write reactive programs. Fault tolerance is a critical issue in reactive programs.
In the next five years, I see healthy growth in both transformational and reactive programming. Because of its shift of responsibility of scheduling, partitioning and synchronization away from the end user to the compiler, I predict that data parallelism will be the main stay of parallel transformational programming. I see FORTRAN 90 and data parallel versions of C++ having a strong hold. A data parallel APL has promise.
The future is less clear for reactive programs. The programming languages C, C++ and Ada will probably dominate.
C.) I see more ``home grown'' cluster computing where users add a high speed switch, e. g., ATM, to a cluster of high speed workstations. However, they will not replace the need for high end supercomputers. Linda and PVM are still too primitive to use clusters effectively. I hope for an effective data parallel C++ which run equally well on the big supercomputers and workstation clusters.
D). As users compute more and faster, there will be a need for a focus on large distributed databases.
To teach reactive programming, a ``visual'' hands-on platform, e. g., Rod Tosten's Model railroad concurrency lab, will be needed. A high performance machine is not needed.
Computational scientist will be extensive users of parallel transformational programs.
Operating and real time systems designers will need to learn parallel reactive programming.
The answers obtained vary with the flavor of the question, and distinctions are important. An oversimplified answer based on processor price-performance ratio alone could doom the NSF supercomputer centers, and, indeed, the MPP industry. While properly software-equipped clusters have advantages over to MPPs in throughput, they are still fundamentally and severely limited in single job elapsed time for large-scale parallel computations, in which the granularity is fine because the memory requirement is large.
A particular flavor of the question relevant to this panel is the degree to which clusters suffice for educational purposes. From personal experience, almost everything important about parallel algorithms can be taught on clusters, using one of the portable interfaces such as MPI, PVM or Chameleon. However, not everything taught will be believed unless a department maintains a dedicated cluster, separate from the interactive environment, on which high-performance applications may be run (part of the week, at least) in a relatively contention-free environment.
It is certainly incumbent upon parallel algorithms instructors to highlight the sensitivity of many popular algorithms, such as conjugate gradients, or any implicit PDE solver, to synchronization, and also to highlight the sensitivity of visualization environments, or of algorithms that exploit computational heterogeneity by passing essentially all of the data of the problem around between phases, to network contention.
Time-permitting, some quantitative illustrations will be presented to highlight the degradation caused when a high-performance, dedicated-link, richly interconnected network is replaced with ethernet.