PARALLEL COMPUTING: WHAT WE DID WRONG AND WHAT WE DID RIGHT Mar.
PARALLEL COMPUTING: WHAT WE DID WRONG AND WHAT WE DID RIGHT Mar. 17 FEATURE By Ken Kennedy, Director, CRPC HPCwire
In December, I attended a supercomputing conference in Taiwan. In addition to giving a lecture, I participated in a panel entitled "Parallel Computing: What Have We Done Wrong?" The panel, which included Kuo-Wei Wo of Taiwan, Convex's Steve Wallach, Purdue's H.J. Siegel, and Hans Zima of the University of Vienna, generated a stimulating discussion. In preparing for it, I did a lot of thinking about the issue and I would like to share some of my thoughts.
The first question is: "Did we do anything wrong?" Clearly many people are disappointed with the slow progress of parallel computing in the marketplace. I have even heard prominent researchers predict that history will view the High Performance Computing and Communications Program to have been a failure. Clearly we made some mistakes, but what were they? I believe that the principal mistake was excessive optimism and, correspondingly, excessive hype. This led to unrealistic expectations on the part of users expectations that turned to frustration in the cold light of reality. This problem has several dimensions which I will explore in the following paragraphs.
One aspect of this over-optimism was the naive belief that if we built and deployed large parallel machines, scientific users would flock to them in droves. To be sure, there was widespread interest in parallel computing in research labs and universities, but the average scientist remained somewhat skeptical. This skepticism was reinforced by the reports out of research labs that parallel machines were not even coming close to their anticipated peak performance. The vendors contributed to this problem by repeatedly overstating the potential, usually quoting the theoretical peak performance rather than performance on real benchmarks.
A related problem was our failure early on to identify software as the central problem. The fact is, the average user will not move from an environment where programming is relatively easy to one where it is relatively hard unless the performance gains are truly remarkable and unachievable by any other method. So long as vector supercomputers remained competitive in performance, there was no rush to convert.
Programming was hard on parallel machines for two reasons. First, the standard sequential algorithms familiar to most researchers were often ill-suited for parallel computation. With 45 years of algorithm development for sequential machines, it should not be surprising that finding parallel algorithms for the same purpose would take time.
A second factor that made programming hard was the typical parallel programming interface which made the architecture of the underlying machines visible to the user. Scientific programmers soon discovered how tedious it was to write parallel programs in a dialect that made the user responsible for creating and managing parallel computations and for explicit communication between the processors.
Tedious programming was not the only problem caused by machine-specific programming interfaces. A deeper problem was that the programs written for a parallel machine were architecture-specific. As a result, the programmer had to reprogram each time a new architecture emerged. Even worse, if he or she wished to run a program on different parallel architectures, multiple versions of the source were required. This was particularly problematic for corporate users contemplating a transition to parallelism for a production code. They knew the conversion would be expensive in terms of programming manpower and their investment would ultimately not be protected -- they might have to redevelop the program for their next machine or even the next generation of the same machine. The independent software vendors, who produce science and engineering applications that are widely used in the commercial sector, resisted moving to parallelism for just this reason. Without these applications, parallelism would be doomed to be an interesting but unsuccessful experimental technology.
Even when software was widely recognized as the problem, we underestimated it. Many of us thought that a breakthrough was just around the corner. We forgot that compiler and system development take a long time. When vector computers were introduced, it took ten years for vectorizing compilers to catch up, a time span that duplicated the experience with machines like the CDC 6600 and 7600 which had multiple pipelined functional units. Parallelization is clearly a much harder problem, so we should not have expected the problem of parallel programming to be solved in less than ten years from their initial availability in the mid-eighties.
Obviously, these were significant errors which were compounded when HPCC became a political football, but can we really rate the program a failure? My answer is an unqualified "no!" The HPCC program is now beginning to produce solutions to these difficult challenges. For example, standard interfaces like High Performance Fortran and Message Passing Interface and portable systems like Parallel Virtual Machine are beginning to turn the tide -- even some of the independent software vendors are planning parallel versions of their software. A notable example is J.S. Nolan who is working with a number of energy companies to produce a parallel implementation of VIP, their industry-standard reservoir simulation code.
So did we make some mistakes? Of course we did, but that does not mean the HPCC program has been a failure -- it is just taking longer than we expected to achieve its goals. Even with the emergence of powerful superworkstations, there is still a need to solve scientific problems that are too big for any one of these machines. What we have learned on parallel machines will also be useful on networks of high-performance workstations as well. Parallelism is now, and will remain for the foreseeable future, a critical technology for attacking large science and engineering problems. The Federal HPCC Program is doing what we asked of it-let's not give up on it just yet.
912) Avalon Computer 915) Genias Software *905) Maximum Strategy *921) Cray Research Inc. 930) HNSX Supercomputers 906) nCUBE 907) Digital Equipment. 902) IBM Corp. 932) Portland Group 909) Fujitsu American *904) Intel Corp. 935) Silicon Graphics 916) MasPar Computer 931) Sony Corporation
*Updated information within last 30 days
Affiliations | Leadership | Research & Applications | Major
Accomplishments | FAQ | Search | Knowledge &
Technology Transfer | Calendar of
Events | Education
& Outreach | Media
Resources | Technical Reports &
Publications | Parallel Computing Research Quarterly Newsletter | News Archives | Contact