Volume 7, Issue 1 -
Spring/Summer 1999
Volume 6, Issue 3
Fall 1998
Volume 6, Issue 2
Spring/Summer 1998
Volume 6, Issue 1
Winter 1998
Volume
5, Issue 4
Fall 1997
Volume
5, Issue 3
Summer 1997
Volume
5, Issue 2
Spring 1997
Volume
5, Issue 1
Winter 1997
Volume
4, Issue 4
Fall 1996
Volume
4, Issue 3
Summer 1996
Volume
4, Issue 2
Spring 1996
Volume
4, Issue 1
Winter 1996
Volume
3, Issue 4
Fall 1995
Volume
3, Issue 3
Summer 1995
Volume
3, Issue 2
Spring 1995
Volume
3, Issue 1
January 1995
Volume
2, Issue 4
October 1994
Volume
2, Issue 3
July 1994
Volume
2, Issue 2
April 1994
Volume
2, Issue 1
January 1994
Volume 1, Issue 4
October 1993
Volume
1, Issue 3
July 1993
Volume
1, Issue 2
April 1993
Volume
1, Issue 1
January 1993
|
PARALLEL COMPUTING:
WHAT WE DID WRONG AND WHAT WE DID RIGHT
Ken Kennedy, Director, CRPC
In December, I attended a supercomputing conference in Taiwan. In
addition to giving a lecture, I participated in a panel entitled
"Parallel Computing: What Have We Done Wrong?" The panel, which
included Kuo-Wei Wo of Taiwan, Convex's Steve Wallach, Purdue's
H.J. Siegel, and Hans Zima of the University of Vienna, generated a
stimulating discussion. In preparing for it, I did a lot of thinking
about the issue and I would like to share some of my thoughts.
The first question is: "Did we do anything wrong?" Clearly many people
are disappointed with the slow progress of parallel computing in the
marketplace. I have even heard prominent researchers predict that
history will view the High Performance Computing and Communications
Program to have been a failure. Clearly we made some mistakes, but
what were they? I believe that the principle mistake was excessive
optimism and, correspondingly, excessive hype. This led to unrealistic
expectations on the part of users--expectations that turned to
frustration in the cold light of reality. This problem has several
dimensions, which I will explore in the following paragraphs.
One aspect of this over-optimism was the naive belief that if we built
and deployed large parallel machines, scientific users would flock to
them in droves. To be sure, there was widespread interest in parallel
computing in research labs and universities, but the average scientist
remained somewhat skeptical. This skepticism was reinforced by the
reports out of research labs that parallel machines were not even
coming close to their anticipated peak performance. The vendors
contributed to this problem by repeatedly overstating the potential,
usually quoting the theoretical peak performance rather than
performance on real benchmarks.
A related problem was our failure, early on, to identify software as
the central problem. The fact is, the average user will not move from
an environment where programming is relatively easy to one where it is
relatively hard unless the performance gains are truly remarkable and
unachievable by any other method. So long as vector supercomputers
remained competitive in performance, there was no rush to convert.
Programming was hard on parallel machines for two reasons. First, the
standard sequential algorithms familiar to most researchers were often
ill-suited for parallel computation. With 45 years of algorithm
development for sequential machines, it should not be surprising that
finding parallel algorithms for the same purpose would take time.
A second factor that made programming hard was the typical parallel
programming interface, which made the architecture of the underlying
machines visible to the user. Scientific programmers soon discovered
how tedious it was to write parallel programs in a dialect that made
the user responsible for creating and managing parallel computations
and for explicit communication between the processors.
Tedious programming was not the only problem caused by
machine-specific programming interfaces. A deeper problem was that the
programs written for a parallel machine were architecture-specific. As
a result, the programmer had to reprogram each time a new architecture
emerged. Even worse, if he or she wished to run a program on different
parallel architectures, multiple versions of the source were
required. This was particularly problematic for corporate users
contemplating a transition to parallelism for a production code. They
knew the conversion would be expensive in terms of programming
manpower and their investment would ultimately not be
protected--they might have to redevelop the program for their next
machine or even the next generation of the same machine. The
independent software vendors, who produce science and engineering
applications that are widely used in the commercial sector, resisted
moving to parallelism for just this reason. Without these
applications, parallelism would be doomed to be an interesting but
unsuccessful experimental technology.
Even when software was widely recognized as the problem, we
underestimated it. Many of us thought that a breakthrough was just
around the corner. We forgot that compiler and system development take
a long time. When vector computers were introduced, it took ten years
for vectorizing compilers to catch up, a time span that duplicated the
experience with machines like the CDC 6600 and 7600, which had
multiple pipelined functional units. Parallelization is clearly a much
harder problem, so we should not have expected the problem of parallel
programming to be solved in less than ten years from their initial
availability in the mid- eighties.
Obviously, these were significant errors, which were compounded when
HPCC became a political football, but can we really rate the program a
failure? My answer is an unqualified "no!" The HPCC program is now
beginning to produce solutions to these difficult challenges. For
example, standard interfaces like High Performance Fortran and Message
Passing Interface and portable systems like Parallel Virtual Machine
are beginning to turn the tide--even some of the independent
software vendors are planning parallel versions of their software. A
notable example is J.S. Nolan (mentioned in my column in this
newsletter's October issue), which is working with a number of energy
companies to produce a parallel implementation of VIP, their
industry-standard reservoir simulation code.
So did we make some mistakes? Of course we did, but that does not mean
the HPCC program has been a failure--it is just taking longer than we
expected to achieve its goals. Even with the emergence of powerful
superworkstations, there is still a need to solve scientific problems
that are too big for any one of these machines. What we have learned
on parallel machines will also be useful on networks of
high-performance workstations as well. Parallelism is now, and will
remain for the foreseeable future, a critical technology for attacking
large science and engineering problems. The Federal HPCC Program is
doing what we asked of it--let's not give up on it just yet.
Table of Contents
|