LEADING HPC FIGURES CONSIDER PETAFLOPS COMPUTING July 1
FEATURE By Paul Messina, California Institute of Technology; Thomas Sterling, University Space Research Association; Jarrett S. Cohen, Hughes STX Corp./NASA Goddard Space Flight Center; and Paul H. Smith, NASA HPCC HPCwire
Building a computer ten times more powerful than all the networked computing capability in the U.S. is the subject of a new report by leading figures in the high performance computing community.
On June 29, "Enabling Technologies for Peta(FL)OPS Computing" was presented to the High Performance Computing, Communications, and Information Technology (HPCCIT) Subcommittee of the National Science and Technology Council. HPCCIT reviewed the findings and recommendations of the workshop and considered alternative strategies for addressing this important challenge. The report is the culmination of a February 1994 workshop hosted by the Jet Propulsion Laboratory and the California Institute of Technology in Pasadena, CA. Participating were more than 60 invited contributors from industry, academia, and government.
The Federal High Performance Computing and Communications Program's (HPCC) goal of sustained teraflops computing by the year 1997 is well on its way to being met. At a million billion (including, but not restricted to, floating point) operations per second, however, a peta(fl)ops (hereafter designated petaflops) machine is so far beyond anything within contemporary experience that it may require entirely new paradigms in architecture, technology, and programming.
"No goal envisioned will be more challenging, demand greater coordination and collaboration among all sectors of the high performance community, or more strongly promote ultimate US leadership in all facets of computing into the next century," the report states. With such a goal in hand, planning and even early research into petaflops system design and methodologies may be essential now, a process the workshop was devised to initiate.
THE FIRST COMPREHENSIVE ASSESSMENT
The broad purpose of the Workshop on Enabling Technologies for Peta(FL)OPS Computing was to conduct and produce the first comprehensive assessment of such systems and to establish a baseline understanding of opportunities, challenges, and critical elements. Specifically, the major objectives of the workshop were to:
-- identify applications of scientific, economic, and societal importance requiring petaflops scale computing;
-- determine the technical barriers to achieving effective petaflops computing systems;
-- reveal enabling technologies that may facilitate implementation of petaflops computers and determine their respective roles in contributing to this objective;
-- delineate the research issues that define the boundary between today's state-of-the-art understanding and key advanced concepts for tomorrow's petaflops computing systems; and
-- establish a research agenda for near-term work focused on immediate questions contributing to our uncertainty and imposing the greatest risks to launching a major long-term research initiative.
The workshop was jointly sponsored by the National Aeronautics and Space Administration, the Department of Energy, the National Science Foundation, the Advanced Research Projects Agency, the National Security Agency, and the Ballistic Missile Defense Organization. Paul H. Smith (NASA HPCC) and Paul Messina (Caltech Concurrent Supercomputing Facilities) were the organizational and program chairs, respectively. Participation was by invitation, and selection was made to achieve the highest quality and coverage of the driving technical areas, as well as representation from all elements of the HPC community.
Seymour Cray (Cray Computer Corp.) and Konstantin Likharev (State University of New York at Stony Brook) delivered opening talks to set the direction and tone of the workshop. Cray spoke on the necessity of miniaturizing devices to the nanometer scale to achieve petaflops speed at reasonable size and cost. He also detailed some possibilities of using biological devices to construct them. Likharev discussed what he considers the "unparalleled advantages" of superconductors, including their picoseconds switching times and their virtually negligible power requirements. Problems include the high costs of helium refrigeration and large memories.
For most of the three-day meeting, attendees broke into four working groups reflecting the pace-setting disciplines that are both enabling and limiting progress towards practical petaflops computing systems.
The Applications Working Group, chaired by Geoffrey Fox (Syracuse University), considered the classes of applications and algorithms -- which cut across every conceivable field -- that are both important to national needs and capable of exploiting this scale of processing. Through these discussions, group members reached some understanding of the resource requirements for such applications.
The Device Technology Working Group, chaired by Carl Kukkonnen (Jet Propulsion Laboratory), explored the three most likely technologies to contribute to achieving petaflops performance: semiconductor, cryogenic superconducting, and optics. This group established projections of the capabilities for each technology family and distinguished them in terms of their strengths and weaknesses in supporting petaflops computing.
The Architecture Working Group, chaired by Harold Stone (IBM Research), examined three alternative structures comprising processor, communication, and memory subunits enabled by future technologies and scaled to petaflops performance. They investigated the most likely organizations and mixes of functional elements at different levels of technology capability to reveal a spectrum of possible systems.
The Software Technology Working Group, chaired by Bert Halstead (DEC Cambridge Research Laboratory), took on the challenging task of identifying the principal obstacles to effective application of future petaflops computing systems imposed by conventional software environments. They also pursued the implications of alternative environments and functionality that might substantively contribute to enhanced usefulness.
The topic area and issues were sufficiently beyond today's design considerations that few of the participants had direct experience dealing with the realm of operation implied by petaflops scale computing. This constituted new and fertile ground for exploration, with the pathfinders carrying few preconceived conclusions. Therefore, the following findings represent a truly fresh look at high performance computing systems four orders-of-magnitude beyond the present. The most significant of these findings are presented below:
Construction of an effective petaflops computing system will be feasible in approximately 20 years based on current technology trend projections.
There are and will be a wide range of applications in science, engineering, economics, and societal information infrastructure and management that will demand petaflops capability in the near future.
Cost, more than any other single aspect, will dominate the ultimate viability and likely time frame in which petaflops systems will come into practical use.
Reliability of petaflops computer systems will be manageable, but only because cost considerations will preclude systems of much greater number of components than comprise contemporary massively parallel processing systems.
No fundamental paradigm shift in system architecture is required to achieve petaflops capable systems. Advanced variations on the non- uniform memory access MIMD (and possibly SIMD) architecture model should suffice, although specific details will vary significantly from today's implementations.
It is likely that a petaflops computer will exhibit a wide diameter, i.e., a large propagation delay across a system measured in system clock cycles. Aggressive latency management techniques and million- fold concurrency will be key facets of system operation at this scale.
The petaflops computer will be dominated by its memory. However, at least for science and engineering applications, memory requirements will scale less than linearly with performance. A petaflops system will require on the order of 30 terabytes of main memory.
A mix of disparate technologies will yield superior performance-to-cost than would be possible with any single technology alone. This will permit earlier delivery of petaflops capacity systems than would otherwise be possible. Semiconductor technology will dominate memory with some logic, and progress toward this goal will be tied to advances in the semiconductor industry. Optics will provide high bandwidth inter- module communication at all levels and mass storage but little or no logic. Superconducting technology operating at cryogenic temperatures may yield very high performance logic and exceptionally low power consumption.
Major advances in software methodologies for programming and resource management will be necessary if such systems are to be practical for end user applications.
ISSUES TO EXPLORE, OBSTACLES TO CONQUER
During the course of workshop deliberations, many issues were brought to light, clarifying the space of opportunities and obstacles but leaving many questions unanswered. For example, assumptions of semiconductor technology in 20 years were derived from Semiconductor Industry Association projections that are only targeted to the year 2007, requiring extrapolation beyond that point. Attendees questioned the economics of specialty hardware, leaving unresolved the degree to which any future petaflops computer design must rely on commodity parts developed for more general commercial application.
The nature of the user base for petaflops computers was highly contested as well. The possibilities included classical science and engineering problems, total immersion virtual reality human interfacing, and massive information management and retrieval. The difficulty of programming even today's massively parallel processing systems left open the prospect that significant resources would be committed to achieving ease-of-use at the cost of sustained performance. How such systems would ultimately be programmed is yet uncertain.
The narrow scope of examined architectures was still very broad with respect to the technology issues they posed. Although for each of the three architectures latency was seen as driving system architecture decisions, the space of alternatives was left too wide to recommend a specific approach over all others. And beyond those explicitly examined, there remains the possibility of completely untried architectures that might greatly accelerate the pace to petaflops computing. These and other issues, while revealed as important at this workshop, remained unresolved at its close.
The workshop concluded with key recommendations for near-term initiatives to reduce uncertainty and advance US capability towards the achievement of petaflops computing. In the area of device technology, it is imperative that better projections for semiconductor evolution be derived and that the true potential of superconducting technology be better understood. With regards to applications, specific candidates for petaflops execution should be studied in depth to determine the required balance of resources and validate the appropriateness of the primary candidate architectures. Such a study should include at least one application for which there is little current use but that is potentially important to the future.
The architecture working group covered many facets of petaflops architecture and produced a meaningful overview of a tenable petaflops computer structure, but many details had to be left unspecified. Participants recommend that a near-term study be initiated to fill in the gaps, determining the requirements of the constituent elements of such a future machine. These specifications are essential to validating the approach and determining the requirements for all of the technologies used in its implementation.
In conclusion, this first comprehensive review of petaflops computing systems brought together a remarkable set of experts and focused their talents on a question of great future importance to the nation's strength in science and engineering as well as its economic leadership in the next century. Ideas, both conservative and controversial, were explored. An initial set of findings derived from these in-depth deliberations will set the course toward the ultimate achievement of a petaflops computer. Recommendations resulting from these findings have been made available to technical policy makers and the high performance computing research community to establish near-term initiatives furthering these goals.
OBTAINING THE REPORT
"Enabling Technologies for Peta(FL)OPS Computing," by Thomas Sterling, Paul C. Messina, and Paul H. Smith, is available from Caltech in preprint form. Send email requests to: email@example.com or write to: CCSF Techpubs, California Institute of Technology, Mail Code 158-79, Pasadena, CA 91125. Include the title, authors, and report number (in this case, CCSF-45) to expedite your request.
901) ANS 902) IBM Corp. 904) Intel SSD 905) Maximum Strategy 906) nCUBE 907) Digital Equipment 909) Fujitsu America 912) Avalon Computer 914) Applied Parallel Res. 915) Genias Software 916) MasPar Computer 921) Cray Research Inc. 927) ISR Corp.
Affiliations | Leadership | Research & Applications | Major
Accomplishments | FAQ | Search | Knowledge &
Technology Transfer | Calendar of
Events | Education
& Outreach | Media
Resources | Technical Reports &
Publications | Parallel Computing Research Quarterly Newsletter | News Archives | Contact
Sites & Affiliations | Leadership | Research & Applications | Major Accomplishments | FAQ | Search | Knowledge & Technology Transfer | Calendar of Events | Education & Outreach | Media Resources | Technical Reports & Publications | Parallel Computing Research Quarterly Newsletter | News Archives | Contact Information