Sites and Affiliations
Research and Applications
Major Accomplishments
Frequently Asked Questions
Knowledge and Technology Transfer
Education and Outreach
Media Resources
Technical Reports and Publications
Parallel Computing Research - Our Quarterly Newsletter
Contact Information
CRPC Home Page

UniGuide Featured Site

For projects ranging from aircraft manufacturing to oil and gas production, companies increasingly rely on high-performance technical servers to analyze structures, calculate fluid flow, design products, and simulate hard-to-test conditions. The servers also help scientists and engineers visualize that which can't be seen, from cancer-causing molecules to buried geologic faults.

Increasingly, parallel computers are the platforms of choice for running high-performance applications. New, flexible system architectures like global shared memory, developed by Hewlett-Packard for Exemplar S- and X-Class technical servers, are giving a wider range of users access to parallel processing power.

Other key factors include the increasing portability and performance of large-scale parallel application software - thanks partly to a decade-long initiative funded by the National Science Foundation (NSF). This initiative is spearheaded by the Center for Research on Parallel Computation (CRPC), an NSF Science and Technology Center.

For the CRPC, "Center" may be a misnomer. The organization consists of research groups at Rice University, the headquarters, as well as at Argonne National Laboratory, the California Institute of Technology, Los Alamos National Laboratory, Syracuse University, the University of Tennessee at Knoxville, and the University of Texas at Austin.


The CRPC was launched in 1989 to make large high-performance parallel computers as usable as conventional systems. This mission includes making parallel applications more portable and improving their performance on large technical servers. Since March 1997 the CRPC has been using a 24-processor Exemplar X-Class server as one of its primary platforms to pursue these and related research goals. The organization has been developing compiler software and optimization techniques that enable parallel programmers to efficiently generate high- quality parallel code for applications. By generating usable code more quickly, software firms reduce application development costs and time-to-market while producing better applications.

The CRPC also works directly with companies to refine parallel application programming techniques.


Hewlett-Packard became an early supporter of the Center for Research on Parallel Computation and also a collaborator, clarifying for CRPC research groups the technical issues associated with making high-performance parallel computing more useful to commercial enterprises.

Caltech CRPC researcher K. Mani Chandy (seated) demonstrates air pollution modeling in the South Coast Air Basin of California. Graduate student Adam Rifkin and programmer Zuzana Dobes look on. The demo shows a problem-solving environment designed to help make parallel computers easier to use.

"...we can better develop improvements that turn parallel computers into more usable and effective production environments for the commercial world."

Ken Kennedy, CRPC director

"We have a strong relationship with HP's Advanced Technology Center," says Ken Kennedy, Ann and John Doerr Professor of Computer Science at Rice University in Houston, Texas, and CRPC director. Kennedy's vision drove the CRPC's creation. "Researchers here have collaborated with HP for 17 years, since well before we founded the CRPC," Kennedy adds. "It has given us the opportunity to work with the people who actually produce the HP machine software," he says. "This sharpens our research by focusing us on real-world issues - so we can better develop improvements that turn parallel computers into more usable and effective production environments for the commercial world." The collaboration continues. HP has adapted CRPC-developed compiler technology and optimization techniques, making it easier for developers to write software for HP Exemplar high-performance servers and to port the software to and from other types of computer systems.


"From our perspective," says Kennedy, "the Exemplar server gives us a great architecture and a great configuration for doing fundamental research into high-performance parallel computing. It's a very interesting machine, particularly easy to work with.

"For example," he says, "we can test- run experimental software on 16 processors or run it on eight processors in one node and eight in another. By measuring any differences in performance, we gain a fresh understanding of the memory hierarchy and its impact on application performance."

This understanding gives CRPC researchers the information they need to write software that delivers higher performance by taking full advantage of the system's architecture.

"The Exemplar server is a logical choice for the CRPC because of some very useful features," adds Rice University computer scientist and CRPC researcher Vikram Adve. "There's the global shared-memory architecture, the clustered nature of the machine, and the fact that it has multiple levels of shared memory within each node and across the nodes."

An Exemplar node contains a cluster of two to 16 tightly integrated, high- performance HP PA-RISC processors. An Exemplar server, in turn, can contain any number of nodes.

An Exemplar server's memory, input/output (I/O), bandwidth and storage can all be balanced and scaled to the number of processors, creating the first shared-memory parallel processing architecture that is both truly scalable and commercially successful - a major benefit to companies and other organizations, such as the CRPC, that acquire and use high- performance technical servers.

Scalability lets users tailor a high- performance technical server to the enterprise's growth, changing needs and strategic objectives - instead of junking the server and acquiring a costly new one or, worse, making do with a system that no longer fits the need.


The greatest benefit, perhaps, comes from the Exemplar global shared- memory (GSM) architecture. GSM enables any Exemplar processor to rapidly access any part of the system's memory. This capability helps make widespread parallel application development practical and affordable. Instead of forcing application developers to learn difficult, esoteric programming techniques, GSM enables them to write parallel software by applying familiar, efficient, intuitive shared- memory programming methods - essentially the same that developers use when writing program code for conventional computers. It's as if a Parisian in Beijing were suddenly free to converse with everyone in French rather than having to fluently speak, write and even think in Chinese.

Because the Exemplar shared-memory architecture lets developers use programming languages and techniques they already know, many more developers are writing parallel applications - and are doing so faster and at lower cost, often with better results.

All of these benefits make HP's technology and objectives complementary with CRPC efforts: to give more scientists and engineers at a greater number of enterprises ready access to high-performance parallel applications and servers.


The CRPC's research may help spark an explosion in the multibillion-dollar parallel computing marketplace. For decades, that market faced a classic chicken-and-egg dilemma. Not many users invested in machines for which few applications existed, and not many developers wrote applications that could run on only a few machines.

Technology advances from the CRPC plus flexible system architectures like HP's global shared memory are together breaking that historic deadlock. These advances are unleashing the same dynamic that drove the unprecedented market success of personal computers: once application could run on any PC, software developers and users knew their respective investments were protected. Market growth soared.

For users of parallel systems like the Exemplar server, resolving the dilemma has meant gaining access to a wider variety of parallel applications that deliver more performance and offer more capabilities and ease of use while costing less to buy and maintain - all of which make acquiring a parallel platform more attractive.

These capabilities attract more application developers to the parallel market, further improving the selection and value available to a burgeoning spectrum of users.


Just a few years ago, nearly all large parallel systems employed distributed memory architecture, a factor that slowed the spread of parallel computing. This architecture requires a type of programming called message passing, which is difficult and tedious to work with, making applications more costly to develop and maintain.

"The Exemplar server is a logical choice for the CRPC because of some very useful features."
Vikram Adve,
Rice University computer scientist
and CRPC researcher


As a result, Kennedy recalls, Writing programs for parallel machines was nowhere as easy as writing them for shared-memory processing (SMP) systems." PCs and multiprocessor workstations are all SMP-type machines in which processors readily access a single pool of system memory.

HP addressed the issue by creating the GSM architecture - combining parallel processing power with shared memory's ease of programming. At the same time, CRPC researchers contributed their own breakthroughs. They developed the Message Passing Interface (MPI), a way to make programs that use message passing run on almost any parallel system, regardless of its architecture.


A major CRPC achievement to date is the development of High Performance Fortran, a parallel version of Fortran, the highly portable computer language long favored by programmers for writing scientific and engineering applications. An industrywide team, under the auspices of the High Performance Fortran Forum, is finalizing work on High Performance Fortran 2.0, offering extensions requested by early users of the language, higher application performance, and other improvements.

High Performance Fortran and MPI - along with other techniques originated by the CRPC and further developed by HP - enable software developers to use a single programming language and set of tools to write shared-memory program code, message-passing code, or a mixture of the two, providing an extremely efficient use of software development resources. It reduces the total cost of owning and using shared-memory parallel systems like the Exemplar server, making them much more attractive to business and industry.

Further, software developers writing in High Performance Fortran need create only one version of their source code, the basic application program. It can then be readily ported - adapted - to any large-scale parallel computing system, giving developers a potent incentive to write parallel applications. As a result they can sell copies to run on any number of parallel systems and architectures.

High Performance Fortran has become a de facto standard language for parallel programming. "More than 40 major parallel applications are already written in High Performance Fortran," Kennedy says," and independent developers have marketed their own implementations of the language" - including a High Performance Fortran release that HP makes available to Exemplar users and software developers.


Two CRPC research groups based at Rice have begun using the Exemplar system.

The Fortran Parallel Programming Systems Group aims to perfect High Performance Fortran. The group's current focus: developing High Performance Fortran compiler tools that will generate code that rivals the performance of hand-coded programs - a difficult challenge.

In addition to portability, high performance is demanded by users of technical servers to run iterations of their applications as quickly as possible. Doing so can increase product design quality, seismic imaging accuracy for petroleum exploration, and the realism of simulations.

"Looking ahead," says computer scientist Vikram Adve, "we see compiler techniques that radically restructure the data to be processed. They will also restructure the processing. This will make the most efficient use of a multilevel memory hierarchy such as the Exemplar server's, and also reduce computational overhead by managing synchronization more efficiently." Synchronization determines how processors cooperate in parallel to solve a numeric problem.

The Parallel Optimization Group works on techniques to optimize multidisciplinary programming design. Adve explains, "An aerospace firm, for instance, must solve in a timely manner huge numeric problems that mix computational fluid dynamics, structures modeling, and manufacturability - while using a finite amount of processing resources."

Two non-CRPC computer science teams at Rice are also using the new Exemplar server.

One group seeks to boost the performance of shared-memory parallel systems by developing techniques to measure how a program runs and to identify excess synchronization. This will let developers improve application performance by reducing computational overhead.

Another group is developing ways to simulate parallel computer architectures - an important step to designing better computer systems. "Simulators tend to be extremely slow," says Adve. "Parallelizing them will help systems engineers refine computer designs more quickly and thoroughly, resulting in better designs and a shorter time-to- market."


Other institutions in the computer science research community also use Exemplar systems. Two CRPC research groups based at California Institute of Technology (Caltech) use the world's largest and most powerful Exemplar server, a 256-processor system. One group, which also includes Los Alamos scientists, is working on ways to parallelize the processing of differential equations - important to speeding the solution of numeric problems in many science and engineering disciplines.

A second Caltech group, collaborating with Scientists at Argonne National Laboratory, addresses parallel program integration. This team is developing modular parallel extensions to common languages, such as Fortran and C, again to make parallel computation accessible to a greater number of engineers. The group also develops programming templates that help scientists and engineers parallelize their numeric problems.

The CRPC's seven core sites have computer science research affiliations with more than a dozen other universities and institutions, including the National Center for Supercomputing Applications at the University of Illinois, where teams use a 64-processor Exemplar X-Class server to tackle challenges facing users of information systems.

Together, Caltech and HP continuously collaborate with NASA's Jet Propulsion Laboratory to improve parallel system software and tools.


"We will continue to address issues critical to the long-term success of high-performance parallel computing," says Ken Kennedy. "Some of our people are developing a set of parallel extensions to C++, one of the most widely used conventional programming languages.

"Soon we hope to test-run large numeric problems by networking our Exemplar server here at Rice with the ones at Caltech and NCSA at the University of Illinois. Then we may try networking the Exemplar server with other high-performance platforms."

Kennedy adds, "There will always be a need to better understand the fundamentals of how parallel computing and specific parallel platforms work, to deliver better application performance and usability to commercial enterprises. The CRPC will be pioneering that work. And I expect we will continue our long collaboration with HP to meet those challenges."

Two CRPC research groups based at Caltech use the world's largest and most powerful Exemplar server, a 256-processor system.

    For more information, contact any of our worldwide sales offices or HP Channel Partners (in the U.S., call 1-800-637-7740; in Canada, call 1-800-387-3867). Look for HP on the World Wide Web (

    The information contained in this document is subject to change without notice.

    Copyright Hewlett-Packard Co., 1998
    All Rights Reserved. Reproduction, adaptation, or translation without prior written permission is prohibited except as allowed under the copyright laws.

    Printed in USA 01/98