For projects ranging from aircraft
manufacturing to oil and gas production,
companies increasingly rely on
high-performance technical servers to
analyze structures, calculate fluid
flow, design products, and simulate
hard-to-test conditions. The servers
also help scientists and engineers
visualize that which can't be seen,
from cancer-causing molecules to
buried geologic faults.
Increasingly, parallel computers are
the platforms of choice for running
high-performance applications. New,
flexible system architectures like
global shared memory, developed by
Hewlett-Packard for Exemplar S- and
X-Class technical servers, are giving a
wider range of users access to parallel
Other key factors include the increasing
portability and performance of
large-scale parallel application software - thanks partly to a decade-long
initiative funded by the National
Science Foundation (NSF). This initiative is spearheaded by the Center for
Research on Parallel Computation
(CRPC), an NSF Science and
For the CRPC, "Center" may be a misnomer. The organization consists of
research groups at Rice University,
the headquarters, as well as at
Argonne National Laboratory, the
California Institute of Technology, Los
Alamos National Laboratory, Syracuse
University, the University of
Tennessee at Knoxville, and the
University of Texas at Austin.
The CRPC was launched in 1989 to
make large high-performance parallel
computers as usable as conventional
systems. This mission includes making
parallel applications more portable
and improving their performance on
large technical servers. Since March
1997 the CRPC has been using a 24-processor Exemplar X-Class server as
one of its primary platforms to pursue
these and related research goals.
The organization has been developing
compiler software and optimization
techniques that enable parallel programmers to efficiently generate high-
quality parallel code for applications.
By generating usable code more quickly, software firms reduce application
development costs and time-to-market
while producing better applications.
The CRPC also works directly with
companies to refine parallel application programming techniques.
Hewlett-Packard became an early
supporter of the Center for Research
on Parallel Computation and also a
collaborator, clarifying for CRPC
research groups the technical issues
associated with making high-performance parallel computing more useful
to commercial enterprises.
Caltech CRPC researcher K. Mani Chandy (seated) demonstrates air pollution
modeling in the South Coast Air Basin of California. Graduate student Adam
Rifkin and programmer Zuzana Dobes look on. The demo shows a problem-solving
environment designed to help make parallel computers easier to use.
"...we can better develop improvements
that turn parallel computers into more usable
and effective production environments for
the commercial world."
Ken Kennedy, CRPC director
"We have a strong relationship with
HP's Advanced Technology Center,"
says Ken Kennedy, Ann and John
Doerr Professor of Computer Science
at Rice University in Houston, Texas,
and CRPC director. Kennedy's vision
drove the CRPC's creation. "Researchers here have collaborated with HP for
17 years, since well before we founded
the CRPC," Kennedy adds.
"It has given us the opportunity to
work with the people who actually
produce the HP machine software," he
says. "This sharpens our research by
focusing us on real-world issues - so
we can better develop improvements
that turn parallel computers into more
usable and effective production environments for the commercial world."
The collaboration continues. HP has
adapted CRPC-developed compiler
technology and optimization techniques, making it easier for developers
to write software for HP Exemplar
high-performance servers and to port
the software to and from other types
of computer systems.
"From our perspective," says Kennedy,
"the Exemplar server gives us a great
architecture and a great configuration
for doing fundamental research into
high-performance parallel computing.
It's a very interesting machine, particularly easy to work with.
"For example," he says, "we can test-
run experimental software on 16
processors or run it on eight processors in one node and eight in another.
By measuring any differences in performance, we gain a fresh understanding of
the memory hierarchy and its
impact on application performance."
This understanding gives CRPC
researchers the information they need
to write software that delivers higher
performance by taking full advantage
of the system's architecture.
"The Exemplar server is a logical
choice for the CRPC because of some
very useful features," adds Rice
University computer scientist and
CRPC researcher Vikram Adve.
"There's the global shared-memory
architecture, the clustered nature of
the machine, and the fact that it has
multiple levels of shared memory
within each node and across the
An Exemplar node contains a cluster
of two to 16 tightly integrated, high-
performance HP PA-RISC processors.
An Exemplar server, in turn, can contain any number of nodes.
An Exemplar server's memory,
input/output (I/O), bandwidth and
storage can all be balanced and scaled
to the number of processors, creating
the first shared-memory parallel processing architecture that is both truly
scalable and commercially successful - a major benefit to companies and
other organizations, such as the
CRPC, that acquire and use high-
performance technical servers.
Scalability lets users tailor a high-
performance technical server to the
enterprise's growth, changing needs
and strategic objectives - instead of
junking the server and acquiring a
costly new one or, worse, making
do with a system that no longer fits
The greatest benefit, perhaps, comes
from the Exemplar global shared-
memory (GSM) architecture. GSM
enables any Exemplar processor to
rapidly access any part of the system's
memory. This capability helps make
widespread parallel application development practical and affordable.
Instead of forcing application developers to learn difficult, esoteric
programming techniques, GSM enables them to
write parallel software by applying
familiar, efficient, intuitive shared-
memory programming methods -
essentially the same that developers
use when writing program code for
conventional computers. It's as if a
Parisian in Beijing were suddenly free
to converse with everyone in French
rather than having to fluently speak,
write and even think in Chinese.
Because the Exemplar shared-memory
architecture lets developers use programming languages and techniques
they already know, many more developers are writing parallel applications -
and are doing so faster and at
lower cost, often with better results.
All of these benefits make HP's technology and objectives complementary
with CRPC efforts: to give more scientists and engineers at a greater number
of enterprises ready access to
high-performance parallel applications and servers.
MULTIBILLION - DOLLAR MARKET
The CRPC's research may help spark
an explosion in the multibillion-dollar
parallel computing marketplace.
For decades, that market faced a
classic chicken-and-egg dilemma.
Not many users invested in machines
for which few applications existed,
and not many developers wrote
applications that could run on only
a few machines.
Technology advances from the CRPC
plus flexible system architectures like
HP's global shared memory are
together breaking that historic deadlock. These advances are unleashing
the same dynamic that drove the
unprecedented market success of personal computers: once application
could run on any PC, software developers and users knew their respective
investments were protected. Market
For users of parallel systems like
the Exemplar server, resolving the
dilemma has meant gaining access to
a wider variety of parallel applications
that deliver more performance and
offer more capabilities and ease of use
while costing less to buy and maintain - all of which make acquiring a
parallel platform more attractive.
These capabilities attract more application developers to the parallel market,
further improving the selection
and value available to a burgeoning
spectrum of users.
Just a few years ago, nearly all large
parallel systems employed distributed
memory architecture, a factor that
slowed the spread of parallel computing. This architecture requires a type
of programming called message passing, which is difficult and tedious to
work with, making applications more
costly to develop and maintain.
"The Exemplar server is a logical
choice for the CRPC because of
some very useful features."
Rice University computer scientist
and CRPC researcher
As a result, Kennedy recalls, Writing
programs for parallel machines was
nowhere as easy as writing them for
shared-memory processing (SMP) systems." PCs and multiprocessor workstations
are all SMP-type machines in
which processors readily access a
single pool of system memory.
HP addressed the issue by creating
the GSM architecture - combining
parallel processing power with shared
memory's ease of programming.
At the same time, CRPC researchers
contributed their own breakthroughs.
They developed the Message Passing
Interface (MPI), a way to make programs that use message passing run
on almost any parallel system, regardless of its architecture.
A major CRPC achievement to date is
the development of High Performance
Fortran, a parallel version of Fortran,
the highly portable computer language
long favored by programmers for writing scientific and engineering
applications. An industrywide team, under
the auspices of the High Performance
Fortran Forum, is finalizing work
on High Performance Fortran 2.0,
offering extensions requested by
early users of the language, higher
application performance, and other
High Performance Fortran and MPI -
along with other techniques originated
by the CRPC and further developed by
HP - enable software developers to use
a single programming language and
set of tools to write shared-memory
program code, message-passing code,
or a mixture of the two, providing an
extremely efficient use of software
development resources. It reduces
the total cost of owning and using
shared-memory parallel systems like
the Exemplar server, making them
much more attractive to business and
Further, software developers writing
in High Performance Fortran need
create only one version of their source
code, the basic application program. It
can then be readily ported - adapted - to
any large-scale parallel computing system, giving developers a potent
incentive to write parallel applications. As a
result they can sell copies to run on
any number of parallel systems and
High Performance Fortran has
become a de facto standard language
for parallel programming. "More than
40 major parallel applications are
already written in High Performance
Fortran," Kennedy says," and independent developers have marketed their
own implementations of the language" - including a High Performance
Fortran release that HP makes available to Exemplar users and software
Two CRPC research groups based at
Rice have begun using the Exemplar
The Fortran Parallel Programming Systems
Group aims to perfect High
Performance Fortran. The group's
current focus: developing High
Performance Fortran compiler tools
that will generate code that rivals the
performance of hand-coded programs - a difficult challenge.
In addition to portability, high performance is demanded by users of technical
servers to run iterations of their
applications as quickly as possible.
Doing so can increase product design
quality, seismic imaging accuracy for
petroleum exploration, and the realism of simulations.
"Looking ahead," says computer scientist Vikram Adve, "we see compiler
techniques that radically restructure
the data to be processed. They will
also restructure the processing. This
will make the most efficient use of a
multilevel memory hierarchy such as
the Exemplar server's, and also reduce
computational overhead by managing
synchronization more efficiently."
Synchronization determines how
processors cooperate in parallel to
solve a numeric problem.
The Parallel Optimization Group works on
techniques to optimize multidisciplinary programming design. Adve
explains, "An aerospace firm, for
instance, must solve in a timely manner huge numeric problems that mix
computational fluid dynamics, structures modeling, and manufacturability -
while using a finite amount of
Two non-CRPC computer science
teams at Rice are also using the new
One group seeks to boost the performance of shared-memory parallel
systems by developing techniques to
measure how a program runs and to
identify excess synchronization. This
will let developers improve application performance by reducing computational overhead.
Another group is developing ways to
simulate parallel computer architectures - an important step to designing
better computer systems. "Simulators
tend to be extremely slow," says Adve.
"Parallelizing them will help systems
engineers refine computer designs
more quickly and thoroughly, resulting
in better designs and a shorter time-to-
Other institutions in the computer
science research community also
use Exemplar systems. Two CRPC
research groups based at California
Institute of Technology (Caltech) use
the world's largest and most powerful
Exemplar server, a 256-processor system. One group, which also includes
Los Alamos scientists, is working on
ways to parallelize the processing of
differential equations - important to
speeding the solution of numeric
problems in many science and engineering disciplines.
A second Caltech group, collaborating with Scientists at Argonne National
Laboratory, addresses parallel program integration. This team is developing
modular parallel extensions to
common languages, such as Fortran
and C, again to make parallel computation accessible to a greater number
of engineers. The group also develops
programming templates that help scientists and engineers parallelize their
The CRPC's seven core sites have
computer science research affiliations
with more than a dozen other universities and institutions, including the
National Center for Supercomputing
Applications at the University of
Illinois, where teams use a 64-processor Exemplar X-Class server to tackle
challenges facing users of information
Together, Caltech and HP continuously collaborate with NASA's Jet
Propulsion Laboratory to improve
parallel system software and tools.
"We will continue to address issues
critical to the long-term success of
high-performance parallel computing,"
says Ken Kennedy. "Some of our
people are developing a set of parallel
extensions to C++, one of the most
widely used conventional programming languages.
"Soon we hope to test-run large
numeric problems by networking our
Exemplar server here at Rice with the
ones at Caltech and NCSA at the
University of Illinois. Then we may try
networking the Exemplar server with
other high-performance platforms."
Kennedy adds, "There will always be a
need to better understand the fundamentals of how parallel computing
and specific parallel platforms work,
to deliver better application performance and usability to commercial
enterprises. The CRPC will be pioneering that work. And I expect we
will continue our long collaboration with HP to meet those challenges."
|Two CRPC research groups based at Caltech use the world's largest and most
powerful Exemplar server, a 256-processor system.
For more information, contact any
of our worldwide sales offices or
HP Channel Partners (in the U.S., call
1-800-637-7740; in Canada, call 1-800-387-3867).
Look for HP on the World Wide Web
The information contained in this document is subject
to change without notice.
Copyright © Hewlett-Packard Co., 1998
All Rights Reserved. Reproduction, adaptation,
or translation without prior written permission
is prohibited except as allowed under the
Printed in USA 01/98