Sites and Affiliations
Research and Applications
Major Accomplishments
Frequently Asked Questions
Knowledge and Technology Transfer
Education and Outreach
Media Resources
Technical Reports and Publications
Parallel Computing Research - Our Quarterly Newsletter
Contact Information
CRPC Home Page

UniGuide Featured Site

Running Simulations with the Best Codes on the Best Machines for the Job
From: NPACI & SDSC Envision, April-June 1998

Distinguished Professor, Computer Science, University of Tennessee; Distinguished Scientist, Oak Ridge National Laboratory Mathematical Science Section

Professor, Computer Science and Mathematics Department, UC Berkeley

ALL THE COMPUTERS ON THE INTERNET TOGETHER perform hundreds of trillions of operations every second. Researchers would like their programs to take advantage of even a fraction of those computers at the same time, but today such a program must include code to handle the details of hardware, operating systems, and network connections. Long-time collaborators Jack Dongarra of the University of Tennessee and Oak Ridge National Laboratory and James Demmel of UC Berkeley are now working with NPACI to turn the computing resources on the Internet into a unified problem-solving environment through improved code libraries, code generators, and the NetSolve software system.

A client application makes a request to a NetSolve agent. The agent searches its database of methods, platforms, and other NetSolve agents to choose an appropriate high-performance computing server on the network. When the server finishes the computation, the result is returned to the client application via the NetSolve agent.
Demmel and Dongarra have long worked together on programming libraries, which provide sets of related subroutines, for high-performance computers. Two key libraries solve sets of linear algebra equations - LAPACK: for serial, shared-memory computers and ScaLAPACK for distributed-memory parallel machines. Under NPACI, the pair is making sure the libraries are available on NPACI machines.

"We have gotten performance of 750 gigaflops on ASCI Red using ScaLAPACK on a generalized symmetric Eigenvalue problem of size 40,000," Dongarra said. "We should see similarly impressive results on NPACI machines." The ASCI Red work, a collaboration with Greg Henry of Intel and Ken Stanley of UC Berkeley, ran on 4,500 processors.

The next release of LAPACK will feature much better implementations of the singular value decomposition (SVD) method and a least squares solver using SVD. A future release will have even faster code for SVD and the symmetric Eigenvalue problem. The next release of ScaLAPACK will also have optimal code for SVD and the symmetric Eigenvalue problem.

Demmel and colleagues are also creating versions of other libraries, including SuperLU, for distributed memory machines. The code has achieved 7.9 gigaflops on 480 processors of a CRAY T3E at the National Energy Research Scientific Computing Center. "These codes promise to be highly scalable," Demmel said. "One of their first uses will be in the earthquake simulations of Greg Fenves." Fenves is professor of civil engineering at UC Berkeley and an assistant director of the Pacific Earthquake Engineering Research Center, headquartered at UC Berkeley Dongarra is also working on an NPACI software repository based on the Netlib collection of code libraries for mathematics users. "The repository will provide the community with 'one-stop shopping' for NPACI software," Dongarra said. "All software developed by NPACI will be put into this library."

A second area of research for Dongarra and Demmel is in developing techniques for generating optimized code automatically. While code libraries can be ported and tuned for a particular piece of hardware, other parts of applications may not be so general. Users often have a standard set of application kernels that they use, but when they move to new platforms, those kernels are no longer optimal.

The Automatically Tuned Linear Algebra Software (ATLAS) package generates and optimizes numerical software for processors with deep memory hierarchies and pipelined functional units. ATLAS, being developed by Dongarra and colleagues at Tennessee, can produce optimized code for machines ranging from desktop workstations to embedded processors, normally a tedious and time-consuming task. Thus far, they have concentrated on the widely used Basic Linear Algebra Subroutines (BLAS).

Demmel and researchers at UC Berkeley are working on a similar project, called PHiPAC. PHiPAC has so far been applied to matrix multiplication codes. "PHiPAC generates many reasonable code implementations, benchmarks them all, and 'intelligently' picks the best version for a particular platform," Demmel said. "In the future, we aspire to tackle more complicated routines."

MCell, a neuron synapse simulator by Tom Bartol of the Salk Institute for Biological Studies and Joel Stiles at Cornell University, will use NetSolve to conduct larger simulations.
Code libraries and optimized code have their greatest impact when they run on high performance hardware, and the Internet has many such resources. NetSolve, a software system that Dongarra is bringing to NPACI, transforms disparate, loosely connected computers and software libraries into a unified, easy-to-access computational service. This service can make enormous amounts of computational power available transparently to users On ordinary network platforms.

Even from a laptop or a hand-held device, NetSolve can tap supercomputing power without the user's having to know anything about the network or the pool of hardware and software resources. An engineer, for example, might formulate the computational problems involved in a large model or simulation and leave it to NetSolve to determine, in light of all the resources available across the network, how to get the required computations done in the fastest and most efficient way possible.

From the user's point of view, NetSolve appears to be able to dynamically acquire remote resources in a way that can increase the instantaneous performance at the desktop by several orders of magnitude. NetSolve hides the complexity of the underlying system through client interfaces ranging from the sophistication of MATLAB and Java, to run-of-the-mill Fortran subroutine calls (Figure l). As Java made it possible for applets to download automatically from the Web, NetSolve in effect delivers supercomputing services directly to any computer on the network.

In an early application, Dongarra is collaborating with Tom Bartok a postdoctoral researcher at the Salk Institute for Biological Studies and a participant in NPACI's Neuroscience thrust area, to move routines from the neuron simulator MCell into NetSolve. MCell, a project of Bartol's and John Stiles at Cornell University, is a 3-D Monte Carlo simulation for neurotransmitter release across synapses. MCell can simulate several classes of transmitters and receptors, along with complex 3-D arrangements of multiple cell or organelle membranes (Figure 2).

Without NetSolve, an application such as MCell would make calls to library routines, which are executed in serial on the user's machine. With NetSolve, on the other hand, the application calls NetSolve with a description of the library routine that needs to be run. The NetSolve agent then finds the best resource available to run the routine. The user is not aware of the networking or the details of where the routine is executed. In fact, an application may send off many NetSolve requests simultaneously, all of which may be executed in parallel on different machines.

"We have the software available, and we are always looking for users with interesting applications that we can deploy under NetSolve," said Henri Casanova, a research associate on the NetSolve team who recently received his Ph.D. from Tennessee. "It's fairly easy to add to the existing routines." Current plans call for increasing the number of routines and NetSolve servers. At the moment: NetSolve runs on machines at Tennessee, at UC Berkeley, and in Denmark.

Under NPACI, NetSolve will be integrated into the computing infrastructure. One possible route may be to make NetSolve an allocable resource on NPACI machines. Dongarra is also working with researchers with similar interests from the Metasystems thrust area, including Fran Berman at UC San Diego for scheduling software and the Globus team's toolkit for overlapping tasks.

"NetSolve is experiencing a tremendous popularity increase and is bound to generate a tremendous demand on the network as it is used for more complex and diverse applications," Dongarra said. "NetSolve is an example of a good marriage between NPACI applications and software infrastructure. -DH