CALTECH'S NEW EXEMPLAR: AN INTERVIEW WITH PAUL MESSINA
Source: HPCwire, June 13, 1997
By Alan Beck
Editor in chief
Pasadena, Calif. -- On June 9 a 256-CPU Hewlett-Packard Exemplar was dedicated at Caltech's Center for Advanced Computing Research. (See HPCwire article 11323, "HP'S 256-CPU SYSTEM AT CALTECH/JPL TO MERCEDIZE IN FUTURE," 06.06.97.) To learn more about the system and how it will be used, HPCwire interviewed Paul Messina, director of California Institute of Technology's Center for Advanced Computing Research and assistant vice-president for Scientific Computing at Caltech. Following are selected excerpts from that discussion.
HPCwire: What will be the principal application areas for the new 256-CPU Exemplar?
MESSINA: "I can only give some examples. I must emphasize that since we've had large parallel computers at Caltech and JPL for about 15 years, there are many applications. Also, the NSF's (National Science Foundation) academic research community will have access to it through the NPACI (National Partnership for Advanced Computational Infrastructure) grant.
"Some of the items that have been transitioned early are ocean modeling and climate studies. This is part of a long-term effort of JPL where we utilized fully parallel programs. One of those is already running on 64 processors of the HP (Hewlett-Packard) system. We currently have 128 of the 256 processors in.
"We have a variety of applications that deal with very large sets of observational data. Some are synthetic aperture radar data of the Earth or other planets such as Venus. A number of NSF grant plan applications address this too; for example, the Digital Sky project will collect several existing databases and provide convenient ways to access the data (in multiple wave lengths) as well as provide access to large computers for the compute-intensive types of uses. Typically, these are sky surveys of up to a couple of billion items and therefore represent terabytes of data. The project seeks not only to allow easy access to optical wavelength images for areas of the sky but also to allow compute-intensive correlations among galaxies or structures.
"There are also classical scientific and engineering applications. One is geophysical: studying the internal structure of the earth, including convection in its mantle. Another is turbulent flow computations that are often employed in aircraft design. John Seinfeld, professor of Chemical Engineering at Caltech and Donald Dabdub, professor at UC Irvine, are doing exciting work modeling air quality in the Los Angeles Basin. Some of their models have led to policy implementations of environmental quality standards on particulate matter and chemical reactions.
"In addition, other interesting applications pertain to the fabrication and design of microprocessors."
HPCwire: How soon will the entire 256-processor machine be operational?
MESSINA: "It will be fully delivered by June 17 and fully operational by mid-summer. It has been our experience that we can integrate 64 processors with memory, test them for uptime and reliability, and run fundamental performance, accuracy and speed tests in ten days to two weeks.
"This has happened twice before. We received the first four 16-processor hypernodes, tied them together and tested them. Then we did it again more recently, so we now have 128 -- the process tends to take a couple of weeks. So I will guess that the whole system will be available by late July."
HPCwire: What key challenges will be encountered in scaling the operating system up to 256 processors?
MESSINA: "It's a bit early for me to be very specific, but I can draw on previous experience and some expectations. I should note that the current OS is SP-UX; this is similar but not identical to HP-UX. There is a firm plan to go to HP-UX rather soon.
"Some of the important issues that arise have to do with single system image. What does it mean for one OS to be running on all 256 processors? Past systems we've used such as the Intel Paragon or Cray T3D have a single system image: you have one file system so that any processor can access any file anywhere online. You only have to issue one command to boot or shut down the machine. No matter where a processor is in the network interface, all the system resources are available to it. One job scheduler can handle all the jobs for the machine. That's what I would categorize as a single system image.
"While no breakthroughs or radically new concepts will have to be invented for the HP system, it will be a matter of judicious detail design so that we don't have the kind of bottlenecks we've experienced not on the HP but, say, on the Intel machine. For example, the name server for a disk file system may generate a bottleneck. Someone writing code 10 or 15 years ago never envisioned that 500 processors might try to access a name server at once. Things like that must be made scalable. And typically you do it by setting up a hierarchy, so you don't have a single processor handling all the name serving for all the I/O requests by all the applications processors.
"What I anticipate is that -- since to my knowledge HP-UX was not designed for many hundreds of processors simultaneously accessing OS services such as name servers -- this is an area where we will perhaps do some redesigning. I suspect it will not have to be a fundamental redesign: it will be more of a recrafting of the implementation. We will have to decide exactly how we want to define single system image for this system.
"With respect to scalability issues, we will almost certainly run into some areas where the design and implementation of the detailed software for the operating system services never took into account that there might be hundreds of processes simultaneously trying to do something. Therefore we'll have to distribute those tasks to avoid the bottleneck of waiting for one processor to do everything."
HPCwire: Is actual networking planned between the Caltech Exemplar and machines at Rice and NRL (Naval Research Lab)?
MESSINA: "Caltech will be under the NSF's vBNS, as will Rice. Although NRL is not on vBNS, it is on networks connected to it. However, we have no special plans to network with the Exemplars at those locations. Undoubtedly, we'll have much contact with them -- as we will with NCSA. We do hope to effect a much faster network with certain other sites, such as the University of California at San Diego with its teraflops system and petabyte data archive.
"Such projects are important. But to enable computers to communicate effectively at high speeds, two things are necessary. First, networks much faster than OC-3 or OC-12 must be put in place. Although technology is available for this, it hasn't yet been done to any great extent. Second, even if someone were to come to me today saying, 'Here's your 10-gigabit per second network,' I still don't have a place to plug it in. There are no computers today that can take an external network of 10-gigabits or more per second -- yet those are the kinds of speed we'll need. So I'm hoping we'll find a way to experiment with those technologies, at least, say, between San Diego and Caltech, sometime in the near future.
"What does this mean for the Exemplar? Our configuration has 16 HIPPI interfaces; each has a speed of 800 megabits per second. Sixteen times that exceeds 10 gigabits per second. If I had a suitable router that could take 16 HIPPIs on one side and squeeze them into a fiber to another location, going at 10-20 gigabits per second, then I'd have an environment where I could start doing research on what software technology is needed to keep the data flowing at those rates -- i.e. to aggregate 16 streams into one and then spread them back out into 16. We could also research applications; we have some where we'd be very happy if they could suck up data at those rates from remote sites. For example -- to get data from the petabyte archive at San Diego, then process it in chunks of a terabyte or two at a time would be very appealing."
HPCwire: NCSA will be gearing up a machine similar to Caltech's for NT 5.0. Do you plan any similar experiments? What is your general feeling about the very-high-end use of NT?
MESSINA: "I don't plan any NT experiments on our Exemplar in the forseeable future. There are two reasons for this. First, it's my understanding that NCSA will look closely at it, and I have my plate full with other research. Second, I have no particular expertise with NT. However, I am intrigued with the possibility that NT could become a viable OS for high-end scientific and engineering computing. Also, I'm delighted that systems such as the Exemplar can be largely built from commodity parts.
"We can finally get the payoff going to the same kinds of servers that companies buy for their commercial applications. That's the hardware area. But in the software area there's still a big split. So if NT could mature to really provide scalable services to very large systems, that would be excellent. I haven't looked at the insides of NT, and I don't know to what extent scalability is designed into it. Many mutithreaded systems seem to bog down once you get to 4 or 8 processors, each with a few threads. But perhaps they'll soon be able to handle large numbers of threads and efficient data transfers. I/O is yet another area that operating systems designed for PCs and workstations seldom can handle at a significant rate -- in contrast for the operating systems for Crays, IBMs, etc.
"So I'll happily follow those NT experiments and developments but don't anticipate being a player -- not because it's a silly idea at all, but because we have a large independent agenda of our own."
Alan Beck is editor in chief of HPCwire. Comments are always welcome and should be directed to firstname.lastname@example.org