Volume 7, Issue 1 -
Spring/Summer 1999

Volume 6, Issue 3
Fall 1998

Volume 6, Issue 2
Spring/Summer 1998

Volume 6, Issue 1
Winter 1998

Volume 5, Issue 4
Fall 1997

Volume 5, Issue 3
Summer 1997

Volume 5, Issue 2
Spring 1997

Volume 5, Issue 1
Winter 1997

Volume 4, Issue 4
Fall 1996

Volume 4, Issue 3
Summer 1996

Volume 4, Issue 2
Spring 1996

Volume 4, Issue 1
Winter 1996

Volume 3, Issue 4
Fall 1995

Volume 3, Issue 3
Summer 1995

Volume 3, Issue 2
Spring 1995

Volume 3, Issue 1
January 1995

Volume 2, Issue 4
October 1994

Volume 2, Issue 3
July 1994

Volume 2, Issue 2
April 1994

Volume 2, Issue 1
January 1994

Volume 1, Issue 4
October 1993

Volume 1, Issue 3
July 1993

Volume 1, Issue 2
April 1993

Volume 1, Issue 1
January 1993

RESEARCH AND KNOWLEDGE TRANSFER FOCUS: NATIONAL HPCC SOFTWARE EXCHANGE (NHSE URL: http://www.nhse.org//)

Shirley Browne, Jack Dongarra, Stan Green, Keith Moore, Tom Rowan, Reed Wade, University of Tennessee; Geoffrey Fox, Ken Hawick, Syracuse University; Ken Kennedy, Rice University; Jim Pool, Caltech; Rick Stevens, Bob Olsen, Terry Disz, Argonne National Laboratory

The National HPCC Software Exchange (NHSE) is an Internet-accessible software and information distribution system whose goals are 1) to facilitate an active exchange of software and enabling technologies among members of the HPCC community and 2) to promote these contributions and their use among Grand Challenge users and other HPCC users. This system effectively shows how the evolving National Information Infrastructure (NII) can be used to facilitate the sharing of software and information within the HPCC community.

NHSE Advantages to CRPC Knowledge Transfer

NHSE embodies several qualities that the CRPC strives to develop in all of its work. For instance, as center technologies are developing and maturing, the CRPC is making a greater push toward outreach and knowledge transfer to HPCC users, so that this work will find individuals and organizations who will naturally benefit from it. NHSE is central to this outreach effort. The promise of the NII as a means of mass communication and exchange is a reality today for academic and industrial researchers around the nation and across the world. As a result, much of the CRPC's main audience can be reached effectively through the Internet, making it an ideal medium for knowledge transfer.

NHSE also shows many good examples of how the CRPC leverages its own resources and those of other HPCC organizations. One of the biggest benefits of the NII is that it makes the physical distribution of information resources insignificant, facilitating the combination of existing resources for HPCC software, methods, and information. Users do not have to separately access different HPCC software and database repositories because NHSE provides a virtual repository that is built on top of these resources. NHSE is also able to leverage CRPC experience because the distribution system itself is based on the Netlib software system (built by CRPC researchers at the University of Tennessee) and the dominant use of the World Wide Web.

Another CRPC goal that NHSE is helping to achieve is focusing on the user perspective to accessing HPCC information. Through a research effort to improve searching and browsing capabilities, the system is making every attempt to help people locate and understand relevant HPCC resources for their field. With myriad World Wide Web sites to access and peruse, a user can no longer count on a browser to access valuable information. This effort will be discussed in further detail in a later paragraph.

NHSE Prototype Components

HPCC software-related algorithms, specifications, designs, documentation, and reports can be found in the current prototype NHSE system, created in 1994. NHSE points to more than 200 modules and packages, including:

  • Parallel system and software tools such as compilers, message- passing communication subsystems, parallel monitors, and debuggers.
  • Building blocks for common computational tasks (These tools are portable across platforms and targeted to Grand Challenge users to speed the implementation and increase the reliability of their work.)
  • Research codes for difficult computational problems (these codes serve as proof-of-concept models for programs solving larger classes of related problems.)

NHSE provides easy access to all of this material while providing a good degree of control to a decentralized group of maintainers. All of these repositories are maintained and given technical support by discipline- related groups, with a central administration handling interoperation between repositories of different disciplines and meeting common needs such as indexing and searching. This setup ensures that material is effectively updated. Although there are three levels of review for software linked through the NHSE, updates to software already linked to the system can be added without review. Additional flexibility will be gained through unique naming and digital signatures to improve the authentication, integrity, and version control for the maintainers.

Several of the established repositories that NHSE points to include CRPC -related systems (Netlib, Softlib, and CITLIB), ASSET (Asset Source for SW Engineering Tech.), CARDS (Comprehensive Approach to Reusable Defense SW), ELSA (Electronic Library Services and Appl.), GAMS (Virtual Software Repository), STARS (SW Technology for Adaptable, Reliable Systems), and others. The repositories that NHSE ties together offer a vast array of computational tools. Netlib, for instance, is a moderated collection of mathematical software and parallel programming tools, such as ScaLAPACK, MPI, P4, PICL, PRESTO, and PVM.

The use of current World Wide Web browser and server technologies ensures that repository maintainers can devote less time and resources to developing distribution methods. Use of familiar, widely available information technologies also allows better interoperability with outside repositories and gives schools, libraries, museums, and other institutions greater access to the NHSE.

Intended Audience

So far, response to NHSE has been supportive. Three different communities make up the list of benefactors for the NHSE:

  • HPCC Application and Computer Science Community--This group of users needs to develop very specific, highly optimized code that they usually develop from scratch. Although they may not need generic, reusable software components, these users may find value in higher- level artifacts like design documents, algorithms, and templates. Domain analysis techniques can also be of value to identify similarities between applications.
  • Users of NASA, NSF, DOE, and Other Government and Non-profit Supercomputing Centers--Since many of these users are involved in research, teaching, or industry, many of the NHSE's libraries of reusable components can be used to solve common computational prob- lems. These users may also find parallel compilers and code restructurers useful for porting existing code to parallel machines. Targeting these audiences helps to leverage the outreach work of the nation's supercomputing centers.
  • Industry Users Interested in Technology Transfer--NHSE promotes software capitalization by giving industry users access to extensive documentation on research prototypes developed by academic researchers. The InfoMall Project started by CRPC researchers at Syracuse is a good example of the success of providing online documentation to facilitate technology transfer to industry.

Research on Improving the NHSE System

As part of the research in enabling technologies for the NHSE, CRPC researchers at Argonne National Laboratory are building a toolkit for exploring advanced web resource management technologies. The toolkit will support the "hunting and gathering" of web pages and will have features for compression, indexing, transaction monitoring, and parallel searches. The toolkit will also have a rich language environment for developing agents. Since it is being developed for use in the NHSE, the toolkit focuses more on discovery, distribution, and management of software code as opposed to information sources (text, images, or video). Here is a detailed list of specific tools in this toolkit.

  • Modular Web Robot: CRPC researchers are developing a modular programmable web robot that is designed to efficiently cache web pages on a local server based on programmable starting locations, key words, file types and other search criteria. For instance, users can develop their own search strings and search for files developed in Fortran, C, C++, Perl, Matlab, Scripts, postscript, HTML, and Tex. This web robot is designed to be run in parallel to allow high-performance gathering of web pages and can be rapidly modified for experimental purposes. So far, the robot has collected data for 52,032 URLs and 37,700 HTML pages from 13,000 sites. The robot provides the NHSE with the raw WWW pages needed for queries of various types.
  • Parallel Web Indexing Engine: The group has developed a parallel extension of the Glimpse (University of Arizona) system for rapidly indexing web pages and for providing rapid regular expression-based parallel searches of web page caches, such as those generated by the group's web robot. A five-million URL test run is planned for the near future.
  • DNS/Geographical Database and Mapping Software: Monitoring transactions on the web can be difficult. Simple tools that capture summary data prove to be very useful, but fail to give information on the geographical distribution of users. This problem is complicated by Internet domain names, which have little relationship to physical location. To address this problem, the group is developing a database to support the mapping of Internet site domain names to geographical places for display on a variety of GIS systems. The system uses two databases: one that maps Internet domains to geographical place names based on the internic registration data and another that maps geographical place names to latitude and longitude. Using these two databases allows the system to automatically parse WWW or FTP site logs and generate maps of usage. This tool is valuable to NHSE contributors and maintainers, providing an instant overview of the number and location of sites that have downloaded their software and data.
  • Autonomous Agents: CRPC researchers have begun work on the design and implementation of several types of search agents (software that, given instructions from the user, can automatically use various Internet mechanisms to locate or monitor data for the user). One area has focused on agents that build up a comprehensive database of network-available data/information/software based on a keyword list and provide the user with daily updates regarding changes to this database (e.g. new Internet sites that contain data that match the keywords or changes to existing sites). This agent would allow software providers to monitor the redistribution of software via ftp or WWW and to provide indications of incorrect version propagation.

The second type of agent will monitor a set of WWW sites and determine significant changes in the structures for these sites. For example, there may be a set of sites that are developing linear algebra software and they all have links to a set of other Web sites relevant for work in linear algebra. The user may like to be notified when something new is referenced by, for example, more than four of the chosen sites, giving some indication that multiple sites view this new site worthy of attention.

Future of the NHSE System

NHSE will build upon the experiences gained from the current prototype, which will hopefully demonstrate to others its effectiveness in distributing HPCC software through the NII. Development will also continue on integrating links to other repositories so that their software can be accessed through the NHSE interface. Browsing and searching mechanisms will continue to be improved, while taking advantage of popular information retrieval technologies like WWW browsers.

Finally, CRPC researchers involved in the NHSE project are currently distributing a survey to users regarding improvements that can be made to the system. Input is especially needed to generate terms for a thesaurus-type roadmap on which to base the NHSE's browsing and navigation tools. For more information on this survey, access the NHSE home page (http://www.nhse.org/).


Table of Contents