Jack Dongarra Notes Views on Proprietary Benchmarks in NSF Bids

Source: HPCwire, December 13, 1996
By Alan Beck

San Diego, Calif. -- Concern about certain aspects of the recently completed bid to provide NCAR with supercomputing resources led HPCwire to request clarification of UCAR policies regarding proprietary benchmarks and the dissemination of benchmark performance information following procurement decisions.

UCAR's Steve Hammond, a member of the technical team for the procurement who helped craft system specifications, assemble the benchmark suite, lead the live test demonstrations, evaluate the proposals and benchmark results, and give testimony to the ITC in August, tendered the following information concerning UCAR policy:

"With respect to UCAR RFP No. B-10-95P, UCAR is not publishing information that is covered by non-disclosure agreements. UCAR signed such agreements with all vendors participating in the procurement. The agreements covered many aspects of the procurement including company product futures, equipment proposed, and benchmarking results. NEC is the only vendor that has agreed to release the details of their bid and their benchmark results.

"As I stated in my presentation at SC96, other vendors have not agreed to release the details of their bid or their benchmark results and thus we cannot publish anything concerning these vendors. The benchmark results from NEC are highlighted in the presentation I made at SC96 and in the paper included in the conference proceedings."

When asked if it is appropriate for decisions on major -- and possibly controversial bids -- to be made solely on the basis of in-house benchmarks, Hammond observed: "In general, I believe that an organization is best served by choosing metrics most appropriate for their needs. That could mean all internally written benchmarks on one extreme or all externally developed codes on the other. The NCAR Benchmark Suite in particular is composed of a mixture of internally and externally developed codes. In Appendix B of the RFP, the Objectives of the benchmark suite are clearly stated:

"'The purpose of these benchmarks is to gain insight into the performance of machines that will be proposed by Vendors in response to this procurement. A primary application for the ACE will be the four component models: an atmospheric model (CCM2), an ocean model (MOM or POP), a land surface model, and a sea ice model.' and

"'The benchmark suite consists of thirteen kernels and three complete geophysical simulation codes. The kernels measure specific aspects of system operation such as accuracy of intrinsics, memory to memory bandwidth, processor speed, memory to disk I/O rates, and HiPPI transfer rates. The three applications measure combinations of these and are to be run at multiple resolutions as specified. Together these codes give a comprehensive measure of the capabilities of a computer system with respect to NCAR's computing environment as well as a computer system's performance under the anticipated computational load in ACE.'"

In order to expand its insight on these points, HPCwire submitted questions to Jack Dongarra, distinguished professor in the Department of Computer Science at the University of Tennessee and distinguished scientist at Oak Ridge National Laboratory (ORNL) about those facets of the bid process that had elicited concern. Following are the queries and the full text of Dongarra's answers.

HPCwire: In a paper, The Performance of the NEC SX-4 on the NCAR Benchmark Suite by Steven W. Hammond, Richard D. Loft and Philip D. Tannenbaum, posted to the UCAR Web site at http://www.scd.ucar.edu/css/sc96/sc96.html , the authors state that the "NCAR Benchmark Suite is consistent with the recommendations of Dongarra et al...who suggest that an effective benchmark suite must accurately characterize the anticipated workload of the system."

Do you feel that NSF-sponsored organizations such as NCAR should base major procurement decisions solely on proprietary benchmark suites? Does such a practice pose any danger that the basis for significant procurements may not be fully appreciated or understood, even by technically-knowledgeable personnel?

DONGARRA: "The value of a computer is dependent on the context in which it is used, where the context varies by application, by workload, and by time. To put it another way, there is no universal metric of value for computers. An evaluation that is valid for one site may not be valid for another site, and an evaluation that is valid at one time may not be valid just a short time later. Thus, each site should conduct its own evaluation activities, of which benchmarking is an essential element.

"It is rare that the objective of a benchmarking is to find the performance of a computer on a single application; rather, the objective is to find the performance range on a set of applications. No computer has a constant performance on all applications or even on a single application for a variety of problem sizes and solution methods, hence it is the comparative analysis of the range of performance that is of interest. In addition, high-performance computers are inherently difficult to evaluate, primarily because of the range of architectural types available and the correspondingly wide range of specific application performance as compared to other kinds of computers.

"The performance of a computer is a complicated issue and a function of many interrelated quantities. These quantities include the application, the algorithm, the size of the problem, the high-level language, the implementation, the level of human effort used to optimize the program, the compiler's ability to optimize, the age of the compiler, the operating system, the architecture of the computer, and the hardware characteristics. The task of evaluation in this multifaceted domain becomes a balancing act -- weighing in all of the qualitative and quantitative issues, understanding the nature and scope of the problem, and navigating through the myriad paths available while avoiding the many pitfalls along the way."

HPCwire: It appears that on major bids it is NCAR practice to post benchmark figures for the winner only, while losing vendors can veto publication of their benchmark results indefinitely. Is this healthy in the long run? Would full disclosure of benchmark results be damaging or enlightening to losing vendors, the HPC community and/or the public?

DONGARRA: "I don't see a problem with this. The benchmark should be tailored to the customer's applications and needs and may not have meaning to others. There are more general benchmarks that expose features of the machine; the NAS benchmarks, http://www.nas.nasa.gov/NAS/NPB/ and the ParkBench suite, http://www.netlib.org/parkbench/, are examples of such general benchmarks."

HPCwire: How do you feel publicly-funded supercomputing resource centers could better conduct the bidding process?

DONGARRA: "I understand the bidding process is complicated and don't feel qualified to comment on improving the process."

Alan Beck is editor in chief of HPCwire. Comments are always welcome and should be directed to editor@hpcwire.tgc.com

(For a free 4-week trial subscription to HPCwire, e-mail trial@hpcwire.tgc.com)

Hipersoft | CRPC

Jack Dongarra Notes Views on Proprietary Benchmarks in NSF Bids

Source: HPCwire, December 13, 1996 By Alan Beck

Source: HPCwire, December 13, 1996
By Alan Beck