Automatically Tuned Linear Algebra Software (ATLAS)

Jack Dongarra and R. Clint Whaley, University of Tennessee and Oak Ridge National Laboratory

The production of numerical software for processors with deep memory hierarchies and pipelined functional units can be a tedious and time-consuming process. To solve this problem, CRPC researchers are developing Automatically Tuned Linear Algebra Software (ATLAS), an approach that automatically generates and optimizes such software for machines that range from desktop workstations to embedded processors.

ATLAS is a package that generates efficient code for basic linear algebra operations. The package, which contains code generators, sophisticated timers, and robust search routines, achieves this by adapting itself to differing architectures via code generation coupled with timing.

Figure 1: ATLAS/Vendor Performance Preview of ATLAS

The researchers are concentrating their efforts on widely used linear algebra kernels called Basic Linear Algebra Subroutines (BLAS). The BLAS are building blocks for constructing efficient and portable linear algebra software for high-performance computers. They allow exploitation of parallelism in a way that is transparent to the software that calls them. There are three levels of BLAS: Level 1 for vector-vector operations, Level 2 for matrix-vector operations, and Level 3 for matrix-matrix operations. The ATLAS project involves Level 3 BLAS, which are targeted for computers with a hierarchy of memory and true parallel-processing computers.

Figure 2: Performance Preview of LU with ATLAS

The initial work is focused on general matrix multiply and matrix vector operations. However, much of the technology and approach that has been developed with ATLAS can be applied to all levels of BLAS and beyond. The general strategy will have an impact on basic linear algebra operations in general, and may be extended to other important kernel operations, such as for sparse matrix operations and FFTs.

Figure 3: Performance Preview of BLAS routines using Superscalar GEMM-based BLAS with ATLAS

To date, ATLAS has been installed on the following architectures: DEC AlphaStation 255, DEC AlphaStation 600 5/266, HP 9000/715/50, HP 9000/735/125, IBM Power2 (thin node), IBM PowerPC 604 (high node), Pentium, Pentium MMX, Pentium Pro, Pentium II, SGI R4600, SGI R5000, SGI R1000, Sun MicroSPARC II 70, Sun Ultra1 Model 140, Sun Ultra2 Model 2170, and Sun Ultra2 Model 2200.

For more information, see or email

Table of Contents