|Volume 7, Issue 1 -
Multigrid Methods on SIMD Machines
Joel Dendy, Jr., Los Alamos National Laboratory
Joel Dendy and his associates have considered several multigrid methods on SIMD (single instruction multiple data) architectures. In particular they are interested in such multigrid methods that are robust for the numerical solution of certain discretizations that arise in petroleum reservoir engineering, in the modeling of contaminant transport, and in nuclear reactor safety studies.
On the CM-200 parallel machine, explicit iterative methods, such as point Jacobi, run many times faster than on the CRAY-YMP. Unfortunately, for some problems, the convergence factor for such methods can be so abysmally bad that such a superior gigaflop rate is irrelevant. On machines like the CRAY-YMP, methods which use incomplete factorization as a preconditioner and conjugate gradient or GMRES as an accelerator compete very effectively with multigrid methods. However, on the CM-200, the recursiveness involved in the incomplete factorization renders such methods ineffective.
The first multigrid method that was considered uses semicoarsening in y and line relaxation by lines in x. Variational coarsening is used; a coarse grid operator is derived from a fine grid operator by forming IfcLfIcf, where Lf is the fine grid operator, Icf is an operator-induced interpolation operator from the coarse grid to the fine grid, and Ifc at least for symmetric problems, is (Icf)T. Since the coarse grid arrays become progressively more rectangular, an early problem was to take sufficient control of the layout of the arrays to permit efficient communication between the coarse grids and the fine grids. This problem was finally solved with the introduction of the 2.0 version of the CMF compiler. Given a fine grid array A and a coarse grid array B, one can use intraprocessor moves (using a Ralph Brickner routine) to communicate efficiently with a temporary array C compatible with A. The other great advance in computing efficiency came from the improvement of the CMSSL library, namely the development of a routine that solves a collection of tridiagonal systems and permits the saving of the factors between calls to the routine. The present result is a two-dimensional nonsymmetric multigrid solver which runs as fast as the original version on the CRAY- YMP. A nonsymmetric three-dimensional solver has also been written, and both two- and three-dimensional solvers run also on the CM-5.
Two other methods have been considered. The first is semicoarsening multigrid with multiple semicoarsened grids. In this method each grid has two offspring, one obtained by semicoarsening in x and the other by semicoarsening in y. The advantage of this method is that only point relaxation is required for good convergence for anisotropic problems; line relaxation is not needed. The algorithm was mapped to the CM-200 by observing that if the fine grid is m x n, then the fine grid and all the coarse grids fit into a rectangle which is 2m x 2n. Then the key to parallel computation is to use concurrent multigrid--one relaxes on all the grids simultaneously, projects the resulting residuals simultaneously, and interpolates the corrections simultaneously. The second method is the frequency decomposition multigrid method. This method is a variant derived by Hackbusch of a method originally developed by Brandt and Ta'assan. In standard coarsening multigrid, to coarsen a given grid, one uses only one of four possible coarse grids. The idea in this method is to use all four possible coarse grids, each having associated with it an interpolation operator that annihilates a different Fourier frequency. Thus it is again possible to get good convergence using point relaxation for anisotropic problems. There are some changes in the algorithm one must make to get robustness for discontinuous coefficients. At this time, the simple semicoarsening multigrid method is the fastest on the CM-200 of the three methods considered, as shown in Fig. 1, and it is likely to remain the fastest since work is still ongoing to achieve further speedups.
Table of Contents