|Volume 7, Issue 1 -
CHAOS++: RUNTIME SUPPORT FOR DISTRIBUTED DYNAMIC DATA STRUCTURES IN C++
Chialin Chang, Alan Sussman, Joel Saltz, University of Maryland
CHAOS++ is a runtime library targeted at object-oriented applications with dynamic communication patterns. It is implemented as a C++ class library, and can be used directly by application programmers to parallelize applications with adaptive and/or irregular data access patterns. It subsumes CHAOS, which is a runtime library developed to efficiently support applications with irregular access patterns to distributed arrays.
A large class of applications execute on distributed memory parallel computers in single-program multiple-data (SPMD) mode in a loosely synchronous manner. Collections of data objects are partitioned among processors, and the program executes alternating phases of computation and communication. In many of these applications (including image processing, geographic information systems, and data mining) pointers are used to synthesize complex composite data types, and build dynamic complex data structures. In many of these applications, hierarchies of data types are defined, such that ones at the higher levels serve as containers for the ones at lower levels, and pointers are often used by container objects to point to the objects they contain. Objects of data types at the top of the hierarchy (e.g., objects of the outermost container class) can further be connected through pointers, forming pointer-based data structures, such as linked lists, trees, and graphs. Such data structures are dynamic in the sense that data elements in these structures are often created and/or deleted during program execution, and accesses to these data elements are done through pointer dereferences. As a consequence, access patterns to such data structures cannot be determined until runtime, so runtime optimization techniques are required.
However, many existing runtime systems for parallelizing applications with complex data access patterns on distributed memory parallel machines fail to handle pointers. These runtime systems therefore only allow data transfers of primitive data types, such as integers and floating point numbers, and of simple objects that contain no references to other objects. Most of these runtime systems also rely on the existence of global indices, which make them applicable only to distributed arrays.
In addition to providing support for distributed arrays through the features of the underlying CHAOS library, CHAOS++ also provides support for distributed pointer-based data structures, and allows flexible and efficient data exchange of complex data objects among processors. It deals with complex data types and pointer-based data structures by providing two abstract base classes: mobile objects (Mobject) and globally addressable objects (Gobject). Mobject is designed as a base class for all objects whose contents may be transferred between processors. It allows users to provide functions to pack an object into, and unpack an object from, a message buffer to support a so-called deep copy for a container object that has pointers to objects that it contains. Gobjects are objects that have ownership assigned to one processor, but allow copies to reside on other processors. Remote copies of a Gobject (all those not on the owner processor) are used to cache the contents of the Gobject, and CHAOS++ supports data transfers between the owner of the Gobject and remote copies. A partitioned pointer-based data structure is thus viewed by CHAOS++ as a collection of Gobjects interconnected by pointers. Pointers between two Gobjects residing on the same processor are represented as local pointers, while those that conceptually go across processor boundaries will point to local copies of the real Gobjects owned by other processors. Preprocessing techniques are used by the CHAOS++ runtime library to analyze communication patterns, and CHAOS++ data exchange primitives are provided to carry out efficient data transfers between processors.
To demonstrate the wide applicability of the runtime library, CHAOS++ has been applied to several applications, including a direct simulation Monte Carlo method, spatial database processing, and image segmentation. Experimental results have shown that good performance can be achieved with the library. The research group is in the process of integrating CHAOS++ into the runtime software being developed by the Parallel Compiler Runtime Consortium. It is also currently working to make a more robust version of the CHAOS++ runtime library, so that it can be more effectively used by compilers and application developers.
Table of Contents