Sites and Affiliations
Research and Applications
Major Accomplishments
Frequently Asked Questions
Knowledge and Technology Transfer
Education and Outreach
Media Resources
Technical Reports and Publications
Parallel Computing Research - Our Quarterly Newsletter
Contact Information
CRPC Home Page

UniGuide Featured Site

A Data Blade Architecture for Generalized Out-of-Core Reduction Operations

Joel Saltz and Alan Sussman

Presented at the 1997 CRPC Annual Meeting Poster Session

We will describe Titan, a prototype database engine that is able to carry out generalized reductions on large spatial or scientific multi-dimensional data sets. Titan is designed to efficiently carry out generalized reductions on large data sets resident on parallel architectures that have been configured to support high I/O rates. Titan can be used to generate composite images from low-level satellite sensor data, support data exploration and processing of large quantities of data from high power light or electron microscopy, and support analysis of output data from scientific calculations. We will also show how the resulting data sets can be efficiently transmitted to other (sequential or parallel) programs for further processing, using the Meta-Chaos library developed at Maryland. In particular, we will show how Titan, running on an IBM SP2, can be used to produce data sets for an HPF image classification program running on a farm of Digital Alpha SMPs.

Our database engine will be accessible as a stand-alone tool (Titan) designed to carry out optimized operations on data repositories, as a Data Blade module associated with a relational database, and as compiler or user-level runtime support. When it is used for compiler or user-level runtime support, the engine can carry out generalized reductions associated with out-of-core scientific applications.

The techniques employed involve the generation and use of optimized schedules - preprocessing of data access patterns is used to schedule processing, communication, local I/O and non-local I/O. The engine makes use of combined clustering/declustering methods we have developed for embedding data sets into parallel disk farms, in ways that optimize the performance of multi-dimensional range queries.