Object Based Concurrency for Data Parallel Applications: Programmability and Effectiveness
MetadataShow full item record
Increased programmability for concurrent applications in distributed systems requires automatic support for some of the concurrent computing aspects. These are: the decomposition of a program into parallel threads, the mapping of threads to processors, the communication between threads, and synchronization among threads. Thus, a highly usable programming environment for data parallel applications strives to conceal data decomposition, data mapping, data communication, and data access synchronization. This work investigates the problem of programmability and effectiveness for scientific, data parallel applications with irregular data layout. The complicating factor for such applications is the recursive, or indirection data structure representation. That is, an efficient parallel execution requires a data distribution and mapping that ensure data locality. However, the recursive and indirect representations yield poor physical data locality. We examine the techniques for efficient, load-balanced data partitioning and mapping for irregular data layouts. Moreover, in the presence of non-trivial parallelism and data dependences, a general data partitioning procedure complicates arbitrary locating distributed data across address spaces. We formulate the general data partitioning and mapping problems and show how a general data layout can be used to access data across address spaces in a location transparent manner. Traditional data parallel models promote instruction level, or loop-level parallelism. Compiler transformations and optimizations for discovering and/or increasing parallelism for Fortran programs apply to regular applications. However, many data intensive applications are irregular (sparse matrix problems, applications that use general meshes, etc.). Discovering and exploiting fine-grain parallelism for applications that use indirection structures (e.g. indirection arrays, pointers) is very hard, or even impossible. The work in this thesis explores a concurrent programming model that enables coarse-grain parallelism in a highly usable, efficient manner. Hence, it explores the issues of implicit parallelism in the context of objects as a means for encapsulating distributed data. The computation model results in a trivial SPMD (Single Program Multiple Data), where the non-trivial parallelism aspects are solved automatically. This thesis makes the following contributions: - It formulates the general data partitioning and mapping problems for data parallel applications. Based on these formulations, it describes an efficient distributed data consistency algorithm. - It describes a data parallel object model suitable for regular and irregular data parallel applications. Moreover, it describes an original technique to map data to processors such as to preserve locality. It also presents an inter-object consistency scheme that tries to minimize communication. - It brings evidence on the efficiency of the data partitioning and consistency schemes. It describes a prototype implementation of a system supporting implicit data parallelism through distributed objects. Finally, it presents results showing that the approach is scalable on various architectures (e.g. Linux clusters, SGI Origin 3800).