Dynamic Selection of MPI Intra-copy Routines Based on Program Characteristics

The Message Passing Interface(MPI) has become a de-facto standard for parallel programming. The ultimate goal of parallel processing is high performance and this brings a motivation for a highly optimized MPI - implementation. When an application calls an MPI communications routine, data is copied between user memory and the memory areas managed by the MPI library. The speed of this transfer depends on a multitude of factors, including the architecture, amount of data, data layout and whether the data is referenced right before or after a transfer. There are numerous ways to copy data from one location to another, and their characteristics combined with the data properties will yield different efficiency. The information needed to select the best way to copy data is only available during application execution. In this Master's Thesis, we present and implement a method to improve the performance of parallel applications by dynamically perform a close-to-optimal selection of intra-copy routines within an MPI implementation. Our method detect loops of MPI calls, and exploit loop predictability to time their performance while varying the routine selections. In order to obtain a good routine selection reasonably fast, a global optimization heuristic, simulated annealing, is used. In particular, our solution method is employed within Scali MPI Connect (SMC), an MPI implementation providing 35 different intra-copy routines. Through various benchmarks, it is observed that our method introduce low overhead and find a good selection fast, thus reducing the execution time of the given benchmark. In benchmarks where the difference between an optimal routine selection and the standard selection within SMC allows it, a bandwidth improvement of 40% is observed.

Publisher

Institutt for datateknikk og informasjonsvitenskap