Automatic Optimization of MPI Applications - Turning Synchronous Calls Into Asynchronous

Natvig, Thorvald

dc.contributor.advisor	Elster, Anne Cathrine
dc.contributor.author	Natvig, Thorvald
dc.date.accessioned	2018-11-05T15:01:01Z
dc.date.available	2018-11-05T15:01:01Z
dc.date.created	2006-01-31
dc.date.issued	2006
dc.identifier	ntnudaim:1483
dc.identifier.uri	http://hdl.handle.net/11250/2571080
dc.description.abstract	The availability of cheap computers with outstanding single-processor performance coupled with Ethernet and the development of open MPI implementations has led to a drastic increase in the number of HPC clusters. This, in turn, has led to many new HPC users. Ideally, all users are proficient programmers that always optimize their programs for the specific architecture they are running on. In practice, users only invest enough effort that their program runs correctly. While we would like to teach all HPC users how to be better programmers, we realize most users consider HPC a tool and would like to focus on their application problem. To this end, we present a new method for automatically optimizing any application's communication. By protecting the memory associated with MPI_Send, MPI_Recv and MPI_Sendrecv requests, we can let the request continue in the background as MPI_Isend or MPI_Irecv while the application is allowed to continue in the belief the request is finished. Once the data is accessed by the application, our protection will ensure we wait for the background transfer to finish before allowing the application to continue. Also presented is an alternate method with less overhead based on recognizing series of requests made between computation phases. We allow the requests in such a chain to overlap with each other, and once the end of such a chain of requests is reached, we wait for all the requests to complete. All of this is done without any user intervention at all. The method can be dynamically injected at runtime, which makes it applicable to any MPI program in binary form. We have implemented a 2D parallel red-black SOR PDE solver, which due to its alternating red and black cell transfers represents a "worst case" communication pattern for MPI programs with 2D data domain decomposition. We show that our new method will greatly improve the efficiency of this application on a cluster, yielding performance close to that of manual optimization.
dc.language	eng
dc.publisher	NTNU
dc.subject	Datateknologi, Komplekse datasystemer
dc.title	Automatic Optimization of MPI Applications - Turning Synchronous Calls Into Asynchronous
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: 1483_FULLTEXT.pdf
Størrelse:: 917.7Kb
Format:: PDF

Åpne

Filnavn:: 1483_ATTACHMENT.zip
Størrelse:: 13.01Kb
Format:: application/zip

Åpne

Filnavn:: 1483_COVER.pdf
Størrelse:: 47.58Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6544]

Vis enkel innførsel