Automatic Optimization of MPI Applications: Turning Synchronous Calls Into Asynchronous

Natvig, Thorvald

dc.contributor.advisor	Elster, Anne Cathrine	nb_NO
dc.contributor.author	Natvig, Thorvald	nb_NO
dc.date.accessioned	2014-12-19T13:31:32Z
dc.date.available	2014-12-19T13:31:32Z
dc.date.created	2010-09-02	nb_NO
dc.date.issued	2006	nb_NO
dc.identifier	347398	nb_NO
dc.identifier	ntnudaim:1483	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/250388
dc.description.abstract	The availability of cheap computers with outstanding single-processor performance coupled with Ethernet and the development of open MPI implementations has led to a drastic increase in the number of HPC clusters. This, in turn, has led to many new HPC users. Ideally, all users are proficient programmers that always optimize their programs for the specific architecture they are running on. In practice, users only invest enough effort that their program runs correctly. While we would like to teach all HPC users how to be better programmers, we realize most users consider HPC a tool and would like to focus on their application problem. To this end, we present a new method for automatically optimizing any application's communication. By protecting the memory associated with MPI_Send, MPI_Recv and MPI_Sendrecv requests, we can let the request continue in the background as MPI_Isend or MPI_Irecv while the application is allowed to continue in the belief the request is finished. Once the data is accessed by the application, our protection will ensure we wait for the background transfer to finish before allowing the application to continue. Also presented is an alternate method with less overhead based on recognizing series of requests made between computation phases. We allow the requests in such a chain to overlap with each other, and once the end of such a chain of requests is reached, we wait for all the requests to complete. All of this is done without any user intervention at all. The method can be dynamically injected at runtime, which makes it applicable to any MPI program in binary form. We have implemented a 2D parallel red-black SOR PDE solver, which due to its alternating red and black cell transfers represents a "worst case" communication pattern for MPI programs with 2D data domain decomposition. We show that our new method will greatly improve the efficiency of this application on a cluster, yielding performance close to that of manual optimization.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Institutt for datateknikk og informasjonsvitenskap	nb_NO
dc.subject	ntnudaim	no_NO
dc.subject	SIF2 datateknikk	no_NO
dc.subject	Komplekse datasystemer	no_NO
dc.title	Automatic Optimization of MPI Applications: Turning Synchronous Calls Into Asynchronous	nb_NO
dc.type	Master thesis	nb_NO
dc.source.pagenumber	126	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskap	nb_NO

Files in this item

Name:: 347398_COVER01.pdf
Size:: 47.58Kb
Format:: PDF

View/Open

Name:: 347398_ATTACHMENT01.zip
Size:: 13.01Kb
Format:: Unknown

View/Open

Name:: 347398_FULLTEXT01.pdf
Size:: 593.0Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Institutt for datateknologi og informatikk [6544]

Show simple item record