Show simple item record

dc.contributor.advisorElster, Anne Cathrinenb_NO
dc.contributor.authorNatvig, Thorvaldnb_NO
dc.date.accessioned2014-12-19T13:31:32Z
dc.date.available2014-12-19T13:31:32Z
dc.date.created2010-09-02nb_NO
dc.date.issued2006nb_NO
dc.identifier347398nb_NO
dc.identifierntnudaim:1483nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/250388
dc.description.abstractThe availability of cheap computers with outstanding single-processor performance coupled with Ethernet and the development of open MPI implementations has led to a drastic increase in the number of HPC clusters. This, in turn, has led to many new HPC users. Ideally, all users are proficient programmers that always optimize their programs for the specific architecture they are running on. In practice, users only invest enough effort that their program runs correctly. While we would like to teach all HPC users how to be better programmers, we realize most users consider HPC a tool and would like to focus on their application problem. To this end, we present a new method for automatically optimizing any application's communication. By protecting the memory associated with MPI_Send, MPI_Recv and MPI_Sendrecv requests, we can let the request continue in the background as MPI_Isend or MPI_Irecv while the application is allowed to continue in the belief the request is finished. Once the data is accessed by the application, our protection will ensure we wait for the background transfer to finish before allowing the application to continue. Also presented is an alternate method with less overhead based on recognizing series of requests made between computation phases. We allow the requests in such a chain to overlap with each other, and once the end of such a chain of requests is reached, we wait for all the requests to complete. All of this is done without any user intervention at all. The method can be dynamically injected at runtime, which makes it applicable to any MPI program in binary form. We have implemented a 2D parallel red-black SOR PDE solver, which due to its alternating red and black cell transfers represents a "worst case" communication pattern for MPI programs with 2D data domain decomposition. We show that our new method will greatly improve the efficiency of this application on a cluster, yielding performance close to that of manual optimization.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.subjectntnudaimno_NO
dc.subjectSIF2 datateknikkno_NO
dc.subjectKomplekse datasystemerno_NO
dc.titleAutomatic Optimization of MPI Applications: Turning Synchronous Calls Into Asynchronousnb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber126nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record