Modeling Communication on Multi-GPU Systems

Spampinato, Daniele

dc.contributor.advisor	Elster, Anne Cathrine	nb_NO
dc.contributor.author	Spampinato, Daniele	nb_NO
dc.date.accessioned	2014-12-19T13:32:39Z
dc.date.available	2014-12-19T13:32:39Z
dc.date.created	2010-09-03	nb_NO
dc.date.issued	2009	nb_NO
dc.identifier	347878	nb_NO
dc.identifier	ntnudaim:4697	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/250796
dc.description.abstract	Coupling commodity CPUs and modern GPUs give you heterogeneous systems that are cheap, high-performance with incredible FLOPS counts. Recent evolution of GPGPU models and technologies make these systems even more appealing as compute devices for a range of HPC applications including image processing, seismic processing and other physical modeling, as well as linear programming applications. In fact, graphics vendor such as NVIDIA and AMD are now targeting HPC with some of their products. Due to the power and frequency walls, the trend is now to use multiple GPUs on a given system, much like you will find multiple cores on CPU-based systems. However, increasing the hierarchy of resource wides the spectrum of factors that may impact on the performance of the system. The lack of good models for GPU-based, heterogeneous systems also makes it harder to understand which factors impact performance the most. The goal of this thesis is to analyze such factors by investigating and benchmarking NVIDIA's multi-GPU solution, their recent NVIDIA Tesla S1070 Computing System. This system combines four T10 GPUs making available up to 4 TFLOPS of computational power. Based on a comparative study of fundamental parallel computing models and on the specific heterogeneous features exposed by the system, we define a test space for performance analysis. As a case study, we develop a red-black, SOR PDE solver for Laplace equations with Dirichlet boundaries, well known for requiring constant communication in order to exchange neighboring data. To aid both design and analysis, we propose a model for multi-GPU systems targeting communication between the several GPUs. The main variables exposed by the benchmark application are: domain size and shape, kind of data partitioning, number of GPUs, width of the borders to exchange, kernels to use, and kind of synchronization between the GPU contexts. Among other results, the framework is able to point out the most critical bounds of the S1070 system when dealing with applications like the one in our case study. We show that the multi-GPU system greatly benefits from using all its four GPUs on very large data volumes. Our results show the four GPUs almost four times faster than a single GPU, and twice as fast as two. Our analysis outcomes also allow us to refine our static communication model, enriching it with regression-based predictions.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Institutt for datateknikk og informasjonsvitenskap	nb_NO
dc.subject	ntnudaim	no_NO
dc.subject	SIF2 datateknikk	no_NO
dc.subject	Komplekse datasystemer	no_NO
dc.title	Modeling Communication on Multi-GPU Systems	nb_NO
dc.type	Master thesis	nb_NO
dc.source.pagenumber	154	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskap	nb_NO

Tilhørende fil(er)

Filnavn:: 347878_ATTACHMENT01.zip
Størrelse:: 52.38Kb
Format:: Ukjent

Åpne

Filnavn:: 347878_FULLTEXT01.pdf
Størrelse:: 10.70Mb
Format:: PDF

Åpne

Filnavn:: 347878_COVER01.pdf
Størrelse:: 101.2Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6620]

Vis enkel innførsel