• norsk
    • English
  • English 
    • norsk
    • English
  • Login
View Item 
  •   Home
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for datateknologi og informatikk
  • View Item
  •   Home
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for datateknologi og informatikk
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

OpenACC-based Snow Simulation

Mikalsen, Magnus Alvestad
Master thesis
Thumbnail
View/Open
655634_FULLTEXT01.pdf (7.714Mb)
655634_COVER01.pdf (184.2Kb)
URI
http://hdl.handle.net/11250/253383
Date
2013
Metadata
Show full item record
Collections
  • Institutt for datateknologi og informatikk [7361]
Abstract
In recent years, the GPU platform has risen in popularity in high performance com-puting due to its cost effectiveness and high computing power offered through its manyparallel cores. The GPUs computing power can be harnessed using the low-level GPGPUprogramming APIs CUDA and OpenCL. While both CUDA and OpenCL gives the pro-grammer fine-grained control of a GPUs resources, they are both generally considereddifficult to use and can potentially lead to complicated software design. To simplifyGPGPU programming and gain more mainstream usage of GPUs, there is an increasedinterest in moving the complexity of GPGPU programming over to the compiler. Thishas lead to the development of the directive-based standard for heterogeneous computingcalled OpenACC, supported by NVIDIA, Cray, PGI, CAPS and others.In this thesis, we explore using OpenACC on a high performance snow simulator code de-veloped by the HPC-Lab at NTNU. The snow simulator consists of two main simulationcomponents; the simulation of wind, and the simulation of snow particle movement.The OpenACC version of the snow simulator is made by first updating the currentCUDA version, porting it to a sequential CPU implementation, and applying OpenACCdirectives to accelerate compute intensive regions in the code. The OpenACC port isalso optimized by reducing datamovement between host and device using OpenACClibrary routines.Due to the heterogeneous nature of OpenACC, we show that the inability to explicitlyuse shared memory as temporary storage and not being able to use texture memory forhardware based interpolation and 3D caching, are the largest performance bottleneckswhen comparing to the CUDA version.This is supported by the benchmarks of the OpenACC implementation which is shown togive only 40.6% performance of the CUDA version with an average speedup of 3.2x whenscaling the amount of snow particles simulated and using a balanced windfield dimension.When scaling the windfield with constant snow particles 58% of the CUDA performanceis reached with an average speedup of 4.84x. The best real-time performance is found atabout 1.5M snow particles when using a balanced windfield with about 524K grid cells.Using OpenACC for accelerating high performance graphical simulations can be a viableoption if the goal is high code portability, however, when the goal is to achieve the best possible performance, our experience show that it is still better to use the more low-level alternatives CUDA or OpenCL.
Publisher
Institutt for datateknikk og informasjonsvitenskap

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit
 

 

Browse

ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDocument TypesJournalsThis CollectionBy Issue DateAuthorsTitlesSubjectsDocument TypesJournals

My Account

Login

Statistics

View Usage Statistics

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit