ImageCL 3D Extensions Targeting Adaptive Mesh Refinement Proxy Applications on GPUs

As the adoption of parallel and heterogeneous systems increases, programming such systems also becomes increasingly complex. Frameworks like CUDA and OpenCL provides functional portability across their supported devices. However, having the same code run optimally across multiple devices with different architectures, including being able to port code fairly seamlessly and efficiently to other GPU device architectures, is not provided. This challenge, known as performance portability, is significant since GPU architectures tend to get updated and vary even more than CPU architectures.

By transforming optimizations into tuning parameters that can be applied statically by the compiler, an auto-tuner can be used to pick the best combination of optimizations for each architecture. This strategy has earlier been explored using the ImageCL language and compiler, which moves much of the complexity away from the programmer by abstracting away many optimizations which would normally have to be applied manually.

In this thesis, we extend the ImageCL language and compiler to support a broader range of applications. These extensions will be guided by miniAMR, a proxy application with the performance characteristics of an Adaptive Mesh Refinement (AMR) application. AMR is a computational method used for adapting the accuracy within certain regions of a domain, and is often used in scientific and engineering applications. We generate multiple GPU stencil kernels from ImageCL code and integrate them into the miniAMR application. We are able to show a considerable speedup (up to 6.78x) for many of the generated stencil kernels in miniAMR compared to the reference implementation.

Utgiver

NTNU