Accelerating Adaptive Mesh Refinement through Multiscale Dataflow Computing
MetadataShow full item record
As we observe diminishing returns for multi-core CPUs, especially when considering power budgets, FPGAs are becoming increasingly important in the HPC world. To push the limits of performance and energy efficiency, more general-purpose hardware, i.e. multi-core and GPU systems are often not sufficient, and we need to create specialised hardware systems. FPGAs are specialised hardware devices that allow us to create systems that are optimized for a given application. However, development and integration is generally difficult and time-consuming. In this thesis we explore harnessing the power of FPGA acceleration through Maxeler s FPGA-based Multiscale Dataflow Computing system by accelerating miniAMR, a proxy application for adaptive mesh refinement developed by Mantevo project. Proxy applications are minimal applications, which mimic the performance characteristics of full applications and are meant for testing and benchmarking. The applications are easier to work with than full applications and as they are meant to test both hardware and software, they contain a lot of options and runtime arguments, which makes FPGA acceleration challenging. Using the Maxeler system, we create arithmetic kernels for the core 7-point and 27-point stencil computations of miniAMR. By rearranging the data used, properly managing memory, and moving the core 3D stencil calculations onto dataflow engines, we achieve a maximum speedup of 2.52 while maintaining the functionality of miniAMR. Because of the flexibility of the Maxeler system and since the application mimics characteristics of full applications, our kernels can potentially be used to accelerate full adaptive mesh refinement or other stencil-driven applications with minimal effort.