BAT: A Benchmark suite for AutoTuners
Peer reviewed, Journal article
MetadataShow full item record
An autotuner takes a parameterized code as input and tries to optimize the code by finding the best possible values for a given architecture. To our knowledge, there are currently no standardized benchmark suites for comparing and testing autotuners. Developers of autotuners thus make their own when presenting and comparing autotuners. We thus present BAT, a Benchmark suite for AutoTuners with HPCbased parameterized GPU programs. CUDA programs and kernels from ”The Scalable Heterogeneous Computing (SHOC) Benchmark” are parameterized. BAT contains a varied selection of benchmarks of different complexity that can utilize multiple GPUs on one system, either by running the same program and computations on multiple nodes, or by splitting the work between nodes. BAT contains 9 different HPC benchmarks that provide a large search space of autotuning parameters, and are modified to suite many different autotuners. BAT also includes a CLI that facilitates autotuning with the benchmarks. Our benchmark suite is tested with four different autotuners, OpenTuner, Kernel Tuner, CLTune and KTT. They differ in setup and how they tune. The impact of the different benchmark parameters on the running time across architectures is analyzed. Test systems used include a DGX-2, IBM Power System AC922 with Tesla V100-SXM2 32 GB GPUs, an RTX Titan, a GeForce GTX 980 and a server with 20 Tesla T4 GPUs.