BAT: A Benchmark suite for AutoTuners

Sund, Ingunn; Kirkhorn, Knut Aasgaard; Tørring, Jacob Odgård; Elster, Anne C.

Sund, Ingunn; Kirkhorn, Knut Aasgaard; Tørring, Jacob Odgård; Elster, Anne C.

Peer reviewed, Journal article

Accepted version

View/Open

NIK_2021_paper_24.pdf (555.5Kb)

URI

https://hdl.handle.net/11250/3003394

Date

2021

Metadata

Show full item record

Collections

Institutt for datateknologi og informatikk [6552]
Publikasjoner fra CRIStin - NTNU [37220]

Abstract

An autotuner takes a parameterized code as input and tries to optimize the code by finding the best possible values for a given architecture. To our knowledge, there are currently no standardized benchmark suites for comparing and testing autotuners. Developers of autotuners thus make their own when presenting and comparing autotuners. We thus present BAT, a Benchmark suite for AutoTuners with HPCbased parameterized GPU programs. CUDA programs and kernels from ”The Scalable Heterogeneous Computing (SHOC) Benchmark” are parameterized. BAT contains a varied selection of benchmarks of different complexity that can utilize multiple GPUs on one system, either by running the same program and computations on multiple nodes, or by splitting the work between nodes. BAT contains 9 different HPC benchmarks that provide a large search space of autotuning parameters, and are modified to suite many different autotuners. BAT also includes a CLI that facilitates autotuning with the benchmarks. Our benchmark suite is tested with four different autotuners, OpenTuner, Kernel Tuner, CLTune and KTT. They differ in setup and how they tune. The impact of the different benchmark parameters on the running time across architectures is analyzed. Test systems used include a DGX-2, IBM Power System AC922 with Tesla V100-SXM2 32 GB GPUs, an RTX Titan, a GeForce GTX 980 and a server with 20 Tesla T4 GPUs.

Publisher

NTNU

Journal

NIKT: Norsk IKT-konferanse for forskning og utdanning