Autotuning Benchmarking Techniques: A Roofline Model Case Study

Tørring, Jacob; Meyer, Jan Christian; Elster, Anne C.

dc.contributor.author	Tørring, Jacob
dc.contributor.author	Meyer, Jan Christian
dc.contributor.author	Elster, Anne C.
dc.date.accessioned	2022-07-07T08:09:12Z
dc.date.available	2022-07-07T08:09:12Z
dc.date.created	2021-07-19T17:55:29Z
dc.date.issued	2021
dc.identifier.citation	Proceedings, International Parallel and Distributed Processing Symposium (IPDPS). 2021, 806-815.	en_US
dc.identifier.issn	1530-2075
dc.identifier.uri	https://hdl.handle.net/11250/3003388
dc.description.abstract	Peak performance metrics published by vendors often do not correspond to what can be achieved in practice. It is therefore of great interest to do extensive benchmarking on core applications and library routines. Since DGEMM is one of the most used in compute-intensive numerical codes, it is typically highly vendor optimized and of great interest for empirical benchmarks.In this paper we show how to build a novel tool that autotunes the benchmarking process for the Roofline model. Our novel approach can efficiently and reliably find optimal configurations for any target hardware. Results of our tool on a range of hardware architectures and comparisons to theoretical peak performance are included. Our tool autotunes the benchmarks for the target architecture by deciding the optimal parameters through state space reductions and exhaustive search. Our core idea includes calculating the confidence interval using the variance and mean and comparing it against the current optimum solution. We can then terminate the evaluation process early if the confidence interval's maximum is lower than the current optimum solution. This dynamic approach yields a search time improvement of up to 116.33x for the DGEMM benchmarking process compared to a traditional fixed sample-size methodology. Our tool produces the same benchmarking result with an error of less than 2% for each of the optimization techniques we apply, while providing a great reduction in search time. We compare these results against hand-tuned benchmarking parameters. Results from the memory-intensive TRIAD benchmark, and some ideas for future directions are also included.	en_US
dc.language.iso	eng	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.title	Autotuning Benchmarking Techniques: A Roofline Model Case Study	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	acceptedVersion	en_US
dc.rights.holder	© IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.source.pagenumber	806-815	en_US
dc.source.journal	Proceedings, International Parallel and Distributed Processing Symposium (IPDPS)	en_US
dc.identifier.doi	10.1109/IPDPSW52791.2021.00119
dc.identifier.cristin	1922127
cristin.ispublished	true
cristin.fulltext	preprint
cristin.fulltext	postprint
cristin.qualitycode	1

Files in this item

Name:: iWAPT2021_paper_6.pdf
Size:: 636.1Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Institutt for datateknologi og informatikk [6778]
Publikasjoner fra CRIStin - NTNU [38127]

Show simple item record