Vis enkel innførsel

dc.contributor.authorTørring, Jacob
dc.contributor.authorMeyer, Jan Christian
dc.contributor.authorElster, Anne C.
dc.date.accessioned2022-07-07T08:09:12Z
dc.date.available2022-07-07T08:09:12Z
dc.date.created2021-07-19T17:55:29Z
dc.date.issued2021
dc.identifier.citationProceedings, International Parallel and Distributed Processing Symposium (IPDPS). 2021, 806-815.en_US
dc.identifier.issn1530-2075
dc.identifier.urihttps://hdl.handle.net/11250/3003388
dc.description.abstractPeak performance metrics published by vendors often do not correspond to what can be achieved in practice. It is therefore of great interest to do extensive benchmarking on core applications and library routines. Since DGEMM is one of the most used in compute-intensive numerical codes, it is typically highly vendor optimized and of great interest for empirical benchmarks.In this paper we show how to build a novel tool that autotunes the benchmarking process for the Roofline model. Our novel approach can efficiently and reliably find optimal configurations for any target hardware. Results of our tool on a range of hardware architectures and comparisons to theoretical peak performance are included. Our tool autotunes the benchmarks for the target architecture by deciding the optimal parameters through state space reductions and exhaustive search. Our core idea includes calculating the confidence interval using the variance and mean and comparing it against the current optimum solution. We can then terminate the evaluation process early if the confidence interval's maximum is lower than the current optimum solution. This dynamic approach yields a search time improvement of up to 116.33x for the DGEMM benchmarking process compared to a traditional fixed sample-size methodology. Our tool produces the same benchmarking result with an error of less than 2% for each of the optimization techniques we apply, while providing a great reduction in search time. We compare these results against hand-tuned benchmarking parameters. Results from the memory-intensive TRIAD benchmark, and some ideas for future directions are also included.en_US
dc.language.isoengen_US
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.titleAutotuning Benchmarking Techniques: A Roofline Model Case Studyen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionacceptedVersionen_US
dc.rights.holder© IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.en_US
dc.source.pagenumber806-815en_US
dc.source.journalProceedings, International Parallel and Distributed Processing Symposium (IPDPS)en_US
dc.identifier.doi10.1109/IPDPSW52791.2021.00119
dc.identifier.cristin1922127
cristin.ispublishedtrue
cristin.fulltextpreprint
cristin.fulltextpostprint
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel