Autotuning CUDA: Applying NLP Techniques to LS-CAT

Bjertnes, Lars; Tørring, Jacob Odgård; Elster, Anne C.

dc.contributor.author	Bjertnes, Lars
dc.contributor.author	Tørring, Jacob Odgård
dc.contributor.author	Elster, Anne C.
dc.date.accessioned	2022-07-07T08:11:05Z
dc.date.available	2022-07-07T08:11:05Z
dc.date.created	2021-12-07T23:31:36Z
dc.date.issued	2021
dc.identifier.citation	NIKT: Norsk IKT-konferanse for forskning og utdanning. 2021, 1 72-85.	en_US
dc.identifier.issn	1892-0713
dc.identifier.uri	https://hdl.handle.net/11250/3003392
dc.description.abstract	The abstract relation between hardware parameters and program performance makes setting program parameters a difficult task. Without autotuning, software can miss low-level optimizations, resulting in lower performance. Traditionally, time-consuming trial and error search methods have been the staple of autotuning. Applying Natural language processing (NLP) based machine learning (ML) methods to source code as a means to perform autotuning-oriented tasks is a growing topic. Earlier research has, with success, performed a range of different autotuning tasks using multiple source code languages. However, most of the source code data is CPU-oriented, with very little GPU code. The LS-CAT (Large-Scale CUDA AutoTuning) dataset [BTE21] uses CUDA GPU-based kernels and generates a dataset to perform thread-coarsening. This paper implements several custom NLP-ML pipelines to evaluate ML-based thread-coarsening using the LS-CAT dataset, and a custom scoring function to? nd the performance impact for any choice. Several model con? gurations were able to beat both random choice, 0.9400, and only selecting the largest thread-block (1024), 0.9437. Finally, the best model achieves a score of 0.9483, giving an average performance increase and speedup of 0.49 percent over the largest thread-block. Implementing self-attention mechanisms proved to counteract overfitting, while a multi-label based learning task outperformed other approaches. Compared to previous datasets [Cum+ 17], the LS-CAT dataset's higher thread-coarsening precision gives a more precise evaluation of the model's performance ...	en_US
dc.language.iso	eng	en_US
dc.publisher	NTNU	en_US
dc.relation.uri	https://ojs.bibsys.no/index.php/NIK/article/view/917
dc.title	Autotuning CUDA: Applying NLP Techniques to LS-CAT	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	acceptedVersion	en_US
dc.source.pagenumber	72-85	en_US
dc.source.volume	1	en_US
dc.source.journal	NIKT: Norsk IKT-konferanse for forskning og utdanning	en_US
dc.identifier.cristin	1965848
cristin.ispublished	true
cristin.fulltext	postprint
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: NIK_2021_paper_22.pdf
Størrelse:: 544.6Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6778]
Publikasjoner fra CRIStin - NTNU [38127]

Vis enkel innførsel