Vis enkel innførsel

dc.contributor.authorBjertnes, Lars
dc.contributor.authorTørring, Jacob Odgård
dc.contributor.authorElster, Anne C.
dc.date.accessioned2022-07-07T08:11:05Z
dc.date.available2022-07-07T08:11:05Z
dc.date.created2021-12-07T23:31:36Z
dc.date.issued2021
dc.identifier.citationNIKT: Norsk IKT-konferanse for forskning og utdanning. 2021, 1 72-85.en_US
dc.identifier.issn1892-0713
dc.identifier.urihttps://hdl.handle.net/11250/3003392
dc.description.abstractThe abstract relation between hardware parameters and program performance makes setting program parameters a difficult task. Without autotuning, software can miss low-level optimizations, resulting in lower performance. Traditionally, time-consuming trial and error search methods have been the staple of autotuning. Applying Natural language processing (NLP) based machine learning (ML) methods to source code as a means to perform autotuning-oriented tasks is a growing topic. Earlier research has, with success, performed a range of different autotuning tasks using multiple source code languages. However, most of the source code data is CPU-oriented, with very little GPU code. The LS-CAT (Large-Scale CUDA AutoTuning) dataset [BTE21] uses CUDA GPU-based kernels and generates a dataset to perform thread-coarsening. This paper implements several custom NLP-ML pipelines to evaluate ML-based thread-coarsening using the LS-CAT dataset, and a custom scoring function to? nd the performance impact for any choice. Several model con? gurations were able to beat both random choice, 0.9400, and only selecting the largest thread-block (1024), 0.9437. Finally, the best model achieves a score of 0.9483, giving an average performance increase and speedup of 0.49 percent over the largest thread-block. Implementing self-attention mechanisms proved to counteract overfitting, while a multi-label based learning task outperformed other approaches. Compared to previous datasets [Cum+ 17], the LS-CAT dataset's higher thread-coarsening precision gives a more precise evaluation of the model's performance ...en_US
dc.language.isoengen_US
dc.publisherNTNUen_US
dc.relation.urihttps://ojs.bibsys.no/index.php/NIK/article/view/917
dc.titleAutotuning CUDA: Applying NLP Techniques to LS-CATen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionacceptedVersionen_US
dc.source.pagenumber72-85en_US
dc.source.volume1en_US
dc.source.journalNIKT: Norsk IKT-konferanse for forskning og utdanningen_US
dc.identifier.cristin1965848
cristin.ispublishedtrue
cristin.fulltextpostprint
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel