• norsk
    • English
  • English 
    • norsk
    • English
  • Login
View Item 
  •   Home
  • Øvrige samlinger
  • Publikasjoner fra CRIStin - NTNU
  • View Item
  •   Home
  • Øvrige samlinger
  • Publikasjoner fra CRIStin - NTNU
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Autotuning CUDA: Applying NLP Techniques to LS-CAT

Bjertnes, Lars; Tørring, Jacob Odgård; Elster, Anne C.
Peer reviewed, Journal article
Accepted version
Thumbnail
View/Open
NIK_2021_paper_22.pdf (544.6Kb)
URI
https://hdl.handle.net/11250/3003392
Date
2021
Metadata
Show full item record
Collections
  • Institutt for datateknologi og informatikk [5025]
  • Publikasjoner fra CRIStin - NTNU [26751]
Original version
NIKT: Norsk IKT-konferanse for forskning og utdanning. 2021, 1 72-85.  
Abstract
The abstract relation between hardware parameters and program performance makes setting program parameters a difficult task. Without autotuning, software can miss low-level optimizations, resulting in lower performance. Traditionally, time-consuming trial and error search methods have been the staple of autotuning. Applying Natural language processing (NLP) based machine learning (ML) methods to source code as a means to perform autotuning-oriented tasks is a growing topic. Earlier research has, with success, performed a range of different autotuning tasks using multiple source code languages. However, most of the source code data is CPU-oriented, with very little GPU code. The LS-CAT (Large-Scale CUDA AutoTuning) dataset [BTE21] uses CUDA GPU-based kernels and generates a dataset to perform thread-coarsening. This paper implements several custom NLP-ML pipelines to evaluate ML-based thread-coarsening using the LS-CAT dataset, and a custom scoring function to? nd the performance impact for any choice. Several model con? gurations were able to beat both random choice, 0.9400, and only selecting the largest thread-block (1024), 0.9437. Finally, the best model achieves a score of 0.9483, giving an average performance increase and speedup of 0.49 percent over the largest thread-block. Implementing self-attention mechanisms proved to counteract overfitting, while a multi-label based learning task outperformed other approaches. Compared to previous datasets [Cum+ 17], the LS-CAT dataset's higher thread-coarsening precision gives a more precise evaluation of the model's performance ...
Publisher
NTNU
Journal
NIKT: Norsk IKT-konferanse for forskning og utdanning

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit
 

 

Browse

ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDocument TypesJournalsThis CollectionBy Issue DateAuthorsTitlesSubjectsDocument TypesJournals

My Account

Login

Statistics

View Usage Statistics

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit