ML-based profile analysis of CUDA programs' compiler flag impact
Abstract
With the recent successes and interest in machine learning, this project aims to investigate whether machine learning methods can be used to improve compiler optimization selection. Compiler optimization is hard because the optimization process is time consuming. At the current state we cannot tell how multiple compiler optimizations used together will impact the performance of a program without actually compiling the program we are looking at. There exist way to many compiler optimization settings for anyone to test them all empirically.A good dataset is necessary if we want to train a machine learning model to predict how compiler optimization settings impact the execution speed of untested programs in future. To facilitate our Machine Learning experiment, we therefore create a dataset that contains information on how compiler optimization settings impact the execution time of different programs accelerated by a GPU. It is important that the variance of the timing measurements is small compared to the execution time of the used programs so the timing variance will not be mistaken by change in performance. Also programs have to achieve good performance increase from some of the compiler optimization settings in use. Since we decided to focus on compiler optimization settings for GPU computations, CUDA programs from the CUDA Software Development Kit are used to create a dataset with profiling information that contained hardware performance counters. The dataset is used to train a neural network in order to predict how compiler optimization settings impact the execution speed of new untested programs.A system that use the prediction of the neural network we trained and tests the best predictions empirically to select good compiler optimization settings is implemented.Our system was set to evaluate the 10 best predictions from the neural network for the compiler optimization settings. In this case we achieved a 1.004 speedup in average for the programs we tested compared to the default setting of the compiler we used. Compared to random search our system thus do not perform better. We believe the programs and compiler options we chose were not able to provide us with useful data for the learning process of a neural network. However, the method should be tested further and we include several suggestions for future work.