|dc.description.abstract||Because of physical constraints, performance gains of single-core processors has come to a halt. Computer architects have responded by adding multiple processor cores to their designs. However, for continued performance gains, multi-core designs require multithreaded applications. Manually managing individual threads becomes burdensome for large applications, and programmers therefore opt to use interfaces that abstract some of this complexity. OpenMP is one such interface. It is an industry-standard for parallel shared-memory programming.
There is currently an ongoing effort to add a profiling interface called the OpenMP Tools (OMPT) API to the upcoming OpenMP 5.0 specification. OMPT will allow creating portable, high-quality performance analysis tools for OpenMP programs.
Grain graphs is a recent visualization that simplifies OpenMP performance analysis. It has previously been found that the instrumentation callbacks of OMPT are almost sufficient to generate the data needed by grain graphs. However, OMPT does not describe events to measure the duration spent creating tasks, or tracing the execution of parallel for-loop chunks.
In this thesis, I propose extensions that provide the necessary descriptions, and evaluate the performance impact of these extensions in the LLVM/Clang toolchain. My evaluation shows that the overheads are low. Benchmarks from the EPCC OpenMP micro-benchmark suite provoke up to 3% increased overhead in the most important scenarios. Most HPC workloads from the BOTS and SPEC OMP2012 application suites don't see any change in execution time. While the proposed extensions are motivated by grain graphs, they can be used by other profiling methods as well.||