Vis enkel innførsel

dc.contributor.authorGottschall, Björn
dc.date.accessioned2024-02-19T10:08:20Z
dc.date.available2024-02-19T10:08:20Z
dc.date.issued2024
dc.identifier.isbn978-82-326-7605-7
dc.identifier.issn2703-8084
dc.identifier.urihttps://hdl.handle.net/11250/3118391
dc.description.abstractThe quest for an increase in processor performance has become difficult due to the inherent power limitations of today’s chips. As processors become more complex with deeper and wider pipelines, out-of-order execution, and the integration of heterogeneous accelerators, software developers face increasing challenges in utilizing these resources efficiently. Therefore, understanding the performance characteristics of our workloads running on these complex architectures has never been more important to enable optimization that increases efficiency and performance. In this thesis, we present three contributions that collectively answer the two fundamental questions of performance analysis for out-of-order processors by explaining what an application spends time on and why. Our first contribution is TIP: Time-Proportional Instruction Profiling, which establishes the time-proportional principle of performance analysis and identifies the four states of commitment that a time-proportional performance analyzer must be able to differentiate. Time-proportional instruction profiling is able to attribute execution time to instruction accurately, unlike contemporary performance profilers, which are not time-proportional. Our second contribution is TEA: Time-Proportional Event Analysis. TEA combines time-proportional instruction profiling with accurate performance event attribution, thereby explaining why certain instructions are performancecritical. The evaluation of the accuracy of performance profilers was made possible through TraceDoctor, which is our third key contribution. TraceDoctor is a high-performance tracing framework that enabled the creation of a golden performance reference and proved its flexibility by enabling the evaluation of accuracy and overhead in sampled simulations. To demonstrate the potential of time-proportional performance analysis, we used it to optimize the industry-standard SPEC CPU2017 benchmarks Imagick, lbm, and nab, and achieved a speedup of 1.93, 1.28, and 2.45 times, respectively. Contemporary performance profilers, such as Intel PEBS, AMD IBS, Arm SPE, and IBM RIS are not time-proportional and hence do not clearly identify these optimization opportunities.en_US
dc.language.isoengen_US
dc.publisherNTNUen_US
dc.relation.ispartofseriesDoctoral theses at NTNU;2024:52
dc.relation.haspartPaper A: Gottschall, Björn; Eeckhout, Lieven; Jahre, Magnus. TIP: Time-Proportional Instruction Profiling. I: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. https://doi.org/10.1145/3466752.3480058 - © ACM 2021. This is the author's version of the work. It is posted here for your personal use. Not for redistribution.en_US
dc.relation.haspartPaper B: Gottschall, Björn; Eeckhout, Lieven; Jahre, Magnus. TEA: Time-Proportional Event Analysis. I: Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA'23). https://doi.org/10.1145/3579371.3589058 - This work is licensed under a Creative Commons Attribution International 4.0 License.en_US
dc.relation.haspartPaper C: Gottschall, Björn; Campelo de Santana, Silvio Heverton; Jahre, Magnus. Balancing Accuracy and Evaluation Overhead in Simulation Point Selection. I: 2023 IEEE International Symposium on Workload Characterization (IISWC). s. 44-53 https://doi.org/10.1109/IISWC59245.2023.00019 - In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of NTNU’s products or services. Internal or personal use of this material is permitted.en_US
dc.titleTime-Proportional Performance Analysis for Out-of-Order Processorsen_US
dc.typeDoctoral thesisen_US
dc.subject.nsiVDP::Technology: 500::Information and communication technology: 550en_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel