Per-Instruction Cycle Stacks Through Time-Proportional Event Analysis
Journal article, Peer reviewed
Accepted version
View/ Open
Date
2024Metadata
Show full item recordCollections
Abstract
Understanding what applications spend time on and why is critical for effective performance optimization. Unfortunately, current state-of-the-art performance analysis tools are generally unable to provide this information. The fundamental reason is that they lack time proportionality; i.e., in many cases, they do not attribute execution time to the instructions and performance events that the architecture is exposing the latency of. Time-proportional event analysis (TEA) creates per-instruction cycle stacks, which clearly and accurately explain what the application spends time on and why at the level of individual static instructions. TEA requires executing the application only once; it is accurate (with an average error of 2.1%); and its hardware implementation incurs negligible runtime, power, and area overheads of 1.1%, 0.1%, and 249 bits per core, respectively. Per-Instruction Cycle Stacks Through Time-Proportional Event Analysis