MDM: The GPU Memory Divergence Model
Chapter
Accepted version
Åpne
Permanent lenke
https://hdl.handle.net/11250/2734404Utgivelsesdato
2020Metadata
Vis full innførselSamlinger
Originalversjon
http://dx.doi.org/10.1109/MICRO50266.2020.00085Sammendrag
Analytical models enable architects to carry out early-stage design space exploration several orders of magnitude faster than cycle-accurate simulation by capturing first-order performance phenomena with a set of mathematical equations. However, this speed advantage is void if the conclusions obtained through the model are misleading due to model inaccuracies. Therefore, a practical analytical model needs to be sufficiently accurate to capture key performance trends across a broad range of applications and architectural configurations. In this work, we focus on analytically modeling the performance of emerging memory-divergent GPU-compute applications which are common in domains such as machine learning and data analytics. The poor spatial locality of these applications leads to frequent L1 cache blocking due to the application issuing significantly more concurrent cache misses than the cache can support, which cripples the GPU's ability to use Thread-Level Parallelism (TLP) to hide memory latencies. We propose the GPU Memory Divergence Model (MDM) which faithfully captures the key performance characteristics of memory-divergent applications, including memory request batching and excessive NoC/DRAM queueing delays. We validate MDM against detailed simulation and real hardware, and report substantial improvements in (1) scope: the ability to model prevalent memory-divergent applications in addition to non-memory divergent applications; (2) practicality: 6.1× faster by computing model inputs using binary instrumentation as opposed to functional simulation; and (3) accuracy: 13.9% average prediction error versus 162% for the state-of-the-art GPUMech model.