• norsk
    • English
  • English 
    • norsk
    • English
  • Login
View Item 
  •   Home
  • Øvrige samlinger
  • Publikasjoner fra CRIStin - NTNU
  • View Item
  •   Home
  • Øvrige samlinger
  • Publikasjoner fra CRIStin - NTNU
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

MDM: The GPU Memory Divergence Model

Wang, Lu; Jahre, Magnus; Adileh, Almutaz; Eeckhout, Lieven
Chapter
Accepted version
Thumbnail
View/Open
Wang (620.3Kb)
URI
https://hdl.handle.net/11250/2734404
Date
2020
Metadata
Show full item record
Collections
  • Institutt for datateknologi og informatikk [7357]
  • Publikasjoner fra CRIStin - NTNU [41869]
Original version
http://dx.doi.org/10.1109/MICRO50266.2020.00085
Abstract
Analytical models enable architects to carry out early-stage design space exploration several orders of magnitude faster than cycle-accurate simulation by capturing first-order performance phenomena with a set of mathematical equations. However, this speed advantage is void if the conclusions obtained through the model are misleading due to model inaccuracies. Therefore, a practical analytical model needs to be sufficiently accurate to capture key performance trends across a broad range of applications and architectural configurations. In this work, we focus on analytically modeling the performance of emerging memory-divergent GPU-compute applications which are common in domains such as machine learning and data analytics. The poor spatial locality of these applications leads to frequent L1 cache blocking due to the application issuing significantly more concurrent cache misses than the cache can support, which cripples the GPU's ability to use Thread-Level Parallelism (TLP) to hide memory latencies. We propose the GPU Memory Divergence Model (MDM) which faithfully captures the key performance characteristics of memory-divergent applications, including memory request batching and excessive NoC/DRAM queueing delays. We validate MDM against detailed simulation and real hardware, and report substantial improvements in (1) scope: the ability to model prevalent memory-divergent applications in addition to non-memory divergent applications; (2) practicality: 6.1× faster by computing model inputs using binary instrumentation as opposed to functional simulation; and (3) accuracy: 13.9% average prediction error versus 162% for the state-of-the-art GPUMech model.
Publisher
Institute of Electrical and Electronics Engineers (IEEE)

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit
 

 

Browse

ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDocument TypesJournalsThis CollectionBy Issue DateAuthorsTitlesSubjectsDocument TypesJournals

My Account

Login

Statistics

View Usage Statistics

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit