dc.contributor.author | Zhao, Xia | |
dc.contributor.author | Jahre, Magnus | |
dc.contributor.author | Eeckhout, Lieven | |
dc.date.accessioned | 2020-08-25T07:04:00Z | |
dc.date.available | 2020-08-25T07:04:00Z | |
dc.date.created | 2020-04-07T19:53:03Z | |
dc.date.issued | 2020 | |
dc.identifier.isbn | 9781450371025 | |
dc.identifier.uri | https://hdl.handle.net/11250/2673753 | |
dc.description.abstract | Graphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in different ways --- leading to suboptimal resource utilization when a single GPU is used to run a single application. One solution is to use the GPU in a multitasking fashion to improve utilization. Unfortunately, multitasking leads to destructive interference between co-running applications which causes fairness issues and Quality-of-Service (QoS) violations.
We propose the Hybrid Slowdown Model (HSM) to dynamically and accurately predict application slowdown due to interference. HSM overcomes the low accuracy of prior white-box models, and training and implementation overheads of pure black-box models, with a hybrid approach. More specifically, the white-box component of HSM builds upon the fundamental insight that effective bandwidth utilization is proportional to DRAM row buffer hit rate, and the black-box component of HSM uses linear regression to relate row buffer hit rate to performance. HSM accurately predicts application slowdown with an average error of 6.8%, a significant improvement over the current state-of-the-art. In addition, we use HSM to guide various resource management schemes in multitasking GPUs: HSM-Fair significantly improves fairness (by 1.59x on average) compared to even partitioning, whereas HSM-QoS improves system throughput (by 18.9% on average) compared to proportional SM partitioning while maintaining the QoS target for the high-priority application in challenging mixed memory/compute-bound multi-program workloads. | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Association for Computing Machinery (ACM) | en_US |
dc.relation.ispartof | ASPLOS'20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems | |
dc.title | HSM: A Hybrid Slowdown Model for Multitasking GPUs | en_US |
dc.type | Chapter | en_US |
dc.description.version | acceptedVersion | en_US |
dc.source.pagenumber | 1371-1385 | en_US |
dc.identifier.doi | 10.1145/3373376.3378457 | |
dc.identifier.cristin | 1805600 | |
dc.relation.project | Norges forskningsråd: 286596 | en_US |
dc.description.localcode | © ACM, 2020. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published here, https://doi.org/10.1145/3373376.3378457 | en_US |
cristin.ispublished | true | |
cristin.fulltext | postprint | |
cristin.qualitycode | 1 | |