HSM: A Hybrid Slowdown Model for Multitasking GPUs

Zhao, Xia; Jahre, Magnus; Eeckhout, Lieven

dc.contributor.author	Zhao, Xia
dc.contributor.author	Jahre, Magnus
dc.contributor.author	Eeckhout, Lieven
dc.date.accessioned	2020-08-25T07:04:00Z
dc.date.available	2020-08-25T07:04:00Z
dc.date.created	2020-04-07T19:53:03Z
dc.date.issued	2020
dc.identifier.isbn	9781450371025
dc.identifier.uri	https://hdl.handle.net/11250/2673753
dc.description.abstract	Graphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in different ways --- leading to suboptimal resource utilization when a single GPU is used to run a single application. One solution is to use the GPU in a multitasking fashion to improve utilization. Unfortunately, multitasking leads to destructive interference between co-running applications which causes fairness issues and Quality-of-Service (QoS) violations. We propose the Hybrid Slowdown Model (HSM) to dynamically and accurately predict application slowdown due to interference. HSM overcomes the low accuracy of prior white-box models, and training and implementation overheads of pure black-box models, with a hybrid approach. More specifically, the white-box component of HSM builds upon the fundamental insight that effective bandwidth utilization is proportional to DRAM row buffer hit rate, and the black-box component of HSM uses linear regression to relate row buffer hit rate to performance. HSM accurately predicts application slowdown with an average error of 6.8%, a significant improvement over the current state-of-the-art. In addition, we use HSM to guide various resource management schemes in multitasking GPUs: HSM-Fair significantly improves fairness (by 1.59x on average) compared to even partitioning, whereas HSM-QoS improves system throughput (by 18.9% on average) compared to proportional SM partitioning while maintaining the QoS target for the high-priority application in challenging mixed memory/compute-bound multi-program workloads.	en_US
dc.language.iso	eng	en_US
dc.publisher	Association for Computing Machinery (ACM)	en_US
dc.relation.ispartof	ASPLOS'20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
dc.title	HSM: A Hybrid Slowdown Model for Multitasking GPUs	en_US
dc.type	Chapter	en_US
dc.description.version	acceptedVersion	en_US
dc.source.pagenumber	1371-1385	en_US
dc.identifier.doi	10.1145/3373376.3378457
dc.identifier.cristin	1805600
dc.relation.project	Norges forskningsråd: 286596	en_US
dc.description.localcode	© ACM, 2020. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published here, https://doi.org/10.1145/3373376.3378457	en_US
cristin.ispublished	true
cristin.fulltext	postprint
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: hsm-asplos20-preprint.pdf
Størrelse:: 1.771Mb
Format:: PDF
Beskrivelse:: Zhao

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6544]
Publikasjoner fra CRIStin - NTNU [37177]

Vis enkel innførsel