Vis enkel innførsel

dc.contributor.authorZhao, Xia
dc.contributor.authorJahre, Magnus
dc.contributor.authorEeckhout, Lieven
dc.date.accessioned2020-08-25T07:04:00Z
dc.date.available2020-08-25T07:04:00Z
dc.date.created2020-04-07T19:53:03Z
dc.date.issued2020
dc.identifier.isbn9781450371025
dc.identifier.urihttps://hdl.handle.net/11250/2673753
dc.description.abstractGraphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in different ways --- leading to suboptimal resource utilization when a single GPU is used to run a single application. One solution is to use the GPU in a multitasking fashion to improve utilization. Unfortunately, multitasking leads to destructive interference between co-running applications which causes fairness issues and Quality-of-Service (QoS) violations. We propose the Hybrid Slowdown Model (HSM) to dynamically and accurately predict application slowdown due to interference. HSM overcomes the low accuracy of prior white-box models, and training and implementation overheads of pure black-box models, with a hybrid approach. More specifically, the white-box component of HSM builds upon the fundamental insight that effective bandwidth utilization is proportional to DRAM row buffer hit rate, and the black-box component of HSM uses linear regression to relate row buffer hit rate to performance. HSM accurately predicts application slowdown with an average error of 6.8%, a significant improvement over the current state-of-the-art. In addition, we use HSM to guide various resource management schemes in multitasking GPUs: HSM-Fair significantly improves fairness (by 1.59x on average) compared to even partitioning, whereas HSM-QoS improves system throughput (by 18.9% on average) compared to proportional SM partitioning while maintaining the QoS target for the high-priority application in challenging mixed memory/compute-bound multi-program workloads.en_US
dc.language.isoengen_US
dc.publisherAssociation for Computing Machinery (ACM)en_US
dc.relation.ispartofASPLOS'20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
dc.titleHSM: A Hybrid Slowdown Model for Multitasking GPUsen_US
dc.typeChapteren_US
dc.description.versionacceptedVersionen_US
dc.source.pagenumber1371-1385en_US
dc.identifier.doi10.1145/3373376.3378457
dc.identifier.cristin1805600
dc.relation.projectNorges forskningsråd: 286596en_US
dc.description.localcode© ACM, 2020. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published here, https://doi.org/10.1145/3373376.3378457en_US
cristin.ispublishedtrue
cristin.fulltextpostprint
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel