Vis enkel innførsel

dc.contributor.authorZhao, Xia
dc.contributor.authorEeckhout, Lieven
dc.contributor.authorJahre, Magnus
dc.date.accessioned2022-11-24T09:23:24Z
dc.date.available2022-11-24T09:23:24Z
dc.date.created2022-04-05T15:31:58Z
dc.date.issued2022
dc.identifier.citationIEEE Symposium on High-Performance Computer Architecture (HPCA). 2022, 1014-1028.en_US
dc.identifier.issn1530-0897
dc.identifier.urihttps://hdl.handle.net/11250/3033800
dc.description.abstractHeterogeneous architectures with latency-sensitive CPU cores and bandwidth-intensive accelerators are attractive as they deliver high performance at favorable cost. These architectures typically have significantly more compute cores than memory nodes. The many bandwidth-intensive accelerators hence overwhelm the few memory nodes, resulting in suboptimal accelerator performance — as their bandwidth needs are not met — and poor CPU performance — because memory node blocking creates high latencies. We call this phenomenon network clogging. Since network clogging is a widespread issue in heterogeneous architectures, we first investigate if existing state-of-the-art approaches can address it. We find that the most effective prior approach, called Realistic Probing (RP), is suboptimal because it searches the local caches of other cores for missing data.We propose Delegated Replies which lets memory nodes speculatively delegate the responsibility of replying to last-level cache hits to the private cache that last accessed the requested cache block, hence avoiding the search that fundamentally limits RP. Moreover, Delegated Replies uses the (typically) under-utilized request network for delegation; it is the reply network links of the memory nodes that commonly clog because replies include complete cache blocks in addition to metadata. We evaluate Delegated Replies in the context of heterogeneous architectures with latency-sensitive CPU cores and bandwidth-intensive GPU cores and find that it improves GPU (CPU) performance by 14.2% (5.2%) and 25.7% (8.8%) on average compared to RP and our baseline, respectively.en_US
dc.language.isoengen_US
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.titleDelegated Replies: Alleviating Network Clogging in Heterogeneous Architecturesen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionacceptedVersionen_US
dc.rights.holder© IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.en_US
dc.subject.nsiVDP::Datateknologi: 551en_US
dc.subject.nsiVDP::Computer technology: 551en_US
dc.source.pagenumber1014-1028en_US
dc.source.journalIEEE Symposium on High-Performance Computer Architecture (HPCA)en_US
dc.identifier.doi10.1109/HPCA53966.2022.00078
dc.identifier.cristin2015457
dc.relation.projectNorges forskningsråd: 286596en_US
cristin.ispublishedtrue
cristin.fulltextpostprint
cristin.qualitycode2


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel