Delegated Replies: Alleviating Network Clogging in Heterogeneous Architectures

Heterogeneous architectures with latency-sensitive CPU cores and bandwidth-intensive accelerators are attractive as they deliver high performance at favorable cost. These architectures typically have significantly more compute cores than memory nodes. The many bandwidth-intensive accelerators hence overwhelm the few memory nodes, resulting in suboptimal accelerator performance — as their bandwidth needs are not met — and poor CPU performance — because memory node blocking creates high latencies. We call this phenomenon network clogging. Since network clogging is a widespread issue in heterogeneous architectures, we first investigate if existing state-of-the-art approaches can address it. We find that the most effective prior approach, called Realistic Probing (RP), is suboptimal because it searches the local caches of other cores for missing data.We propose Delegated Replies which lets memory nodes speculatively delegate the responsibility of replying to last-level cache hits to the private cache that last accessed the requested cache block, hence avoiding the search that fundamentally limits RP. Moreover, Delegated Replies uses the (typically) under-utilized request network for delegation; it is the reply network links of the memory nodes that commonly clog because replies include complete cache blocks in addition to metadata. We evaluate Delegated Replies in the context of heterogeneous architectures with latency-sensitive CPU cores and bandwidth-intensive GPU cores and find that it improves GPU (CPU) performance by 14.2% (5.2%) and 25.7% (8.8%) on average compared to RP and our baseline, respectively.

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Journal

IEEE Symposium on High-Performance Computer Architecture (HPCA)

Copyright

© IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.