SAC: Sharing-Aware Caching in Multi-Chip GPUs
Original version
10.1145/3579371.3589078Abstract
Bandwidth non-uniformity in multi-chip GPUs poses a major design challenge for its last-level cache (LLC) architecture. Whereas a memory-side LLC caches data from the local memory partition while being accessible by all chips, an SM-side LLC is private to a chip while caching data from all memory partitions. We find that some workloads prefer a memory-side LLC while others prefer an SM-side LLC, and this preference solely depends on which organization maximizes the effective LLC bandwidth. In contrast to prior work which optimizes bandwidth beyond the LLC, we make the observation that the effective bandwidth ahead of the LLC is critical to end-to-end application performance. We propose Sharing-Aware Caching (SAC) to adopt either a memory-side or SM-side LLC organization by dynamically reconfiguring the routing policies in the intra-chip interconnection network and LLC controllers. SAC is driven by a simple and lightweight analytical model that predicts the impact of data sharing across chips on the effective LLC bandwidth. SAC improves average performance by 76% and 12% (and up to 157% and 49%) compared to a memory-side and SM-side LLC, respectively. We demonstrate significant performance improvements across the design space and across workloads.