Show simple item record

dc.contributor.authorKumar, Rakesh
dc.contributor.authorGrot, Boris
dc.date.accessioned2023-03-07T16:07:12Z
dc.date.available2023-03-07T16:07:12Z
dc.date.created2022-01-20T12:38:23Z
dc.date.issued2022
dc.identifier.citationACM Transactions on Computer Systems. 2022, 38 (3-4), 1-30.en_US
dc.identifier.issn0734-2071
dc.identifier.urihttps://hdl.handle.net/11250/3056538
dc.description.abstractThe front-end bottleneck is a well-established problem in server workloads owing to their deep software stacks and large instruction footprints. Despite years of research into effective L1-I and BTB prefetching, state-of-the-art techniques force a trade-off between metadata storage cost and performance. Temporal Stream prefetchers deliver high performance but require a prohibitive amount of metadata to accommodate the temporal history. Meanwhile, BTB-directed prefetchers incur low cost by using the existing in-core branch prediction structures but fall short on performance due to BTB’s inability to capture the massive control flow working set of server applications. This work overcomes the fundamental limitation of BTB-directed prefetchers, which is capturing a large control flow working set within an affordable BTB storage budget. We re-envision the BTB organization to maximize its control flow coverage by observing that an application’s instruction footprint can be mapped as a combination of its unconditional branch working set and, for each unconditional branch, a spatial encoding of the cache blocks around the branch target. Effectively capturing a map of the application’s instruction footprint in the BTB enables highly effective BTB-directed prefetching that outperforms the state-of-the-art prefetchers by up to 10% for equivalent storage budget.en_US
dc.language.isoengen_US
dc.publisherAssociation for Computing Machinery (ACM)en_US
dc.titleShooting Down the Server Front-End Bottlenecken_US
dc.title.alternativeShooting Down the Server Front-End Bottlenecken_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionacceptedVersionen_US
dc.source.pagenumber1-30en_US
dc.source.volume38en_US
dc.source.journalACM Transactions on Computer Systemsen_US
dc.source.issue3-4en_US
dc.identifier.doi10.1145/3484492
dc.identifier.cristin1986135
dc.relation.projectNorges forskningsråd: 302279en_US
cristin.ispublishedtrue
cristin.fulltextpostprint
cristin.qualitycode2


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record