Evaluating Shared Last Level Cache Partitioning Algorithms

aan de Wiel, Thomas Alexander

dc.contributor.advisor	Jahre, Magnus
dc.contributor.author	aan de Wiel, Thomas Alexander
dc.date.accessioned	2017-09-12T14:00:28Z
dc.date.available	2017-09-12T14:00:28Z
dc.date.created	2017-06-11
dc.date.issued	2017
dc.identifier	ntnudaim:16708
dc.identifier.uri	http://hdl.handle.net/11250/2454354
dc.description.abstract	Over the past few decades, the development of Dynamic Random-Access Memory (DRAM) has mainly focused on increasing capacity and lowering costs. However, microprocessor development has experienced enormous improvements in latency. This has led to an increasing memory latency-gap, that unaddressed can lead to significant underutilization of available microprocessor resources. To bridge this gap, memory hierarchies including several levels of cache memories have been introduced. Chip Multiprocessors (CMPs) or multi-core architectures commonly share the Last Level Cache (LLC). Sharing allows for destructive interference, as several cores can start to compete for cache space. With CMPs becoming commonplace and as their core count increases, scalable algorithms that partition the LLC among the cores of a CMP are becoming increasingly important. This thesis describes the implementation of the zcache and the cache partitioning algorithm Vantage in a simulation framework based on Sniper, a parallel multi-core simulator. We utilize this simulation framework to establish the performance improvements of Vantage and the cache partitioning algorithms Thread-Aware Dynamic Insertion Policy (TADIP), Dynamic Re-Reference Interval Prediction (DRRIP), Promotion/Insertion Pseudo Partitioning (PIPP), Utility-Based Cache Partitioning (UCP) and Probabilistic Shared Cache Management (PriSM) over a range of architectural configurations, with the conventional Least Recently Used (LRU) algorithm as baseline. Moreover, we identify several root causes that lead to the observed performance differences. We find that Vantage, by using the highly associative zcache, attains the highest performance improvements and is the most scalable cache partitioning algorithm in our evaluation. The scalability and performance of cache partitioning algorithms utilizing the conventional set-associative cache are mainly limited by the restricted associativity that the set-associative cache provides. We further find that although individual improvements in the System Throughput (STP) can reach up to approximately 20% in our evaluation, the overall impact of cache partitioning is minor, improving the STP and the Harmonic Mean of Speedups (HMS) by a maximum of 3% with respect to LRU.
dc.language	eng
dc.publisher	NTNU
dc.subject	Embedded Computing Systems
dc.title	Evaluating Shared Last Level Cache Partitioning Algorithms
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: 16708_FULLTEXT.pdf
Størrelse:: 1.096Mb
Format:: PDF

Åpne

Filnavn:: 16708_ATTACHMENT.zip
Størrelse:: 1.483Mb
Format:: application/zip

Åpne

Filnavn:: 16708_COVER.pdf
Størrelse:: 1.556Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6544]

Vis enkel innførsel