Reducing Memory Latency by Improving Resource Utilization
Abstract
Integrated circuits have been in constant progression since the first prototype in 1958, with the semiconductor industry maintaining a constant rate of miniaturisation of transistors and wires. Up until about the year 2002, processor performance increased by about 55% per year. Since then, limitations on power, ILP and memory latency have slowed the increase in uniprocessor performance to about 20% per year. Although the capacity of DRAM increases by about 40% per year, the latency only decreases by about 6 { 7% per year. This performance gap between the processor and DRAM leads to a problem known as the memory wall.
This thesis aims to improve system memory latency by leveraging available resources with excess capacity. This has been achieved through multiple techniques, but mainly by using excess bandwidth and improving scheduling policies.
The first approach presented, destructive read DRAM, changes the underlying assumptions about the contents of a DRAM cell being unchanged after a read. The latency of a read is reduced, but the rest of the memory system requires changes to conserve data.
Prefetching predicts what data is needed in the future and fetches that data into the cache before it is referenced. This dissertation presents a technique for generating highly accurate prefetches with good timeliness called Delta Correlating Prediction Tables (DCPT). DCPT uses a table indexed by the load's address to store the delta history of individual loads. Delta correlation is then used to predict future misses. Delta Correlating Prediction Tables with Partial Matching (DCPT-P) extends DCPT by introducing L1 hoisting which moves data from the L2 to the L1 to further increase performance. In addition, DCPT-P leverages partial matching which reduces the spatial resolution of deltas to expose more patterns.
The interaction between the memory controller and the prefetcher is especially important, because of the complex 3D structure of modern DRAM. Utilizing open pages can increase the performance of the system significantly. Memory controllers can increase bandwidth utilization and reduce latency at the same time by scheduling prefetches such that the number of page hits are maximized. The interaction between the program, prefetcher and the memory controller is explored.
This thesis examines the impact of having a shared memory system in a CMP. When resources are shared, one core might interfere with another core's execution by delaying memory requests or displacing useful data in the cache. This effect is quantified and which components are most prone to interference between cores identified. Finally, we present a framework for measuring interference at runtime.
Has parts
Dybdahl, H; Grannaes, M; Natvig, L. Cache write-back schemes for embedded destructive-read DRAM. Lecture Nnotes in Computer Science - ARCHITECTURE OF COMPUTING SYSTEMS - ARCS 2006, PROCEEDINGS : 145-159, 2006.Dybdahl, Haakon; Kjeldsberg, Per Gunnar; Grannæs, Marius; Natvig, Lasse. Destructive-Read in Embedded DRAM, Impacton Power Consumption. Journal of Embedded Computing. (ISSN 1740-4460). 2(2): 249-260, 2006.
Grannæs, Marius; Natvig, Lasse. Hardware Prefetching Using Shadow Tagging. In CMP-MSI: 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008, 2008.
Grannæs, Marius; Jahre, Magnus; Natvig, Lasse. Low-Cost Open-Page Prefetch Scheduling in Chip Multiprocessors. XXVI IEEE International Conference on Computer Design (ICCD) 2008: 390-396, 2008. 10.1109/ICCD.2008.4751890.
Grannæs, Marius; Jahre, Magnus; Natvig, Lasse. Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables. Data Prefetching Chamionship - 1, 2009, 2009.
Jahre, M; Grannæs, M; Natvig, L. A Quantitative Study of Memory System Interference in Chip Multiprocessor Architectures. 11th IEEE International Conference on High Performance Computing and Communications (HPCC) 2009: 622-629, 2009. 10.1109/HPCC.2009.77.
Grannæs, Marius; Jahre, Magnus; Natvig, Lasse. Multi-Level Hardware Prefetching using Low Complexity Delta Correlating Prediction Tables with Partial Matching. Lecture Notes in Computer Science = Lecture notes in artificial intelligence: 247-261, 2010. 10.1007/978-3-642-11515-8_19.
Jahre, Magnus; Grannaes, Marius; Natvig, Lasse. DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems. Lecture Notes in Computer Science. (ISSN 0302-9743). 5952: 292-306, 2010. 10.1007/978-3-642-11515-8_22.
Grannæs, Marius; Jahre, Magnus; Natvig, Lasse. Exploring the Prefetcher/Memory Controller Design Space: An Opportunistic Prefetch Scheduling Strategy. .