Implementing Data Cache Access Memoization (DCAM) in hardware to measure L1 DC and DTLB energy efficiency

Vedvik, Edgar

dc.contributor.advisor	Själander, Magnus
dc.contributor.author	Vedvik, Edgar
dc.date.accessioned	2020-02-19T15:00:22Z
dc.date.available	2020-02-19T15:00:22Z
dc.date.issued	2019
dc.identifier.uri	http://hdl.handle.net/11250/2642680
dc.description.abstract	Nivå-1 data-hurtiglager (L1 DC) og mellomlager for dataoversetting (DTLB) er essensielle i nåtidens minnehierarki for å gi raskere tilgang til data og redusere antall ventesykluser. Disse strukturene bli aksessert ofte, og bruker betydelig mer energi enn prosessorregistrer. En stor del av prosessorens energibudsjett går med til å betjene data gjennom nivå-1 hurtiglageret og dataoversettingsmellomlageret. (Stokes et al., 2019) foreslo nylig «data cache access memoisation» (DCAM), som er en teknikk for å redusere energiforbruket i disse strukturene. Vi vil utforske ytelsen, energiforbruket og den kristiske stien til DCAM-teknikken og se hvordan den sammenligner med en standard implementasjon. DCAM-teknikken identifiserer den siste instruksjonen som oppdaterer et register som senere blir brukt av en minneinstruksjon. Ved å utføre tagg-sjekken sammen med instruksjonen som oppdaterte registeret sist, kan vi aksessere kun én datatabell i et sett-assosiativt hurtiglager. Ved å memoisere denne informasjonen mellom instruksjoner er vi i stand til å redusere antall DTLB-aksesser og L1 DC-tagg-sjekker. Vi viser at en implementasjon av denne teknikken ikke forlenger den kristiske stien, og bruker betydelig mindre kraft enn en standard implementasjon.
dc.description.abstract	The level-1 data cache (L1 DC) and data translation lookaside buffer (DTLB) are essential in contemporary memory hierarchies by providing faster data access and reducing the number of stall cycles in processors. Accesses to these structures are common and they use significantly more energy than registers. A large portion of a processors energy budget is spent servicing data through the L1 DC and DTLB. Stokes et al. (2019) recently proposed the data cache access memoisation (DCAM) technique to reduce energy usage by the L1 DC and DTLB. We will implement this technique in VHDL and test it on an FPGA. We will also investigate the performance, energy usage and critical path of the technique. DCAM identifies the last instruction to update a register before it is referenced by a memory instruction. By performing the tag check along with this prepare to access memory (PAM) instruction, we are able to access a single data array in a set associative cache. By memoising this information between instructions, we are able to reduce the number of DTLB accesses and L1 DC tag checks. We show an implementation of the DCAM-technique that does not increase the critical path and uses significantly less power than a standard pipeline.
dc.language	eng
dc.publisher	NTNU
dc.title	Implementing Data Cache Access Memoization (DCAM) in hardware to measure L1 DC and DTLB energy efficiency
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:2531113.pdf
Størrelse:: 1.291Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6551]

Vis enkel innførsel