Implementing Data Cache Access Memoization (DCAM) in hardware to measure L1 DC and DTLB energy efficiency

Vedvik, Edgar

Vedvik, Edgar

Master thesis

Åpne

no.ntnu:inspera:2531113.pdf (1.291Mb)

Permanent lenke

http://hdl.handle.net/11250/2642680

Utgivelsesdato

2019

Metadata

Vis full innførsel

Samlinger

Institutt for datateknologi og informatikk [6556]

Sammendrag

Nivå-1 data-hurtiglager (L1 DC) og mellomlager for dataoversetting (DTLB) er essensielle i nåtidens minnehierarki for å gi raskere tilgang til data og redusere antall ventesykluser. Disse strukturene bli aksessert ofte, og bruker betydelig mer energi enn prosessorregistrer. En stor del av prosessorens energibudsjett går med til å betjene data gjennom nivå-1 hurtiglageret og dataoversettingsmellomlageret. (Stokes et al., 2019) foreslo

nylig «data cache access memoisation» (DCAM), som er en teknikk for å redusere energiforbruket i disse strukturene. Vi vil utforske ytelsen, energiforbruket og den kristiske

stien til DCAM-teknikken og se hvordan den sammenligner med en standard implementasjon. DCAM-teknikken identifiserer den siste instruksjonen som oppdaterer et register

som senere blir brukt av en minneinstruksjon. Ved å utføre tagg-sjekken sammen med

instruksjonen som oppdaterte registeret sist, kan vi aksessere kun én datatabell i et sett-assosiativt hurtiglager. Ved å memoisere denne informasjonen mellom instruksjoner er

vi i stand til å redusere antall DTLB-aksesser og L1 DC-tagg-sjekker. Vi viser at en implementasjon av denne teknikken ikke forlenger den kristiske stien, og bruker betydelig

mindre kraft enn en standard implementasjon.

The level-1 data cache (L1 DC) and data translation lookaside buffer (DTLB) are essential in contemporary memory hierarchies by providing faster data access and reducing the

number of stall cycles in processors. Accesses to these structures are common and they

use significantly more energy than registers. A large portion of a processors energy budget is spent servicing data through the L1 DC and DTLB. Stokes et al. (2019) recently

proposed the data cache access memoisation (DCAM) technique to reduce energy usage

by the L1 DC and DTLB. We will implement this technique in VHDL and test it on an

FPGA. We will also investigate the performance, energy usage and critical path of the

technique. DCAM identifies the last instruction to update a register before it is referenced

by a memory instruction. By performing the tag check along with this prepare to access

memory (PAM) instruction, we are able to access a single data array in a set associative

cache. By memoising this information between instructions, we are able to reduce the

number of DTLB accesses and L1 DC tag checks. We show an implementation of the

DCAM-technique that does not increase the critical path and uses significantly less power

than a standard pipeline.

Utgiver

NTNU