Evaluating FIFO-based Instruction Scheduling Techniques using FPGAs

Metz, David Christoph; Jellum, Erling Rennemo

dc.contributor.advisor	Själander, Magnus
dc.contributor.advisor	Hendseth, Sverre
dc.contributor.author	Metz, David Christoph
dc.contributor.author	Jellum, Erling Rennemo
dc.date.accessioned	2021-09-23T18:05:56Z
dc.date.available	2021-09-23T18:05:56Z
dc.date.issued	2020
dc.identifier	no.ntnu:inspera:56990118:15069951
dc.identifier.uri	https://hdl.handle.net/11250/2780915
dc.description.abstract
dc.description.abstract	The performance advantage of out-of-order processors stems from their ability to extract more instruction-level parallelism (ILP) and memory-level parallelism (MLP) than in-order cores. This is largely the benefit of the dynamic out-of-order schedules they create. The downside of out-of-order scheduling is that it comes at high energy and die-area cost. We evaluate three recently proposed FIFO-based scheduling techniques found in Load Slice Core (LSC), Delay and Bypass (DnB), and CASINO. They all promise a large part of the performance gain of out-of-order scheduling, but at a much lower cost. DnB and LSC focus on extracting MLP by iteratively building load slices and giving them prece- dence in the execution order. The dependency analysis technique they employ is called Iterative Backward Dependency Analysis (IBDA). We evaluate implementability, performance, and area of the proposed IBDA as well as proposing three improved implementations of IBDA that require less area and power while providing essentially the same performance. DnB, LSC, and CASINO, the third technique, are all based around the idea of replacing the expensive, content addressable issue queue with cheaper FIFOs. We implement all these techniques based on BOOM, an open-source, RTL implementation of an out-of-order RISC-V core. We synthesize our designs and evaluate them on a Xilinx ZC707 FPGA. By instantiating our cores as part of a system on chip, we are also able to boot Linux on them. Our evaluation, using parts of the SPEC CPU2006 benchmark suite, confirms the claims that these techniques come close to the performance of a fully-fledged out-of-order core. LSC and CASINO do this while consuming noticeably fewer resources. DnB comes closest to the perfor- mance of out-of-order cores, but it fails to show area-benefits in our implementation. Additionally, we provide insights into the overheads that the BOOM core has over its smaller sibling, the in-order processor Rocket. As this form of implementation is much closer to real silicon tapeouts than simulators, it forces us to consider and analyze implementation specifics that can be ignored in high-level simulation. This provides insights into the implementability of the different techniques. Our work provides a big step towards providing accurate measurements, instead of estimates, of performance, power, and area usage for LSC, DnB, and CASINO.
dc.language
dc.publisher	NTNU
dc.title	Evaluating FIFO-based Instruction Scheduling Techniques using FPGAs
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:56990118:15069 ...
Størrelse:: 8.224Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for teknisk kybernetikk [3739]

Vis enkel innførsel