Towards Efficiently Utilizing Coarse-Grained Reconfigurable Accelerators

Eggen, Lasse Agentoft

dc.contributor.advisor	Jahre, Magnus
dc.contributor.author	Eggen, Lasse Agentoft
dc.date.accessioned	2021-09-15T16:31:49Z
dc.date.available	2021-09-15T16:31:49Z
dc.date.issued	2020
dc.identifier	no.ntnu:inspera:57320302:22809678
dc.identifier.uri	https://hdl.handle.net/11250/2777940
dc.description.abstract	The end of Dennard scaling and the imminent end of Moore's law is causing disruptive changes to the way computers are designed. There is agreement in the computer-architecture community that we probably need specialized hardware to further improve performance and increase power and energy efficiency. The industry standard is domain-specific accelerators that accelerate a common set of applications, at the cost of generality. An alternative is reconfigurable architectures which can accelerate multiple domains and thereby achieve better utilization. In this thesis, we focus on Coarse-Grained Reconfigurable Architectures (CGRAs), as they have higher theoretical performance than fine-grain alternatives, such as Field-Programmable Gate Arrays (FPGAs). We studied CGRA accelerators in the context of state-of-the-art Stream-Dataflow Architecture (SDA), and the Reconfigurable Vector Lanes (REVEL) accelerator. First, we model a set of different CGRA sizes to explore performance-scaling opportunities. Then, by varying the ratio of static-to-dynamic scheduling we assess the performance impact of a dynamic-dataflow region and to what extent the mapping algorithm is able to exploit it. Finally, we explore how performance can be improved by factoring in an active developer. We find that attaining high performance requires that the program formulation, the mapping and scheduling algorithms, and the CGRA accelerator architecture all align favorably. Due to the high complexity in the mapping and scheduling problem, we are not able to gain efficiency when we try to express more parallelism than the original REVEL workloads. We have shown empirically that increasing the CGRA size alone does not contribute to execution scalability, neither does introducing dynamic dataflow in isolation. Software-hardware cooperation is hence key to maximize the performance of CGRA accelerators. Unfortunately, we were not able to qualitatively evaluate such approaches with the constraints of a master thesis due to limitations with our chosen compiler and simulator framework (even if it is the current state-of-the-art).
dc.description.abstract	The end of Dennard scaling and the imminent end of Moore's law is causing disruptive changes to the way computers are designed. There is agreement in the computer-architecture community that we probably need specialized hardware to further improve performance and increase power and energy efficiency. The industry standard is domain-specific accelerators that accelerate a common set of applications, at the cost of generality. An alternative is reconfigurable architectures which can accelerate multiple domains and thereby achieve better utilization. In this thesis, we focus on Coarse-Grained Reconfigurable Architectures (CGRAs), as they have higher theoretical performance than fine-grain alternatives, such as Field-Programmable Gate Arrays (FPGAs). We studied CGRA accelerators in the context of state-of-the-art Stream-Dataflow Architecture (SDA), and the Reconfigurable Vector Lanes (REVEL) accelerator. First, we model a set of different CGRA sizes to explore performance-scaling opportunities. Then, by varying the ratio of static-to-dynamic scheduling we assess the performance impact of a dynamic-dataflow region and to what extent the mapping algorithm is able to exploit it. Finally, we explore how performance can be improved by factoring in an active developer. We find that attaining high performance requires that the program formulation, the mapping and scheduling algorithms, and the CGRA accelerator architecture all align favorably. Due to the high complexity in the mapping and scheduling problem, we are not able to gain efficiency when we try to express more parallelism than the original REVEL workloads. We have shown empirically that increasing the CGRA size alone does not contribute to execution scalability, neither does introducing dynamic dataflow in isolation. Software-hardware cooperation is hence key to maximize the performance of CGRA accelerators. Unfortunately, we were not able to qualitatively evaluate such approaches with the constraints of a master thesis due to limitations with our chosen compiler and simulator framework (even if it is the current state-of-the-art).
dc.language
dc.publisher	NTNU
dc.title	Towards Efficiently Utilizing Coarse-Grained Reconfigurable Accelerators
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:57320302:22809 ...
Størrelse:: 7.488Mb
Format:: Ukjent

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6558]

Vis enkel innførsel