Vis enkel innførsel

dc.contributor.advisorJahre, Magnus
dc.contributor.authorEggen, Lasse Agentoft
dc.date.accessioned2021-09-15T16:31:49Z
dc.date.available2021-09-15T16:31:49Z
dc.date.issued2020
dc.identifierno.ntnu:inspera:57320302:22809678
dc.identifier.urihttps://hdl.handle.net/11250/2777940
dc.description.abstractThe end of Dennard scaling and the imminent end of Moore's law is causing disruptive changes to the way computers are designed. There is agreement in the computer-architecture community that we probably need specialized hardware to further improve performance and increase power and energy efficiency. The industry standard is domain-specific accelerators that accelerate a common set of applications, at the cost of generality. An alternative is reconfigurable architectures which can accelerate multiple domains and thereby achieve better utilization. In this thesis, we focus on Coarse-Grained Reconfigurable Architectures (CGRAs), as they have higher theoretical performance than fine-grain alternatives, such as Field-Programmable Gate Arrays (FPGAs). We studied CGRA accelerators in the context of state-of-the-art Stream-Dataflow Architecture (SDA), and the Reconfigurable Vector Lanes (REVEL) accelerator. First, we model a set of different CGRA sizes to explore performance-scaling opportunities. Then, by varying the ratio of static-to-dynamic scheduling we assess the performance impact of a dynamic-dataflow region and to what extent the mapping algorithm is able to exploit it. Finally, we explore how performance can be improved by factoring in an active developer. We find that attaining high performance requires that the program formulation, the mapping and scheduling algorithms, and the CGRA accelerator architecture all align favorably. Due to the high complexity in the mapping and scheduling problem, we are not able to gain efficiency when we try to express more parallelism than the original REVEL workloads. We have shown empirically that increasing the CGRA size alone does not contribute to execution scalability, neither does introducing dynamic dataflow in isolation. Software-hardware cooperation is hence key to maximize the performance of CGRA accelerators. Unfortunately, we were not able to qualitatively evaluate such approaches with the constraints of a master thesis due to limitations with our chosen compiler and simulator framework (even if it is the current state-of-the-art).
dc.description.abstractThe end of Dennard scaling and the imminent end of Moore's law is causing disruptive changes to the way computers are designed. There is agreement in the computer-architecture community that we probably need specialized hardware to further improve performance and increase power and energy efficiency. The industry standard is domain-specific accelerators that accelerate a common set of applications, at the cost of generality. An alternative is reconfigurable architectures which can accelerate multiple domains and thereby achieve better utilization. In this thesis, we focus on Coarse-Grained Reconfigurable Architectures (CGRAs), as they have higher theoretical performance than fine-grain alternatives, such as Field-Programmable Gate Arrays (FPGAs). We studied CGRA accelerators in the context of state-of-the-art Stream-Dataflow Architecture (SDA), and the Reconfigurable Vector Lanes (REVEL) accelerator. First, we model a set of different CGRA sizes to explore performance-scaling opportunities. Then, by varying the ratio of static-to-dynamic scheduling we assess the performance impact of a dynamic-dataflow region and to what extent the mapping algorithm is able to exploit it. Finally, we explore how performance can be improved by factoring in an active developer. We find that attaining high performance requires that the program formulation, the mapping and scheduling algorithms, and the CGRA accelerator architecture all align favorably. Due to the high complexity in the mapping and scheduling problem, we are not able to gain efficiency when we try to express more parallelism than the original REVEL workloads. We have shown empirically that increasing the CGRA size alone does not contribute to execution scalability, neither does introducing dynamic dataflow in isolation. Software-hardware cooperation is hence key to maximize the performance of CGRA accelerators. Unfortunately, we were not able to qualitatively evaluate such approaches with the constraints of a master thesis due to limitations with our chosen compiler and simulator framework (even if it is the current state-of-the-art).
dc.language
dc.publisherNTNU
dc.titleTowards Efficiently Utilizing Coarse-Grained Reconfigurable Accelerators
dc.typeMaster thesis


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel