SHMACsim: A Cycle-accurate Simulation Infrastructure for the Heterogeneous SHMAC Multi-Core Prototype
Abstract
The fast-paced development trend in microprocessor performance characterized by Moore?s Law can no longer continue unperturbed. Shrinking semiconductor node size still translates into increasing transistor count but not directly into performance, since thermal and power constraints are limiting the amount of transistors that can be used simultaneously. One way of exploiting this ?dark silicon? is building heterogeneous systems containing specialized accelerators and cores. The SHMAC project aims to provide a research platform for heterogeneous systems research. An FPGA prototype has been constructed for the SHMAC, but to have a rapid implement-evaluate cycle for system policies, software simulation is needed.This thesis covers the design and implementation of a cycle-accurate simulation infrastructure for the SHMAC. Additionally, the current state ofthe architecture is evaluated with a set of micro-benchmarks and several improvements are proposed. The constructed infrastructure offers a highlyconfigurable, cycle-accurate simulation of the SHMAC FPGA prototype. A micro-benchmark-based analysis of the current state of the architecture exposes the router hop latency and throughput as the greatest bottlenecks. To address this a dual-port RAM slave with router bypass is implemented,resulting in 3.5× instruction fetch speedup and contributing to overall system performance. Improvements contributing traffic independent clock counting and bootstrapping functionality, and a network packet lifetime instrumentation method are also described.