• Fast and Accurate Edge Computing Energy Modeling and DVFS Implementation in GEM5 Using System Call Emulation Mode 

      Yassin, Yahya Hussain; Jahre, Magnus; Kjeldsberg, Per Gunnar; Aunet, Snorre; Catthoor, Francky (Peer reviewed; Journal article, 2021)
      Stringent power budgets in battery powered platforms have led to the development of energy saving techniques such as Dynamic Voltage and Frequency scaling (DVFS). For embedded system designers to be able to ripe the benefits ...
    • Fast Call Graph Profiling 

      Smithsen, Eirik (Master thesis, 2020)
      CPUer er ikke-spesialiserte chiper som kan utføre alle beregninger, og de er ikke optimalisert til å utføre noen beregninger mye raskere enn andre. Det er mulig å lage chiper som er optimaliserte til å utføre et begrenset ...
    • FINN: A Framework for Fast, Scalable Binarized Neural Network Inference 

      Umuroglu, Yaman; Fraser, Nicholas J.; Gambardella, Giulio; Blott, Michaela; Leong, Philip W.; Jahre, Magnus; Vissers, Kees (Chapter, 2017)
      Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In ...
    • GDP: Using Dataflow Properties to Accurately Estimate Interference-Free Performance at Runtime 

      Jahre, Magnus; Eeckhout, Lieven (Journal article; Peer reviewed, 2018)
      Multi-core memory systems commonly share resources between processors. Resource sharing improves utilization at the cost of increased inter-application interference which may lead to priority inversion, missed deadlines ...
    • Generating Grain Graphs Using the OpenMP Tools API 

      Langdal, Peder Voldnes (Research report, 2017)
      Computers are becoming increasingly parallel. Many applications rely on OpenMP to divide units of work between a set of worker threads. Typically, this is done using parallel for-loops or tasking. Grain graphs is a recent ...
    • Get Out of the Valley: Power-Efficient Address Mapping for GPUs 

      Yuxi, Liu; Zhao, Xia; Jahre, Magnus; Wang, Zhenlin; Wang, Xiaolin; Lou, Yingwei; Eeckhout, Lieven (Journal article; Peer reviewed, 2018)
      GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional ...
    • HSM: A Hybrid Slowdown Model for Multitasking GPUs 

      Zhao, Xia; Jahre, Magnus; Eeckhout, Lieven (Chapter, 2020)
      Graphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in different ways --- leading to suboptimal resource ...
    • Implementing a Bare-Metal Threading Library for SHMAC 

      Wikene, Håkon Opsvik (Master thesis, 2014)
      For decades, Moore's Law has stood as a symbol of the continuedperformance increases achieved through technology scaling. While Moore'sobservation has remained true for far longer than Moore himselfpredicted, it now seems ...
    • Implementing a Heterogeneous Multi-Core Prototype in an FPGA 

      Rusten, Leif Tore; Sortland, Gunnar Inge (Master thesis, 2012)
      Since the mid-1980s processor performance growth has been remarkable, with an annual growth of about 52 %. Methods such as architectural enhancements exploiting ILP and frequency scaling have been effective at increasing ...
    • Improving Energy Efficiency with Special-Purpose Accelerators 

      Fiodorov, Alexandru (Master thesis, 2013)
      The number of transistors per chip and their speed grows exponentially, but thepower dissipation per transistor is decreased slightly with each processgeneration. This leads to increased power density and heat generation, ...
    • Improving Fetch and Issue Bandwidth in the Vortex GPU 

      Aurud, Lars Murud (Master thesis, 2023)
      Softwaresimulering er en mye brukt metode for å forske på datamaskin arkitekturer. Dessverre er det tregt, spesielt for større parallelle arkitekturer, som GPUer. En detaljert simulering av en GPU kan ta opptil flere dager. ...
    • Improving the first-level cache bandwidth in the Berkeley Out-of-Order Machine 

      Nesset, Erling Feet (Master thesis, 2023)
      Ettersom moderne prosessorer de siste tiårene har truffet minnegapet, har de brukt minne-nivå-parallelisme(MLP) for skjule forskjellen i ytelse mellom prosessoren og minnet. For å utnytte MLP trenger prosessorer nok ...
    • Improving the Performance of Parallel Applications in Chip Multiprocessors with Architectural Techniques 

      Jahre, Magnus (Master thesis, 2007)
      Chip Multiprocessors (CMPs) or multi-core architectures are a new class of processor architectures. Here, multiple processing cores are placed on the same physical chip. To reach the performance potential of these architectures ...
    • Improving the Performance of Processor Core Simulation in the M5 Simulator 

      Bertheussen, Håkon (Master thesis, 2008)
      Simulators are often used to evaluate new ideas in computer architecture research. Unfortunately, detailed simulation is computationally expensive, leading to long simulation turn-around times. This is particularly true ...
    • Investigating Performance Variability on Multi-core Processors 

      Bru, Christer Emil Haga (Master thesis, 2014)
      Performance variability is important because it implies that performance is not always as good as it could have been. Running the same benchmark multiple times will give you different running times. A variable total runtime ...
    • Investigating the Performance Scalability of the Vortex GPU 

      Rekdal, Markus (Master thesis, 2022)
      Programvaresimulert datamaskinarkitektur-evaluering er treigt, spesielt for store fleir-kjerne arkitekturar. Ved å bruke FPGA-akselerert evaluering kan ein minske avstanden mellom simulering og prototyping, slik at ein kan ...
    • LMT: Accurate and Resource-Scalable Slowdown Prediction 

      Salvesen, Peter; Jahre, Magnus (Peer reviewed; Journal article, 2022)
      Multi-core processors suffer from inter-application interference which makes the performance of an application depend on the behavior of the applications it happens to be co-scheduled with. This results in performance ...
    • Managing Shared Resources in Chip Multiprocessor Memory Systems 

      Jahre, Magnus (Doktoravhandlinger ved NTNU, 1503-8181; 2010:159, Doctoral thesis, 2010)
      Chip Multiprocessors (CMPs) have become the architecture of choice for high-performance general-purpose processors. CMPs often share memory system units between processes. This may result in independent processes competing ...
    • MDM: The GPU Memory Divergence Model 

      Wang, Lu; Jahre, Magnus; Adileh, Almutaz; Eeckhout, Lieven (Chapter, 2020)
      Analytical models enable architects to carry out early-stage design space exploration several orders of magnitude faster than cycle-accurate simulation by capturing first-order performance phenomena with a set of mathematical ...
    • Minimizing the Energy Consumption of Soft Real-Time Applications on a Multi-Core Ultra-Low-Power Device 

      Aase, Eirik Vale (Master thesis, 2020)
      Tingenes Internett (Internet of Things (IoT)) gjør at milliarder of ultra-lav-effekt-enheter (ultra-low-power (ULP) devices) blir deployert i alle deler av samfunnet vårt. Slike enheter utfører en rekke funksjoner, periodisk ...