• A Comparative Analysis of Shared Cache Management Techniques for Chip Multiprocessors 

      Grøvdal, Christian Vik (Master thesis, 2013)
      In this thesis we present a comparative analysis of shared cache management techniquesfor chip multiprocessors. When sharing an unmanaged cache between multiplecores, destructive interference can reduce the performance of ...
    • Accelerating Sparse Linear Algebra and Deep Neural Networks on Reconfigurable Platforms 

      Umuroglu, Yaman (Doctoral theses at NTNU;2018:1, Doctoral thesis, 2018)
      Regardless of whether the chosen figure of merit is execution time, throughput, battery life for an embedded system or total cost of ownership for a datacenter, today’s computers are fundamentally limited by their energy ...
    • Challenges of Reducing Cycle-Accurate Simulation Time for TBP Applications 

      Iordan, Alexandru Ciprian; Jahre, Magnus; Natvig, Lasse (Journal article; Peer reviewed, 2013)
      Cycle-accurate simulation is an important tool that depends on the computational power of supercomputers. Unfortunately, simulations of modern multi-core platforms can take weeks or months. In this paper, we look into the ...
    • Computing in Unstructured Matter 

      Lykkebø, Odd Rune S. (Doctoral theses at NTNU;2017:90, Doctoral thesis, 2017)
    • DCMI: A Scalable Strategy for Accelerating Iterative Stencil Loops on FPGAs 

      Koraei, Mostafa; Fatemi, Omid; Jahre, Magnus (Journal article; Peer reviewed, 2019)
      Iterative Stencil Loops (ISLs) are the key kernel within a range of compute-intensive applications. To accelerate ISLs with Field Programmable Gate Arrays, it is critical to exploit parallelism (1) among elements within ...
    • Designing a Virtual Memory System for the SHMAC Research Infrastructure 

      Sutterud, Audun (Master thesis, 2017)
      The Single-ISA Heterogeneous MAny-core Computer (SHMAC) is an infrastructure for realizing heterogeneous computing systems. The current SHMAC prototype does not have a Memory Management Unit (MMU). An MMU would simplify ...
    • DTP: Enabling Exhaustive Exploration of FPGA Temporal Partitions for Streaming HPC Applications 

      Koraei, Mostafa; Jahre, Magnus; Fatemi, S. Omid (Chapter; Peer reviewed, 2017)
      Reconfigurable computing systems show great promise for accelerating streaming HPC applications because of their low power consumption and high performance. However, mapping an HPC application to a reconfigurable system ...
    • Evaluating Shared Last Level Cache Partitioning Algorithms 

      aan de Wiel, Thomas Alexander (Master thesis, 2017)
      Over the past few decades, the development of Dynamic Random-Access Memory (DRAM) has mainly focused on increasing capacity and lowering costs. However, microprocessor development has experienced enormous improvements in ...
    • Evaluation of Cache Management Algorithms for Shared Last Level Caches 

      Olsen, Runar Bergheim (Master thesis, 2015)
      The performance gap between processors and main memory has been growing over the last decades. Fast memory structures know as caches were introduced to mitigate some of the effects of this gap. After processor manufacturers ...
    • Evolution in Materio: - En Kaotisk Tilnærming 

      Flogard, Eirik Lund (Master thesis, 2015)
      Denne avhandlingen omhandler et konsept kalt Evolution in Materio, der man gjennom datakontrollert evolusjon forsøker å utnytte et materies naturlige egenskaper for å løse oppgaver eller utføre beregninger. Motivasjonen ...
    • Extending OMPT to Support Grain Graph Visualization 

      Langdal, Peder Voldnes (Master thesis, 2017)
      Because of physical constraints, performance gains of single-core processors has come to a halt. Computer architects have responded by adding multiple processor cores to their designs. However, for continued performance ...
    • Extending OMPT to Support Grain Graphs 

      Langdal, Peder Voldnes; Jahre, Magnus; Muddukrishna, Ananya (Journal article, 2017)
      The upcoming profiling API standard OMPT can describe almost all profiling events required to construct grain graphs, a recent visualization that simplifies OpenMP performance analysis. We propose OMPT extensions that ...
    • FINN: A Framework for Fast, Scalable Binarized Neural Network Inference 

      Umuroglu, Yaman; Fraser, Nicholas J.; Gambardella, Giulio; Blott, Michaela; Leong, Philip W.; Jahre, Magnus; Vissers, Kees (Chapter, 2017)
      Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In ...
    • GDP: Using Dataflow Properties to Accurately Estimate Interference-Free Performance at Runtime 

      Jahre, Magnus; Eeckhout, Lieven (Journal article; Peer reviewed, 2018)
      Multi-core memory systems commonly share resources between processors. Resource sharing improves utilization at the cost of increased inter-application interference which may lead to priority inversion, missed deadlines ...
    • Generating Grain Graphs Using the OpenMP Tools API 

      Langdal, Peder Voldnes (Research report, 2017)
      Computers are becoming increasingly parallel. Many applications rely on OpenMP to divide units of work between a set of worker threads. Typically, this is done using parallel for-loops or tasking. Grain graphs is a recent ...
    • Get Out of the Valley: Power-Efficient Address Mapping for GPUs 

      Yuxi, Liu; Zhao, Xia; Jahre, Magnus; Wang, Zhenlin; Wang, Xiaolin; Lou, Yingwei; Eeckhout, Lieven (Journal article; Peer reviewed, 2018)
      GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional ...
    • Implementing a Bare-Metal Threading Library for SHMAC 

      Wikene, Håkon Opsvik (Master thesis, 2014)
      For decades, Moore's Law has stood as a symbol of the continuedperformance increases achieved through technology scaling. While Moore'sobservation has remained true for far longer than Moore himselfpredicted, it now seems ...
    • Implementing a Heterogeneous Multi-Core Prototype in an FPGA 

      Rusten, Leif Tore; Sortland, Gunnar Inge (Master thesis, 2012)
      Since the mid-1980s processor performance growth has been remarkable, with an annual growth of about 52 %. Methods such as architectural enhancements exploiting ILP and frequency scaling have been effective at increasing ...
    • Improving Energy Efficiency with Special-Purpose Accelerators 

      Fiodorov, Alexandru (Master thesis, 2013)
      The number of transistors per chip and their speed grows exponentially, but thepower dissipation per transistor is decreased slightly with each processgeneration. This leads to increased power density and heat generation, ...
    • Improving the Performance of Parallel Applications in Chip Multiprocessors with Architectural Techniques 

      Jahre, Magnus (Master thesis, 2007)
      Chip Multiprocessors (CMPs) or multi-core architectures are a new class of processor architectures. Here, multiple processing cores are placed on the same physical chip. To reach the performance potential of these architectures ...