• A Comparative Analysis of Shared Cache Management Techniques for Chip Multiprocessors 

      Grøvdal, Christian Vik (Master thesis, 2013)
      In this thesis we present a comparative analysis of shared cache management techniquesfor chip multiprocessors. When sharing an unmanaged cache between multiplecores, destructive interference can reduce the performance of ...
    • Accelerating LBM on a Tightly-Coupled Field Programmable Gate Array 

      Vázquez Maceiras, Mateo (Master thesis, 2021)
      Det er ikke lenger mulig å anvende Dennard's prinsipper til å skalere integrerte kretser, og det forventes at Moore's lov snart vil opphøre. Dette har ført til en voldsom interesse for nye metoder for å oppnå ytelsesforbedring ...
    • Accelerating Object Detection for Agricultural Robotics 

      Boganes, Jørgen (Master thesis, 2020)
      Innenfor agrikulturell teknologi - eller agritech - er det å høste inn frukt en dyr og tidkrevende prosess. Dette er vanligvis utført av menneskelig arbeidskraft, og agritech er derfor et felt hvor automatisering har stort ...
    • Accelerating Sparse Linear Algebra and Deep Neural Networks on Reconfigurable Platforms 

      Umuroglu, Yaman (Doctoral theses at NTNU;2018:1, Doctoral thesis, 2018)
      Regardless of whether the chosen figure of merit is execution time, throughput, battery life for an embedded system or total cost of ownership for a datacenter, today’s computers are fundamentally limited by their energy ...
    • Challenges of Reducing Cycle-Accurate Simulation Time for TBP Applications 

      Iordan, Alexandru Ciprian; Jahre, Magnus; Natvig, Lasse (Journal article; Peer reviewed, 2013)
      Cycle-accurate simulation is an important tool that depends on the computational power of supercomputers. Unfortunately, simulations of modern multi-core platforms can take weeks or months. In this paper, we look into the ...
    • Computing in Unstructured Matter 

      Lykkebø, Odd Rune S. (Doctoral theses at NTNU;2017:90, Doctoral thesis, 2017)
    • DCMI: A Scalable Strategy for Accelerating Iterative Stencil Loops on FPGAs 

      Koraei, Mostafa; Fatemi, Omid; Jahre, Magnus (Journal article; Peer reviewed, 2019)
      Iterative Stencil Loops (ISLs) are the key kernel within a range of compute-intensive applications. To accelerate ISLs with Field Programmable Gate Arrays, it is critical to exploit parallelism (1) among elements within ...
    • Designing a Virtual Memory System for the SHMAC Research Infrastructure 

      Sutterud, Audun (Master thesis, 2017)
      The Single-ISA Heterogeneous MAny-core Computer (SHMAC) is an infrastructure for realizing heterogeneous computing systems. The current SHMAC prototype does not have a Memory Management Unit (MMU). An MMU would simplify ...
    • DTP: Enabling Exhaustive Exploration of FPGA Temporal Partitions for Streaming HPC Applications 

      Koraei, Mostafa; Jahre, Magnus; Fatemi, S. Omid (Chapter; Peer reviewed, 2017)
      Reconfigurable computing systems show great promise for accelerating streaming HPC applications because of their low power consumption and high performance. However, mapping an HPC application to a reconfigurable system ...
    • Evaluating Shared Last Level Cache Partitioning Algorithms 

      aan de Wiel, Thomas Alexander (Master thesis, 2017)
      Over the past few decades, the development of Dynamic Random-Access Memory (DRAM) has mainly focused on increasing capacity and lowering costs. However, microprocessor development has experienced enormous improvements in ...
    • Evaluation of Cache Management Algorithms for Shared Last Level Caches 

      Olsen, Runar Bergheim (Master thesis, 2015)
      The performance gap between processors and main memory has been growing over the last decades. Fast memory structures know as caches were introduced to mitigate some of the effects of this gap. After processor manufacturers ...
    • Evolution in Materio: - En Kaotisk Tilnærming 

      Flogard, Eirik Lund (Master thesis, 2015)
      Denne avhandlingen omhandler et konsept kalt Evolution in Materio, der man gjennom datakontrollert evolusjon forsøker å utnytte et materies naturlige egenskaper for å løse oppgaver eller utføre beregninger. Motivasjonen ...
    • Extending OMPT to Support Grain Graph Visualization 

      Langdal, Peder Voldnes (Master thesis, 2017)
      Because of physical constraints, performance gains of single-core processors has come to a halt. Computer architects have responded by adding multiple processor cores to their designs. However, for continued performance ...
    • Extending OMPT to Support Grain Graphs 

      Langdal, Peder Voldnes; Jahre, Magnus; Muddukrishna, Ananya (Journal article, 2017)
      The upcoming profiling API standard OMPT can describe almost all profiling events required to construct grain graphs, a recent visualization that simplifies OpenMP performance analysis. We propose OMPT extensions that ...
    • Fast Call Graph Profiling 

      Smithsen, Eirik (Master thesis, 2020)
      CPUer er ikke-spesialiserte chiper som kan utføre alle beregninger, og de er ikke optimalisert til å utføre noen beregninger mye raskere enn andre. Det er mulig å lage chiper som er optimaliserte til å utføre et begrenset ...
    • FINN: A Framework for Fast, Scalable Binarized Neural Network Inference 

      Umuroglu, Yaman; Fraser, Nicholas J.; Gambardella, Giulio; Blott, Michaela; Leong, Philip W.; Jahre, Magnus; Vissers, Kees (Chapter, 2017)
      Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In ...
    • GDP: Using Dataflow Properties to Accurately Estimate Interference-Free Performance at Runtime 

      Jahre, Magnus; Eeckhout, Lieven (Journal article; Peer reviewed, 2018)
      Multi-core memory systems commonly share resources between processors. Resource sharing improves utilization at the cost of increased inter-application interference which may lead to priority inversion, missed deadlines ...
    • Generating Grain Graphs Using the OpenMP Tools API 

      Langdal, Peder Voldnes (Research report, 2017)
      Computers are becoming increasingly parallel. Many applications rely on OpenMP to divide units of work between a set of worker threads. Typically, this is done using parallel for-loops or tasking. Grain graphs is a recent ...
    • Get Out of the Valley: Power-Efficient Address Mapping for GPUs 

      Yuxi, Liu; Zhao, Xia; Jahre, Magnus; Wang, Zhenlin; Wang, Xiaolin; Lou, Yingwei; Eeckhout, Lieven (Journal article; Peer reviewed, 2018)
      GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional ...
    • HSM: A Hybrid Slowdown Model for Multitasking GPUs 

      Zhao, Xia; Jahre, Magnus; Eeckhout, Lieven (Chapter, 2020)
      Graphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in different ways --- leading to suboptimal resource ...