Accelerating Sparse Linear Algebra and Deep Neural Networks on Reconfigurable Platforms
Abstract
Regardless of whether the chosen figure of merit is execution time, throughput, battery life for an embedded system or total cost of ownership for a datacenter, today’s computers are fundamentally limited by their energy efficiency. Using specialized hardware-software solutions for particular applications or domains is a well-known approach to increase energy efficiency of computing systems. Reconfigurable logic in the form of Field-Programmable Gate Arrays (FPGAs) is a particularly promising substrate for hardware specialization, owing to its runtime reconfigurability, vastly parallel compute fabric and widespread availability. However, mapping computation to reconfigurable logic in a way which provides performance and efficiency benefits is a significant challenge due to the vast design space. In this thesis, we study how two particular domains can benefit from specialized architectures on reconfigurable logic. We focus on sparse linear algebra and deep neural network inference, whose execution is known to be particularly problematic on today’s general-purpose computers.
For sparse linear algebra, lack of spatial and temporal locality in memory accesses pose a fundamental problem. We address this problem by taking advantage of the flexibility of reconfigurable logic to construct specialized memory systems.We propose a hardware-software caching scheme which uses lightweight preprocessing to extract key access pattern information fromsparse matrices to offer greatly increased random access efficiency with minimal on-chip memory usage. Furthermore, we demonstrate the broader applicability of the specialization for sparse linear algebra to graph analytics with an accelerator for breadth-first search that uses off-chip memory bandwidth more efficiently compared to prior work.
For deep neural network inference, the sheer energy and hardware resource cost of floating point computation is a fundamental limitation on energy efficiency. Exploiting recent advances in training highly quantized neural networks (QNNs), we demonstrate how FPGAs can be leveraged for accurate, energy-efficient and high-performance neural network inference.We propose the FINN framework to generate customized architectures with compute resources tailored to user-specified performance requirements while exploiting multiple levels of parallelism for high energy efficiency. We also describe mathematical simplifications for making QNN inference more resourceefficient, and show how binary matrix operators can be used as bit-serial building blocks for higher-precision computation.
Has parts
Paper 1: Umuroglu, Yaman; Jahre, Magnus. An Energy Efficient Column-Major Backend for FPGA SpMV Accelerators. I: 2014 32nd IEEE International Conference on Computer Design (ICCD). IEEE 2014 s. 432-439 http://doi.org/10.1109/ICCD.2014.6974716 - © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksPaper 2: Umuroglu, Yaman; Jahre, Magnus. A Vector Caching Scheme for Streaming FPGA SpMV Accelerators. I: Applied Reconfigurable Computing. Springer 2015 s. 15-26 The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-16214-0_2
Paper 3: Umuroglu, Yaman; Jahre, Magnus. Random access schemes for efficient FPGA SpMV acceleration. Microprocessors and microsystems 2016 ;Volum 47B. s. 321-332 https://doi.org/10.1016/j.micpro.2016.02.015
Paper 4: Umuroglu, Yaman; Morrison, Donn; Jahre, Magnus. Hybrid Breadth-First Search on a Single-Chip FPGA-CPU Heterogeneous Platform. I: 25th International Conference on Field Programmable Logic and Applications, FPL 2015 2015. IEEE conference proceedings https://doi.org/10.1109/FPL.2015.7293939 - © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Paper 5: Umuroglu, Yaman; Fraser, Nicholas J.; Gambardella, Giulio; Blott, Michaela; Leong, Philip W.; Jahre, Magnus; Vissers, Kees. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. I: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Association for Computing Machinery (ACM) 2017 s. 65-74 http:dx.doi/10.1145/3020078.3021744
Paper 6: Fraser, Nicholas J.; Umuroglu, Yaman; Gambardella, Giulio; Blott, Michaela; Leong, Philip W.; Vissers, Kees; Jahre, Magnus. Scaling Binarized Neural Networks on Reconfigurable Logic. I: Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms. Association for Computing Machinery (ACM) 2017 s. 25-30 http://doi.org/10.1145/3029580.3029586
Paper 7: Umuroglu, Yaman; Jahre, Magnus. Streamlined Deployment for Quantized Neural Networks. InternationalWorkshop on Highly Efficient Neural Networks Design (HENND), part of Embedded SystemsWeek (ESWEEK) 2017 arXiv:1709.04060v1