Improving Memory Access Locality for Vectorized Bit-Serial Matrix Multiplication in Reconfigurable Computing

Sasnayake Mudiyanselage, Lahiru Kasun Rasnayake; Själander, Magnus

Sasnayake Mudiyanselage, Lahiru Kasun Rasnayake; Själander, Magnus

Chapter

Accepted version

Åpne

Sasnayake Mudiyanselage (326.7Kb)

Permanent lenke

https://hdl.handle.net/11250/2677748

Utgivelsesdato

2019

Sammendrag

Low-precision matrix multiplication has gained significant interest in the research community due to its applicability in the quantized neural network domain. As a result, a multitude of variable precision hardware designs have been proposed since fixed-precision hardware causes under-utilization of the hardware resources due to the low and varying precision in such applications. Bit-serial hardware takes advantage of the frugal nature of bit-serial computations that can operate on only as many bits as necessary. A bit-serial matrix multiplication consists of a summation of weighted binary matrix multiplications. In this work, we study the inherent locality of bit-serial matrix multiplications and propose a locality-aware scheduling algorithm that eliminates redundant data fetches from memory. The proposed schedule improves with up to 76% compared to a schedule that computes each binary matrix multiplication in sequence.

Utgiver

Institute of Electrical and Electronics Engineers (IEEE)