Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization

Huai, Shuo; Liu, Di; Kong, Hao; Liu, Weichen; Subramaniam, Ravi; Makaya, Christian; Lin, Qian

dc.contributor.author	Huai, Shuo
dc.contributor.author	Liu, Di
dc.contributor.author	Kong, Hao
dc.contributor.author	Liu, Weichen
dc.contributor.author	Subramaniam, Ravi
dc.contributor.author	Makaya, Christian
dc.contributor.author	Lin, Qian
dc.date.accessioned	2023-03-07T09:21:54Z
dc.date.available	2023-03-07T09:21:54Z
dc.date.created	2023-01-03T14:15:54Z
dc.date.issued	2022
dc.identifier.citation	Future generations computer systems. 2022, 142 314-327.	en_US
dc.identifier.issn	0167-739X
dc.identifier.uri	https://hdl.handle.net/11250/3056327
dc.description.abstract	Deep learning applications have been widely adopted on edge devices, to mitigate the privacy and latency issues of accessing cloud servers. Deciding the number of neurons during the design of a deep neural network to maximize performance is not intuitive. Particularly, many application scenarios are real-time and have a strict latency constraint, while conventional neural network optimization methods do not directly change the temporal cost of model inference for latency-critical edge systems. In this work, we propose a latency-oriented neural network learning method to optimize models for high accuracy while fulfilling the latency constraint. For efficiency, we also introduce a universal hardware-customized latency predictor to optimize this procedure to learn a model that satisfies the latency constraint by only a one-shot training process. The experiment results reveal that, compared to state-of-the-art methods, our approach can well-fit the ‘hard’ latency constraint and achieve high accuracy. Under the same training settings as the original model and satisfying a 34 ms latency constraint on the ImageNet-100 dataset, we reduce GoogLeNet’s latency from 40.32 ms to 34 ms with a 0.14% accuracy reduction on the NVIDIA Jetson Nano. When coupled with quantization, our method can be further improved to only 0.04% drop for GoogLeNet. On the NVIDIA Jetson TX2, we compress VGG-19 from 119.98 ms to 34 ms and even improve its accuracy by 0.5%, and we scale GoogLeNet up from 20.27 ms to 34 ms and achieve higher accuracy by 0.78%. We also open source this framework at https://github.com/ntuliuteam/ZeroBN.	en_US
dc.language.iso	eng	en_US
dc.publisher	Elsevier	en_US
dc.title	Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization	en_US
dc.title.alternative	Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
dc.source.pagenumber	314-327	en_US
dc.source.volume	142	en_US
dc.source.journal	Future generations computer systems	en_US
dc.identifier.doi	10.1016/j.future.2022.12.021
dc.identifier.cristin	2099780
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	2

Tilhørende fil(er)

Filnavn:: FGCS2023-Huai.pdf
Størrelse:: 2.668Mb
Format:: PDF

Låst

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6558]
Publikasjoner fra CRIStin - NTNU [37304]

Vis enkel innførsel