Vis enkel innførsel

dc.contributor.authorHuai, Shuo
dc.contributor.authorLiu, Di
dc.contributor.authorKong, Hao
dc.contributor.authorLiu, Weichen
dc.contributor.authorSubramaniam, Ravi
dc.contributor.authorMakaya, Christian
dc.contributor.authorLin, Qian
dc.date.accessioned2023-03-07T09:21:54Z
dc.date.available2023-03-07T09:21:54Z
dc.date.created2023-01-03T14:15:54Z
dc.date.issued2022
dc.identifier.citationFuture generations computer systems. 2022, 142 314-327.en_US
dc.identifier.issn0167-739X
dc.identifier.urihttps://hdl.handle.net/11250/3056327
dc.description.abstractDeep learning applications have been widely adopted on edge devices, to mitigate the privacy and latency issues of accessing cloud servers. Deciding the number of neurons during the design of a deep neural network to maximize performance is not intuitive. Particularly, many application scenarios are real-time and have a strict latency constraint, while conventional neural network optimization methods do not directly change the temporal cost of model inference for latency-critical edge systems. In this work, we propose a latency-oriented neural network learning method to optimize models for high accuracy while fulfilling the latency constraint. For efficiency, we also introduce a universal hardware-customized latency predictor to optimize this procedure to learn a model that satisfies the latency constraint by only a one-shot training process. The experiment results reveal that, compared to state-of-the-art methods, our approach can well-fit the ‘hard’ latency constraint and achieve high accuracy. Under the same training settings as the original model and satisfying a 34 ms latency constraint on the ImageNet-100 dataset, we reduce GoogLeNet’s latency from 40.32 ms to 34 ms with a 0.14% accuracy reduction on the NVIDIA Jetson Nano. When coupled with quantization, our method can be further improved to only 0.04% drop for GoogLeNet. On the NVIDIA Jetson TX2, we compress VGG-19 from 119.98 ms to 34 ms and even improve its accuracy by 0.5%, and we scale GoogLeNet up from 20.27 ms to 34 ms and achieve higher accuracy by 0.78%. We also open source this framework at https://github.com/ntuliuteam/ZeroBN.en_US
dc.language.isoengen_US
dc.publisherElsevieren_US
dc.titleLatency-constrained DNN architecture learning for edge systems using zerorized batch normalizationen_US
dc.title.alternativeLatency-constrained DNN architecture learning for edge systems using zerorized batch normalizationen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionpublishedVersionen_US
dc.source.pagenumber314-327en_US
dc.source.volume142en_US
dc.source.journalFuture generations computer systemsen_US
dc.identifier.doi10.1016/j.future.2022.12.021
dc.identifier.cristin2099780
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode2


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel