dc.contributor.author | Huai, Shuo | |
dc.contributor.author | Liu, Di | |
dc.contributor.author | Kong, Hao | |
dc.contributor.author | Liu, Weichen | |
dc.contributor.author | Subramaniam, Ravi | |
dc.contributor.author | Makaya, Christian | |
dc.contributor.author | Lin, Qian | |
dc.date.accessioned | 2023-03-07T09:21:54Z | |
dc.date.available | 2023-03-07T09:21:54Z | |
dc.date.created | 2023-01-03T14:15:54Z | |
dc.date.issued | 2022 | |
dc.identifier.citation | Future generations computer systems. 2022, 142 314-327. | en_US |
dc.identifier.issn | 0167-739X | |
dc.identifier.uri | https://hdl.handle.net/11250/3056327 | |
dc.description.abstract | Deep learning applications have been widely adopted on edge devices, to mitigate the privacy and latency issues of accessing cloud servers. Deciding the number of neurons during the design of a deep neural network to maximize performance is not intuitive. Particularly, many application scenarios are real-time and have a strict latency constraint, while conventional neural network optimization methods do not directly change the temporal cost of model inference for latency-critical edge systems. In this work, we propose a latency-oriented neural network learning method to optimize models for high accuracy while fulfilling the latency constraint. For efficiency, we also introduce a universal hardware-customized latency predictor to optimize this procedure to learn a model that satisfies the latency constraint by only a one-shot training process. The experiment results reveal that, compared to state-of-the-art methods, our approach can well-fit the ‘hard’ latency constraint and achieve high accuracy. Under the same training settings as the original model and satisfying a 34 ms latency constraint on the ImageNet-100 dataset, we reduce GoogLeNet’s latency from 40.32 ms to 34 ms with a 0.14% accuracy reduction on the NVIDIA Jetson Nano. When coupled with quantization, our method can be further improved to only 0.04% drop for GoogLeNet. On the NVIDIA Jetson TX2, we compress VGG-19 from 119.98 ms to 34 ms and even improve its accuracy by 0.5%, and we scale GoogLeNet up from 20.27 ms to 34 ms and achieve higher accuracy by 0.78%. We also open source this framework at https://github.com/ntuliuteam/ZeroBN. | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Elsevier | en_US |
dc.title | Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization | en_US |
dc.title.alternative | Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization | en_US |
dc.type | Peer reviewed | en_US |
dc.type | Journal article | en_US |
dc.description.version | publishedVersion | en_US |
dc.source.pagenumber | 314-327 | en_US |
dc.source.volume | 142 | en_US |
dc.source.journal | Future generations computer systems | en_US |
dc.identifier.doi | 10.1016/j.future.2022.12.021 | |
dc.identifier.cristin | 2099780 | |
cristin.ispublished | true | |
cristin.fulltext | original | |
cristin.qualitycode | 2 | |