People Detection using Transfer learning on Deep Convolutional Neural Networks

Benjamin Ramberg Møklegård

dc.contributor.advisor	Aunet, Snorre
dc.contributor.author	Benjamin Ramberg Møklegård
dc.date.accessioned	2021-09-15T16:56:34Z
dc.date.available	2021-09-15T16:56:34Z
dc.date.issued	2020
dc.identifier	no.ntnu:inspera:54579301:34126264
dc.identifier.uri	https://hdl.handle.net/11250/2778102
dc.description.abstract	Convolutional neural networks har blitt etablert som den mest effektive metoden for å anvende deep learning på datasyn. Denne oppgaven har som mål å utforske hvordan man kan bruke konseptet transfere learning for å øke deteksjons effektiviteten til allerede trente nevrale nett og deretter analysere disse trente modellen ved å anvende de på ressurs begrensede maskinvare. Oppgaven disse trente nettverkene er anvendt på er person deteksjon, det vil si å detekter og gi et estimat på hvor mange personer som befinner seg i et bilde. Nettverkene som er anvendt i oppgaven er trent på en del av et større datasett laget av Google, kalt Open Image Database. I oppgaven ble 20000 bilder tilhørende klassen "Person" brukt for å fine-tune de allerede trente modellene. Modellene som er utforsket i denne oppgaven er Mobilenet V2 + SSD (Non-quantized og Quantized) for Tensorflow, YOLOv3 og YOLOv3-Tiny for Darknet. Ved å bruke transfer learning så kan man observere en forbedring i modellenes mean Average Precision (mAP) og Average Recall (AR). For Mobilenet V2 + SSD så øker mAP fra 0.49 til 0.62. Mobilenet V2 + SSD Quantized ser en økning fra 0.004 til 0.61. YOLOv3 ser en liten reduksjon fra 0.66 til 0.65 og YOLOv3-Tiny øker fra 0.25 til 0.51. Videre så er modellene blitt testet på Google Coral Dev Board som har en innebygget akselerator for nevrale nett. Alle modellene har blitt testet på utviklingsbrettets CPU, hvorav Mobilenet V2 + SSD (Quantized) også har blitt testet på TPUen. Resultatene fra testingen ga at Mobilenet V2 + SSD kjører med en bilde per sekund (FPS) på 1.35. Den kvantifiserte modellen kjører raskere med 3.61 FPS på CPU og 131.82 FPS på TPUen. YOLOv3 og YOLOv3-Tiny har en FPS på 0.02 og 0.23 respektivt. Videre ble det funnet at energy per operation for Quantized Mobilenet V2 + SSD kunne estimeres til 8pJ/FLOPS når den kjørte på TPUen og 266pJ/FLOPS ved inferens på CPU. I motsetning så bruker den ordinere Mobilenet modellen 474pJ/FLOPS (CPU). YOLOv3s energi per operasjon ble estimert til 1210pJ/FLOPS (CPU) og YOLOv3-Tiny bruker ca 1397pJ/FLOPS (CPU). I forhold til deteksjon så klarer alle modellene bortsett fra YOLOv3-Tiny å detektere mennesker i et eksempel bilde som inneholder fire personer. Hvor YOLOv3-Tiny kun klarer å detektere tre personer. Konklusjonen i oppgaven blir at transfer learning kan hjelpe ved å gi modellene en økning i deteksjons nøyaktighet på egendefinerte datasett. videre så er anbefalingen at den kvantifiserte modellen anvendes. Denne kan kjøres på Corals TPU, hvilket er bevist å være den mest energi effektive effektive måten å kjøre modellene, noe som vil være viktig hvis modellen skal kjøres på hardware med ressurs begrensinger, slik som Google Coral Dev Board.
dc.description.abstract	Convolutional neural networks have been established as one of the most efficient ways of applying machine learning to computer vision. The purpose of this thesis has been to investigate the concept of transfer learning and how it can be utilized to retrain pretrained neural network models to increase detection accuracy. The fine-tuned networks in this thesis have been trained for the task of "People Detection." That is detecting and giving an estimate of how many people are present in an image. A subset of images from Open Image Database, a database curated by Google, has been used to train the custom detectors. In this thesis, a set of 20000 images is used for the training phase and 4000 images for the test phase. The images belong to the object class, "Person." The neural networks explored in this thesis are Mobilenet V2 + SSD (Non-quantized and Quantized), YOLOv3 and YOLOv3-Tiny. Applying \textit{transfer learning} increases the mean Average Precision (mAP) and average Recall (AR) scores for most of the models. mAP for Mobilenet V2 + SSD increases from 0.49 to 0.62. Mobilenet V2 + SSD Quantized increases from 0.004 to 0.61. YOLOv3 suffers a slight performance reduction, where the mAP reduces from 0.66 to 0.65. YOLOv3-Tiny sees an increase from 0.25 to 0.51. The models have undergone further testing by being deployed on the Google Coral Dev Board, which features an accelerator. Every model has been tested on the Dev Board CPU, while the Quantized version of the Mobilenet V2 + SSD model has also been tested on the TPU accelerator. Results from the testing shows that the Mobilenet V2 + SSD (Non-quantized) model runs at a frame per second (FPS) of 1.35. The quantized model performs better at 3.61 FPS on CPU and 131.82 FPS on the TPU. YOLOv3 and YOLOv3-Tiny performs poorly with an FPS of 0.02 and 0.23, respectively. Estimations on the energy consumption per operation have been performed, to give a better overview on the energy efficiency of each model. Since the models are to be deployed in system for detection of people and that it is likely that it will run on battery power, the energy consumed per network becomes vital to determine which should be deployed to ensure longevity of such a system. In this thesis it was found that the Quantized Mobilenet V2 + SSD consumes approximately 8pJ/FLOPS when running on the TPU,increasing to 266pJ/FLOPS when running on CPU. The non-quantized Mobilenet model consumes 474pJ/FLOPS (CPU). YOLOv3 consumes 1210pJ/FLOPS(CPU), and YOLOv3-Tiny uses 1397pJ/FLOPS (CPU). In terms of detection, the model has been applied to an example image containing four people. Both Mobilenet models and YOLOv3 manages to properly detect 4 people, while YOLOv3-Tiny only manages to detect three people. The conclusion reached in this thesis is that transfer-learning can help boost a pretrained model's performance and fine-tune the models for custom tasks, such as "People detection." The recommendation one can provide from the results of deploying the neural networks is to use the Quantized Mobilenet V2 + SSD model. This model is shown to be the most energy-efficient model when deployed on the Edge TPU, which is vital in deploying such a system on resource-constrained devices such as the Google Coral Dev Board.
dc.language	eng
dc.publisher	NTNU
dc.title	People Detection using Transfer learning on Deep Convolutional Neural Networks
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:54579301:34126 ...
Størrelse:: 16.18Mb
Format:: PDF

Åpne

Filnavn:: no.ntnu:inspera:54579301:34126 ...
Størrelse:: 463.6Mb
Format:: application/zip

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for elektroniske systemer [2289]

Vis enkel innførsel