Computer Vision and Deep Learning on Mobile Devices
MetadataShow full item record
Deep learning has advanced the field of computer-based image classification algorithms within a range of different fields. These algorithms are computational demanding, requiring a minimum level of computational power. The performance of mobile hardware and integrated mobile camera has significantly increased in the last decade, making it an interesting platform for deploying deep learning algorithms based on computer vision. Three objectives were explored during this thesis. The viability of using deep learning based image recognition and object detection locally on a mobile device. Comparing local performed recognition with different cloud-based image recognition services. In addition to exploring a specific use case of creating image classifiers and object detectors for a small and imbalanced dataset containing images of eight different shell fossils from Svalbard. A mobile application was developed to be able to test the different image recognition, and object detection approaches explored during this thesis. The application was developed using a cross-platform development framework named React-Native. Locally trained models using a transfer learning approach from pre-trained MobileNet and Inception v3 models were tested for the image classifiers, and Single Shot MultiBox Detector, You Only Look Once, and Region-based Convolutional Neural Network was explored for the object detectors. Three different cloud services were evaluated: Clarifai, Google Vision, and Microsft Azure Vision. The local and cloud-based modules were found to have a vast range of computationally latency. The inference latency of the local image classifiers were all under 1 second, with some MobileNets versions implemented with Tensorflow, and SuqeezeNet implemented in Caffe2 performing faster than 0.2 seconds. The implemented object detectors ranged from under 0.4 seconds to over 5 seconds, while the cloud services performed with a latency of between 3.1 and 9 seconds. The computational power and memory usage were all found to be relatively low. The trained fossil models achieved acceptable accuracy and precision scores. The best image classifier scored a 74.3% accuracy when retrained from a MobileNets v1 model using a dataset created by cropping out fossils from the original dataset, in addition to performing the data augmentation technique rotation. Using the mentioned dataset, a trained object detector achieved a mean average precision score of 70,28%. While using a more realistic dataset for object detection, a dataset where the fossils did not occupy nearly the entire image, obtained a score of 63,61%. Performing deep learning based image recognition and object detection methods locally on mobile devices showed great potential. Whether to use cloud-based solutions or to perform the computation locally highly depend on the specific use case, as both prevail in different aspects. The support for deep learning and computer vision components for cross-platform development frameworks is still lacking, but the infrastructure is in place.