Sensor Fusion of Ultrasound and Motion Data for Gesture Recognition on Smartphones

Through the advances in computing power on smartphones in recent years, the potential for running computationally heavy systems, with higher demands for fast processing has become feasible to run on these devices. The field of machine learning is rapidly evolving and is continuing to influence more and more fields. How these methods work and how to apply them in human activity recognition problems is assessed in this thesis. An assessment of the current proximity classifier solution using ultrasound is done and used as a benchmark for the other models. With the expressive power of neural networks, attempts at combing ultrasound and inertia data for classification will be made.

Different approaches to classify the gesture of bringing the phone up to the ear will be presented, analyzed, tested and discussed. These include using a sliding window approach with a multilayer perceptron network and using a recurrent neural network on the sequences. Important characteristics used to compare different machine learning models is presented. The process of recognizing gestures with inertial sensor data is explained and tested. Using just inertia data the neural network showed promising results and was used further in combination with ultrasound. A heuristic method of known transition pattern between output classes has been proposed and the reason for why it did not improve the performance of the model has been discussed.

State of the art methods to optimize hyperparameters in neural networks is presented and tools such as Hyperas is introduced to apply these methods on parameters such as dropout rate, activation function, and size of a layer. The possibility of training models with the Keras framework and deploying them to Android devices is introduced. The performance of the models is evaluated running on an app, which is evaluated empirically through experiments on different gesture scenarios. The final sensor fusion model was also tested by multiple users, to check the user independent performance of the model. The model's ability to distinguish between the approach and retract motion was quantified.

Utgiver

NTNU