Monocular Action Classification

Birkeland, Vetle Gustav

dc.contributor.advisor	Bach, Kerstin
dc.contributor.author	Birkeland, Vetle Gustav
dc.date.accessioned	2022-09-28T17:42:28Z
dc.date.available	2022-09-28T17:42:28Z
dc.date.issued	2022
dc.identifier	no.ntnu:inspera:102935593:33599723
dc.identifier.uri	https://hdl.handle.net/11250/3022381
dc.description.abstract	Treningsspill er spill som bruker aspekter som belønninger eller utfordringer fra videospill for å få folk til å trene. Bruk av treningsspill har vist potensiale innenfor fysisk rehabilitering ved å hjelpe pasienter med å gjøre øvelsene sine uten konstant tilstedeværelse fra en lege. Når det kommer til fysisk rehabilitering er det viktig å gjøre øvelse på korrekt vis for å oppnå rask og god bedring. Et treningsspill som kan bruke et kamera for å observere og tilby sanntidstilbakemelding til pasienter under behandling vil kunne gi økt fleksibilitet for både pasient og lege, både med tanke på bruk av tid og sted. En viktig del i et slikt treningsspill er evnen til å gjenkjenne mennesker og klassifisere handlingene deres ved hjelp av video. I denne masteroppgaven blir det laget tre modeller som kan klassifisere handlinger: to XGBoost-modeller, med og uten variabelvalg (feature selection), og en CNN-modell. Et datasett bestående av videoeksempler på øvelser rettet mot fysisk rehabilitering blir brukt til å trene modellene. Videoene blir først konvertert over til et datasett bestående av leddposisjoner ved hjelp av det fritt tilgjengelige, state-of-the-art positurestimeringssystemet OpenPose. Dette leddposisjons-baserte datasettet blir brukt for å trene de tre modellene. Gjennomførte eksperimenter viser at alle modellene klarer å klassifisere de ulike øvelsene riktig. XGBoost-modellene scorer begge høyere og er mer konsistente enn CNN-modellen. XGBoost-modellen med variabelvalg trener i tillegg betydelig fortere enn de to andre. Alle modellene viser lignende tendenser til problemer med å skille øvelser som er svært like fra hverandre.
dc.description.abstract	Exergames are games that use elements like rewards or challenges from video games to get people to exercise. The use of exergames has been showing promise in physical rehabilitaion by helping patients do their exercises without the constant presence of a physician. In physical rehabilitation it is important that exercises are executed correctly in order to achieve a faster and better recovery. An exergame that can use an camera to monitor and provide real time feedback to patients undergoing treatment would allow for more flexibility for both patient and physician, both in terms of time commitment and location. A vital part of such an exergame is the ability to recognize and classify human actions from videos. In this thesis three models are created that can perform classification of movements: two XGBoost models, with and without feature selection, and a CNN model. A data set of videos containing examples of exercises related to physical therapy is used to train the models. The video data is first converted into a data set of joint positions by using the freely available, state-of-the-art pose estimator called OpenPose. The resulting data set is used to train the three models. Experiment results all show that the models manage to classify the different exercises correctly. The XGBoost models both score higher and are more consistent than the CNN-based model, with the feature selected model showing a significant decrease in training time over the other two. All models show similar difficulties with separating exercises that are very similar to one another.
dc.language	eng
dc.publisher	NTNU
dc.title	Monocular Action Classification
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:102935593:3359 ...
Størrelse:: 14.68Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6558]

Vis enkel innførsel