Meal type classification using a neural network model trained on chewing audio data

Klavins, Davis Guntars

dc.contributor.advisor	Hjelme, Dag Roar
dc.contributor.advisor	Ijaz, Salman Siddiqui
dc.contributor.author	Klavins, Davis Guntars
dc.date.accessioned	2022-09-28T17:42:39Z
dc.date.available	2022-09-28T17:42:39Z
dc.date.issued	2022
dc.identifier	no.ntnu:inspera:104140281:37563565
dc.identifier.uri	https://hdl.handle.net/11250/3022385
dc.description.abstract	Diabetes er en av de mest utbredte sykdommene i verden. Millioner av pasienter er avhengige av invasive og ugunstige behandlinger for å opprettholde optimal daglig funksjon. Forskning i alternative behandlinger har ført til dannelsen av begrepet "kunstig bukspyttkjertel" (artificial pancreas), et fullstendig autonomt system som erstatter de feilaktige insulinproduksjonsevnene til pasientens bukspyttkjertel. Et av kravene for å realisere et slikt system, er autonom kostholdsovervåking. Regulering av insulin krever tidlig informasjon om måltidsstart, og nøyaktig dosering er avhengig av informasjon om næringsinnholdet i maten. Som et av flere pågående forskningsprosjektene til Artificial Pancreas Trondheim, denne studien dekker undersøkelsen og implementering av måltidstype-klassifiseringssystem. I denne studien er systemene basert på nevrale nettverksmodeller trent på data laget fra lydopptak. Ved bruk av 20 lydopptak, som fanger opp tyggelydene av salat- og havregrøt-måltiden, fremstilles et datasett basert på Power Spectrum Density (PSD) og Mel spektrogram. Systemet i den første tilnærmingen bruker en "fully connected" nevral nettverksmodell, trent på PSD-dataen for å klassifisere de to måltidstypene. Selv om dette systemet klarer å identifisere riktig måltidsområde i opptakene, klarer den ikke å klassifisere måltidstypen pålitelig nok. Den andre tilnærmingen implementerer 3 systemer som bruker et konvolusjonelt nevralt nettverk (CNN), trent på Mel spektrogram segmenter av individuelle tyggeforekomster. To av systemene, tyggedetektoren og Mel måltidstypeklassifiserer, er kombinert for å gi måltidstypeklassifisering. Disse to systemene klassifiserer riktig måltidstype for alle 20 testopptakene og oppnår gjennomsnittlig "prediction ratio" på 90,5 %. Det tredje systemet, som kombinerer de to foregående, fungerer på samme måte, og oppnår 90,3 % "prediction ratio". Ytterligere testing av datasettkonfigurasjonene indikerer at den optimale segmentlengden for datasettet er rundt 350 til 450 ms. Selv om testforholdene ble kontrollert og klassifisering begrenset til kun to måltidstyper, beviser denne studien gjennomførbarheten av måltidstypeklassifisering, samt gir en fungerende implementering, med god ytelse.
dc.description.abstract	Diabetes is one of the most prevalent conditions in the world. Millions of patients depend on invasive and inconvenient treatments to maintain daily function. Research in alternative treatments has lead to conception of the term artificial pancreas, a fully autonomous system which replaces the faulty insulin production capabilities of the patients pancreas. One among the numerous considerations for realizing such system, is autonomous dietary monitoring. Appropriate injection of insulin requires early information about meal onset, and accurate dosage is dependent on information about the nutritional contents of the food. This study is one of the numerous ongoing research projects by the Artificial Pancreas Trondheim research group, and covers the investigation into approaches for meal type classification. The meal type classification systems developed in this study are based on neural network models trained on data extracted from audio recordings. A dataset consisting of 20 audio recordings, capturing the chewing sounds during consumption of salad and oat meals, is used to extract Power spectrum density (PSD) and Mel spectrogram features. The initial approach uses a fully connected neural network model, trained on the PSD features to classify the two meal types. Although this approach manages to correctly identify the meal region in the recordings, it fails to reliably identify the meal type. The second approach implements 3 systems using a convolutional neural network, trained on Mel spectrogram chewing segments. Two of the systems, the chew detector and Mel meal type classifier, are combined to provide meal type classification. This approach correctly classifies meal types for all 20 testing recordings and achieves average prediction ratio of 90.5%. The third system, which combines the previous two, performs similarly, achieving 90.3% prediction ratio. Additional testing of the dataset configurations indicates that the optimal segment length for the dataset is around 350 to 450ms. Although the testing conditions were controlled and recordings confined to only two meal types, this study proves the feasibility of meal type classification and provides a working implementation with a very good performance.
dc.language	eng
dc.publisher	NTNU
dc.title	Meal type classification using a neural network model trained on chewing audio data
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:104140281:3756 ...
Størrelse:: 11.88Mb
Format:: PDF

Åpne

Filnavn:: no.ntnu:inspera:104140281:3756 ...
Størrelse:: 222.1Mb
Format:: application/zip

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for elektroniske systemer [2286]

Vis enkel innførsel