ANN for classification of Jack snipe

Stava, Maja Sofie

dc.contributor.advisor	Dutilleux, Guillaume
dc.contributor.author	Stava, Maja Sofie
dc.date.accessioned	2021-09-15T16:59:29Z
dc.date.available	2021-09-15T16:59:29Z
dc.date.issued	2020
dc.identifier	no.ntnu:inspera:53184405:23698298
dc.identifier.uri	https://hdl.handle.net/11250/2778130
dc.description.abstract	Maskinlæring som brukes i biomangfoldisk bevarelse er et fagområde i utvikling, da det tilbyr en ikke-ingripende overvåkning av hendelser i naturen. For formålet til denne avhandlingen så er vi interesert i å klassifisere kvartsbekkasin i opptak på flere timer. En binær CNN -og LSTM modell ble bygd og trent på et merket datasett bestående av 400 klipp, hvor hvert klipp varte i 4 sekunder. Av disse var 130 klipp av kvartbekkasin, mens 270 var av ikke-kvartbekkasin. Modelene blir testet med et umerket lydopptak på 1t og 49 min, som inneholder 22 vokaliseringer av kvartbekkasin. Lydklippet inneholder klare vokal-\\ iseringer uten mye forstyrrelse, og vokaliseringer som er forurenset med lyder fra omgivelsen, slik som vind og elvbrus. Det beste resultatet ble oppnådd ved å bruke en CNN model med inngangsform (90, 126, 1) og et påført båndpassfilter med frekvensområde 400-2000 Hz på lydklippet. Dette ga en sann positive testrate (TPR) på TPR = 0.88, og en falsk positiv testrate (FPR) på FPR = 0.086. Den totale lengen på de predikert positive klippene er 13 min og 4 s. Når det kjøres tester på hele opptak på 9 -og 15 timer, viste prediksjonene seg mer ustabile avhengig av hvor mye vind det var i opptakene.
dc.description.abstract	Machine learning used in biodiversity conservation is an expanding field of study, as it provides non-intrusive monitoring of events in the wild. For the purpose of this thesis, the event of interest is to classify jack snipe in a recording of several hours. A binary CNN- and LSTM model was built and trained on an annotated dataset consisting of 400 clips. Each clip has a duration of 4 seconds, where 130 of the clips are of jack snipe vocalization, and 270 is non-jack snipe vocalization, like wind, grouse, and crow. The models are then tested on the sound recording of 1 h and 49 min, containing 22 vocalizations of jack snipe. The recording included clean vocalizations and vocalizations polluted with environmental sounds such as wind and river noise. The best result came from using the CNN model with input shape of (90, 126, 1) and added DTW and bandpass filter with frequency range 400-2000 Hz on the clips to be predicted, with a true positive rate (TPR) of TPR = 0.88 and false positive rate (FPR) of FPR = 0.086. The total length of all the 196 positive predicted clips is 13 min and 4 sec. When testing on full recordings of 9 -and 15 hours length, the predictions proved more unstable depending on how windy the recording was.
dc.language	eng
dc.publisher	NTNU
dc.title	ANN for classification of Jack snipe
dc.type	Master thesis

Files in this item

Name:: no.ntnu:inspera:53184405:23698 ...
Size:: 17.81Mb
Format:: PDF

View/Open

Name:: no.ntnu:inspera:53184405:23698 ...
Size:: 9.766Kb
Format:: application/zip

View/Open

This item appears in the following Collection(s)

Institutt for elektroniske systemer [2338]

Show simple item record