Noise in the Sea - Deep Learning for Marine Acoustic Data

Moan, Martin

Moan, Martin

Master thesis

Åpne

no.ntnu:inspera:112046434:22380879.pdf (9.773Mb)

Permanent lenke

https://hdl.handle.net/11250/3050167

Utgivelsesdato

2022

Metadata

Vis full innførsel

Samlinger

Institutt for datateknologi og informatikk [6828]

Sammendrag

Økt menneskelig aktivitet har forandret havets akustiske miljø fra før-industriell tid gjennom vesentlig antropogenisk lydforurensning, fra aktiviteter som for eksempel shipping, bruk av geologisk undersøkelsesutstyr og påling. Mange marine arter er spesialister på lyd, og benytter lyd som sitt primære verktøy for å oppfatte sin omverden og for å kommunisere med andre individer av samme art. Det er blitt funnet at antropogenisk støy påvirker marine dyr på diverse måter, blant annet gjennom auditiv maskering, økt stress, adferdsmessige forandringer samt fysiologiske skader. Overvåkning av marine akustiske miljø er derfor avgjørende for å tilse effektivt vern av marine økosystem. Ubemannede glider baserte autonome undervannsfarkoster (AUV) kan være lovende plattformer for å gjennomføre denne typen overvåkning, grunnet relativt lave kostnader forbundet med disse plattformene i forhold til for eksempel skipsbårent akustisk utstyr. Samt også grunnet i at disse plattformene kan utplasseres over lengre tidsrom og traversere havområder, uten å produsere vesentlig støy i forhold til skipsbårent akustisk utstyr. I denne avhandlingen benytter og evaluerer vi tre modeller som benytter dyplæringsteknikker for å klassifisere antropogeniske og biologiske lyder i marine lyddata fra glider baserte AUV plattformer. I dette arbeidet benytter vi ResNet18 CNN arkitekturen og Audio Spectrogram Transformer (AST) transformator nettverket, trent gjennom veiledet læring. Vi trener også AST modellen gjennom selv-veiledet forhåndstrening (SSAST) som deretter fintrenes for den samme klassifiseringsoppgaven. Både de veiledede og selv-veiledede modellene sammenlignes for å undersøke effekten av både økt kompleksitet i modellene og selvveiledning har på modellenes ytelsesevne i klassifiseringsoppgaven. Modellene evalueres ved hjelp av flere mål som ofte benyttes innen lydklassifisering, blant annet F1 verdi, "mean Average Precision (mAp)" og "Area Under the Receiver Operating Characteristic curve (AUROC)". Resultatene som presenteres i denne oppgaven viser at AST modellen som oppnår beste ytelse i klassifiseringsoppgaven, og oppnår omtrentlig mAp og AUROC på henholdsvis 95.7 og 99.1. Resultatene viser videre at ResNet18 modellen, til tross for vesentlig lavere kompleksitet i forhold til AST, oppnår overraskende gode resultater, med omtrentlig mAp og AUROC på henholdsvis 92.8 og 99.2. SSAST yter vesentlig verre derimot, og oppnår kun omtrentlige mAp of AUROC på henholdsvis 43.2 og 82.5. Som kan tyde på at de veiledede modellene egner seg bedre til den endelige klassifiseringsoppgaven, dersom tilstrekkelige markerte data er tilgjengelig for trening. Resultatene av den selv-veiledede modellen kan tyde på at slike modeller kan benyttes for lyddeteksjon og klassifisering av marine akustiske data dersom tilstrekkelig mengde markerte treningsdata ikke er tilgjengelig, men at disse kan bli vesentlig forbigått av veiledede modeller. Resultatene presentert i denne oppgaven viser også tydelig at dyplæringsmodeller basert på transformator arkitekturen, som har oppnådd lovende resultater for landbasert lydklassifisering, også egner seg godt for klassifisering av marine lyddata fra glider baserte AUV plattformer.

Increased human activity in the oceans has transformed the pre-industrial marine acoustic environments through significant anthropogenic sound pollution from activities such as shipping, seismic survey equipment and pile driving, to name a few. Many marine animals are acoustic specialists, using sound as their main method of perception of their environment and communication with conspesific individuals. Anthropogenic noise has been found to affect marine wildlife in a number of ways, including auditory masking, increased stress, behavioural changes and physiological damage. Therefore it is vital that we monitor the acoustic environment of our oceans, to ensure conservation efforts are effective in preserving the health of our marine ecosystems. For this Unmanned glider-based Autonomous Underwater Vehicles (AUVs) provide a promising platform for acoustic monitoring due to their relatively low cost, and their ability to be deployed for long periods of time while traversing the marine environment, while introducing minimal noise into the environment compared to mobile ship-based recording platforms. In this thesis three deep learning based models are applied and evaluated for the task of multi-label classification of glider-based acoustic data, to detect and classify anthropogenic and biological sounds in marine environments. We train the ResNet18 convolutional architecture, and the Audio Spectrogram Transformer model using supervised techniques. The Audio Spectrogram Transformer is also pretrained using a self-supervised framework, and fine-tuned for the same multi-label classification task. The supervised and self-supervised models are compared to investigate the effect of both increased model complexity and self-supervised pretraining on the performance of the final multi-label classification task. The models are evaluated using several common metrics in the audio classification domain, including F1 Score, mean Average Precision (mAp) and Area Under the Receiver Operating Characteristic curve (AUROC). The results presented in this thesis show that the supervised Audio Spectrogram Transformer is the most capable in the classification task among the three models used in this project, achieving mAp and AUROC of approximately 95.7 and 99.1 respectively. The results also show that despite its reduced complexity compared to the Audio Spectrogram Transformer, the ResNet18 model achieves good results with mAp and AUROC of approximately 92.8 and 99.2 respectively. The self-supervised Audio Spectrogram Transformer however performs significantly worse on the final multi-label classification task, achieving mAp and AUROC of approximately 43.2 and 82.5 respectively. Indicating that the supervised frameworks can achieve better results for the final prediction task than the self-supervised pretrained models, if enough labeled data is available for training. The results of the self-supervised model indicate that models trained in this framework can indeed be used for sound event detection and classification of marine acoustic data if labeled data is significantly limited or missing, although being outperformed by the purely supervised models. The results of this thesis also show that transformer based audio classification models, which have achieved promising results in the terrestrial domain, are also capable of audio classification of glider-based marine acoustic data.

Utgiver

NTNU