Uncertainty Estimation in Image Classification as a Step Toward Autonomous Inspection

Lille, Hanna Berggrav

dc.contributor.advisor	Liu, Yi Edward
dc.contributor.advisor	Johansen, Tor Arne
dc.contributor.advisor	Mester, Rudolf
dc.contributor.author	Lille, Hanna Berggrav
dc.date.accessioned	2023-10-03T17:23:14Z
dc.date.available	2023-10-03T17:23:14Z
dc.date.issued	2023
dc.identifier	no.ntnu:inspera:140443607:34407973
dc.identifier.uri	https://hdl.handle.net/11250/3093930
dc.description	Full text not available
dc.description.abstract	DNVs REDHUS prosjekt har som mål å utvikle autonom deteksjon og klassifikasjon av defekter inni ballasttanker på skip ved hjelp av maskinlæringsalgoritmer. Hvis deteksjons- og klassifikasjonsalgoritmer skal stoles på er det avgjørende å forstå hvor sikre valgene de tar er. Dette innebærer at observasjoner med lav usikkerhet kan stoles på og observasjoner med høy usikkerhet trenger en ny gjennomgang. Det finnes usikkerhet i maskinlæringssystemer som kan oppstå både fra input-dataen og fra hvor godt systemet passer til dataen, henholdsvis kalt aleatorisk og epistemisk usikkerhet. Begge typer usikkerhet kan estimeres. Den sistnevnte usikkerhetstypen krever tradisjonelt algoritmer med kunnskap om det nevrale nettverket for å estimere den. Målet med denne oppgaven er å estimere begge typer usikkerhet og undersøke om de kan estimeres ved hjelp av metoder som ikke krever innsikt i maskinlæringssystemet. To eksisterende metoder implementeres i oppgaven for å estimere usikkerhet, Monte-Carlo (MC) Dropout og Test-Time Augmentation (TTA), som henholdsvis estimerer epistemisk og aleatorisk usikkerhet. I tillegg foreslås og testes en ny metode kalt Test-Time (TT) Pixel Variation for å potensielt erstatte Monte-Carlo Dropout som en estimator for epistemisk usikkerhet, spesielt i tilfeller der det nevrale nettverket ikke er tilgjengelig for endring. Metodene brukes på ulike sprekk-klassifiseringssystemer for tilstrekkelig sammenlikning. MNIST-datasettet brukes til å trene en enkel klassifiserer, REDHUS sprekkdatasettet brukes til å trene en DeepLabV3 og en U-Net klassifiserer, og et Kaggle sprekkdatasett brukes til å trene enda en DeepLabV3 klassifiserer for å kunne sammenligne effekten av usikkerhetsestimering på de to sprekkdatasettene. Noen viktige funn i oppgaven er at usikkerhetsestimeringsmetodene hovedsakelig forbedrer ytelsen til klassifiseringssystemene, med noen unntak, og prediktiv entropi beregnet av metodene kan brukes til å tolke og visualisere usikkerheten til sprekkprediksjonene. Bayesiansk beslutningsteori blir brukt for å finne ut hvor grensen for usikkerhet går, som igjen kan brukes til å forkaste observasjoner eller markere piksler som usikre. MNIST-klassifiseringssystemet forkaster observasjoner med høy usikkerhet over den beregnede entropigrensen, fordi disse observasjonene ville i praksis blitt gjennomgått på nytt. Segmenteringsklassifikasjonssystemene produserer usikkerhetskart som direkte presenterer usikkerheten for hver bildepiksel. Integrering av usikkerhetsestimering i autonom inspeksjon vil kunne varsle når klassifiseringssystemet er usikker over en viss entropigrense og spørre etter en ny gjennomgang når det trengs. Da kan prediksjonene til klassifikasjonssystemene bli stolt mer på dersom de har lav usikkerhet, i stedet for å godkjenne dem som riktige uten videre. Den nye metoden TT Pixel Variation blir sammenlignet med MC Dropout for å evaluere om den potensielt kunne erstattet MC Dropout, med konklusjonen at det varierer med klassifisereren. Metodene oppnår for det meste lignende trender i entropidistribusjonene, som styrker argumentet for erstatning. I de fleste tilfeller forbedrer MC Dropout ytelsen til klassifisererne mer enn TT Pixel Variation, som er en ulempe ved erstatning. Eksperimenter med alternative versjoner av TT Pixel Variation gir interessante observasjoner som utfordrer den matematiske teorien bak metoden og hva slags type usikkerhet metoden faktisk estimerer. Dette blir diskutert og etterlater et åpent spørsmål for videre forskning.
dc.description.abstract	The DNV REDHUS project aims to deploy autonomous detection and classification of defects inside ship ballast tanks, which is traditionally done by human surveyors. If machine learning (ML) detection and classification systems are to be trusted to replace manual inspection, it is crucial to understand the certainty with which they make decisions. This entails that observations with low uncertainty are trusted to be true, and observations with high uncertainty need a second review. There is uncertainty in ML systems that can arise from both the input data and the system's fit to the data, respectively called aleatoric and epistemic uncertainty. Both types of uncertainties can be estimated. Epistemic uncertainty traditionally requires algorithms that need knowledge of the neural network in order to be estimated. In this master's thesis, the aim is to estimate both types of uncertainty and investigate whether they both can be estimated by test-time methods that do not require altering the ML system. In the thesis, two existing methods of estimating uncertainty are implemented, Monte-Carlo (MC) Dropout and Test-Time Augmentation (TTA), which respectively estimate epistemic and aleatoric uncertainty. Additionally, the novel method Test-Time (TT) Pixel Variation is proposed and tested to investigate whether it can potentially replace MC Dropout as an estimator of epistemic uncertainty, especially in cases where the neural network is inaccessible to alter. The methods are applied to several image crack classification systems to test their effect on different classifiers and datasets. The MNIST dataset is used to train a simple classifier, the REDHUS crack dataset is used to train a DeepLabV3 and a U-Net classifier, whereas a Kaggle crack dataset is used for training another DeepLabV3 classifier in order to compare the effect of uncertainty estimation on the two crack datasets. Some key findings in the thesis are that the uncertainty estimation methods mostly improve the classifiers' performances, with some exceptions, and the predictive entropy computed by the methods can be used to interpret and visualize the uncertainty of crack predictions. Bayesian decision theory is used to determine the limit of acceptable uncertainty, which can then be used to discard observations or mark pixels as uncertain. The MNIST classifier successfully discards samples with high uncertainty above the computed limit because these observations would, in practice, be subject to a second review. The segmentation classifiers obtain uncertainty maps that directly present the uncertainty of each image pixel. Integrating uncertainty estimation in autonomous inspection will flag when the classifier is uncertain beyond a set limit and prompt that a second review is needed. The certain predictions of the classifier can then better be trusted instead of blindly approving them as correct. The novel method TT Pixel Variation is compared to MC Dropout to evaluate whether it potentially could replace MC Dropout, with the conclusion that it varies with the classifier. The methods mostly have similar trends in their entropy distributions, strengthening the argument for replacement. MC Dropout, in most cases, improves the classifiers' performances more than TT Pixel Variation, which would make replacement disadvantageous. Experiments with alternated versions of TT Pixel Variation yield interesting observations that challenge the mathematical theory behind the method and what type of uncertainty the method actually estimates. This is discussed and leaves an open question for further research.
dc.language	eng
dc.publisher	NTNU
dc.title	Uncertainty Estimation in Image Classification as a Step Toward Autonomous Inspection
dc.type	Master thesis

Files in this item

Files	Size	Format	View

This item appears in the following Collection(s)

Institutt for teknisk kybernetikk [3714]

Show simple item record