Inductive Bias And The Information Bottleneck Method

Landsverk, Marius Mario Cervera

dc.contributor.advisor	Riemer-Sørensen, Signe
dc.contributor.advisor	Lie, Knut-Andreas
dc.contributor.author	Landsverk, Marius Mario Cervera
dc.date.accessioned	2021-10-22T17:20:36Z
dc.date.available	2021-10-22T17:20:36Z
dc.date.issued	2021
dc.identifier	no.ntnu:inspera:75366163:20915981
dc.identifier.uri	https://hdl.handle.net/11250/2825098
dc.description.abstract	Induktiv bias referer til forskjellige arkitekturvalg som gjøres når man designer modeller for dyp læring. Spesielt så handler det om hvilke antakelser som gjøres om inngangsdataen, noe som i sin tur påvirker arkitekturvalget. Eksempler på forskjellige nevrale arkitekturer er konvolusjonale nevrale nett for bildedata, grafkonvolusjonale nevrale nett for grafdata og rekurrente nevrale nett for sekvensiell data. Informasjonsflaskehalsmetoden søker å kvantifisere en optimal balanse mellom kompresjon og presisjon for å beskrive en tilfeldig variabel $X$. For nevrale nettverk betrakter man påfølgende representasjoner $Z^i, Z^{i+1}, \dots$ som funksjoner av inngangsdataen $X$, og dermed kan man beregne den gjensidige informasjonen $I(X,Z^i)$, eller beregne $I(Y,Z^i)$ for den gjensidige informasjonen mellom representasjonen $Z^i$ og målvariabelen $Y$. Hovedideen er at jo dypere i nettverket man kommer, så vil representasjonene $Z^i$ få mindre informasjon om inngangsdataen $X$, og mer med målvariabelen $Y$. Dette kan tolkes som at nettverket er i stand til å fjerne unødvendig informasjon i inngangsvariabelen $X$, og er i stand til å generalisere ved å kun beholde informasjon som er relevant for å predikere $Y$. Ved å bruke informasjonsflaskehalsmetoden ønsker vi å belyse treningsprosedyren og læringsevnen til forskjellige nevrale arkitekturer. Tidligere arbeid har i hovedsak betraktet syntetiske datasett og nevrale strukturer som ikke brukes i praktiske anvendelser. I dette arbeidet så kommer vi til å benytte informasjonsflaskehals metoden for å sammenligne tre forskjellige nevrale arkitekturer med deres fulltilkoblede alternativer, sammen med sammenligninger av deres ytelsesevner. Vi begynner med å sammenligne et grafkonvolusjonalt nevralt nett med et fulltilkoblet nettverk trent på Cora datasettet. Deretter sammenligner vi et rekurrent nevralt nettverk med et fulltilkoblet nettverk på et datasett som inneholder navn fra forskjellige språk, der oppgaven er å klassifisere navn til riktige språk. Til slutt sammenligner vi et konvolusjonsnettverk med et fulltikoblet nettverk på MNIST datasettet.
dc.description.abstract	Inductive bias refers to architectural choices made when designing a deep learning model in order to facilitate the model learning on a particular kind of data. In particular, one makes assumptions about the structure of the data and designs a suitable model accordingly. Examples of architectures are convolutional neural networks for image data, graph-convolutional neural networks for graph data, and recurrent neural networks for sequential data. The information bottleneck method arose to quantify the optimal trade-off between compression and accuracy when summarizing a random variable $X$. As applied to neural networks, one considers each successive representation $Z^i, Z^{i+1}, \dots$ as functions of the input $X$, and one can thus compute the mutual information $I(X, Z^i)$ between the representation and the input, or $I(Y, Z^i)$ for the mutual information between the representation and the output. The main idea is that as the input is processed deeper in the network, the representations will lose information about the input $X$ and gain information about the output $Y$. This means that the network is able to generalize away unnecessary variance in $X$, and only retain the parts relevant for predicting $Y$. Using the information bottleneck, we seek to elucidate the training procedure and learning capabilities of different neural architectures. Previous works have mainly applied the method on synthetic datasets and architectures not commonly found in practical applications. In this work, we will be comparing the performance and information bottleneck for three different architectures of neural networks to their fully-connected counterparts. First, we will compare a graph neural network and a fully-connected network trained on the Cora citation dataset. Then we compare a recurrent neural network and a fully-connected network on a dataset consisting of names from different languages, with the task of classifying the correct language for each name. Finally, we compare a convolutional neural network with a fully-connected network on the MNIST dataset.
dc.language	eng
dc.publisher	NTNU
dc.title	Inductive Bias And The Information Bottleneck Method
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:75366163:20915 ...
Størrelse:: 14.54Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for matematiske fag [2451]

Vis enkel innførsel