Word Discovery from Unsegmented Speech

Aune, Astrid

dc.contributor.advisor	Salvi, Giampiero
dc.contributor.author	Aune, Astrid
dc.date.accessioned	2021-09-15T16:56:36Z
dc.date.available	2021-09-15T16:56:36Z
dc.date.issued	2020
dc.identifier	no.ntnu:inspera:53184405:45146495
dc.identifier.uri	https://hdl.handle.net/11250/2778104
dc.description.abstract	Hensikten til denne oppgaven er å finne ord i sammenhengende tale ved hjelp av ikke-veiledet maskinlæring. Det ble testet med to metoder av latent faktoranalyse; Non-Negative Matrix Factorization (NNMF) og Beta Process Factor Analysis (BPFA). Ved å se på overganger mellom basisenhetene (bokstaver eller foner) og oppdage gjentagende mønstre, klarer disse to metodene å finne estimater av ordene som befinner seg i talesekvensene. Den største forskjellen mellom de to algoritmene er at NNMF trenger å vite antall ord på forhånd, mens BPFA klarer å estimere dette tallet i tillegg til estimatene av ordene i datasettet. Metodene ble testet med fire ulike metoder å representere talesekvensene på basert på overganger mellom basisenhetene av ulik kompleksitet. Resultatene viser oss at både NNMF og BPFA presterer bra så lenge størrelsen på vokabularet er liten nok. For de større vokabularene presterer den mest komplekse datarepresentasjonen bedre enn de enklere representasjonene. Men for mindre vokabularer er det ofte tilstrekkelig med den enkleste datarepresentasjonen som kun ser på 1.ordens overganger.
dc.description.abstract	The goal of the thesis is to discover words in unsegmented speech in an unsupervised way. We experimented with two methods of latent factor analysis; the Non-Negative Matrix Factorization (NNMF) and the Beta Process Factor Analysis (BPFA). By looking at the transitions between the subword units (letters or phones) and finding recurring patterns, these techniques are able to discover and estimate the words present in the utterances. The main difference between the two algorithms is that NNMF needs prior knowledge of the vocabulary size, whereas BPFA is able to infer this knowledge as well as the estimations of the words from the data set. We tested the methods using four different types of data representation, based on transitions between subword units with different complexities. The results show us that both NNMF and BPFA perform well, as long as the vocabulary size is small enough. For larger vocabularies, the most complex representation performs better than the simpler ones. However, for small vocabularies, using the 1st-order subword unit transitions is often a sufficient data representation.
dc.language	eng
dc.publisher	NTNU
dc.title	Word Discovery from Unsegmented Speech
dc.type	Master thesis

Files in this item

Name:: no.ntnu:inspera:53184405:45146 ...
Size:: 9.757Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Institutt for elektroniske systemer [2286]

Show simple item record