Talegjenkjenning av barnestemmer

Thorsrud, Ole Petter

dc.contributor.advisor	Svendsen, Torbjørn
dc.contributor.author	Thorsrud, Ole Petter
dc.date.accessioned	2017-11-17T15:02:16Z
dc.date.available	2017-11-17T15:02:16Z
dc.date.created	2017-06-08
dc.date.issued	2017
dc.identifier	ntnudaim:16844
dc.identifier.uri	http://hdl.handle.net/11250/2467027
dc.description.abstract	Today s speech recognition systems are based on adult speech corpora. In speech recognition of children s speech, the recognition system will not perform equally good as in the recognition of adult speech. This is due to large variations betwe- en the characteristics of adult speech and child speech. There are few available databases with child s speech, and it would be an expensive and time-consuming process to produce such databases. In this master thesis there will therefore be created a speech recognition system for children based on existing adult speech corpora. In the spring of 2016 a speech recognition system for children was created at NTNU in conjunction of a master thesis. The system was implemented with the speech tool Hidden Markov Toolkit (HTK), and it used training techniques such as Vo- cal Tract Length Normalization (VTLN) and Speaker Adaptiv Training (SAT). The speech recognition system performed well with childen s speech corpora, and had a word error rate WER = 11.7%. HTK is out of date, and the goal of this master thesis is to replace the HTK-toolkit with a newer toolkit Kaldi. Kaldi differs in the way that HTK is built, and the recognition system created in this task is therefore independent of the previously implemented HTK-system. Similar training methods (VTLN and SAT) are used, and a language model and grammar file is created for recognition. The evaluation methods used in the two systems are the same, and the speech recognition system implemented in this task performs a word error rate on 36.1% with training and recognition of VTLN and SAT. The difference in word error rate between the two systems is 24.4%, which is too high. Expected results was about the same as the HTK-implemented system at 11.7%, or even better. A possible source error could be the generated language model and its grammar file. There are many ways to create a language model, and the word weighting could be generated wrongly and result in a poor word error rate. By further testing of a speech recognition system for children in Kaldi, it would be wise to replace the TIMIT corpora with another adult data base. TIMIT works best with phonetic training and contains many complex sentences. A new adult database should be able to train and recognise words, in order to simplify the system.
dc.language	nob
dc.publisher	NTNU
dc.subject	Elektronikk (2årig), Signalbehandling
dc.title	Talegjenkjenning av barnestemmer
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: 16844_FULLTEXT.pdf
Størrelse:: 2.338Mb
Format:: PDF

Åpne

Filnavn:: 16844_ATTACHMENT.zip
Størrelse:: 3.209Mb
Format:: application/zip

Åpne

Filnavn:: 16844_COVER.pdf
Størrelse:: 179.3Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for elektroniske systemer [2316]

Vis enkel innførsel