Speech Analysis for Automatic Speech Recognition

Alcaraz Meseguer, Noelia

dc.contributor.advisor	Svendsen, Torbjørn	nb_NO
dc.contributor.author	Alcaraz Meseguer, Noelia	nb_NO
dc.date.accessioned	2014-12-19T13:43:49Z
dc.date.accessioned	2015-12-22T11:41:31Z
dc.date.available	2014-12-19T13:43:49Z
dc.date.available	2015-12-22T11:41:31Z
dc.date.created	2010-09-03	nb_NO
dc.date.issued	2009	nb_NO
dc.identifier	347957	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/2369233
dc.description.abstract	The classical front end analysis in speech recognition is a spectral analysis which parametrizes the speech signal into feature vectors; the most popular set of them is the Mel Frequency Cepstral Coefficients (MFCC). They are based on a standard power spectrum estimate which is first subjected to a log-based transform of the frequency axis (mel- frequency scale), and then decorrelated by using a modified discrete cosine transform. Following a focused introduction on speech production, perception and analysis, this paper gives a study of the implementation of a speech generative model; whereby the speech is synthesized and recovered back from its MFCC representations. The work has been developed into two steps: first, the computation of the MFCC vectors from the source speech files by using HTK Software; and second, the implementation of the generative model in itself, which, actually, represents the conversion chain from HTK-generated MFCC vectors to speech reconstruction. In order to know the goodness of the speech coding into feature vectors and to evaluate the generative model, the spectral distance between the original speech signal and the one produced from the MFCC vectors has been computed. For that, spectral models based on Linear Prediction Coding (LPC) analysis have been used. During the implementation of the generative model some results have been obtained in terms of the reconstruction of the spectral representation and the quality of the synthesized speech.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Institutt for elektronikk og telekommunikasjon	nb_NO
dc.subject	ntnudaim	no_NO
dc.title	Speech Analysis for Automatic Speech Recognition	nb_NO
dc.type	Master thesis	nb_NO
dc.source.pagenumber	87	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjon	nb_NO

Tilhørende fil(er)

Filnavn:: 347957_ATTACHMENT01.zip
Størrelse:: 683.8Kb
Format:: Ukjent

Åpne

Filnavn:: 347957_COVER01.pdf
Størrelse:: 46.43Kb
Format:: PDF

Åpne

Filnavn:: 347957_FULLTEXT01.pdf
Størrelse:: 944.7Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for elektroniske systemer [2288]

Vis enkel innførsel