Feature Extraction for Automatic Speech Recognition in Noisy Acoustic Environments

Gajic, Bojana

dc.contributor.author	Gajic, Bojana	nb_NO
dc.date.accessioned	2014-12-19T13:29:40Z
dc.date.available	2014-12-19T13:29:40Z
dc.date.created	2002-06-21	nb_NO
dc.date.issued	2002	nb_NO
dc.identifier	125221	nb_NO
dc.identifier.isbn	82-471-5457-9	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/249782
dc.description.abstract	This thesis presents a study of alternative speech feature extraction methods aimed at increasing robustness of automatic speech recognition (ASR) against additive background noise. Spectral peak positions of speech signals remain practically unchanged in presence of additive background noise. Thus, it was expected that emphasizing spectral peak positions in speech feature extraction would result in improved noise robustness of ASR systems. If frequency subbands are properly chosen, dominant subband frequencies can serve as reasonable estimates of spectral peak positions. Thus, different methods for incorporating dominant subband frequencies into speech feature vectors were investigated in this study. To begin with, two earlier proposed feature extraction methods that utilize dominant subband frequency information were examined. The first one uses zero-crossing statistics of the subband signals to estimate dominant subband frequencies, while the second one uses subband spectral centroids. The methods were compared with the standard MFCC feature extraction method on two different recognition tasks in various background conditions. The first method was shown to improve ASR performance on both recognition tasks at sufficiently high noise levels. The improvement was, however, smaller on the more complex recognition task. The second method, on the other hand, led to some reduction in ASR performance in all testing conditions. Next, a new method for incorporating subband spectral centroids into speech feature vectors was proposed, and was shown to be considerably more robust than the standard MFCC method on both ASR tasks. The main difference between the proposed method and the zero-crossing based method is in the way they utilize dominant subband frequency information. It was shown that the performance improvement due to the use of dominant subband frequency information was considerably larger for the proposed method than for the ZCPA method, especially on the more complex recognition task. Finally, the computational complexity of the proposed method is two orders of magnitude lower than that of the zero-crossing based method, and of the same order of magnitude as the standard MFCC method.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Fakultet for informasjonsteknologi, matematikk og elektroteknikk	nb_NO
dc.relation.ispartofseries	Dr. ingeniøravhandling; 2002:60	nb_NO
dc.subject	speech recognition	en_GB
dc.subject	feature extraction	en_GB
dc.subject	noise robustness	en_GB
dc.title	Feature Extraction for Automatic Speech Recognition in Noisy Acoustic Environments	nb_NO
dc.title.alternative	Parameteruttrekning for automatisk talegjenkjenning i støyende omgivelser	nb_NO
dc.type	Doctoral thesis	nb_NO
dc.source.pagenumber	111	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk	nb_NO
dc.description.degree	dr.ing.	nb_NO
dc.description.degree	dr.ing.	en_GB

Tilhørende fil(er)

Filnavn:: 125221_FULLTEXT01.pdf
Størrelse:: 1.128Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Fakultet for informasjonsteknologi og elektroteknikk (Uspesifisert) [120]

Vis enkel innførsel