Robust Speech Recognition in the Presence of Additive Noise

Pettersen, Svein Gunnar

dc.contributor.author	Pettersen, Svein Gunnar	nb_NO
dc.date.accessioned	2014-12-19T13:42:42Z
dc.date.accessioned	2015-12-22T11:39:57Z
dc.date.available	2014-12-19T13:42:42Z
dc.date.available	2015-12-22T11:39:57Z
dc.date.created	2008-12-30	nb_NO
dc.date.issued	2009	nb_NO
dc.identifier	132735	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/2368843
dc.description.abstract	It is well known that additive noise can cause a significant decrease in performance for an automatic speech recognition (ASR) system. For an ASR system to maintain an acceptable level of performance in noisy conditions, measures must be taken to make it robust. Since prior information about the noise is usually not available, this information typically has to be obtained from the observed noisy utterance that is to be recognized. Model compensation is one way of achieving robustness towards noise. One of the main problems with model compensation is how to approximate the non-linear relationship between speech, noise, and noisy speech in the log-spectral domain. In an effort to investigate the effects of approximation accuracy, a comparative study of two existing and one new method for approximating this relationship is presented. The study shows that, although the approximation methods differ in accuracy on a one-dimensional example, the recognition results on Aurora2 are almost equal in practice. Due to several factors, the noisy speech parameter estimates obtained when performing model compensation will normally be uncertain, limiting the attainable performance. We propose a new model compensation approach, in which a robust decision rule is combined with traditional parallel model combination (PMC) to compensate for uncertainty. Experiments show that the proposed approach is effective in increasing performance at low signal-to-noise ratios (SNRs) for most noise types compared to PMC. Another way of improving ASR performance in noisy conditions is by applying a feature enhancement algorithm prior to recognition. Many existing feature enhancement techniques rely on probabilistic models of speech and noise. Thus, the performance is influenced by the quality of these models. Traditionally, the probabilistic models have been trained using maximum likelihood estimation. This dissertation investigates the use of an alternative estimation method for prior speech models, namely Bayesian learning. It is shown that, within the chosen experimental setup, Bayesian learning can be used for model selection, and that the recognition performance is comparable to the performance obtained with maximum likelihood in most cases. A good probabilistic model for the noise can be difficult to obtain, since it usually has to be estimated directly from the utterance at hand. In order to improve the quality of the noise model used by the feature enhancement algorithm, we investigate the use of voice activity detection (VAD) to obtain information about the noise. An advantage of the proposed VAD approach is that it works in the same domain as the speech recognizer. Experiments show that the VAD approach on average obtains a 10.8% error rate reduction compared to simply using a speech-free segment from the beginning of the utterance for noise modeling.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjon	nb_NO
dc.relation.ispartofseries	Doctoral Theses at NTNU, 1503-8181; 2009:14	nb_NO
dc.title	Robust Speech Recognition in the Presence of Additive Noise	nb_NO
dc.type	Doctoral thesis	nb_NO
dc.source.pagenumber	154	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjon	nb_NO
dc.description.degree	PhD i elektronteknikk	nb_NO
dc.description.degree	PhD in Electrical Engineering

Files in this item

Name:: 132735_FULLTEXT01.pdf
Size:: 902.9Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Institutt for elektroniske systemer [2286]

Show simple item record