Show simple item record

dc.contributor.authorPettersen, Svein Gunnarnb_NO
dc.date.accessioned2014-12-19T13:42:42Z
dc.date.accessioned2015-12-22T11:39:57Z
dc.date.available2014-12-19T13:42:42Z
dc.date.available2015-12-22T11:39:57Z
dc.date.created2008-12-30nb_NO
dc.date.issued2009nb_NO
dc.identifier132735nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/2368843
dc.description.abstractIt is well known that additive noise can cause a significant decrease in performance for an automatic speech recognition (ASR) system. For an ASR system to maintain an acceptable level of performance in noisy conditions, measures must be taken to make it robust. Since prior information about the noise is usually not available, this information typically has to be obtained from the observed noisy utterance that is to be recognized. Model compensation is one way of achieving robustness towards noise. One of the main problems with model compensation is how to approximate the non-linear relationship between speech, noise, and noisy speech in the log-spectral domain. In an effort to investigate the effects of approximation accuracy, a comparative study of two existing and one new method for approximating this relationship is presented. The study shows that, although the approximation methods differ in accuracy on a one-dimensional example, the recognition results on Aurora2 are almost equal in practice. Due to several factors, the noisy speech parameter estimates obtained when performing model compensation will normally be uncertain, limiting the attainable performance. We propose a new model compensation approach, in which a robust decision rule is combined with traditional parallel model combination (PMC) to compensate for uncertainty. Experiments show that the proposed approach is effective in increasing performance at low signal-to-noise ratios (SNRs) for most noise types compared to PMC. Another way of improving ASR performance in noisy conditions is by applying a feature enhancement algorithm prior to recognition. Many existing feature enhancement techniques rely on probabilistic models of speech and noise. Thus, the performance is influenced by the quality of these models. Traditionally, the probabilistic models have been trained using maximum likelihood estimation. This dissertation investigates the use of an alternative estimation method for prior speech models, namely Bayesian learning. It is shown that, within the chosen experimental setup, Bayesian learning can be used for model selection, and that the recognition performance is comparable to the performance obtained with maximum likelihood in most cases. A good probabilistic model for the noise can be difficult to obtain, since it usually has to be estimated directly from the utterance at hand. In order to improve the quality of the noise model used by the feature enhancement algorithm, we investigate the use of voice activity detection (VAD) to obtain information about the noise. An advantage of the proposed VAD approach is that it works in the same domain as the speech recognizer. Experiments show that the VAD approach on average obtains a 10.8% error rate reduction compared to simply using a speech-free segment from the beginning of the utterance for noise modeling.nb_NO
dc.languageengnb_NO
dc.publisherNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjonnb_NO
dc.relation.ispartofseriesDoctoral Theses at NTNU, 1503-8181; 2009:14nb_NO
dc.titleRobust Speech Recognition in the Presence of Additive Noisenb_NO
dc.typeDoctoral thesisnb_NO
dc.source.pagenumber154nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjonnb_NO
dc.description.degreePhD i elektronteknikknb_NO
dc.description.degreePhD in Electrical Engineering


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record