Vis enkel innførsel

dc.contributor.authorMartínez del Hoyo Canterla, Alfonsonb_NO
dc.date.accessioned2014-12-19T13:47:11Z
dc.date.accessioned2015-12-22T11:46:11Z
dc.date.available2014-12-19T13:47:11Z
dc.date.available2015-12-22T11:46:11Z
dc.date.created2012-05-28nb_NO
dc.date.issued2012nb_NO
dc.identifier528775nb_NO
dc.identifier.isbn978-82-471-3336-1 (printed ver.)nb_NO
dc.identifier.isbn978-82-471-3337-8 (electronic ver.)
dc.identifier.urihttp://hdl.handle.net/11250/2370409
dc.description.abstractThis thesis presents methods and results for optimizing subword detectors in continuous speech. Speech detectors are useful within areas like detection-based ASR, pronunciation training, phonetic analysis, word spotting, etc. Firstly, we propose a structure suitable for subword detection. This structure is based on the standard HMM framework, but in each detector the MFCC feature extractor and the models are trained for the specific detection problem. Our experiments in the TIMIT database validate the effectiveness of this structure for detection of phones and articulatory features. Secondly, two discriminative training techniques are proposed for detector training. The first one is a modification of Minimum Classification Error training. The second one, Minimum Detection Error training, is the adaptation of Minimum Phone Error to the detection problem. Both methods are used to train HMMs and filterbanks in the detectors, isolated or jointly. MDE has the advantage that any detection performance criterion can be optimized directly. F-score and class accuracy optimization experiments show that MDE training is superior to the MCE-based method. The optimized filterbanks reflect some acoustical properties of the detection classes. Moreover, some changes are consistent over classes with similar acoustical properties. In addition, MDE-training of filterbanks results in filters significatively different than in the standard filterbank. In fact, some filters extract information from different critical bands. Finally, we propose a detection-based automatic speech recognition system. Detectors are built with the proposed HMM-based detection structure and trained discriminatively. The linguistic merger is based on an MLP/Viterbi decoder.nb_NO
dc.languageengnb_NO
dc.publisherNTNUnb_NO
dc.relation.ispartofseriesDoctoral Theses at NTNU, 1503-8181; 2012:36nb_NO
dc.subjectSpeech Recognition Detector Filterbanken_GB
dc.titleDesign of Detectors for Automatic Speech Recognitionnb_NO
dc.typeDoctoral thesisnb_NO
dc.source.pagenumber135nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjonnb_NO
dc.description.degreePhD i elektronikk og telekommunikasjonnb_NO
dc.description.degreePhD in Electronics and Telecommunication


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel