Show simple item record

dc.contributor.authorSoufifar, Mehdinb_NO
dc.date.accessioned2014-12-19T13:50:21Z
dc.date.accessioned2015-12-22T11:51:29Z
dc.date.available2014-12-19T13:50:21Z
dc.date.available2015-12-22T11:51:29Z
dc.date.created2014-11-21nb_NO
dc.date.issued2014nb_NO
dc.identifier765169nb_NO
dc.identifier.isbn978-82-326-0496-8 (printed ver.)nb_NO
dc.identifier.isbn978-82-326-0497-5 (electronic ver.)
dc.identifier.issnISSN 1503-8181
dc.identifier.urihttp://hdl.handle.net/11250/2371169
dc.description.abstractThis thesis addresses the language recognition problem with a special focus on phonotactic language recognition. A full description of different steps in a language recognition system is provided. We study state-of-the-art speech modeling techniques in language recognition that comprise phonotactic, acoustic and prosodic language modeling. A brief understanding of the state-of-the-art subspace modeling technique known as the iVector model for continuous features is given. Using recent proposals on training the iVector model for continuous features, we explain our recipe for extracting iVectors for acoustic and prosodic features that results in similar language recognition performance as the state-of-the-art results reported in the recent literature. In the next step, inspired by the intuition behind the iVector model for continuous features, we propose our iVector model for discrete features. After a general explanation of the model, adaption of the proposed model to the n-gram model that is used to extract iVectors representing the language phonotactics is given. Finally a regularized iVector extraction model for discrete features that is robust to model overfitting is proposed. The full theoretical derivation of the proposed iVector model for discrete features is given. We also explain use of discriminative and generative classifiers for training language models based on the different extracted iVectors. Effects of the iVector normalizations for binary and multi-class formulation of the used classifiers is also studied. We report performances of our iVector model on NIST language recognition evaluation LRE2009, LRE2011 and RATS language recognition as the most recent and challenging language recognition task. Using our phonotactic iVector model, we obtain a significant improvement over our phonotactic baseline system which was a state-of-the-art system at the time of starting this thesis. Our results on NIST LRE09, NIST LRE2011 and RATS confirms superior advantage of our iVector model for discrete features compared to the other state-of-the-art phonotactic system.nb_NO
dc.languageengnb_NO
dc.publisherNTNUnb_NO
dc.relation.ispartofseriesDoctoral theses at NTNU, 2014:292nb_NO
dc.titleSubspace Modeling of Discrete Features for Language Recognitionnb_NO
dc.typeDoctoral thesisnb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjonnb_NO
dc.description.degreePhD i elektronikk og telekommunikasjonnb_NO
dc.description.degreePhD in Electronics and Telecommunication


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record