• norsk
    • English
  • English 
    • norsk
    • English
  • Login
View Item 
  •   Home
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for elektroniske systemer
  • View Item
  •   Home
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for elektroniske systemer
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Subspace Modeling of Discrete Features for Language Recognition

Soufifar, Mehdi
Doctoral thesis
Thumbnail
View/Open
765169_FULLTEXT01.pdf (1.141Mb)
URI
http://hdl.handle.net/11250/2371169
Date
2014
Metadata
Show full item record
Collections
  • Institutt for elektroniske systemer [2215]
Abstract
This thesis addresses the language recognition problem with a special focus on phonotactic language recognition. A full description of different steps in a language recognition system is provided. We study state-of-the-art speech modeling techniques in language recognition that comprise phonotactic, acoustic and prosodic language modeling. A brief understanding of the state-of-the-art subspace modeling technique known as the iVector model for continuous features is given. Using recent proposals on training the iVector model for continuous features, we explain our recipe for extracting iVectors for acoustic and prosodic features that results in similar language recognition performance as the state-of-the-art results reported in the recent literature. In the next step, inspired by the intuition behind the iVector model for continuous features, we propose our iVector model for discrete features. After a general explanation of the model, adaption of the proposed model to the n-gram model that is used to extract iVectors representing the language phonotactics is given. Finally a regularized iVector extraction model for discrete features that is robust to model overfitting is proposed. The full theoretical derivation of the proposed iVector model for discrete features is given. We also explain use of discriminative and generative classifiers for training language models based on the different extracted iVectors. Effects of the iVector normalizations for binary and multi-class formulation of the used classifiers is also studied.

We report performances of our iVector model on NIST language recognition evaluation LRE2009, LRE2011 and RATS language recognition as the most recent and challenging language recognition task. Using our phonotactic iVector model, we obtain a significant improvement over our phonotactic baseline system which was a state-of-the-art system at the time of starting this thesis. Our results on NIST LRE09, NIST LRE2011 and RATS confirms superior advantage of our iVector model for discrete features compared to the other state-of-the-art phonotactic system.
Publisher
NTNU
Series
Doctoral theses at NTNU, 2014:292

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit
 

 

Browse

ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDocument TypesJournalsThis CollectionBy Issue DateAuthorsTitlesSubjectsDocument TypesJournals

My Account

Login

Statistics

View Usage Statistics

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit