Vis enkel innførsel

dc.contributor.authorHamar, Jarle Baucknb_NO
dc.date.accessioned2014-12-19T13:48:25Z
dc.date.accessioned2015-12-22T11:48:01Z
dc.date.available2014-12-19T13:48:25Z
dc.date.available2015-12-22T11:48:01Z
dc.date.created2013-06-21nb_NO
dc.date.issued2013nb_NO
dc.identifier631569nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/2370720
dc.description.abstractA common way to construct a large vocabulary continuous speech recogniser LVCSR is to use 3 state HMMs to model phonemic units. In this dissertation the focus is to improve this standard phone model. To this end three alternative phone recognition systems will be proposed. Central in the first two systems is a set of Acoustic SubWord Units (ASWUs), which are used in order to train phone models with an extended state topology. This extended topology contains several parallel paths and allows the model to vary the amount of states that are employed for each realisation of the phones. In the first system this topology is fixed with four parallel paths which contains one, two, three or four states. A novel training algorithm is developed in order to train each of the states properly. In the second system the number of paths and the number of states in each of the states are derived in a data driven manner using an algorithm for pronunciation variation modelling (PVM). This algorithm is applied to the set of ASWUs in order to find variations for each phones, variations which are used to decide the topologies. The final system is a hybrid system that employs non-negative matrix factorisation (NMF), an algorithm capable of extracting latent units in a data driven manner to model the acoustic observations. This hybrid was proposed before in the literature for modelling audio mixtures. In this dissertation modifications to this original hybrid, the non-negative HMM (N-HMM), are suggested for it to be used on the speech recognition task. The main contribution is to introduce dependency on state duration for the output probability distribution functions. This modified structure is referred to as the non-negative durational HMM (NdHMM).nb_NO
dc.languageengnb_NO
dc.publisherNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjonnb_NO
dc.relation.ispartofseriesDoktoravhandlinger ved NTNU, 1503-8181; 2013:185nb_NO
dc.titleUsing Sub-Phonemic Units for HMM Based Phone Recognitionnb_NO
dc.typeDoctoral thesisnb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjonnb_NO
dc.description.degreePhD i elektronikk og telekommunikasjonnb_NO
dc.description.degreePhD in Electronics and Telecommunication


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel