Show simple item record

dc.contributor.advisorSvendsen, Torbjørnnb_NO
dc.contributor.authorSandsmark, Håkonnb_NO
dc.date.accessioned2014-12-19T13:47:52Z
dc.date.accessioned2015-12-22T11:47:14Z
dc.date.available2014-12-19T13:47:52Z
dc.date.available2015-12-22T11:47:14Z
dc.date.created2012-11-08nb_NO
dc.date.issued2012nb_NO
dc.identifier566519nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/2370579
dc.description.abstractTwo systems for spoken document classification are implemented by combining an automatic speech recognizer with the two classification algorithms naive Bayes and logistic regression. The focus is on how to handle the inherent uncertainty in the output of the speech recognizer. Feature extraction is performed by computing expected word counts from speech recognition lattices, and subsequently removing words that are found to carry little or noisy information about the topic label, as determined by the information gain metric. The systems are evaluated by performing cross-validation on broadcast news stories, and the classification accuracy is measured with different configurations and on recognition output with different word error rates. The results show that a relatively high classification accuracy can be obtained with word error rates around 50%, and that the benefit of extracting features from lattices instead of 1-best transcripts increases with increasing word error rates.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for elektronikk og telekommunikasjonnb_NO
dc.subjectntnudaim:7911no_NO
dc.titleSpoken Document Classification of Broadcast Newsnb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber51nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjonnb_NO


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record