Spoken Document Classification of Broadcast News

Sandsmark, Håkon

dc.contributor.advisor	Svendsen, Torbjørn	nb_NO
dc.contributor.author	Sandsmark, Håkon	nb_NO
dc.date.accessioned	2014-12-19T13:47:52Z
dc.date.accessioned	2015-12-22T11:47:14Z
dc.date.available	2014-12-19T13:47:52Z
dc.date.available	2015-12-22T11:47:14Z
dc.date.created	2012-11-08	nb_NO
dc.date.issued	2012	nb_NO
dc.identifier	566519	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/2370579
dc.description.abstract	Two systems for spoken document classification are implemented by combining an automatic speech recognizer with the two classification algorithms naive Bayes and logistic regression. The focus is on how to handle the inherent uncertainty in the output of the speech recognizer. Feature extraction is performed by computing expected word counts from speech recognition lattices, and subsequently removing words that are found to carry little or noisy information about the topic label, as determined by the information gain metric. The systems are evaluated by performing cross-validation on broadcast news stories, and the classification accuracy is measured with different configurations and on recognition output with different word error rates. The results show that a relatively high classification accuracy can be obtained with word error rates around 50%, and that the benefit of extracting features from lattices instead of 1-best transcripts increases with increasing word error rates.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Institutt for elektronikk og telekommunikasjon	nb_NO
dc.subject	ntnudaim:7911	no_NO
dc.title	Spoken Document Classification of Broadcast News	nb_NO
dc.type	Master thesis	nb_NO
dc.source.pagenumber	51	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjon	nb_NO

Files in this item

Name:: 566519_FULLTEXT01.pdf
Size:: 1.198Mb
Format:: PDF

View/Open

Name:: 566519_COVER01.pdf
Size:: 184.1Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Institutt for elektroniske systemer [2285]

Show simple item record