A Framework for Speech Recognition using Logistic Regression

Birkenes, Øystein

dc.contributor.advisor	Johnsen, Magne Hallstein	nb_NO
dc.contributor.advisor	Myrvoll, Tor Andre	nb_NO
dc.contributor.author	Birkenes, Øystein	nb_NO
dc.date.accessioned	2014-12-19T13:29:27Z
dc.date.available	2014-12-19T13:29:27Z
dc.date.created	2007-08-06	nb_NO
dc.date.issued	2007	nb_NO
dc.identifier	122563	nb_NO
dc.identifier.isbn	978-82-471-3621-8	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/249711
dc.description.abstract	Although discriminative approaches like the support vector machine or logistic regression have had great success in many pattern recognition application, they have only achieved limited success in speech recognition. Two of the difficulties often encountered include 1) speech signals typically have variable lengths, and 2) speech recognition is a sequence labeling problem, where each spoken utterance corresponds to a sequence of words or phones. In this thesis, we present a framework for automatic speech recognition using logistic regression. We solve the difficulty of variable length speech signals by including a mapping in the logistic regression framework that transforms each speech signal into a fixed-dimensional vector. The mapping is defined either explicitly with a set of hidden Markov models (HMMs) for the use in penalized logistic regression (PLR), or implicitly through a sequence kernel to be used with kernel logistic regression (KLR). Unlike previous work that has used HMMs in combination with a discriminative classification approach, we jointly optimize the logistic regression parameters and the HMM parameters using a penalized likelihood criterion. Experiments show that joint optimization improves the recognition accuracy significantly. The sequence kernel we present is motivated by the dynamic time warping (DTW) distance between two feature vector sequences. Instead of considering only the optimal alignment path, we sum up the contributions from all alignment paths. Preliminary experiments with the sequence kernel show promising results. A two-step approach is used for handling the sequence labeling problem. In the first step, a set of HMMs is used to generate an N-best list of sentence hypotheses for a spoken utterance. In the second step, these sentence hypotheses are rescored using logistic regression on the segments in the N-best list. A garbage class is introduced in the logistic regression framework in order to get reliable probability estimates for the segments in the N-best lists. We present results on both a connected digit recognition task and a continuous phone recognition task.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Fakultet for informasjonsteknologi, matematikk og elektroteknikk	nb_NO
dc.relation.ispartofseries	Doktoravhandlinger ved NTNU, 1503-8181; 2007:165	nb_NO
dc.subject	Automatic speech recognition	en_GB
dc.subject	TECHNOLOGY: Information technology: Signal processing	en_GB
dc.title	A Framework for Speech Recognition using Logistic Regression	nb_NO
dc.type	Doctoral thesis	nb_NO
dc.source.pagenumber	114	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk	nb_NO
dc.description.degree	PhD i elektronteknikk	nb_NO
dc.description.degree	PhD in Electrical Engineering	en_GB

Tilhørende fil(er)

Filnavn:: 122563_FULLTEXT01.pdf
Størrelse:: 1.844Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Fakultet for informasjonsteknologi og elektroteknikk (Uspesifisert) [120]

Vis enkel innførsel