Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning

Ali Humayun, Mohammad; Hameed, Ibrahim A.; Muslim Shah, Syed; Hassan Khan, Sohaib; Zafar, Irfan; Bin Ahmed, Saad; Shuja, Junaid

dc.contributor.author	Ali Humayun, Mohammad
dc.contributor.author	Hameed, Ibrahim A.
dc.contributor.author	Muslim Shah, Syed
dc.contributor.author	Hassan Khan, Sohaib
dc.contributor.author	Zafar, Irfan
dc.contributor.author	Bin Ahmed, Saad
dc.contributor.author	Shuja, Junaid
dc.date.accessioned	2019-07-01T09:47:12Z
dc.date.available	2019-07-01T09:47:12Z
dc.date.created	2019-06-30T10:30:18Z
dc.date.issued	2019
dc.identifier.citation	Applied Sciences. 2019, 9 (9), .	nb_NO
dc.identifier.issn	2076-3417
dc.identifier.uri	http://hdl.handle.net/11250/2602970
dc.description.abstract	Automatic Speech Recognition, (ASR) has achieved the best results for English, with end-to-end neural network based supervised models. These supervised models need huge amounts of labeled speech data for good generalization, which can be quite a challenge to obtain for low-resource languages like Urdu. Most models proposed for Urdu ASR are based on Hidden Markov Models (HMMs). This paper proposes an end-to-end neural network model, for Urdu ASR, regularized with dropout, ensemble averaging and Maxout units. Dropout and ensembles are averaging techniques over multiple neural network models while Maxout are units in a neural network which adapt their activation functions. Due to limited labeled data, Semi Supervised Learning (SSL) techniques are also incorporated to improve model generalization. Speech features are transformed into a lower dimensional manifold using an unsupervised dimensionality-reduction technique called Locally Linear Embedding (LLE). Transformed data along with higher dimensional features is used to train neural networks. The proposed model also utilizes label propagation-based self-training of initially trained models and achieves a Word Error Rate (WER) of 4% less than that reported as the benchmark on the same Urdu corpus using HMM. The decrease in WER after incorporating SSL is more significant with an increased validation data size.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	MDPI	nb_NO
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning	nb_NO
dc.type	Journal article	nb_NO
dc.type	Peer reviewed	nb_NO
dc.description.version	publishedVersion	nb_NO
dc.source.pagenumber	15	nb_NO
dc.source.volume	9	nb_NO
dc.source.journal	Applied Sciences	nb_NO
dc.source.issue	9	nb_NO
dc.identifier.doi	https://doi.org/10.3390/app9091956
dc.identifier.cristin	1708849
dc.description.localcode	© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).	nb_NO
cristin.unitcode	194,63,55,0
cristin.unitname	Institutt for IKT og realfag
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: applsci-09-01956-v2.pdf
Størrelse:: 3.041Mb
Format:: PDF
Beskrivelse:: Ali Humayun

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for IKT og realfag [602]
Publikasjoner fra CRIStin - NTNU [38672]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal