Implementation of a System for Automatic Speech Recognition of Child Speech

Walsøe, Andre Utne

dc.contributor.advisor	Svendsen, Torbjørn
dc.contributor.author	Walsøe, Andre Utne
dc.date.accessioned	2019-09-11T11:07:51Z
dc.date.created	2016-06-14
dc.date.issued	2016
dc.identifier	ntnudaim:14729
dc.identifier.uri	http://hdl.handle.net/11250/2615922
dc.description.abstract	Most speech recognizers today are trained on adult speech corpora. Child speech differ from adult speech on several areas. These differences cause severe degradation in the performance of an ASR system developed for adult speakers when employed to recognize child speech. Recording sufficiently large child speech corpuses are expensive. NTNU has therefore developed a method which trains a child speech recognizer by transforming a database of adult speech so that it corresponds better with child speech. For this purpose speaker adaption techniques like VTLN and SAT are applied. ChildSR has an enormous potential for use in computer tools for speech and language development. While it may not be able to replace the teacher-pupil interaction it will vastly increase the assistance which a child gets. It also has a potential to help make computer technology available to new populations which have not used computers to its extent because of physical disabilities, or similar. Interactive entertainment and talking toys are examples of other applications. This work is intended to further develop and replace a system developed by NTNU for automatic speech recognition of child speech. The new system is developed with a focus on cross-platform compatibility, performance and efficiency. The original system was developed by D.R Sanand at NTNU for the article "Synthetic Speaker Models Using VTLN to Improve the Performance of Children in Mismatched Speaker Conditions for ASR". The original system was mainly written in Bash and Perl employing the Hidden Markov Toolkit. As the original system was written for research purposes it had many redundant modules. The first step was to strip down the old system to its bare necessities. Following that, to ensure cross-platform probability, the system was rewritten in Python. With the python system WER=11.67% was achieved, which is the same WER as the original system. This confirms that the python implementation is correct. For training the adult speech corpus TIMIT is used, testing is performed with the child speech corpus CMUKids. In order to increase ChildSR performance for the Python implementation experiments carried out to optimize decoding with HVite in the recognition of adapted test data. Choosing the right Word Insertion Penalties (PEN) and Grammar Scale Factors(SCALE) impacts recognition performance significantly. Tests were therefore run with 450 different combinations of PEN and SCALE to find the combination which minimized WER. The optimization resulted in WER=7.03, i.e a 39.7% improvement relative to the original system. In order to analyze the software further, run time measures were performed on the scripts to determine the duration of the different processes.	en
dc.language	eng
dc.publisher	NTNU
dc.subject	Elektronikk, Signalbehandling	en
dc.title	Implementation of a System for Automatic Speech Recognition of Child Speech	en
dc.type	Master thesis	en
dc.source.pagenumber	72
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi og elektroteknikk,Institutt for elektroniske systemer	nb_NO
dc.date.embargoenddate	10000-01-01

Tilhørende fil(er)

Filnavn:: 14729_FULLTEXT.pdf
Størrelse:: 1.014Mb
Format:: PDF

Låst

Filnavn:: 14729_COVER.pdf
Størrelse:: 1.556Mb
Format:: PDF

Låst

Filnavn:: 14729_ATTACHMENT.zip
Størrelse:: 1.401Gb
Format:: application/zip

Låst

Denne innførselen finnes i følgende samling(er)

Institutt for elektroniske systemer [2286]

Vis enkel innførsel