Voice Transformation based on Gaussian mixture models

Gundersen, Terje

dc.contributor.advisor	Svendsen, Torbjørn	nb_NO
dc.contributor.author	Gundersen, Terje	nb_NO
dc.date.accessioned	2014-12-19T13:45:45Z
dc.date.accessioned	2015-12-22T11:44:08Z
dc.date.available	2014-12-19T13:45:45Z
dc.date.available	2015-12-22T11:44:08Z
dc.date.created	2010-09-22	nb_NO
dc.date.issued	2010	nb_NO
dc.identifier	352717	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/2369982
dc.description.abstract	In this thesis, a probabilistic model for transforming a voice to sound like another specific voice is tested. The model is fully automatic and only requires some 100 training sentences from both speakers with the same acoustic content. The classical source-filter decomposition allows prosodic and spectral transformation to be performed independently. The transformations are based on a Gaussian mixture model and a transformation function suggested by Y. Stylianou. Feature vectors of the same content from the source and target speaker, aligned in time by dynamic time warping, are fitted to a GMM. The short time spectra, represented as cepstral coefficients and derived from LPC, and the pitch periods, represented as fundamental frequency estimated from the RAPT algorithm, are transformed with the same probabilistic transformation function. Several techniques of spectrum and pitch transformation were assessed in addition to some novel smoothing techniques of the fundamental frequency contour. The pitch transform was implemented on the excitation signal from the inverse LP filtering by time domain PSOLA. The transformed spectrum parameters were used in the synthesis filter with the transformed excitation as input to yield the transformed voice. A listening test was performed with the best setup from objective tests and the results indicate that it is possible to recognise the transformed voice as the target speaker with a 72 % probability. However, the synthesised voice was affected by a muffling effect due to incorrect frequency transformation and the prosody sounded somewhat robotic.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Institutt for elektronikk og telekommunikasjon	nb_NO
dc.subject	ntnudaim	no_NO
dc.title	Voice Transformation based on Gaussian mixture models	nb_NO
dc.type	Master thesis	nb_NO
dc.source.pagenumber	55	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjon	nb_NO

Tilhørende fil(er)

Filnavn:: 352717_ATTACHMENT01.zip
Størrelse:: 1.558Mb
Format:: Ukjent

Åpne

Filnavn:: 352717_FULLTEXT01.pdf
Størrelse:: 1.246Mb
Format:: PDF

Åpne

Filnavn:: 352717_COVER01.pdf
Størrelse:: 48.10Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for elektroniske systemer [2288]

Vis enkel innførsel