Vis enkel innførsel

dc.contributor.advisorSvendsen, Torbjørnnb_NO
dc.contributor.authorGundersen, Terjenb_NO
dc.date.accessioned2014-12-19T13:45:45Z
dc.date.accessioned2015-12-22T11:44:08Z
dc.date.available2014-12-19T13:45:45Z
dc.date.available2015-12-22T11:44:08Z
dc.date.created2010-09-22nb_NO
dc.date.issued2010nb_NO
dc.identifier352717nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/2369982
dc.description.abstractIn this thesis, a probabilistic model for transforming a voice to sound like another specific voice is tested. The model is fully automatic and only requires some 100 training sentences from both speakers with the same acoustic content. The classical source-filter decomposition allows prosodic and spectral transformation to be performed independently. The transformations are based on a Gaussian mixture model and a transformation function suggested by Y. Stylianou. Feature vectors of the same content from the source and target speaker, aligned in time by dynamic time warping, are fitted to a GMM. The short time spectra, represented as cepstral coefficients and derived from LPC, and the pitch periods, represented as fundamental frequency estimated from the RAPT algorithm, are transformed with the same probabilistic transformation function. Several techniques of spectrum and pitch transformation were assessed in addition to some novel smoothing techniques of the fundamental frequency contour. The pitch transform was implemented on the excitation signal from the inverse LP filtering by time domain PSOLA. The transformed spectrum parameters were used in the synthesis filter with the transformed excitation as input to yield the transformed voice. A listening test was performed with the best setup from objective tests and the results indicate that it is possible to recognise the transformed voice as the target speaker with a 72 % probability. However, the synthesised voice was affected by a muffling effect due to incorrect frequency transformation and the prosody sounded somewhat robotic.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for elektronikk og telekommunikasjonnb_NO
dc.subjectntnudaimno_NO
dc.titleVoice Transformation based on Gaussian mixture modelsnb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber55nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for elektronikk og telekommunikasjonnb_NO


Tilhørende fil(er)

Thumbnail
Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel