• norsk
    • English
  • English 
    • norsk
    • English
  • Login
View Item 
  •   Home
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for elektroniske systemer
  • View Item
  •   Home
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for elektroniske systemer
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Voice Transformation based on Gaussian mixture models

Gundersen, Terje
Master thesis
Thumbnail
View/Open
352717_ATTACHMENT01.zip (1.558Mb)
352717_FULLTEXT01.pdf (1.246Mb)
352717_COVER01.pdf (48.10Kb)
URI
http://hdl.handle.net/11250/2369982
Date
2010
Metadata
Show full item record
Collections
  • Institutt for elektroniske systemer [1575]
Abstract
In this thesis, a probabilistic model for transforming a voice to sound like another specific voice is tested. The model is fully automatic and only requires some 100 training sentences from both speakers with the same acoustic content. The classical source-filter decomposition allows prosodic and spectral transformation to be performed independently. The transformations are based on a Gaussian mixture model and a transformation function suggested by Y. Stylianou. Feature vectors of the same content from the source and target speaker, aligned in time by dynamic time warping, are fitted to a GMM. The short time spectra, represented as cepstral coefficients and derived from LPC, and the pitch periods, represented as fundamental frequency estimated from the RAPT algorithm, are transformed with the same probabilistic transformation function. Several techniques of spectrum and pitch transformation were assessed in addition to some novel smoothing techniques of the fundamental frequency contour. The pitch transform was implemented on the excitation signal from the inverse LP filtering by time domain PSOLA. The transformed spectrum parameters were used in the synthesis filter with the transformed excitation as input to yield the transformed voice. A listening test was performed with the best setup from objective tests and the results indicate that it is possible to recognise the transformed voice as the target speaker with a 72 % probability. However, the synthesised voice was affected by a muffling effect due to incorrect frequency transformation and the prosody sounded somewhat robotic.
Publisher
Institutt for elektronikk og telekommunikasjon

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit
 

 

Browse

ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDocument TypesJournalsThis CollectionBy Issue DateAuthorsTitlesSubjectsDocument TypesJournals

My Account

Login

Statistics

View Usage Statistics

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit