Vis enkel innførsel

dc.contributor.advisorSvendsen, Torbjørn
dc.contributor.authorHaug, Jon Magnus Momrak
dc.date.accessioned2017-10-23T14:02:46Z
dc.date.available2017-10-23T14:02:46Z
dc.date.created2017-07-06
dc.date.issued2017
dc.identifierntnudaim:17261
dc.identifier.urihttp://hdl.handle.net/11250/2461496
dc.description.abstractThis thesis aims to implement a voice conversion system that transforms one persons voice into another persons voice. Mel Frequency Cepstral Coefficients are used as coefficients for one set of tests, while STRAIGHT spectrogram is tried out as another set of features. The system is built using an artificial neural network approach when mapping the features from one speaker to the other. Training the system is done using first around 300 sentences from 6 speakers that will not be used for testing. This builds a speaker independet stacked autoencoder that is used as a pre-training for the complete network. The encoder and decoder part of the stacked autoencoder is then separated by a shallow artificial neural network mapping layer, mapping features from the source speaker to the target speaker. This is done using only 2, or 70 sentences each from these 2 speakers. Finally, the complete network when combining the stacked autoencoder with the shallow artificial neural network is trained, also using 2 or 70 sentences. The performance of the individual autoencoders, the complete stacked autoencoder and the complete network has been tested using mel cepstral distortion. The complete network, when putting everything together was unable to train properly.
dc.languageeng
dc.publisherNTNU
dc.subjectKommunikasjonsteknologi, Signalbehandling
dc.titleVoice Conversion using Deep Learning
dc.typeMaster thesis


Tilhørende fil(er)

Thumbnail
Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel