Vis enkel innførsel

dc.contributor.advisorGambäck, Björn
dc.contributor.authorMcAllister, Tyler
dc.date.accessioned2021-09-15T16:01:18Z
dc.date.available2021-09-15T16:01:18Z
dc.date.issued2020
dc.identifierno.ntnu:inspera:57393545:34556466
dc.identifier.urihttps://hdl.handle.net/11250/2777489
dc.description.abstract
dc.description.abstractSelective remixing refers to altering an existing musical composition to create something new. The process of remixing audio is commonly intertwined with having a fundamental understanding of music, or music production software - such as digital audio workstations. As research in the roles machine learning can have in audio related transformation and generation tasks continues, there is an indication that systems aiming to remix all types of music without prior musical knowledge from the user could be an effective means of creating content. Existing machine learning research focused on music related generation and transformation is commonly concerned with targeting single instrument or single melody music. As such, five genres of music are used throughout this thesis with the goal being to achieve selective remixing by using image-based domain transfer methods on spectrogram images of music. With this in mind a system with a pipeline architecture comprised of two independent generative adversarial network models was created. The first model in the pipeline, CycleGAN (Zhu et al. 2017) is responsible for performing style transfer on constant-Q transform spectrogram images. CycleGAN applies features from one of five genres to the spectrogram and passes its result to the next process in the pipeline, CQTGAN which is a modified MelGAN (Kumar et al. 2019) model. The spectrogram output by CycleGAN is turned into a real-value tensor representing a spectrogram and is approximately reconstructed back into audio. Four seconds of music are output by the system in WAV format, and can be concatenated together to recreate a full length music track. To evaluate the system a number of experiments and a survey are conducted, each concerning the intelligibility of the music and the sufficiency of the style transfer performed. In both cases the audio quality output from the system was considered to be low quality. This was determined to be due to the increased complexity involved in processing high sample rate music with homophonic or polyphonic audio textures. Despite the low quality results, the style transfer performed by the system did appear to perform noticeable selective remixing on most of the music tracks used for evaluation. Twenty-five unique examples are provided on https://mcallistertyler95.github.io/music-comparison.html, it is recommended to listen to them before reading the rest of this report. Additionally, the code for the implemented system is hosted at https://github.com/mcallistertyler95/genre-transfer-pipeline along with run and training instructions.
dc.language
dc.publisherNTNU
dc.titleGenerating Remixed Music via Style Transfer
dc.typeMaster thesis


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel