Subjective evaluation of a noise-reduced training target for deep neural network-based speech enhancement

Gelderblom, Femke B.; Tronstad, Tron Vedul; Viggen, Erlend Magnus

dc.contributor.author	Gelderblom, Femke B.
dc.contributor.author	Tronstad, Tron Vedul
dc.contributor.author	Viggen, Erlend Magnus
dc.date.accessioned	2019-03-14T11:28:00Z
dc.date.available	2019-03-14T11:28:00Z
dc.date.created	2018-11-19T11:32:42Z
dc.date.issued	2018
dc.identifier.issn	2329-9290
dc.identifier.uri	http://hdl.handle.net/11250/2590000
dc.description.abstract	Speech enhancement systems aim to improve the quality and intelligibility of noisy speech. In this study, we compare two speech enhancement systems based on deep neural networks. The speech intelligibility and quality of both systems was evaluated subjectively, by a Speech Recognition Test based on Hagerman sentences and a translation of the ITU-T P.835 recommendation, respectively. Results were compared with the objective measures STOI and POLQA. Neither STOI nor POLQA reliably predicted subjective results. While STOI anticipated improvement, subjective results for both models showed degradation of speech intelligibility. POLQA results were overall hardly affected, while the subjective results showed significant changes in overall quality, both positive and negative, in many of the tests. One of the systems was trained to remove all noise; a strategy that is common in speech enhancement systems found in the literature. The other system was trained to only reduce the noise such that the signal-to-noise ratio increased with 10 dB. The latter system subjectively outperformed the system that attempted to remove noise completely. From this, we conclude that objective evaluation cannot replace subjective evaluation until a measure that reliably predicts intelligibility and quality for deep neural network based systems has been identified. Results further indicate that it may be beneficial to move away from more aggressive noise removal strategies towards noise reduction strategies that cause less speech distortion.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	nb_NO
dc.subject	Artificial Neural Networks	nb_NO
dc.subject	Talepersespsjon	nb_NO
dc.subject	Speech perception	nb_NO
dc.title	Subjective evaluation of a noise-reduced training target for deep neural network-based speech enhancement	nb_NO
dc.title.alternative	Subjective evaluation of a noise-reduced training target for deep neural network-based speech enhancement	nb_NO
dc.type	Journal article	nb_NO
dc.type	Peer reviewed	nb_NO
dc.description.version	acceptedVersion	nb_NO
dc.subject.nsi	VDP::Telekommunikasjon: 552	nb_NO
dc.subject.nsi	VDP::Telecommunication: 552	nb_NO
dc.source.journal	IEEE/ACM Transactions on Audio, Speech, and Language Processing	nb_NO
dc.identifier.doi	10.1109/TASLP.2018.2882738
dc.identifier.cristin	1632077
dc.relation.project	Norges forskningsråd: 237887	nb_NO
dc.description.localcode	© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	nb_NO
cristin.unitcode	194,65,25,0
cristin.unitname	Institutt for sirkulasjon og bildediagnostikk
cristin.ispublished	false
cristin.fulltext	postprint
cristin.qualitycode	2

Tilhørende fil(er)

Filnavn:: Gelderblom+et+al.%2C+2018.pdf
Størrelse:: 777.2Kb
Format:: PDF
Beskrivelse:: Gelderblom

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for sirkulasjon og bildediagnostikk [1930]
Publikasjoner fra CRIStin - NTNU [38525]

Vis enkel innførsel