Evaluating Performance Metrics for Deep Neural Network-based Speech Enhancement Systems

Gelderblom, Femke B.

Gelderblom, Femke B.

Doctoral thesis

Åpne

Femke B. Gelderblom.pdf (8.064Mb)

Fulltext not available (Låst)

Permanent lenke

https://hdl.handle.net/11250/3059489

Utgivelsesdato

2023

Metadata

Vis full innførsel

Samlinger

Institutt for elektroniske systemer [2351]

Sammendrag

A recurring challenge for speech enhancement (SE) systems, is that removing/reducing noise and reverberance does not necessarily increase the intelligibility or the quality of the speech for human listeners.

Deep neural networks (DNNs) are promising models for speech enhancement systems due to their highly adaptive non-linear nature. While these models can be trained with standard deep learning (DL) techniques to perform a wide variety of tasks, their real-life performance is dependent on the predictive power of the evaluation tools that guide the development process of speech enhancement systems.

This thesis focuses on evaluating the reliability of popular objective performance metrics of DNN-based speech enhancement systems. For this purpose, a variety of single channel and multichannel SE systems were developed and subjectively evaluated with listening tests.

None of the tested metrics proved to be reliable indicators for subjective changes in performance. This lack of reliable indicators critically impedes progress within the field of speech enhancement systems for human listeners.

Består av

Paper 1: Gelderblom, Femke B.; Tronstad, Tron Vedul; Viggen, Erlend Magnus. Subjective intelligibility of deep neural network-based speech enhancement. Interspeech 2017 ;Volum 2017-August. s. 1968-1972. Copyright © 2017 ISCA. DOI: http://dx.doi.org/10.21437/Interspeech.2017-1041

Paper 2: Gelderblom, Femke B.; Tronstad, Tron Vedul; Viggen, Erlend Magnus. Subjective evaluation of a noise-reduced training target for deep neural network-based speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) 2018 ;Volum 27.(3) s. 583-594. Copyright © 2018 IEEE. DOI: http://dx.doi.org/10.1109/TASLP.2018.2882738

Paper 3: Gelderblom, Femke B.; Liu, Yi; Kvam, Johannes; Myrvoll, Tor Andre. Synthetic Data For Dnn-Based Doa Estimation of Indoor Speech. I: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (Institute of Electrical and Electronics Engineers) 2021 ISBN 978-1-7281-7606-2. s. 4390-4394. Copyright © 2021 IEEE. DOI: http://dx.doi.org/10.1109/ICASSP39728.2021.9414415

Paper 4: Gelderblom, Femke B.; Myrvoll, Tor Andre. Deep Complex Convolutional Recurrent Network for Multi-Channel Speech Enhancement and Dereverberation. I: 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP). IEEE 2021 ISBN 978-1-7281-6338-3. s. - Copyright © 2021 IEEE. DOI: http://dx.doi.org/10.1109/MLSP52302.2021.9596086

Paper 5: Gelderblom, Femke B.; Tronstad, Tron V.; Svendsen, Torbjørn; Myrvoll, Tor Andre. On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks. This paper is submitted for publication and is therefore not included.

Utgiver

NTNU

Serie

Doctoral theses at NTNU;2023:53