Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models

Sabzi Shahrebabaki, Abdolreza; Salvi, Giampiero; Svendsen, Torbjørn Karl; Siniscalchi, Sabato Marco

dc.contributor.author	Sabzi Shahrebabaki, Abdolreza
dc.contributor.author	Salvi, Giampiero
dc.contributor.author	Svendsen, Torbjørn Karl
dc.contributor.author	Siniscalchi, Sabato Marco
dc.date.accessioned	2022-09-29T08:50:13Z
dc.date.available	2022-09-29T08:50:13Z
dc.date.created	2022-01-18T14:38:01Z
dc.date.issued	2021
dc.identifier.issn	2329-9290
dc.identifier.uri	https://hdl.handle.net/11250/3022485
dc.description.abstract	We investigate the problem of speaker independent acoustic-to-articulatory inversion (AAI) in noisy conditions within the deep neural network (DNN) framework. In contrast with recent results in the literature, we argue that a DNN vector-to-vector regression front-end for speech enhancement (DNN-SE) can play a key role in AAI when used to enhance spectral features prior to AAI back-end processing. We experimented with single- and multi-task training strategies for the DNN-SE block finding the latter to be beneficial to AAI. Furthermore, we show that coupling DNN-SE producing enhanced speech features with an AAI trained on clean speech outperforms a multi-condition AAI (AAI-MC) when tested on noisy speech. We observe a 15% relative improvement in the Pearson’s correlation coefficient (PCC) between our system and AAI-MC at 0 dB signal-to-noise ratio on the Haskins corpus. Our approach also compares favourably against using a conventional DSP approach to speech enhancement (MMSE with IMCRA) in the front-end. Finally, we demonstrate the utility of articulatory inversion in a downstream speech application. We report significant WER improvements on an automatic speech recognition task in mismatched conditions based on the Wall Street Journal corpus (WSJ) when leveraging articulatory information estimated by AAI-MC system over spectral-alone speech features.	en_US
dc.language.iso	eng	en_US
dc.publisher	IEEE	en_US
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models	en_US
dc.type	Journal article	en_US
dc.type	Peer reviewed	en_US
dc.description.version	publishedVersion	en_US
dc.source.volume	30	en_US
dc.source.journal	IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)	en_US
dc.identifier.doi	10.1109/TASLP.2021.3133218
dc.identifier.cristin	1983737
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	2

Tilhørende fil(er)

Filnavn:: Acoustic-to-Articulatory_Mappi ...
Størrelse:: 2.510Mb
Format:: PDF
Beskrivelse:: IEEE/ACM (TASLP)

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for elektroniske systemer [2334]
Publikasjoner fra CRIStin - NTNU [38070]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal