Vis enkel innførsel

dc.contributor.authorSabzi Shahrebabaki, Abdolreza
dc.contributor.authorSalvi, Giampiero
dc.contributor.authorSvendsen, Torbjørn Karl
dc.contributor.authorSiniscalchi, Sabato Marco
dc.date.accessioned2022-09-29T08:50:13Z
dc.date.available2022-09-29T08:50:13Z
dc.date.created2022-01-18T14:38:01Z
dc.date.issued2021
dc.identifier.issn2329-9290
dc.identifier.urihttps://hdl.handle.net/11250/3022485
dc.description.abstractWe investigate the problem of speaker independent acoustic-to-articulatory inversion (AAI) in noisy conditions within the deep neural network (DNN) framework. In contrast with recent results in the literature, we argue that a DNN vector-to-vector regression front-end for speech enhancement (DNN-SE) can play a key role in AAI when used to enhance spectral features prior to AAI back-end processing. We experimented with single- and multi-task training strategies for the DNN-SE block finding the latter to be beneficial to AAI. Furthermore, we show that coupling DNN-SE producing enhanced speech features with an AAI trained on clean speech outperforms a multi-condition AAI (AAI-MC) when tested on noisy speech. We observe a 15% relative improvement in the Pearson’s correlation coefficient (PCC) between our system and AAI-MC at 0 dB signal-to-noise ratio on the Haskins corpus. Our approach also compares favourably against using a conventional DSP approach to speech enhancement (MMSE with IMCRA) in the front-end. Finally, we demonstrate the utility of articulatory inversion in a downstream speech application. We report significant WER improvements on an automatic speech recognition task in mismatched conditions based on the Wall Street Journal corpus (WSJ) when leveraging articulatory information estimated by AAI-MC system over spectral-alone speech features.en_US
dc.language.isoengen_US
dc.publisherIEEEen_US
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.titleAcoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Modelsen_US
dc.typeJournal articleen_US
dc.typePeer revieweden_US
dc.description.versionpublishedVersionen_US
dc.source.volume30en_US
dc.source.journalIEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)en_US
dc.identifier.doi10.1109/TASLP.2021.3133218
dc.identifier.cristin1983737
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode2


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Navngivelse 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal