A Two-Stage Deep Modeling Approach to Articulatory Inversion

Sabzi Shahrebabaki, Abdolreza; Olfati, Negar; Imran, Ali Shariq; Johnsen, Magne Hallstein; Siniscalchi, Sabato Marco; Svendsen, Torbjørn Karl

dc.contributor.author	Sabzi Shahrebabaki, Abdolreza
dc.contributor.author	Olfati, Negar
dc.contributor.author	Imran, Ali Shariq
dc.contributor.author	Johnsen, Magne Hallstein
dc.contributor.author	Siniscalchi, Sabato Marco
dc.contributor.author	Svendsen, Torbjørn Karl
dc.date.accessioned	2022-10-11T11:30:44Z
dc.date.available	2022-10-11T11:30:44Z
dc.date.created	2022-03-11T13:09:55Z
dc.date.issued	2021
dc.identifier.citation	2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	en_US
dc.identifier.isbn	978-1-7281-7606-2
dc.identifier.uri	https://hdl.handle.net/11250/3025349
dc.description.abstract	This paper proposes a two-stage deep feed-forward neural network (DNN) to tackle the acoustic-to-articulatory inversion (AAI) problem. DNNs are a viable solution for the AAI task, but the temporal continuity of the estimated articulatory values has not been exploited properly when a DNN is employed. In this work, we propose to address the lack of any temporal constraints while enforcing a parameter-parsimonious solution by deploying a two-stage solution based only on DNNs: (i) Articulatory trajectories are estimated in a first stage using DNN, and (ii) a temporal window of the estimated trajectories is used in a follow-up DNN stage as a refinement. The first stage estimation could be thought of as an auxiliary additional information that poses some constraints on the inversion process. Experimental evidence demonstrates an average error reduction of 7.51% in terms of RMSE compared to the baseline, and an improvement of 2.39% with respect to Pearson correlation is also attained. Finally, we should point out that AAI is still a highly challenging problem, mainly due to the non-linearity of the acoustic-to-articulatory and one-to-many mapping. It is thus promising that a significant improvement was attained with our simple yet elegant solution.	en_US
dc.language.iso	eng	en_US
dc.publisher	IEEE	en_US
dc.relation.ispartof	ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
dc.title	A Two-Stage Deep Modeling Approach to Articulatory Inversion	en_US
dc.type	Chapter	en_US
dc.description.version	acceptedVersion	en_US
dc.rights.holder	© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.identifier.doi	10.1109/ICASSP39728.2021.9413742
dc.identifier.cristin	2009124
cristin.ispublished	true
cristin.fulltext	postprint
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: tandem_dnn.pdf
Størrelse:: 328.1Kb
Format:: PDF
Beskrivelse:: Sabzi

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for elektroniske systemer [2289]
Publikasjoner fra CRIStin - NTNU [37304]

Vis enkel innførsel