Vis enkel innførsel

dc.contributor.authorSabzi Shahrebabaki, Abdolreza
dc.contributor.authorSiniscalchi, Sabato Marco
dc.contributor.authorSalvi, Giampiero
dc.contributor.authorSvendsen, Torbjørn Karl
dc.date.accessioned2021-03-29T11:45:41Z
dc.date.available2021-03-29T11:45:41Z
dc.date.created2020-10-26T10:35:34Z
dc.date.issued2020
dc.identifier.issn2308-457X
dc.identifier.urihttps://hdl.handle.net/11250/2735981
dc.description.abstractWe propose a new acoustic-to-articulatory inversion (AAI) sequence-to-sequence neural architecture, where spectral sub-bands are independently processed in time by 1-dimensional (1-D) convolutional filters of different sizes. The learned features maps are then combined and processed by a recurrent block with bi-directional long short-term memory (BLSTM) gates for preserving the smoothly varying nature of the articulatory trajectories. Our experimental evidence shows that, on a speaker dependent AAI task, in spite of the reduced number of parameters, our model demonstrates better root mean squared error (RMSE) and Pearson’s correlation coefficient (PCC) than a both a BLSTM model and an FC-BLSTM model where the first stages are fully connected layers. In particular, the average RMSE goes from 1.401 when feeding the filterbank features directly into the BLSTM, to 1.328 with the FC-BLSTM model, and to 1.216 with the proposed method. Similarly, the average PCC increases from 0.859 to 0.877, and 0.895, respectively. On a speaker independent AAI task, we show that our convolutional features outperform the original filterbank features, and can be combined with phonetic features bringing independent information to the solution of the problem. To the best of the authors’ knowledge, we report the best results on the given task and data.en_US
dc.language.isoengen_US
dc.publisherInternational Speech Communication Association - ISCAen_US
dc.titleSequence-to-sequence articulatory inversion through time convolution of sub-band frequency signalsen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionpublishedVersionen_US
dc.source.journalInterspeechen_US
dc.identifier.cristin1842190
dc.description.localcodeThis article will not be available due to copyright restrictions © 2020 by ISCA.en_US
cristin.ispublishedfalse
cristin.fulltextoriginal
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel