Vis enkel innførsel

dc.contributor.authorSabzi Shahrebabaki, Abdolreza
dc.contributor.authorImran, Ali Shariq
dc.contributor.authorOlfati, Negar
dc.contributor.authorSvendsen, Torbjørn Karl
dc.date.accessioned2019-05-27T05:43:34Z
dc.date.available2019-05-27T05:43:34Z
dc.date.created2018-07-12T14:31:59Z
dc.date.issued2018
dc.identifier.isbn978-3-319-91250-9
dc.identifier.urihttp://hdl.handle.net/11250/2598838
dc.description.abstractThis paper investigates the effect of speaking rate variation on the task of frame classification. This task is indicative of the performance on phoneme and word recognition and is a first step towards designing voice-controlled interfaces. Different speaking rates cause different dynamics. For example, speaking rate variations will cause changes both in formant frequencies and in their transition tracks. A word spoken at normal speed gets recognized more often than the same word spoken by the same speaker at a much faster or slower pace, or vice-versa. It is thus imperative to design interfaces which take into account different speaking variabilities. To better incorporate speaker variability into digital devices, we study the effect of a) feature selection and b) the choice of network architecture on variable speaking rates. Four different features are evaluated on multiple configurations of Deep Neural Network (DNN) architectures. The findings show that log Filter-Bank Energies (FBE) outperformed the other acoustic features not only on normal speaking rate but for slow and fast speaking rates as well.nb_NO
dc.language.isoengnb_NO
dc.publisherSpringer Verlagnb_NO
dc.relation.ispartofHuman-Computer Interaction. Interaction Technologies
dc.titleAcoustic Feature Comparison for Different Speaking Ratesnb_NO
dc.typeChapternb_NO
dc.description.versionacceptedVersionnb_NO
dc.source.pagenumber176-189nb_NO
dc.identifier.doi10.1007/978-3-319-91250-9_14
dc.identifier.cristin1596955
dc.relation.projectNorges forskningsråd: 240282nb_NO
dc.description.localcodeThis is a post-peer-review, pre-copyedit version of an article published in [Lecture Notes in Computer Science] Locked until 1.6.2019 due to copyright restrictions. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-91250-9_14nb_NO
cristin.unitcode194,63,35,0
cristin.unitnameInstitutt for elektroniske systemer
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel