Vis enkel innførsel

dc.contributor.authorSabzi Shahrebabaki, Abdolreza
dc.contributor.authorImran, Ali Shariq
dc.contributor.authorOlfati, Negar
dc.contributor.authorSvendsen, Torbjørn Karl
dc.date.accessioned2019-11-29T09:27:07Z
dc.date.available2019-11-29T09:27:07Z
dc.date.created2019-05-02T13:29:29Z
dc.date.issued2019
dc.identifier.citationCircuits, systems, and signal processing. 2019, 34 (1130), 1-20.nb_NO
dc.identifier.issn0278-081X
dc.identifier.urihttp://hdl.handle.net/11250/2630983
dc.description.abstractThis paper provides a comprehensive analysis of the effect of speaking rate on frame classification accuracy. Different speaking rates may affect the performance of the automatic speech recognition system yielding poor recognition accuracy. A model trained on a normal speaking rate is better able to recognize speech at a normal pace but fails to achieve similar performance when tested on slow or fast speaking rates. Our recent study has shown that a drop of almost ten percentage points in the classification accuracy is observed when a deep feed-forward network is trained on the normal speaking rate and evaluated on slow and fast speaking rates. In this paper, we extend our work to convolutional neural networks (CNN) to see if this model can reduce the accuracy gap between different speaking rates. Filter bank energies (FBE) and Mel frequency cepstral coefficients are evaluated on multiple configurations of the CNN where the networks are trained on normal speaking rate and evaluated on slow and fast speaking rates. The results are compared to those obtained by a deep neural network. A breakdown of phoneme-level classification results and the confusion between vowels and consonants is also presented. The experiments show that the CNN architecture when used with FBE features performs better on both slow and fast speaking rates. An improvement of nearly 2% in case of fast and 3% in case of slow speaking rates is observed.nb_NO
dc.language.isoengnb_NO
dc.publisherSpringer Verlagnb_NO
dc.titleA Comparative Study of Deep Learning Techniques on Frame-Level Speech Data Classificationnb_NO
dc.typeJournal articlenb_NO
dc.typePeer reviewednb_NO
dc.description.versionacceptedVersionnb_NO
dc.source.pagenumber1-20nb_NO
dc.source.volume34nb_NO
dc.source.journalCircuits, systems, and signal processingnb_NO
dc.source.issue1130nb_NO
dc.identifier.doi10.1007/s00034-019-01130-0
dc.identifier.cristin1695153
dc.description.localcodeThis is a post-peer-review, pre-copyedit version of the article. Locked until 3.5.2020 due to copyright restrictions.nb_NO
cristin.unitcode194,63,35,0
cristin.unitcode194,63,1,0
cristin.unitnameInstitutt for elektroniske systemer
cristin.unitnameIE fakultetsadministrasjon
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel