Acoustic Feature Comparison for Different Speaking Rates
MetadataVis full innførsel
This paper investigates the eﬀect of speaking rate variation on the task of frame classiﬁcation. This task is indicative of the performance on phoneme and word recognition and is a ﬁrst step towards designing voice-controlled interfaces. Diﬀerent speaking rates cause different dynamics. For example, speaking rate variations will cause changes both in formant frequencies and in their transition tracks. A word spoken at normal speed gets recognized more often than the same word spoken by the same speaker at a much faster or slower pace, or vice-versa. It is thus imperative to design interfaces which take into account diﬀerent speaking variabilities. To better incorporate speaker variability into digital devices, we study the eﬀect of a) feature selection and b) the choice of network architecture on variable speaking rates. Four different features are evaluated on multiple conﬁgurations of Deep Neural Network (DNN) architectures. The ﬁndings show that log Filter-Bank Energies (FBE) outperformed the other acoustic features not only on normal speaking rate but for slow and fast speaking rates as well.