Language Identification Based on Detection of Phonetic Characteristics

This thesis has taken a closer look at the implementation of the back-end of a language recognition system. The front-end of the system is a Universal Attribute Recognizer (UAR), which is used to detect phonetic characteristics in an utterance. When a speech signal is sent through the UAR, it is decoded into a sequence of attributes which is used to generate a vector of term-count. Vector Space Modeling (VSM) have been used for training the language classifiers in the back-end. The main principle of VSM is that term-count vectors from the same language will position themselves close to eachother when they are mapped into a vector space, and this property can be exploited for recognizing languages. The implemented back-end has trained vectors space classifiers for 12 different languages, and a NIST recognition task has been performed for evaluating the recognition rate of the system. The NIST task was a verification task and the system achived a equal error rate (EER) of $6.73 %$. Tools like Support Vector Machines (SVM) and Gaussian Mixture Models (GMM) have been used in the implementation of the back-end. Thus, are quite a few parameters which can be varied and tweaked, and different experiments were conducted to investigate how these parameters would affect EER of the language recognizer. As a part test the robustness of the system, the language recognizer were exposed to a so-called out-of-set language, which is a language that the system has not been trained to handle. The system showed a poor performance at rejecting these speech segments correctly.

Utgiver

Institutt for elektronikk og telekommunikasjon