Detection of phonetic features for automatic classification of Norwegian dialects

This project explores the expansion of an existing language recognition system for use with the Norwegian data set NB tale / NAFTA, with the ultimate goal being dialect classification. Acoustic events are tokenized into phonetic features which are universal for all languages. A recognizer is trained with a deep/artificial neural network (DNN/ANN), which is connected to a Hidden Markov Model (HMM). Test data is decoded using the aforementioned system, called the frontend. The features are used in a high-dimensional document vector, which can be used for language identification. In the backend, a one-versus-all Support Vector Machine (SVM) is trained for each language, to discriminate between these language-labeled documents. A target and anti-target Gaussian Mixture Models (GMM) are thus trained, which are used for a final language identification (LID) decision. The frontend was trained with six languages from the OGI database, and tested with the OGI CV set, in addition to the English TIMIT database, and part 1 of the NAFTA database. The best frontend system proved to be a context-independent tristate configuration. The backend was trained with the Callfriend database of spontaneous speech, and tested on the LID 2003 evaluation set, in addition to part 3 of NAFTA. The best LID performance was achieved with Norwegian and Japanese data.

Utgiver

NTNU