Open Set Speaker Identification

Digital assistants that communicate through speech are one of the new technologies that have emerged this decade. Progress in the field of speaker recognition have opened up possibilities for having digital assistants for groups of people, where the assistant can offer personalized assistance and receive commands from multiple people. This master's thesis investigates techniques for speaker identification in a group meeting scenario, where the availability of speech data for system training often can be low. Speaker identification and verification experiments on the RSR2015 database have been conducted with different GMM-UBM- and i-vector-based systems. It has been found that the tz-normalized GMM-UBM system gave best the performance, with a recognition rate of 81.3\% and an EER of 7.8\%. The GMM-UBM system has overall performed better that the i-vector system.

Recent research proposes the usage of deep learning techniques for speaker identification, and a framework for bottleneck feature extraction have been included in thesis, with experiments on bottleneck features left for future work. In addition to experiments, the thesis also contains a short guide to setting up a speaker identification system in SIDEKIT, which has been the main toolkit used in this task. The full implementation of the scripts used in experiments can be found in the Appendix.

Utgiver

NTNU