A Discriminative Approach to Pronunciation Variation Modeling in Speech Recognition
Doctoral thesis
Permanent lenke
http://hdl.handle.net/11250/2370679Utgivelsesdato
2013Metadata
Vis full innførselSamlinger
Sammendrag
Put in the most general terms, this dissertation addresses the problem of automatic recognition of non-native proper names. Proper names in themselves tend to pose a severe challenge to speech recognition engines, as these names can typically be pronounced in a variety of ways, and do not necessarily follow generally governing pronunciation conventions. Non-native proper names add still further levels of complication, caused by such variables as the speaker’s familiarity with the foreign name, proficiency in the foreign language, and tendency to adapt pronunciation of the name to the native language or, obversely, to adopt foreign speech characteristics in order to pronounce the name as faithfully as possible. When confronted with nonnative proper names, it is therefore particularly important for an automatic speech recognition system to be able to handle a considerable amount of pronunciation variety. Traditionally, the more or less self-evident approach to cope with this variety has been simply to add pronunciation variants to the recognition lexicon. However, introducing such variants typically entails the risk of increasing confusability between different lexicon entries, as new variants of previously more distinct units are likely to augment phonetic similarities within the lexicon. It would seem crucial for recognition success, then, to optimize the balance between lexical coverage and confusability. In this work, we strive to attain such a balance by submitting pronunciation variants to selection procedures rather than adding variants to the recognition lexicon indiscriminately. The selective addition of pronunciation variants to a recognition lexicon has a clear intuitive appeal. It is the objective of this dissertation to confirm that intuition experimentally by measuring the improvements in recognition accuracy yielded by various selection methods. Particularly, we propose a new pronunciation variant selection criterion that is directly related to the effective recognition error rate. To estimate the number of errors corrected by a particular variant, scores based on the Minimum Classification Error framework are calculated before and after the addition of the variant to the lexicon. Using this criterion, three different variant selection procedures are proposed in this work: a single-pass approach, an iterative approach and a tree-search approach. These selection methods aim to optimize the recognition lexicon in terms of size and recognition performance by adding to the lexicon only those pronunciation variants that effect an actual decrease in the error rate. We contrast these selection methods with more traditional approaches to populate the recognition lexicon, such as using all available variants indiscriminately, and selecting on the basis of the probabilities obtained during the generation of possible new pronunciation variants. Our experiments show that we can significantly reduce the error rate and the required number of variants per name by applying our proposed selection approaches.