Vis enkel innførsel

dc.contributor.advisorSvendsen, Torbjørn Karl
dc.contributor.authorRugayan, Janine
dc.date.accessioned2021-09-15T17:11:33Z
dc.date.available2021-09-15T17:11:33Z
dc.date.issued2021
dc.identifierno.ntnu:inspera:77038608:48242214
dc.identifier.urihttps://hdl.handle.net/11250/2778188
dc.description.abstract
dc.description.abstractThe process of human spoken language acquisition is still being studied up to this day—the most popular theory from B.F. Skinner describes the language learning of infants as a verbal behavior controlled by consequences. This thesis explores the possibility of applying the same principle to machines by creating a system that simulates spoken language acquisition using reinforcement learning. The developed system is mainly comprised of unsupervised word segmentation and language learning. Vector-Quantized Autoregressive Predictive Coding (VQ-APC) model is utilized to implement unsupervised word segmentation. While the language learning part is implemented using the reinforcement learning method called deep Q-network (DQN). The input to the system is a combined sound file consisted of randomly shuffled utterances of digits "zero" to "nine", and various background noises. It is akin to what an infant would hear during the early stages of learning a language. The virtual agent learns the meanings of the discovered spoken digits through accomplishing the task of "reciting" them in ascending order. Different experiments were executed to test the system. The best results for word segmentation were achieved using the VQ-APC model with the WordSeg Adaptor Grammar (AG) algorithm. Moreover, increasing the recognition rate of the word segmentation was observed to improve the reinforcement learning results only to a certain degree. Finally, it was found that large action space sizes can hinder DQN model convergence. In summary, the thesis achieved spoken language acquisition in machines in line with Skinner's theory by performing unsupervised word segmentation on a long speech clip and employing reinforcement learning to ground the discovered spoken words. Moreover, it managed to utilize VQ-APC for unsupervised word segmentation and discovered factors that can affect reinforcement learning performance.
dc.languageeng
dc.publisherNTNU
dc.titleA Deep Learning Approach to Spoken Language Acquisition
dc.typeMaster thesis


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel