A Normative Study on Applying Deep Learning to Native Language Identification

This thesis is a normative study on various approaches within native language identification (NLI), with the intention of highlighting the shortcomings and strong points of implementing deep neural networks for this task. NLI is the task of identifying a person's first language (L1) based solely on written and/or spoken output produced in a learned language (L2). The research is mainly based around the NLI shared tasks, which are workshops where different teams participate to produce solutions that aims at bettering NLI performance. The dataset TOEFL11: A Corpus of Non-Native English, which was distributed in the context of these tasks, will also be used for the scope of this thesis. Deep neural networks, also commonly referred to as deep learning, have proven useful in many applications, including other related fields in natural language processing (NLP). In the most recent NLI shared task, there proved to still be many unanswered questions regarding the usefulness of deep neural networks in the field, and how to better utilise the available data. Through experiments and by studying related work, this publication aims to bring light to these questions using variations of recurrent neural networks as the classification models, specifically long short-term memory (LSTM) and gated recurrent units (GRU).

Utgiver

NTNU