Estimating Software Development Task Effort Using Word Embeddings and Recurrent Neural Networks

Tubelis, Mariss

Tubelis, Mariss

Master thesis

Åpne

18491_FULLTEXT.pdf (2.283Mb)

18491_COVER.pdf (1.600Mb)

Permanent lenke

http://hdl.handle.net/11250/2584651

Utgivelsesdato

2018

Metadata

Vis full innførsel

Samlinger

Institutt for datateknologi og informatikk [6704]

Sammendrag

This cross-discipline project tests a state-of-the-art neural network model on a problem with high impact in software engineering, namely task estimation. The majority of research conducted in the software estimation field focuses on early, prior-project estimation, while considerably less effort has been spent on task estimation, which has become more important since agile practices became widely adopted.

In this paper a dataset consisting of more than 63,000 tasks with textual descriptions and time spent reported was created from 32 publicly available JIRA issue tracking system instances. Five architectures of LSTM and highway neural networks were then parameter-tuned on 19 subsets of the main dataset by running 18,000 evaluation rounds in total. The use of general English word embeddings was compared with learning word embeddings from more than 2,000,000 publicly available software task description text corpus. The results were validated on two commercial datasets of 9,000 and 30,000 labeled datapoints respectively.

Although the results were not gratifying as the model accuracy wasn t anywhere close to human expert accuracy, this project provides a solid contribution to further research in the field by describing the methods applied in the attempt to solve the problem as well as several observations regarding transfer learning effects and optimal model configurations. The main dataset and the results together with well-documented data gathering, preprocessing, model training and visualization scripts were published on GitHub.

Utgiver

NTNU