Using Information Extraction and Text Classification in an Effort to Support Systematic Literature Reviews

Systematic literature reviews are an important tool in Evidence-basedSoftware Engineering, but require a large amount of effort and time from theresearchers. Data extraction is an important step in these reviews, but currentpractice requires the researchers to manually extract large amounts ofdata. This thesis investigates the possibility of developing a prototype forautomatic extraction, so to reduce the time spent on manually extracting thisdata. By reviewing related research, and experimenting with different features and machine learning models, two different models were implemented in the prototype: Conditional Random Fields for information extraction and Maximum Entropy for text classification. The models achieved average F1 performance score of 67.02% and 73.82%, respectively. These results can be characterized as good results, and show that it is possible to automate the data extraction process, by annotating a small part of the dataset and training machine learning models to perform the extraction.

Utgiver

Institutt for datateknikk og informasjonsvitenskap