Classification of Maintenance Reports - Statistical NLP meets the Oil & Gas Industry
Master thesis
Permanent lenke
http://hdl.handle.net/11250/2615793Utgivelsesdato
2018Metadata
Vis full innførselSamlinger
Sammendrag
Several problematic data characteristics were revealed, such as multilingual reports, and significant class imbalances. While no consistent scheme for conduct-ing data preparation was found, several techniques were frequently reiterated in the most promising experiments. For the three classifiers tested (Naive Bayes, Support Vector Machines, and Random Forest), Support Vector Machines was the overall best choice, being the only classifier to generalize well beyond observed data. The various re-sampling techniques decreased the overall performance, which seems to indicate that more noise was generated instead.