Vis enkel innførsel

dc.contributor.advisorGulla, Jon Atle
dc.contributor.authorFosse, Eirik Nilsen
dc.contributor.authorHjetland, Sondre Sæterli
dc.date.accessioned2018-09-27T14:00:25Z
dc.date.available2018-09-27T14:00:25Z
dc.date.created2018-06-01
dc.date.issued2018
dc.identifierntnudaim:18043
dc.identifier.urihttp://hdl.handle.net/11250/2565103
dc.description.abstractFinding good features for performing supervised learning on high dimensional industrial datasets can be challenging, as the feature set typically consists of hundreds to thousands of features. Specific features might follow protocols or custom coding standards that unless decoded, are unusable by machine learning algorithms. This is often the case in industrial environments, where you need domain knowledge to interpret the semantics of the data. The objective of this research is to enable classification of industrial work orders into a predefined set of failure mode codes. Analyzing the effect of incorporating domain knowledge in the preprocessing phase of the supervised learning process is the main focus of the study. A thorough analysis is conducted to assess multiple supervised learning algorithms, to find fitting evaluation metrics, as well as to appraise the effect of extracting features from both structured and unstructured fields. Our experiments show that incorporating domain knowledge in the preprocessing phase improves the performance of the classifiers substantially. By utilizing domain knowledge we were able to increase the performance of the classifiers with approximately 0.07 measured by Cohen's Kappa, an average relative improvement of 25.2%. An assessment of the feature importance in one of the final classifiers, showed that the sum of the importance of features extracted using domain knowledge was 38.97%. This implies that applying domain knowledge during feature extraction is crucial in order to avoid erroneous pruning of important encoded features, and to be able to extract more information from the dataset. The best classifier is currently not accurate enough to automatically label work orders with a failure mode code, but it is accurate enough to suggest failure mode codes when an operator submits new work orders.
dc.languageeng
dc.publisherNTNU
dc.subjectInformatikk, Databaser og søk
dc.titleUsing Domain Knowledge in Classifying Industrial Data from the Oil and Gas Sector
dc.typeMaster thesis


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel