Extending Decision Support for the Norwegian Labour Inspection Authority Through Open and Unstructured Data Sources - Methods for detecting relevant information based on external data
Master thesis
Permanent lenke
http://hdl.handle.net/11250/2562773Utgivelsesdato
2018Metadata
Vis full innførselSamlinger
Sammendrag
The Norwegian Labour Inspection Authority supervises thousands of enterprises every year and strives for preservation of the employee's health and safety. The possibility to detect work related events in open sources is highly desirable, and a system identifying this information will serve as a supplement to the internal systems they already possess.
The world wide web contains a large amount of unstructured data on different platforms, and this makes it complicated to detect relevant information. Harvesting the data from the different sources may offer some challenges, since not all platforms offer open API solutions. In this thesis we focus on how we can detect relevant data for the Norwegian Labour Inspection Authority. The most important steps in the approach are to extract information from open sources and use Artificial Neural Network and Deep Learning to classify the information as relevant or irrelevant. The next step is to cluster and extract the topic from the clusters to get an overview of the data, and finally the relevant information is presented.
As there were no benchmark dataset to evaluate our proposed system, we gathered data from online news papers and assessed our system on our retrieved dataset. By using neural network on the data gathered from open news articles, we achieve good results for the Norwegian Labour Inspection Authority. To the best of our knowledge, it does not exist any previous work with the Norwegian Labour Inspection Authority on Norwegian text, and as a result of this our proposed system is state-of-the-art.