Ontology-based Data Extraction in the Scholarship-Related Content
MetadataVis full innførsel
Master Thesis on the topic "Ontology-based Data Extraction in the Scholarship-Related Content" is concentrated on the area of ontologies and on the research of the methods by which ontological concepts can be recognized in the text, enhancing its semantic meaning. The use of ontologies can significantly improve semantic richness of the texts presented on the Web, but to be able to exploit all their capabilities, specific XML-based notations must be written to describe each and every resource. This is usually quite a big amount of human work, and the Thesis is seeking for the ways to decrease the amount of human resources, either by suggesting automatic or semi-automatic approaches for ontology-based information retrieval. In the experiments conducted in the domain of scholarships, ontology for scholarships has been thoroughly evaluated, and the names of the disciplines were chosen as a target area for the further information retrieval research. Discovery of the ontological concepts in the text was performed by, first, scraping the webpage for the target section, and then by implementing Boolean search method with and without prior preprocessing. Such approach demonstrated very good results, and with preprocessing roughly 70% of the disciplines were retrieved. Furthermore, extension of the ontology has been proposed as the way to increase extraction rate by 10%. Overall, 80% of the disciplines can be retrieved by our method.