Ontology-based Data Extraction in the Scholarship-Related Content

2013

Master Thesis on the topic "Ontology-based Data Extraction in the Scholarship-Related Content"

is concentrated on the area of ontologies and on the research of the methods by which ontological

concepts can be recognized in the text, enhancing its semantic meaning.

The use of ontologies can significantly improve semantic richness of the texts presented on

the Web, but to be able to exploit all their capabilities, specific XML-based notations must be

written to describe each and every resource. This is usually quite a big amount of human work,

and the Thesis is seeking for the ways to decrease the amount of human resources, either by

suggesting automatic or semi-automatic approaches for ontology-based information retrieval.

In the experiments conducted in the domain of scholarships, ontology for scholarships has

been thoroughly evaluated, and the names of the disciplines were chosen as a target area for the

further information retrieval research.

Discovery of the ontological concepts in the text was performed by, first, scraping the webpage

for the target section, and then by implementing Boolean search method with and without prior

preprocessing. Such approach demonstrated very good results, and with preprocessing roughly

70% of the disciplines were retrieved. Furthermore, extension of the ontology has been proposed

as the way to increase extraction rate by 10%. Overall, 80% of the disciplines can be retrieved

by our method.