A Framework for Ontology Based Semantic Search
Abstract
Publicly-accessible open transport data is provided by the public sector in an effortto create new opportunities, stimulate innovation and enable new solutions thatbenefits the society. The number of datasets available are however limited. Thisis partially due to the necessary, but labor intensive, preparation process of eachdataset. The datasets need to be annotated with descriptions that explain their purposeand content. The search and retrieval functionality of current publishing platformsare limited to classical keyword based search, which is much more restrictedthan the search technology used for finding information on the world wide web.This is due to the fact that information in most cases cannot be retrieved directlyfrom the data itself, but depends on the dataset descriptions. Open Datasets are encodedin a rich variety of formats which makes it difficult to reuse them directly insoftware applications. This study investigates how a transport domain knowledgemodel, namely an ontology of the transport domain, can enable data to be identifiedin terms of its meaning in a given context, i.e. semantics, and not by keywordsand tags alone. The study further to investigates how semantic technology can beapplied to improve discoverability and reuse of datasets. This was done by initiallydeveloping a prototype framework for ontology based semantic classification. Theframework works as a test bed that allows for different algorithms to be tested andcompared against different ontologies. The framework also includes the developmentof an online search engine that is used to measure the efficiency of the datadiscovery method. This study further includes a conceptual design for a softwaresystem that allows transport related software applications to utilize datasets fromheterogenous sources. The study finds that automated classification based on naturallanguage processing of dataset descriptions is possible and shows promisingresults. This approach appears to improve the search and retrieval functionality oflimited datasets, however it is currently sensitive to the quality of the descriptiontext and needs to developed further.