dc.description.abstract | Publicly-accessible open transport data is provided by the public sector in an effort
to create new opportunities, stimulate innovation and enable new solutions that
benefits the society. The number of datasets available are however limited. This
is partially due to the necessary, but labor intensive, preparation process of each
dataset. The datasets need to be annotated with descriptions that explain their purpose
and content. The search and retrieval functionality of current publishing platforms
are limited to classical keyword based search, which is much more restricted
than the search technology used for finding information on the world wide web.
This is due to the fact that information in most cases cannot be retrieved directly
from the data itself, but depends on the dataset descriptions. Open Datasets are encoded
in a rich variety of formats which makes it difficult to reuse them directly in
software applications. This study investigates how a transport domain knowledge
model, namely an ontology of the transport domain, can enable data to be identified
in terms of its meaning in a given context, i.e. semantics, and not by keywords
and tags alone. The study further to investigates how semantic technology can be
applied to improve discoverability and reuse of datasets. This was done by initially
developing a prototype framework for ontology based semantic classification. The
framework works as a test bed that allows for different algorithms to be tested and
compared against different ontologies. The framework also includes the development
of an online search engine that is used to measure the efficiency of the data
discovery method. This study further includes a conceptual design for a software
system that allows transport related software applications to utilize datasets from
heterogenous sources. The study finds that automated classification based on natural
language processing of dataset descriptions is possible and shows promising
results. This approach appears to improve the search and retrieval functionality of
limited datasets, however it is currently sensitive to the quality of the description
text and needs to developed further. | |