Gauging triple stores with actual biological data
Mironov, Vladimir; Seethappan, Nirmala; Blondé, Ward; Antezana, Erick; Splendiani, Andrea; Kuiper, Martin
Journal article, Peer reviewed
View/ Open
Date
2012Metadata
Show full item recordCollections
- Institutt for biologi [2615]
- Publikasjoner fra CRIStin - NTNU [38688]
Abstract
Background: Semantic Web technologies have been developed to overcome the limitations of the current Web
and conventional data integration solutions. The Semantic Web is expected to link all the data present on the
Internet instead of linking just documents. One of the foundations of the Semantic Web technologies is the
knowledge representation language Resource Description Framework (RDF). Knowledge expressed in RDF is
typically stored in so-called triple stores (also known as RDF stores), from which it can be retrieved with SPARQL, a
language designed for querying RDF-based models. The Semantic Web technologies should allow federated
queries over multiple triple stores. In this paper we compare the efficiency of a set of biologically relevant queries
as applied to a number of different triple store implementations.
Results: Previously we developed a library of queries to guide the use of our knowledge base Cell Cycle Ontology
implemented as a triple store. We have now compared the performance of these queries on five non-commercial
triple stores: OpenLink Virtuoso (Open-Source Edition), Jena SDB, Jena TDB, SwiftOWLIM and 4Store. We examined
three performance aspects: the data uploading time, the query execution time and the scalability. The queries we
had chosen addressed diverse ontological or biological questions, and we found that individual store performance
was quite query-specific. We identified three groups of queries displaying similar behaviour across the different
stores: 1) relatively short response time queries, 2) moderate response time queries and 3) relatively long response
time queries. SwiftOWLIM proved to be a winner in the first group, 4Store in the second one and Virtuoso in the
third one.
Conclusions: Our analysis showed that some queries behaved idiosyncratically, in a triple store specific manner,
mainly with SwiftOWLIM and 4Store. Virtuoso, as expected, displayed a very balanced performance - its load time
and its response time for all the tested queries were better than average among the selected stores; it showed a
very good scalability and a reasonable run-to-run reproducibility. Jena SDB and Jena TDB were consistently slower
than the other three implementations. Our analysis demonstrated that most queries developed for Virtuoso could
be successfully used for other implementations.