Gauging triple stores with actual biological data

Mironov, Vladimir; Seethappan, Nirmala; Blondé, Ward; Antezana, Erick; Splendiani, Andrea; Kuiper, Martin

dc.contributor.author	Mironov, Vladimir
dc.contributor.author	Seethappan, Nirmala
dc.contributor.author	Blondé, Ward
dc.contributor.author	Antezana, Erick
dc.contributor.author	Splendiani, Andrea
dc.contributor.author	Kuiper, Martin
dc.date.accessioned	2015-09-21T11:56:50Z
dc.date.accessioned	2015-12-03T12:09:03Z
dc.date.available	2015-09-21T11:56:50Z
dc.date.available	2015-12-03T12:09:03Z
dc.date.issued	2012
dc.identifier.citation	BMC Bioinformatics 2012, 13	nb_NO
dc.identifier.issn	1471-2105
dc.identifier.uri	http://hdl.handle.net/11250/2366703
dc.description.abstract	Background: Semantic Web technologies have been developed to overcome the limitations of the current Web and conventional data integration solutions. The Semantic Web is expected to link all the data present on the Internet instead of linking just documents. One of the foundations of the Semantic Web technologies is the knowledge representation language Resource Description Framework (RDF). Knowledge expressed in RDF is typically stored in so-called triple stores (also known as RDF stores), from which it can be retrieved with SPARQL, a language designed for querying RDF-based models. The Semantic Web technologies should allow federated queries over multiple triple stores. In this paper we compare the efficiency of a set of biologically relevant queries as applied to a number of different triple store implementations. Results: Previously we developed a library of queries to guide the use of our knowledge base Cell Cycle Ontology implemented as a triple store. We have now compared the performance of these queries on five non-commercial triple stores: OpenLink Virtuoso (Open-Source Edition), Jena SDB, Jena TDB, SwiftOWLIM and 4Store. We examined three performance aspects: the data uploading time, the query execution time and the scalability. The queries we had chosen addressed diverse ontological or biological questions, and we found that individual store performance was quite query-specific. We identified three groups of queries displaying similar behaviour across the different stores: 1) relatively short response time queries, 2) moderate response time queries and 3) relatively long response time queries. SwiftOWLIM proved to be a winner in the first group, 4Store in the second one and Virtuoso in the third one. Conclusions: Our analysis showed that some queries behaved idiosyncratically, in a triple store specific manner, mainly with SwiftOWLIM and 4Store. Virtuoso, as expected, displayed a very balanced performance - its load time and its response time for all the tested queries were better than average among the selected stores; it showed a very good scalability and a reasonable run-to-run reproducibility. Jena SDB and Jena TDB were consistently slower than the other three implementations. Our analysis demonstrated that most queries developed for Virtuoso could be successfully used for other implementations.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	BioMed Central	nb_NO
dc.title	Gauging triple stores with actual biological data	nb_NO
dc.type	Journal article	nb_NO
dc.type	Peer reviewed	en_GB
dc.date.updated	2015-09-21T11:56:50Z
dc.source.volume	13	nb_NO
dc.source.journal	BMC Bioinformatics	nb_NO
dc.identifier.doi	10.1186/1471-2105-13-S1-S3
dc.identifier.cristin	948425
dc.description.localcode	© 2012 Mironov et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.	nb_NO

Tilhørende fil(er)

Filnavn:: 1471-2105-13-S1-S3.pdf
Størrelse:: 262.9Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for biologi [2514]
Publikasjoner fra CRIStin - NTNU [37221]

Vis enkel innførsel