The use of graph databases in file retrieval

Fostvedt, Fredrik Persen; Eriksen, Stephan Nordnes

dc.contributor.advisor	Bratsberg, Svein Erik	nb_NO
dc.contributor.author	Fostvedt, Fredrik Persen	nb_NO
dc.contributor.author	Eriksen, Stephan Nordnes	nb_NO
dc.date.accessioned	2014-12-19T13:41:26Z
dc.date.available	2014-12-19T13:41:26Z
dc.date.created	2014-09-30	nb_NO
dc.date.issued	2014	nb_NO
dc.identifier	751071	nb_NO
dc.identifier	ntnudaim:11372	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/253739
dc.description.abstract	Files and the unique terms they contain can be modeled as a graph where the vertices are files and terms and the edges describe containment. Can a graph databases be used for search and retrieval of local files? What problems arise and which optimizations can be done? How does such a method compare to today's file retrieval methods?The problem is approached in this project as a potentially commerciallizable software application. The intent is to create an environment where graph based file retrieval algorithms can easily be created, explored, tested and put in production. A highly modifiable Ruby based client server file retrieval application using Titan Aurelius Graph Database and rexpro is created. The server side consists of a Ruby on Rails back end with a rexpro connection to the graph database. The server can manage connections from several clients. The client side allows the user to index their files in the graph database on the server and run search queries for strings. Algorithms in groovy for Titan Aurelius can easily be implemented and tested on the server. Though the application is well suited for testing graph database file retrieval algorithms, only one was designed, implemented and tested. This is due to the time constraints on the project. The algorithm that was implemented and tested was ran on the indexed files of one of the project members on a handful of subjectively chosen search terms. It was a relatively simple algorithm that did not benefit from the full potential of a graph based file retrieval solution. The test was done to get an initial feel for the precision and recall of the algorithm and compare it to OSX Spotlight, which is the most highly developed local file retrieval service. The framework has proved simple enough to run and test algorithms. Because there was little test driven development involved, some uncertainty remains in the results in terms of what results the algorithms that were tested actually produced. The one algorithm that was designed and tested was pitted against OSX Spotlight. The algorithm showed a significantly lower performance than OSX Spotlight in terms of average precision and recall. Many reasons for this were identifiable. For instance, file types that were very unlikely to be a match were not filtered out. In a few cases, the application performed better than OSX Spotlight. It is too soon to determine for certain that a graph based file retrieval solution can compete with todays solutions. It does however have some precision and recall and has the potential to be significantly improved from its current state.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Institutt for datateknikk og informasjonsvitenskap	nb_NO
dc.title	The use of graph databases in file retrieval	nb_NO
dc.type	Master thesis	nb_NO
dc.source.pagenumber	135	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskap	nb_NO

Tilhørende fil(er)

Filnavn:: 751071_FULLTEXT01.pdf
Størrelse:: 3.272Mb
Format:: PDF

Låst

Filnavn:: 751071_COVER01.pdf
Størrelse:: 184.3Kb
Format:: PDF

Låst

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6778]

Vis enkel innførsel