Show simple item record

dc.contributor.advisorOlstad, Bjørnnb_NO
dc.contributor.advisorTorbjørnsen, Øysteinnb_NO
dc.contributor.authorGrimsmo, Nilsnb_NO
dc.date.accessioned2014-12-19T13:33:04Z
dc.date.available2014-12-19T13:33:04Z
dc.date.created2010-09-03nb_NO
dc.date.issued2005nb_NO
dc.identifier348092nb_NO
dc.identifierntnudaim:1034nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/250960
dc.description.abstractThis report explores the problem of substring search in a dynamic document set. The operations supported are document inclusion, document removal and queries. This is a well explored field for word indexes, but not for substring indexes. The contributions of this report is the exploration of a multi-document dynamic suffix tree (MDST), which is compared with using a hierarchy of static indexes using suffix arrays. Only memory resident data structures are explored. The concept of a ``generalised suffix tree'', indexing a static set of strings, is used in bioinformatics. The implemented data structure adds online document inclusion, update and removal, linear on the single document size. Various models for the hierarchy of static indexes is explored, some which of give faster update, and some faster search. For the static suffix arrays, the BPR cite{SS05} construction algorithm is used, which is the fastest known. This algorithm is about 3-4 times faster than the implemented suffix tree construction. Two tricks for speeding up search and hit reporting in the suffix array are also explored: Using a start index for the binary search, and a direct map of global addresses to document IDs and local addresses. The tests show that the MDST is much faster than the hierarchic indexes when the index freshness requirement is absolute, and the documents are small. The tree uses about three times as much memory as the suffix arrays. When there is a large number of hits, the suffix arrays are slightly faster on reporting hits, as there they have better memory locality. If you have enough primary memory, the MDST seems to be the best choice in general.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.subjectntnudaimno_NO
dc.subjectSIF2 datateknikkno_NO
dc.subjectProgram- og informasjonssystemerno_NO
dc.titleDynamic indexes vs. static hierarchies for substring searchnb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber78nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record