Vis enkel innførsel

dc.contributor.advisorBratsberg, Svein Eriknb_NO
dc.contributor.advisorTorbjørnsen, Øysteinnb_NO
dc.contributor.authorNatvig, Olanb_NO
dc.date.accessioned2014-12-19T13:32:43Z
dc.date.available2014-12-19T13:32:43Z
dc.date.created2010-09-03nb_NO
dc.date.issued2010nb_NO
dc.identifier347984nb_NO
dc.identifierntnudaim:5488nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/250820
dc.description.abstractThe structure of XML documents can be used by search engines to answer structured queries or to provide better relevancy. Several index structures exist for search in XML data. This study focuses on inverted lists with dictionary coded path types and dewey coded path instances. The dewey coded path index is large, but could be compressed. This study examines query processing with indexes encoded using well known integer coding methods VByte and PFor(delta) and methods tailored for the dewey index. Intersection queries and structural queries are evaluated. In addition to standard document level skipping, skip operations for path types are implemented and evaluated. Four extensions over plain PFor methods are proposed and tested. Path type sorting sorts dewey codes on their path types and store all deweys from one path type together. Column wise dewey storage stores the deweys in columns instead of rows. Prefix coding a well known method, is adapted to the column wise dewey storage, and dynamic column wise method chooses between row wise and column wise storage based on the compressed data. Experiments are performed on a XML collection based on Wikipedia. Queries are generated from the TREC 06 efficiency task query trace. Several different types of structural queries have been executed. Experiments show that column wise methods perform good on both intersection and structural queries. The dynamic column wise scheme is in most cases the best, and creates the smallest index. Special purpose skipping for path types makes some queries extremely fast and can be implemented with only limited storage footprint. The performance of in-memory search with multi-threaded query execution is limited by memory bandwidth.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.subjectntnudaimno_NO
dc.subjectSIF2 datateknikkno_NO
dc.subjectKomplekse datasystemerno_NO
dc.titleCompression in XML search enginesnb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber143nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Tilhørende fil(er)

Thumbnail
Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel