Show simple item record

dc.contributor.advisorBratsberg, Svein Eriknb_NO
dc.contributor.advisorTorbjørnsen, Øysteinnb_NO
dc.contributor.authorFellinghaug, Asbjørn Alexandernb_NO
dc.date.accessioned2014-12-19T13:32:15Z
dc.date.available2014-12-19T13:32:15Z
dc.date.created2010-09-03nb_NO
dc.date.issued2008nb_NO
dc.identifier347620nb_NO
dc.identifierntnudaim:3429nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/250656
dc.description.abstractPhrase searching in text indexes Compare different approaches to perform phrase searching, and consider a new approach whereas bigrams is considered as index term. This master thesis focus at the challenges within phrase searching in large text indexes, and to assess alternative approaches to cope with such indexes. This goal was achieved by performing an experiment, based on the theory of using bigrams consisting of stopwords as additional index terms. Realizing the characteristics within inverted index structures, we utilized stopwords as indicators for severe long posting lists. The characteristics of stopwords proved valuable, and they were collected based on a already established index for a subset of the TREC GOV2 collection. In alternative approaches we outlined two state of the art index structures, specifically designed to cope with phrase searching challenges. The first structure - nextword index - followed a modification of the inverted index structure. The second structure - phrase index - utilized the inverted structure in using complete phrases as index terms. Our bigram index focused on the same manipulation of the inverted index structure as the phrase index, using bigrams of words to rastically cut posting lists lengths. This was one of our main goals, as we identified stopwords posting list lengths to be one of the primary challenges with phrase searching in inverted index structures. Using stopwords to create and select bigrams proved successful to enhance phrase searching, as response times substantially improved. We conclude that our bigram index provides a significant performance in crease in terms of query evaluation time, and outperforms the standard inverted index within phrase searching.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.subjectntnudaimno_NO
dc.subjectMIT informatikkno_NO
dc.subjectInformasjonsforvaltningno_NO
dc.titlePhrase searching in text indexesnb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber137nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record