Implementation and evaluation of Norwegian Analyzer for use with DotLucene

Olsen, Bjørn Harald

dc.contributor.advisor	Amble, Tore
dc.contributor.advisor	Gran, Stein Joar
dc.contributor.author	Olsen, Bjørn Harald
dc.date.accessioned	2018-11-05T15:00:59Z
dc.date.available	2018-11-05T15:00:59Z
dc.date.created	2006-06-15
dc.date.issued	2006
dc.identifier	ntnudaim:1219
dc.identifier.uri	http://hdl.handle.net/11250/2571077
dc.description.abstract	This work has focused on improving retrieval performance of search in Norwegian document collections. The initiator of the thesis, InfoFinder Norge, desired an Norwegian analyzer for DotLucene. The standard analyzer used before did not support stopword elimination and stemming for Norwegian language. Norwegian Analyzer and standard analyzer were used in turns on the same document collection before indexing and querying, then the respective results were compared to discover efficiency improvements. An evaluation method based on Term Relevance Sets were investigated and used on DotLucene with use of the two analyzer approaches. Term Relevance Sets methodology were also compared with common measurements for relevance judging, and found useful for evaluation of IR systems. The evaluation results of Norwegian analyzer and standard analyzer gave clear indications that use of stopword elimination and stemming for Norwegian documents improves retrieval efficiency. Term Relevance Set-based evaluation was found reliable by comparing the results with precision measurements. Precision was increased with 16\% with use of Norwegian Analyzer compared to use an standard analyzer with no content preprocessing support for Norwegian. Term Relevance Set evaluation with use of 10 ontopic terms and 10 offtopic terms gave an increased $tScore$ of 44\%. The results show that counting term occurrences in the content of retrieved documents can be used to gain confidence that documents are either relevant or not relevant.
dc.language	eng
dc.publisher	NTNU
dc.subject	Datateknologi, Intelligente systemer
dc.title	Implementation and evaluation of Norwegian Analyzer for use with DotLucene
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: 1219_FULLTEXT.pdf
Størrelse:: 1.438Mb
Format:: PDF

Åpne

Filnavn:: 1219_ATTACHMENT.zip
Størrelse:: 30.35Mb
Format:: application/zip

Åpne

Filnavn:: 1219_COVER.pdf
Størrelse:: 47.55Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6623]

Vis enkel innførsel