Show simple item record

dc.contributor.advisorAmble, Torenb_NO
dc.contributor.advisorGran, Stein Joarnb_NO
dc.contributor.authorOlsen, Bjørn Haraldnb_NO
dc.date.accessioned2014-12-19T13:34:18Z
dc.date.available2014-12-19T13:34:18Z
dc.date.created2010-09-05nb_NO
dc.date.issued2006nb_NO
dc.identifier349025nb_NO
dc.identifierntnudaim:1219nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/251478
dc.description.abstractThis work has focused on improving retrieval performance of search in Norwegian document collections. The initiator of the thesis, InfoFinder Norge, desired an Norwegian analyzer for DotLucene. The standard analyzer used before did not support stopword elimination and stemming for Norwegian language. Norwegian Analyzer and standard analyzer were used in turns on the same document collection before indexing and querying, then the respective results were compared to discover efficiency improvements. An evaluation method based on Term Relevance Sets were investigated and used on DotLucene with use of the two analyzer approaches. Term Relevance Sets methodology were also compared with common measurements for relevance judging, and found useful for evaluation of IR systems. The evaluation results of Norwegian analyzer and standard analyzer gave clear indications that use of stopword elimination and stemming for Norwegian documents improves retrieval efficiency. Term Relevance Set-based evaluation was found reliable by comparing the results with precision measurements. Precision was increased with 16% with use of Norwegian Analyzer compared to use an standard analyzer with no content preprocessing support for Norwegian. Term Relevance Set evaluation with use of 10 ontopic terms and 10 offtopic terms gave an increased $tScore$ of 44%. The results show that counting term occurrences in the content of retrieved documents can be used to gain confidence that documents are either relevant or not relevant.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.subjectntnudaimno_NO
dc.subjectSIF2 datateknikkno_NO
dc.subjectIntelligente systemerno_NO
dc.titleImplementation and evaluation of Norwegian Analyzer for use with DotLucenenb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber114nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record