Show simple item record

dc.contributor.advisorGamback, Bjørnnb_NO
dc.contributor.authorNeergaard, Morten Mindenb_NO
dc.date.accessioned2014-12-19T13:38:35Z
dc.date.available2014-12-19T13:38:35Z
dc.date.created2012-11-08nb_NO
dc.date.issued2012nb_NO
dc.identifier565861nb_NO
dc.identifierntnudaim:5665nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/252891
dc.description.abstractCLIR, Cross-Lingual Information Retrieval, is a field of research that can behighly useful in web search and for several other applications. Extensiveresearch has been done on possible CLIR implementations, but as of yet thereare no open source frameworks or applications readily available. The thesisfocuses on building such a framework and evaluating it for use on theNorwegian/Spanish language pair.The framework implemented uses query translation to submit queries to existinginformation retrieval (IR) implementations, and the framework itself holds nolow-level IR algorithms. Experiments were performed on a small parallel corpusof Norwegian and Spanish texts, using the Xapian and PostgreSQL IRimplementations. A comprehensive comparison of possible configurations wasdone, and certain measures were shown to be effective when searching fordocuments in either language.The framework is implemented in a modular architecture, allowing the suggestedadditions and amendments to be implemented as add-on components. This is themain intent of the framework, and eases the process of building support foradditional languages as well. For easing the adoption of the framework,additional components and data may be beneficial.Some improvements are also possible for the tested language pair, throughobtaining larger data sets or implementing certain language specificalgorithms. Of particular interest is implementing effective decompounding ofNorwegian compound words and phrase translation support. Suggestions are alsomade for how the system can be used to perform CLIR tasks in other languages.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.subjectntnudaim:5665no_NO
dc.subjectMIT informatikkno_NO
dc.subjectKunstig intelligens og læringno_NO
dc.titleCLIRch, an extensible open source framework for query translation: evaluated for use on the Norwegian/Spanish language pair.nb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber64nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record