CLIRch, an extensible open source framework for query translation: evaluated for use on the Norwegian/Spanish language pair.
MetadataShow full item record
CLIR, Cross-Lingual Information Retrieval, is a field of research that can behighly useful in web search and for several other applications. Extensiveresearch has been done on possible CLIR implementations, but as of yet thereare no open source frameworks or applications readily available. The thesisfocuses on building such a framework and evaluating it for use on theNorwegian/Spanish language pair.The framework implemented uses query translation to submit queries to existinginformation retrieval (IR) implementations, and the framework itself holds nolow-level IR algorithms. Experiments were performed on a small parallel corpusof Norwegian and Spanish texts, using the Xapian and PostgreSQL IRimplementations. A comprehensive comparison of possible configurations wasdone, and certain measures were shown to be effective when searching fordocuments in either language.The framework is implemented in a modular architecture, allowing the suggestedadditions and amendments to be implemented as add-on components. This is themain intent of the framework, and eases the process of building support foradditional languages as well. For easing the adoption of the framework,additional components and data may be beneficial.Some improvements are also possible for the tested language pair, throughobtaining larger data sets or implementing certain language specificalgorithms. Of particular interest is implementing effective decompounding ofNorwegian compound words and phrase translation support. Suggestions are alsomade for how the system can be used to perform CLIR tasks in other languages.