Open-Domain Word-Level Interpretation of Norwegian: Towards a General Encyclopedic Question-Answering System for Norwegian

Ranang, Martin Thorsen

dc.contributor.advisor	Amble, Tore	nb_NO
dc.contributor.advisor	Nordgård, Torbjørn	nb_NO
dc.contributor.advisor	Gambäck, Björn	nb_NO
dc.contributor.author	Ranang, Martin Thorsen	nb_NO
dc.date.accessioned	2014-12-19T13:30:31Z
dc.date.available	2014-12-19T13:30:31Z
dc.date.created	2010-01-11	nb_NO
dc.date.issued	2010	nb_NO
dc.identifier	293836	nb_NO
dc.identifier.isbn	978-82-471-1973-0	nb_NO
dc.identifier.isbn	978-82-471-1974-7	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/250007
dc.description.abstract	No large-scale, open-domain semantic resource for Norwegian, with a rich number of semantic relations currently exists. The existing semantic resources for Norwegian are either limited in size and/or incompatible with the de facto standard resources used for Natural Language Processing for English. Both current and future cultural, technological, economical, and educational consequences caused by the scarcity of advanced Norwegian language-technological solutions and resources has been widely acknowledged (Simonsen 2005; Norwegian Language Council 2005; Norwegian Ministry of Culture and Church Affairs 2008). This dissertation presents (1) a novel method that consists of a model and several algorithms for automatically mapping content words from a non-English source language to (a power set of) WordNet (Miller 1995; Fellbaum 1998) senses with average precision of up to 92.1 % and recall of up to 36.5 %. Because an important feature of the method is its ability to correctly handle compounds, this dissertation also presents (2) a practical implementation, including algorithms and a grammar, of a program for automatically analyzing Norwegian compounds. This work also shows (3) how Verto, an implementation of the model and algorithms, is used to create Ordnett, a large-scale, open-domain lexical-semantic resource for Norwegian with a rich number of semantic relations. Finally, this work argues that the new method and automatically generated resource makes it possible to build large-scale open-domain Natural Language Understanding systems, that offer both wide coverage and deep analyses, for Norwegian texts. This is done by showing (4) how Ordnett can be used in an open-domain question answering system that automatically extracts and acquires knowledge from Norwegian encyclopedic articles and uses the acquired knowledge to answer questions formulated in natural language by its users. The open-domain question answering system, named TUClopedia, is based on The Understanding Computer (Amble 2003) which has previously been successfully applied to narrow domains.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Norges teknisk-naturvitenskapelige universitet	nb_NO
dc.relation.ispartofseries	Doctoral Theses at NTNU, 1503-8181; 2010:11	nb_NO
dc.subject	ontology	en_GB
dc.subject	natural language understanding	en_GB
dc.subject	mapping	en_GB
dc.subject	knowledge extraction	en_GB
dc.subject	question answering	en_GB
dc.subject	Norwegian	en_GB
dc.subject	WordNet	en_GB
dc.title	Open-Domain Word-Level Interpretation of Norwegian: Towards a General Encyclopedic Question-Answering System for Norwegian	nb_NO
dc.type	Doctoral thesis	nb_NO
dc.source.pagenumber	233	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskap	nb_NO
dc.description.degree	PhD i informasjons- og kommunikasjonsteknologi	nb_NO
dc.description.degree	PhD in Information and Communications Technology	en_GB

Tilhørende fil(er)

Filnavn:: 293836_FULLTEXT01.pdf
Størrelse:: 2.328Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6778]

Vis enkel innførsel