Automatic template generation

Rogstad, Erik; Ulseth, Øystein

dc.contributor.advisor	Amble, Tore	nb_NO
dc.contributor.advisor	Lech, Till Christopher	nb_NO
dc.contributor.author	Rogstad, Erik	nb_NO
dc.contributor.author	Ulseth, Øystein	nb_NO
dc.date.accessioned	2014-12-19T13:30:36Z
dc.date.available	2014-12-19T13:30:36Z
dc.date.created	2010-09-02	nb_NO
dc.date.issued	2006	nb_NO
dc.identifier	346668	nb_NO
dc.identifier	ntnudaim:1392	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/250037
dc.description.abstract	In natural language processing (NLP), templates define events and actions in text documents. In particular, templates are useful for information extraction (IE). Traditionally, template generation is a manual process, which is time consuming and tedious. Additionally, such templates are restricted to a limited number of knowledge domains. With these considerations in mind, automatic generation of templates from unstructured text is useful for a wide range of applications. This thesis proposes a method for automatic generation of templates from unstructured text. The method learns templates from training sets of text documents and returns templates that capture stereotyped behavior in the document collections. In addition, the report proposes a method that uses the template sets in order to classify text documents and extract information from the documents. In order to arrive with a set of templates that captures stereotyped behavior, predicate argument structures (PA-structures) are first extracted from the documents. Next, all the PA-structures are transformed into template representation. Eventually templates are merged and the resulting template set is returned. All the templates are given a shared information value (SI-value). SI-values indicate the level of shared information captured in the templates, in other words to what extent the templates describe stereotyped behavior in the domain. As an integral part of the system a parser that extracts predicate argument structures have been implemented. Precision and recall of the extractor is 89,7% and 79,1%, respectively. The template sets generated have proven to be very useful both in order to classify text documents and to extract information from text document.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Institutt for datateknikk og informasjonsvitenskap	nb_NO
dc.subject	ntnudaim	no_NO
dc.subject	SIF2 datateknikk	no_NO
dc.subject	Intelligente systemer	no_NO
dc.title	Automatic template generation	nb_NO
dc.type	Master thesis	nb_NO
dc.source.pagenumber	124	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskap	nb_NO

Tilhørende fil(er)

Filnavn:: 346668_COVER01.pdf
Størrelse:: 47.50Kb
Format:: PDF

Låst

Filnavn:: 346668_FULLTEXT01.pdf
Størrelse:: 1.607Mb
Format:: PDF

Låst

Filnavn:: 346668_ATTACHMENT01.zip
Størrelse:: 402.0Kb
Format:: Ukjent

Låst

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6544]

Vis enkel innførsel