Vis enkel innførsel

dc.contributor.advisorAmble, Torenb_NO
dc.contributor.advisorLech, Till Christophernb_NO
dc.contributor.authorRogstad, Eriknb_NO
dc.contributor.authorUlseth, Øysteinnb_NO
dc.date.accessioned2014-12-19T13:30:36Z
dc.date.available2014-12-19T13:30:36Z
dc.date.created2010-09-02nb_NO
dc.date.issued2006nb_NO
dc.identifier346668nb_NO
dc.identifierntnudaim:1392nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/250037
dc.description.abstractIn natural language processing (NLP), templates define events and actions in text documents. In particular, templates are useful for information extraction (IE). Traditionally, template generation is a manual process, which is time consuming and tedious. Additionally, such templates are restricted to a limited number of knowledge domains. With these considerations in mind, automatic generation of templates from unstructured text is useful for a wide range of applications. This thesis proposes a method for automatic generation of templates from unstructured text. The method learns templates from training sets of text documents and returns templates that capture stereotyped behavior in the document collections. In addition, the report proposes a method that uses the template sets in order to classify text documents and extract information from the documents. In order to arrive with a set of templates that captures stereotyped behavior, predicate argument structures (PA-structures) are first extracted from the documents. Next, all the PA-structures are transformed into template representation. Eventually templates are merged and the resulting template set is returned. All the templates are given a shared information value (SI-value). SI-values indicate the level of shared information captured in the templates, in other words to what extent the templates describe stereotyped behavior in the domain. As an integral part of the system a parser that extracts predicate argument structures have been implemented. Precision and recall of the extractor is 89,7% and 79,1%, respectively. The template sets generated have proven to be very useful both in order to classify text documents and to extract information from text document.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.subjectntnudaimno_NO
dc.subjectSIF2 datateknikkno_NO
dc.subjectIntelligente systemerno_NO
dc.titleAutomatic template generationnb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber124nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Tilhørende fil(er)

Thumbnail
Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel