Show simple item record

dc.contributor.advisorGamback, Bjørnnb_NO
dc.contributor.advisormarsi, erwinnb_NO
dc.contributor.authorRøkenes, Håkon Drolsumnb_NO
dc.date.accessioned2014-12-19T13:39:32Z
dc.date.available2014-12-19T13:39:32Z
dc.date.created2013-04-28nb_NO
dc.date.issued2012nb_NO
dc.identifier618487nb_NO
dc.identifierntnudaim:6700nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/253133
dc.description.abstractThe focus of this thesis is the exploration of graph-based similarity, in the context of natural language processing. The work is motivated by a need for richer representations of text. A graph edit distance algorithm was implemented, that calculates the difference between graphs. Sentences were represented by means of dependency graphs, which consist of words connected by dependencies. A dependency graph captures the syntactic structure of a sentence. The graph-based similarity approach was applied to the problem of detecting plagiarism, and was compared against state of the art systems. The key advantages of graph-based textual representations are mainly word order indifference and the ability to capture similarity between words, based on the sentence structure. The approach was compared against contributions made to the PAN plagiarism detection challenge at the CLEF 2011 conference, and would have achieved a 5th place out of 10 contestants. The evaluation results suggest that the approach can be applicable to the task of detecting plagiarism, but require some fine tuning on input parameters. The evaluation results demonstrated that dependency graphs are best represented by directed edges. The graph edit distance algorithm scored best with a combination of node and edge label matching. Different edit weights were applied, which increased performance. Keywords: Graph Edit Distance, Natural Language Processing, Dependency Graphs, Plagiarism Detectionnb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.titleGraph-based Natural Language Processing: Graph edit distance applied to the task of detecting plagiarismnb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber61nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record