• norsk
    • English
  • English 
    • norsk
    • English
  • Login
View Item 
  •   Home
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for datateknologi og informatikk
  • View Item
  •   Home
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for datateknologi og informatikk
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Graph-based Natural Language Processing: Graph edit distance applied to the task of detecting plagiarism

Røkenes, Håkon Drolsum
Master thesis
Thumbnail
View/Open
618487_FULLTEXT01.pdf (726.3Kb)
618487_ATTACHMENT01.zip (48.58Mb)
618487_COVER01.pdf (184.3Kb)
URI
http://hdl.handle.net/11250/253133
Date
2012
Metadata
Show full item record
Collections
  • Institutt for datateknologi og informatikk [3881]
Abstract
The focus of this thesis is the exploration of graph-based similarity, in the context of natural language processing. The work is motivated by a need for richer representations of text. A graph edit distance algorithm was implemented, that calculates the difference between graphs. Sentences were represented by means of dependency graphs, which consist of words connected by dependencies. A dependency graph captures the syntactic structure of a sentence. The graph-based similarity approach was applied to the problem of detecting plagiarism, and was compared against state of the art systems. The key advantages of graph-based textual representations are mainly word order indifference and the ability to capture similarity between words, based on the sentence structure. The approach was compared against contributions made to the PAN plagiarism detection challenge at the CLEF 2011 conference, and would have achieved a 5th place out of 10 contestants. The evaluation results suggest that the approach can be applicable to the task of detecting plagiarism, but require some fine tuning on input parameters. The evaluation results demonstrated that dependency graphs are best represented by directed edges. The graph edit distance algorithm scored best with a combination of node and edge label matching. Different edit weights were applied, which increased performance. Keywords: Graph Edit Distance, Natural Language Processing, Dependency Graphs, Plagiarism Detection
Publisher
Institutt for datateknikk og informasjonsvitenskap

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit
 

 

Browse

ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDocument TypesJournalsThis CollectionBy Issue DateAuthorsTitlesSubjectsDocument TypesJournals

My Account

Login

Statistics

View Usage Statistics

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit