Authorship Identification of Research Papers

Authorship identification is a technique used to identify anonymous documents by identifying and extracting an authors stylometric features. The focus of this thesis is to apply an authorship identification technique, classification, to a set of research papers to determine the authorship. We go through theory and previous work of authorship identification before we present the implemented system. In the end, we perform two separate experiments and discuss their results.

The experiments show good results in specific cases, and we achieve an accuracy of 100% in the best case. The algorithms used are support vector machines, artificial neural networks, decision trees, random forests and the k-nearest neighbor. In our experiments support vector machines and artificial neural network had the best performance while decision trees performed worst.

Based on our results we propose caution when applying authorship identification before or after having performed a double-blind review, or for an author to use authorship identification to acquire an unbiased review of a research paper. Even though we state that authorship identification should be used with caution, it is still a great tool and gives a general idea of finding the authorship of an anonymous document.

Publisher

NTNU