A deep network model for paraphrase detection in short text messages

Agarwal, Basant; Ramampiaro, Heri; Langseth, Helge; Ruocco, Massimiliano

dc.contributor.author	Agarwal, Basant
dc.contributor.author	Ramampiaro, Heri
dc.contributor.author	Langseth, Helge
dc.contributor.author	Ruocco, Massimiliano
dc.date.accessioned	2019-04-30T07:55:50Z
dc.date.available	2019-04-30T07:55:50Z
dc.date.created	2018-06-30T15:34:51Z
dc.date.issued	2018
dc.identifier.citation	Information Processing & Management. 2018, 54 922-937.	nb_NO
dc.identifier.issn	0306-4573
dc.identifier.uri	http://hdl.handle.net/11250/2596042
dc.description.abstract	This paper is concerned with paraphrase detection, i.e., identifying sentences that are semantically identical. The ability to detect similar sentences written in natural language is crucial for several applications, such as text mining, text summarization, plagiarism detection, authorship authentication and question answering. Recognizing this importance, we study in particular how to address the challenges with detecting paraphrases in user generated short texts, such as Twitter, which often contain language irregularity and noise, and do not necessarily contain as much semantic information as longer clean texts. We propose a novel deep neural network-based approach that relies on coarse-grained sentence modelling using a convolutional neural network (CNN) and a recurrent neural network (RNN) model, combined with a specific fine-grained word-level similarity matching model. More specifically, we develop a new architecture, called DeepParaphrase, which enables to create an informative semantic representation of each sentence by (1) using CNN to extract the local region information in form of important n-grams from the sentence, and (2) applying RNN to capture the long-term dependency information. In addition, we perform a comparative study on state-of-the-art approaches within paraphrase detection. An important insight from this study is that existing paraphrase approaches perform well when applied on clean texts, but they do not necessarily deliver good performance against noisy texts, and vice versa. In contrast, our evaluation has shown that the proposed DeepParaphrase-based approach achieves good results in both types of texts, thus making it more robust and generic than the existing approaches.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	Elsevier	nb_NO
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/deed.no	*
dc.title	A deep network model for paraphrase detection in short text messages	nb_NO
dc.type	Journal article	nb_NO
dc.type	Peer reviewed	nb_NO
dc.description.version	acceptedVersion	nb_NO
dc.source.pagenumber	922-937	nb_NO
dc.source.volume	54	nb_NO
dc.source.journal	Information Processing & Management	nb_NO
dc.identifier.doi	10.1016/j.ipm.2018.06.005
dc.identifier.cristin	1594948
dc.description.localcode	© 2018. This is the authors’ accepted and refereed manuscript to the article. Locked until 30.6.2020 due to copyright restrictions. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/	nb_NO
cristin.unitcode	194,63,10,0
cristin.unitname	Institutt for datateknologi og informatikk
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: Paraphrase paper IPM_revision2.pdf
Størrelse:: 644.7Kb
Format:: PDF
Beskrivelse:: Agarwal

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6560]
Publikasjoner fra CRIStin - NTNU [37325]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal