Lex-pos feature-based grammar error detection system for the English language

Agarwal, Nancy; Wani, Mudasir Ahmad; Bours, Patrick

dc.contributor.author	Agarwal, Nancy
dc.contributor.author	Wani, Mudasir Ahmad
dc.contributor.author	Bours, Patrick
dc.date.accessioned	2022-05-04T08:52:22Z
dc.date.available	2022-05-04T08:52:22Z
dc.date.created	2020-11-04T11:15:31Z
dc.date.issued	2020
dc.identifier.citation	Electronics. 2020, 9 (10), .	en_US
dc.identifier.issn	2079-9292
dc.identifier.uri	https://hdl.handle.net/11250/2994073
dc.description.abstract	This work focuses on designing a grammar detection system that understands both structural and contextual information of sentences for validating whether the English sentences are grammatically correct. Most existing systems model a grammar detector by translating the sentences into sequences of either words appearing in the sentences or syntactic tags holding the grammar knowledge of the sentences. In this paper, we show that both these sequencing approaches have limitations. The former model is over specific, whereas the latter model is over generalized, which in turn affects the performance of the grammar classifier. Therefore, the paper proposes a new sequencing approach that contains both information, linguistic as well as syntactic, of a sentence. We call this sequence a Lex-Pos sequence. The main objective of the paper is to demonstrate that the proposed Lex-Pos sequence has the potential to imbibe the specific nature of the linguistic words (i.e., lexicals) and generic structural characteristics of a sentence via Part-Of-Speech (POS) tags, and so, can lead to a significant improvement in detecting grammar errors. Furthermore, the paper proposes a new vector representation technique, Word Embedding One-Hot Encoding (WEOE) to transform this Lex-Pos into mathematical values. The paper also introduces a new error induction technique to artificially generate the POS tag specific incorrect sentences for training. The classifier is trained using two corpora of incorrect sentences, one with general errors and another with POS tag specific errors. Long Short-Term Memory (LSTM) neural network architecture has been employed to build the grammar classifier. The study conducts nine experiments to validate the strength of the Lex-Pos sequences. The Lex-Pos -based models are observed as superior in two ways: (1) they give more accurate predictions; and (2) they are more stable as lesser accuracy drops have been recorded from training to testing. To further prove the potential of the proposed Lex-Pos -based model, we compare it with some well known existing studies	en_US
dc.language.iso	eng	en_US
dc.publisher	MDPI	en_US
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	Lex-pos feature-based grammar error detection system for the English language	en_US
dc.title.alternative	Lex-pos feature-based grammar error detection system for the English language	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
dc.source.pagenumber	17	en_US
dc.source.volume	9	en_US
dc.source.journal	Electronics	en_US
dc.source.issue	10	en_US
dc.identifier.doi	10.3390/electronics9101686
dc.identifier.cristin	1844827
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: Agarwal.pdf
Størrelse:: 327.3Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for informasjonssikkerhet og kommunikasjonsteknologi [2521]
Publikasjoner fra CRIStin - NTNU [37215]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal