Part-of-Speech Tagging for Code-Mixed English-Hindi Twitter and Facebook Chat Messages

Jamatia, Anupam; Gambäck, Björn; Das, Amitava

Jamatia, Anupam; Gambäck, Björn; Das, Amitava

Chapter

Accepted version

Åpne

R15-1033.pdf (159.6Kb)

Permanent lenke

http://hdl.handle.net/11250/2391290

Utgivelsesdato

2015

Metadata

Vis full innførsel

Samlinger

Institutt for datateknologi og informatikk [6828]
Publikasjoner fra CRIStin - NTNU [38672]

Originalversjon

Angelova, Galia; Bontcheva, Kalina; Mitkov, Ruslan [Eds.] Proceedings of the International Conference Recent Advances in Natural Language Processing p. 239-248 International conference: Recent advances in natural language processing, Association for Computational Linguistics, 2015

Sammendrag

The paper reports work on collecting and

annotating code-mixed English-Hindi so-

cial media text (Twitter and Facebook

messages), and experiments on automatic

tagging of these corpora, using both a

coarse-grained and a fine-grained part-of-

speech tag set. We compare the perfor-

mance of a combination of language spe-

cific taggers to that of applying four ma-

chine learning algorithms to the task (Con-

ditional Random Fields, Sequential Mini-

mal Optimization, Naïve Bayes and Ran-

dom Forests), using a range of different

features based on word context and word-

internal information

Utgiver

Association for Computational Linguistics

Serie

Proceedings of the International Conference Recent Advances in Natural Language Processing;33