Part-of-Speech Tagging for Code-Mixed English-Hindi Twitter and Facebook Chat Messages
Chapter
Accepted version
Permanent lenke
http://hdl.handle.net/11250/2391290Utgivelsesdato
2015Metadata
Vis full innførselSamlinger
Originalversjon
Angelova, Galia; Bontcheva, Kalina; Mitkov, Ruslan [Eds.] Proceedings of the International Conference Recent Advances in Natural Language Processing p. 239-248 International conference: Recent advances in natural language processing, Association for Computational Linguistics, 2015Sammendrag
The paper reports work on collecting and
annotating code-mixed English-Hindi so-
cial media text (Twitter and Facebook
messages), and experiments on automatic
tagging of these corpora, using both a
coarse-grained and a fine-grained part-of-
speech tag set. We compare the perfor-
mance of a combination of language spe-
cific taggers to that of applying four ma-
chine learning algorithms to the task (Con-
ditional Random Fields, Sequential Mini-
mal Optimization, Naïve Bayes and Ran-
dom Forests), using a range of different
features based on word context and word-
internal information