Identifying Languages at the Word Level in Code-Mixed Indian Social Media Text

Das, Amitava; Gambäck, Björn

dc.contributor.author	Das, Amitava
dc.contributor.author	Gambäck, Björn
dc.date.accessioned	2016-01-23T23:45:14Z
dc.date.accessioned	2016-04-13T14:46:27Z
dc.date.available	2016-01-23T23:45:14Z
dc.date.available	2016-04-13T14:46:27Z
dc.date.issued	2014
dc.identifier.citation	Sangal, Rajeev [Eds.] Proceedings of the 11th International Conference on Natural Language Processing, International Institute of Information Technology, 2014	nb_NO
dc.identifier.isbn	9788177649604
dc.identifier.uri	http://hdl.handle.net/11250/2385477
dc.description.abstract	Language identification at the document level has been considered an almost solved problem in some application areas, but language detectors fail in the social media context due to phenomena such as utterance internal code-switching, lexical borrowings, and phonetic typing; all implying that language identification in social media has to be carried out at the word level. The paper reports a study to detect language boundaries at the word level in chat message corpora in mixed EnglishBengali and English-Hindi. We introduce a code-mixing index to evaluate the level of blending in the corpora and describe the performance of a system developed to separate multiple languages.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	International Institute of Information Technology Goa, India	nb_NO
dc.relation.ispartofseries	Proceedings of the 11th International Conference on Natural Language Processing;52
dc.title	Identifying Languages at the Word Level in Code-Mixed Indian Social Media Text	nb_NO
dc.type	Chapter	nb_NO
dc.date.updated	2016-01-23T23:45:14Z
dc.description.version	publishedVersion
dc.identifier.cristin	1320947
dc.description.localcode	Forlagets publiserte versjon	nb_NO

Files in this item

Name:: File52-p169.pdf
Size:: 220.7Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Institutt for datateknologi og informatikk [6766]
Publikasjoner fra CRIStin - NTNU [37994]

Show simple item record