Vis enkel innførsel

dc.contributor.authorDas, Amitava
dc.contributor.authorGambäck, Björn
dc.date.accessioned2016-01-23T23:45:14Z
dc.date.accessioned2016-04-13T14:46:27Z
dc.date.available2016-01-23T23:45:14Z
dc.date.available2016-04-13T14:46:27Z
dc.date.issued2014
dc.identifier.citationSangal, Rajeev [Eds.] Proceedings of the 11th International Conference on Natural Language Processing, International Institute of Information Technology, 2014nb_NO
dc.identifier.isbn9788177649604
dc.identifier.urihttp://hdl.handle.net/11250/2385477
dc.description.abstractLanguage identification at the document level has been considered an almost solved problem in some application areas, but language detectors fail in the social media context due to phenomena such as utterance internal code-switching, lexical borrowings, and phonetic typing; all implying that language identification in social media has to be carried out at the word level. The paper reports a study to detect language boundaries at the word level in chat message corpora in mixed EnglishBengali and English-Hindi. We introduce a code-mixing index to evaluate the level of blending in the corpora and describe the performance of a system developed to separate multiple languages.nb_NO
dc.language.isoengnb_NO
dc.publisherInternational Institute of Information Technology Goa, Indianb_NO
dc.relation.ispartofseriesProceedings of the 11th International Conference on Natural Language Processing;52
dc.titleIdentifying Languages at the Word Level in Code-Mixed Indian Social Media Textnb_NO
dc.typeChapternb_NO
dc.date.updated2016-01-23T23:45:14Z
dc.description.versionpublishedVersion
dc.identifier.cristin1320947
dc.description.localcodeForlagets publiserte versjonnb_NO


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel