Augmenting sentiment prediction capabilities for code-mixed tweets with multilingual transformers

Hashmi, Ehtesham; Yildirim-Yayilgan, Sule; Shaikh, Sarang

dc.contributor.author	Hashmi, Ehtesham
dc.contributor.author	Yildirim-Yayilgan, Sule
dc.contributor.author	Shaikh, Sarang
dc.date.accessioned	2024-04-23T07:46:12Z
dc.date.available	2024-04-23T07:46:12Z
dc.date.created	2024-04-19T19:49:44Z
dc.date.issued	2024
dc.identifier.issn	1869-5450
dc.identifier.uri	https://hdl.handle.net/11250/3127710
dc.description.abstract	People in the modern digital era are increasingly embracing social media platforms to express their concerns and emotions in the form of reviews or comments. While positive interactions within diverse communities can considerably enhance confidence, it is critical to recognize that negative comments can hurt people’s reputations and well-being. Currently, individuals tend to express their thoughts in their native languages on these platforms, which is quite challenging due to potential syntactic ambiguity in these languages. Most of the research has been conducted for resource-aware languages like English. However, low-resource languages such as Urdu, Arabic, and Hindi present challenges due to limited linguistic resources, making information extraction labor-intensive. This study concentrates on code-mixed languages, including three types of text: English, Roman Urdu, and their combination. This study introduces robust transformer-based algorithms to enhance sentiment prediction in code-mixed text, which is a combination of Roman Urdu and English in the same context. Unlike conventional deep learning-based models, transformers are adept at handling syntactic ambiguity, facilitating the interpretation of semantics across various languages. We used state-of-the-art transformer-based models like Electra, code-mixed BERT (cm-BERT), and Multilingual Bidirectional and Auto-Regressive Transformers (mBART) to address sentiment prediction challenges in code-mixed tweets. Furthermore, results reveal that mBART outperformed the Electra and cm-BERT models for sentiment prediction in code-mixed text with an overall F1-score of 0.73. In addition to this, we also perform topic modeling to uncover shared characteristics within the corpus and reveal patterns and commonalities across different classes.	en_US
dc.language.iso	eng	en_US
dc.publisher	Springer	en_US
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	Augmenting sentiment prediction capabilities for code-mixed tweets with multilingual transformers	en_US
dc.title.alternative	Augmenting sentiment prediction capabilities for code-mixed tweets with multilingual transformers	en_US
dc.type	Journal article	en_US
dc.type	Peer reviewed	en_US
dc.description.version	publishedVersion	en_US
dc.source.journal	Social Network Analysis and Mining	en_US
dc.identifier.doi	10.1007/s13278-024-01245-6
dc.identifier.cristin	2263139
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: s13278-024-01245-6.pdf
Størrelse:: 1.222Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for informasjonssikkerhet og kommunikasjonsteknologi [2525]
Publikasjoner fra CRIStin - NTNU [37304]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal