Vis enkel innførsel

dc.contributor.advisorGambäck, Björn
dc.contributor.authorTrelease, Hanne Marie
dc.date.accessioned2019-09-11T10:56:18Z
dc.date.created2017-06-01
dc.date.issued2017
dc.identifierntnudaim:15882
dc.identifier.urihttp://hdl.handle.net/11250/2615855
dc.description.abstractTwitter is today a very popular microblogging platform with vast amounts of available data. This has created an interest in collecting information from Twitter data, for example in the form of sentiment analysis. Natural Language Processing (NLP) generally interprets the literal meaning of a text, which makes sarcasm a disruptive factor in sentiment analysis and other NLP tasks. The intended meaning of a sarcastic sentence is often the opposite of the literal meaning, which causes the polarity of the sentence to flip. Due to the challenge sarcasm presents, researchers have shown interest in automatic sarcasm detection of social media and Twitter data. This Master's Thesis introduces a sarcasm detection system for Twitter messages, known as tweets, for Norwegian and English data. The system detects sarcasm by using a supervised machine learning approach, and evaluations of three different machine learning classifiers are presented for the two languages. The impact of hashtag splitting and emojis on sarcasm detection and on the different feature groups used is also explored. Norwegian and English datasets of automatically annotated Norwegian and English tweets have been created, taking advantage of the fact that many Twitter users mark their messages as sarcastic by using sarcasm hashtags (e.g., "#sarcasm"). However, not all tweets containing such sarcasm hashtags can be interpreted as sarcastic. To include the sarcasm hashtags with the highest share of tweets considered as sarcastic in the datasets, a small review of possible Norwegian and English sarcasm hashtags has been made. The created English corpus is included in a comparison of datasets collected during different years. This comparison shows that training a classifier on a dataset that include tweets from several years overall performs better at classifying new, unseen tweets than a classifier trained on a dataset of tweets from one specific year. From the same comparison, it can also be seen that an English classifier predicting sarcasm in translated Norwegian tweets does not outperform a Norwegian classifier trained on original Norwegian data.en
dc.languageeng
dc.publisherNTNU
dc.subjectInformatikk, Kunstig intelligensen
dc.titleIdentifying Sarcasm in English and Norwegian Twitter Messagesen
dc.typeMaster thesisen
dc.source.pagenumber100
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi og elektroteknikk,Institutt for datateknologi og informatikknb_NO
dc.date.embargoenddate10000-01-01


Tilhørende fil(er)

Thumbnail
Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel