Automatic Classification of Pro-Eating Disorder Twitter Accounts with Personality as a Feature

Gran, Martine Alvilde; Nornes, Andrea Hollung

dc.contributor.advisor	Gambäck, Björn
dc.contributor.author	Gran, Martine Alvilde
dc.contributor.author	Nornes, Andrea Hollung
dc.date.accessioned	2019-12-10T15:00:16Z
dc.date.available	2019-12-10T15:00:16Z
dc.date.issued	2019
dc.identifier.uri	http://hdl.handle.net/11250/2632545
dc.description.abstract	En person eller en gruppe som anser spiseforstyrrelser som en livsstil, i stedet for en dødelig psykisk lidelse, omtales som å være pro-spiseforstyrrelse eller pro-ED (fra det engelske ordet pro-Eating Disorder). Spiseforstyrrelser er den gruppen psykiske lidelser med høyest dødsrate. Store pro-ED samfunn har vokst frem siden lanseringen av internett. Disse nettbaserte samfunnene deler innhold med fokus på opprettholdelse av spiseforstyrrelser samt deling av inspirasjon og motivasjon. Det har blitt bevist at å se på denne typen innhold fører til lavere selvtillit og et ønske om å spise mindre. Mange mikrobloggingtjenester, deriblant Tumblr, Instagram og Pinterest, har tatt grep for å redusere mengden pro-ED innhold. Twitter har, på det tidspunktet denne oppgaven ble skrevet, derimot ikke tatt grep for å fjerne slikt innhold, hvilket betyr at mye pro-ED innhold er tilgjengelig på denne plattformen. Målet med denne oppgaven var å forbedre automatisk klassifisering av pro-ED konto- er på Twitter ved å ta i bruk personlighetstrekk fra Big 5 modellen som en feature. Totalt fire datasett ble samlet inn, der to ble brukt til å trene en Big 5 personlighets- detekteringsmodell, og ett ble brukt til å trene en pro-ED klassifiseringsmodell. Det siste datasettet ble ekskludert da det viste seg å påvirke resultatene på en negativ måte. De to datasettene som ble brukt til å trene personlighets-detekteringsmodellen ble slått sammen til ett stort dataset som inneholdt 169 Twitter kontoer og 2 467 essays. Disse kontoene og essayene hadde alle blitt merket med verdier for å representere Big 5 personlighetstrekk. Datasettet som ble brukt til klassifiseringen av pro-ED kontoer inneholdt 6 824 Twitter kontoer som ble merket med enten pro-ED, pro-recovery, eller unrelated. Etter å ha testet en rekke features og maskinlæringsalgoritmer ble det laget en ny state- of-the-art modell for klassifisering av pro-ED kontoer på Twitter. Denne modellen tar resultatene fra personlighets-detekteringsmodellen som en feature, sammen med unigrams, bigrams og topic models. Algoritmen som ble brukt for personlighetsdetektering var Support Vector Regression med Global Vectors som feature. Både Support Vector Machine og Multilayer Perceptron ble testet som mulige algoritmer for pro-ED-klassifiseringsmodell av Twitter-kontoer. Den beste F1 verdien var 0.99 og ble funnet ved å bruke Multilayer Perceptron med Big 5 personlighet inkludert i feature-settet.
dc.description.abstract	A person or a group of people who considers eating disorders as a lifestyle, instead of a deadly mental disease, is called pro-eating disorder (abbreviated pro-ED). Eating disorders are the number one most deadly group of mental disorders and since the introduction of the internet, large online pro-ED communities have sprung forth. These communities share content focusing on eating disorder maintenance, inspiration, and motivation. Viewing this kind of content has been proven to be damaging, resulting in lower self-esteem and the desire to eat less. Multiple microblogging services such as Tumblr, Instagram and Pinterest have taken measures to limit the amount of pro-ED content. Twitter has not taken any measures, as of writing this thesis, which means that a lot of pro-ED related content is available on the site. The goal of this thesis was to improve automatic classification of pro-ED Twitter accounts, by using the Big 5 personality model to calculate personality traits and add them to the list of features. A total of four datasets were accumulated, where two of the datasets ended up being used to train a Big 5 personality detection model and one was used to train a pro-ED classification model. The last dataset was found to significantly reduce the performance of the personality detection model, and was therefore discarded. The two datasets used to train the personality detection model were combined together and contained 2 636 Twitter accounts and essays. 169 of these were Twitter accounts, and the remaining 2 467 were essays. These accounts and essays were all labeled with Big 5 personality trait scores. The dataset used for the pro-ED classification model contained 6 824 Twitter accounts which were annotated as either pro-ED, pro-recovery, or unrelated. After testing a number of features and machine learning algorithms, a new, state-of-the- art pro-ED classification model was created. This model takes the predictions from the personality detection model as a feature, in combination with unigrams, bigrams, and topic models. The algorithm used for creating the personality detection model was the Support Vector Regression algorithm and Global Vectors was used as the only feature. Both Support Vector Machine and Multilayer Perceptron were tested as the pro-ED classification algorithm. The best F1 score was 0.99 and was achieved with the Multilayer Perceptron algorithm with the personality feature included in the feature set.
dc.language	eng
dc.publisher	NTNU
dc.title	Automatic Classification of Pro-Eating Disorder Twitter Accounts with Personality as a Feature
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:2424015.pdf
Størrelse:: 9.776Mb
Format:: PDF

Åpne

Filnavn:: no.ntnu:inspera:2424015.zip
Størrelse:: 55.65Kb
Format:: application/zip

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6772]

Vis enkel innførsel