Classification of Pro-Eating Disorder Users on Twitter
Abstract
For the purpose of this study, 7096 users and 10.7M tweets were collected from Twitter and manually annotated. The data set included users taking part in pro-ED communities and users whose tweets were either recovery-oriented or unrelated to eating disorders. Analysis of the data set revealed differentiating characteristics in the users tweets and profile information, with respect to emoji use, presence of URLs and user mentions, and references to eating disorders and related topics.
Based on the established differences, groups of features, such as tweet n-grams and emojis, were extracted and used to train a series of supervised classifiers. Four machine learning models were explored; a Support Vector Machine, a Naïve Bayes model, a Logistic Regression model and a Random Forest. The highest F1-score (0.98) was achieved both when using an SVM and when using an ensemble approach trained on weighted feature groups with emphasis on unigrams from tweets.