Gender prediction on Norwegian Twitter accounts
Abstract
In this thesis, methods for predicting the gender of Norwegian Twitteraccounts were investigated. Through Twitterâ s public APIs, variousaccount information is available. Tweets (text), personal descriptions,friends networks, and profile images were the main fields investigated.First separate classifiers were fitted to features from the different fields,and later the individual classifiersâ posterior probability estimates werecombined to achieve increased accuracy. The datasets were labeledthough comparison of the accountsâ names and names in the Norwegianpopulation. Subsets of accounts with very gender specific names wereused for training and testing. The highest balanced accuracy obtainedwas around 0.89. This, however, required access to the accountsâ profileimages (85% of the data). Without images, the accuracy dropped toaround 0.85.