dc.description.abstract | In this thesis, methods for predicting the gender of Norwegian Twitter
accounts were investigated. Through Twitterâ s public APIs, various
account information is available. Tweets (text), personal descriptions,
friends networks, and profile images were the main fields investigated.
First separate classifiers were fitted to features from the different fields,
and later the individual classifiersâ posterior probability estimates were
combined to achieve increased accuracy. The datasets were labeled
though comparison of the accountsâ names and names in the Norwegian
population. Subsets of accounts with very gender specific names were
used for training and testing. The highest balanced accuracy obtained
was around 0.89. This, however, required access to the accountsâ profile
images (85% of the data). Without images, the accuracy dropped to
around 0.85. | |