Knowledge about exact tweet location is valuable for several reasons. The proportion of geotagged tweets are reported to be low, ranging from 0.42% to 3% [8; 5; 17]. Estimation of tweet location based on tweet text has therefore emerged as an active field of research. Previous efforts have also included metadata, but only for prediction of coarse-grained locations, such as cities [10; 6] or countries .
In addition to tweet text, this thesis evaluates seven metadata attributes associated with a tweet for fine-grained location estimation. Moreover, the importance of time is also analyzed. A Naive Bayes classifier is used to evaluate single attributes and in multiple combinations.
Experiments on tweets show that combinations of attributes outperform a classifier trained on tweet text alone. It also indicates that attributes that have reported to be valuable in country and city location estimation are of less importance for fine-grained location estimation. Time influences the best performing model, and performance suffers when it is tested with new tweets. There are also indications that it is harder to accurately estimate location in the morning as opposed to other times of the day.