Locality-adapted kernel densities of term co-occurrences for location prediction of tweets

Özdikis, Özer; Ramampiaro, Heri; Nørvåg, Kjetil

dc.contributor.author	Özdikis, Özer
dc.contributor.author	Ramampiaro, Heri
dc.contributor.author	Nørvåg, Kjetil
dc.date.accessioned	2019-11-04T08:44:12Z
dc.date.available	2019-11-04T08:44:12Z
dc.date.created	2019-06-06T13:44:36Z
dc.date.issued	2019
dc.identifier.citation	Information Processing & Management. 2019, 56 (4), 1280-1299.	nb_NO
dc.identifier.issn	0306-4573
dc.identifier.uri	http://hdl.handle.net/11250/2626278
dc.description.abstract	While geographical metadata referring to the originating locations of tweets provides valuable information to perform effective spatial analysis in social networks, scarcity of such geotagged tweets imposes limitations on their usability. In this work, we propose a content-based location prediction method for tweets by analyzing the geographical distribution of tweet texts using Kernel Density Estimation (KDE). The primary novelty of our work is to determine different settings of kernel functions for every term in tweets based on the location indicativeness of these terms. Our proposed method, which we call locality-adapted KDE, uses information-theoretic metrics and does not require any parameter tuning for these settings. As a further enhancement on the term-level distribution model, we describe an analysis of spatial point patterns in tweet texts in order to identify bigrams that exhibit significant deviation from the underlying unigram patterns. We present an expansion of feature space using the selected bigrams and show that it eventually yields further improvement in prediction accuracy of our locality-adapted KDE. We demonstrate that our expansion results in a limited increase in the size of feature space and it does not hinder online localization of tweets. The methods we propose rely purely on statistical approaches without requiring any language-specific setting. Experiments conducted on three tweet sets from different countries show that our proposed solution outperforms existing state-of-the-art techniques, yielding significantly more accurate predictions.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	Elsevier	nb_NO
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/deed.no	*
dc.title	Locality-adapted kernel densities of term co-occurrences for location prediction of tweets	nb_NO
dc.type	Journal article	nb_NO
dc.type	Peer reviewed	nb_NO
dc.description.version	acceptedVersion	nb_NO
dc.source.pagenumber	1280-1299	nb_NO
dc.source.volume	56	nb_NO
dc.source.journal	Information Processing & Management	nb_NO
dc.source.issue	4	nb_NO
dc.identifier.doi	10.1016/j.ipm.2019.02.013
dc.identifier.cristin	1703185
dc.description.localcode	© 2019. This is the authors’ accepted and refereed manuscript to the article. Locked until 22.3.2021 due to copyright restrictions. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/	nb_NO
cristin.unitcode	194,63,10,0
cristin.unitname	Institutt for datateknologi og informatikk
cristin.ispublished	true
cristin.fulltext	postprint
cristin.qualitycode	2

Tilhørende fil(er)

Filnavn:: IPM2019GeoLoc.pdf
Størrelse:: 3.983Mb
Format:: PDF
Beskrivelse:: Özdikis

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6828]
Publikasjoner fra CRIStin - NTNU [38672]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal