Efficient top-k recently-frequent term querying over spatio-temporal textual streams
Peer reviewed, Journal article
MetadataShow full item record
Original versionInformation Systems. 2021, 97 . https://doi.org/10.1016/j.is.2020.101687
Massive amounts of data with spatio-temporal-textual information are being generated due to the proliferation of GPS-equipped mobile devices. Much of this data are social media posts, often used to share and spread personal updates and news. Exploring valuable information from a dynamic collection of social posts is of great interest and has attracted many studies. However, because the size of data is huge, the existing methods mostly work with the time window model where the old data is discarded. In this work, we introduce the task of efficiently discovering the top-k most popular terms within a user specified bounded region over a stream of social posts, where the recent posts are more important than the old ones. To make this feasible, we propose a hybrid index structure and algorithms to efficiently answer such top-k queries. Our index employs a spatial index augmented by top-k time-weighted term lists and a bulk updating technique to support fast digestion of social post streams. Further, these top-k term lists are employed in the aggregation step to produce the final results so that incoming queries can be efficiently processed. An extensive experimental study with a large collection of social posts shows that the proposed methods are capable of both online aggregation and accurate query processing.