|dc.description.abstract||In recent years, there has been an increased focus on providing support for large spatial and spatiotemporal point data by using locality-preserving linearization techniques to transform multi-dimensional data into one-dimensional data and store the transformed data in scalable key-value NoSQL stores. Spatial and spatiotemporal region queries are then executed by decomposing the query into multiple linear range scans on the linearized system and processing the range scans in a filter-and-refine manner. The process of decomposing a multi-dimensional region query into a number of smaller linear range scans and reduce the number of false hits is essential for achieving efficient query processing in these systems. However, there exists a trade-off between the overhead due to creating additional range scans and the overhead due to removing false positives.
We have, in this thesis, studied various existing methods of decomposing spatial and spatiotemporal region queries into linear range scans. Based on the research, we have proposed a cost-driven approach that decomposes a query according to the data distribution, which is estimated by a histogram. The proposed method has been implemented in a simple database prototype and evaluated with real-world test data, and compared against a popular and more space-driven decomposition approach.
The experiments show that our method is able to outperform the baseline approach for both spatial and spatiotemporal point data with most improvement for the latter. One of the main strengths of the proposed method over the baseline is the capability to create an accurate decomposition with a relatively few range scans when dealing with skewed data. The downside is having to maintain a histogram over the data, which can be prohibitively expensive when dealing with large datasets. However, measures such as sampling were found to reduce this cost significantly with minimal impact on performance.||