Continuously adapting continuous Queries for Data Streams in Raincoat
MetadataShow full item record
In the last decade, the world wide web has grown from being a platform where users passively viewed content, to an active platform where the users themselves contributed with new content. With this came an explosion of available data that ventures could use to gain market advantage. Not only did the did the amount of available data grow massively, but also newly produced data started to arrive at immense speed. This spawned a new field of specialized computational framework being able to handle the change in the data paradigm. Now, one must be able to process the massive amount of incoming data within a reasonable response time, as well as be able to handle its high velocity. This spurred several ideas for processing fast data. One of these ideas uses SQL-like languages for processing fast data, taking advantage of the years of work on query optimization theory.In the fall of 2012, we proposed and implemented the prototype of Raincoat. Raincoat was developed to ease developers without any experience with distributed programming, providing a familiar interface which they could use to deploy stream filtering jobs to a Storm cluster. As the prototype did not include any query optimization techniques it does not meet the expected performance requirements. In this thesis we research optimization techniques for scaling Raincoat. We explore optimization techniques from different fields including traditional, distributed, parallel, streaming and adaptive query optimization. We propose an adaptive query optimizer, inspired by existing adaptive query optimizers. The focus of the optimizer lies in detecting when an optimization is needed and which optimization techniques that should be applied. In this thesis we explore the possibility of adaptively achieving better performance and scalability by carefully selecting the join order, select order, merging of selection operators, and applying intra-operator parallelism on operators.Based on our results from experiments on the different implemented optimizers, we demonstrate their applicability and their significant contribution in increasing the performance of a Raincoat query.