Supporting the Join Operation in a NoSQL System - Mastering the internals of Cassandra
MetadataVis full innførsel
The join operation is one of the most valuable operations found in traditional database management systems. With this operation, it is possible to join data from multiple tables. Today, most NoSQL systems do not support the join operation. One of the reasons for why these systems do not support this operation is that it is too time-consuming when the data is replicated across multiple nodes. However, it is possible to accomplish the same result with two other options, denormalizing of the data or joining at the application level. Denormalizing will result in more redundant data and both options will involve the user more in the execution of join. Support for the join operation in the query language of a NoSQL system may ease the change of database system for some users that only wants to use a NoSQL system where data can be joined. This thesis presents an implementation of the equijoin in Cassandra since the two other options shown above are already covered by others. Cassandra is a NoSQL system classified as an extensible record store that is quite similar to the relational model used by, for example, MySQL. This implementation shows how the parsing, preparation and execution of the query are performed. Enabling support for queries that can be written in Cassandra Query Language (CQL) is done in the parsing step. A way of finding the join order that allows only one read of the table from memory or disk is also implemented. This join order is also slightly optimized where selections in a where clause are executed early on in the execution step. During execution, the nested loop join is used to accomplish the process of joining tables. The implementation of join in Cassandra shows a significant worse execution time than MySQL. One of the problems with Cassandra is the underlying architecture that is not designed for the purpose of joining data from multiple tables. However, this thesis shows that it is possible to support the join operation in Cassandra, but it still need some further work to execute within a reasonable time.