Evaluating the potential of LSM-trees to supersede B-trees in databases
MetadataShow full item record
In recent years, the amount of data recorded and stored has increased substantially, causing a shift towards more write-intensive workloads for many database systems. In the same time period, storage technology has made big leaps. Solid state drives (SSDs), whose read and write performance is significantly better than hard disk drives' (HDDs), have increased in capacity while becoming more affordable. This has led to a widespread adoption of SSDs not only in consumer computers but also in enterprise data centers that handle petabytes of data. The combination of increasingly more write-intensive workloads and SSDs as primary storage means that data structures optimized for older hardware and read-intensive workloads no longer necessarily are the best choice for today's database systems. This has triggered development and research of new alternatives that enable database systems to write data more efficiently to disk. The LSM-tree, in particular, is one data structure that is gaining steadily more attention. It has repeatedly shown to perform better than the B-tree, a well-known and robust data structure used for decades by many database systems, for write-intensive workloads. However, the LSM-tree is often associated with having worse read performance than the B-tree and therefore less suitable for read-intensive workloads. Yet recently much work has been done to improve read performance of LSM-trees in order to make them a more attractive alternative for workloads in which also reads constitute a significant part. To assess how these two data structures compare against each other and whether the LSM-tree should be considered a replacement of the B-tree, this work has benchmarked different databases implementing the two tree structures and evaluated their properties and performance. Statistics on throughput, I/O, and CPU utilization have been recorded and analyzed to gain insight into the strengths and limits of each database. The results show that LSM-tree databases did not only achieve comparable read performance to B-tree databases, but were in some cases even superior. In accordance with their own claims, the LSM-tree databases did achieve far better write performance. Furthermore, the excellent compression ratios of LSM-trees allowed them to store data more efficiently. And many operations in an LSM-tree can run in parallel, an important property for current as well as future workload as processors gain increasingly more cores. However, for in-memory workloads, the B-tree still attained the best read performance. The B-tree should therefore, at this point, not be considered obsolete or inferior to the LSM-tree in all areas. Yet for workloads involving disk access, an LSM-tree brings enough advantages that it is likely to be the better choice---both for read and write-intensive workloads.