Durability in a data-flow storage system

Ek, Lars Martin Bævre

Ek, Lars Martin Bævre

Master thesis

Åpne

18715_FULLTEXT.pdf (4.776Mb)

18715_COVER.pdf (1.556Mb)

Permanent lenke

http://hdl.handle.net/11250/2562277

Utgivelsesdato

2018

Metadata

Vis full innførsel

Samlinger

Institutt for datateknologi og informatikk [6787]

Sammendrag

Traditional database systems do not meet the throughput demands of today's web applications. Mitigation strategies on the form of intricate cache hierarchies and manual view materialization solve parts of the performance equation, at the cost of increasing complexity. Soup is a new structured storage system that scales to millions of reads per second on a single machine, without the architectural complexity seen in other storage deployments. By propagating updates through a data-flow graph, where pre-computed state is incrementally maintained at materialized nodes throughout the graph, Soup moves the majority of the processing work from reads to writes.

Soup stores all data in volatile main-memory and relies on a write-ahead log for durability. While fast memory access is crucial for frequently accessed materialized views, it is a cumbersome requirement for Soup's base tables, which only serve read requests at rare occasions. By implementing a disk-resident index structure on top of the RocksDB storage engine, this thesis moves Soup from a pure main-memory database to a structured storage system capable of handling datasets larger than its memory size, with only a small decrease in overall write throughput.

With its base tables stored safely on durable storage, Soup can recover from fatal failures by gradually building up its partially materialized views as needed. Similar to a system that recovers with an empty cache, this reduces initial performance while live requests slowly build up Soup's memory-based state. To completely remove the performance degradation of recovery, this thesis also implements a method of performing a global checkpoint of Soup's materialized state. By approaching the data-flow graph as a distributed system, the method performs a coordinated snapshot of all local state, allowing Soup to recover in about a tenth of the time.

Utgiver

NTNU