Show simple item record

dc.contributor.advisorNørvåg, Kjetil
dc.contributor.authorEk, Lars Martin Bævre
dc.date.accessioned2018-09-13T07:16:26Z
dc.date.available2018-09-13T07:16:26Z
dc.date.created2018-06-07
dc.date.issued2018
dc.identifierntnudaim:18715
dc.identifier.urihttp://hdl.handle.net/11250/2562277
dc.description.abstractTraditional database systems do not meet the throughput demands of today's web applications. Mitigation strategies on the form of intricate cache hierarchies and manual view materialization solve parts of the performance equation, at the cost of increasing complexity. Soup is a new structured storage system that scales to millions of reads per second on a single machine, without the architectural complexity seen in other storage deployments. By propagating updates through a data-flow graph, where pre-computed state is incrementally maintained at materialized nodes throughout the graph, Soup moves the majority of the processing work from reads to writes. Soup stores all data in volatile main-memory and relies on a write-ahead log for durability. While fast memory access is crucial for frequently accessed materialized views, it is a cumbersome requirement for Soup's base tables, which only serve read requests at rare occasions. By implementing a disk-resident index structure on top of the RocksDB storage engine, this thesis moves Soup from a pure main-memory database to a structured storage system capable of handling datasets larger than its memory size, with only a small decrease in overall write throughput. With its base tables stored safely on durable storage, Soup can recover from fatal failures by gradually building up its partially materialized views as needed. Similar to a system that recovers with an empty cache, this reduces initial performance while live requests slowly build up Soup's memory-based state. To completely remove the performance degradation of recovery, this thesis also implements a method of performing a global checkpoint of Soup's materialized state. By approaching the data-flow graph as a distributed system, the method performs a coordinated snapshot of all local state, allowing Soup to recover in about a tenth of the time.
dc.languageeng
dc.publisherNTNU
dc.subjectDatateknologi, Databaser og søk
dc.titleDurability in a data-flow storage system
dc.typeMaster thesis


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record