Durability in a data-flow storage system

Ek, Lars Martin Bævre

dc.contributor.advisor	Nørvåg, Kjetil
dc.contributor.author	Ek, Lars Martin Bævre
dc.date.accessioned	2018-09-13T07:16:26Z
dc.date.available	2018-09-13T07:16:26Z
dc.date.created	2018-06-07
dc.date.issued	2018
dc.identifier	ntnudaim:18715
dc.identifier.uri	http://hdl.handle.net/11250/2562277
dc.description.abstract	Traditional database systems do not meet the throughput demands of today's web applications. Mitigation strategies on the form of intricate cache hierarchies and manual view materialization solve parts of the performance equation, at the cost of increasing complexity. Soup is a new structured storage system that scales to millions of reads per second on a single machine, without the architectural complexity seen in other storage deployments. By propagating updates through a data-flow graph, where pre-computed state is incrementally maintained at materialized nodes throughout the graph, Soup moves the majority of the processing work from reads to writes. Soup stores all data in volatile main-memory and relies on a write-ahead log for durability. While fast memory access is crucial for frequently accessed materialized views, it is a cumbersome requirement for Soup's base tables, which only serve read requests at rare occasions. By implementing a disk-resident index structure on top of the RocksDB storage engine, this thesis moves Soup from a pure main-memory database to a structured storage system capable of handling datasets larger than its memory size, with only a small decrease in overall write throughput. With its base tables stored safely on durable storage, Soup can recover from fatal failures by gradually building up its partially materialized views as needed. Similar to a system that recovers with an empty cache, this reduces initial performance while live requests slowly build up Soup's memory-based state. To completely remove the performance degradation of recovery, this thesis also implements a method of performing a global checkpoint of Soup's materialized state. By approaching the data-flow graph as a distributed system, the method performs a coordinated snapshot of all local state, allowing Soup to recover in about a tenth of the time.
dc.language	eng
dc.publisher	NTNU
dc.subject	Datateknologi, Databaser og søk
dc.title	Durability in a data-flow storage system
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: 18715_FULLTEXT.pdf
Størrelse:: 4.776Mb
Format:: PDF

Åpne

Filnavn:: 18715_COVER.pdf
Størrelse:: 1.556Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6831]

Vis enkel innførsel