System Recovery in Large-Scale Distributed Storage Systems

Aga, Svein

Aga, Svein

Master thesis

Åpne

348644_FULLTEXT01.pdf (1.530Mb)

348644_COVER01.pdf (46.42Kb)

Permanent lenke

http://hdl.handle.net/11250/251274

Utgivelsesdato

2008

Metadata

Vis full innførsel

Samlinger

Institutt for datateknologi og informatikk [6778]

Sammendrag

This report aims to describe and improve a system recovery process in large-scale storage systems. Inevitable, a recovery process results in the system being loaded with internal replication of data, and will extensively utilize several storage nodes. Such internal load can be categorized and generalized into a maintenance workload class. Obviously, a storage system will have external clients which also introduce load into the system. This can be users altering their data, uploading new content, etc. Load generated by clients can be generalized into a production workload class. When both workload classes are actively present in a system, i.e. the system is recovering while users are simultaneously accessing their data, there will be a competition of system resources between the different workload classes. The storage must ensure Quality of Service (QoS) for each workload class so that both are guaranteed system resources. We have created Dynamic Tree with Observed Metrics (DTOM), an algorithm designed to gracefully throttle resources between multiple different workload classes. DTOM can be used to enforce and ensure QoS for the variety of workloads in a system. Experimental results demonstrate that DTOM outperforms another well-known scheduling algorithm. In addition, we have designed a recovery model which aims to improve handling of critical maintenance workload. Although the model is intentionally intended for system recovery, it can also be applied to many other contexts.

Utgiver

Institutt for datateknikk og informasjonsvitenskap