• norsk
    • English
  • norsk 
    • norsk
    • English
  • Logg inn
Vis innførsel 
  •   Hjem
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for informasjonssikkerhet og kommunikasjonsteknologi
  • Vis innførsel
  •   Hjem
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for informasjonssikkerhet og kommunikasjonsteknologi
  • Vis innførsel
JavaScript is disabled for your browser. Some features of this site may not work without it.

Hadoop, and Its Mechanisms for Reliable Storage

Midthaug, Ingvild Hovdelien
Master thesis
Thumbnail
Åpne
19764_FULLTEXT.pdf (1.773Mb)
19764_COVER.pdf (1.556Mb)
Permanent lenke
http://hdl.handle.net/11250/2576513
Utgivelsesdato
2018
Metadata
Vis full innførsel
Samlinger
  • Institutt for informasjonssikkerhet og kommunikasjonsteknologi [2777]
Sammendrag
Nowadays the global amount of digital data increases rapidly. Internet-connected devices generate massive amounts of data through various interactions such as digital communication and file sharing. In a world surrounded by such interactions every day, this results in Big Data sets. The datasets can be analyzed and further be used for various purposes such as personalized marketing and health research. In order to analyze and utilize the data, it has to be transferred and stored reliably. Failures in storage systems happen frequently, so mechanisms for reliable data storage are needed. The Hadoop software provides a distributed file system that achieves reliable data storage through different coding techniques.

This thesis presents different mechanisms for reliable data storage in Hadoop and gives a practical implementation of an experimental Hadoop environment. The mechanisms include erasure coding (Reed-Solomon codes) and triple-replication. Further, the performance of the mechanisms is tested and compared. The performance parameters considered are the time of file recovery and the amount of network traffic during file recovery. Factors affecting the performance, such as file size and block size, are also considered. The test setup includes wired Ethernet connection, a configured multi-node Hadoop cluster, a managed network switch and a network analysis tool.

The obtained results show the impact of different factors on the Hadoop cluster performance during node failure. In general, the results confirm theory. Both the time of recovery and the network traffic during recovery increase with the file size. For erasure coding, the time of recovery increases with the code length, and block size of 128 MB gives the best overall performance. Moreover, optimized erasure coding variants for improving the cluster performance are presented in related work and then suggested as future work.
Utgiver
NTNU

Kontakt oss | Gi tilbakemelding

Personvernerklæring
DSpace software copyright © 2002-2019  DuraSpace

Levert av  Unit
 

 

Bla i

Hele arkivetDelarkiv og samlingerUtgivelsesdatoForfattereTitlerEmneordDokumenttyperTidsskrifterDenne samlingenUtgivelsesdatoForfattereTitlerEmneordDokumenttyperTidsskrifter

Min side

Logg inn

Statistikk

Besøksstatistikk

Kontakt oss | Gi tilbakemelding

Personvernerklæring
DSpace software copyright © 2002-2019  DuraSpace

Levert av  Unit