Smart Data Placement for Big Data Pipelines: An Approach based on the Storage-as-a-Service Model

Khan, Akif Quddus; Nikolov, Nikolay Vladimirov; Matskin, Mihhail; Prodan, Radu; Song, Hui; Roman, Dumitru; Soylu, Ahmet

Khan, Akif Quddus; Nikolov, Nikolay Vladimirov; Matskin, Mihhail; Prodan, Radu; Song, Hui; Roman, Dumitru; Soylu, Ahmet

Chapter

Accepted version

Åpne

INTEL4EC___SmartDataPlacement.pdf (248.3Kb)

Permanent lenke

https://hdl.handle.net/11250/3062532

Utgivelsesdato

2022

Metadata

Vis full innførsel

Samlinger

Institutt for datateknologi og informatikk [6788]
Publikasjoner fra CRIStin - NTNU [38294]

Sammendrag

The development of big data pipelines is a challenging task, especially when data storage is considered as part of the data pipelines. Local storage is expensive, hard to maintain, comes with several challenges (e.g., data availability, data security, and backup). The use of cloud storage, i.e., Storageas-a-Service (StaaS), instead of local storage has the potential of providing more flexibility in terms of such as scalability, fault tolerance, and availability. In this paper, we propose a generic approach to integrate StaaS with data pipelines, i.e., computation on an on-premise server or on a specific cloud, but integration with StaaS, and develop a ranking method for available storage options based on five key parameters: cost, proximity, network performance, the impact of server-side encryption, and user weights. The evaluation carried out demonstrates the effectiveness of the proposed approach in terms of data transfer performance and the feasibility of dynamic selection of a storage option based on four primary user scenarios.

Utgiver

IEEE

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal