Smart Data Placement for Big Data Pipelines: An Approach based on the Storage-as-a-Service Model
Khan, Akif Quddus; Nikolov, Nikolay Vladimirov; Matskin, Mihhail; Prodan, Radu; Song, Hui; Roman, Dumitru; Soylu, Ahmet
Chapter
Accepted version
Permanent lenke
https://hdl.handle.net/11250/3062532Utgivelsesdato
2022Metadata
Vis full innførselSamlinger
Sammendrag
The development of big data pipelines is a challenging task, especially when data storage is considered as part of the data pipelines. Local storage is expensive, hard to maintain, comes with several challenges (e.g., data availability, data security, and backup). The use of cloud storage, i.e., Storageas-a-Service (StaaS), instead of local storage has the potential of providing more flexibility in terms of such as scalability, fault tolerance, and availability. In this paper, we propose a generic approach to integrate StaaS with data pipelines, i.e., computation on an on-premise server or on a specific cloud, but integration with StaaS, and develop a ranking method for available storage options based on five key parameters: cost, proximity, network performance, the impact of server-side encryption, and user weights. The evaluation carried out demonstrates the effectiveness of the proposed approach in terms of data transfer performance and the feasibility of dynamic selection of a storage option based on four primary user scenarios.