Vis enkel innførsel

dc.contributor.authorMostafaei, Habib
dc.contributor.authorSmaragdakis, Georgios
dc.contributor.authorZinner, Thomas Erich
dc.contributor.authorFeldmann, Anja
dc.date.accessioned2023-01-04T11:52:30Z
dc.date.available2023-01-04T11:52:30Z
dc.date.created2022-08-02T14:28:24Z
dc.date.issued2022
dc.identifier.issn1932-4537
dc.identifier.urihttps://hdl.handle.net/11250/3040912
dc.description.abstractBig data analytics platforms have played a critical role in the unprecedented success of data-driven applications. However, real-time and streaming data applications, and recent legislation, e.g., GDPR in Europe, have posed constraints on exchanging and analyzing data, especially personal data, across geographic regions. To address such constraints data has to be processed and analyzed in-situ and aggregated results have to be exchanged among the different sites for further processing. This introduces additional network delays due to the geographic distribution of the sites and potentially affecting the performance of analytics platforms that are designed to operate in datacenters with low network delays. In this paper, we show that the three most popular big data analytics systems (Apache Storm, Apache Spark, and Apache Flink) fail to tolerate round-trip times more than 30 milliseconds even when the input data rate is low. The execution time of distributed big data analytics tasks degrades substantially after this threshold, and some of the systems are more sensitive than others. A closer examination and understanding of the design of these systems show that there is no winner in all wide-area settings. However, we show that it is possible to improve the performance of all these popular big data analytics systems significantly amid even transcontinental delays (where inter-node delay is more than 30 milliseconds) and achieve performance comparable to this within a datacenter for the same load.en_US
dc.language.isoengen_US
dc.publisherIEEEen_US
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.titleDelay-Resistant Geo-Distributed Analyticsen_US
dc.title.alternativeDelay-Resistant Geo-Distributed Analyticsen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionpublishedVersionen_US
dc.source.journalIEEE Transactions on Network and Service Managementen_US
dc.identifier.doi10.1109/TNSM.2022.3192710
dc.identifier.cristin2040699
cristin.ispublishedtrue
cristin.fulltextpreprint
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Navngivelse 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal