Non-Asymptotic Delay Bounds for Multi-Server Systems with Synchronization Constraints
Journal article, Peer reviewed
MetadataShow full item record
Original versionIEEE Transactions on Parallel and Distributed Systems. 2018, 29 (7), 1545-1559. 10.1109/TPDS.2017.2779872
Parallel computing has become a standard tool with architectures such as Google MapReduce, Hadoop, and Spark being broadly used in applications such as data processing and machine learning. Common to these systems are a fork operation, where jobs are first divided into tasks that are processed in parallel, and a join operation where completed tasks wait for the other tasks of the job before leaving the system. The synchronization constraint of the join operation makes the analysis of fork-join systems challenging, and few explicit results are known. In this work, we formulate a max-plus server model for parallel systems which allows us to derive performance bounds for a variety of systems in the GII GI and G I G cases. We contribute end-to-end delay bounds for multi-stage fork-join networks. We perform a detailed comparison of different multi-server configurations, including an analysis of single-queue fork-join systems that achieve a fundamental performance gain. We compare these results to both simulation and a live Spark system.