Efficient routing and reconfiguration in virtualized HPC environments with vSwitch-enabled lossless networks
Tasoulas, Evangelos; Zahid, Feroz; Gran, Ernst Gunnar; Begnum, Kyrre; Johnsen, Bjørn Dag; Skeie, Tor
Journal article, Peer reviewed
Accepted version
View/ Open
Date
2019Metadata
Show full item recordCollections
Abstract
To meet the demands of communication‐intensive workloads in the cloud, virtual machines (VMs) should utilize low overhead network communication paradigms. In general, such paradigms enable VMs to directly communicate with the hardware by means of a passthrough technology like Single‐Root I/O Virtualization (SR‐IOV). However, when passthrough‐based virtualization is coupled with lossless interconnection networks, live migrations introduce scalability challenges due to the substantial network reconfiguration overhead. With these challenges in mind, we proposed a virtual switch (vSwitch) SR‐IOV architecture for InfiniBand in our previous work titled “Towards the InfiniBand SR‐IOV vSwitch Architecture”. In this paper, we first suggest solutions to rectify the space‐domain scalability issues that are present in vSwitch‐enabled subnets as a result of the VMs using dedicated layer‐two addresses. Then, we discuss routing strategies for virtualized environments using vSwitches and present a routing algorithm for Fat‐Trees. We also present a reconfiguration method that minimizes imposed reconfiguration overhead on Fat‐Trees. We perform an extensive evaluation of our prototype algorithms, and as vSwitch‐enabled hardware does not yet exist, we deduce from empirical observations by emulating vSwitches with existing hardware, as well as large‐scale simulations. Our results show significant reduction in the reconfiguration times as route recalculations can be eliminated, and for certain scenarios, the number of reconfiguration subnet management packets sent to switches is reduced from several hundred thousand down to a single one without degrading the routing quality.