Load Balancing of Pseudo-random Workloads on Heterogeneous Systems

Heterogeneous computing systems using one or more graphics processing units (GPUs) as accelerators present unique load balancing challenges due to the architecture of the GPUs. Assigning a part of the workload proportional to the throughput of the GPU is unlikely to achieve the peak theoretical performance of the GPU, partly because of branch divergence. Additionally, for workloads depending on pseudo-random numbers, the branch divergence may appear unpredictable, making it hard to work around.

In this thesis we present an approach for reorganizing pseudo-random workloads before execution on the GPU, with the goal of reducing the branch divergence. In our experiments, the method achieves a speedup in kernel execution time of up to 1.45 on a real application. We also show that the method may be faster even if the overhead of it is accounted for. Additionally, a method for estimating the resulting reduction in execution time is developed, which can be used for determining whether or not to apply the reorganization.

A graph based method for task balancing is also presented, which is able to select the optimal task sequence in over 96\% of the tested cases. This task graph doubles as a model for the throughput of the GPU, and the estimates are used by a load balancer to partition the workload between the central processing unit (CPU) and GPU.

Utgiver

NTNU