Energy Efficiency and Performance Evaluation of Register Level Bitonic Sort: on ARM Mali Powered Exynos 5 Processor
MetadataShow full item record
Energy is one of the most important aspects impacting the reality of reach-ing exascale computing capabilities. In order to build super computers withthis computing power new hardware needs to be considered in their de-sign. One possibility is using hardware designed for mobile and embeddedsystems. In this project, a sorting approach, developed for AVX-512 by Xi-aochen et. al., is implemented both using ARM NEON vectorization andOpenCL. OpenMP is also used. These implementations are profiled on theArndale development board, which houses dual Cortex-A15 ARM proces-sor cores and an ARM Mali T604 GPU on its Exynos 5 System on Chip.These are compared to other sorting algorithm implementations and mea-sured with regards to performance and energy efficiency. It is found thatthe NEON vectorization offer a slight increase in performance when com-pared to a merge sort algorithm without such vectorization. The OpenCLimplementation has the overall poorest performance. The approach imposesrequirements on the input data size which overall make the approach un-favorable on current mobile hardware. The SIMD vector length is deemedan important part in the performance increase being low. Future hardwarewith potentially larger SIMD vector length could see the method be appliedwith more success.