Energy Efficient Computing on Multi-core Processors: Vectorization and Compression Techniques
MetadataShow full item record
Over the past few years, energy consumption has become the main limiting factor for computing in general. This has led CPU vendors to aggressively promote parallel computing using multiple cores without significantly increasing the thermal design power of the processor. However, achieving maximum performance and energy efficiency from the available resources on the multi-core and many-core platforms mandates efficient exploitation of the existing and emerging architectural features at the application level. This thesis presents the study of some of the existing and emerging technologies in order to identify the potential of exploiting these technologies in achieving high performance and energy efficiency for a set of Smart Grid applications on Intel multi-core and many-core platforms. The first part of this thesis explores the energy efficiency impact of different multi-core programming techniques for a selected set of benchmarks and smart grid applications on Intel SandyBridge and Haswell multi-core processors. These techniques include different parallelism techniques such as thread-level parallelism using OpenMP, task-based parallelism using OmpSs, data parallelism using SIMD (Single Instruction Multiple Data) instruction sets, code optimizations and use of different existing optimized math libraries. In our initial case studies, SIMD vectorization is proven very effective in providing both high performance and energy efficiency. Though the SIMD vectorization is proven very effective, it can also exert pressure on the available memory bandwidth for some applications like Powel Time-Series Kernel, causing under-utilization of the computing resources and thus energy inefficient executions. In the second part of this research, we investigate the opportunities of improving the performance of SIMD vectorization for memory-bound applications using SIMD data compression, SIMD software prefetching, SIMD shuffling, code-blocking and other code transformation techniques. The key idea is to reduce the data movement across memory hierarchy by using the idle CPU time. We show that integration of data compression is feasible on the Intel multicore platforms, as long as we can do it in a reasonable time. We present a comprehensive discussion on the SIMD compression techniques and the code transformations required for achieving efficient SIMD computations for memory/cache bound applications using Powel time series kernel as a demonstrator application. Finally, we perform feasibility study of SIMD optimization and compression techniques across other application domains using k-means clustering algorithm and full-search motion estimation algorithm. We also extended our experiments on Intel many-core architecture using Intel Xeon Phi coprocessor.
Has partsPaper 1: Hasib, Abdullah Al. Case Studies of Multi-core Energy Efficiency in Task Based Programs. 2nd International Conference on ICT as Key Technology against Global Warming (ICT-GLOW'12) https://doi.org/10.1007/978-3-642-32606-6_4
Paper 2: Hasib, Abdullah Al. Performance and Power Efficiency Analysis of Data Reuse Transformation Methodology on Multicore Processor. First International Workshop on On-chip Memory Hierarchies and Interconnects: organization, management and implementation (OMHI'12) https://doi.org/10.1007/978-3-642-36949-0_37
Paper 3: Hasib, Abdullah Al; Natvig, Lasse. Performance Optimization and Evaluation of a Data Cleansing Algorithm on Multicore Processors. I: Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems. Academia Press 2013 s. 21-24 - Is not available due to copyright
Paper 4: Al Hasib, Abdullah; Cebrian, Juan; Natvig, Lasse. V-PFORDelta: Data Compression for Energy Efficient Computation of Time Series. I: IEEE International Conference on High Performance Computing (HiPC). 2015 s. 416-425 http//doi.org/10.1109/HiPC.2015.11 © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Paper 5: Al Hasib, Abdullah; Natvig, Lasse; Kjeldsberg, Per Gunnar; Cebrian, Juan Manuel. Energy Efficiency Effects of Vectorization in Data Reuse Transformations for Many-Core Processors—A Case Study. Journal of Low Power Electronics and Applications 2017 ;Volum 7.(1) https://doi.org/10.3390/jlpea7010005 - This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).
Paper 6: Al Hasib, A., Cebrian, J.M. & Natvig, L. J. A Vectorized K-means Algorithm for Compressed Datasets – Design and Experimental Analysis - https://doi.org/10.1007/s11227-018-2310-0