Vis enkel innførsel

dc.contributor.advisorNatvig, Lassenb_NO
dc.contributor.authorLillesand, Trond Ingenb_NO
dc.date.accessioned2014-12-19T13:40:17Z
dc.date.available2014-12-19T13:40:17Z
dc.date.created2013-10-12nb_NO
dc.date.issued2013nb_NO
dc.identifier655637nb_NO
dc.identifierntnudaim:9105nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/253392
dc.description.abstractThe Mont-Blanc Project, led by the Barcelona Supercomputing Center, aims to achieve exascale computing performance with a new global standard in energy efficiency by integrating low-power ARM-based technology into a system of energy-efficient compute nodes. Application kernel research plays a crucial role in understanding the interplay between performance and energy efficiency in a High Performance Computing (HPC) system. In this thesis, the Mont Blanc targeted application kernels 2D-Convolution and Merge Sort are explored, and implemented in OmpSs, NEON, and OpenCL on an Arndale development board containing an Exynos 5 System-on-Chip (SoC). The SoC, which contains a dual-core ARM Cortex A15 processor and a Mali T604 GPU, serves as a compute node in the Mont-Blanc project. Due to the lack of access to energy counting registers in the Exynos 5, a scheme for measuring whole-board energy consumption was created. Performance and energy efficiency metrics were used to evaluate the various implementations. The frequency was also scaled for the different CPU implementations to see how different frequencies affect these metrics. NEON vectorization was exploited by using vector extractions on the 2D-Convolution kernel to improve locality. For Merge Sort, NEON was exploited by implementing in-register sorting with a bitonic sorting network, similar to the approach taken by Chhugani et al. (2008) with SSE, but applied to NEON instead. Implementations of sorting networks and convolution kernels in OpenCL were also explored. Various scheduling policies in the OmpSs implementations were used to get a sense of how they affected performance. The in-register merge sort scheme with NEON gave the highest performance and energy efficiency compared to the OpenCL implementations, although a direct comparison may not be entirely appropriate, as the quality and circumstances of the implementations likely differ. However, vectorization with NEON resulted in high performance at the expense of high power consumption, but with a high energy efficiency, and demonstrates the power of locality improvement combined with vector operations. The OpenCL implementation for 2D-Convolution demonstrated high performance and low power consumption, and achieved the highest energy efficiency in this particular case. For the OmpSs implementations, the choice of scheduling policy proved to affect performance. Scaling the frequency on the applications shows that there is a balance point between frequency and energy efficiency, where an excessively high frequency tends to result in a larger increase in power than performance, and an excessively low frequency results in a larger decrease in performance than power. The results indicate that this differential effect increases with the amount of cores.
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.titleAcceleration with OmpSs and Neon/OpenCL on ARM Processornb_NO
dc.title.alternativeAcceleration with OmpSs and Neon/OpenCL on ARM Processornb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber160nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Tilhørende fil(er)

Thumbnail
Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel