Vis enkel innførsel

dc.contributor.authorCebrian, Juan Manuel
dc.contributor.authorNatvig, Lasse
dc.contributor.authorJahre, Magnus
dc.date.accessioned2019-11-05T09:23:29Z
dc.date.available2019-11-05T09:23:29Z
dc.date.created2019-07-09T10:55:17Z
dc.date.issued2019
dc.identifier.citationJournal of Supercomputing. 2019, 1-16.nb_NO
dc.identifier.issn0920-8542
dc.identifier.urihttp://hdl.handle.net/11250/2626529
dc.description.abstractEnergy efficiency below a specific thermal design power (TDP) has become the main design goal for microprocessors across all market segments. Optimizing the usage of the available transistors within the TDP is a pending topic. Parallelism is the basic foundation for achieving the exascale level. While instruction-level and thread-level parallelism are embraced by developers, data-level parallelism is usually underutilized, despite its huge potential (e.g. single-instruction multiple-data execution). Companies are pushing the size of vector registers to double every 4 years. Intel’s AVX-512 (512-bit registers) and ARM’s SVE (up to 2048-bit registers) are examples of such trend. In this paper, we perform a scalability and energy efficiency analysis of AVX-512 using the ParVec benchmark suite. ParVec is extended to add support for AVX-512 as well as the newest versions of the GCC compiler . We use Intel’s Top–Down model to show the main bottlenecks of the architecture for each studied benchmark. Results show that the performance and energy improvements depend greatly on the fraction of code that can be vectorized . Energy improvements over scalar codes in a single-thread environment range from 2 × for Streamcluster (worst) to 8 × for Blackscholes (best).nb_NO
dc.language.isoengnb_NO
dc.publisherSpringer Verlagnb_NO
dc.titleScalability analysis of AVX-512 extensionsnb_NO
dc.typeJournal articlenb_NO
dc.typePeer reviewednb_NO
dc.description.versionacceptedVersionnb_NO
dc.source.pagenumber1-16nb_NO
dc.source.journalJournal of Supercomputingnb_NO
dc.identifier.doi10.1007/s11227-019-02840-7
dc.identifier.cristin1710771
dc.description.localcodeThis is a post-peer-review, pre-copyedit version of an article published in [Journal of Supercomputing] Locked until 23.4.2020 due to copyright restrictions. The final authenticated version is available online at: https://doi.org/10.1007/s11227-019-02840-7nb_NO
cristin.unitcode194,63,10,0
cristin.unitnameInstitutt for datateknologi og informatikk
cristin.ispublishedtrue
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel