Vis enkel innførsel

dc.contributor.authorChen, Jing
dc.contributor.authorFang, Jianbin
dc.contributor.authorLiu, Weifeng
dc.contributor.authorTang, Tao
dc.contributor.authorYang, Canqun
dc.date.accessioned2019-02-11T13:09:52Z
dc.date.available2019-02-11T13:09:52Z
dc.date.created2018-09-24T10:06:54Z
dc.date.issued2018
dc.identifier.citationFuture generations computer systems. 2018, 1-32.nb_NO
dc.identifier.issn0167-739X
dc.identifier.urihttp://hdl.handle.net/11250/2584807
dc.description.abstractAlternating least squares (ALS) has been proved to be an effective solver for matrix factorization in recommender systems. To speed up factorizing performance, various parallel ALS solvers have been proposed to leverage modern multi-cores and many-cores. Existing implementations are limited in either speed or portability. In this paper, we present an efficient and portable ALS solver (clMF) for recommender systems. On one hand, we diagnose the baseline implementation and observe that it lacks of the awareness of the hierarchical thread organization on modern hardware. To achieve high performance, we apply the thread batching technique, the fine-grained tiling technique and three architecture-specific optimizations. On the other hand, we implement the ALS solver in OpenCL so that it can run on various platforms (CPUs, GPUs and MICs). Based on the architectural specifics, we select a suitable code variant for each platform to efficiently map it to the underlying hardware. The experimental results show that our implementation performs 2.8–15.7 faster on an Intel 16-core CPU, 23.9–87.9 faster on an NVIDIA K20C GPU and 34.6–97.1 faster on an AMD Fury X GPU than the baseline implementation. On the K20C GPU, our implementation also outperforms cuMF over different latent features ranging from 10 to 100 with various real-world recommendation datasets.nb_NO
dc.language.isoengnb_NO
dc.publisherElseviernb_NO
dc.titleclMF: A Fine-Grained and Portable Alternating Least Squares Algorithm for Parallel Matrix Factorizationnb_NO
dc.title.alternativeclMF: A Fine-Grained and Portable Alternating Least Squares Algorithm for Parallel Matrix Factorizationnb_NO
dc.typeJournal articlenb_NO
dc.description.versionsubmittedVersionnb_NO
dc.source.pagenumber1-32nb_NO
dc.source.journalFuture generations computer systemsnb_NO
dc.identifier.doi10.1016/j.future.2018.04.071
dc.identifier.cristin1612783
dc.description.localcodeThis is a submitted manuscript of an article published by Elsevier Ltd in Future generations computer systems, 10 May 2018.nb_NO
cristin.unitcode194,63,10,0
cristin.unitnameInstitutt for datateknologi og informatikk
cristin.ispublishedtrue
cristin.fulltextpreprint
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel