Scaling up Bayesian variational inference using distributed computing clusters

Masegosa, Andres; Martinez, Ana M.; Langseth, Helge; Nielsen, Thomas D.; Salmeron, Antonio; Ramos-López, Dario

dc.contributor.author	Masegosa, Andres
dc.contributor.author	Martinez, Ana M.
dc.contributor.author	Langseth, Helge
dc.contributor.author	Nielsen, Thomas D.
dc.contributor.author	Salmeron, Antonio
dc.contributor.author	Ramos-López, Dario
dc.date.accessioned	2017-11-15T08:44:21Z
dc.date.available	2017-11-15T08:44:21Z
dc.date.created	2017-07-26T00:57:42Z
dc.date.issued	2017
dc.identifier.citation	International Journal of Approximate Reasoning. 2017, 88 435-451.	nb_NO
dc.identifier.issn	0888-613X
dc.identifier.uri	http://hdl.handle.net/11250/2466330
dc.description.abstract	In this paper we present an approach for scaling up Bayesian learning using variational methods by exploiting distributed computing clusters managed by modern big data processing tools like Apache Spark or Apache Flink, which e ciently support iterative map-reduce operations. Our approach is de ned as a distributed projected natural gradient ascent algorithm, has excellent convergence properties, and covers a wide range of conjugate exponential family models. We evaluate the proposed algorithm on three real-world datasets from di erent domains (the Pubmed abstracts dataset, a GPS trajectory dataset, and a nancial dataset) and using several models (LDA, factor analysis, mixture of Gaussians and linear regression models). Our approach compares favourably to stochastic variational inference and streaming variational Bayes, two of the main current proposals for scaling up variational methods. For the scalability analysis, we evaluate our approach over a network with more than one billion nodes and approx. 75% latent variables using a computer cluster with 128 processing units (AWS). The proposed methods are released as part of an open-source toolbox for scalable probabilistic machine learning (http://www.amidsttoolbox.com) Masegosa et al. (2017).	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	Elsevier	nb_NO
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/deed.no	*
dc.title	Scaling up Bayesian variational inference using distributed computing clusters	nb_NO
dc.type	Journal article	nb_NO
dc.type	Peer reviewed	nb_NO
dc.description.version	acceptedVersion	nb_NO
dc.source.pagenumber	435-451	nb_NO
dc.source.volume	88	nb_NO
dc.source.journal	International Journal of Approximate Reasoning	nb_NO
dc.identifier.doi	10.1016/j.ijar.2017.06.010
dc.identifier.cristin	1483087
dc.description.localcode	© 2017 Elsevier Ltd. This is the authors' accepted and refereed manuscript to the article, locked until 2019-06-28 due to copyright restrictions. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/	nb_NO
cristin.unitcode	194,63,10,0
cristin.unitname	Institutt for datateknikk og informasjonsvitenskap
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	2

Files in this item

Name:: Masegosa_et_al_texmain--final.pdf
Size:: 1.018Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Institutt for datateknologi og informatikk [6544]
Publikasjoner fra CRIStin - NTNU [37177]

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal