SIMD parallel MCMC sampling with applications for big-data Bayesian analytics

SIMD parallel MCMC sampling with applications for big-data Bayesian analytics

Author

Listed:

Mahani, Alireza S.
Sharabiani, Mansour T.A.

Abstract

Computational intensity and sequential nature of estimation techniques for Bayesian methods in statistics and machine learning, combined with their increasing applications for big data analytics, necessitate both the identification of potential opportunities to parallelize techniques such as Monte Carlo Markov Chain (MCMC) sampling, and the development of general strategies for mapping such parallel algorithms to modern CPUs in order to elicit the performance up the compute-based and/or memory-based hardware limits. Two opportunities for Single-Instruction Multiple-Data (SIMD) parallelization of MCMC sampling for probabilistic graphical models are presented. In exchangeable models with many observations such as Bayesian Generalized Linear Models (GLMs), child-node contributions to the conditional posterior of each node can be calculated concurrently. In undirected graphs with discrete-value nodes, concurrent sampling of conditionally-independent nodes can be transformed into a SIMD form. High-performance libraries with multi-threading and vectorization capabilities can be readily applied to such SIMD opportunities to gain decent speedup, while a series of high-level source-code and runtime modifications provide further performance boost by reducing parallelization overhead and increasing data locality for Non-Uniform Memory Access architectures. For big-data Bayesian GLM graphs, the end-result is a routine for evaluating the conditional posterior and its gradient vector that is 5 times faster than a naive implementation using (built-in) multi-threaded Intel MKL BLAS, and reaches within the striking distance of the memory-bandwidth-induced hardware limit. Using multi-threading for cache-friendly, fine-grained parallelization can outperform coarse-grained alternatives which are often less cache-friendly, a likely scenario in modern predictive analytics workflow such as Hierarchical Bayesian GLM, variable selection, and ensemble regression and classification. The proposed optimization strategies improve the scaling of performance with number of cores and width of vector units (applicable to many-core SIMD processors such as Intel Xeon Phi and Graphic Processing Units), resulting in cost-effectiveness, energy efficiency (‘green computing’), and higher speed on multi-core x86 processors.

Suggested Citation

Mahani, Alireza S. & Sharabiani, Mansour T.A., 2015. "SIMD parallel MCMC sampling with applications for big-data Bayesian analytics," Computational Statistics & Data Analysis, Elsevier, vol. 88(C), pages 75-99.

Handle: RePEc:eee:csdana:v:88:y:2015:i:c:p:75-99
DOI: 10.1016/j.csda.2015.02.010

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

References listed on IDEAS

Antonio, Katrien & Beirlant, Jan, 2007. "Actuarial statistics with generalized linear mixed models," Insurance: Mathematics and Economics, Elsevier, vol. 40(1), pages 58-76, January.
Bradley P. Carlin & Alan E. Gelfand & Adrian F. M. Smith, 1992. "Hierarchical Bayesian Analysis of Changepoint Problems," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 41(2), pages 389-405, June.
Ferreira da Silva, Adelino R., 2011. "cudaBayesreg: Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 44(i04).
Strid, Ingvar, 2010. "Efficient parallelisation of Metropolis-Hastings algorithms using a prefetching approach," Computational Statistics & Data Analysis, Elsevier, vol. 54(11), pages 2814-2835, November.
Mark Girolami & Ben Calderhead, 2011. "Riemann manifold Langevin and Hamiltonian Monte Carlo methods," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 73(2), pages 123-214, March.
W. R. Gilks & P. Wild, 1992. "Adaptive Rejection Sampling for Gibbs Sampling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 41(2), pages 337-348, June.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Federico Palacios-González & Rosa M. García-Fernández, 2020. "A faster algorithm to estimate multiresolution densities," Computational Statistics, Springer, vol. 35(3), pages 1207-1230, September.
Li, Song & Tso, Geoffrey K.F. & Long, Lufan, 2017. "Powered embarrassing parallel MCMC sampling in Bayesian inference, a weighted average intuition," Computational Statistics & Data Analysis, Elsevier, vol. 115(C), pages 11-20.
Hasan, Asad & Wang, Zhiyu & Mahani, Alireza S., 2016. "Fast Estimation of Multinomial Logit Models: R Package mnlogit," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 75(i03).
Abpeykar, Shadi & Ghatee, Mehdi & Zare, Hadi, 2019. "Ensemble decision forest of RBF networks via hybrid feature clustering approach for high-dimensional data classification," Computational Statistics & Data Analysis, Elsevier, vol. 131(C), pages 12-36.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Rotondi, R., 2002. "On the influence of the proposal distributions on a reversible jump MCMC algorithm applied to the detection of multiple change-points," Computational Statistics & Data Analysis, Elsevier, vol. 40(3), pages 633-653, September.
White, Gentry & Porter, Michael D., 2014. "GPU accelerated MCMC for modeling terrorist activity," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 643-651.
Mahani, Alireza S. & Sharabiani, Mansour T. A., 2017. "Multivariate-From-Univariate MCMC Sampler: The R Package MfUSampler," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 78(c01).
Fitzpatrick, Matthew, 2014. "Geometric ergodicity of the Gibbs sampler for the Poisson change-point model," Statistics & Probability Letters, Elsevier, vol. 91(C), pages 55-61.
Pang, W. K. & Yang, Z. H. & Hou, S. H. & Leung, P. K., 2002. "Non-uniform random variate generation by the vertical strip method," European Journal of Operational Research, Elsevier, vol. 142(3), pages 595-609, November.
Atkinson, Scott E. & Tsionas, Mike G., 2021. "Generalized estimation of productivity with multiple bad outputs: The importance of materials balance constraints," European Journal of Operational Research, Elsevier, vol. 292(3), pages 1165-1186.
Jia Liu & John M. Maheu & Yong Song, 2024. "Identification and forecasting of bull and bear markets using multivariate returns," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(5), pages 723-745, August.
- Liu, Jia & Maheu, John M & Song, Yong, 2023. "Identification and Forecasting of Bull and Bear Markets using Multivariate Returns," MPRA Paper 119515, University Library of Munich, Germany.
Dimitrakopoulos, Stefanos & Tsionas, Mike, 2019. "Ordinal-response GARCH models for transaction data: A forecasting exercise," International Journal of Forecasting, Elsevier, vol. 35(4), pages 1273-1287.
Vanhatalo, Jarno & Veneranta, Lari & Hudd, Richard, 2012. "Species distribution modeling with Gaussian processes: A case study with the youngest stages of sea spawning whitefish (Coregonus lavaretus L. s.l.) larvae," Ecological Modelling, Elsevier, vol. 228(C), pages 49-58.
Z. Rezaei Ghahroodi & M. Ganjali, 2013. "A Bayesian approach for analysing longitudinal nominal outcomes using random coefficients transitional generalized logit model: an application to the labour force survey data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 40(7), pages 1425-1445, July.
Will Penny & Biswa Sengupta, 2016. "Annealed Importance Sampling for Neural Mass Models," PLOS Computational Biology, Public Library of Science, vol. 12(3), pages 1-25, March.
Anis Fradi & Chafik Samir & Ines Adouani, 2024. "A New Bayesian Approach to Global Optimization on Parametrized Surfaces in $$\mathbb {R}^{3}$$ R 3," Journal of Optimization Theory and Applications, Springer, vol. 202(3), pages 1077-1100, September.
Zarezadeh Zakarya & Costantini Giovanni, 2019. "Particle diffusion Monte Carlo (PDMC)," Monte Carlo Methods and Applications, De Gruyter, vol. 25(2), pages 121-130, June.
Owyang, Michael T. & Piger, Jeremy & Wall, Howard J., 2008. "A state-level analysis of the Great Moderation," Regional Science and Urban Economics, Elsevier, vol. 38(6), pages 578-589, November.
- Michael T. Owyang & Jeremy Piger & Howard J. Wall & Federal Reserve Bank of St. Louis, 2006. "A State-Level Analysis of the Great Moderation," Computing in Economics and Finance 2006 131, Society for Computational Economics.
- Michael T. Owyang & Jeremy M. Piger & Howard J. Wall, 2007. "A state-level analysis of the Great Moderation," Working Papers 2007-003, Federal Reserve Bank of St. Louis.
Ruggieri, Eric & Antonellis, Marcus, 2016. "An exact approach to Bayesian sequential change point detection," Computational Statistics & Data Analysis, Elsevier, vol. 97(C), pages 71-86.
Nandram, Balgobin & Zelterman, Daniel, 2007. "Computational Bayesian inference for estimating the size of a finite population," Computational Statistics & Data Analysis, Elsevier, vol. 51(6), pages 2934-2945, March.
Michael L. Polemis & Mike G. Tsionas, 2019. "Bayesian nonlinear panel cointegration: an empirical application to the EKC hypothesis," Letters in Spatial and Resource Sciences, Springer, vol. 12(2), pages 113-120, August.
Agudze, Komla M. & Billio, Monica & Casarin, Roberto & Ravazzolo, Francesco, 2022. "Markov switching panel with endogenous synchronization effects," Journal of Econometrics, Elsevier, vol. 230(2), pages 281-298.
- Komla M. Agudze & Monica Billio & Roberto Casarin & Francesco Ravazzolo, 2021. "Markov Switching Panel with Endogenous Synchronization Effects," BEMPS - Bozen Economics & Management Paper Series BEMPS82, Faculty of Economics and Management at the Free University of Bozen.
Arnak S. Dalalyan, 2017. "Theoretical guarantees for approximate sampling from smooth and log-concave densities," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(3), pages 651-676, June.
- Arnak S. Dalalyan, 2014. "Theoretical guarantees for approximate sampling from smooth and log-concave densities," Working Papers 2014-45, Center for Research in Economics and Statistics.
Fuentes-García, Ruth & Mena, Ramsés H. & Walker, Stephen G., 2009. "A nonparametric dependent process for Bayesian regression," Statistics & Probability Letters, Elsevier, vol. 79(8), pages 1112-1119, April.

More about this item

Keywords

; ; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:88:y:2015:i:c:p:75-99. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

SIMD parallel MCMC sampling with applications for big-data Bayesian analytics

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data