SIMD parallel MCMC sampling with applications for big-data Bayesian analytics
Computational intensity and sequential nature of estimation techniques for Bayesian methods in statistics and machine learning, combined with their increasing applications for big data analytics, necessitate both the identification of potential opportunities to parallelize techniques such as Monte Carlo Markov Chain (MCMC) sampling, and the development of general strategies for mapping such parallel algorithms to modern CPUs in order to elicit the performance up the compute-based and/or memory-based hardware limits. Two opportunities for Single-Instruction Multiple-Data (SIMD) parallelization of MCMC sampling for probabilistic graphical models are presented. In exchangeable models with many observations such as Bayesian Generalized Linear Models (GLMs), child-node contributions to the conditional posterior of each node can be calculated concurrently. In undirected graphs with discrete-value nodes, concurrent sampling of conditionally-independent nodes can be transformed into a SIMD form. High-performance libraries with multi-threading and vectorization capabilities can be readily applied to such SIMD opportunities to gain decent speedup, while a series of high-level source-code and runtime modifications provide further performance boost by reducing parallelization overhead and increasing data locality for Non-Uniform Memory Access architectures. For big-data Bayesian GLM graphs, the end-result is a routine for evaluating the conditional posterior and its gradient vector that is 5 times faster than a naive implementation using (built-in) multi-threaded Intel MKL BLAS, and reaches within the striking distance of the memory-bandwidth-induced hardware limit. Using multi-threading for cache-friendly, fine-grained parallelization can outperform coarse-grained alternatives which are often less cache-friendly, a likely scenario in modern predictive analytics workflow such as Hierarchical Bayesian GLM, variable selection, and ensemble regression and classification. The proposed optimization strategies improve the scaling of performance with number of cores and width of vector units (applicable to many-core SIMD processors such as Intel Xeon Phi and Graphic Processing Units), resulting in cost-effectiveness, energy efficiency (‘green computing’), and higher speed on multi-core x86 processors.
If you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.
As the access to this document is restricted, you may want to look for a different version under "Related research" (further below) or search for a different version of it.
References listed on IDEAS
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
- Antonio, Katrien & Beirlant, Jan, 2007. "Actuarial statistics with generalized linear mixed models," Insurance: Mathematics and Economics, Elsevier, vol. 40(1), pages 58-76, January.
- Strid, Ingvar, 2010. "Efficient parallelisation of Metropolis-Hastings algorithms using a prefetching approach," Computational Statistics & Data Analysis, Elsevier, vol. 54(11), pages 2814-2835, November.
- Mark Girolami & Ben Calderhead, 2011. "Riemann manifold Langevin and Hamiltonian Monte Carlo methods," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 73(2), pages 123-214, 03.
When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:88:y:2015:i:c:p:75-99. See general information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Dana Niculescu)
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If references are entirely missing, you can add them using this form.
If the full references list an item that is present in RePEc, but the system did not link to it, you can help with this form.
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your profile, as there may be some citations waiting for confirmation.
Please note that corrections may take a couple of weeks to filter through the various RePEc services.