A Bayesian perspective of statistical machine learning for big data

My bibliography Save this article

A Bayesian perspective of statistical machine learning for big data

Author

Listed:

Rajiv Sambasivan
(Chennai Mathematical Institute
University of Southampton)
Sourish Das
(Chennai Mathematical Institute
University of Southampton)
Sujit K. Sahu
(Chennai Mathematical Institute
University of Southampton)

Registered:

Abstract

Statistical Machine Learning (SML) refers to a body of algorithms and methods by which computers are allowed to discover important features of input data sets which are often very large in size. The very task of feature discovery from data is essentially the meaning of the keyword ‘learning’ in SML. Theoretical justifications for the effectiveness of the SML algorithms are underpinned by sound principles from different disciplines, such as Computer Science and Statistics. The theoretical underpinnings particularly justified by statistical inference methods are together termed as statistical learning theory. This paper provides a review of SML from a Bayesian decision theoretic point of view—where we argue that many SML techniques are closely connected to making inference by using the so called Bayesian paradigm. We discuss many important SML techniques such as supervised and unsupervised learning, deep learning, online learning and Gaussian processes especially in the context of very large data sets where these are often employed. We present a dictionary which maps the key concepts of SML from Computer Science and Statistics. We illustrate the SML techniques with three moderately large data sets where we also discuss many practical implementation issues. Thus the review is especially targeted at statisticians and computer scientists who are aspiring to understand and apply SML for moderately large to big data sets.

Suggested Citation

Rajiv Sambasivan & Sourish Das & Sujit K. Sahu, 2020. "A Bayesian perspective of statistical machine learning for big data," Computational Statistics, Springer, vol. 35(3), pages 893-930, September.

Handle: RePEc:spr:compst:v:35:y:2020:i:3:d:10.1007_s00180-020-00970-8
DOI: 10.1007/s00180-020-00970-8

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

References listed on IDEAS

Park, Trevor & Casella, George, 2008. "The Bayesian Lasso," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 681-686, June.
Megan L Head & Luke Holman & Rob Lanfear & Andrew T Kahn & Michael D Jennions, 2015. "The Extent and Consequences of P-Hacking in Science," PLOS Biology, Public Library of Science, vol. 13(3), pages 1-15, March.
David M. Blei & Alp Kucukelbir & Jon D. McAuliffe, 2017. "Variational Inference: A Review for Statisticians," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 859-877, April.
Sourish Das & Dipak K. Dey, 2013. "On Dynamic Generalized Linear Models with Applications," Methodology and Computing in Applied Probability, Springer, vol. 15(2), pages 407-421, June.
Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
Das, Sourish & Dey, Dipak K., 2006. "On Bayesian Analysis of Generalized Linear Models Using the Jacobian Technique," The American Statistician, American Statistical Association, vol. 60, pages 264-268, August.
Das, Sourish & Dey, Dipak K., 2010. "On Bayesian inference for generalized multivariate gamma distribution," Statistics & Probability Letters, Elsevier, vol. 80(19-20), pages 1492-1499, October.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Christian Soize, 2023. "Probabilistic learning constrained by realizations using a weak formulation of Fourier transform of probability measures," Computational Statistics, Springer, vol. 38(4), pages 1879-1925, December.
Hirofumi Michimae & Takeshi Emura, 2022. "Bayesian ridge estimators based on copula-based joint prior distributions for regression coefficients," Computational Statistics, Springer, vol. 37(5), pages 2741-2769, November.
Emanuele Dolera, 2022. "Asymptotic Efficiency of Point Estimators in Bayesian Predictive Inference," Mathematics, MDPI, vol. 10(7), pages 1-27, April.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Gary Koop & Dimitris Korobilis, 2023. "Bayesian Dynamic Variable Selection In High Dimensions," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 64(3), pages 1047-1074, August.
- Gary Koop & Dimitris Korobilis, 2018. "Bayesian dynamic variable selection in high dimensions," Papers 1809.03031, arXiv.org, revised May 2020.
- Korobilis, Dimitris & Koop, Gary, 2020. "Bayesian dynamic variable selection in high dimensions," MPRA Paper 100164, University Library of Munich, Germany.
- Gary Koop & Dimitris Korobilis, 2020. "Bayesian dynamic variable selection in high dimensions," Working Papers 2020_11, Business School - Economics, University of Glasgow.
Yu-Zhu Tian & Man-Lai Tang & Wai-Sum Chan & Mao-Zai Tian, 2021. "Bayesian bridge-randomized penalized quantile regression for ordinal longitudinal data, with application to firm’s bond ratings," Computational Statistics, Springer, vol. 36(2), pages 1289-1319, June.
Yagli, Gokhan Mert & Yang, Dazhi & Srinivasan, Dipti, 2019. "Automatic hourly solar forecasting using machine learning models," Renewable and Sustainable Energy Reviews, Elsevier, vol. 105(C), pages 487-498.
Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2023. "Machine learning advances for time series forecasting," Journal of Economic Surveys, Wiley Blackwell, vol. 37(1), pages 76-111, February.
- Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2020. "Machine Learning Advances for Time Series Forecasting," Papers 2012.12802, arXiv.org, revised Apr 2021.
Korobilis, Dimitris, 2013. "Hierarchical shrinkage priors for dynamic regressions with many predictors," International Journal of Forecasting, Elsevier, vol. 29(1), pages 43-59.
- Dimitris Korobilis, 2011. "Hierarchical Shrinkage Priors for Dynamic Regressions with Many Predictors," Working Paper series 21_11, Rimini Centre for Economic Analysis.
- Korobilis, Dimitris, 2011. "Hierarchical shrinkage priors for dynamic regressions with many predictors," MPRA Paper 30380, University Library of Munich, Germany.
- KOROBILIS, Dimitris, 2011. "Hierarchical shrinkage priors for dynamic regressions with many predictors," LIDAM Discussion Papers CORE 2011021, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
Philip Kostov & Thankom Arun & Samuel Annim, 2014. "Financial Services to the Unbanked: the case of the Mzansi intervention in South Africa," Contemporary Economics, Vizja University, vol. 8(2), June.
Ruggieri, Eric & Lawrence, Charles E., 2012. "On efficient calculations for Bayesian variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1319-1332.
Olivier Collignon & Jeongseop Han & Hyungmi An & Seungyoung Oh & Youngjo Lee, 2018. "Comparison of the modified unbounded penalty and the LASSO to select predictive genes of response to chemotherapy in breast cancer," PLOS ONE, Public Library of Science, vol. 13(10), pages 1-15, October.
Park, Seongoh & Kim, Joungyoun & Wang, Xinlei & Lim, Johan, 2024. "Variable selection in Bayesian multiple instance regression using shotgun stochastic search," Computational Statistics & Data Analysis, Elsevier, vol. 196(C).
Gilles Charmet & Louis-Gautier Tran & Jérôme Auzanneau & Renaud Rincent & Sophie Bouchet, 2020. "BWGS: A R package for genomic selection and its application to a wheat breeding programme," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-20, April.
Scutari Marco & Mackay Ian & Balding David, 2013. "Improving the efficiency of genomic selection," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(4), pages 517-527, August.
Posch, Konstantin & Arbeiter, Maximilian & Pilz, Juergen, 2020. "A novel Bayesian approach for variable selection in linear regression models," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
Tanin Sirimongkolkasem & Reza Drikvandi, 2019. "On Regularisation Methods for Analysis of High Dimensional Data," Annals of Data Science, Springer, vol. 6(4), pages 737-763, December.
Mogliani, Matteo & Simoni, Anna, 2021. "Bayesian MIDAS penalized regressions: Estimation, selection, and prediction," Journal of Econometrics, Elsevier, vol. 222(1), pages 833-860.
- Matteo Mogliani & Anna Simoni, 2019. "Bayesian MIDAS Penalized Regressions: Estimation, Selection, and Prediction," Papers 1903.08025, arXiv.org, revised Jun 2020.
- Matteo Mogliani & Anna Simoni, 2020. "Bayesian MIDAS penalized regressions: Estimation, selection, and prediction," Post-Print hal-03089878, HAL.
- Matteo Mogliani, 2019. "Bayesian MIDAS penalized regressions: estimation, selection, and prediction," Working papers 713, Banque de France.
Christis Katsouris, 2023. "High Dimensional Time Series Regression Models: Applications to Statistical Learning Methods," Papers 2308.16192, arXiv.org.
Roberto Casarin & Fausto Corradin & Francesco Ravazzolo & Nguyen Domenico Sartore & Wing-Keung Wong, 2020. "A Scoring Rule for Factor and Autoregressive Models Under Misspecification," Advances in Decision Sciences, Asia University, Taiwan, vol. 24(2), pages 66-103, June.
- Roberto Casarin & Fausto Corradin & Francesco Ravazzolo & Nguyen Domenico Sartore & Wing-Keung Wong, 2020. "A Scoring Rule for Factor and Autoregressive Models Under Misspecification," International Association of Decision Sciences, Asia University, Taiwan, vol. 24(2), pages 66-103, June.
- Roberto Casarin & Fausto Corradin & Francesco Ravazzolo & Domenico Sartore, 2018. "A scoring rule for factor and autoregressive models under misspecification," Working Papers 2018:18, Department of Economics, University of Venice "Ca' Foscari".
Daniel Felix Ahelegbey & Monica Billio & Roberto Casarin, 2016. "Sparse Graphical Vector Autoregression: A Bayesian Approach," Annals of Economics and Statistics, GENES, issue 123-124, pages 333-361.
- Roberto Casarin & Daniel Felix Ahelegbey & Monica Billio, 2014. "Sparse Graphical Vector Autoregression: A Bayesian Approach," Working Papers 2014:29, Department of Economics, University of Venice "Ca' Foscari".
Bergersen Linn Cecilie & Glad Ingrid K. & Lyng Heidi, 2011. "Weighted Lasso with Data Integration," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-29, August.
van Erp, Sara & Oberski, Daniel L. & Mulder, Joris, 2018. "Shrinkage priors for Bayesian penalized regression," OSF Preprints cg8fq, Center for Open Science.
Kshitij Khare & Malay Ghosh, 2022. "MCMC Convergence for Global-Local Shrinkage Priors," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 20(1), pages 211-234, September.

More about this item

Keywords

; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:35:y:2020:i:3:d:10.1007_s00180-020-00970-8. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

A Bayesian perspective of statistical machine learning for big data

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data