IDEAS home Printed from https://ideas.repec.org/p/crs/wpaper/2017-40.html
   My bibliography  Save this paper

Informed Sub-Sampling MCMC: Approximate Bayesian Inference for Large Datasets

Author

Listed:
  • Florian Maire

    (School of Mathematics and Statistics, University College Dublin; Insight Centre for Data Analytics, University College Dublin)

  • Nial Friel

    (School of Mathematics and Statistics, University College Dublin; Insight Centre for Data Analytics, University College Dublin)

  • Pierre ALQUIER

    (CREST-ENSAE)

Abstract

This paper introduces a framework for speeding up Bayesian inference conducted in presence of large datasets. We design a Markov chain whose transition kernel uses an unknown fraction of fixed size of the available data that is randomly refreshed throughout the algorithm. Inspired by the Approximate Bayesian Computation (ABC) literature, the subsampling process is guided by the fidelity to the observed data, as measured by summary statistics. The resulting algorithm, Informed Sub-Sampling MCMC, is a generic and exible approach which, contrarily to existing scalable methodologies, preserves the simplicity of the Metropolis-Hastings algorithm. Even though exactness is lost, i.e the chain distribution approximates the target, we study and quantify theoretically this bias and show on a diverse set of examples that it yields excellent performances when the computational budget is limited. If available and cheap to compute, we show that setting the summary statistics as the maximum likelihood estimator is supported by theoretical arguments.

Suggested Citation

  • Florian Maire & Nial Friel & Pierre ALQUIER, 2017. "Informed Sub-Sampling MCMC: Approximate Bayesian Inference for Large Datasets," Working Papers 2017-40, Center for Research in Economics and Statistics.
  • Handle: RePEc:crs:wpaper:2017-40
    as

    Download full text from publisher

    File URL: http://crest.science/RePEc/wpstorage/2017-40.pdf
    File Function: CREST working paper version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Vaart,A. W. van der, 2000. "Asymptotic Statistics," Cambridge Books, Cambridge University Press, number 9780521784504, October.
    2. Quiroz, Matias, 2015. "Speeding Up Mcmc By Delayed Acceptance And Data Subsampling," Working Paper Series 307, Sveriges Riksbank (Central Bank of Sweden).
    3. Nunes Matthew A & Balding David J, 2010. "On Optimal Selection of Summary Statistics for Approximate Bayesian Computation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-16, September.
    4. repec:dau:papers:123456789/5724 is not listed on IDEAS
    5. Matias Quiroz & Robert Kohn & Mattias Villani & Minh-Ngoc Tran, 2019. "Speeding Up MCMC by Efficient Data Subsampling," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 831-843, April.
    6. Paul Fearnhead & Dennis Prangle, 2012. "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 74(3), pages 419-474, June.
    7. Arnak Dalalyan, 2017. "Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent," Working Papers 2017-21, Center for Research in Economics and Statistics.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li, J. & Nott, D.J. & Fan, Y. & Sisson, S.A., 2017. "Extending approximate Bayesian computation methods to high dimensions via a Gaussian copula model," Computational Statistics & Data Analysis, Elsevier, vol. 106(C), pages 77-89.
    2. Gael M. Martin & David T. Frazier & Christian P. Robert, 2020. "Computing Bayes: Bayesian Computation from 1763 to the 21st Century," Monash Econometrics and Business Statistics Working Papers 14/20, Monash University, Department of Econometrics and Business Statistics.
    3. Gael M. Martin & David T. Frazier & Christian P. Robert, 2021. "Approximating Bayes in the 21st Century," Monash Econometrics and Business Statistics Working Papers 24/21, Monash University, Department of Econometrics and Business Statistics.
    4. Menéndez, P. & Fan, Y. & Garthwaite, P.H. & Sisson, S.A., 2014. "Simultaneous adjustment of bias and coverage probabilities for confidence intervals," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 35-44.
    5. Christopher C. Drovandi & Anthony N. Pettitt, 2013. "Bayesian Experimental Design for Models with Intractable Likelihoods," Biometrics, The International Biometric Society, vol. 69(4), pages 937-948, December.
    6. Baey, Charlotte & Smith, Henrik G. & Rundlöf, Maj & Olsson, Ola & Clough, Yann & Sahlin, Ullrika, 2023. "Calibration of a bumble bee foraging model using Approximate Bayesian Computation," Ecological Modelling, Elsevier, vol. 477(C).
    7. Jonathan U Harrison & Ruth E Baker, 2020. "An automatic adaptive method to combine summary statistics in approximate Bayesian computation," PLOS ONE, Public Library of Science, vol. 15(8), pages 1-21, August.
    8. Prangle Dennis & Fearnhead Paul & Cox Murray P. & Biggs Patrick J. & French Nigel P., 2014. "Semi-automatic selection of summary statistics for ABC model choice," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(1), pages 67-82, February.
    9. Buzbas, Erkan O. & Rosenberg, Noah A., 2015. "AABC: Approximate approximate Bayesian computation for inference in population-genetic models," Theoretical Population Biology, Elsevier, vol. 99(C), pages 31-42.
    10. Soubeyrand Samuel & Guiton François & Klein Etienne K. & Carpentier Florence, 2013. "Approximate Bayesian computation with functional statistics," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(1), pages 17-37, March.
    11. Mikael Sunnåker & Alberto Giovanni Busetto & Elina Numminen & Jukka Corander & Matthieu Foll & Christophe Dessimoz, 2013. "Approximate Bayesian Computation," PLOS Computational Biology, Public Library of Science, vol. 9(1), pages 1-10, January.
    12. Creel, Michael & Kristensen, Dennis, 2016. "On selection of statistics for approximate Bayesian computing (or the method of simulated moments)," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 99-114.
    13. Wilkinson Richard David, 2013. "Approximate Bayesian computation (ABC) gives exact results under the assumption of model error," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(2), pages 129-141, May.
    14. Nakagome Shigeki & Fukumizu Kenji & Mano Shuhei, 2013. "Kernel approximate Bayesian computation in population genetic inferences," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(6), pages 667-678, December.
    15. Michael Stocks & Mathieu Siol & Martin Lascoux & Stéphane De Mita, 2014. "Amount of Information Needed for Model Choice in Approximate Bayesian Computation," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-13, June.
    16. VanDerHorn, Eric & Mahadevan, Sankaran, 2018. "Bayesian model updating with summarized statistical and reliability data," Reliability Engineering and System Safety, Elsevier, vol. 172(C), pages 12-24.
    17. Soubeyrand, Samuel & Haon-Lasportes, Emilie, 2015. "Weak convergence of posteriors conditional on maximum pseudo-likelihood estimates and implications in ABC," Statistics & Probability Letters, Elsevier, vol. 107(C), pages 84-92.
    18. Silk Daniel & Filippi Sarah & Stumpf Michael P. H., 2013. "Optimizing threshold-schedules for sequential approximate Bayesian computation: applications to molecular systems," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(5), pages 603-618, October.
    19. Laurent Davezies & Xavier D'Haultfoeuille & Yannick Guyonvarch, 2019. "Empirical Process Results for Exchangeable Arrays," Papers 1906.11293, arXiv.org, revised May 2020.
    20. Alexander Frankel & Maximilian Kasy, 2022. "Which Findings Should Be Published?," American Economic Journal: Microeconomics, American Economic Association, vol. 14(1), pages 1-38, February.

    More about this item

    Keywords

    Bayesian inference; Big-data; Approximate Bayesian Computation; noisy Markov chain Monte Carlo;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:crs:wpaper:2017-40. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Secretariat General (email available below). General contact details of provider: https://edirc.repec.org/data/crestfr.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.