IDEAS home Printed from https://ideas.repec.org/p/crs/wpaper/2017-40.html
   My bibliography  Save this paper

Informed Sub-Sampling MCMC: Approximate Bayesian Inference for Large Datasets

Author

Listed:
  • Florian Maire

    (School of Mathematics and Statistics, University College Dublin; Insight Centre for Data Analytics, University College Dublin)

  • Nial Friel

    (School of Mathematics and Statistics, University College Dublin; Insight Centre for Data Analytics, University College Dublin)

  • Pierre ALQUIER

    (CREST-ENSAE)

Abstract

This paper introduces a framework for speeding up Bayesian inference conducted in presence of large datasets. We design a Markov chain whose transition kernel uses an unknown fraction of fixed size of the available data that is randomly refreshed throughout the algorithm. Inspired by the Approximate Bayesian Computation (ABC) literature, the subsampling process is guided by the fidelity to the observed data, as measured by summary statistics. The resulting algorithm, Informed Sub-Sampling MCMC, is a generic and exible approach which, contrarily to existing scalable methodologies, preserves the simplicity of the Metropolis-Hastings algorithm. Even though exactness is lost, i.e the chain distribution approximates the target, we study and quantify theoretically this bias and show on a diverse set of examples that it yields excellent performances when the computational budget is limited. If available and cheap to compute, we show that setting the summary statistics as the maximum likelihood estimator is supported by theoretical arguments.

Suggested Citation

  • Florian Maire & Nial Friel & Pierre ALQUIER, 2017. "Informed Sub-Sampling MCMC: Approximate Bayesian Inference for Large Datasets," Working Papers 2017-40, Center for Research in Economics and Statistics.
  • Handle: RePEc:crs:wpaper:2017-40
    as

    Download full text from publisher

    File URL: http://crest.science/RePEc/wpstorage/2017-40.pdf
    File Function: CREST working paper version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Quiroz, Matias & Villani, Mattias & Kohn, Robert, 2015. "Speeding Up Mcmc By Efficient Data Subsampling," Working Paper Series 297, Sveriges Riksbank (Central Bank of Sweden).
    2. Paul Fearnhead & Dennis Prangle, 2012. "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 74(3), pages 419-474, June.
    3. Vaart,A. W. van der, 2000. "Asymptotic Statistics," Cambridge Books, Cambridge University Press, number 9780521784504, October.
    4. Quiroz, Matias, 2015. "Speeding Up Mcmc By Delayed Acceptance And Data Subsampling," Working Paper Series 307, Sveriges Riksbank (Central Bank of Sweden).
    5. Arnak Dalalyan, 2017. "Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent," Working Papers 2017-21, Center for Research in Economics and Statistics.
    6. repec:dau:papers:123456789/5724 is not listed on IDEAS
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li, J. & Nott, D.J. & Fan, Y. & Sisson, S.A., 2017. "Extending approximate Bayesian computation methods to high dimensions via a Gaussian copula model," Computational Statistics & Data Analysis, Elsevier, vol. 106(C), pages 77-89.
    2. Laurent Davezies & Xavier D'Haultfoeuille & Yannick Guyonvarch, 2019. "Empirical Process Results for Exchangeable Arrays," Papers 1906.11293, arXiv.org, revised May 2020.
    3. Grazzini, Jakob & Richiardi, Matteo G. & Tsionas, Mike, 2017. "Bayesian estimation of agent-based models," Journal of Economic Dynamics and Control, Elsevier, vol. 77(C), pages 26-47.
    4. Kasy, Maximilian, 2011. "A nonparametric test for path dependence in discrete panel data," Economics Letters, Elsevier, vol. 113(2), pages 172-175.
    5. D.T. Frazier & G.M. Martin & C.P. Robert & J. Rousseau, 2016. "Asymptotic Properties of Approximate Bayesian Computation," Monash Econometrics and Business Statistics Working Papers 18/16, Monash University, Department of Econometrics and Business Statistics.
    6. Xing Ju Lee & Christopher C. Drovandi & Anthony N. Pettitt, 2015. "Model choice problems using approximate Bayesian computation with applications to pathogen transmission data sets," Biometrics, The International Biometric Society, vol. 71(1), pages 198-207, March.
    7. McKinley, Trevelyan J. & Ross, Joshua V. & Deardon, Rob & Cook, Alex R., 2014. "Simulation-based Bayesian inference for epidemic models," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 434-447.
    8. Ashesh Rambachan & Jonathan Roth, 2020. "Design-Based Uncertainty for Quasi-Experiments," Papers 2008.00602, arXiv.org, revised Aug 2020.
    9. Gael M. Martin & David T. Frazier & Christian P. Robert, 2020. "Computing Bayes: Bayesian Computation from 1763 to the 21st Century," Monash Econometrics and Business Statistics Working Papers 14/20, Monash University, Department of Econometrics and Business Statistics.
    10. Debashis Ghosh, 2004. "Semiparametric methods for the binormal model with multiple biomarkers," The University of Michigan Department of Biostatistics Working Paper Series 1046, Berkeley Electronic Press.
    11. Brian D. Williamson & Peter B. Gilbert & Marco Carone & Noah Simon, 2021. "Nonparametric variable importance assessment using machine learning techniques," Biometrics, The International Biometric Society, vol. 77(1), pages 9-22, March.
    12. Arie Beresteanu & Francesca Molinari, 2008. "Asymptotic Properties for a Class of Partially Identified Models," Econometrica, Econometric Society, vol. 76(4), pages 763-814, July.
    13. Kristi Kuljus & Bo Ranneby, 2020. "Asymptotic normality of generalized maximum spacing estimators for multivariate observations," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(3), pages 968-989, September.
    14. Laurent Davezies & Xavier D'Haultfoeuille & Yannick Guyonvarch, 2018. "Asymptotic results under multiway clustering," Papers 1807.07925, arXiv.org, revised Aug 2018.
    15. Dominic Edelmann & Tobias Terzer & Donald Richards, 2021. "A Basic Treatment of the Distance Covariance," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 12-25, May.
    16. Dalalyan, Arnak S. & Karagulyan, Avetik, 2019. "User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient," Stochastic Processes and their Applications, Elsevier, vol. 129(12), pages 5278-5311.
    17. A. Stefano Caria, 2020. "An Adaptive Targeted Field Experiment: Job Search Assistance for Refugees in Jordan," CSAE Working Paper Series 2020-20, Centre for the Study of African Economies, University of Oxford.
    18. Clément de Chaisemartin & Xavier D'Haultfœuille, 2020. "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects," American Economic Review, American Economic Association, vol. 110(9), pages 2964-2996, September.
    19. Kruiniger, Hugo, 2018. "A further look at Modified ML estimation of the panel AR(1) model with fixed effects and arbitrary initial conditions," MPRA Paper 88623, University Library of Munich, Germany.
    20. Benoumechiara Nazih & Bousquet Nicolas & Michel Bertrand & Saint-Pierre Philippe, 2020. "Detecting and modeling critical dependence structures between random inputs of computer models," Dependence Modeling, De Gruyter, vol. 8(1), pages 263-297, January.

    More about this item

    Keywords

    Bayesian inference; Big-data; Approximate Bayesian Computation; noisy Markov chain Monte Carlo;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:crs:wpaper:2017-40. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: . General contact details of provider: https://edirc.repec.org/data/crestfr.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Secretariat General (email available below). General contact details of provider: https://edirc.repec.org/data/crestfr.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.