IDEAS home Printed from https://ideas.repec.org/p/syb/wpbsba/2123-16205.html
   My bibliography  Save this paper

Speeding up MCMC by Efficient Data Subsampling

Author

Listed:
  • Kohn, Robert
  • Quiroz, Matias
  • Tran, Minh-Ngoc
  • Villani, Mattias

Abstract

We propose Subsampling MCMC, a Markov Chain Monte Carlo (MCMC) framework where the likelihood function for n observations is estimated from a random subset of m observations. We introduce a general and highly efficient unbiased estimator of the log-likelihood based on control variates obtained from clustering the data. The cost of computing the log-likelihood estimator is much smaller than that of the full log-likelihood used by standard MCMC. The likelihood estimate is bias-corrected and used in two correlated pseudo-marginal algorithms to sample from a perturbed posterior, for which we derive the asymptotic error with respect to n and m, respectively. A practical estimator of the error is proposed and we show that the error is negligible even for a very small m in our applications. We demonstrate that Subsampling MCMC is substantially more efficient than standard MCMC in terms of sampling efficiency for a given computational budget, and that it outperforms other subsampling methods for MCMC proposed in the literature.

Suggested Citation

  • Kohn, Robert & Quiroz, Matias & Tran, Minh-Ngoc & Villani, Mattias, 2016. "Speeding up MCMC by Efficient Data Subsampling," Working Papers 2123/16205, University of Sydney Business School, Discipline of Business Analytics.
  • Handle: RePEc:syb:wpbsba:2123/16205
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/2123/16205
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Giordani, Paolo & Jacobson, Tor & Schedvin, Erik von & Villani, Mattias, 2014. "Taking the Twists into Account: Predicting Firm Bankruptcy Risk with Splines of Financial Ratios," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 49(4), pages 1071-1099, August.
    2. Tor Jacobson & Jesper Lindé & Kasper Roszbach, 2013. "Firm Default And Aggregate Fluctuations," Journal of the European Economic Association, European Economic Association, vol. 11(4), pages 945-972, August.
    3. Shujie Ma & Jeffrey S. Racine & Lijian Yang, 2015. "Spline Regression in the Presence of Categorical Predictors," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 30(5), pages 705-717, August.
    4. repec:dau:papers:123456789/5724 is not listed on IDEAS
    5. Pitt, Michael K. & Silva, Ralph dos Santos & Giordani, Paolo & Kohn, Robert, 2012. "On some properties of Markov chain Monte Carlo simulation methods based on the particle filter," Journal of Econometrics, Elsevier, vol. 171(2), pages 134-151.
    6. A. Doucet & M. K. Pitt & G. Deligiannidis & R. Kohn, 2015. "Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator," Biometrika, Biometrika Trust, vol. 102(2), pages 295-313.
    7. Christophe Andrieu & Arnaud Doucet & Roman Holenstein, 2010. "Particle Markov chain Monte Carlo methods," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(3), pages 269-342, June.
    8. Quiroz, Matias & Villani, Mattias, 2013. "Dynamic mixture-of-experts models for longitudinal and discrete-time survival data," Working Paper Series 268, Sveriges Riksbank (Central Bank of Sweden).
    9. Ormerod, J. T. & Wand, M. P., 2010. "Explaining Variational Approximations," The American Statistician, American Statistical Association, vol. 64(2), pages 140-153.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Guangbao Guo & Guoqi Qian & Lu Lin & Wei Shao, 2021. "Parallel inference for big data with the group Bayesian method," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 84(2), pages 225-243, February.
    2. Gael M. Martin & David T. Frazier & Christian P. Robert, 2022. "Computing Bayes: From Then `Til Now," Monash Econometrics and Business Statistics Working Papers 14/22, Monash University, Department of Econometrics and Business Statistics.
    3. Gael M. Martin & David T. Frazier & Christian P. Robert, 2020. "Computing Bayes: Bayesian Computation from 1763 to the 21st Century," Monash Econometrics and Business Statistics Working Papers 14/20, Monash University, Department of Econometrics and Business Statistics.
    4. Boris Beranger & Huan Lin & Scott Sisson, 2023. "New models for symbolic data analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(3), pages 659-699, September.
    5. Patrick Leung & Catherine S. Forbes & Gael M Martin & Brendan McCabe, 2019. "Forecasting Observables with Particle Filters: Any Filter Will Do!," Monash Econometrics and Business Statistics Working Papers 22/19, Monash University, Department of Econometrics and Business Statistics.
    6. Gael M. Martin & David T. Frazier & Worapree Maneesoonthorn & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2022. "Bayesian Forecasting in Economics and Finance: A Modern Review," Papers 2212.03471, arXiv.org, revised Jul 2023.
    7. Gael M. Martin & David T. Frazier & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2023. "Bayesian Forecasting in the 21st Century: A Modern Review," Monash Econometrics and Business Statistics Working Papers 1/23, Monash University, Department of Econometrics and Business Statistics.
    8. Florian Maire & Nial Friel & Pierre ALQUIER, 2017. "Informed Sub-Sampling MCMC: Approximate Bayesian Inference for Large Datasets," Working Papers 2017-40, Center for Research in Economics and Statistics.
    9. Gael M. Martin & David T. Frazier & Christian P. Robert, 2021. "Approximating Bayes in the 21st Century," Monash Econometrics and Business Statistics Working Papers 24/21, Monash University, Department of Econometrics and Business Statistics.
    10. Feifei Wang & Danyang Huang & Tianchen Gao & Shuyuan Wu & Hansheng Wang, 2022. "Sequential one‐step estimator by sub‐sampling for customer churn analysis with massive data sets," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1753-1786, November.
    11. Quiroz, Matias & Villani, Mattias & Kohn, Robert, 2015. "Scalable Mcmc For Large Data Problems Using Data Subsampling And The Difference Estimator," Working Paper Series 306, Sveriges Riksbank (Central Bank of Sweden).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lux, Thomas, 2020. "Bayesian estimation of agent-based models via adaptive particle Markov chain Monte Carlo," Economics Working Papers 2020-01, Christian-Albrechts-University of Kiel, Department of Economics.
    2. Gael M. Martin & David T. Frazier & Christian P. Robert, 2020. "Computing Bayes: Bayesian Computation from 1763 to the 21st Century," Monash Econometrics and Business Statistics Working Papers 14/20, Monash University, Department of Econometrics and Business Statistics.
    3. James M. Nason & Gregor W. Smith, 2021. "Measuring the slowly evolving trend in US inflation with professional forecasts," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 36(1), pages 1-17, January.
    4. Gael M. Martin & David T. Frazier & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2023. "Bayesian Forecasting in the 21st Century: A Modern Review," Monash Econometrics and Business Statistics Working Papers 1/23, Monash University, Department of Econometrics and Business Statistics.
    5. Chris Sherlock, 2016. "Optimal Scaling for the Pseudo-Marginal Random Walk Metropolis: Insensitivity to the Noise Generating Mechanism," Methodology and Computing in Applied Probability, Springer, vol. 18(3), pages 869-884, September.
    6. Golightly, Andrew & Bradley, Emma & Lowe, Tom & Gillespie, Colin S., 2019. "Correlated pseudo-marginal schemes for time-discretised stochastic kinetic models," Computational Statistics & Data Analysis, Elsevier, vol. 136(C), pages 92-107.
    7. Thomas Lux, 2022. "Bayesian Estimation of Agent-Based Models via Adaptive Particle Markov Chain Monte Carlo," Computational Economics, Springer;Society for Computational Economics, vol. 60(2), pages 451-477, August.
    8. Dang, Khue-Dung & Quiroz, Matias & Kohn, Robert & Tran, Minh-Ngoc & Villani, Mattias, 2019. "Hamiltonian Monte Carlo with Energy Conserving Subsampling," Working Paper Series 372, Sveriges Riksbank (Central Bank of Sweden).
    9. Wiqvist, Samuel & Golightly, Andrew & McLean, Ashleigh T. & Picchini, Umberto, 2021. "Efficient inference for stochastic differential equation mixed-effects models using correlated particle pseudo-marginal algorithms," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).
    10. Gael M. Martin & David T. Frazier & Worapree Maneesoonthorn & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2022. "Bayesian Forecasting in Economics and Finance: A Modern Review," Papers 2212.03471, arXiv.org, revised Jul 2023.
    11. Matti Vihola & Jouni Helske & Jordan Franks, 2020. "Importance sampling type estimators based on approximate marginal Markov chain Monte Carlo," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(4), pages 1339-1376, December.
    12. Gael M. Martin & David T. Frazier & Christian P. Robert, 2022. "Computing Bayes: From Then `Til Now," Monash Econometrics and Business Statistics Working Papers 14/22, Monash University, Department of Econometrics and Business Statistics.
    13. Johan Dahlin & Thomas B. Schon, 2015. "Getting Started with Particle Metropolis-Hastings for Inference in Nonlinear Dynamical Models," Papers 1511.01707, arXiv.org, revised Mar 2019.
    14. Delis, Manthos D. & Tsionas, Mike G., 2018. "Measuring management practices," International Journal of Production Economics, Elsevier, vol. 199(C), pages 65-77.
    15. Tsionas, Mike G. & Michaelides, Panayotis G., 2017. "Bayesian analysis of chaos: The joint return-volatility dynamical system," MPRA Paper 80632, University Library of Munich, Germany.
    16. Joshua Chan, 2023. "BVARs and Stochastic Volatility," Papers 2310.14438, arXiv.org.
    17. Virbickaitė, Audronė & Frey, Christoph & Macedo, Demian N., 2020. "Bayesian sequential stock return prediction through copulas," The Journal of Economic Asymmetries, Elsevier, vol. 22(C).
    18. Tsionas, Mike G. & Michaelides, Panayotis G., 2017. "Neglected chaos in international stock markets: Bayesian analysis of the joint return–volatility dynamical system," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 482(C), pages 95-107.
    19. Golightly Andrew & Wilkinson Darren J., 2015. "Bayesian inference for Markov jump processes with informative observations," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 14(2), pages 169-188, April.
    20. Ong, Victor M.-H. & Nott, David J. & Tran, Minh-Ngoc & Sisson, Scott A. & Drovandi, Christopher C., 2018. "Likelihood-free inference in high dimensions with synthetic likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 271-291.

    More about this item

    Keywords

    Survey sampling; Big Data; Block pseudo-marginal; Estimated likelihood; Correlated pseudo-marginal; Bayesian inference;
    All these keywords.

    JEL classification:

    • C11 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Bayesian Analysis: General
    • C13 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Estimation: General
    • C15 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Statistical Simulation Methods: General
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • C83 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Survey Methods; Sampling Methods

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:syb:wpbsba:2123/16205. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Artem Prokhorov (email available below). General contact details of provider: https://edirc.repec.org/data/sbsydau.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.