IDEAS home Printed from https://ideas.repec.org/p/hhs/rbnkwp/0306.html
   My bibliography  Save this paper

Scalable Mcmc For Large Data Problems Using Data Subsampling And The Difference Estimator

Author

Listed:
  • Quiroz, Matias

    (Research Department, Central Bank of Sweden)

  • Villani, Mattias

    (Linköping University)

  • Kohn, Robert

    (University of New South Wales)

Abstract

We propose a generic Markov Chain Monte Carlo (MCMC) algorithm to speed up computations for datasets with many observations. A key feature of our approach is the use of the highly efficient difference estimator from the survey sampling literature to estimate the log-likelihood accurately using only a small fraction of the data. Our algorithm improves on the O(n) complexity of regular MCMC by operating over local data clusters instead of the full sample when computing the likelihood. The likelihood estimate is used in a Pseudo- marginal framework to sample from a perturbed posterior which is within O(m^-1/2) of the true posterior, where m is the subsample size. The method is applied to a logistic regression model to predict firm bankruptcy for a large data set. We document a significant speed up in comparison to the standard MCMC on the full dataset.

Suggested Citation

  • Quiroz, Matias & Villani, Mattias & Kohn, Robert, 2015. "Scalable Mcmc For Large Data Problems Using Data Subsampling And The Difference Estimator," Working Paper Series 306, Sveriges Riksbank (Central Bank of Sweden).
  • Handle: RePEc:hhs:rbnkwp:0306
    as

    Download full text from publisher

    File URL: http://www.riksbank.se/Documents/Rapporter/Working_papers/2015/rap_wp306_150729.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Pitt, Michael K. & Silva, Ralph dos Santos & Giordani, Paolo & Kohn, Robert, 2012. "On some properties of Markov chain Monte Carlo simulation methods based on the particle filter," Journal of Econometrics, Elsevier, vol. 171(2), pages 134-151.
    2. A. Doucet & M. K. Pitt & G. Deligiannidis & R. Kohn, 2015. "Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator," Biometrika, Biometrika Trust, vol. 102(2), pages 295-313.
    3. Matias Quiroz & Robert Kohn & Mattias Villani & Minh-Ngoc Tran, 2019. "Speeding Up MCMC by Efficient Data Subsampling," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 831-843, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gael M. Martin & David T. Frazier & Christian P. Robert, 2020. "Computing Bayes: Bayesian Computation from 1763 to the 21st Century," Monash Econometrics and Business Statistics Working Papers 14/20, Monash University, Department of Econometrics and Business Statistics.
    2. Gael M. Martin & David T. Frazier & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2023. "Bayesian Forecasting in the 21st Century: A Modern Review," Monash Econometrics and Business Statistics Working Papers 1/23, Monash University, Department of Econometrics and Business Statistics.
    3. Gael M. Martin & David T. Frazier & Worapree Maneesoonthorn & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2022. "Bayesian Forecasting in Economics and Finance: A Modern Review," Papers 2212.03471, arXiv.org, revised Jul 2023.
    4. Gael M. Martin & David T. Frazier & Christian P. Robert, 2022. "Computing Bayes: From Then `Til Now," Monash Econometrics and Business Statistics Working Papers 14/22, Monash University, Department of Econometrics and Business Statistics.
    5. Lux, Thomas, 2020. "Bayesian estimation of agent-based models via adaptive particle Markov chain Monte Carlo," Economics Working Papers 2020-01, Christian-Albrechts-University of Kiel, Department of Economics.
    6. Matias Quiroz & Robert Kohn & Mattias Villani & Minh-Ngoc Tran, 2019. "Speeding Up MCMC by Efficient Data Subsampling," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 831-843, April.
    7. James M. Nason & Gregor W. Smith, 2021. "Measuring the slowly evolving trend in US inflation with professional forecasts," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 36(1), pages 1-17, January.
    8. Chris Sherlock, 2016. "Optimal Scaling for the Pseudo-Marginal Random Walk Metropolis: Insensitivity to the Noise Generating Mechanism," Methodology and Computing in Applied Probability, Springer, vol. 18(3), pages 869-884, September.
    9. Gael M. Martin & David T. Frazier & Christian P. Robert, 2021. "Approximating Bayes in the 21st Century," Monash Econometrics and Business Statistics Working Papers 24/21, Monash University, Department of Econometrics and Business Statistics.
    10. Golightly, Andrew & Bradley, Emma & Lowe, Tom & Gillespie, Colin S., 2019. "Correlated pseudo-marginal schemes for time-discretised stochastic kinetic models," Computational Statistics & Data Analysis, Elsevier, vol. 136(C), pages 92-107.
    11. Thomas Lux, 2022. "Bayesian Estimation of Agent-Based Models via Adaptive Particle Markov Chain Monte Carlo," Computational Economics, Springer;Society for Computational Economics, vol. 60(2), pages 451-477, August.
    12. Matias Quiroz & Mattias Villani & Robert Kohn & Minh-Ngoc Tran & Khue-Dung Dang, 2018. "Subsampling MCMC - an Introduction for the Survey Statistician," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 80(1), pages 33-69, December.
    13. Dang, Khue-Dung & Quiroz, Matias & Kohn, Robert & Tran, Minh-Ngoc & Villani, Mattias, 2019. "Hamiltonian Monte Carlo with Energy Conserving Subsampling," Working Paper Series 372, Sveriges Riksbank (Central Bank of Sweden).
    14. Wiqvist, Samuel & Golightly, Andrew & McLean, Ashleigh T. & Picchini, Umberto, 2021. "Efficient inference for stochastic differential equation mixed-effects models using correlated particle pseudo-marginal algorithms," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).
    15. Patrick Leung & Catherine S. Forbes & Gael M Martin & Brendan McCabe, 2019. "Forecasting Observables with Particle Filters: Any Filter Will Do!," Monash Econometrics and Business Statistics Working Papers 22/19, Monash University, Department of Econometrics and Business Statistics.
    16. Matti Vihola & Jouni Helske & Jordan Franks, 2020. "Importance sampling type estimators based on approximate marginal Markov chain Monte Carlo," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(4), pages 1339-1376, December.
    17. Johan Dahlin & Thomas B. Schon, 2015. "Getting Started with Particle Metropolis-Hastings for Inference in Nonlinear Dynamical Models," Papers 1511.01707, arXiv.org, revised Mar 2019.
    18. Beatrice Franzolini & Alexandros Beskos & Maria De Iorio & Warrick Poklewski Koziell & Karolina Grzeszkiewicz, 2022. "Change point detection in dynamic Gaussian graphical models: the impact of COVID-19 pandemic on the US stock market," Papers 2208.00952, arXiv.org, revised May 2023.
    19. Benchimol, Jonathan & Ivashchenko, Sergey, 2021. "Switching volatility in a nonlinear open economy," Journal of International Money and Finance, Elsevier, vol. 110(C).
    20. Mamatzakis, Emmanuel C. & Tsionas, Mike G., 2021. "Making inference of British household's happiness efficiency: A Bayesian latent model," European Journal of Operational Research, Elsevier, vol. 294(1), pages 312-326.

    More about this item

    Keywords

    Bayesian inference; Markov Chain Monte Carlo; Pseudo-marginal MCMC; estimated likelihood; GLM for large data.;
    All these keywords.

    JEL classification:

    • C11 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Bayesian Analysis: General
    • C13 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Estimation: General
    • C15 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Statistical Simulation Methods: General
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • C83 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Survey Methods; Sampling Methods

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hhs:rbnkwp:0306. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Lena Löfgren (email available below). General contact details of provider: https://edirc.repec.org/data/rbgovse.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.