IDEAS home Printed from https://ideas.repec.org/p/hhs/rbnkwp/0297.html
   My bibliography  Save this paper

Speeding Up Mcmc By Efficient Data Subsampling

Author

Listed:
  • Quiroz, Matias

    (Research Department, Central Bank of Sweden)

  • Villani, Mattias

    (Linköpings University)

  • Kohn, Robert

    (Australian School of Business, University of New South Wales)

Abstract

The computing time for Markov Chain Monte Carlo (MCMC) algorithms can be prohibitively large for datasets with many observations, especially when the data density for each observation is costly to evaluate. We propose a framework where the likelihood function is estimated from a random subset of the data, resulting in substantially fewer density evaluations. The data subsets are selected using an efficient Probability Proportional-to-Size (PPS) sampling scheme, where the inclusion probability of an observation is proportional to an approximation of its contribution to the log-likelihood function. Three broad classes of approximations are presented. The proposed algorithm is shown to sample from a distribu- tion that is within O(m^-1/2) of the true posterior, where m is the subsample size. Moreover, the constant in the O(m^-1/2) error bound of the likelihood is shown to be small and the approximation error is demonstrated to be negligible even for a small m in our applications. We propose a simple way to adaptively choose the sample size m during the MCMC to optimize sampling efficiency for a fixed computational budget. The method is applied to a bivariate probit model on a data set with half a million observations, and on a Weibull regression model with random effects for discrete-time survival data.

Suggested Citation

  • Quiroz, Matias & Villani, Mattias & Kohn, Robert, 2015. "Speeding Up Mcmc By Efficient Data Subsampling," Working Paper Series 297, Sveriges Riksbank (Central Bank of Sweden).
  • Handle: RePEc:hhs:rbnkwp:0297
    as

    Download full text from publisher

    File URL: http://www.riksbank.se/Documents/Rapporter/Working_papers/2015/rap_wp297_150330.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Giordani, Paolo & Jacobson, Tor & Schedvin, Erik von & Villani, Mattias, 2014. "Taking the Twists into Account: Predicting Firm Bankruptcy Risk with Splines of Financial Ratios," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 49(4), pages 1071-1099, August.
    2. Christophe Andrieu & Arnaud Doucet & Roman Holenstein, 2010. "Particle Markov chain Monte Carlo methods," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(3), pages 269-342, June.
    3. Shujie Ma & Jeffrey S. Racine & Lijian Yang, 2015. "Spline Regression in the Presence of Categorical Predictors," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 30(5), pages 705-717, August.
    4. Quiroz, Matias & Villani, Mattias, 2013. "Dynamic mixture-of-experts models for longitudinal and discrete-time survival data," Working Paper Series 268, Sveriges Riksbank (Central Bank of Sweden).
    5. Tor Jacobson & Jesper Lindé & Kasper Roszbach, 2013. "Firm Default And Aggregate Fluctuations," Journal of the European Economic Association, European Economic Association, vol. 11(4), pages 945-972, August.
    6. Ormerod, J. T. & Wand, M. P., 2010. "Explaining Variational Approximations," The American Statistician, American Statistical Association, vol. 64(2), pages 140-153.
    7. repec:dau:papers:123456789/5724 is not listed on IDEAS
    8. Pitt, Michael K. & Silva, Ralph dos Santos & Giordani, Paolo & Kohn, Robert, 2012. "On some properties of Markov chain Monte Carlo simulation methods based on the particle filter," Journal of Econometrics, Elsevier, vol. 171(2), pages 134-151.
    9. A. Doucet & M. K. Pitt & G. Deligiannidis & R. Kohn, 2015. "Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator," Biometrika, Biometrika Trust, vol. 102(2), pages 295-313.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Patrick Leung & Catherine S. Forbes & Gael M Martin & Brendan McCabe, 2019. "Forecasting Observables with Particle Filters: Any Filter Will Do!," Monash Econometrics and Business Statistics Working Papers 22/19, Monash University, Department of Econometrics and Business Statistics.
    2. Guangbao Guo & Guoqi Qian & Lu Lin & Wei Shao, 2021. "Parallel inference for big data with the group Bayesian method," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 84(2), pages 225-243, February.
    3. Gael M. Martin & David T. Frazier & Worapree Maneesoonthorn & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2022. "Bayesian Forecasting in Economics and Finance: A Modern Review," Papers 2212.03471, arXiv.org, revised Jul 2023.
    4. Gael M. Martin & David T. Frazier & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2023. "Bayesian Forecasting in the 21st Century: A Modern Review," Monash Econometrics and Business Statistics Working Papers 1/23, Monash University, Department of Econometrics and Business Statistics.
    5. Gael M. Martin & David T. Frazier & Christian P. Robert, 2022. "Computing Bayes: From Then `Til Now," Monash Econometrics and Business Statistics Working Papers 14/22, Monash University, Department of Econometrics and Business Statistics.
    6. Florian Maire & Nial Friel & Pierre ALQUIER, 2017. "Informed Sub-Sampling MCMC: Approximate Bayesian Inference for Large Datasets," Working Papers 2017-40, Center for Research in Economics and Statistics.
    7. Gael M. Martin & David T. Frazier & Christian P. Robert, 2020. "Computing Bayes: Bayesian Computation from 1763 to the 21st Century," Monash Econometrics and Business Statistics Working Papers 14/20, Monash University, Department of Econometrics and Business Statistics.
    8. Gael M. Martin & David T. Frazier & Christian P. Robert, 2021. "Approximating Bayes in the 21st Century," Monash Econometrics and Business Statistics Working Papers 24/21, Monash University, Department of Econometrics and Business Statistics.
    9. Boris Beranger & Huan Lin & Scott Sisson, 2023. "New models for symbolic data analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(3), pages 659-699, September.
    10. Feifei Wang & Danyang Huang & Tianchen Gao & Shuyuan Wu & Hansheng Wang, 2022. "Sequential one‐step estimator by sub‐sampling for customer churn analysis with massive data sets," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1753-1786, November.
    11. Quiroz, Matias & Villani, Mattias & Kohn, Robert, 2015. "Scalable Mcmc For Large Data Problems Using Data Subsampling And The Difference Estimator," Working Paper Series 306, Sveriges Riksbank (Central Bank of Sweden).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lux, Thomas, 2020. "Bayesian estimation of agent-based models via adaptive particle Markov chain Monte Carlo," Economics Working Papers 2020-01, Christian-Albrechts-University of Kiel, Department of Economics.
    2. Gael M. Martin & David T. Frazier & Christian P. Robert, 2020. "Computing Bayes: Bayesian Computation from 1763 to the 21st Century," Monash Econometrics and Business Statistics Working Papers 14/20, Monash University, Department of Econometrics and Business Statistics.
    3. James M. Nason & Gregor W. Smith, 2021. "Measuring the slowly evolving trend in US inflation with professional forecasts," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 36(1), pages 1-17, January.
    4. Gael M. Martin & David T. Frazier & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2023. "Bayesian Forecasting in the 21st Century: A Modern Review," Monash Econometrics and Business Statistics Working Papers 1/23, Monash University, Department of Econometrics and Business Statistics.
    5. Chris Sherlock, 2016. "Optimal Scaling for the Pseudo-Marginal Random Walk Metropolis: Insensitivity to the Noise Generating Mechanism," Methodology and Computing in Applied Probability, Springer, vol. 18(3), pages 869-884, September.
    6. Golightly, Andrew & Bradley, Emma & Lowe, Tom & Gillespie, Colin S., 2019. "Correlated pseudo-marginal schemes for time-discretised stochastic kinetic models," Computational Statistics & Data Analysis, Elsevier, vol. 136(C), pages 92-107.
    7. Thomas Lux, 2022. "Bayesian Estimation of Agent-Based Models via Adaptive Particle Markov Chain Monte Carlo," Computational Economics, Springer;Society for Computational Economics, vol. 60(2), pages 451-477, August.
    8. Dang, Khue-Dung & Quiroz, Matias & Kohn, Robert & Tran, Minh-Ngoc & Villani, Mattias, 2019. "Hamiltonian Monte Carlo with Energy Conserving Subsampling," Working Paper Series 372, Sveriges Riksbank (Central Bank of Sweden).
    9. Wiqvist, Samuel & Golightly, Andrew & McLean, Ashleigh T. & Picchini, Umberto, 2021. "Efficient inference for stochastic differential equation mixed-effects models using correlated particle pseudo-marginal algorithms," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).
    10. Gael M. Martin & David T. Frazier & Worapree Maneesoonthorn & Ruben Loaiza-Maya & Florian Huber & Gary Koop & John Maheu & Didier Nibbering & Anastasios Panagiotelis, 2022. "Bayesian Forecasting in Economics and Finance: A Modern Review," Papers 2212.03471, arXiv.org, revised Jul 2023.
    11. Matti Vihola & Jouni Helske & Jordan Franks, 2020. "Importance sampling type estimators based on approximate marginal Markov chain Monte Carlo," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(4), pages 1339-1376, December.
    12. Gael M. Martin & David T. Frazier & Christian P. Robert, 2022. "Computing Bayes: From Then `Til Now," Monash Econometrics and Business Statistics Working Papers 14/22, Monash University, Department of Econometrics and Business Statistics.
    13. Johan Dahlin & Thomas B. Schon, 2015. "Getting Started with Particle Metropolis-Hastings for Inference in Nonlinear Dynamical Models," Papers 1511.01707, arXiv.org, revised Mar 2019.
    14. Delis, Manthos D. & Tsionas, Mike G., 2018. "Measuring management practices," International Journal of Production Economics, Elsevier, vol. 199(C), pages 65-77.
    15. Tsionas, Mike G. & Michaelides, Panayotis G., 2017. "Bayesian analysis of chaos: The joint return-volatility dynamical system," MPRA Paper 80632, University Library of Munich, Germany.
    16. Joshua Chan, 2023. "BVARs and Stochastic Volatility," Papers 2310.14438, arXiv.org.
    17. Virbickaitė, Audronė & Frey, Christoph & Macedo, Demian N., 2020. "Bayesian sequential stock return prediction through copulas," The Journal of Economic Asymmetries, Elsevier, vol. 22(C).
    18. Tsionas, Mike G. & Michaelides, Panayotis G., 2017. "Neglected chaos in international stock markets: Bayesian analysis of the joint return–volatility dynamical system," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 482(C), pages 95-107.
    19. Golightly Andrew & Wilkinson Darren J., 2015. "Bayesian inference for Markov jump processes with informative observations," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 14(2), pages 169-188, April.
    20. Ong, Victor M.-H. & Nott, David J. & Tran, Minh-Ngoc & Sisson, Scott A. & Drovandi, Christopher C., 2018. "Likelihood-free inference in high dimensions with synthetic likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 271-291.

    More about this item

    Keywords

    Bayesian inference; Markov Chain Monte Carlo; Pseudo-marginal MCMC; Big Data; Probability Proportional-to-Size sampling; Numerical integration.;
    All these keywords.

    JEL classification:

    • C11 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Bayesian Analysis: General
    • C13 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Estimation: General
    • C15 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Statistical Simulation Methods: General
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • C83 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Survey Methods; Sampling Methods

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hhs:rbnkwp:0297. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Lena Löfgren (email available below). General contact details of provider: https://edirc.repec.org/data/rbgovse.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.