IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v9y2010i1n34.html
   My bibliography  Save this article

On Optimal Selection of Summary Statistics for Approximate Bayesian Computation

Author

Listed:
  • Nunes Matthew A

    (Lancaster University)

  • Balding David J

    (University College London)

Abstract

How best to summarize large and complex datasets is a problem that arises in many areas of science. We approach it from the point of view of seeking data summaries that minimize the average squared error of the posterior distribution for a parameter of interest under approximate Bayesian computation (ABC). In ABC, simulation under the model replaces computation of the likelihood, which is convenient for many complex models. Simulated and observed datasets are usually compared using summary statistics, typically in practice chosen on the basis of the investigator's intuition and established practice in the field. We propose two algorithms for automated choice of efficient data summaries. Firstly, we motivate minimisation of the estimated entropy of the posterior approximation as a heuristic for the selection of summary statistics. Secondly, we propose a two-stage procedure: the minimum-entropy algorithm is used to identify simulated datasets close to that observed, and these are each successively regarded as observed datasets for which the mean root integrated squared error of the ABC posterior approximation is minimized over sets of summary statistics. In a simulation study, we both singly and jointly inferred the scaled mutation and recombination parameters from a population sample of DNA sequences. The computationally-fast minimum entropy algorithm showed a modest improvement over existing methods while our two-stage procedure showed substantial and highly-significant further improvement for both univariate and bivariate inferences. We found that the optimal set of summary statistics was highly dataset specific, suggesting that more generally there may be no globally-optimal choice, which argues for a new selection for each dataset even if the model and target of inference are unchanged.

Suggested Citation

  • Nunes Matthew A & Balding David J, 2010. "On Optimal Selection of Summary Statistics for Approximate Bayesian Computation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-16, September.
  • Handle: RePEc:bpj:sagmbi:v:9:y:2010:i:1:n:34
    DOI: 10.2202/1544-6115.1576
    as

    Download full text from publisher

    File URL: https://doi.org/10.2202/1544-6115.1576
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.2202/1544-6115.1576?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. McKinley Trevelyan & Cook Alex R & Deardon Robert, 2009. "Inference in Epidemic Models without Likelihoods," The International Journal of Biostatistics, De Gruyter, vol. 5(1), pages 1-40, July.
    2. Mevik, Björn-Helge & Wehrens, Ron, 2007. "The pls Package: Principal Component and Partial Least Squares Regression in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 18(i02).
    3. Yu, K. & Jones, M.C., 2004. "Likelihood-Based Local Linear Estimation of the Conditional Variance Function," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 139-144, January.
    4. Fan, Jianqing & Yao, Qiwei, 1998. "Efficient estimation of conditional variance functions in stochastic regression," LSE Research Online Documents on Economics 6635, London School of Economics and Political Science, LSE Library.
    5. Peter Hall & Sally Morton, 1993. "On the estimation of entropy," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 45(1), pages 69-88, March.
    6. Ebrahimi, Nader & Maasoumi, Esfandiar & Soofi, Ehsan S., 1999. "Ordering univariate distributions by entropy and variance," Journal of Econometrics, Elsevier, vol. 90(2), pages 317-336, June.
    7. Joyce Paul & Marjoram Paul, 2008. "Approximately Sufficient Statistics and Bayesian Computation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-18, August.
    8. Mark A. Beaumont & Jean-Marie Cornuet & Jean-Michel Marin & Christian P. Robert, 2009. "Adaptive approximate Bayesian computation," Biometrika, Biometrika Trust, vol. 96(4), pages 983-990.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mikael Sunnåker & Alberto Giovanni Busetto & Elina Numminen & Jukka Corander & Matthieu Foll & Christophe Dessimoz, 2013. "Approximate Bayesian Computation," PLOS Computational Biology, Public Library of Science, vol. 9(1), pages 1-10, January.
    2. Soubeyrand, Samuel & Haon-Lasportes, Emilie, 2015. "Weak convergence of posteriors conditional on maximum pseudo-likelihood estimates and implications in ABC," Statistics & Probability Letters, Elsevier, vol. 107(C), pages 84-92.
    3. Prangle Dennis & Fearnhead Paul & Cox Murray P. & Biggs Patrick J. & French Nigel P., 2014. "Semi-automatic selection of summary statistics for ABC model choice," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(1), pages 67-82, February.
    4. Silk Daniel & Filippi Sarah & Stumpf Michael P. H., 2013. "Optimizing threshold-schedules for sequential approximate Bayesian computation: applications to molecular systems," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(5), pages 603-618, October.
    5. Anthony Ebert & Kerrie Mengersen & Fabrizio Ruggeri & Paul Wu, 2021. "Curve Registration of Functional Data for Approximate Bayesian Computation," Stats, MDPI, vol. 4(3), pages 1-14, September.
    6. Soubeyrand Samuel & Guiton François & Klein Etienne K. & Carpentier Florence, 2013. "Approximate Bayesian computation with functional statistics," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(1), pages 17-37, March.
    7. Michael Stocks & Mathieu Siol & Martin Lascoux & Stéphane De Mita, 2014. "Amount of Information Needed for Model Choice in Approximate Bayesian Computation," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-13, June.
    8. Nakagome Shigeki & Fukumizu Kenji & Mano Shuhei, 2013. "Kernel approximate Bayesian computation in population genetic inferences," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(6), pages 667-678, December.
    9. VanDerHorn, Eric & Mahadevan, Sankaran, 2018. "Bayesian model updating with summarized statistical and reliability data," Reliability Engineering and System Safety, Elsevier, vol. 172(C), pages 12-24.
    10. Bonassi Fernando V. & You Lingchong & West Mike, 2011. "Bayesian Learning from Marginal Data in Bionetwork Models," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-27, October.
    11. Christopher C. Drovandi & Anthony N. Pettitt, 2013. "Bayesian Experimental Design for Models with Intractable Likelihoods," Biometrics, The International Biometric Society, vol. 69(4), pages 937-948, December.
    12. Baey, Charlotte & Smith, Henrik G. & Rundlöf, Maj & Olsson, Ola & Clough, Yann & Sahlin, Ullrika, 2023. "Calibration of a bumble bee foraging model using Approximate Bayesian Computation," Ecological Modelling, Elsevier, vol. 477(C).
    13. Creel, Michael & Kristensen, Dennis, 2016. "On selection of statistics for approximate Bayesian computing (or the method of simulated moments)," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 99-114.
    14. Florian Maire & Nial Friel & Pierre ALQUIER, 2017. "Informed Sub-Sampling MCMC: Approximate Bayesian Inference for Large Datasets," Working Papers 2017-40, Center for Research in Economics and Statistics.
    15. Wilkinson Richard David, 2013. "Approximate Bayesian computation (ABC) gives exact results under the assumption of model error," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(2), pages 129-141, May.
    16. Jonathan U Harrison & Ruth E Baker, 2020. "An automatic adaptive method to combine summary statistics in approximate Bayesian computation," PLOS ONE, Public Library of Science, vol. 15(8), pages 1-21, August.
    17. Li, J. & Nott, D.J. & Fan, Y. & Sisson, S.A., 2017. "Extending approximate Bayesian computation methods to high dimensions via a Gaussian copula model," Computational Statistics & Data Analysis, Elsevier, vol. 106(C), pages 77-89.
    18. Buzbas, Erkan O. & Rosenberg, Noah A., 2015. "AABC: Approximate approximate Bayesian computation for inference in population-genetic models," Theoretical Population Biology, Elsevier, vol. 99(C), pages 31-42.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Henri Pesonen & Umberto Simola & Alvaro Köhn‐Luque & Henri Vuollekoski & Xiaoran Lai & Arnoldo Frigessi & Samuel Kaski & David T. Frazier & Worapree Maneesoonthorn & Gael M. Martin & Jukka Corander, 2023. "ABC of the future," International Statistical Review, International Statistical Institute, vol. 91(2), pages 243-268, August.
    2. repec:dau:papers:123456789/5724 is not listed on IDEAS
    3. Xing Ju Lee & Christopher C. Drovandi & Anthony N. Pettitt, 2015. "Model choice problems using approximate Bayesian computation with applications to pathogen transmission data sets," Biometrics, The International Biometric Society, vol. 71(1), pages 198-207, March.
    4. Yanchun Jin, 2016. "Nonparametric tests for the effect of treatment on conditional variance," KIER Working Papers 948, Kyoto University, Institute of Economic Research.
    5. Holger Dette & Kay Pilz, 2009. "On the estimation of a monotone conditional variance in nonparametric regression," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 61(1), pages 111-141, March.
    6. McKinley, Trevelyan J. & Ross, Joshua V. & Deardon, Rob & Cook, Alex R., 2014. "Simulation-based Bayesian inference for epidemic models," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 434-447.
    7. Gael M. Martin & David T. Frazier & Christian P. Robert, 2020. "Computing Bayes: Bayesian Computation from 1763 to the 21st Century," Monash Econometrics and Business Statistics Working Papers 14/20, Monash University, Department of Econometrics and Business Statistics.
    8. Dette, Holger & Pilz, Kay F., 2004. "On the estimation of a monotone conditional variance in nonparametric regression," Technical Reports 2004,42, Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen.
    9. Holger Dette & Mareen Marchlewski & Jens Wagener, 2012. "Testing for a constant coefficient of variation in nonparametric regression by empirical processes," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 64(5), pages 1045-1070, October.
    10. Bertl Johanna & Ewing Gregory & Kosiol Carolin & Futschik Andreas, 2017. "Approximate maximum likelihood estimation for population genetic inference," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(5-6), pages 387-405, December.
    11. Jung Hsuan & Marjoram Paul, 2011. "Choice of Summary Statistic Weights in Approximate Bayesian Computation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-23, September.
    12. Enno Mammen & Jens Perch Nielsen & Michael Scholz & Stefan Sperlich, 2019. "Conditional Variance Forecasts for Long-Term Stock Returns," Risks, MDPI, vol. 7(4), pages 1-22, November.
    13. Degui Li & Oliver Linton & Zudi Lu, 2010. "Loch Linear Fitting under Near Epoch Dependence: Uniform Consistency with Convergence Rate," STICERD - Econometrics Paper Series 549, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
    14. Maxime Lenormand & Franck Jabot & Guillaume Deffuant, 2013. "Adaptive approximate Bayesian computation for complex models," Computational Statistics, Springer, vol. 28(6), pages 2777-2796, December.
    15. Xu, Ke-Li & Phillips, Peter C. B., 2011. "Tilted Nonparametric Estimation of Volatility Functions With Empirical Applications," Journal of Business & Economic Statistics, American Statistical Association, vol. 29(4), pages 518-528.
    16. Gael M. Martin & David T. Frazier & Christian P. Robert, 2021. "Approximating Bayes in the 21st Century," Monash Econometrics and Business Statistics Working Papers 24/21, Monash University, Department of Econometrics and Business Statistics.
    17. Jonathan U Harrison & Ruth E Baker, 2020. "An automatic adaptive method to combine summary statistics in approximate Bayesian computation," PLOS ONE, Public Library of Science, vol. 15(8), pages 1-21, August.
    18. Pérez-González, A. & Vilar-Fernández, J.M. & González-Manteiga, W., 2010. "Nonparametric variance function estimation with missing data," Journal of Multivariate Analysis, Elsevier, vol. 101(5), pages 1123-1142, May.
    19. Piotr Borkowski & Jan Mielniczuk, 2010. "Postmodel selection estimators of variance function for nonlinear autoregression," Journal of Time Series Analysis, Wiley Blackwell, vol. 31(1), pages 50-63, January.
    20. Frazier, David T. & Maneesoonthorn, Worapree & Martin, Gael M. & McCabe, Brendan P.M., 2019. "Approximate Bayesian forecasting," International Journal of Forecasting, Elsevier, vol. 35(2), pages 521-539.
    21. Buzbas, Erkan O. & Rosenberg, Noah A., 2015. "AABC: Approximate approximate Bayesian computation for inference in population-genetic models," Theoretical Population Biology, Elsevier, vol. 99(C), pages 31-42.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:9:y:2010:i:1:n:34. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.