IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v70y2021i4p835-857.html
   My bibliography  Save this article

Bayesian criterion‐based variable selection

Author

Listed:
  • Arnab Kumar Maity
  • Sanjib Basu
  • Santu Ghosh

Abstract

Bayesian approaches for criterion based selection include the marginal likelihood based highest posterior model (HPM) and the deviance information criterion (DIC). The DIC is popular in practice as it can often be estimated from sampling‐based methods with relative ease and DIC is readily available in various Bayesian software. We find that sensitivity of DIC‐based selection can be high, in the range of 90–100%. However, correct selection by DIC can be in the range of 0–2%. These performances persist consistently with increase in sample size. We establish that both marginal likelihood and DIC asymptotically disfavour under‐fitted models, explaining the high sensitivities of both criteria. However, mis‐selection probability of DIC remains bounded below by a positive constant in linear models with g‐priors whereas mis‐selection probability by marginal likelihood converges to 0 under certain conditions. A consequence of our results is that not only the DIC cannot asymptotically differentiate between the data‐generating and an over‐fitted model, but, in fact, it cannot asymptotically differentiate between two over‐fitted models as well. We illustrate these results in multiple simulation studies and in a biomarker selection problem on cancer cachexia of non‐small cell lung cancer patients. We further study the performances of HPM and DIC in generalized linear model as practitioners often choose to use DIC that is readily available in software in such non‐conjugate settings.

Suggested Citation

  • Arnab Kumar Maity & Sanjib Basu & Santu Ghosh, 2021. "Bayesian criterion‐based variable selection," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 835-857, August.
  • Handle: RePEc:bla:jorssc:v:70:y:2021:i:4:p:835-857
    DOI: 10.1111/rssc.12488
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssc.12488
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssc.12488?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Fernandez, Carmen & Ley, Eduardo & Steel, Mark F. J., 2001. "Benchmark priors for Bayesian model averaging," Journal of Econometrics, Elsevier, vol. 100(2), pages 381-427, February.
    2. Oludare Ariyo & Adrian Quintero & Johanna Muñoz & Geert Verbeke & Emmanuel Lesaffre, 2020. "Bayesian model selection in linear mixed models for longitudinal data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 47(5), pages 890-913, April.
    3. Siddhartha Chib & Minchul Shin & Anna Simoni, 2018. "Bayesian Estimation and Comparison of Moment Condition Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(524), pages 1656-1668, October.
    4. Carlos M. Carvalho & Nicholas G. Polson & James G. Scott, 2010. "The horseshoe estimator for sparse signals," Biometrika, Biometrika Trust, vol. 97(2), pages 465-480.
    5. Michael J. Daniels & Arkendu S. Chatterjee & Chenguang Wang, 2012. "Bayesian Model Selection for Incomplete Data Using the Posterior Predictive Distribution," Biometrics, The International Biometric Society, vol. 68(4), pages 1055-1063, December.
    6. Anindya Bhadra & Jyotishka Datta & Nicholas G. Polson & Brandon Willard, 2016. "Default Bayesian analysis with global-local shrinkage priors," Biometrika, Biometrika Trust, vol. 103(4), pages 955-969.
    7. Chan, Joshua C.C. & Grant, Angelia L., 2016. "Fast computation of the deviance information criterion for latent variable models," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 847-859.
    8. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    9. David J. Spiegelhalter & Nicola G. Best & Bradley P. Carlin & Angelika Linde, 2014. "The deviance information criterion: 12 years on," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(3), pages 485-493, June.
    10. Liang, Feng & Paulo, Rui & Molina, German & Clyde, Merlise A. & Berger, Jim O., 2008. "Mixtures of g Priors for Bayesian Variable Selection," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 410-423, March.
    11. Edgar C. Merkle & Daniel Furr & Sophia Rabe-Hesketh, 2019. "Bayesian Comparison of Latent Variable Models: Conditional Versus Marginal Likelihoods," Psychometrika, Springer;The Psychometric Society, vol. 84(3), pages 802-829, September.
    12. Siddhartha Chib & Ivan Jeliazkov, 2005. "Accept–reject Metropolis–Hastings sampling and marginal likelihood estimation," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 59(1), pages 30-44, February.
    13. Valen E. Johnson & David Rossell, 2010. "On the use of non‐local prior densities in Bayesian hypothesis tests," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(2), pages 143-170, March.
    14. Ming-Hui Chen, 2004. "Bayesian criterion based model assessment for categorical data," Biometrika, Biometrika Trust, vol. 91(1), pages 45-63, March.
    15. David J. Spiegelhalter & Nicola G. Best & Bradley P. Carlin & Angelika Van Der Linde, 2002. "Bayesian measures of model complexity and fit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 583-639, October.
    16. Casella, George & Moreno, Elias, 2006. "Objective Bayesian Variable Selection," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 157-167, March.
    17. Chib S. & Jeliazkov I., 2001. "Marginal Likelihood From the Metropolis-Hastings Output," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 270-281, March.
    18. Li, Yong & Yu, Jun & Zeng, Tao, 2020. "Deviance information criterion for latent variable models and misspecified models," Journal of Econometrics, Elsevier, vol. 216(2), pages 450-493.
    19. Yingbo Li & Merlise A. Clyde, 2018. "Mixtures of g-Priors in Generalized Linear Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(524), pages 1828-1845, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dimitris Korobilis & Kenichi Shimizu, 2022. "Bayesian Approaches to Shrinkage and Sparse Estimation," Foundations and Trends(R) in Econometrics, now publishers, vol. 11(4), pages 230-354, June.
    2. Mark F. J. Steel, 2020. "Model Averaging and Its Use in Economics," Journal of Economic Literature, American Economic Association, vol. 58(3), pages 644-719, September.
    3. Oludare Ariyo & Emmanuel Lesaffre & Geert Verbeke & Adrian Quintero, 2022. "Bayesian Model Selection for Longitudinal Count Data," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 84(2), pages 516-547, November.
    4. Chan, Joshua C.C., 2023. "Comparing stochastic volatility specifications for large Bayesian VARs," Journal of Econometrics, Elsevier, vol. 235(2), pages 1419-1446.
    5. Fang Liu & Xiaojing Wang & Roeland Hancock & Ming-Hui Chen, 2022. "Bayesian Model Assessment for Jointly Modeling Multidimensional Response Data with Application to Computerized Testing," Psychometrika, Springer;The Psychometric Society, vol. 87(4), pages 1290-1317, December.
    6. Li, Hanning & Pati, Debdeep, 2017. "Variable selection using shrinkage priors," Computational Statistics & Data Analysis, Elsevier, vol. 107(C), pages 107-119.
    7. Li, Yong & Yu, Jun & Zeng, Tao, 2020. "Deviance information criterion for latent variable models and misspecified models," Journal of Econometrics, Elsevier, vol. 216(2), pages 450-493.
    8. Ley, Eduardo & Steel, Mark F.J., 2012. "Mixtures of g-priors for Bayesian model averaging with economic applications," Journal of Econometrics, Elsevier, vol. 171(2), pages 251-266.
    9. Gilles Celeux & Mohammed El Anbari & Jean-Michel Marin & Christian P. Robert, 2010. "Regularization in Regression : Comparing Bayesian and Frequentist Methods in a Poorly Informative Situation," Working Papers 2010-43, Center for Research in Economics and Statistics.
    10. Min Wang & Xiaoqian Sun & Tao Lu, 2015. "Bayesian structured variable selection in linear regression models," Computational Statistics, Springer, vol. 30(1), pages 205-229, March.
    11. Junior A. Ojeda Cunya & Gabriel Rodríguez, 2022. "Time-Varying Effects of External Shocks on Macroeconomic Fluctuations in Peru: An Empirical Application using TVP-VAR- SV Models," Documentos de Trabajo / Working Papers 2022-507, Departamento de Economía - Pontificia Universidad Católica del Perú.
    12. Guido Consonni & Roberta Paroli, 2017. "Objective Bayesian Comparison of Constrained Analysis of Variance Models," Psychometrika, Springer;The Psychometric Society, vol. 82(3), pages 589-609, September.
    13. Li, Yong & Yu, Jun & Zeng, Tao, 2018. "Integrated Deviance Information Criterion for Latent Variable Models," Economics and Statistics Working Papers 6-2018, Singapore Management University, School of Economics.
    14. Shi, Guiling & Lim, Chae Young & Maiti, Tapabrata, 2019. "Model selection using mass-nonlocal prior," Statistics & Probability Letters, Elsevier, vol. 147(C), pages 36-44.
    15. Posch, Konstantin & Arbeiter, Maximilian & Pilz, Juergen, 2020. "A novel Bayesian approach for variable selection in linear regression models," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    16. Yang, Kai & Yu, Xinyang & Zhang, Qingqing & Dong, Xiaogang, 2022. "On MCMC sampling in self-exciting integer-valued threshold time series models," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).
    17. Bresson Georges & Chaturvedi Anoop & Rahman Mohammad Arshad & Shalabh, 2021. "Seemingly unrelated regression with measurement error: estimation via Markov Chain Monte Carlo and mean field variational Bayes approximation," The International Journal of Biostatistics, De Gruyter, vol. 17(1), pages 75-97, May.
    18. Ye Yang & Osman Doğan & Süleyman Taşpınar, 2023. "Observed-data DIC for spatial panel data models," Empirical Economics, Springer, vol. 64(3), pages 1281-1314, March.
    19. Ye Yang & Osman Dogan & Suleyman Taspinar & Fei Jin, 2023. "A Review of Cross-Sectional Matrix Exponential Spatial Models," Papers 2311.14813, arXiv.org.
    20. Michael J. Daniels & Arkendu S. Chatterjee & Chenguang Wang, 2012. "Bayesian Model Selection for Incomplete Data Using the Posterior Predictive Distribution," Biometrics, The International Biometric Society, vol. 68(4), pages 1055-1063, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:70:y:2021:i:4:p:835-857. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.