IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v55y2011i11p2908-2924.html
   My bibliography  Save this article

Modified versions of the Bayesian Information Criterion for sparse Generalized Linear Models

Author

Listed:
  • Zak-Szatkowska, Malgorzata
  • Bogdan, Malgorzata

Abstract

The classical model selection criteria, such as the Bayesian Information Criterion (BIC) or Akaike information criterion (AIC), have a strong tendency to overestimate the number of regressors when the search is performed over a large number of potential explanatory variables. To handle the problem of the overestimation, several modifications of the BIC have been proposed. These versions rely on supplementing the original BIC with some prior distributions on the class of possible models. Three such modifications are presented and compared in the context of sparse Generalized Linear Models (GLMs). The related choices of priors are discussed and the conditions for the asymptotic equivalence of these criteria are provided. The performance of the modified versions of the BIC is illustrated with an extensive simulation study and a real data analysis. Also, simplified versions of the modified BIC, based on least squares regression, are investigated.

Suggested Citation

  • Zak-Szatkowska, Malgorzata & Bogdan, Malgorzata, 2011. "Modified versions of the Bayesian Information Criterion for sparse Generalized Linear Models," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2908-2924, November.
  • Handle: RePEc:eee:csdana:v:55:y:2011:i:11:p:2908-2924
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947311001459
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jiahua Chen & Zehua Chen, 2008. "Extended Bayesian information criteria for model selection with large model spaces," Biometrika, Biometrika Trust, vol. 95(3), pages 759-771.
    2. Ye, Gui-Bo & Xie, Xiaohui, 2011. "Split Bregman method for large scale fused Lasso," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1552-1569, April.
    3. Crews, Hugh B. & Boos, Dennis D. & Stefanski, Leonard A., 2011. "FSR methods for second-order regression models," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2026-2037, June.
    4. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    5. Kapetanios, George, 2007. "Variable selection in regression models using nonstandard optimisation of information criteria," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 4-15, September.
    6. Erhardt Vinzenz & Bogdan Malgorzata & Czado Claudia, 2010. "Locating Multiple Interacting Quantitative Trait Loci with the Zero-Inflated Generalized Poisson Regression," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-27, June.
    7. Małgorzata Bogdan & Florian Frommlet & Przemysław Biecek & Riyan Cheng & Jayanta K. Ghosh & R.W. Doerge, 2008. "Extending the Modified Bayesian Information Criterion (mBIC) to Dense Markers and Multiple Interval Mapping," Biometrics, The International Biometric Society, vol. 64(4), pages 1162-1169, December.
    8. Marra, Giampiero & Wood, Simon N., 2011. "Practical variable selection for generalized additive models," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2372-2387, July.
    9. Karl W. Broman & Terence P. Speed, 2002. "A model selection approach for the identification of quantitative trait loci in experimental crosses," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 641-656, October.
    10. Baierl, Andreas & Futschik, Andreas & Bogdan, Malgorzata & Biecek, Przemyslaw, 2007. "Locating multiple interacting quantitative trait loci using robust model selection," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 6423-6434, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Frommlet Florian & Ljubic Ivana & Arnardóttir Helga Björk & Bogdan Malgorzata, 2012. "QTL Mapping Using a Memetic Algorithm with Modifications of BIC as Fitness Function," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(4), pages 1-26, May.
    2. Jian Huang & Yuling Jiao & Lican Kang & Jin Liu & Yanyan Liu & Xiliang Lu, 2022. "GSDAR: a fast Newton algorithm for $$\ell _0$$ ℓ 0 regularized generalized linear models with statistical guarantee," Computational Statistics, Springer, vol. 37(1), pages 507-533, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Frommlet, Florian & Ruhaltinger, Felix & Twaróg, Piotr & Bogdan, Małgorzata, 2012. "Modified versions of Bayesian Information Criterion for genome-wide association studies," Computational Statistics & Data Analysis, Elsevier, vol. 56(5), pages 1038-1051.
    2. Frommlet Florian & Ljubic Ivana & Arnardóttir Helga Björk & Bogdan Malgorzata, 2012. "QTL Mapping Using a Memetic Algorithm with Modifications of BIC as Fitness Function," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(4), pages 1-26, May.
    3. Guo, Jie & Tang, Manlai & Tian, Maozai & Zhu, Kai, 2013. "Variable selection in high-dimensional partially linear additive models for composite quantile regression," Computational Statistics & Data Analysis, Elsevier, vol. 65(C), pages 56-67.
    4. Ryan A. Peterson & Joseph E. Cavanaugh, 2022. "Ranked sparsity: a cogent regularization framework for selecting and estimating feature interactions and polynomials," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 106(3), pages 427-454, September.
    5. Yawei He & Zehua Chen, 2016. "The EBIC and a sequential procedure for feature selection in interactive linear models with high-dimensional data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 68(1), pages 155-180, February.
    6. Chun Wang, 2021. "Using Penalized EM Algorithm to Infer Learning Trajectories in Latent Transition CDM," Psychometrika, Springer;The Psychometric Society, vol. 86(1), pages 167-189, March.
    7. Erhardt Vinzenz & Bogdan Malgorzata & Czado Claudia, 2010. "Locating Multiple Interacting Quantitative Trait Loci with the Zero-Inflated Generalized Poisson Regression," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-27, June.
    8. Kaixu Yang & Tapabrata Maiti, 2022. "Ultrahigh‐dimensional generalized additive model: Unified theory and methods," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(3), pages 917-942, September.
    9. Wang, Tao & Zhu, Lixing, 2011. "Consistent tuning parameter selection in high dimensional sparse linear regression," Journal of Multivariate Analysis, Elsevier, vol. 102(7), pages 1141-1151, August.
    10. Gaorong Li & Liugen Xue & Heng Lian, 2012. "SCAD-penalised generalised additive models with non-polynomial dimensionality," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(3), pages 681-697.
    11. Xiaotong Shen & Wei Pan & Yunzhang Zhu & Hui Zhou, 2013. "On constrained and regularized high-dimensional regression," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 65(5), pages 807-832, October.
    12. Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
    13. Lian, Heng & Du, Pang & Li, YuanZhang & Liang, Hua, 2014. "Partially linear structure identification in generalized additive models with NP-dimensionality," Computational Statistics & Data Analysis, Elsevier, vol. 80(C), pages 197-208.
    14. Tang, Yanlin & Song, Xinyuan & Wang, Huixia Judy & Zhu, Zhongyi, 2013. "Variable selection in high-dimensional quantile varying coefficient models," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 115-132.
    15. Yunxiao Chen & Xiaoou Li & Jingchen Liu & Zhiliang Ying, 2017. "Regularized Latent Class Analysis with Application in Cognitive Diagnosis," Psychometrika, Springer;The Psychometric Society, vol. 82(3), pages 660-692, September.
    16. Li, Xinyi & Wang, Li & Nettleton, Dan, 2019. "Sparse model identification and learning for ultra-high-dimensional additive partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 204-228.
    17. Zhaoliang Wang & Liugen Xue & Gaorong Li & Fei Lu, 2019. "Spline estimator for ultra-high dimensional partially linear varying coefficient models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(3), pages 657-677, June.
    18. Zhang, Ting & Wang, Lei, 2020. "Smoothed empirical likelihood inference and variable selection for quantile regression with nonignorable missing response," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    19. Chenchen Ma & Jing Ouyang & Gongjun Xu, 2023. "Learning Latent and Hierarchical Structures in Cognitive Diagnosis Models," Psychometrika, Springer;The Psychometric Society, vol. 88(1), pages 175-207, March.
    20. Canhong Wen & Xueqin Wang & Shaoli Wang, 2015. "Laplace Error Penalty-based Variable Selection in High Dimension," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 42(3), pages 685-700, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:55:y:2011:i:11:p:2908-2924. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.