IDEAS home Printed from https://ideas.repec.org/a/oup/biomet/v109y2022i1p123-136..html
   My bibliography  Save this article

Large-scale model selection in misspecified generalized linear models
[Information theory and an extension of the maximum likelihood principle]

Author

Listed:
  • Emre Demirkaya
  • Yang Feng
  • Pallavi Basu
  • Jinchi Lv

Abstract

SummaryModel selection is crucial both to high-dimensional learning and to inference for contemporary big data applications in pinpointing the best set of covariates among a sequence of candidate interpretable models. Most existing work implicitly assumes that the models are correctly specified or have fixed dimensionality, yet both model misspecification and high dimensionality are prevalent in practice. In this paper, we exploit the framework of model selection principles under the misspecified generalized linear models presented in Lv & Liu (2014), and investigate the asymptotic expansion of the posterior model probability in the setting of high-dimensional misspecified models. With a natural choice of prior probabilities that encourages interpretability and incorporates the Kullback–Leibler divergence, we suggest using the high-dimensional generalized Bayesian information criterion with prior probability for large-scale model selection with misspecification. Our new information criterion characterizes the impacts of both model misspecification and high dimensionality on model selection. We further establish the consistency of covariance contrast matrix estimation and the model selection consistency of the new information criterion in ultrahigh dimensions under some mild regularity conditions. Our numerical studies demonstrate that the proposed method enjoys improved model selection consistency over its main competitors.

Suggested Citation

  • Emre Demirkaya & Yang Feng & Pallavi Basu & Jinchi Lv, 2022. "Large-scale model selection in misspecified generalized linear models [Information theory and an extension of the maximum likelihood principle]," Biometrika, Biometrika Trust, vol. 109(1), pages 123-136.
  • Handle: RePEc:oup:biomet:v:109:y:2022:i:1:p:123-136.
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1093/biomet/asab005
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jiahua Chen & Zehua Chen, 2008. "Extended Bayesian information criteria for model selection with large model spaces," Biometrika, Biometrika Trust, vol. 95(3), pages 759-771.
    2. Jinchi Lv & Jun S. Liu, 2014. "Model selection principles in misspecified models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 141-167, January.
    3. Heng Peng & Hongjia Yan & Wenyang Zhang, 2013. "The connection between cross-validation and Akaike information criterion in a semiparametric family," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 25(2), pages 475-485, June.
    4. Hamparsum Bozdogan, 1987. "Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions," Psychometrika, Springer;The Psychometric Society, vol. 52(3), pages 345-370, September.
    5. Emmanuel Candès & Yingying Fan & Lucas Janson & Jinchi Lv, 2018. "Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(3), pages 551-577, June.
    6. White, Halbert, 1982. "Maximum Likelihood Estimation of Misspecified Models," Econometrica, Econometric Society, vol. 50(1), pages 1-25, January.
    7. E Fong & C C Holmes, 2020. "On the marginal likelihood and cross-validation," Biometrika, Biometrika Trust, vol. 107(2), pages 489-496.
    8. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    9. Yingying Fan & Cheng Yong Tang, 2013. "Tuning parameter selection in high dimensional penalized likelihood," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(3), pages 531-552, June.
    10. Yingying Fan & Emre Demirkaya & Gaorong Li & Jinchi Lv, 2020. "RANK: Large-Scale Inference With Graphical Nonlinear Knockoffs," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(529), pages 362-379, January.
    11. Rajen D. Shah & Peter Bühlmann, 2018. "Goodness‐of‐fit tests for high dimensional linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(1), pages 113-135, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hani Amir Aouissi & Ahmed Hamimes & Mostefa Ababsa & Lavinia Bianco & Christian Napoli & Feriel Kheira Kebaili & Andrey E. Krauklis & Hafid Bouzekri & Kuldeep Dhama, 2022. "Bayesian Modeling of COVID-19 to Classify the Infection and Death Rates in a Specific Duration: The Case of Algerian Provinces," IJERPH, MDPI, vol. 19(15), pages 1-18, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Eguchi, Shoichi, 2018. "Model comparison for generalized linear models with dependent observations," Econometrics and Statistics, Elsevier, vol. 5(C), pages 171-188.
    2. Pan, Yingli, 2022. "Feature screening and FDR control with knockoff features for ultrahigh-dimensional right-censored data," Computational Statistics & Data Analysis, Elsevier, vol. 173(C).
    3. Zemin Zheng & Jinchi Lv & Wei Lin, 2021. "Nonsparse Learning with Latent Variables," Operations Research, INFORMS, vol. 69(1), pages 346-359, January.
    4. Paweł Teisseyre & Robert A. Kłopotek & Jan Mielniczuk, 2016. "Random Subspace Method for high-dimensional regression with the R package regRSM," Computational Statistics, Springer, vol. 31(3), pages 943-972, September.
    5. Fabio Canova & Christian Matthes, 2021. "Dealing with misspecification in structural macroeconometric models," Quantitative Economics, Econometric Society, vol. 12(2), pages 313-350, May.
    6. Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
    7. Francesco BARTOLUCCI & Silvia BACCI & Claudia PIGINI, 2015. "A Misspecification Test for Finite-Mixture Logistic Models for Clustered Binary and Ordered Responses," Working Papers 410, Universita' Politecnica delle Marche (I), Dipartimento di Scienze Economiche e Sociali.
    8. Tang, Yanlin & Song, Xinyuan & Wang, Huixia Judy & Zhu, Zhongyi, 2013. "Variable selection in high-dimensional quantile varying coefficient models," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 115-132.
    9. Yunxiao Chen & Xiaoou Li & Jingchen Liu & Zhiliang Ying, 2017. "Regularized Latent Class Analysis with Application in Cognitive Diagnosis," Psychometrika, Springer;The Psychometric Society, vol. 82(3), pages 660-692, September.
    10. Li, Xinyi & Wang, Li & Nettleton, Dan, 2019. "Sparse model identification and learning for ultra-high-dimensional additive partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 204-228.
    11. Tae-Hwy Lee & Ekaterina Seregina, 2020. "Learning from Forecast Errors: A New Approach to Forecast Combination," Working Papers 202024, University of California at Riverside, Department of Economics.
    12. Canhong Wen & Xueqin Wang & Shaoli Wang, 2015. "Laplace Error Penalty-based Variable Selection in High Dimension," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 42(3), pages 685-700, September.
    13. Sakyajit Bhattacharya & Paul McNicholas, 2014. "A LASSO-penalized BIC for mixture model selection," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(1), pages 45-61, March.
    14. Chen, J. & Li, D. & Li, Y. & Linton, O. B., 2022. "Estimating Time-Varying Networks for High-Dimensional Time Series," Cambridge Working Papers in Economics 2273, Faculty of Economics, University of Cambridge.
    15. Zhao, Bangxin & Liu, Xin & He, Wenqing & Yi, Grace Y., 2021. "Dynamic tilted current correlation for high dimensional variable screening," Journal of Multivariate Analysis, Elsevier, vol. 182(C).
    16. Liu, Jingyuan & Lou, Lejia & Li, Runze, 2018. "Variable selection for partially linear models via partial correlation," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 418-434.
    17. Honda, Toshio & 本田, 敏雄 & Lin, Chien-Tong, 2022. "Forward variable selection for ultra-high dimensional quantile regression models," Discussion Papers 2021-02, Graduate School of Economics, Hitotsubashi University.
    18. Zhang, Shucong & Zhou, Yong, 2018. "Variable screening for ultrahigh dimensional heterogeneous data via conditional quantile correlations," Journal of Multivariate Analysis, Elsevier, vol. 165(C), pages 1-13.
    19. Haofeng Wang & Hongxia Jin & Xuejun Jiang & Jingzhi Li, 2022. "Model Selection for High Dimensional Nonparametric Additive Models via Ridge Estimation," Mathematics, MDPI, vol. 10(23), pages 1-22, December.
    20. Yang Feng & Qingfeng Liu, 2020. "Nested Model Averaging on Solution Path for High-dimensional Linear Regression," Papers 2005.08057, arXiv.org.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:oup:biomet:v:109:y:2022:i:1:p:123-136.. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Oxford University Press (email available below). General contact details of provider: https://academic.oup.com/biomet .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.