IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i19p2474-d649347.html
   My bibliography  Save this article

Normalized Information Criteria and Model Selection in the Presence of Missing Data

Author

Listed:
  • Nitzan Cohen

    (Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, P.O. Box 653, Beer-Sheva 84105, Israel)

  • Yakir Berchenko

    (Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, P.O. Box 653, Beer-Sheva 84105, Israel)

Abstract

Information criteria such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC) are commonly used for model selection. However, the current theory does not support unconventional data, so naive use of these criteria is not suitable for data with missing values. Imputation, at the core of most alternative methods, is both distorted as well as computationally demanding. We propose a new approach that enables the use of classic well-known information criteria for model selection when there are missing data. We adapt the current theory of information criteria through normalization, accounting for the different sample sizes used for each candidate model (focusing on AIC and BIC). Interestingly, when the sample sizes are different, our theoretical analysis finds that A I C j / n j is the proper correction for A I C j that we need to optimize (where n j is the sample size available to the j th model) while − ( B I C j − B I C i ) / ( n j − n i ) is the correction of BIC. Furthermore, we find that the computational complexity of normalized information criteria methods is exponentially better than that of imputation methods. In a series of simulation studies, we find that normalized-AIC and normalized-BIC outperform previous methods (i.e., normalized-AIC is more efficient, and normalized BIC includes only important variables, although it tends to exclude some of them in cases of large correlation). We propose three additional methods aimed at increasing the statistical efficiency of normalized-AIC: post-selection imputation , Akaike sub-model averaging , and minimum-variance averaging . The latter succeeds in increasing efficiency further.

Suggested Citation

  • Nitzan Cohen & Yakir Berchenko, 2021. "Normalized Information Criteria and Model Selection in the Presence of Missing Data," Mathematics, MDPI, vol. 9(19), pages 1-23, October.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:19:p:2474-:d:649347
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/19/2474/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/19/2474/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Marco Doretti & Sara Geneletti & Elena Stanghellini, 2018. "Missing Data: A Unified Taxonomy Guided by Conditional Independence," International Statistical Review, International Statistical Institute, vol. 86(2), pages 189-204, August.
    2. Schomaker, Michael & Heumann, Christian, 2014. "Model selection and model averaging after multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 758-770.
    3. Xiaowei Yang & Thomas R. Belin & W. John Boscardin, 2005. "Imputation and Variable Selection in Linear Regression Models with Missing Covariates," Biometrics, The International Biometric Society, vol. 61(2), pages 498-506, June.
    4. Gerda Claeskens & Fabrizio Consentino, 2008. "Variable Selection with Incomplete Covariate Data," Biometrics, The International Biometric Society, vol. 64(4), pages 1062-1069, December.
    5. Zeugner, Stefan & Feldkircher, Martin, 2015. "Bayesian Model Averaging Employing Fixed and Flexible Priors: The BMS Package for R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 68(i04).
    6. Schomaker, Michael & Wan, Alan T.K. & Heumann, Christian, 2010. "Frequentist Model Averaging with missing observations," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3336-3347, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hai Wang & Xinjie Chen & Nancy Flournoy, 2016. "The focused information criterion for varying-coefficient partially linear measurement error models," Statistical Papers, Springer, vol. 57(1), pages 99-113, March.
    2. Jiming Jiang & Thuan Nguyen & J. Sunil Rao, 2015. "The E-MS Algorithm: Model Selection With Incomplete Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(511), pages 1136-1147, September.
    3. Zhimeng Sun & Zhi Su & Jingyi Ma, 2014. "Focused vector information criterion model selection and model averaging regression with missing response," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 77(3), pages 415-432, April.
    4. Adriano Zanin Zambom & Gregory J. Matthews, 2021. "Sure independence screening in the presence of missing data," Statistical Papers, Springer, vol. 62(2), pages 817-845, April.
    5. Schomaker, Michael & Heumann, Christian, 2014. "Model selection and model averaging after multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 758-770.
    6. Abe, Ryosuke & Kato, Hironori, 2017. "What led to the establishment of a rail-oriented city? Determinants of urban rail supply in Tokyo, Japan, 1950–2010," Transport Policy, Elsevier, vol. 58(C), pages 72-79.
    7. Riccardo (Jack) Lucchetti & Luca Pedini, 2020. "ParMA: Parallelised Bayesian Model Averaging for Generalised Linear Models," Working Papers 2020:28, Department of Economics, University of Venice "Ca' Foscari".
    8. Anna Sokolova, 2023. "Marginal Propensity to Consume and Unemployment: a Meta-analysis," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 51, pages 813-846, December.
    9. Jindrich Matousek & Tomas Havranek & Zuzana Irsova, 2022. "Individual discount rates: a meta-analysis of experimental evidence," Experimental Economics, Springer;Economic Science Association, vol. 25(1), pages 318-358, February.
    10. Zhongqi Liang & Qihua Wang & Yuting Wei, 2022. "Robust model selection with covariables missing at random," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(3), pages 539-557, June.
    11. Joseph G. Ibrahim & Hongtu Zhu & Ramon I. Garcia & Ruixin Guo, 2011. "Fixed and Random Effects Selection in Mixed Effects Models," Biometrics, The International Biometric Society, vol. 67(2), pages 495-503, June.
    12. Schomaker Michael & Heumann Christian, 2011. "Model Averaging in Factor Analysis: An Analysis of Olympic Decathlon Data," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 7(1), pages 1-15, January.
    13. Janus, Jakub, 2021. "The COVID-19 shock and long-term interest rates in emerging market economies," Finance Research Letters, Elsevier, vol. 43(C).
    14. Beata K. Bierut & Piot Dybka, 2019. "Institutional determinants of export competitiveness among the EU countries: evidence from Bayesian model averaging," KAE Working Papers 2019-043, Warsaw School of Economics, Collegium of Economic Analysis.
    15. Dirick, Lore & Claeskens, Gerda & Baesens, Bart, 2015. "An Akaike information criterion for multiple event mixture cure models," European Journal of Operational Research, Elsevier, vol. 241(2), pages 449-457.
    16. Roman Horvath & Ali Elminejad & Tomas Havranek, 2020. "Publication and Identification Biases in Measuring the Intertemporal Substitution of Labor Supply," Working Papers IES 2020/32, Charles University Prague, Faculty of Social Sciences, Institute of Economic Studies, revised Sep 2020.
    17. Chen Ray-Bing & Lee Kuo-Jung & Chen Yi-Chi & Chu Chi-Hsiang, 2017. "On the determinants of the 2008 financial crisis: a Bayesian approach to the selection of groups and variables," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 21(5), pages 1-17, December.
    18. Rajeev K. Goel & James W. Saunoris, 2020. "A Replication of “Sorting through Global Corruption Determinants: Institutions and Education Matter—Not Culture†(World Development 2018)," Public Finance Review, , vol. 48(4), pages 538-567, July.
    19. Joseph, Andreas & Osbat, Chiara, 2016. "How you export matters: the disassortative structure of international trade," Working Paper Series 1958, European Central Bank.
    20. Florian Morvillier, 2018. "On the impact of the launch of the euro on EMU macroeconomic vulnerability," EconomiX Working Papers 2018-51, University of Paris Nanterre, EconomiX.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:19:p:2474-:d:649347. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.