Model selection and model averaging after multiple imputation

Model selection and model averaging after multiple imputation

Author

Listed:

Schomaker, Michael
Heumann, Christian

Abstract

Model selection and model averaging are two important techniques to obtain practical and useful models in applied research. However, it is now well-known that many complex issues arise, especially in the context of model selection, when the stochastic nature of the selection process is ignored and estimates, standard errors, and confidence intervals are calculated as if the selected model was known a priori. While model averaging aims to incorporate the uncertainty associated with the model selection process by combining estimates over a set of models, there is still some debate over appropriate interpretation and confidence interval construction. These problems become even more complex in the presence of missing data and it is currently not entirely clear how to proceed. To deal with such situations, a framework for model selection and model averaging in the context of missing data is proposed. The focus lies on multiple imputation as a strategy to deal with the missingness: a consequent combination with model averaging aims to incorporate both the uncertainty associated with the model selection and with the imputation process. Furthermore, the performance of bootstrapping as a flexible extension to our framework is evaluated. Monte Carlo simulations are used to reveal the nature of the proposed estimators in the context of the linear regression model. The practical implications of our approach are illustrated by means of a recent survival study on sputum culture conversion in pulmonary tuberculosis.

Suggested Citation

Schomaker, Michael & Heumann, Christian, 2014. "Model selection and model averaging after multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 758-770.

Handle: RePEc:eee:csdana:v:71:y:2014:i:c:p:758-770
DOI: 10.1016/j.csda.2013.02.017

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

References listed on IDEAS

Yan, Jun, 2007. "Enjoy the Joy of Copulas: With a Package copula," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 21(i04).
Horton, Nicholas J. & Kleinman, Ken P., 2007. "Much Ado About Nothing: A Comparison of Missing Data Methods and Software to Fit Incomplete Data Regression Models," The American Statistician, American Statistical Association, vol. 61, pages 79-90, February.
Leeb, Hannes & Pötscher, Benedikt M., 2005. "Model Selection And Inference: Facts And Fiction," Econometric Theory, Cambridge University Press, vol. 21(1), pages 21-59, February.
Chris Chatfield, 1995. "Model Uncertainty, Data Mining and Statistical Inference," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 158(3), pages 419-444, May.
Hjort N.L. & Claeskens G., 2003. "Frequentist Model Average Estimators," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 879-899, January.
Turek, Daniel & Fletcher, David, 2012. "Model-averaged Wald confidence intervals," Computational Statistics & Data Analysis, Elsevier, vol. 56(9), pages 2809-2815.
Gerda Claeskens & Fabrizio Consentino, 2008. "Variable Selection with Incomplete Covariate Data," Biometrics, The International Biometric Society, vol. 64(4), pages 1062-1069, December.
Magnus, Jan R. & Powell, Owen & Prüfer, Patricia, 2010. "A comparison of two model averaging techniques with an application to growth empirics," Journal of Econometrics, Elsevier, vol. 154(2), pages 139-153, February.
Kabaila, Paul & Leeb, Hannes, 2006. "On the Large-Sample Minimal Coverage Probability of Confidence Intervals After Model Selection," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 619-629, June.
Schomaker Michael & Heumann Christian, 2011. "Model Averaging in Factor Analysis: An Analysis of Olympic Decathlon Data," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 7(1), pages 1-15, January.
Magnus, Jan R. & Wan, Alan T.K. & Zhang, Xinyu, 2011. "Weighted average least squares estimation with nonspherical disturbances and an application to the Hong Kong housing market," Computational Statistics & Data Analysis, Elsevier, vol. 55(3), pages 1331-1341, March.
Wan, Alan T.K. & Zhang, Xinyu & Zou, Guohua, 2010. "Least squares model averaging by Mallows criterion," Journal of Econometrics, Elsevier, vol. 156(2), pages 277-283, June.
Hansen, Bruce E. & Racine, Jeffrey S., 2012. "Jackknife model averaging," Journal of Econometrics, Elsevier, vol. 167(1), pages 38-46.
Liang, Hua & Zou, Guohua & Wan, Alan T. K. & Zhang, Xinyu, 2011. "Optimal Weight Choice for Frequentist Model Average Estimators," Journal of the American Statistical Association, American Statistical Association, vol. 106(495), pages 1053-1066.
Schomaker, Michael & Wan, Alan T.K. & Heumann, Christian, 2010. "Frequentist Model Averaging with missing observations," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3336-3347, December.
Michael Schomaker, 2012. "Shrinkage averaging estimation," Statistical Papers, Springer, vol. 53(4), pages 1015-1034, November.
Leeb, Hannes & Pötscher, Benedikt M., 2008. "Can One Estimate The Unconditional Distribution Of Post-Model-Selection Estimators?," Econometric Theory, Cambridge University Press, vol. 24(2), pages 338-376, April.
- Hannes Leeb & Benedikt M. Potscher, 2003. "Can One Estimate the Conditional Distribution of Post-Model-Selection Estimators?," Cowles Foundation Discussion Papers 1444, Cowles Foundation for Research in Economics, Yale University.
- Leeb, Hannes & Pötscher, Benedikt M., 2005. "Can One Estimate the Unconditional Distribution of Post-Model-Selection Estimators ?," MPRA Paper 72, University Library of Munich, Germany.
Xinyu Zhang & Alan Wan & Sherry Zhou, 2012. "Focused Information Criteria, Model Selection, and Model Averaging in a Tobit Model With a Nonzero Threshold," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 30(1), pages 132-142.
Pötscher, Benedikt M., 2006. "The Distribution of Model Averaging Estimators and an Impossibility Result Regarding Its Estimation," MPRA Paper 73, University Library of Munich, Germany, revised Jul 2006.
Fletcher, David & Dillingham, Peter W., 2011. "Model-averaged confidence intervals for factorial experiments," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 3041-3048, November.
Bruce E. Hansen, 2007. "Least Squares Model Averaging," Econometrica, Econometric Society, vol. 75(4), pages 1175-1189, July.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Hai Wang & Xinjie Chen & Nancy Flournoy, 2016. "The focused information criterion for varying-coefficient partially linear measurement error models," Statistical Papers, Springer, vol. 57(1), pages 99-113, March.
Nitzan Cohen & Yakir Berchenko, 2021. "Normalized Information Criteria and Model Selection in the Presence of Missing Data," Mathematics, MDPI, vol. 9(19), pages 1-23, October.
Michael Schomaker & Christian Heumann, 2020. "When and when not to use optimal model averaging," Statistical Papers, Springer, vol. 61(5), pages 2221-2240, October.
Lasanthi C. R. Pelawa Watagoda & David J. Olive, 2021. "Bootstrapping multiple linear regression after variable selection," Statistical Papers, Springer, vol. 62(2), pages 681-700, April.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Michael Schomaker & Christian Heumann, 2020. "When and when not to use optimal model averaging," Statistical Papers, Springer, vol. 61(5), pages 2221-2240, October.
Xinyu Zhang & Alan T. K. Wan & Sherry Z. Zhou, 2011. "Focused Information Criteria, Model Selection, and Model Averaging in a Tobit Model With a Nonzero Threshold," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 30(1), pages 132-142, June.
Liu, Chu-An, 2015. "Distribution theory of the least squares averaging estimator," Journal of Econometrics, Elsevier, vol. 186(1), pages 142-159.
- Liu, Chu-An, 2013. "Distribution Theory of the Least Squares Averaging Estimator," MPRA Paper 54201, University Library of Munich, Germany.
Shaobo Jin & Sebastian Ankargren, 2019. "Frequentist Model Averaging in Structural Equation Modelling," Psychometrika, Springer;The Psychometric Society, vol. 84(1), pages 84-104, March.
Wan, Alan T.K. & Zhang, Xinyu & Wang, Shouyang, 2014. "Frequentist model averaging for multinomial and ordered logit models," International Journal of Forecasting, Elsevier, vol. 30(1), pages 118-128.
Magnus, Jan R. & Wan, Alan T.K. & Zhang, Xinyu, 2011. "Weighted average least squares estimation with nonspherical disturbances and an application to the Hong Kong housing market," Computational Statistics & Data Analysis, Elsevier, vol. 55(3), pages 1331-1341, March.
Qingfeng Liu & Ryo Okui & Arihiro Yoshimura, 2016. "Generalized Least Squares Model Averaging," Econometric Reviews, Taylor & Francis Journals, vol. 35(8-10), pages 1692-1752, December.
- Qingfeng Liu & Ryo Okui & Arihiro Yoshimura, 2013. "Generalized Least Squares Model Averaging," KIER Working Papers 855, Kyoto University, Institute of Economic Research.
Michael Schomaker, 2012. "Shrinkage averaging estimation," Statistical Papers, Springer, vol. 53(4), pages 1015-1034, November.
Aman Ullah & Alan T. K. Wan & Huansha Wang & Xinyu Zhang & Guohua Zou, 2017. "A semiparametric generalized ridge estimator and link with model averaging," Econometric Reviews, Taylor & Francis Journals, vol. 36(1-3), pages 370-384, March.
- Aman Ullah & Alan T.K. Wan & Huansha Wang & Xinyu Zhang & Guohua Zou, 2014. "A Semiparametric Generalized Ridge Estimator and Link with Model Averaging," Working Papers 201412, University of California at Riverside, Department of Economics.
Zhang, Xinyu & Wan, Alan T.K. & Zou, Guohua, 2013. "Model averaging by jackknife criterion in models with dependent data," Journal of Econometrics, Elsevier, vol. 174(2), pages 82-94.
Jan R. Magnus & Wendun Wang & Xinyu Zhang, 2016. "Weighted-Average Least Squares Prediction," Econometric Reviews, Taylor & Francis Journals, vol. 35(6), pages 1040-1074, June.
Haili Zhang & Guohua Zou, 2020. "Cross-Validation Model Averaging for Generalized Functional Linear Model," Econometrics, MDPI, vol. 8(1), pages 1-35, February.
Yuting Wei & Qihua Wang & Wei Liu, 2021. "Model averaging for linear models with responses missing at random," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(3), pages 535-553, June.
Chu-An Liu & Biing-Shen Kuo & Wen-Jen Tsay, 2017. "Autoregressive Spectral Averaging Estimator," IEAS Working Paper : academic research 17-A013, Institute of Economics, Academia Sinica, Taipei, Taiwan.
Shangwei Zhao & Jun Liao & Dalei Yu, 2020. "Model averaging estimator in ridge regression and its large sample properties," Statistical Papers, Springer, vol. 61(4), pages 1719-1739, August.
Schomaker, Michael & Wan, Alan T.K. & Heumann, Christian, 2010. "Frequentist Model Averaging with missing observations," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3336-3347, December.
Shou-Yung Yin & Chu-An Liu & Chang-Ching Lin, 2021. "Focused Information Criterion and Model Averaging for Large Panels With a Multifactor Error Structure," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 39(1), pages 54-68, January.
- Shou-Yung Yin & Chu-An Liu & Chang-Ching Lin, 2016. "Focused Information Criterion and Model Averaging for Large Panels with a Multifactor Error Structure," IEAS Working Paper : academic research 16-A016, Institute of Economics, Academia Sinica, Taipei, Taiwan.
Liu, Chu-An, 2012. "A plug-in averaging estimator for regressions with heteroskedastic errors," MPRA Paper 41414, University Library of Munich, Germany.
Aman Ullah & Huansha Wang, 2013. "Parametric and Nonparametric Frequentist Model Selection and Model Averaging," Econometrics, MDPI, vol. 1(2), pages 1-23, September.
Shi, Ruoyao, 2024. "An Averaging Estimator For Two-Step M-Estimation In Semiparametric Models," Econometric Theory, Cambridge University Press, vol. 40(3), pages 652-687, June.
- Ruoyao Shi, 2021. "An Averaging Estimator for Two Step M Estimation in Semiparametric Models," Working Papers 202105, University of California at Riverside, Department of Economics.
- Ruoyao Shi, 2022. "An Averaging Estimator for Two Step M Estimation in Semiparametric Models," Working Papers 202211, University of California at Riverside, Department of Economics.
- Ruoyao Shi, 2022. "An Averaging Estimator for Two Step M Estimation in Semiparametric Models," Working Papers 202201, University of California at Riverside, Department of Economics.

More about this item

Keywords

; ; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:71:y:2014:i:c:p:758-770. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Model selection and model averaging after multiple imputation

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data