IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v71y2014icp758-770.html
   My bibliography  Save this article

Model selection and model averaging after multiple imputation

Author

Listed:
  • Schomaker, Michael
  • Heumann, Christian

Abstract

Model selection and model averaging are two important techniques to obtain practical and useful models in applied research. However, it is now well-known that many complex issues arise, especially in the context of model selection, when the stochastic nature of the selection process is ignored and estimates, standard errors, and confidence intervals are calculated as if the selected model was known a priori. While model averaging aims to incorporate the uncertainty associated with the model selection process by combining estimates over a set of models, there is still some debate over appropriate interpretation and confidence interval construction. These problems become even more complex in the presence of missing data and it is currently not entirely clear how to proceed. To deal with such situations, a framework for model selection and model averaging in the context of missing data is proposed. The focus lies on multiple imputation as a strategy to deal with the missingness: a consequent combination with model averaging aims to incorporate both the uncertainty associated with the model selection and with the imputation process. Furthermore, the performance of bootstrapping as a flexible extension to our framework is evaluated. Monte Carlo simulations are used to reveal the nature of the proposed estimators in the context of the linear regression model. The practical implications of our approach are illustrated by means of a recent survival study on sputum culture conversion in pulmonary tuberculosis.

Suggested Citation

  • Schomaker, Michael & Heumann, Christian, 2014. "Model selection and model averaging after multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 758-770.
  • Handle: RePEc:eee:csdana:v:71:y:2014:i:c:p:758-770
    DOI: 10.1016/j.csda.2013.02.017
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S016794731300073X
    Download Restriction: Full text for ScienceDirect subscribers only.

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Yan, Jun, 2007. "Enjoy the Joy of Copulas: With a Package copula," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 21(i04).
    2. Horton, Nicholas J. & Kleinman, Ken P., 2007. "Much Ado About Nothing: A Comparison of Missing Data Methods and Software to Fit Incomplete Data Regression Models," The American Statistician, American Statistical Association, vol. 61, pages 79-90, February.
    3. Leeb, Hannes & P tscher, Benedikt M., 2005. "Model Selection And Inference: Facts And Fiction," Econometric Theory, Cambridge University Press, vol. 21(01), pages 21-59, February.
    4. Hjort N.L. & Claeskens G., 2003. "Frequentist Model Average Estimators," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 879-899, January.
    5. Turek, Daniel & Fletcher, David, 2012. "Model-averaged Wald confidence intervals," Computational Statistics & Data Analysis, Elsevier, vol. 56(9), pages 2809-2815.
    6. Magnus, Jan R. & Powell, Owen & Prüfer, Patricia, 2010. "A comparison of two model averaging techniques with an application to growth empirics," Journal of Econometrics, Elsevier, vol. 154(2), pages 139-153, February.
    7. Leeb, Hannes & P tscher, Benedikt M., 2008. "Can One Estimate The Unconditional Distribution Of Post-Model-Selection Estimators?," Econometric Theory, Cambridge University Press, vol. 24(02), pages 338-376, April.
    8. Kabaila, Paul & Leeb, Hannes, 2006. "On the Large-Sample Minimal Coverage Probability of Confidence Intervals After Model Selection," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 619-629, June.
    9. Schomaker Michael & Heumann Christian, 2011. "Model Averaging in Factor Analysis: An Analysis of Olympic Decathlon Data," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 7(1), pages 1-15, January.
    10. Magnus, Jan R. & Wan, Alan T.K. & Zhang, Xinyu, 2011. "Weighted average least squares estimation with nonspherical disturbances and an application to the Hong Kong housing market," Computational Statistics & Data Analysis, Elsevier, vol. 55(3), pages 1331-1341, March.
    11. Wan, Alan T.K. & Zhang, Xinyu & Zou, Guohua, 2010. "Least squares model averaging by Mallows criterion," Journal of Econometrics, Elsevier, vol. 156(2), pages 277-283, June.
    12. Hansen, Bruce E. & Racine, Jeffrey S., 2012. "Jackknife model averaging," Journal of Econometrics, Elsevier, vol. 167(1), pages 38-46.
    13. Liang, Hua & Zou, Guohua & Wan, Alan T. K. & Zhang, Xinyu, 2011. "Optimal Weight Choice for Frequentist Model Average Estimators," Journal of the American Statistical Association, American Statistical Association, vol. 106(495), pages 1053-1066.
    14. Schomaker, Michael & Wan, Alan T.K. & Heumann, Christian, 2010. "Frequentist Model Averaging with missing observations," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3336-3347, December.
    15. Michael Schomaker, 2012. "Shrinkage averaging estimation," Statistical Papers, Springer, vol. 53(4), pages 1015-1034, November.
    16. repec:taf:jnlbes:v:30:y:2012:i:1:p:132-142 is not listed on IDEAS
    17. Pötscher, Benedikt M., 2006. "The Distribution of Model Averaging Estimators and an Impossibility Result Regarding Its Estimation," MPRA Paper 73, University Library of Munich, Germany, revised Jul 2006.
    18. Fletcher, David & Dillingham, Peter W., 2011. "Model-averaged confidence intervals for factorial experiments," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 3041-3048, November.
    19. Bruce E. Hansen, 2007. "Least Squares Model Averaging," Econometrica, Econometric Society, vol. 75(4), pages 1175-1189, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hai Wang & Xinjie Chen & Nancy Flournoy, 2016. "The focused information criterion for varying-coefficient partially linear measurement error models," Statistical Papers, Springer, vol. 57(1), pages 99-113, March.
    2. Hai Ying Wang & Xinjie Chen & Nancy Flournoy, 2016. "The focused information criterion for varying-coefficient partially linear measurement error models," Statistical Papers, Springer, vol. 57(1), pages 99-113, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:71:y:2014:i:c:p:758-770. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Dana Niculescu). General contact details of provider: http://www.elsevier.com/locate/csda .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.