IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v62y2021i2d10.1007_s00362-019-01108-9.html
   My bibliography  Save this article

Bootstrapping multiple linear regression after variable selection

Author

Listed:
  • Lasanthi C. R. Pelawa Watagoda

    (Appalachian State University)

  • David J. Olive

    (Southern Illinois University)

Abstract

This paper suggests a method for bootstrapping the multiple linear regression model $$Y = \beta _1 + \beta _2 x_2 + \cdots + \beta _p x_p + e$$ Y = β 1 + β 2 x 2 + ⋯ + β p x p + e after variable selection. We develop asymptotic theory for some common least squares variable selection estimators such as forward selection with $$C_p$$ C p . Then hypothesis testing is done using three confidence regions, one of which is new. Theory suggests that the three confidence regions tend to have coverage at least as high as the nominal coverage if the sample size is large enough.

Suggested Citation

  • Lasanthi C. R. Pelawa Watagoda & David J. Olive, 2021. "Bootstrapping multiple linear regression after variable selection," Statistical Papers, Springer, vol. 62(2), pages 681-700, April.
  • Handle: RePEc:spr:stpapr:v:62:y:2021:i:2:d:10.1007_s00362-019-01108-9
    DOI: 10.1007/s00362-019-01108-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00362-019-01108-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00362-019-01108-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zhihua Su & R. Dennis Cook, 2012. "Inner envelopes: efficient estimation in multivariate linear regression," Biometrika, Biometrika Trust, vol. 99(3), pages 687-702.
    2. Schomaker, Michael & Heumann, Christian, 2014. "Model selection and model averaging after multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 758-770.
    3. David J. Olive, 2018. "Applications of hyperellipsoidal prediction regions," Statistical Papers, Springer, vol. 59(3), pages 913-931, September.
    4. Leeb, Hannes & Pötscher, Benedikt M., 2008. "Can One Estimate The Unconditional Distribution Of Post-Model-Selection Estimators?," Econometric Theory, Cambridge University Press, vol. 24(2), pages 338-376, April.
    5. Claeskens,Gerda & Hjort,Nils Lid, 2008. "Model Selection and Model Averaging," Cambridge Books, Cambridge University Press, number 9780521852258.
    6. Bradley Efron, 2014. "Estimation and Accuracy After Model Selection," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 991-1007, September.
    7. Luis Firinguetti & Gladys Bobadilla, 2011. "Asymptotic confidence intervals in ridge regression based on the Edgeworth expansion," Statistical Papers, Springer, vol. 52(2), pages 287-307, May.
    8. José A. F. Machado & Paulo Parente, 2005. "Bootstrap estimation of covariance matrices via the percentile method," Econometrics Journal, Royal Economic Society, vol. 8(1), pages 70-78, March.
    9. Michael Schomaker, 2012. "Shrinkage averaging estimation," Statistical Papers, Springer, vol. 53(4), pages 1015-1034, November.
    10. Ryan J. Tibshirani & Jonathan Taylor & Richard Lockhart & Robert Tibshirani, 2016. "Exact Post-Selection Inference for Sequential Regression Procedures," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(514), pages 600-620, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Tsung-Yeh Chou & Jaclyn B. Caccese & Yu-Lun Huang & Joseph J. Glutting & Thomas A. Buckley & Steven P. Broglio & Thomas W. McAllister & Michael A. McCrea & Paul F. Pasquina & Thomas W. Kaminski, 2022. "Effects of Pre-Collegiate Sport Specialization on Cognitive, Postural, and Psychological Functions: Findings from the NCAA-DoD CARE Consortium," IJERPH, MDPI, vol. 19(4), pages 1-12, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. David J. Olive, 2018. "Applications of hyperellipsoidal prediction regions," Statistical Papers, Springer, vol. 59(3), pages 913-931, September.
    2. Hai Wang & Xinjie Chen & Nancy Flournoy, 2016. "The focused information criterion for varying-coefficient partially linear measurement error models," Statistical Papers, Springer, vol. 57(1), pages 99-113, March.
    3. Lasanthi C. R. Pelawa Watagoda & David J. Olive, 2021. "Comparing six shrinkage estimators with large sample theory and asymptotically optimal prediction intervals," Statistical Papers, Springer, vol. 62(5), pages 2407-2431, October.
    4. John Copas & Shinto Eguchi, 2020. "Strong model dependence in statistical analysis: goodness of fit is not enough for model choice," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(2), pages 329-352, April.
    5. Michael Schomaker & Christian Heumann, 2020. "When and when not to use optimal model averaging," Statistical Papers, Springer, vol. 61(5), pages 2221-2240, October.
    6. Su, Jiun-Hua, 2021. "Model selection in utility-maximizing binary prediction," Journal of Econometrics, Elsevier, vol. 223(1), pages 96-124.
    7. Xinyu Zhang & Alan T. K. Wan & Sherry Z. Zhou, 2011. "Focused Information Criteria, Model Selection, and Model Averaging in a Tobit Model With a Nonzero Threshold," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 30(1), pages 132-142, June.
    8. Liu, Chu-An, 2015. "Distribution theory of the least squares averaging estimator," Journal of Econometrics, Elsevier, vol. 186(1), pages 142-159.
    9. Céline Cunen & Nils Lid Hjort, 2020. "Confidence Distributions for FIC Scores," Econometrics, MDPI, vol. 8(3), pages 1-28, July.
    10. Maur,Jean-Christophe & Nedeljkovic,Milan & Von Uexkull,Jan Erik, 2022. "FDI and Trade Outcomes at the Industry Level—A Data-Driven Approach," Policy Research Working Paper Series 9901, The World Bank.
    11. Ghosh, D. & Yuan, Z., 2009. "An improved model averaging scheme for logistic regression," Journal of Multivariate Analysis, Elsevier, vol. 100(8), pages 1670-1681, September.
    12. Liu, Chu-An, 2012. "A plug-in averaging estimator for regressions with heteroskedastic errors," MPRA Paper 41414, University Library of Munich, Germany.
    13. Farrell, Max H., 2015. "Robust inference on average treatment effects with possibly more covariates than observations," Journal of Econometrics, Elsevier, vol. 189(1), pages 1-23.
    14. Magnus, Jan R. & Wan, Alan T.K. & Zhang, Xinyu, 2011. "Weighted average least squares estimation with nonspherical disturbances and an application to the Hong Kong housing market," Computational Statistics & Data Analysis, Elsevier, vol. 55(3), pages 1331-1341, March.
    15. Aman Ullah & Huansha Wang, 2013. "Parametric and Nonparametric Frequentist Model Selection and Model Averaging," Econometrics, MDPI, vol. 1(2), pages 1-23, September.
    16. Ali Charkhi & Gerda Claeskens, 2018. "Asymptotic post-selection inference for the Akaike information criterion," Biometrika, Biometrika Trust, vol. 105(3), pages 645-664.
    17. Schomaker, Michael & Heumann, Christian, 2014. "Model selection and model averaging after multiple imputation," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 758-770.
    18. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2019. "Valid Post-Selection Inference in High-Dimensional Approximately Sparse Quantile Regression Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 749-758, April.
    19. Paulo M. D. C. Parente & Richard J. Smith, 2021. "Quasi‐maximum likelihood and the kernel block bootstrap for nonlinear dynamic models," Journal of Time Series Analysis, Wiley Blackwell, vol. 42(4), pages 377-405, July.
    20. Kitagawa, Toru & Muris, Chris, 2016. "Model averaging in semiparametric estimation of treatment effects," Journal of Econometrics, Elsevier, vol. 193(1), pages 271-289.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:62:y:2021:i:2:d:10.1007_s00362-019-01108-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.