IDEAS home Printed from https://ideas.repec.org/a/bla/jorssb/v82y2020i3p773-795.html
   My bibliography  Save this article

Goodness‐of‐fit testing in high dimensional generalized linear models

Author

Listed:
  • Jana Janková
  • Rajen D. Shah
  • Peter Bühlmann
  • Richard J. Samworth

Abstract

We propose a family of tests to assess the goodness of fit of a high dimensional generalized linear model. Our framework is flexible and may be used to construct an omnibus test or directed against testing specific non‐linearities and interaction effects, or for testing the significance of groups of variables. The methodology is based on extracting left‐over signal in the residuals from an initial fit of a generalized linear model. This can be achieved by predicting this signal from the residuals by using modern powerful regression or machine learning methods such as random forests or boosted trees. Under the null hypothesis that the generalized linear model is correct, no signal is left in the residuals and our test statistic has a Gaussian limiting distribution, translating to asymptotic control of type I error. Under a local alternative, we establish a guarantee on the power of the test. We illustrate the effectiveness of the methodology on simulated and real data examples by testing goodness of fit in logistic regression models. Software implementing the methodology is available in the R package GRPtests.

Suggested Citation

  • Jana Janková & Rajen D. Shah & Peter Bühlmann & Richard J. Samworth, 2020. "Goodness‐of‐fit testing in high dimensional generalized linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(3), pages 773-795, July.
  • Handle: RePEc:bla:jorssb:v:82:y:2020:i:3:p:773-795
    DOI: 10.1111/rssb.12371
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssb.12371
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssb.12371?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    2. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    3. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "Inference on Treatment Effects after Selection among High-Dimensional Controlsâ€," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 81(2), pages 608-650.
    4. Victor Chernozhukov & Denis Chetverikov & Kengo Kato, 2012. "Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors," Papers 1212.6906, arXiv.org, revised Jan 2018.
    5. A. Belloni & V. Chernozhukov & L. Wang, 2011. "Square-root lasso: pivotal recovery of sparse signals via conic programming," Biometrika, Biometrika Trust, vol. 98(4), pages 791-806.
    6. Rajen D. Shah & Richard J. Samworth, 2013. "Variable selection with error control: another look at stability selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(1), pages 55-80, January.
    7. Tingni Sun & Cun-Hui Zhang, 2012. "Scaled sparse linear regression," Biometrika, Biometrika Trust, vol. 99(4), pages 879-898.
    8. Rajen D. Shah & Peter Bühlmann, 2018. "Goodness‐of‐fit tests for high dimensional linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(1), pages 113-135, January.
    9. Cun-Hui Zhang & Stephanie S. Zhang, 2014. "Confidence intervals for low dimensional parameters in high dimensional linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 217-242, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Choi, Woohyun & Kim, Ilmun, 2023. "Averaging p-values under exchangeability," Statistics & Probability Letters, Elsevier, vol. 194(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    2. Susan Athey & Guido W. Imbens & Stefan Wager, 2018. "Approximate residual balancing: debiased inference of average treatment effects in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 597-623, September.
    3. Philipp Bach & Victor Chernozhukov & Malte S. Kurz & Martin Spindler & Sven Klaassen, 2021. "DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R," Papers 2103.09603, arXiv.org, revised Feb 2024.
    4. Adamek, Robert & Smeekes, Stephan & Wilms, Ines, 2023. "Lasso inference for high-dimensional time series," Journal of Econometrics, Elsevier, vol. 235(2), pages 1114-1143.
    5. Achim Ahrens & Christian B. Hansen & Mark E. Schaffer, 2020. "lassopack: Model selection and prediction with regularized regression in Stata," Stata Journal, StataCorp LP, vol. 20(1), pages 176-235, March.
    6. Claude Renaux & Laura Buzdugan & Markus Kalisch & Peter Bühlmann, 2020. "Hierarchical inference for genome-wide association studies: a view on methodology with software," Computational Statistics, Springer, vol. 35(1), pages 1-40, March.
    7. Timothy B. Armstrong & Michal Kolesár & Soonwoo Kwon, 2020. "Bias-Aware Inference in Regularized Regression Models," Working Papers 2020-2, Princeton University. Economics Department..
    8. Kaspar Wuthrich & Ying Zhu, 2019. "Omitted variable bias of Lasso-based inference methods: A finite sample analysis," Papers 1903.08704, arXiv.org, revised Sep 2021.
    9. Alexandre Belloni & Mingli Chen & Victor Chernozhukov, 2016. "Quantile Graphical Models: Prediction and Conditional Independence with Applications to Systemic Risk," Papers 1607.00286, arXiv.org, revised Oct 2019.
    10. Victor Chernozhukov & Whitney K. Newey & Rahul Singh, 2022. "Automatic Debiased Machine Learning of Causal and Structural Effects," Econometrica, Econometric Society, vol. 90(3), pages 967-1027, May.
    11. Tianxi Cai & T. Tony Cai & Zijian Guo, 2021. "Optimal statistical inference for individualized treatment effects in high‐dimensional models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(4), pages 669-719, September.
    12. Adel Javanmard & Jason D. Lee, 2020. "A flexible framework for hypothesis testing in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(3), pages 685-718, July.
    13. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2019. "Valid Post-Selection Inference in High-Dimensional Approximately Sparse Quantile Regression Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 749-758, April.
    14. Michael C Knaus & Michael Lechner & Anthony Strittmatter, 2021. "Machine learning estimation of heterogeneous causal effects: Empirical Monte Carlo evidence," The Econometrics Journal, Royal Economic Society, vol. 24(1), pages 134-161.
    15. Hansen, Christian & Liao, Yuan, 2019. "The Factor-Lasso And K-Step Bootstrap Approach For Inference In High-Dimensional Economic Applications," Econometric Theory, Cambridge University Press, vol. 35(3), pages 465-509, June.
    16. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2013. "Uniform post selection inference for LAD regression and other z-estimation problems," CeMMAP working papers CWP74/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    17. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey, 2016. "Double machine learning for treatment and causal parameters," CeMMAP working papers 49/16, Institute for Fiscal Studies.
    18. Yumou Qiu & Jing Tao & Xiao‐Hua Zhou, 2021. "Inference of heterogeneous treatment effects using observational data with high‐dimensional covariates," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(5), pages 1016-1043, November.
    19. Agboola, Oluwagbenga David & Yu, Han, 2023. "Neighborhood-based cross fitting approach to treatment effects with high-dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 186(C).
    20. Victor Chernozhukov & Wolfgang Härdle & Chen Huang & Weining Wang, 2018. "LASSO-driven inference in time and space," CeMMAP working papers CWP36/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:82:y:2020:i:3:p:773-795. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.