IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v35y2020i3d10.1007_s00180-020-00963-7.html
   My bibliography  Save this article

Ultra-high dimensional variable screening via Gram–Schmidt orthogonalization

Author

Listed:
  • Huiwen Wang

    (Beihang University
    Beijing Advanced Innovation Center for Big Data and Brain Computing)

  • Ruiping Liu

    (Beihang University)

  • Shanshan Wang

    (Beihang University
    Beijing Key Laboratory of Emergence Support Simulation Technologies for City Operations)

  • Zhichao Wang

    (Beihang University)

  • Gilbert Saporta

    (Conservatoire National des Arts et Métiers)

Abstract

Independence screening procedure plays a vital role in variable selection when the number of variables is massive. However, high dimensionality of the data may bring in many challenges, such as multicollinearity or high correlation (possibly spurious) between the covariates, which results in marginal correlation being unreliable as a measure of association between the covariates and the response. We propose a novel and simple screening procedure called Gram–Schmidt screening (GSS) by integrating the classical Gram–Schmidt orthogonalization and the sure independence screening technique, which takes into account high correlations between the covariates in a data-driven way. GSS could successfully discriminate between the relevant and the irrelevant variables to achieve a high true positive rate without including many irrelevant and redundant variables, which offers a new perspective for screening method when the covariates are highly correlated. The practical performance of GSS was shown by comparative simulation studies and analysis of two real datasets.

Suggested Citation

  • Huiwen Wang & Ruiping Liu & Shanshan Wang & Zhichao Wang & Gilbert Saporta, 2020. "Ultra-high dimensional variable screening via Gram–Schmidt orthogonalization," Computational Statistics, Springer, vol. 35(3), pages 1153-1170, September.
  • Handle: RePEc:spr:compst:v:35:y:2020:i:3:d:10.1007_s00180-020-00963-7
    DOI: 10.1007/s00180-020-00963-7
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-020-00963-7
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-020-00963-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jiahua Chen & Zehua Chen, 2008. "Extended Bayesian information criteria for model selection with large model spaces," Biometrika, Biometrika Trust, vol. 95(3), pages 759-771.
    2. Wang, Shangshan & Xiang, Liming, 2017. "Two-layer EM algorithm for ALD mixture regression models: A new solution to composite quantile regression," Computational Statistics & Data Analysis, Elsevier, vol. 115(C), pages 136-154.
    3. Haeran Cho & Piotr Fryzlewicz, 2012. "High dimensional variable selection via tilting," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 74(3), pages 593-622, June.
    4. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    5. Hansheng Wang & Bo Li & Chenlei Leng, 2009. "Shrinkage tuning parameter selection with a diverging number of parameters," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 671-683, June.
    6. William D. Mangold & Luann Bean & Douglas Adams, 2003. "The Impact of Intercollegiate Athletics on Graduation Rates among Major NCAA Division I Universities," The Journal of Higher Education, Taylor & Francis Journals, vol. 74(5), pages 540-562, September.
    7. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    8. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    9. Wang, Hansheng, 2009. "Forward Regression for Ultra-High Dimensional Variable Screening," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1512-1524.
    10. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiangyu Wang & Chenlei Leng, 2016. "High dimensional ordinary least squares projection for screening variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(3), pages 589-611, June.
    2. Dai, Linlin & Chen, Kani & Sun, Zhihua & Liu, Zhenqiu & Li, Gang, 2018. "Broken adaptive ridge regression and its asymptotic properties," Journal of Multivariate Analysis, Elsevier, vol. 168(C), pages 334-351.
    3. Ruggieri, Eric & Lawrence, Charles E., 2012. "On efficient calculations for Bayesian variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1319-1332.
    4. Li, Xingxiang & Cheng, Guosheng & Wang, Liming & Lai, Peng & Song, Fengli, 2017. "Ultrahigh dimensional feature screening via projection," Computational Statistics & Data Analysis, Elsevier, vol. 114(C), pages 88-104.
    5. Chen Xu & Jiahua Chen, 2014. "The Sparse MLE for Ultrahigh-Dimensional Feature Screening," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1257-1269, September.
    6. Wei Sun & Lexin Li, 2012. "Multiple Loci Mapping via Model-free Variable Selection," Biometrics, The International Biometric Society, vol. 68(1), pages 12-22, March.
    7. Wang, Tao & Zhu, Lixing, 2011. "Consistent tuning parameter selection in high dimensional sparse linear regression," Journal of Multivariate Analysis, Elsevier, vol. 102(7), pages 1141-1151, August.
    8. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    9. Zhang, Ting & Wang, Lei, 2020. "Smoothed empirical likelihood inference and variable selection for quantile regression with nonignorable missing response," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    10. Zhao, Bangxin & Liu, Xin & He, Wenqing & Yi, Grace Y., 2021. "Dynamic tilted current correlation for high dimensional variable screening," Journal of Multivariate Analysis, Elsevier, vol. 182(C).
    11. Zhang, Shucong & Zhou, Yong, 2018. "Variable screening for ultrahigh dimensional heterogeneous data via conditional quantile correlations," Journal of Multivariate Analysis, Elsevier, vol. 165(C), pages 1-13.
    12. Liming Wang & Xingxiang Li & Xiaoqing Wang & Peng Lai, 2022. "Unified mean-variance feature screening for ultrahigh-dimensional regression," Computational Statistics, Springer, vol. 37(4), pages 1887-1918, September.
    13. Randy C. S. Lai & Jan Hannig & Thomas C. M. Lee, 2015. "Generalized Fiducial Inference for Ultrahigh-Dimensional Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 760-772, June.
    14. Christis Katsouris, 2023. "High Dimensional Time Series Regression Models: Applications to Statistical Learning Methods," Papers 2308.16192, arXiv.org.
    15. Sweata Sen & Damitri Kundu & Kiranmoy Das, 2023. "Variable selection for categorical response: a comparative study," Computational Statistics, Springer, vol. 38(2), pages 809-826, June.
    16. She, Yiyuan, 2012. "An iterative algorithm for fitting nonconvex penalized generalized linear models with grouped predictors," Computational Statistics & Data Analysis, Elsevier, vol. 56(10), pages 2976-2990.
    17. Wang, Jia & Cai, Xizhen & Li, Runze, 2021. "Variable selection for partially linear models via Bayesian subset modeling with diffusing prior," Journal of Multivariate Analysis, Elsevier, vol. 183(C).
    18. Howard D. Bondell & Brian J. Reich, 2012. "Consistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(500), pages 1610-1624, December.
    19. Zhihua Sun & Yi Liu & Kani Chen & Gang Li, 2022. "Broken adaptive ridge regression for right-censored survival data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(1), pages 69-91, February.
    20. Jian Huang & Yuling Jiao & Lican Kang & Jin Liu & Yanyan Liu & Xiliang Lu, 2022. "GSDAR: a fast Newton algorithm for $$\ell _0$$ ℓ 0 regularized generalized linear models with statistical guarantee," Computational Statistics, Springer, vol. 37(1), pages 507-533, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:35:y:2020:i:3:d:10.1007_s00180-020-00963-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.