IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v166y2018icp17-31.html
   My bibliography  Save this article

Efficient test-based variable selection for high-dimensional linear models

Author

Listed:
  • Gong, Siliang
  • Zhang, Kai
  • Liu, Yufeng

Abstract

Variable selection plays a fundamental role in high-dimensional data analysis. Various methods have been developed for variable selection in recent years. Well-known examples are forward stepwise regression (FSR) and least angle regression (LARS), among others. These methods typically add variables into the model one by one. For such selection procedures, it is crucial to find a stopping criterion that controls model complexity. One of the most commonly used techniques to this end is cross-validation (CV) which, in spite of its popularity, has two major drawbacks: expensive computational cost and lack of statistical interpretation. To overcome these drawbacks, we introduce a flexible and efficient test-based variable selection approach that can be incorporated into any sequential selection procedure. The test, which is on the overall signal in the remaining inactive variables, is based on the maximal absolute partial correlation between the inactive variables and the response given active variables. We develop the asymptotic null distribution of the proposed test statistic as the dimension tends to infinity uniformly in the sample size. We also show that the test is consistent. With this test, at each step of the selection, a new variable is included if and only if the p-value is below some pre-defined level. Numerical studies show that the proposed method delivers very competitive performance in terms of variable selection accuracy and computational complexity compared to CV.

Suggested Citation

  • Gong, Siliang & Zhang, Kai & Liu, Yufeng, 2018. "Efficient test-based variable selection for high-dimensional linear models," Journal of Multivariate Analysis, Elsevier, vol. 166(C), pages 17-31.
  • Handle: RePEc:eee:jmvana:v:166:y:2018:i:c:p:17-31
    DOI: 10.1016/j.jmva.2018.01.003
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047259X17302749
    Download Restriction: Full text for ScienceDirect subscribers only

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jelle J. Goeman & Sara A. Van De Geer & Hans C. Van Houwelingen, 2006. "Testing against a high dimensional alternative," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(3), pages 477-493, June.
    2. Zhong, Ping-Shou & Chen, Song Xi, 2011. "Tests for High-Dimensional Regression Coefficients With Factorial Designs," Journal of the American Statistical Association, American Statistical Association, vol. 106(493), pages 260-274.
    3. Ehud Aharoni & Saharon Rosset, 2014. "Generalized α-investing: definitions, optimality results and application to public databases," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(4), pages 771-794, September.
    4. Max Grazier G'Sell & Stefan Wager & Alexandra Chouldechova & Robert Tibshirani, 2016. "Sequential selection procedures and false discovery rate control," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(2), pages 423-444, March.
    5. Runze Li & Wei Zhong & Liping Zhu, 2012. "Feature Screening via Distance Correlation Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1129-1139, September.
    6. James, Barry & James, Kang & Qi, Yongcheng, 2007. "Limit distribution of the sum and maximum from multivariate Gaussian sequences," Journal of Multivariate Analysis, Elsevier, vol. 98(3), pages 517-532, March.
    7. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    Full references (including those not matched with items on IDEAS)

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:166:y:2018:i:c:p:17-31. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Dana Niculescu). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.