IDEAS home Printed from https://ideas.repec.org/a/bla/jorssb/v70y2008i5p849-911.html
   My bibliography  Save this article

Sure independence screening for ultrahigh dimensional feature space

Author

Listed:
  • Jianqing Fan
  • Jinchi Lv

Abstract

Summary. Variable selection plays an important role in high dimensional statistical modelling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, accuracy of estimation and computational cost are two top concerns. Recently, Candes and Tao have proposed the Dantzig selector using L1‐regularization and showed that it achieves the ideal risk up to a logarithmic factor log (p). Their innovative procedure and remarkable result are challenged when the dimensionality is ultrahigh as the factor log (p) can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size. In a fairly general asymptotic framework, correlation learning is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, iterative sure independence screening is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be accomplished by a well‐developed method such as smoothly clipped absolute deviation, the Dantzig selector, lasso or adaptive lasso. The connections between these penalized least squares methods are also elucidated.

Suggested Citation

  • Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
  • Handle: RePEc:bla:jorssb:v:70:y:2008:i:5:p:849-911
    DOI: 10.1111/j.1467-9868.2008.00674.x
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/j.1467-9868.2008.00674.x
    Download Restriction: no

    File URL: https://libkey.io/10.1111/j.1467-9868.2008.00674.x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Peter Hall & J. S. Marron & Amnon Neeman, 2005. "Geometric representation of high dimension, low sample size data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(3), pages 427-444, June.
    3. Lukas Meier & Sara Van De Geer & Peter Bühlmann, 2008. "The group lasso for logistic regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 53-71, February.
    4. Lexin Li, 2007. "Sparse sufficient dimension reduction," Biometrika, Biometrika Trust, vol. 94(3), pages 603-613.
    5. Jianqing Fan & Jiancheng Jiang, 2007. "Rejoinder on: Nonparametric inference with generalized likelihood ratio tests," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 16(3), pages 471-478, December.
    6. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    7. Jianqing Fan & Jiancheng Jiang, 2007. "Nonparametric inference with generalized likelihood ratio tests," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 16(3), pages 409-444, December.
    8. Hansheng Wang & Guodong Li & Chih‐Ling Tsai, 2007. "Regression coefficient and autoregressive order shrinkage and selection via the lasso," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(1), pages 63-78, February.
    9. Hans, Chris & Dobra, Adrian & West, Mike, 2007. "Shotgun Stochastic Search for," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 507-516, June.
    10. Wang, Lifeng & Shen, Xiaotong, 2007. "On L1-Norm Multiclass Support Vector Machines: Methodology and Theory," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 583-594, June.
    11. Liang, Feng & Paulo, Rui & Molina, German & Clyde, Merlise A. & Berger, Jim O., 2008. "Mixtures of g Priors for Bayesian Variable Selection," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 410-423, March.
    12. Wang, Hansheng & Leng, Chenlei, 2007. "Unified LASSO Estimation by Least Squares Approximation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1039-1048, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.
    2. Lenka Zbonakova & Wolfgang Karl Härdle & Weining Wang, 2016. "Time Varying Quantile Lasso," SFB 649 Discussion Papers SFB649DP2016-047, Sonderforschungsbereich 649, Humboldt University, Berlin, Germany.
    3. Hansheng Wang & Bo Li & Chenlei Leng, 2009. "Shrinkage tuning parameter selection with a diverging number of parameters," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 671-683, June.
    4. Alessandro Gregorio & Francesco Iafrate, 2021. "Regularized bridge-type estimation with multiple penalties," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(5), pages 921-951, October.
    5. Zbonakova, L. & Härdle, W.K. & Wang, W., 2016. "Time Varying Quantile Lasso," Working Papers 16/07, Department of Economics, City University London.
    6. Leng, Chenlei & Li, Bo, 2010. "Least squares approximation with a diverging number of parameters," Statistics & Probability Letters, Elsevier, vol. 80(3-4), pages 254-261, February.
    7. Li-Ping Zhu & Lin-Yi Qian & Jin-Guan Lin, 2011. "Variable selection in a class of single-index models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 63(6), pages 1277-1293, December.
    8. Nicolai Meinshausen & Peter Bühlmann, 2010. "Stability selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(4), pages 417-473, September.
    9. Zhang, Hong-Fan, 2021. "Minimum Average Variance Estimation with group Lasso for the multivariate response Central Mean Subspace," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    10. Stefano Maria IACUS & Alessandro DE GREGORIO, 2010. "Adaptive LASSO-type estimation for ergodic diffusion processes," Departmental Working Papers 2010-13, Department of Economics, Management and Quantitative Methods at Università degli Studi di Milano.
    11. Tian, Shaonan & Yu, Yan & Guo, Hui, 2015. "Variable selection and corporate bankruptcy forecasts," Journal of Banking & Finance, Elsevier, vol. 52(C), pages 89-100.
    12. Pötscher, Benedikt M., 2007. "Confidence Sets Based on Sparse Estimators Are Necessarily Large," MPRA Paper 5677, University Library of Munich, Germany.
    13. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    14. Jun Zhu & Hsin‐Cheng Huang & Perla E. Reyes, 2010. "On selection of spatial linear models for lattice data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(3), pages 389-402, June.
    15. Umberto Amato & Anestis Antoniadis & Italia De Feis & Irene Gijbels, 2021. "Penalised robust estimators for sparse and high-dimensional linear models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 1-48, March.
    16. Camila Epprecht & Dominique Guegan & Álvaro Veiga & Joel Correa da Rosa, 2017. "Variable selection and forecasting via automated methods for linear models: LASSO/adaLASSO and Autometrics," Post-Print halshs-00917797, HAL.
    17. Caner, Mehmet & Fan, Qingliang, 2015. "Hybrid generalized empirical likelihood estimators: Instrument selection with adaptive lasso," Journal of Econometrics, Elsevier, vol. 187(1), pages 256-274.
    18. Kai Yang & Xue Ding & Xiaohui Yuan, 2022. "Bayesian empirical likelihood inference and order shrinkage for autoregressive models," Statistical Papers, Springer, vol. 63(1), pages 97-121, February.
    19. Weihua Zhao & Riquan Zhang & Yazhao Lv & Jicai Liu, 2017. "Quantile regression and variable selection of single-index coefficient model," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(4), pages 761-789, August.
    20. Anders Bredahl Kock, 2012. "On the Oracle Property of the Adaptive Lasso in Stationary and Nonstationary Autoregressions," CREATES Research Papers 2012-05, Department of Economics and Business Economics, Aarhus University.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:70:y:2008:i:5:p:849-911. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.