Sure independence screening for ultrahigh dimensional feature space
AbstractVariable selection plays an important role in high dimensional statistical modelling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality "p", accuracy of estimation and computational cost are two top concerns. Recently, Candes and Tao have proposed the Dantzig selector using "L" 1-regularization and showed that it achieves the ideal risk up to a logarithmic factor log ("p"). Their innovative procedure and remarkable result are challenged when the dimensionality is ultrahigh as the factor log ("p") can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size. In a fairly general asymptotic framework, correlation learning is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, iterative sure independence screening is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be accomplished by a well-developed method such as smoothly clipped absolute deviation, the Dantzig selector, lasso or adaptive lasso. The connections between these penalized least squares methods are also elucidated. Copyright (c) 2008 Royal Statistical Society.
Download InfoIf you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.
As the access to this document is restricted, you may want to look for a different version under "Related research" (further below) or search for a different version of it.
Bibliographic InfoArticle provided by Royal Statistical Society in its journal Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Volume (Year): 70 (2008)
Issue (Month): 5 ()
Contact details of provider:
Postal: 12 Errol Street, London EC1Y 8LX, United Kingdom
Web page: http://www.blackwellpublishing.com/journal.asp?ref=1369-7412
More information through EDIRC
You can help add them by filling out this form.
CitEc Project, subscribe to its RSS feed for this item.
- Wang, Cheng & Cao, Longbing & Miao, Baiqi, 2013. "Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 66(C), pages 140-149.
- Malene Kallestrup-Lamb & Anders Bredahl Kock & Johannes Tang Kristensen, 2013. "Lassoing the Determinants of Retirement," CREATES Research Papers 2013-21, School of Economics and Management, University of Aarhus.
- Peixin Zhao & Liugen Xue, 2011. "Variable selection for varying coefficient models with measurement errors," Metrika, Springer, vol. 74(2), pages 231-245, September.
- Ruggieri, Eric & Lawrence, Charles E., 2012. "On efficient calculations for Bayesian variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1319-1332.
- Anders Bredahl Kock & Laurent A.F. Callot, 2012. "Oracle Inequalities for High Dimensional Vector Autoregressions," CREATES Research Papers 2012-16, School of Economics and Management, University of Aarhus.
- Peixin Zhao & Liugen Xue, 2012. "Variable selection in semiparametric regression analysis for longitudinal data," Annals of the Institute of Statistical Mathematics, Springer, vol. 64(1), pages 213-231, February.
- Dazard, Jean-Eudes & Sunil Rao, J., 2012. "Joint adaptive mean–variance regularization and variance stabilization of high dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 56(7), pages 2317-2333.
- Wu, Billy & Yao, Qiwei & Zhu, Shiwu, 2013. "Estimation in the presence of many nuisance parameters: Composite likelihood and plug-in likelihood," Stochastic Processes and their Applications, Elsevier, vol. 123(7), pages 2877-2898.
- Du, Pang & Cheng, Guang & Liang, Hua, 2012. "Semiparametric regression models with additive nonparametric components and high dimensional parametric components," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 2006-2017.
- Ivan Savin, 2010.
"A comparative study of the Lasso-type and heuristic model selection methods,"
- Ivan Savin, 2013. "A Comparative Study of the Lasso-type and Heuristic Model Selection Methods," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), Justus-Liebig University Giessen, Department of Statistics and Economics, vol. 233(4), pages 526-549, July.
- Song Song, 2011. "Dynamic Large Spatial Covariance Matrix Estimation in Application to Semiparametric Model Construction via Variable Clustering: the SCE approach," Papers 1106.3921, arXiv.org, revised Jun 2011.
- Park, Junyong, 2009. "Independent rule in classification of multivariate binary data," Journal of Multivariate Analysis, Elsevier, vol. 100(10), pages 2270-2286, November.
- Anders Bredahl Kock, 2012. "On the Oracle Property of the Adaptive Lasso in Stationary and Nonstationary Autoregressions," CREATES Research Papers 2012-05, School of Economics and Management, University of Aarhus.
- Fan, Jianqing & Liao, Yuan, 2012. "Endogeneity in ultrahigh dimension," MPRA Paper 38698, University Library of Munich, Germany.
- Li, Gaorong & Lin, Lu & Zhu, Lixing, 2012. "Empirical likelihood for a varying coefficient partially linear model with diverging number of parameters," Journal of Multivariate Analysis, Elsevier, vol. 105(1), pages 85-111.
- Mehmet Caner & Anders Bredahl Kock, 2013. "Oracle Inequalities for Convex Loss Functions with Non-Linear Targets," CREATES Research Papers 2013-51, School of Economics and Management, University of Aarhus.
- Alexandre Belloni & Victor Chernozhukov & Lie Wang, 2013. "Pivotal estimation via square-root lasso in nonparametric regression," CeMMAP working papers CWP62/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Anders Bredahl Kock, 2013. "Oracle inequalities for high-dimensional panel data models," CREATES Research Papers 2013-20, School of Economics and Management, University of Aarhus.
- Zhao, Sihai Dave & Li, Yi, 2012. "Principled sure independence screening for Cox models with ultra-high-dimensional covariates," Journal of Multivariate Analysis, Elsevier, vol. 105(1), pages 397-411.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Wiley-Blackwell Digital Licensing) or (Christopher F. Baum).
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If references are entirely missing, you can add them using this form.
If the full references list an item that is present in RePEc, but the system did not link to it, you can help with this form.
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your profile, as there may be some citations waiting for confirmation.
Please note that corrections may take a couple of weeks to filter through the various RePEc services.