IDEAS home Printed from https://ideas.repec.org/p/azt/cemmap/35-15.html
   My bibliography  Save this paper

Variable selection and estimation in high-dimensional models

Author

Listed:
  • Joel L. Horowitz

Abstract

Models with high-dimensional covariates arise frequently in economics and other fields. Often, only a few covariates have important effects on the dependent variable. When this happens, the model is said to be sparse. In applications, however, it is not known which covariates are important and which are not. This paper reviews methods for discriminating between important and unimportant covariates with particular attention given to methods that discriminate correctly with probability approaching 1 as the sample size increases. Methods are available for a wide variety of linear, nonlinear, semiparametric, and nonparametric models. The performance of some of these methods in finite samples is illustrated through Monte Carlo simulations and an empirical example.

Suggested Citation

  • Joel L. Horowitz, 2015. "Variable selection and estimation in high-dimensional models," CeMMAP working papers 35/15, Institute for Fiscal Studies.
  • Handle: RePEc:azt:cemmap:35/15
    DOI: 10.1920/wp.cem.2015.3515
    as

    Download full text from publisher

    File URL: https://www.cemmap.ac.uk/wp-content/uploads/2020/08/CWP3515.pdf
    Download Restriction: no

    File URL: https://libkey.io/10.1920/wp.cem.2015.3515?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hansheng Wang & Bo Li & Chenlei Leng, 2009. "Shrinkage tuning parameter selection with a diverging number of parameters," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 671-683, June.
    2. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    3. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    4. Robinson, Peter M, 1988. "Root- N-Consistent Semiparametric Regression," Econometrica, Econometric Society, vol. 56(4), pages 931-954, July.
    5. Kim, Yongdai & Choi, Hosik & Oh, Hee-Seok, 2008. "Smoothly Clipped Absolute Deviation on High Dimensions," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1665-1673.
    6. Wang, Tao & Xu, Pei-Rong & Zhu, Li-Xing, 2012. "Non-convex penalized estimation in high-dimensional models with single-index structure," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 221-235.
    7. Powell, James L & Stock, James H & Stoker, Thomas M, 1989. "Semiparametric Estimation of Index Coefficients," Econometrica, Econometric Society, vol. 57(6), pages 1403-1430, November.
    8. Efang Kong & Yingcun Xia, 2007. "Variable selection for the single‐index model," Biometrika, Biometrika Trust, vol. 94(1), pages 217-229.
    9. Jing Wang & Lijian Yang, 2009. "Efficient and fast spline-backfitted kernel smoothing of additive models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 61(3), pages 663-690, September.
    10. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    11. Hansheng Wang & Runze Li & Chih-Ling Tsai, 2007. "Tuning parameter selectors for the smoothly clipped absolute deviation method," Biometrika, Biometrika Trust, vol. 94(3), pages 553-568.
    12. Antoniadis A. & Fan J., 2001. "Regularization of Wavelet Approximations," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 939-967, September.
    13. Sala-i-Martin, Xavier, 1997. "I Just Ran Two Million Regressions," American Economic Review, American Economic Association, vol. 87(2), pages 178-183, May.
    14. Heng Lian, 2012. "Variable selection in high-dimensional partly linear additive models," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(4), pages 825-839, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Joel L. Horowitz, 2015. "Variable selection and estimation in high-dimensional models," Canadian Journal of Economics, Canadian Economics Association, vol. 48(2), pages 389-407, May.
    2. Joel L. Horowitz, 2015. "Variable selection and estimation in high-dimensional models," CeMMAP working papers CWP35/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    3. Joel L. Horowitz, 2015. "Variable selection and estimation in high‐dimensional models," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 48(2), pages 389-407, May.
    4. Gaorong Li & Liugen Xue & Heng Lian, 2012. "SCAD-penalised generalised additive models with non-polynomial dimensionality," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(3), pages 681-697.
    5. Tan, Xin Lu, 2019. "Optimal estimation of slope vector in high-dimensional linear transformation models," Journal of Multivariate Analysis, Elsevier, vol. 169(C), pages 179-204.
    6. Guang Cheng & Hao Zhang & Zuofeng Shang, 2015. "Sparse and efficient estimation for partial spline models with increasing dimension," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 67(1), pages 93-127, February.
    7. Lian, Heng & Li, Jianbo & Tang, Xingyu, 2014. "SCAD-penalized regression in additive partially linear proportional hazards models with an ultra-high-dimensional linear part," Journal of Multivariate Analysis, Elsevier, vol. 125(C), pages 50-64.
    8. Yanxin Wang & Qibin Fan & Li Zhu, 2018. "Variable selection and estimation using a continuous approximation to the $$L_0$$ L 0 penalty," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 70(1), pages 191-214, February.
    9. Jianbo Li & Yuan Li & Riquan Zhang, 2017. "B spline variable selection for the single index models," Statistical Papers, Springer, vol. 58(3), pages 691-706, September.
    10. Guo-Liang Tian & Mingqiu Wang & Lixin Song, 2014. "Variable selection in the high-dimensional continuous generalized linear model with current status data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(3), pages 467-483, March.
    11. Sunghoon Kwon & Jeongyoun Ahn & Woncheol Jang & Sangin Lee & Yongdai Kim, 2017. "A doubly sparse approach for group variable selection," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(5), pages 997-1025, October.
    12. Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
    13. Garcia-Magariños Manuel & Antoniadis Anestis & Cao Ricardo & González-Manteiga Wenceslao, 2010. "Lasso Logistic Regression, GSoft and the Cyclic Coordinate Descent Algorithm: Application to Gene Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-30, August.
    14. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    15. Li, Xinyi & Wang, Li & Nettleton, Dan, 2019. "Sparse model identification and learning for ultra-high-dimensional additive partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 204-228.
    16. Gerda Claeskens, 2012. "Focused estimation and model averaging with penalization methods: an overview," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 66(3), pages 272-287, August.
    17. Fei Jin & Lung-fei Lee, 2018. "Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices," Econometrics, MDPI, vol. 6(1), pages 1-24, February.
    18. Lan Wang & Yichao Wu & Runze Li, 2012. "Quantile Regression for Analyzing Heterogeneity in Ultra-High Dimension," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 214-222, March.
    19. Yongjin Li & Qingzhao Zhang & Qihua Wang, 2017. "Penalized estimation equation for an extended single-index model," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(1), pages 169-187, February.
    20. Canhong Wen & Xueqin Wang & Shaoli Wang, 2015. "Laplace Error Penalty-based Variable Selection in High Dimension," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 42(3), pages 685-700, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:azt:cemmap:35/15. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Dermot Watson (email available below). General contact details of provider: https://edirc.repec.org/data/ifsssuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.