IDEAS home Printed from https://ideas.repec.org/a/bla/jorssb/v72y2010i1p3-25.html
   My bibliography  Save this article

Sparse partial least squares regression for simultaneous dimension reduction and variable selection

Author

Listed:
  • Hyonho Chun
  • Sündüz Keleş

Abstract

Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large "p" and small "n" paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data. Copyright Journal compilation (c) 2010 Royal Statistical Society.

Suggested Citation

  • Hyonho Chun & Sündüz Keleş, 2010. "Sparse partial least squares regression for simultaneous dimension reduction and variable selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(1), pages 3-25.
  • Handle: RePEc:bla:jorssb:v:72:y:2010:i:1:p:3-25
    as

    Download full text from publisher

    File URL: http://www.blackwell-synergy.com/doi/abs/10.1111/j.1467-9868.2009.00723.x
    File Function: link to full text
    Download Restriction: Access to full text is restricted to subscribers.

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Prasad Naik & Chih-Ling Tsai, 2000. "Partial least squares estimator for single-index models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(4), pages 763-771.
    2. Inge S. Helland, 2000. "Model Reduction for Prediction in Regression Models," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 27(1), pages 1-20.
    3. Neil A. Butler & Michael C. Denham, 2000. "The peculiar shrinkage properties of partial least squares regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(3), pages 585-593.
    4. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768.
    5. Bair, Eric & Hastie, Trevor & Paul, Debashis & Tibshirani, Robert, 2006. "Prediction by Supervised Principal Components," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 119-137, March.
    6. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Christian Gayer & Alessandro Girardi & Andreas Reuter, 2016. "Replacing Judgment by Statistics: Constructing Consumer Confidence Indicators on the basis of Data-driven Techniques. The Case of the Euro Area," Working Papers LuissLab 16125, Dipartimento di Economia e Finanza, LUISS Guido Carli.
    2. repec:eee:csdana:v:112:y:2017:i:c:p:242-256 is not listed on IDEAS
    3. Lee Woojoo & Lee Donghwan & Lee Youngjo & Pawitan Yudi, 2011. "Sparse Canonical Covariance Analysis for High-throughput Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-24, July.
    4. Shin, Seung Jun & Artemiou, Andreas, 2017. "Penalized principal logistic regression for sparse sufficient dimension reduction," Computational Statistics & Data Analysis, Elsevier, vol. 111(C), pages 48-58.
    5. Groen, Jan J.J. & Kapetanios, George, 2016. "Revisiting useful approaches to data-rich macroeconomic forecasting," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 221-239.
    6. Cubadda, Gianluca & Guardabascio, Barbara, 2012. "A medium-N approach to macroeconomic forecasting," Economic Modelling, Elsevier, vol. 29(4), pages 1099-1105.
    7. Julieta Fuentes & Pilar Poncela & Julio Rodríguez, 2015. "Sparse Partial Least Squares in Time Series for Macroeconomic Forecasting," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 30(4), pages 576-595, June.
    8. Tommaso Proietti, 2016. "On the Selection of Common Factors for Macroeconomic Forecasting," Advances in Econometrics,in: Dynamic Factor Models, volume 35, pages 593-628 Emerald Publishing Ltd.
    9. repec:eee:jmvana:v:157:y:2017:i:c:p:14-28 is not listed on IDEAS
    10. Hayashi Takeshi, 2012. "Variational Bayes Procedure for Effective Classification of Tumor Type with Microarray Gene Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(5), pages 1-21, October.
    11. Luo, Ruiyan & Qi, Xin, 2015. "Sparse wavelet regression with multiple predictive curves," Journal of Multivariate Analysis, Elsevier, vol. 134(C), pages 33-49.
    12. R. D. Cook & I. S. Helland & Z. Su, 2013. "Envelopes and partial least squares regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(5), pages 851-877, November.
    13. Jasmit Shah & Somnath Datta & Susmita Datta, 2014. "A multi-loss super regression learner (MSRL) with application to survival prediction using proteomics," Computational Statistics, Springer, vol. 29(6), pages 1749-1767, December.
    14. Kawano, Shuichi & Fujisawa, Hironori & Takada, Toyoyuki & Shiroishi, Toshihiko, 2015. "Sparse principal component regression with adaptive loading," Computational Statistics & Data Analysis, Elsevier, vol. 89(C), pages 192-203.
    15. Luo, Ruiyan & Qi, Xin, 2017. "Signal extraction approach for sparse multivariate response regression," Journal of Multivariate Analysis, Elsevier, vol. 153(C), pages 83-97.
    16. Zhang Yuping & Tibshirani Robert J. & Davis Ronald W., 2010. "Predicting Patient Survival from Longitudinal Gene Expression," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-23, November.
    17. Mingkun Chen & Evelyne Vigneau, 2016. "Supervised clustering of variables," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(1), pages 85-101, March.
    18. repec:eee:stapro:v:127:y:2017:i:c:p:173-177 is not listed on IDEAS
    19. Fuentes, Julieta & Poncela, Pilar & Rodríguez, Julio, 2014. "Selecting and combining experts from survey forecasts," DES - Working Papers. Statistics and Econometrics. WS ws140905, Universidad Carlos III de Madrid. Departamento de Estadística.
    20. Kapetanios, G & Price, SG & Young, G, 2017. "A UK financial conditions index using targeted data reduction: forecasting and structural identification," Essex Finance Centre Working Papers 20328, University of Essex, Essex Business School.
    21. repec:spr:stpapr:v:59:y:2018:i:2:d:10.1007_s00362-016-0781-8 is not listed on IDEAS

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:72:y:2010:i:1:p:3-25. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Wiley-Blackwell Digital Licensing) or (Christopher F. Baum). General contact details of provider: http://edirc.repec.org/data/rssssea.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.