IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0242730.html
   My bibliography  Save this article

A comparison of penalised regression methods for informing the selection of predictive markers

Author

Listed:
  • Christopher J Greenwood
  • George J Youssef
  • Primrose Letcher
  • Jacqui A Macdonald
  • Lauryn J Hagg
  • Ann Sanson
  • Jenn Mcintosh
  • Delyse M Hutchinson
  • John W Toumbourou
  • Matthew Fuller-Tyszkiewicz
  • Craig A Olsson

Abstract

Background: Penalised regression methods are a useful atheoretical approach for both developing predictive models and selecting key indicators within an often substantially larger pool of available indicators. In comparison to traditional methods, penalised regression models improve prediction in new data by shrinking the size of coefficients and retaining those with coefficients greater than zero. However, the performance and selection of indicators depends on the specific algorithm implemented. The purpose of this study was to examine the predictive performance and feature (i.e., indicator) selection capability of common penalised logistic regression methods (LASSO, adaptive LASSO, and elastic-net), compared with traditional logistic regression and forward selection methods. Design: Data were drawn from the Australian Temperament Project, a multigenerational longitudinal study established in 1983. The analytic sample consisted of 1,292 (707 women) participants. A total of 102 adolescent psychosocial and contextual indicators were available to predict young adult daily smoking. Findings: Penalised logistic regression methods showed small improvements in predictive performance over logistic regression and forward selection. However, no single penalised logistic regression model outperformed the others. Elastic-net models selected more indicators than either LASSO or adaptive LASSO. Additionally, more regularised models included fewer indicators, yet had comparable predictive performance. Forward selection methods dismissed many indicators identified as important in the penalised logistic regression models. Conclusions: Although overall predictive accuracy was only marginally better with penalised logistic regression methods, benefits were most clear in their capacity to select a manageable subset of indicators. Preference to competing penalised logistic regression methods may therefore be guided by feature selection capability, and thus interpretative considerations, rather than predictive performance alone.

Suggested Citation

  • Christopher J Greenwood & George J Youssef & Primrose Letcher & Jacqui A Macdonald & Lauryn J Hagg & Ann Sanson & Jenn Mcintosh & Delyse M Hutchinson & John W Toumbourou & Matthew Fuller-Tyszkiewicz &, 2020. "A comparison of penalised regression methods for informing the selection of predictive markers," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-14, November.
  • Handle: RePEc:plo:pone00:0242730
    DOI: 10.1371/journal.pone.0242730
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0242730
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0242730&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0242730?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    2. Takaya Saito & Marc Rehmsmeier, 2015. "The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets," PLOS ONE, Public Library of Science, vol. 10(3), pages 1-21, March.
    3. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    4. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    5. Amanda Fitzgerald & Naoise Mac Giollabhui & Louise Dolphin & Robert Whelan & Barbara Dooley, 2018. "Dissociable psychosocial profiles of adolescent substance users," PLOS ONE, Public Library of Science, vol. 13(8), pages 1-16, August.
    6. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    7. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    8. S. le Cessie & J. C. van Houwelingen, 1992. "Ridge Estimators in Logistic Regression," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 41(1), pages 191-201, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    2. Mkhadri, Abdallah & Ouhourane, Mohamed, 2013. "An extended variable inclusion and shrinkage algorithm for correlated variables," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 631-644.
    3. Mostafa Rezaei & Ivor Cribben & Michele Samorani, 2021. "A clustering-based feature selection method for automatically generated relational attributes," Annals of Operations Research, Springer, vol. 303(1), pages 233-263, August.
    4. Christopher Kath & Florian Ziel, 2018. "The value of forecasts: Quantifying the economic gains of accurate quarter-hourly electricity price forecasts," Papers 1811.08604, arXiv.org.
    5. Camila Epprecht & Dominique Guegan & Álvaro Veiga & Joel Correa da Rosa, 2017. "Variable selection and forecasting via automated methods for linear models: LASSO/adaLASSO and Autometrics," Post-Print halshs-00917797, HAL.
    6. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    7. Peter Martey Addo & Dominique Guegan & Bertrand Hassani, 2018. "Credit Risk Analysis Using Machine and Deep Learning Models," Risks, MDPI, vol. 6(2), pages 1-20, April.
    8. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    9. Tomáš Plíhal, 2021. "Scheduled macroeconomic news announcements and Forex volatility forecasting," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(8), pages 1379-1397, December.
    10. Kawano, Shuichi & Fujisawa, Hironori & Takada, Toyoyuki & Shiroishi, Toshihiko, 2015. "Sparse principal component regression with adaptive loading," Computational Statistics & Data Analysis, Elsevier, vol. 89(C), pages 192-203.
    11. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    12. Zeyu Bian & Erica E. M. Moodie & Susan M. Shortreed & Sahir Bhatnagar, 2023. "Variable selection in regression‐based estimation of dynamic treatment regimes," Biometrics, The International Biometric Society, vol. 79(2), pages 988-999, June.
    13. Jingxuan Luo & Lili Yue & Gaorong Li, 2023. "Overview of High-Dimensional Measurement Error Regression Models," Mathematics, MDPI, vol. 11(14), pages 1-22, July.
    14. Kath, Christopher & Ziel, Florian, 2018. "The value of forecasts: Quantifying the economic gains of accurate quarter-hourly electricity price forecasts," Energy Economics, Elsevier, vol. 76(C), pages 411-423.
    15. Li Shaoyu & Lu Qing & Fu Wenjiang & Romero Roberto & Cui Yuehua, 2009. "A Regularized Regression Approach for Dissecting Genetic Conflicts that Increase Disease Risk in Pregnancy," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-28, October.
    16. Zanhua Yin, 2020. "Variable selection for sparse logistic regression," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(7), pages 821-836, October.
    17. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    18. Achim Ahrens & Christian B. Hansen & Mark E. Schaffer, 2020. "lassopack: Model selection and prediction with regularized regression in Stata," Stata Journal, StataCorp LP, vol. 20(1), pages 176-235, March.
    19. Holger Breinlich & Valentina Corradi & Nadia Rocha & Michele Ruta & Joao M.C. Santos Silva & Tom Zylkin, 2021. "Machine Learning in International Trade Research ?- Evaluating the Impact of Trade Agreements," School of Economics Discussion Papers 0521, School of Economics, University of Surrey.
    20. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0242730. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.