IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v33y2018i3d10.1007_s00180-017-0773-8.html
   My bibliography  Save this article

On the choice and influence of the number of boosting steps for high-dimensional linear Cox-models

Author

Listed:
  • Heidi Seibold

    (LMU Munich
    University of Zurich)

  • Christoph Bernau

    (Leibniz Supercomputing Centre)

  • Anne-Laure Boulesteix

    (LMU Munich)

  • Riccardo De Bin

    (LMU Munich
    University of Oslo)

Abstract

In biomedical research, boosting-based regression approaches have gained much attention in the last decade. Their intrinsic variable selection procedure and ability to shrink the estimates of the regression coefficients toward 0 make these techniques appropriate to fit prediction models in the case of high-dimensional data, e.g. gene expressions. Their prediction performance, however, highly depends on specific tuning parameters, in particular on the number of boosting iterations to perform. This crucial parameter is usually selected via cross-validation. The cross-validation procedure may highly depend on a completely random component, namely the considered fold partition. We empirically study how much this randomness affects the results of the boosting techniques, in terms of selected predictors and prediction ability of the related models. We use four publicly available data sets related to four different diseases. In these studies, the goal is to predict survival end-points when a large number of continuous candidate predictors are available. We focus on two well known boosting approaches implemented in the R-packages CoxBoost and mboost, assuming the validity of the proportional hazards assumption and the linearity of the effects of the predictors. We show that the variability in selected predictors and prediction ability of the model is reduced by averaging over several repetitions of cross-validation in the selection of the tuning parameters.

Suggested Citation

  • Heidi Seibold & Christoph Bernau & Anne-Laure Boulesteix & Riccardo De Bin, 2018. "On the choice and influence of the number of boosting steps for high-dimensional linear Cox-models," Computational Statistics, Springer, vol. 33(3), pages 1195-1215, September.
  • Handle: RePEc:spr:compst:v:33:y:2018:i:3:d:10.1007_s00180-017-0773-8
    DOI: 10.1007/s00180-017-0773-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-017-0773-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-017-0773-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Benjamin Hofner & Andreas Mayr & Nikolay Robinzonov & Matthias Schmid, 2014. "Model-based boosting in R: a hands-on tutorial using the R package mboost," Computational Statistics, Springer, vol. 29(1), pages 3-35, February.
    2. Mogensen, Ulla B. & Ishwaran, Hemant & Gerds, Thomas A., 2012. "Evaluating Random Forests for Survival Analysis Using Prediction Error Curves," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 50(i11).
    3. Gerhard Tutz & Harald Binder, 2006. "Generalized Additive Modeling with Implicit Variable Selection by Likelihood-Based Boosting," Biometrics, The International Biometric Society, vol. 62(4), pages 961-971, December.
    4. Riccardo De Bin, 2016. "Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost," Computational Statistics, Springer, vol. 31(2), pages 513-531, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Riccardo De Bin & Vegard Grødem Stikbakke, 2023. "A boosting first-hitting-time model for survival analysis in high-dimensional settings," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 29(2), pages 420-440, April.
    2. Battauz, Michela & Vidoni, Paolo, 2022. "A likelihood-based boosting algorithm for factor analysis models with binary data," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    3. Hornung, Roman & Boulesteix, Anne-Laure, 2022. "Interaction forests: Identifying and exploiting interpretable quantitative and qualitative interaction effects," Computational Statistics & Data Analysis, Elsevier, vol. 171(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Riccardo De Bin & Vegard Grødem Stikbakke, 2023. "A boosting first-hitting-time model for survival analysis in high-dimensional settings," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 29(2), pages 420-440, April.
    2. Battauz, Michela & Vidoni, Paolo, 2022. "A likelihood-based boosting algorithm for factor analysis models with binary data," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    3. Lore Zumeta-Olaskoaga & Maximilian Weigert & Jon Larruskain & Eder Bikandi & Igor Setuain & Josean Lekue & Helmut Küchenhoff & Dae-Jin Lee, 2023. "Prediction of sports injuries in football: a recurrent time-to-event approach using regularized Cox models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 107(1), pages 101-126, March.
    4. Riccardo De Bin, 2016. "Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost," Computational Statistics, Springer, vol. 31(2), pages 513-531, June.
    5. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    6. Philip Kostov, 2010. "Do Buyers’ Characteristics and Personal Relationships Affect Agricultural Land Prices?," Land Economics, University of Wisconsin Press, vol. 86(1), pages 48-65.
    7. Bayu Adhi Tama & Sunghoon Lim, 2020. "A Comparative Performance Evaluation of Classification Algorithms for Clinical Decision Support Systems," Mathematics, MDPI, vol. 8(10), pages 1-25, October.
    8. Marra, Giampiero & Wood, Simon N., 2011. "Practical variable selection for generalized additive models," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2372-2387, July.
    9. Yousuf, Kashif & Ng, Serena, 2021. "Boosting high dimensional predictive regressions with time varying parameters," Journal of Econometrics, Elsevier, vol. 224(1), pages 60-87.
    10. Philipp F. M. Baumann & Enzo Rossi & Alexander Volkmann, 2020. "What Drives Inflation and How: Evidence from Additive Mixed Models Selected by cAIC," Papers 2006.06274, arXiv.org, revised Aug 2022.
    11. Osamu Komori, 2011. "A boosting method for maximization of the area under the ROC curve," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 63(5), pages 961-979, October.
    12. Stefanie Hieke & Axel Benner & Richard F Schlenk & Martin Schumacher & Lars Bullinger & Harald Binder, 2016. "Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-18, May.
    13. Yanis Tazi & Juan E. Arango-Ossa & Yangyu Zhou & Elsa Bernard & Ian Thomas & Amanda Gilkes & Sylvie Freeman & Yoann Pradat & Sean J. Johnson & Robert Hills & Richard Dillon & Max F. Levine & Daniel Le, 2022. "Unified classification and risk-stratification in Acute Myeloid Leukemia," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    14. Faisal Zahid & Gerhard Tutz, 2013. "Multinomial logit models with implicit variable selection," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(4), pages 393-416, December.
    15. Aizawa, Toshiaki, 2021. "Inequality of opportunity in infant mortality in South Asia: A decomposition analysis of survival data," Economics & Human Biology, Elsevier, vol. 43(C).
    16. Gerhard Tutz & Gunther Schauberger, 2015. "A Penalty Approach to Differential Item Functioning in Rasch Models," Psychometrika, Springer;The Psychometric Society, vol. 80(1), pages 21-43, March.
    17. Heikki Kauppi, 2019. "Recession Prediction with OptimalUse of Leading Indicators," Discussion Papers 125, Aboa Centre for Economics.
    18. Ngandu Balekelayi & Solomon Tesfamariam, 2020. "Geoadditive Quantile Regression Model for Sewer Pipes Deterioration Using Boosting Optimization Algorithm," Sustainability, MDPI, vol. 12(20), pages 1-24, October.
    19. Kamaryn T. Tanner & Linda D. Sharples & Rhian M. Daniel & Ruth H. Keogh, 2021. "Dynamic survival prediction combining landmarking with a machine learning ensemble: Methodology and empirical comparison," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 3-30, January.
    20. Lahiri, Kajal & Yang, Cheng, 2022. "Boosting tax revenues with mixed-frequency data in the aftermath of COVID-19: The case of New York," International Journal of Forecasting, Elsevier, vol. 38(2), pages 545-566.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:33:y:2018:i:3:d:10.1007_s00180-017-0773-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.