IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v53y2009i5p1590-1603.html
   My bibliography  Save this article

Survival prediction using gene expression data: A review and comparison

Author

Listed:
  • van Wieringen, Wessel N.
  • Kun, David
  • Hampel, Regina
  • Boulesteix, Anne-Laure

Abstract

Knowledge of transcription of the human genome might greatly enhance our understanding of cancer. In particular, gene expression may be used to predict the survival of cancer patients. Microarray data are characterized by their high-dimensionality: the number of covariates (p~1000) greatly exceeds the number of samples (n~100), which is a considerable challenge in the context of survival prediction. An inventory of methods that have been used to model survival using gene expression is given. These methods are critically reviewed and compared in a qualitative way. Next, these methods are applied to three real-life data sets for a quantitative comparison. The choice of the evaluation measure of predictive performance is crucial for the selection of the best method. Depending on the evaluation measure, either the L2-penalized Cox regression or the random forest ensemble method yields the best survival time prediction using the considered gene expression data sets. Consensus on the best evaluation measure of predictive performance is needed.

Suggested Citation

  • van Wieringen, Wessel N. & Kun, David & Hampel, Regina & Boulesteix, Anne-Laure, 2009. "Survival prediction using gene expression data: A review and comparison," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1590-1603, March.
  • Handle: RePEc:eee:csdana:v:53:y:2009:i:5:p:1590-1603
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(08)00294-6
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bastien, Philippe & Vinzi, Vincenzo Esposito & Tenenhaus, Michel, 2005. "PLS generalised linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 48(1), pages 17-46, January.
    2. Danh V. Nguyen & A. Bulak Arpat & Naisyin Wang & Raymond J. Carroll, 2002. "DNA Microarray Experiments: Biological and Technological Aspects," Biometrics, The International Biometric Society, vol. 58(4), pages 701-717, December.
    3. Ash A. Alizadeh & Michael B. Eisen & R. Eric Davis & Chi Ma & Izidore S. Lossos & Andreas Rosenwald & Jennifer C. Boldrick & Hajeer Sabet & Truc Tran & Xin Yu & John I. Powell & Liming Yang & Gerald E, 2000. "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling," Nature, Nature, vol. 403(6769), pages 503-511, February.
    4. Bair, Eric & Hastie, Trevor & Paul, Debashis & Tibshirani, Robert, 2006. "Prediction by Supervised Principal Components," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 119-137, March.
    5. Boulesteix Anne-Laure, 2006. "Reader's Reaction to "Dimension Reduction for Classification with Gene Expression Microarray Data" by Dai et al (2006)," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 5(1), pages 1-7, June.
    6. Laura J. van 't Veer & Hongyue Dai & Marc J. van de Vijver & Yudong D. He & Augustinus A. M. Hart & Mao Mao & Hans L. Peterse & Karin van der Kooy & Matthew J. Marton & Anke T. Witteveen & George J. S, 2002. "Gene expression profiling predicts clinical outcome of breast cancer," Nature, Nature, vol. 415(6871), pages 530-536, January.
    7. Mahlet G. Tadesse & Joseph G. Ibrahim & Robert Gentleman & Sabina Chiaretti & Jerome Ritz & Robin Foa, 2005. "Bayesian Error-in-Variable Survival Model for the Analysis of GeneChip Arrays," Biometrics, The International Biometric Society, vol. 61(2), pages 488-497, June.
    8. Dudoit S. & Fridlyand J. & Speed T. P, 2002. "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 77-87, March.
    9. Neil A. Butler & Michael C. Denham, 2000. "The peculiar shrinkage properties of partial least squares regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(3), pages 585-593.
    10. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    11. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Antoniadis, Anestis & Fryzlewicz, Piotr & Letué, Frédérique, 2010. "The Dantzig selector in Cox's proportional hazards model," LSE Research Online Documents on Economics 30992, London School of Economics and Political Science, LSE Library.
    2. Yu Takagi & Hirokazu Matsuda & Yukio Taniguchi & Hiroaki Iwaisaki, 2014. "Predicting the Phenotypic Values of Physiological Traits Using SNP Genotype and Gene Expression Data in Mice," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-17, December.
    3. Stefanie Hieke & Axel Benner & Richard F Schlenk & Martin Schumacher & Lars Bullinger & Harald Binder, 2016. "Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-18, May.
    4. Zhao, Xiaobing & Zhou, Xian, 2014. "Sufficient dimension reduction on marginal regression for gaps of recurrent events," Journal of Multivariate Analysis, Elsevier, vol. 127(C), pages 56-71.
    5. Isabella Zwiener & Barbara Frisch & Harald Binder, 2014. "Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures," PLOS ONE, Public Library of Science, vol. 9(1), pages 1-13, January.
    6. Christine W Duarte & Christopher D Willey & Degui Zhi & Xiangqin Cui & Jacqueline J Harris & Laura Kelly Vaughan & Tapan Mehta & Raymond O McCubrey & Nikolai N Khodarev & Ralph R Weichselbaum & G Yanc, 2012. "Expression Signature of IFN/STAT1 Signaling Genes Predicts Poor Survival Outcome in Glioblastoma Multiforme in a Subtype-Specific Manner," PLOS ONE, Public Library of Science, vol. 7(1), pages 1-8, January.
    7. Emura, Takeshi & Chen, Yi-Hau & Chen, Hsuan-Yu, 2012. "Survival prediction based on compound covariate under cox proportional hazard models," MPRA Paper 41149, University Library of Munich, Germany.
    8. Yanfeng Wang & Haohao Wang & Sanyi Li & Lidong Wang, 2022. "Survival Risk Prediction of Esophageal Cancer Based on the Kohonen Network Clustering Algorithm and Kernel Extreme Learning Machine," Mathematics, MDPI, vol. 10(9), pages 1-20, April.
    9. Anestis Antoniadis & Piotr Fryzlewicz & Frédérique Letué, 2010. "The Dantzig Selector in Cox's Proportional Hazards Model," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 37(4), pages 531-552, December.
    10. Luke Kumar & Russell Greiner, 2019. "Gene expression based survival prediction for cancer patients—A topic modeling approach," PLOS ONE, Public Library of Science, vol. 14(11), pages 1-30, November.
    11. Wei Zhang & Takayo Ota & Viji Shridhar & Jeremy Chien & Baolin Wu & Rui Kuang, 2013. "Network-based Survival Analysis Reveals Subnetwork Signatures for Predicting Outcomes of Ovarian Cancer Treatment," PLOS Computational Biology, Public Library of Science, vol. 9(3), pages 1-16, March.
    12. Armin Rauschenberger & Iuliana Ciocănea-Teodorescu & Marianne A. Jonker & Renée X. Menezes & Mark A. Wiel, 2020. "Sparse classification with paired covariates," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(3), pages 571-588, September.
    13. Ming Yi & Ruoqing Zhu & Robert M Stephens, 2018. "GradientScanSurv—An exhaustive association test method for gene expression data with censored survival outcome," PLOS ONE, Public Library of Science, vol. 13(12), pages 1-28, December.
    14. Farcomeni, Alessio & Nardi, Alessandra, 2010. "A two-component Weibull mixture to model early and late mortality in a Bayesian framework," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 416-428, February.
    15. Hapfelmeier, A. & Ulm, K., 2013. "A new variable selection approach using Random Forests," Computational Statistics & Data Analysis, Elsevier, vol. 60(C), pages 50-69.
    16. Julia Gilhodes & Florence Dalenc & Jocelyn Gal & Christophe Zemmour & Eve Leconte & Jean Marie Boher & Thomas Filleron, 2020. "Comparison of Variable Selection Methods for Time-to-Event Data in High-Dimensional Settings," Post-Print hal-02934793, HAL.
    17. Xiaolin Chen & Catherine Chunling Liu & Sheng Xu, 2021. "An efficient algorithm for joint feature screening in ultrahigh-dimensional Cox’s model," Computational Statistics, Springer, vol. 36(2), pages 885-910, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Juan C. Laria & M. Carmen Aguilera-Morillo & Rosa E. Lillo, 2023. "Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models," Statistical Papers, Springer, vol. 64(1), pages 227-253, February.
    2. Hyonho Chun & Sündüz Keleş, 2010. "Sparse partial least squares regression for simultaneous dimension reduction and variable selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(1), pages 3-25, January.
    3. Zemin Zheng & Jie Zhang & Yang Li, 2022. "L 0 -Regularized Learning for High-Dimensional Additive Hazards Regression," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2762-2775, September.
    4. Wang Zhu & Wang C.Y., 2010. "Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-33, June.
    5. Bilin Zeng & Xuerong Meggie Wen & Lixing Zhu, 2017. "A link-free sparse group variable selection method for single-index model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2388-2400, October.
    6. Caroline Jardet & Baptiste Meunier, 2022. "Nowcasting world GDP growth with high‐frequency data," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(6), pages 1181-1200, September.
    7. Kawano, Shuichi & Fujisawa, Hironori & Takada, Toyoyuki & Shiroishi, Toshihiko, 2015. "Sparse principal component regression with adaptive loading," Computational Statistics & Data Analysis, Elsevier, vol. 89(C), pages 192-203.
    8. Hojin Yang & Hongtu Zhu & Joseph G. Ibrahim, 2018. "MILFM: Multiple index latent factor model based on high‐dimensional features," Biometrics, The International Biometric Society, vol. 74(3), pages 834-844, September.
    9. Chakraborty, Sounak, 2009. "Bayesian binary kernel probit model for microarray based cancer classification and gene selection," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4198-4209, October.
    10. Khan Md Hasinur Rahaman & Bhadra Anamika & Howlader Tamanna, 2019. "Stability selection for lasso, ridge and elastic net implemented with AFT models," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(5), pages 1-14, October.
    11. Brendan P. W. Ames & Mingyi Hong, 2016. "Alternating direction method of multipliers for penalized zero-variance discriminant analysis," Computational Optimization and Applications, Springer, vol. 64(3), pages 725-754, July.
    12. Luis A. Barboza & Julien Emile-Geay & Bo Li & Wan He, 2019. "Efficient Reconstructions of Common Era Climate via Integrated Nested Laplace Approximations," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 24(3), pages 535-554, September.
    13. Zemin Zheng & Jinchi Lv & Wei Lin, 2021. "Nonsparse Learning with Latent Variables," Operations Research, INFORMS, vol. 69(1), pages 346-359, January.
    14. Bai, Jushan & Ng, Serena, 2008. "Forecasting economic time series using targeted predictors," Journal of Econometrics, Elsevier, vol. 146(2), pages 304-317, October.
    15. Lore Zumeta-Olaskoaga & Maximilian Weigert & Jon Larruskain & Eder Bikandi & Igor Setuain & Josean Lekue & Helmut Küchenhoff & Dae-Jin Lee, 2023. "Prediction of sports injuries in football: a recurrent time-to-event approach using regularized Cox models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 107(1), pages 101-126, March.
    16. Chakraborty, Sounak & Guo, Ruixin, 2011. "A Bayesian hybrid Huberized support vector machine and its applications in high-dimensional medical data," Computational Statistics & Data Analysis, Elsevier, vol. 55(3), pages 1342-1356, March.
    17. Wang, Tao & Zhu, Lixing, 2013. "Sparse sufficient dimension reduction using optimal scoring," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 223-232.
    18. Kawano, Shuichi & Fujisawa, Hironori & Takada, Toyoyuki & Shiroishi, Toshihiko, 2018. "Sparse principal component regression for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 180-196.
    19. Luo, Ruiyan & Qi, Xin, 2015. "Sparse wavelet regression with multiple predictive curves," Journal of Multivariate Analysis, Elsevier, vol. 134(C), pages 33-49.
    20. Zeyu Diao & Lili Yue & Fanrong Zhao & Gaorong Li, 2022. "High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates," Mathematics, MDPI, vol. 10(24), pages 1-18, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:53:y:2009:i:5:p:1590-1603. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.