IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v55y2011i5p1897-1908.html
   My bibliography  Save this article

Gene selection and prediction for cancer classification using support vector machines with a reject option

Author

Listed:
  • Choi, Hosik
  • Yeo, Donghwa
  • Kwon, Sunghoon
  • Kim, Yongdai

Abstract

In cancer classification based on gene expression data, it would be desirable to defer a decision for observations that are difficult to classify. For instance, an observation for which the conditional probability of being cancer is around 1/2 would preferably require more advanced tests rather than an immediate decision. This motivates the use of a classifier with a reject option that reports a warning in cases of observations that are difficult to classify. In this paper, we consider a problem of gene selection with a reject option. Typically, gene expression data comprise of expression levels of several thousands of candidate genes. In such cases, an effective gene selection procedure is necessary to provide a better understanding of the underlying biological system that generates data and to improve prediction performance. We propose a machine learning approach in which we apply the l1 penalty to the SVM with a reject option. This method is referred to as the l1 SVM with a reject option. We develop a novel optimization algorithm for this SVM, which is sufficiently fast and stable to analyze gene expression data. The proposed algorithm realizes an entire solution path with respect to the regularization parameter. Results of numerical studies show that, in comparison with the standard l1 SVM, the proposed method efficiently reduces prediction errors without hampering gene selectivity.

Suggested Citation

  • Choi, Hosik & Yeo, Donghwa & Kwon, Sunghoon & Kim, Yongdai, 2011. "Gene selection and prediction for cancer classification using support vector machines with a reject option," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1897-1908, May.
  • Handle: RePEc:eee:csdana:v:55:y:2011:i:5:p:1897-1908
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(10)00457-3
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Peter Hall & J. S. Marron & Amnon Neeman, 2005. "Geometric representation of high dimension, low sample size data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(3), pages 427-444, June.
    2. Lukas Meier & Sara Van De Geer & Peter Bühlmann, 2008. "The group lasso for logistic regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 53-71, February.
    3. Hao Helen Zhang & Grace Wahba & Yi Lin & Meta Voelker & Michael Ferris & Ronald Klein & Barbara Klein, 2004. "Variable Selection and Model Building via Likelihood Basis Pursuit," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 659-672, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Pedro Duarte Silva, A., 2011. "Two-group classification with high-dimensional correlated data: A factor model approach," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2975-2990, November.
    2. Drechsler, Jörg & Reiter, Jerome P., 2011. "An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3232-3243, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    2. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.
    3. Kuangnan Fang & Xinyan Fan & Wei Lan & Bingquan Wang, 2019. "Nonparametric additive beta regression for fractional response with application to body fat data," Annals of Operations Research, Springer, vol. 276(1), pages 331-347, May.
    4. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    5. Ye, Ya-Fen & Shao, Yuan-Hai & Deng, Nai-Yang & Li, Chun-Na & Hua, Xiang-Yu, 2017. "Robust Lp-norm least squares support vector regression with feature selection," Applied Mathematics and Computation, Elsevier, vol. 305(C), pages 32-52.
    6. Vincent, Martin & Hansen, Niels Richard, 2014. "Sparse group lasso and high dimensional multinomial classification," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 771-786.
    7. Yata, Kazuyoshi & Aoshima, Makoto, 2013. "PCA consistency for the power spiked model in high-dimensional settings," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 334-354.
    8. Jung, Sungkyu & Sen, Arusharka & Marron, J.S., 2012. "Boundary behavior in High Dimension, Low Sample Size asymptotics of PCA," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 190-203.
    9. Croux, Christophe & Jagtiani, Julapa & Korivi, Tarunsai & Vulanovic, Milos, 2020. "Important factors determining Fintech loan default: Evidence from a lendingclub consumer platform," Journal of Economic Behavior & Organization, Elsevier, vol. 173(C), pages 270-296.
    10. Wang, Shao-Hsuan & Huang, Su-Yun, 2022. "Perturbation theory for cross data matrix-based PCA," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    11. Caner, Mehmet, 2023. "Generalized linear models with structured sparsity estimators," Journal of Econometrics, Elsevier, vol. 236(2).
    12. repec:jss:jstsof:33:i01 is not listed on IDEAS
    13. Bilin Zeng & Xuerong Meggie Wen & Lixing Zhu, 2017. "A link-free sparse group variable selection method for single-index model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2388-2400, October.
    14. Kazuyoshi Yata & Makoto Aoshima, 2012. "Inference on High-Dimensional Mean Vectors with Fewer Observations Than the Dimension," Methodology and Computing in Applied Probability, Springer, vol. 14(3), pages 459-476, September.
    15. Olga Klopp & Marianna Pensky, 2013. "Sparse High-dimensional Varying Coefficient Model : Non-asymptotic Minimax Study," Working Papers 2013-30, Center for Research in Economics and Statistics.
    16. Jiang, Cuixia & Xiong, Wei & Xu, Qifa & Liu, Yezheng, 2021. "Predicting default of listed companies in mainland China via U-MIDAS Logit model with group lasso penalty," Finance Research Letters, Elsevier, vol. 38(C).
    17. Li, Peili & Jiao, Yuling & Lu, Xiliang & Kang, Lican, 2022. "A data-driven line search rule for support recovery in high-dimensional data analysis," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    18. Ding, Hui & Zhang, Jian & Zhang, Riquan, 2022. "Nonparametric variable screening for multivariate additive models," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    19. Saha, Enakshi & Sarkar, Soham & Ghosh, Anil K., 2017. "Some high-dimensional one-sample tests based on functions of interpoint distances," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 83-95.
    20. Osamu Komori & Shinto Eguchi & John B. Copas, 2015. "Generalized t-statistic for two-group classification," Biometrics, The International Biometric Society, vol. 71(2), pages 404-416, June.
    21. Wei, Fengrong & Zhu, Hongxiao, 2012. "Group coordinate descent algorithms for nonconvex penalized regression," Computational Statistics & Data Analysis, Elsevier, vol. 56(2), pages 316-326.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:55:y:2011:i:5:p:1897-1908. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.