IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v53y2009i4p1462-1474.html

Simultaneous cancer classification and gene selection with Bayesian nearest neighbor method: An integrated approach

Author

Listed:
  • Chakraborty, Sounak

Abstract

Since most cancer treatments come with a certain degree of toxicity it is very essential to identify a cancer type correctly and then administer the relevant therapy. With the arrival of powerful tools such as gene expression microarrays the cancer classification basis is slowly changing from morphological properties to molecular signatures. Several recent studies have demonstrated a marked improvement in prediction accuracy of tumor types based on gene expression microarray measurements over clinical markers. The main challenge in working with gene expression microarrays is that there is a huge number of genes to work with. Out of them only a small fraction are actually relevant for differentiating between different types of cancer. A Bayesian nearest neighbor model equipped with an integrated variable selection technique is proposed to overcome this challenge. This classification and gene selection model is able to classify different cancer types accurately and simultaneously identify the relevant or important genes. The proposed model is completely automatic in the sense that it adaptively picks up the neighborhood size and the important covariates. The method is successfully applied to three simulated data sets and four well known real data sets. To demonstrate the competitiveness of the method a comparative study is also done with several other "off the shelf" popular classification methods. For all the simulated data sets and real life data sets, the proposed method produced highly competitive if not better results. While the standard approach is two step model building for gene selection and then tumor prediction, this novel adaptive gene selection technique automatically selects the relevant genes along with tumor class prediction in one go. The biological relevance of the selected genes are also discussed to validate the claim.

Suggested Citation

  • Chakraborty, Sounak, 2009. "Simultaneous cancer classification and gene selection with Bayesian nearest neighbor method: An integrated approach," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1462-1474, February.
  • Handle: RePEc:eee:csdana:v:53:y:2009:i:4:p:1462-1474
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(08)00472-6
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Dudoit S. & Fridlyand J. & Speed T. P, 2002. "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 77-87, March.
    2. Malay Ghosh & Tapabrata Maiti & Dalho Kim & Sounak Chakraborty & Ashutosh Tewari, 2004. "Hierarchical Bayesian Neural Networks: An Application to a Prostate Cancer Study," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 601-608, January.
    3. Julian Besag & Jeremy York & Annie Mollié, 1991. "Bayesian image restoration, with two applications in spatial statistics," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 43(1), pages 1-20, March.
    4. Naijun Sha & Marina Vannucci & Mahlet G. Tadesse & Philip J. Brown & Ilaria Dragoni & Nick Davies & Tracy C. Roberts & Andrea Contestabile & Mike Salmon & Chris Buckley & Francesco Falciani, 2004. "Bayesian Variable Selection in Multinomial Probit Models to Identify Molecular Signatures of Disease Stage," Biometrics, The International Biometric Society, vol. 60(3), pages 812-819, September.
    5. Bani K. Mallick & Debashis Ghosh & Malay Ghosh, 2005. "Bayesian classification of tumours by using gene expression data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 219-234, April.
    6. Belitz, Christiane & Lang, Stefan, 2008. "Simultaneous selection of variables and smoothing parameters in structured additive regression models," Computational Statistics & Data Analysis, Elsevier, vol. 53(1), pages 61-81, September.
    7. Zou, Hui & Yuan, Ming, 2008. "Regularized simultaneous model selection in multiple quantiles regression," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5296-5304, August.
    8. W. R. Gilks & P. Wild, 1992. "Adaptive Rejection Sampling for Gibbs Sampling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 41(2), pages 337-348, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nader Salari & Shamarina Shohaimi & Farid Najafi & Meenakshii Nallappan & Isthrinayagy Karishnarajah, 2014. "A Novel Hybrid Classification Model of Genetic Algorithms, Modified k-Nearest Neighbor and Developed Backpropagation Neural Network," PLOS ONE, Public Library of Science, vol. 9(11), pages 1-50, November.
    2. Fraiman, Ricardo & Justel, Ana & Svarc, Marcela, 2010. "Pattern recognition via projection-based kNN rules," Computational Statistics & Data Analysis, Elsevier, vol. 54(5), pages 1390-1403, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chakraborty, Sounak, 2009. "Bayesian binary kernel probit model for microarray based cancer classification and gene selection," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4198-4209, October.
    2. Katherine A. Guthrie & Lianne Sheppard & Jon Wakefield, 2002. "A Hierarchical Aggregate Data Model with Spatially Correlated Disease Rates," Biometrics, The International Biometric Society, vol. 58(4), pages 898-905, December.
    3. Zhuoqiong He & Dongchu Sun, 2000. "Hierarchical Bayes Estimation of Hunting Success Rates with Spatial Correlations," Biometrics, The International Biometric Society, vol. 56(2), pages 360-367, June.
    4. Ngianga-Bakwin Kandala & Chibuzor Christopher Nnanatu & Glory Atilola & Paul Komba & Lubanzadio Mavatikua & Zhuzhi Moore & Gerry Mackie & Bettina Shell-Duncan, 2019. "A Spatial Analysis of the Prevalence of Female Genital Mutilation/Cutting among 0–14-Year-Old Girls in Kenya," IJERPH, MDPI, vol. 16(21), pages 1-28, October.
    5. Lizhen Shen & Hua Jiang & Mingfang He & Guoqing Liu, 2017. "Collaborative representation-based classification of microarray gene expression data," PLOS ONE, Public Library of Science, vol. 12(12), pages 1-14, December.
    6. Tamara Broderick & Robert Gramacy, 2011. "Classification and Categorical Inputs with Treed Gaussian Process Models," Journal of Classification, Springer;The Classification Society, vol. 28(2), pages 244-270, July.
    7. Vinícius Diniz Mayrink & Renato Valladares Panaro & Marcelo Azevedo Costa, 2021. "Structural equation modeling with time dependence: an application comparing Brazilian energy distributors," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 105(2), pages 353-383, June.
    8. Wang, Tao & Zhu, Lixing, 2013. "Sparse sufficient dimension reduction using optimal scoring," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 223-232.
    9. Katie Wilson & Jon Wakefield, 2022. "A probabilistic model for analyzing summary birth history data," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 47(11), pages 291-344.
    10. Peter Congdon, 2010. "A multiple indicator, multiple cause method for representing social capital with an application to psychological distress," Journal of Geographical Systems, Springer, vol. 12(1), pages 1-23, March.
    11. Thomas C. McHale & Claudia M. Romero-Vivas & Claudio Fronterre & Pedro Arango-Padilla & Naomi R. Waterlow & Chad D. Nix & Andrew K. Falconar & Jorge Cano, 2019. "Spatiotemporal Heterogeneity in the Distribution of Chikungunya and Zika Virus Case Incidences during their 2014 to 2016 Epidemics in Barranquilla, Colombia," IJERPH, MDPI, vol. 16(10), pages 1-21, May.
    12. Pang, W. K. & Yang, Z. H. & Hou, S. H. & Leung, P. K., 2002. "Non-uniform random variate generation by the vertical strip method," European Journal of Operational Research, Elsevier, vol. 142(3), pages 595-609, November.
    13. Kubokawa, Tatsuya & Srivastava, Muni S., 2008. "Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 99(9), pages 1906-1928, October.
    14. Riccardo (Jack) Lucchetti & Luca Pedini, 2020. "ParMA: Parallelised Bayesian Model Averaging for Generalised Linear Models," Working Papers 2020:28, Department of Economics, University of Venice "Ca' Foscari".
    15. Roy, Vivekananda, 2014. "Efficient estimation of the link function parameter in a robust Bayesian binary regression model," Computational Statistics & Data Analysis, Elsevier, vol. 73(C), pages 87-102.
    16. Renato Assunção & Carl Schmertmann & Joseph Potter & Suzana Cavenaghi, 2005. "Empirical bayes estimation of demographic schedules for small areas," Demography, Springer;Population Association of America (PAA), vol. 42(3), pages 537-558, August.
    17. Peter Congdon, 2014. "Estimating life expectancies for US small areas: a regression framework," Journal of Geographical Systems, Springer, vol. 16(1), pages 1-18, January.
    18. Chu, Chi-Hsiang & Lee, Kuo-Jung & Hsu, Chien-Chin & Chen, Ray-Bing, 2025. "Bayesian selection approach for categorical responses via multinomial probit models," Computational Statistics & Data Analysis, Elsevier, vol. 212(C).
    19. Jansen, Nora & Hinz, Oliver & Deusser, Clemens & Strufe, Thorsten, 2021. "Is the Buzz on? – A Buzz Detection System for Viral Posts in Social Media," Journal of Interactive Marketing, Elsevier, vol. 56(C), pages 1-17.
    20. Makoto Aoshima & Kazuyoshi Yata, 2014. "A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 66(5), pages 983-1010, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:53:y:2009:i:4:p:1462-1474. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.