IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v53y2009i4p1462-1474.html
   My bibliography  Save this article

Simultaneous cancer classification and gene selection with Bayesian nearest neighbor method: An integrated approach

Author

Listed:
  • Chakraborty, Sounak

Abstract

Since most cancer treatments come with a certain degree of toxicity it is very essential to identify a cancer type correctly and then administer the relevant therapy. With the arrival of powerful tools such as gene expression microarrays the cancer classification basis is slowly changing from morphological properties to molecular signatures. Several recent studies have demonstrated a marked improvement in prediction accuracy of tumor types based on gene expression microarray measurements over clinical markers. The main challenge in working with gene expression microarrays is that there is a huge number of genes to work with. Out of them only a small fraction are actually relevant for differentiating between different types of cancer. A Bayesian nearest neighbor model equipped with an integrated variable selection technique is proposed to overcome this challenge. This classification and gene selection model is able to classify different cancer types accurately and simultaneously identify the relevant or important genes. The proposed model is completely automatic in the sense that it adaptively picks up the neighborhood size and the important covariates. The method is successfully applied to three simulated data sets and four well known real data sets. To demonstrate the competitiveness of the method a comparative study is also done with several other "off the shelf" popular classification methods. For all the simulated data sets and real life data sets, the proposed method produced highly competitive if not better results. While the standard approach is two step model building for gene selection and then tumor prediction, this novel adaptive gene selection technique automatically selects the relevant genes along with tumor class prediction in one go. The biological relevance of the selected genes are also discussed to validate the claim.

Suggested Citation

  • Chakraborty, Sounak, 2009. "Simultaneous cancer classification and gene selection with Bayesian nearest neighbor method: An integrated approach," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1462-1474, February.
  • Handle: RePEc:eee:csdana:v:53:y:2009:i:4:p:1462-1474
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(08)00472-6
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bani K. Mallick & Debashis Ghosh & Malay Ghosh, 2005. "Bayesian classification of tumours by using gene expression data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 219-234, April.
    2. Zou, Hui & Yuan, Ming, 2008. "Regularized simultaneous model selection in multiple quantiles regression," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5296-5304, August.
    3. W. R. Gilks & P. Wild, 1992. "Adaptive Rejection Sampling for Gibbs Sampling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 41(2), pages 337-348, June.
    4. Malay Ghosh & Tapabrata Maiti & Dalho Kim & Sounak Chakraborty & Ashutosh Tewari, 2004. "Hierarchical Bayesian Neural Networks: An Application to a Prostate Cancer Study," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 601-608, January.
    5. Julian Besag & Jeremy York & Annie Mollié, 1991. "Bayesian image restoration, with two applications in spatial statistics," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 43(1), pages 1-20, March.
    6. Belitz, Christiane & Lang, Stefan, 2008. "Simultaneous selection of variables and smoothing parameters in structured additive regression models," Computational Statistics & Data Analysis, Elsevier, vol. 53(1), pages 61-81, September.
    7. Dudoit S. & Fridlyand J. & Speed T. P, 2002. "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 77-87, March.
    8. Naijun Sha & Marina Vannucci & Mahlet G. Tadesse & Philip J. Brown & Ilaria Dragoni & Nick Davies & Tracy C. Roberts & Andrea Contestabile & Mike Salmon & Chris Buckley & Francesco Falciani, 2004. "Bayesian Variable Selection in Multinomial Probit Models to Identify Molecular Signatures of Disease Stage," Biometrics, The International Biometric Society, vol. 60(3), pages 812-819, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nader Salari & Shamarina Shohaimi & Farid Najafi & Meenakshii Nallappan & Isthrinayagy Karishnarajah, 2014. "A Novel Hybrid Classification Model of Genetic Algorithms, Modified k-Nearest Neighbor and Developed Backpropagation Neural Network," PLOS ONE, Public Library of Science, vol. 9(11), pages 1-50, November.
    2. Fraiman, Ricardo & Justel, Ana & Svarc, Marcela, 2010. "Pattern recognition via projection-based kNN rules," Computational Statistics & Data Analysis, Elsevier, vol. 54(5), pages 1390-1403, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chakraborty, Sounak, 2009. "Bayesian binary kernel probit model for microarray based cancer classification and gene selection," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4198-4209, October.
    2. Ngianga-Bakwin Kandala & Chibuzor Christopher Nnanatu & Glory Atilola & Paul Komba & Lubanzadio Mavatikua & Zhuzhi Moore & Gerry Mackie & Bettina Shell-Duncan, 2019. "A Spatial Analysis of the Prevalence of Female Genital Mutilation/Cutting among 0–14-Year-Old Girls in Kenya," IJERPH, MDPI, vol. 16(21), pages 1-28, October.
    3. Katherine A. Guthrie & Lianne Sheppard & Jon Wakefield, 2002. "A Hierarchical Aggregate Data Model with Spatially Correlated Disease Rates," Biometrics, The International Biometric Society, vol. 58(4), pages 898-905, December.
    4. Zhuoqiong He & Dongchu Sun, 2000. "Hierarchical Bayes Estimation of Hunting Success Rates with Spatial Correlations," Biometrics, The International Biometric Society, vol. 56(2), pages 360-367, June.
    5. Lizhen Shen & Hua Jiang & Mingfang He & Guoqing Liu, 2017. "Collaborative representation-based classification of microarray gene expression data," PLOS ONE, Public Library of Science, vol. 12(12), pages 1-14, December.
    6. Katie Wilson & Jon Wakefield, 2022. "A probabilistic model for analyzing summary birth history data," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 47(11), pages 291-344.
    7. Pang, W. K. & Yang, Z. H. & Hou, S. H. & Leung, P. K., 2002. "Non-uniform random variate generation by the vertical strip method," European Journal of Operational Research, Elsevier, vol. 142(3), pages 595-609, November.
    8. Kubokawa, Tatsuya & Srivastava, Muni S., 2008. "Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 99(9), pages 1906-1928, October.
    9. Riccardo (Jack) Lucchetti & Luca Pedini, 2020. "ParMA: Parallelised Bayesian Model Averaging for Generalised Linear Models," Working Papers 2020:28, Department of Economics, University of Venice "Ca' Foscari".
    10. Eibich, Peter & Ziebarth, Nicolas, 2014. "Examining the Structure of Spatial Health Effects in Germany Using Hierarchical Bayes Models," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 49, pages 305-320.
    11. Hossain, Ahmed & Beyene, Joseph & Willan, Andrew R. & Hu, Pingzhao, 2009. "A flexible approximate likelihood ratio test for detecting differential expression in microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3685-3695, August.
    12. Luca Scrucca, 2014. "Graphical tools for model-based mixture discriminant analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(2), pages 147-165, June.
    13. Samantha Leorato & Maura Mezzetti, 2015. "Spatial Panel Data Model with error dependence: a Bayesian Separable Covariance Approach," CEIS Research Paper 338, Tor Vergata University, CEIS, revised 09 Apr 2015.
    14. Shreosi Sanyal & Thierry Rochereau & Cara Nichole Maesano & Laure Com-Ruelle & Isabella Annesi-Maesano, 2018. "Long-Term Effect of Outdoor Air Pollution on Mortality and Morbidity: A 12-Year Follow-Up Study for Metropolitan France," IJERPH, MDPI, vol. 15(11), pages 1-8, November.
    15. Mayer Alvo & Jingrui Mu, 2023. "COVID-19 Data Analysis Using Bayesian Models and Nonparametric Geostatistical Models," Mathematics, MDPI, vol. 11(6), pages 1-13, March.
    16. Gil, Guilherme Dôco Roberti & Costa, Marcelo Azevedo & Lopes, Ana Lúcia Miranda & Mayrink, Vinícius Diniz, 2017. "Spatial statistical methods applied to the 2015 Brazilian energy distribution benchmarking model: Accounting for unobserved determinants of inefficiencies," Energy Economics, Elsevier, vol. 64(C), pages 373-383.
    17. Z. Rezaei Ghahroodi & M. Ganjali, 2013. "A Bayesian approach for analysing longitudinal nominal outcomes using random coefficients transitional generalized logit model: an application to the labour force survey data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 40(7), pages 1425-1445, July.
    18. Vanessa Santos-Sánchez & Juan Antonio Córdoba-Doña & Javier García-Pérez & Antonio Escolar-Pujolar & Lucia Pozzi & Rebeca Ramis, 2020. "Cancer Mortality and Deprivation in the Proximity of Polluting Industrial Facilities in an Industrial Region of Spain," IJERPH, MDPI, vol. 17(6), pages 1-15, March.
    19. Berti, Patrizia & Dreassi, Emanuela & Rigo, Pietro, 2014. "Compatibility results for conditional distributions," Journal of Multivariate Analysis, Elsevier, vol. 125(C), pages 190-203.
    20. Louise Choo & Stephen G. Walker, 2008. "A new approach to investigating spatial variations of disease," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 171(2), pages 395-405, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:53:y:2009:i:4:p:1462-1474. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.