IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v78y2014icp56-68.html
   My bibliography  Save this article

Bayesian nonparametric classification for spectroscopy data

Author

Listed:
  • Gutiérrez, Luis
  • Gutiérrez-Peña, Eduardo
  • Mena, Ramsés H.

Abstract

High-dimensional spectroscopy data are increasingly common in many fields of science. Building classification models in this context is challenging, due not only to high dimensionality but also to high autocorrelations. A two-stage classification strategy is proposed. First, in a data pre-processing step, the dimensionality of the data is reduced using one of two distinct methods. The output of either of these methods is then used to feed a classification procedure that uses a multivariate density estimate from a Bayesian nonparametric mixture model for discrimination purposes. The model employed is based on a random probability measure with decreasing weights. This nonparametric prior is chosen so as to ease the identifiability and label switching problems inherent to these models. This simple and flexible classification strategy is applied to the well-known ‘meat’ data set. The results are similar or better than previously reported in the literature for the same data.

Suggested Citation

  • Gutiérrez, Luis & Gutiérrez-Peña, Eduardo & Mena, Ramsés H., 2014. "Bayesian nonparametric classification for spectroscopy data," Computational Statistics & Data Analysis, Elsevier, vol. 78(C), pages 56-68.
  • Handle: RePEc:eee:csdana:v:78:y:2014:i:c:p:56-68
    DOI: 10.1016/j.csda.2014.04.010
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947314001170
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2014.04.010?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Crainiceanu, Ciprian M. & Ruppert, David & Wand, Matthew P., 2005. "Bayesian Analysis for Penalized Spline Regression Using WinBUGS," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 14(i14).
    2. Nema Dean & Thomas Brendan Murphy & Gerard Downey, 2006. "Using unlabelled data to update classification rules with applications in food authenticity studies," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 55(1), pages 1-14, January.
    3. Datta Somnath, 2008. "Classification of Breast Cancer versus Normal Samples from Mass Spectrometry Profiles Using Linear Discriminant Analysis of Important Features Selected by Random Forest," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(2), pages 1-14, February.
    4. Hoefsloot Huub C. J. & Smit Suzanne & Smilde Age K., 2008. "A Classification Model for the Leiden Proteomics Competition," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(2), pages 1-11, February.
    5. Lee, Jong Soo & Cox, Dennis D., 2010. "Robust smoothing: Smoothing parameter selection and applications to fluorescence spectroscopy," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3131-3143, December.
    6. Fearn Tom, 2008. "Principal Component Discriminant Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(2), pages 1-6, February.
    7. Heidema A. Geert & Nagelkerke Nico, 2008. "Developing a Discrimination Rule between Breast Cancer Patients and Controls Using Proteomics Mass Spectrometric Data: A Three-Step Approach," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(2), pages 1-11, February.
    8. Luis Gutierrez & Fernando Quintana & Dietrich von Baer & Claudia Mardones, 2011. "Multivariate Bayesian discrimination for varietal authentication of Chilean red wine," Journal of Applied Statistics, Taylor & Francis Journals, vol. 38(10), pages 2099-2109.
    9. Chakraborty, Sounak, 2012. "Bayesian multiple response kernel regression model for high dimensional data and its practical applications in near infrared spectroscopy," Computational Statistics & Data Analysis, Elsevier, vol. 56(9), pages 2742-2755.
    10. Rolando De la Cruz‐Mesía & Fernando A. Quintana & Peter Müller, 2007. "Semiparametric Bayesian classification with longitudinal markers," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 56(2), pages 119-137, March.
    11. Raftery, Adrian E. & Dean, Nema, 2006. "Variable Selection for Model-Based Clustering," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 168-178, March.
    12. Yao, Weixin & Lindsay, Bruce G., 2009. "Bayesian Mixture Labeling by Highest Posterior Density," Journal of the American Statistical Association, American Statistical Association, vol. 104(486), pages 758-767.
    13. Anjishnu Banerjee & David B. Dunson & Surya T. Tokdar, 2013. "Efficient Gaussian process regression for large datasets," Biometrika, Biometrika Trust, vol. 100(1), pages 75-89.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. De Blasi, Pierpaolo & Martínez, Asael Fabian & Mena, Ramsés H. & Prünster, Igor, 2020. "On the inferential implications of decreasing weight structures in mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 147(C).
    2. Pierpaolo De Blasi & Ramsés H. Mena & Igor Prünster, 2022. "Asymptotic behavior of the number of distinct values in a sample from the geometric stick-breaking process," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(1), pages 143-165, February.
    3. Bao, Junshu & Hanson, Timothy E., 2016. "A mean-constrained finite mixture of normals model," Statistics & Probability Letters, Elsevier, vol. 117(C), pages 93-99.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hand David J, 2008. "Breast Cancer Diagnosis from Proteomic Mass Spectrometry Data: A Comparative Evaluation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(2), pages 1-23, December.
    2. repec:jss:jstsof:18:i06 is not listed on IDEAS
    3. Morris, Katherine & McNicholas, Paul D., 2016. "Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures," Computational Statistics & Data Analysis, Elsevier, vol. 97(C), pages 133-150.
    4. Andrews, Jeffrey L. & McNicholas, Paul D. & Subedi, Sanjeena, 2011. "Model-based classification via mixtures of multivariate t-distributions," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 520-529, January.
    5. Gutiérrez, Luis & Mena, Ramsés H. & Ruggiero, Matteo, 2016. "A time dependent Bayesian nonparametric model for air quality analysis," Computational Statistics & Data Analysis, Elsevier, vol. 95(C), pages 161-175.
    6. Cappozzo, Andrea & Greselin, Francesca & Murphy, Thomas Brendan, 2021. "Robust variable selection for model-based learning in presence of adulteration," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    7. Alexandre Rodrigues & Peter Diggle & Renato Assuncao, 2010. "Semiparametric approach to point source modelling in epidemiology and criminology," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 59(3), pages 533-542, May.
    8. Daeyoung Kim & Bruce Lindsay, 2015. "Empirical identifiability in finite mixture models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 67(4), pages 745-772, August.
    9. Jerzy Korzeniewski, 2016. "New Method Of Variable Selection For Binary Data Cluster Analysis," Statistics in Transition New Series, Polish Statistical Association, vol. 17(2), pages 295-304, June.
    10. Erin E. Gabriel & Michael J. Daniels & M. Elizabeth Halloran, 2016. "Comparing biomarkers as trial level general surrogates," Biometrics, The International Biometric Society, vol. 72(4), pages 1046-1054, December.
    11. Yao, Weixin & Wei, Yan & Yu, Chun, 2014. "Robust mixture regression using the t-distribution," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 116-127.
    12. Sarah Brown & Pulak Ghosh & Bhuvanesh Pareek & Karl Taylor, 2017. "Financial Hardship and Saving Behaviour: Bayesian Analysis of British Panel Data," Working Papers 2017011, The University of Sheffield, Department of Economics.
    13. Welham, S.J. & Thompson, R., 2009. "A note on bimodality in the log-likelihood function for penalized spline mixed models," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 920-931, February.
    14. Jian Guo & Elizaveta Levina & George Michailidis & Ji Zhu, 2010. "Pairwise Variable Selection for High-Dimensional Model-Based Clustering," Biometrics, The International Biometric Society, vol. 66(3), pages 793-804, September.
    15. Maugis, C. & Celeux, G. & Martin-Magniette, M.-L., 2011. "Variable selection in model-based discriminant analysis," Journal of Multivariate Analysis, Elsevier, vol. 102(10), pages 1374-1387, November.
    16. Ahlquist, John S. & Breunig, Christian, 2009. "Country clustering in comparative political economy," MPIfG Discussion Paper 09/5, Max Planck Institute for the Study of Societies.
    17. Cathy Maugis & Gilles Celeux & Marie-Laure Martin-Magniette, 2009. "Variable Selection for Clustering with Gaussian Mixture Models," Biometrics, The International Biometric Society, vol. 65(3), pages 701-709, September.
    18. Jeffrey Andrews & Paul McNicholas, 2014. "Variable Selection for Clustering and Classification," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 136-153, July.
    19. Alessandro Casa & Andrea Cappozzo & Michael Fop, 2022. "Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 648-674, November.
    20. Mukhopadhyay, Subhadeep & Ghosh, Anil K., 2011. "Bayesian multiscale smoothing in supervised and semi-supervised kernel discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2344-2353, July.
    21. Wang, Ketong & Porter, Michael D., 2018. "Optimal Bayesian clustering using non-negative matrix factorization," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 395-411.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:78:y:2014:i:c:p:56-68. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.