IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0204627.html
   My bibliography  Save this article

Optimal clustering under uncertainty

Author

Listed:
  • Lori A Dalton
  • Marco E Benalcázar
  • Edward R Dougherty

Abstract

Classical clustering algorithms typically either lack an underlying probability framework to make them predictive or focus on parameter estimation rather than defining and minimizing a notion of error. Recent work addresses these issues by developing a probabilistic framework based on the theory of random labeled point processes and characterizing a Bayes clusterer that minimizes the number of misclustered points. The Bayes clusterer is analogous to the Bayes classifier. Whereas determining a Bayes classifier requires full knowledge of the feature-label distribution, deriving a Bayes clusterer requires full knowledge of the point process. When uncertain of the point process, one would like to find a robust clusterer that is optimal over the uncertainty, just as one may find optimal robust classifiers with uncertain feature-label distributions. Herein, we derive an optimal robust clusterer by first finding an effective random point process that incorporates all randomness within its own probabilistic structure and from which a Bayes clusterer can be derived that provides an optimal robust clusterer relative to the uncertainty. This is analogous to the use of effective class-conditional distributions in robust classification. After evaluating the performance of robust clusterers in synthetic mixtures of Gaussians models, we apply the framework to granular imaging, where we make use of the asymptotic granulometric moment theory for granular images to relate robust clustering theory to the application.

Suggested Citation

  • Lori A Dalton & Marco E Benalcázar & Edward R Dougherty, 2018. "Optimal clustering under uncertainty," PLOS ONE, Public Library of Science, vol. 13(10), pages 1-21, October.
  • Handle: RePEc:plo:pone00:0204627
    DOI: 10.1371/journal.pone.0204627
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0204627
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0204627&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0204627?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Fernando A. Quintana & Pilar L. Iglesias, 2003. "Bayesian clustering and product partition models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(2), pages 557-574, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mark S. Handcock & Adrian E. Raftery & Jeremy M. Tantrum, 2007. "Model‐based clustering for social networks," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(2), pages 301-354, March.
    2. Wang, Ketong & Porter, Michael D., 2018. "Optimal Bayesian clustering using non-negative matrix factorization," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 395-411.
    3. Im, Yunju & Tan, Aixin, 2021. "Bayesian subgroup analysis in regression using mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 162(C).
    4. Christopher A. Bush & Juhee Lee & Steven N. MacEachern, 2010. "Minimally informative prior distributions for non‐parametric Bayesian analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(2), pages 253-268, March.
    5. L. G. Leon-Novelo & B. Nebiyou Bekele & P. Müller & F. Quintana & K. Wathen, 2012. "Borrowing Strength with Nonexchangeable Priors over Subpopulations," Biometrics, The International Biometric Society, vol. 68(2), pages 550-558, June.
    6. Jim Q. Smith & Paul E. Anderson & Silvia Liverani, 2008. "Separation measures and the geometry of Bayes factor selection for classification," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 957-980, November.
    7. Maura Mezzetti & Daniele Borzelli & Andrea d’Avella, 2022. "A Bayesian approach to model individual differences and to partition individuals: case studies in growth and learning curves," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 31(5), pages 1245-1271, December.
    8. An Cheng & Tonghui Chen & Guogang Jiang & Xinru Han, 2021. "Can Major Public Health Emergencies Affect Changes in International Oil Prices?," IJERPH, MDPI, vol. 18(24), pages 1-13, December.
    9. Sylvia Frühwirth-Schnatter & Gertraud Malsiner-Walli, 2019. "From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 33-64, March.
    10. Chai Jian & Wang Shubin & Xiao Hao, 2013. "Abrupt Changes of Global Oil Price," Journal of Systems Science and Information, De Gruyter, vol. 1(1), pages 38-59, February.
    11. Rosangela Loschi & Leonardo Bastos & Pilar Iglesias, 2005. "Identifying Volatility Clusters Using the PPM: A Sensitivity Analysis," Computational Economics, Springer;Society for Computational Economics, vol. 24(4), pages 305-319, June.
    12. Raffaele Argiento & Alessandra Guglielmi & Antonio Pievatolo, 2014. "Estimation, prediction and interpretation of NGG random effects models: an application to Kevlar fibre failure times," Statistical Papers, Springer, vol. 55(3), pages 805-826, August.
    13. Athanasios Kottas & Milovan Krnjajić, 2009. "Bayesian Semiparametric Modelling in Quantile Regression," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 36(2), pages 297-319, June.
    14. Yeonseung Chung & David Dunson, 2011. "The local Dirichlet process," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 63(1), pages 59-80, February.
    15. Loschi, R.H. & Cruz, F.R.B., 2005. "Extension to the product partition model: computing the probability of a change," Computational Statistics & Data Analysis, Elsevier, vol. 48(2), pages 255-268, February.
    16. Fuentes-García, Ruth & Mena, Ramsés H. & Walker, Stephen G., 2019. "Modal posterior clustering motivated by Hopfield’s network," Computational Statistics & Data Analysis, Elsevier, vol. 137(C), pages 92-100.
    17. Subharup Guha & Rex Jung & David Dunson, 2022. "Predicting phenotypes from brain connection structure," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(3), pages 639-668, June.
    18. Abel Rodr�guez & Enrique ter Horst, 2011. "Measuring expectations in options markets: an application to the S&P500 index," Quantitative Finance, Taylor & Francis Journals, vol. 11(9), pages 1393-1405, July.
    19. Sara Wade & Stephen G. Walker & Sonia Petrone, 2014. "A Predictive Study of Dirichlet Process Mixture Models for Curve Fitting," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(3), pages 580-605, September.
    20. Chai, Jian & Lu, Quanying & Hu, Yi & Wang, Shouyang & Lai, Kin Keung & Liu, Hongtao, 2018. "Analysis and Bayes statistical probability inference of crude oil price change point," Technological Forecasting and Social Change, Elsevier, vol. 126(C), pages 271-283.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0204627. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.