IDEAS home Printed from https://ideas.repec.org/a/spr/aodasc/v3y2016i2d10.1007_s40745-016-0082-z.html
   My bibliography  Save this article

Dirichlet Process Mixture Models with Pairwise Constraints for Data Clustering

Author

Listed:
  • Cheng Li

    (Deakin University)

  • Santu Rana

    (Deakin University)

  • Dinh Phung

    (Deakin University)

  • Svetha Venkatesh

    (Deakin University)

Abstract

The Dirichlet process mixture (DPM) model, a typical Bayesian nonparametric model, can infer the number of clusters automatically, and thus performing priority in data clustering. This paper investigates the influence of pairwise constraints in the DPM model. The pairwise constraint, known as two types: must-link (ML) and cannot-link (CL) constraints, indicates the relationship between two data points. We have proposed two relevant models which incorporate pairwise constraints: the constrained DPM (C-DPM) and the constrained DPM with selected constraints (SC-DPM). In C-DPM, the concept of chunklet is introduced. ML constraints are compiled into chunklets and CL constraints exist between chunklets. We derive the Gibbs sampling of the C-DPM based on chunklets. We further propose a principled approach to select the most useful constraints, which will be incorporated into the SC-DPM. We evaluate the proposed models based on three real datasets: 20 Newsgroups dataset, NUS-WIDE image dataset and Facebook comments datasets we collected by ourselves. Our SC-DPM performs priority in data clustering. In addition, our SC-DPM can be potentially used for short-text clustering.

Suggested Citation

  • Cheng Li & Santu Rana & Dinh Phung & Svetha Venkatesh, 2016. "Dirichlet Process Mixture Models with Pairwise Constraints for Data Clustering," Annals of Data Science, Springer, vol. 3(2), pages 205-223, June.
  • Handle: RePEc:spr:aodasc:v:3:y:2016:i:2:d:10.1007_s40745-016-0082-z
    DOI: 10.1007/s40745-016-0082-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s40745-016-0082-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s40745-016-0082-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Stephen Johnson, 1967. "Hierarchical clustering schemes," Psychometrika, Springer;The Psychometric Society, vol. 32(3), pages 241-254, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. repec:ers:journl:v:xxiv:y:2021:i:4b:p:659-667 is not listed on IDEAS
    2. Kim, Junyung & Shah, Asad Ullah Amin & Kang, Hyun Gook, 2020. "Dynamic risk assessment with bayesian network and clustering analysis," Reliability Engineering and System Safety, Elsevier, vol. 201(C).
    3. David G Mets & Michael S Brainard, 2018. "An automated approach to the quantitation of vocalizations and vocal learning in the songbird," PLOS Computational Biology, Public Library of Science, vol. 14(8), pages 1-29, August.
    4. Noah E. Friedkin, 1984. "Structural Cohesion and Equivalence Explanations of Social Homogeneity," Sociological Methods & Research, , vol. 12(3), pages 235-261, February.
    5. David Matesanz Gomez & Guillermo J. Ortega & Benno Torgler, 2011. "Measuring globalization: A hierarchical network approach," CREMA Working Paper Series 2011-11, Center for Research in Economics, Management and the Arts (CREMA).
    6. repec:cdl:itsrrp:qt6cb1f85c is not listed on IDEAS
    7. Lisa Price, 2001. "Demystifying farmers' entomological and pest management knowledge: A methodology for assessing the impacts on knowledge from IPM-FFS and NES interventions," Agriculture and Human Values, Springer;The Agriculture, Food, & Human Values Society (AFHVS), vol. 18(2), pages 153-176, June.
    8. Elisa Frutos-Bernal & Ángel Martín del Rey & Irene Mariñas-Collado & María Teresa Santos-Martín, 2022. "An Analysis of Travel Patterns in Barcelona Metro Using Tucker3 Decomposition," Mathematics, MDPI, vol. 10(7), pages 1-17, March.
    9. Geert Soete & Wayne DeSarbo & J. Carroll, 1985. "Optimal variable weighting for hierarchical clustering: An alternating least-squares algorithm," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 173-192, December.
    10. Teh, Boon Kin & Goo, Yik Wen & Lian, Tong Wei & Ong, Wei Guang & Choi, Wen Ting & Damodaran, Mridula & Cheong, Siew Ann, 2015. "The Chinese Correction of February 2007: How financial hierarchies change in a market crash," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 424(C), pages 225-241.
    11. Yoshio Takane & Forrest Young & Jan Leeuw, 1977. "Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features," Psychometrika, Springer;The Psychometric Society, vol. 42(1), pages 7-67, March.
    12. Wentao Qu & Xianchao Xiu & Huangyue Chen & Lingchen Kong, 2023. "A Survey on High-Dimensional Subspace Clustering," Mathematics, MDPI, vol. 11(2), pages 1-39, January.
    13. Hung Chau & Sarah H Bana & Baptiste Bouvier & Morgan R Frank, 2023. "Connecting higher education to workplace activities and earnings," PLOS ONE, Public Library of Science, vol. 18(3), pages 1-18, March.
    14. Taggart, J. H., 1999. "MNC subsidiary performance, risk, and corporate expectations," International Business Review, Elsevier, vol. 8(2), pages 233-255, April.
    15. Sorin Alexandru Ungureanu & Diana Andreea Mandricel & Bogdan Ioan Coculescu & Ionica Oncioiu, 2020. "Prevention in Dental Medicine. Case Studies and Explanations Regarding the Cost-Benefit Ratio," Academic Journal of Economic Studies, Faculty of Finance, Banking and Accountancy Bucharest,"Dimitrie Cantemir" Christian University Bucharest, vol. 6(2), pages 135-147, June.
    16. Fang, Yixin & Wang, Junhui, 2011. "Penalized cluster analysis with applications to family data," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2128-2136, June.
    17. Xingyin Duan & Xiaobo Wu & Jie Ge & Li Deng & Liang Shen & Jingwen Xu & Xiaoying Xu & Qin He & Yixin Chen & Xuesong Gao & Bing Li, 2024. "A Novel Hierarchical Clustering Sequential Forward Feature Selection Method for Paddy Rice Agriculture Mapping Based on Time-Series Images," Agriculture, MDPI, vol. 14(9), pages 1-20, August.
    18. Simon Blanchard & Wayne DeSarbo, 2013. "A New Zero-Inflated Negative Binomial Methodology for Latent Category Identification," Psychometrika, Springer;The Psychometric Society, vol. 78(2), pages 322-340, April.
    19. Satoru Yokoyama & Atsuho Nakayama & Akinori Okada, 2009. "One-mode three-way overlapping cluster analysis," Computational Statistics, Springer, vol. 24(1), pages 165-179, February.
    20. Vincent S. Tseng & Hsieh-Hui Yu & Shih-Chiang Yang, 2009. "Efficient mining of multilevel gene association rules from microarray and gene ontology," Information Systems Frontiers, Springer, vol. 11(4), pages 433-447, September.
    21. repec:jss:jstsof:35:i07 is not listed on IDEAS
    22. Thomas J. Lampoltshammer & Valerie Albrecht & Corinna Raith, 2021. "Teaching Digital Sustainability in Higher Education from a Transdisciplinary Perspective," Sustainability, MDPI, vol. 13(21), pages 1-21, October.
    23. Sumin Yu & Zhijiao Du & Xuanhua Xu, 2021. "Hierarchical Punishment-Driven Consensus Model for Probabilistic Linguistic Large-Group Decision Making with Application to Global Supplier Selection," Group Decision and Negotiation, Springer, vol. 30(6), pages 1343-1372, December.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:aodasc:v:3:y:2016:i:2:d:10.1007_s40745-016-0082-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.