A Note on the Formal Implementation of the K-means Algorithm with Hard Positive and Negative Constraints

My bibliography Save this article

A Note on the Formal Implementation of the K-means Algorithm with Hard Positive and Negative Constraints

Author

Listed:

Igor Melnykov
(University of Minnesota - Duluth)
Volodymyr Melnykov
(The University of Alabama)

Registered:

Abstract

The paper discusses a new approach for incorporating hard constraints into the K-means algorithm for semi-supervised clustering. An analytic modification of the objective function of K-means is proposed that has not been previously considered in the literature.

Suggested Citation

Igor Melnykov & Volodymyr Melnykov, 2020. "A Note on the Formal Implementation of the K-means Algorithm with Hard Positive and Negative Constraints," Journal of Classification, Springer;The Classification Society, vol. 37(3), pages 789-809, October.

Handle: RePEc:spr:jclass:v:37:y:2020:i:3:d:10.1007_s00357-019-09349-x
DOI: 10.1007/s00357-019-09349-x

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Marek Śmieja & Magdalena Wiercioch, 2017. "Constrained clustering with a complex cluster structure," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(3), pages 493-518, September.
Melnykov, Volodymyr & Chen, Wei-Chen & Maitra, Ranjan, 2012. "MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 51(i12).
Geoffrey Barbier & Reza Zafarani & Huiji Gao & Gabriel Fung & Huan Liu, 2012. "Maximizing benefits from crowdsourced data," Computational and Mathematical Organization Theory, Springer, vol. 18(3), pages 257-279, September.
Wayne DeSarbo & Vijay Mahajan, 1984. "Constrained classification: The use of a priori information in cluster analysis," Psychometrika, Springer;The Psychometric Society, vol. 49(2), pages 187-215, June.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Dolnicar, Sara & Grün, Bettina & Leisch, Friedrich, 2016. "Increasing sample size compensates for data problems in segmentation studies," Journal of Business Research, Elsevier, vol. 69(2), pages 992-999.
Giuseppe RICCIARDO LAMONICA, 2002. "La funzionalita' nelle zone omogenee delle Marche," Working Papers 165, Universita' Politecnica delle Marche (I), Dipartimento di Scienze Economiche e Sociali.
Yu Ding & Wayne S. DeSarbo & Dominique M. Hanssens & Kamel Jedidi & John G. Lynch & Donald R. Lehmann, 2020. "The past, present, and future of measurement and methods in marketing analysis," Marketing Letters, Springer, vol. 31(2), pages 175-186, September.
Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
Renato Cordeiro Amorim, 2016. "A Survey on Feature Weighting Based K-Means Algorithms," Journal of Classification, Springer;The Classification Society, vol. 33(2), pages 210-242, July.
J. Fernando Vera & Rodrigo Macías, 2021. "On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling," Psychometrika, Springer;The Psychometric Society, vol. 86(2), pages 489-513, June.
Andrea Cerasa, 2016. "Combining homogeneous groups of preclassified observations with application to international trade," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 70(3), pages 229-259, August.
Wayne DeSarbo & Richard Oliver & Arvind Rangaswamy, 1989. "A simulated annealing methodology for clusterwise linear regression," Psychometrika, Springer;The Psychometric Society, vol. 54(4), pages 707-736, September.
Ana Oliveira-Brochado & Francisco Vitorino Martins, 2008. "Aspectos Metodológicos da Segmentação de Mercado: Base de Segmentação e Métodos de Classificação," FEP Working Papers 261, Universidade do Porto, Faculdade de Economia do Porto.
Antonello Maruotti & Antonio Punzo, 2021. "Initialization of Hidden Markov and Semi‐Markov Models: A Critical Evaluation of Several Strategies," International Statistical Review, International Statistical Institute, vol. 89(3), pages 447-480, December.
Melnykov, Volodymyr, 2013. "On the distribution of posterior probabilities in finite mixture models with application in clustering," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 175-189.
Melnykov, Volodymyr, 2016. "ClickClust: An R Package for Model-Based Clustering of Categorical Sequences," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i09).
Qi Chen & Wen Luo & Gregory J. Palardy & Ryan Glaman & Amber McEnturff, 2017. "The Efficacy of Common Fit Indices for Enumerating Classes in Growth Mixture Models When Nested Data Structure Is Ignored," SAGE Open, , vol. 7(1), pages 21582440177, March.
Melnykov, Igor & Melnykov, Volodymyr, 2014. "On K-means algorithm with the use of Mahalanobis distances," Statistics & Probability Letters, Elsevier, vol. 84(C), pages 88-95.
Efthymios Costa & Ioanna Papatsouma & Angelos Markos, 2023. "Benchmarking distance-based partitioning methods for mixed-type data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(3), pages 701-724, September.
Francesca Torti & Domenico Perrotta & Marco Riani & Andrea Cerioli, 2019. "Assessing trimming methodologies for clustering linear regression data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 227-257, March.
Wedel, Michel & DeSarbo, Wayne S., 1996. "Semiparametric estimation of (constrained) ultrametric trees," Research Report 96B34, University of Groningen, Research Institute SOM (Systems, Organisations and Management).
Mathis Poser & Gerrit C. Küstermann & Navid Tavanapour & Eva A. C. Bittner, 2022. "Design and Evaluation of a Conversational Agent for Facilitating Idea Generation in Organizational Innovation Processes," Information Systems Frontiers, Springer, vol. 24(3), pages 771-796, June.
Xuwen Zhu & Volodymyr Melnykov, 2015. "Probabilistic assessment of model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 395-422, December.
Bairong Wang & Jun Zhuang, 2017. "Crisis information distribution on Twitter: a content analysis of tweets during Hurricane Sandy," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 89(1), pages 161-181, October.

More about this item

Keywords

K-means; Semi-supervised clustering; Hard constraints;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:37:y:2020:i:3:d:10.1007_s00357-019-09349-x. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

A Note on the Formal Implementation of the K-means Algorithm with Hard Positive and Negative Constraints

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data