IDEAS home Printed from https://ideas.repec.org/a/pal/jorsoc/v55y2004i9d10.1057_palgrave.jors.2601732.html
   My bibliography  Save this article

An extended study of the K-means algorithm for data clustering and its applications

Author

Listed:
  • Ja-Shen Chen

    (Yuan-Ze University)

  • Russell K H Ching

    (California State University)

  • Yi-Shen Lin

    (Chinatrust Commercial Bank)

Abstract

The K-means algorithm has been a widely applied clustering technique, especially in the area of marketing research. In spite of its popularity and ability to deal with large volumes of data quickly and efficiently, K-means has its drawbacks, such as its inability to provide good solution quality and robustness. In this paper, an extended study of the K-means algorithm is carried out. We propose a new clustering algorithm that integrates the concepts of hierarchical approaches and the K-means algorithm to yield improved performance in terms of solution quality and robustness. This proposed algorithm and score function are introduced and thoroughly discussed. Comparison studies with the K-means algorithm and three popular K-means initialization methods using five well-known test data sets are also presented. Finally, a business application involving segmenting credit card users demonstrates the algorithm's capability.

Suggested Citation

  • Ja-Shen Chen & Russell K H Ching & Yi-Shen Lin, 2004. "An extended study of the K-means algorithm for data clustering and its applications," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 55(9), pages 976-987, September.
  • Handle: RePEc:pal:jorsoc:v:55:y:2004:i:9:d:10.1057_palgrave.jors.2601732
    DOI: 10.1057/palgrave.jors.2601732
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1057/palgrave.jors.2601732
    File Function: Abstract
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1057/palgrave.jors.2601732?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. A C Yeo & K A Smith & R J Willis & M Brooks, 2002. "A mathematical programming approach to optimise insurance premium pricing within a data mining framework," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 53(11), pages 1197-1203, November.
    2. Paul Green & Jonathan Kim & Frank Carmone, 1990. "A preliminary study of optimal variable weighting in k-means clustering," Journal of Classification, Springer;The Classification Society, vol. 7(2), pages 271-285, September.
    3. Glenn Milligan, 1980. "An examination of the effect of six types of error perturbation on fifteen clustering algorithms," Psychometrika, Springer;The Psychometric Society, vol. 45(3), pages 325-342, September.
    4. Lee, Hsuan-Shih, 1999. "Automatic clustering of business processes in business systems planning," European Journal of Operational Research, Elsevier, vol. 114(2), pages 354-362, April.
    5. Balakrishnan, P. V. (Sundar) & Cooper, Martha C. & Jacob, Varghese S. & Lewis, Phillip A., 1996. "Comparative performance of the FSCL neural net and K-means algorithm for market segmentation," European Journal of Operational Research, Elsevier, vol. 93(2), pages 346-357, September.
    6. Glenn Milligan & Martha Cooper, 1985. "An examination of procedures for determining the number of clusters in a data set," Psychometrika, Springer;The Psychometric Society, vol. 50(2), pages 159-179, June.
    7. Hruschka, Harald & Natter, Martin, 1999. "Comparing performance of feedforward neural nets and K-means for cluster-based market segmentation," European Journal of Operational Research, Elsevier, vol. 114(2), pages 346-353, April.
    8. Birgin, E. G. & Martinez, J. M. & Ronconi, D. P., 2003. "Minimization subproblems and heuristics for an applied clustering problem," European Journal of Operational Research, Elsevier, vol. 146(1), pages 19-34, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. José Ramón-Cardona & David Daniel Peña-Miranda & María Dolores Sánchez-Fernández, 2021. "Acceptance of Tourist Offers and Territory: Cluster Analysis of Ibiza Residents (Spain)," Land, MDPI, vol. 10(7), pages 1-15, July.
    2. Z Hua & S Li & Z Tao, 2006. "A rule-based risk decision-making approach and its application in China's customs inspection decision," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 57(11), pages 1313-1322, November.
    3. J Kim & J Yang & S Ólafsson, 2009. "An optimization approach to partitional data clustering," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 60(8), pages 1069-1084, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. YongSeog Kim & W. Nick Street & Gary J. Russell & Filippo Menczer, 2005. "Customer Targeting: A Neural Network Approach Guided by Genetic Algorithms," Management Science, INFORMS, vol. 51(2), pages 264-276, February.
    2. Goethner, Maximilian & Hornuf, Lars & Regner, Tobias, 2021. "Protecting investors in equity crowdfunding: An empirical analysis of the small investor protection act," Technological Forecasting and Social Change, Elsevier, vol. 162(C).
    3. Henner Gimpel & Daniel Rau & Maximilian Röglinger, 2018. "Understanding FinTech start-ups – a taxonomy of consumer-oriented service offerings," Electronic Markets, Springer;IIM University of St. Gallen, vol. 28(3), pages 245-264, August.
    4. Michael Brusco & Douglas Steinley, 2015. "Affinity Propagation and Uncapacitated Facility Location Problems," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 443-480, October.
    5. P. (Sundar) Balakrishnan & Martha Cooper & Varghese Jacob & Phillip Lewis, 1994. "A study of the classification capabilities of neural networks using unsupervised learning: A comparison withK-means clustering," Psychometrika, Springer;The Psychometric Society, vol. 59(4), pages 509-525, December.
    6. Alexander, Corinne E. & Wilson, Christine A. & Foley, Daniel H., 2004. "Agricultural Input Market Segments: Who Is Buying?," 2004 Annual meeting, August 1-4, Denver, CO 19997, American Agricultural Economics Association (New Name 2008: Agricultural and Applied Economics Association).
    7. Weinand, J.M. & McKenna, R. & Fichtner, W., 2019. "Developing a municipality typology for modelling decentralised energy systems," Utilities Policy, Elsevier, vol. 57(C), pages 75-96.
    8. Ertl, Antal & Horn, Dániel & Kiss, Hubert János, 2024. "Economic Preferences across Generations and Family Clusters: A Comment," I4R Discussion Paper Series 105, The Institute for Replication (I4R).
    9. Bauer, Hans H. & Fischer, Marc, 2000. "Product life cycle patterns for pharmaceuticals and their impact on R&D profitability of late mover products," International Business Review, Elsevier, vol. 9(6), pages 703-725, December.
    10. Maximilian Goethner & Sebastian Luettig & Tobias Regner, 2021. "Crowdinvesting in entrepreneurial projects: disentangling patterns of investor behavior," Small Business Economics, Springer, vol. 57(2), pages 905-926, August.
    11. Cyril Atkinson-Clement & Eléonore Pigalle, 2021. "What can we learn from Covid-19 pandemic’s impact on human behaviour? The case of France’s lockdown," Palgrave Communications, Palgrave Macmillan, vol. 8(1), pages 1-12, December.
    12. Johanna Mair & Julie Battilana & Julian Cardenas, 2012. "Organizing for Society: A Typology of Social Entrepreneuring Models," Journal of Business Ethics, Springer, vol. 111(3), pages 353-373, December.
    13. Mingoti, Sueli A. & Lima, Joab O., 2006. "Comparing SOM neural network with Fuzzy c-means, K-means and traditional hierarchical clustering algorithms," European Journal of Operational Research, Elsevier, vol. 174(3), pages 1742-1759, November.
    14. Douglas Steinley & Michael Brusco, 2008. "Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures," Psychometrika, Springer;The Psychometric Society, vol. 73(1), pages 125-144, March.
    15. Abba Krieger & Paul Green, 1999. "A cautionary note on using internal cross validation to select the number of clusters," Psychometrika, Springer;The Psychometric Society, vol. 64(3), pages 341-353, September.
    16. Maximilian Goethner & Lars Hornuf & Tobias Regner, 2020. "Protecting Investors in Equity Crowdfunding: An Empirical Analysis of the Small Investor Protection Act," Bremen Papers on Economics & Innovation 2008, University of Bremen, Faculty of Business Studies and Economics.
    17. Chenhong Peng & Yik-Wa Law, 2023. "How Do Consumption Patterns Influence the Discrepancy Between Economic and Subjective Poverty?," Journal of Happiness Studies, Springer, vol. 24(4), pages 1579-1604, April.
    18. Balakrishnan, P. V. (Sundar) & Cooper, Martha C. & Jacob, Varghese S. & Lewis, Phillip A., 1996. "Comparative performance of the FSCL neural net and K-means algorithm for market segmentation," European Journal of Operational Research, Elsevier, vol. 93(2), pages 346-357, September.
    19. Pieter C. Schoonees & Patrick J. F. Groenen & Michel Velden, 2022. "Least-squares bilinear clustering of three-way data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(4), pages 1001-1037, December.
    20. Liu, Pei-chen Barry & Hansen, Mark & Mukherjee, Avijit, 2008. "Scenario-based air traffic flow management: From theory to practice," Transportation Research Part B: Methodological, Elsevier, vol. 42(7-8), pages 685-702, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pal:jorsoc:v:55:y:2004:i:9:d:10.1057_palgrave.jors.2601732. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.palgrave-journals.com/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.