IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0201874.html
   My bibliography  Save this article

Balancing effort and benefit of K-means clustering algorithms in Big Data realms

Author

Listed:
  • Joaquín Pérez-Ortega
  • Nelva Nely Almanza-Ortega
  • David Romero

Abstract

In this paper we propose a criterion to balance the processing time and the solution quality of k-means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k-means algorithms are related to the initialization or classification steps. In contrast, our criterion applies in the convergence step, namely, the process stops whenever the number of objects that change their assigned cluster at any iteration is lower than a given threshold. Through computer experimentation with synthetic and real instances, we found that a threshold close to 0.03n involves a decrease in computing time of about a factor 4/100, yielding solutions whose quality reduces by less than two percent. These findings naturally suggest the usefulness of our criterion in Big Data realms.

Suggested Citation

  • Joaquín Pérez-Ortega & Nelva Nely Almanza-Ortega & David Romero, 2018. "Balancing effort and benefit of K-means clustering algorithms in Big Data realms," PLOS ONE, Public Library of Science, vol. 13(9), pages 1-19, September.
  • Handle: RePEc:plo:pone00:0201874
    DOI: 10.1371/journal.pone.0201874
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0201874
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0201874&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0201874?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Konak, Abdullah & Coit, David W. & Smith, Alice E., 2006. "Multi-objective optimization using genetic algorithms: A tutorial," Reliability Engineering and System Safety, Elsevier, vol. 91(9), pages 992-1007.
    2. Yordan P Raykov & Alexis Boukouvalas & Fahd Baig & Max A Little, 2016. "What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm," PLOS ONE, Public Library of Science, vol. 11(9), pages 1-28, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Iram Parvez & Jianjian Shen & Ishitaq Hassan & Nannan Zhang, 2021. "Generation of Hydro Energy by Using Data Mining Algorithm for Cascaded Hydropower Plant," Energies, MDPI, vol. 14(2), pages 1-28, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gupta, Pankaj & Mittal, Garima & Mehlawat, Mukesh Kumar, 2013. "Expected value multiobjective portfolio rebalancing model with fuzzy parameters," Insurance: Mathematics and Economics, Elsevier, vol. 52(2), pages 190-203.
    2. Weifan Zhong & Lijing Du, 2023. "Predicting Traffic Casualties Using Support Vector Machines with Heuristic Algorithms: A Study Based on Collision Data of Urban Roads," Sustainability, MDPI, vol. 15(4), pages 1-18, February.
    3. Zhang, Yue & Zhang, Qi & Farnoosh, Arash & Chen, Siyuan & Li, Yan, 2019. "GIS-Based Multi-Objective Particle Swarm Optimization of charging stations for electric vehicles," Energy, Elsevier, vol. 169(C), pages 844-853.
    4. J. Octavio Gutierrez-Garcia & Kwang Mong Sim, 2012. "GA-based cloud resource estimation for agent-based execution of bag-of-tasks applications," Information Systems Frontiers, Springer, vol. 14(4), pages 925-951, September.
    5. Ahmadi, Mohammad H. & Amin Nabakhteh, Mohammad & Ahmadi, Mohammad-Ali & Pourfayaz, Fathollah & Bidi, Mokhtar, 2017. "Investigation and optimization of performance of nano-scale Stirling refrigerator using working fluid as Maxwell–Boltzmann gases," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 483(C), pages 337-350.
    6. Hausken, Kjell & Levitin, Gregory, 2009. "Minmax defense strategy for complex multi-state systems," Reliability Engineering and System Safety, Elsevier, vol. 94(2), pages 577-587.
    7. Akhlaque Ahmad Khan & Ahmad Faiz Minai & Rupendra Kumar Pachauri & Hasmat Malik, 2022. "Optimal Sizing, Control, and Management Strategies for Hybrid Renewable Energy Systems: A Comprehensive Review," Energies, MDPI, vol. 15(17), pages 1-29, August.
    8. Alarcon-Rodriguez, Arturo & Ault, Graham & Galloway, Stuart, 2010. "Multi-objective planning of distributed energy resources: A review of the state-of-the-art," Renewable and Sustainable Energy Reviews, Elsevier, vol. 14(5), pages 1353-1366, June.
    9. Prina, Matteo Giacomo & Lionetti, Matteo & Manzolini, Giampaolo & Sparber, Wolfram & Moser, David, 2019. "Transition pathways optimization methodology through EnergyPLAN software for long-term energy planning," Applied Energy, Elsevier, vol. 235(C), pages 356-368.
    10. Janssens, Jochen & Van den Bergh, Joos & Sörensen, Kenneth & Cattrysse, Dirk, 2015. "Multi-objective microzone-based vehicle routing for courier companies: From tactical to operational planning," European Journal of Operational Research, Elsevier, vol. 242(1), pages 222-231.
    11. H. Liao & Q. Wu, 2013. "Multi-objective optimization by learning automata," Journal of Global Optimization, Springer, vol. 55(2), pages 459-487, February.
    12. Huan Yu & Jun Yang & Yu Zhao, 2018. "Reliability of nonrepairable phased-mission systems with common bus performance sharing," Journal of Risk and Reliability, , vol. 232(6), pages 647-660, December.
    13. Li, Yuqiang & Liu, Gang & Liu, Xianping & Liao, Shengming, 2016. "Thermodynamic multi-objective optimization of a solar-dish Brayton system based on maximum power output, thermal efficiency and ecological performance," Renewable Energy, Elsevier, vol. 95(C), pages 465-473.
    14. Ahmadi, Mohammad H. & Ahmadi, Mohammad-Ali & Maleki, Akbar & Pourfayaz, Fathollah & Bidi, Mokhtar & Açıkkalp, Emin, 2017. "Exergetic sustainability evaluation and multi-objective optimization of performance of an irreversible nanoscale Stirling refrigeration cycle operating with Maxwell–Boltzmann gas," Renewable and Sustainable Energy Reviews, Elsevier, vol. 78(C), pages 80-92.
    15. Nayara R. M. Sakiyama & Joyce C. Carlo & Leonardo Mazzaferro & Harald Garrecht, 2021. "Building Optimization through a Parametric Design Platform: Using Sensitivity Analysis to Improve a Radial-Based Algorithm Performance," Sustainability, MDPI, vol. 13(10), pages 1-25, May.
    16. Abokersh, Mohamed Hany & Vallès, Manel & Cabeza, Luisa F. & Boer, Dieter, 2020. "A framework for the optimal integration of solar assisted district heating in different urban sized communities: A robust machine learning approach incorporating global sensitivity analysis," Applied Energy, Elsevier, vol. 267(C).
    17. Nizami, M.S.H. & Hossain, M.J. & Amin, B.M. Ruhul & Fernandez, Edstan, 2020. "A residential energy management system with bi-level optimization-based bidding strategy for day-ahead bi-directional electricity trading," Applied Energy, Elsevier, vol. 261(C).
    18. Y Xu & R Qu, 2011. "Solving multi-objective multicast routing problems by evolutionary multi-objective simulated annealing algorithms with variable neighbourhoods," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 62(2), pages 313-325, February.
    19. Ahmadi, Mohammad H. & Ahmadi, Mohammad Ali & Pourfayaz, Fathollah & Hosseinzade, Hadi & Acıkkalp, Emin & Tlili, Iskander & Feidt, Michel, 2016. "Designing a powered combined Otto and Stirling cycle power plant through multi-objective optimization approach," Renewable and Sustainable Energy Reviews, Elsevier, vol. 62(C), pages 585-595.
    20. Goerigk, Marc & Deghdak, Kaouthar & Heßler, Philipp, 2014. "A comprehensive evacuation planning model and genetic solution algorithm," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 71(C), pages 82-97.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0201874. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.