IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0297355.html
   My bibliography  Save this article

The least sample size essential for detecting changes in clustering solutions of streaming datasets

Author

Listed:
  • Muhammad Atif
  • Muhammad Farooq
  • Mohammad Abiad
  • Muhammad Shafiq

Abstract

The clustering analysis approach treats multivariate data tuples as objects and groups them into clusters based on their similarities or dissimilarities within the dataset. However, in modern world, a significant volume of data is continuously generated from diverse sources over time. In these dynamic scenarios, the data is not static but continually evolves. Consequently, the interesting patterns and inherent subgroups within the datasets also change and develop over time. The researchers have paid special attention to monitoring changes in cluster solutions of evolving streams. For this matter, several algorithms have been proposed in the literature. However, to date, no study has examined the effect of variability in cluster sizes on the evolution of cluster solutions. Moreover, no guidance is available on determining the impact of cluster sizes on the type of changes they experience in the streams. In the present simulation study using artificial datasets, the evolution of clusters is examined concerning the variability in cluster sizes. The findings are substantial because tracing and monitoring the changes in clustering solutions have a wide range of applications in every field of research. This study determines the minimum sample size required in the clustering of time-stamped datasets.

Suggested Citation

  • Muhammad Atif & Muhammad Farooq & Mohammad Abiad & Muhammad Shafiq, 2024. "The least sample size essential for detecting changes in clustering solutions of streaming datasets," PLOS ONE, Public Library of Science, vol. 19(2), pages 1-14, February.
  • Handle: RePEc:plo:pone00:0297355
    DOI: 10.1371/journal.pone.0297355
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0297355
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0297355&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0297355?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Weiliang Qiu & Harry Joe, 2006. "Generation of Random Clusters with Specified Degree of Separation," Journal of Classification, Springer;The Classification Society, vol. 23(2), pages 315-334, September.
    2. Muhammad Atif & Muhammad Shafiq & Friedrich Leisch, 2023. "Applications of monitoring and tracing the evolution of clustering solutions in dynamic datasets," Journal of Applied Statistics, Taylor & Francis Journals, vol. 50(4), pages 1017-1035, March.
    3. Sadia Basar & Mushtaq Ali & Gilberto Ochoa-Ruiz & Mahdi Zareei & Abdul Waheed & Awais Adnan, 2020. "Unsupervised color image segmentation: A case of RGB histogram based K-means clustering initialization," PLOS ONE, Public Library of Science, vol. 15(10), pages 1-21, October.
    4. R. A. Rigby & D. M. Stasinopoulos, 2005. "Generalized additive models for location, scale and shape," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 54(3), pages 507-554, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yixuan Wang & Jianzhu Li & Ping Feng & Rong Hu, 2015. "A Time-Dependent Drought Index for Non-Stationary Precipitation Series," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 29(15), pages 5631-5647, December.
    2. Auteri, Monica & Cremaschini, Alessandro, 2024. "Ownership or procurement, which matters? exploring asymmetries in local public transportation in Italy through a semi-parametric approach," The Journal of Economic Asymmetries, Elsevier, vol. 30(C).
    3. Nathaniel Geiger & Bryan McLaughlin & John Velez, 2021. "Not all boomers: temporal orientation explains inter- and intra-cultural variability in the link between age and climate engagement," Climatic Change, Springer, vol. 166(1), pages 1-20, May.
    4. Panayi, Efstathios & Peters, Gareth W. & Danielsson, Jon & Zigrand, Jean-Pierre, 2018. "Designating market maker behaviour in limit order book markets," Econometrics and Statistics, Elsevier, vol. 5(C), pages 20-44.
    5. Gauss Cordeiro & Josemar Rodrigues & Mário Castro, 2012. "The exponential COM-Poisson distribution," Statistical Papers, Springer, vol. 53(3), pages 653-664, August.
    6. Christian Kleiber & Achim Zeileis, 2016. "Visualizing Count Data Regressions Using Rootograms," The American Statistician, Taylor & Francis Journals, vol. 70(3), pages 296-303, July.
    7. Chen, Shu & Shao, Dongguo & Tan, Xuezhi & Gu, Wenquan & Lei, Caixiu, 2017. "An interval multistage classified model for regional inter- and intra-seasonal water management under uncertain and nonstationary condition," Agricultural Water Management, Elsevier, vol. 191(C), pages 98-112.
    8. Riccardo De Bin & Vegard Grødem Stikbakke, 2023. "A boosting first-hitting-time model for survival analysis in high-dimensional settings," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 29(2), pages 420-440, April.
    9. Matteo Malavasi & Gareth W. Peters & Pavel V. Shevchenko & Stefan Truck & Jiwook Jang & Georgy Sofronov, 2021. "Cyber Risk Frequency, Severity and Insurance Viability," Papers 2111.03366, arXiv.org, revised Mar 2022.
    10. Joanna Baj-Korpak & Marian Jan Stelmach & Kamil Zaworski & Piotr Lichograj & Marek Wochna, 2022. "Assessment of Motor Abilities and Physical Fitness in Youth in the Context of Talent Identification—OSF Test," IJERPH, MDPI, vol. 19(21), pages 1-19, November.
    11. Simon Hirsch, 2025. "Online Multivariate Regularized Distributional Regression for High-dimensional Probabilistic Electricity Price Forecasting," Papers 2504.02518, arXiv.org.
    12. Youxin Wang & Tao Peng & Qingxia Lin & Vijay P. Singh & Xiaohua Dong & Chen Chen & Ji Liu & Wenjuan Chang & Gaoxu Wang, 2022. "A New Non-stationary Hydrological Drought Index Encompassing Climate Indices and Modified Reservoir Index as Covariates," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 36(7), pages 2433-2454, May.
    13. Lucio Masserini & Matilde Bini & Monica Pratesi, 2017. "Effectiveness of non-selective evaluation test scores for predicting first-year performance in university career: a zero-inflated beta regression approach," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(2), pages 693-708, March.
    14. Hötte, Kerstin, 2023. "Demand-pull, technology-push, and the direction of technological change," Research Policy, Elsevier, vol. 52(5).
    15. Simon N. Wood & Natalya Pya & Benjamin Säfken, 2016. "Smoothing Parameter and Model Selection for General Smooth Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1548-1563, October.
    16. Wu, Guojun & Song, Ge & Lv, Xiaoxiang & Luo, Shikai & Shi, Chengchun & Zhu, Hongtu, 2023. "DNet: distributional network for distributional individualized treatment effects," LSE Research Online Documents on Economics 122895, London School of Economics and Political Science, LSE Library.
    17. Dominique Guegan & Bertrand K. Hassani, 2011. "Operational risk: a Basel II++ step before Basel III," Documents de travail du Centre d'Economie de la Sorbonne 11053, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
    18. Tong, Edward N.C. & Mues, Christophe & Thomas, Lyn, 2013. "A zero-adjusted gamma model for mortgage loan loss given default," International Journal of Forecasting, Elsevier, vol. 29(4), pages 548-562.
    19. Alexander Silbersdorff & Kai Sebastian Schneider, 2019. "Distributional Regression Techniques in Socioeconomic Research on the Inequality of Health with an Application on the Relationship between Mental Health and Income," IJERPH, MDPI, vol. 16(20), pages 1-28, October.
    20. Jeffrey Andrews & Paul McNicholas, 2014. "Variable Selection for Clustering and Classification," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 136-153, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0297355. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.