IDEAS home Printed from https://ideas.repec.org/a/eee/phsmap/v616y2023ics0378437123001474.html
   My bibliography  Save this article

Determining the number of clusters, before finding clusters, from the susceptibility of the similarity matrix

Author

Listed:
  • Lippiello, E.
  • Baccari, S.
  • Bountzis, P.

Abstract

Clustering represents a fundamental procedure to provide users with meaningful insights from an original data set. The quality of the resulting clusters is largely dependent on the correct estimation of their number, K∗, which must be provided as an input parameter in many clustering algorithms. Only very few techniques provide an automatic detection of K∗ and are usually based on cluster validity indexes which are expensive with regard to computation time. Here, we present a new algorithm which allows one to obtain an accurate estimate of K∗, without partitioning data into the different clusters. This makes the algorithm particularly efficient in handling large-scale data sets from both the perspective of time and space complexity. The algorithm, indeed, highlights the block structure which is implicitly present in the similarity matrix, and associates K∗ to the number of blocks in the matrix. We test the algorithm on synthetic data sets with or without a hierarchical organization of elements. We explore a wide range of K∗ and show the effectiveness of the proposed algorithm to identify K∗, even more accurate than existing methods based on standard internal validity indexes, with a huge advantage in terms of computation time and memory storage. We also discuss the application of the novel algorithm to the de-clustering of instrumental earthquake catalogs, a procedure finalized to identify the level of background seismic activity useful for seismic hazard assessment.

Suggested Citation

  • Lippiello, E. & Baccari, S. & Bountzis, P., 2023. "Determining the number of clusters, before finding clusters, from the susceptibility of the similarity matrix," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 616(C).
  • Handle: RePEc:eee:phsmap:v:616:y:2023:i:c:s0378437123001474
    DOI: 10.1016/j.physa.2023.128592
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0378437123001474
    Download Restriction: Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

    File URL: https://libkey.io/10.1016/j.physa.2023.128592?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zhuang J. & Ogata Y. & Vere-Jones D., 2002. "Stochastic Declustering of Space-Time Earthquake Occurrences," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 369-380, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. van den Hengel, G. & Franses, Ph.H.B.F., 2018. "Forecasting social conflicts in Africa using an Epidemic Type Aftershock Sequence model," Econometric Institute Research Papers EI2018-31, Erasmus University Rotterdam, Erasmus School of Economics (ESE), Econometric Institute.
    2. Chenlong Li & Zhanjie Song & Wenjun Wang, 2020. "Space–time inhomogeneous background intensity estimators for semi-parametric space–time self-exciting point process models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(4), pages 945-967, August.
    3. Ying Song & Harvey Miller, 2012. "Exploring traffic flow databases using space-time plots and data cubes," Transportation, Springer, vol. 39(2), pages 215-234, March.
    4. Chhotu Kumar Keshri & William Kumar Mohanty & Pratul Ranjan, 2020. "Probabilistic seismic hazard assessment for some parts of the Indo-Gangetic plains, India," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 103(1), pages 815-843, August.
    5. Jiaqi Zhang & Xijun He, 2023. "Earthquake magnitude prediction using a VMD-BP neural network model," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 117(1), pages 189-205, May.
    6. Giada Adelfio & Marcello Chiodi, 2021. "Including covariates in a space-time point process with application to seismicity," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(3), pages 947-971, September.
    7. Gresnigt, Francine & Kole, Erik & Franses, Philip Hans, 2015. "Interpreting financial market crashes as earthquakes: A new Early Warning System for medium term crashes," Journal of Banking & Finance, Elsevier, vol. 56(C), pages 123-139.
    8. Rachele Foschi & Francesca Lilla & Cecilia Mancini, 2020. "Warnings about future jumps: properties of the exponential Hawkes model," Working Papers 13/2020, University of Verona, Department of Economics.
    9. D'Angelo, Nicoletta & Adelfio, Giada & Mateu, Jorge, 2023. "Locally weighted minimum contrast estimation for spatio-temporal log-Gaussian Cox processes," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    10. Gilian van den Hengel & Philip Hans Franses, 2020. "Forecasting Social Conflicts in Africa Using an Epidemic Type Aftershock Sequence Model," Forecasting, MDPI, vol. 2(3), pages 1-25, August.
    11. Nader Davoudi & Hamid Reza Tavakoli & Mehdi Zare & Abdollah Jalilian, 2020. "Aftershock probabilistic seismic hazard analysis for Bushehr province in Iran using ETAS model," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 100(3), pages 1159-1170, February.
    12. V. Filimonov & D. Sornette, 2015. "Apparent criticality and calibration issues in the Hawkes self-excited point process model: application to high-frequency financial data," Quantitative Finance, Taylor & Francis Journals, vol. 15(8), pages 1293-1314, August.
    13. Vladimir Filimonov & Didier Sornette, 2013. "Apparent criticality and calibration issues in the Hawkes self-excited point process model: application to high-frequency financial data," Papers 1308.6756, arXiv.org, revised Jul 2014.
    14. Rachele Foschi, 2021. "Measuring Discrepancies Between Poisson and Exponential Hawkes Processes," Methodology and Computing in Applied Probability, Springer, vol. 23(1), pages 219-239, March.
    15. Sankar Kumar Nath & Suman Mandal & Manik Adhikari & Soumya Kanti Maiti, 2017. "A unified earthquake catalogue for South Asia covering the period 1900–2014," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 85(3), pages 1787-1810, February.
    16. Baichuan Yuan & Frederic P. Schoenberg & Andrea L. Bertozzi, 2021. "Fast estimation of multivariate spatiotemporal Hawkes processes and network reconstruction," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(6), pages 1127-1152, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:phsmap:v:616:y:2023:i:c:s0378437123001474. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/physica-a-statistical-mechpplications/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.