IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v33y2018i4d10.1007_s00180-018-0791-1.html
   My bibliography  Save this article

ClustGeo: an R package for hierarchical clustering with spatial constraints

Author

Listed:
  • Marie Chavent

    (Université de Bordeaux)

  • Vanessa Kuentz-Simonet

    (IRSTEA)

  • Amaury Labenne

    (IRSTEA)

  • Jérôme Saracco

    (ENSC - Bordeaux INP)

Abstract

In this paper, we propose a Ward-like hierarchical clustering algorithm including spatial/geographical constraints. Two dissimilarity matrices $$D_0$$ D 0 and $$D_1$$ D 1 are inputted, along with a mixing parameter $$\alpha \in [0,1]$$ α ∈ [ 0 , 1 ] . The dissimilarities can be non-Euclidean and the weights of the observations can be non-uniform. The first matrix gives the dissimilarities in the “feature space” and the second matrix gives the dissimilarities in the “constraint space”. The criterion minimized at each stage is a convex combination of the homogeneity criterion calculated with $$D_0$$ D 0 and the homogeneity criterion calculated with $$D_1$$ D 1 . The idea is then to determine a value of $$\alpha $$ α which increases the spatial contiguity without deteriorating too much the quality of the solution based on the variables of interest i.e. those of the feature space. This procedure is illustrated on a real dataset using the R package ClustGeo.

Suggested Citation

  • Marie Chavent & Vanessa Kuentz-Simonet & Amaury Labenne & Jérôme Saracco, 2018. "ClustGeo: an R package for hierarchical clustering with spatial constraints," Computational Statistics, Springer, vol. 33(4), pages 1799-1822, December.
  • Handle: RePEc:spr:compst:v:33:y:2018:i:4:d:10.1007_s00180-018-0791-1
    DOI: 10.1007/s00180-018-0791-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-018-0791-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-018-0791-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Anuška Ferligoj & Vladimir Batagelj, 1982. "Clustering with relational constraint," Psychometrika, Springer;The Psychometric Society, vol. 47(4), pages 413-426, December.
    2. Mónica Bécue-Bertaut & Belchin Kostov & Annie Morin & Guilhem Naro, 2014. "Rhetorical Strategy in Forensic Speeches: Multidimensional Statistics-Based Methodology," Journal of Classification, Springer;The Classification Society, vol. 31(1), pages 85-106, April.
    3. Gordon, A. D., 1996. "A survey of constrained classification," Computational Statistics & Data Analysis, Elsevier, vol. 21(1), pages 17-29, January.
    4. Trudie Strauss & Michael Johan von Maltitz, 2017. "Generalising Ward’s Method for Use with Manhattan Distances," PLOS ONE, Public Library of Science, vol. 12(1), pages 1-21, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Pablo Quintana, 2022. "Una metodología de clustering para agrupar series temporales en regiones contiguas," Asociación Argentina de Economía Política: Working Papers 4589, Asociación Argentina de Economía Política.
    2. Mattera, Raffaele & Franses, Philip Hans, 2023. "Are African business cycles synchronized? Evidence from spatio-temporal modeling," Economic Modelling, Elsevier, vol. 128(C).
    3. Dalila Camêlo Aguiar & Ramón Gutiérrez Sánchez & Edwirde Luiz Silva Camêlo, 2020. "Hierarchical Clustering with Spatial Constraints and Standardized Incidence Ratio in Tuberculosis Data," Mathematics, MDPI, vol. 8(9), pages 1-12, September.
    4. Deb, Soudeep & Karmakar, Sayar, 2023. "A novel spatio-temporal clustering algorithm with applications on COVID-19 data from the United States," Computational Statistics & Data Analysis, Elsevier, vol. 188(C).
    5. Facundo Sigal & Jorge Camusso & Ana Inés Navarro, 2022. "Argentine regions based on dynamic criteria," Asociación Argentina de Economía Política: Working Papers 4600, Asociación Argentina de Economía Política.
    6. Meifang Chen & Yongwan Chun & Daniel A. Griffith, 2023. "Delineating Housing Submarkets Using Space–Time House Sales Data: Spatially Constrained Data-Driven Approaches," JRFM, MDPI, vol. 16(6), pages 1-17, June.
    7. Mello, Kaline de & Fendrich, Arthur Nicolaus & Borges-Matos, Clarice & Brites, Alice Dantas & Tavares, Paulo André & da Rocha, Gustavo Casoni & Matsumoto, Marcelo & Rodrigues, Ricardo Ribeiro & Joly, , 2021. "Integrating ecological equivalence for native vegetation compensation: A methodological approach," Land Use Policy, Elsevier, vol. 108(C).
    8. Pablo Aníbal Quintana, 2021. "Métodos de clustering espacialmente restringidos: Un análisis al agrupamiento por nivel de estudio en la provincia de Mendoza," Asociación Argentina de Economía Política: Working Papers 4510, Asociación Argentina de Economía Política.
    9. Nathanaël Randriamihamison & Nathalie Vialaneix & Pierre Neuvial, 2021. "Applicability and Interpretability of Ward’s Hierarchical Agglomerative Clustering With or Without Contiguity Constraints," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 363-389, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rui Fragoso & Conceição Rego & Vladimir Bushenkov, 2016. "Clustering of Territorial Areas: A Multi-Criteria Districting Problem," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 14(2), pages 179-198, December.
    2. Juan Carlos Duque & Raúl Ramos & Jordi Suriñach, 2007. "Supervised Regionalization Methods: A Survey," International Regional Science Review, , vol. 30(3), pages 195-220, July.
    3. Nathanaël Randriamihamison & Nathalie Vialaneix & Pierre Neuvial, 2021. "Applicability and Interpretability of Ward’s Hierarchical Agglomerative Clustering With or Without Contiguity Constraints," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 363-389, July.
    4. Juan Carlos Duque & Raúl Ramos, 2004. "Design of homogenous territorial units: a methodological proposal," ERSA conference papers ersa04p6, European Regional Science Association.
    5. G. Damiana Costanzo, 2001. "A constrainedk-means clustering algorithm for classifying spatial units," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 10(1), pages 237-256, January.
    6. Recchia, Anthony, 2010. "Contiguity-Constrained Hierarchical Agglomerative Clustering Using SAS," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(c02).
    7. Dalila Camêlo Aguiar & Ramón Gutiérrez Sánchez & Edwirde Luiz Silva Camêlo, 2020. "Hierarchical Clustering with Spatial Constraints and Standardized Incidence Ratio in Tuberculosis Data," Mathematics, MDPI, vol. 8(9), pages 1-12, September.
    8. Abang Zainoren Abang Abdurahman & Syerina Azlin Md Nasir & Wan Fairos Wan Yaacob & Serah Jaya & Suhaili Mokhtar, 2021. "Spatio-Temporal Clustering of Sarawak Malaysia Total Protected Area Visitors," Sustainability, MDPI, vol. 13(21), pages 1-19, October.
    9. Iwona Bąk & Anna Barwińska-Małajowicz & Grażyna Wolska & Paweł Walawender & Paweł Hydzik, 2021. "Is the European Union Making Progress on Energy Decarbonisation While Moving towards Sustainable Development?," Energies, MDPI, vol. 14(13), pages 1-18, June.
    10. Giuseppe Giordano & Maria Vitale, 2011. "On the use of external information in social network analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(2), pages 95-112, July.
    11. repec:jss:jstsof:33:c02 is not listed on IDEAS
    12. Juan Carlos Duque & Raul Ramos Lobo & Manuel Artis Ortuno, 2004. "Spanish unemployment: Normative versus analytical regionalisation procedures," Working Papers in Economics 118, Universitat de Barcelona. Espai de Recerca en Economia.
    13. Guidi, Lionel & Ibanez, Frédéric & Calcagno, Vincent & Beaugrand, Grégory, 2009. "A new procedure to optimize the selection of groups in a classification tree: Applications for ecological data," Ecological Modelling, Elsevier, vol. 220(4), pages 451-461.
    14. Schnettler, Berta & Grunert, Klaus G. & Lobos, Germán & Miranda-Zapata, Edgardo & Denegri, Marianela & Lapo, María & Hueche, Clementina & Rojas, Juan, 2019. "Maternal well-being, food involvement and quality of diet: Profiles of single mother-adolescent dyads," Children and Youth Services Review, Elsevier, vol. 96(C), pages 336-345.
    15. Antoine, V. & Quost, B. & Masson, M.-H. & Denœux, T., 2012. "CECM: Constrained evidential C-means algorithm," Computational Statistics & Data Analysis, Elsevier, vol. 56(4), pages 894-914.
    16. Yongcui Lan & Jinliang Wang & Wenying Hu & Eldar Kurbanov & Janine Cole & Jinming Sha & Yuanmei Jiao & Jingchun Zhou, 2023. "Spatial pattern prediction of forest wildfire susceptibility in Central Yunnan Province, China based on multivariate data," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 116(1), pages 565-586, March.
    17. Vidoli, Francesco & Pignataro, Giacomo & Benedetti, Roberto, 2022. "Identification of spatial regimes of the production function of Italian hospitals through spatially constrained cluster-wise regression," Socio-Economic Planning Sciences, Elsevier, vol. 82(PA).
    18. Maravalle, Maurizio & Simeone, Bruno & Naldini, Rosella, 1997. "Clustering on trees," Computational Statistics & Data Analysis, Elsevier, vol. 24(2), pages 217-234, April.
    19. Renato Coppi & Pierpaolo D’Urso & Paolo Giordani, 2010. "A Fuzzy Clustering Model for Multivariate Spatial Time Series," Journal of Classification, Springer;The Classification Society, vol. 27(1), pages 54-88, March.
    20. Laurin Arnold & Jan Jöhnk & Florian Vogt & Nils Urbach, 2022. "IIoT platforms’ architectural features – a taxonomy and five prevalent archetypes," Electronic Markets, Springer;IIM University of St. Gallen, vol. 32(2), pages 927-944, June.
    21. Zdeněk Šulc & Hana Řezanková, 2019. "Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering," Journal of Classification, Springer;The Classification Society, vol. 36(1), pages 58-72, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:33:y:2018:i:4:d:10.1007_s00180-018-0791-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.