Generalising Ward’s Method for Use with Manhattan Distances

My bibliography Save this article

Generalising Ward’s Method for Use with Manhattan Distances

Author

Listed:

Trudie Strauss
Michael Johan von Maltitz

Registered:

Abstract

The claim that Ward’s linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward’s clustering algorithm is generalised to use with l1 norm or Manhattan distances. We argue that the generalisation of Ward’s linkage method to incorporate Manhattan distances is theoretically sound and provide an example of where this method outperforms the method using Euclidean distances. As an application, we perform statistical analyses on languages using methods normally applied to biology and genetic classification. We aim to quantify differences in character traits between languages and use a statistical language signature based on relative bi-gram (sequence of two letters) frequencies to calculate a distance matrix between 32 Indo-European languages. We then use Ward’s method of hierarchical clustering to classify the languages, using the Euclidean distance and the Manhattan distance. Results obtained from using the different distance metrics are compared to show that the Ward’s algorithm characteristic of minimising intra-cluster variation and maximising inter-cluster variation is not violated when using the Manhattan metric.

Suggested Citation

Trudie Strauss & Michael Johan von Maltitz, 2017. "Generalising Ward’s Method for Use with Manhattan Distances," PLOS ONE, Public Library of Science, vol. 12(1), pages 1-21, January.

Handle: RePEc:plo:pone00:0168288
DOI: 10.1371/journal.pone.0168288

Download full text from publisher

References listed on IDEAS

Nancy C. M. Ross & Dietmar Wolfram, 2000. "End user searching on the Internet: An analysis of term pair topics submitted to the Excite search engine," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 51(10), pages 949-958.
Brock, Guy & Pihur, Vasyl & Datta, Susmita & Datta, Somnath, 2008. "clValid: An R Package for Cluster Validation," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 25(i04).
Glenn Milligan, 1979. "Ultrametric hierarchical clustering algorithms," Psychometrika, Springer;The Psychometric Society, vol. 44(3), pages 343-346, September.
Zhenmin Chen & John Ness, 1996. "Space-conserving agglomerative algorithms," Journal of Classification, Springer;The Classification Society, vol. 13(1), pages 157-168, March.
Gabor J. Szekely & Maria L. Rizzo, 2005. "Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method," Journal of Classification, Springer;The Classification Society, vol. 22(2), pages 151-183, September.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

G Anahí Salas-Gallardo & Jonathan-Julio Lorea-Hernández & Ángel Abdiel Robles-Gómez & Claudia Castillo-Martin Del Campo & Fernando Peña-Ortega, 2024. "Morphological differentiation of peritumoral brain zone microglia," PLOS ONE, Public Library of Science, vol. 19(3), pages 1-27, March.
Schnettler, Berta & Grunert, Klaus G. & Lobos, Germán & Miranda-Zapata, Edgardo & Denegri, Marianela & Lapo, María & Hueche, Clementina & Rojas, Juan, 2019. "Maternal well-being, food involvement and quality of diet: Profiles of single mother-adolescent dyads," Children and Youth Services Review, Elsevier, vol. 96(C), pages 336-345.
Laurin Arnold & Jan Jöhnk & Florian Vogt & Nils Urbach, 2022. "IIoT platforms’ architectural features – a taxonomy and five prevalent archetypes," Electronic Markets, Springer;IIM University of St. Gallen, vol. 32(2), pages 927-944, June.
Dalila Camêlo Aguiar & Ramón Gutiérrez Sánchez & Edwirde Luiz Silva Camêlo, 2020. "Hierarchical Clustering with Spatial Constraints and Standardized Incidence Ratio in Tuberculosis Data," Mathematics, MDPI, vol. 8(9), pages 1-12, September.
Trotta, Gianluca, 2020. "An empirical analysis of domestic electricity load profiles: Who consumes how much and when?," Applied Energy, Elsevier, vol. 275(C).
Abang Zainoren Abang Abdurahman & Syerina Azlin Md Nasir & Wan Fairos Wan Yaacob & Serah Jaya & Suhaili Mokhtar, 2021. "Spatio-Temporal Clustering of Sarawak Malaysia Total Protected Area Visitors," Sustainability, MDPI, vol. 13(21), pages 1-19, October.
Iwona Bąk & Anna Barwińska-Małajowicz & Grażyna Wolska & Paweł Walawender & Paweł Hydzik, 2021. "Is the European Union Making Progress on Energy Decarbonisation While Moving towards Sustainable Development?," Energies, MDPI, vol. 14(13), pages 1-18, June.
Marie Chavent & Vanessa Kuentz-Simonet & Amaury Labenne & Jérôme Saracco, 2018. "ClustGeo: an R package for hierarchical clustering with spatial constraints," Computational Statistics, Springer, vol. 33(4), pages 1799-1822, December.
Zdeněk Šulc & Hana Řezanková, 2019. "Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering," Journal of Classification, Springer;The Classification Society, vol. 36(1), pages 58-72, April.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Alan Lee & Bobby Willcox, 2014. "Minkowski Generalizations of Ward’s Method in Hierarchical Clustering," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 194-218, July.
Pavel I. Blus & Rustam V. Plotnikov, 2022. "Spatial clustering for reducing intraregional unevenness," Journal of New Economy, Ural State University of Economics, vol. 23(1), pages 88-108, April.
Patrick Zschech & Kai Heinrich & Raphael Bink & Janis S. Neufeld, 2019. "Prognostic Model Development with Missing Labels," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 61(3), pages 327-343, June.
Jia Zhu & Xingcheng Wu & Xueqin Lin & Changqin Huang & Gabriel Pui Cheong Fung & Yong Tang, 2018. "A novel multiple layers name disambiguation framework for digital libraries using dynamic clustering," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 781-794, March.
Anahita Nodehi & Mousa Golalizadeh & Mehdi Maadooliat & Claudio Agostinelli, 2025. "Torus Probabilistic Principal Component Analysis," Journal of Classification, Springer;The Classification Society, vol. 42(2), pages 435-456, July.
Gainbi Park & Zengwang Xu, 2022. "The constituent components and local indicator variables of social vulnerability index," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 110(1), pages 95-120, January.
Linde, Jona & Sonnemans, Joep & Tuinstra, Jan, 2014. "Strategies and evolution in the minority game: A multi-round strategy experiment," Games and Economic Behavior, Elsevier, vol. 86(C), pages 77-95.
- Jona Linde & Joep Sonnemans & Jan Tuinstra, 2013. "Strategies and Evolution in the Minority Game: A Multi- Round Strategy Experiment," Tinbergen Institute Discussion Papers 13-043/I, Tinbergen Institute.
- Sonnemans, J. & Tuinstra, J. & Linde, J., 2013. "Strategies and Evolution in the Minority Game: A Multi- Round Strategy Experiment," CeNDEF Working Papers 13-02, Universiteit van Amsterdam, Center for Nonlinear Dynamics in Economics and Finance.
Gautier Marti & Frank Nielsen & Philippe Donnat & S'ebastien Andler, 2016. "On clustering financial time series: a need for distances between dependent random variables," Papers 1603.07822, arXiv.org.
Zdeňka Náglová & Tereza Horáková, 2017. "Position of the Bakery Enterprises in the Czech Republic According to Detailed Specification of the Businesses," Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Mendel University Press, vol. 65(5), pages 1719-1727.
Borke, Lukas & Härdle, Wolfgang Karl, 2016. "Q3-D3-Lsa," SFB 649 Discussion Papers 2016-049, Humboldt University Berlin, Collaborative Research Center 649: Economic Risk.
Ana Alina Tudoran, 2022. "A machine learning approach to identifying decision-making styles for managing customer relationships," Electronic Markets, Springer;IIM University of St. Gallen, vol. 32(1), pages 351-374, March.
Wu, Han-Ming, 2011. "On biological validity indices for soft clustering algorithms for gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1969-1979, May.
Renato Amorim, 2015. "Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm," Journal of Classification, Springer;The Classification Society, vol. 32(1), pages 46-62, April.
Quessy, Jean-François, 2021. "A Szekely–Rizzo inequality for testing general copula homogeneity hypotheses," Journal of Multivariate Analysis, Elsevier, vol. 186(C).
Carmen C. Rodríguez-Martínez & Mitzi Cubilla-Montilla & Purificación Vicente-Galindo & Purificación Galindo-Villardón, 2023. "X-STATIS: A Multivariate Approach to Characterize the Evolution of E-Participation, from a Global Perspective," Mathematics, MDPI, vol. 11(6), pages 1-15, March.
Fionn Murtagh & Pierre Legendre, 2014. "Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?," Journal of Classification, Springer;The Classification Society, vol. 31(3), pages 274-295, October.
Drago, Carlo & Fortuna, Fabio, "undated". "Investigating the Corporate Governance and Sustainability Relationship: A Bibliometric Analysis Using Keyword-Ensemble Community Detection," FEEM Working Papers 336985, Fondazione Eni Enrico Mattei (FEEM).
- Carlo Drago & Fabio Fortuna, 2023. "Investigating the Corporate Governance and Sustainability Relationship: A Bibliometric Analysis Using Keyword-Ensemble Community Detection," Working Papers 2023.12, Fondazione Eni Enrico Mattei.
Judit Bar-Ilan, 2001. "Data collection methods on the Web for infometric purposes — A review and analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 50(1), pages 7-32, January.
Wu, Tong & Rocha, Juan C. & Berry, Kevin & Chaigneau, Tomas & Hamann, Maike & Lindkvist, Emilie & Qiu, Jiangxiao & Schill, Caroline & Shepon, Alon & Crépin, Anne-Sophie & Folke, Carl, 2024. "Triple Bottom Line or Trilemma? Global Tradeoffs Between Prosperity, Inequality, and the Environment," World Development, Elsevier, vol. 178(C).
Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0168288. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Generalising Ward’s Method for Use with Manhattan Distances

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data