Author
Listed:
- Paweł Górecki
- Alexey Markin
- Sriram Vijendran
- Oliver Eulenstein
Abstract
The cophenetic distance is a well-established metric in biology used to compare pairs of trees represented in a vector format. This distance was introduced by Cardona and his co-authors, building on the foundational work of Sokal and Rohlf, which dates back over 60 years. It is widely recognized for its versatility since it can analyze trees with edge weights using various vector norms. However, when comparing large-scale trees, the quadratic runtime of the current best-known (i.e., naïve) algorithm for computing the cophenetic distance can become prohibitive. Recently, a new algorithmic framework with near-linear time complexity has been developed to calculate the distances of a generalized class of cophenetic distances, which are derived from the work of Sokal and Rohlf. This improvement not only allows the cophenetic distance to be utilized in large-scale studies but also enhances the versatility of these studies by incorporating generalized variants of the cophenetic distance. However, the framework is limited to applying only the L1 and L2 vector norms, which significantly restricts the versatility of generalized cophenetic distances in large-scale applications. To address this limitation, we present a near-linear time algorithmic framework for computing the generalized cophenetic distances across all Lp vector norms. In our scalability study, we showcase the practical performance of our unrestricted algorithmic framework. Furthermore, we investigate the applicability of the generalized cophenetic distances by analyzing the distributions of key components of these distances under various vector norms.Author summary: Biological research often relies on large-scale comparisons of evolutionary trees to extract valuable insights across different subfields of biology. To effectively compare trees on a large scale, sensitive metrics are needed to assess differences in topology and branch lengths alongside efficient computational algorithms. In this study, we focus on the classic cophenetic distance, a metric that compares pairs of trees represented in vector format. The cophenetic distance can analyze trees with edge weights using various vector norms, making it highly versatile. Recently, a fast algorithm was developed to calculate generalized cophenetic distances, including the cophenetic distance itself, which has enabled large-scale studies and expanded the types of cophenetic distances applicable for tree pair comparisons. However, this algorithm is limited to L1 and L2 norms, which greatly reduces the versatility of the cophenetic distance. To overcome this limitation, we introduce a fast algorithm that computes the generalized cophenetic distance for all vector norms, making it suitable for large-scale studies. We also conduct a scalability study to demonstrate the effectiveness of our algorithm in practice, and we analyze the distributions of key representatives of cophenetic distances across various vector norms.
Suggested Citation
Paweł Górecki & Alexey Markin & Sriram Vijendran & Oliver Eulenstein, 2025.
"Computing generalized cophenetic distances under all Lp norms: A near-linear time algorithmic framework,"
PLOS Computational Biology, Public Library of Science, vol. 21(6), pages 1-21, June.
Handle:
RePEc:plo:pcbi00:1013069
DOI: 10.1371/journal.pcbi.1013069
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1013069. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.