IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1012337.html
   My bibliography  Save this article

Reliable estimation of tree branch lengths using deep neural networks

Author

Listed:
  • Anton Suvorov
  • Daniel R Schrider

Abstract

A phylogenetic tree represents hypothesized evolutionary history for a set of taxa. Besides the branching patterns (i.e., tree topology), phylogenies contain information about the evolutionary distances (i.e. branch lengths) between all taxa in the tree, which include extant taxa (external nodes) and their last common ancestors (internal nodes). During phylogenetic tree inference, the branch lengths are typically co-estimated along with other phylogenetic parameters during tree topology space exploration. There are well-known regions of the branch length parameter space where accurate estimation of phylogenetic trees is especially difficult. Several novel studies have recently demonstrated that machine learning approaches have the potential to help solve phylogenetic problems with greater accuracy and computational efficiency. In this study, as a proof of concept, we sought to explore the possibility of machine learning models to predict branch lengths. To that end, we designed several deep learning frameworks to estimate branch lengths on fixed tree topologies from multiple sequence alignments or its representations. Our results show that deep learning methods can exhibit superior performance in some difficult regions of branch length parameter space. For example, in contrast to maximum likelihood inference, which is typically used for estimating branch lengths, deep learning methods are more efficient and accurate. In general, we find that our neural networks achieve similar accuracy to a Bayesian approach and are the best-performing methods when inferring long branches that are associated with distantly related taxa. Together, our findings represent a next step toward accurate, fast, and reliable phylogenetic inference with machine learning approaches.Author summary: Phylogenetic trees that delineate organismal relationships serve as a cornerstone structure for almost any basic research leveraging evolutionary information. Besides the tree topology, phylogeneticists are concerned with estimating other fundamental phylogenetic parameters such as the lengths of each branch in the tree. The tree branch lengths are proportional to evolutionary distances between taxa, with long branches representing distantly related taxa and/or accelerated evolution, whereas short branches are indicative of close taxonomic relationships and/or slower evolutionary rates. There is a plethora of phylogenetic methods that can infer branch lengths from sequence data, but they typically exhibit elevated error rates within certain regions of the branch length parameter space and thus in some cases may provide poor estimates. Here, as a proof-of-concept study, we explored the possibility of using artificial neural networks (ANNs) to accurately estimate branch lengths directly from sequence data or its summaries. We show that ANNs can reliably infer branch lengths with accuracy on par with or even better than traditional methods such as Bayesian and maximum likelihood approaches, especially when branches are long. We argue that further investigation of machine learning methods could lead to marked improvements in phylogenetic inference.

Suggested Citation

  • Anton Suvorov & Daniel R Schrider, 2024. "Reliable estimation of tree branch lengths using deep neural networks," PLOS Computational Biology, Public Library of Science, vol. 20(8), pages 1-25, August.
  • Handle: RePEc:plo:pcbi00:1012337
    DOI: 10.1371/journal.pcbi.1012337
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012337
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1012337&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1012337?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Dana Azouri & Shiran Abadi & Yishay Mansour & Itay Mayrose & Tal Pupko, 2021. "Harnessing machine learning to guide phylogenetic-tree search algorithms," Nature Communications, Nature, vol. 12(1), pages 1-9, December.
    2. Shiran Abadi & Dana Azouri & Tal Pupko & Itay Mayrose, 2019. "Model selection may not be a mandatory step for phylogeny reconstruction," Nature Communications, Nature, vol. 10(1), pages 1-11, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Danielle J Parsons & Abigail E Green & Bryan C Carstens & Tara A Pelletier, 2024. "Predicting genetic biodiversity in salamanders using geographic, climatic, and life history traits," PLOS ONE, Public Library of Science, vol. 19(10), pages 1-20, October.
    2. Gasparin, Andrea & Camerota Verdù, Federico Julian & Catanzaro, Daniele, 2023. "An evolution strategy approach for the Balanced Minimum Evolution Problem," LIDAM Discussion Papers CORE 2023021, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1012337. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.