IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/0030051.html
   My bibliography  Save this article

Measures of Clade Confidence Do Not Correlate with Accuracy of Phylogenetic Trees

Author

Listed:
  • Barry G Hall
  • Stephen J Salipante

Abstract

Metrics of phylogenetic tree reliability, such as parametric bootstrap percentages or Bayesian posterior probabilities, represent internal measures of the topological reproducibility of a phylogenetic tree, while the recently introduced aLRT (approximate likelihood ratio test) assesses the likelihood that a branch exists on a maximum-likelihood tree. Although those values are often equated with phylogenetic tree accuracy, they do not necessarily estimate how well a reconstructed phylogeny represents cladistic relationships that actually exist in nature. The authors have therefore attempted to quantify how well bootstrap percentages, posterior probabilities, and aLRT measures reflect the probability that a deduced phylogenetic clade is present in a known phylogeny. The authors simulated the evolution of bacterial genes of varying lengths under biologically realistic conditions, and reconstructed those known phylogenies using both maximum likelihood and Bayesian methods. Then, they measured how frequently clades in the reconstructed trees exhibiting particular bootstrap percentages, aLRT values, or posterior probabilities were found in the true trees. The authors have observed that none of these values correlate with the probability that a given clade is present in the known phylogeny. The major conclusion is that none of the measures provide any information about the likelihood that an individual clade actually exists. It is also found that the mean of all clade support values on a tree closely reflects the average proportion of all clades that have been assigned correctly, and is thus a good representation of the overall accuracy of a phylogenetic tree.: The construction of phylogenetic trees, which depict past relationships between groups of DNA or protein sequences, has valuable application in many fields of study, most commonly evolutionary and population biology. Before drawing conclusions from phylogenetic trees, it is important to assess how accurate those reconstructions are. This is typically accomplished by examining measures of “clade credibility” (such as bootstrap or posterior probability values), which represent how reproducible relationships are within the tree based on the parameters of the phylogenetic analysis. However, such measures do not necessarily reflect how likely inferred relationships are to have actually occurred in nature. Therefore, using simulated data where relationships are known, we have determined how well several measures of clade credibility correlate with the likelihood that a deduced phylogenetic grouping actually exists in reality. Surprisingly, we found no such correlation, and that the inferred relationships were correctly assigned about as often in cases where clade credibility values were very low as where they were high. This finding suggests that current measures of phylogenetic tree reliability are not useful in predicting whether specific inferred relationships have actually occurred.

Suggested Citation

  • Barry G Hall & Stephen J Salipante, 2007. "Measures of Clade Confidence Do Not Correlate with Accuracy of Phylogenetic Trees," PLOS Computational Biology, Public Library of Science, vol. 3(3), pages 1-9, March.
  • Handle: RePEc:plo:pcbi00:0030051
    DOI: 10.1371/journal.pcbi.0030051
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0030051
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.0030051&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.0030051?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Bob Mau & Michael A. Newton & Bret Larget, 1999. "Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods," Biometrics, The International Biometric Society, vol. 55(1), pages 1-12, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Malika Ihle & Isabel S. Winney & Anna Krystalli & Michael Croucher, 2017. "Striving for transparent and credible research: practical guidelines for behavioral ecologists," Behavioral Ecology, International Society for Behavioral Ecology, vol. 28(2), pages 348-354.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ian J. Wilson & Michael E. Weale & David J. Balding, 2003. "Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 166(2), pages 155-188, June.
    2. Alexandra Gavryushkina & David Welch & Tanja Stadler & Alexei J Drummond, 2014. "Bayesian Inference of Sampled Ancestor Trees for Epidemiology and Fossil Calibration," PLOS Computational Biology, Public Library of Science, vol. 10(12), pages 1-15, December.
    3. Elena Rivas & Sean R Eddy, 2008. "Probabilistic Phylogenetic Inference with Insertions and Deletions," PLOS Computational Biology, Public Library of Science, vol. 4(9), pages 1-21, September.
    4. Lin, Yu-Min & Fang, Shu-Cherng & Thorne, Jeffrey L., 2007. "A tabu search algorithm for maximum parsimony phylogeny inference," European Journal of Operational Research, Elsevier, vol. 176(3), pages 1908-1917, February.
    5. Rigat, F. & Mira, A., 2012. "Parallel hierarchical sampling: A general-purpose interacting Markov chains Monte Carlo algorithm," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1450-1467.
    6. Jordan Douglas & Rong Zhang & Remco Bouckaert, 2021. "Adaptive dating and fast proposals: Revisiting the phylogenetic relaxed clock model," PLOS Computational Biology, Public Library of Science, vol. 17(2), pages 1-30, February.
    7. Spade David A., 2020. "An extended model for phylogenetic maximum likelihood based on discrete morphological characters," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 19(1), pages 1-11, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:0030051. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.