IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1004186.html
   My bibliography  Save this article

Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting

Author

Listed:
  • Davide Albanese
  • Carlotta De Filippo
  • Duccio Cavalieri
  • Claudio Donati

Abstract

Metagenomics is revolutionizing our understanding of microbial communities, showing that their structure and composition have profound effects on the ecosystem and in a variety of health and disease conditions. Despite the flourishing of new analysis methods, current approaches based on statistical comparisons between high-level taxonomic classes often fail to identify the microbial taxa that are differentially distributed between sets of samples, since in many cases the taxonomic schema do not allow an adequate description of the structure of the microbiota. This constitutes a severe limitation to the use of metagenomic data in therapeutic and diagnostic applications. To provide a more robust statistical framework, we introduce a class of feature-weighting algorithms that discriminate the taxa responsible for the classification of metagenomic samples. The method unambiguously groups the relevant taxa into clades without relying on pre-defined taxonomic categories, thus including in the analysis also those sequences for which a taxonomic classification is difficult. The phylogenetic clades are weighted and ranked according to their abundance measuring their contribution to the differentiation of the classes of samples, and a criterion is provided to define a reduced set of most relevant clades. Applying the method to public datasets, we show that the data-driven definition of relevant phylogenetic clades accomplished by our ranking strategy identifies features in the samples that are lost if phylogenetic relationships are not considered, improving our ability to mine metagenomic datasets. Comparison with supervised classification methods currently used in metagenomic data analysis highlights the advantages of using phylogenetic information.Author Summary: In metagenomics, the composition of complex microbial communities is characterized using Next Generation Sequencing technologies. Thanks to the decreasing cost of sequencing, large amounts of data have been generated for environmental samples and for a variety of health-associated conditions. In parallel there has been a flourishing of statistical methods to analyze metagenomic datasets, concentrating mainly on the problem of assessing the existence of significant differences between microbial communities in different conditions. However, for a large number of therapeutic and diagnostic applications it would be essential to identify and rank the microbial taxa that are most relevant in these comparisons. Here we present PhyloRelief, a novel feature-ranking algorithm that fills this gap by integrating the phylogenetic relationships amongst the taxa into a statistical feature weighting procedure. Without relying on a precompiled taxonomy, PhyloRelief determines the lineages most relevant to the diversification of the samples guided by the data. As such, PhyloRelief can be applied both to cases in which sequences can be classified according to a known taxonomy, and to cases in which this is not feasible, a common occurrence in metagenomic data analysis given the increasing number of new and uncultivable taxa that are discovered using these technologies.

Suggested Citation

  • Davide Albanese & Carlotta De Filippo & Duccio Cavalieri & Claudio Donati, 2015. "Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting," PLOS Computational Biology, Public Library of Science, vol. 11(3), pages 1-18, March.
  • Handle: RePEc:plo:pcbi00:1004186
    DOI: 10.1371/journal.pcbi.1004186
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004186
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1004186&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1004186?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Stephanie L. Schnorr & Marco Candela & Simone Rampelli & Manuela Centanni & Clarissa Consolandi & Giulia Basaglia & Silvia Turroni & Elena Biagi & Clelia Peano & Marco Severgnini & Jessica Fiori & Rob, 2014. "Gut microbiome of the Hadza hunter-gatherers," Nature Communications, Nature, vol. 5(1), pages 1-12, May.
    2. George M. Weinstock, 2012. "Genomic approaches to studying the human microbiota," Nature, Nature, vol. 489(7415), pages 250-256, September.
    3. Tanya Yatsunenko & Federico E. Rey & Mark J. Manary & Indi Trehan & Maria Gloria Dominguez-Bello & Monica Contreras & Magda Magris & Glida Hidalgo & Robert N. Baldassano & Andrey P. Anokhin & Andrew C, 2012. "Human gut microbiome viewed across age and geography," Nature, Nature, vol. 486(7402), pages 222-227, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gertrude Ecklu-Mensah & Candice Choo-Kang & Maria Gjerstad Maseng & Sonya Donato & Pascal Bovet & Bharathi Viswanathan & Kweku Bedu-Addo & Jacob Plange-Rhule & Prince Oti Boateng & Terrence E. Forrest, 2023. "Gut microbiota and fecal short chain fatty acids differ with adiposity and country of origin: the METS-microbiome study," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    2. Fiona B. Tamburini & Dylan Maghini & Ovokeraye H. Oduaran & Ryan Brewster & Michaella R. Hulley & Venesa Sahibdeen & Shane A. Norris & Stephen Tollman & Kathleen Kahn & Ryan G. Wagner & Alisha N. Wade, 2022. "Short- and long-read metagenomics of urban and rural South African gut microbiomes reveal a transitional composition and undescribed taxa," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    3. Ruairi C. Robertson & Thaddeus J. Edens & Lynnea Carr & Kuda Mutasa & Ethan K. Gough & Ceri Evans & Hyun Min Geum & Iman Baharmand & Sandeep K. Gill & Robert Ntozini & Laura E. Smith & Bernard Chasekw, 2023. "The gut microbiome and early-life growth in a population with high prevalence of stunting," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    4. John Molloy & Katrina Allen & Fiona Collier & Mimi L. K. Tang & Alister C. Ward & Peter Vuillermin, 2013. "The Potential Link between Gut Microbiota and IgE-Mediated Food Allergy in Early Life," IJERPH, MDPI, vol. 10(12), pages 1-22, December.
    5. Antonella Gagliardi & Valentina Totino & Fatima Cacciotti & Valerio Iebba & Bruna Neroni & Giulia Bonfiglio & Maria Trancassini & Claudio Passariello & Fabrizio Pantanella & Serena Schippa, 2018. "Rebuilding the Gut Microbiota Ecosystem," IJERPH, MDPI, vol. 15(8), pages 1-24, August.
    6. Allison G. White & George S. Watts & Zhenqiang Lu & Maria M. Meza-Montenegro & Eric A. Lutz & Philip Harber & Jefferey L. Burgess, 2014. "Environmental Arsenic Exposure and Microbiota in Induced Sputum," IJERPH, MDPI, vol. 11(2), pages 1-15, February.
    7. Sanzhima Garmaeva & Trishla Sinha & Anastasia Gulyaeva & Nataliia Kuzub & Johanne E. Spreckels & Sergio Andreu-Sánchez & Ranko Gacesa & Arnau Vich Vila & Siobhan Brushett & Marloes Kruk & Jackie Deken, 2024. "Transmission and dynamics of mother-infant gut viruses during pregnancy and early life," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    8. Ruchi Shroff & Carla Ramos Cortés, 2020. "The Biodiversity Paradigm: Building Resilience for Human and Environmental Health," Development, Palgrave Macmillan;Society for International Deveopment, vol. 63(2), pages 172-180, December.
    9. Tetyana Zakharkina & Elke Heinzel & Rembert A Koczulla & Timm Greulich & Katharina Rentz & Josch K Pauling & Jan Baumbach & Mathias Herrmann & Christiane Grünewald & Hendrik Dienemann & Lutz von Mülle, 2013. "Analysis of the Airway Microbiota of Healthy Individuals and Patients with Chronic Obstructive Pulmonary Disease by T-RFLP and Clone Sequencing," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-11, July.
    10. Jean-Sebastien Gounot & Minghao Chia & Denis Bertrand & Woei-Yuh Saw & Aarthi Ravikrishnan & Adrian Low & Yichen Ding & Amanda Hui Qi Ng & Linda Wei Lin Tan & Yik-Ying Teo & Henning Seedorf & Niranjan, 2022. "Genome-centric analysis of short and long read metagenomes reveals uncharacterized microbiome diversity in Southeast Asians," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    11. Fanette Fontaine & Sondra Turjeman & Karel Callens & Omry Koren, 2023. "The intersection of undernutrition, microbiome, and child development in the first years of life," Nature Communications, Nature, vol. 14(1), pages 1-9, December.
    12. Emidio Scarpellini & Emanuele Rinninella & Martina Basilico & Esther Colomier & Carlo Rasetti & Tiziana Larussa & Pierangelo Santori & Ludovico Abenavoli, 2021. "From Pre- and Probiotics to Post-Biotics: A Narrative Review," IJERPH, MDPI, vol. 19(1), pages 1-14, December.
    13. Kerstin Thriene & Karin B. Michels, 2023. "Human Gut Microbiota Plasticity throughout the Life Course," IJERPH, MDPI, vol. 20(2), pages 1-14, January.
    14. Charles K Fisher & Thierry Mora & Aleksandra M Walczak, 2017. "Variable habitat conditions drive species covariation in the human microbiota," PLOS Computational Biology, Public Library of Science, vol. 13(4), pages 1-18, April.
    15. Amanda H Pendegraft & Boyi Guo & Nengjun Yi, 2019. "Bayesian hierarchical negative binomial models for multivariable analyses with applications to human microbiome count data," PLOS ONE, Public Library of Science, vol. 14(8), pages 1-23, August.
    16. Paul J McMurdie & Susan Holmes, 2014. "Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible," PLOS Computational Biology, Public Library of Science, vol. 10(4), pages 1-12, April.
    17. David Martino, 2019. "The Effects of Chlorinated Drinking Water on the Assembly of the Intestinal Microbiome," Challenges, MDPI, vol. 10(1), pages 1-7, January.
    18. Elio L Herzog & Melania Wäfler & Irene Keller & Sebastian Wolf & Martin S Zinkernagel & Denise C Zysset-Burri, 2021. "The importance of age in compositional and functional profiling of the human intestinal microbiome," PLOS ONE, Public Library of Science, vol. 16(10), pages 1-13, October.
    19. Jian Wang & Cielito C. Reyes-Gibby & Sanjay Shete, 2021. "An Approach to Analyze Longitudinal Zero-Inflated Microbiome Count Data Using Two-Stage Mixed Effects Models," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(2), pages 267-290, July.
    20. Kang Li & Zeng Dan & Luobu Gesang & Hong Wang & Yongjian Zhou & Yanlei Du & Yi Ren & Yixiang Shi & Yuqiang Nie, 2016. "Comparative Analysis of Gut Microbiota of Native Tibetan and Han Populations Living at Different Altitudes," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-16, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1004186. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.