IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1009089.html
   My bibliography  Save this article

A zero inflated log-normal model for inference of sparse microbial association networks

Author

Listed:
  • Vincent Prost
  • Stéphane Gazut
  • Thomas Brüls

Abstract

The advent of high-throughput metagenomic sequencing has prompted the development of efficient taxonomic profiling methods allowing to measure the presence, abundance and phylogeny of organisms in a wide range of environmental samples. Multivariate sequence-derived abundance data further has the potential to enable inference of ecological associations between microbial populations, but several technical issues need to be accounted for, like the compositional nature of the data, its extreme sparsity and overdispersion, as well as the frequent need to operate in under-determined regimes.The ecological network reconstruction problem is frequently cast into the paradigm of Gaussian Graphical Models (GGMs) for which efficient structure inference algorithms are available, like the graphical lasso and neighborhood selection. Unfortunately, GGMs or variants thereof can not properly account for the extremely sparse patterns occurring in real-world metagenomic taxonomic profiles. In particular, structural zeros (as opposed to sampling zeros) corresponding to true absences of biological signals fail to be properly handled by most statistical methods.We present here a zero-inflated log-normal graphical model (available at https://github.com/vincentprost/Zi-LN) specifically aimed at handling such “biological” zeros, and demonstrate significant performance gains over state-of-the-art statistical methods for the inference of microbial association networks, with most notable gains obtained when analyzing taxonomic profiles displaying sparsity levels on par with real-world metagenomic datasets.Author summary: The importance of associations in the structuring and dynamics of community members is widely acknowledged, but we are currently unable to co-culture most of the micro-organims sampled from the environment. Computational methods to predict microbial associations can therefore be of practical interest, in particular given the large amounts of multivariate microbial abundance data generated by metagenomics. This data can in theory be leveraged to infer association networks, but with limited success so far, as several of its attributes lead to technical difficulties, including its extreme sparsity, compositionality and overdispersion among others. In particular, structural zeros (as opposed to sampling and technical zeros) corresponding to true absences of biological signals frequently fail to be properly handled, and such non-random absences can lead to high levels of false positives. Given their prevalence, zero values should be properly handled by the modeling process by accounting for the zero generating process in the first place. We describe here a truncated log-normal graphical model that specifically addresses zeros originating from biological absences, and discuss consistent methods for estimating sparse and high-dimensional association networks. We also show that this model generates sparse multivariate counts more close to those derived from real-world microbiomes.

Suggested Citation

  • Vincent Prost & Stéphane Gazut & Thomas Brüls, 2021. "A zero inflated log-normal model for inference of sparse microbial association networks," PLOS Computational Biology, Public Library of Science, vol. 17(6), pages 1-17, June.
  • Handle: RePEc:plo:pcbi00:1009089
    DOI: 10.1371/journal.pcbi.1009089
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009089
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1009089&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1009089?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1009089. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.