IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1004226.html
   My bibliography  Save this article

Sparse and Compositionally Robust Inference of Microbial Ecological Networks

Author

Listed:
  • Zachary D Kurtz
  • Christian L Müller
  • Emily R Miraldi
  • Dan R Littman
  • Martin J Blaser
  • Richard A Bonneau

Abstract

16S ribosomal RNA (rRNA) gene and other environmental sequencing techniques provide snapshots of microbial communities, revealing phylogeny and the abundances of microbial populations across diverse ecosystems. While changes in microbial community structure are demonstrably associated with certain environmental conditions (from metabolic and immunological health in mammals to ecological stability in soils and oceans), identification of underlying mechanisms requires new statistical tools, as these datasets present several technical challenges. First, the abundances of microbial operational taxonomic units (OTUs) from amplicon-based datasets are compositional. Counts are normalized to the total number of counts in the sample. Thus, microbial abundances are not independent, and traditional statistical metrics (e.g., correlation) for the detection of OTU-OTU relationships can lead to spurious results. Secondly, microbial sequencing-based studies typically measure hundreds of OTUs on only tens to hundreds of samples; thus, inference of OTU-OTU association networks is severely under-powered, and additional information (or assumptions) are required for accurate inference. Here, we present SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that addresses both of these issues. SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological association network is sparse. To reconstruct the network, SPIEC-EASI relies on algorithms for sparse neighborhood and inverse covariance selection. To provide a synthetic benchmark in the absence of an experimentally validated gold-standard network, SPIEC-EASI is accompanied by a set of computational tools to generate OTU count data from a set of diverse underlying network topologies. SPIEC-EASI outperforms state-of-the-art methods to recover edges and network properties on synthetic data under a variety of scenarios. SPIEC-EASI also reproducibly predicts previously unknown microbial associations using data from the American Gut project.Author Summary: Genomic survey of microbes by 16S rRNA gene sequencing and metagenomics has inspired appreciation for the role of complex communities in diverse ecosystems. However, due to the unique properties of community composition data, standard data analysis tools are likely to produce statistical artifacts. For a typical experiment studying microbial ecosystems these artifacts can lead to erroneous conclusions about patterns of associations between microbial taxa. We developed a new procedure that seeks to infer ecological associations between microbial populations, by 1) taking advantage of the proportionality invariance of relative abundance data and 2) making assumptions about the underlying network structure when the number of taxa in the dataset is larger than the number of sampled communities. Additionally, we employed a novel tool to generate biologically plausible synthetic data and objectively benchmark current association inference tools. Finally, we tested our procedures on a large-scale 16S rRNA gene sequencing dataset sampled from the human gut.

Suggested Citation

  • Zachary D Kurtz & Christian L Müller & Emily R Miraldi & Dan R Littman & Martin J Blaser & Richard A Bonneau, 2015. "Sparse and Compositionally Robust Inference of Microbial Ecological Networks," PLOS Computational Biology, Public Library of Science, vol. 11(5), pages 1-25, May.
  • Handle: RePEc:plo:pcbi00:1004226
    DOI: 10.1371/journal.pcbi.1004226
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004226
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1004226&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1004226?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Peter J. Turnbaugh & Ruth E. Ley & Micah Hamady & Claire M. Fraser-Liggett & Rob Knight & Jeffrey I. Gordon, 2007. "The Human Microbiome Project," Nature, Nature, vol. 449(7164), pages 804-810, October.
    2. Lam, Clifford & Fan, Jianqing, 2009. "Sparsistency and rates of convergence in large covariance matrix estimation," LSE Research Online Documents on Economics 31540, London School of Economics and Political Science, LSE Library.
    3. Ming Yuan & Yi Lin, 2007. "Model selection and estimation in the Gaussian graphical model," Biometrika, Biometrika Trust, vol. 94(1), pages 19-35.
    4. Paul J McMurdie & Susan Holmes, 2014. "Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible," PLOS Computational Biology, Public Library of Science, vol. 10(4), pages 1-12, April.
    5. Wei Lin & Pixu Shi & Rui Feng & Hongzhe Li, 2014. "Variable selection in regression with compositional covariates," Biometrika, Biometrika Trust, vol. 101(4), pages 785-797.
    6. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Runtan Cheng & Lu Wang & Shenglong Le & Yifan Yang & Can Zhao & Xiangqi Zhang & Xin Yang & Ting Xu & Leiting Xu & Petri Wiklund & Jun Ge & Dajiang Lu & Chenhong Zhang & Luonan Chen & Sulin Cheng, 2022. "A randomized controlled trial for response of microbiome network to exercise and diet intervention in patients with nonalcoholic fatty liver disease," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    2. Yu Shang & Johannes Sikorski & Michael Bonkowski & Anna-Maria Fiore-Donno & Ellen Kandeler & Sven Marhan & Runa S Boeddinghaus & Emily F Solly & Marion Schrumpf & Ingo Schöning & Tesfaye Wubet & Franc, 2017. "Inferring interactions in complex microbial communities from nucleotide sequence data and environmental parameters," PLOS ONE, Public Library of Science, vol. 12(3), pages 1-24, March.
    3. Huang Lin & Merete Eggesbø & Shyamal Das Peddada, 2022. "Linear and nonlinear correlation estimators unveil undescribed taxa interactions in microbiome data," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    4. Duo Jiang & Thomas Sharpton & Yuan Jiang, 2021. "Microbial Interaction Network Estimation via Bias-Corrected Graphical Lasso," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(2), pages 329-350, July.
    5. Maria Rita Perrone & Salvatore Romano & Giuseppe De Maria & Paolo Tundo & Anna Rita Bruno & Luigi Tagliaferro & Michele Maffia & Mattia Fragola, 2022. "Compositional Data Analysis of 16S rRNA Gene Sequencing Results from Hospital Airborne Microbiome Samples," IJERPH, MDPI, vol. 19(16), pages 1-21, August.
    6. Oliver Aasmets & Kertu Liis Krigul & Kreete Lüll & Andres Metspalu & Elin Org, 2022. "Gut metagenome associations with extensive digital health data in a volunteer-based Estonian microbiome cohort," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    7. Lingjing Jiang & Niina Haiminen & Anna‐Paola Carrieri & Shi Huang & Yoshiki Vázquez‐Baeza & Laxmi Parida & Ho‐Cheol Kim & Austin D. Swafford & Rob Knight & Loki Natarajan, 2022. "Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data," Biometrics, The International Biometric Society, vol. 78(3), pages 1155-1167, September.
    8. Qin Liu & Qi Su & Fen Zhang & Hein M. Tun & Joyce Wing Yan Mak & Grace Chung-Yan Lui & Susanna So Shan Ng & Jessica Y. L. Ching & Amy Li & Wenqi Lu & Chenyu Liu & Chun Pan Cheung & David S. C. Hui & P, 2022. "Multi-kingdom gut microbiota analyses define COVID-19 severity and post-acute COVID-19 syndrome," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    9. Courtney M. Thomas & Elie Desmond-Le Quéméner & Simonetta Gribaldo & Guillaume Borrel, 2022. "Factors shaping the abundance and diversity of the gut archaeome across the animal kingdom," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    10. Juan José Egozcue & Vera Pawlowsky-Glahn, 2019. "Compositional data: the sample space and its structure," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(3), pages 599-638, September.
    11. Ines Wilms & Jacob Bien, 2021. "Tree-based Node Aggregation in Sparse Graphical Models," Papers 2101.12503, arXiv.org.
    12. Susheel Bhanu Busi & Massimo Bourquin & Stilianos Fodelianakis & Grégoire Michoud & Tyler J. Kohler & Hannes Peter & Paraskevi Pramateftaki & Michail Styllas & Matteo Tolosano & Vincent Staercke & Mar, 2022. "Genomic and metabolic adaptations of biofilms to ecological windows of opportunity in glacier-fed streams," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    13. Jing Ma, 2021. "Joint Microbial and Metabolomic Network Estimation with the Censored Gaussian Graphical Model," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(2), pages 351-372, July.
    14. McGillivray, Annaliza & Khalili, Abbas & Stephens, David A., 2020. "Estimating sparse networks with hubs," Journal of Multivariate Analysis, Elsevier, vol. 179(C).
    15. Brandon Kieft & Niko Finke & Ryan J. McLaughlin & Aditi N. Nallan & Martin Krzywinski & Sean A. Crowe & Steven J. Hallam, 2023. "Genome-resolved correlation mapping links microbial community structure to metabolic interactions driving methane production from wastewater," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    16. Emma Schwager & Himel Mallick & Steffen Ventz & Curtis Huttenhower, 2017. "A Bayesian method for detecting pairwise associations in compositional data," PLOS Computational Biology, Public Library of Science, vol. 13(11), pages 1-21, November.
    17. Chieh Lo & Radu Marculescu, 2017. "MPLasso: Inferring microbial association networks using prior microbial knowledge," PLOS Computational Biology, Public Library of Science, vol. 13(12), pages 1-20, December.
    18. Pratheepa Jeganathan & Susan P. Holmes, 2021. "A Statistical Perspective on the Challenges in Molecular Microbial Biology," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(2), pages 131-160, June.
    19. Ana Popovic & Celine Bourdon & Pauline W. Wang & David S. Guttman & Sajid Soofi & Zulfiqar A. Bhutta & Robert H. J. Bandsma & John Parkinson & Lisa G. Pell, 2021. "Micronutrient supplements can promote disruptive protozoan and fungal communities in the developing infant gut," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    20. Tyler A Joseph & Liat Shenhav & Joao B Xavier & Eran Halperin & Itsik Pe’er, 2020. "Compositional Lotka-Volterra describes microbial dynamics in the simplex," PLOS Computational Biology, Public Library of Science, vol. 16(5), pages 1-22, May.
    21. Rieser, Christopher & Filzmoser, Peter, 2023. "Extending compositional data analysis from a graph signal processing perspective," Journal of Multivariate Analysis, Elsevier, vol. 198(C).
    22. Jiarui Lu & Pixu Shi & Hongzhe Li, 2019. "Generalized linear models with linear constraints for microbiome compositional data," Biometrics, The International Biometric Society, vol. 75(1), pages 235-244, March.
    23. Dina in ‘t Zandt & Zuzana Kolaříková & Tomáš Cajthaml & Zuzana Münzbergová, 2023. "Plant community stability is associated with a decoupling of prokaryote and fungal soil networks," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    24. Liang, Wanfeng & Wu, Yue & Ma, Xiaoyan, 2022. "Robust sparse precision matrix estimation for high-dimensional compositional data," Statistics & Probability Letters, Elsevier, vol. 184(C).
    25. Li, Lianwei & Li, Wendy & Zou, Quan & Ma, Zhanshan (Sam), 2020. "Network analysis of the hot spring microbiome sketches out possible niche differentiations among ecological guilds," Ecological Modelling, Elsevier, vol. 431(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Khai X. Chiong & Hyungsik Roger Moon, 2017. "Estimation of Graphical Models using the $L_{1,2}$ Norm," Papers 1709.10038, arXiv.org, revised Oct 2017.
    2. Liangliang Zhang & Yushu Shi & Robert R. Jenq & Kim‐Anh Do & Christine B. Peterson, 2021. "Bayesian compositional regression with structured priors for microbiome feature selection," Biometrics, The International Biometric Society, vol. 77(3), pages 824-838, September.
    3. Pan, Yuqing & Mai, Qing, 2020. "Efficient computation for differential network analysis with applications to quadratic discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    4. Ziqi Chen & Chenlei Leng, 2016. "Dynamic Covariance Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 1196-1207, July.
    5. Zhigang Li & Katherine Lee & Margaret R. Karagas & Juliette C. Madan & Anne G. Hoen & A. James O’Malley & Hongzhe Li, 2018. "Conditional Regression Based on a Multivariate Zero-Inflated Logistic-Normal Model for Microbiome Relative Abundance Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 10(3), pages 587-608, December.
    6. McGillivray, Annaliza & Khalili, Abbas & Stephens, David A., 2020. "Estimating sparse networks with hubs," Journal of Multivariate Analysis, Elsevier, vol. 179(C).
    7. Li, Peili & Xiao, Yunhai, 2018. "An efficient algorithm for sparse inverse covariance matrix estimation based on dual formulation," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 292-307.
    8. Duo Jiang & Thomas Sharpton & Yuan Jiang, 2021. "Microbial Interaction Network Estimation via Bias-Corrected Graphical Lasso," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(2), pages 329-350, July.
    9. Lam, Clifford, 2008. "Estimation of large precision matrices through block penalization," LSE Research Online Documents on Economics 31543, London School of Economics and Political Science, LSE Library.
    10. Shilan Li & Jianxin Shi & Paul Albert & Hong-Bin Fang, 2022. "Dependence Structure Analysis and Its Application in Human Microbiome," Mathematics, MDPI, vol. 11(1), pages 1-14, December.
    11. Benjamin Poignard & Manabu Asai, 2023. "Estimation of high-dimensional vector autoregression via sparse precision matrix," The Econometrics Journal, Royal Economic Society, vol. 26(2), pages 307-326.
    12. Dong Liu & Changwei Zhao & Yong He & Lei Liu & Ying Guo & Xinsheng Zhang, 2023. "Simultaneous cluster structure learning and estimation of heterogeneous graphs for matrix‐variate fMRI data," Biometrics, The International Biometric Society, vol. 79(3), pages 2246-2259, September.
    13. Huangdi Yi & Qingzhao Zhang & Cunjie Lin & Shuangge Ma, 2022. "Information‐incorporated Gaussian graphical model for gene expression data," Biometrics, The International Biometric Society, vol. 78(2), pages 512-523, June.
    14. S Klaassen & J Kueck & M Spindler & V Chernozhukov, 2023. "Uniform inference in high-dimensional Gaussian graphical models," Biometrika, Biometrika Trust, vol. 110(1), pages 51-68.
    15. Liu, Jianyu & Yu, Guan & Liu, Yufeng, 2019. "Graph-based sparse linear discriminant analysis for high-dimensional classification," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 250-269.
    16. Lafit, Ginette & Nogales Martín, Francisco Javier & Zamar, Rubén, 2015. "Ranking Edges and Model Selection in High-Dimensional Graphs," DES - Working Papers. Statistics and Econometrics. WS ws1511, Universidad Carlos III de Madrid. Departamento de Estadística.
    17. Lichun Wang & Yuan You & Heng Lian, 2015. "Convergence and sparsity of Lasso and group Lasso in high-dimensional generalized linear models," Statistical Papers, Springer, vol. 56(3), pages 819-828, August.
    18. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    19. Lam, Clifford, 2020. "High-dimensional covariance matrix estimation," LSE Research Online Documents on Economics 101667, London School of Economics and Political Science, LSE Library.
    20. Wei Lan & Ronghua Luo & Chih-Ling Tsai & Hansheng Wang & Yunhong Yang, 2015. "Testing the Diagonality of a Large Covariance Matrix in a Regression Setting," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 33(1), pages 76-86, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1004226. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.