IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1003531.html
   My bibliography  Save this article

Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible

Author

Listed:
  • Paul J McMurdie
  • Susan Holmes

Abstract

Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.Author Summary: The term microbiome refers to the ecosystem of microbes that live in a defined environment. The decreasing cost and increasing speed of DNA sequencing technology has recently provided scientists with affordable and timely access to the genes and genomes of microbiomes that inhabit our planet and even our own bodies. In these investigations many microbiome samples are sequenced at the same time on the same DNA sequencing machine, but often result in total numbers of sequences per sample that are vastly different. The common procedure for addressing this difference in sequencing effort across samples – different library sizes – is to either (1) base analyses on the proportional abundance of each species in a library, or (2) rarefy, throw away sequences from the larger libraries so that all have the same, smallest size. We show that both of these normalization methods can work when comparing obviously-different whole microbiomes, but that neither method works well when comparing the relative proportions of each bacterial species across microbiome samples. We show that alternative methods based on a statistical mixture model perform much better and can be easily adapted from a separate biological sub-discipline, called RNA-Seq analysis.

Suggested Citation

  • Paul J McMurdie & Susan Holmes, 2014. "Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible," PLOS Computational Biology, Public Library of Science, vol. 10(4), pages 1-12, April.
  • Handle: RePEc:plo:pcbi00:1003531
    DOI: 10.1371/journal.pcbi.1003531
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003531
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1003531&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1003531?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Cameron,A. Colin & Trivedi,Pravin K., 2013. "Regression Analysis of Count Data," Cambridge Books, Cambridge University Press, number 9781107667273, January.
    2. James Robert White & Niranjan Nagarajan & Mihai Pop, 2009. "Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples," PLOS Computational Biology, Public Library of Science, vol. 5(4), pages 1-11, April.
    3. Wickham, Hadley, 2011. "The Split-Apply-Combine Strategy for Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i01).
    4. Wickham, Hadley, 2007. "Reshaping Data with the reshape Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 21(i12).
    5. Tanya Yatsunenko & Federico E. Rey & Mark J. Manary & Indi Trehan & Maria Gloria Dominguez-Bello & Monica Contreras & Magda Magris & Glida Hidalgo & Robert N. Baldassano & Andrey P. Anokhin & Andrew C, 2012. "Human gut microbiome viewed across age and geography," Nature, Nature, vol. 486(7402), pages 222-227, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Aaron C Ericsson & J Wade Davis & William Spollen & Nathan Bivens & Scott Givan & Catherine E Hagan & Mark McIntosh & Craig L Franklin, 2015. "Effects of Vendor and Genetic Background on the Composition of the Fecal Microbiota of Inbred Mice," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-19, February.
    2. Amanda H Pendegraft & Boyi Guo & Nengjun Yi, 2019. "Bayesian hierarchical negative binomial models for multivariable analyses with applications to human microbiome count data," PLOS ONE, Public Library of Science, vol. 14(8), pages 1-23, August.
    3. Ewa Sajnaga & Marcin Skowronek & Agnieszka Kalwasińska & Waldemar Kazimierczak & Magdalena Lis & Monika Elżbieta Jach & Adrian Wiater, 2022. "Comparative Nanopore Sequencing-Based Evaluation of the Midgut Microbiota of the Summer Chafer ( Amphimallon solstitiale L.) Associated with Possible Resistance to Entomopathogenic Nematodes," IJERPH, MDPI, vol. 19(6), pages 1-16, March.
    4. Duo Jiang & Thomas Sharpton & Yuan Jiang, 2021. "Microbial Interaction Network Estimation via Bias-Corrected Graphical Lasso," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(2), pages 329-350, July.
    5. Toby Kenney & Hong Gu & Tianshu Huang, 2021. "Poisson PCA: Poisson measurement error corrected PCA, with application to microbiome data," Biometrics, The International Biometric Society, vol. 77(4), pages 1369-1384, December.
    6. Lucas Czech & Alexandros Stamatakis, 2019. "Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples," PLOS ONE, Public Library of Science, vol. 14(5), pages 1-50, May.
    7. Zhigang Li & Katherine Lee & Margaret R. Karagas & Juliette C. Madan & Anne G. Hoen & A. James O’Malley & Hongzhe Li, 2018. "Conditional Regression Based on a Multivariate Zero-Inflated Logistic-Normal Model for Microbiome Relative Abundance Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 10(3), pages 587-608, December.
    8. Shilan Li & Jianxin Shi & Paul Albert & Hong-Bin Fang, 2022. "Dependence Structure Analysis and Its Application in Human Microbiome," Mathematics, MDPI, vol. 11(1), pages 1-14, December.
    9. Yask Gupta & Anna Lara Ernst & Artem Vorobyev & Foteini Beltsiou & Detlef Zillikens & Katja Bieber & Simone Sanna-Cherchi & Angela M. Christiano & Christian D. Sadik & Ralf J. Ludwig & Tanya Sezin, 2023. "Impact of diet and host genetics on the murine intestinal mycobiome," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    10. Andrea Quagliariello & Alessandra Modi & Gabriel Innocenti & Valentina Zaro & Cecilia Conati Barbaro & Annamaria Ronchitelli & Francesco Boschin & Claudio Cavazzuti & Elena Dellù & Francesca Radina & , 2022. "Ancient oral microbiomes support gradual Neolithic dietary shifts towards agriculture," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    11. Zachary D Kurtz & Christian L Müller & Emily R Miraldi & Dan R Littman & Martin J Blaser & Richard A Bonneau, 2015. "Sparse and Compositionally Robust Inference of Microbial Ecological Networks," PLOS Computational Biology, Public Library of Science, vol. 11(5), pages 1-25, May.
    12. M. McCauley & T. L. Goulet & C. R. Jackson & S. Loesgen, 2023. "Systematic review of cnidarian microbiomes reveals insights into the structure, specificity, and fidelity of marine associations," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    13. Chase C. James & Andrew D. Barton & Lisa Zeigler Allen & Robert H. Lampe & Ariel Rabines & Anne Schulberg & Hong Zheng & Ralf Goericke & Kelly D. Goodwin & Andrew E. Allen, 2022. "Influence of nutrient supply on plankton microbiome biodiversity and distribution in a coastal upwelling region," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    14. Anna C. Peterson & Himanshu Sharma & Arvind Kumar & Bruno M. Ghersi & Scott J. Emrich & Kurt J. Vandegrift & Amit Kapoor & Michael J. Blum, 2021. "Rodent Virus Diversity and Differentiation across Post-Katrina New Orleans," Sustainability, MDPI, vol. 13(14), pages 1-18, July.
    15. Chieh Lo & Radu Marculescu, 2017. "MPLasso: Inferring microbial association networks using prior microbial knowledge," PLOS Computational Biology, Public Library of Science, vol. 13(12), pages 1-20, December.
    16. Kurtis Shuler & Samuel Verbanic & Irene A. Chen & Juhee Lee, 2021. "A Bayesian nonparametric analysis for zero‐inflated multivariate count data with application to microbiome study," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 961-979, August.
    17. Dawid Nosek & Tomasz Mikołajczyk & Agnieszka Cydzik-Kwiatkowska, 2023. "Anode Modification with Fe 2 O 3 Affects the Anode Microbiome and Improves Energy Generation in Microbial Fuel Cells Powered by Wastewater," IJERPH, MDPI, vol. 20(3), pages 1-21, January.
    18. Pratheepa Jeganathan & Susan P. Holmes, 2021. "A Statistical Perspective on the Challenges in Molecular Microbial Biology," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(2), pages 131-160, June.
    19. Tianchen Xu & Ryan T. Demmer & Gen Li, 2021. "Zero‐inflated Poisson factor model with application to microbiome read counts," Biometrics, The International Biometric Society, vol. 77(1), pages 91-101, March.
    20. Jianshi Jin & Reiko Yamamoto & Tadashi Takeuchi & Guangwei Cui & Eiji Miyauchi & Nozomi Hojo & Koichi Ikuta & Hiroshi Ohno & Katsuyuki Shiroguchi, 2022. "High-throughput identification and quantification of single bacterial cells in the microbiota," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    21. Francesco Spennati & Salvatore La China & Giovanna Siracusa & Simona Di Gregorio & Alessandra Bardi & Valeria Tigini & Gualtiero Mori & David Gabriel & Giulio Munz, 2021. "Tannery Wastewater Recalcitrant Compounds Foster the Selection of Fungi in Non-Sterile Conditions: A Pilot Scale Long-Term Test," IJERPH, MDPI, vol. 18(12), pages 1-18, June.
    22. Georgia Charalampous & Efsevia Fragkou & Konstantinos A. Kormas & Alexandre B. De Menezes & Paraskevi N. Polymenakou & Nikos Pasadakis & Nicolas Kalogerakis & Eleftheria Antoniou & Evangelia Gontikaki, 2021. "Comparison of Hydrocarbon-Degrading Consortia from Surface and Deep Waters of the Eastern Mediterranean Sea: Characterization and Degradation Potential," Energies, MDPI, vol. 14(8), pages 1-18, April.
    23. Stijn Hawinkel & J C W Rayner & Luc Bijnens & Olivier Thas, 2020. "Sequence count data are poorly fit by the negative binomial distribution," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-16, April.
    24. Robert H. Lampe & Tyler H. Coale & Kiefer O. Forsch & Loay J. Jabre & Samuel Kekuewa & Erin M. Bertrand & Aleš Horák & Miroslav Oborník & Ariel J. Rabines & Elden Rowland & Hong Zheng & Andreas J. And, 2023. "Short-term acidification promotes diverse iron acquisition and conservation mechanisms in upwelling-associated phytoplankton," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    25. Cameron Wagg & Aafke van Erk & Erica Fava & Louis-Pierre Comeau & T. Fatima Mitterboeck & Claudia Goyer & Sheng Li & Andrew McKenzie-Gopsill & Aaron Mills, 2021. "Full-Season Cover Crops and Their Traits That Promote Agroecosystem Services," Agriculture, MDPI, vol. 11(9), pages 1-26, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Allison G. White & George S. Watts & Zhenqiang Lu & Maria M. Meza-Montenegro & Eric A. Lutz & Philip Harber & Jefferey L. Burgess, 2014. "Environmental Arsenic Exposure and Microbiota in Induced Sputum," IJERPH, MDPI, vol. 11(2), pages 1-15, February.
    2. Miller, Christine M.F. & Waterhouse, Hannah & Harter, Thomas & Fadel, James G. & Meyer, Deanne, 2020. "Quantifying the uncertainty in nitrogen application and groundwater nitrate leaching in manure based cropping systems," Agricultural Systems, Elsevier, vol. 184(C).
    3. Sarlas, Georgios & Páez, Antonio & Axhausen, Kay W., 2020. "Betweenness-accessibility: Estimating impacts of accessibility on networks," Journal of Transport Geography, Elsevier, vol. 84(C).
    4. Marin FOTACHE & Florin DUMITRU & Valerica GREAVU-SERBAN, 2015. "An Information Systems Master Programme in Romania. Some Commonalities and Specificities," Informatica Economica, Academy of Economic Studies - Bucharest, Romania, vol. 19(3), pages 5-18.
    5. Martijn Van Heel & Dinska Van Gucht & Koen Vanbrabant & Frank Baeyens, 2017. "The Importance of Conditioned Stimuli in Cigarette and E-Cigarette Craving Reduction by E-Cigarettes," IJERPH, MDPI, vol. 14(2), pages 1-18, February.
    6. Sean McKenzie & Hilary Parkinson & Jane Mangold & Mary Burrows & Selena Ahmed & Fabian Menalled, 2018. "Perceptions, Experiences, and Priorities Supporting Agroecosystem Management Decisions Differ among Agricultural Producers, Consultants, and Researchers," Sustainability, MDPI, vol. 10(11), pages 1-19, November.
    7. Milad Abbasiharofteh & Tom Broekel, 2021. "Still in the shadow of the wall? The case of the Berlin biotechnology cluster," Environment and Planning A, , vol. 53(1), pages 73-94, February.
    8. Andee J. Kaplan & Eric R. Hare, 2019. "Putting down roots: a graphical exploration of community attachment," Computational Statistics, Springer, vol. 34(4), pages 1449-1464, December.
    9. Stefan LINGNER & Eiko THIESSEN & Kerrin MÜLLER & Eberhard HARTUNG, 2018. "Dry Biomass Estimation of Hedge Banks: Allometric Equation vs. Structure from Motion via Unmanned Aerial Vehicle," Journal of Forest Science, Czech Academy of Agricultural Sciences, vol. 64(4), pages 149-156.
    10. Wickham, Hadley, 2014. "Tidy Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 59(i10).
    11. Cornelius J. König & Clemens B. Fell & Linus Kellnhofer & Gabriel Schui, 2015. "Are there gender differences among researchers from industrial/organizational psychology?," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(3), pages 1931-1952, December.
    12. C. Sean Burns & Charles W. Fox, 2017. "Language and socioeconomics predict geographic variation in peer review outcomes at an ecology journal," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(2), pages 1113-1127, November.
    13. Martín, Belén & Páez, Antonio, 2019. "Individual and geographic variations in the propensity to travel by active modes in Vitoria-Gasteiz, Spain," Journal of Transport Geography, Elsevier, vol. 76(C), pages 103-113.
    14. Fiona B. Tamburini & Dylan Maghini & Ovokeraye H. Oduaran & Ryan Brewster & Michaella R. Hulley & Venesa Sahibdeen & Shane A. Norris & Stephen Tollman & Kathleen Kahn & Ryan G. Wagner & Alisha N. Wade, 2022. "Short- and long-read metagenomics of urban and rural South African gut microbiomes reveal a transitional composition and undescribed taxa," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    15. Jean Mercenier & Maria Teresa Alvarez Martinez & Andries Brandsma & Francesco Di Comite & Olga Diukanova & d'Artis Kancs & Patrizio Lecca & Montserrat Lopez-Cobo & Philippe Monfort & Damiaan Persyn & , 2016. "RHOMOLO-v2 Model Description: A spatial computable general equilibrium model for EU regions and sectors," JRC Research Reports JRC100011, Joint Research Centre.
    16. Kayla A. Cotterman & Anthony D. Kendall & Bruno Basso & David W. Hyndman, 2018. "Groundwater depletion and climate change: future prospects of crop production in the Central High Plains Aquifer," Climatic Change, Springer, vol. 146(1), pages 187-200, January.
    17. Chrats Melkonian & Francisco Zorrilla & Inge Kjærbølling & Sonja Blasche & Daniel Machado & Mette Junge & Kim Ib Sørensen & Lene Tranberg Andersen & Kiran R. Patil & Ahmad A. Zeidan, 2023. "Microbial interactions shape cheese flavour formation," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    18. Jana S. Dietrich & Ellen A. R. Welti & Peter Haase, 2023. "Extreme climatic events alter the aquatic insect community in a pristine German stream," Climatic Change, Springer, vol. 176(6), pages 1-16, June.
    19. Thiele, Jan C. & Nuske, Robert S. & Ahrends, Bernd & Panferov, Oleg & Albert, Matthias & Staupendahl, Kai & Junghans, Udo & Jansen, Martin & Saborowski, Joachim, 2017. "Climate change impact assessment—A simulation experiment with Norway spruce for a forest district in Central Europe," Ecological Modelling, Elsevier, vol. 346(C), pages 30-47.
    20. Wang, Xu & Zhang, Xiaobo & Xie, Zhuan & Huang, Yiping, 2016. "Roads to innovation: Firm-level evidence from China:," IFPRI discussion papers 1542, International Food Policy Research Institute (IFPRI).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1003531. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.