IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1006102.html
   My bibliography  Save this article

Correcting for batch effects in case-control microbiome studies

Author

Listed:
  • Sean M Gibbons
  • Claire Duvallet
  • Eric J Alm

Abstract

High-throughput data generation platforms, like mass-spectrometry, microarrays, and second-generation sequencing are susceptible to batch effects due to run-to-run variation in reagents, equipment, protocols, or personnel. Currently, batch correction methods are not commonly applied to microbiome sequencing datasets. In this paper, we compare different batch-correction methods applied to microbiome case-control studies. We introduce a model-free normalization procedure where features (i.e. bacterial taxa) in case samples are converted to percentiles of the equivalent features in control samples within a study prior to pooling data across studies. We look at how this percentile-normalization method compares to traditional meta-analysis methods for combining independent p-values and to limma and ComBat, widely used batch-correction models developed for RNA microarray data. Overall, we show that percentile-normalization is a simple, non-parametric approach for correcting batch effects and improving sensitivity in case-control meta-analyses.Author summary: Batch effects are obstacles to comparing results across studies. Traditional meta-analysis techniques for combining p-values from independent studies, like Fisher’s method, are effective but statistically conservative. If batch-effects can be corrected, then statistical tests can be performed on data pooled across studies, increasing sensitivity to detect differences between treatment groups. Here, we show how a simple, model-free approach corrects for batch effects in case-control microbiome datasets.

Suggested Citation

  • Sean M Gibbons & Claire Duvallet & Eric J Alm, 2018. "Correcting for batch effects in case-control microbiome studies," PLOS Computational Biology, Public Library of Science, vol. 14(4), pages 1-17, April.
  • Handle: RePEc:plo:pcbi00:1006102
    DOI: 10.1371/journal.pcbi.1006102
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006102
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006102&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1006102?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Claire Duvallet & Sean M. Gibbons & Thomas Gurry & Rafael A. Irizarry & Eric J. Alm, 2017. "Meta-analysis of gut microbiome studies identifies disease-specific and shared responses," Nature Communications, Nature, vol. 8(1), pages 1-10, December.
    2. Peter J. Turnbaugh & Micah Hamady & Tanya Yatsunenko & Brandi L. Cantarel & Alexis Duncan & Ruth E. Ley & Mitchell L. Sogin & William J. Jones & Bruce A. Roe & Jason P. Affourtit & Michael Egholm & Be, 2009. "A core gut microbiome in obese and lean twins," Nature, Nature, vol. 457(7228), pages 480-484, January.
    3. Chao Chen & Kay Grennan & Judith Badner & Dandan Zhang & Elliot Gershon & Li Jin & Chunyu Liu, 2011. "Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods," PLOS ONE, Public Library of Science, vol. 6(2), pages 1-10, February.
    4. Edoardo Pasolli & Duy Tin Truong & Faizan Malik & Levi Waldron & Nicola Segata, 2016. "Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights," PLOS Computational Biology, Public Library of Science, vol. 12(7), pages 1-26, July.
    5. Jeffrey T Leek & John D Storey, 2007. "Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis," PLOS Genetics, Public Library of Science, vol. 3(9), pages 1-12, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Wodan Ling & Jiuyao Lu & Ni Zhao & Anju Lulla & Anna M. Plantinga & Weijia Fu & Angela Zhang & Hongjiao Liu & Hoseung Song & Zhigang Li & Jun Chen & Timothy W. Randolph & Wei Li A. Koay & James R. Whi, 2022. "Batch effects removal for microbiome data via conditional quantile regression," Nature Communications, Nature, vol. 13(1), pages 1-14, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Qi Su & Qin Liu & Raphaela Iris Lau & Jingwan Zhang & Zhilu Xu & Yun Kit Yeoh & Thomas W. H. Leung & Whitney Tang & Lin Zhang & Jessie Q. Y. Liang & Yuk Kam Yau & Jiaying Zheng & Chengyu Liu & Mengjin, 2022. "Faecal microbiome-based machine learning for multi-class disease diagnosis," Nature Communications, Nature, vol. 13(1), pages 1-8, December.
    2. Aline Talhouk & Stefan Kommoss & Robertson Mackenzie & Martin Cheung & Samuel Leung & Derek S Chiu & Steve E Kalloger & David G Huntsman & Stephanie Chen & Maria Intermaggio & Jacek Gronwald & Fong C , 2016. "Single-Patient Molecular Testing with NanoString nCounter Data Using a Reference-Based Strategy for Batch Effect Correction," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-18, April.
    3. Charlotte Soneson & Sarah Gerster & Mauro Delorenzi, 2014. "Batch Effect Confounding Leads to Strong Bias in Performance Estimates Obtained by Cross-Validation," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-13, June.
    4. Christian Müller & Arne Schillert & Caroline Röthemeier & David-Alexandre Trégouët & Carole Proust & Harald Binder & Norbert Pfeiffer & Manfred Beutel & Karl J Lackner & Renate B Schnabel & Laurence T, 2016. "Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data," PLOS ONE, Public Library of Science, vol. 11(6), pages 1-23, June.
    5. Alan Le Goallec & Braden T Tierney & Jacob M Luber & Evan M Cofer & Aleksandar D Kostic & Chirag J Patel, 2020. "A systematic machine learning and data type comparison yields metagenomic predictors of infant age, sex, breastfeeding, antibiotic usage, country of origin, and delivery type," PLOS Computational Biology, Public Library of Science, vol. 16(5), pages 1-21, May.
    6. Patrick D Schloss, 2009. "A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies," PLOS ONE, Public Library of Science, vol. 4(12), pages 1-9, December.
    7. John Molloy & Katrina Allen & Fiona Collier & Mimi L. K. Tang & Alister C. Ward & Peter Vuillermin, 2013. "The Potential Link between Gut Microbiota and IgE-Mediated Food Allergy in Early Life," IJERPH, MDPI, vol. 10(12), pages 1-22, December.
    8. Bharati Patel & Kadamb Patel & Shabbir Moochhala, 2020. "Diet-Derived Post-Biotic Metabolites to Promote Microbiota Function and Human Health," Biomedical Journal of Scientific & Technical Research, Biomedical Research Network+, LLC, vol. 28(2), pages 21520-21524, June.
    9. Xia Qing & Thompson Jeffrey A. & Koestler Devin C., 2021. "Batch effect reduction of microarray data with dependent samples using an empirical Bayes approach (BRIDGE)," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 20(4-6), pages 101-119, December.
    10. Ahmed A Metwally & Philip S Yu & Derek Reiman & Yang Dai & Patricia W Finn & David L Perkins, 2019. "Utilizing longitudinal microbiome taxonomic profiles to predict food allergy via Long Short-Term Memory networks," PLOS Computational Biology, Public Library of Science, vol. 15(2), pages 1-16, February.
    11. Arjun Bhattacharya & Anastasia N. Freedman & Vennela Avula & Rebeca Harris & Weifang Liu & Calvin Pan & Aldons J. Lusis & Robert M. Joseph & Lisa Smeester & Hadley J. Hartwell & Karl C. K. Kuban & Car, 2022. "Placental genomics mediates genetic associations with complex health traits and disease," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    12. Pirjo Wacklin & Harri Mäkivuokko & Noora Alakulppi & Janne Nikkilä & Heli Tenkanen & Jarkko Räbinä & Jukka Partanen & Kari Aranko & Jaana Mättö, 2011. "Secretor Genotype (FUT2 gene) Is Strongly Associated with the Composition of Bifidobacteria in the Human Intestine," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-10, May.
    13. Yunxi Liu & R. A. Leo Elworth & Michael D. Jochum & Kjersti M. Aagaard & Todd J. Treangen, 2022. "De novo identification of microbial contaminants in low microbial biomass microbiomes with Squeegee," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    14. C. E. Dubé & M. Ziegler & A. Mercière & E. Boissin & S. Planes & C. A. -F. Bourmaud & C. R. Voolstra, 2021. "Naturally occurring fire coral clones demonstrate a genetic and environmental basis of microbiome composition," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    15. Mariana F. Fernández & Iris Reina-Pérez & Juan Manuel Astorga & Andrea Rodríguez-Carrillo & Julio Plaza-Díaz & Luis Fontana, 2018. "Breast Cancer and Its Relationship with the Microbiota," IJERPH, MDPI, vol. 15(8), pages 1-20, August.
    16. Jaron Thompson & Renee Johansen & John Dunbar & Brian Munsky, 2019. "Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition," PLOS ONE, Public Library of Science, vol. 14(7), pages 1-16, July.
    17. repec:jss:jstsof:40:i14 is not listed on IDEAS
    18. Hung-Chih Chen & Yen-Wen Liu & Kuan-Cheng Chang & Yen-Wen Wu & Yi-Ming Chen & Yu-Kai Chao & Min-Yi You & David J. Lundy & Chen-Ju Lin & Marvin L. Hsieh & Yu-Che Cheng & Ray P. Prajnamitra & Po-Ju Lin , 2023. "Gut butyrate-producers confer post-infarction cardiac protection," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    19. Braden T Tierney & Yingxuan Tan & Zhen Yang & Bing Shui & Michaela J Walker & Benjamin M Kent & Aleksandar D Kostic & Chirag J Patel, 2022. "Systematically assessing microbiome–disease associations identifies drivers of inconsistency in metagenomic research," PLOS Biology, Public Library of Science, vol. 20(3), pages 1-18, March.
    20. Won Jun Lee & Sang Cheol Kim & Jung-Ho Yoon & Sang Jun Yoon & Johan Lim & You-Sun Kim & Sung Won Kwon & Jeong Hill Park, 2016. "Meta-Analysis of Tumor Stem-Like Breast Cancer Cells Using Gene Set and Network Analysis," PLOS ONE, Public Library of Science, vol. 11(2), pages 1-20, February.
    21. Emanuele Aliverti & Kristian Lum & James E. Johndrow & David B. Dunson, 2021. "Removing the influence of group variables in high‐dimensional predictive modelling," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(3), pages 791-811, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006102. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.