IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0212669.html
   My bibliography  Save this article

Batch adjustment by reference alignment (BARA): Improved prediction performance in biological test sets with batch effects

Author

Listed:
  • Robin Gradin
  • Malin Lindstedt
  • Henrik Johansson

Abstract

Many biological data acquisition platforms suffer from inadvertent inclusion of biologically irrelevant variance in analyzed data, collectively termed batch effects. Batch effects can lead to difficulties in downstream analysis by lowering the power to detect biologically interesting differences and can in certain instances lead to false discoveries. They are especially troublesome in predictive modelling where samples in training sets and test sets are often completely correlated with batches. In this article, we present BARA, a normalization method for adjusting batch effects in predictive modelling. BARA utilizes a few reference samples to adjust for batch effects in a compressed data space spanned by the training set. We evaluate BARA using a collection of publicly available datasets and three different prediction models, and compare its performance to already existing methods developed for similar purposes. The results show that data normalized with BARA generates high and consistent prediction performances. Further, they suggest that BARA produces reliable performances independent of the examined classifiers. We therefore conclude that BARA has great potential to facilitate the development of predictive assays where test sets and training sets are correlated with batch.

Suggested Citation

  • Robin Gradin & Malin Lindstedt & Henrik Johansson, 2019. "Batch adjustment by reference alignment (BARA): Improved prediction performance in biological test sets with batch effects," PLOS ONE, Public Library of Science, vol. 14(2), pages 1-15, February.
  • Handle: RePEc:plo:pone00:0212669
    DOI: 10.1371/journal.pone.0212669
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0212669
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0212669&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0212669?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Jeffrey T Leek & John D Storey, 2007. "Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis," PLOS Genetics, Public Library of Science, vol. 3(9), pages 1-12, September.
    2. Youichi Higuchi & Motohiro Kojima & Genichiro Ishii & Kazuhiko Aoyagi & Hiroki Sasaki & Atsushi Ochiai, 2015. "Gastrointestinal Fibroblasts Have Specialized, Diverse Transcriptional Phenotypes: A Comprehensive Gene Expression Analysis of Human Fibroblasts," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-19, June.
    3. Wickham, Hadley, 2007. "Reshaping Data with the reshape Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 21(i12).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Augustinus, Benno A. & Blum, Moshe & Citterio, Sandra & Gentili, Rodolfo & Helman, David & Nestel, David & Schaffner, Urs & Müller-Schärer, Heinz & Lensky, Itamar M., 2022. "Ground-truthing predictions of a demographic model driven by land surface temperatures with a weed biocontrol cage experiment," Ecological Modelling, Elsevier, vol. 466(C).
    2. Marie Bobowski-Gerard & Clémence Boulet & Francesco P. Zummo & Julie Dubois-Chevalier & Céline Gheeraert & Mohamed Bou Saleh & Jean-Marc Strub & Amaury Farce & Maheul Ploton & Loïc Guille & Jimmy Vand, 2022. "Functional genomics uncovers the transcription factor BNC2 as required for myofibroblastic activation in fibrosis," Nature Communications, Nature, vol. 13(1), pages 1-20, December.
    3. Miller, Christine M.F. & Waterhouse, Hannah & Harter, Thomas & Fadel, James G. & Meyer, Deanne, 2020. "Quantifying the uncertainty in nitrogen application and groundwater nitrate leaching in manure based cropping systems," Agricultural Systems, Elsevier, vol. 184(C).
    4. repec:jss:jstsof:40:i14 is not listed on IDEAS
    5. Sean McKenzie & Hilary Parkinson & Jane Mangold & Mary Burrows & Selena Ahmed & Fabian Menalled, 2018. "Perceptions, Experiences, and Priorities Supporting Agroecosystem Management Decisions Differ among Agricultural Producers, Consultants, and Researchers," Sustainability, MDPI, vol. 10(11), pages 1-19, November.
    6. repec:plo:pgen00:1002078 is not listed on IDEAS
    7. Emanuele Aliverti & Kristian Lum & James E. Johndrow & David B. Dunson, 2021. "Removing the influence of group variables in high‐dimensional predictive modelling," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(3), pages 791-811, July.
    8. Seungchul Baek & Yen‐Yi Ho & Yanyuan Ma, 2020. "Using sufficient direction factor model to analyze latent activities associated with breast cancer survival," Biometrics, The International Biometric Society, vol. 76(4), pages 1340-1350, December.
    9. C. J. Torrecilla-Salinas & O. Troyer & M. J. Escalona & M. Mejías, 2019. "A Delphi-based expert judgment method applied to the validation of a mature Agile framework for Web development projects," Information Technology and Management, Springer, vol. 20(1), pages 9-40, March.
    10. Griffin, Maryclare & Hoff, Peter D., 2019. "Lasso ANOVA decompositions for matrix and tensor data," Computational Statistics & Data Analysis, Elsevier, vol. 137(C), pages 181-194.
    11. Priyanga Dilini Talagala & Rob J Hyndman & Kate Smith-Miles & Sevvandi Kandanaarachchi & Mario A Munoz, 2018. "Anomaly detection in streaming nonstationary temporal data," Monash Econometrics and Business Statistics Working Papers 4/18, Monash University, Department of Econometrics and Business Statistics.
    12. Thelma Dede Baddoo & Zhijia Li & Yiqing Guan & Kenneth Rodolphe Chabi Boni & Isaac Kwesi Nooni, 2020. "Data-Driven Modeling and the Influence of Objective Function Selection on Model Performance in Limited Data Regions," IJERPH, MDPI, vol. 17(11), pages 1-26, June.
    13. Luiz A. Domeignoz-Horta & Seraina L. Cappelli & Rashmi Shrestha & Stephanie Gerin & Annalea K. Lohila & Jussi Heinonsalo & Daniel B. Nelson & Ansgar Kahmen & Pengpeng Duan & David Sebag & Eric Verrecc, 2024. "Plant diversity drives positive microbial associations in the rhizosphere enhancing carbon use efficiency in agricultural soils," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    14. Delaram Pouyabahar & Tallulah Andrews & Gary D. Bader, 2025. "Interpretable single-cell factor decomposition using sciRED," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    15. Paul J McMurdie & Susan Holmes, 2014. "Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible," PLOS Computational Biology, Public Library of Science, vol. 10(4), pages 1-12, April.
    16. Chee Ho H’ng & Shanika L. Amarasinghe & Boya Zhang & Hojin Chang & Xinli Qu & David R. Powell & Alberto Rosello-Diez, 2024. "Compensatory growth and recovery of cartilage cytoarchitecture after transient cell death in fetal mouse limbs," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    17. Priscila Villalobos Perna & Mirko Di Febbraro & Maria Laura Carranza & Flavio Marzialetti & Michele Innangi, 2023. "Remote Sensing and Invasive Plants in Coastal Ecosystems: What We Know So Far and Future Prospects," Land, MDPI, vol. 12(2), pages 1-16, January.
    18. Bikram K. Das & Robiul Islam Rubel & Surbhi Gupta & Yajun Wu & Lin Wei & Volker S. Brözel, 2022. "Impacts of Biochar-Based Controlled-Release Nitrogen Fertilizers on Soil Prokaryotic and Fungal Communities," Agriculture, MDPI, vol. 12(10), pages 1-15, October.
    19. Mark Reimers, 2010. "Making Informed Choices about Microarray Data Analysis," PLOS Computational Biology, Public Library of Science, vol. 6(5), pages 1-7, May.
    20. Leek Jeffrey T & Storey John D., 2011. "The Joint Null Criterion for Multiple Hypothesis Tests," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-22, June.
    21. Stefan LINGNER & Eiko THIESSEN & Kerrin MÜLLER & Eberhard HARTUNG, 2018. "Dry Biomass Estimation of Hedge Banks: Allometric Equation vs. Structure from Motion via Unmanned Aerial Vehicle," Journal of Forest Science, Czech Academy of Agricultural Sciences, vol. 64(4), pages 149-156.
    22. França, Lucas Gabriel Souza & Montoya, Pedro & Miranda, José Garcia Vivas, 2019. "On multifractals: A non-linear study of actigraphy data," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 514(C), pages 612-619.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0212669. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.