IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0212669.html
   My bibliography  Save this article

Batch adjustment by reference alignment (BARA): Improved prediction performance in biological test sets with batch effects

Author

Listed:
  • Robin Gradin
  • Malin Lindstedt
  • Henrik Johansson

Abstract

Many biological data acquisition platforms suffer from inadvertent inclusion of biologically irrelevant variance in analyzed data, collectively termed batch effects. Batch effects can lead to difficulties in downstream analysis by lowering the power to detect biologically interesting differences and can in certain instances lead to false discoveries. They are especially troublesome in predictive modelling where samples in training sets and test sets are often completely correlated with batches. In this article, we present BARA, a normalization method for adjusting batch effects in predictive modelling. BARA utilizes a few reference samples to adjust for batch effects in a compressed data space spanned by the training set. We evaluate BARA using a collection of publicly available datasets and three different prediction models, and compare its performance to already existing methods developed for similar purposes. The results show that data normalized with BARA generates high and consistent prediction performances. Further, they suggest that BARA produces reliable performances independent of the examined classifiers. We therefore conclude that BARA has great potential to facilitate the development of predictive assays where test sets and training sets are correlated with batch.

Suggested Citation

  • Robin Gradin & Malin Lindstedt & Henrik Johansson, 2019. "Batch adjustment by reference alignment (BARA): Improved prediction performance in biological test sets with batch effects," PLOS ONE, Public Library of Science, vol. 14(2), pages 1-15, February.
  • Handle: RePEc:plo:pone00:0212669
    DOI: 10.1371/journal.pone.0212669
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0212669
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0212669&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0212669?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Wickham, Hadley, 2007. "Reshaping Data with the reshape Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 21(i12).
    2. Jeffrey T Leek & John D Storey, 2007. "Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis," PLOS Genetics, Public Library of Science, vol. 3(9), pages 1-12, September.
    3. Youichi Higuchi & Motohiro Kojima & Genichiro Ishii & Kazuhiko Aoyagi & Hiroki Sasaki & Atsushi Ochiai, 2015. "Gastrointestinal Fibroblasts Have Specialized, Diverse Transcriptional Phenotypes: A Comprehensive Gene Expression Analysis of Human Fibroblasts," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-19, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Augustinus, Benno A. & Blum, Moshe & Citterio, Sandra & Gentili, Rodolfo & Helman, David & Nestel, David & Schaffner, Urs & Müller-Schärer, Heinz & Lensky, Itamar M., 2022. "Ground-truthing predictions of a demographic model driven by land surface temperatures with a weed biocontrol cage experiment," Ecological Modelling, Elsevier, vol. 466(C).
    2. Julio Cesar Alonso Cifuentes & Jaime Andres Carabali, 2019. "Breve Tuturial para visualizar y Calcular Métricas de Redes (grafos) en R (para Económisas)," Icesi Economics Lecture Notes 18170, Universidad Icesi.
    3. Arjun Bhattacharya & Anastasia N. Freedman & Vennela Avula & Rebeca Harris & Weifang Liu & Calvin Pan & Aldons J. Lusis & Robert M. Joseph & Lisa Smeester & Hadley J. Hartwell & Karl C. K. Kuban & Car, 2022. "Placental genomics mediates genetic associations with complex health traits and disease," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    4. Marie Bobowski-Gerard & Clémence Boulet & Francesco P. Zummo & Julie Dubois-Chevalier & Céline Gheeraert & Mohamed Bou Saleh & Jean-Marc Strub & Amaury Farce & Maheul Ploton & Loïc Guille & Jimmy Vand, 2022. "Functional genomics uncovers the transcription factor BNC2 as required for myofibroblastic activation in fibrosis," Nature Communications, Nature, vol. 13(1), pages 1-20, December.
    5. Miller, Christine M.F. & Waterhouse, Hannah & Harter, Thomas & Fadel, James G. & Meyer, Deanne, 2020. "Quantifying the uncertainty in nitrogen application and groundwater nitrate leaching in manure based cropping systems," Agricultural Systems, Elsevier, vol. 184(C).
    6. Sarlas, Georgios & Páez, Antonio & Axhausen, Kay W., 2020. "Betweenness-accessibility: Estimating impacts of accessibility on networks," Journal of Transport Geography, Elsevier, vol. 84(C).
    7. repec:jss:jstsof:40:i14 is not listed on IDEAS
    8. Marin FOTACHE & Florin DUMITRU & Valerica GREAVU-SERBAN, 2015. "An Information Systems Master Programme in Romania. Some Commonalities and Specificities," Informatica Economica, Academy of Economic Studies - Bucharest, Romania, vol. 19(3), pages 5-18.
    9. Martijn Van Heel & Dinska Van Gucht & Koen Vanbrabant & Frank Baeyens, 2017. "The Importance of Conditioned Stimuli in Cigarette and E-Cigarette Craving Reduction by E-Cigarettes," IJERPH, MDPI, vol. 14(2), pages 1-18, February.
    10. Sean McKenzie & Hilary Parkinson & Jane Mangold & Mary Burrows & Selena Ahmed & Fabian Menalled, 2018. "Perceptions, Experiences, and Priorities Supporting Agroecosystem Management Decisions Differ among Agricultural Producers, Consultants, and Researchers," Sustainability, MDPI, vol. 10(11), pages 1-19, November.
    11. Won Jun Lee & Sang Cheol Kim & Jung-Ho Yoon & Sang Jun Yoon & Johan Lim & You-Sun Kim & Sung Won Kwon & Jeong Hill Park, 2016. "Meta-Analysis of Tumor Stem-Like Breast Cancer Cells Using Gene Set and Network Analysis," PLOS ONE, Public Library of Science, vol. 11(2), pages 1-20, February.
    12. Milad Abbasiharofteh & Tom Broekel, 2021. "Still in the shadow of the wall? The case of the Berlin biotechnology cluster," Environment and Planning A, , vol. 53(1), pages 73-94, February.
    13. Jill F. Lundell & Brennan Bean & Jürgen Symanzik, 2023. "Let’s talk about the weather: a cluster-based approach to weather forecast accuracy," Computational Statistics, Springer, vol. 38(3), pages 1135-1155, September.
    14. Andee J. Kaplan & Eric R. Hare, 2019. "Putting down roots: a graphical exploration of community attachment," Computational Statistics, Springer, vol. 34(4), pages 1449-1464, December.
    15. Emanuele Aliverti & Kristian Lum & James E. Johndrow & David B. Dunson, 2021. "Removing the influence of group variables in high‐dimensional predictive modelling," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(3), pages 791-811, July.
    16. Marron, J.S., 2017. "Big Data in context and robustness against heterogeneity," Econometrics and Statistics, Elsevier, vol. 2(C), pages 73-80.
    17. Haider, Saira M. & Benscoter, Allison M. & Pearlstine, Leonard & D'Acunto, Laura E. & Romañach, Stephanie S., 2021. "Landscape-scale drivers of endangered Cape Sable Seaside Sparrow (Ammospiza maritima mirabilis) presence using an ensemble modeling approach," Ecological Modelling, Elsevier, vol. 461(C).
    18. Seungchul Baek & Yen‐Yi Ho & Yanyuan Ma, 2020. "Using sufficient direction factor model to analyze latent activities associated with breast cancer survival," Biometrics, The International Biometric Society, vol. 76(4), pages 1340-1350, December.
    19. Senka Čaušević & Manupriyam Dubey & Marian Morales & Guillem Salazar & Vladimir Sentchilo & Nicolas Carraro & Hans-Joachim Ruscheweyh & Shinichi Sunagawa & Jan Roelof van der Meer, 2024. "Niche availability and competitive loss by facilitation control proliferation of bacterial strains intended for soil microbiome interventions," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    20. Fox, John & Carvalho, Marilia S., 2012. "The RcmdrPlugin.survival Package: Extending the R Commander Interface to Survival Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 49(i07).
    21. C. J. Torrecilla-Salinas & O. Troyer & M. J. Escalona & M. Mejías, 2019. "A Delphi-based expert judgment method applied to the validation of a mature Agile framework for Web development projects," Information Technology and Management, Springer, vol. 20(1), pages 9-40, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0212669. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.