IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v11y2012i3n10.html
   My bibliography  Save this article

The practical effect of batch on genomic prediction

Author

Listed:
  • Parker Hilary S.

    (Johns Hopkins Bloomberg School of Public Health)

  • Leek Jeffrey T.

    (Johns Hopkins Bloomberg School of Public Health)

Abstract

Measurements from microarrays and other high-throughput technologies are susceptible to non-biological artifacts like batch effects. It is known that batch effects can alter or obscure the set of significant results and biological conclusions in high-throughput studies. Here we examine the impact of batch effects on predictors built from genomic technologies. To investigate batch effects, we collected publicly available gene expression measurements with known outcomes, and estimated batches using date. Using these data we show (1) the impact of batch effects on prediction depends on the correlation between outcome and batch in the training data, and (2) removing expression measurements most affected by batch before building predictors may improve the accuracy of those predictors. These results suggest that (1) training sets should be designed to minimize correlation between batches and outcome, and (2) methods for identifying batch-affected probes should be developed to improve prediction results for studies with high correlation between batches and outcome.

Suggested Citation

  • Parker Hilary S. & Leek Jeffrey T., 2012. "The practical effect of batch on genomic prediction," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-22, April.
  • Handle: RePEc:bpj:sagmbi:v:11:y:2012:i:3:n:10
    DOI: 10.1515/1544-6115.1766
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/1544-6115.1766
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/1544-6115.1766?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zhijin Wu & Rafael Irizarry & Robert Gentleman & Francisco Martinez Murillo & Forrest Spencer, 2004. "A Model Based Background Adjustment for Oligonucleotide Expression Arrays," Johns Hopkins University Dept. of Biostatistics Working Paper Series 1001, Berkeley Electronic Press.
    2. Zhijin Wu & Rafael A. Irizarry & Robert Gentleman & Francisco Martinez-Murillo & Forrest Spencer, 2004. "A Model-Based Background Adjustment for Oligonucleotide Expression Arrays," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 909-917, December.
    3. Jeffrey T Leek & John D Storey, 2007. "Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis," PLOS Genetics, Public Library of Science, vol. 3(9), pages 1-12, September.
    4. Geman Donald & d'Avignon Christian & Naiman Daniel Q. & Winslow Raimond L., 2004. "Classifying Gene Expression Profiles from Pairwise mRNA Comparisons," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-22, August.
    5. Andy J. Minn & Gaorav P. Gupta & Peter M. Siegel & Paula D. Bos & Weiping Shu & Dilip D. Giri & Agnes Viale & Adam B. Olshen & William L. Gerald & Joan Massagué, 2005. "Genes that mediate breast cancer metastasis to lung," Nature, Nature, vol. 436(7050), pages 518-524, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Aline Talhouk & Stefan Kommoss & Robertson Mackenzie & Martin Cheung & Samuel Leung & Derek S Chiu & Steve E Kalloger & David G Huntsman & Stephanie Chen & Maria Intermaggio & Jacek Gronwald & Fong C , 2016. "Single-Patient Molecular Testing with NanoString nCounter Data Using a Reference-Based Strategy for Batch Effect Correction," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-18, April.
    2. Charlotte Soneson & Sarah Gerster & Mauro Delorenzi, 2014. "Batch Effect Confounding Leads to Strong Bias in Performance Estimates Obtained by Cross-Validation," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-13, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rinku Sharma & Garima Singh & Sudeepto Bhattacharya & Ashutosh Singh, 2018. "Comparative transcriptome meta-analysis of Arabidopsis thaliana under drought and cold stress," PLOS ONE, Public Library of Science, vol. 13(9), pages 1-18, September.
    2. Jin-Xing Liu & Yong Xu & Chun-Hou Zheng & Yi Wang & Jing-Yu Yang, 2012. "Characteristic Gene Selection via Weighting Principal Components by Singular Values," PLOS ONE, Public Library of Science, vol. 7(7), pages 1-10, July.
    3. Nan Li & Matthew N. McCall & Zhijin Wu, 2017. "Establishing Informative Prior for Gene Expression Variance from Public Databases," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(1), pages 160-177, June.
    4. Won Jun Lee & Sang Cheol Kim & Jung-Ho Yoon & Sang Jun Yoon & Johan Lim & You-Sun Kim & Sung Won Kwon & Jeong Hill Park, 2016. "Meta-Analysis of Tumor Stem-Like Breast Cancer Cells Using Gene Set and Network Analysis," PLOS ONE, Public Library of Science, vol. 11(2), pages 1-20, February.
    5. Sigrun Helga Lund & Daniel Fannar Gudbjartsson & Thorunn Rafnar & Asgeir Sigurdsson & Sigurjon Axel Gudjonsson & Julius Gudmundsson & Kari Stefansson & Gunnar Stefansson, 2014. "A Method for Detecting Long Non-Coding RNAs with Tiled RNA Expression Microarrays," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-9, June.
    6. Krishanpal Anamika & Àkos Gyenis & Laetitia Poidevin & Olivier Poch & Làszlò Tora, 2012. "RNA Polymerase II Pausing Downstream of Core Histone Genes Is Different from Genes Producing Polyadenylated Transcripts," PLOS ONE, Public Library of Science, vol. 7(6), pages 1-14, June.
    7. Lei Zhang & Linlin Wang & Pu Tian & Suyan Tian, 2016. "Identification of Genes Discriminating Multiple Sclerosis Patients from Controls by Adapting a Pathway Analysis Method," PLOS ONE, Public Library of Science, vol. 11(11), pages 1-13, November.
    8. Upton Graham J. G. & Harrison Andrew P, 2010. "The Detection of Blur in Affymetrix GeneChips," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-19, October.
    9. Ryan Abo & Gregory D Jenkins & Liewei Wang & Brooke L Fridley, 2012. "Identifying the Genetic Variation of Gene Expression Using Gene Sets: Application of Novel Gene Set eQTL Approach to PharmGKB and KEGG," PLOS ONE, Public Library of Science, vol. 7(8), pages 1-11, August.
    10. Jeremiah J Faith & Boris Hayete & Joshua T Thaden & Ilaria Mogno & Jamey Wierzbowski & Guillaume Cottarel & Simon Kasif & James J Collins & Timothy S Gardner, 2007. "Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles," PLOS Biology, Public Library of Science, vol. 5(1), pages 1-13, January.
    11. Chalise, Prabhakar & Fridley, Brooke L., 2012. "Comparison of penalty functions for sparse canonical correlation analysis," Computational Statistics & Data Analysis, Elsevier, vol. 56(2), pages 245-254.
    12. Marot Guillemette & Mayer Claus-Dieter, 2009. "Sequential Analysis for Microarray Data Based on Sensitivity and Meta-Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-33, January.
    13. Wei-Chung Cheng & Cheng-Wei Chang & Chaang-Ray Chen & Min-Lung Tsai & Wun-Yi Shu & Chia-Yang Li & Ian C Hsu, 2011. "Identification of Reference Genes across Physiological States for qRT-PCR through Microarray Meta-Analysis," PLOS ONE, Public Library of Science, vol. 6(2), pages 1-8, February.
    14. Suyan Tian & James G Krueger & Katherine Li & Ali Jabbari & Carrie Brodmerkel & Michelle A Lowes & Mayte Suárez-Fariñas, 2012. "Meta-Analysis Derived (MAD) Transcriptome of Psoriasis Defines the “Core” Pathogenesis of Disease," PLOS ONE, Public Library of Science, vol. 7(9), pages 1-15, September.
    15. Akul Singhania & Hitasha Rupani & Nivenka Jayasekera & Simon Lumb & Paul Hales & Neil Gozzard & Donna E Davies & Christopher H Woelk & Peter H Howarth, 2017. "Altered Epithelial Gene Expression in Peripheral Airways of Severe Asthma," PLOS ONE, Public Library of Science, vol. 12(1), pages 1-16, January.
    16. Russell D J Huby & Philip Glaves & Richard Jackson, 2014. "The Incidence of Sexually Dimorphic Gene Expression Varies Greatly between Tissues in the Rat," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-19, December.
    17. Erick da Conceição Amorim & Vinícius Diniz Mayrink, 2020. "Clustering non-linear interactions in factor analysis," METRON, Springer;Sapienza Università di Roma, vol. 78(3), pages 329-352, December.
    18. Hossain, Ahmed & Beyene, Joseph & Willan, Andrew R. & Hu, Pingzhao, 2009. "A flexible approximate likelihood ratio test for detecting differential expression in microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3685-3695, August.
    19. Arjun Bhattacharya & Anastasia N. Freedman & Vennela Avula & Rebeca Harris & Weifang Liu & Calvin Pan & Aldons J. Lusis & Robert M. Joseph & Lisa Smeester & Hadley J. Hartwell & Karl C. K. Kuban & Car, 2022. "Placental genomics mediates genetic associations with complex health traits and disease," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    20. Huixia Wang & Xuming He, 2008. "An Enhanced Quantile Approach for Assessing Differential Gene Expressions," Biometrics, The International Biometric Society, vol. 64(2), pages 449-457, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:11:y:2012:i:3:n:10. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.