IDEAS home Printed from https://ideas.repec.org/p/bep/jhubio/1001.html
   My bibliography  Save this paper

A Model Based Background Adjustment for Oligonucleotide Expression Arrays

Author

Listed:
  • Zhijin Wu

    (Johns Hopkins Bloomberg School of Public Health)

  • Rafael Irizarry

    (Johns Hopkins Bloomberg School of Public Health)

  • Robert Gentleman

    (Dana-Farber Cancer Institute)

  • Francisco Martinez Murillo

    (Johns Hopkins Medical Institute)

  • Forrest Spencer

    (Johns Hopkins Medical Institute)

Abstract

High density oligonucleotide expression arrays are widely used in many areas of biomedical research. Affymetrix GeneChip arrays are the most popular. In the Affymetrix system, a fair amount of further pre-processing and data reduction occurs following the image processing step. Statistical procedures developed by academic groups have been successful at improving the default algorithms provided by the Affymetrix system. In this paper we present a solution to one of the pre-processing steps, background adjustment, based on a formal statistical framework. Our solution greatly improves the performance of the technology in various practical applications.Affymetrix GeneChip arrays use short oligonucleotides to probe for genes in an RNA sample. Typically each gene will be represented by 11-20 pairs of oligonucleotide probes. The first component of these pairs is referred to as a perfect match probe and is designed to hybridize only with transcripts from the intended gene (specific hybridization). However, hybridization by other sequences (non-specific hybridization) is unavoidable. Furthermore, hybridization strengths are measured by a scanner that introduces optical noise. Therefore, the observed intensities need to be adjusted to give accurate measurements of specific hybridization. One approach to adjusting is to pair each perfect match probe with a mismatch probe that is designed with the intention of measuring non-specific hybridization. The default adjustment, provided as part of the Affymetrix system, is based on the difference between perfect match and mismatch probe intensities. We have found that this approach can be improved via the use of estimators derived from a statistical model that use probe sequence information. The model is based on simple hybridization theory from molecular biology and experiments specifically designed to help develop it.A final step in the pre-processing of these arrays is to combine the 11-20 probe pair intensities,after background adjustment and normalization, for a given gene to define a measure of expression that represents the amount of the corresponding mRNA species. In this paper we illustrate the practical consequences of not adjusting appropriately for the presence of nonspecific hybridization and provide a solution based on our background adjustment procedure. Software that computes our adjustment is available as part of the Bioconductor project (http://www.bioconductor.

Suggested Citation

  • Zhijin Wu & Rafael Irizarry & Robert Gentleman & Francisco Martinez Murillo & Forrest Spencer, 2004. "A Model Based Background Adjustment for Oligonucleotide Expression Arrays," Johns Hopkins University Dept. of Biostatistics Working Paper Series 1001, Berkeley Electronic Press.
  • Handle: RePEc:bep:jhubio:1001
    Note: oai:bepress.com:jhubiostat-1001
    as

    Download full text from publisher

    File URL: http://www.bepress.com/cgi/viewcontent.cgi?article=1001&context=jhubiostat
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Cui Xiangqin & Kerr M. Kathleen & Churchill Gary A., 2003. "Transformations for cDNA Microarray Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 2(1), pages 1-22, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nan Li & Matthew N. McCall & Zhijin Wu, 2017. "Establishing Informative Prior for Gene Expression Variance from Public Databases," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(1), pages 160-177, June.
    2. Jeremiah J Faith & Boris Hayete & Joshua T Thaden & Ilaria Mogno & Jamey Wierzbowski & Guillaume Cottarel & Simon Kasif & James J Collins & Timothy S Gardner, 2007. "Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles," PLOS Biology, Public Library of Science, vol. 5(1), pages 1-13, January.
    3. Russell D J Huby & Philip Glaves & Richard Jackson, 2014. "The Incidence of Sexually Dimorphic Gene Expression Varies Greatly between Tissues in the Rat," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-19, December.
    4. Parker Hilary S. & Leek Jeffrey T., 2012. "The practical effect of batch on genomic prediction," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-22, April.
    5. Krishanpal Anamika & Àkos Gyenis & Laetitia Poidevin & Olivier Poch & Làszlò Tora, 2012. "RNA Polymerase II Pausing Downstream of Core Histone Genes Is Different from Genes Producing Polyadenylated Transcripts," PLOS ONE, Public Library of Science, vol. 7(6), pages 1-14, June.
    6. Rinku Sharma & Garima Singh & Sudeepto Bhattacharya & Ashutosh Singh, 2018. "Comparative transcriptome meta-analysis of Arabidopsis thaliana under drought and cold stress," PLOS ONE, Public Library of Science, vol. 13(9), pages 1-18, September.
    7. Akul Singhania & Hitasha Rupani & Nivenka Jayasekera & Simon Lumb & Paul Hales & Neil Gozzard & Donna E Davies & Christopher H Woelk & Peter H Howarth, 2017. "Altered Epithelial Gene Expression in Peripheral Airways of Severe Asthma," PLOS ONE, Public Library of Science, vol. 12(1), pages 1-16, January.
    8. Sigrun Helga Lund & Daniel Fannar Gudbjartsson & Thorunn Rafnar & Asgeir Sigurdsson & Sigurjon Axel Gudjonsson & Julius Gudmundsson & Kari Stefansson & Gunnar Stefansson, 2014. "A Method for Detecting Long Non-Coding RNAs with Tiled RNA Expression Microarrays," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-9, June.
    9. Marot Guillemette & Mayer Claus-Dieter, 2009. "Sequential Analysis for Microarray Data Based on Sensitivity and Meta-Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-33, January.
    10. Suyan Tian & James G Krueger & Katherine Li & Ali Jabbari & Carrie Brodmerkel & Michelle A Lowes & Mayte Suárez-Fariñas, 2012. "Meta-Analysis Derived (MAD) Transcriptome of Psoriasis Defines the “Core” Pathogenesis of Disease," PLOS ONE, Public Library of Science, vol. 7(9), pages 1-15, September.
    11. Wei-Chung Cheng & Cheng-Wei Chang & Chaang-Ray Chen & Min-Lung Tsai & Wun-Yi Shu & Chia-Yang Li & Ian C Hsu, 2011. "Identification of Reference Genes across Physiological States for qRT-PCR through Microarray Meta-Analysis," PLOS ONE, Public Library of Science, vol. 6(2), pages 1-8, February.
    12. Upton Graham J. G. & Harrison Andrew P, 2010. "The Detection of Blur in Affymetrix GeneChips," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-19, October.
    13. Jin-Xing Liu & Yong Xu & Chun-Hou Zheng & Yi Wang & Jing-Yu Yang, 2012. "Characteristic Gene Selection via Weighting Principal Components by Singular Values," PLOS ONE, Public Library of Science, vol. 7(7), pages 1-10, July.
    14. Ryan Abo & Gregory D Jenkins & Liewei Wang & Brooke L Fridley, 2012. "Identifying the Genetic Variation of Gene Expression Using Gene Sets: Application of Novel Gene Set eQTL Approach to PharmGKB and KEGG," PLOS ONE, Public Library of Science, vol. 7(8), pages 1-11, August.
    15. Lei Zhang & Linlin Wang & Pu Tian & Suyan Tian, 2016. "Identification of Genes Discriminating Multiple Sclerosis Patients from Controls by Adapting a Pathway Analysis Method," PLOS ONE, Public Library of Science, vol. 11(11), pages 1-13, November.
    16. Chalise, Prabhakar & Fridley, Brooke L., 2012. "Comparison of penalty functions for sparse canonical correlation analysis," Computational Statistics & Data Analysis, Elsevier, vol. 56(2), pages 245-254.
    17. Erick da Conceição Amorim & Vinícius Diniz Mayrink, 2020. "Clustering non-linear interactions in factor analysis," METRON, Springer;Sapienza Università di Roma, vol. 78(3), pages 329-352, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ambroise Jérôme & Bearzatto Bertrand & Robert Annie & Macq Benoit & Gala Jean-Luc, 2012. "Combining Multiple Laser Scans of Spotted Microarrays by Means of a Two-Way ANOVA Model," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-20, February.
    2. M. Kathleen Kerr, 2003. "Design Considerations for Efficient and Effective Microarray Studies," Biometrics, The International Biometric Society, vol. 59(4), pages 822-828, December.
    3. Kelmansky Diana M. & Martínez Elena J. & Leiva Víctor, 2013. "A new variance stabilizing transformation for gene expression data analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(6), pages 653-666, December.
    4. Lama, Nicola & Boracchi, Patrizia & Biganzoli, Elia, 2009. "Exploration of distributional models for a novel intensity-dependent normalization procedure in censored gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1906-1922, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bep:jhubio:1001. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.bepress.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.