IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v7y2008i1n10.html
   My bibliography  Save this article

Correcting the Estimated Level of Differential Expression for Gene Selection Bias: Application to a Microarray Study

Author

Listed:
  • Bickel David R.

    (Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology, and Immunology, University of Ottawa)

Abstract

The level of differential gene expression may be defined as a fold change, a frequency of upregulation, or some other measure of the degree or extent of a difference in expression across groups of interest. On the basis of expression data for hundreds or thousands of genes, inferring which genes are differentially expressed or ranking genes in order of priority introduces a bias in estimates of their differential expression levels. A previous correction of this feature selection bias suffers from a lack of generality in the method of ranking genes, from requiring many biological replicates, and from unnecessarily overcompensating for the bias.For any method of ranking genes on the basis of gene expression measured for as few as three biological replicates, a simple leave-one-out algorithm corrects, with less overcompensation, the bias in estimates of the level of differential gene expression. In a microarray data set, the bias correction reduces estimates of the probability of upregulation or downregulation from 100% to as low as 60%, even for genes with estimated local false discovery rates close to 0. A simulation study quantifies both the advantage of smoothing estimates of bias before correction and the degree of overcompensation.

Suggested Citation

  • Bickel David R., 2008. "Correcting the Estimated Level of Differential Expression for Gene Selection Bias: Application to a Microarray Study," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-27, March.
  • Handle: RePEc:bpj:sagmbi:v:7:y:2008:i:1:n:10
    DOI: 10.2202/1544-6115.1330
    as

    Download full text from publisher

    File URL: https://doi.org/10.2202/1544-6115.1330
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.2202/1544-6115.1330?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Allison, David B. & Gadbury, Gary L. & Heo, Moonseong & Fernandez, Jose R. & Lee, Cheol-Koo & Prolla, Tomas A. & Weindruch, Richard, 2002. "A mixture model approach for the analysis of microarray gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 39(1), pages 1-20, March.
    2. Smyth Gordon K, 2004. "Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-28, February.
    3. Youngchao Ge & Sandrine Dudoit & Terence Speed, 2003. "Resampling-based multiple testing for microarray data analysis," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 12(1), pages 1-77, June.
    4. Kim‐Anh Do & Peter Müller & Feng Tang, 2005. "A Bayesian mixture model for differential gene expression," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 54(3), pages 627-644, June.
    5. Debashis Ghosh, 2006. "Shrunken p-Values for Assessing Differential Expression with Applications to Genomic Data Analysis," Biometrics, The International Biometric Society, vol. 62(4), pages 1099-1106, December.
    6. repec:bla:biomet:v:62:y:2006:i:1:p:10-18:1 is not listed on IDEAS
    7. Dudoit S. & Fridlyand J. & Speed T. P, 2002. "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 77-87, March.
    8. Cheng Cheng & Pounds Stanley B. & Boyett James M. & Pei Deqing & Kuo Mei-Ling & Roussel Martine F., 2004. "Statistical Significance Threshold Criteria For Analysis of Microarray Gene Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-32, December.
    9. Margaret Sullivan Pepe & Gary Longton & Garnet L. Anderson & Michel Schummer, 2003. "Selecting Differentially Expressed Genes from Microarray Experiments," Biometrics, The International Biometric Society, vol. 59(1), pages 133-142, March.
    10. Efron, Bradley, 2004. "Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 96-104, January.
    11. Xiang, Qinfang & Edwards, Jode & Gadbury, Gary L., 2006. "Interval estimation in a finite mixture model: Modeling P-values in multiple testing applications," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 570-586, November.
    12. Efron B. & Tibshirani R. & Storey J.D. & Tusher V., 2001. "Empirical Bayes Analysis of a Microarray Experiment," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1151-1160, December.
    13. Peter Muller & Giovanni Parmigiani & Christian Robert & Judith Rousseau, 2004. "Optimal Sample Size for Multiple Testing: The Case of Gene Expression Microarrays," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 990-1001, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. David R. Bickel, 2011. "Estimating the Null Distribution to Adjust Observed Confidence Levels for Genome-Scale Screening," Biometrics, The International Biometric Society, vol. 67(2), pages 363-370, June.
    2. Bickel David R., 2012. "Empirical Bayes Interval Estimates that are Conditionally Equal to Unadjusted Confidence Intervals or to Default Prior Credibility Intervals," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-34, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rossell David & Guerra Rudy & Scott Clayton, 2008. "Semi-Parametric Differential Expression Analysis via Partial Mixture Estimation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-29, April.
    2. Montazeri Zahra & Yanofsky Corey M. & Bickel David R., 2010. "Shrinkage Estimation of Effect Sizes as an Alternative to Hypothesis Testing Followed by Estimation in High-Dimensional Biology: Applications to Differential Gene Expression," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-33, June.
    3. Mark A. van de Wiel & Kyung In Kim, 2007. "Estimating the False Discovery Rate Using Nonparametric Deconvolution," Biometrics, The International Biometric Society, vol. 63(3), pages 806-815, September.
    4. Robin, Stephane & Bar-Hen, Avner & Daudin, Jean-Jacques & Pierre, Laurent, 2007. "A semi-parametric approach for mixture models: Application to local false discovery rate estimation," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 5483-5493, August.
    5. Hossain, Ahmed & Beyene, Joseph & Willan, Andrew R. & Hu, Pingzhao, 2009. "A flexible approximate likelihood ratio test for detecting differential expression in microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3685-3695, August.
    6. Ghosh Debashis, 2012. "Incorporating the Empirical Null Hypothesis into the Benjamini-Hochberg Procedure," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(4), pages 1-21, July.
    7. Ahmed Hossain & Hafiz T.A. Khan, 2016. "Identification of genomic markers correlated with sensitivity in solid tumors to Dasatinib using sparse principal components," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(14), pages 2538-2549, October.
    8. He, Yi & Pan, Wei & Lin, Jizhen, 2006. "Cluster analysis using multivariate normal mixture models to detect differential gene expression with microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 641-658, November.
    9. Cheng, Cheng, 2009. "Internal validation inferences of significant genomic features in genome-wide screening," Computational Statistics & Data Analysis, Elsevier, vol. 53(3), pages 788-800, January.
    10. Xiang, Qinfang & Edwards, Jode & Gadbury, Gary L., 2006. "Interval estimation in a finite mixture model: Modeling P-values in multiple testing applications," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 570-586, November.
    11. Dazard, Jean-Eudes & Sunil Rao, J., 2012. "Joint adaptive mean–variance regularization and variance stabilization of high dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 56(7), pages 2317-2333.
    12. Leek Jeffrey T & Storey John D., 2011. "The Joint Null Criterion for Multiple Hypothesis Tests," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-22, June.
    13. Khalili, Abbas & Huang, Tim & Lin, Shili, 2009. "A robust unified approach to analyzing methylation and gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1701-1710, March.
    14. Miecznikowski, Jeffrey C. & Gold, David & Shepherd, Lori & Liu, Song, 2011. "Deriving and comparing the distribution for the number of false positives in single step methods to control k-FWER," Statistics & Probability Letters, Elsevier, vol. 81(11), pages 1695-1705, November.
    15. Yu, Chang & Zelterman, Daniel, 2017. "A parametric model to estimate the proportion from true null using a distribution for p-values," Computational Statistics & Data Analysis, Elsevier, vol. 114(C), pages 105-118.
    16. Michele Guindani & Wesley O. Johnson, 2018. "More nonparametric Bayesian inference in applications," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(2), pages 239-251, June.
    17. Muir, W.M. & Rosa, G.J.M. & Pittendrigh, B.R. & Xu, Z. & Rider, S.D. & Fountain, M. & Ogas, J., 2009. "A mixture model approach for the analysis of small exploratory microarray experiments," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1566-1576, March.
    18. Hong, Zhaoping & Lian, Heng, 2012. "BOPA: A Bayesian hierarchical model for outlier expression detection," Computational Statistics & Data Analysis, Elsevier, vol. 56(12), pages 4146-4156.
    19. Marot Guillemette & Mayer Claus-Dieter, 2009. "Sequential Analysis for Microarray Data Based on Sensitivity and Meta-Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-35, January.
    20. Bradley Efron, 2007. "Doing thousands of hypothesis tests at the same time," Metron - International Journal of Statistics, Dipartimento di Statistica, Probabilità e Statistiche Applicate - University of Rome, vol. 0(1), pages 3-21.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:7:y:2008:i:1:n:10. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.