IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v56y2012i12p4381-4398.html
   My bibliography  Save this article

Investigations into refinements of Storey’s method of multiple hypothesis testing minimising the FDR, and its application to test binomial data

Author

Listed:
  • Nixon, John H.

Abstract

Storey’s method for multiple hypothesis testing “the Optimal Discovery Procedure” (ODP) minimising the false discovery rate (FDR) and giving p-values and q-values (estimates of FDR) for each test, was extended by iteration to enforce consistency between the p-values of the tests and the binary parameters defining which data points contribute to the fitted null hypothesis. These parameters arise when the null hypothesis has to be estimated from the data. The ODP as previously described, is only optimal for fixed values of these parameters. The extension proposed here requires the introduction of a cut-off parameter for the p-values. Motivated by using this method to analyse a set of pairs of frequencies representing gene expression for a set of genes in two libraries, from which it was desired to select those that are most likely to be not following the null hypothesis that the frequency ratio is a fixed unknown number, this method was tested by analysing many similar simulated datasets. The results showed that the ODP modified by iteration could be improved sometimes greatly by a suitable choice of the cut-off parameter, but varying this parameter alone may not lead to the globally optimal solution because statistical testing based on the binomial distribution is more efficient than using a form of the ODP when the number of non-null hypotheses in the data is small, but the reverse is true when it is large. This may be an effect of using discrete data. Efficiency here is defined in terms of the expected proportion of errors that occur (q-value) when a given proportion of the data is declared “significant” (i.e. the null hypothesis is believed not to hold for them). An improved version of the ODP along these lines is likely to have numerous applications such as in the optimised search for candidate genes that show unusual expression patterns for example when more than two experimental conditions are simultaneously compared and to cases when additional categorical variables or a time series is present in the experimental design.

Suggested Citation

  • Nixon, John H., 2012. "Investigations into refinements of Storey’s method of multiple hypothesis testing minimising the FDR, and its application to test binomial data," Computational Statistics & Data Analysis, Elsevier, vol. 56(12), pages 4381-4398.
  • Handle: RePEc:eee:csdana:v:56:y:2012:i:12:p:4381-4398
    DOI: 10.1016/j.csda.2012.03.026
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947312001697
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2012.03.026?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. John D. Storey, 2007. "The optimal discovery procedure: a new approach to simultaneous significance testing," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(3), pages 347-368, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shonosuke Sugasawa & Hisashi Noma, 2021. "Efficient screening of predictive biomarkers for individual treatment selection," Biometrics, The International Biometric Society, vol. 77(1), pages 249-257, March.
    2. Daniel Yekutieli, 2015. "Bayesian tests for composite alternative hypotheses in cross-tabulated data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(2), pages 287-301, June.
    3. Ruth Heller & Saharon Rosset, 2021. "Optimal control of false discovery criteria in the two‐group model," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(1), pages 133-155, February.
    4. Huixia Wang & Xuming He, 2008. "An Enhanced Quantile Approach for Assessing Differential Gene Expressions," Biometrics, The International Biometric Society, vol. 64(2), pages 449-457, June.
    5. Edsel Peña & Joshua Habiger & Wensong Wu, 2015. "Classes of multiple decision functions strongly controlling FWER and FDR," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 78(5), pages 563-595, July.
    6. Xiaoquan Wen, 2017. "Robust Bayesian FDR Control Using Bayes Factors, with Applications to Multi-tissue eQTL Discovery," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(1), pages 28-49, June.
    7. Chen, Xiongzhi, 2019. "Uniformly consistently estimating the proportion of false null hypotheses via Lebesgue–Stieltjes integral equations," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 724-744.
    8. Dazard, Jean-Eudes & Sunil Rao, J., 2012. "Joint adaptive mean–variance regularization and variance stabilization of high dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 56(7), pages 2317-2333.
    9. Luis G. León-Novelo & Peter Müller & Wadih Arap & Mikhail Kolonin & Jessica Sun & Renata Pasqualini & Kim-Anh Do, 2013. "Semiparametric Bayesian Inference for Phage Display Data," Biometrics, The International Biometric Society, vol. 69(1), pages 174-183, March.
    10. Michele Guindani & Peter Müller & Song Zhang, 2009. "A Bayesian discovery procedure," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(5), pages 905-925, November.
    11. Leek Jeffrey T & Storey John D., 2011. "The Joint Null Criterion for Multiple Hypothesis Tests," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-22, June.
    12. David Amar & Ron Shamir & Daniel Yekutieli, 2017. "Extracting replicable associations across multiple studies: Empirical Bayes algorithms for controlling the false discovery rate," PLOS Computational Biology, Public Library of Science, vol. 13(8), pages 1-22, August.
    13. Shigeyuki Matsui & Hisashi Noma & Pingping Qu & Yoshio Sakai & Kota Matsui & Christoph Heuck & John Crowley, 2018. "Multi†subgroup gene screening using semi†parametric hierarchical mixture models and the optimal discovery procedure: Application to a randomized clinical trial in multiple myeloma," Biometrics, The International Biometric Society, vol. 74(1), pages 313-320, March.
    14. Hwang J.T. Gene & Liu Peng, 2010. "Optimal Tests Shrinking Both Means and Variances Applicable to Microarray Data Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-35, October.
    15. Rossell David & Guerra Rudy & Scott Clayton, 2008. "Semi-Parametric Differential Expression Analysis via Partial Mixture Estimation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-29, April.
    16. Jules Ellis, 2014. "An Inequality for Correlations in Unidimensional Monotone Latent Variable Models for Binary Variables," Psychometrika, Springer;The Psychometric Society, vol. 79(2), pages 303-316, April.
    17. Wenguang Sun & T. Tony Cai, 2009. "Large‐scale multiple testing under dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 393-424, April.
    18. Saharon Rosset & Ruth Heller & Amichai Painsky & Ehud Aharoni, 2022. "Optimal and maximin procedures for multiple testing problems," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1105-1128, September.
    19. Rubin Daniel B., 2016. "Evaluations of the Optimal Discovery Procedure for Multiple Testing," The International Journal of Biostatistics, De Gruyter, vol. 12(1), pages 21-29, May.
    20. Youngjo Lee & Jan F. Bjørnstad, 2013. "Extended likelihood approach to large-scale multiple testing," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(3), pages 553-575, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:56:y:2012:i:12:p:4381-4398. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.