IDEAS home Printed from https://ideas.repec.org/a/spr/metcap/v23y2021i3d10.1007_s11009-019-09763-z.html
   My bibliography  Save this article

False Discovery Variance Reduction in Large Scale Simultaneous Hypothesis Tests

Author

Listed:
  • Sairam Rayaprolu

    (University of Connecticut)

  • Zhiyi Chi

    (University of Connecticut)

Abstract

Statistical dependence between hypotheses poses a significant challenge to the stability of large scale multiple hypotheses testing. Ignoring it often results in an unacceptably large spread in the false positive proportion even though the average value is acceptable (Fan et al., J Amer Statist Assoc 107(499): 1019-1035, 2012; Owen J R Stat Soc Ser B 67(3): 411–426, 2005; Qiu et al., Stat Appl Genet Mol Biol 4: 32, 2005 and Schwartzman and Lin Biometrika 98(1): 199–214, 2011). However, the statistical dependence structure of data is often unknown. Using a generic signal-processing model, Bayesian multiple testing, and simulations, we demonstrate that the variance of the false positive proportion can be substantially reduced even under unknown short range dependence. We do this by modeling the data generating process as a stationary ergodic binary signal process embedded in noisy observations. We derive conditional probabilities needed for the Bayesian multiple testing by incorporating nearby observations into a second order Taylor series approximation. Simulations under general conditions are carried out to assess the validity and the variance reduction of the approach. Along the way, we address the problem of sampling a random Markov matrix with specified stationary distribution and lower bounds on the top absolute eigenvalues, which is of interest in its own right.

Suggested Citation

  • Sairam Rayaprolu & Zhiyi Chi, 2021. "False Discovery Variance Reduction in Large Scale Simultaneous Hypothesis Tests," Methodology and Computing in Applied Probability, Springer, vol. 23(3), pages 711-733, September.
  • Handle: RePEc:spr:metcap:v:23:y:2021:i:3:d:10.1007_s11009-019-09763-z
    DOI: 10.1007/s11009-019-09763-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11009-019-09763-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11009-019-09763-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Art B. Owen, 2005. "Variance of the number of false discoveries," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(3), pages 411-426, June.
    2. Baxter, J. R. & Rosenthal, Jeffrey S., 1995. "Rates of convergence for everywhere-positive Markov chains," Statistics & Probability Letters, Elsevier, vol. 22(4), pages 333-338, March.
    3. Armin Schwartzman & Xihong Lin, 2011. "The effect of correlation in false discovery rate estimation," Biometrika, Biometrika Trust, vol. 98(1), pages 199-214.
    4. Qiu Xing & Klebanov Lev & Yakovlev Andrei, 2005. "Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-32, November.
    5. Jianqing Fan & Xu Han, 2017. "Estimation of the false discovery proportion with unknown dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(4), pages 1143-1164, September.
    6. Efron B. & Tibshirani R. & Storey J.D. & Tusher V., 2001. "Empirical Bayes Analysis of a Microarray Experiment," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1151-1160, December.
    7. Wenguang Sun & T. Tony Cai, 2009. "Large‐scale multiple testing under dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 393-424, April.
    8. Efron, Bradley, 2007. "Correlation and Large-Scale Simultaneous Significance Testing," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 93-103, March.
    9. John D. Storey & Jonathan E. Taylor & David Siegmund, 2004. "Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 66(1), pages 187-205, February.
    10. Efron, Bradley, 2004. "Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 96-104, January.
    11. Efron, Bradley, 2010. "Correlated z-Values and the Accuracy of Large-Scale Statistical Estimates," Journal of the American Statistical Association, American Statistical Association, vol. 105(491), pages 1042-1055.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gordon, Alexander & Chen, Linlin & Glazko, Galina & Yakovlev, Andrei, 2009. "Balancing type one and two errors in multiple testing for differential expression of genes," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1622-1629, March.
    2. Jianqing Fan & Xu Han, 2017. "Estimation of the false discovery proportion with unknown dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(4), pages 1143-1164, September.
    3. T. Tony Cai & Weidong Liu, 2016. "Large-Scale Multiple Testing of Correlations," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 229-240, March.
    4. Ghosh Debashis, 2012. "Incorporating the Empirical Null Hypothesis into the Benjamini-Hochberg Procedure," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(4), pages 1-21, July.
    5. Bickel David R., 2012. "Empirical Bayes Interval Estimates that are Conditionally Equal to Unadjusted Confidence Intervals or to Default Prior Credibility Intervals," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-34, February.
    6. Leek Jeffrey T & Storey John D., 2011. "The Joint Null Criterion for Multiple Hypothesis Tests," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-22, June.
    7. Wenguang Sun & T. Tony Cai, 2009. "Large‐scale multiple testing under dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 393-424, April.
    8. Wang, Jiangzhou & Cui, Tingting & Zhu, Wensheng & Wang, Pengfei, 2023. "Covariate-modulated large-scale multiple testing under dependence," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    9. Chang Yu & Daniel Zelterman, 2020. "Distributions associated with simultaneous multiple hypothesis testing," Journal of Statistical Distributions and Applications, Springer, vol. 7(1), pages 1-17, December.
    10. Wang, Xia & Shojaie, Ali & Zou, Jian, 2019. "Bayesian hidden Markov models for dependent large-scale multiple testing," Computational Statistics & Data Analysis, Elsevier, vol. 136(C), pages 123-136.
    11. de Uña-Alvarez Jacobo, 2011. "On the Statistical Properties of SGoF Multitesting Method," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-30, April.
    12. Lee, Donghwan & Lee, Youngjo, 2016. "Extended likelihood approach to multiple testing with directional error control under a hidden Markov random field model," Journal of Multivariate Analysis, Elsevier, vol. 151(C), pages 1-13.
    13. Nik Tuzov & Frederi Viens, 2011. "Mutual fund performance: false discoveries, bias, and power," Annals of Finance, Springer, vol. 7(2), pages 137-169, May.
    14. Lim Johan & Kim Jayoun & Kim Sang-cheol & Yu Donghyeon & Kim Kyunga & Kim Byung Soo, 2012. "Detection of Differentially Expressed Gene Sets in a Partially Paired Microarray Data Set," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-30, February.
    15. Chen, Xiongzhi & Doerge, R.W., 2020. "A strong law of large numbers related to multiple testing normal means," Statistics & Probability Letters, Elsevier, vol. 159(C).
    16. de Uña-Alvarez Jacobo, 2012. "The Beta-Binomial SGoF method for multiple dependent tests," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-32, May.
    17. Pallavi Basu & Luella Fu & Alessio Saretto & Wenguang Sun, 2021. "Empirical Bayes Control of the False Discovery Exceedance," Working Papers 2115, Federal Reserve Bank of Dallas.
    18. Habiger, Joshua D. & Peña, Edsel A., 2014. "Compound p-value statistics for multiple testing procedures," Journal of Multivariate Analysis, Elsevier, vol. 126(C), pages 153-166.
    19. Noirrit Kiran Chandra & Sourabh Bhattacharya, 2021. "Asymptotic theory of dependent Bayesian multiple testing procedures under possible model misspecification," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(5), pages 891-920, October.
    20. Chen, Xiongzhi, 2020. "A strong law of large numbers for simultaneously testing parameters of Lancaster bivariate distributions," Statistics & Probability Letters, Elsevier, vol. 167(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:metcap:v:23:y:2021:i:3:d:10.1007_s11009-019-09763-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.