Adaptive Choice of the Number of Bootstrap Samples in Large Scale Multiple Testing

My bibliography Save this article

Adaptive Choice of the Number of Bootstrap Samples in Large Scale Multiple Testing

Author

Listed:

Guo Wenge
(National Institute of Environmental Health Science)
Peddada Shyamal
(National Institute of Environmental Health Science)

Registered:

Abstract

It is a common practice to use resampling methods such as the bootstrap for calculating the p-value for each test when performing large scale multiple testing. The precision of the bootstrap p-values and that of the false discovery rate (FDR) relies on the number of bootstraps used for testing each hypothesis. Clearly, the larger the number of bootstraps the better the precision. However, the required number of bootstraps can be computationally burdensome, and it multiplies the number of tests to be performed. Further adding to the computational challenge is that in some applications the calculation of the test statistic itself may require considerable computation time. As technology improves one can expect the dimension of the problem to increase as well. For instance, during the early days of microarray technology, the number of probes on a cDNA chip was less than 10,000. Now the Affymetrix chips come with over 50,000 probes per chip. Motivated by this important need, we developed a simple adaptive bootstrap methodology for large scale multiple testing, which reduces the total number of bootstrap calculations while ensuring the control of the FDR. The proposed algorithm results in a substantial reduction in the number of bootstrap samples. Based on a simulation study we found that, relative to the number of bootstraps required for the Benjamini-Hochberg (BH) procedure, the standard FDR methodology which was the proposed methodology achieved a very substantial reduction in the number of bootstraps. In some cases the new algorithm required as little as 1/6th the number of bootstraps as the conventional BH procedure. Thus, if the conventional BH procedure used 1,000 bootstraps, then the proposed method required only 160 bootstraps. This methodology has been implemented for time-course/dose-response data in our software, ORIOGEN, which is available from the authors upon request.

Suggested Citation

Guo Wenge & Peddada Shyamal, 2008. "Adaptive Choice of the Number of Bootstrap Samples in Large Scale Multiple Testing," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-21, March.

Handle: RePEc:bpj:sagmbi:v:7:y:2008:i:1:n:13
DOI: 10.2202/1544-6115.1360

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Yoav Benjamini & Abba M. Krieger & Daniel Yekutieli, 2006. "Adaptive linear step-up procedures that control the false discovery rate," Biometrika, Biometrika Trust, vol. 93(3), pages 491-507, September.
van der Laan Mark J. & Dudoit Sandrine & Pollard Katherine S., 2004. "Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-27, June.
Mark van der Laan & Sandrine Dudoit & Katherine Pollard, 2004. "Multiple Testing. Part III. Procedures for Control of the Generalized Family-Wise Error Rate and Proportion of False Positives," U.C. Berkeley Division of Biostatistics Working Paper Series 1140, Berkeley Electronic Press.
Russell Davidson & James MacKinnon, 2000. "Bootstrap tests: how many bootstraps?," Econometric Reviews, Taylor & Francis Journals, vol. 19(1), pages 55-68.
- James G. MacKinnon & Russell Davidson, 2001. "Bootstrap Tests: How Many Bootstraps?," Working Paper 1036, Economics Department, Queen's University.
Donald W. K. Andrews & Moshe Buchinsky, 2000. "A Three-Step Method for Choosing the Number of Bootstrap Repetitions," Econometrica, Econometric Society, vol. 68(1), pages 23-52, January.
Yoav Benjamini & Yosef Hochberg, 2000. "On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics," Journal of Educational and Behavioral Statistics, , vol. 25(1), pages 60-83, March.
Youngchao Ge & Sandrine Dudoit & Terence Speed, 2003. "Resampling-based multiple testing for microarray data analysis," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 12(1), pages 1-77, June.
John D. Storey & Jonathan E. Taylor & David Siegmund, 2004. "Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 66(1), pages 187-205, February.
Efron B. & Tibshirani R. & Storey J.D. & Tusher V., 2001. "Empirical Bayes Analysis of a Microarray Experiment," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1151-1160, December.
John D. Storey, 2002. "A direct approach to false discovery rates," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(3), pages 479-498, August.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Axel Gandy & Georg Hahn, 2014. "MMCTest—A Safe Algorithm for Implementing Multiple Monte Carlo Tests," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(4), pages 1083-1101, December.
Hahn, Georg, 2020. "On the expected runtime of multiple testing algorithms with bounded error," Statistics & Probability Letters, Elsevier, vol. 165(C).
Georg Hahn, 2018. "Closure properties of classes of multiple testing procedures," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 102(2), pages 167-178, April.
Axel Gandy & Georg Hahn, 2016. "A Framework for Monte Carlo based Multiple Testing," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(4), pages 1046-1063, December.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Debashis Ghosh, 2006. "Shrunken p-Values for Assessing Differential Expression with Applications to Genomic Data Analysis," Biometrics, The International Biometric Society, vol. 62(4), pages 1099-1106, December.
Alessio Farcomeni, 2009. "Generalized Augmentation to Control the False Discovery Exceedance in Multiple Testing," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 36(3), pages 501-517, September.
Joseph Romano & Azeem Shaikh & Michael Wolf, 2008. "Control of the false discovery rate under dependence using the bootstrap and subsampling," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 17(3), pages 417-442, November.
- Joseph P. Romano & Azeem M. Shaikh & Michael Wolf, 2008. "Control of the False Discovery Rate under Dependence using the Bootstrap and Subsampling," IEW - Working Papers 337, Institute for Empirical Research in Economics - University of Zurich.
Christina C. Bartenschlager & Michael Krapp, 2015. "Theorie und Methoden multipler statistischer Vergleiche," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 9(2), pages 107-129, November.
Joseph P. Romano & Azeem M. Shaikh & Michael Wolf, 2010. "Hypothesis Testing in Econometrics," Annual Review of Economics, Annual Reviews, vol. 2(1), pages 75-104, September.
- Joseph P. Romano & Azeem M. Shaikh & Michael Wolf, 2009. "Hypothesis testing in econometrics," IEW - Working Papers 444, Institute for Empirical Research in Economics - University of Zurich.
Nik Tuzov & Frederi Viens, 2011. "Mutual fund performance: false discoveries, bias, and power," Annals of Finance, Springer, vol. 7(2), pages 137-169, May.
Yoav Benjamini, 2010. "Discovering the false discovery rate," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(4), pages 405-416, September.
Kim, Donggyu & Zhang, Chunming, 2014. "Adaptive linear step-up multiple testing procedure with the bias-reduced estimator," Statistics & Probability Letters, Elsevier, vol. 87(C), pages 31-39.
Andrew Y. Chen, 2022. "Do t-Statistic Hurdles Need to be Raised?," Papers 2204.10275, arXiv.org, revised Feb 2024.
Li Wang, 2019. "Weighted multiple testing procedure for grouped hypotheses with k-FWER control," Computational Statistics, Springer, vol. 34(2), pages 885-909, June.
Habiger, Joshua D. & Peña, Edsel A., 2014. "Compound p-value statistics for multiple testing procedures," Journal of Multivariate Analysis, Elsevier, vol. 126(C), pages 153-166.
Joseph P. Romano & Michael Wolf, 2008. "Balanced Control of Generalized Error Rates," IEW - Working Papers 379, Institute for Empirical Research in Economics - University of Zurich.
Wen Shi & Xi Chen & Jennifer Shang, 2019. "An Efficient Morris Method-Based Framework for Simulation Factor Screening," INFORMS Journal on Computing, INFORMS, vol. 31(4), pages 745-770, October.
Shigeyuki Matsui & Hisashi Noma, 2011. "Estimating Effect Sizes of Differentially Expressed Genes for Power and Sample-Size Assessments in Microarray Experiments," Biometrics, The International Biometric Society, vol. 67(4), pages 1225-1235, December.
Ghosh Debashis, 2012. "Incorporating the Empirical Null Hypothesis into the Benjamini-Hochberg Procedure," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(4), pages 1-21, July.
Xiaoquan Wen, 2017. "Robust Bayesian FDR Control Using Bayes Factors, with Applications to Multi-tissue eQTL Discovery," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(1), pages 28-49, June.
Alejandro Ochoa & John D Storey & Manuel Llinás & Mona Singh, 2015. "Beyond the E-Value: Stratified Statistics for Protein Domain Prediction," PLOS Computational Biology, Public Library of Science, vol. 11(11), pages 1-21, November.
Baolin Wu & Zhong Guan & Hongyu Zhao, 2006. "Parametric and Nonparametric FDR Estimation Revisited," Biometrics, The International Biometric Society, vol. 62(3), pages 735-744, September.
Chen, Xiongzhi, 2019. "Uniformly consistently estimating the proportion of false null hypotheses via Lebesgue–Stieltjes integral equations," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 724-744.
Christina C. Bartenschlager & Jens O. Brunner, 2019. "Reaching for the stars: attention to multiple testing problems and method recommendations using simulation for business research," Journal of Business Economics, Springer, vol. 89(4), pages 447-479, June.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:7:y:2008:i:1:n:13. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Adaptive Choice of the Number of Bootstrap Samples in Large Scale Multiple Testing

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data