Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data

My bibliography Save this article

Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data

Author

Listed:

Constantin F Aliferis
Alexander Statnikov
Ioannis Tsamardinos
Jonathan S Schildcrout
Bryan E Shepherd
Frank E Harrell Jr.

Registered:

Abstract

Background: Critical to the development of molecular signatures from microarray and other high-throughput data is testing the statistical significance of the produced signature in order to ensure its statistical reproducibility. While current best practices emphasize sufficiently powered univariate tests of differential expression, little is known about the factors that affect the statistical power of complex multivariate analysis protocols for high-dimensional molecular signature development. Methodology/Principal Findings: We show that choices of specific components of the analysis (i.e., error metric, classifier, error estimator and event balancing) have large and compounding effects on statistical power. The effects are demonstrated empirically by an analysis of 7 of the largest microarray cancer outcome prediction datasets and supplementary simulations, and by contrasting them to prior analyses of the same data. Conclusions/Significance: The findings of the present study have two important practical implications: First, high-throughput studies by avoiding under-powered data analysis protocols, can achieve substantial economies in sample required to demonstrate statistical significance of predictive signal. Factors that affect power are identified and studied. Much less sample than previously thought may be sufficient for exploratory studies as long as these factors are taken into consideration when designing and executing the analysis. Second, previous highly-cited claims that microarray assays may not be able to predict disease outcomes better than chance are shown by our experiments to be due to under-powered data analysis combined with inappropriate statistical tests.

Suggested Citation

Constantin F Aliferis & Alexander Statnikov & Ioannis Tsamardinos & Jonathan S Schildcrout & Bryan E Shepherd & Frank E Harrell Jr., 2009. "Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data," PLOS ONE, Public Library of Science, vol. 4(3), pages 1-7, March.

Handle: RePEc:plo:pone00:0004922
DOI: 10.1371/journal.pone.0004922

Download full text from publisher

References listed on IDEAS

Alexander Shapiro & Jos Berge, 2002. "Statistical inference of minimum rank factor analysis," Psychometrika, Springer;The Psychometric Society, vol. 67(1), pages 79-94, March.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Guo Yu & Balasubramanian Raji, 2012. "Comparative Evaluation of Classifiers in the Presence of Statistical Interactions between Features in High Dimensional Data Settings," The International Journal of Biostatistics, De Gruyter, vol. 8(1), pages 1-32, June.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Anastasiou, Andreas, 2017. "Bounds for the normal approximation of the maximum likelihood estimator from m-dependent random variables," Statistics & Probability Letters, Elsevier, vol. 129(C), pages 171-181.
Denter, Philipp & Sisak, Dana, 2015. "Do polls create momentum in political competition?," Journal of Public Economics, Elsevier, vol. 130(C), pages 1-14.
- Philipp Denter & Dana Sisak, 2013. "Do Polls create Momentum in Political Competition?," Tinbergen Institute Discussion Papers 13-169/VII, Tinbergen Institute.
Salgado Alfredo, 2018. "Incomplete Information and Costly Signaling in College Admissions," Working Papers 2018-23, Banco de México.
Albrecht, James & Anderson, Axel & Vroman, Susan, 2010. "Search by committee," Journal of Economic Theory, Elsevier, vol. 145(4), pages 1386-1407, July.
- Susan Vroman & Axel Anderson & James Albrecht, 2007. "Search by Committee," 2007 Meeting Papers 351, Society for Economic Dynamics.
- James Albrecht & Axel Anderson & Susan Vroman, 2007. "Search by Committee," Working Papers gueconwpa~07-07-09, Georgetown University, Department of Economics.
- Albrecht, James & Anderson, Axel Z. & Vroman, Susan, 2007. "Search by Committee," IZA Discussion Papers 3137, Institute of Labor Economics (IZA).
Blier-Wong, Christopher & Cossette, Hélène & Marceau, Etienne, 2023. "Risk aggregation with FGM copulas," Insurance: Mathematics and Economics, Elsevier, vol. 111(C), pages 102-120.
Simon Bruhn & Thomas Grebel & Lionel Nesta, 2023. "The fallacy in productivity decomposition," Journal of Evolutionary Economics, Springer, vol. 33(3), pages 797-835, July.
- Bruhn, Simon & Grebel, Thomas & Nesta, Lionel, 2021. "The fallacy in productivity decomposition," Ilmenau Economics Discussion Papers 160, Ilmenau University of Technology, Institute of Economics.
- Bruhn, Simon & Grebel, Thomas & Nesta, Lionel, 2023. "The fallacy in productivity decomposition," Ilmenau Economics Discussion Papers 180, Ilmenau University of Technology, Institute of Economics.
- Simon Bruhn & Thomas Grebel & Lionel Nesta, 2021. "The fallacy in productivity decomposition," SciencePo Working papers Main hal-03474838, HAL.
- Simon Bruhn & Thomas Grebel & Lionel Nesta, 2021. "The Fallacy in Productivity Decomposition," GREDEG Working Papers 2021-39, Groupe de REcherche en Droit, Economie, Gestion (GREDEG CNRS), Université Côte d'Azur, France.
- Simon Bruhn & Thomas Grebel & Lionel Nesta, 2021. "The fallacy in productivity decomposition," Working Papers hal-03474838, HAL.
- Simon Bruhn & Thomas Grebel & Lionel Nesta, 2023. "The fallacy in productivity decomposition," Post-Print hal-04288851, HAL.
Wim J. van der Linden, 2019. "Lordâ€™s Equity Theorem Revisited," Journal of Educational and Behavioral Statistics, , vol. 44(4), pages 415-430, August.
Simar, Léopold & Wilson, Paul, 2022. "Modern Tools for Evaluating the Performance of Health-Care Providers," LIDAM Discussion Papers ISBA 2022006, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
Baey, Charlotte & Didier, Anne & Lemaire, Sébastien & Maupas, Fabienne & Cournède, Paul-Henry, 2013. "Modelling the interindividual variability of organogenesis in sugar beet populations using a hierarchical segmented model," Ecological Modelling, Elsevier, vol. 263(C), pages 56-63.
Tasche, Dirk, 2013. "Bayesian estimation of probabilities of default for low default portfolios," Journal of Risk Management in Financial Institutions, Henry Stewart Publications, vol. 6(3), pages 302-326, July.
- Dirk Tasche, 2011. "Bayesian estimation of probabilities of default for low default portfolios," Papers 1112.5550, arXiv.org, revised Aug 2013.
Diers, Dorothea & Linde, Marc & Hahn, Lukas, 2016. "Addendum to ‘The multi-year non-life insurance risk in the additive reserving model’ [Insurance Math. Econom. 52(3) (2013) 590–598]: Quantification of multi-year non-life insurance risk in chain ladde," Insurance: Mathematics and Economics, Elsevier, vol. 67(C), pages 187-199.
Anastasiou, Andreas, 2017. "Bounds for the normal approximation of the maximum likelihood estimator from m -dependent random variables," LSE Research Online Documents on Economics 83635, London School of Economics and Political Science, LSE Library.
Hirschberg, Joe & Lye, Jenny, 2017. "Inverting the indirect—The ellipse and the boomerang: Visualizing the confidence intervals of the structural coefficient from two-stage least squares," Journal of Econometrics, Elsevier, vol. 199(2), pages 173-183.
Serguei Kaniovski & Alexander Zaigraev, 2018. "The probability of majority inversion in a two-stage voting system with three states," Theory and Decision, Springer, vol. 84(4), pages 525-546, June.
Andreas Hefti & Peiyao Shen, 2025. "Predicting the distribution of contest success under the illusion of proportionality," ECON - Working Papers 466, Department of Economics - University of Zurich.
Packham, Natalie & Woebbeking, Fabian, 2021. "Correlation scenarios and correlation stress testing," IRTG 1792 Discussion Papers 2021-012, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
Guillermo Martínez-Flórez & Roger Tovar-Falón & Carlos Barrera-Causil, 2022. "Inflated Unit-Birnbaum-Saunders Distribution," Mathematics, MDPI, vol. 10(4), pages 1-14, February.
Xyngis, Georgios, 2017. "Business-cycle variation in macroeconomic uncertainty and the cross-section of expected returns: Evidence for scale-dependent risks," Journal of Empirical Finance, Elsevier, vol. 44(C), pages 43-65.
Nair, Gopalan M. & Turlach, Berwin A., 2012. "The stochastic h-index," Journal of Informetrics, Elsevier, vol. 6(1), pages 80-87.
Riko Kelter, 2021. "Analysis of type I and II error rates of Bayesian and frequentist parametric and nonparametric two-sample hypothesis tests under preliminary assessment of normality," Computational Statistics, Springer, vol. 36(2), pages 1263-1288, June.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0004922. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data