IDEAS home Printed from https://ideas.repec.org/a/taf/amstat/v69y2015i3p201-212.html
   My bibliography  Save this article

A Statistical Framework for Hypothesis Testing in Real Data Comparison Studies

Author

Listed:
  • Anne-Laure Boulesteix
  • Robert Hable
  • Sabine Lauer
  • Manuel J. A. Eugster

Abstract

In computational sciences, including computational statistics, machine learning, and bioinformatics, it is often claimed in articles presenting new supervised learning methods that the new method performs better than existing methods on real data, for instance in terms of error rate. However, these claims are often not based on proper statistical tests and, even if such tests are performed, the tested hypothesis is not clearly defined and poor attention is devoted to the Type I and Type II errors. In the present article, we aim to fill this gap by providing a proper statistical framework for hypothesis tests that compare the performances of supervised learning methods based on several real datasets with unknown underlying distributions. After giving a statistical interpretation of ad hoc tests commonly performed by computational researchers, we devote special attention to power issues and outline a simple method of determining the number of datasets to be included in a comparison study to reach an adequate power. These methods are illustrated through three comparison studies from the literature and an exemplary benchmarking study using gene expression microarray data. All our results can be reproduced using R codes and datasets available from the companion website http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_pr ofessuren/boulesteix/compstud2013 .

Suggested Citation

  • Anne-Laure Boulesteix & Robert Hable & Sabine Lauer & Manuel J. A. Eugster, 2015. "A Statistical Framework for Hypothesis Testing in Real Data Comparison Studies," The American Statistician, Taylor & Francis Journals, vol. 69(3), pages 201-212, August.
  • Handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:201-212
    DOI: 10.1080/00031305.2015.1005128
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/00031305.2015.1005128
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/00031305.2015.1005128?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Eugster, Manuel J.A. & Leisch, Friedrich & Strobl, Carolin, 2014. "(Psycho-)analysis of benchmark experiments: A formal framework for investigating the relationship between data sets and learning algorithms," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 986-1000.
    2. Lee, Jae Won & Lee, Jung Bok & Park, Mira & Song, Seuck Heun, 2005. "An extensive comparison of recent classification tools applied to microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 48(4), pages 869-885, April.
    3. Anne-Laure Boulesteix & Sabine Lauer & Manuel J A Eugster, 2013. "A Plea for Neutral Comparison Studies in Computational Sciences," PLOS ONE, Public Library of Science, vol. 8(4), pages 1-11, April.
    4. Dudoit S. & Fridlyand J. & Speed T. P, 2002. "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 77-87, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Doove, Lisa L. & Wilderjans, Tom F. & Calcagnì, Antonio & Van Mechelen, Iven, 2017. "Deriving optimal data-analytic regimes from benchmarking studies," Computational Statistics & Data Analysis, Elsevier, vol. 107(C), pages 81-91.
    2. Andrew Gelman & Christian Hennig, 2017. "Beyond subjective and objective in statistics," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 180(4), pages 967-1033, October.
    3. Anne-Laure Boulesteix, 2015. "Ten Simple Rules for Reducing Overoptimistic Reporting in Methodological Computational Research," PLOS Computational Biology, Public Library of Science, vol. 11(4), pages 1-6, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alan R Dabney & John D Storey, 2007. "Optimality Driven Nearest Centroid Classification from Genomic Data," PLOS ONE, Public Library of Science, vol. 2(10), pages 1-7, October.
    2. Dong, Kai & Pang, Herbert & Tong, Tiejun & Genton, Marc G., 2016. "Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 127-142.
    3. Shieh Albert D & Hung Yeung Sam, 2009. "Detecting Outlier Samples in Microarray Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-24, February.
    4. Valkenborg Dirk & Van Sanden Suzy & Lin Dan & Kasim Adetayo & Zhu Qi & Haldermans Philippe & Jansen Ivy & Shkedy Ziv & Burzykowski Tomasz, 2008. "A Cross-Validation Study to Select a Classification Procedure for Clinical Diagnosis Based on Proteomic Mass Spectrometry," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(2), pages 1-22, March.
    5. Lambert-Lacroix, Sophie & Peyre, Julie, 2006. "Local likelihood regression in generalized linear single-index models with applications to microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 51(3), pages 2091-2113, December.
    6. Yang, Tae Young, 2009. "Simple Bayesian binary framework for discovering significant genes and classifying cancer diagnosis," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1743-1754, March.
    7. Scrucca, Luca, 2007. "Class prediction and gene selection for DNA microarrays using regularized sliced inverse regression," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 438-451, September.
    8. Conde David & Salvador Bonifacio & Rueda Cristina & Fernández Miguel A., 2013. "Performance and estimation of the true error rate of classification rules built with additional information. An application to a cancer trial," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(5), pages 583-602, October.
    9. Frénay, Benoît & Doquire, Gauthier & Verleysen, Michel, 2014. "Estimating mutual information for feature selection in the presence of label noise," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 832-848.
    10. Kubokawa, Tatsuya & Srivastava, Muni S., 2008. "Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 99(9), pages 1906-1928, October.
    11. Hossain, Ahmed & Beyene, Joseph & Willan, Andrew R. & Hu, Pingzhao, 2009. "A flexible approximate likelihood ratio test for detecting differential expression in microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3685-3695, August.
    12. Luca Scrucca, 2014. "Graphical tools for model-based mixture discriminant analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(2), pages 147-165, June.
    13. Bilin Zeng & Xuerong Meggie Wen & Lixing Zhu, 2017. "A link-free sparse group variable selection method for single-index model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2388-2400, October.
    14. J. Burez & D. Van Den Poel, 2005. "CRM at a Pay-TV Company: Using Analytical Models to Reduce Customer Attrition by Targeted Marketing for Subscription Services," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 05/348, Ghent University, Faculty of Economics and Business Administration.
    15. Won, Joong-Ho & Lim, Johan & Yu, Donghyeon & Kim, Byung Soo & Kim, Kyunga, 2014. "Monotone false discovery rate," Statistics & Probability Letters, Elsevier, vol. 87(C), pages 86-93.
    16. Jan, Budczies & Kosztyla, Daniel & von Törne, Christian & Stenzinger, Albrecht & Darb-Esfahani, Silvia & Dietel, Manfred & Denkert, Carsten, 2014. "cancerclass: An R Package for Development and Validation of Diagnostic Tests from High-Dimensional Molecular Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 59(i01).
    17. Jianqing Fan & Yang Feng & Jiancheng Jiang & Xin Tong, 2016. "Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 275-287, March.
    18. Márton Gosztonyi & Csákné Filep Judit, 2022. "Profiling (Non-)Nascent Entrepreneurs in Hungary Based on Machine Learning Approaches," Sustainability, MDPI, vol. 14(6), pages 1-20, March.
    19. Wang, Tao & Xu, Pei-Rong & Zhu, Li-Xing, 2012. "Non-convex penalized estimation in high-dimensional models with single-index structure," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 221-235.
    20. Theresa Ullmann & Anna Beer & Maximilian Hünemörder & Thomas Seidl & Anne-Laure Boulesteix, 2023. "Over-optimistic evaluation and reporting of novel cluster algorithms: an illustrative study," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 211-238, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:amstat:v:69:y:2015:i:3:p:201-212. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UTAS20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.