IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0107633.html
   My bibliography  Save this article

Exploration of Analysis Methods for Diagnostic Imaging Tests: Problems with ROC AUC and Confidence Scores in CT Colonography

Author

Listed:
  • Susan Mallett
  • Steve Halligan
  • Gary S Collins
  • Doug G Altman

Abstract

Background: Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection. Methods: In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods. Results: Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC. Conclusions: The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests.

Suggested Citation

  • Susan Mallett & Steve Halligan & Gary S Collins & Doug G Altman, 2014. "Exploration of Analysis Methods for Diagnostic Imaging Tests: Problems with ROC AUC and Confidence Scores in CT Colonography," PLOS ONE, Public Library of Science, vol. 9(10), pages 1-11, October.
  • Handle: RePEc:plo:pone00:0107633
    DOI: 10.1371/journal.pone.0107633
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0107633
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0107633&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0107633?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Vickers, Andrew J, 2008. "Decision Analysis for the Evaluation of Diagnostic Tests, Prediction Models, and Molecular Markers," The American Statistician, American Statistical Association, vol. 62(4), pages 314-320.
    2. Karel G.M. Moons & Theo Stijnen & Bowine C. Michel & Harry R. Büller & Gerrit-Anne Van Es & Diederick E. Grobbee & J. Dik F. Habbema, 1997. "Application of Treatment Thresholds to Diagnostic-test Evaluation," Medical Decision Making, , vol. 17(4), pages 447-454, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Thaworn Dendumrongsup & Andrew A Plumb & Steve Halligan & Thomas R Fanshawe & Douglas G Altman & Susan Mallett, 2014. "Multi-Reader Multi-Case Studies Using the Area under the Receiver Operator Characteristic Curve as a Measure of Diagnostic Accuracy: Systematic Review with a Focus on Quality of Data Reporting," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-20, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dexin Chen & Meiting Fu & Liangjie Chi & Liyan Lin & Jiaxin Cheng & Weisong Xue & Chenyan Long & Wei Jiang & Xiaoyu Dong & Jian Sui & Dajia Lin & Jianping Lu & Shuangmu Zhuo & Side Liu & Guoxin Li & G, 2022. "Prognostic and predictive value of a pathomics signature in gastric cancer," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    2. Tracey L. Marsh & Holly Janes & Margaret S. Pepe, 2020. "Statistical inference for net benefit measures in biomarker validation studies," Biometrics, The International Biometric Society, vol. 76(3), pages 843-852, September.
    3. Tengyang Wang & Guanghua Liu & Hongye Lin, 2020. "A machine learning approach to predict intravenous immunoglobulin resistance in Kawasaki disease patients: A study based on a Southeast China population," PLOS ONE, Public Library of Science, vol. 15(8), pages 1-15, August.
    4. Baker Stuart G. & Van Calster Ben & Steyerberg Ewout W., 2012. "Evaluating a New Marker for Risk Prediction Using the Test Tradeoff: An Update," The International Journal of Biostatistics, De Gruyter, vol. 8(1), pages 1-37, March.
    5. Todd J. Levy & Kevin Coppa & Jinxuan Cang & Douglas P. Barnaby & Marc D. Paradis & Stuart L. Cohen & Alex Makhnevich & David Klaveren & David M. Kent & Karina W. Davidson & Jamie S. Hirsch & Theodoros, 2022. "Development and validation of self-monitoring auto-updating prognostic models of survival for hospitalized COVID-19 patients," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    6. Jérôme Allyn & Cyril Ferdynus & Michel Bohrer & Cécile Dalban & Dorothée Valance & Nicolas Allou, 2016. "Simplified Acute Physiology Score II as Predictor of Mortality in Intensive Care Units: A Decision Curve Analysis," PLOS ONE, Public Library of Science, vol. 11(10), pages 1-11, October.
    7. Stuart Baker & Jian-Lun Xu & Ping Hu & Peng Huang, 2014. "Vardeman, S. B. and Morris, M. D. (2013), "Majority Voting by Independent Classifiers can Increase Error Rates," The American Statistician, 67, 94-96: Comment by Baker, Xu, Hu, and Huang and," The American Statistician, Taylor & Francis Journals, vol. 68(2), pages 125-126, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0107633. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.