IDEAS home Printed from https://ideas.repec.org/a/sae/medema/v46y2026i3p321-333.html

Use of Expected Utility to Evaluate Artificial Intelligence–Enabled Rule-out Devices for Mammography Screening

Author

Listed:
  • Kwok Lung Fan

    (US Food and Drug Administration, Silver Spring, MD, USA)

  • Yee Lam Elim Thompson

    (US Food and Drug Administration, Silver Spring, MD, USA)

  • Weijie Chen

    (US Food and Drug Administration, Silver Spring, MD, USA)

  • Craig K. Abbey

    (Department of Psychological and Brain Sciences, UC Santa Barbara, Santa Barbara, CA, USA)

  • Frank W. Samuelson

    (US Food and Drug Administration, Silver Spring, MD, USA)

Abstract

Background An artificial intelligence (AI)–enabled rule-out device may autonomously remove patient images unlikely to have cancer from radiologist review. Many published studies evaluate this type of device by retrospectively applying the AI to large datasets and use sensitivity and specificity as the performance metrics. However, these metrics have fundamental shortcomings because sensitivity will always be negatively affected in retrospective studies of rule-out applications of AI. Method We reviewed 2 performance metrics to compare the screening performance between the radiologist-with-rule-out-device and radiologist-without-device workflows: positive/negative predictive values (PPV/NPV) and expected utility (EU). We applied both methods to a recent study that reported improved performance in the radiologist-with-device workflow using a retrospective US dataset. We then applied the EU method to a European study based on the reported recall and cancer detection rates at different AI thresholds to compare the potential utility among different thresholds. Results For the US study, neither PPV/NPV nor EU can demonstrate significant improvement for any of the algorithm thresholds reported. For the study using European data, we found that EU is lower as AI rules out more patients including false-negative cases and reduces the overall screening performance. Conclusions Due to the nature of the retrospective simulated study design, sensitivity and specificity can be ambiguous in evaluating a rule-out device. We showed that using PPV/NPV or EU can resolve the ambiguity. The EU method can be applied with only recall rates and cancer detection rates, which is convenient as ground truth is often unavailable for nonrecalled patients in screening mammography. Highlights Sensitivity and specificity can be ambiguous metrics for evaluating a rule-out device in a retrospective setting. PPV and NPV can resolve the ambiguity but require the ground truth for all patients. Based on utility theory, expected utility (EU) is a potential metric that helps demonstrate improvement in screening performance due to a rule-out device using large retrospective datasets. We applied EU to a recent study that used a large retrospective mammography screening dataset from the United States. That study reported an improvement in specificity and decrease in sensitivity when using their AI as a rule-out device retrospectively. In terms of EU, we cannot conclude a significant improvement when the AI is used as a rule-out device. We applied the method to a European study that reported only recall rates and cancer detection rates. Since there is no established EU baseline value in European mammography screening workflow, we estimated the EU baseline using data from previous literature. We cannot conclude a significant improvement when the AI is used as a rule-out device for the European study. In this work, we investigated the use of EU to evaluate rule-out devices using large retrospective datasets. This metric, used with retrospective clinical data, could be used as supporting evidence for rule-out devices.

Suggested Citation

  • Kwok Lung Fan & Yee Lam Elim Thompson & Weijie Chen & Craig K. Abbey & Frank W. Samuelson, 2026. "Use of Expected Utility to Evaluate Artificial Intelligence–Enabled Rule-out Devices for Mammography Screening," Medical Decision Making, , vol. 46(3), pages 321-333, April.
  • Handle: RePEc:sae:medema:v:46:y:2026:i:3:p:321-333
    DOI: 10.1177/0272989X251388665
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/0272989X251388665
    Download Restriction: no

    File URL: https://libkey.io/10.1177/0272989X251388665?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Craig K. Abbey & Miguel P. Eckstein & John M. Boone, 2010. "An Equivalent Relative Utility Metric for Evaluating Screening Mammography," Medical Decision Making, , vol. 30(1), pages 113-122, January.
    2. Charles E. Phelps & Alvin I. Mushlin, 1988. "Focusing Technology Assessment Using Medical Decision Theory," Medical Decision Making, , vol. 8(4), pages 279-289, December.
    3. Robert F. Wagner & Craig A. Beam & Sergey V. Beiden, 2004. "Reader Variability in Mammography and Its Implications for Expected Utility over the Population of Readers and Cases," Medical Decision Making, , vol. 24(6), pages 561-572, November.
    4. Stuart G. Baker & Nancy R. Cook & Andrew Vickers & Barnett S. Kramer, 2009. "Using relative utility curves to evaluate risk prediction," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 172(4), pages 729-748, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ben Van Calster & Andrew J. Vickers, 2015. "Calibration of Risk Prediction Models," Medical Decision Making, , vol. 35(2), pages 162-169, February.
    2. Hui Zhang & Christian Wernz & Danny R. Hughes, 2018. "Modeling and designing health care payment innovations for medical imaging," Health Care Management Science, Springer, vol. 21(1), pages 37-51, March.
    3. Charles F. Manski, 2020. "Towards Reasonable Patient Care Under Uncertainty," Contemporary Economic Policy, Western Economic Association International, vol. 38(2), pages 227-245, April.
    4. Charles E. Phelps, 1997. "Good Technologies Gone Bad," Medical Decision Making, , vol. 17(1), pages 107-117, February.
    5. David Bardey & Philippe de Donder & Vera Zaporozhets, 2024. "The health technology assessment approach of the economic value of diagnostic tests: a literature review," Working Papers hal-04472485, HAL.
    6. Charles F. Manski, 2018. "Reasonable patient care under uncertainty," Health Economics, John Wiley & Sons, Ltd., vol. 27(10), pages 1397-1421, October.
    7. Charles F. Manski, 2022. "Patient‐centered appraisal of race‐free clinical risk assessment," Health Economics, John Wiley & Sons, Ltd., vol. 31(10), pages 2109-2114, October.
    8. Charles F. Manski, 2016. "Credible Ecological Inference for Personalized Medicine: Formalizing Clinical Judgment," NBER Working Papers 22643, National Bureau of Economic Research, Inc.
    9. Jose A. Robles-Zurita & Neil Hawkins & Janet Bouttell, 2025. "Leveling up: Treating Uptake as Endogenous May Increase the Value of Screening Programs," Medical Decision Making, , vol. 45(3), pages 318-331, April.
    10. Ben Van Calster & Ewout W. Steyerberg & Ralph B. D’Agostino Sr & Michael J. Pencina, 2014. "Sensitivity and Specificity Can Change in Opposite Directions When New Predictive Markers Are Added to Risk Models," Medical Decision Making, , vol. 34(4), pages 513-522, May.
    11. Ben Van Calster & Andrew J. Vickers & Michael J. Pencina & Stuart G. Baker & Dirk Timmerman & Ewout W. Steyerberg, 2013. "Evaluation of Markers and Risk Prediction Models," Medical Decision Making, , vol. 33(4), pages 490-501, May.
    12. Tracey L. Marsh & Holly Janes & Margaret S. Pepe, 2020. "Statistical inference for net benefit measures in biomarker validation studies," Biometrics, The International Biometric Society, vol. 76(3), pages 843-852, September.
    13. Kämpfen, F.; & Gómez-Olivé, X.; & O’Donnell, O.; & Riumallo Herl, C.;, 2023. "Effectiveness of Population-Based Hypertension Screening: A Multidimensional Regression Discontinuity Design," Health, Econometrics and Data Group (HEDG) Working Papers 23/15, HEDG, c/o Department of Economics, University of York.
    14. Charles F. Manski, 2023. "Using Limited Trial Evidence to Credibly Choose Treatment Dosage when Efficacy and Adverse Effects Weakly Increase with Dose," NBER Working Papers 31305, National Bureau of Economic Research, Inc.
    15. Shi, Chengchun & Lu, Wenbin & Song, Rui, 2019. "A sparse random projection-based test for overall qualitative treatment effects," LSE Research Online Documents on Economics 102107, London School of Economics and Political Science, LSE Library.
    16. Joanne Lord & George Laking & Alastair Fischer, 2006. "Non‐linearity in the cost‐effectiveness frontier," Health Economics, John Wiley & Sons, Ltd., vol. 15(6), pages 565-577, June.
    17. Greve, Jane & Kristensen, Søren Rud & Lydiksen, Nis, 2023. "Patient and peer: Guideline design and expert response," Journal of Health Economics, Elsevier, vol. 92(C).
    18. Marjolein A. M. Mulders & Monique M. J. Walenkamp & Nico L. Sosef & Frank Ouwehand & Romuald van Velde & J. Carel Goslings & Niels W. L. Schep, 2020. "The Amsterdam Wrist Rules: how much money can they save?," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 21(5), pages 745-750, July.
    19. Hormuzd A. Katki & Ionut Bebu, 2021. "A simple framework to identify optimal cost‐effective risk thresholds for a single screen: Comparison to Decision Curve Analysis," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(3), pages 887-903, July.
    20. Stuart G. Baker, 2024. "Evaluating Risk Prediction with Data Collection Costs: Novel Estimation of Test Tradeoff Curves," Medical Decision Making, , vol. 44(1), pages 53-63, January.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:medema:v:46:y:2026:i:3:p:321-333. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.