IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2603.11511.html

Managing Cognitive Bias in Human Labeling Operations for Rare-Event AI: Evidence from a Field Experiment

Author

Listed:
  • Gunnar P. Epping
  • Andrew Caplin
  • Erik Duhaime
  • William R. Holmes
  • Daniel Martin
  • Jennifer S. Trueblood

Abstract

Many operational AI systems depend on large-scale human annotation to detect rare but consequential events (e.g., fraud, defects, and medical abnormalities). When positives are rare, the prevalence effect induces systematic cognitive biases that inflate misses and can propagate through the AI lifecycle via biased training labels. We analyze prior experimental evidence and run a field experiment on DiagnosUs, a medical crowdsourcing platform, in which we hold the true prevalence in the unlabeled stream fixed (20% blasts) while varying (i) the prevalence of positives in the gold-standard feedback stream (20% vs. 50%) and (ii) the response interface (binary labels vs. elicited probabilities). We then post-process probabilistic labels using a linear-in-log-odds recalibration approach at the worker and crowd levels, and train convolutional neural networks on the resulting labels. Balanced feedback and probabilistic elicitation reduce rare-event misses, and pipeline-level recalibration substantially improves both classification performance and probabilistic calibration; these gains carry through to downstream CNN reliability out of sample.

Suggested Citation

  • Gunnar P. Epping & Andrew Caplin & Erik Duhaime & William R. Holmes & Daniel Martin & Jennifer S. Trueblood, 2026. "Managing Cognitive Bias in Human Labeling Operations for Rare-Event AI: Evidence from a Field Experiment," Papers 2603.11511, arXiv.org.
  • Handle: RePEc:arx:papers:2603.11511
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2603.11511
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jonathan Baron & Barbara A. Mellers & Philip E. Tetlock & Eric Stone & Lyle H. Ungar, 2014. "Two Reasons to Make Aggregated Probability Forecasts More Extreme," Decision Analysis, INFORMS, vol. 11(2), pages 133-145, June.
    2. Ying Han & David V. Budescu, 2022. "Recalibrating probabilistic forecasts to improve their accuracy," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 17(1), pages 91-123, January.
    3. Han, Ying & Budescu, David V., 2022. "Recalibrating probabilistic forecasts to improve their accuracy," Judgment and Decision Making, Cambridge University Press, vol. 17(1), pages 91-123, January.
    4. David V. Budescu & Eva Chen, 2015. "Identifying Expertise to Extract the Wisdom of Crowds," Management Science, INFORMS, vol. 61(2), pages 267-280, February.
    5. Jeremy M. Wolfe & Todd S. Horowitz & Naomi M. Kenner, 2005. "Rare items often missed in visual searches," Nature, Nature, vol. 435(7041), pages 439-440, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Peker, Cem & Wilkening, Tom, 2025. "Robust recalibration of aggregate probability forecasts using meta-beliefs," International Journal of Forecasting, Elsevier, vol. 41(2), pages 613-630.
    2. Ho, Emily H. & Budescu, David V. & Himmelstein, Mark, 2025. "Measuring probabilistic coherence to identify superior forecasters," International Journal of Forecasting, Elsevier, vol. 41(2), pages 596-612.
    3. Hassoun, Zane & MacKay, Niall & Powell, Ben, 2026. "Kairosis: A method for dynamical probability forecast aggregation informed by Bayesian change-point detection," International Journal of Forecasting, Elsevier, vol. 42(1), pages 112-125.
    4. Marcellin Martinie & Tom Wilkening & Piers D L Howe, 2020. "Using meta-predictions to identify experts in the crowd when past performance is unknown," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-11, April.
    5. Anca M. Hanea & Marissa F. McBride & Mark A. Burgman & Bonnie C. Wintle, 2018. "The Value of Performance Weights and Discussion in Aggregated Expert Judgments," Risk Analysis, John Wiley & Sons, vol. 38(9), pages 1781-1794, September.
    6. Ying Han & David Budescu, 2019. "A universal method for evaluating the quality of aggregators," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 14(4), pages 395-411, July.
    7. Satopää, Ville A. & Salikhov, Marat & Tetlock, Philip E. & Mellers, Barbara, 2023. "Decomposing the effects of crowd-wisdom aggregators: The bias–information–noise (BIN) model," International Journal of Forecasting, Elsevier, vol. 39(1), pages 470-485.
    8. Asa B. Palley & Jack B. Soll, 2019. "Extracting the Wisdom of Crowds When Information Is Shared," Management Science, INFORMS, vol. 67(5), pages 2291-2309, May.
    9. Vitalii Antoshchuk & Volodymyr Filippov & Varvara Kuvaieva, 2021. "Development of methodological support for improving the quality of expert assessment of business processes," Technology audit and production reserves, Socionet;Technology audit and production reserves, vol. 1(4(57)), pages 22-27.
    10. Dan Zhu & Qingwei Wang & John Goddard, 2022. "A new hedging hypothesis regarding prediction interval formation in stock price forecasting," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(4), pages 697-717, July.
    11. David R. Mandel & Daniel Irwin, 2021. "Tracking accuracy of strategic intelligence forecasts: Findings from a long‐term Canadian study," Futures & Foresight Science, John Wiley & Sons, vol. 3(3-4), September.
    12. Bernd Frick & Franziska Prockl, 2018. "Information Precision In Online Communities: Player Valuations On Www.Transfermarkt.De," Working Papers Dissertations 37, Paderborn University, Faculty of Business Administration and Economics.
    13. Esther Kaufmann, 2024. "Teachers’ judgment accuracy: A replication check by psychometric meta-analysis," PLOS ONE, Public Library of Science, vol. 19(7), pages 1-18, July.
    14. Benchimol, Jonathan & El-Shagi, Makram & Saadon, Yossi, 2022. "Do expert experience and characteristics affect inflation forecasts?," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 201, pages 205-226.
    15. Jaspersen, Johannes G., 2022. "Convex combinations in judgment aggregation," European Journal of Operational Research, Elsevier, vol. 299(2), pages 780-794.
    16. Brown, Alasdair & Reade, J. James, 2019. "The wisdom of amateur crowds: Evidence from an online community of sports tipsters," European Journal of Operational Research, Elsevier, vol. 272(3), pages 1073-1081.
    17. Pavel Atanasov & Phillip Rescober & Eric Stone & Samuel A. Swift & Emile Servan-Schreiber & Philip Tetlock & Lyle Ungar & Barbara Mellers, 2017. "Distilling the Wisdom of Crowds: Prediction Markets vs. Prediction Polls," Management Science, INFORMS, vol. 63(3), pages 691-706, March.
    18. Patrick Afflerbach & Christopher Dun & Henner Gimpel & Dominik Parak & Johannes Seyfried, 2021. "A Simulation-Based Approach to Understanding the Wisdom of Crowds Phenomenon in Aggregating Expert Judgment," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 63(4), pages 329-348, August.
    19. Hanea, A.M. & McBride, M.F. & Burgman, M.A. & Wintle, B.C. & Fidler, F. & Flander, L. & Twardy, C.R. & Manning, B. & Mascaro, S., 2017. "I nvestigate D iscuss E stimate A ggregate for structured expert judgement," International Journal of Forecasting, Elsevier, vol. 33(1), pages 267-279.
    20. Ville A. Satopää & Robin Pemantle & Lyle H. Ungar, 2016. "Modeling Probability Forecasts via Information Diversity," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1623-1633, October.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2603.11511. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.