Managing Cognitive Bias in Human Labeling Operations for Rare-Event AI: Evidence from a Field Experiment

Managing Cognitive Bias in Human Labeling Operations for Rare-Event AI: Evidence from a Field Experiment

Author

Listed:

Gunnar P. Epping
Andrew Caplin
Erik Duhaime
William R. Holmes
Daniel Martin
Jennifer S. Trueblood

Registered:

Daniel Martin

Abstract

Many operational AI systems depend on large-scale human annotation to detect rare but consequential events (e.g., fraud, defects, and medical abnormalities). When positives are rare, the prevalence effect induces systematic cognitive biases that inflate misses and can propagate through the AI lifecycle via biased training labels. We analyze prior experimental evidence and run a field experiment on DiagnosUs, a medical crowdsourcing platform, in which we hold the true prevalence in the unlabeled stream fixed (20% blasts) while varying (i) the prevalence of positives in the gold-standard feedback stream (20% vs. 50%) and (ii) the response interface (binary labels vs. elicited probabilities). We then post-process probabilistic labels using a linear-in-log-odds recalibration approach at the worker and crowd levels, and train convolutional neural networks on the resulting labels. Balanced feedback and probabilistic elicitation reduce rare-event misses, and pipeline-level recalibration substantially improves both classification performance and probabilistic calibration; these gains carry through to downstream CNN reliability out of sample.

Suggested Citation

Gunnar P. Epping & Andrew Caplin & Erik Duhaime & William R. Holmes & Daniel Martin & Jennifer S. Trueblood, 2026. "Managing Cognitive Bias in Human Labeling Operations for Rare-Event AI: Evidence from a Field Experiment," Papers 2603.11511, arXiv.org.

Handle: RePEc:arx:papers:2603.11511

Download full text from publisher

References listed on IDEAS

Jonathan Baron & Barbara A. Mellers & Philip E. Tetlock & Eric Stone & Lyle H. Ungar, 2014. "Two Reasons to Make Aggregated Probability Forecasts More Extreme," Decision Analysis, INFORMS, vol. 11(2), pages 133-145, June.
Ying Han & David V. Budescu, 2022. "Recalibrating probabilistic forecasts to improve their accuracy," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 17(1), pages 91-123, January.
Han, Ying & Budescu, David V., 2022. "Recalibrating probabilistic forecasts to improve their accuracy," Judgment and Decision Making, Cambridge University Press, vol. 17(1), pages 91-123, January.
David V. Budescu & Eva Chen, 2015. "Identifying Expertise to Extract the Wisdom of Crowds," Management Science, INFORMS, vol. 61(2), pages 267-280, February.
Jeremy M. Wolfe & Todd S. Horowitz & Naomi M. Kenner, 2005. "Rare items often missed in visual searches," Nature, Nature, vol. 435(7041), pages 439-440, May.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Peker, Cem & Wilkening, Tom, 2025. "Robust recalibration of aggregate probability forecasts using meta-beliefs," International Journal of Forecasting, Elsevier, vol. 41(2), pages 613-630.
Ho, Emily H. & Budescu, David V. & Himmelstein, Mark, 2025. "Measuring probabilistic coherence to identify superior forecasters," International Journal of Forecasting, Elsevier, vol. 41(2), pages 596-612.
Hassoun, Zane & MacKay, Niall & Powell, Ben, 2026. "Kairosis: A method for dynamical probability forecast aggregation informed by Bayesian change-point detection," International Journal of Forecasting, Elsevier, vol. 42(1), pages 112-125.
Marcellin Martinie & Tom Wilkening & Piers D L Howe, 2020. "Using meta-predictions to identify experts in the crowd when past performance is unknown," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-11, April.
John McCoy & Drazen Prelec, 2024. "A Bayesian Hierarchical Model of Crowd Wisdom Based on Predicting Opinions of Others," Management Science, INFORMS, vol. 70(9), pages 5931-5948, September.
James W. Taylor & Xiaochun Meng, 2026. "Angular Combining of Forecasts of Probability Distributions," Management Science, INFORMS, vol. 72(3), pages 2111-2133, March.
Anca M. Hanea & Marissa F. McBride & Mark A. Burgman & Bonnie C. Wintle, 2018. "The Value of Performance Weights and Discussion in Aggregated Expert Judgments," Risk Analysis, John Wiley & Sons, vol. 38(9), pages 1781-1794, September.
Ying Han & David Budescu, 2019. "A universal method for evaluating the quality of aggregators," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 14(4), pages 395-411, July.
Satopää, Ville A. & Salikhov, Marat & Tetlock, Philip E. & Mellers, Barbara, 2023. "Decomposing the effects of crowd-wisdom aggregators: The bias–information–noise (BIN) model," International Journal of Forecasting, Elsevier, vol. 39(1), pages 470-485.
Asa B. Palley & Jack B. Soll, 2019. "Extracting the Wisdom of Crowds When Information Is Shared," Management Science, INFORMS, vol. 67(5), pages 2291-2309, May.
Vitalii Antoshchuk & Volodymyr Filippov & Varvara Kuvaieva, 2021. "Development of methodological support for improving the quality of expert assessment of business processes," Technology audit and production reserves, Socionet;Technology audit and production reserves, vol. 1(4(57)), pages 22-27.
Dan Zhu & Qingwei Wang & John Goddard, 2022. "A new hedging hypothesis regarding prediction interval formation in stock price forecasting," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(4), pages 697-717, July.
David R. Mandel & Daniel Irwin, 2021. "Tracking accuracy of strategic intelligence forecasts: Findings from a long‐term Canadian study," Futures & Foresight Science, John Wiley & Sons, vol. 3(3-4), September.
Xiaojia Guo & Kenneth C. Lichtendahl & Yael Grushka-Cockayne, 2025. "Bayesian Ensembles of Exponentially Smoothed Life-Cycle Forecasts," Manufacturing & Service Operations Management, INFORMS, vol. 27(1), pages 230-248, January.
Bernd Frick & Franziska Prockl, 2018. "Information Precision In Online Communities: Player Valuations On Www.Transfermarkt.De," Working Papers Dissertations 37, Paderborn University, Faculty of Business Administration and Economics.
Esther Kaufmann, 2024. "Teachers’ judgment accuracy: A replication check by psychometric meta-analysis," PLOS ONE, Public Library of Science, vol. 19(7), pages 1-18, July.
Benchimol, Jonathan & El-Shagi, Makram & Saadon, Yossi, 2022. "Do expert experience and characteristics affect inflation forecasts?," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 201, pages 205-226.
- Benchimol, Jonathan & El-Shagi, Makram & Saadon, Yossi, 2022. "Do expert experience and characteristics affect inflation forecasts?," Journal of Economic Behavior & Organization, Elsevier, vol. 201(C), pages 205-226.
- Jonathan Benchimol & Makram El-Shagi & Yossi Saadon, 2020. "Do Expert Experience and Characteristics Affect Inflation Forecasts?," Bank of Israel Working Papers 2020.11, Bank of Israel.
- Jonathan Benchimol & Makram El-Shagi & Yossi Saadon, 2022. "Do expert experience and characteristics affect inflation forecasts?," Post-Print emse-04624966, HAL.
- Jonathan Benchimol & Makram El-Shagi & Yossi Saadon, 2020. "Do Expert Experience and Characteristics Affect Inflation Forecasts?," CFDS Discussion Paper Series 2020/6, Center for Financial Development and Stability at Henan University, Kaifeng, Henan, China.
Jaspersen, Johannes G., 2022. "Convex combinations in judgment aggregation," European Journal of Operational Research, Elsevier, vol. 299(2), pages 780-794.
Brown, Alasdair & Reade, J. James, 2019. "The wisdom of amateur crowds: Evidence from an online community of sports tipsters," European Journal of Operational Research, Elsevier, vol. 272(3), pages 1073-1081.
Pavel Atanasov & Phillip Rescober & Eric Stone & Samuel A. Swift & Emile Servan-Schreiber & Philip Tetlock & Lyle Ungar & Barbara Mellers, 2017. "Distilling the Wisdom of Crowds: Prediction Markets vs. Prediction Polls," Management Science, INFORMS, vol. 63(3), pages 691-706, March.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-AIN-2026-03-16 (Artificial Intelligence)
NEP-EXP-2026-03-16 (Experimental Economics)
NEP-NEU-2026-03-16 (Neuroeconomics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2603.11511. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Managing Cognitive Bias in Human Labeling Operations for Rare-Event AI: Evidence from a Field Experiment

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data