Does AI help humans make better decisions? A statistical evaluation framework for experimental and observational studies

My bibliography Save this paper

Does AI help humans make better decisions? A statistical evaluation framework for experimental and observational studies

Author

Listed:

Eli Ben-Michael
D. James Greiner
Melody Huang
Kosuke Imai
Zhichao Jiang
Sooahn Shin

Registered:

Abstract

The use of Artificial Intelligence (AI), or more generally data-driven algorithms, has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions compared to a human-alone or AI-alone system. We introduce a new methodological framework to empirically answer this question with a minimal set of assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded and unconfounded treatment assignment, where the provision of AI-generated recommendations is assumed to be randomized across cases with humans making final decisions. Under this study design, we show how to compare the performance of three alternative decision-making systems--human-alone, human-with-AI, and AI-alone. Importantly, the AI-alone system includes any individualized treatment assignment, including those that are not used in the original study. We also show when AI recommendations should be provided to a human-decision maker, and when one should follow such recommendations. We apply the proposed methodology to our own randomized controlled trial evaluating a pretrial risk assessment instrument. We find that the risk assessment recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Furthermore, we find that replacing a human judge with algorithms--the risk assessment score and a large language model in particular--leads to a worse classification performance.

Suggested Citation

Eli Ben-Michael & D. James Greiner & Melody Huang & Kosuke Imai & Zhichao Jiang & Sooahn Shin, 2024. "Does AI help humans make better decisions? A statistical evaluation framework for experimental and observational studies," Papers 2403.12108, arXiv.org, revised Oct 2024.

Handle: RePEc:arx:papers:2403.12108

Download full text from publisher

References listed on IDEAS

Mitchell Hoffman & Lisa B Kahn & Danielle Li, 2018. "Discretion in Hiring," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 133(2), pages 765-800.
- Mitchell Hoffman & Lisa B. Kahn & Danielle Li, 2015. "Discretion in Hiring," NBER Working Papers 21709, National Bureau of Economic Research, Inc.
David Arnold & Will Dobbie & Peter Hull, 2022. "Measuring Racial Discrimination in Bail Decisions," American Economic Review, American Economic Association, vol. 112(9), pages 2992-3038, September.
- David Arnold & Will Dobbie & Peter Hull, 2020. "Measuring Racial Discrimination in Bail Decisions," Working Papers 2020-33, Becker Friedman Institute for Research In Economics.
- Dobbie, Will & Hull, Peter & Arnold, David, 2022. "Measuring Racial Discrimination in Bail Decisions," University of California at San Diego, Economics Working Paper Series qt6f22n2h3, Department of Economics, UC San Diego.
- David Arnold & Will S. Dobbie & Peter Hull, 2020. "Measuring Racial Discrimination in Bail Decisions," NBER Working Papers 26999, National Bureau of Economic Research, Inc.
David Arnold & Will Dobbie & Peter Hull, 2021. "Measuring Racial Discrimination in Algorithms," AEA Papers and Proceedings, American Economic Association, vol. 111, pages 49-54, May.
- David Arnold & Will S. Dobbie & Peter Hull, 2020. "Measuring Racial Discrimination in Algorithms," NBER Working Papers 28222, National Bureau of Economic Research, Inc.
Will Dobbie & Jacob Goldin & Crystal S. Yang, 2018. "The Effects of Pretrial Detention on Conviction, Future Crime, and Employment: Evidence from Randomly Assigned Judges," American Economic Review, American Economic Association, vol. 108(2), pages 201-240, February.
- Will Dobbie & Jacob Goldin & Crystal Yang, 2016. "The Effects of Pre-Trial Detention on Conviction, Future Crime, and Employment: Evidence from Randomly Assigned Judges," NBER Working Papers 22511, National Bureau of Economic Research, Inc.
- Will Dobbie & Jacob Goldin & Crystal Yang, 2016. "The Effects of Pre-Trial Detention on Conviction, Future Crime, and Employment: Evidence from Randomly Assigned Judges," Working Papers id:11212, eSocialSciences.
- Will Dobbie & Jacob Goldin & Crystal Yang, 2016. "The Effects of Pre-Trial Detention on Conviction, Future Crime, and Employment: Evidence from Randomly Assigned Judges," Working Papers 601, Princeton University, Department of Economics, Industrial Relations Section..
Victoria Angelova & Will S. Dobbie & Crystal Yang, 2023. "Algorithmic Recommendations and Human Discretion," NBER Working Papers 31747, National Bureau of Economic Research, Inc.
Sharad Goel & Justin M. Rao & Ravi Shroff, 2016. "Personalized Risk Assessments in the Criminal Justice System," American Economic Review, American Economic Association, vol. 106(5), pages 119-123, May.
Jon Kleinberg & Himabindu Lakkaraju & Jure Leskovec & Jens Ludwig & Sendhil Mullainathan, 2018. "Human Decisions and Machine Predictions," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 133(1), pages 237-293.
- Jon Kleinberg & Himabindu Lakkaraju & Jure Leskovec & Jens Ludwig & Sendhil Mullainathan, 2017. "Human Decisions and Machine Predictions," NBER Working Papers 23180, National Bureau of Economic Research, Inc.
Richard A. Berk & Susan B. Sorenson & Geoffrey Barnes, 2016. "Forecasting Domestic Violence: A Machine Learning Approach to Help Inform Arraignment Decisions," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 13(1), pages 94-115, March.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Eli Ben-Michael, 2025. "Partial identification via conditional linear programs: estimation and policy learning," Papers 2506.12215, arXiv.org, revised Aug 2025.
Grimon, Marie-Pascale & Mills, Christopher, 2025. "Better Together? A Field Experiment on Human-Algorithm Interaction in Child Protection," SOFI Working Papers in Labour Economics 2/2025, Stockholm University, Swedish Institute for Social Research.
- Marie-Pascale Grimon & Christopher Mills, 2025. "Better Together? A Field Experiment on Human-Algorithm Interaction in Child Protection," Papers 2502.08501, arXiv.org, revised Aug 2025.
Benedikt Koch & Kosuke Imai, 2025. "Statistical Decision Theory with Counterfactual Loss," Papers 2505.08908, arXiv.org, revised Oct 2025.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Joshua Grossman & Julian Nyarko & Sharad Goel, 2023. "Racial bias as a multi‐stage, multi‐actor problem: An analysis of pretrial detention," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 20(1), pages 86-133, March.
Ivan A Canay & Magne Mogstad & Jack Mount, 2024. "On the Use of Outcome Tests for Detecting Bias in Decision Making," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 91(4), pages 2135-2167.
- Ivan A. Canay & Magne Mogstad & Jack Mountjoy, 2020. "On the Use of Outcome Tests for Detecting Bias in Decision Making," Working Papers 2020-125, Becker Friedman Institute for Research In Economics.
- Ivan A. Canay & Magne Mogstad & Jack Mountjoy, 2020. "On the Use of Outcome Tests for Detecting Bias in Decision Making," NBER Working Papers 27802, National Bureau of Economic Research, Inc.
Bharti, Nitin Kumar & Roy, Sutanuka, 2023. "The early origins of judicial stringency in bail decisions: Evidence from early childhood exposure to Hindu-Muslim riots in India," Journal of Public Economics, Elsevier, vol. 221(C).
Danielle Li & Lindsey R. Raymond & Peter Bergman, 2020. "Hiring as Exploration," NBER Working Papers 27736, National Bureau of Economic Research, Inc.
- Danielle Li & Lindsey Raymond & Peter Bergman, 2024. "Hiring as Exploration," Papers 2411.03616, arXiv.org.
Jens Ludwig & Sendhil Mullainathan, 2021. "Fragile Algorithms and Fallible Decision-Makers: Lessons from the Justice System," Journal of Economic Perspectives, American Economic Association, vol. 35(4), pages 71-96, Fall.
- Jens Ludwig & Sendhil Mullainathan, 2021. "Fragile Algorithms and Fallible Decision-Makers: Lessons from the Justice System," NBER Working Papers 29267, National Bureau of Economic Research, Inc.
Isil Erel & Léa H Stern & Chenhao Tan & Michael S Weisbach, 2021. "Selecting Directors Using Machine Learning," NBER Chapters, in: Big Data: Long-Term Implications for Financial Markets and Firms, pages 3226-3264, National Bureau of Economic Research, Inc.
- Isil Erel & Léa H Stern & Chenhao Tan & Michael S Weisbach, 2021. "Selecting Directors Using Machine Learning [The role of boards of directors in corporate governance: A conceptual framework and survey]," The Review of Financial Studies, Society for Financial Studies, vol. 34(7), pages 3226-3264.
- Isil Erel & Léa H. Stern & Chenhao Tan & Michael S. Weisbach, 2018. "Selecting Directors Using Machine Learning," NBER Working Papers 24435, National Bureau of Economic Research, Inc.
- Erel, Isil & Stern, Lea Henny & Tan, Chenhao & Weisbach, Michael S., 2018. "Selecting Directors Using Machine Learning," Working Paper Series 2018-05, Ohio State University, Charles A. Dice Center for Research in Financial Economics.
Ginther, Donna K. & Heggeness, Misty L., 2020. "Administrative discretion in scientific funding: Evidence from a prestigious postdoctoral training program✰," Research Policy, Elsevier, vol. 49(4).
Nicolás Grau & Damián Vergara, "undated". "A Simple Test for Prejudice in Decision Processes: The Prediction-Based Outcome Test," Working Papers wp493, University of Chile, Department of Economics.
Dario Sansone & Anna Zhu, 2023. "Using Machine Learning to Create an Early Warning System for Welfare Recipients," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 85(5), pages 959-992, October.
- Dario Sansone & Anna Zhu, 2020. "Using Machine Learning to Create an Early Warning System for Welfare Recipients," Papers 2011.12057, arXiv.org, revised May 2021.
- Sansone, Dario & Zhu, Anna, 2021. "Using Machine Learning to Create an Early Warning System for Welfare Recipients," IZA Discussion Papers 14377, Institute of Labor Economics (IZA).
Xiaochen Hu & Xudong Zhang & Nicholas Lovrich, 2021. "Public perceptions of police behavior during traffic stops: logistic regression and machine learning approaches compared," Journal of Computational Social Science, Springer, vol. 4(1), pages 355-380, May.
Mogstad, Magne & Torgovitsky, Alexander, 2024. "Instrumental variables with unobserved heterogeneity in treatment effects," Handbook of Labor Economics,, Elsevier.
Shroff, Ravi & Vamvourellis, Konstantinos, 2022. "Pretrial release judgments and decision fatigue," LSE Research Online Documents on Economics 117579, London School of Economics and Political Science, LSE Library.
Chugunova, Marina & Sele, Daniela, 2022. "We and It: An interdisciplinary review of the experimental evidence on how humans interact with machines," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 99(C).
Marie-Pierre Dargnies & Rustamdjan Hakimov & Dorothea Kübler, 2025. "Behavioral Measures Improve AI Hiring: A Field Experiment," Rationality and Competition Discussion Paper Series 532, CRC TRR 190 Rationality and Competition.
Stevenson, Megan T. & Doleac, Jennifer, 2019. "Algorithmic Risk Assessment in the Hands of Humans," IZA Discussion Papers 12853, Institute of Labor Economics (IZA).
- Megan Stevenson & Jennifer Doleac, 2020. "Algorithmic Risk Assessment in the Hands of Humans," Working Papers 2020-055, Human Capital and Economic Opportunity Working Group.
Hyunjin Kim & Edward L. Glaeser & Andrew Hillis & Scott Duke Kominers & Michael Luca, 2024. "Decision authority and the returns to algorithms," Strategic Management Journal, Wiley Blackwell, vol. 45(4), pages 619-648, April.
David Almog & Romain Gauriot & Lionel Page & Daniel Martin, 2024. "AI Oversight and Human Mistakes: Evidence from Centre Court," Papers 2401.16754, arXiv.org, revised Feb 2025.
Richard Berk, 2019. "Accuracy and Fairness for Juvenile Justice Risk Assessments," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 16(1), pages 175-194, March.
Bauer, Kevin & Gill, Andrej, 2021. "Mirror, mirror on the wall: Machine predictions and self-fulfilling prophecies," SAFE Working Paper Series 313, Leibniz Institute for Financial Research SAFE.
Elliott Ash & Claudia Marangon, 2024. "Judging disparities: Recidivism risk, image motives and in-group bias on Wisconsin criminal courts," Discussion Papers 2024-03, Nottingham Interdisciplinary Centre for Economic and Political Research (NICEP).

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-AIN-2024-04-22 (Artificial Intelligence)
NEP-CMP-2024-04-22 (Computational Economics)
NEP-EXP-2024-04-22 (Experimental Economics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2403.12108. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Does AI help humans make better decisions? A statistical evaluation framework for experimental and observational studies

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data