When Algorithms Rate Performance: Do Large Language Models Replicate Human Evaluation Biases?

When Algorithms Rate Performance: Do Large Language Models Replicate Human Evaluation Biases?

Author

Listed:

Rainer Michael Rilke
(WHU - Otto Beisheim School of Management)
Dirk Sliwka
(University of Cologne)

Abstract

A large body of research across management, psychology, accounting, and economics shows that subjective performance evaluations are systematically biased: ratings cluster near the midpoint of scales and are often excessively lenient. As organizations increasingly adopt large language models (LLMs) for evaluative tasks, little is known about how these systems perform when assessing human performance. We document that, in the absence of clear objective standards and when individuals are rated independently, LLMs reproduce the familiar patterns of human raters. However, LLMs generate greater dispersion and accuracy when evaluating multiple individuals simultaneously. With noisy but objective performance signals, LLMs provide substantially more accurate evaluations than human raters, as they (i) are less subject to biases arising from concern for the evaluated employee and (ii) make fewer mistakes in information processing closely approximating rational Bayesian benchmarks.

Suggested Citation

Rainer Michael Rilke & Dirk Sliwka, 2026. "When Algorithms Rate Performance: Do Large Language Models Replicate Human Evaluation Biases?," ECONtribute Discussion Papers Series 384, University of Bonn and University of Cologne, Germany.

Handle: RePEc:ajk:ajkdps:384

Download full text from publisher

References listed on IDEAS

Jens Ludwig & Sendhil Mullainathan, 2024. "Machine Learning as a Tool for Hypothesis Generation," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 139(2), pages 751-827.
- Jens Ludwig & Sendhil Mullainathan, 2023. "Machine Learning as a Tool for Hypothesis Generation," NBER Working Papers 31017, National Bureau of Economic Research, Inc.
Frederiksen, Anders, 2013. "Incentives and earnings growth," Journal of Economic Behavior & Organization, Elsevier, vol. 85(C), pages 97-107.
Johannes Berger & Christine Harbring & Dirk Sliwka, 2013. "Performance Appraisals and the Impact of Forced Distribution--An Experimental Investigation," Management Science, INFORMS, vol. 59(1), pages 54-68, June.
- Berger, Johannes & Harbring, Christine & Sliwka, Dirk, 2010. "Performance Appraisals and the Impact of Forced Distribution: An Experimental Investigation," IZA Discussion Papers 5020, Institute of Labor Economics (IZA).
Thomas J. Dohmen & Ben Kriechel & Gerard A. Pfann, 2004. "Monkey bars and ladders: The importance of lateral and vertical job mobility in internal labor market careers," Journal of Population Economics, Springer;European Society for Population Economics, vol. 17(2), pages 193-228, June.
- Dohmen, Thomas & Kriechel, Ben & Pfann, Gerard A., 2003. "Monkey Bars and Ladders: The Importance of Lateral and Vertical Job Mobility in Internal Labor Market Careers," IZA Discussion Papers 867, IZA Network @ LISER.
Kathrin Manthei & Dirk Sliwka, 2019. "Multitasking and Subjective Performance Evaluations: Theory and Evidence from a Field Experiment in a Bank," Management Science, INFORMS, vol. 65(12), pages 5861-5883, December.
Axel Ockenfels & Dirk Sliwka & Peter Werner, 2015. "Bonus Payments and Reference Point Violations," Management Science, INFORMS, vol. 61(7), pages 1496-1513, July.
- Ockenfels, Axel & Sliwka, Dirk & Werner, Peter, 2010. "Bonus Payments and Reference Point Violations," IZA Discussion Papers 4795, Institute of Labor Economics (IZA).
Canice Prendergast, 1999. "The Provision of Incentives in Firms," Journal of Economic Literature, American Economic Association, vol. 37(1), pages 7-63, March.
John J. Horton, 2017. "The Effects of Algorithmic Labor Market Recommendations: Evidence from a Field Experiment," Journal of Labor Economics, University of Chicago Press, vol. 35(2), pages 345-385.
Moers, Frank, 2005. "Discretion and bias in performance evaluation: the impact of diversity and subjectivity," Accounting, Organizations and Society, Elsevier, vol. 30(1), pages 67-80, January.
Frederiksen, Anders & Lange, Fabian & Kriechel, Ben, 2017. "Subjective performance evaluations and employee careers," Journal of Economic Behavior & Organization, Elsevier, vol. 134(C), pages 408-429.
- Frederiksen, Anders & Lange, Fabian & Kriechel, Ben, 2012. "Subjective Performance Evaluations and Employee Careers," IZA Discussion Papers 6373, IZA Network @ LISER.
Berkeley J. Dietvorst & Joseph P. Simmons & Cade Massey, 2018. "Overcoming Algorithm Aversion: People Will Use Imperfect Algorithms If They Can (Even Slightly) Modify Them," Management Science, INFORMS, vol. 64(3), pages 1155-1170, March.
Flabbi, Luca & Ichino, Andrea, 2001. "Productivity, seniority and wages: new evidence from personnel data," Labour Economics, Elsevier, vol. 8(3), pages 359-387, June.
- Flabbi, Luca & Ichino, Andrea, 1998. "Productivity, Seniority and Wages: New Evidence from Personnel Data," CEPR Discussion Papers 1966, C.E.P.R. Discussion Papers.
- Ichino, A. & Flabbi, L., 1998. "Productivity, Seniority and Wages. New Evidence form Personnel Data," Economics Working Papers eco98/11, European University Institute.
Golman, Russell & Bhatia, Sudeep, 2012. "Performance evaluation inflation and compression," Accounting, Organizations and Society, Elsevier, vol. 37(8), pages 534-543.
Jon Kleinberg & Himabindu Lakkaraju & Jure Leskovec & Jens Ludwig & Sendhil Mullainathan, 2018. "Human Decisions and Machine Predictions," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 133(1), pages 237-293.
- Jon Kleinberg & Himabindu Lakkaraju & Jure Leskovec & Jens Ludwig & Sendhil Mullainathan, 2017. "Human Decisions and Machine Predictions," NBER Working Papers 23180, National Bureau of Economic Research, Inc.
Eddy Cardinaels & Christoph Feichter, 2021. "Forced Rating Systems from Employee and Supervisor Perspectives," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 59(5), pages 1573-1607, December.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Patrick Kampkötter & Dirk Sliwka, 2016. "The Complementary Use of Experiments and Field Data to Evaluate Management Practices: The Case of Subjective Performance Evaluations," Journal of Institutional and Theoretical Economics (JITE), Mohr Siebeck, Tübingen, vol. 172(2), pages 364-389, June.
- Kampkötter, Patrick & Sliwka, Dirk, 2015. "The Complementary Use of Experiments and Field Data to Evaluate Management Practices: The Case of Subjective Performance Evaluations," IZA Discussion Papers 9285, IZA Network @ LISER.
Axel Ockenfels & Dirk Sliwka & Peter Werner, 2025. "Multirater Performance Evaluations and Incentives," Journal of Labor Economics, University of Chicago Press, vol. 43(4), pages 985-1004.
- Axel Ockenfels & Dirk Sliwka & Peter Werner, 2024. "Multi-rater Performance Evaluations and Incentives," ECONtribute Discussion Papers Series 307, University of Bonn and University of Cologne, Germany.
- Ockenfels, Axel & Sliwka, Dirk & Werner, Peter, 2024. "Multi-Rater Performance Evaluations and Incentives," IZA Discussion Papers 16812, IZA Network @ LISER.
Werner, Peter, 2024. "On common evaluation standards and the acceptance of wage inequality," Games and Economic Behavior, Elsevier, vol. 145(C), pages 137-156.
Frederiksen, Anders & Lange, Fabian & Kriechel, Ben, 2017. "Subjective performance evaluations and employee careers," Journal of Economic Behavior & Organization, Elsevier, vol. 134(C), pages 408-429.
- Frederiksen, Anders & Lange, Fabian & Kriechel, Ben, 2012. "Subjective Performance Evaluations and Employee Careers," IZA Discussion Papers 6373, IZA Network @ LISER.
Irene Trapp & Rouven Trapp, 2019. "The psychological effects of centrality bias: an experimental analysis," Journal of Business Economics, Springer, vol. 89(2), pages 155-189, March.
Kusterer, David & Sliwka, Dirk, 2022. "Social Preferences and Rating Biases in Subjective Performance Evaluations," IZA Discussion Papers 15496, IZA Network @ LISER.
Bol, Jasmijn C. & Kramer, Stephan & Maas, Victor S., 2016. "How control system design affects performance evaluation compression: The role of information accuracy and outcome transparency," Accounting, Organizations and Society, Elsevier, vol. 51(C), pages 64-73.
Kusterer, David & Bolton, Gary & Mans, Johannes, 2016. "Inflated Reputations Uncertainty, Leniency & Moral Wiggle Room in Trader Feedback Systems," VfS Annual Conference 2016 (Augsburg): Demographic Change 145794, Verein für Socialpolitik / German Economic Association.
Thuy-Van Tran & Sinikka Lepistö & Janne Järvinen, 2021. "The relationship between subjectivity in managerial performance evaluation and the three dimensions of justice perception," Journal of Management Control: Zeitschrift für Planung und Unternehmenssteuerung, Springer, vol. 32(3), pages 369-399, September.
Marco Kleine & Sebastian Kube, 2015. "Communication and Trust in Principal-Team Relationships: Experimental Evidence," Discussion Paper Series of the Max Planck Institute for Behavioral Economics 2015_06, Max Planck Institute for Behavioral Economics.
- Kleine, Marco & Kube, Sebastian, 2015. "Communication and Trust in Principal-Team Relationships: Experimental Evidence," IZA Discussion Papers 8762, IZA Network @ LISER.
Enzo Brox & Michael Lechner, 2024. "Teamwork and Spillover Effects in Performance Evaluations," Papers 2403.15200, arXiv.org.
Aniek Wijayanti & Mahfud Sholihin & Ertambang Nahartyo & Supriyadi, 2025. "What do we know about the forced distribution system: a systematic literature review and opportunities for future research," Management Review Quarterly, Springer, vol. 75(1), pages 747-788, February.
Juho Jokinen & Jaakko Pehkonen, 2017. "Promotions and Earnings – Gender or Merit? Evidence from Longitudinal Personnel Data," Journal of Labor Research, Springer, vol. 38(3), pages 306-334, September.
Fochmann, Martin & Sachs, Florian & Weimann, Joachim, 2019. "Managing wages: Fairness norms of low- and high-performing team members," arqus Discussion Papers in Quantitative Tax Research 238, arqus - Arbeitskreis Quantitative Steuerlehre.
Li, Yelin & Reichert, Bernhard E. & Woods, Alex, 2024. "The interactive effects of performance evaluation leniency and performance measurement precision on employee effort and performance," Advances in accounting, Elsevier, vol. 64(C).
Fumagalli, Elena & Rezaei, Sarah & Salomons, Anna, 2022. "OK computer: Worker perceptions of algorithmic recruitment," Research Policy, Elsevier, vol. 51(2).
Kathrin Manthei & Dirk Sliwka, 2019. "Multitasking and Subjective Performance Evaluations: Theory and Evidence from a Field Experiment in a Bank," Management Science, INFORMS, vol. 65(12), pages 5861-5883, December.
Eddy Cardinaels & Christoph Feichter, 2021. "Forced Rating Systems from Employee and Supervisor Perspectives," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 59(5), pages 1573-1607, December.
Marchegiani, Lucia & Reggiani, Tommaso & Rizzolli, Matteo, 2016. "Loss averse agents and lenient supervisors in performance appraisal," Journal of Economic Behavior & Organization, Elsevier, vol. 131(PA), pages 183-197.
- Lucia Marchegiani & Tommaso Reggiani & Matteo Rizzolli, 2016. "Loss Averse Agents and Lenient Supervisors in Performance Appraisal," CERBE Working Papers wpC11, CERBE Center for Relationship Banking and Economics.
Gary E. Bolton & David J. Kusterer & Johannes Mans, 2019. "Inflated Reputations: Uncertainty, Leniency, and Moral Wiggle Room in Trader Feedback Systems," Management Science, INFORMS, vol. 65(11), pages 5371-5391, November.

More about this item

Keywords

; ; ; ; ;

JEL classification:

J24 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Human Capital; Skills; Occupational Choice; Labor Productivity
J28 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Safety; Job Satisfaction; Related Public Policy
M12 - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics - - Business Administration - - - Personnel Management; Executives; Executive Compensation
M53 - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics - - Personnel Economics - - - Training

NEP fields

This paper has been announced in the following NEP Reports:

NEP-AIN-2026-01-26 (Artificial Intelligence)
NEP-BIG-2026-01-26 (Big Data)
NEP-CBE-2026-01-26 (Cognitive and Behavioural Economics)
NEP-CMP-2026-01-26 (Computational Economics)
NEP-HRM-2026-01-26 (Human Capital and Human Resource Management)
NEP-LMA-2026-01-26 (Labor Markets - Supply, Demand, and Wages)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ajk:ajkdps:384. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ECONtribute Office (email available below). General contact details of provider: https://www.econtribute.de .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

When Algorithms Rate Performance: Do Large Language Models Replicate Human Evaluation Biases?

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

JEL classification:

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data