IDEAS home Printed from https://ideas.repec.org/p/ajk/ajkdps/384.html

When Algorithms Rate Performance: Do Large Language Models Replicate Human Evaluation Biases?

Author

Listed:
  • Rainer Michael Rilke

    (WHU - Otto Beisheim School of Management)

  • Dirk Sliwka

    (University of Cologne)

Abstract

A large body of research across management, psychology, accounting, and economics shows that subjective performance evaluations are systematically biased: ratings cluster near the midpoint of scales and are often excessively lenient. As organizations increasingly adopt large language models (LLMs) for evaluative tasks, little is known about how these systems perform when assessing human performance. We document that, in the absence of clear objective standards and when individuals are rated independently, LLMs reproduce the familiar patterns of human raters. However, LLMs generate greater dispersion and accuracy when evaluating multiple individuals simultaneously. With noisy but objective performance signals, LLMs provide substantially more accurate evaluations than human raters, as they (i) are less subject to biases arising from concern for the evaluated employee and (ii) make fewer mistakes in information processing closely approximating rational Bayesian benchmarks.

Suggested Citation

  • Rainer Michael Rilke & Dirk Sliwka, 2026. "When Algorithms Rate Performance: Do Large Language Models Replicate Human Evaluation Biases?," ECONtribute Discussion Papers Series 384, University of Bonn and University of Cologne, Germany.
  • Handle: RePEc:ajk:ajkdps:384
    as

    Download full text from publisher

    File URL: https://www.econtribute.de/RePEc/ajk/ajkdps/ECONtribute_384_2026.pdf
    File Function: First version, 2026
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jens Ludwig & Sendhil Mullainathan, 2024. "Machine Learning as a Tool for Hypothesis Generation," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 139(2), pages 751-827.
    2. Frederiksen, Anders, 2013. "Incentives and earnings growth," Journal of Economic Behavior & Organization, Elsevier, vol. 85(C), pages 97-107.
    3. Johannes Berger & Christine Harbring & Dirk Sliwka, 2013. "Performance Appraisals and the Impact of Forced Distribution--An Experimental Investigation," Management Science, INFORMS, vol. 59(1), pages 54-68, June.
    4. Thomas J. Dohmen & Ben Kriechel & Gerard A. Pfann, 2004. "Monkey bars and ladders: The importance of lateral and vertical job mobility in internal labor market careers," Journal of Population Economics, Springer;European Society for Population Economics, vol. 17(2), pages 193-228, June.
    5. Kathrin Manthei & Dirk Sliwka, 2019. "Multitasking and Subjective Performance Evaluations: Theory and Evidence from a Field Experiment in a Bank," Management Science, INFORMS, vol. 65(12), pages 5861-5883, December.
    6. Axel Ockenfels & Dirk Sliwka & Peter Werner, 2015. "Bonus Payments and Reference Point Violations," Management Science, INFORMS, vol. 61(7), pages 1496-1513, July.
    7. Canice Prendergast, 1999. "The Provision of Incentives in Firms," Journal of Economic Literature, American Economic Association, vol. 37(1), pages 7-63, March.
    8. John J. Horton, 2017. "The Effects of Algorithmic Labor Market Recommendations: Evidence from a Field Experiment," Journal of Labor Economics, University of Chicago Press, vol. 35(2), pages 345-385.
    9. Moers, Frank, 2005. "Discretion and bias in performance evaluation: the impact of diversity and subjectivity," Accounting, Organizations and Society, Elsevier, vol. 30(1), pages 67-80, January.
    10. Frederiksen, Anders & Lange, Fabian & Kriechel, Ben, 2017. "Subjective performance evaluations and employee careers," Journal of Economic Behavior & Organization, Elsevier, vol. 134(C), pages 408-429.
    11. Berkeley J. Dietvorst & Joseph P. Simmons & Cade Massey, 2018. "Overcoming Algorithm Aversion: People Will Use Imperfect Algorithms If They Can (Even Slightly) Modify Them," Management Science, INFORMS, vol. 64(3), pages 1155-1170, March.
    12. Flabbi, Luca & Ichino, Andrea, 2001. "Productivity, seniority and wages: new evidence from personnel data," Labour Economics, Elsevier, vol. 8(3), pages 359-387, June.
    13. Golman, Russell & Bhatia, Sudeep, 2012. "Performance evaluation inflation and compression," Accounting, Organizations and Society, Elsevier, vol. 37(8), pages 534-543.
    14. Jon Kleinberg & Himabindu Lakkaraju & Jure Leskovec & Jens Ludwig & Sendhil Mullainathan, 2018. "Human Decisions and Machine Predictions," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 133(1), pages 237-293.
    15. Eddy Cardinaels & Christoph Feichter, 2021. "Forced Rating Systems from Employee and Supervisor Perspectives," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 59(5), pages 1573-1607, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Patrick Kampkötter & Dirk Sliwka, 2016. "The Complementary Use of Experiments and Field Data to Evaluate Management Practices: The Case of Subjective Performance Evaluations," Journal of Institutional and Theoretical Economics (JITE), Mohr Siebeck, Tübingen, vol. 172(2), pages 364-389, June.
    2. Axel Ockenfels & Dirk Sliwka & Peter Werner, 2025. "Multirater Performance Evaluations and Incentives," Journal of Labor Economics, University of Chicago Press, vol. 43(4), pages 985-1004.
    3. Werner, Peter, 2024. "On common evaluation standards and the acceptance of wage inequality," Games and Economic Behavior, Elsevier, vol. 145(C), pages 137-156.
    4. Frederiksen, Anders & Lange, Fabian & Kriechel, Ben, 2017. "Subjective performance evaluations and employee careers," Journal of Economic Behavior & Organization, Elsevier, vol. 134(C), pages 408-429.
    5. Irene Trapp & Rouven Trapp, 2019. "The psychological effects of centrality bias: an experimental analysis," Journal of Business Economics, Springer, vol. 89(2), pages 155-189, March.
    6. Kusterer, David & Sliwka, Dirk, 2022. "Social Preferences and Rating Biases in Subjective Performance Evaluations," IZA Discussion Papers 15496, IZA Network @ LISER.
    7. Bol, Jasmijn C. & Kramer, Stephan & Maas, Victor S., 2016. "How control system design affects performance evaluation compression: The role of information accuracy and outcome transparency," Accounting, Organizations and Society, Elsevier, vol. 51(C), pages 64-73.
    8. Kusterer, David & Bolton, Gary & Mans, Johannes, 2016. "Inflated Reputations Uncertainty, Leniency & Moral Wiggle Room in Trader Feedback Systems," VfS Annual Conference 2016 (Augsburg): Demographic Change 145794, Verein für Socialpolitik / German Economic Association.
    9. Thuy-Van Tran & Sinikka Lepistö & Janne Järvinen, 2021. "The relationship between subjectivity in managerial performance evaluation and the three dimensions of justice perception," Journal of Management Control: Zeitschrift für Planung und Unternehmenssteuerung, Springer, vol. 32(3), pages 369-399, September.
    10. Marco Kleine & Sebastian Kube, 2015. "Communication and Trust in Principal-Team Relationships: Experimental Evidence," Discussion Paper Series of the Max Planck Institute for Behavioral Economics 2015_06, Max Planck Institute for Behavioral Economics.
    11. Enzo Brox & Michael Lechner, 2024. "Teamwork and Spillover Effects in Performance Evaluations," Papers 2403.15200, arXiv.org.
    12. Aniek Wijayanti & Mahfud Sholihin & Ertambang Nahartyo & Supriyadi, 2025. "What do we know about the forced distribution system: a systematic literature review and opportunities for future research," Management Review Quarterly, Springer, vol. 75(1), pages 747-788, February.
    13. Juho Jokinen & Jaakko Pehkonen, 2017. "Promotions and Earnings – Gender or Merit? Evidence from Longitudinal Personnel Data," Journal of Labor Research, Springer, vol. 38(3), pages 306-334, September.
    14. Fochmann, Martin & Sachs, Florian & Weimann, Joachim, 2019. "Managing wages: Fairness norms of low- and high-performing team members," arqus Discussion Papers in Quantitative Tax Research 238, arqus - Arbeitskreis Quantitative Steuerlehre.
    15. Li, Yelin & Reichert, Bernhard E. & Woods, Alex, 2024. "The interactive effects of performance evaluation leniency and performance measurement precision on employee effort and performance," Advances in accounting, Elsevier, vol. 64(C).
    16. Fumagalli, Elena & Rezaei, Sarah & Salomons, Anna, 2022. "OK computer: Worker perceptions of algorithmic recruitment," Research Policy, Elsevier, vol. 51(2).
    17. Kathrin Manthei & Dirk Sliwka, 2019. "Multitasking and Subjective Performance Evaluations: Theory and Evidence from a Field Experiment in a Bank," Management Science, INFORMS, vol. 65(12), pages 5861-5883, December.
    18. Eddy Cardinaels & Christoph Feichter, 2021. "Forced Rating Systems from Employee and Supervisor Perspectives," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 59(5), pages 1573-1607, December.
    19. Marchegiani, Lucia & Reggiani, Tommaso & Rizzolli, Matteo, 2016. "Loss averse agents and lenient supervisors in performance appraisal," Journal of Economic Behavior & Organization, Elsevier, vol. 131(PA), pages 183-197.
    20. Gary E. Bolton & David J. Kusterer & Johannes Mans, 2019. "Inflated Reputations: Uncertainty, Leniency, and Moral Wiggle Room in Trader Feedback Systems," Management Science, INFORMS, vol. 65(11), pages 5371-5391, November.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • J24 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Human Capital; Skills; Occupational Choice; Labor Productivity
    • J28 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Safety; Job Satisfaction; Related Public Policy
    • M12 - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics - - Business Administration - - - Personnel Management; Executives; Executive Compensation
    • M53 - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics - - Personnel Economics - - - Training

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ajk:ajkdps:384. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ECONtribute Office (email available below). General contact details of provider: https://www.econtribute.de .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.