Benchmarking Municipal AI Chatbot Performance: Mixed Methods Insights into Competence, Integrity, and Algorithmic Discrimination in Dutch Public Administration

My bibliography Save this paper

Benchmarking Municipal AI Chatbot Performance: Mixed Methods Insights into Competence, Integrity, and Algorithmic Discrimination in Dutch Public Administration

Author

Listed:

Rudzitis, Ralfs
Weißmüller, Kristina Sabrina
(Vrije Universiteit Amsterdam)

Registered:

Abstract

This study introduces the Public-sector Chatbot Performance (PCP) framework, a novel and comprehensive approach to systematically assess AI chatbot performance in public administration. The framework evaluates both technical competence—factual accuracy, completeness, and source reliability—and normative integrity, including lawfulness, transparency, equality, and privacy. To demonstrate applicability of the PCP framework, we benchmark the full set of municipal chatbot systems currently deployed in Dutch local governments, alongside two leading proprietary large language models (LLMs): ChatGPT-4o and Gemini 2.5 Pro. Using a pragmatic mixed methods approach, we developed 26 prompts with systematic user-based variation to explore algorithmic bias, resulting in a dataset of n=326 user-chatbot interactions. Quantitative analysis revealed that ChatGPT-4o achieved a composite performance score of 95.7%, significantly outperforming all municipal systems. Municipal chatbots exhibited notable shortcomings in competence and integrity, with some failing to meet basic standards of lawful and equal service provision. Exploratory qualitative analysis further uncovered algorithmic opacity, discretionary advice in violation of Dutch good governance regulations, and discriminatory responses based on “ethnic” usernames. These insights challenge assumptions about neutrality in public sector AI and underscore the need for ethical benchmarks in chatbot evaluation. The PCP framework offers actionable guidance for policymakers, technologists, and scholars committed to responsible digital governance.

Suggested Citation

Rudzitis, Ralfs & Weißmüller, Kristina Sabrina, 2025. "Benchmarking Municipal AI Chatbot Performance: Mixed Methods Insights into Competence, Integrity, and Algorithmic Discrimination in Dutch Public Administration," SocArXiv me7pf_v1, Center for Open Science.

Handle: RePEc:osf:socarx:me7pf_v1
DOI: 10.31219/osf.io/me7pf_v1

Download full text from publisher

References listed on IDEAS

Marianne Bertrand & Sendhil Mullainathan, 2004. "Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination," American Economic Review, American Economic Association, vol. 94(4), pages 991-1013, September.
- Marianne Bertrand & Sendhil Mullainathan, 2003. "Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination," NBER Working Papers 9873, National Bureau of Economic Research, Inc.
- Marianne Bertrand & Sendhil Mullainathan, 2003. "Are emily and greg more employable than lakisha and jamal? A field experiment on labor market discrimination," Natural Field Experiments 00216, The Field Experiments Website.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Chowdhury, Shyamal & Ooi, Evarn & Slonim, Robert, 2017. "Racial discrimination and white first name adoption: a field experiment in the Australian labour market," Working Papers 2017-15, University of Sydney, School of Economics.
Mujcic, Redzo & Frijters, Paul, 2013. "Still Not Allowed on the Bus: It Matters If You're Black or White!," IZA Discussion Papers 7300, Institute of Labor Economics (IZA).
Jonas Radbruch & Amelie Schiprowski, 2025. "Interview Sequences and the Formation of Subjective Assessments," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 92(2), pages 1226-1256.
- Jonas Radbruch & Amelie Schiprowski, 2020. "Interview Sequences and the Formation of Subjective Assessments," ECONtribute Discussion Papers Series 045, University of Bonn and University of Cologne, Germany.
- Radbruch, Jonas & Schiprowski, Amelie, 2024. "Interview Sequences and the Formation of Subjective Assessments," CEPR Discussion Papers 18839, C.E.P.R. Discussion Papers.
- Jonas Radbruch & Amelie Schiprowski, 2024. "Interview Sequences and the Formation of Subjective Assessments," Rationality and Competition Discussion Paper Series 497, CRC TRR 190 Rationality and Competition.
- Jonas Radbruch & Amelie Schiprowski, 2024. "Interview Sequences and the Formation of Subjective Assessments," CESifo Working Paper Series 10957, CESifo.
- Jonas Radbruch & Amelie Schiprowski, 2021. "Interview Sequences and the Formation of Subjective Assessments," CRC TR 224 Discussion Paper Series crctr224_2021_268v2, University of Bonn and University of Mannheim, Germany.
- Radbruch, Jonas & Schiprowski, Amelie, 2021. "Interview Sequences and the Formation of Subjective Assessments," IZA Discussion Papers 14799, Institute of Labor Economics (IZA).
Anthony Edo & Nicolas Jacquemet & Constantine Yannelis, 2019. "Language skills and homophilous hiring discrimination: Evidence from gender and racially differentiated applications," Review of Economics of the Household, Springer, vol. 17(1), pages 349-376, March.
- Anthony Edo & Nicolas Jacquemet & Constantine Yannelis, 2013. "Language Skills and Homophilous Hiring Discrimination: Evidence from Gender-and Racially-Differentiated Applications," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-00877458, HAL.
- Anthony Edo & Nicolas Jacquemet & Constantine Yannelis, 2019. "Language skills and homophilous hiring discrimination: Evidence from gender and racially differentiated applications," PSE-Ecole d'économie de Paris (Postprint) halshs-02042942, HAL.
- Anthony Edo & Nicolas Jacquemet & Constantine Yannelis, 2019. "Language skills and homophilous hiring discrimination: Evidence from gender and racially differentiated applications," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-02042942, HAL.
- Edo, Anthony & Jacquemet, Nicolas & Yannelis, Constantine, 2013. "Language Skills and Homophilous Hiring Discrimination: Evidence from Gender- and Racially-Differentiated Applications," CEPREMAP Working Papers (Docweb) 1313, CEPREMAP.
- Anthony Edo & Nicolas Jacquemet & Constantine Yannelis, 2013. "Language Skills and Homophilous Hiring Discrimination: Evidence from Gender-and Racially-Differentiated Applications," Working Papers halshs-00877458, HAL.
- Anthony Edo & Nicolas Jacquemet & Constantine Yannelis, 2013. "Language Skills and Homophilous Hiring Discrimination: Evidence from Gender- and Racially-Differentiated Applications," Documents de travail du Centre d'Economie de la Sorbonne 13058, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
- Anthony Edo & Nicolas Jacquemet & Constantine Yannelis, 2019. "Language skills and homophilous hiring discrimination: Evidence from gender and racially differentiated applications," Post-Print halshs-02042942, HAL.
Kevin Lang & Ariella Kahn-Lang Spitzer, 2020. "Race Discrimination: An Economic Perspective," Journal of Economic Perspectives, American Economic Association, vol. 34(2), pages 68-89, Spring.
- Kevin Lang & Ariella Kahn-Lang Spitzer, "undated". "Race Discrimination: An Economic Perspective," Mathematica Policy Research Reports 7821adedb2e441ef85021895d, Mathematica Policy Research.
Keith Head & Thierry Mayer, 2008. "Detection Of Local Interactions From The Spatial Pattern Of Names In France," Journal of Regional Science, Wiley Blackwell, vol. 48(1), pages 67-95, February.
- Mayer, Thierry & Head, Keith, 2007. "Detection of Local Interactions from the Spatial Pattern of Names in France," CEPR Discussion Papers 6340, C.E.P.R. Discussion Papers.
- Keth Head & Thierry Mayer, 2008. "Detection of local interactions from the spatial pattern of names in France," PSE-Ecole d'économie de Paris (Postprint) halshs-00754305, HAL.
- Keith Head & Thierry Mayer, 2008. "Detection of local interactions from the spatial pattern of names in France," Post-Print hal-00266554, HAL.
- Keith Head & Thierry Mayer, 2008. "Detection of local interactions from the spatial pattern of names in France," SciencePo Working papers Main hal-01022324, HAL.
- Keth Head & Thierry Mayer, 2008. "Detection of local interactions from the spatial pattern of names in France," Post-Print halshs-00754305, HAL.
- Keith Head & Thierry Mayer, 2008. "Detection of local interactions from the spatial pattern of names in France," PSE-Ecole d'économie de Paris (Postprint) hal-00266554, HAL.
- Keith Head & Thierry Mayer, 2008. "Detection of local interactions from the spatial pattern of names in France," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-00266554, HAL.
- Keith Head & Thierry Mayer, 2008. "Detection of local interactions from the spatial pattern of names in France," Post-Print hal-01022324, HAL.
David H Chae & Sean Clouston & Mark L Hatzenbuehler & Michael R Kramer & Hannah L F Cooper & Sacoby M Wilson & Seth I Stephens-Davidowitz & Robert S Gold & Bruce G Link, 2015. "Association between an Internet-Based Measure of Area Racism and Black Mortality," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-12, April.
Elizabeth Hirsh & Hazel Hollingdale & Natasha Stecy-Hildebrandt, 2013. "Gender inequality in the workplace," Chapters, in: Deborah M. Figart & Tonia L. Warnecke (ed.), Handbook of Research on Gender and Economic Life, chapter 12, pages 183-199, Edward Elgar Publishing.
Saugato Datta & Vikram Pathania, 2016. "For whom does the phone (not) ring? Discrimination in the rental housing market in Delhi, India," WIDER Working Paper Series 055, World Institute for Development Economic Research (UNU-WIDER).
Kelley Sarussi & Thomas Walstrum, 2019. "Education and the Evolution of Earnings Across Population Groups Since 2000," Profitwise, Federal Reserve Bank of Chicago, issue 5, pages 1-13.
Ritwik Banerjee & Nabanita Datta Gupta, 2015. "Awareness Programs and Change in Taste-Based Caste Prejudice," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-17, April.
- Banerjee, Ritwik & Datta Gupta, Nabanita, 2014. "Awareness Programs and Change in Taste-based Caste Prejudice," IZA Discussion Papers 8446, Institute of Labor Economics (IZA).
- Ritwik Banerjee & Nabanita Datta Gupta, 2014. "Awareness programs and change in taste-based caste prejudice," Economics Working Papers 2014-19, Department of Economics and Business Economics, Aarhus University.
Bertrand, Jérémie & Burietz, Aurore, 2023. "(Loan) price and (loan officer) prejudice," Journal of Economic Behavior & Organization, Elsevier, vol. 210(C), pages 26-42.
- Jérémie Bertrand & Aurore Burietz, 2023. "(Loan) price and (loan officer) prejudice," Post-Print hal-04130884, HAL.
Jan Hanousek & Štěpán Jurajda, 2018. "Názvy společností a jejich vliv na výkonnost firem [Corporate Names and Performance]," Politická ekonomie, Prague University of Economics and Business, vol. 2018(6), pages 671-688.
Bryson, Alex & Chevalier, Arnaud, 2015. "Is there a taste for racial discrimination amongst employers?," Labour Economics, Elsevier, vol. 34(C), pages 51-63.
Bethany Everett & David Rehkopf & Richard Rogers, 2013. "The Nonlinear Relationship Between Education and Mortality: An Examination of Cohort, Race/Ethnic, and Gender Differences," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 32(6), pages 893-917, December.
Ola Bengtsson & John R. M. Hand, 2013. "Employee Compensation in Entrepreneurial Companies," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 22(2), pages 312-340, June.
- Bengtsson, Ola & Hand, John R. M., 2012. "Employee Compensation in Entrepreneurial Companies," Working Paper Series 922, Research Institute of Industrial Economics.
John M. Nunley & Adam Pugh & Nicholas Romero & Richard Alan Seals, Jr., 2014. "Unemployment, Underemployment, and Employment Opportunities: Results from a Correspondence Audit of the Labor Market for College Graduates," Auburn Economics Working Paper Series auwp2014-04, Department of Economics, Auburn University.
Henning Hermes & Philipp Lergetporer & Fabian Mierisch & Frauke Peter & Simon Wiederhold, 2023. "Discrimination on the Child Care Market: A Nationwide Field Experiment," Working Papers 225, Bavarian Graduate Program in Economics (BGPE).
- Hermes, Henning & Lergetporer, Philipp & Mierisch, Fabian & Peter, Frauke, 2023. "Discrimination on the child care market: A nationwide field experiment," DICE Discussion Papers 398, Heinrich Heine University Düsseldorf, Düsseldorf Institute for Competition Economics (DICE).
- Hermes, Henning & Mierisch, Fabian & Peter, Frauke & Wiederhold, Simon & Lergetporer, Philipp, 2023. "Discrimination on the Child Care Market: A Nationwide Field Experiment," IZA Discussion Papers 16082, Institute of Labor Economics (IZA).
Li, Xilin & Hsee, Christopher K., 2021. "Free-riding and cost-bearing in discrimination," Organizational Behavior and Human Decision Processes, Elsevier, vol. 163(C), pages 80-90.
Button, Patrick & Walker, Brigham, 2020. "Employment discrimination against Indigenous Peoples in the United States: Evidence from a field experiment," Labour Economics, Elsevier, vol. 65(C).
- Button, Patrick & Walker, Brigham, 2019. "Employment Discrimination against Indigenous Peoples in the United States: Evidence from a Field Experiment," IZA Discussion Papers 12131, Institute of Labor Economics (IZA).
- Patrick Button & Brigham Walker, 2019. "Employment Discrimination against Indigenous Peoples in the United States: Evidence from a Field Experiment," NBER Working Papers 25849, National Bureau of Economic Research, Inc.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-EUR-2025-09-29 (Microeconomic European Issues)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:socarx:me7pf_v1. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://arabixiv.org .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Benchmarking Municipal AI Chatbot Performance: Mixed Methods Insights into Competence, Integrity, and Algorithmic Discrimination in Dutch Public Administration

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data