Fair outputs, Biased Internals: Causal Potency and Asymmetry of Latent Bias in LLMs for High-Stakes Decisions

Fair outputs, Biased Internals: Causal Potency and Asymmetry of Latent Bias in LLMs for High-Stakes Decisions

Author

Listed:

Jagdish Tripathy
Marcus Buckmann

Abstract

Instruction-tuned language models exhibit behavioural fairness in high-stakes decisions while retaining biased associations in their internal representations. However, whether these suppressed representations can affect model outputs - and whether such causal potency is symmetric across demographic groups - remains unknown. We investigate the use of open-weight models for mortgage underwriting using matched applications that differ only in racially-associated names and reveal a critical disconnect: models show no output-level bias, yet retain and amplify demographic representations across model layers. Through activation steering and novel cross-layer interventions, we demonstrate that this suppressed information is decision-relevant: when reinjected at critical layers, it produces near-complete decision reversals. Critically, this latent bias is asymmetric - steering interventions affect decisions in one demographic direction, while producing minimal effects in reverse - and susceptible to adversarial prompt engineering and parameter-efficient fine-tuning. These findings demonstrate that behavioural audits focused on outputs are insufficient: fair outputs can mask exploitable internal biases. They also motivate dual-layer testing frameworks combining output evaluation with representational analysis for AI governance in high-stakes decisions.

Suggested Citation

Jagdish Tripathy & Marcus Buckmann, 2026. "Fair outputs, Biased Internals: Causal Potency and Asymmetry of Latent Bias in LLMs for High-Stakes Decisions," Papers 2605.15217, arXiv.org.

Handle: RePEc:arx:papers:2605.15217

Download full text from publisher

References listed on IDEAS

Andreas Fuster & Paul Goldsmith‐Pinkham & Tarun Ramadorai & Ansgar Walther, 2022. "Predictably Unequal? The Effects of Machine Learning on Credit Markets," Journal of Finance, American Finance Association, vol. 77(1), pages 5-47, February.
- Goldsmith-Pinkham, Paul & Walther, Ansgar, 2017. "Predictably Unequal? The Effects of Machine Learning on Credit Markets," CEPR Discussion Papers 12448, Centre for Economic Policy Research.
Marianne Bertrand & Sendhil Mullainathan, 2004. "Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination," American Economic Review, American Economic Association, vol. 94(4), pages 991-1013, September.
- Marianne Bertrand & Sendhil Mullainathan, 2003. "Are emily and greg more employable than lakisha and jamal? A field experiment on labor market discrimination," Natural Field Experiments 00216, The Field Experiments Website.
- Marianne Bertrand & Sendhil Mullainathan, 2003. "Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination," NBER Working Papers 9873, National Bureau of Economic Research, Inc.
Bartlett, Robert & Morse, Adair & Stanton, Richard & Wallace, Nancy, 2022. "Consumer-lending discrimination in the FinTech Era," Journal of Financial Economics, Elsevier, vol. 143(1), pages 30-56.
Jos W. R. Twisk, 2025. "Machine Learning," Springer Books, in: Basic Principles of Applied Medical Statistics, chapter 0, pages 197-207, Springer.
Valentin Hofmann & Pratyusha Ria Kalluri & Dan Jurafsky & Sharese King, 2024. "AI generates covertly racist decisions about people based on their dialect," Nature, Nature, vol. 633(8028), pages 147-154, September.
Jiafu An & Difang Huang & Chen Lin & Mingzhu Tai, 2024. "Measuring Gender and Racial Biases in Large Language Models," Papers 2403.15281, arXiv.org.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Sugat Chaturvedi & Rochana Chaturvedi, 2025. "Who Gets the Callback? Generative AI and Gender Bias," Papers 2504.21400, arXiv.org.
Sabrina T. Howell & Theresa Kuchler & David Snitkof & Johannes Stroebel & Jun Wong, 2024. "Lender Automation and Racial Disparities in Credit Access," Journal of Finance, American Finance Association, vol. 79(2), pages 1457-1512, April.
- Sabrina T. Howell & Theresa Kuchler & David Snitkof & Johannes Stroebel & Jun Wong, 2021. "Lender Automation and Racial Disparities in Credit Access," NBER Working Papers 29364, National Bureau of Economic Research, Inc.
Sanroman Graciela & Bertoletti Lucía & Borraz Fernando, 2024. "Consumer Debt and Poverty: the Default Risk Gap," Asociación Argentina de Economía Política: Working Papers 4765, Asociación Argentina de Economía Política.
- Bertoletti, Lucía & Borraz, Fernando & Sanroman, Graciela, 2024. "Consumer Debt and Poverty: the Default Risk Gap," GLO Discussion Paper Series 1439, Global Labor Organization (GLO).
- Lucía Bertoletti & Fernando Borraz & Graciela Sanroman, 2024. "Consumer Debt and Poverty: the Default Risk Gap," Documentos de Trabajo (working papers) 0524, Department of Economics - dECON.
Agarwal, Shivam & Muckley, Cal B. & Neelakantan, Parvati, 2023. "Countering racial discrimination in algorithmic lending: A case for model-agnostic interpretation methods," Economics Letters, Elsevier, vol. 226(C).
Anastasia Cozarenco & Ariane Szafarz, 2026. "Bias in Mission-Driven Finance: Discrimination or Mission Drift?," Journal of Business Ethics, Springer, vol. 203(1), pages 91-106, January.
- Anastasia Cozarenco & Ariane Szafarz, 2025. "Bias in Mission-Driven Finance: Discrimination or Mission Drift?," Working Papers CEB 25-004, ULB -- Universite Libre de Bruxelles.
Lepinteur, Anthony & Menta, Giorgia & Waltl, Sofie R., 2025. "Equal price for equal place? Demand-driven racial discrimination in the housing market," Regional Science and Urban Economics, Elsevier, vol. 111(C).
- Lepinteur, Anthony & Menta, Giorgia & Waltl, Sofie R., 2023. "Equal Price for Equal Place? Demand-Driven Racial Discrimination in the Housing Market," IZA Discussion Papers 16418, IZA Network @ LISER.
- Anthony Lepinteur & Giorgia Menta & Sofie R. Waltl, 2023. "Equal Price for Equal Place? Demand-Driven Racial Discrimination in the Housing Market," LISER Working Paper Series 2023-09, Luxembourg Institute of Socio-Economic Research (LISER).
LaVoice, Jessica & Vamossy, Domonkos F., 2024. "Racial disparities in debt collection," Journal of Banking & Finance, Elsevier, vol. 164(C).
- Jessica LaVoice & Domonkos F. Vamossy, 2019. "Racial Disparities in Debt Collection," Papers 1910.02570, arXiv.org, revised Jun 2023.
Alireza Fallah & Michael I. Jordan & Annie Ulichney, 2025. "The Statistical Fairness-Accuracy Frontier," Papers 2508.17622, arXiv.org, revised Feb 2026.
Trevor J. Bakker & Stefanie DeLuca & Eric A. English & Jamie Fogel & Nathaniel Hendren & Daniel Herbst, 2025. "Credit Access in the United States," Working Papers 25-45, Center for Economic Studies, U.S. Census Bureau.
Zhao, Xiaoyang & Weng, Zongyuan, 2024. "Digital dividend or divide: The digital economy and urban entrepreneurial activity," Socio-Economic Planning Sciences, Elsevier, vol. 93(C).
Ashleigh Eldemire & Kimberly F Luchtenberg & Matthew M Wynter, 2022. "Does Homeownership Reduce Wealth Disparities for Low-Income and Minority Households?," The Review of Corporate Finance Studies, Society for Financial Studies, vol. 11(3), pages 465-510.
Matteo Crosignani & Jonathan Kivell & Daniel Mangrum & Donald P. Morgan & Ambika Nair & Joelle Scally & Wilbert Van der Klaauw, 2025. "Financial Inclusion in the United States: Measurement, Determinants, and Recent Developments," Economic Policy Review, Federal Reserve Bank of New York, vol. 31(3), pages 1-49, September.
David Gaddis Ross & Dong Hyun Shin, 2024. "Do financial market frictions hurt the performance of women‐led ventures? A meta‐analytic investigation," Strategic Management Journal, Wiley Blackwell, vol. 45(3), pages 507-534, March.
Chacon, David Ugarte & Lee, Seohyun & Park, Jaehyuk, 2026. "An explainable machine learning model for consumer credit scoring in Mexico," Emerging Markets Review, Elsevier, vol. 71(C).
Wang, Xiaodong & Deng, Yunfeng & Mao, Xiaomeng, 2025. "The impact of bank digital transformation on enterprises digital technology innovation in China," International Review of Financial Analysis, Elsevier, vol. 102(C).
Runhuan Feng & Hong Li & Ming Liu, 2025. "Robo-Advisors Beyond Automation: Principles and Roadmap for AI-Driven Financial Planning," Papers 2509.09922, arXiv.org.
Andrés Alonso-Robisco & José Manuel Carbó, 2025. "Should We Trust the Credit Decisions Provided by Machine Learning Models?," Computational Economics, Springer;Society for Computational Economics, vol. 66(5), pages 4245-4274, November.
Shuaiyu Chen & T. Clifton Green & Huseyin Gulen & Dexin Zhou, 2024. "What Does ChatGPT Make of Historical Stock Returns? Extrapolation and Miscalibration in LLM Stock Return Forecasts," Papers 2409.11540, arXiv.org.
Alonso-Robisco, Andrés & Carbó, José Manuel, 2022. "Can machine learning models save capital for banks? Evidence from a Spanish credit portfolio," International Review of Financial Analysis, Elsevier, vol. 84(C).
Christophe Hurlin & Christophe Pérignon & Sébastien Saurin, 2026. "The Fairness of Credit Scoring Models," Management Science, INFORMS, vol. 72(1), pages 406-425, January.
- Christophe Hurlin & Christophe Perignon & Sébastien Saurin, 2021. "The Fairness of Credit Scoring Models," Working Papers hal-03501452, HAL.
- Christophe Hurlin & Christophe P'erignon & S'ebastien Saurin, 2022. "The Fairness of Credit Scoring Models," Papers 2205.10200, arXiv.org, revised Feb 2024.
- Hurlin, Christophe & Pérignon, Christophe & Saurin, Sébastien, 2021. "The Fairness of Credit Scoring Models," HEC Research Papers Series 1411, HEC Paris.
- Christophe HURLIN & Christophe PERIGNON & Sébastien SAURIN, 2021. "The Fairness of Credit Scoring Models," LEO Working Papers / DR LEO 2912, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
- Christophe Hurlin & Christophe Pérignon & Sébastien Saurin, 2024. "The Fairness of Credit Scoring Models," Post-Print hal-04787960, HAL.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-AIN-2026-05-25 (Artificial Intelligence)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2605.15217. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: https://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Fair outputs, Biased Internals: Causal Potency and Asymmetry of Latent Bias in LLMs for High-Stakes Decisions

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data