Prompt-Response Semantic Divergence Metrics for Faithfulness Hallucination and Misalignment Detection in Large Language Models

My bibliography Save this paper

Prompt-Response Semantic Divergence Metrics for Faithfulness Hallucination and Misalignment Detection in Large Language Models

Author

Listed:

Igor Halperin

Registered:

Abstract

The proliferation of Large Language Models (LLMs) is challenged by hallucinations, critical failure modes where models generate non-factual, nonsensical or unfaithful text. This paper introduces Semantic Divergence Metrics (SDM), a novel lightweight framework for detecting Faithfulness Hallucinations -- events of severe deviations of LLMs responses from input contexts. We focus on a specific implementation of these LLM errors, {confabulations, defined as responses that are arbitrary and semantically misaligned with the user's query. Existing methods like Semantic Entropy test for arbitrariness by measuring the diversity of answers to a single, fixed prompt. Our SDM framework improves upon this by being more prompt-aware: we test for a deeper form of arbitrariness by measuring response consistency not only across multiple answers but also across multiple, semantically-equivalent paraphrases of the original prompt. Methodologically, our approach uses joint clustering on sentence embeddings to create a shared topic space for prompts and answers. A heatmap of topic co-occurances between prompts and responses can be viewed as a quantified two-dimensional visualization of the user-machine dialogue. We then compute a suite of information-theoretic metrics to measure the semantic divergence between prompts and responses. Our practical score, $\mathcal{S}_H$, combines the Jensen-Shannon divergence and Wasserstein distance to quantify this divergence, with a high score indicating a Faithfulness hallucination. Furthermore, we identify the KL divergence KL(Answer $||$ Prompt) as a powerful indicator of \textbf{Semantic Exploration}, a key signal for distinguishing different generative behaviors. These metrics are further combined into the Semantic Box, a diagnostic framework for classifying LLM response types, including the dangerous, confident confabulation.

Suggested Citation

Igor Halperin, 2025. "Prompt-Response Semantic Divergence Metrics for Faithfulness Hallucination and Misalignment Detection in Large Language Models," Papers 2508.10192, arXiv.org.

Handle: RePEc:arx:papers:2508.10192

Download full text from publisher

References listed on IDEAS

Sebastian Farquhar & Jannik Kossen & Lorenz Kuhn & Yarin Gal, 2024. "Detecting hallucinations in large language models using semantic entropy," Nature, Nature, vol. 630(8017), pages 625-630, June.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Igor Halperin, 2025. "Topic Identification in LLM Input-Output Pairs through the Lens of Information Bottleneck," Papers 2509.03533, arXiv.org.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Xiaoyi Luo & Aiwen Wang & Xinling Zhang & Kunda Huang & Songyu Wang & Lixin Chen & Yejia Cui, 2025. "Toward Intelligent AIoT: A Comprehensive Survey on Digital Twin and Multimodal Generative AI Integration," Mathematics, MDPI, vol. 13(21), pages 1-44, October.
Abramson, Corey & Li, Zhuofan & Prendergast, Tara & Dohan, Daniel, 2025. "Qualitative Research in an Era of AI: A Pragmatic Approach to Data Analysis, Workflow, and Computation," SocArXiv 7bsgy_v1, Center for Open Science.
Yusong Ke & Hongru Lin & Yuting Ruan & Junya Tang & Li Li, 2025. "Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework," Mathematics, MDPI, vol. 13(9), pages 1-17, May.
Sheng Wang & Fangyuan Zhao & Dechao Bu & Yunwei Lu & Ming Gong & Hongjie Liu & Zhaohui Yang & Xiaoxi Zeng & Zhiyuan Yuan & Baoping Wan & Jingbo Sun & Yang Wu & Lianhe Zhao & Xirun Wan & Wei Huang & Ta, 2025. "LINS: A general medical Q&A framework for enhancing the quality and credibility of LLM-generated responses," Nature Communications, Nature, vol. 16(1), pages 1-20, December.
Francesco Carli & Pierluigi Chiaro & Mariangela Morelli & Chakit Arora & Luisa Bisceglia & Natalia Oliveira Rosa & Alice Cortesi & Sara Franceschi & Francesca Lessi & Anna Luisa Stefano & Orazio Santo, 2025. "Learning and actioning general principles of cancer cell drug sensitivity," Nature Communications, Nature, vol. 16(1), pages 1-23, December.
Li, Butong & Zhu, Junjie & Zhao, Xufeng, 2025. "A prior knowledge-guided predictive framework for LCF life and its implementation in shaft-like components under multiaxial loading," Reliability Engineering and System Safety, Elsevier, vol. 260(C).
Gemma Turon & Mwila Mulubwa & Anna Montaner & Mathew Njoroge & Kelly Chibale & Miquel Duran-Frigola, 2025. "Artificial intelligence coupled to pharmacometrics modelling to tailor malaria and tuberculosis treatment in Africa," Nature Communications, Nature, vol. 16(1), pages 1-12, December.
Zhou, Zhen & Gu, Ziyuan & Qu, Xiaobo & Liu, Pan & Liu, Zhiyuan & Yu, Wenwu, 2024. "Urban mobility foundation model: A literature review and hierarchical perspective," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 192(C).
Xiaohan Lin & Yijie Xia & Yanheng Li & Yu-Peng Huang & Shuo Liu & Jun Zhang & Yi Qin Gao, 2025. "In-silico 3D molecular editing through physics-informed and preference-aligned generative foundation models," Nature Communications, Nature, vol. 16(1), pages 1-15, December.
Sanchaita Hazra & Marta Serra-Garcia, 2025. "Understanding Trust in AI as an Information Source: Cross-Country Evidence," CESifo Working Paper Series 11954, CESifo.
Beining Xu & Yongming Lu, 2025. "TECP: Token-Entropy Conformal Prediction for LLMs," Mathematics, MDPI, vol. 13(20), pages 1-14, October.
Igor Halperin, 2025. "Topic Identification in LLM Input-Output Pairs through the Lens of Information Bottleneck," Papers 2509.03533, arXiv.org.
Peng Zhang & Jiayu Shi & Maged N. Kamel Boulos, 2024. "Generative AI in Medicine and Healthcare: Moving Beyond the ‘Peak of Inflated Expectations’," Future Internet, MDPI, vol. 16(12), pages 1-21, December.
Hui-Hung Yu & Wei-Tsun Lin & Chih-Wei Kuan & Chao-Chi Yang & Kuan-Min Liao, 2025. "GraphRAG-Enhanced Dialogue Engine for Domain-Specific Question Answering: A Case Study on the Civil IoT Taiwan Platform," Future Internet, MDPI, vol. 17(9), pages 1-22, September.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-BIG-2025-09-01 (Big Data)
NEP-CMP-2025-09-01 (Computational Economics)
NEP-INV-2025-09-01 (Investment)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2508.10192. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Prompt-Response Semantic Divergence Metrics for Faithfulness Hallucination and Misalignment Detection in Large Language Models

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data