Off-the-Shelf Large Language Models for Causality Assessment of Individual Case Safety Reports: A Proof-of-Concept with COVID-19 Vaccines

My bibliography Save this article

Off-the-Shelf Large Language Models for Causality Assessment of Individual Case Safety Reports: A Proof-of-Concept with COVID-19 Vaccines

Author

Listed:

Andrea Abate
(University of Copenhagen
University of Messina)
Elisa Poncato
(University of Copenhagen)
Maria Antonietta Barbieri
(University of Copenhagen
University of Messina)
Greg Powell
(GSK)
Andrea Rossi
(University of Milan)
Simay Peker
(University of Copenhagen)
Anders Hviid
(University of Copenhagen
Statens Serum Institute)
Andrew Bate
(GSK
London School of Hygiene and Tropical Medicine)
Maurizio Sessa
(University of Copenhagen)

Registered:

Abstract

Background This study evaluated the feasibility of ChatGPT and Gemini, two off-the-shelf large language models (LLMs), to automate causality assessments, focusing on Adverse Events Following Immunizations (AEFIs) of myocarditis and pericarditis related to COVID-19 vaccines. Methods We assessed 150 COVID-19-related cases of myocarditis and pericarditis reported to the Vaccine Adverse Event Reporting System (VAERS) in the United States of America (USA). Both LLMs and human experts conducted the World Health Organization (WHO) algorithm for vaccine causality assessments, and inter-rater agreement was measured using percentage agreement. Adherence to the WHO algorithm was evaluated by comparing LLM responses to the expected sequence of the algorithm. Statistical analyses, including descriptive statistics and Random Forest modeling, explored case complexity (e.g., string length measurements) and factors affecting LLM performance and adherence. Results ChatGPT showed higher adherence to the WHO algorithm (34%) compared to Gemini (7%) and had moderate agreement (71%) with human experts, whereas Gemini had fair agreement (53%). Both LLMs often failed to recognize listed AEFIs, with ChatGPT and Gemini incorrectly identifying 6.7% and 13.3% of AEFIs, respectively. ChatGPT showed inconsistencies in 8.0% of cases and Gemini in 46.7%. For ChatGPT, adherence to the algorithm was associated with lower string complexity in prompt sections. The random forest analysis achieved an accuracy of 55% (95% confidence interval: 35.7–73.5) for predicting adherence to the WHO algorithm for ChatGPT. Conclusion Notable limitations of ChatGPT and Gemini have been identified in their use for aiding causality assessments in vaccine safety. ChatGPT performed better, with higher adherence and agreement with human experts. In the investigated scenario, both models are better suited as complementary tools to human expertise.

Suggested Citation

Andrea Abate & Elisa Poncato & Maria Antonietta Barbieri & Greg Powell & Andrea Rossi & Simay Peker & Anders Hviid & Andrew Bate & Maurizio Sessa, 2025. "Off-the-Shelf Large Language Models for Causality Assessment of Individual Case Safety Reports: A Proof-of-Concept with COVID-19 Vaccines," Drug Safety, Springer, vol. 48(7), pages 805-820, July.

Handle: RePEc:spr:drugsa:v:48:y:2025:i:7:d:10.1007_s40264-025-01531-y
DOI: 10.1007/s40264-025-01531-y

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:drugsa:v:48:y:2025:i:7:d:10.1007_s40264-025-01531-y. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com/economics/journal/40264 .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Off-the-Shelf Large Language Models for Causality Assessment of Individual Case Safety Reports: A Proof-of-Concept with COVID-19 Vaccines

Author

Abstract

Suggested Citation

Download full text from publisher

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data