IDEAS home Printed from https://ideas.repec.org/a/spr/drugsa/v48y2025i7d10.1007_s40264-025-01531-y.html
   My bibliography  Save this article

Off-the-Shelf Large Language Models for Causality Assessment of Individual Case Safety Reports: A Proof-of-Concept with COVID-19 Vaccines

Author

Listed:
  • Andrea Abate

    (University of Copenhagen
    University of Messina)

  • Elisa Poncato

    (University of Copenhagen)

  • Maria Antonietta Barbieri

    (University of Copenhagen
    University of Messina)

  • Greg Powell

    (GSK)

  • Andrea Rossi

    (University of Milan)

  • Simay Peker

    (University of Copenhagen)

  • Anders Hviid

    (University of Copenhagen
    Statens Serum Institute)

  • Andrew Bate

    (GSK
    London School of Hygiene and Tropical Medicine)

  • Maurizio Sessa

    (University of Copenhagen)

Abstract

Background This study evaluated the feasibility of ChatGPT and Gemini, two off-the-shelf large language models (LLMs), to automate causality assessments, focusing on Adverse Events Following Immunizations (AEFIs) of myocarditis and pericarditis related to COVID-19 vaccines. Methods We assessed 150 COVID-19-related cases of myocarditis and pericarditis reported to the Vaccine Adverse Event Reporting System (VAERS) in the United States of America (USA). Both LLMs and human experts conducted the World Health Organization (WHO) algorithm for vaccine causality assessments, and inter-rater agreement was measured using percentage agreement. Adherence to the WHO algorithm was evaluated by comparing LLM responses to the expected sequence of the algorithm. Statistical analyses, including descriptive statistics and Random Forest modeling, explored case complexity (e.g., string length measurements) and factors affecting LLM performance and adherence. Results ChatGPT showed higher adherence to the WHO algorithm (34%) compared to Gemini (7%) and had moderate agreement (71%) with human experts, whereas Gemini had fair agreement (53%). Both LLMs often failed to recognize listed AEFIs, with ChatGPT and Gemini incorrectly identifying 6.7% and 13.3% of AEFIs, respectively. ChatGPT showed inconsistencies in 8.0% of cases and Gemini in 46.7%. For ChatGPT, adherence to the algorithm was associated with lower string complexity in prompt sections. The random forest analysis achieved an accuracy of 55% (95% confidence interval: 35.7–73.5) for predicting adherence to the WHO algorithm for ChatGPT. Conclusion Notable limitations of ChatGPT and Gemini have been identified in their use for aiding causality assessments in vaccine safety. ChatGPT performed better, with higher adherence and agreement with human experts. In the investigated scenario, both models are better suited as complementary tools to human expertise.

Suggested Citation

  • Andrea Abate & Elisa Poncato & Maria Antonietta Barbieri & Greg Powell & Andrea Rossi & Simay Peker & Anders Hviid & Andrew Bate & Maurizio Sessa, 2025. "Off-the-Shelf Large Language Models for Causality Assessment of Individual Case Safety Reports: A Proof-of-Concept with COVID-19 Vaccines," Drug Safety, Springer, vol. 48(7), pages 805-820, July.
  • Handle: RePEc:spr:drugsa:v:48:y:2025:i:7:d:10.1007_s40264-025-01531-y
    DOI: 10.1007/s40264-025-01531-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s40264-025-01531-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s40264-025-01531-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:drugsa:v:48:y:2025:i:7:d:10.1007_s40264-025-01531-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com/economics/journal/40264 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.