Author
Listed:
- Gaston Bizel-Bizellot
(Imaging and Modeling Unit)
- Simon Galmiche
(Epidemiology of Emerging Diseases Unit
Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital)
- Benoît Lelandais
(Imaging and Modeling Unit
Image Analysis Hub)
- Tiffany Charmet
(Epidemiology of Emerging Diseases Unit)
- Laurent Coudeville
(Global Medical)
- Arnaud Fontanet
(Epidemiology of Emerging Diseases Unit
PACRI Unit)
- Christophe Zimmer
(Imaging and Modeling Unit
Rudolf Virchow Center for integrative and translational bioimaging
Center for Artificial Intelligence and Data Science (CAIDAS))
Abstract
Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics.
Suggested Citation
Gaston Bizel-Bizellot & Simon Galmiche & Benoît Lelandais & Tiffany Charmet & Laurent Coudeville & Arnaud Fontanet & Christophe Zimmer, 2025.
"Extracting circumstances of Covid-19 transmission from free text with large language models,"
Nature Communications, Nature, vol. 16(1), pages 1-13, December.
Handle:
RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-60762-w
DOI: 10.1038/s41467-025-60762-w
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-60762-w. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.