Quantify unmet medical need across the disease landscape – A large language model-based methodology

Quantify unmet medical need across the disease landscape – A large language model-based methodology

Author

Listed:

Elliott W Sharp
Nicholas Fragola
Charlotte Blewitt
Matthew Goddeeris
Lee Lancashire
Charlie Hempstead
David C Fajgenbaum

Abstract

Background: Despite the ultimate goal of medical researchers and funders being to maximize patient benefit, there is no systematic process for quantifying unmet medical need across diseases. While a relative unmet medical need scoring system would be valuable for prioritization of medical research, systematically performing this effort across all 22,701 human diseases is technically challenging, time-consuming, and expensive. Using a large language model-based (LLM) architecture, we built a scalable method demonstrating feasibility to quantify “unmet medical need” criteria across all diseases, combine those criteria into a single weighted score, and extend the method into new criteria or diseases in the future. We aimed to quantitatively determine which diseases have the greatest unmet medical need and, therefore, which diseases are priority targets for new repurposed treatments. Method and findings: We defined 11 scoring criteria across three categories of unmet medical need. For each criterion, we tested LLM models and refined prompts to generate a score per criteria for each disease and then defined a weighting for each criterion to contribute to a final score. A 30-disease development set was used to iterate on the prompting, and a 10-disease evaluation set was held out and used to evaluate the performance of the final prompt. All 22,701 human diseases in the MONDO disease ontology were quantitatively scored for their unmet medical need across all 11 weighted criteria. The resulting scores allowed for relative comparison between diseases of unmet medical needs. Inter-expert agreement was strong, indicating reliability of the scoring framework with 95% of ratings within a 1-point difference. Across multiple LLMs, gpt-4o is most closely aligned with expert rankings, achieving low mean and standard deviation differences relative to human scores. Furthermore, LLM-generated scores demonstrated strong Spearman’s rho correlations with expert assessments across key clinical criteria, such as mortality (ρ = 0.845) and quality-adjusted life years lost (ρ = 0.822), supporting their suitability for prioritizing unmet medical need. All data were generated in ~1 hour with no missing data, at a total cost of $120 USD of compute and the results of the Unmet Medical Need Index are publicly available. The main limitation of this study is the combined size of the development and evaluation set being 40 diseases. Conclusions: This accessible, scalable methodology enables funders and researchers, across governments, universities, healthcare organizations, and disease groups to tailor prioritization efforts according to unmet medical need in the context of their organizational objectives, by selecting appropriate criteria and weighting of those criteria. This method creates a pragmatic and transparent tool to streamline research prioritization. Future research should consider expanding the disease set size used to create scores. Why was this study done?: What did the researchers do and find?: What do these findings mean?: Elliott W Sharp and colleagues develop Large Language Model-Based Methodology to quantify unmet needs among human diseases considering factors related to patient suffering, standard of care, and accessibility of care.

Suggested Citation

Elliott W Sharp & Nicholas Fragola & Charlotte Blewitt & Matthew Goddeeris & Lee Lancashire & Charlie Hempstead & David C Fajgenbaum, 2026. "Quantify unmet medical need across the disease landscape – A large language model-based methodology," PLOS Medicine, Public Library of Science, vol. 23(3), pages 1-17, March.

Handle: RePEc:plo:pmed00:1004798
DOI: 10.1371/journal.pmed.1004798

Download full text from publisher

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pmed00:1004798. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosmedicine (email available below). General contact details of provider: https://journals.plos.org/plosmedicine/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Quantify unmet medical need across the disease landscape – A large language model-based methodology

Author

Abstract

Suggested Citation

Download full text from publisher

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data