IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-64511-x.html
   My bibliography  Save this article

Benchmarking cell type and gene set annotation by large language models with AnnDictionary

Author

Listed:
  • George Crowley

    (Stanford)

  • Stephen R. Quake

    (Stanford
    Stanford University
    Chan Zuckerberg Initiative)

Abstract

We develop an open-source package called AnnDictionary to facilitate the parallel, independent analysis of multiple anndata. AnnDictionary is built on top of LangChain and AnnData and supports all common large language model (LLM) providers. AnnDictionary only requires 1 line of code to configure or switch the LLM backend and it contains numerous multithreading optimizations to support the analysis of many anndata and large anndata. We use AnnDictionary to perform the first benchmarking study of all major LLMs at de novo cell-type annotation. LLMs vary greatly in absolute agreement with manual annotation based on model size. Inter-LLM agreement also varies with model size. We find that LLM annotation of most major cell types to be more than 80-90% accurate, and will maintain a leaderboard of LLM cell type annotation. Furthermore, we benchmark these LLMs at functional annotation of gene sets, and find that Claude 3.5 Sonnet recovers close matches of functional gene set annotations in over 80% of test sets.

Suggested Citation

  • George Crowley & Stephen R. Quake, 2025. "Benchmarking cell type and gene set annotation by large language models with AnnDictionary," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-64511-x
    DOI: 10.1038/s41467-025-64511-x
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-64511-x
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-64511-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Robin D. Lee & Sarah A. Munro & Todd P. Knutson & Rebecca S. LaRue & Lynn M. Heltemes-Harris & Michael A. Farrar, 2021. "Single-cell analysis identifies dynamic gene expression networks that govern B cell development and transformation," Nature Communications, Nature, vol. 12(1), pages 1-16, December.
    2. X. Fan & M. Bialecka & I. Moustakas & E. Lam & V. Torrens-Juaneda & N. V. Borggreven & L. Trouw & L. A. Louwe & G. S. K. Pilgram & H. Mei & L. Westerlaken & S. M. Chuva de Sousa Lopes, 2019. "Single-cell reconstruction of follicular remodeling in the human adult ovary," Nature Communications, Nature, vol. 10(1), pages 1-13, December.
    3. George Crowley & Stephen R. Quake, 2025. "Benchmarking cell type and gene set annotation by large language models with AnnDictionary," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    4. Zehua Zeng & Yuqing Ma & Lei Hu & Bowen Tan & Peng Liu & Yixuan Wang & Cencan Xing & Yuanyan Xiong & Hongwu Du, 2024. "OmicVerse: a framework for bridging and deepening insights across bulk and single-cell sequencing," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. George Crowley & Stephen R. Quake, 2025. "Benchmarking cell type and gene set annotation by large language models with AnnDictionary," Nature Communications, Nature, vol. 16(1), pages 1-14, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Simeng Hu & Can Hu & Jingli Xu & Pengfei Yu & Li Yuan & Ziyu Li & Haohong Liang & Yanqiang Zhang & Jiahui Chen & Qing Wei & Shengjie Zhang & Litao Yang & Dan Su & Yian Du & Zhiyuan Xu & Fan Bai & Xian, 2024. "The estrogen response in fibroblasts promotes ovarian metastases of gastric cancer," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    2. Ilmatar Rooda & Jasmin Hassan & Jie Hao & Magdalena Wagner & Elisabeth Moussaud-Lamodière & Kersti Jääger & Marjut Otala & Katri Knuus & Cecilia Lindskog & Kiriaki Papaikonomou & Sebastian Gidlöf & Ce, 2024. "In-depth analysis of transcriptomes in ovarian cortical follicles from children and adults reveals interfollicular heterogeneity," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    3. Xiaowen Zhang & Mingjian Peng & Jianghao Zhu & Xue Zhai & Chaoguang Wei & He Jiao & Zhichao Wu & Songqian Huang & Mingli Liu & Wenhao Li & Wenyi Yang & Kai Miao & Qiongqiong Xu & Liangbiao Chen & Peng, 2025. "Benchmarking metabolic RNA labeling techniques for high-throughput single-cell RNA sequencing," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
    4. Moïra Rossitto & Stephanie Déjardin & Chris M. Rands & Stephanie Gras & Roberta Migale & Mahmoud-Reza Rafiee & Yasmine Neirijnck & Alain Pruvost & Anvi Laetitia Nguyen & Guillaume Bossis & Florence Ca, 2022. "TRIM28-dependent SUMOylation protects the adult ovary from activation of the testicular pathway," Nature Communications, Nature, vol. 13(1), pages 1-19, December.
    5. Li Yang & Xulei Wang & Xingyu Zhou & Hongyu Chen & Sentao Song & Liling Deng & Yao Yao & Xiaolei Yin, 2025. "A tunable human intestinal organoid system achieves controlled balance between self-renewal and differentiation," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    6. Matthew C. Sinton & Praveena R. G. Chandrasegaran & Paul Capewell & Anneli Cooper & Alex Girard & John Ogunsola & Georgia Perona-Wright & Dieudonné M Ngoyi & Nono Kuispond & Bruno Bucheton & Mamadou C, 2023. "IL-17 signalling is critical for controlling subcutaneous adipose tissue dynamics and parasite burden during chronic murine Trypanosoma brucei infection," Nature Communications, Nature, vol. 14(1), pages 1-21, December.
    7. Sissy E. Wamaitha & Ernesto J. Rojas & Francesco Monticolo & Fei-man Hsu & Enrique Sosa & Amanda M. Mackie & Kiana Oyama & Maggie Custer & Melinda Murphy & Diana J. Laird & Jian Shu & Jon D. Hennebold, 2025. "Defining the cell and molecular origins of the primate ovarian reserve," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
    8. Kailu Song & Yumin Zheng & Bowen Zhao & David H. Eidelman & Jian Tang & Jun Ding, 2025. "DOLPHIN advances single-cell transcriptomics beyond gene level by leveraging exon and junction reads," Nature Communications, Nature, vol. 16(1), pages 1-26, December.
    9. Jingyuan Zhang & Pei Mao & Tengfei Zhou & Bingqing Yue & Yaning Li & Yuanhua Qiu & Kejing Ying & Fudi Wang & Jingyu Chen & Jun Yang, 2025. "Macrophage ferroptosis potentiates GCN2 deficiency induced pulmonary venous arterialization," Nature Communications, Nature, vol. 16(1), pages 1-19, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-64511-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.