IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-59926-5.html
   My bibliography  Save this article

CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells

Author

Listed:
  • Yuansong Zeng

    (Sun Yat-sen University
    Chongqing University
    Jinfeng Laboratory)

  • Jiancong Xie

    (Sun Yat-sen University)

  • Ningyuan Shangguan

    (Sun Yat-sen University)

  • Zhuoyi Wei

    (Sun Yat-sen University
    Ltd)

  • Wenbing Li

    (Sun Yat-sen University)

  • Yun Su

    (Ltd)

  • Shuangyu Yang

    (Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University)

  • Chengyang Zhang

    (Chongqing University)

  • Jinbo Zhang

    (Nanjing)

  • Nan Fang

    (Nanjing)

  • Hongyu Zhang

    (Chongqing University)

  • Yutong Lu

    (Sun Yat-sen University)

  • Huiying Zhao

    (Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University)

  • Jue Fan

    (Nanjing)

  • Weijiang Yu

    (Sun Yat-sen University
    Ltd)

  • Yuedong Yang

    (Sun Yat-sen University)

Abstract

Single-cell sequencing provides transcriptomic profiling at single-cell resolution, uncovering cellular heterogeneity with unprecedented precision. Yet, current single cell data analysis suffers from the inherent data noises, batch effects, and sparsity, highlighting the requirement of a unified model to represent cellular states. To circumvent this problem, many recent efforts focus on training single-cell foundation models based on large datasets. However, current human foundation models are still limited by the sizes of training data and model parameters. Here, we have collected a diverse dataset of 100 million human cells, on which we train a single-cell foundation model (CellFM) containing 800 million parameters. To balance efficiency and performance, the model is trained through a modified RetNet framework on the MindSpore. Extensive experiments have shown that CellFM outperforms existing models in cell annotation, perturbation prediction, gene function prediction, and gene-gene relationship capturing.

Suggested Citation

  • Yuansong Zeng & Jiancong Xie & Ningyuan Shangguan & Zhuoyi Wei & Wenbing Li & Yun Su & Shuangyu Yang & Chengyang Zhang & Jinbo Zhang & Nan Fang & Hongyu Zhang & Yutong Lu & Huiying Zhao & Jue Fan & We, 2025. "CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells," Nature Communications, Nature, vol. 16(1), pages 1-17, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59926-5
    DOI: 10.1038/s41467-025-59926-5
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-59926-5
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-59926-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59926-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.