IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-58699-1.html
   My bibliography  Save this article

scPRINT: pre-training on 50 million cells allows robust gene network predictions

Author

Listed:
  • Jérémie Kalfon

    (Machine Learning for Integrative Genomics group)

  • Jules Samaran

    (Machine Learning for Integrative Genomics group)

  • Gabriel Peyré

    (Université PSL)

  • Laura Cantini

    (Machine Learning for Integrative Genomics group)

Abstract

A cell is governed by the interaction of myriads of macromolecules. Inferring such a network of interactions has remained an elusive milestone in cellular biology. Building on recent advances in large foundation models and their ability to learn without supervision, we present scPRINT, a large cell model for the inference of gene networks pre-trained on more than 50 million cells from the cellxgene database. Using innovative pretraining tasks and model architecture, scPRINT pushes large transformer models towards more interpretability and usability when uncovering the complex biology of the cell. Based on our atlas-level benchmarks, scPRINT demonstrates superior performance in gene network inference to the state of the art, as well as competitive zero-shot abilities in denoising, batch effect correction, and cell label prediction. On an atlas of benign prostatic hyperplasia, scPRINT highlights the profound connections between ion exchange, senescence, and chronic inflammation.

Suggested Citation

  • Jérémie Kalfon & Jules Samaran & Gabriel Peyré & Laura Cantini, 2025. "scPRINT: pre-training on 50 million cells allows robust gene network predictions," Nature Communications, Nature, vol. 16(1), pages 1-23, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-58699-1
    DOI: 10.1038/s41467-025-58699-1
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-58699-1
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-58699-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Vân Anh Huynh-Thu & Alexandre Irrthum & Louis Wehenkel & Pierre Geurts, 2010. "Inferring Regulatory Networks from Expression Data Using Tree-Based Methods," PLOS ONE, Public Library of Science, vol. 5(9), pages 1-10, September.
    2. Kenji Kamimoto & Blerta Stringa & Christy M. Hoffmann & Kunal Jindal & Lilianna Solnica-Krezel & Samantha A. Morris, 2023. "Dissecting cell identity via network inference and in silico gene perturbation," Nature, Nature, vol. 614(7949), pages 742-751, February.
    3. Gökcen Eraslan & Lukas M. Simon & Maria Mircea & Nikola S. Mueller & Fabian J. Theis, 2019. "Single-cell RNA-seq denoising using a deep count autoencoder," Nature Communications, Nature, vol. 10(1), pages 1-14, December.
    4. Peizhuo Wang & Xiao Wen & Han Li & Peng Lang & Shuya Li & Yipin Lei & Hantao Shu & Lin Gao & Dan Zhao & Jianyang Zeng, 2023. "Deciphering driver regulators of cell fate decisions from single-cell transcriptomics data with CEFCON," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    5. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    6. Christina V. Theodoris & Ling Xiao & Anant Chopra & Mark D. Chaffin & Zeina R. Al Sayed & Matthew C. Hill & Helene Mantineo & Elizabeth M. Brydon & Zexian Zeng & X. Shirley Liu & Patrick T. Ellinor, 2023. "Transfer learning enables predictions in network biology," Nature, Nature, vol. 618(7965), pages 616-624, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hao Li & Zebei Han & Yu Sun & Fu Wang & Pengzhen Hu & Yuang Gao & Xuemei Bai & Shiyu Peng & Chao Ren & Xiang Xu & Zeyu Liu & Hebing Chen & Yang Yang & Xiaochen Bo, 2024. "CGMega: explainable graph neural network framework with attention mechanisms for cancer gene module dissection," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    2. Md Tauhidul Islam & Jen-Yeu Wang & Hongyi Ren & Xiaomeng Li & Masoud Badiei Khuzani & Shengtian Sang & Lequan Yu & Liyue Shen & Wei Zhao & Lei Xing, 2022. "Leveraging data-driven self-consistency for high-fidelity gene expression recovery," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    3. Xiang Lin & Tian Tian & Zhi Wei & Hakon Hakonarson, 2022. "Clustering of single-cell multi-omics data with a multimodal deep learning method," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    4. Hui Li & Cory R. Brouwer & Weijun Luo, 2022. "A universal deep neural network for in-depth cleaning of single-cell RNA-Seq data," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    5. Songming Tang & Xuejian Cui & Rongxiang Wang & Sijie Li & Siyu Li & Xin Huang & Shengquan Chen, 2024. "scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    6. Zhijian Li & Christoph Kuppe & Susanne Ziegler & Mingbo Cheng & Nazanin Kabgani & Sylvia Menzel & Martin Zenke & Rafael Kramann & Ivan G. Costa, 2021. "Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    7. Lulu Shang & Xiang Zhou, 2022. "Spatially aware dimension reduction for spatial transcriptomics," Nature Communications, Nature, vol. 13(1), pages 1-22, December.
    8. Lucy Xia & Christy Lee & Jingyi Jessica Li, 2024. "Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
    9. Wu, Han-Ming & Tien, Yin-Jing & Chen, Chun-houh, 2010. "GAP: A graphical environment for matrix visualization and cluster analysis," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 767-778, March.
    10. José E. Chacón, 2021. "Explicit Agreement Extremes for a 2 × 2 Table with Given Marginals," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 257-263, July.
    11. Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.
    12. Redivo, Edoardo & Nguyen, Hien D. & Gupta, Mayetri, 2020. "Bayesian clustering of skewed and multimodal data using geometric skewed normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    13. Felix Fischer & David S. Fischer & Roman Mukhin & Andrey Isaev & Evan Biederstedt & Alexandra-Chloé Villani & Fabian J. Theis, 2024. "scTab: Scaling cross-tissue single-cell annotation models," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    14. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    15. Amiri, Babak & Karimianghadim, Ramin, 2024. "A novel text clustering model based on topic modelling and social network analysis," Chaos, Solitons & Fractals, Elsevier, vol. 181(C).
    16. Li, Pai-Ling & Chiou, Jeng-Min, 2011. "Identifying cluster number for subspace projected functional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2090-2103, June.
    17. A van Giessen & K G M Moons & G A de Wit & W M M Verschuren & J M A Boer & H Koffijberg, 2015. "Tailoring the Implementation of New Biomarkers Based on Their Added Predictive Value in Subgroups of Individuals," PLOS ONE, Public Library of Science, vol. 10(1), pages 1-14, January.
    18. Ethan Bahl & Snehajyoti Chatterjee & Utsav Mukherjee & Muhammad Elsadany & Yann Vanrobaeys & Li-Chun Lin & Miriam McDonough & Jon Resch & K. Peter Giese & Ted Abel & Jacob J. Michaelson, 2024. "Using deep learning to quantify neuronal activation from single-cell and spatial transcriptomic data," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    19. Yaeji Lim & Hee-Seok Oh & Ying Kuen Cheung, 2019. "Multiscale Clustering for Functional Data," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 368-391, July.
    20. Qingfei Pan & Liang Ding & Siarhei Hladyshau & Xiangyu Yao & Jiayu Zhou & Lei Yan & Yogesh Dhungana & Hao Shi & Chenxi Qian & Xinran Dong & Chad Burdyshaw & Joao Pedro Veloso & Alireza Khatamian & Zhe, 2025. "scMINER: a mutual information-based framework for clustering and hidden driver inference from single-cell transcriptomics data," Nature Communications, Nature, vol. 16(1), pages 1-20, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-58699-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.