IDEAS home Printed from https://ideas.repec.org/a/spr/stabio/v15y2023i3d10.1007_s12561-022-09335-9.html
   My bibliography  Save this article

scPI: A Scalable Framework for Probabilistic Inference in Single-Cell RNA-Sequencing Data Analysis

Author

Listed:
  • Jingsi Ming

    (East China Normal University
    The Hong Kong University of Science and Technology)

  • Jia Zhao

    (The Hong Kong University of Science and Technology)

  • Can Yang

    (The Hong Kong University of Science and Technology)

Abstract

The technique of single-cell RNA-sequencing (scRNA-seq) has provided an unprecedented opportunity to investigate the cellular heterogeneity of complex tissues. As large-scale scRNA-seq datasets are becoming more available and affordable, there is a growing demand for computational scalable methods to analyze scRNA-seq data. Here, we propose a scalable framework, scPI, to infer the latent low-dimensional representations of the scRNA-seq data to facilitate downstream analysis. Our method scPI makes use of the amortized variational inference, where the posterior mean and variance of the latent variable are parameterized by a nonlinear neural network. This inference structure combined with stochastic optimization enables its computational efficiency and scalability. Through the analysis of two real datasets, we demonstrate that the scPI framework can be effectively applied to several probabilistic models for scRNA-seq data, in terms of its scalability, missing value imputation and cell type clustering. The codes for reproducing the real data analysis results are available at https://github.com/YangLabHKUST/scPI.

Suggested Citation

  • Jingsi Ming & Jia Zhao & Can Yang, 2023. "scPI: A Scalable Framework for Probabilistic Inference in Single-Cell RNA-Sequencing Data Analysis," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 15(3), pages 633-656, December.
  • Handle: RePEc:spr:stabio:v:15:y:2023:i:3:d:10.1007_s12561-022-09335-9
    DOI: 10.1007/s12561-022-09335-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s12561-022-09335-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s12561-022-09335-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Michael E. Tipping & Christopher M. Bishop, 1999. "Probabilistic Principal Component Analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 611-622.
    2. Junyue Cao & Malte Spielmann & Xiaojie Qiu & Xingfan Huang & Daniel M. Ibrahim & Andrew J. Hill & Fan Zhang & Stefan Mundlos & Lena Christiansen & Frank J. Steemers & Cole Trapnell & Jay Shendure, 2019. "The single-cell transcriptional landscape of mammalian organogenesis," Nature, Nature, vol. 566(7745), pages 496-502, February.
    3. Davide Risso & Fanny Perraudeau & Svetlana Gribkova & Sandrine Dudoit & Jean-Philippe Vert, 2018. "A general and flexible method for signal extraction from single-cell RNA-seq data," Nature Communications, Nature, vol. 9(1), pages 1-17, December.
    4. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    5. Barbara Treutlein & Doug G. Brownfield & Angela R. Wu & Norma F. Neff & Gary L. Mantalas & F. Hernan Espinoza & Tushar J. Desai & Mark A. Krasnow & Stephen R. Quake, 2014. "Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq," Nature, Nature, vol. 509(7500), pages 371-375, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lulu Shang & Xiang Zhou, 2022. "Spatially aware dimension reduction for spatial transcriptomics," Nature Communications, Nature, vol. 13(1), pages 1-22, December.
    2. Anahita Nodehi & Mousa Golalizadeh & Mehdi Maadooliat & Claudio Agostinelli, 2025. "Torus Probabilistic Principal Component Analysis," Journal of Classification, Springer;The Classification Society, vol. 42(2), pages 435-456, July.
    3. Ming-Wen Hu & Dong Won Kim & Sheng Liu & Donald J Zack & Seth Blackshaw & Jiang Qian, 2019. "PanoView: An iterative clustering method for single-cell RNA sequencing data," PLOS Computational Biology, Public Library of Science, vol. 15(8), pages 1-17, August.
    4. Carlo Cavicchia & Maurizio Vichi & Giorgia Zaccaria, 2022. "Gaussian mixture model with an extended ultrametric covariance structure," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 16(2), pages 399-427, June.
    5. Bobby Ranjan & Wenjie Sun & Jinyu Park & Kunal Mishra & Florian Schmidt & Ronald Xie & Fatemeh Alipour & Vipul Singhal & Ignasius Joanito & Mohammad Amin Honardoost & Jacy Mei Yun Yong & Ee Tzun Koh &, 2021. "DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    6. Md Tauhidul Islam & Jen-Yeu Wang & Hongyi Ren & Xiaomeng Li & Masoud Badiei Khuzani & Shengtian Sang & Lequan Yu & Liyue Shen & Wei Zhao & Lei Xing, 2022. "Leveraging data-driven self-consistency for high-fidelity gene expression recovery," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    7. Qi Liu & Charles A Herring & Quanhu Sheng & Jie Ping & Alan J Simmons & Bob Chen & Amrita Banerjee & Wei Li & Guoqiang Gu & Robert J Coffey & Yu Shyr & Ken S Lau, 2018. "Quantitative assessment of cell population diversity in single-cell landscapes," PLOS Biology, Public Library of Science, vol. 16(10), pages 1-29, October.
    8. Jinge Yu & Qiuyu Wu & Xiangyu Luo, 2023. "Bayesian Joint Modeling of Single-Cell Expression Data and Bulk Spatial Transcriptomic Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 15(3), pages 719-733, December.
    9. Francisco X. Galdos & Sidra Xu & William R. Goodyer & Lauren Duan & Yuhsin V. Huang & Soah Lee & Han Zhu & Carissa Lee & Nicholas Wei & Daniel Lee & Sean M. Wu, 2022. "devCellPy is a machine learning-enabled pipeline for automated annotation of complex multilayered single-cell transcriptomic data," Nature Communications, Nature, vol. 13(1), pages 1-20, December.
    10. Christopher W. Murray & Jennifer J. Brady & Mingqi Han & Hongchen Cai & Min K. Tsai & Sarah E. Pierce & Ran Cheng & Janos Demeter & David M. Feldser & Peter K. Jackson & David B. Shackelford & Monte M, 2022. "LKB1 drives stasis and C/EBP-mediated reprogramming to an alveolar type II fate in lung cancer," Nature Communications, Nature, vol. 13(1), pages 1-19, December.
    11. Jingtao Wang & Gregory J. Fonseca & Jun Ding, 2024. "scSemiProfiler: Advancing large-scale single-cell studies through semi-profiling with deep generative models and active learning," Nature Communications, Nature, vol. 15(1), pages 1-27, December.
    12. Ran Wang & Xianfa Yang & Jiehui Chen & Lin Zhang & Jonathan A. Griffiths & Guizhong Cui & Yingying Chen & Yun Qian & Guangdun Peng & Jinsong Li & Liantang Wang & John C. Marioni & Patrick P. L. Tam & , 2023. "Time space and single-cell resolved tissue lineage trajectories and laterality of body plan at gastrulation," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    13. Paul McLaughlin & Brian C. Franczak & Adam B. Kashlak, 2024. "Unsupervised Classification with a Family of Parsimonious Contaminated Shifted Asymmetric Laplace Mixtures," Journal of Classification, Springer;The Classification Society, vol. 41(1), pages 65-93, March.
    14. Wu, Han-Ming & Tien, Yin-Jing & Chen, Chun-houh, 2010. "GAP: A graphical environment for matrix visualization and cluster analysis," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 767-778, March.
    15. José E. Chacón, 2021. "Explicit Agreement Extremes for a 2 × 2 Table with Given Marginals," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 257-263, July.
    16. Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.
    17. Redivo, Edoardo & Nguyen, Hien D. & Gupta, Mayetri, 2020. "Bayesian clustering of skewed and multimodal data using geometric skewed normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    18. Xin Xu & Yang Lu & Yupeng Zhou & Zhiguo Fu & Yanjie Fu & Minghao Yin, 2021. "An Information-Explainable Random Walk Based Unsupervised Network Representation Learning Framework on Node Classification Tasks," Mathematics, MDPI, vol. 9(15), pages 1-14, July.
    19. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    20. Amiri, Babak & Karimianghadim, Ramin, 2024. "A novel text clustering model based on topic modelling and social network analysis," Chaos, Solitons & Fractals, Elsevier, vol. 181(C).

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stabio:v:15:y:2023:i:3:d:10.1007_s12561-022-09335-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.