IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1012014.html
   My bibliography  Save this article

Clustering and visualization of single-cell RNA-seq data using path metrics

Author

Listed:
  • Andriana Manousidaki
  • Anna Little
  • Yuying Xie

Abstract

Recent advances in single-cell technologies have enabled high-resolution characterization of tissue and cancer compositions. Although numerous tools for dimension reduction and clustering are available for single-cell data analyses, these methods often fail to simultaneously preserve local cluster structure and global data geometry. To address these challenges, we developed a novel analyses framework, Single-Cell Path Metrics Profiling (scPMP), using power-weighted path metrics, which measure distances between cells in a data-driven way. Unlike Euclidean distance and other commonly used distance metrics, path metrics are density sensitive and respect the underlying data geometry. By combining path metrics with multidimensional scaling, a low dimensional embedding of the data is obtained which preserves both the global data geometry and cluster structure. We evaluate the method both for clustering quality and geometric fidelity, and it outperforms current scRNAseq clustering algorithms on a wide range of benchmarking data sets.Author summary: Advancements in single-cell technologies with the ability to measure gene expression at the cellular level have provided unprecedented opportunity to investigate the cell type (T cells, B cells, etc) and cell state diversity (active T cells and exhausted T cells) within tissues and cancers. However, analyzing this complex high-dimensional data when the noise level is high requires sophisticated tools to effectively extract useful biological information and faithfully visualize the data in a low-dimensional space (2- or 3-D). Existing computational methods such as dimension reduction and clustering (group similar cells together) for single-cell data struggle to simultaneously preserve local group structure and global data geometry (developmental relationship between cell types). To tackle this problem, we’ve developed a new analysis framework called scPMP (Single-Cell Path Metrics Profiling) based on a unique approach to measure distances between cells which takes into account both the density of cells (common vs rare cell types) and the overall structure of the data. We have demonstrated the ability of scPMP to better preserve the natural grouping of cells and the relationships between different groups over existing methods in numerous real and simulated data sets. This improvement could lead to more accurate identification of cell types and states.

Suggested Citation

  • Andriana Manousidaki & Anna Little & Yuying Xie, 2024. "Clustering and visualization of single-cell RNA-seq data using path metrics," PLOS Computational Biology, Public Library of Science, vol. 20(5), pages 1-19, May.
  • Handle: RePEc:plo:pcbi00:1012014
    DOI: 10.1371/journal.pcbi.1012014
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012014
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1012014&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1012014?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Dmitry Kobak & Philipp Berens, 2019. "The art of using t-SNE for single-cell transcriptomics," Nature Communications, Nature, vol. 10(1), pages 1-14, December.
    2. Duc Tran & Hung Nguyen & Bang Tran & Carlo La Vecchia & Hung N. Luu & Tin Nguyen, 2021. "Fast and precise single-cell data analysis using a hierarchical autoencoder," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    3. T. C. Hu, 1961. "Letter to the Editor---The Maximum Capacity Route Problem," Operations Research, INFORMS, vol. 9(6), pages 898-900, December.
    4. Maurice Pollack, 1960. "Letter to the Editor---The Maximum Capacity Through a Network," Operations Research, INFORMS, vol. 8(5), pages 733-736, October.
    5. Wei Vivian Li & Jingyi Jessica Li, 2018. "An accurate and robust imputation method scImpute for single-cell RNA-seq data," Nature Communications, Nature, vol. 9(1), pages 1-9, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kaiwen Wang & Yuqiu Yang & Fangjiang Wu & Bing Song & Xinlei Wang & Tao Wang, 2023. "Comparative analysis of dimension reduction methods for cytometry by time-of-flight data," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    2. Tayyebi, Javad & Mitra, Ankan & Sefair, Jorge A., 2023. "The continuous maximum capacity path interdiction problem," European Journal of Operational Research, Elsevier, vol. 305(1), pages 38-52.
    3. Hyun Kim & Won Chang & Seok Joo Chae & Jong-Eun Park & Minseok Seo & Jae Kyoung Kim, 2024. "scLENS: data-driven signal detection for unbiased scRNA-seq data analysis," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    4. Lingfei Wang, 2021. "Single-cell normalization and association testing unifying CRISPR screen and gene co-expression analyses with Normalisr," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    5. Rong Ma & Eric D. Sun & James Zou, 2023. "A spectral method for assessing and combining multiple data visualizations," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    6. L. Mathur & B. Szalai & N. H. Du & R. Utharala & M. Ballinger & J. J. M. Landry & M. Ryckelynck & V. Benes & J. Saez-Rodriguez & C. A. Merten, 2022. "Combi-seq for multiplexed transcriptome-based profiling of drug combinations using deterministic barcoding in single-cell droplets," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    7. Erhan Erkut & Armann Ingolfsson, 2000. "Catastrophe Avoidance Models for Hazardous Materials Route Planning," Transportation Science, INFORMS, vol. 34(2), pages 165-179, May.
    8. Jing Qi & Yang Zhou & Zicen Zhao & Shuilin Jin, 2021. "SDImpute: A statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data," PLOS Computational Biology, Public Library of Science, vol. 17(6), pages 1-20, June.
    9. Calvete, Herminia I. & del-Pozo, Lourdes & Iranzo, José A., 2018. "Dealing with residual energy when transmitting data in energy-constrained capacitated networks," European Journal of Operational Research, Elsevier, vol. 269(2), pages 602-620.
    10. Zhiyuan Yuan & Yisi Li & Minglei Shi & Fan Yang & Juntao Gao & Jianhua Yao & Michael Q. Zhang, 2022. "SOTIP is a versatile method for microenvironment modeling with spatial omics data," Nature Communications, Nature, vol. 13(1), pages 1-19, December.
    11. Scott R. Tyler & Daniel Lozano-Ojalvo & Ernesto Guccione & Eric E. Schadt, 2024. "Anti-correlated feature selection prevents false discovery of subpopulations in scRNAseq," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    12. Ziyan Huang & Myung Chung & Kentaro Tao & Akiyuki Watarai & Mu-Yun Wang & Hiroh Ito & Teruhiro Okuyama, 2023. "Ventromedial prefrontal neurons represent self-states shaped by vicarious fear in male mice," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    13. Md Tauhidul Islam & Jen-Yeu Wang & Hongyi Ren & Xiaomeng Li & Masoud Badiei Khuzani & Shengtian Sang & Lequan Yu & Liyue Shen & Wei Zhao & Lei Xing, 2022. "Leveraging data-driven self-consistency for high-fidelity gene expression recovery," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    14. Yasa Baig & Helena R. Ma & Helen Xu & Lingchong You, 2023. "Autoencoder neural networks enable low dimensional structure analyses of microbial growth dynamics," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    15. Mohammad Abbasi & Connor R Sanderford & Narendiran Raghu & Mirjeta Pasha & Benjamin B Bartelle, 2023. "Sparse representation learning derives biological features with explicit gene weights from the Allen Mouse Brain Atlas," PLOS ONE, Public Library of Science, vol. 18(3), pages 1-16, March.
    16. Mikhael D. Manurung & Friederike Sonnet & Marie-Astrid Hoogerwerf & Jacqueline J. Janse & Yvonne Kruize & Laura de Bes-Roeleveld & Marion König & Alex Loukas & Benjamin G. Dewals & Taniawati Supali & , 2024. "Controlled human hookworm infection remodels plasmacytoid dendritic cells and regulatory T cells towards profiles seen in natural infections in endemic areas," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    17. Thomas Hu & Mayar Allam & Shuangyi Cai & Walter Henderson & Brian Yueh & Aybuke Garipcan & Anton V. Ievlev & Maryam Afkarian & Semir Beyaz & Ahmet F. Coskun, 2023. "Single-cell spatial metabolomics with cell-type specific protein profiling for tissue systems biology," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    18. Louise Velut & Laura Fancello & Nadia Cherradi & Laurent Guyon, 2025. "Single-cell microRNA-mRNA co-sequencing techniques convey large potential for understanding microRNA regulations but require careful and systemic approaches," Nature Communications, Nature, vol. 16(1), pages 1-4, December.
    19. Hui Li & Cory R. Brouwer & Weijun Luo, 2022. "A universal deep neural network for in-depth cleaning of single-cell RNA-Seq data," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    20. Xianke Xiang & Yao He & Zemin Zhang & Xuerui Yang, 2024. "Interrogations of single-cell RNA splicing landscapes with SCASL define new cell identities with physiological relevance," Nature Communications, Nature, vol. 15(1), pages 1-17, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1012014. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.