IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1012014.html
   My bibliography  Save this article

Clustering and visualization of single-cell RNA-seq data using path metrics

Author

Listed:
  • Andriana Manousidaki
  • Anna Little
  • Yuying Xie

Abstract

Recent advances in single-cell technologies have enabled high-resolution characterization of tissue and cancer compositions. Although numerous tools for dimension reduction and clustering are available for single-cell data analyses, these methods often fail to simultaneously preserve local cluster structure and global data geometry. To address these challenges, we developed a novel analyses framework, Single-Cell Path Metrics Profiling (scPMP), using power-weighted path metrics, which measure distances between cells in a data-driven way. Unlike Euclidean distance and other commonly used distance metrics, path metrics are density sensitive and respect the underlying data geometry. By combining path metrics with multidimensional scaling, a low dimensional embedding of the data is obtained which preserves both the global data geometry and cluster structure. We evaluate the method both for clustering quality and geometric fidelity, and it outperforms current scRNAseq clustering algorithms on a wide range of benchmarking data sets.Author summary: Advancements in single-cell technologies with the ability to measure gene expression at the cellular level have provided unprecedented opportunity to investigate the cell type (T cells, B cells, etc) and cell state diversity (active T cells and exhausted T cells) within tissues and cancers. However, analyzing this complex high-dimensional data when the noise level is high requires sophisticated tools to effectively extract useful biological information and faithfully visualize the data in a low-dimensional space (2- or 3-D). Existing computational methods such as dimension reduction and clustering (group similar cells together) for single-cell data struggle to simultaneously preserve local group structure and global data geometry (developmental relationship between cell types). To tackle this problem, we’ve developed a new analysis framework called scPMP (Single-Cell Path Metrics Profiling) based on a unique approach to measure distances between cells which takes into account both the density of cells (common vs rare cell types) and the overall structure of the data. We have demonstrated the ability of scPMP to better preserve the natural grouping of cells and the relationships between different groups over existing methods in numerous real and simulated data sets. This improvement could lead to more accurate identification of cell types and states.

Suggested Citation

  • Andriana Manousidaki & Anna Little & Yuying Xie, 2024. "Clustering and visualization of single-cell RNA-seq data using path metrics," PLOS Computational Biology, Public Library of Science, vol. 20(5), pages 1-19, May.
  • Handle: RePEc:plo:pcbi00:1012014
    DOI: 10.1371/journal.pcbi.1012014
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012014
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1012014&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1012014?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. T. C. Hu, 1961. "Letter to the Editor---The Maximum Capacity Route Problem," Operations Research, INFORMS, vol. 9(6), pages 898-900, December.
    2. Dmitry Kobak & Philipp Berens, 2019. "The art of using t-SNE for single-cell transcriptomics," Nature Communications, Nature, vol. 10(1), pages 1-14, December.
    3. Maurice Pollack, 1960. "Letter to the Editor---The Maximum Capacity Through a Network," Operations Research, INFORMS, vol. 8(5), pages 733-736, October.
    4. Duc Tran & Hung Nguyen & Bang Tran & Carlo La Vecchia & Hung N. Luu & Tin Nguyen, 2021. "Fast and precise single-cell data analysis using a hierarchical autoencoder," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    5. Wei Vivian Li & Jingyi Jessica Li, 2018. "An accurate and robust imputation method scImpute for single-cell RNA-seq data," Nature Communications, Nature, vol. 9(1), pages 1-9, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kaiwen Wang & Yuqiu Yang & Fangjiang Wu & Bing Song & Xinlei Wang & Tao Wang, 2023. "Comparative analysis of dimension reduction methods for cytometry by time-of-flight data," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    2. Tayyebi, Javad & Mitra, Ankan & Sefair, Jorge A., 2023. "The continuous maximum capacity path interdiction problem," European Journal of Operational Research, Elsevier, vol. 305(1), pages 38-52.
    3. Hyun Kim & Won Chang & Seok Joo Chae & Jong-Eun Park & Minseok Seo & Jae Kyoung Kim, 2024. "scLENS: data-driven signal detection for unbiased scRNA-seq data analysis," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    4. Lingfei Wang, 2021. "Single-cell normalization and association testing unifying CRISPR screen and gene co-expression analyses with Normalisr," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    5. Guocheng Fang & Zhen Qiao & Luqi Huang & Hui Zhu & Jun Xie & Tian Zhou & Zhongshu Xiong & I-Hsin Su & Dayong Jin & Yu-Cheng Chen, 2024. "Single-cell laser emitting cytometry for label-free nucleolus fingerprinting," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    6. Songming Tang & Xuejian Cui & Rongxiang Wang & Sijie Li & Siyu Li & Xin Huang & Shengquan Chen, 2024. "scCASE: accurate and interpretable enhancement for single-cell chromatin accessibility sequencing data," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    7. Rong Ma & Eric D. Sun & James Zou, 2023. "A spectral method for assessing and combining multiple data visualizations," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    8. Yingying Kang & Rajan Batta & Changhyun Kwon, 2014. "Value-at-Risk model for hazardous material transportation," Annals of Operations Research, Springer, vol. 222(1), pages 361-387, November.
    9. L. Mathur & B. Szalai & N. H. Du & R. Utharala & M. Ballinger & J. J. M. Landry & M. Ryckelynck & V. Benes & J. Saez-Rodriguez & C. A. Merten, 2022. "Combi-seq for multiplexed transcriptome-based profiling of drug combinations using deterministic barcoding in single-cell droplets," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    10. Benjamin L. Walker & Qing Nie, 2023. "NeST: nested hierarchical structure identification in spatial transcriptomic data," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    11. Gunnar Carlsson & Facundo Mémoli & Alejandro Ribeiro & Santiago Segarra, 2018. "Hierarchical clustering of asymmetric networks," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(1), pages 65-105, March.
    12. Erhan Erkut & Armann Ingolfsson, 2000. "Catastrophe Avoidance Models for Hazardous Materials Route Planning," Transportation Science, INFORMS, vol. 34(2), pages 165-179, May.
    13. Minhui Chen & Andy Dahl, 2024. "A robust model for cell type-specific interindividual variation in single-cell RNA sequencing data," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    14. Jing Qi & Yang Zhou & Zicen Zhao & Shuilin Jin, 2021. "SDImpute: A statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data," PLOS Computational Biology, Public Library of Science, vol. 17(6), pages 1-20, June.
    15. Zhijian Li & Christoph Kuppe & Susanne Ziegler & Mingbo Cheng & Nazanin Kabgani & Sylvia Menzel & Martin Zenke & Rafael Kramann & Ivan G. Costa, 2021. "Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    16. Zhenchao Tang & Guanxing Chen & Shouzhi Chen & Jianhua Yao & Linlin You & Calvin Yu-Chian Chen, 2024. "Modal-nexus auto-encoder for multi-modality cellular data integration and imputation," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    17. Calvete, Herminia I. & del-Pozo, Lourdes & Iranzo, José A., 2018. "Dealing with residual energy when transmitting data in energy-constrained capacitated networks," European Journal of Operational Research, Elsevier, vol. 269(2), pages 602-620.
    18. George C. Linderman & Jun Zhao & Manolis Roulis & Piotr Bielecki & Richard A. Flavell & Boaz Nadler & Yuval Kluger, 2022. "Zero-preserving imputation of single-cell RNA-seq data," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    19. Zhiyuan Yuan & Yisi Li & Minglei Shi & Fan Yang & Juntao Gao & Jianhua Yao & Michael Q. Zhang, 2022. "SOTIP is a versatile method for microenvironment modeling with spatial omics data," Nature Communications, Nature, vol. 13(1), pages 1-19, December.
    20. Scott R. Tyler & Daniel Lozano-Ojalvo & Ernesto Guccione & Eric E. Schadt, 2024. "Anti-correlated feature selection prevents false discovery of subpopulations in scRNAseq," Nature Communications, Nature, vol. 15(1), pages 1-15, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1012014. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.