Author
Abstract
Among existing computational algorithms for single-cell RNA-seq analysis, clustering and trajectory inference are two major types of analysis that are routinely applied. For a given dataset, clustering and trajectory inference can generate vastly different visualizations that lead to very different interpretations of the data. To address this issue, we propose multiple scores to quantify the “clusterness” and “trajectoriness” of single-cell RNA-seq data, in other words, whether the data looks like a collection of distinct clusters or a continuum of progression trajectory. The scores we introduce are based on pairwise distance distribution, persistent homology, vector magnitude, Ripley’s K, and degrees of connectivity. Using simulated datasets, we demonstrate that the proposed scores are able to effectively differentiate between cluster-like data and trajectory-like data. Using real single-cell RNA-seq datasets, we demonstrate the scores can serve as indicators of whether clustering analysis or trajectory inference is a more appropriate choice for biological interpretation of the data.Author summary: Single-cell sequencing technologies have motivated development of numerous computational algorithms. Two main types of these algorithms are clustering and trajectory inference. When scientists have a scRNA-seq dataset, they usually pick one of these approaches based on what they think the data shows. If they think the data has distinct clusters of cells, they will analyze the data using clustering algorithms. If they think the data shows a continuous progression, they will use trajectory inference algorithms. However, sometimes using clustering and trajectory inference on the same data can lead to very different interpretations, where clustering algorithms produce distinct cell clusters while trajectory inference on the same data show continuous trajectories. This makes us wonder: which way of looking at the data is more appropriate? In this paper, we developed a pipeline for quantifying the “clusterness” and “trajectoriness” of scRNA-seq data, in other words, whether the data looks like a collection of distinct clusters or a continuum of progression trajectory. We think such geometric quantification is an important question that should be broadly discussed in the single-cell research community.
Suggested Citation
Hong Seo Lim & Peng Qiu, 2024.
"Quantifying the clusterness and trajectoriness of single-cell RNA-seq data,"
PLOS Computational Biology, Public Library of Science, vol. 20(2), pages 1-19, February.
Handle:
RePEc:plo:pcbi00:1011866
DOI: 10.1371/journal.pcbi.1011866
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1011866. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.