Author
Listed:
- Saishi Cui
- Sina Nassiri
- Issa Zakeri
Abstract
Single-cell RNA sequencing (scRNA-seq) data analysis faces numerous challenges, including high sparsity, a high-dimensional feature space, and biological noise. These challenges hinder downstream analysis, necessitating the use of feature selection methods to identify informative genes, and reduce data dimensionality. However, existing methods for selecting highly variable genes (HVGs) exhibit limited overlap and inconsistent clustering performance across benchmark datasets. Moreover, these methods often struggle to accurately select HVGs from fine-resolution scRNA-seq datasets and minority cell types, which are more difficult to distinguish, raising concerns about the reliability of their results. To overcome these limitations, we propose a novel feature selection framework for scRNA-seq data called Mcadet. Mcadet integrates Multiple Correspondence Analysis (MCA), graph-based community detection, and a novel statistical testing approach. To assess the effectiveness of Mcadet, we conducted extensive evaluations using both simulated and real-world data, employing unbiased metrics for comparison. Our results demonstrate the superior performance of Mcadet in the selection of HVGs in scenarios involving fine-resolution scRNA-seq datasets and datasets containing minority cell populations. Overall, we demonstrate that Mcadet enhances the reliability of selected HVGs, although the impact of HVG selection on various downstream analyses varies and needs to be further investigated.Author summary: scRNA-seq brings both great opportunities and challenges for transcriptomic analysis. While scRNA-seq enables the characterization of cell heterogeneity at an unprecedented resolution, analytical issues like sparsity, noise and bias can severely compromise interpretation if not addressed properly. To extract meaningful biological signals, effective feature selection is critical. We propose Mcadet, a novel framework for feature selection in scRNA-seq data. Mcadet aims to accurately identify informative genes from fine-resolution datasets and datasets with minority cell types where existing methods falter.
Suggested Citation
Saishi Cui & Sina Nassiri & Issa Zakeri, 2024.
"Mcadet: A feature selection method for fine-resolution single-cell RNA-seq data based on multiple correspondence analysis and community detection,"
PLOS Computational Biology, Public Library of Science, vol. 20(10), pages 1-39, October.
Handle:
RePEc:plo:pcbi00:1012560
DOI: 10.1371/journal.pcbi.1012560
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1012560. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.