IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1012386.html
   My bibliography  Save this article

Binomial models uncover biological variation during feature selection of droplet-based single-cell RNA sequencing

Author

Listed:
  • Breanne Sparta
  • Timothy Hamilton
  • Gunalan Natesan
  • Samuel D Aragones
  • Eric J Deeds

Abstract

Effective analysis of single-cell RNA sequencing (scRNA-seq) data requires a rigorous distinction between technical noise and biological variation. In this work, we propose a simple feature selection model, termed “Differentially Distributed Genes” or DDGs, where a binomial sampling process for each mRNA species produces a null model of technical variation. Using scRNA-seq data where cell identities have been established a priori, we find that the DDG model of biological variation outperforms existing methods. We demonstrate that DDGs distinguish a validated set of real biologically varying genes, minimize neighborhood distortion, and enable accurate partitioning of cells into their established cell-type groups.Author summary: Single-cell omics technologies measure tens of thousands of genes in up to millions of individual cells. Yet, the sheer dimensionality of the data poses a challenge to its intelligibility. A typical first step in reducing the dimensionality is to apply a feature selection model that distinguishes real biological signals from technical noise. Yet without an appropriate model of technical noise, feature selection can introduce bias into the downstream analysis of the data. In this work, we demonstrate that, in the analysis of single-cell RNA sequencing data, the standard approach of finding Highly Variable Genes (HVGs) induces severe distortion and bias into the analysis of data, when compared to true biological variation that is known a priori. To address this issue, we present a new feature selection model and demonstrate that our model outperforms existing methods in its ability to accurately identify real biological variation.

Suggested Citation

  • Breanne Sparta & Timothy Hamilton & Gunalan Natesan & Samuel D Aragones & Eric J Deeds, 2024. "Binomial models uncover biological variation during feature selection of droplet-based single-cell RNA sequencing," PLOS Computational Biology, Public Library of Science, vol. 20(9), pages 1-31, September.
  • Handle: RePEc:plo:pcbi00:1012386
    DOI: 10.1371/journal.pcbi.1012386
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012386
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1012386&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1012386?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Grace X. Y. Zheng & Jessica M. Terry & Phillip Belgrader & Paul Ryvkin & Zachary W. Bent & Ryan Wilson & Solongo B. Ziraldo & Tobias D. Wheeler & Geoff P. McDermott & Junjie Zhu & Mark T. Gregory & Jo, 2017. "Massively parallel digital transcriptional profiling of single cells," Nature Communications, Nature, vol. 8(1), pages 1-12, April.
    2. David I. Warton, 2018. "Why you cannot transform your way out of trouble for small counts," Biometrics, The International Biometric Society, vol. 74(1), pages 362-368, March.
    3. Carmen Lidia Diaz Soria & Jayhun Lee & Tracy Chong & Avril Coghlan & Alan Tracey & Matthew D. Young & Tallulah Andrews & Christopher Hall & Bee Ling Ng & Kate Rawlinson & Stephen R. Doyle & Steven Leo, 2020. "Single-cell atlas of the first intra-mammalian developmental stage of the human parasite Schistosoma mansoni," Nature Communications, Nature, vol. 11(1), pages 1-16, December.
    4. Kyla D. Omilusik & Ananda W. Goldrath, 2017. "The origins of memory T cells," Nature, Nature, vol. 552(7685), pages 337-339, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Azka Javaid & Hildreth Robert Frost, 2023. "STREAK: A supervised cell surface receptor abundance estimation strategy for single cell RNA-sequencing data using feature selection and thresholded gene set scoring," PLOS Computational Biology, Public Library of Science, vol. 19(8), pages 1-24, August.
    2. repec:plo:pcbi00:1007925 is not listed on IDEAS
    3. Grace Yee Lin Ng & Shing Chiang Tan & Chia Sui Ong, 2023. "On the use of QDE-SVM for gene feature selection and cell type classification from scRNA-seq data," PLOS ONE, Public Library of Science, vol. 18(10), pages 1-22, October.
    4. Svenja Gramberg & Oliver Puckelwaldt & Tobias Schmitt & Zhigang Lu & Simone Haeberlein, 2024. "Spatial transcriptomics of a parasitic flatworm provides a molecular map of drug targets and drug resistance genes," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    5. Snehalika Lall & Sumanta Ray & Sanghamitra Bandyopadhyay, 2022. "A copula based topology preserving graph convolution network for clustering of single-cell RNA-seq data," PLOS Computational Biology, Public Library of Science, vol. 18(3), pages 1-16, March.
    6. Qunlun Shen & Shihua Zhang, 2021. "Approximate distance correlation for selecting highly interrelated genes across datasets," PLOS Computational Biology, Public Library of Science, vol. 17(11), pages 1-18, November.
    7. Jinzhou Li & Marloes H. Maathuis, 2021. "GGM knockoff filter: False discovery rate control for Gaussian graphical models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(3), pages 534-558, July.
    8. Lin Lin & Wei Shi & Jianbo Ye & Jia Li, 2023. "Multisource single‐cell data integration by MAW barycenter for Gaussian mixture models," Biometrics, The International Biometric Society, vol. 79(2), pages 866-877, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1012386. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.