IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1009118.html
   My bibliography  Save this article

SDImpute: A statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data

Author

Listed:
  • Jing Qi
  • Yang Zhou
  • Zicen Zhao
  • Shuilin Jin

Abstract

The single-cell RNA sequencing (scRNA-seq) technologies obtain gene expression at single-cell resolution and provide a tool for exploring cell heterogeneity and cell types. As the low amount of extracted mRNA copies per cell, scRNA-seq data exhibit a large number of dropouts, which hinders the downstream analysis of the scRNA-seq data. We propose a statistical method, SDImpute (Single-cell RNA-seq Dropout Imputation), to implement block imputation for dropout events in scRNA-seq data. SDImpute automatically identifies the dropout events based on the gene expression levels and the variations of gene expression across similar cells and similar genes, and it implements block imputation for dropouts by utilizing gene expression unaffected by dropouts from similar cells. In the experiments, the results of the simulated datasets and real datasets suggest that SDImpute is an effective tool to recover the data and preserve the heterogeneity of gene expression across cells. Compared with the state-of-the-art imputation methods, SDImpute improves the accuracy of the downstream analysis including clustering, visualization, and differential expression analysis.Author summary: Single-cell RNA sequencing (scRNA-seq) allows researchers to analyze gene expression of thousands of single cells simultaneously. However, the low amount of extracted mRNA leads to a large number of dropout events, which introduce computational challenges and hinder downstream analysis of data. To address this problem, we developed SDImpute, a novel statistical method to recover the scRNA-seq data based on cell-level and gene-level information in this manuscript. The goal of our algorithm is to denoise the scRNA-seq data while maintaining the biological nature of gene expression. Combining SDImpute with the downstream analysis tools, we considered the matched bulk expression data and known cell labels of the scRNA-seq data as criteria to design experiments to validate the performance of our method in both simulated and real datasets. Moreover, we offer an R package with detailed instructions and an example input dataset. We hope that SDImpute will be beneficial to researchers to identify mechanisms underlying some biological processes by analysis of the scRNA-seq data.

Suggested Citation

  • Jing Qi & Yang Zhou & Zicen Zhao & Shuilin Jin, 2021. "SDImpute: A statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data," PLOS Computational Biology, Public Library of Science, vol. 17(6), pages 1-20, June.
  • Handle: RePEc:plo:pcbi00:1009118
    DOI: 10.1371/journal.pcbi.1009118
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009118
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1009118&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1009118?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Gökcen Eraslan & Lukas M. Simon & Maria Mircea & Nikola S. Mueller & Fabian J. Theis, 2019. "Single-cell RNA-seq denoising using a deep count autoencoder," Nature Communications, Nature, vol. 10(1), pages 1-14, December.
    2. J. Gray Camp & Keisuke Sekine & Tobias Gerber & Henry Loeffler-Wirth & Hans Binder & Malgorzata Gac & Sabina Kanton & Jorge Kageyama & Georg Damm & Daniel Seehofer & Lenka Belicova & Marc Bickle & Ric, 2017. "Multilineage communication regulates human liver bud development from pluripotency," Nature, Nature, vol. 546(7659), pages 533-538, June.
    3. Wang, Xiaogang & Qiu, Weiliang & Zamar, Ruben H., 2007. "CLUES: A non-parametric clustering method based on local shrinking," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 286-298, September.
    4. Catalina A Vallejos & John C Marioni & Sylvia Richardson, 2015. "BASiCS: Bayesian Analysis of Single-Cell Sequencing Data," PLOS Computational Biology, Public Library of Science, vol. 11(6), pages 1-18, June.
    5. Wei Vivian Li & Jingyi Jessica Li, 2018. "An accurate and robust imputation method scImpute for single-cell RNA-seq data," Nature Communications, Nature, vol. 9(1), pages 1-9, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lingfei Wang, 2021. "Single-cell normalization and association testing unifying CRISPR screen and gene co-expression analyses with Normalisr," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    2. Zhijian Li & Christoph Kuppe & Susanne Ziegler & Mingbo Cheng & Nazanin Kabgani & Sylvia Menzel & Martin Zenke & Rafael Kramann & Ivan G. Costa, 2021. "Chromatin-accessibility estimation from single-cell ATAC-seq data with scOpen," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    3. George C. Linderman & Jun Zhao & Manolis Roulis & Piotr Bielecki & Richard A. Flavell & Boaz Nadler & Yuval Kluger, 2022. "Zero-preserving imputation of single-cell RNA-seq data," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    4. Md Tauhidul Islam & Jen-Yeu Wang & Hongyi Ren & Xiaomeng Li & Masoud Badiei Khuzani & Shengtian Sang & Lequan Yu & Liyue Shen & Wei Zhao & Lei Xing, 2022. "Leveraging data-driven self-consistency for high-fidelity gene expression recovery," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    5. Hui Li & Cory R. Brouwer & Weijun Luo, 2022. "A universal deep neural network for in-depth cleaning of single-cell RNA-Seq data," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    6. Ethan Bahl & Snehajyoti Chatterjee & Utsav Mukherjee & Muhammad Elsadany & Yann Vanrobaeys & Li-Chun Lin & Miriam McDonough & Jon Resch & K. Peter Giese & Ted Abel & Jacob J. Michaelson, 2024. "Using deep learning to quantify neuronal activation from single-cell and spatial transcriptomic data," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    7. Chang, Fang & Qiu, Weiliang & Zamar, Ruben H. & Lazarus, Ross & Wang, Xiaogang, 2010. "clues: An R Package for Nonparametric Clustering Based on Local Shrinking," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i04).
    8. Qiu Weiliang & He Wenqing & Wang Xiaogang & Lazarus Ross, 2008. "A Marginal Mixture Model for Selecting Differentially Expressed Genes across Two Types of Tissue Samples," The International Journal of Biostatistics, De Gruyter, vol. 4(1), pages 1-28, October.
    9. Ziqi Zhang & Xinye Zhao & Mehak Bindra & Peng Qiu & Xiuwei Zhang, 2024. "scDisInFact: disentangled learning for integration and prediction of multi-batch multi-condition single-cell RNA-sequencing data," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    10. Nicolae Sapoval & Amirali Aghazadeh & Michael G. Nute & Dinler A. Antunes & Advait Balaji & Richard Baraniuk & C. J. Barberan & Ruth Dannenfelser & Chen Dun & Mohammadamin Edrisi & R. A. Leo Elworth &, 2022. "Current progress and open challenges for applying deep learning across the biosciences," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    11. Kaiwen Wang & Yuqiu Yang & Fangjiang Wu & Bing Song & Xinlei Wang & Tao Wang, 2023. "Comparative analysis of dimension reduction methods for cytometry by time-of-flight data," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    12. Benjamin L. Walker & Qing Nie, 2023. "NeST: nested hierarchical structure identification in spatial transcriptomic data," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    13. Ian Covert & Rohan Gala & Tim Wang & Karel Svoboda & Uygar Sümbül & Su-In Lee, 2023. "Predictive and robust gene selection for spatial transcriptomics," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    14. Xin Tang & Jiawei Zhang & Yichun He & Xinhe Zhang & Zuwan Lin & Sebastian Partarrieu & Emma Bou Hanna & Zhaolin Ren & Hao Shen & Yuhong Yang & Xiao Wang & Na Li & Jie Ding & Jia Liu, 2023. "Explainable multi-task learning for multi-modality biological data analysis," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    15. Wenwei Sun & Meimei Wang & Jun Zhao & Shuang Zhao & Wenchao Zhu & Xiaoting Wu & Feifei Li & Wei Liu & Zhuo Wang & Meng Gao & Yiyue Zhang & Jin Xu & Meijia Zhang & Qiang Wang & Zilong Wen & Juan Shen &, 2023. "Sulindac selectively induces autophagic apoptosis of GABAergic neurons and alters motor behaviour in zebrafish," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    16. Lulu Shang & Xiang Zhou, 2022. "Spatially aware dimension reduction for spatial transcriptomics," Nature Communications, Nature, vol. 13(1), pages 1-22, December.
    17. Lucy Xia & Christy Lee & Jingyi Jessica Li, 2024. "Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
    18. Vidhya M. Ravi & Nicolas Neidert & Paulina Will & Kevin Joseph & Julian P. Maier & Jan Kückelhaus & Lea Vollmer & Jonathan M. Goeldner & Simon P. Behringer & Florian Scherer & Melanie Boerries & Marie, 2022. "T-cell dysfunction in the glioblastoma microenvironment is mediated by myeloid cells releasing interleukin-10," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    19. Bécue-Bertaut, Monica & Pagès, Jérome, 2008. "Multiple factor analysis and clustering of a mixture of quantitative, categorical and frequency data," Computational Statistics & Data Analysis, Elsevier, vol. 52(6), pages 3255-3268, February.
    20. Fraiman, Ricardo & Justel, Ana & Svarc, Marcela, 2010. "Pattern recognition via projection-based kNN rules," Computational Statistics & Data Analysis, Elsevier, vol. 54(5), pages 1390-1403, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1009118. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.