IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1010025.html

Non-linear archetypal analysis of single-cell RNA-seq data by deep autoencoders

Author

Listed:
  • Yuge Wang
  • Hongyu Zhao

Abstract

Advances in single-cell RNA sequencing (scRNA-seq) have led to successes in discovering novel cell types and understanding cellular heterogeneity among complex cell populations through cluster analysis. However, cluster analysis is not able to reveal continuous spectrum of states and underlying gene expression programs (GEPs) shared across cell types. We introduce scAAnet, an autoencoder for single-cell non-linear archetypal analysis, to identify GEPs and infer the relative activity of each GEP across cells. We use a count distribution-based loss term to account for the sparsity and overdispersion of the raw count data and add an archetypal constraint to the loss function of scAAnet. We first show that scAAnet outperforms existing methods for archetypal analysis across different metrics through simulations. We then demonstrate the ability of scAAnet to extract biologically meaningful GEPs using publicly available scRNA-seq datasets including a pancreatic islet dataset, a lung idiopathic pulmonary fibrosis dataset and a prefrontal cortex dataset.Author summary: Single-cell RNA sequencing (scRNA-seq) techniques enable the profiling of gene expression at the single-cell level, and thus make it possible to uncover the cellular heterogeneity in a complex cell population which is composed of multiple cell types. Due to the complexity of biological system, different cell types may share underlying gene expression programs (GEPs) at different levels. However, such shared patterns are difficult to study by traditional cluster analysis. Based on the assumption that the expression profile of each cell results from a non-linear combination of multiple GEPs, we develop scAAnet, a deep learning model for non-linear archetypal decomposition of scRNA-seq data. We demonstrate that scAAnet is able to both achieve better decomposition performance in simulated data and identify biologically meaningful GEPs that are either cell-type-specific or disease-enriched in three real scRNA-seq datasets. To help interpret results from scAAnet, we also provide downstream analysis tools for the identification of program-specific marker genes. We expect scAAnet can be applied to explore GEPs shared across cells when scRNA-seq is used to study a complex disease or biological system.

Suggested Citation

  • Yuge Wang & Hongyu Zhao, 2022. "Non-linear archetypal analysis of single-cell RNA-seq data by deep autoencoders," PLOS Computational Biology, Public Library of Science, vol. 18(4), pages 1-31, April.
  • Handle: RePEc:plo:pcbi00:1010025
    DOI: 10.1371/journal.pcbi.1010025
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010025
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1010025&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1010025?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Giorgia Quadrato & Tuan Nguyen & Evan Z. Macosko & John L. Sherwood & Sung Min Yang & Daniel R. Berger & Natalie Maria & Jorg Scholvin & Melissa Goldman & Justin P. Kinney & Edward S. Boyden & Jeff W., 2017. "Cell diversity and network dynamics in photosensitive human brain organoids," Nature, Nature, vol. 545(7652), pages 48-53, May.
    2. repec:abf:journl:v:31:y:2020:i:3:p:24253-24254 is not listed on IDEAS
    3. Daniel D. Lee & H. Sebastian Seung, 1999. "Learning the parts of objects by non-negative matrix factorization," Nature, Nature, vol. 401(6755), pages 788-791, October.
    4. Gökcen Eraslan & Lukas M. Simon & Maria Mircea & Nikola S. Mueller & Fabian J. Theis, 2019. "Single-cell RNA-seq denoising using a deep count autoencoder," Nature Communications, Nature, vol. 10(1), pages 1-14, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rafael Teixeira & Mário Antunes & Diogo Gomes & Rui L. Aguiar, 2024. "Comparison of Semantic Similarity Models on Constrained Scenarios," Information Systems Frontiers, Springer, vol. 26(4), pages 1307-1330, August.
    2. Del Corso, Gianna M. & Romani, Francesco, 2019. "Adaptive nonnegative matrix factorization and measure comparisons for recommender systems," Applied Mathematics and Computation, Elsevier, vol. 354(C), pages 164-179.
    3. P Fogel & C Geissler & P Cotte & G Luta, 2022. "Applying separative non-negative matrix factorization to extra-financial data," Working Papers hal-03689774, HAL.
    4. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.
    5. János Abonyi & Ádám Ipkovich & Gyula Dörgő & Károly Héberger, 2023. "Matrix factorization-based multi-objective ranking–What makes a good university?," PLOS ONE, Public Library of Science, vol. 18(4), pages 1-30, April.
    6. Naiyang Guan & Lei Wei & Zhigang Luo & Dacheng Tao, 2013. "Limited-Memory Fast Gradient Descent Method for Graph Regularized Nonnegative Matrix Factorization," PLOS ONE, Public Library of Science, vol. 8(10), pages 1-10, October.
    7. Spelta, A. & Pecora, N. & Rovira Kaltwasser, P., 2019. "Identifying Systemically Important Banks: A temporal approach for macroprudential policies," Journal of Policy Modeling, Elsevier, vol. 41(1), pages 197-218.
    8. M. Moghadam & K. Aminian & M. Asghari & M. Parnianpour, 2013. "How well do the muscular synergies extracted via non-negative matrix factorisation explain the variation of torque at shoulder joint?," Computer Methods in Biomechanics and Biomedical Engineering, Taylor & Francis Journals, vol. 16(3), pages 291-301.
    9. Ethan Bahl & Snehajyoti Chatterjee & Utsav Mukherjee & Muhammad Elsadany & Yann Vanrobaeys & Li-Chun Lin & Miriam McDonough & Jon Resch & K. Peter Giese & Ted Abel & Jacob J. Michaelson, 2024. "Using deep learning to quantify neuronal activation from single-cell and spatial transcriptomic data," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    10. Markovsky, Ivan & Niranjan, Mahesan, 2010. "Approximate low-rank factorization with structured factors," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3411-3420, December.
    11. Paul Fogel & Yann Gaston-Mathé & Douglas Hawkins & Fajwel Fogel & George Luta & S. Stanley Young, 2016. "Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health," IJERPH, MDPI, vol. 13(5), pages 1-14, May.
    12. Le Thi Khanh Hien & Duy Nhat Phan & Nicolas Gillis, 2022. "Inertial alternating direction method of multipliers for non-convex non-smooth optimization," Computational Optimization and Applications, Springer, vol. 83(1), pages 247-285, September.
    13. Zhaoyu Xing & Yang Wan & Juan Wen & Wei Zhong, 2024. "GOLFS: feature selection via combining both global and local information for high dimensional clustering," Computational Statistics, Springer, vol. 39(5), pages 2651-2675, July.
    14. Jonathan M Werner & Jesse Gillis, 2024. "Meta-analysis of single-cell RNA sequencing co-expression in human neural organoids reveals their high variability in recapitulating primary tissue," PLOS Biology, Public Library of Science, vol. 22(12), pages 1-34, December.
    15. Chae, Bongsug (Kevin), 2018. "The Internet of Things (IoT): A Survey of Topics and Trends using Twitter Data and Topic Modeling," 22nd ITS Biennial Conference, Seoul 2018. Beyond the boundaries: Challenges for business, policy and society 190376, International Telecommunications Society (ITS).
    16. Md Nazrul Islam & Md Mofazzal Hossain & Md Shafayet Shahed Ornob, 2024. "Business research on Industry 4.0: a systematic review using topic modelling approach," Future Business Journal, Springer, vol. 10(1), pages 1-15, December.
    17. Jingfeng Guo & Chao Zheng & Shanshan Li & Yutong Jia & Bin Liu, 2022. "BiInfGCN: Bilateral Information Augmentation of Graph Convolutional Networks for Recommendation," Mathematics, MDPI, vol. 10(17), pages 1-16, August.
    18. Jianfei Cao & Han Yang & Jianshu Lv & Quanyuan Wu & Baolei Zhang, 2023. "Estimating Soil Salinity with Different Levels of Vegetation Cover by Using Hyperspectral and Non-Negative Matrix Factorization Algorithm," IJERPH, MDPI, vol. 20(4), pages 1-15, February.
    19. Wang, Ketong & Porter, Michael D., 2018. "Optimal Bayesian clustering using non-negative matrix factorization," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 395-411.
    20. Lei, Da & Cheng, Long & Wang, Pengfei & Chen, Xuewu & Zhang, Lin, 2024. "Identifying service bottlenecks in public bikesharing flow networks," Journal of Transport Geography, Elsevier, vol. 116(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1010025. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.