IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1012006.html
   My bibliography  Save this article

Partial label learning for automated classification of single-cell transcriptomic profiles

Author

Listed:
  • Malek Senoussi
  • Thierry Artieres
  • Paul Villoutreix

Abstract

Single-cell RNA sequencing (scRNASeq) data plays a major role in advancing our understanding of developmental biology. An important current question is how to classify transcriptomic profiles obtained from scRNASeq experiments into the various cell types and identify the lineage relationship for individual cells. Because of the fast accumulation of datasets and the high dimensionality of the data, it has become challenging to explore and annotate single-cell transcriptomic profiles by hand. To overcome this challenge, automated classification methods are needed. Classical approaches rely on supervised training datasets. However, due to the difficulty of obtaining data annotated at single-cell resolution, we propose instead to take advantage of partial annotations. The partial label learning framework assumes that we can obtain a set of candidate labels containing the correct one for each data point, a simpler setting than requiring a fully supervised training dataset. We study and extend when needed state-of-the-art multi-class classification methods, such as SVM, kNN, prototype-based, logistic regression and ensemble methods, to the partial label learning framework. Moreover, we study the effect of incorporating the structure of the label set into the methods. We focus particularly on the hierarchical structure of the labels, as commonly observed in developmental processes. We show, on simulated and real datasets, that these extensions enable to learn from partially labeled data, and perform predictions with high accuracy, particularly with a nonlinear prototype-based method. We demonstrate that the performances of our methods trained with partially annotated data reach the same performance as fully supervised data. Finally, we study the level of uncertainty present in the partially annotated data, and derive some prescriptive results on the effect of this uncertainty on the accuracy of the partial label learning methods. Overall our findings show how hierarchical and non-hierarchical partial label learning strategies can help solve the problem of automated classification of single-cell transcriptomic profiles, interestingly these methods rely on a much less stringent type of annotated datasets compared to fully supervised learning methods.Author summary: Recent years have witnessed an exponential increase in the amount of single-cell RNASeq data generated, particularly in studies of development. One of the major challenges is to identify individual cell types within the data. Expert knowledge is required to identify the relevant marker genes, tissue and timing that will enable the cell type identification. This information can be difficult to obtain and calls for automated cell type classification approaches. Classical classification techniques would solve this problem by training a classifier on a fully supervised dataset. However, this only pushes the problem further, as a dataset annotated at single-cell resolution is still needed for training. Here we propose instead to take advantage of the partial label learning framework which let us train our classifiers on a set of candidate labels per transcriptomic profile. This approach overcomes the need for a training dataset annotated at single-cell resolution. We show that we obtain classification accuracy similar to the fully supervised case. We explore the effect of varying the amount of partially labeled data and of considering the hierarchical structure of the label set (derived from the developmental processes) in the models on simulated and real biological datasets.

Suggested Citation

  • Malek Senoussi & Thierry Artieres & Paul Villoutreix, 2024. "Partial label learning for automated classification of single-cell transcriptomic profiles," PLOS Computational Biology, Public Library of Science, vol. 20(4), pages 1-28, April.
  • Handle: RePEc:plo:pcbi00:1012006
    DOI: 10.1371/journal.pcbi.1012006
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012006
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1012006&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1012006?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Mor Nitzan & Nikos Karaiskos & Nir Friedman & Nikolaus Rajewsky, 2019. "Gene expression cartography," Nature, Nature, vol. 576(7785), pages 132-137, December.
    2. Anna Klimovskaia & David Lopez-Paz & Léon Bottou & Maximilian Nickel, 2020. "Poincaré maps for analyzing complex hierarchies in single-cell data," Nature Communications, Nature, vol. 11(1), pages 1-9, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wenyi Yang & Pingping Wang & Shouping Xu & Tao Wang & Meng Luo & Yideng Cai & Chang Xu & Guangfu Xue & Jinhao Que & Qian Ding & Xiyun Jin & Yuexin Yang & Fenglan Pang & Boran Pang & Yi Lin & Huan Nie , 2024. "Deciphering cell–cell communication at single-cell resolution for spatial transcriptomics with subgraph-based graph attention network," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    2. Zhiyuan Yuan & Yisi Li & Minglei Shi & Fan Yang & Juntao Gao & Jianhua Yao & Michael Q. Zhang, 2022. "SOTIP is a versatile method for microenvironment modeling with spatial omics data," Nature Communications, Nature, vol. 13(1), pages 1-19, December.
    3. Honglei Ren & Benjamin L. Walker & Zixuan Cang & Qing Nie, 2022. "Identifying multicellular spatiotemporal organization of cells with SpaceFlow," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    4. Zoe Piran & Mor Nitzan, 2024. "SiFT: uncovering hidden biological processes by probabilistic filtering of single-cell data," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    5. Md Tauhidul Islam & Jen-Yeu Wang & Hongyi Ren & Xiaomeng Li & Masoud Badiei Khuzani & Shengtian Sang & Lequan Yu & Liyue Shen & Wei Zhao & Lei Xing, 2022. "Leveraging data-driven self-consistency for high-fidelity gene expression recovery," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    6. Zhiyuan Liu & Dafei Wu & Weiwei Zhai & Liang Ma, 2023. "SONAR enables cell type deconvolution with spatially weighted Poisson-Gamma model for spatial transcriptomics," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    7. Domonkos Pogány & Péter Antal, 2024. "Towards explainable interaction prediction: Embedding biological hierarchies into hyperbolic interaction space," PLOS ONE, Public Library of Science, vol. 19(3), pages 1-23, March.
    8. Manuel Neumann & Xiaocai Xu & Cezary Smaczniak & Julia Schumacher & Wenhao Yan & Nils Blüthgen & Thomas Greb & Henrik Jönsson & Jan Traas & Kerstin Kaufmann & Jose M. Muino, 2022. "A 3D gene expression atlas of the floral meristem based on spatial reconstruction of single nucleus RNA sequencing data," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    9. Kai Cao & Qiyu Gong & Yiguang Hong & Lin Wan, 2022. "A unified computational framework for single-cell data integration with optimal transport," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    10. Qihuang Zhang & Shunzhou Jiang & Amelia Schroeder & Jian Hu & Kejie Li & Baohong Zhang & David Dai & Edward B. Lee & Rui Xiao & Mingyao Li, 2023. "Leveraging spatial transcriptomics data to recover cell locations in single-cell RNA-seq with CeLEry," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    11. Jingyang Qian & Hudong Bao & Xin Shao & Yin Fang & Jie Liao & Zhuo Chen & Chengyu Li & Wenbo Guo & Yining Hu & Anyao Li & Yue Yao & Xiaohui Fan & Yiyu Cheng, 2024. "Simulating multiple variability in spatially resolved transcriptomics with scCube," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
    12. Yichun He & Xin Tang & Jiahao Huang & Jingyi Ren & Haowen Zhou & Kevin Chen & Albert Liu & Hailing Shi & Zuwan Lin & Qiang Li & Abhishek Aditham & Johain Ounadjela & Emanuelle I. Grody & Jian Shu & Ji, 2021. "ClusterMap for multi-scale clustering analysis of spatial gene expression," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    13. Clara Guijarro & Solène Song & Benoit Aigouy & Raphaël Clément & Paul Villoutreix & Robert G. Kelly, 2024. "Single-cell morphometrics reveals T-box gene-dependent patterns of epithelial tension in the Second Heart field," Nature Communications, Nature, vol. 15(1), pages 1-14, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1012006. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.