Author
Listed:
- Hui Yuan
- Mingzhu Liu
- Yushan Qiu
- Wai-Ki Ching
- Quan Zou
Abstract
The development of single-cell multi-omics sequencing technologies has enabled the simultaneous analysis of multi-omics data within the same cell. Accurate clustering of these cells is crucial for downstream analyses of complex biological functions. Despite significant advances in multi-omics integration approaches, current methodologies exhibit two major limitations. First, they inadequately incorporate prior biological knowledge from various omic layers. Second, these methods often conduct independent dimensionality reduction on individual omic datasets, thereby failing to capture the intrinsic complementary information and potentially overlooking crucial cross-platform interactions. Motivated by these, this study investigates a non-negative matrix factorization model called PLNMFG, which integrates the unified latent representation learning that retains the features between and within omics and the cluster structure learning that retains the intrinsic structure of the data into one joint framework. Specially, PLNMFG performs adaptive imputation to handle dropout events and uses prior pseudo-labels as constraints during the process of collective non-negative matrix factorization, as a result, a more robust latent representation that preserves the double similarity information is obtained. Graph Laplacian constraint is applied during clustering which further preserves structure characteristic of multi-omics data. In addition, the weight of each omic is adaptively learned based on the omic contribution. A series of experiments on 8 benchmark datasets show that our model performs well in terms of clustering accuracy and computational efficiency.Author summary: With the rapid advancement of biotechnology, we can obtain single-cell multi-omics data including genomics, transcriptomics, epigenomics, proteomics, and metabolomics. Single-cell clustering based on these omics data can help to understand the cell heterogeneity, enabling more precise analysis of the human body at the individual cell level, thereby advancing comprehension of human systems. However, because of the high-dimensional and sparse characteristics of single-cell multi-omics data, the clustering performance is generally poor. In this paper, pseudo-label guided non-negative matrix factorization model with graph constraint (PLNMFG) is proposed for analyzing single-cell multi-omics data. It is the first time to integrate pseudo-labels, imputation and clustering based on non-negative matrix factorization and it can be conducted the different task simultaneously in a unified manner. PLNMFG combines imputation techniques with non-negative matrix factorization to further enhance clustering accuracy. It applies an adaptive omics weighting strategy to match the importance of each omic layer, giving more influence to critical omics during the clustering process. And PLNMFG employs collective matrix decomposition method based on pseudo-labeling constraints and thus avoids the traditional computationally intensive feature decomposition and similarity graph construction. Furthermore, PLNMFG applies manifold constraints in the clustering process to further preserve the data structure, it simultaneously learns the latent representation and clustering structure in the same framework, making the latent representation more suitable for clustering. Experimental results on eight different datasets indicate that PLNMFG method achieves outstanding clustering performance, fully validating its effectiveness and generalization ability.
Suggested Citation
Hui Yuan & Mingzhu Liu & Yushan Qiu & Wai-Ki Ching & Quan Zou, 2025.
"PLNMFG: Pseudo-label guided non-negative matrix factorization model with graph constraint for single-cell multi-omics data clustering,"
PLOS Computational Biology, Public Library of Science, vol. 21(8), pages 1-17, August.
Handle:
RePEc:plo:pcbi00:1013375
DOI: 10.1371/journal.pcbi.1013375
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1013375. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.