scHG: A supercell framework with high-order graph learning enables scalable multi-omics analysis

scHG: A supercell framework with high-order graph learning enables scalable multi-omics analysis

Author

Listed:

Yixiang Huang
Yuan Gan
Xinqi Gong

Abstract

Multi-omics profiling—spanning proteomics, transcriptomics, and additional omics data types—is rapidly advancing, providing increasingly detailed maps of cellular identity and function. Yet, identifying rare cell populations while maintaining computational tractability remains a major challenge in large-scale multi-omics clustering. Here, we introduce the supercell paradigm, in which expression-coherent cells are grouped into intermediate units that preserve weak but biologically meaningful local structure across omics layers, thereby improving sensitivity to rare populations that are often masked at the conventional cluster level. Supercells are constructed using angle-aware similarity metrics and second-order co-occurrence neighbors, with impurity cells pruned by degree centrality. Building on this idea, we develop scHG, a high-order graph learning framework with an omics-weighted optimizer that adaptively balances contributions from gene expression, surface proteins, and chromatin accessibility while remaining scalable on large datasets through sparse matrix optimization and iterative graph refinement. Across six benchmark datasets (up to 30672 cells), scHG consistently outperforms state-of-the-art methods, improving mean ARI and NMI by 3.97% and 3.54%, respectively, while reducing runtime by 26.40%. Beyond overall clustering accuracy, scHG resolves fine-grained heterogeneity within conventionally defined T-cell populations and, importantly, uncovers rare populations—including dendritic-cell populations and NK-like B cells—that remain hidden under standard clustering pipelines. These results demonstrate that supercells provide not only an efficient intermediate representation for large-scale multi-omics integration, but also a practical mechanism for rare-cell detection.Author summary: Modern single-cell technologies can measure multiple molecular layers from the same cell, such as RNA, surface proteins, and chromatin accessibility. These rich “multi-omics” profiles promise a more complete view of cellular identity, but they also create a practical bottleneck: existing methods can be slow on large datasets and often miss rare yet important cell populations. We present scHG, a fast and accurate framework that compresses many similar cells into intermediate units called “supercells” and then learns relationships among supercells using a high-order graph model. This design keeps biologically meaningful structure while dramatically reducing computational cost, making large-scale analyses feasible on standard hardware. In benchmarks spanning multiple multi-omics datasets, scHG improves clustering accuracy and runs substantially faster than state-of-the-art approaches. Beyond overall performance, scHG reveals fine-grained immune subtypes within T cells and highlights rare populations—such as dendritic cells and NK-like B cells—that are easily diluted in conventional cluster-level analysis. By combining efficiency with sensitivity to subtle and rare signals, scHG helps researchers map cellular diversity more reliably in complex multi-omics studies.

Suggested Citation

Yixiang Huang & Yuan Gan & Xinqi Gong, 2026. "scHG: A supercell framework with high-order graph learning enables scalable multi-omics analysis," PLOS Computational Biology, Public Library of Science, vol. 22(5), pages 1-36, May.

Handle: RePEc:plo:pcbi00:1013851
DOI: 10.1371/journal.pcbi.1013851

Download full text from publisher

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1013851. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

scHG: A supercell framework with high-order graph learning enables scalable multi-omics analysis

Author

Abstract

Suggested Citation

Download full text from publisher

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data