Author
Listed:
- Yuntian Wu
- Haoran Hu
- Wei Chen
- Johann E Gudjonsson
- Lam C Tsoi
- Xiaoquan Wen
Abstract
Multiplets arise when multiple cells are captured within the same droplet during single-cell sequencing, producing hybrid molecular profiles that can distort downstream analyses. Detecting multiplets in single-nucleus ATAC-seq (snATAC-seq) data is particularly challenging due to the sparsity and overdispersion of chromatin accessibility measurements. Moreover, computational approaches that jointly leverage evidence across multiple features and data modalities are highly desirable for multiplet detection. We introduce SEBULA, a semi-parametric empirical Bayes framework for multiplet detection in snATAC-seq data. SEBULA models the singlet background directly from observed chromatin accessibility signals using fragment-level information from snATAC-seq data. This approach avoids reliance on synthetic doublets and produces classification probabilities that enable direct false discovery rate control. We further extend SEBULA to integrate complementary evidence from additional features and modalities, such as simultaneously measured gene expression profiles. Across simulations and seven multimodal datasets with hashing-based ground truth, SEBULA demonstrates improved sensitivity and specificity compared with existing snATAC-seq methods. The evidence integration framework achieves comparable or superior performance relative to state-of-the-art multiomic approaches while maintaining computational efficiency.Author summary: Single-cell sequencing has revolutionized biology by allowing researchers to look at the genetic activity of thousands of individual cells simultaneously. However, common technical artifacts occur when two or more cells are accidentally trapped in the same reaction droplet. These “multiplets” create a blurred, hybrid signal that can lead researchers to false biological conclusions. Detecting these artifacts is especially difficult in data that measures chromatin accessibility (i.e., the openness of DNA), which is often sparse and noisy. We developed SEBULA, a new computational tool designed to solve this problem. Unlike existing methods that rely on simulated data to guess what a multiplet looks like, SEBULA learns the characteristics of true single cells directly from the observed data. This makes it more accurate at spotting subtle multiplet signals that other tools miss. Furthermore, SEBULA is built for the latest multimodal technologies that measure different types of biological information at once. It can combine evidence from multiple sources, such as gene activity and DNA structure, to confirm if a droplet contains a single cell or multiple cells. By providing a more reliable way to identify and remove multiplets, SEBULA helps improve the reliability of downstream analyses in single-cell studies.
Suggested Citation
Yuntian Wu & Haoran Hu & Wei Chen & Johann E Gudjonsson & Lam C Tsoi & Xiaoquan Wen, 2026.
"Semi-parametric empirical bayes method for multiplet detection in snATAC-seq with probabilistic multi-omic integration,"
PLOS Computational Biology, Public Library of Science, vol. 22(4), pages 1-16, April.
Handle:
RePEc:plo:pcbi00:1013653
DOI: 10.1371/journal.pcbi.1013653
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1013653. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.