IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1014014.html

From noise to models to numbers: Evaluating negative binomial models and parameter estimations in single-cell RNA-seq

Author

Listed:
  • Yiling Wang
  • Zhanpeng Shu
  • Zhixing Cao
  • Ramon Grima

Abstract

The Negative Binomial (NB) distribution is widely used to approximate transcript count distributions in single-cell RNA sequencing (scRNA-seq) data, yet the reason for its ubiquity is not fully understood. Here, we employ a computationally efficient model selection technique to map the relationship between the best-fit models – Beta-Poisson (Telegraph), NB, and Poisson – and the kinetic parameters that govern gene expression stochasticity. Our findings reveal that the NB distribution closely approximates simulated data (incorporating both biological and technical noise) within an intermediate range of the sum of the gene activation and inactivation rates normalized by the mRNA degradation rate. This range expands with decreasing mean expression, increasing technical noise, and larger sample sizes. The results imply that: (i) good NB fits occur in diverse parameter regimes without exclusively indicating transcriptional bursting; (ii) for small sample sizes, biological noise predominantly shapes the NB profile even when technical noise is present; (iii) under steady-state conditions, gene-specific parameters (burst size and frequency) estimated in regions where the NB model fits well, typically show large relative errors, even after corrections for technical noise, and (iv) gene ranking by burst frequency remains reliably accurate, suggesting that burst parameters are most informative in a relative sense. Finally, applying technical-noise–corrected model fitting to scRNA-seq data confirms that a substantial fraction of mammalian genes fall within these NB-fitting regimes, despite lacking transcriptional bursting.Author summary: Single-cell RNA sequencing (scRNA-seq) measures mRNA molecule counts in individual cells. For most genes, these counts are well fit by a negative binomial (NB) distribution, and NB fits are often interpreted as evidence for transcriptional bursting. We asked when an NB model is expected to arise from a mechanistic gene-expression process, and what biological meaning can be safely assigned to its parameters. We combine the standard two-state telegraph model of promoter switching with a binomial model of transcript capture, and introduce the approximate expected Bayesian information criterion (aeBIC). aeBIC predicts which distribution—telegraph, NB, or Poisson—would be chosen by likelihood/BIC model selection. We show that NB fits are optimal in an intermediate regime of promoter switching relative to mRNA decay, and that this regime expands for low mean expression, larger sample sizes, and increased cell-to-cell variability in capture probability. Consequently, excellent NB fits can occur well outside the classical bursting limit. In these regimes, estimating burst size and burst frequency from NB parameters can incur large absolute errors, although relative comparisons are more robust: ranking genes by inferred burst frequency is usually preserved. Our results provide practical guidance for model choice and for interpreting fitted burst parameters in single-cell genomics.

Suggested Citation

  • Yiling Wang & Zhanpeng Shu & Zhixing Cao & Ramon Grima, 2026. "From noise to models to numbers: Evaluating negative binomial models and parameter estimations in single-cell RNA-seq," PLOS Computational Biology, Public Library of Science, vol. 22(3), pages 1-37, March.
  • Handle: RePEc:plo:pcbi00:1014014
    DOI: 10.1371/journal.pcbi.1014014
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014014
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1014014&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1014014?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1014014. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.