IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1014014.html

From noise to models to numbers: Evaluating negative binomial models and parameter estimations in single-cell RNA-seq

Author

Listed:
  • Yiling Wang
  • Zhanpeng Shu
  • Zhixing Cao
  • Ramon Grima

Abstract

The Negative Binomial (NB) distribution is widely used to approximate transcript count distributions in single-cell RNA sequencing (scRNA-seq) data, yet the reason for its ubiquity is not fully understood. Here, we employ a computationally efficient model selection technique to map the relationship between the best-fit models – Beta-Poisson (Telegraph), NB, and Poisson – and the kinetic parameters that govern gene expression stochasticity. Our findings reveal that the NB distribution closely approximates simulated data (incorporating both biological and technical noise) within an intermediate range of the sum of the gene activation and inactivation rates normalized by the mRNA degradation rate. This range expands with decreasing mean expression, increasing technical noise, and larger sample sizes. The results imply that: (i) good NB fits occur in diverse parameter regimes without exclusively indicating transcriptional bursting; (ii) for small sample sizes, biological noise predominantly shapes the NB profile even when technical noise is present; (iii) under steady-state conditions, gene-specific parameters (burst size and frequency) estimated in regions where the NB model fits well, typically show large relative errors, even after corrections for technical noise, and (iv) gene ranking by burst frequency remains reliably accurate, suggesting that burst parameters are most informative in a relative sense. Finally, applying technical-noise–corrected model fitting to scRNA-seq data confirms that a substantial fraction of mammalian genes fall within these NB-fitting regimes, despite lacking transcriptional bursting.Author summary: Single-cell RNA sequencing (scRNA-seq) measures mRNA molecule counts in individual cells. For most genes, these counts are well fit by a negative binomial (NB) distribution, and NB fits are often interpreted as evidence for transcriptional bursting. We asked when an NB model is expected to arise from a mechanistic gene-expression process, and what biological meaning can be safely assigned to its parameters. We combine the standard two-state telegraph model of promoter switching with a binomial model of transcript capture, and introduce the approximate expected Bayesian information criterion (aeBIC). aeBIC predicts which distribution—telegraph, NB, or Poisson—would be chosen by likelihood/BIC model selection. We show that NB fits are optimal in an intermediate regime of promoter switching relative to mRNA decay, and that this regime expands for low mean expression, larger sample sizes, and increased cell-to-cell variability in capture probability. Consequently, excellent NB fits can occur well outside the classical bursting limit. In these regimes, estimating burst size and burst frequency from NB parameters can incur large absolute errors, although relative comparisons are more robust: ranking genes by inferred burst frequency is usually preserved. Our results provide practical guidance for model choice and for interpreting fitted burst parameters in single-cell genomics.

Suggested Citation

  • Yiling Wang & Zhanpeng Shu & Zhixing Cao & Ramon Grima, 2026. "From noise to models to numbers: Evaluating negative binomial models and parameter estimations in single-cell RNA-seq," PLOS Computational Biology, Public Library of Science, vol. 22(3), pages 1-37, March.
  • Handle: RePEc:plo:pcbi00:1014014
    DOI: 10.1371/journal.pcbi.1014014
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014014
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1014014&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1014014?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Arjun Raj & Charles S Peskin & Daniel Tranchina & Diana Y Vargas & Sanjay Tyagi, 2006. "Stochastic mRNA Synthesis in Mammalian Cells," PLOS Biology, Public Library of Science, vol. 4(10), pages 1-13, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mohammad Soltani & Cesar A Vargas-Garcia & Duarte Antunes & Abhyudai Singh, 2016. "Intercellular Variability in Protein Levels from Stochastic Expression and Noisy Cell Cycle Processes," PLOS Computational Biology, Public Library of Science, vol. 12(8), pages 1-23, August.
    2. Amy L. Hughes & Aleksander T. Szczurek & Jessica R. Kelley & Anna Lastuvkova & Anne H. Turberfield & Emilia Dimitrova & Neil P. Blackledge & Robert J. Klose, 2023. "A CpG island-encoded mechanism protects genes from premature transcription termination," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    3. Alistair N Boettiger & Peter L Ralph & Steven N Evans, 2011. "Transcriptional Regulation: Effects of Promoter Proximal Pausing on Speed, Synchrony and Reliability," PLOS Computational Biology, Public Library of Science, vol. 7(5), pages 1-14, May.
    4. Matthieu Wyart & David Botstein & Ned S Wingreen, 2010. "Evaluating Gene Expression Dynamics Using Pairwise RNA FISH Data," PLOS Computational Biology, Public Library of Science, vol. 6(11), pages 1-14, November.
    5. Qiwen Sun & Zhaohang Cai & Chunjuan Zhu, 2022. "A Novel Dynamical Regulation of mRNA Distribution by Cross-Talking Pathways," Mathematics, MDPI, vol. 10(9), pages 1-14, May.
    6. Stuart Aitken & Marie-Cécile Robert & Ross D Alexander & Igor Goryanin & Edouard Bertrand & Jean D Beggs, 2010. "Processivity and Coupling in Messenger RNA Transcription," PLOS ONE, Public Library of Science, vol. 5(1), pages 1-12, January.
    7. Singh, Abhyudai & Vahdat, Zahra & Xu, Zikai, 2019. "Time-triggered stochastic hybrid systems with two timer-dependent resets," OSF Preprints u8fzg, Center for Open Science.
    8. Muir Morrison & Manuel Razo-Mejia & Rob Phillips, 2021. "Reconciling kinetic and thermodynamic models of bacterial transcription," PLOS Computational Biology, Public Library of Science, vol. 17(1), pages 1-30, January.
    9. Elijah Roberts & Andrew Magis & Julio O Ortiz & Wolfgang Baumeister & Zaida Luthey-Schulten, 2011. "Noise Contributions in an Inducible Genetic Switch: A Whole-Cell Simulation Study," PLOS Computational Biology, Public Library of Science, vol. 7(3), pages 1-21, March.
    10. Ross D. Jones & Yili Qian & Katherine Ilia & Benjamin Wang & Michael T. Laub & Domitilla Del Vecchio & Ron Weiss, 2022. "Robust and tunable signal processing in mammalian cells via engineered covalent modification cycles," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    11. repec:plo:pcbi00:1000506 is not listed on IDEAS
    12. Xinyu Hu & Bob van Sluijs & Óscar García-Blay & Yury Stepanov & Koen Rietrae & Wilhelm T. S. Huck & Maike M. K. Hansen, 2024. "ARTseq-FISH reveals position-dependent differences in gene expression of micropatterned mESCs," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    13. Song, Yi & Xu, Wei & Wei, Wei & Niu, Lizhi, 2023. "Dynamical transition of phenotypic states in breast cancer system with Lévy noise," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 627(C).
    14. Eva Kaulich & Quinn Waselenchuk & Nicole Fürst & Kristina Desch & Janus Mosbacher & Elena Ciirdaeva & Marcel Juengling & Roshni Ray & Belquis Nassim-Assir & Georgi Tushev & Julian D. Langer & Erin M. , 2025. "An integrated transcriptomic and proteomic map of the mouse hippocampus at synaptic resolution," Nature Communications, Nature, vol. 16(1), pages 1-22, December.
    15. Marc S Sherman & Barak A Cohen, 2014. "A Computational Framework for Analyzing Stochasticity in Gene Expression," PLOS Computational Biology, Public Library of Science, vol. 10(5), pages 1-13, May.
    16. Jingyao Wang & Shihe Zhang & Hongfang Lu & Heng Xu, 2022. "Differential regulation of alternative promoters emerges from unified kinetics of enhancer-promoter interaction," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    17. Anissa Guillemin & Ronan Duchesne & Fabien Crauste & Sandrine Gonin-Giraud & Olivier Gandrillon, 2019. "Drugs modulating stochastic gene expression affect the erythroid differentiation process," PLOS ONE, Public Library of Science, vol. 14(11), pages 1-19, November.
    18. Rajesh Ramaswamy & Ivo F Sbalzarini & Nélido González-Segredo, 2011. "Noise-Induced Modulation of the Relaxation Kinetics around a Non-Equilibrium Steady State of Non-Linear Chemical Reaction Networks," PLOS ONE, Public Library of Science, vol. 6(1), pages 1-10, January.
    19. Anton J M Larsson & Christoph Ziegenhain & Michael Hagemann-Jensen & Björn Reinius & Tina Jacob & Tim Dalessandri & Gert-Jan Hendriks & Maria Kasper & Rickard Sandberg, 2021. "Transcriptional bursts explain autosomal random monoallelic expression and affect allelic imbalance," PLOS Computational Biology, Public Library of Science, vol. 17(3), pages 1-16, March.
    20. Chen, Aimin & Tian, Tianhai & Chen, Yiren & Zhou, Tianshou, 2022. "Stochastic analysis of a complex gene-expression model," Chaos, Solitons & Fractals, Elsevier, vol. 160(C).
    21. Zachary R Fox & Brian Munsky, 2019. "The finite state projection based Fisher information matrix approach to estimate information and optimize single-cell experiments," PLOS Computational Biology, Public Library of Science, vol. 15(1), pages 1-23, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1014014. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.