IDEAS home Printed from https://ideas.repec.org/a/eee/thpobi/v163y2025icp62-79.html
   My bibliography  Save this article

A matrix-analytical sampling formula for time-homogeneous coalescent processes under the infinite sites mutation model

Author

Listed:
  • Hobolth, Asger
  • Boitard, Simon
  • Futschik, Andreas
  • Leblois, Raphael

Abstract

In this paper we develop a general framework for calculating the probability of a genetic sample under a time-homogeneous coalescent process and the infinite sites mutation model. The evolutionary model that we consider can be characterized as a two-step procedure: A coalescent process that describes the ancestral relatedness of the samples and a sprinkling of mutations in separate sites on the ancestral tree according to a Poisson process. The coalescent process is defined using multivariate phase-type theory. The requirements are a rate matrix that determines the transition rates between the ancestral states, an initial state probability vector, and a reward matrix that informs about the characteristics of the ancestral states. For example, the reward matrix could contain information about the number of singleton, doubleton or higher-order lineages in the ancestral states. We analyze the probability generating function for the evolutionary model as a function of the initial state probability vector, the transition rate matrix, the reward matrix, and the mutation rate. The matrix-analytical expression of the probability generating function allows us to develop a general method for calculating the probability of a population genetic data set. We demonstrate that the method is computationally attractive for a small number of mutations and provide a simple and easy-to-implement algorithm for determining the probability of a sample from the evolutionary model. The method is computationally stable and only involves a single inverse matrix operation, matrix multiplications and matrix additions. We provide comprehensive understanding of the procedure by detailed calculations and discussions of several elementary examples. These examples include different sample representations (labeled samples and the site frequency spectrum) and different demographic and genetic models (the structured coalescent and the Beta-coalescent). We apply the sampling formula to calculate probabilities of spectra for the Kingman coalescent and the Beta-coalescent. Even for a small number of samples and mutations we find that the probabilities for spectra vary in huge orders of magnitudes. We compare the probabilities of the spectra to the values of Tajima’s D-statistics, and find that the D-statistic is a poor predictor for the probability of a spectrum. Finally, we investigate how the probabilities of the spectra vary with the parametrization of the Beta-coalescent.

Suggested Citation

  • Hobolth, Asger & Boitard, Simon & Futschik, Andreas & Leblois, Raphael, 2025. "A matrix-analytical sampling formula for time-homogeneous coalescent processes under the infinite sites mutation model," Theoretical Population Biology, Elsevier, vol. 163(C), pages 62-79.
  • Handle: RePEc:eee:thpobi:v:163:y:2025:i:c:p:62-79
    DOI: 10.1016/j.tpb.2025.03.002
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S004058092500019X
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.tpb.2025.03.002?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:thpobi:v:163:y:2025:i:c:p:62-79. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.sciencedirect.com/journal/theoretical-population-biology .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.