IDEAS home Printed from https://ideas.repec.org/a/eee/thpobi/v163y2025icp50-61.html

Exact calculation of the expected SFS in structured populations

Author

Listed:
  • Arredondo, Armando
  • Corujo, Josué
  • Noûs, Camille
  • Boitard, Simon
  • Chikhi, Lounès
  • Mazet, Olivier

Abstract

The Site Frequency Spectrum (SFS), summary statistic of the distribution of derived allele frequencies in a sample of DNA sequences, provides information about genetic variation and can be used to make population inferences. The exact calculation of the expected SFS in a panmictic population under the infinite-site model of mutation has been known in the Markovian coalescent theory for decades, but its generalization to the structured coalescent is hampered by the almost exponential growth of the states space. We show here how to obtain this expected SFS as the solution of a linear system. More precisely, we propose a complete algorithmic procedure, from how to build a suitable state space and sort it, to how to take advantage of the sparsity of the rate matrix and to solve numerically the linear system using an iterative method. We then build a specialization for the simplest case of the symmetrical n-island model to arrive at a ready-to-use software called SISiFS from which a demographic parameters inference framework could easily be developed.

Suggested Citation

  • Arredondo, Armando & Corujo, Josué & Noûs, Camille & Boitard, Simon & Chikhi, Lounès & Mazet, Olivier, 2025. "Exact calculation of the expected SFS in structured populations," Theoretical Population Biology, Elsevier, vol. 163(C), pages 50-61.
  • Handle: RePEc:eee:thpobi:v:163:y:2025:i:c:p:50-61
    DOI: 10.1016/j.tpb.2025.03.003
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0040580925000231
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.tpb.2025.03.003?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Hobolth, Asger & Siri-Jégousse, Arno & Bladt, Mogens, 2019. "Phase-type distributions in population genetics," Theoretical Population Biology, Elsevier, vol. 127(C), pages 16-32.
    2. Richard R Hudson, 2015. "A New Proof of the Expected Frequency Spectrum under the Standard Neutral Model," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-5, July.
    3. Heng Li & Richard Durbin, 2011. "Inference of human population history from individual whole-genome sequences," Nature, Nature, vol. 475(7357), pages 493-496, July.
    4. repec:plo:pgen00:1003905 is not listed on IDEAS
    5. Uyenoyama, Marcy K. & Takebayashi, Naoki & Kumagai, Seiji, 2019. "Inductive determination of allele frequency spectrum probabilities in structured populations," Theoretical Population Biology, Elsevier, vol. 129(C), pages 148-159.
    6. Chen, Hua, 2012. "The joint allele frequency spectrum of multiple populations: A coalescent theory approach," Theoretical Population Biology, Elsevier, vol. 81(2), pages 179-195.
    7. repec:plo:pgen00:1000695 is not listed on IDEAS
    8. Ganapathy, Ganeshkumar & Uyenoyama, Marcy K., 2009. "Site frequency spectra from genomic SNP surveys," Theoretical Population Biology, Elsevier, vol. 75(4), pages 346-354.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hobolth, Asger & Boitard, Simon & Futschik, Andreas & Leblois, Raphael, 2025. "A matrix-analytical sampling formula for time-homogeneous coalescent processes under the infinite sites mutation model," Theoretical Population Biology, Elsevier, vol. 163(C), pages 62-79.
    2. Hobolth, Asger & Rivas-González, Iker & Bladt, Mogens & Futschik, Andreas, 2024. "Phase-type distributions in mathematical population genetics: An emerging framework," Theoretical Population Biology, Elsevier, vol. 157(C), pages 14-32.
    3. Legried, Brandon & Terhorst, Jonathan, 2022. "Rates of convergence in the two-island and isolation-with-migration models," Theoretical Population Biology, Elsevier, vol. 147(C), pages 16-27.
    4. repec:plo:pgen00:1003905 is not listed on IDEAS
    5. Wences, Alejandro H. & Peñaloza, Lizbeth & Steinrücken, Matthias & Siri-Jégousse, Arno, 2025. "The TMRCA of general genealogies in populations with deterministically varying size," Theoretical Population Biology, Elsevier, vol. 165(C), pages 1-9.
    6. Gideon S Bradburd & Peter L Ralph & Graham M Coop, 2016. "A Spatial Framework for Understanding Population Structure and Admixture," PLOS Genetics, Public Library of Science, vol. 12(1), pages 1-38, January.
    7. Riccardo De Bin & Vegard Grødem Stikbakke, 2023. "A boosting first-hitting-time model for survival analysis in high-dimensional settings," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 29(2), pages 420-440, April.
    8. Juraj Bergman & Rasmus Ø. Pedersen & Erick J. Lundgren & Rhys T. Lemoine & Sophie Monsarrat & Elena A. Pearce & Mikkel H. Schierup & Jens-Christian Svenning, 2023. "Worldwide Late Pleistocene and Early Holocene population declines in extant megafauna are associated with Homo sapiens expansion rather than climate change," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    9. Mikula, Lynette Caitlin & Vogl, Claus, 2024. "The expected sample allele frequencies from populations of changing size via orthogonal polynomials," Theoretical Population Biology, Elsevier, vol. 157(C), pages 55-85.
    10. Costa, Rui J. & Wilkinson-Herbots, Hilde M., 2021. "Inference of gene flow in the process of speciation: Efficient maximum-likelihood implementation of a generalised isolation-with-migration model," Theoretical Population Biology, Elsevier, vol. 140(C), pages 1-15.
    11. Per Unneberg & Mårten Larsson & Anna Olsson & Ola Wallerman & Anna Petri & Ignas Bunikis & Olga Vinnere Pettersson & Chiara Papetti & Astthor Gislason & Henrik Glenner & Joan E. Cartes & Leocadio Blan, 2024. "Ecological genomics in the Northern krill uncovers loci for local adaptation across ocean basins," Nature Communications, Nature, vol. 15(1), pages 1-29, December.
    12. Temple, Seth D. & Thompson, Elizabeth A., 2025. "Identity-by-descent segments in large samples," Theoretical Population Biology, Elsevier, vol. 165(C), pages 10-21.
    13. Ya-Mei Ding & Xiao-Xu Pang & Yu Cao & Wei-Ping Zhang & Susanne S. Renner & Da-Yong Zhang & Wei-Ning Bai, 2023. "Genome structure-based Juglandaceae phylogenies contradict alignment-based phylogenies and substitution rates vary with DNA repair genes," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    14. Romain Fournier & Zoi Tsangalidou & David Reich & Pier Francesco Palamara, 2023. "Haplotype-based inference of recent effective population size in modern and ancient DNA samples," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    15. Steinrücken, Matthias & Paul, Joshua S. & Song, Yun S., 2013. "A sequentially Markov conditional sampling distribution for structured populations with migration and recombination," Theoretical Population Biology, Elsevier, vol. 87(C), pages 51-61.
    16. repec:plo:pgen00:1008895 is not listed on IDEAS
    17. Barton, N.H. & Etheridge, A.M. & Kelleher, J. & Véber, A., 2013. "Inference in two dimensions: Allele frequencies versus lengths of shared sequence blocks," Theoretical Population Biology, Elsevier, vol. 87(C), pages 105-119.
    18. Bo Friis Nielsen, 2022. "Characterisation of multivariate phase type distributions," Queueing Systems: Theory and Applications, Springer, vol. 100(3), pages 229-231, April.
    19. Guangping Huang & Lingyun Song & Xin Du & Xin Huang & Fuwen Wei, 2023. "Evolutionary genomics of camouflage innovation in the orchid mantis," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    20. Jörn Bethune & April Kleppe & Søren Besenbacher, 2022. "A method to build extended sequence context models of point mutations and indels," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    21. Chen, Hua & Hey, Jody & Slatkin, Montgomery, 2015. "A hidden Markov model for investigating recent positive selection through haplotype structure," Theoretical Population Biology, Elsevier, vol. 99(C), pages 18-30.
    22. Wilton, Peter R. & Baduel, Pierre & Landon, Matthieu M. & Wakeley, John, 2017. "Population structure and coalescence in pedigrees: Comparisons to the structured coalescent and a framework for inference," Theoretical Population Biology, Elsevier, vol. 115(C), pages 1-12.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:thpobi:v:163:y:2025:i:c:p:50-61. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.sciencedirect.com/journal/theoretical-population-biology .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.