IDEAS home Printed from https://ideas.repec.org/a/eee/thpobi/v163y2025icp50-61.html
   My bibliography  Save this article

Exact calculation of the expected SFS in structured populations

Author

Listed:
  • Arredondo, Armando
  • Corujo, Josué
  • Noûs, Camille
  • Boitard, Simon
  • Chikhi, Lounès
  • Mazet, Olivier

Abstract

The Site Frequency Spectrum (SFS), summary statistic of the distribution of derived allele frequencies in a sample of DNA sequences, provides information about genetic variation and can be used to make population inferences. The exact calculation of the expected SFS in a panmictic population under the infinite-site model of mutation has been known in the Markovian coalescent theory for decades, but its generalization to the structured coalescent is hampered by the almost exponential growth of the states space. We show here how to obtain this expected SFS as the solution of a linear system. More precisely, we propose a complete algorithmic procedure, from how to build a suitable state space and sort it, to how to take advantage of the sparsity of the rate matrix and to solve numerically the linear system using an iterative method. We then build a specialization for the simplest case of the symmetrical n-island model to arrive at a ready-to-use software called SISiFS from which a demographic parameters inference framework could easily be developed.

Suggested Citation

  • Arredondo, Armando & Corujo, Josué & Noûs, Camille & Boitard, Simon & Chikhi, Lounès & Mazet, Olivier, 2025. "Exact calculation of the expected SFS in structured populations," Theoretical Population Biology, Elsevier, vol. 163(C), pages 50-61.
  • Handle: RePEc:eee:thpobi:v:163:y:2025:i:c:p:50-61
    DOI: 10.1016/j.tpb.2025.03.003
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0040580925000231
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.tpb.2025.03.003?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Hobolth, Asger & Siri-Jégousse, Arno & Bladt, Mogens, 2019. "Phase-type distributions in population genetics," Theoretical Population Biology, Elsevier, vol. 127(C), pages 16-32.
    2. Richard R Hudson, 2015. "A New Proof of the Expected Frequency Spectrum under the Standard Neutral Model," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-5, July.
    3. Heng Li & Richard Durbin, 2011. "Inference of human population history from individual whole-genome sequences," Nature, Nature, vol. 475(7357), pages 493-496, July.
    4. repec:plo:pgen00:1003905 is not listed on IDEAS
    5. Uyenoyama, Marcy K. & Takebayashi, Naoki & Kumagai, Seiji, 2019. "Inductive determination of allele frequency spectrum probabilities in structured populations," Theoretical Population Biology, Elsevier, vol. 129(C), pages 148-159.
    6. Chen, Hua, 2012. "The joint allele frequency spectrum of multiple populations: A coalescent theory approach," Theoretical Population Biology, Elsevier, vol. 81(2), pages 179-195.
    7. repec:plo:pgen00:1000695 is not listed on IDEAS
    8. Ganapathy, Ganeshkumar & Uyenoyama, Marcy K., 2009. "Site frequency spectra from genomic SNP surveys," Theoretical Population Biology, Elsevier, vol. 75(4), pages 346-354.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hobolth, Asger & Boitard, Simon & Futschik, Andreas & Leblois, Raphael, 2025. "A matrix-analytical sampling formula for time-homogeneous coalescent processes under the infinite sites mutation model," Theoretical Population Biology, Elsevier, vol. 163(C), pages 62-79.
    2. Hobolth, Asger & Rivas-González, Iker & Bladt, Mogens & Futschik, Andreas, 2024. "Phase-type distributions in mathematical population genetics: An emerging framework," Theoretical Population Biology, Elsevier, vol. 157(C), pages 14-32.
    3. Legried, Brandon & Terhorst, Jonathan, 2022. "Rates of convergence in the two-island and isolation-with-migration models," Theoretical Population Biology, Elsevier, vol. 147(C), pages 16-27.
    4. repec:plo:pgen00:1003905 is not listed on IDEAS
    5. Riccardo De Bin & Vegard Grødem Stikbakke, 2023. "A boosting first-hitting-time model for survival analysis in high-dimensional settings," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 29(2), pages 420-440, April.
    6. Mikula, Lynette Caitlin & Vogl, Claus, 2024. "The expected sample allele frequencies from populations of changing size via orthogonal polynomials," Theoretical Population Biology, Elsevier, vol. 157(C), pages 55-85.
    7. Costa, Rui J. & Wilkinson-Herbots, Hilde M., 2021. "Inference of gene flow in the process of speciation: Efficient maximum-likelihood implementation of a generalised isolation-with-migration model," Theoretical Population Biology, Elsevier, vol. 140(C), pages 1-15.
    8. Ya-Mei Ding & Xiao-Xu Pang & Yu Cao & Wei-Ping Zhang & Susanne S. Renner & Da-Yong Zhang & Wei-Ning Bai, 2023. "Genome structure-based Juglandaceae phylogenies contradict alignment-based phylogenies and substitution rates vary with DNA repair genes," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    9. Romain Fournier & Zoi Tsangalidou & David Reich & Pier Francesco Palamara, 2023. "Haplotype-based inference of recent effective population size in modern and ancient DNA samples," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    10. Guangping Huang & Lingyun Song & Xin Du & Xin Huang & Fuwen Wei, 2023. "Evolutionary genomics of camouflage innovation in the orchid mantis," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    11. Jörn Bethune & April Kleppe & Søren Besenbacher, 2022. "A method to build extended sequence context models of point mutations and indels," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    12. Blath, Jochen & Buzzoni, Eugenio & Koskela, Jere & Wilke Berenguer, Maite, 2020. "Statistical tools for seed bank detection," Theoretical Population Biology, Elsevier, vol. 132(C), pages 1-15.
    13. Carmi, Shai & Wilton, Peter R. & Wakeley, John & Pe’er, Itsik, 2014. "A renewal theory approach to IBD sharing," Theoretical Population Biology, Elsevier, vol. 97(C), pages 35-48.
    14. Aoki, Kenichi & Wakano, Joe Yuichiro, 2022. "Hominin forager technology, food sharing, and diet breadth," Theoretical Population Biology, Elsevier, vol. 144(C), pages 37-48.
    15. Yupeng Sang & Zhiqin Long & Xuming Dan & Jiajun Feng & Tingting Shi & Changfu Jia & Xinxin Zhang & Qiang Lai & Guanglei Yang & Hongying Zhang & Xiaoting Xu & Huanhuan Liu & Yuanzhong Jiang & Pär K. In, 2022. "Genomic insights into local adaptation and future climate-induced vulnerability of a keystone forest tree in East Asia," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    16. Uyenoyama, Marcy K. & Takebayashi, Naoki & Kumagai, Seiji, 2020. "Allele frequency spectra in structured populations: Novel-allele probabilities under the labelled coalescent," Theoretical Population Biology, Elsevier, vol. 133(C), pages 130-140.
    17. Kimmel, Marek & Wojdyła, Tomasz, 2016. "Genetic demographic networks: Mathematical model and applications," Theoretical Population Biology, Elsevier, vol. 111(C), pages 75-86.
    18. Jason Flannick & Joshua M Korn & Pierre Fontanillas & George B Grant & Eric Banks & Mark A Depristo & David Altshuler, 2012. "Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation," PLOS Computational Biology, Public Library of Science, vol. 8(7), pages 1-13, July.
    19. Attias, Laurent & Siess, Vincent & Labbé, Stéphane, 2025. "An agile modeling framework for population dynamics," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 234(C), pages 113-134.
    20. Yee Wen Low & Sitaram Rajaraman & Crystal M. Tomlin & Joffre Ali Ahmad & Wisnu H. Ardi & Kate Armstrong & Parusuraman Athen & Ahmad Berhaman & Ruth E. Bone & Martin Cheek & Nicholas R. W. Cho & Le Min, 2022. "Genomic insights into rapid speciation within the world’s largest tree genus Syzygium," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    21. José Cerca & Bent Petersen & José Miguel Lazaro-Guevara & Angel Rivera-Colón & Siri Birkeland & Joel Vizueta & Siyu Li & Qionghou Li & João Loureiro & Chatchai Kosawang & Patricia Jaramillo Díaz & Gon, 2022. "The genomic basis of the plant island syndrome in Darwin’s giant daisies," Nature Communications, Nature, vol. 13(1), pages 1-13, December.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:thpobi:v:163:y:2025:i:c:p:50-61. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.sciencedirect.com/journal/theoretical-population-biology .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.