IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1011769.html
   My bibliography  Save this article

A novel expectation-maximization approach to infer general diploid selection from time-series genetic data

Author

Listed:
  • Adam G Fine
  • Matthias Steinrücken

Abstract

Detecting and quantifying the strength of selection is a major objective in population genetics. Since selection acts over multiple generations, many approaches have been developed to detect and quantify selection using genetic data sampled at multiple points in time. Such time-series genetic data is commonly analyzed using Hidden Markov Models, but in most cases, under the assumption of additive selection. However, many examples of genetic variation exhibiting non-additive mechanisms exist, making it critical to develop methods that can characterize selection in more general scenarios. Here, we extend a previously introduced expectation-maximization algorithm for the inference of additive selection coefficients to the case of general diploid selection, in which the heterozygote and homozygote fitness are parameterized independently. We furthermore introduce a framework to identify bespoke modes of diploid selection from given data, a heuristic to account for variable population size, and a procedure for aggregating data across linked loci to increase power and robustness. Using extensive simulation studies, we find that our method accurately and efficiently estimates selection coefficients for different modes of diploid selection across a wide range of scenarios; however, power to classify the mode of selection is low unless selection is very strong. We apply our method to ancient DNA samples from Great Britain in the last 4,450 years and detect evidence for selection in six genomic regions, including the well-characterized LCT locus. Our work is the first genome-wide scan characterizing signals of general diploid selection.Author summary: Natural selection increases the likelihood that beneficial genetic variants are passed from parent to offspring and thus forms the basis of genetic adaptation to novel environments. Genomic data sampled at multiple timepoints, such as genetic material extracted from ancient remains (ancient DNA) or data from evolve and resequence experiments, can enable more precise identification of genetic variants subject to selective pressure than contemporary samples alone. However, most methods for identifying genetic variation under selection focus on additive selection, where the fitness of the heterozygote is exactly intermediate between the homozygotes. Leveraging genetic data at multiple timepoints, we develop a method to detect additive and non-additive selection as well as to infer the most likely dominance mechanism. We apply our methods to a dataset of ancient DNA from Great Britain dated less than 4,450 years before present and identify six regions with signals of recent selection, including one at the TFR2 locus that has not been previously reported as a target of selection. Our work enables more accurate quantification of non-additive selection dynamics and can be used to test more complex models of selection.

Suggested Citation

  • Adam G Fine & Matthias Steinrücken, 2025. "A novel expectation-maximization approach to infer general diploid selection from time-series genetic data," PLOS Genetics, Public Library of Science, vol. 21(7), pages 1-38, July.
  • Handle: RePEc:plo:pgen00:1011769
    DOI: 10.1371/journal.pgen.1011769
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1011769
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1011769&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1011769?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1011769. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.