IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1010419.html
   My bibliography  Save this article

Robust inference of population size histories from genomic sequencing data

Author

Listed:
  • Gautam Upadhya
  • Matthias Steinrücken

Abstract

Unraveling the complex demographic histories of natural populations is a central problem in population genetics. Understanding past demographic events is of general anthropological interest, but is also an important step in establishing accurate null models when identifying adaptive or disease-associated genetic variation. An important class of tools for inferring past population size changes from genomic sequence data are Coalescent Hidden Markov Models (CHMMs). These models make efficient use of the linkage information in population genomic datasets by using the local genealogies relating sampled individuals as latent states that evolve along the chromosome in an HMM framework. Extending these models to large sample sizes is challenging, since the number of possible latent states increases rapidly.Here, we present our method CHIMP (CHMM History-Inference Maximum-Likelihood Procedure), a novel CHMM method for inferring the size history of a population. It can be applied to large samples (hundreds of haplotypes) and only requires unphased genomes as input. The two implementations of CHIMP that we present here use either the height of the genealogical tree (TMRCA) or the total branch length, respectively, as the latent variable at each position in the genome. The requisite transition and emission probabilities are obtained by numerically solving certain systems of differential equations derived from the ancestral process with recombination. The parameters of the population size history are subsequently inferred using an Expectation-Maximization algorithm. In addition, we implement a composite likelihood scheme to allow the method to scale to large sample sizes.We demonstrate the efficiency and accuracy of our method in a variety of benchmark tests using simulated data and present comparisons to other state-of-the-art methods. Specifically, our implementation using TMRCA as the latent variable shows comparable performance and provides accurate estimates of effective population sizes in intermediate and ancient times. Our method is agnostic to the phasing of the data, which makes it a promising alternative in scenarios where high quality data is not available, and has potential applications for pseudo-haploid data.Author summary: The demograpic history of natural populations shapes their genetic variation. The genomes of contemporary individuals can thus be used to unravel past migration events and population size changes, which is of anthropological interest. Moreover, it is also important to uncover these past events for studies investigating disease related genetic variation, since past demographic events can confound such analyses. Here we present a novel method for inferring the size history of a given population from full-genome sequencing data of contemporary individuals. Our method is based on a Coalescent Hidden Markov model framework, a model frequently applied to this type of inference. A key component of the model is the representation of unobserved local genealogical relationships among the sampled individuals as latent states. This is achieved by numerically solving certain differential equations that describe the distributions of these quantities and ultimately enables inference of past population size changes. Other methods performing similar inference rely on availability of high quality genomic data, whereas we demonstrate that our method can be applied in situations with limited data quality.

Suggested Citation

  • Gautam Upadhya & Matthias Steinrücken, 2022. "Robust inference of population size histories from genomic sequencing data," PLOS Computational Biology, Public Library of Science, vol. 18(9), pages 1-32, September.
  • Handle: RePEc:plo:pcbi00:1010419
    DOI: 10.1371/journal.pcbi.1010419
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010419
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1010419&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1010419?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1010419. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.