IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1004789.html
   My bibliography  Save this article

Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference

Author

Listed:
  • Michael D Karcher
  • Julia A Palacios
  • Trevor Bedford
  • Marc A Suchard
  • Vladimir N Minin

Abstract

Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals’ genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for this deficiency, we propose a new model that explicitly accounts for preferential sampling by modeling the sampling times as an inhomogeneous Poisson process dependent on effective population size. We demonstrate that in the presence of preferential sampling our new model not only reduces bias, but also improves estimation precision. Finally, we compare the performance of the currently used phylodynamic methods with our proposed model through clinically-relevant, seasonal human influenza examples.Author Summary: Phylodynamics seeks to estimate changes in population size from genetic data sampled from individuals across a particular population. One approach to accomplish this task uses a model called the coalescent, which relates the shape of the individuals’ shared ancestral tree to genetic diversity, which is in turn related to population size. However, when analyzing genetic data sampled at different times, current techniques assume that sampling times are fixed ahead of time or are distributed randomly without any relation to the size of the population. Through simulation, we show that when sampling times are related to population size, a situation referred to as preferential sampling, those estimation methods may be systematically biased. To fix this problem, we propose a new model that explicitly accounts for and models the preferential sampling. We show that in the presence of preferential sampling our new technique not only fixes the bias, but also has improved precision in its population size estimates. We also compare the performance of the old and new techniques on several real-world seasonal human influenza examples.

Suggested Citation

  • Michael D Karcher & Julia A Palacios & Trevor Bedford & Marc A Suchard & Vladimir N Minin, 2016. "Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference," PLOS Computational Biology, Public Library of Science, vol. 12(3), pages 1-19, March.
  • Handle: RePEc:plo:pcbi00:1004789
    DOI: 10.1371/journal.pcbi.1004789
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004789
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1004789&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1004789?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Julia A. Palacios & Vladimir N. Minin, 2013. "Gaussian Process-Based Bayesian Nonparametric Inference of Population Size Trajectories from Gene Genealogies," Biometrics, The International Biometric Society, vol. 69(1), pages 8-18, March.
    2. Andrew Rambaut & Oliver G. Pybus & Martha I. Nelson & Cecile Viboud & Jeffery K. Taubenberger & Edward C. Holmes, 2008. "The genomic and epidemiological dynamics of human influenza A virus," Nature, Nature, vol. 453(7195), pages 615-619, May.
    3. Håvard Rue & Sara Martino & Nicolas Chopin, 2009. "Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 319-392, April.
    4. David A Rasmussen & Oliver Ratmann & Katia Koelle, 2011. "Inference for Nonlinear Epidemiological Models Using Genealogies and Time Series," PLOS Computational Biology, Public Library of Science, vol. 7(8), pages 1-11, August.
    5. Martins, Thiago G. & Simpson, Daniel & Lindgren, Finn & Rue, Håvard, 2013. "Bayesian computing with INLA: New features," Computational Statistics & Data Analysis, Elsevier, vol. 67(C), pages 68-83.
    6. Peter J. Diggle & Raquel Menezes & Ting‐li Su, 2010. "Geostatistical inference under preferential sampling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 59(2), pages 191-232, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Raphaëlle Klitting & Liana E. Kafetzopoulou & Wim Thiery & Gytis Dudas & Sophie Gryseels & Anjali Kotamarthi & Bram Vrancken & Karthik Gangavarapu & Mambu Momoh & John Demby Sandi & Augustine Goba & F, 2022. "Predicting the evolution of the Lassa virus endemic area and population at risk over the next decades," Nature Communications, Nature, vol. 13(1), pages 1-15, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. John M. Humphreys & Robert B. Srygley & David H. Branson, 2022. "Geographic Variation in Migratory Grasshopper Recruitment under Projected Climate Change," Geographies, MDPI, vol. 2(1), pages 1-19, January.
    2. Humphreys, John M. & Srygley, Robert B. & Lawton, Douglas & Hudson, Amy R. & Branson, David H., 2022. "Grasshoppers exhibit asynchrony and spatial non-stationarity in response to the El Niño/Southern and Pacific Decadal Oscillations," Ecological Modelling, Elsevier, vol. 471(C).
    3. Nikoline N. Knudsen & Jörg Schullehner & Birgitte Hansen & Lisbeth F. Jørgensen & Søren M. Kristiansen & Denitza D. Voutchkova & Thomas A. Gerds & Per K. Andersen & Kristine Bihrmann & Morten Grønbæk , 2017. "Lithium in Drinking Water and Incidence of Suicide: A Nationwide Individual-Level Cohort Study with 22 Years of Follow-Up," IJERPH, MDPI, vol. 14(6), pages 1-13, June.
    4. Scott, Ryan P. & Scott, Tyler A., 2019. "Investing in collaboration for safety: Assessing grants to states for oil and gas distribution pipeline safety program enhancement," Energy Policy, Elsevier, vol. 124(C), pages 332-345.
    5. Cho, Daegon & Hwang, Youngdeok & Park, Jongwon, 2018. "More buzz, more vibes: Impact of social media on concert distribution," Journal of Economic Behavior & Organization, Elsevier, vol. 156(C), pages 103-113.
    6. Brown, Paul T. & Joshi, Chaitanya & Joe, Stephen & Rue, Håvard, 2021. "A novel method of marginalisation using low discrepancy sequences for integrated nested Laplace approximations," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).
    7. Mayer Alvo & Jingrui Mu, 2023. "COVID-19 Data Analysis Using Bayesian Models and Nonparametric Geostatistical Models," Mathematics, MDPI, vol. 11(6), pages 1-13, March.
    8. David Jiménez-Hernández & Víctor González-Calatayud & Ana Torres-Soto & Asunción Martínez Mayoral & Javier Morales, 2020. "Digital Competence of Future Secondary School Teachers: Differences According to Gender, Age, and Branch of Knowledge," Sustainability, MDPI, vol. 12(22), pages 1-16, November.
    9. Zhang, Shen & Liu, Xin & Tang, Jinjun & Cheng, Shaowu & Qi, Yong & Wang, Yinhai, 2018. "Spatio-temporal modeling of destination choice behavior through the Bayesian hierarchical approach," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 512(C), pages 537-551.
    10. Aaron Osgood‐Zimmerman & Jon Wakefield, 2023. "A Statistical Review of Template Model Builder: A Flexible Tool for Spatial Modelling," International Statistical Review, International Statistical Institute, vol. 91(2), pages 318-342, August.
    11. Luca Grassetti & Laura Rizzi, 2019. "The determinants of individual health care expenditures in the Italian region of Friuli Venezia Giulia: evidence from a hierarchical spatial model estimation," Empirical Economics, Springer, vol. 56(3), pages 987-1009, March.
    12. Muff, Stefanie & Ott, Manuela & Braun, Julia & Held, Leonhard, 2017. "Bayesian two-component measurement error modelling for survival analysis using INLA—A case study on cardiovascular disease mortality in Switzerland," Computational Statistics & Data Analysis, Elsevier, vol. 113(C), pages 177-193.
    13. Gressani, Oswaldo & Lambert, Philippe, 2021. "Laplace approximations for fast Bayesian inference in generalized additive models based on P-splines," Computational Statistics & Data Analysis, Elsevier, vol. 154(C).
    14. Ferreira, Marco A.R. & Porter, Erica M. & Franck, Christopher T., 2021. "Fast and scalable computations for Gaussian hierarchical models with intrinsic conditional autoregressive spatial random effects," Computational Statistics & Data Analysis, Elsevier, vol. 162(C).
    15. Sameh Abdulah & Yuxiao Li & Jian Cao & Hatem Ltaief & David E. Keyes & Marc G. Genton & Ying Sun, 2023. "Large‐scale environmental data science with ExaGeoStatR," Environmetrics, John Wiley & Sons, Ltd., vol. 34(1), February.
    16. John M. Humphreys, 2022. "Amplification in Time and Dilution in Space: Partitioning Spatiotemporal Processes to Assess the Role of Avian-Host Phylodiversity in Shaping Eastern Equine Encephalitis Virus Distribution," Geographies, MDPI, vol. 2(3), pages 1-16, July.
    17. Tyler A. Scott & Nicola Ulibarri & Ryan P. Scott, 2020. "Stakeholder involvement in collaborative regulatory processes: Using automated coding to track attendance and actions," Regulation & Governance, John Wiley & Sons, vol. 14(2), pages 219-237, April.
    18. Matthew Yap & Matthew Tuson & Berwin Turlach & Bryan Boruff & David Whyatt, 2021. "Modelling the Relationship between Rainfall and Mental Health Using Different Spatial and Temporal Units," IJERPH, MDPI, vol. 18(3), pages 1-15, February.
    19. Simon N. Wood & Zheyuan Li & Gavin Shaddick & Nicole H. Augustin, 2017. "Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(519), pages 1199-1210, July.
    20. Christian P. Robert, 2013. "Bayesian Computational Tools," Working Papers 2013-45, Center for Research in Economics and Statistics.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1004789. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.