IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v12y2021i1d10.1038_s41467-021-26501-7.html
   My bibliography  Save this article

Estimating disease prevalence in large datasets using genetic risk scores

Author

Listed:
  • Benjamin D. Evans

    (University of Exeter
    University of Exeter
    University of Bristol)

  • Piotr Słowiński

    (University of Exeter
    University of Exeter)

  • Andrew T. Hattersley

    (University of Exeter Medical School, Institute of Biomedical & Clinical Science
    Royal Devon & Exeter NHS Foundation Trust)

  • Samuel E. Jones

    (University of Exeter Medical School, Institute of Biomedical & Clinical Science)

  • Seth Sharp

    (University of Exeter Medical School, Institute of Biomedical & Clinical Science)

  • Robert A. Kimmitt

    (University of Exeter Medical School, Institute of Biomedical & Clinical Science
    Royal Devon & Exeter NHS Foundation Trust)

  • Michael N. Weedon

    (University of Exeter Medical School, Institute of Biomedical & Clinical Science)

  • Richard A. Oram

    (University of Exeter Medical School, Institute of Biomedical & Clinical Science
    Royal Devon & Exeter NHS Foundation Trust)

  • Krasimira Tsaneva-Atanasova

    (University of Exeter
    Living Systems Institute, EPSRC Hub for Quantitative Modelling in Healthcare, University of Exeter)

  • Nicholas J. Thomas

    (University of Exeter
    University of Exeter
    Royal Devon & Exeter NHS Foundation Trust)

Abstract

Clinical classification is essential for estimating disease prevalence but is difficult, often requiring complex investigations. The widespread availability of population level genetic data makes novel genetic stratification techniques a highly attractive alternative. We propose a generalizable mathematical framework for determining disease prevalence within a cohort using genetic risk scores. We compare and evaluate methods based on the means of genetic risk scores’ distributions; the Earth Mover’s Distance between distributions; a linear combination of kernel density estimates of distributions; and an Excess method. We demonstrate the performance of genetic stratification to produce robust prevalence estimates. Specifically, we show that robust estimates of prevalence are still possible even with rarer diseases, smaller cohort sizes and less discriminative genetic risk scores, highlighting the general utility of these approaches. Genetic stratification techniques offer exciting new research tools, enabling unbiased insights into disease prevalence and clinical characteristics unhampered by clinical classification criteria.

Suggested Citation

  • Benjamin D. Evans & Piotr Słowiński & Andrew T. Hattersley & Samuel E. Jones & Seth Sharp & Robert A. Kimmitt & Michael N. Weedon & Richard A. Oram & Krasimira Tsaneva-Atanasova & Nicholas J. Thomas, 2021. "Estimating disease prevalence in large datasets using genetic risk scores," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
  • Handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-26501-7
    DOI: 10.1038/s41467-021-26501-7
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-021-26501-7
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-021-26501-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Tim C. Hesterberg, 2015. "What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum," The American Statistician, Taylor & Francis Journals, vol. 69(4), pages 371-386, November.
    2. Andreas Rosenblad, 2009. "B. F. J. Manly: Randomization, bootstrap and Monte Carlo methods in biology, third edition," Computational Statistics, Springer, vol. 24(2), pages 371-372, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mozhgan Alirezaei Dizicheh & Nasrollah Iranpanah & Ehsan Zamanzade, 2021. "Bootstrap Methods for Judgment Post Stratification," Statistical Papers, Springer, vol. 62(5), pages 2453-2471, October.
    2. Andrew D Wilcock & Sushant Joshi & José Escarce & Peter J Huckfeldt & Teryl Nuckols & Ioana Popescu & Neeraj Sood, 2021. "Luck of the draw: Role of chance in the assignment of medicare readmissions penalties," PLOS ONE, Public Library of Science, vol. 16(12), pages 1-14, December.
    3. Casalin, Fabrizio & Dia, Enzo, 2019. "Information and reputation mechanisms in auctions of remanufactured goods," International Journal of Industrial Organization, Elsevier, vol. 63(C), pages 185-212.
    4. Hakim, Adam & Klorfeld, Shira & Sela, Tal & Friedman, Doron & Shabat-Simon, Maytal & Levy, Dino J., 2021. "Machines learn neuromarketing: Improving preference prediction from self-reports using multiple EEG measures and machine learning," International Journal of Research in Marketing, Elsevier, vol. 38(3), pages 770-791.
    5. Jale Samuwai & Jeremy Maxwell Hills, 2018. "Assessing Climate Finance Readiness in the Asia-Pacific Region," Sustainability, MDPI, vol. 10(4), pages 1-18, April.
    6. Jeffrey D. Michler & Anna Josephson, 2022. "Recent developments in inference: practicalities for applied economics," Chapters, in: A Modern Guide to Food Economics, chapter 11, pages 235-268, Edward Elgar Publishing.
    7. Timothy G. Gregoire & David L. R. Affleck, 2018. "Estimating Desired Sample Size for Simple Random Sampling of a Skewed Population," The American Statistician, Taylor & Francis Journals, vol. 72(2), pages 184-190, April.
    8. Yulong Xie & Mark Halverson & Rosemarie Bartlett & Yan Chen & Michael Rosenberg & Todd Taylor & Jeremiah Williams & Michael Reiner, 2020. "Evaluating Building Energy Code Compliance and Savings Potential through Large-Scale Simulation with Models Inferred by Field Data," Energies, MDPI, vol. 13(9), pages 1-19, May.
    9. Samanwoy Mukhopadhyay & Pravat K Thatoi & Abhay D Pandey & Bidyut K Das & Balachandran Ravindran & Samsiddhi Bhattacharjee & Saroj K Mohapatra, 2017. "Transcriptomic meta-analysis reveals up-regulation of gene expression functional in osteoclast differentiation in human septic shock," PLOS ONE, Public Library of Science, vol. 12(2), pages 1-17, February.
    10. Subburaj Alagarsamy & Sangeeta Mehrolia & Sonia Mathew, 2021. "How Green Consumption Value Affects Green Consumer Behaviour: The Mediating Role of Consumer Attitudes Towards Sustainable Food Logistics Practices," Vision, , vol. 25(1), pages 65-76, March.
    11. Au Yong Lyn, Audrey, 2022. "Vocational training and employment outcomes of domestic violence survivors: Evidence from Chihuahua City," International Journal of Educational Development, Elsevier, vol. 89(C).
    12. Fırat Bilgel & Burhan Can Karahasan, 2024. "Understanding Covid-19 Mobility Through Human Capital: A Unified Causal Framework," Computational Economics, Springer;Society for Computational Economics, vol. 63(2), pages 793-833, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-26501-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.