IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1008081.html
   My bibliography  Save this article

FunSPU: A versatile and adaptive multiple functional annotation-based association test of whole-genome sequencing data

Author

Listed:
  • Yiding Ma
  • Peng Wei

Abstract

Despite ongoing large-scale population-based whole-genome sequencing (WGS) projects such as the NIH NHLBI TOPMed program and the NHGRI Genome Sequencing Program, WGS-based association analysis of complex traits remains a tremendous challenge due to the large number of rare variants, many of which are non-trait-associated neutral variants. External biological knowledge, such as functional annotations based on the ENCODE, Epigenomics Roadmap and GTEx projects, may be helpful in distinguishing causal rare variants from neutral ones; however, each functional annotation can only provide certain aspects of the biological functions. Our knowledge for selecting informative annotations a priori is limited, and incorporating non-informative annotations will introduce noise and lose power. We propose FunSPU, a versatile and adaptive test that incorporates multiple biological annotations and is adaptive at both the annotation and variant levels and thus maintains high power even in the presence of noninformative annotations. In addition to extensive simulations, we illustrate our proposed test using the TWINSUK cohort (n = 1,752) of UK10K WGS data based on six functional annotations: CADD, RegulomeDB, FunSeq, Funseq2, GERP++, and GenoSkyline. We identified genome-wide significant genetic loci on chromosome 19 near gene TOMM40 and APOC4-APOC2 associated with low-density lipoprotein (LDL), which are replicated in the UK10K ALSPAC cohort (n = 1,497). These replicated LDL-associated loci were missed by existing rare variant association tests that either ignore external biological information or rely on a single source of biological knowledge. We have implemented the proposed test in an R package “FunSPU”.Author summary: In recent years, large-scale whole-genome sequencing (WGS) data have been generated, such as those in the UK10K project and the ongoing NIH Trans-Omics for Precision Medicine (TOPMed) WGS program, providing unprecedented opportunities to investigate low-frequency variants and rare variants in association with complex diseases and traits. However, WGS-based association analysis of complex traits remains a tremendous challenge due to the large number of rare variants, many of which are non-trait-associated neutral variants. External biological knowledge, such as functional annotations based on the ENCODE, Epigenomics Roadmap and GTEx projects, can be helpful in distinguishing causal rare variants from neutral ones; however, each functional annotation can only provide certain aspects of the biological functions. To this end, we have proposed a versatile and adaptive association test, FunSPU, to exploit multiple sources of biological knowledge in the analysis of WGS data. We illustrate our proposed test using the TWINSUK cohort of UK10K WGS data based on six functional annotations. We identified genome-wide significant genetic loci associated with low-density lipoprotein, which are replicated in the UK10K ALSPAC cohort. These replicated loci were missed by existing rare variant association tests that either ignore external biological information or rely on a single source of biological knowledge.

Suggested Citation

  • Yiding Ma & Peng Wei, 2019. "FunSPU: A versatile and adaptive multiple functional annotation-based association test of whole-genome sequencing data," PLOS Genetics, Public Library of Science, vol. 15(4), pages 1-21, April.
  • Handle: RePEc:plo:pgen00:1008081
    DOI: 10.1371/journal.pgen.1008081
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008081
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1008081&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1008081?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yangqing Deng & Yinqiu He & Gongjun Xu & Wei Pan, 2022. "Speeding up Monte Carlo simulations for the adaptive sum of powered score test with importance sampling," Biometrics, The International Biometric Society, vol. 78(1), pages 261-273, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1008081. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.