IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1011488.html
   My bibliography  Save this article

Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data

Author

Listed:
  • Simon Boutry
  • Raphaël Helaers
  • Tom Lenaerts
  • Miikka Vikkula

Abstract

The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.Author summary: An increasing number of diseases previously thought to be caused by a mutation in a single gene are now being considered as involving several variants in a small number of genes (i.e. “oligogenic”). There is a limited number of dedicated bioinformatic tools to study such oligogenic causes of diseases. These include so called aggregation tests. Yet, an important challenge is to select the right aggregation test among the various ones that have been developed, as each suffers from different limitations. We have computationally compared 59 aggregation methods to explore their limitations. We found that combining 36 of them results in a more robust method, which we baptized “Excalibur”. It can handle a wider range of hypotheses and case-control studies than any of the single methods, while reducing the number of false positive results. Excalibur also provides a comprehensive elucidation of the underlying genetic architecture pertaining to each genomic region under investigation. Thus, it provides a user-friendly, and statistically sound platform to study oligogenic inheritance with the increasing amount of available genetic data.

Suggested Citation

  • Simon Boutry & Raphaël Helaers & Tom Lenaerts & Miikka Vikkula, 2023. "Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data," PLOS Computational Biology, Public Library of Science, vol. 19(9), pages 1-26, September.
  • Handle: RePEc:plo:pcbi00:1011488
    DOI: 10.1371/journal.pcbi.1011488
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011488
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1011488&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1011488?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Martin Ladouceur & Zari Dastani & Yurii S Aulchenko & Celia M T Greenwood & J Brent Richards, 2012. "The Empirical Power of Rare Variant Association Methods: Results from Sanger Sequencing in 1,998 Individuals," PLOS Genetics, Public Library of Science, vol. 8(2), pages 1-11, February.
    2. Thomas J Hoffmann & Nicholas J Marini & John S Witte, 2010. "Comprehensive Approach to Analyzing Rare Genetic Variants," PLOS ONE, Public Library of Science, vol. 5(11), pages 1-9, November.
    3. Elodie Persyn & Matilde Karakachoff & Solena Le Scouarnec & Camille Le Clézio & Dominique Campion & French Exome Consortium & Jean-Jacques Schott & Richard Redon & Lise Bellanger & Christian Dina, 2017. "DoEstRare: A statistical test to identify local enrichments in rare genomic variants associated with disease," PLOS ONE, Public Library of Science, vol. 12(7), pages 1-20, July.
    4. Benjamin M Neale & Manuel A Rivas & Benjamin F Voight & David Altshuler & Bernie Devlin & Marju Orho-Melander & Sekar Kathiresan & Shaun M Purcell & Kathryn Roeder & Mark J Daly, 2011. "Testing for an Unusual Distribution of Rare Variants," PLOS Genetics, Public Library of Science, vol. 7(3), pages 1-8, March.
    5. repec:plo:pgen00:1000384 is not listed on IDEAS
    6. repec:plo:pone00:0085728 is not listed on IDEAS
    7. Elodie Persyn & Richard Redon & Lise Bellanger & Christian Dina, 2018. "The impact of a fine-scale population stratification on rare variant association test results," PLOS ONE, Public Library of Science, vol. 13(12), pages 1-17, December.
    8. Dajiang J Liu & Suzanne M Leal, 2010. "A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions," PLOS Genetics, Public Library of Science, vol. 6(10), pages 1-14, October.
    9. Rachel Marceau West & Wenbin Lu & Daniel M Rotroff & Melaine A Kuenemann & Sheng-Mao Chang & Michael C Wu & Michael J Wagner & John B Buse & Alison A Motsinger-Reif & Denis Fourches & Jung-Ying Tzeng, 2019. "Identifying individual risk rare variants using protein structure guided local tests (POINT)," PLOS Computational Biology, Public Library of Science, vol. 15(2), pages 1-24, February.
    10. repec:plo:pone00:0041694 is not listed on IDEAS
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. repec:plo:pone00:0041694 is not listed on IDEAS
    2. Elodie Persyn & Richard Redon & Lise Bellanger & Christian Dina, 2018. "The impact of a fine-scale population stratification on rare variant association test results," PLOS ONE, Public Library of Science, vol. 13(12), pages 1-17, December.
    3. Chung-Feng Kao & Jia-Rou Liu & Hung Hung & Po-Hsiu Kuo, 2015. "A Robust GWSS Method to Simultaneously Detect Rare and Common Variants for Complex Disease," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-14, April.
    4. repec:plo:pone00:0085728 is not listed on IDEAS
    5. Nanye Long & Samuel P Dickson & Jessica M Maia & Hee Shin Kim & Qianqian Zhu & Andrew S Allen, 2013. "Leveraging Prior Information to Detect Causal Variants via Multi-Variant Regression," PLOS Computational Biology, Public Library of Science, vol. 9(6), pages 1-11, June.
    6. repec:plo:pone00:0042530 is not listed on IDEAS
    7. Daniel D Kinnamon & Ray E Hershberger & Eden R Martin, 2012. "Reconsidering Association Testing Methods Using Single-Variant Test Statistics as Alternatives to Pooling Tests for Sequence Data with Rare Variants," PLOS ONE, Public Library of Science, vol. 7(2), pages 1-15, February.
    8. repec:plo:pgen00:1005165 is not listed on IDEAS
    9. repec:plo:pone00:0083057 is not listed on IDEAS
    10. Faming Liang & Momiao Xiong, 2013. "Bayesian Detection of Causal Rare Variants under Posterior Consistency," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-16, July.
    11. Wenjing Qi & Andrew S Allen & Yi-Ju Li, 2019. "Family-based association tests for rare variants with censored traits," PLOS ONE, Public Library of Science, vol. 14(1), pages 1-17, January.
    12. repec:plo:pone00:0170222 is not listed on IDEAS
    13. Rajesh Talluri & Sanjay Shete, 2013. "A Linkage Disequilibrium–Based Approach to Selecting Disease-Associated Rare Variants," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-6, July.
    14. repec:plo:pgen00:1006573 is not listed on IDEAS
    15. Zheng Xu & Song Yan & Cong Wu & Qing Duan & Sixia Chen & Yun Li, 2023. "Next-Generation Sequencing Data-Based Association Testing of a Group of Genetic Markers for Complex Responses Using a Generalized Linear Model Framework," Mathematics, MDPI, vol. 11(11), pages 1-28, June.
    16. repec:plo:pcbi00:1002600 is not listed on IDEAS
    17. Ruth Greenblatt & Peter Bacchetti & Ross Boylan & Kord Kober & Gayle Springer & Kathryn Anastos & Michael Busch & Mardge Cohen & Seble Kassaye & Deborah Gustafson & Bradley Aouizerat & on behalf of th, 2019. "Genetic and clinical predictors of CD4 lymphocyte recovery during suppressive antiretroviral therapy: Whole exome sequencing and antiretroviral therapy response phenotypes," PLOS ONE, Public Library of Science, vol. 14(8), pages 1-25, August.
    18. Hanmin Guo & Alexander Eckehart Urban & Wing Hung Wong, 2024. "Prioritizing disease-related rare variants by integrating gene expression data," PLOS Genetics, Public Library of Science, vol. 20(9), pages 1-16, September.
    19. Xinge Jessie Jeng & Zhongyin John Daye & Wenbin Lu & Jung-Ying Tzeng, 2016. "Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level," PLOS Computational Biology, Public Library of Science, vol. 12(6), pages 1-23, June.
    20. repec:plo:pone00:0114523 is not listed on IDEAS
    21. Wan-Yu Lin, 2014. "Adaptive Combination of P-Values for Family-Based Association Testing with Sequence Data," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-16, December.
    22. Gourab De & Wai-Ki Yip & Iuliana Ionita-Laza & Nan Laird, 2013. "Rare Variant Analysis for Family-Based Design," PLOS ONE, Public Library of Science, vol. 8(1), pages 1-9, January.
    23. Cameron Palmer & Itsik Pe’er, 2016. "Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation," PLOS Genetics, Public Library of Science, vol. 12(6), pages 1-17, June.
    24. Hou-Feng Zheng & Jing-Jing Rong & Ming Liu & Fang Han & Xing-Wei Zhang & J Brent Richards & Li Wang, 2015. "Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes," PLOS ONE, Public Library of Science, vol. 10(1), pages 1-10, January.
    25. Yukinori Okada & Dorothee Diogo & Jeffrey D Greenberg & Faten Mouassess & Walid A L Achkar & Robert S Fulton & Joshua C Denny & Namrata Gupta & Daniel Mirel & Stacy Gabriel & Gang Li & Joel M Kremer &, 2014. "Integration of Sequence Data from a Consanguineous Family with Genetic Data from an Outbred Population Identifies PLB1 as a Candidate Rheumatoid Arthritis Risk Gene," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-12, February.
    26. Ren-Hua Chung & Wei-Yun Tsai & Eden R Martin, 2014. "Family-Based Association Test Using Both Common and Rare Variants and Accounting for Directions of Effects for Sequencing Data," PLOS ONE, Public Library of Science, vol. 9(9), pages 1-7, September.
    27. Yuanjia Wang & Yin-Hsiu Chen & Qiong Yang, 2012. "Joint Rare Variant Association Test of the Average and Individual Effects for Sequencing Studies," PLOS ONE, Public Library of Science, vol. 7(3), pages 1-13, March.
    28. Rachel Marceau West & Wenbin Lu & Daniel M Rotroff & Melaine A Kuenemann & Sheng-Mao Chang & Michael C Wu & Michael J Wagner & John B Buse & Alison A Motsinger-Reif & Denis Fourches & Jung-Ying Tzeng, 2019. "Identifying individual risk rare variants using protein structure guided local tests (POINT)," PLOS Computational Biology, Public Library of Science, vol. 15(2), pages 1-24, February.
    29. Martin Ladouceur & Zari Dastani & Yurii S Aulchenko & Celia M T Greenwood & J Brent Richards, 2012. "The Empirical Power of Rare Variant Association Methods: Results from Sanger Sequencing in 1,998 Individuals," PLOS Genetics, Public Library of Science, vol. 8(2), pages 1-11, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1011488. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.