IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1003032.html
   My bibliography  Save this article

Informed Conditioning on Clinical Covariates Increases Power in Case-Control Association Studies

Author

Listed:
  • Noah Zaitlen
  • Sara Lindström
  • Bogdan Pasaniuc
  • Marilyn Cornelis
  • Giulio Genovese
  • Samuela Pollack
  • Anne Barton
  • Heike Bickeböller
  • Donald W Bowden
  • Steve Eyre
  • Barry I Freedman
  • David J Friedman
  • John K Field
  • Leif Groop
  • Aage Haugen
  • Joachim Heinrich
  • Brian E Henderson
  • Pamela J Hicks
  • Lynne J Hocking
  • Laurence N Kolonel
  • Maria Teresa Landi
  • Carl D Langefeld
  • Loic Le Marchand
  • Michael Meister
  • Ann W Morgan
  • Olaide Y Raji
  • Angela Risch
  • Albert Rosenberger
  • David Scherf
  • Sophia Steer
  • Martin Walshaw
  • Kevin M Waters
  • Anthony G Wilson
  • Paul Wordsworth
  • Shanbeh Zienolddiny
  • Eric Tchetgen Tchetgen
  • Christopher Haiman
  • David J Hunter
  • Robert M Plenge
  • Jane Worthington
  • David C Christiani
  • Debra A Schaumberg
  • Daniel I Chasman
  • David Altshuler
  • Benjamin Voight
  • Peter Kraft
  • Nick Patterson
  • Alkes L Price

Abstract

Genetic case-control association studies often include data on clinical covariates, such as body mass index (BMI), smoking status, or age, that may modify the underlying genetic risk of case or control samples. For example, in type 2 diabetes, odds ratios for established variants estimated from low–BMI cases are larger than those estimated from high–BMI cases. An unanswered question is how to use this information to maximize statistical power in case-control studies that ascertain individuals on the basis of phenotype (case-control ascertainment) or phenotype and clinical covariates (case-control-covariate ascertainment). While current approaches improve power in studies with random ascertainment, they often lose power under case-control ascertainment and fail to capture available power increases under case-control-covariate ascertainment. We show that an informed conditioning approach, based on the liability threshold model with parameters informed by external epidemiological information, fully accounts for disease prevalence and non-random ascertainment of phenotype as well as covariates and provides a substantial increase in power while maintaining a properly controlled false-positive rate. Our method outperforms standard case-control association tests with or without covariates, tests of gene x covariate interaction, and previously proposed tests for dealing with covariates in ascertained data, with especially large improvements in the case of case-control-covariate ascertainment. We investigate empirical case-control studies of type 2 diabetes, prostate cancer, lung cancer, breast cancer, rheumatoid arthritis, age-related macular degeneration, and end-stage kidney disease over a total of 89,726 samples. In these datasets, informed conditioning outperforms logistic regression for 115 of the 157 known associated variants investigated (P-value = 1×10−9). The improvement varied across diseases with a 16% median increase in χ2 test statistics and a commensurate increase in power. This suggests that applying our method to existing and future association studies of these diseases may identify novel disease loci. Author Summary: This work describes a new methodology for analyzing genome-wide case-control association studies of diseases with strong correlations to clinical covariates, such as age in prostate cancer and body mass index in type 2 diabetes. Currently, researchers either ignore these clinical covariates or apply approaches that ignore the disease's prevalence and the study's ascertainment strategy. We take an alternative approach, leveraging external prevalence information from the epidemiological literature and constructing a statistic based on the classic liability threshold model of disease. Our approach not only improves the power of studies that ascertain individuals randomly or based on the disease phenotype, but also improves the power of studies that ascertain individuals based on both the disease phenotype and clinical covariates. We apply our statistic to seven datasets over six different diseases and a variety of clinical covariates. We found that there was a substantial improvement in test statistics relative to current approaches at known associated variants. This suggests that novel loci may be identified by applying our method to existing and future association studies of these diseases.

Suggested Citation

  • Noah Zaitlen & Sara Lindström & Bogdan Pasaniuc & Marilyn Cornelis & Giulio Genovese & Samuela Pollack & Anne Barton & Heike Bickeböller & Donald W Bowden & Steve Eyre & Barry I Freedman & David J Fri, 2012. "Informed Conditioning on Clinical Covariates Increases Power in Case-Control Association Studies," PLOS Genetics, Public Library of Science, vol. 8(11), pages 1-13, November.
  • Handle: RePEc:plo:pgen00:1003032
    DOI: 10.1371/journal.pgen.1003032
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1003032
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1003032&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1003032?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Rose Sherri & van der Laan Mark J., 2008. "Simple Optimal Weighting of Cases and Controls in Case-Control Studies," The International Journal of Biostatistics, De Gruyter, vol. 4(1), pages 1-24, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gengjie Jia & Xue Zhong & Hae Kyung Im & Nathan Schoettler & Milton Pividori & D. Kyle Hogarth & Anne I. Sperling & Steven R. White & Edward T. Naureckas & Christopher S. Lyttle & Chikashi Terao & Yoi, 2022. "Discerning asthma endotypes through comorbidity mapping," Nature Communications, Nature, vol. 13(1), pages 1-19, December.
    2. Joel Mefford & John S Witte, 2012. "The Covariate's Dilemma," PLOS Genetics, Public Library of Science, vol. 8(11), pages 1-2, November.
    3. Emil M. Pedersen & Esben Agerbo & Oleguer Plana-Ripoll & Jette Steinbach & Morten D. Krebs & David M. Hougaard & Thomas Werge & Merete Nordentoft & Anders D. Børglum & Katherine L. Musliner & Andrea G, 2023. "ADuLT: An efficient and robust time-to-event GWAS," Nature Communications, Nature, vol. 14(1), pages 1-12, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rose Sherri & van der Laan Mark J., 2011. "A Targeted Maximum Likelihood Estimator for Two-Stage Designs," The International Journal of Biostatistics, De Gruyter, vol. 7(1), pages 1-21, March.
    2. van der Laan Mark J. & Petersen Maya & Zheng Wenjing, 2013. "Estimating the Effect of a Community-Based Intervention with Two Communities," Journal of Causal Inference, De Gruyter, vol. 1(1), pages 83-106, June.
    3. van der Laan Mark J., 2010. "Targeted Maximum Likelihood Based Causal Inference: Part I," The International Journal of Biostatistics, De Gruyter, vol. 6(2), pages 1-45, February.
    4. van der Laan Mark J. & Gruber Susan, 2010. "Collaborative Double Robust Targeted Maximum Likelihood Estimation," The International Journal of Biostatistics, De Gruyter, vol. 6(1), pages 1-71, May.
    5. Sherri Rose & Julie Shi & Thomas G. McGuire & Sharon-Lise T. Normand, 2017. "Matching and Imputation Methods for Risk Adjustment in the Health Insurance Marketplaces," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(2), pages 525-542, December.
    6. Etienne Volatier & Francesco Salvo & Antoine Pariente & Émeline Courtois & Sylvie Escolano & Pascale Tubert-Bitter & Ismaïl Ahmed, 2022. "High-Dimensional Propensity Score-Adjusted Case-Crossover for Discovering Adverse Drug Reactions from Computerized Administrative Healthcare Databases," Drug Safety, Springer, vol. 45(3), pages 275-285, March.
    7. van der Laan Mark J. & Gruber Susan, 2012. "Targeted Minimum Loss Based Estimation of Causal Effects of Multiple Time Point Interventions," The International Journal of Biostatistics, De Gruyter, vol. 8(1), pages 1-41, May.
    8. O. Saarela & L. R. Belzile & D. A. Stephens, 2016. "A Bayesian view of doubly robust causal inference," Biometrika, Biometrika Trust, vol. 103(3), pages 667-681.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1003032. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.