IDEAS home Printed from https://ideas.repec.org/a/spr/stabio/v9y2017i1d10.1007_s12561-016-9154-z.html
   My bibliography  Save this article

Annotation Regression for Genome-Wide Association Studies with an Application to Psychiatric Genomic Consortium Data

Author

Listed:
  • Sunyoung Shin

    (University of Wisconsin-Madison)

  • Sündüz Keleş

    (University of Wisconsin-Madison)

Abstract

Although genome-wide association studies (GWAS) have been successful at finding thousands of disease-associated genetic variants (GVs), identifying causal variants and elucidating the mechanisms by which genotypes influence phenotypes are critical open questions. A key challenge is that a large percentage of disease-associated GVs are potential regulatory variants located in noncoding regions, making them difficult to interpret. Recent research efforts focus on going beyond annotating GVs by integrating functional annotation data with GWAS to prioritize GVs. However, applicability of these approaches is challenged by high dimensionality and heterogeneity of functional annotation data. Furthermore, existing methods often assume global associations of GVs with annotation data. This strong assumption is susceptible to violations for GVs involved in many complex diseases. To address these issues, we develop a general regression framework, named Annotation Regression for GWAS (ARoG). ARoG is based on a finite mixture of linear regressions model where GWAS association measures are viewed as responses and functional annotations as predictors. This mixture framework addresses heterogeneity of effects of GVs by grouping them into clusters and high dimensionality of the functional annotations by enabling annotation selection within each cluster. ARoG further employs permutation testing to evaluate the significance of selected annotations. Computational experiments indicate that ARoG can discover distinct associations between disease risk and functional annotations. Application of ARoG to autism and schizophrenia data from Psychiatric Genomics Consortium led to identification of GVs that significantly affect interactions of several transcription factors with DNA as potential mechanisms contributing to these disorders.

Suggested Citation

  • Sunyoung Shin & Sündüz Keleş, 2017. "Annotation Regression for Genome-Wide Association Studies with an Application to Psychiatric Genomic Consortium Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(1), pages 50-72, June.
  • Handle: RePEc:spr:stabio:v:9:y:2017:i:1:d:10.1007_s12561-016-9154-z
    DOI: 10.1007/s12561-016-9154-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s12561-016-9154-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s12561-016-9154-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Nanye Long & Samuel P Dickson & Jessica M Maia & Hee Shin Kim & Qianqian Zhu & Andrew S Allen, 2013. "Leveraging Prior Information to Detect Causal Variants via Multi-Variant Regression," PLOS Computational Biology, Public Library of Science, vol. 9(6), pages 1-11, June.
    2. Meinshausen, Nicolai, 2007. "Relaxed Lasso," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 374-393, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mkhadri, Abdallah & Ouhourane, Mohamed, 2013. "An extended variable inclusion and shrinkage algorithm for correlated variables," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 631-644.
    2. Peng, Liang & Qi, Yongcheng & Wang, Ruodu, 2014. "Empirical likelihood test for high dimensional linear models," Statistics & Probability Letters, Elsevier, vol. 86(C), pages 85-90.
    3. Jiang, Liewen & Bondell, Howard D. & Wang, Huixia Judy, 2014. "Interquantile shrinkage and variable selection in quantile regression," Computational Statistics & Data Analysis, Elsevier, vol. 69(C), pages 208-219.
    4. Min, Aleksey & Holzmann, Hajo & Czado, Claudia, 2010. "Model selection strategies for identifying most relevant covariates in homoscedastic linear models," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3194-3211, December.
    5. Ollier, Edouard & Samson, Adeline & Delavenne, Xavier & Viallon, Vivian, 2016. "A SAEM algorithm for fused lasso penalized NonLinear Mixed Effect Models: Application to group comparison in pharmacokinetics," Computational Statistics & Data Analysis, Elsevier, vol. 95(C), pages 207-221.
    6. Gestel, R.V. & Müller, T. & Bosmans, J., 2016. "Does My High Blood Pressure Improve Your Survival? Overall and Subgroup Learning Curves in Health," Health, Econometrics and Data Group (HEDG) Working Papers 16/27, HEDG, c/o Department of Economics, University of York.
    7. Xinge Jessie Jeng & Zhongyin John Daye & Wenbin Lu & Jung-Ying Tzeng, 2016. "Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level," PLOS Computational Biology, Public Library of Science, vol. 12(6), pages 1-23, June.
    8. Bergersen Linn Cecilie & Glad Ingrid K. & Lyng Heidi, 2011. "Weighted Lasso with Data Integration," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-29, August.
    9. Roberts, S. & Nowak, G., 2014. "Stabilizing the lasso against cross-validation variability," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 198-211.
    10. Rachel Marceau West & Wenbin Lu & Daniel M Rotroff & Melaine A Kuenemann & Sheng-Mao Chang & Michael C Wu & Michael J Wagner & John B Buse & Alison A Motsinger-Reif & Denis Fourches & Jung-Ying Tzeng, 2019. "Identifying individual risk rare variants using protein structure guided local tests (POINT)," PLOS Computational Biology, Public Library of Science, vol. 15(2), pages 1-24, February.
    11. Friedman, Jerome H., 2012. "Fast sparse regression and classification," International Journal of Forecasting, Elsevier, vol. 28(3), pages 722-738.
    12. Du, Pang & Cheng, Guang & Liang, Hua, 2012. "Semiparametric regression models with additive nonparametric components and high dimensional parametric components," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 2006-2017.
    13. Radchenko, Peter, 2015. "High dimensional single index models," Journal of Multivariate Analysis, Elsevier, vol. 139(C), pages 266-282.
    14. Ruth M. Pfeiffer & Andrew Redd & Raymond J. Carroll, 2017. "On the impact of model selection on predictor identification and parameter inference," Computational Statistics, Springer, vol. 32(2), pages 667-690, June.
    15. Bergersen, Linn Cecilie & Tharmaratnam, Kukatharmini & Glad, Ingrid K., 2014. "Monotone splines lasso," Computational Statistics & Data Analysis, Elsevier, vol. 77(C), pages 336-351.
    16. Abdallah Mkhadri & Mohamed Ouhourane, 2015. "A group VISA algorithm for variable selection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(1), pages 41-60, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stabio:v:9:y:2017:i:1:d:10.1007_s12561-016-9154-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.