IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0097524.html
   My bibliography  Save this article

GWAS in a Box: Statistical and Visual Analytics of Structured Associations via GenAMap

Author

Listed:
  • Eric P Xing
  • Ross E Curtis
  • Georg Schoenherr
  • Seunghak Lee
  • Junming Yin
  • Kriti Puniyani
  • Wei Wu
  • Peter Kinnaird

Abstract

With the continuous improvement in genotyping and molecular phenotyping technology and the decreasing typing cost, it is expected that in a few years, more and more clinical studies of complex diseases will recruit thousands of individuals for pan-omic genetic association analyses. Hence, there is a great need for algorithms and software tools that could scale up to the whole omic level, integrate different omic data, leverage rich structure information, and be easily accessible to non-technical users. We present GenAMap, an interactive analytics software platform that 1) automates the execution of principled machine learning methods that detect genome- and phenome-wide associations among genotypes, gene expression data, and clinical or other macroscopic traits, and 2) provides new visualization tools specifically designed to aid in the exploration of association mapping results. Algorithmically, GenAMap is based on a new paradigm for GWAS and PheWAS analysis, termed structured association mapping, which leverages various structures in the omic data. We demonstrate the function of GenAMap via a case study of the Brem and Kruglyak yeast dataset, and then apply it on a comprehensive eQTL analysis of the NIH heterogeneous stock mice dataset and report some interesting findings. GenAMap is available from http://sailing.cs.cmu.edu/genamap.

Suggested Citation

  • Eric P Xing & Ross E Curtis & Georg Schoenherr & Seunghak Lee & Junming Yin & Kriti Puniyani & Wei Wu & Peter Kinnaird, 2014. "GWAS in a Box: Statistical and Visual Analytics of Structured Associations via GenAMap," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-19, June.
  • Handle: RePEc:plo:pone00:0097524
    DOI: 10.1371/journal.pone.0097524
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0097524
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0097524&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0097524?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Meinshausen, Nicolai & Meier, Lukas & Bühlmann, Peter, 2009. "p-Values for High-Dimensional Regression," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1671-1681.
    2. Zhang Bin & Horvath Steve, 2005. "A General Framework for Weighted Gene Co-Expression Network Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-45, August.
    3. Nick Patterson & Alkes L Price & David Reich, 2006. "Population Structure and Eigenanalysis," PLOS Genetics, Public Library of Science, vol. 2(12), pages 1-20, December.
    4. Yi-Hsiang Hsu & M Carola Zillikens & Scott G Wilson & Charles R Farber & Serkalem Demissie & Nicole Soranzo & Estelle N Bianchi & Elin Grundberg & Liming Liang & J Brent Richards & Karol Estrada & Yan, 2010. "An Integration of Genome-Wide Association Study and Gene Expression Profiling to Prioritize the Discovery of Novel Susceptibility Loci for Osteoporosis-Related Traits," PLOS Genetics, Public Library of Science, vol. 6(6), pages 1-16, June.
    5. P. Tseng & S. Yun, 2009. "Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization," Journal of Optimization Theory and Applications, Springer, vol. 140(3), pages 513-535, March.
    6. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    7. Yanqing Chen & Jun Zhu & Pek Yee Lum & Xia Yang & Shirly Pinto & Douglas J. MacNeil & Chunsheng Zhang & John Lamb & Stephen Edwards & Solveig K. Sieberts & Amy Leonardson & Lawrence W. Castellini & Su, 2008. "Variations in DNA elucidate molecular networks that cause disease," Nature, Nature, vol. 452(7186), pages 429-435, March.
    8. Eric E. Schadt, 2009. "Molecular networks as sensors and drivers of common human diseases," Nature, Nature, vol. 461(7261), pages 218-223, September.
    9. NESTEROV, Yu., 2005. "Smooth minimization of non-smooth functions," LIDAM Reprints CORE 1819, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Valur Emilsson & Elias F. Gudmundsson & Thorarinn Jonmundsson & Brynjolfur G. Jonsson & Michael Twarog & Valborg Gudmundsdottir & Zhiguang Li & Nancy Finkel & Stephen Poor & Xin Liu & Robert Esterberg, 2022. "A proteogenomic signature of age-related macular degeneration in blood," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    2. Xue Jiang & Han Zhang & Xiongwen Quan & Zhandong Liu & Yanbin Yin, 2017. "Disease-related gene module detection based on a multi-label propagation clustering algorithm," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-17, May.
    3. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    4. Achim Ahrens & Christian B. Hansen & Mark E. Schaffer, 2020. "lassopack: Model selection and prediction with regularized regression in Stata," Stata Journal, StataCorp LP, vol. 20(1), pages 176-235, March.
    5. Lingxue Zhang & Seyoung Kim, 2014. "Learning Gene Networks under SNP Perturbations Using eQTL Datasets," PLOS Computational Biology, Public Library of Science, vol. 10(2), pages 1-20, February.
    6. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    7. Claude Renaux & Laura Buzdugan & Markus Kalisch & Peter Bühlmann, 2020. "Hierarchical inference for genome-wide association studies: a view on methodology with software," Computational Statistics, Springer, vol. 35(1), pages 1-40, March.
    8. Benjamin A Logsdon & Jason Mezey, 2010. "Gene Expression Network Reconstruction by Convex Feature Selection when Incorporating Genetic Perturbations," PLOS Computational Biology, Public Library of Science, vol. 6(12), pages 1-13, December.
    9. The Tien Mai, 2023. "Reliable Genetic Correlation Estimation via Multiple Sample Splitting and Smoothing," Mathematics, MDPI, vol. 11(9), pages 1-13, May.
    10. Michimasa Fujiogi & Yoshihiko Raita & Marcos Pérez-Losada & Robert J. Freishtat & Juan C. Celedón & Jonathan M. Mansbach & Pedro A. Piedra & Zhaozhong Zhu & Carlos A. Camargo & Kohei Hasegawa, 2022. "Integrated relationship of nasopharyngeal airway host response and microbiome associates with bronchiolitis severity," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    11. Achim Ahrens & Arnab Bhattacharjee, 2015. "Two-Step Lasso Estimation of the Spatial Weights Matrix," Econometrics, MDPI, vol. 3(1), pages 1-28, March.
    12. Siwei Xia & Yuehan Yang & Hu Yang, 2022. "Sparse Laplacian Shrinkage with the Graphical Lasso Estimator for Regression Problems," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 255-277, March.
    13. Jeon, Jong-June & Kim, Yongdai & Won, Sungho & Choi, Hosik, 2020. "Primal path algorithm for compositional data analysis," Computational Statistics & Data Analysis, Elsevier, vol. 148(C).
    14. Monica Novackova & Richard S.J. Tol, 2018. "Climate Change Awareness and Willingness to Pay for its Mitigation: Evidence from the UK," Working Paper Series 0318, Department of Economics, University of Sussex Business School.
    15. Junyang Qian & Yosuke Tanigawa & Wenfei Du & Matthew Aguirre & Chris Chang & Robert Tibshirani & Manuel A Rivas & Trevor Hastie, 2020. "A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank," PLOS Genetics, Public Library of Science, vol. 16(10), pages 1-30, October.
    16. Abhik Ghosh & Magne Thoresen, 2018. "Non-concave penalization in linear mixed-effect models and regularized selection of fixed effects," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 102(2), pages 179-210, April.
    17. Jan Pablo Burgard & Joscha Krause & Dennis Kreber & Domingo Morales, 2021. "The generalized equivalence of regularization and min–max robustification in linear mixed models," Statistical Papers, Springer, vol. 62(6), pages 2857-2883, December.
    18. Blum Yuna & Houée-Bigot Magalie & Causeur David, 2016. "Sparse factor model for co-expression networks with an application using prior biological knowledge," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 15(3), pages 253-272, June.
    19. Paul Tseng & Sangwoon Yun, 2014. "Incrementally Updated Gradient Methods for Constrained and Regularized Optimization," Journal of Optimization Theory and Applications, Springer, vol. 160(3), pages 832-853, March.
    20. Ville-Petteri Mäkinen & Mete Civelek & Qingying Meng & Bin Zhang & Jun Zhu & Candace Levian & Tianxiao Huan & Ayellet V Segrè & Sujoy Ghosh & Juan Vivar & Majid Nikpay & Alexandre F R Stewart & Christ, 2014. "Integrative Genomics Reveals Novel Molecular Pathways and Gene Networks for Coronary Artery Disease," PLOS Genetics, Public Library of Science, vol. 10(7), pages 1-14, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0097524. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.