IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1003876.html
   My bibliography  Save this article

Canonical Correlation Analysis for Gene-Based Pleiotropy Discovery

Author

Listed:
  • Jose A Seoane
  • Colin Campbell
  • Ian N M Day
  • Juan P Casas
  • Tom R Gaunt

Abstract

Genome-wide association studies have identified a wealth of genetic variants involved in complex traits and multifactorial diseases. There is now considerable interest in testing variants for association with multiple phenotypes (pleiotropy) and for testing multiple variants for association with a single phenotype (gene-based association tests). Such approaches can increase statistical power by combining evidence for association over multiple phenotypes or genetic variants respectively. Canonical Correlation Analysis (CCA) measures the correlation between two sets of multidimensional variables, and thus offers the potential to combine these two approaches. To apply CCA, we must restrict the number of attributes relative to the number of samples. Hence we consider modules of genetic variation that can comprise a gene, a pathway or another biologically relevant grouping, and/or a set of phenotypes. In order to do this, we use an attribute selection strategy based on a binary genetic algorithm. Applied to a UK-based prospective cohort study of 4286 women (the British Women's Heart and Health Study), we find improved statistical power in the detection of previously reported genetic associations, and identify a number of novel pleiotropic associations between genetic variants and phenotypes. New discoveries include gene-based association of NSF with triglyceride levels and several genes (ACSM3, ERI2, IL18RAP, IL23RAP and NRG1) with left ventricular hypertrophy phenotypes. In multiple-phenotype analyses we find association of NRG1 with left ventricular hypertrophy phenotypes, fibrinogen and urea and pleiotropic relationships of F7 and F10 with Factor VII, Factor IX and cholesterol levels.Author Summary: Pleiotropy appears when a variation in one gene affects to several non-related phenotypes. The study of this phenomenon can be useful in gene function discovery, but also in the study of the evolution of a gene. In this paper, we present a methodology, based on Canonical Correlation Analysis, which studies gene-centered multiple association of the variation of SNPs in one or a set of genes with one or a set of phenotypes. The resulting methodology can be applied in gene-centered association analysis, multiple association analysis or pleiotropic pattern discovery. We apply this methodology with a genotype dataset and a set of cardiovascular related phenotypes, and discover new gene association between gene NRG1 and phenotypes related with left ventricular hypertrophy, and pleiotropic effects of this gene with other phenotypes as coagulation factors and urea or pleiotropic effects between coagulation related genes F7 and F10 with coagulation factors and cholesterol levels. This methodology could be also used to find multiple associations in other omics datasets.

Suggested Citation

  • Jose A Seoane & Colin Campbell & Ian N M Day & Juan P Casas & Tom R Gaunt, 2014. "Canonical Correlation Analysis for Gene-Based Pleiotropy Discovery," PLOS Computational Biology, Public Library of Science, vol. 10(10), pages 1-13, October.
  • Handle: RePEc:plo:pcbi00:1003876
    DOI: 10.1371/journal.pcbi.1003876
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003876
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1003876&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1003876?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Leonardo Bottolo & Marc Chadeau-Hyam & David I Hastie & Tanja Zeller & Benoit Liquet & Paul Newcombe & Loic Yengo & Philipp S Wild & Arne Schillert & Andreas Ziegler & Sune F Nielsen & Adam S Butterwo, 2013. "GUESS-ing Polygenic Associations with Multiple Phenotypes Using a GPU-Based Evolutionary Stochastic Search Algorithm," PLOS Genetics, Public Library of Science, vol. 9(8), pages 1-17, August.
    2. Paul F O’Reilly & Clive J Hoggart & Yotsawat Pomyen & Federico C F Calboli & Paul Elliott & Marjo-Riitta Jarvelin & Lachlan J M Coin, 2012. "MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS," PLOS ONE, Public Library of Science, vol. 7(5), pages 1-1, May.
    3. Parkhomenko Elena & Tritchler David & Beyene Joseph, 2009. "Sparse Canonical Correlation Analysis with Application to Genomic Data Integration," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-34, January.
    4. Witten Daniela M & Tibshirani Robert J., 2009. "Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-27, June.
    5. Tamra E Meyer & Germaine C Verwoert & Shih-Jen Hwang & Nicole L Glazer & Albert V Smith & Frank J A van Rooij & Georg B Ehret & Eric Boerwinkle & Janine F Felix & Tennille S Leak & Tamara B Harris & Q, 2010. "Genome-Wide Association Studies of Serum Magnesium, Potassium, and Sodium Concentrations Identify Six Loci Influencing Serum Magnesium Levels," PLOS Genetics, Public Library of Science, vol. 6(8), pages 1-11, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nan Lin & Yun Zhu & Ruzong Fan & Momiao Xiong, 2017. "A quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-33, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Wenjia & Zhou, Yi-Hui, 2021. "Eigenvector-based sparse canonical correlation analysis: Fast computation for estimation of multiple canonical vectors," Journal of Multivariate Analysis, Elsevier, vol. 185(C).
    2. Coleman Jacob & Replogle Joseph & Chandler Gabriel & Hardin Johanna, 2016. "Resistant multiple sparse canonical correlation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 15(2), pages 123-138, April.
    3. Dmitry Kobak & Yves Bernaerts & Marissa A. Weis & Federico Scala & Andreas S. Tolias & Philipp Berens, 2021. "Sparse reduced‐rank regression for exploratory visualisation of paired multivariate data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 980-1000, August.
    4. Chalise, Prabhakar & Fridley, Brooke L., 2012. "Comparison of penalty functions for sparse canonical correlation analysis," Computational Statistics & Data Analysis, Elsevier, vol. 56(2), pages 245-254.
    5. Ronglai Shen & Qianxing Mo & Nikolaus Schultz & Venkatraman E Seshan & Adam B Olshen & Jason Huse & Marc Ladanyi & Chris Sander, 2012. "Integrative Subtype Discovery in Glioblastoma Using iCluster," PLOS ONE, Public Library of Science, vol. 7(4), pages 1-9, April.
    6. Zhang Fan & Miecznikowski Jeffrey C. & Tritchler David L., 2020. "Identification of supervised and sparse functional genomic pathways," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 19(1), pages 1-27, February.
    7. Melissa G Naylor & Xihong Lin & Scott T Weiss & Benjamin A Raby & Christoph Lange, 2010. "Using Canonical Correlation Analysis to Discover Genetic Regulatory Variants," PLOS ONE, Public Library of Science, vol. 5(5), pages 1-6, May.
    8. Kai Wang, 2014. "Testing Genetic Association by Regressing Genotype over Multiple Phenotypes," PLOS ONE, Public Library of Science, vol. 9(9), pages 1-9, September.
    9. Szefer Elena & Graham Jinko & Lu Donghuan & Beg Mirza Faisal & Nathoo Farouk, 2017. "Multivariate association between single-nucleotide polymorphisms in Alzgene linkage regions and structural changes in the brain: discovery, refinement and validation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(5-6), pages 349-365, December.
    10. Alberto Roverato & F. Marta L. Di Lascio, 2011. "Wilks' Λ Dissimilarity Measures for Gene Clustering: An Approach Based on the Identification of Transcription Modules," Biometrics, The International Biometric Society, vol. 67(4), pages 1236-1248, December.
    11. Lee Woojoo & Lee Donghwan & Lee Youngjo & Pawitan Yudi, 2011. "Sparse Canonical Covariance Analysis for High-throughput Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-24, July.
    12. Tenenhaus, Arthur & Philippe, Cathy & Frouin, Vincent, 2015. "Kernel Generalized Canonical Correlation Analysis," Computational Statistics & Data Analysis, Elsevier, vol. 90(C), pages 114-131.
    13. Iaci, Ross & Sriram, T.N., 2013. "Robust multivariate association and dimension reduction using density divergences," Journal of Multivariate Analysis, Elsevier, vol. 117(C), pages 281-295.
    14. Heejung Shim & Daniel I Chasman & Joshua D Smith & Samia Mora & Paul M Ridker & Deborah A Nickerson & Ronald M Krauss & Matthew Stephens, 2015. "A Multivariate Genome-Wide Association Analysis of 10 LDL Subfractions, and Their Response to Statin Treatment, in 1868 Caucasians," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-20, April.
    15. Huanhuan Zhu & Shuanglin Zhang & Qiuying Sha, 2018. "A novel method to test associations between a weighted combination of phenotypes and genetic variants," PLOS ONE, Public Library of Science, vol. 13(1), pages 1-17, January.
    16. Lukáš Malec & Vladimír Janovský, 2020. "Connecting the multivariate partial least squares with canonical analysis: a path-following approach," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(3), pages 589-609, September.
    17. Xue Yuan & Zhang Sanguo & Wang Jinjuan & Ding Juan & Li Qizhai, 2019. "A powerful test for ordinal trait genetic association analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(2), pages 1-9, April.
    18. Diptavo Dutta & Yuan He & Ashis Saha & Marios Arvanitis & Alexis Battle & Nilanjan Chatterjee, 2022. "Aggregative trans-eQTL analysis detects trait-specific target gene sets in whole blood," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    19. Bin Li & Hyunjin Shin & Georgy Gulbekyan & Olga Pustovalova & Yuri Nikolsky & Andrew Hope & Marina Bessarabova & Matthew Schu & Elona Kolpakova-Hart & David Merberg & Andrew Dorner & William L Trepicc, 2015. "Development of a Drug-Response Modeling Framework to Identify Cell Line Derived Translational Biomarkers That Can Predict Treatment Outcome to Erlotinib or Sorafenib," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-20, June.
    20. Langworthy, Benjamin W. & Stephens, Rebecca L. & Gilmore, John H. & Fine, Jason P., 2021. "Canonical correlation analysis for elliptical copulas," Journal of Multivariate Analysis, Elsevier, vol. 183(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1003876. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.