IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v8y2009i1n28.html
   My bibliography  Save this article

Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data

Author

Listed:
  • Witten Daniela M

    (Stanford University)

  • Tibshirani Robert J.

    (Stanford University)

Abstract

In recent work, several authors have introduced methods for sparse canonical correlation analysis (sparse CCA). Suppose that two sets of measurements are available on the same set of observations. Sparse CCA is a method for identifying sparse linear combinations of the two sets of variables that are highly correlated with each other. It has been shown to be useful in the analysis of high-dimensional genomic data, when two sets of assays are available on the same set of samples. In this paper, we propose two extensions to the sparse CCA methodology. (1) Sparse CCA is an unsupervised method; that is, it does not make use of outcome measurements that may be available for each observation (e.g., survival time or cancer subtype). We propose an extension to sparse CCA, which we call sparse supervised CCA, which results in the identification of linear combinations of the two sets of variables that are correlated with each other and associated with the outcome. (2) It is becoming increasingly common for researchers to collect data on more than two assays on the same set of samples; for instance, SNP, gene expression, and DNA copy number measurements may all be available. We develop sparse multiple CCA in order to extend the sparse CCA methodology to the case of more than two data sets. We demonstrate these new methods on simulated data and on a recently published and publicly available diffuse large B-cell lymphoma data set.

Suggested Citation

  • Witten Daniela M & Tibshirani Robert J., 2009. "Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-27, June.
  • Handle: RePEc:bpj:sagmbi:v:8:y:2009:i:1:n:28
    DOI: 10.2202/1544-6115.1470
    as

    Download full text from publisher

    File URL: https://doi.org/10.2202/1544-6115.1470
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.2202/1544-6115.1470?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Michael Morley & Cliona M. Molony & Teresa M. Weber & James L. Devlin & Kathryn G. Ewens & Richard S. Spielman & Vivian G. Cheung, 2004. "Genetic analysis of genome-wide variation in human gene expression," Nature, Nature, vol. 430(7001), pages 743-747, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lee Woojoo & Lee Donghwan & Lee Youngjo & Pawitan Yudi, 2011. "Sparse Canonical Covariance Analysis for High-throughput Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-24, July.
    2. Ronglai Shen & Qianxing Mo & Nikolaus Schultz & Venkatraman E Seshan & Adam B Olshen & Jason Huse & Marc Ladanyi & Chris Sander, 2012. "Integrative Subtype Discovery in Glioblastoma Using iCluster," PLOS ONE, Public Library of Science, vol. 7(4), pages 1-9, April.
    3. Wang, Wenjia & Zhou, Yi-Hui, 2021. "Eigenvector-based sparse canonical correlation analysis: Fast computation for estimation of multiple canonical vectors," Journal of Multivariate Analysis, Elsevier, vol. 185(C).
    4. Jose A Seoane & Colin Campbell & Ian N M Day & Juan P Casas & Tom R Gaunt, 2014. "Canonical Correlation Analysis for Gene-Based Pleiotropy Discovery," PLOS Computational Biology, Public Library of Science, vol. 10(10), pages 1-13, October.
    5. Zhang Fan & Miecznikowski Jeffrey C. & Tritchler David L., 2020. "Identification of supervised and sparse functional genomic pathways," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 19(1), pages 1-27, February.
    6. Bin Li & Hyunjin Shin & Georgy Gulbekyan & Olga Pustovalova & Yuri Nikolsky & Andrew Hope & Marina Bessarabova & Matthew Schu & Elona Kolpakova-Hart & David Merberg & Andrew Dorner & William L Trepicc, 2015. "Development of a Drug-Response Modeling Framework to Identify Cell Line Derived Translational Biomarkers That Can Predict Treatment Outcome to Erlotinib or Sorafenib," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-20, June.
    7. Nam D Nguyen & Daifeng Wang, 2020. "Multiview learning for understanding functional multiomics," PLOS Computational Biology, Public Library of Science, vol. 16(4), pages 1-26, April.
    8. Dmitry Kobak & Yves Bernaerts & Marissa A. Weis & Federico Scala & Andreas S. Tolias & Philipp Berens, 2021. "Sparse reduced‐rank regression for exploratory visualisation of paired multivariate data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 980-1000, August.
    9. Langworthy, Benjamin W. & Stephens, Rebecca L. & Gilmore, John H. & Fine, Jason P., 2021. "Canonical correlation analysis for elliptical copulas," Journal of Multivariate Analysis, Elsevier, vol. 183(C).
    10. Iaci, Ross & Sriram, T.N., 2013. "Robust multivariate association and dimension reduction using density divergences," Journal of Multivariate Analysis, Elsevier, vol. 117(C), pages 281-295.
    11. Bayarbaatar Amgalan & Hyunju Lee, 2014. "WMAXC: A Weighted Maximum Clique Method for Identifying Condition-Specific Sub-Network," PLOS ONE, Public Library of Science, vol. 9(8), pages 1-10, August.
    12. Diptavo Dutta & Yuan He & Ashis Saha & Marios Arvanitis & Alexis Battle & Nilanjan Chatterjee, 2022. "Aggregative trans-eQTL analysis detects trait-specific target gene sets in whole blood," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    13. Palzer, Elise F. & Wendt, Christine H. & Bowler, Russell P. & Hersh, Craig P. & Safo, Sandra E. & Lock, Eric F., 2022. "sJIVE: Supervised joint and individual variation explained," Computational Statistics & Data Analysis, Elsevier, vol. 175(C).
    14. Yunfeng Zhang & Irina Gaynanova, 2022. "Joint association and classification analysis of multi‐view data," Biometrics, The International Biometric Society, vol. 78(4), pages 1614-1625, December.
    15. Coleman Jacob & Replogle Joseph & Chandler Gabriel & Hardin Johanna, 2016. "Resistant multiple sparse canonical correlation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 15(2), pages 123-138, April.
    16. Tenenhaus, Arthur & Philippe, Cathy & Frouin, Vincent, 2015. "Kernel Generalized Canonical Correlation Analysis," Computational Statistics & Data Analysis, Elsevier, vol. 90(C), pages 114-131.
    17. Sandra E. Safo & Eun Jeong Min & Lillian Haine, 2022. "Sparse linear discriminant analysis for multiview structured data," Biometrics, The International Biometric Society, vol. 78(2), pages 612-623, June.
    18. Chalise, Prabhakar & Fridley, Brooke L., 2012. "Comparison of penalty functions for sparse canonical correlation analysis," Computational Statistics & Data Analysis, Elsevier, vol. 56(2), pages 245-254.
    19. Jung, Sungkyu, 2018. "Continuum directions for supervised dimension reduction," Computational Statistics & Data Analysis, Elsevier, vol. 125(C), pages 27-43.
    20. Efrat Muller & Itamar Shiryan & Elhanan Borenstein, 2024. "Multi-omic integration of microbiome data for identifying disease-associated modules," Nature Communications, Nature, vol. 15(1), pages 1-13, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Julia Schröder & Vitalia Schüller & Andrea May & Christian Gerges & Mario Anders & Jessica Becker & Timo Hess & Nicole Kreuser & René Thieme & Kerstin U Ludwig & Tania Noder & Marino Venerito & Lothar, 2019. "Identification of loci of functional relevance to Barrett’s esophagus and esophageal adenocarcinoma: Cross-referencing of expression quantitative trait loci data from disease-relevant tissues with gen," PLOS ONE, Public Library of Science, vol. 14(12), pages 1-12, December.
    2. Bo Jiang & Jun S. Liu, 2015. "Bayesian Partition Models for Identifying Expression Quantitative Trait Loci," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1350-1361, December.
    3. Yixin Fang & Yang Feng & Ming Yuan, 2014. "Regularized principal components of heritability," Computational Statistics, Springer, vol. 29(3), pages 455-465, June.
    4. Lingxue Zhang & Seyoung Kim, 2014. "Learning Gene Networks under SNP Perturbations Using eQTL Datasets," PLOS Computational Biology, Public Library of Science, vol. 10(2), pages 1-20, February.
    5. Cipolli III, William & Hanson, Timothy & McLain, Alexander C., 2016. "Bayesian nonparametric multiple testing," Computational Statistics & Data Analysis, Elsevier, vol. 101(C), pages 64-79.
    6. Barbara E Stranger & Stephen B Montgomery & Antigone S Dimas & Leopold Parts & Oliver Stegle & Catherine E Ingle & Magda Sekowska & George Davey Smith & David Evans & Maria Gutierrez-Arcelus & Alkes P, 2012. "Patterns of Cis Regulatory Variation in Diverse Human Populations," PLOS Genetics, Public Library of Science, vol. 8(4), pages 1-13, April.
    7. Eric R Gamazon & Hae-Kyung Im & Shiwei Duan & Yves A Lussier & Nancy J Cox & M Eileen Dolan & Wei Zhang, 2010. "ExprTarget: An Integrative Approach to Predicting Human MicroRNA Targets," PLOS ONE, Public Library of Science, vol. 5(10), pages 1-8, October.
    8. Ryan Abo & Gregory D Jenkins & Liewei Wang & Brooke L Fridley, 2012. "Identifying the Genetic Variation of Gene Expression Using Gene Sets: Application of Novel Gene Set eQTL Approach to PharmGKB and KEGG," PLOS ONE, Public Library of Science, vol. 7(8), pages 1-11, August.
    9. Mitsutaka Kadota & Howard H Yang & Nan Hu & Chaoyu Wang & Ying Hu & Philip R Taylor & Kenneth H Buetow & Maxwell P Lee, 2007. "Allele-Specific Chromatin Immunoprecipitation Studies Show Genetic Influence on Chromatin State in Human Genome," PLOS Genetics, Public Library of Science, vol. 3(5), pages 1-11, May.
    10. Oualkacha Karim & Labbe Aurelie & Ciampi Antonio & Roy Marc-Andre & Maziade Michel, 2012. "Principal Components of Heritability for High Dimension Quantitative Traits and General Pedigrees," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(2), pages 1-27, January.
    11. Enrico Petretto & Leonardo Bottolo & Sarah R Langley & Matthias Heinig & Chris McDermott-Roe & Rizwan Sarwar & Michal Pravenec & Norbert Hübner & Timothy J Aitman & Stuart A Cook & Sylvia Richardson, 2010. "New Insights into the Genetic Control of Gene Expression using a Bayesian Multi-tissue Approach," PLOS Computational Biology, Public Library of Science, vol. 6(4), pages 1-13, April.
    12. Bergersen Linn Cecilie & Glad Ingrid K. & Lyng Heidi, 2011. "Weighted Lasso with Data Integration," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-29, August.
    13. Jin Hyun Ju & Sushila A Shenoy & Ronald G Crystal & Jason G Mezey, 2017. "An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci," PLOS Computational Biology, Public Library of Science, vol. 13(5), pages 1-26, May.
    14. Parkhomenko Elena & Tritchler David & Beyene Joseph, 2009. "Sparse Canonical Correlation Analysis with Application to Genomic Data Integration," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-34, January.
    15. Leopold Parts & Oliver Stegle & John Winn & Richard Durbin, 2011. "Joint Genetic Analysis of Gene Expression Data with Inferred Cellular Phenotypes," PLOS Genetics, Public Library of Science, vol. 7(1), pages 1-10, January.
    16. Ning Jiang & Minghui Wang & Tianye Jia & Lin Wang & Lindsey Leach & Christine Hackett & David Marshall & Zewei Luo, 2011. "A Robust Statistical Method for Association-Based eQTL Analysis," PLOS ONE, Public Library of Science, vol. 6(8), pages 1-11, August.
    17. Paul C Boutros & Ivy D Moffat & Allan B Okey & Raimo Pohjanvirta, 2011. "mRNA Levels in Control Rat Liver Display Strain-Specific, Hereditary, and AHR-Dependent Components," PLOS ONE, Public Library of Science, vol. 6(7), pages 1-15, July.
    18. Hui-Min Wang & Ching-Lin Hsiao & Ai-Ru Hsieh & Ying-Chao Lin & Cathy S J Fann, 2012. "Constructing Endophenotypes of Complex Diseases Using Non-Negative Matrix Factorization and Adjusted Rand Index," PLOS ONE, Public Library of Science, vol. 7(7), pages 1-12, July.
    19. Xiaohong Li & Steven G Self & Patricia C Galipeau & Thomas G Paulson & Brian J Reid, 2007. "Direct Inference of SNP Heterozygosity Rates and Resolution of LOH Detection," PLOS Computational Biology, Public Library of Science, vol. 3(11), pages 1-10, November.
    20. Urmo Võsa & Tõnu Esko & Silva Kasela & Tarmo Annilo, 2015. "Altered Gene Expression Associated with microRNA Binding Site Polymorphisms," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-24, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:8:y:2009:i:1:n:28. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.