IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v166y2018icp241-265.html
   My bibliography  Save this article

Angle-based joint and individual variation explained

Author

Listed:
  • Feng, Qing
  • Jiang, Meilei
  • Hannig, Jan
  • Marron, J.S.

Abstract

Integrative analysis of disparate data blocks measured on a common set of experimental subjects is a major challenge in modern data analysis. This data structure naturally motivates the simultaneous exploration of the joint and individual variation within each data block resulting in new insights. For instance, there is a strong desire to integrate the multiple genomic data sets in The Cancer Genome Atlas to characterize the common and also the unique aspects of cancer genetics and cell biology for each source. In this paper we introduce Angle-Based Joint and Individual Variation Explained capturing both joint and individual variation within each data block. This is a major improvement over earlier approaches to this challenge in terms of a new conceptual understanding, much better adaption to data heterogeneity and a fast linear algebra computation. Important mathematical contributions are the use of score subspaces as the principal descriptors of variation structure and the use of perturbation theory as the guide for variation segmentation. This leads to an exploratory data analysis method which is insensitive to the heterogeneity among data blocks and does not require separate normalization. An application to cancer data reveals different behaviors of each type of signal in characterizing tumor subtypes. An application to a mortality data set reveals interesting historical lessons. Software and data are available at GitHub https://github.com/MeileiJiang/AJIVE_Project.

Suggested Citation

  • Feng, Qing & Jiang, Meilei & Hannig, Jan & Marron, J.S., 2018. "Angle-based joint and individual variation explained," Journal of Multivariate Analysis, Elsevier, vol. 166(C), pages 241-265.
  • Handle: RePEc:eee:jmvana:v:166:y:2018:i:c:p:241-265
    DOI: 10.1016/j.jmva.2018.03.008
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047259X1730204X
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jmva.2018.03.008?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Parkhomenko Elena & Tritchler David & Beyene Joseph, 2009. "Sparse Canonical Correlation Analysis with Application to Genomic Data Integration," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-34, January.
    2. Kotz,Samuel & Nadarajah,Saralees, 2004. "Multivariate T-Distributions and Their Applications," Cambridge Books, Cambridge University Press, number 9780521826549.
    3. Paul Horst, 1961. "Relations amongm sets of measures," Psychometrika, Springer;The Psychometric Society, vol. 26(2), pages 129-149, June.
    4. Waaijenborg Sandra & Verselewel de Witt Hamer Philip C. & Zwinderman Aeilko H, 2008. "Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-29, January.
    5. Vinod, H. D., 1976. "Canonical ridge and econometrics of joint production," Journal of Econometrics, Elsevier, vol. 4(2), pages 147-166, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yang, Xi & Hoadley, Katherine A. & Hannig, Jan & Marron, J.S., 2023. "Jackstraw inference for AJIVE data integration," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    2. Davide Pigoli & Pantelis Z. Hadjipantelis & John S. Coleman & John A. D. Aston, 2018. "The statistical analysis of acoustic phonetic data: exploring differences between spoken Romance languages," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 67(5), pages 1103-1145, November.
    3. J. S. Marron, 2019. "Comments on: Data science, big data and statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 342-344, June.
    4. Palzer, Elise F. & Wendt, Christine H. & Bowler, Russell P. & Hersh, Craig P. & Safo, Sandra E. & Lock, Eric F., 2022. "sJIVE: Supervised joint and individual variation explained," Computational Statistics & Data Analysis, Elsevier, vol. 175(C).
    5. Zhao, Yuxuan & Matteson, David S. & Mostofsky, Stewart H. & Nebel, Mary Beth & Risk, Benjamin B., 2022. "Group linear non-Gaussian component analysis with applications to neuroimaging," Computational Statistics & Data Analysis, Elsevier, vol. 171(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chalise, Prabhakar & Fridley, Brooke L., 2012. "Comparison of penalty functions for sparse canonical correlation analysis," Computational Statistics & Data Analysis, Elsevier, vol. 56(2), pages 245-254.
    2. Lykou, Anastasia & Whittaker, Joe, 2010. "Sparse CCA using a Lasso with positivity constraints," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3144-3157, December.
    3. Melissa G Naylor & Xihong Lin & Scott T Weiss & Benjamin A Raby & Christoph Lange, 2010. "Using Canonical Correlation Analysis to Discover Genetic Regulatory Variants," PLOS ONE, Public Library of Science, vol. 5(5), pages 1-6, May.
    4. Wang, Wenjia & Zhou, Yi-Hui, 2021. "Eigenvector-based sparse canonical correlation analysis: Fast computation for estimation of multiple canonical vectors," Journal of Multivariate Analysis, Elsevier, vol. 185(C).
    5. Tenenhaus, Arthur & Philippe, Cathy & Frouin, Vincent, 2015. "Kernel Generalized Canonical Correlation Analysis," Computational Statistics & Data Analysis, Elsevier, vol. 90(C), pages 114-131.
    6. Lukáš Malec & Vladimír Janovský, 2020. "Connecting the multivariate partial least squares with canonical analysis: a path-following approach," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(3), pages 589-609, September.
    7. Zhang Fan & Miecznikowski Jeffrey C. & Tritchler David L., 2020. "Identification of supervised and sparse functional genomic pathways," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 19(1), pages 1-27, February.
    8. Wan-Lun Wang, 2019. "Mixture of multivariate t nonlinear mixed models for multiple longitudinal data with heterogeneity and missing values," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 196-222, March.
    9. Takane, Yoshio & Yanai, Haruo & Hwang, Heungsun, 2006. "An improved method for generalized constrained canonical correlation analysis," Computational Statistics & Data Analysis, Elsevier, vol. 50(1), pages 221-241, January.
    10. repec:jss:jstsof:23:i12 is not listed on IDEAS
    11. Hanafi, Mohamed & Kiers, Henk A.L., 2006. "Analysis of K sets of data, with differential emphasis on agreement between and within sets," Computational Statistics & Data Analysis, Elsevier, vol. 51(3), pages 1491-1508, December.
    12. Lamboni, Matieyendou, 2022. "Efficient dependency models: Simulating dependent random variables," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 200(C), pages 199-217.
    13. Chen, Tao & Martin, Elaine & Montague, Gary, 2009. "Robust probabilistic PCA with missing data and contribution analysis for outlier detection," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3706-3716, August.
    14. Pietro Amenta & Antonio Lucadamo & Antonello D’Ambra, 2021. "Restricted Common Component and Specific Weight Analysis: A Constrained Explorative Approach for the Customer Satisfaction Evaluation," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 156(2), pages 409-427, August.
    15. Szefer Elena & Graham Jinko & Lu Donghuan & Beg Mirza Faisal & Nathoo Farouk, 2017. "Multivariate association between single-nucleotide polymorphisms in Alzgene linkage regions and structural changes in the brain: discovery, refinement and validation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(5-6), pages 349-365, December.
    16. Catania, Leopoldo & Proietti, Tommaso, 2020. "Forecasting volatility with time-varying leverage and volatility of volatility effects," International Journal of Forecasting, Elsevier, vol. 36(4), pages 1301-1317.
    17. Andrés García-Medina & Graciela González Farías, 2020. "Transfer entropy as a variable selection methodology of cryptocurrencies in the framework of a high dimensional predictive model," PLOS ONE, Public Library of Science, vol. 15(1), pages 1-31, January.
    18. Alberto Roverato & F. Marta L. Di Lascio, 2011. "Wilks' Λ Dissimilarity Measures for Gene Clustering: An Approach Based on the Identification of Transcription Modules," Biometrics, The International Biometric Society, vol. 67(4), pages 1236-1248, December.
    19. Yuzhu Tian & Er’qian Li & Maozai Tian, 2016. "Bayesian joint quantile regression for mixed effects models with censoring and errors in covariates," Computational Statistics, Springer, vol. 31(3), pages 1031-1057, September.
    20. Jondeau, Eric, 2016. "Asymmetry in tail dependence in equity portfolios," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 351-368.
    21. Badi H. Baltagi & Georges Bresson & Anoop Chaturvedi & Guy Lacroix, 2022. "Robust Dynamic Space-Time Panel Data Models Using ε-contamination: An Application to Crop Yields and Climate Change," Center for Policy Research Working Papers 254, Center for Policy Research, Maxwell School, Syracuse University.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:166:y:2018:i:c:p:241-265. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.