IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0276886.html
   My bibliography  Save this article

Sparse canonical correlation to identify breast cancer related genes regulated by copy number aberrations

Author

Listed:
  • Diptavo Dutta
  • Ananda Sen
  • Jaya Satagopan

Abstract

Background: Copy number aberrations (CNAs) in cancer affect disease outcomes by regulating molecular phenotypes, such as gene expressions, that drive important biological processes. To gain comprehensive insights into molecular biomarkers for cancer, it is critical to identify key groups of CNAs, the associated gene modules, regulatory modules, and their downstream effect on outcomes. Methods: In this paper, we demonstrate an innovative use of sparse canonical correlation analysis (sCCA) to effectively identify the ensemble of CNAs, and gene modules in the context of binary and censored disease endpoints. Our approach detects potentially orthogonal gene expression modules which are highly correlated with sets of CNA and then identifies the genes within these modules that are associated with the outcome. Results: Analyzing clinical and genomic data on 1,904 breast cancer patients from the METABRIC study, we found 14 gene modules to be regulated by groups of proximally located CNA sites. We validated this finding using an independent set of 1,077 breast invasive carcinoma samples from The Cancer Genome Atlas (TCGA). Our analysis of 7 clinical endpoints identified several novel and interpretable regulatory associations, highlighting the role of CNAs in key biological pathways and processes for breast cancer. Genes significantly associated with the outcomes were enriched for early estrogen response pathway, DNA repair pathways as well as targets of transcription factors such as E2F4, MYC, and ETS1 that have recognized roles in tumor characteristics and survival. Subsequent meta-analysis across the endpoints further identified several genes through the aggregation of weaker associations. Conclusions: Our findings suggest that sCCA analysis can aggregate weaker associations to identify interpretable and important genes, modules, and clinically consequential pathways.

Suggested Citation

  • Diptavo Dutta & Ananda Sen & Jaya Satagopan, 2022. "Sparse canonical correlation to identify breast cancer related genes regulated by copy number aberrations," PLOS ONE, Public Library of Science, vol. 17(12), pages 1-18, December.
  • Handle: RePEc:plo:pone00:0276886
    DOI: 10.1371/journal.pone.0276886
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0276886
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0276886&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0276886?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. repec:plo:pone00:0055489 is not listed on IDEAS
    2. Jill E. Moore & Michael J. Purcaro & Henry E. Pratt & Charles B. Epstein & Noam Shoresh & Jessika Adrian & Trupti Kawli & Carrie A. Davis & Alexander Dobin & Rajinder Kaul & Jessica Halow & Eric L. No, 2020. "Expanded encyclopaedias of DNA elements in the human and mouse genomes," Nature, Nature, vol. 583(7818), pages 699-710, July.
    3. Meinshausen, Nicolai & Meier, Lukas & Bühlmann, Peter, 2009. "p-Values for High-Dimensional Regression," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1671-1681.
    4. Diptavo Dutta & Yuan He & Ashis Saha & Marios Arvanitis & Alexis Battle & Nilanjan Chatterjee, 2022. "Aggregative trans-eQTL analysis detects trait-specific target gene sets in whole blood," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    5. Christina Curtis & Sohrab P. Shah & Suet-Feung Chin & Gulisa Turashvili & Oscar M. Rueda & Mark J. Dunning & Doug Speed & Andy G. Lynch & Shamith Samarajiwa & Yinyin Yuan & Stefan Gräf & Gavin Ha & Gh, 2012. "The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups," Nature, Nature, vol. 486(7403), pages 346-352, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xu, Yang & Zhao, Shishun & Hu, Tao & Sun, Jianguo, 2021. "Variable selection for generalized odds rate mixture cure models with interval-censored failure time data," Computational Statistics & Data Analysis, Elsevier, vol. 156(C).
    2. Aleix Prat & Fara Brasó-Maristany & Olga Martínez-Sáez & Esther Sanfeliu & Youli Xia & Meritxell Bellet & Patricia Galván & Débora Martínez & Tomás Pascual & Mercedes Marín-Aguilera & Anna Rodríguez &, 2023. "Circulating tumor DNA reveals complex biological features with clinical relevance in metastatic breast cancer," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    3. Shi, Chengchun & Zhou, Yunzhe & Li, Lexin, 2024. "Testing directed acyclic graph via structural, supervised and generative adversarial learning," LSE Research Online Documents on Economics 119446, London School of Economics and Political Science, LSE Library.
    4. Liang, Weijuan & Zhang, Qingzhao & Ma, Shuangge, 2024. "Hierarchical false discovery rate control for high-dimensional survival analysis with interactions," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    5. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    6. Hugh Chen & Scott M. Lundberg & Su-In Lee, 2022. "Explaining a series of models by propagating Shapley values," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    7. Ian W. McKeague & Min Qian, 2015. "An Adaptive Resampling Test for Detecting the Presence of Significant Predictors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1422-1433, December.
    8. Camilla Tombari & Alessandro Zannini & Rebecca Bertolio & Silvia Pedretti & Matteo Audano & Luca Triboli & Valeria Cancila & Davide Vacca & Manuel Caputo & Sara Donzelli & Ilenia Segatto & Simone Vodr, 2023. "Mutant p53 sustains serine-glycine synthesis and essential amino acids intake promoting breast cancer growth," Nature Communications, Nature, vol. 14(1), pages 1-21, December.
    9. Nicolas Städler & Sach Mukherjee, 2017. "Two-sample testing in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(1), pages 225-246, January.
    10. Kevin He & Yue Wang & Xiang Zhou & Han Xu & Can Huang, 2019. "An improved variable selection procedure for adaptive Lasso in high-dimensional survival analysis," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 25(3), pages 569-585, July.
    11. Peng, Liang & Qi, Yongcheng & Wang, Ruodu, 2014. "Empirical likelihood test for high dimensional linear models," Statistics & Probability Letters, Elsevier, vol. 86(C), pages 85-90.
    12. Michael R. Kelly & Kamila Wisniewska & Matthew J. Regner & Michael W. Lewis & Andrea A. Perreault & Eric S. Davis & Douglas H. Phanstiel & Joel S. Parker & Hector L. Franco, 2022. "A multi-omic dissection of super-enhancer driven oncogenic gene expression programs in ovarian cancer," Nature Communications, Nature, vol. 13(1), pages 1-22, December.
    13. Caroline Hoffmann & Floriane Noel & Maximilien Grandclaudon & Lucile Massenet-Regad & Paula Michea & Philemon Sirven & Lilith Faucheux & Aurore Surun & Olivier Lantz & Mylene Bohec & Jian Ye & Weihua , 2022. "PD-L1 and ICOSL discriminate human Secretory and Helper dendritic cells in cancer, allergy and autoimmunity," Nature Communications, Nature, vol. 13(1), pages 1-20, December.
    14. The Tien Mai, 2023. "Reliable Genetic Correlation Estimation via Multiple Sample Splitting and Smoothing," Mathematics, MDPI, vol. 11(9), pages 1-13, May.
    15. Marta Vicioso-Mantis & Raquel Fueyo & Claudia Navarro & Sara Cruz-Molina & Wilfred F. J. Ijcken & Elena Rebollo & Álvaro Rada-Iglesias & Marian A. Martínez-Balbás, 2022. "JMJD3 intrinsically disordered region links the 3D-genome structure to TGFβ-dependent transcription activation," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    16. repec:plo:pone00:0049359 is not listed on IDEAS
    17. Chao-Hui Chang & Feng Liu & Stefania Militi & Svenja Hester & Reshma Nibhani & Siwei Deng & James Dunford & Aniko Rendek & Zahir Soonawalla & Roman Fischer & Udo Oppermann & Siim Pauklin, 2024. "The pRb/RBL2-E2F1/4-GCN5 axis regulates cancer stem cell formation and G0 phase entry/exit by paracrine mechanisms," Nature Communications, Nature, vol. 15(1), pages 1-29, December.
    18. Solari, Aldo & Djordjilović, Vera, 2022. "Multi split conformal prediction," Statistics & Probability Letters, Elsevier, vol. 184(C).
    19. Victor Chernozhukov & Mert Demirer & Esther Duflo & Iv'an Fern'andez-Val, 2017. "Fisher-Schultz Lecture: Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with an Application to Immunization in India," Papers 1712.04802, arXiv.org, revised Oct 2023.
    20. Sandra M. Rocha & Sílvia Socorro & Luís A. Passarinha & Cláudio J. Maia, 2022. "Comprehensive Landscape of STEAP Family Members Expression in Human Cancers: Unraveling the Potential Usefulness in Clinical Practice Using Integrated Bioinformatics Analysis," Data, MDPI, vol. 7(5), pages 1-48, May.
    21. Jieqiong Zhang & Zhenhua Hu & Hwa Hwa Chung & Yun Tian & Kah Weng Lau & Zheng Ser & Yan Ting Lim & Radoslaw M. Sobota & Hwei Fen Leong & Benjamin Jieming Chen & Clarisse Jingyi Yeo & Shawn Ying Xuan T, 2023. "Dependency of NELF-E-SLUG-KAT2B epigenetic axis in breast cancer carcinogenesis," Nature Communications, Nature, vol. 14(1), pages 1-21, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0276886. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.