IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0276886.html
   My bibliography  Save this article

Sparse canonical correlation to identify breast cancer related genes regulated by copy number aberrations

Author

Listed:
  • Diptavo Dutta
  • Ananda Sen
  • Jaya Satagopan

Abstract

Background: Copy number aberrations (CNAs) in cancer affect disease outcomes by regulating molecular phenotypes, such as gene expressions, that drive important biological processes. To gain comprehensive insights into molecular biomarkers for cancer, it is critical to identify key groups of CNAs, the associated gene modules, regulatory modules, and their downstream effect on outcomes. Methods: In this paper, we demonstrate an innovative use of sparse canonical correlation analysis (sCCA) to effectively identify the ensemble of CNAs, and gene modules in the context of binary and censored disease endpoints. Our approach detects potentially orthogonal gene expression modules which are highly correlated with sets of CNA and then identifies the genes within these modules that are associated with the outcome. Results: Analyzing clinical and genomic data on 1,904 breast cancer patients from the METABRIC study, we found 14 gene modules to be regulated by groups of proximally located CNA sites. We validated this finding using an independent set of 1,077 breast invasive carcinoma samples from The Cancer Genome Atlas (TCGA). Our analysis of 7 clinical endpoints identified several novel and interpretable regulatory associations, highlighting the role of CNAs in key biological pathways and processes for breast cancer. Genes significantly associated with the outcomes were enriched for early estrogen response pathway, DNA repair pathways as well as targets of transcription factors such as E2F4, MYC, and ETS1 that have recognized roles in tumor characteristics and survival. Subsequent meta-analysis across the endpoints further identified several genes through the aggregation of weaker associations. Conclusions: Our findings suggest that sCCA analysis can aggregate weaker associations to identify interpretable and important genes, modules, and clinically consequential pathways.

Suggested Citation

  • Diptavo Dutta & Ananda Sen & Jaya Satagopan, 2022. "Sparse canonical correlation to identify breast cancer related genes regulated by copy number aberrations," PLOS ONE, Public Library of Science, vol. 17(12), pages 1-18, December.
  • Handle: RePEc:plo:pone00:0276886
    DOI: 10.1371/journal.pone.0276886
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0276886
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0276886&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0276886?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Meinshausen, Nicolai & Meier, Lukas & Bühlmann, Peter, 2009. "p-Values for High-Dimensional Regression," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1671-1681.
    2. Diptavo Dutta & Yuan He & Ashis Saha & Marios Arvanitis & Alexis Battle & Nilanjan Chatterjee, 2022. "Aggregative trans-eQTL analysis detects trait-specific target gene sets in whole blood," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    3. Christina Curtis & Sohrab P. Shah & Suet-Feung Chin & Gulisa Turashvili & Oscar M. Rueda & Mark J. Dunning & Doug Speed & Andy G. Lynch & Shamith Samarajiwa & Yinyin Yuan & Stefan Gräf & Gavin Ha & Gh, 2012. "The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups," Nature, Nature, vol. 486(7403), pages 346-352, June.
    4. repec:plo:pone00:0055489 is not listed on IDEAS
    5. Jill E. Moore & Michael J. Purcaro & Henry E. Pratt & Charles B. Epstein & Noam Shoresh & Jessika Adrian & Trupti Kawli & Carrie A. Davis & Alexander Dobin & Rajinder Kaul & Jessica Halow & Eric L. No, 2020. "Expanded encyclopaedias of DNA elements in the human and mouse genomes," Nature, Nature, vol. 583(7818), pages 699-710, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xu, Yang & Zhao, Shishun & Hu, Tao & Sun, Jianguo, 2021. "Variable selection for generalized odds rate mixture cure models with interval-censored failure time data," Computational Statistics & Data Analysis, Elsevier, vol. 156(C).
    2. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    3. Aleix Prat & Fara Brasó-Maristany & Olga Martínez-Sáez & Esther Sanfeliu & Youli Xia & Meritxell Bellet & Patricia Galván & Débora Martínez & Tomás Pascual & Mercedes Marín-Aguilera & Anna Rodríguez &, 2023. "Circulating tumor DNA reveals complex biological features with clinical relevance in metastatic breast cancer," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    4. Shi, Chengchun & Zhou, Yunzhe & Li, Lexin, 2024. "Testing directed acyclic graph via structural, supervised and generative adversarial learning," LSE Research Online Documents on Economics 119446, London School of Economics and Political Science, LSE Library.
    5. Liang, Weijuan & Zhang, Qingzhao & Ma, Shuangge, 2024. "Hierarchical false discovery rate control for high-dimensional survival analysis with interactions," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    6. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    7. Claude Renaux & Laura Buzdugan & Markus Kalisch & Peter Bühlmann, 2020. "Rejoinder on: Hierarchical inference for genome-wide association studies: a view on methodology with software," Computational Statistics, Springer, vol. 35(1), pages 59-67, March.
    8. Hugh Chen & Scott M. Lundberg & Su-In Lee, 2022. "Explaining a series of models by propagating Shapley values," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    9. Ian W. McKeague & Min Qian, 2015. "An Adaptive Resampling Test for Detecting the Presence of Significant Predictors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1422-1433, December.
    10. Adam C. Weiner & Marc J. Williams & Hongyu Shi & Ignacio Vázquez-García & Sohrab Salehi & Nicole Rusk & Samuel Aparicio & Sohrab P. Shah & Andrew McPherson, 2024. "Inferring replication timing and proliferation dynamics from single-cell DNA sequencing data," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    11. Marco, Nicholas & Şentürk, Damla & Jeste, Shafali & DiStefano, Charlotte C. & Dickinson, Abigail & Telesca, Donatello, 2024. "Flexible regularized estimation in high-dimensional mixed membership models," Computational Statistics & Data Analysis, Elsevier, vol. 194(C).
    12. Camilla Tombari & Alessandro Zannini & Rebecca Bertolio & Silvia Pedretti & Matteo Audano & Luca Triboli & Valeria Cancila & Davide Vacca & Manuel Caputo & Sara Donzelli & Ilenia Segatto & Simone Vodr, 2023. "Mutant p53 sustains serine-glycine synthesis and essential amino acids intake promoting breast cancer growth," Nature Communications, Nature, vol. 14(1), pages 1-21, December.
    13. Ruben Dezeure & Peter Bühlmann & Cun-Hui Zhang, 2017. "Rejoinder on: High-dimensional simultaneous inference with the bootstrap," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 26(4), pages 751-758, December.
    14. Nicolas Städler & Sach Mukherjee, 2017. "Two-sample testing in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(1), pages 225-246, January.
    15. Guo, Wenwen & Cui, Hengjian, 2019. "Projection tests for high-dimensional spiked covariance matrices," Journal of Multivariate Analysis, Elsevier, vol. 169(C), pages 21-32.
    16. Lei-Jie Dai & Ding Ma & Yu-Zheng Xu & Ming Li & Yu-Wei Li & Yi Xiao & Xi Jin & Song-Yang Wu & Ya-Xin Zhao & Han Wang & Wen-Tao Yang & Yi-Zhou Jiang & Zhi-Ming Shao, 2023. "Molecular features and clinical implications of the heterogeneity in Chinese patients with HER2-low breast cancer," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    17. Ha-Linh Nguyen & Tatjana Geukens & Marion Maetens & Samuel Aparicio & Ayse Bassez & Ake Borg & Jane Brock & Annegien Broeks & Carlos Caldas & Fatima Cardoso & Maxim Schepper & Mauro Delorenzi & Caroli, 2023. "Obesity-associated changes in molecular biology of primary breast cancer," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    18. Kevin He & Yue Wang & Xiang Zhou & Han Xu & Can Huang, 2019. "An improved variable selection procedure for adaptive Lasso in high-dimensional survival analysis," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 25(3), pages 569-585, July.
    19. Victor Chernozhukov & Mert Demirer & Esther Duflo & Iv'an Fern'andez-Val, 2017. "Fisher-Schultz Lecture: Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with an Application to Immunization in India," Papers 1712.04802, arXiv.org, revised Oct 2023.
    20. Achim Ahrens & Christian B. Hansen & Mark E. Schaffer, 2020. "lassopack: Model selection and prediction with regularized regression in Stata," Stata Journal, StataCorp LLC, vol. 20(1), pages 176-235, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0276886. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.