IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v77y2021i4p1445-1455.html
   My bibliography  Save this article

Using the “Hidden” genome to improve classification of cancer types

Author

Listed:
  • Saptarshi Chakraborty
  • Colin B. Begg
  • Ronglai Shen

Abstract

It is increasingly common clinically for cancer specimens to be examined using techniques that identify somatic mutations. In principle, these mutational profiles can be used to diagnose the tissue of origin, a critical task for the 3% to 5% of tumors that have an unknown primary site. Diagnosis of primary site is also critical for screening tests that employ circulating DNA. However, most mutations observed in any new tumor are very rarely occurring mutations, and indeed the preponderance of these may never have been observed in any previous recorded tumor. To create a viable diagnostic tool we need to harness the information content in this “hidden genome” of variants for which no direct information is available. To accomplish this we propose a multilevel meta‐feature regression to extract the critical information from rare variants in the training data in a way that permits us to also extract diagnostic information from any previously unobserved variants in the new tumor sample. A scalable implementation of the model is obtained by combining a high‐dimensional feature screening approach with a group‐lasso penalized maximum likelihood approach based on an equivalent mixed‐effect representation of the multilevel model. We apply the method to the Cancer Genome Atlas whole‐exome sequencing data set including 3702 tumor samples across seven common cancer sites. Results show that our multilevel approach can harness substantial diagnostic information from the hidden genome.

Suggested Citation

  • Saptarshi Chakraborty & Colin B. Begg & Ronglai Shen, 2021. "Using the “Hidden” genome to improve classification of cancer types," Biometrics, The International Biometric Society, vol. 77(4), pages 1445-1455, December.
  • Handle: RePEc:bla:biomet:v:77:y:2021:i:4:p:1445-1455
    DOI: 10.1111/biom.13367
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13367
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13367?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Vincent, Martin & Hansen, Niels Richard, 2014. "Sparse group lasso and high dimensional multinomial classification," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 771-786.
    2. Saptarshi Chakraborty & Arshi Arora & Colin B. Begg & Ronglai Shen, 2019. "Using somatic variant richness to mine signals from rare variants in the cancer genome," Nature Communications, Nature, vol. 10(1), pages 1-9, December.
    3. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    4. Ludmil B. Alexandrov & Serena Nik-Zainal & David C. Wedge & Samuel A. J. R. Aparicio & Sam Behjati & Andrew V. Biankin & Graham R. Bignell & Niccolò Bolli & Ake Borg & Anne-Lise Børresen-Dale & Sandri, 2013. "Correction: Corrigendum: Signatures of mutational processes in human cancer," Nature, Nature, vol. 502(7470), pages 258-258, October.
    5. Ludmil B. Alexandrov & Serena Nik-Zainal & David C. Wedge & Samuel A. J. R. Aparicio & Sam Behjati & Andrew V. Biankin & Graham R. Bignell & Niccolò Bolli & Ake Borg & Anne-Lise Børresen-Dale & Sandri, 2013. "Signatures of mutational processes in human cancer," Nature, Nature, vol. 500(7463), pages 415-421, August.
    6. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    2. Xing Cheng & Jing An & Jitong Lou & Qisheng Gu & Weimin Ding & Gaith Nabil Droby & Yilin Wang & Chenghao Wang & Yanzhe Gao & Jay Ramanlal Anand & Abigail Shelton & Andrew Benson Satterlee & Breanna Ma, 2024. "Trans-lesion synthesis and mismatch repair pathway crosstalk defines chemoresistance and hypermutation mechanisms in glioblastoma," Nature Communications, Nature, vol. 15(1), pages 1-20, December.
    3. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    4. Marjan M. Naeini & Felicity Newell & Lauren G. Aoude & Vanessa F. Bonazzi & Kalpana Patel & Guy Lampe & Lambros T. Koufariotis & Vanessa Lakis & Venkateswar Addala & Olga Kondrashova & Rebecca L. John, 2023. "Multi-omic features of oesophageal adenocarcinoma in patients treated with preoperative neoadjuvant therapy," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    5. Ambrocio Sanchez & Pedro Ortega & Ramin Sakhtemani & Lavanya Manjunath & Sunwoo Oh & Elodie Bournique & Alexandrea Becker & Kyumin Kim & Cameron Durfee & Nuri Alpay Temiz & Xiaojiang S. Chen & Reuben , 2024. "Mesoscale DNA features impact APOBEC3A and APOBEC3B deaminase activity and shape tumor mutational landscapes," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    6. Brittany N. Vandenberg & Marian F. Laughery & Cameron Cordero & Dalton Plummer & Debra Mitchell & Jordan Kreyenhagen & Fatimah Albaqshi & Alexander J. Brown & Piotr A. Mieczkowski & John J. Wyrick & S, 2023. "Contributions of replicative and translesion DNA polymerases to mutagenic bypass of canonical and atypical UV photoproducts," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    7. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    8. Anna Luiza Silva Almeida Vicente & Alexei Novoloaca & Vincent Cahais & Zainab Awada & Cyrille Cuenin & Natália Spitz & André Lopes Carvalho & Adriane Feijó Evangelista & Camila Souza Crovador & Rui Ma, 2022. "Cutaneous and acral melanoma cross-OMICs reveals prognostic cancer drivers associated with pathobiology and ultraviolet exposure," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    9. Victor Chernozhukov & Christian Hansen & Yuan Liao, 2015. "A lava attack on the recovery of sums of dense and sparse signals," CeMMAP working papers CWP56/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    10. Teresa Maria Rosaria Noviello & Anna Maria Giacomo & Francesca Pia Caruso & Alessia Covre & Roberta Mortarini & Giovanni Scala & Maria Claudia Costa & Sandra Coral & Wolf H. Fridman & Catherine Sautès, 2023. "Guadecitabine plus ipilimumab in unresectable melanoma: five-year follow-up and integrated multi-omic analysis in the phase 1b NIBIT-M4 trial," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    11. Qi Zhao & Feng Wang & Yan-Xing Chen & Shifu Chen & Yi-Chen Yao & Zhao-Lei Zeng & Teng-Jia Jiang & Ying-Nan Wang & Chen-Yi Wu & Ying Jing & You-Sheng Huang & Jing Zhang & Zi-Xian Wang & Ming-Ming He & , 2022. "Comprehensive profiling of 1015 patients’ exomes reveals genomic-clinical associations in colorectal cancer," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    12. Zeng, Yaohui & Yang, Tianbao & Breheny, Patrick, 2021. "Hybrid safe–strong rules for efficient optimization in lasso-type problems," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).
    13. Ankur Chakravarthy & Ian Reddin & Stephen Henderson & Cindy Dong & Nerissa Kirkwood & Maxmilan Jeyakumar & Daniela Rothschild Rodriguez & Natalia Gonzalez Martinez & Jacqueline McDermott & Xiaoping Su, 2022. "Integrated analysis of cervical squamous cell carcinoma cohorts from three continents reveals conserved subtypes of prognostic significance," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    14. Lee, Sangin & Lee, Youngjo & Pawitan, Yudi, 2018. "Sparse pathway-based prediction models for high-throughput molecular data," Computational Statistics & Data Analysis, Elsevier, vol. 126(C), pages 125-135.
    15. Charles‐Elie Rabier & Simona Grusea, 2021. "Prediction in high‐dimensional linear models and application to genomic selection under imperfect linkage disequilibrium," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 1001-1026, August.
    16. Francesca Menghi & Edison T. Liu, 2022. "Functional genomics of complex cancer genomes," Nature Communications, Nature, vol. 13(1), pages 1-4, December.
    17. Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2023. "Machine learning advances for time series forecasting," Journal of Economic Surveys, Wiley Blackwell, vol. 37(1), pages 76-111, February.
    18. Zhenghui Feng & Lu Lin & Ruoqing Zhu & Lixing Zhu, 2020. "Nonparametric variable selection and its application to additive models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(3), pages 827-854, June.
    19. Sujath Abbas & Oriol Pich & Ginny Devonshire & Shahriar A. Zamani & Annalise Katz-Summercorn & Sarah Killcoyne & Calvin Cheah & Barbara Nutzinger & Nicola Grehan & Nuria Lopez-Bigas & Rebecca C. Fitzg, 2023. "Mutational signature dynamics shaping the evolution of oesophageal adenocarcinoma," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    20. He, Yong & Zhang, Liang & Ji, Jiadong & Zhang, Xinsheng, 2019. "Robust feature screening for elliptical copula regression model," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 568-582.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:77:y:2021:i:4:p:1445-1455. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.