IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v13y2022i1d10.1038_s41467-022-31666-w.html
   My bibliography  Save this article

Machine learning-based tissue of origin classification for cancer of unknown primary diagnostics using genome-wide mutation features

Author

Listed:
  • Luan Nguyen

    (Universiteitsweg 100)

  • Arne Hoeck

    (Universiteitsweg 100)

  • Edwin Cuppen

    (Universiteitsweg 100
    Hartwig Medical Foundation)

Abstract

Cancers of unknown primary (CUP) origin account for ∼3% of all cancer diagnoses, whereby the tumor tissue of origin (TOO) cannot be determined. Using a uniformly processed dataset encompassing 6756 whole-genome sequenced primary and metastatic tumors, we develop Cancer of Unknown Primary Location Resolver (CUPLR), a random forest TOO classifier that employs 511 features based on simple and complex somatic driver and passenger mutations. CUPLR distinguishes 35 cancer (sub)types with ∼90% recall and ∼90% precision based on cross-validation and test set predictions. We find that structural variant derived features increase the performance and utility for classifying specific cancer types. With CUPLR, we could determine the TOO for 82/141 (58%) of CUP patients. Although CUPLR is based on machine learning, it provides a human interpretable graphical report with detailed feature explanations. The comprehensive output of CUPLR complements existing histopathological procedures and can enable improved diagnostics for CUP patients.

Suggested Citation

  • Luan Nguyen & Arne Hoeck & Edwin Cuppen, 2022. "Machine learning-based tissue of origin classification for cancer of unknown primary diagnostics using genome-wide mutation features," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
  • Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-31666-w
    DOI: 10.1038/s41467-022-31666-w
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-022-31666-w
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-022-31666-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Benjamin Schuster-Böckler & Ben Lehner, 2012. "Chromatin organization is a major influence on regional mutation rates in human cancer cells," Nature, Nature, vol. 488(7412), pages 504-507, August.
    2. Harald Vöhringer & Arne Van Hoeck & Edwin Cuppen & Moritz Gerstung, 2021. "Learning mutational signatures and their multidimensional genomic properties with TensorSignatures," Nature Communications, Nature, vol. 12(1), pages 1-16, December.
    3. Isidro Cortes-Ciriano & Sejoon Lee & Woong-Yang Park & Tae-Min Kim & Peter J. Park, 2017. "A molecular portrait of microsatellite instability across multiple cancers," Nature Communications, Nature, vol. 8(1), pages 1-12, August.
    4. Moritz Gerstung & Clemency Jolly & Ignaty Leshchiner & Stefan C. Dentro & Santiago Gonzalez & Daniel Rosebrock & Thomas J. Mitchell & Yulia Rubanova & Pavana Anur & Kaixian Yu & Maxime Tarabichi & Ami, 2020. "The evolutionary history of 2,658 cancers," Nature, Nature, vol. 578(7793), pages 122-128, February.
    5. Yilong Li & Nicola D. Roberts & Jeremiah A. Wala & Ofer Shapira & Steven E. Schumacher & Kiran Kumar & Ekta Khurana & Sebastian Waszak & Jan O. Korbel & James E. Haber & Marcin Imielinski & Joachim We, 2020. "Patterns of somatic structural variation in human cancer genomes," Nature, Nature, vol. 578(7793), pages 112-121, February.
    6. Peter Priestley & Jonathan Baber & Martijn P. Lolkema & Neeltje Steeghs & Ewart Bruijn & Charles Shale & Korneel Duyvesteyn & Susan Haidari & Arne Hoeck & Wendy Onstenk & Paul Roepman & Mircea Voda & , 2019. "Pan-cancer whole-genome analyses of metastatic solid tumours," Nature, Nature, vol. 575(7781), pages 210-216, November.
    7. Paz Polak & Rosa Karlić & Amnon Koren & Robert Thurman & Richard Sandstrom & Michael S. Lawrence & Alex Reynolds & Eric Rynes & Kristian Vlahoviček & John A. Stamatoyannopoulos & Shamil R. Sunyaev, 2015. "Cell-of-origin chromatin organization shapes the mutational landscape of cancer," Nature, Nature, vol. 518(7539), pages 360-364, February.
    8. S. Kasar & J. Kim & R. Improgo & G. Tiao & P. Polak & N. Haradhvala & M. S. Lawrence & A. Kiezun & S. M. Fernandes & S. Bahl & C. Sougnez & S. Gabriel & E. S. Lander & H. T. Kim & G. Getz & J. R. Brow, 2015. "Whole-genome sequencing reveals activation-induced cytidine deaminase signatures during indolent chronic lymphocytic leukaemia evolution," Nature Communications, Nature, vol. 6(1), pages 1-12, December.
    9. Ming Y. Lu & Tiffany Y. Chen & Drew F. K. Williamson & Melissa Zhao & Maha Shady & Jana Lipkova & Faisal Mahmood, 2021. "AI-based pathology predicts origins for cancers of unknown primary," Nature, Nature, vol. 594(7861), pages 106-110, June.
    10. Ludmil B. Alexandrov & Jaegil Kim & Nicholas J. Haradhvala & Mi Ni Huang & Alvin Wei Tian Ng & Yang Wu & Arnoud Boot & Kyle R. Covington & Dmitry A. Gordenin & Erik N. Bergstrom & S. M. Ashiqul Islam , 2020. "The repertoire of mutational signatures in human cancer," Nature, Nature, vol. 578(7793), pages 94-101, February.
    11. Henry Lee-Six & Sigurgeir Olafsson & Peter Ellis & Robert J. Osborne & Mathijs A. Sanders & Luiza Moore & Nikitas Georgakopoulos & Franco Torrente & Ayesha Noorani & Martin Goddard & Philip Robinson &, 2019. "The landscape of somatic mutation in normal colorectal epithelial cells," Nature, Nature, vol. 574(7779), pages 532-537, October.
    12. Philip J. Stephens & Patrick S. Tarpey & Helen Davies & Peter Van Loo & Chris Greenman & David C. Wedge & Serena Nik-Zainal & Sancha Martin & Ignacio Varela & Graham R. Bignell & Lucy R. Yates & Elli , 2012. "The landscape of cancer genes and mutational processes in breast cancer," Nature, Nature, vol. 486(7403), pages 400-404, June.
    13. Cayetano Pleguezuelos-Manzano & Jens Puschhof & Axel Rosendahl Huber & Arne van Hoeck & Henry M. Wood & Jason Nomburg & Carino Gurjao & Freek Manders & Guillaume Dalmasso & Paul B. Stege & Fernanda L., 2020. "Mutational signature in colorectal cancer caused by genotoxic pks+ E. coli," Nature, Nature, vol. 580(7802), pages 269-273, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mingyun Bae & Gyuhee Kim & Tae-Rim Lee & Jin Mo Ahn & Hyunwook Park & Sook Ryun Park & Ki Byung Song & Eunsung Jun & Dongryul Oh & Jeong-Won Lee & Young Sik Park & Ki-Won Song & Jeong-Sik Byeon & Bo H, 2023. "Integrative modeling of tumor genomes and epigenomes for enhanced cancer diagnosis by cell-free DNA," Nature Communications, Nature, vol. 14(1), pages 1-15, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mischan Vali-Pour & Solip Park & Jose Espinosa-Carrasco & Daniel Ortiz-Martínez & Ben Lehner & Fran Supek, 2022. "The impact of rare germline variants on human somatic mutation processes," Nature Communications, Nature, vol. 13(1), pages 1-21, December.
    2. Ewart Kuijk & Onno Kranenburg & Edwin Cuppen & Arne Van Hoeck, 2022. "Common anti-cancer therapies induce somatic mutations in stem cells of healthy tissue," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    3. Alexander Martinez-Fundichely & Austin Dixon & Ekta Khurana, 2022. "Modeling tissue-specific breakpoint proximity of structural variations from whole-genomes to identify cancer drivers," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    4. Bernard C. H. Lee & Philip S. Robinson & Tim H. H. Coorens & Helen H. N. Yan & Sigurgeir Olafsson & Henry Lee-Six & Mathijs A. Sanders & Hoi Cheong Siu & James Hewinson & Sarah S. K. Yue & Wai Yin Tsu, 2022. "Mutational landscape of normal epithelial cells in Lynch Syndrome patients," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    5. Bingjie Chen & Daniele Ramazzotti & Timon Heide & Inmaculada Spiteri & Javier Fernandez-Mateos & Chela James & Luca Magnani & Trevor A. Graham & Andrea Sottoriva, 2023. "Contribution of pks+ E. coli mutations to colorectal carcinogenesis," Nature Communications, Nature, vol. 14(1), pages 1-9, December.
    6. Thomas G. Paulson & Patricia C. Galipeau & Kenji M. Oman & Carissa A. Sanchez & Mary K. Kuhner & Lucian P. Smith & Kevin Hadi & Minita Shah & Kanika Arora & Jennifer Shelton & Molly Johnson & Andre Co, 2022. "Somatic whole genome dynamics of precancer in Barrett’s esophagus reveals features associated with disease progression," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    7. Philip S. Robinson & Laura E. Thomas & Federico Abascal & Hyunchul Jung & Luke M. R. Harvey & Hannah D. West & Sigurgeir Olafsson & Bernard C. H. Lee & Tim H. H. Coorens & Henry Lee-Six & Laura Butlin, 2022. "Inherited MUTYH mutations cause elevated somatic mutation rates and distinctive mutational signatures in normal human cells," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    8. Andrey A. Yurchenko & Fatemeh Rajabi & Tirzah Braz-Petta & Hiva Fassihi & Alan Lehmann & Chikako Nishigori & Jinxin Wang & Ismael Padioleau & Konstantin Gunbin & Leonardo Panunzi & Fanny Morice-Picard, 2023. "Genomic mutation landscape of skin cancers from DNA repair-deficient xeroderma pigmentosum patients," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    9. Yasuhiko Haga & Yoshitaka Sakamoto & Keiko Kajiya & Hitomi Kawai & Miho Oka & Noriko Motoi & Masayuki Shirasawa & Masaya Yotsukura & Shun-Ichi Watanabe & Miyuki Arai & Junko Zenkoh & Kouya Shiraishi &, 2023. "Whole-genome sequencing reveals the molecular implications of the stepwise progression of lung adenocarcinoma," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    10. Jonathan C. M. Wan & Dennis Stephens & Lingqi Luo & James R. White & Caitlin M. Stewart & Benoît Rousseau & Dana W. Y. Tsui & Luis A. Diaz, 2022. "Genome-wide mutational signatures in low-coverage whole genome sequencing of cell-free DNA," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    11. Thomas R. W. Oliver & Lia Chappell & Rashesh Sanghvi & Lauren Deighton & Naser Ansari-Pour & Stefan C. Dentro & Matthew D. Young & Tim H. H. Coorens & Hyunchul Jung & Tim Butler & Matthew D. C. Nevill, 2022. "Clonal diversification and histogenesis of malignant germ cell tumours," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    12. Michael Habig & Cecile Lorrain & Alice Feurtey & Jovan Komluski & Eva H. Stukenbrock, 2021. "Epigenetic modifications affect the rate of spontaneous mutations in a pathogenic fungus," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    13. Oriol Pich & Iker Reyes-Salazar & Abel Gonzalez-Perez & Nuria Lopez-Bigas, 2022. "Discovering the drivers of clonal hematopoiesis," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    14. Sujath Abbas & Oriol Pich & Ginny Devonshire & Shahriar A. Zamani & Annalise Katz-Summercorn & Sarah Killcoyne & Calvin Cheah & Barbara Nutzinger & Nicola Grehan & Nuria Lopez-Bigas & Rebecca C. Fitzg, 2023. "Mutational signature dynamics shaping the evolution of oesophageal adenocarcinoma," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    15. S. Mouron & M. J. Bueno & A. Lluch & L. Manso & I. Calvo & J. Cortes & J. A. Garcia-Saenz & M. Gil-Gil & N. Martinez-Janez & J. V. Apala & E. Caleiras & Pilar Ximénez-Embún & J. Muñoz & L. Gonzalez-Co, 2022. "Phosphoproteomic analysis of neoadjuvant breast cancer suggests that increased sensitivity to paclitaxel is driven by CDK4 and filamin A," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    16. Ryan N. Ptashkin & Mark D. Ewalt & Gowtham Jayakumaran & Iwona Kiecka & Anita S. Bowman & JinJuan Yao & Jacklyn Casanova & Yun-Te David Lin & Kseniya Petrova-Drus & Abhinita S. Mohanty & Ruben Bacares, 2023. "Enhanced clinical assessment of hematologic malignancies through routine paired tumor and normal sequencing," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    17. Erik Elias & Arman Ardalan & Markus Lindberg & Susanne E. Reinsbach & Andreas Muth & Ola Nilsson & Yvonne Arvidsson & Erik Larsson, 2021. "Independent somatic evolution underlies clustered neuroendocrine tumors in the human small intestine," Nature Communications, Nature, vol. 12(1), pages 1-8, December.
    18. Anouk C. Jong & Alexandra Danyi & Job Riet & Ronald Wit & Martin Sjöström & Felix Feng & Jeroen Ridder & Martijn P. Lolkema, 2023. "Predicting response to enzalutamide and abiraterone in metastatic prostate cancer using whole-omics machine learning," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    19. Yoshitaka Sakamoto & Shuhei Miyake & Miho Oka & Akinori Kanai & Yosuke Kawai & Satoi Nagasawa & Yuichi Shiraishi & Katsushi Tokunaga & Takashi Kohno & Masahide Seki & Yutaka Suzuki & Ayako Suzuki, 2022. "Phasing analysis of lung cancer genomes using a long read sequencer," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    20. Nicholas Light & Mehdi Layeghifard & Ayush Attery & Vallijah Subasri & Matthew Zatzman & Nathaniel D. Anderson & Rupal Hatkar & Sasha Blay & David Chen & Ana Novokmet & Fabio Fuligni & James Tran & Ri, 2023. "Germline TP53 mutations undergo copy number gain years prior to tumor diagnosis," Nature Communications, Nature, vol. 14(1), pages 1-12, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-31666-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.