IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1005347.html
   My bibliography  Save this article

Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression

Author

Listed:
  • William Poole
  • Kalle Leinonen
  • Ilya Shmulevich
  • Theo A Knijnenburg
  • Brady Bernard

Abstract

Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.Author summary: Identifying driver mutations in cancer has been a major challenge in cancer research, with the ultimate goal of understanding the detailed molecular origins of cancer and providing genetically personalized treatments. For decades, the cancer research community has known that mutations in certain genes—such as tumor suppressors like P53—can drive cancer. In some cases it is also clear that mutations within cancer genes are localized in a single amino—such as the V600E mutation in BRAF. With the existence of large multi-omic data sets including The Cancer Genome Atlas (TCGA), it is now possible to apply big data approaches towards both identifying mutation features of interest and understanding their functional consequences. We have bridged the gap between single amino acid mutations and the whole gene view by developing an algorithm that can identify variable length regions within cancer genes that which enriched for mutations. Furthermore, we have been able to integrate our multiscale mutation clusters with additional molecular data to gain insight into possible functional consequences of the clusters.

Suggested Citation

  • William Poole & Kalle Leinonen & Ilya Shmulevich & Theo A Knijnenburg & Brady Bernard, 2017. "Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression," PLOS Computational Biology, Public Library of Science, vol. 13(2), pages 1-26, February.
  • Handle: RePEc:plo:pcbi00:1005347
    DOI: 10.1371/journal.pcbi.1005347
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005347
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005347&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1005347?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1005347. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.