IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1012631.html
   My bibliography  Save this article

A regression based approach to phylogenetic reconstruction from multi-sample bulk DNA sequencing of tumors

Author

Listed:
  • Henri Schmidt
  • Benjamin J Raphael

Abstract

Motivation: DNA sequencing of multiple bulk samples from a tumor provides the opportunity to investigate tumor heterogeneity and reconstruct a phylogeny of a patient’s cancer. However, since bulk DNA sequencing of tumor tissue measures thousands of cells from a heterogeneous mixture of distinct sub-populations, accurate reconstruction of the tumor phylogeny requires simultaneous deconvolution of cancer clones and inference of ancestral relationships, leading to a challenging computational problem. Many existing methods for phylogenetic reconstruction from bulk sequencing data do not scale to large datasets, such as recent datasets containing upwards of ninety samples with dozens of distinct sub-populations. Results: We develop an approach to reconstruct phylogenetic trees from multi-sample bulk DNA sequencing data by separating the reconstruction problem into two parts: a structured regression problem for a fixed tree T, and an optimization over tree space. We derive an algorithm for the regression sub-problem by exploiting the unique, combinatorial structure of the matrices appearing within the problem. This algorithm has both asymptotic and empirical improvements over linear programming (LP) approaches to the problem. Using our algorithm for this regression sub-problem, we develop fastBE, a simple method for phylogenetic inference from multi-sample bulk DNA sequencing data. We demonstrate on simulated data with hundreds of samples and upwards of a thousand distinct sub-populations that fastBE outperforms existing approaches in terms of reconstruction accuracy, sample efficiency, and runtime. Owing to its scalability, fastBE enables both phylogenetic reconstruction directly from indvidual mutations without requiring the clustering of mutations into clones, as well as a new phylogeny constrained mutation clustering algorithm. On real data from fourteen B-progenitor acute lymphoblastic leukemia patients, fastBE infers mutation phylogenies with fewer violations of a widely used evolutionary constraint and better agreement to the observed mutational frequencies. Using our phylogeny constrained mutation clustering algorithm, we also find mutation clusters with lower distortion compared to state-of-the-art approaches. Finally, we show that on two patient-derived colorectal cancer models, fastBE infers mutation phylogenies with less violation of a widely used evolutionary constraint compared to existing methods. Author summary: DNA sequencing of a bulk tumor sample measures the genomes of the heterogeneous mixture of cells that comprise a tumor. Reconstructing the evolutionary history of a cancer from such admixed measurements is challenging, as standard phylogenetic techniques assume that genomes of individual cells are measured. Multiple specialized techniques aim to simultaneously infer the unmeasured genomes and construct the evolutionary history of these genomes, but many of these methods do not scale to large numbers of genomes in the mixture. We introduce a new tool, fast Bulk Evolution (fastBE), which accurately reconstructs the evolutionary history of tumors containing hundreds-thousands of genomes from bulk DNA sequencing data. Key to the success of fastBE are new algorithmic insights which make this task tractable. fastBE is a useful tool to analyze large multi-region tumor sequencing datasets.

Suggested Citation

  • Henri Schmidt & Benjamin J Raphael, 2024. "A regression based approach to phylogenetic reconstruction from multi-sample bulk DNA sequencing of tumors," PLOS Computational Biology, Public Library of Science, vol. 20(12), pages 1-24, December.
  • Handle: RePEc:plo:pcbi00:1012631
    DOI: 10.1371/journal.pcbi.1012631
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012631
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1012631&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1012631?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1012631. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.