IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1003765.html
   My bibliography  Save this article

Identification of Allelic Imbalance with a Statistical Model for Subtle Genomic Mosaicism

Author

Listed:
  • Rui Xia
  • Selina Vattathil
  • Paul Scheet

Abstract

Genetic heterogeneity in a mixed sample of tumor and normal DNA can confound characterization of the tumor genome. Numerous computational methods have been proposed to detect aberrations in DNA samples from tumor and normal tissue mixtures. Most of these require tumor purities to be at least 10–15%. Here, we present a statistical model to capture information, contained in the individual's germline haplotypes, about expected patterns in the B allele frequencies from SNP microarrays while fully modeling their magnitude, the first such model for SNP microarray data. Our model consists of a pair of hidden Markov models—one for the germline and one for the tumor genome—which, conditional on the observed array data and patterns of population haplotype variation, have a dependence structure induced by the relative imbalance of an individual's inherited haplotypes. Together, these hidden Markov models offer a powerful approach for dealing with mixtures of DNA where the main component represents the germline, thus suggesting natural applications for the characterization of primary clones when stromal contamination is extremely high, and for identifying lesions in rare subclones of a tumor when tumor purity is sufficient to characterize the primary lesions. Our joint model for germline haplotypes and acquired DNA aberration is flexible, allowing a large number of chromosomal alterations, including balanced and imbalanced losses and gains, copy-neutral loss-of-heterozygosity (LOH) and tetraploidy. We found our model (which we term J-LOH) to be superior for localizing rare aberrations in a simulated 3% mixture sample. More generally, our model provides a framework for full integration of the germline and tumor genomes to deal more effectively with missing or uncertain features, and thus extract maximal information from difficult scenarios where existing methods fail.Author Summary: Allelic imbalance, or a deviation from the expected 1-to-1 ratio of alleles where both were present in the germline, can result when there has been an acquired deletion or duplication of part of a chromosome and is a hallmark of cancer genomes. Tumor genomic profiling studies often involve analysis of samples that contain aberrant tumor cells mixed with normal cells without these acquired mutations. Methods for detecting chromosomal aberrations that result in allelic imbalance within a heterogeneous sample have previously been proposed that use the dispersion of within-sample allele frequencies measured at germline heterozygous positions. Here we demonstrate that combining this information with a measure for the correlation in these dispersions, due to the imbalance of one of the chromosomes, provides the most powerful approach. Our method allows for sensitive identification of short allelic imbalance events (e.g. 10 Mb) contained in as few as 3% of the cells in a heterogeneous mixture. Applications include profiling tumor genomes following surgical resection where there exists high contamination of normal tissue and identifying aberrations in subclones. Our work provides a framework for further development of methods that use observed data and population genetic theory for inference of allelic imbalance.

Suggested Citation

  • Rui Xia & Selina Vattathil & Paul Scheet, 2014. "Identification of Allelic Imbalance with a Statistical Model for Subtle Genomic Mosaicism," PLOS Computational Biology, Public Library of Science, vol. 10(8), pages 1-11, August.
  • Handle: RePEc:plo:pcbi00:1003765
    DOI: 10.1371/journal.pcbi.1003765
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003765
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1003765&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1003765?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Paul Scheet & Matthew Stephens, 2008. "Linkage Disequilibrium-Based Quality Control for Large-Scale Genetic Studies," PLOS Genetics, Public Library of Science, vol. 4(8), pages 1-9, August.
    2. Hao Chen & Haipeng Xing & Nancy R Zhang, 2011. "Estimation of Parent Specific DNA Copy Number in Tumors using High-Density Genotyping Arrays," PLOS Computational Biology, Public Library of Science, vol. 7(1), pages 1-15, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lai Yinglei & Gastwirth Joseph L., 2015. "Outlier reset CUSUM for the exploration of copy number alteration data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 14(4), pages 333-345, August.
    2. Anat Reiner-Benaim, 2016. "Scan Statistic Tail Probability Assessment Based on Process Covariance and Window Size," Methodology and Computing in Applied Probability, Springer, vol. 18(3), pages 717-745, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1003765. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.