Author
Listed:
- Nobuaki Masaki
- Sharon R Browning
- Brian L Browning
Abstract
Genotype data include errors that may influence conclusions reached by downstream statistical analyses. Previous studies have estimated genotype error rates from discrepancies in human pedigree data, such as Mendelian inconsistent genotypes or apparent phase violations. However, uncalled deletions, which generally have not been accounted for in these studies, can lead to biased error rate estimates. In this study, we propose a genotype error model that considers both genotype errors and uncalled deletions when calculating the likelihood of the observed genotypes in parent-offspring trios. Using simulations, we show that when there are uncalled deletions, our model produces genotype error rate estimates that are less biased than estimates from a model that does not account for these deletions. We applied our model to SNVs in 77 sequenced White British parent-offspring trios in the UK Biobank. We use the Akaike information criterion to show that our model fits the data better than a model that does not account for uncalled deletions. We estimate the genotype error rate at SNVs with minor allele frequency > 0.001 in these data to be 3.2×10−4(90%CI:[2.8×10−4,6.2×10−4]). We estimate that 77% of the genotype errors at these markers are attributable to uncalled deletions (90%CI:[73%,88%]).Author summary: A genotype error occurs when the genotype identified through molecular analysis does not match the actual genotype of the individual being analyzed. Because genotype errors can influence downstream statistical results, previous studies have attempted to estimate the rate of genotype errors in a study sample. However, uncalled deletions, which generally have not been accounted for in these studies, can lead to biased error rate estimates. In this study, we formulate a model adjusting for uncalled deletions when estimating genotype error rates. We show that when uncalled deletions are present, this model results in less biased estimates of genotype error rates compared to a model that does not adjust for uncalled deletions. We apply this model to SNVs in 77 sequenced White British parent-offspring trios in the UK Biobank and estimate the genotype error rate and the proportion of genotype errors that are attributable to uncalled deletions at SNVs with minor allele frequency > 0.001.
Suggested Citation
Nobuaki Masaki & Sharon R Browning & Brian L Browning, 2024.
"Simultaneous estimation of genotype error and uncalled deletion rates in whole genome sequence data,"
PLOS Genetics, Public Library of Science, vol. 20(5), pages 1-18, May.
Handle:
RePEc:plo:pgen00:1011297
DOI: 10.1371/journal.pgen.1011297
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1011297. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.