IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1011205.html
   My bibliography  Save this article

Recall DNA methylation levels at low coverage sites using a CNN model in WGBS

Author

Listed:
  • Ximei Luo
  • Yansu Wang
  • Quan Zou
  • Lei Xu

Abstract

DNA methylation is an important regulator of gene transcription. WGBS is the gold-standard approach for base-pair resolution quantitative of DNA methylation. It requires high sequencing depth. Many CpG sites with insufficient coverage in the WGBS data, resulting in inaccurate DNA methylation levels of individual sites. Many state-of-arts computation methods were proposed to predict the missing value. However, many methods required either other omics datasets or other cross-sample data. And most of them only predicted the state of DNA methylation. In this study, we proposed the RcWGBS, which can impute the missing (or low coverage) values from the DNA methylation levels on the adjacent sides. Deep learning techniques were employed for the accurate prediction. The WGBS datasets of H1-hESC and GM12878 were down-sampled. The average difference between the DNA methylation level at 12× depth predicted by RcWGBS and that at >50× depth in the H1-hESC and GM2878 cells are less than 0.03 and 0.01, respectively. RcWGBS performed better than METHimpute even though the sequencing depth was as low as 12×. Our work would help to process methylation data of low sequencing depth. It is beneficial for researchers to save sequencing costs and improve data utilization through computational methods.Author summary: DNA methylation has a major impact on gene regulation. WGBS is the gold standard for investigating the DNA methylation. The DNA methylation level of the sites with low coverage are often not accurate in WGBS datasets. Therefore, we proposed a method based on the CNN model to perform DNA methylation level interpolation for specific sites and named this method as RcWGBS. RcWGBS did not rely on other omics data or other cross-sample data. It only used the sites with sufficient coverage contained in the target WGBS dataset for model training to obtain parameters. Then, the trained model can be used to predict the DNA methylation level of sites with low coverage. Our analyses showed that RcWGBS could recalibrate the methylation level of some CpGs with insufficient coverage. It is suggested that our research could benefit the WGBS datasets with insufficient sequencing coverage. RcWGBS is implemented as an R-packages. It is efficient and convenient and does not need other WGBS or omics data.

Suggested Citation

  • Ximei Luo & Yansu Wang & Quan Zou & Lei Xu, 2023. "Recall DNA methylation levels at low coverage sites using a CNN model in WGBS," PLOS Computational Biology, Public Library of Science, vol. 19(6), pages 1-14, June.
  • Handle: RePEc:plo:pcbi00:1011205
    DOI: 10.1371/journal.pcbi.1011205
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011205
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1011205&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1011205?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1011205. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.