IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v15y2016i2p173-191n5.html
   My bibliography  Save this article

Comparing five statistical methods of differential methylation identification using bisulfite sequencing data

Author

Listed:
  • Yu Xiaoqing

    (Department of Biostatistics, Yale University, New Haven, CT 06511, USA)

  • Sun Shuying

    (Department of Mathematics, Texas State University, San Marcos, TX 78666, USA)

Abstract

We are presenting a comprehensive comparative analysis of five differential methylation (DM) identification methods: methylKit, BSmooth, BiSeq, HMM-DM, and HMM-Fisher, which are developed for bisulfite sequencing (BS) data. We summarize the features of these methods from several analytical aspects and compare their performances using both simulated and real BS datasets. Our comparison results are summarized below. First, parameter settings may largely affect the accuracy of DM identification. Different from default settings, modified parameter settings yield higher sensitivity and/or lower false positive rates. Second, all five methods show more accurate results when identifying simulated DM regions that are long and have small within-group variation, but they have low concordance, probably due to the different approaches they have used for DM identification. Third, HMM-DM and HMM-Fisher yield relatively higher sensitivity and lower false positive rates than others, especially in DM regions with large variation. Finally, we have found that among the three methods that involve methylation estimation (methylKit, BSmooth, and BiSeq), BiSeq can best present raw methylation signals. Therefore, based on these results, we suggest that users select DM identification methods based on the characteristics of their data and the advantages of each method.

Suggested Citation

  • Yu Xiaoqing & Sun Shuying, 2016. "Comparing five statistical methods of differential methylation identification using bisulfite sequencing data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 15(2), pages 173-191, April.
  • Handle: RePEc:bpj:sagmbi:v:15:y:2016:i:2:p:173-191:n:5
    DOI: 10.1515/sagmb-2015-0078
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/sagmb-2015-0078
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/sagmb-2015-0078?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Yoav Benjamini & Abba M. Krieger & Daniel Yekutieli, 2006. "Adaptive linear step-up procedures that control the false discovery rate," Biometrika, Biometrika Trust, vol. 93(3), pages 491-507, September.
    2. Ryan Lister & Mattia Pelizzola & Robert H. Dowen & R. David Hawkins & Gary Hon & Julian Tonti-Filippini & Joseph R. Nery & Leonard Lee & Zhen Ye & Que-Minh Ngo & Lee Edsall & Jessica Antosiewicz-Bourg, 2009. "Human DNA methylomes at base resolution show widespread epigenomic differences," Nature, Nature, vol. 462(7271), pages 315-322, November.
    3. Benjamini, Yoav & Heller, Ruth, 2007. "False Discovery Rates for Spatial Signals," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1272-1281, December.
    4. Claude Becker & Jörg Hagmann & Jonas Müller & Daniel Koenig & Oliver Stegle & Karsten Borgwardt & Detlef Weigel, 2011. "Spontaneous epigenetic variation in the Arabidopsis thaliana methylome," Nature, Nature, vol. 480(7376), pages 245-249, December.
    5. John D. Storey, 2002. "A direct approach to false discovery rates," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(3), pages 479-498, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yoav Benjamini, 2010. "Discovering the false discovery rate," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(4), pages 405-416, September.
    2. Qingyun Cai & Hock Peng Chan, 2017. "A Double Application of the Benjamini-Hochberg Procedure for Testing Batched Hypotheses," Methodology and Computing in Applied Probability, Springer, vol. 19(2), pages 429-443, June.
    3. Cheng, Cheng, 2009. "Internal validation inferences of significant genomic features in genome-wide screening," Computational Statistics & Data Analysis, Elsevier, vol. 53(3), pages 788-800, January.
    4. Guo Wenge & Peddada Shyamal, 2008. "Adaptive Choice of the Number of Bootstrap Samples in Large Scale Multiple Testing," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-21, March.
    5. T. Tony Cai & Wenguang Sun, 2017. "Optimal screening and discovery of sparse signals with applications to multistage high throughput studies," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(1), pages 197-223, January.
    6. Nik Tuzov & Frederi Viens, 2011. "Mutual fund performance: false discoveries, bias, and power," Annals of Finance, Springer, vol. 7(2), pages 137-169, May.
    7. Kim, Donggyu & Zhang, Chunming, 2014. "Adaptive linear step-up multiple testing procedure with the bias-reduced estimator," Statistics & Probability Letters, Elsevier, vol. 87(C), pages 31-39.
    8. Andrew Y. Chen, 2022. "Do t-Statistic Hurdles Need to be Raised?," Papers 2204.10275, arXiv.org, revised Apr 2024.
    9. Zhao, Haibing & Fung, Wing Kam, 2016. "A powerful FDR control procedure for multiple hypotheses," Computational Statistics & Data Analysis, Elsevier, vol. 98(C), pages 60-70.
    10. Cai, Qingyun, 2018. "A scoring criterion for rejection of clustered p-values," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 180-189.
    11. Haibing Zhao & Wing Kam Fung, 2018. "Controlling mixed directional false discovery rate in multidimensional decisions with applications to microarray studies," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 27(2), pages 316-337, June.
    12. Rohit Kumar Patra & Bodhisattva Sen, 2016. "Estimation of a two-component mixture model with applications to multiple testing," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(4), pages 869-893, September.
    13. I. Ahmed & C. Dalmasso & F. Haramburu & F. Thiessard & P. Broët & P. Tubert-Bitter, 2010. "False Discovery Rate Estimation for Frequentist Pharmacovigilance Signal Detection Methods," Biometrics, The International Biometric Society, vol. 66(1), pages 301-309, March.
    14. Sun Shuying & Yu Xiaoqing, 2016. "HMM-Fisher: identifying differential methylation using a hidden Markov model and Fisher’s exact test," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 15(1), pages 55-67, March.
    15. Wenguang Sun & T. Tony Cai, 2009. "Large‐scale multiple testing under dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 393-424, April.
    16. Farcomeni, Alessio & Pacillo, Simona, 2011. "A conservative estimator for the proportion of false nulls based on Dvoretzky, Kiefer and Wolfowitz inequality," Statistics & Probability Letters, Elsevier, vol. 81(12), pages 1867-1870.
    17. Alessio Farcomeni, 2009. "Generalized Augmentation to Control the False Discovery Exceedance in Multiple Testing," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 36(3), pages 501-517, September.
    18. Haibing Zhao & Xinping Cui, 2020. "Constructing confidence intervals for selected parameters," Biometrics, The International Biometric Society, vol. 76(4), pages 1098-1108, December.
    19. Habiger, Joshua D. & Peña, Edsel A., 2014. "Compound p-value statistics for multiple testing procedures," Journal of Multivariate Analysis, Elsevier, vol. 126(C), pages 153-166.
    20. Wang, Xia & Shojaie, Ali & Zou, Jian, 2019. "Bayesian hidden Markov models for dependent large-scale multiple testing," Computational Statistics & Data Analysis, Elsevier, vol. 136(C), pages 123-136.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:15:y:2016:i:2:p:173-191:n:5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.