IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v77y2021i2p424-438.html
   My bibliography  Save this article

A novel statistical method for modeling covariate effects in bisulfite sequencing derived measures of DNA methylation

Author

Listed:
  • Kaiqiong Zhao
  • Karim Oualkacha
  • Lajmi Lakhal‐Chaieb
  • Aurélie Labbe
  • Kathleen Klein
  • Antonio Ciampi
  • Marie Hudson
  • Inés Colmegna
  • Tomi Pastinen
  • Tieyuan Zhang
  • Denise Daley
  • Celia M.T. Greenwood

Abstract

Identifying disease‐associated changes in DNA methylation can help us gain a better understanding of disease etiology. Bisulfite sequencing allows the generation of high‐throughput methylation profiles at single‐base resolution of DNA. However, optimally modeling and analyzing these sparse and discrete sequencing data is still very challenging due to variable read depth, missing data patterns, long‐range correlations, data errors, and confounding from cell type mixtures. We propose a regression‐based hierarchical model that allows covariate effects to vary smoothly along genomic positions and we have built a specialized EM algorithm, which explicitly allows for experimental errors and cell type mixtures, to make inference about smooth covariate effects in the model. Simulations show that the proposed method provides accurate estimates of covariate effects and captures the major underlying methylation patterns with excellent power. We also apply our method to analyze data from rheumatoid arthritis patients and controls. The method has been implemented in R package SOMNiBUS.

Suggested Citation

  • Kaiqiong Zhao & Karim Oualkacha & Lajmi Lakhal‐Chaieb & Aurélie Labbe & Kathleen Klein & Antonio Ciampi & Marie Hudson & Inés Colmegna & Tomi Pastinen & Tieyuan Zhang & Denise Daley & Celia M.T. Green, 2021. "A novel statistical method for modeling covariate effects in bisulfite sequencing derived measures of DNA methylation," Biometrics, The International Biometric Society, vol. 77(2), pages 424-438, June.
  • Handle: RePEc:bla:biomet:v:77:y:2021:i:2:p:424-438
    DOI: 10.1111/biom.13307
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13307
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13307?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Jelle J. Goeman & Sara A. Van De Geer & Hans C. Van Houwelingen, 2006. "Testing against a high dimensional alternative," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(3), pages 477-493, June.
    2. Ryan Lister & Mattia Pelizzola & Robert H. Dowen & R. David Hawkins & Gary Hon & Julian Tonti-Filippini & Joseph R. Nery & Leonard Lee & Zhen Ye & Que-Minh Ngo & Lee Edsall & Jessica Antosiewicz-Bourg, 2009. "Human DNA methylomes at base resolution show widespread epigenomic differences," Nature, Nature, vol. 462(7271), pages 315-322, November.
    3. D. Oakes, 1999. "Direct calculation of the information matrix via the EM," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(2), pages 479-482, April.
    4. Brendan Maher, 2008. "Personal genomes: The case of the missing heritability," Nature, Nature, vol. 456(7218), pages 18-21, November.
    5. Simon N. Wood, 2011. "Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 73(1), pages 3-36, January.
    6. Farhad Shokoohi & David A. Stephens & Guillaume Bourque & Tomi Pastinen & Celia M. T. Greenwood & Aurélie Labbe, 2019. "A hidden markov model for identifying differentially methylated sites in bisulfite sequencing data," Biometrics, The International Biometric Society, vol. 75(1), pages 210-221, March.
    7. Simon N. Wood & Matteo Fasiolo, 2017. "A generalized Fellner‐Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models," Biometrics, The International Biometric Society, vol. 73(4), pages 1071-1081, December.
    8. Andrew P. Feinberg, 2007. "Phenotypic plasticity and the epigenetics of human disease," Nature, Nature, vol. 447(7143), pages 433-440, May.
    9. Simon N. Wood & Natalya Pya & Benjamin Säfken, 2016. "Smoothing Parameter and Model Selection for General Smooth Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1548-1563, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Simon N. Wood, 2020. "Inference and computation with generalized additive models and their extensions," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 29(2), pages 307-339, June.
    2. E. Zanini & E. Eastoe & M. J. Jones & D. Randell & P. Jonathan, 2020. "Flexible covariate representations for extremes," Environmetrics, John Wiley & Sons, Ltd., vol. 31(5), August.
    3. Simon N. Wood & Matteo Fasiolo, 2017. "A generalized Fellner‐Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models," Biometrics, The International Biometric Society, vol. 73(4), pages 1071-1081, December.
    4. Øystein Sørensen & Anders M. Fjell & Kristine B. Walhovd, 2023. "Longitudinal Modeling of Age-Dependent Latent Traits with Generalized Additive Latent and Mixed Models," Psychometrika, Springer;The Psychometric Society, vol. 88(2), pages 456-486, June.
    5. Cornelius Fritz & Göran Kauermann, 2022. "On the interplay of regional mobility, social connectedness and the spread of COVID‐19 in Germany," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(1), pages 400-424, January.
    6. Lambert, Philippe, 2021. "Fast Bayesian inference using Laplace approximations in nonparametric double additive location-scale models with right- and interval-censored data," Computational Statistics & Data Analysis, Elsevier, vol. 161(C).
    7. Frank van Berkum & Katrien Antonio & Michel Vellekoop, 2021. "Quantifying longevity gaps using micro‐level lifetime data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(2), pages 548-570, April.
    8. Roel Verbelen & Katrien Antonio & Gerda Claeskens, 2018. "Unravelling the predictive power of telematics data in car insurance pricing," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 67(5), pages 1275-1304, November.
    9. Kunegel-Lion, Mélodie & Neilson, Eric W. & Mansuy, Nicolas & Goodsman, Devin W., 2022. "Habitat quality does not predict animal population abundance on frequently disturbed landscapes," Ecological Modelling, Elsevier, vol. 469(C).
    10. Valtiala, Juho & Niskanen, Olli & Torvinen, Mikael & Riekkinen, Kirsikka & Suokannas, Antti, 2023. "The relationship between agricultural land parcel size and cultivation costs," Land Use Policy, Elsevier, vol. 131(C).
    11. Luca Scrucca, 2022. "A COVINDEX based on a GAM beta regression model with an application to the COVID-19 pandemic in Italy," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 31(4), pages 881-900, October.
    12. Bo Sun & Derek T. Robinson, 2018. "Comparison of Statistical Approaches for Modelling Land-Use Change," Land, MDPI, vol. 7(4), pages 1-33, November.
    13. Roland Langrock & Timo Adam & Vianey Leos‐Barajas & Sina Mews & David L. Miller & Yannis P. Papastamatiou, 2018. "Spline‐based nonparametric inference in general state‐switching models," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 72(3), pages 179-200, August.
    14. Mengfei Ran & Yihe Yang, 2022. "Optimal Estimation of Large Functional and Longitudinal Data by Using Functional Linear Mixed Model," Mathematics, MDPI, vol. 10(22), pages 1-28, November.
    15. François Freddy Ateba & Issaka Sagara & Nafomon Sogoba & Mahamoudou Touré & Drissa Konaté & Sory Ibrahim Diawara & Séidina Aboubacar Samba Diakité & Ayouba Diarra & Mamadou D. Coulibaly & Mathias Dolo, 2020. "Spatio-Temporal Dynamic of Malaria Incidence: A Comparison of Two Ecological Zones in Mali," IJERPH, MDPI, vol. 17(13), pages 1-21, June.
    16. Simon N. Wood, 2022. "Inferring UK COVID‐19 fatal infection trajectories from daily mortality data: Were infections already in decline before the UK lockdowns?," Biometrics, The International Biometric Society, vol. 78(3), pages 1127-1140, September.
    17. Shu Yang & Jae Kwang Kim, 2016. "Likelihood-based Inference with Missing Data Under Missing-at-Random," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(2), pages 436-454, June.
    18. Georgios Gioldasis & Antonio Musolesi & Michel Simioni, 2020. "Model uncertainty, nonlinearities and out-of-sample comparison: evidence from international technology diffusion," Working Papers hal-02790523, HAL.
    19. Gerhard Tutz & Moritz Berger, 2018. "Tree-structured modelling of categorical predictors in generalized additive regression," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(3), pages 737-758, September.
    20. Chuong B Do & David A Hinds & Uta Francke & Nicholas Eriksson, 2012. "Comparison of Family History and SNPs for Predicting Risk of Complex Disease," PLOS Genetics, Public Library of Science, vol. 8(10), pages 1-16, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:77:y:2021:i:2:p:424-438. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.