IDEAS home Printed from
   My bibliography  Save this article

Composite Likelihood Modeling of Neighboring Site Correlations of DNA Sequence Substitution Rates


  • Deng Ling

    (Johnson & Johnson)

  • Moore Dirk F.

    (University of Medicine and Dentistry of New Jersey)


Sequence data from a series of homologous DNA segments from related organisms are typically polymorphic at many sites, and these polymorphisms are the result of evolutionary processes. Such data may be used to estimate the substitution rates as well as the variability of these rates. Careful characterization of the distribution of this variation is essential for accurate estimation of evolutionary distances and phylogeny reconstruction among these sequences. Many researchers have recognized the importance of the variability of substitution rates, which most have modeled using a discrete gamma distribution. Some have extended these methods to explicitly account for the correlation of substitution rates among sites using hidden Markov models; others have proposed context-dependent substitution rate schemes. We accommodate these correlations using a composite likelihood method based on a bivariate gamma distribution, which is more flexible than hidden Markov models in terms of correlation structure and more computationally tractable compared to the context-dependent schemes. We show that the estimates have good theoretical properties. We also use simulations to compare the maximum composite likelihood estimates to those obtained from maximum likelihood based on the independence assumption. We use data from the mitochondrial DNA of ten primates to obtain maximum composite likelihood estimates of the mean substitution rate, overdispersion, and correlation parameters, and use these estimates in a parametric phylogenetic bootstrap to assess the impact of serial correlation on the estimates of substitution rates and branch lengths.

Suggested Citation

  • Deng Ling & Moore Dirk F., 2009. "Composite Likelihood Modeling of Neighboring Site Correlations of DNA Sequence Substitution Rates," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-20, January.
  • Handle: RePEc:bpj:sagmbi:v:8:y:2009:i:1:n:6

    Download full text from publisher

    File URL:
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    1. Robin Henderson, 2003. "A serially correlated gamma frailty model for longitudinal count data," Biometrika, Biometrika Trust, vol. 90(2), pages 355-366, June.
    2. Paul Fearnhead & Peter Donnelly, 2002. "Approximate likelihood methods for estimating local recombination rates," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 657-680, October.
    3. Cristiano Varin & Paolo Vidoni, 2005. "A note on composite likelihood inference and model selection," Biometrika, Biometrika Trust, vol. 92(3), pages 519-528, September.
    4. D. R. Cox, 2004. "A note on pseudolikelihood constructed from marginal densities," Biometrika, Biometrika Trust, vol. 91(3), pages 729-737, September.
    Full references (including those not matched with items on IDEAS)

    More about this item


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:8:y:2009:i:1:n:6. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Peter Golla). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.