IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1002234.html
   My bibliography  Save this article

Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity

Author

Listed:
  • Hyun Joo
  • Archana G Chavan
  • Ryan Day
  • Kristin P Lennox
  • Paul Sukhanov
  • David B Dahl
  • Marina Vannucci
  • Jerry Tsai

Abstract

Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD 7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/. Author Summary: A protein's structure consists of elements of regular secondary structure connected by less regular stretches of loop segments. The irregularity of the loop structure makes loop modeling quite challenging. More accurate sampling of these loop conformations has a direct impact on protein modeling, design, function classification, as well as protein interactions. A method has been developed that extends a more comprehensive knowledge-based approach to producing models of the loop regions of protein structure. Most physical models cannot adequately sample the large conformational space, while the more discrete knowledge based libraries are conformationally limited. To address both of these problems, we introduce a novel statistical method that produces a continuous yet weighted estimation of loop conformational space from a discrete library of structures by using a Dirichlet process mixture of hidden Markov models (DPM-HMM). Applied to loop structure sampling, the results of a number of tests demonstrate that our approach quickly generates large numbers of candidates with near native loop conformations. Most significantly, in the cases where the template sampling is sparse and/or far from native conformations, the DPM-HMM method samples close to the native space and produces a population of accurate loop structures.

Suggested Citation

  • Hyun Joo & Archana G Chavan & Ryan Day & Kristin P Lennox & Paul Sukhanov & David B Dahl & Marina Vannucci & Jerry Tsai, 2011. "Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity," PLOS Computational Biology, Public Library of Science, vol. 7(10), pages 1-14, October.
  • Handle: RePEc:plo:pcbi00:1002234
    DOI: 10.1371/journal.pcbi.1002234
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002234
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1002234&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1002234?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Pu Liu & Fangqiang Zhu & Dmitrii N Rassokhin & Dimitris K Agrafiotis, 2009. "A Self-Organizing Algorithm for Modeling Protein Loops," PLOS Computational Biology, Public Library of Science, vol. 5(8), pages 1-11, August.
    2. Lennox, Kristin P. & Dahl, David B. & Vannucci, Marina & Tsai, Jerry W., 2009. "Density Estimation for Protein Conformation Angles Using a Bivariate von Mises Distribution and Bayesian Nonparametrics," Journal of the American Statistical Association, American Statistical Association, vol. 104(486), pages 586-596.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Fernández-Durán Juan José & Gregorio-Domínguez MarÍa Mercedes, 2014. "Modeling angles in proteins and circular genomes using multivariate angular distributions based on multiple nonnegative trigonometric sums," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(1), pages 1-18, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Arthur Pewsey & Eduardo García-Portugués, 2021. "Recent advances in directional statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(1), pages 1-58, March.
    2. Fernández-Durán Juan José & Gregorio-Domínguez MarÍa Mercedes, 2014. "Modeling angles in proteins and circular genomes using multivariate angular distributions based on multiple nonnegative trigonometric sums," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(1), pages 1-18, February.
    3. Saptarshi Chakraborty & Samuel W. K. Wong, 2023. "On the circular correlation coefficients for bivariate von Mises distributions on a torus," Statistical Papers, Springer, vol. 64(2), pages 643-675, April.
    4. Kanti Mardia, 2010. "Bayesian analysis for bivariate von Mises distributions," Journal of Applied Statistics, Taylor & Francis Journals, vol. 37(3), pages 515-528.
    5. Abhishek Bhattacharya & David Dunson, 2012. "Strong consistency of nonparametric Bayes density estimation on compact metric spaces with applications to specific manifolds," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 64(4), pages 687-714, August.
    6. Garnett P. McMillan & Timothy E. Hanson & Gabrielle Saunders & Frederick J. Gallun, 2013. "A two-component circular regression model for repeated measures auditory localization data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 62(4), pages 515-534, August.
    7. Daniel Ting & Guoli Wang & Maxim Shapovalov & Rajib Mitra & Michael I Jordan & Roland L Dunbrack Jr, 2010. "Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model," PLOS Computational Biology, Public Library of Science, vol. 6(4), pages 1-21, April.
    8. Ke Tang & Jinfeng Zhang & Jie Liang, 2014. "Fast Protein Loop Sampling and Structure Prediction Using Distance-Guided Sequential Chain-Growth Monte Carlo Method," PLOS Computational Biology, Public Library of Science, vol. 10(4), pages 1-16, April.
    9. David B. Dahl & Ryan Day & Jerry W. Tsai, 2017. "Random Partition Distribution Indexed by Pairwise Information," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 721-732, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1002234. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.