IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0218149.html
   My bibliography  Save this article

RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold

Author

Listed:
  • Clare E West
  • Saulo H P de Oliveira
  • Charlotte M Deane

Abstract

While template-free protein structure prediction protocols now produce good quality models for many targets, modelling failure remains common. For these methods to be useful it is important that users can both choose the best model from the hundreds to thousands of models that are commonly generated for a target, and determine whether this model is likely to be correct. We have developed Random Forest Quality Assessment (RFQAmodel), which assesses whether models produced by a protein structure prediction pipeline have the correct fold. RFQAmodel uses a combination of existing quality assessment scores with two predicted contact map alignment scores. These alignment scores are able to identify correct models for targets that are not otherwise captured. Our classifier was trained on a large set of protein domains that are structurally diverse and evenly balanced in terms of protein features known to have an effect on modelling success, and then tested on a second set of 244 protein domains with a similar spread of properties. When models for each target in this second set were ranked according to the RFQAmodel score, the highest-ranking model had a high-confidence RFQAmodel score for 67 modelling targets, of which 52 had the correct fold. At the other end of the scale RFQAmodel correctly predicted that for 59 targets the highest-ranked model was incorrect. In comparisons to other methods we found that RFQAmodel is better able to identify correct models for targets where only a few of the models are correct. We found that RFQAmodel achieved a similar performance on the model sets for CASP12 and CASP13 free-modelling targets. Finally, by iteratively generating models and running RFQAmodel until a model is produced that is predicted to be correct with high confidence, we demonstrate how such a protocol can be used to focus computational efforts on difficult modelling targets. RFQAmodel and the accompanying data can be downloaded from http://opig.stats.ox.ac.uk/resources.

Suggested Citation

  • Clare E West & Saulo H P de Oliveira & Charlotte M Deane, 2019. "RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold," PLOS ONE, Public Library of Science, vol. 14(10), pages 1-16, October.
  • Handle: RePEc:plo:pone00:0218149
    DOI: 10.1371/journal.pone.0218149
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0218149
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0218149&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0218149?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Balachandran Manavalan & Juyong Lee & Jooyoung Lee, 2014. "Random Forest-Based Protein Model Quality Assessment (RFMQA) Using Structural Features and Potential Energy Terms," PLOS ONE, Public Library of Science, vol. 9(9), pages 1-11, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rin Sato & Takashi Ishida, 2019. "Protein model accuracy estimation based on local structure quality assessment using 3D convolutional neural network," PLOS ONE, Public Library of Science, vol. 14(9), pages 1-15, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0218149. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.