IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0143166.html
   My bibliography  Save this article

Machine Learning: How Much Does It Tell about Protein Folding Rates?

Author

Listed:
  • Marc Corrales
  • Pol Cuscó
  • Dinara R Usmanova
  • Heng-Chang Chen
  • Natalya S Bogatyreva
  • Guillaume J Filion
  • Dmitry N Ivankov

Abstract

The prediction of protein folding rates is a necessary step towards understanding the principles of protein folding. Due to the increasing amount of experimental data, numerous protein folding models and predictors of protein folding rates have been developed in the last decade. The problem has also attracted the attention of scientists from computational fields, which led to the publication of several machine learning-based models to predict the rate of protein folding. Some of them claim to predict the logarithm of protein folding rate with an accuracy greater than 90%. However, there are reasons to believe that such claims are exaggerated due to large fluctuations and overfitting of the estimates. When we confronted three selected published models with new data, we found a much lower predictive power than reported in the original publications. Overly optimistic predictive powers appear from violations of the basic principles of machine-learning. We highlight common misconceptions in the studies claiming excessive predictive power and propose to use learning curves as a safeguard against those mistakes. As an example, we show that the current amount of experimental data is insufficient to build a linear predictor of logarithms of folding rates based on protein amino acid composition.

Suggested Citation

  • Marc Corrales & Pol Cuscó & Dinara R Usmanova & Heng-Chang Chen & Natalya S Bogatyreva & Guillaume J Filion & Dmitry N Ivankov, 2015. "Machine Learning: How Much Does It Tell about Protein Folding Rates?," PLOS ONE, Public Library of Science, vol. 10(11), pages 1-12, November.
  • Handle: RePEc:plo:pone00:0143166
    DOI: 10.1371/journal.pone.0143166
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0143166
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0143166&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0143166?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Nobuyasu Koga & Rie Tatsumi-Koga & Gaohua Liu & Rong Xiao & Thomas B. Acton & Gaetano T. Montelione & David Baker, 2012. "Principles for designing ideal protein structures," Nature, Nature, vol. 491(7423), pages 222-227, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Thomas W. Linsky & Kyle Noble & Autumn R. Tobin & Rachel Crow & Lauren Carter & Jeffrey L. Urbauer & David Baker & Eva-Maria Strauch, 2022. "Sampling of structure and sequence space of small protein folds," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    2. Jorge Roel-Touris & Marta Nadal & Enrique Marcos, 2023. "Single-chain dimers from de novo immunoglobulins as robust scaffolds for multiple binding loops," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    3. Anindya Roy & Lei Shi & Ashley Chang & Xianchi Dong & Andres Fernandez & John C. Kraft & Jing Li & Viet Q. Le & Rebecca Viazzo Winegar & Gerald Maxwell Cherf & Dean Slocum & P. Daniel Poulson & Garret, 2023. "De novo design of highly selective miniprotein inhibitors of integrins αvβ6 and αvβ8," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    4. Hiroto Murata & Hayao Imakawa & Nobuyasu Koga & George Chikenji, 2021. "The register shift rules for βαβ-motifs for de novo protein design," PLOS ONE, Public Library of Science, vol. 16(8), pages 1-24, August.
    5. Kozyrev, S.V. & Volovich, I.V., 2014. "Quinary lattice model of secondary structures of polymers," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 393(C), pages 86-95.
    6. Lindsey A. Doyle & Brittany Takushi & Ryan D. Kibler & Lukas F. Milles & Carolina T. Orozco & Jonathan D. Jones & Sophie E. Jackson & Barry L. Stoddard & Philip Bradley, 2023. "De novo design of knotted tandem repeat proteins," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    7. Tamuka M. Chidyausiku & Soraia R. Mendes & Jason C. Klima & Marta Nadal & Ulrich Eckhard & Jorge Roel-Touris & Scott Houliston & Tibisay Guevara & Hugh K. Haddox & Adam Moyer & Cheryl H. Arrowsmith & , 2022. "De novo design of immunoglobulin-like domains," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    8. Jaume Bonet & Sarah Wehrle & Karen Schriever & Che Yang & Anne Billet & Fabian Sesterhenn & Andreas Scheck & Freyr Sverrisson & Barbora Veselkova & Sabrina Vollers & Roxanne Lourman & Mélanie Villard , 2018. "Rosetta FunFolDes – A general framework for the computational design of functional proteins," PLOS Computational Biology, Public Library of Science, vol. 14(11), pages 1-30, November.
    9. Sagar D Khare & Timothy A Whitehead, 2015. "Introduction to the Rosetta Special Collection," PLOS ONE, Public Library of Science, vol. 10(12), pages 1-5, December.
    10. Pralay Mitra & David Shultis & Jeffrey R Brender & Jeff Czajka & David Marsh & Felicia Gray & Tomasz Cierpicki & Yang Zhang, 2013. "An Evolution-Based Approach to De Novo Protein Design and Case Study on Mycobacterium tuberculosis," PLOS Computational Biology, Public Library of Science, vol. 9(10), pages 1-18, October.
    11. Willow Coyote-Maestas & David Nedrud & Antonio Suma & Yungui He & Kenneth A. Matreyek & Douglas M. Fowler & Vincenzo Carnevale & Chad L. Myers & Daniel Schmidt, 2021. "Probing ion channel functional architecture and domain recombination compatibility by massively parallel domain insertion profiling," Nature Communications, Nature, vol. 12(1), pages 1-16, December.
    12. Rebecca F Alford & Andrew Leaver-Fay & Lynda Gonzales & Erin L Dolan & Jeffrey J Gray, 2017. "A cyber-linked undergraduate research experience in computational biomolecular structure prediction and design," PLOS Computational Biology, Public Library of Science, vol. 13(12), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0143166. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.