IDEAS home Printed from https://ideas.repec.org/a/nat/nature/v596y2021i7873d10.1038_s41586-021-03828-1.html
   My bibliography  Save this article

Highly accurate protein structure prediction for the human proteome

Author

Listed:
  • Kathryn Tunyasuvunakool

    (DeepMind)

  • Jonas Adler

    (DeepMind)

  • Zachary Wu

    (DeepMind)

  • Tim Green

    (DeepMind)

  • Michal Zielinski

    (DeepMind)

  • Augustin Žídek

    (DeepMind)

  • Alex Bridgland

    (DeepMind)

  • Andrew Cowie

    (DeepMind)

  • Clemens Meyer

    (DeepMind)

  • Agata Laydon

    (DeepMind)

  • Sameer Velankar

    (European Bioinformatics Institute)

  • Gerard J. Kleywegt

    (European Bioinformatics Institute)

  • Alex Bateman

    (European Bioinformatics Institute)

  • Richard Evans

    (DeepMind)

  • Alexander Pritzel

    (DeepMind)

  • Michael Figurnov

    (DeepMind)

  • Olaf Ronneberger

    (DeepMind)

  • Russ Bates

    (DeepMind)

  • Simon A. A. Kohl

    (DeepMind)

  • Anna Potapenko

    (DeepMind)

  • Andrew J. Ballard

    (DeepMind)

  • Bernardino Romera-Paredes

    (DeepMind)

  • Stanislav Nikolov

    (DeepMind)

  • Rishub Jain

    (DeepMind)

  • Ellen Clancy

    (DeepMind)

  • David Reiman

    (DeepMind)

  • Stig Petersen

    (DeepMind)

  • Andrew W. Senior

    (DeepMind)

  • Koray Kavukcuoglu

    (DeepMind)

  • Ewan Birney

    (European Bioinformatics Institute)

  • Pushmeet Kohli

    (DeepMind)

  • John Jumper

    (DeepMind)

  • Demis Hassabis

    (DeepMind)

Abstract

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold2, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.

Suggested Citation

  • Kathryn Tunyasuvunakool & Jonas Adler & Zachary Wu & Tim Green & Michal Zielinski & Augustin Žídek & Alex Bridgland & Andrew Cowie & Clemens Meyer & Agata Laydon & Sameer Velankar & Gerard J. Kleywegt, 2021. "Highly accurate protein structure prediction for the human proteome," Nature, Nature, vol. 596(7873), pages 590-596, August.
  • Handle: RePEc:nat:nature:v:596:y:2021:i:7873:d:10.1038_s41586-021-03828-1
    DOI: 10.1038/s41586-021-03828-1
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41586-021-03828-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1038/s41586-021-03828-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nature:v:596:y:2021:i:7873:d:10.1038_s41586-021-03828-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.