IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0198216.html
   My bibliography  Save this article

Predicting human protein function with multi-task deep neural networks

Author

Listed:
  • Rui Fa
  • Domenico Cozzetto
  • Cen Wan
  • David T Jones

Abstract

Machine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multi-task deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks).MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability.

Suggested Citation

  • Rui Fa & Domenico Cozzetto & Cen Wan & David T Jones, 2018. "Predicting human protein function with multi-task deep neural networks," PLOS ONE, Public Library of Science, vol. 13(6), pages 1-16, June.
  • Handle: RePEc:plo:pone00:0198216
    DOI: 10.1371/journal.pone.0198216
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0198216
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0198216&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0198216?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sheng Wang & Siqi Sun & Zhen Li & Renyu Zhang & Jinbo Xu, 2017. "Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model," PLOS Computational Biology, Public Library of Science, vol. 13(1), pages 1-34, January.
    2. Alexandra M Schnoes & Shoshana D Brown & Igor Dodevski & Patricia C Babbitt, 2009. "Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies," PLOS Computational Biology, Public Library of Science, vol. 5(12), pages 1-13, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Peicong Lin & Yumeng Yan & Huanyu Tao & Sheng-You Huang, 2023. "Deep transfer learning for inter-chain contact predictions of transmembrane protein complexes," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    2. Nicolae Sapoval & Amirali Aghazadeh & Michael G. Nute & Dinler A. Antunes & Advait Balaji & Richard Baraniuk & C. J. Barberan & Ruth Dannenfelser & Chen Dun & Mohammadamin Edrisi & R. A. Leo Elworth &, 2022. "Current progress and open challenges for applying deep learning across the biosciences," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    3. Michal Brylinski & Daswanth Lingam, 2012. "eThread: A Highly Optimized Machine Learning-Based Approach to Meta-Threading and the Modeling of Protein Tertiary Structures," PLOS ONE, Public Library of Science, vol. 7(11), pages 1-12, November.
    4. Rahmatullah Roche & Sutanu Bhattacharya & Debswapna Bhattacharya, 2021. "Hybridized distance- and contact-based hierarchical structure modeling for folding soluble and membrane proteins," PLOS Computational Biology, Public Library of Science, vol. 17(2), pages 1-31, February.
    5. Thomas J Sharpton & Samantha J Riesenfeld & Steven W Kembel & Joshua Ladau & James P O'Dwyer & Jessica L Green & Jonathan A Eisen & Katherine S Pollard, 2011. "PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data," PLOS Computational Biology, Public Library of Science, vol. 7(1), pages 1-13, January.
    6. Shuangxi Ji & Tuğçe Oruç & Liam Mead & Muhammad Fayyaz Rehman & Christopher Morton Thomas & Sam Butterworth & Peter James Winn, 2019. "DeepCDpred: Inter-residue distance and contact prediction for improved prediction of protein structure," PLOS ONE, Public Library of Science, vol. 14(1), pages 1-15, January.
    7. Juan A Morales-Cordovilla & Victoria Sanchez & Martin Ratajczak, 2018. "Protein alignment based on higher order conditional random fields for template-based modeling," PLOS ONE, Public Library of Science, vol. 13(6), pages 1-14, June.
    8. Akira R Kinjo & Haruki Nakamura, 2012. "Composite Structural Motifs of Binding Sites for Delineating Biological Functions of Proteins," PLOS ONE, Public Library of Science, vol. 7(2), pages 1-11, February.
    9. Shivangi & Laxman S Meena & Md Amjad Beg, 2018. "Insights of Rv2921c (Ftsy) Gene of Mycobacterium tuberculosis H37Rv To Prove Its Significance by Computational Approach," Biomedical Journal of Scientific & Technical Research, Biomedical Research Network+, LLC, vol. 12(2), pages 9147-9157, December.
    10. Yang Li & Chengxin Zhang & Eric W Bell & Wei Zheng & Xiaogen Zhou & Dong-Jun Yu & Yang Zhang, 2021. "Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks," PLOS Computational Biology, Public Library of Science, vol. 17(3), pages 1-19, March.
    11. Elisa Boari de Lima & Wagner Meira Júnior & Raquel Cardoso de Melo-Minardi, 2016. "Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering," PLOS Computational Biology, Public Library of Science, vol. 12(6), pages 1-32, June.
    12. Matthew N Benedict & Michael B Mundy & Christopher S Henry & Nicholas Chia & Nathan D Price, 2014. "Likelihood-Based Gene Annotations for Gap Filling and Quality Assessment in Genome-Scale Metabolic Models," PLOS Computational Biology, Public Library of Science, vol. 10(10), pages 1-14, October.
    13. Lei Wang & Jiangguo Zhang & Dali Wang & Chen Song, 2022. "Membrane contact probability: An essential and predictive character for the structural and functional studies of membrane proteins," PLOS Computational Biology, Public Library of Science, vol. 18(3), pages 1-27, March.
    14. Wing-Cheong Wong & Sebastian Maurer-Stroh & Frank Eisenhaber, 2010. "More Than 1,001 Problems with Protein Domain Databases: Transmembrane Regions, Signal Peptides and the Issue of Sequence Homology," PLOS Computational Biology, Public Library of Science, vol. 6(7), pages 1-19, July.
    15. Zhiye Guo & Jian Liu & Jeffrey Skolnick & Jianlin Cheng, 2022. "Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    16. Yuval Bussi & Ruti Kapon & Ziv Reich, 2021. "Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy," PLOS ONE, Public Library of Science, vol. 16(10), pages 1-27, October.
    17. Claudio Mirabello & Björn Wallner, 2019. "rawMSA: End-to-end Deep Learning using raw Multiple Sequence Alignments," PLOS ONE, Public Library of Science, vol. 14(8), pages 1-15, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0198216. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.