IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1006457.html
   My bibliography  Save this article

Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes

Author

Listed:
  • Weilong Zhao
  • Xinwei Sher

Abstract

A number of machine learning-based predictors have been developed for identifying immunogenic T-cell epitopes based on major histocompatibility complex (MHC) class I and II binding affinities. Rationally selecting the most appropriate tool has been complicated by the evolving training data and machine learning methods. Despite the recent advances made in generating high-quality MHC-eluted, naturally processed ligandome, the reliability of new predictors on these epitopes has yet to be evaluated. This study reports the latest benchmarking on an extensive set of MHC-binding predictors by using newly available, untested data of both synthetic and naturally processed epitopes. 32 human leukocyte antigen (HLA) class I and 24 HLA class II alleles are included in the blind test set. Artificial neural network (ANN)-based approaches demonstrated better performance than regression-based machine learning and structural modeling. Among the 18 predictors benchmarked, ANN-based mhcflurry and nn_align perform the best for MHC class I 9-mer and class II 15-mer predictions, respectively, on binding/non-binding classification (Area Under Curves = 0.911). NetMHCpan4 also demonstrated comparable predictive power. Our customization of mhcflurry to a pan-HLA predictor has achieved similar accuracy to NetMHCpan. The overall accuracy of these methods are comparable between 9-mer and 10-mer testing data. However, the top methods deliver low correlations between the predicted versus the experimental affinities for strong MHC binders. When used on naturally processed MHC-ligands, tools that have been trained on elution data (NetMHCpan4 and MixMHCpred) shows better accuracy than pure binding affinity predictor. The variability of false prediction rate is considerable among HLA types and datasets. Finally, structure-based predictor of Rosetta FlexPepDock is less optimal compared to the machine learning approaches. With our benchmarking of MHC-binding and MHC-elution predictors using a comprehensive metrics, a unbiased view for establishing best practice of T-cell epitope predictions is presented, facilitating future development of methods in immunogenomics.Author summary: Computationally predicting antigen peptide sequences that elicit T-cell immune response has broad and significant impact on vaccine design. The most widely accepted approach is to rely on machine learning classifier, trained on large-scale major-histocompatibility complex (MHC)-binding peptide dataset. Because of the constant development of machine learning algorithms and expanding training data, providing comprehensive benchmarking of existing algorithms on blind testing dataset is important for recognizing the pros and cons of different algorithms and providing guidelines on specific applications. Here we present a study of such benchmarking by characterizing on a wide array of accuracy metrics, highlighting the best-in-class algorithms as well as their limitations. The rising concept that “naturally presented” antigen epitopes are more likely to generate effective T-cell immune response has led us to also consider the accuracy of these machine learning algorithms on predicting naturally presented peptides. We demonstrate that recent advance in incorporating high-quality naturally presented peptide data from mass spectrometry experiments has improved the accuracy. Our benchmarking of machine learning predictors for MHC-binding and MHC-naturally presented antigen peptides contributes to establishing best practice of computational T-cell epitope analysis, which also has implication in tumor neoantigen-based cancer vaccine discovery.

Suggested Citation

  • Weilong Zhao & Xinwei Sher, 2018. "Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes," PLOS Computational Biology, Public Library of Science, vol. 14(11), pages 1-28, November.
  • Handle: RePEc:plo:pcbi00:1006457
    DOI: 10.1371/journal.pcbi.1006457
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006457
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006457&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1006457?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Michal Bassani-Sternberg & Eva Bräunlein & Richard Klar & Thomas Engleitner & Pavel Sinitcyn & Stefan Audehm & Melanie Straub & Julia Weber & Julia Slotta-Huspenina & Katja Specht & Marc E. Martignoni, 2016. "Direct identification of clinically relevant neoepitopes presented on native human melanoma tissue by mass spectrometry," Nature Communications, Nature, vol. 7(1), pages 1-16, December.
    2. Luo, Jingqin & Xiong, Chengjie, 2012. "DiagTest3Grp: An R Package for Analyzing Diagnostic Tests with Three Ordinal Groups," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 51(i03).
    3. Massimo Andreatta & Claus Schafer-Nielsen & Ole Lund & Søren Buus & Morten Nielsen, 2011. "NNAlign: A Web-Based Prediction Method Allowing Non-Expert End-User Discovery of Sequence Motifs in Quantitative Peptide Data," PLOS ONE, Public Library of Science, vol. 6(11), pages 1-11, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Samuel Rivero-Hinojosa & Melanie Grant & Aswini Panigrahi & Huizhen Zhang & Veronika Caisova & Catherine M. Bollard & Brian R. Rood, 2021. "Proteogenomic discovery of neoantigens facilitates personalized multi-antigen targeted T cell immunotherapy for brain tumors," Nature Communications, Nature, vol. 12(1), pages 1-15, December.
    2. Sinu Paul & Nathan P Croft & Anthony W Purcell & David C Tscharke & Alessandro Sette & Morten Nielsen & Bjoern Peters, 2020. "Benchmarking predictions of MHC class I restricted T cell epitopes in a comprehensively studied model system," PLOS Computational Biology, Public Library of Science, vol. 16(5), pages 1-18, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ben Kiprono Koech, 2021. "Estimation of Receiver Operating Characteristic Surface Using Mixtures of Finite Polya Trees (MFPT)," International Journal of Statistics and Probability, Canadian Center of Science and Education, vol. 10(2), pages 1-18, March.
    2. Georges Bedran & Daniel A. Polasky & Yi Hsiao & Fengchao Yu & Felipe Veiga Leprevost & Javier A. Alfaro & Marcin Cieslik & Alexey I. Nesvizhskii, 2023. "Unraveling the glycosylated immunopeptidome with HLA-Glyco," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    3. Lei Xin & Rui Qiao & Xin Chen & Hieu Tran & Shengying Pan & Sahar Rabinoviz & Haibo Bian & Xianliang He & Brenton Morse & Baozhen Shan & Ming Li, 2022. "A streamlined platform for analyzing tera-scale DDA and DIA mass spectrometry data enables highly sensitive immunopeptidomics," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    4. Jennifer G. Abelin & Erik J. Bergstrom & Keith D. Rivera & Hannah B. Taylor & Susan Klaeger & Charles Xu & Eva K. Verzani & C. Jackson White & Hilina B. Woldemichael & Maya Virshup & Meagan E. Olive &, 2023. "Workflow enabling deepscale immunopeptidome, proteome, ubiquitylome, phosphoproteome, and acetylome analyses of sample-limited tissues," Nature Communications, Nature, vol. 14(1), pages 1-22, December.
    5. Samuel Rivero-Hinojosa & Melanie Grant & Aswini Panigrahi & Huizhen Zhang & Veronika Caisova & Catherine M. Bollard & Brian R. Rood, 2021. "Proteogenomic discovery of neoantigens facilitates personalized multi-antigen targeted T cell immunotherapy for brain tumors," Nature Communications, Nature, vol. 12(1), pages 1-15, December.
    6. Jens Bauer & Natalie Köhler & Yacine Maringer & Philip Bucher & Tatjana Bilich & Melissa Zwick & Severin Dicks & Annika Nelde & Marissa Dubbelaar & Jonas Scheid & Marcel Wacker & Jonas S. Heitmann & S, 2022. "The oncogenic fusion protein DNAJB1-PRKACA can be specifically targeted by peptide-based immunotherapy in fibrolamellar hepatocellular carcinoma," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    7. Naomi Hoenisch Gravel & Annika Nelde & Jens Bauer & Lena Mühlenbruch & Sarah M. Schroeder & Marian C. Neidert & Jonas Scheid & Steffen Lemke & Marissa L. Dubbelaar & Marcel Wacker & Anna Dengler & Rei, 2023. "TOFIMS mass spectrometry-based immunopeptidomics refines tumor antigen identification," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    8. Wen-Feng Zeng & Xie-Xuan Zhou & Sander Willems & Constantin Ammar & Maria Wahle & Isabell Bludau & Eugenia Voytik & Maximillian T. Strauss & Matthias Mann, 2022. "AlphaPeptDeep: a modular deep learning framework to predict peptide properties for proteomics," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    9. Yury Patskovsky & Aswin Natarajan & Larysa Patskovska & Samantha Nyovanie & Bishnu Joshi & Benjamin Morin & Christine Brittsan & Olivia Huber & Samuel Gordon & Xavier Michelet & Florian Schmitzberger , 2023. "Molecular mechanism of phosphopeptide neoantigen immunogenicity," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    10. Hanqing Liao & Carolina Barra & Zhicheng Zhou & Xu Peng & Isaac Woodhouse & Arun Tailor & Robert Parker & Alexia Carré & Persephone Borrow & Michael J. Hogan & Wayne Paes & Laurence C. Eisenlohr & Rob, 2024. "MARS an improved de novo peptide candidate selection method for non-canonical antigen target discovery in cancer," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    11. Celina Tretter & Niklas Andrade Krätzig & Matteo Pecoraro & Sebastian Lange & Philipp Seifert & Clara Frankenberg & Johannes Untch & Gabriela Zuleger & Mathias Wilhelm & Daniel P. Zolg & Florian S. Dr, 2023. "Proteogenomic analysis reveals RNA as a source for tumor-agnostic neoantigen identification," Nature Communications, Nature, vol. 14(1), pages 1-22, December.
    12. Kevin L. Yang & Fengchao Yu & Guo Ci Teo & Kai Li & Vadim Demichev & Markus Ralser & Alexey I. Nesvizhskii, 2023. "MSBooster: improving peptide identification rates using deep learning-based features," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    13. Ashish Goyal & Jens Bauer & Joschka Hey & Dimitris N. Papageorgiou & Ekaterina Stepanova & Michael Daskalakis & Jonas Scheid & Marissa Dubbelaar & Boris Klimovich & Dominic Schwarz & Melanie Märklin &, 2023. "DNMT and HDAC inhibition induces immunogenic neoantigens from human endogenous retroviral element-derived transcripts," Nature Communications, Nature, vol. 14(1), pages 1-19, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006457. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.