IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v8y2023i2p41-d1072576.html

VPAgs-Dataset4ML: A Dataset to Predict Viral Protective Antigens for Machine Learning-Based Reverse Vaccinology

Author

Listed:
  • Zakia Salod

    (Discipline of Public Health Medicine, University of KwaZulu-Natal, Durban 4051, South Africa)

  • Ozayr Mahomed

    (Discipline of Public Health Medicine, University of KwaZulu-Natal, Durban 4051, South Africa
    Dasman Diabetes Institute, P.O. Box 1180, Dasman 15462, Kuwait City, Kuwait)

Abstract

Reverse vaccinology (RV) is a computer-aided approach for vaccine development that identifies a subset of pathogen proteins as protective antigens (PAgs) or potential vaccine candidates. Machine learning (ML)-based RV is promising, but requires a dataset of PAgs (positives) and non-protective protein sequences (negatives). This study aimed to create an ML dataset, VPAgs-Dataset4ML, to predict viral PAgs based on PAgs obtained from Protegen. We performed seven steps to identify PAgs from the Protegen website and non-protective protein sequences from Universal Protein Resource (UniProt). The seven steps included downloading viral PAgs from Protegen, performing quality checks on PAgs using the standard BLASTp identity check ≤30% via MMseqs2, and computational steps running on Google Colaboratory and the Ubuntu terminal to retrieve and perform quality checks (similar to the PAgs) on non-protective protein sequences as negatives from UniProt. VPAgs-Dataset4ML contains 2145 viral protein sequences, with 210 PAgs in positive.fasta and 1935 non-protective protein sequences in negative.fasta . This dataset can be used to train ML models to predict antigens for various viral pathogens with the aim of developing effective vaccines.

Suggested Citation

  • Zakia Salod & Ozayr Mahomed, 2023. "VPAgs-Dataset4ML: A Dataset to Predict Viral Protective Antigens for Machine Learning-Based Reverse Vaccinology," Data, MDPI, vol. 8(2), pages 1-12, February.
  • Handle: RePEc:gam:jdataj:v:8:y:2023:i:2:p:41-:d:1072576
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/8/2/41/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/8/2/41/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kuhn, Max, 2008. "Building Predictive Models in R Using the caret Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i05).
    2. Kate E. Jones & Nikkita G. Patel & Marc A. Levy & Adam Storeygard & Deborah Balk & John L. Gittleman & Peter Daszak, 2008. "Global trends in emerging infectious diseases," Nature, Nature, vol. 451(7181), pages 990-993, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Arthur Novaes de Amorim & Rob Deardon & Vineet Saini, 2021. "A stacked ensemble method for forecasting influenza-like illness visit volumes at emergency departments," PLOS ONE, Public Library of Science, vol. 16(3), pages 1-15, March.
    2. Nikolett Orosz & Tünde Tóthné Tóth & Gyöngyi Vargáné Gyuró & Zsoltné Tibor Nábrádi & Klára Hegedűsné Sorosi & Zsuzsa Nagy & Éva Rigó & Ádám Kaposi & Gabriella Gömöri & Cornelia Melinda Adi Santoso & A, 2022. "Comparison of Length of Hospital Stay for Community-Acquired Infections Due to Enteric Pathogens, Influenza Viruses and Multidrug-Resistant Bacteria: A Cross-Sectional Study in Hungary," IJERPH, MDPI, vol. 19(23), pages 1-16, November.
    3. Mudassar Arsalan & Omar Mubin & Fady Alnajjar & Belal Alsinglawi, 2020. "COVID-19 Global Risk: Expectation vs. Reality," IJERPH, MDPI, vol. 17(15), pages 1-10, August.
    4. Prabal Das & D. A. Sachindra & Kironmala Chanda, 2022. "Machine Learning-Based Rainfall Forecasting with Multiple Non-Linear Feature Selection Algorithms," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 36(15), pages 6043-6071, December.
    5. Jie Zhao & Ji Chen & Damien Beillouin & Hans Lambers & Yadong Yang & Pete Smith & Zhaohai Zeng & Jørgen E. Olesen & Huadong Zang, 2022. "Global systematic review with meta-analysis reveals yield advantage of legume-based rotations and its drivers," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    6. Piaopiao Chen & Agnès H. Michel & Jianzhi Zhang, 2022. "Transposon insertional mutagenesis of diverse yeast strains suggests coordinated gene essentiality polymorphisms," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    7. Maryna Henrysson & Ranjula Bali Swain & Ashok Swain & Francesco Fuso Nerini, 2024. "Sustainable Development Goals and wellbeing for resilient societies: shocks and recovery," Humanities and Social Sciences Communications, Palgrave Macmillan, vol. 11(1), pages 1-13, December.
    8. Paulo Infante & Gonçalo Jacinto & Anabela Afonso & Leonor Rego & Pedro Nogueira & Marcelo Silva & Vitor Nogueira & José Saias & Paulo Quaresma & Daniel Santos & Patrícia Góis & Paulo Rebelo Manuel, 2023. "Factors That Influence the Type of Road Traffic Accidents: A Case Study in a District of Portugal," Sustainability, MDPI, vol. 15(3), pages 1-16, January.
    9. You, Xuemei & Fan, Xiaonan & Ma, Yinghong & Liu, Zhiyuan, 2025. "Information-disease coupling dynamics with message fatigue and behavioral responses in higher-order weighted networks," Chaos, Solitons & Fractals, Elsevier, vol. 199(P3).
    10. Li, Li & Li, Han & Panagiotelis, Anastasios, 2025. "Boosting domain-specific models with shrinkage: An application in mortality forecasting," International Journal of Forecasting, Elsevier, vol. 41(1), pages 191-207.
    11. Ephrem Habyarimana & Faheem S Baloch, 2021. "Machine learning models based on remote and proximal sensing as potential methods for in-season biomass yields prediction in commercial sorghum fields," PLOS ONE, Public Library of Science, vol. 16(3), pages 1-23, March.
    12. Banks, Jonathan & Rabbani, Arif & Nadkarni, Kabir & Renaud, Evan, 2020. "Estimating parasitic loads related to brine production from a hot sedimentary aquifer geothermal project: A case study from the Clarke Lake gas field, British Columbia," Renewable Energy, Elsevier, vol. 153(C), pages 539-552.
    13. Tinggui Chen & Hui Wang, 2022. "Consumers' purchase intention of wild freshwater fish during the COVID‐19 pandemic," Agribusiness, John Wiley & Sons, Ltd., vol. 38(4), pages 832-849, October.
    14. Rock, Melanie & Buntain, Bonnie J. & Hatfield, Jennifer M. & Hallgrímsson, Benedikt, 2009. "Animal-human connections, "one health," and the syndemic approach to prevention," Social Science & Medicine, Elsevier, vol. 68(6), pages 991-995, March.
    15. Somodi, Imelda & Bede-Fazekas, Ákos & Botta-Dukát, Zoltán & Molnár, Zsolt, 2024. "Confidence and consistency in discrimination: A new family of evaluation metrics for potential distribution models," Ecological Modelling, Elsevier, vol. 491(C).
    16. Manuel Marques-Cruz & Daniel Martinho Dias & João A. Fonseca & Bernardo Sousa-Pinto, 2024. "Ten year citation prediction model for systematic reviews using early years citation data," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(8), pages 4847-4862, August.
    17. World Bank, 2024. "Toward a One Health Approach in Sudan," World Bank Publications - Reports 41580, The World Bank Group.
    18. Diosey Ramon Lugo-Morin, 2020. "Global Food Security in a Pandemic: The Case of the New Coronavirus (COVID-19)," World, MDPI, vol. 1(2), pages 1-20, September.
    19. Ceddia, M.G. & Bardsley, N.O. & Goodwin, R. & Holloway, G.J. & Nocella, G. & Stasi, A., 2013. "A complex system perspective on the emergence and spread of infectious diseases: Integrating economic and ecological aspects," Ecological Economics, Elsevier, vol. 90(C), pages 124-131.
    20. Franklin Amuakwa-Mensah & George Marbuah & Mwenya Mubanga, 2016. "Climate variability and infectious diseases nexus: evidence from Sweden," Working Papers 2016.02, FAERE - French Association of Environmental and Resource Economists.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:8:y:2023:i:2:p:41-:d:1072576. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager The email address of this maintainer does not seem to be valid anymore. Please ask MDPI Indexing Manager to update the entry or send us the correct address (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.