IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v9y2024i2p25-d1328602.html
   My bibliography  Save this article

Curating, Collecting, and Cataloguing Global COVID-19 Datasets for the Aim of Predicting Personalized Risk

Author

Listed:
  • Sepehr Golriz Khatami

    (Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53757 Sankt Augustin, Germany
    Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53113 Bonn, Germany
    These authors contributed equally to this work.)

  • Astghik Sargsyan

    (Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53757 Sankt Augustin, Germany
    Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53113 Bonn, Germany
    These authors contributed equally to this work.)

  • Maria Francesca Russo

    (Fondazione Policlinico Universitario Agostino Gemelli IRCCS, 00168 Rome, Italy)

  • Daniel Domingo-Fernández

    (Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53757 Sankt Augustin, Germany
    Fraunhofer Center for Machine Learning, 53757 Sankt Augustin, Germany)

  • Andrea Zaliani

    (Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), 22525 Hamburg, Germany)

  • Abish Kaladharan

    (Causality Biomodels, Kinfra Hi-Tech Park, Kalamassery, Cochin 683503, India)

  • Priya Sethumadhavan

    (Causality Biomodels, Kinfra Hi-Tech Park, Kalamassery, Cochin 683503, India)

  • Sarah Mubeen

    (Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53757 Sankt Augustin, Germany
    Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53113 Bonn, Germany
    Fraunhofer Center for Machine Learning, 53757 Sankt Augustin, Germany)

  • Yojana Gadiya

    (Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), 22525 Hamburg, Germany)

  • Reagon Karki

    (Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), 22525 Hamburg, Germany)

  • Stephan Gebel

    (Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53757 Sankt Augustin, Germany)

  • Ram Kumar Ruppa Surulinathan

    (Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53757 Sankt Augustin, Germany
    Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53113 Bonn, Germany)

  • Vanessa Lage-Rupprecht

    (Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53757 Sankt Augustin, Germany)

  • Saulius Archipovas

    (Fraunhofer Institute for Digital Medicine (MEVIS), 28359 Bremen, Germany)

  • Geltrude Mingrone

    (Fondazione Policlinico Universitario Agostino Gemelli IRCCS, 00168 Rome, Italy
    Department of Diabetes Research, School of Life Course Sciences, Faculty of Life Sciences & Medicine, King’s College London, London WC2R 2LS, UK
    Medicina e Chirurgia “A.Gemelli”, Dipartimento di Medicina e Chirurgia Traslazionale, Università Cattolica del Sacro Cuore, 00168 Rome, Italy)

  • Marc Jacobs

    (Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53757 Sankt Augustin, Germany)

  • Carsten Claussen

    (Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), 22525 Hamburg, Germany)

  • Martin Hofmann-Apitius

    (Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53757 Sankt Augustin, Germany
    Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53113 Bonn, Germany)

  • Alpha Tom Kodamullil

    (Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), 53757 Sankt Augustin, Germany)

Abstract

Although hundreds of datasets have been published since the beginning of the coronavirus pandemic, there is a lack of centralized resources where these datasets are listed and harmonized to facilitate their applicability and uptake by predictive modeling approaches. Firstly, such a centralized resource provides information about data owners to researchers who are searching datasets to develop their predictive models. Secondly, the harmonization of the datasets supports simultaneously taking advantage of several similar datasets. This, in turn, does not only ease the imperative external validation of data-driven models but can also be used for virtual cohort generation, which helps to overcome data sharing impediments. Here, we present that the COVID-19 data catalogue is a repository that provides a landscape view of COVID-19 studies and datasets as a putative source to enable researchers to develop personalized COVID-19 predictive risk models. The COVID-19 data catalogue currently contains over 400 studies and their relevant information collected from a wide range of global sources such as global initiatives, clinical trial repositories, publications, and data repositories. Further, the curated content stored in this data catalogue is complemented by a web application, providing visualizations of these studies, including their references, relevant information such as measured variables, and the geographical locations of where these studies were performed. This resource is one of the first to capture, organize, and store studies, datasets, and metadata related to COVID-19 in a comprehensive repository. We believe that our work will facilitate future research and development of personalized predictive risk models for COVID-19.

Suggested Citation

  • Sepehr Golriz Khatami & Astghik Sargsyan & Maria Francesca Russo & Daniel Domingo-Fernández & Andrea Zaliani & Abish Kaladharan & Priya Sethumadhavan & Sarah Mubeen & Yojana Gadiya & Reagon Karki & St, 2024. "Curating, Collecting, and Cataloguing Global COVID-19 Datasets for the Aim of Predicting Personalized Risk," Data, MDPI, vol. 9(2), pages 1-19, January.
  • Handle: RePEc:gam:jdataj:v:9:y:2024:i:2:p:25-:d:1328602
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/9/2/25/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/9/2/25/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jude Moutchia & Pratik Pokharel & Aldiona Kerri & Kaodi McGaw & Shreeshti Uchai & Miriam Nji & Michael Goodman, 2020. "Clinical laboratory parameters associated with severe or critical novel coronavirus disease 2019 (COVID-19): A systematic review and meta-analysis," PLOS ONE, Public Library of Science, vol. 15(10), pages 1-25, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:9:y:2024:i:2:p:25-:d:1328602. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.