Author
Listed:
- Mauro Nievas Offidani
(Electrical and Computer Engineering Department, Universidad Nacional del Sur, Bahía Blanca 8000, Argentina)
- Facundo Roffet
(Electrical and Computer Engineering Department, Universidad Nacional del Sur, Bahía Blanca 8000, Argentina
Institute of Computer Science and Engineering, National Scientific and Technological Research Council of Argentina (CONICET), Bahía Blanca 8000, Argentina)
- María Carolina González Galtier
(Independent Researcher, Bialet Massé 5158, Argentina)
- Miguel Massiris
(Electrical and Computer Engineering Department, Universidad Nacional del Sur, Bahía Blanca 8000, Argentina
Institute of Computer Science and Engineering, National Scientific and Technological Research Council of Argentina (CONICET), Bahía Blanca 8000, Argentina)
- Claudio Delrieux
(Electrical and Computer Engineering Department, Universidad Nacional del Sur, Bahía Blanca 8000, Argentina
Institute of Computer Science and Engineering, National Scientific and Technological Research Council of Argentina (CONICET), Bahía Blanca 8000, Argentina)
Abstract
High-quality, openly accessible clinical datasets remain a significant bottleneck in advancing both research and clinical applications within medical artificial intelligence. Case reports, often rich in multimodal clinical data, represent an underutilized resource for developing medical AI applications. We present an enhanced version of MultiCaRe, a dataset derived from open-access case reports on PubMed Central. This new version addresses the limitations identified in the previous release and incorporates newly added clinical cases and images (totaling 93,816 and 130,791, respectively), along with a refined hierarchical taxonomy featuring over 140 categories. Image labels have been meticulously curated using a combination of manual and machine learning-based label generation and validation, ensuring a higher quality for image classification tasks and the fine-tuning of multimodal models. To facilitate its use, we also provide a Python package for dataset manipulation, pretrained models for medical image classification, and two dedicated websites. The updated MultiCaRe dataset expands the resources available for multimodal AI research in medicine. Its scale, quality, and accessibility make it a valuable tool for developing medical AI systems, as well as for educational purposes in clinical and computational fields.
Suggested Citation
Mauro Nievas Offidani & Facundo Roffet & María Carolina González Galtier & Miguel Massiris & Claudio Delrieux, 2025.
"An Open-Source Clinical Case Dataset for Medical Image Classification and Multimodal AI Applications,"
Data, MDPI, vol. 10(8), pages 1-21, July.
Handle:
RePEc:gam:jdataj:v:10:y:2025:i:8:p:123-:d:1714176
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:10:y:2025:i:8:p:123-:d:1714176. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.