Author
Listed:
- Elí Cruz-Parada
(División de Estudios de Posgrado e Investigación, Instituto Tecnológico de Oaxaca, Tecnológico Nacional de México, Oaxaca de Juárez C.P. 68030, Mexico)
- Guillermina Vivar-Estudillo
(Facultad de Sistemas Biológicos e Innovación Tecnológica, Universidad Autónoma Benito Juárez de Oaxaca, Oaxaca de Juárez C.P. 68120, Mexico)
- Laura Pérez-Campos Mayoral
(Centro de Investigación de la Facultad de Medicina UNAM-UABJO, Universidad Autónoma Benito Juárez de Oaxaca, Oaxaca de Juárez C.P. 68020, Mexico)
- María Teresa Hernández-Huerta
(Secretaría de Ciencia, Humanidades, Tecnología e Innovación (SECIHTI), Facultad de Medicina y Cirugía, Universidad Autónoma Benito Juárez de Oaxaca, Oaxaca de Juárez C.P. 68020, Mexico)
- Alma Dolores Pérez-Santiago
(División de Estudios de Posgrado e Investigación, Instituto Tecnológico de Oaxaca, Tecnológico Nacional de México, Oaxaca de Juárez C.P. 68030, Mexico)
- Carlos Romero-Diaz
(División de Estudios de Posgrado e Investigación, Instituto Tecnológico de Oaxaca, Tecnológico Nacional de México, Oaxaca de Juárez C.P. 68030, Mexico)
- Eduardo Pérez-Campos Mayoral
(Centro de Investigación de la Facultad de Medicina UNAM-UABJO, Universidad Autónoma Benito Juárez de Oaxaca, Oaxaca de Juárez C.P. 68020, Mexico)
- Iván Antonio García-Montalvo
(División de Estudios de Posgrado e Investigación, Instituto Tecnológico de Oaxaca, Tecnológico Nacional de México, Oaxaca de Juárez C.P. 68030, Mexico)
- Lucia Martínez-Martínez
(Centro de Investigación de la Facultad de Medicina UNAM-UABJO, Universidad Autónoma Benito Juárez de Oaxaca, Oaxaca de Juárez C.P. 68020, Mexico)
- Héctor Martínez-Ruiz
(Centro de Investigación de la Facultad de Medicina UNAM-UABJO, Universidad Autónoma Benito Juárez de Oaxaca, Oaxaca de Juárez C.P. 68020, Mexico)
- Idarh Matadamas
(División de Estudios de Posgrado e Investigación, Instituto Tecnológico de Oaxaca, Tecnológico Nacional de México, Oaxaca de Juárez C.P. 68030, Mexico)
- Miriam Emily Avendaño-Villegas
(División de Estudios de Posgrado e Investigación, Instituto Tecnológico de Oaxaca, Tecnológico Nacional de México, Oaxaca de Juárez C.P. 68030, Mexico)
- Margarito Martínez Cruz
(División de Estudios de Posgrado e Investigación, Instituto Tecnológico de Oaxaca, Tecnológico Nacional de México, Oaxaca de Juárez C.P. 68030, Mexico)
- Hector Alejandro Cabrera-Fuentes
(Centro de Investigación de la Facultad de Medicina UNAM-UABJO, Universidad Autónoma Benito Juárez de Oaxaca, Oaxaca de Juárez C.P. 68020, Mexico
R&D Group, Vice Presidency Scientific Research & Innovation, Imam Abdulrahman Bin Faisal University (IAU), Dammam P.O. Box 1982, Saudi Arabia
División de Estudios de Posgrado e Investigación, Instituto Tecnológico de Tijuana, Tecnológico Nacional de México, Tijuana C.P. 22414, Mexico)
- Aldo Eleazar Pérez-Ramos
(División de Estudios de Posgrado e Investigación, Instituto Tecnológico de Oaxaca, Tecnológico Nacional de México, Oaxaca de Juárez C.P. 68030, Mexico)
- Eduardo Lorenzo Pérez-Campos
(División de Estudios de Posgrado e Investigación, Instituto Tecnológico de Oaxaca, Tecnológico Nacional de México, Oaxaca de Juárez C.P. 68030, Mexico)
- Carlos Mauricio Lastre-Domínguez
(División de Estudios de Posgrado e Investigación, Instituto Tecnológico de Oaxaca, Tecnológico Nacional de México, Oaxaca de Juárez C.P. 68030, Mexico)
Abstract
This work presents a synthetic binary database of Dengue, Zika, Chikungunya, and Influenza constructed entirely from clinical information extracted from the scientific literature. Due to the limited availability and heterogeneity of clinical records in medical units—particularly for arboviral diseases—existing datasets are often insufficient for developing robust Machine Learning models. To address this limitation, an extensive search of PubMed and Google Scholar was conducted between February 2024 and May 2025, following strict selection criteria focused on diagnostic confirmation. The resulting dataset comprises 48,214 records and 67 standardized signs and symptoms, homogenized across all pathologies. Each record is fully binary, contains no missing values, and represents symptom presence or absence. The composition includes 22,379 Dengue records, 7135 Zika records, 7959 Chikungunya records, and 10,741 Influenza records. Symptom prevalence was analyzed, revealing consistency with patterns reported in epidemiological and clinical studies, supporting the dataset’s plausibility. This database enables statistical exploration and direct integration into Machine Learning pipelines without the need for imputation. It has been used in an in silico predictive study of arboviral diseases, employing Influenza as a negative control, and serves as a reproducible, literature-derived resource for computational modeling.
Suggested Citation
Elí Cruz-Parada & Guillermina Vivar-Estudillo & Laura Pérez-Campos Mayoral & María Teresa Hernández-Huerta & Alma Dolores Pérez-Santiago & Carlos Romero-Diaz & Eduardo Pérez-Campos Mayoral & Iván Anto, 2026.
"Synthetic and Encoded Database of Dengue, Zika, Chikungunya, and Influenza Derived from the Literature,"
Data, MDPI, vol. 11(2), pages 1-17, February.
Handle:
RePEc:gam:jdataj:v:11:y:2026:i:2:p:33-:d:1859144
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:11:y:2026:i:2:p:33-:d:1859144. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager The email address of this maintainer does not seem to be valid anymore. Please ask MDPI Indexing Manager to update the entry or send us the correct address
(email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.