Author
Listed:
- Kula Kekeba Tune
(Department of Software Engineering, HPC and Big Data Analytics Center of Excellence, Addis Ababa Science and Technology University, Addis Ababa 16417, Ethiopia)
- Foziya Ahmed Mohammed
(Department of Software Engineering, HPC and Big Data Analytics Center of Excellence, Addis Ababa Science and Technology University, Addis Ababa 16417, Ethiopia
Lina Pharmaceuticals and Medical Devices Inc., Addis Ababa, Ethiopia
Enkoy LLC, 6418 Tiffany Ct, Lanham, MD 20706, USA)
- Juhar Ahmed Mohammed
(Lina Pharmaceuticals and Medical Devices Inc., Addis Ababa, Ethiopia)
- Seid Muhie
(Enkoy LLC, 6418 Tiffany Ct, Lanham, MD 20706, USA
The Geneva Foundation, Silver Spring, MD 20910, USA)
Abstract
Human cervical cancer and pre-cancer research relies on datasets scattered across modality-specific archives, imaging repositories, benchmark platforms, trial registries, and controlled-access catalogs. This fragmentation—combined with heterogeneous metadata, ambiguous use of “cervical” terminology, and inconsistent indexing of pre-cancer and screening/triage resources—limits reproducible discovery, access planning, and cross-modal benchmarking. We present the Cervical Cancer Dataset Catalog (CCDCAT), a machine-readable, versioned dataset of datasets that enumerates host-specific dataset-instance records anchored to stable identifiers and resolvable landing records within an explicitly declared discoverable source universe (U_v1.0) and a frozen discovery/labeling lexicon (Q_v1.0). The CCDCAT spans invasive cervical cancer, pre-cancer/dysplasia, and cervix-focused screening and triage phenotypes, and it covers molecular omics, imaging and microscopy (including cervix photography, cytology, and digital pathology), trial registry records, benchmark resources, and controlled-access catalogs represented as metadata with explicit access pathways. Eligibility and labels are assigned conservatively from source-provided metadata; when evidence is insufficient, the CCDCAT abstains rather than infers. In the initial release (CCDCAT-U_v1.0; v0.1), we enumerate 14 eligible dataset instances across 11 host systems within a declared universe of 21 sources. Releases include manuscript-ready tables and interoperable artifacts (schema, controlled vocabularies, provenance logs, abstention ledgers, and a queryable database), enabling reproducible filtering, linkage, and auditable reuse planning.
Suggested Citation
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:11:y:2026:i:6:p:136-:d:1963069. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager The email address of this maintainer does not seem to be valid anymore. Please ask MDPI Indexing Manager to update the entry or send us the correct address
(email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.