IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0160648.html
   My bibliography  Save this article

Identifying Cases of Type 2 Diabetes in Heterogeneous Data Sources: Strategy from the EMIF Project

Author

Listed:
  • Giuseppe Roberto
  • Ingrid Leal
  • Naveed Sattar
  • A Katrina Loomis
  • Paul Avillach
  • Peter Egger
  • Rients van Wijngaarden
  • David Ansell
  • Sulev Reisberg
  • Mari-Liis Tammesoo
  • Helene Alavere
  • Alessandro Pasqua
  • Lars Pedersen
  • James Cunningham
  • Lara Tramontan
  • Miguel A Mayer
  • Ron Herings
  • Preciosa Coloma
  • Francesco Lapi
  • Miriam Sturkenboom
  • Johan van der Lei
  • Martijn J Schuemie
  • Peter Rijnbeek
  • Rosa Gini

Abstract

Due to the heterogeneity of existing European sources of observational healthcare data, data source-tailored choices are needed to execute multi-data source, multi-national epidemiological studies. This makes transparent documentation paramount. In this proof-of-concept study, a novel standard data derivation procedure was tested in a set of heterogeneous data sources. Identification of subjects with type 2 diabetes (T2DM) was the test case. We included three primary care data sources (PCDs), three record linkage of administrative and/or registry data sources (RLDs), one hospital and one biobank. Overall, data from 12 million subjects from six European countries were extracted. Based on a shared event definition, sixteeen standard algorithms (components) useful to identify T2DM cases were generated through a top-down/bottom-up iterative approach. Each component was based on one single data domain among diagnoses, drugs, diagnostic test utilization and laboratory results. Diagnoses-based components were subclassified considering the healthcare setting (primary, secondary, inpatient care). The Unified Medical Language System was used for semantic harmonization within data domains. Individual components were extracted and proportion of population identified was compared across data sources. Drug-based components performed similarly in RLDs and PCDs, unlike diagnoses-based components. Using components as building blocks, logical combinations with AND, OR, AND NOT were tested and local experts recommended their preferred data source-tailored combination. The population identified per data sources by resulting algorithms varied from 3.5% to 15.7%, however, age-specific results were fairly comparable. The impact of individual components was assessed: diagnoses-based components identified the majority of cases in PCDs (93–100%), while drug-based components were the main contributors in RLDs (81–100%). The proposed data derivation procedure allowed the generation of data source-tailored case-finding algorithms in a standardized fashion, facilitated transparent documentation of the process and benchmarking of data sources, and provided bases for interpretation of possible inter-data source inconsistency of findings in future studies.

Suggested Citation

  • Giuseppe Roberto & Ingrid Leal & Naveed Sattar & A Katrina Loomis & Paul Avillach & Peter Egger & Rients van Wijngaarden & David Ansell & Sulev Reisberg & Mari-Liis Tammesoo & Helene Alavere & Alessan, 2016. "Identifying Cases of Type 2 Diabetes in Heterogeneous Data Sources: Strategy from the EMIF Project," PLOS ONE, Public Library of Science, vol. 11(8), pages 1-18, August.
  • Handle: RePEc:plo:pone00:0160648
    DOI: 10.1371/journal.pone.0160648
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0160648
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0160648&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0160648?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Andrea Spini & Giulia Hyeraci & Claudia Bartolini & Sandra Donnini & Pietro Rosellini & Rosa Gini & Marina Ziche & Francesco Salvo & Giuseppe Roberto, 2021. "Real-World Utilization of Target- and Immunotherapies for Lung Cancer: A Scoping Review of Studies Based on Routinely Collected Electronic Healthcare Data," IJERPH, MDPI, vol. 18(14), pages 1-21, July.
    2. Lester Darryl Geneviève & Andrea Martani & Maria Christina Mallet & Tenzin Wangmo & Bernice Simone Elger, 2019. "Factors influencing harmonized health data collection, sharing and linkage in Denmark and Switzerland: A systematic review," PLOS ONE, Public Library of Science, vol. 14(12), pages 1-44, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0160648. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.