identifying cases of type 2 diabetes from heterogeneous ... · identifying cases of type 2 diabetes...

Identifying cases of type 2 diabetes from Identifying cases of type 2 diabetes from Identifying cases of type 2 diabetes from Identifying cases of type 2 diabetes from

heterogeneous data sources: heterogeneous data sources: heterogeneous data sources: heterogeneous data sources:

strategy from the EMIF Projectstrategy from the EMIF Projectstrategy from the EMIF Projectstrategy from the EMIF Project

Giuseppe Roberto

Agenzia regionale di sanità della Toscana

The EMIF projectThe EMIF projectThe EMIF projectThe EMIF project

The EMIFThe EMIFThe EMIFThe EMIF----PlatformPlatformPlatformPlatform

• Represents a federation of heterogeneous sources of real world health data (e.g. administrative, hospital or primary

care databases, disease registries, biobanks)

• Currently collects information on around 40 millions of European citizens from 7 different EU countries

• Different healthcare setting, database structure, content, reasons for recording, language, coding terminologies healthcare system organization

How to combine data from these sources???

How to provide sufficient insight into the datato correctly interpret study results???

The data derivation processThe data derivation processThe data derivation processThe data derivation process

Formal Definition

Collect Experience

Literature Search

Terminology Mapping

Data Extraction

Results Analysis

Final Decision

• A standard process to identify any condition

or event of interest across multiple data sources independently from their specific characteristics was designed

• The identification of patients with type 2

diabetes was used as a test case

Participating data sourcesParticipating data sourcesParticipating data sourcesParticipating data sources

THINPrimary care

UK

ARHUSRecord Linkage System

DK

IMASISHospital

ES HSD

Primary careIT

ARSRecord Linkage System

IT

PHARMORecord Linkage System

NL

IPCIPrimary care

NL

EGCUTPrimary care

EE

Participating data sourcesParticipating data sourcesParticipating data sourcesParticipating data sources

• 8 data sources, over 20 millions subjects from 6 different EU countries

- 3 PCD= primary care data sources (Italy, Netherlands, UK)- 3 RLD= record linkage systems (Italy, Netherlands, Denmark)- 1 HD= hospital data source (Spain)

- 1 BD= biobank (Estonia)

• Different coding terminologies

- Diagnoses: READ, ICD9CM, ICD10, ICPC, free text

- Drugs: ATC

- Utilization of diagnostic test or laboratory values: local service terminologies

• Data domains available do not overlap across data sources

• Combining a central expert-based clinical and operational definition of T2DM (top-down approach) with local expertise (bottom-up approach), a list of standard algorithms, referred to as “components”, was created

• Each component was based on a single data domain among:

- diagnoses (primary, secondary, or inpatients care)

- drugs (dispensing/prescripton)

- utilization of a diagnostic test

- laboratory results

• The Unified Medical Language System (UMLS) was used for semantic

harmonization of coding systems: pertinent medical concepts embedded in each component were identified and projected to local terminologies

The The The The ““““component algorithmcomponent algorithmcomponent algorithmcomponent algorithm”””” strategystrategystrategystrategy(generating a list of standard algorithms)

The The The The ““““component algorithmcomponent algorithmcomponent algorithmcomponent algorithm”””” strategystrategystrategystrategy(building a composite algorithm to identify T2DM)

≥ 2 non-insulin antdiabetics DRUGS

≥ 2 insulin DRUGS

OR

≥ 1 INPATIENT DIAGNOSES of T2DM

AND NOT ≥ 1 DIAGNOSES of T1DM

≥ 2 Hba1C VALUES above threshold

≥ 5 utilization of Hba1C TEST in one year

OR

OR

OR

Recommended composite algorithmsRecommended composite algorithmsRecommended composite algorithmsRecommended composite algorithms(data source-tailored combinations of standardized components)

Recommended composite algorithmsRecommended composite algorithmsRecommended composite algorithmsRecommended composite algorithms(data source tailored combinations of standardized components)

Benchmark of components acrossBenchmark of components acrossBenchmark of components acrossBenchmark of components across

heterogeneous data sourcesheterogeneous data sourcesheterogeneous data sourcesheterogeneous data sources≥ 1 prymary care diagnoses1 prymary care diagnoses1 prymary care diagnoses1 prymary care diagnoses ≥ 1 inpatient care diagnoses1 inpatient care diagnoses1 inpatient care diagnoses1 inpatient care diagnoses

≥ 2 non2 non2 non2 non----insulininsulininsulininsulin antidiabeticsantidiabeticsantidiabeticsantidiabetics ≥ 2 Hba1c values above threshold2 Hba1c values above threshold2 Hba1c values above threshold2 Hba1c values above threshold

Percentage of cases identified with:

•Diagnoses

- 93-100% in primary care data sources

- 15-73% in record linkage systems

•Drugs

- 58-83% in primary care data sources

- 81-100% in record linkage systems

ComponentsComponentsComponentsComponents’’’’ contribution to the total contribution to the total contribution to the total contribution to the total

population of casespopulation of casespopulation of casespopulation of cases

Discussion and conclusionsDiscussion and conclusionsDiscussion and conclusionsDiscussion and conclusions

•This data derivation process allows the identification of T2DM building data source-tailored strategies in a standard fashion

•Important information for quali-quantitative evaluation of the total population of cases identified in each data source can be obtained

•Benchmarking of component algorithms across otherwise non-comparable data sources is possible

•Using a priori extracted components in different logical combination, further sensitivity analyses can be easily planned to discuss possible heterogeneity of results across data sources

...thanks for listening!!!

identifying cases of type 2 diabetes from heterogeneous ... · identifying cases of type 2 diabetes...

Documents