identifying cases of type 2 diabetes from heterogeneous ... · identifying cases of type 2 diabetes...
TRANSCRIPT
Identifying cases of type 2 diabetes from Identifying cases of type 2 diabetes from Identifying cases of type 2 diabetes from Identifying cases of type 2 diabetes from
heterogeneous data sources: heterogeneous data sources: heterogeneous data sources: heterogeneous data sources:
strategy from the EMIF Projectstrategy from the EMIF Projectstrategy from the EMIF Projectstrategy from the EMIF Project
Giuseppe Roberto
Agenzia regionale di sanità della Toscana
The EMIF projectThe EMIF projectThe EMIF projectThe EMIF project
The EMIFThe EMIFThe EMIFThe EMIF----PlatformPlatformPlatformPlatform
• Represents a federation of heterogeneous sources of real world health data (e.g. administrative, hospital or primary
care databases, disease registries, biobanks)
• Currently collects information on around 40 millions of European citizens from 7 different EU countries
• Different healthcare setting, database structure, content, reasons for recording, language, coding terminologies healthcare system organization
How to combine data from these sources???
How to provide sufficient insight into the datato correctly interpret study results???
The data derivation processThe data derivation processThe data derivation processThe data derivation process
Formal Definition
Collect Experience
Literature Search
Terminology Mapping
Data Extraction
Results Analysis
Final Decision
• A standard process to identify any condition
or event of interest across multiple data sources independently from their specific characteristics was designed
• The identification of patients with type 2
diabetes was used as a test case
Participating data sourcesParticipating data sourcesParticipating data sourcesParticipating data sources
THINPrimary care
UK
ARHUSRecord Linkage System
DK
IMASISHospital
ES HSD
Primary careIT
ARSRecord Linkage System
IT
PHARMORecord Linkage System
NL
IPCIPrimary care
NL
EGCUTPrimary care
EE
Participating data sourcesParticipating data sourcesParticipating data sourcesParticipating data sources
• 8 data sources, over 20 millions subjects from 6 different EU countries
- 3 PCD= primary care data sources (Italy, Netherlands, UK)- 3 RLD= record linkage systems (Italy, Netherlands, Denmark)- 1 HD= hospital data source (Spain)
- 1 BD= biobank (Estonia)
• Different coding terminologies
- Diagnoses: READ, ICD9CM, ICD10, ICPC, free text
- Drugs: ATC
- Utilization of diagnostic test or laboratory values: local service terminologies
• Data domains available do not overlap across data sources
• Combining a central expert-based clinical and operational definition of T2DM (top-down approach) with local expertise (bottom-up approach), a list of standard algorithms, referred to as “components”, was created
• Each component was based on a single data domain among:
- diagnoses (primary, secondary, or inpatients care)
- drugs (dispensing/prescripton)
- utilization of a diagnostic test
- laboratory results
• The Unified Medical Language System (UMLS) was used for semantic
harmonization of coding systems: pertinent medical concepts embedded in each component were identified and projected to local terminologies
The The The The ““““component algorithmcomponent algorithmcomponent algorithmcomponent algorithm”””” strategystrategystrategystrategy(generating a list of standard algorithms)
The The The The ““““component algorithmcomponent algorithmcomponent algorithmcomponent algorithm”””” strategystrategystrategystrategy(building a composite algorithm to identify T2DM)
≥ 2 non-insulin antdiabetics DRUGS
≥ 2 insulin DRUGS
OR
≥ 1 INPATIENT DIAGNOSES of T2DM
AND NOT ≥ 1 DIAGNOSES of T1DM
≥ 2 Hba1C VALUES above threshold
≥ 5 utilization of Hba1C TEST in one year
OR
OR
OR
Recommended composite algorithmsRecommended composite algorithmsRecommended composite algorithmsRecommended composite algorithms(data source-tailored combinations of standardized components)
Recommended composite algorithmsRecommended composite algorithmsRecommended composite algorithmsRecommended composite algorithms(data source tailored combinations of standardized components)
Benchmark of components acrossBenchmark of components acrossBenchmark of components acrossBenchmark of components across
heterogeneous data sourcesheterogeneous data sourcesheterogeneous data sourcesheterogeneous data sources≥ 1 prymary care diagnoses1 prymary care diagnoses1 prymary care diagnoses1 prymary care diagnoses ≥ 1 inpatient care diagnoses1 inpatient care diagnoses1 inpatient care diagnoses1 inpatient care diagnoses
≥ 2 non2 non2 non2 non----insulininsulininsulininsulin antidiabeticsantidiabeticsantidiabeticsantidiabetics ≥ 2 Hba1c values above threshold2 Hba1c values above threshold2 Hba1c values above threshold2 Hba1c values above threshold
Percentage of cases identified with:
•Diagnoses
- 93-100% in primary care data sources
- 15-73% in record linkage systems
•Drugs
- 58-83% in primary care data sources
- 81-100% in record linkage systems
ComponentsComponentsComponentsComponents’’’’ contribution to the total contribution to the total contribution to the total contribution to the total
population of casespopulation of casespopulation of casespopulation of cases
Discussion and conclusionsDiscussion and conclusionsDiscussion and conclusionsDiscussion and conclusions
•This data derivation process allows the identification of T2DM building data source-tailored strategies in a standard fashion
•Important information for quali-quantitative evaluation of the total population of cases identified in each data source can be obtained
•Benchmarking of component algorithms across otherwise non-comparable data sources is possible
•Using a priori extracted components in different logical combination, further sensitivity analyses can be easily planned to discuss possible heterogeneity of results across data sources
...thanks for listening!!!