social domain record linkage environmentsocial domain record linkage environment (sdrle) proof of...

15
Social Domain Record Linkage Environment Presentation at the 2014 International Health Data Linkage Conference Health Statistics Division April 2014

Upload: others

Post on 12-May-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Social Domain Record Linkage Environment

Presentation at the 2014 International Health Data Linkage Conference

Health Statistics Division

April 2014

Record Linkage at Statistics Canada

Linkages must satisfy a prescribed review process• New linkages approved by Executive Management Board

chaired by Chief Statistician of Canada

All projects using linked data are publicly announced on StatCan website• Including tabulations using linked datasets

Approved researchers access only the records needed for their project with no direct identifiers included

14/05/2014Statistics Canada • Statistique Canada2

Creating a Record Linkage Environment

Longitudinal Health and Administrative Data (LHAD) Initiative has proven that a record linkage environment is possible• System for linking health data using provincial health registries

and storing linked keys in a depository • Does not produce a fully integrated analytical database• Reduced cost and time for creating linked health analysis files

Social Domain Record Linkage Environment (SDRLE) Proof of Concept Project• Building on the success of LHAD• Using Statistics Canada administrative data• Increase the relevance of surveys by linking socio-demographic

indicators from multiple sources

14/05/2014Statistics Canada • Statistique Canada3

Derived Record Depository--------------

Key Depository

Hospital Discharge

Data

Survey Data

Vital Statistics

CensusImmigration Data

Tax Data

Canadian Cancer Registry

14/05/2014Statistics Canada • Statistique Canada4

SDRL Environment

Identifiers of the datasets are linked to

the DRD

The results are stored in a depository of linked keys

Only identifiers of datasets are

brought into the environment

DRD is built through

successive record linkages

SDRLE – Derived Record Depository (DRD)

Prototype DRD was built using multiple files linked together to identify unique individuals Source files include Census 2006, T1 Tax file, (1980-2011)

Canadian Births Database (1985-2008), Canadian Mortality Database (1992-2009), Landed Immigrant File (1980-2011), Indian Registry

Deterministic Record Linkage Method• Deterministic record linkage was initially used to restrict the creation

to the highest quality matches• Development of the method evolved to include near exact matches

Only persons identified in at least two datasets through record linkage were included in the DRD

Personal identifiers are stored in the DRD and a unique anonymous record identifier is assigned to these records

14/05/2014Statistics Canada • Statistique Canada5

14/05/2014Statistics Canada • Statistique Canada6

SDRL ENVIRONMENTDerived Record Depository

Core

SDRLE number123456789

182354987

129998889

Name

SDRLE number

Surname Given 1 Given 2 Start date End date

123456789 Doe John Liam 2006 2010

182354987 Doe Jane Lena 2006 2009

182354987 Johnson Jane Lena 2009 2012

129998889 Simpson Homer J 2006 2006

Address

SDRLE number

Address City PostalCode

Start date End date

123456789 150 Tunney’s Ottawa K1A0T6 2006 2009

123456789 Disney World Orlando 12345 2009 2010

182354987 151 Tunney’s Ottawa K1A0T5 2006 2012

129998889 Du Parc Montreal H3G1B1 2006 2006

Date of birth

SDRLE number

Date of birth

Start date End date

123456789 19501012 2006 2010

182354987 19600506 2006 2009

182354987 19600605 2009 2012

129998889 2006 2006 2006

Sex

SDRLE number

Sex Start date End date

123456789 M 2006 2010

182354987 F 2006 2012

129998889 M 2006 2006

Date of death

SDRLE number

Date of death

Start date End date

123456789 20100101 2010 2010

14/05/2014Statistics Canada • Statistique Canada7

SDRL ENVIRONMENT

Key Depository

SDRLE Number

DAD ID Number

Tax ID Number

Birth ID Number

Death IDNumber

Cancer ID Number

Census ID Number

ImmigrationNumber

123456789 - 490212461 - 1756243763 - 129309482 1278882762

182354987 4455600 678097512 - - 123765190A 776545411 -

129998889 1547342 - 2938365789 - - - -

Creating linked datasets

14/05/2014Statistics Canada • Statistique Canada8

Key Depository

Cohort dataset

Outcomes dataset

Linked dataset

14/05/2014Statistics Canada • Statistique Canada9

SDRLE results to date

DRD unique person records lower but near the demographic count for Canada• As a test, compared to the 2011 population count• Analysis of this DRD indicates that it includes fewer than 200,000

duplicates• Variations of coverage in sub-populations

• Could be attributable to limitations of the datasets used as well as to the record linkage methodology used for the proof of concept exercise

External linkage between the DRD and the National Longitudinal Survey of Children and Youth, criminal court data (ICCS), hospital discharge data (DAD), tax, and education program records (PSIS and RAIS).

14/05/2014Statistics Canada • Statistique Canada10

Next steps Improve methods by:

• Simplifying process for updates to the Derived Record Depository

• Reviewing and optimizing record linkage methods and processes

• Incorporating the use of G-Link for probabilistic matching where appropriate

Additional files to be included in the model• Files already at StatCan: Canadian Child Tax Benefit, T4

file• Other files yet to be identified• Environment is open to addition of new files in the future

QUESTIONS/COMMENTS?

Richard TrudeauHealth Record Linkage SectionHealth Statistics [email protected]

Craig GrimesHealth Record Linkage SectionHealth Statistics [email protected]

Bob KingsleyHealth Statistics [email protected]

14/05/2014Statistics Canada • Statistique Canada11

14/05/2014Statistics Canada • Statistique Canada12

Supplemental Slides

Derived Depository--------------

Key Registry

Hospital Discharge

Data

Survey Data

Vital Statistics

CensusImmigration Data

Tax Data

Canadian Cancer Registry

14/05/2014Statistics Canada • Statistique Canada13

SDRL ENVIRONMENT

Some datasets are

used to derive the Depository

Some datasets are linked to the Derived

Depository for analytical

purposes only

Deriving Record Depository and Key Depository

14/05/2014Statistics Canada • Statistique Canada14

Source(Tax, Census,

Births, etc.)

Filter

External Record Linkage

Record Depository

Add?Update?

Source updatesUpdate source

Process metadata

Unlinked source records

Linked source records

Source metadata

Key Depository

(linked status)

Creation of a linked dataset

14/05/2014Statistics Canada • Statistique Canada15

Analysis fileLinked NLSCY Cohort with

• 93% linked to Tax • 24% linked to PSIS• 14% linked to DAD• 6% linked to RAIS• 5% linked to ICCS

Cohort file NLSCY

91% linked

Derived Record

Depository

Key Registry

ICCS (69%)

PSIS (90%)

Tax (98%)

RAIS (95%)

DAD (55%)