data linkage: the key to long term outcomes
DESCRIPTION
Data linkage: the key to long term outcomes. Professor Ronan Lyons Farr Institute – CIPHER Centre for Improvement in Population Health through E-records Research. Swansea University Biennial Scientific Meeting , Congenital Anomaly Registers: Utilizing a valuable resource - PowerPoint PPT PresentationTRANSCRIPT
Data linkage: the key to long term outcomes
Professor Ronan Lyons Farr Institute – CIPHER
Centre for Improvement in Population Health through E-records Research. Swansea University
Biennial Scientific Meeting, Congenital Anomaly Registers: Utilizing a valuable resourceTuesday 7th October 2104 Dylan Thomas Centre, Swansea
• Farr Institute
• Data linkage in the UK
• What is possible now and in the future
• Long term outcomes
Content of Presentation
Historical research
MRC’s vision for UK medical bioinformatics research
Enabling technologies & infrastructure
Developing capacity & expertise
Funding for innovative research
High throughput
data
Cohorts
Trials
BioBanks
EducationalEnvironmental
SocialData
NHSClinicalData
Patient groups
Demographicdata
Farr UCL Partners
Farr Scotland
Farr - CIPHER
Farr N8 Manchester
Strengthening health informatics research
• MRC coordinated 10-partner £19m call for e-health informatics research centres across the UK
Cutting edge research using data linkage
capacity building
• Additional £20m capital to create Farr Institute
• UK Health Informatics Research Network
Coordinate training, share good practice and develop methodologies
Engage with the public, collaborate with industry and the NHS
Who is Farr?
“Diseases are more easily prevented than cured and the first
step to their prevention is the discovery of their exciting causes.”
William Farr
“To harness health data for patient and public benefit by setting the international standard in trustworthy reuse of electronic patient records
and related linkable data for large-scale research.”
Our Vision
Our Ten Key Activities
1. Collaborative Leadership 6. Meta Data and Enabling Datasets2. Cutting edge Research 7. Harmonised eInfrastructure3. Public engagement 8. Partnerships4. Governance (safe havens) 9. Training/ Capacity Building 5. Methods development 10. CommunicationsTo deliver impact nationally an internationally
Various developments across the UK
• Considerable number of initiatives
• UK – Farr Institute– Administrative Data Research Centres/Network
• England– Health and Social Care Information Centre– Clinical Practice Research Datalink
• Northern Ireland– Northern Ireland Longitudinal study
• Scotland– Information Services Division, ISD Scotland– Electronic Data Research and Innovation Service eDRIS
• Wales– SAIL databank
Steps in utilising health information for research
1. Building trust, partnerships and collaboration
2. Development of anonymisation and linkage techniques
3. Quality assessment and appraisal of datasets
4. Use of datasets to support research
SAIL uses a split file, trusted third party (TTP), multi-stage encryption, and step wise and restricted field remote access analysis system to ensure privacy protection
Lyons RA, et al.The SAIL databank: linking multiple health and social care datasets. BMC Med Inform Decis Mak. 2009 Jan 16;9:3. http://www.biomedcentral.com/1472-6947/9/3
Secure Anonymised Information Linkage (SAIL) databank
SAIL: a multi-sourced data bank of linkable anonymised data on the population of Wales:
• health service operational systems• national databases • clinical and biological data• education, housing, social care, etc.
Uses a trusted third party, split file and multiple encryption technologies to create Anonymised Linkage Fields (ALFs) for individuals and residences
SAIL Gateway is a remote access analysis facility to curtailed data.
SAIL split file/trusted third party methodology
Anonymisation process
HIRU (Blue C)
Demographic data only
Clinical / activity data
Recombine
Other recombined data
Validated, anonymised data
Encrypt and load
Operational system
NHS Wales Informatics Service
Data Provider
HIRU (Blue C)
Con
stru
ct
ALF
Valid
ate Tra
ce &
Geo-
cod
e
Datasets in SAIL (incomplete coverage)
Administrative Health:PopulationInpatients Outpatients Emergency DepartmentChild Health Database WalesNHS Direct WalesAdministrative Non-Health:BirthsDeathsEducational AttainmentSocial Services Housing
Clinically rich data bases: Specialty specificCancer IncidenceCancer Screening Congenital AnomaliesArthropathiesMyocardial InfarctionDiabetesEtc.General GP DataLaboratory systemsStudy specificEmbedded trials and cohorts
Patient Journey Analysis - Health and Social Care
• Fetal deaths common with more severe malformations• Fetus does not have an ‘identity’ such as an NHS number• Ther e may be multiple fetuses• Babies often leave hospital with incomple name – ‘Baby
Surname’• Early neonatal deaths - not registered with GP
• However, possible to link maternal and baby NHS numbers if systems like National Community Child Health Databases in Wales exist
• NN4B
Partcular difficulties with congenital anomaly research
• Modern cohorts/registries designed for multi-modal data linkage– Huge amounts of data – Different database structures/sizes– Major challenges when creating cross/cohort/platform analyses– Semantic interoperability /data harmonisation issues
• Original metadata - standards• Variable definitions from baseline/laboratory results• Variable definitions from routine GP/hospital data
– GP Read codes: UK/NZ, user variation+++– UK Inpatient data – different in Wales/England/Scotland
– Too difficult to move very large and complex data• Recipients would need to design/implement very complex data structures just to receive
data
• Privacy protection essential– Potential for ‘jigsaw’ attacks, threat from reidentification scientists
• World-wide shortage of skills and expertise in managing these challenges– No single institution with all necessary skills– Need for international collaboration – Build upon existing expertise, developments and investments
Informatics challenges
• 22 cohorts involved• UK Biobank – greatest variety
– Baseline survey– Baseline anthropometrics/ physiological measurements
(continuous/categorical)– Baseline biochemistry/haematology– Genomics – 821,000 SNPs– Imaging: retinal/MRI/US– Accelerometer data– Follow up
• Death and cancer registry• Primary care• Hospital data• Disease registries• Self reported conditions/status• Functional/cognitive impairment
Cohort Data in UK Dementia Platform
• Built upon SAIL Gateway developments www.saildatabank.com
• Built with MRC capital infrastructure for Farr Institute– bid supported by ALSPAC, UK Biobank, LifeStudy cohorts
• A national / international resource delivered through FARR – A secure environment to enable research groups to conform to
best practices of data management, security and information governance
– A remote access large scale IT infrastructure with standard and bespoke analytical tools
• Leaves data ownership with the cohorts– devolved account and access control – information governance responsibility & control with projects
• Researchers focus on the science
Remote analysis platform for multiple cohorts: UK Secure e-Research Platform (UK SeRP)
• Multidisciplinary collaborative project
• Platform for translating routinely collected data into an anonymised population level child e-cohort
• Investigate the widest possible range of social and environmental determinants of child health and social outcomes
• Inform the development of interventions to reduce health inequalities of children in Wales
• Two phases: - Phase 1: proof of concept
- Phase 2: dynamic capabilities
Wales Electronic Cohort for Children (WECC)
Birth records
(ONS births)
Mortality records
(ONS deaths)
Wales Electronic Cohort for Children
N=981,404
WECC eligibility criteria applied
Data cleaning: rules for removal of duplicates and errors
WDSChild
Health(NCCHD)
ALF_E
WDS: Welsh Demographic Service, NCCHD: National Community Child Health, ONS: Office for National Statistics
WECC development
• Links with health and education data via ALF_E• Links with maternal health data via mALF_E• Links with SAIL eGIS data via ALF_E/RALF_E
WECC coren = 981,404
♂: 500,181 (51.0%)♀ : 481,205 (49.0%)
Inpatient
GP consultation
s
Perinatal and Child
health
Environment
House Moves
Non-Welsh births
n=215,095♂: 107,222 (49.8%)♀ : 107,872 (50.2%)
Born in Walesn= 766,309
♂: 392,959 (51.3%)♀ : 373,333 (49.0%)
WECC derived tables
National dataset
Education
I. Influence of maternal and child health factors on time to first admission with a respiratory disorder
(Paranjothy S. et al (2013) Pediatrics 132:6 e1562-e1569)
II. Influence of head injuries on educational attainment at age 7 (Gabbe B.J. et al (2014)Journal of Epidemiology and Community Health, J Epidemiol Community Health.68:5 466-470 )
III. Educational outcomes for frequent movers (Hutchings H. et al (2013) PLoS One. 8(8) e70601)
IV. Influence of the physical social and environment on childhood obesity
Examples of analyses
Background to WECC phase 2
Poor educational attainment unemployment and/or low salary
ill-healthA greater understanding of factors underlying
education inequalities is necessary to target interventions to protect future generations from poverty and ill health.
Health of the child
E
Environment
Family size
Household illness
Unemployment
Ill health
Low salary
Educational attainment
1. Does moving to a less deprived community influence child health and educational outcomes?
2. To what extent do serious childhood or family health conditions affect educational outcomes?
3. Is poor educational attainment a risk factor for adverse health in adolescence?
4. Can a novel hybrid cohort study; embedding a traditional detailed survey cohort e.g. Millennium Cohort Study (MCS) within D-WECC be used to evaluate the strengths and weaknesses of using e-cohorts for epidemiological studies?
Research questions
• Individual linkage– Mortality data : survival and cause of death– GP and hospital activity: health service impact/comorbidy– Laboratory and imaging systems: severity of
condition/comorbidity– Education attainment: social impact of condition– Work and benefits: social impact/disability
• Family/household linkage– Impact on the wider family
Data linkage and long term outcomes
Time to the first emergency respiratory hospital admission
• Risk decreased with each successive week in gestation up to 40 – 42 weeks.
• Risk further increased for babies that were small for gestational age.
• The increased risk is small for late preterm infants but the number affected is large and will impact on healthcare services.
Head injury and school performance
J Epidemiol Community Health 2014;68:466-470 doi:10.1136/jech-2013-203427
For children entering the school, what is the association between preceding head injury and KS1 (age 5-7 years) performance?
n=116,154Born in Wales Sept 1998-
Aug 2001
n=90,661Valid KS1 result
n=290Head injury admission
n=90,371No head injury
n=101,892Remaining in Wales
n=14,262Left Wales
Association between head injury and satisfactory performance on KS1 Predictor OR (95% CI) AOR (95% CI)Head injury None (reference)
Skull fracture
Concussion
Intracranial injury
1
0.73 (0.50, 1.09)
0.85 (0.33, 2.16)
0.50 (0.33, 0.75)
1
0.79 (0.52, 1.18)
0.87 (0.31, 2.49)
0.46 (0.30, 0.72)
Gender Male (reference)
Female
- 1
1.95 (1.87, 2.03)
Townsend deprivation index quintile
1 (Least deprived) (reference)
2
3
4
5 (Most deprived)
- 1
0.64 (0.59, 0.69)
0.49 (0.45, 0.52)
0.38 (0.35, 0.41)
0.26 (0.24, 0.28)
Age at KS1 assessment
(years) - 2.77 (2.60, 2.97)
Birth weight (kg) - 1.41 (1.35, 1.47)
Gestational age (weeks) - 1.01 (1.00, 1.03)
Household level linkage
Soon - a tidal wave of data…
• Full genome sequence ~£3,000• Dropping in price 10x every 2-4 years• Existing NHS genetic test ~£1,000• Disk cost to store individuals variations
~10p
• Development of continuous monitoring and remote sensors
• Data from many other sources• New approaches needed for accessing,
manipulating, visualizing• Requires entirely new perspective
• Expect further development of data linkage capabilities across the UK
• However, capacity is a major issue
• Amount of work needed is often underestimated
• Ensuring privacy is protected and that the public are engagement and accept this research approach are key activities
The future is bright