managing and analyzing health data (vldb conference)

Post on 27-Jan-2015

105 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

UNIVERSITY OF WASHINGTON

Managing and Analyzing

Global Health Data

Seattle, August 30, 2011

Peter Speyer, Director of Data Development

IHME Background

• Global institute dedicated to providing independent, rigorous, and scientific measurements and evaluations to accelerate progress on global health

• Part of the Department of Global Health at the University of Washington

• Funded by the Bill & Melinda Gates Foundation and the State of Washington (‘core funding’), and other funders through specific research grants

• Created in 2007

• 70 researchers, 30 staff

2

IHME Mission

Our goal isto improve the health of the world’s populations

by providing the best informationon population health

3

4

Health-related data

• Social determinants• Risk factors

Health Data

5

Population-based data

• Household / facility surveys• Census• Vital registration• Registries (provider,

disease)

Facility-based data

• Health records• Administrative data

(financial, operational)• Research data (DSS,

clinical trials, etc.)

Individual-based data

• Personal health records• “Quantified self”• Disease-based social

networks

Health Data Innovation

Patient engagementOpen data

Health apps

Key Health Data Challenges

6

Find & access

data

Dissemi-natedata

Use data

Key Health Data Challenges

• Lack of transparency

• Timeliness of data

• Lack of documentation• Access vs. privacy

7

Find & access

data

Dissemi-natedata

Use data

Key Health Data Challenges

• Sheer quantity of data files (30TB, 20K+ source datasets, 40M files)

• Diverse source data types and formats (pdf, csv, SPSS, CSPro, …)

• Data quality issues

8

Find & access

data

Dissemi-natedata

Use data

Key Health Data Challenges

• Make results data engaging

• Accountability: share results, code, source data

• Accommodate diverse audiences (expertise, geographies)

9

Find & access

data

Dissemi-natedata

Use data

Example: Global Burden of Disease

Mortality & causes of death

• Sources: census, surveys, vital registration, verbal autopsy

• Estimates: covariate models, spatial-temporal regressions; weighted combination of models

Morbidity

• Sources: Literature reviews, surveys, registries,hospital data

• Disease modeling: compartmental Bayesian model

• Health severity weights

Burden of disease

• DALYnator

10

300 diseases

40 risk factors

21 regions

1990, 2005, 2010

GBD Country Years, Causes of Death 1950-2009

11

GBD Country Years, Causes of Death 1950-2009

12

Data source Countries Site-years # of Deaths

VR 128 4,190 722,267,710

Household Surveys 136 2,827 10,132,976

Surveillance Systems 12 126 717,698

National VA 21 71 301,855

Subnational VA 59 442 2,606,815

Mortuary Registries 6 25 54,316

TOTAL 7,680 735,564,116

Solutions: Computing Infrastructure

• Analysis with statistical packages

– Projects with 100K+ lines of code

• File system

– 60TB disc space

– Redundant backup

• Cluster with 63 nodes (+300% in 2011), ~2000 cores

– Runs 24x7, very little downtime

• Virtual environments to test new applications, servethem to collaborators, etc.

13

Solutions: Global Health Data Exchange

• Transparency => data catalog• Access => data repository• Information => data community (future)

• One record per dataset• Standardized metadata• Internal users (10K records): files on file server• External users (5K records): files for download

• CMS: Drupal • Search: SOLR

14

Objectives

Approach

Implementation

15

UNIVERSITY OF WASHINGTON

Thank you!

speyer@uw.edu@peterspeyer

www.ghdx.org

Peter Speyer

Director of Data Development

top related