doing data science – chapter 12: epidemiology vast amounts of individual patient medical data is...

6
Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available Detailed – visits, prescriptions, outcomes, etc. Records cover lifetimes Largest databases have records on 80 million people However many medical studies are observational Not founded on data Results effect actions of doctors and insurance regulators

Upload: stephanie-oconnor

Post on 13-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available – Detailed – visits, prescriptions, outcomes,

Doing Data Science – Chapter 12: Epidemiology

• Vast amounts of individual patient medical data is available– Detailed – visits, prescriptions, outcomes, etc.– Records cover lifetimes– Largest databases have records on 80 million people

• However many medical studies are observational– Not founded on data– Results effect actions of doctors and insurance regulators

Page 2: Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available – Detailed – visits, prescriptions, outcomes,

Confounder Problem and Stratification

• Confounding problem: an extraneous variable which correlates to both the dependent and independent statistical variable, giving an incorrect perception of cause and effect

• Stratification: partitioning a case into subcases and evaluating just the subcases to reach conclusions about the top level case– Weighted average is one way of evaluating subcases

• Example [p.294-295]:– In study where equal number of women (50) and men (50) had treatment but

different numbers (80 women, 20 men) were in the control group– Original causal effect is 10%– Stratified causal effect is 5% for men and 11.25% for women– This does NOT prove that the treatment side effects are twice as strong for

women• Problem – errors in causality if the numbers in the groups after stratification are

too different to give meaningful statistics

Page 3: Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available – Detailed – visits, prescriptions, outcomes,

Data Driven Studies

• Analysis of 50 studies of drug/outcome pairs– 5000 analyses for each pair on nine databases– Example:

• ACE inhibitors (treatment for hypertension)/swelling of the heart• Results varied between databases from 3X risk to 6X risk

– For 20 of 50 pairs, risk or no risk was database dependent– By adjusting factors of databases, confounders, and time windows, all studies

can show risk or no risk

Page 4: Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available – Detailed – visits, prescriptions, outcomes,

Data Driven Studies

• Observational Medical Outcomes Partnership (OMOP)– See how well current methods predict things we already know– 10 large medical databases containing records for 200 million people– $25M– Determined an ROC curve. Area Under the Curve (AUC) was 0.65, not much

better than a random 0.5– Databases are self-consistent – using one database gave better accuracy (0.92

in one case)– Graphs below show ~80% sensitivity with ~10% false-positive rate [p.302]

Page 5: Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available – Detailed – visits, prescriptions, outcomes,

“The epidemiologists in general don’t believe the results of this study.”

In other words, they prefer to rely on observational rather than data driven

conclusions

Page 6: Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available – Detailed – visits, prescriptions, outcomes,

References

• http://en.wikipedia.org/wiki/Confounding• https://en.wikipedia.org/wiki/ACE_inhibitor