basic epidemiologic analysis

41
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5

Upload: vuongdieu

Post on 04-Jan-2017

240 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Basic epidemiologic analysis

Basic epidemiologic analysis with Stata

Biostatistics 212

Lecture 5

Page 2: Basic epidemiologic analysis

Housekeeping• Questions about Lab 4?

– Extra credit puzzle• Lab 3 issues

– Make sure your do file executes– Make sure your do file opens the dataset

• Final Project – by the last session you should:– Have dataset imported into Stata– Clean up the variables you will use– Sketch out (paper and pencil) a table and a figure– Be ready to write analysis do files

Page 3: Basic epidemiologic analysis

Today...

• What’s the difference between epidemiologic and statistical analysis?

• Interaction and confounding with 2 x 2’s• Stata’s “Epitab” commands• Adjusting for many things at once• Logistic regression• Testing for trends

Page 4: Basic epidemiologic analysis

Epi vs. Biostats• Statistical analysis – Evaluating the role of chance

• Epidemiologic analysis – Analyzing and interpreting clinical research data in the context of scientific knowledge– Directionality of causes– Mediation vs. confounding– Prediction vs. causal inference– Clinical importance of effect size– “Cost” of a type I and type II error

Page 5: Basic epidemiologic analysis

Epi vs. Biostats

• Epi –Confounding, interaction, and causal diagrams.– What to adjust for?– What do the adjusted estimates mean?

A B

C

A BC

Page 6: Basic epidemiologic analysis

2 x 2 Tables

• “Contingency tables” are the traditional analytic tool of the epidemiologist

Outcome

Exposure

+ -

+

-

a b

c d

OR = (a/b) /(c/d) = ad/bc

RR = a/(a+b) / c/(c+d)

Page 7: Basic epidemiologic analysis

2 x 2 Tables

• Example

Coronary calcium

Binge drinking

+ -

+

-

106 585

186 2165

OR = 2.1 (1.6 – 2.7)

RR = 1.9 (1.6 – 2.4)

292 2750

2351

691

3042

Page 8: Basic epidemiologic analysis

2 x 2 Tables

• Example

Coronary calcium

Binge drinking

+ -

+

-

106 585

186 2165

OR = 2.1 (1.6 – 2.7)

RR = 1.9 (1.6 – 2.4)

292 2750

2351

691

3042

Can we say that binge drinking CAUSES atherosclerosis?

Page 9: Basic epidemiologic analysis

2 x 2 Tables

• There is a statistically significant association, but is it causal?

• Does male gender confound the association?

Binge drinking Coronary calcium

Male

Page 10: Basic epidemiologic analysis

2 x 2 Tables

• Men more likely to binge– 34% of men, 14% of women

• Men have more coronary calcium– 15% of men, 7% of women

Page 11: Basic epidemiologic analysis

2 x 2 Tables

• But what does confounding look like in a 2x2 table?

• And how do you adjust for it?

Page 12: Basic epidemiologic analysis

2 x 2 Tables

• But what does confounding look like in a 2x2 table?

• And how do you adjust for it?– Stratify– Examine strata-specific estimates (for interaction)– Combine estimates if appropriate (if no interaction)

• Weighted average of strata-specific estimates

Page 13: Basic epidemiologic analysis

2 x 2 Tables• First, stratify…

106 585

186 2165

CAC

Binge

+ -

+

-

89 374

118 801

CAC

Binge

+ -

+

-

17 211

68 1364

CAC

Binge

+ -

+

-

In men In women

RR = 1.94 (1.55-2.42)

(34%) (14%)

(15%) (7%)

RR = 1.57 (0.94-2.62)RR = 1.50 (1.16-1.93)

Page 14: Basic epidemiologic analysis

2 x 2 Tables• …compare strata-specific estimates…

• (they’re about the same)

89 374

118 801

CAC

Binge

+ -

+

-

17 211

68 1364

CAC

Binge

+ -

+

-

In men In women

(34%) (14%)

(15%) (7%)

RR = 1.57 (0.94-2.62)RR = 1.50 (1.16-1.93)

Page 15: Basic epidemiologic analysis

2 x 2 Tables• …and then “combine” the estimates.

89 374

118 801

CAC

Binge

+ -

+

-

17 211

68 1364

CAC

Binge

+ -

+

-

In men In women

RR = 1.50 (1.16-1.93) RR = 1.57 (0.94-2.62)

RRadj = 1.51 (1.21-1.89)

Page 16: Basic epidemiologic analysis

106 585

186 2165Binge

+ -

+

-

89 374

118 801

CAC

Binge

+ -

+

-

17 211

68 1364

CAC

Binge

+ -

+

-

In men In women

(34%) (14%)

(15%) (7%)

RR = 1.57 (0.94-2.62)RR = 1.50 (1.16-1.93)

RR = 1.94 (1.55-2.42)

RRadj = 1.51 (1.21-1.89)

Page 17: Basic epidemiologic analysis

2 x 2 Tables

• How do we do this with Stata?– Tabulate – output not exactly what we want.– The “epitab” commands

• Stata’s answer to stratified analyses

cs, cccsi, ccitabodds, mhodds

Page 18: Basic epidemiologic analysis

2 x 2 Tables

• Example – demo using Stata

cs cac bingecs cac binge, by(male)

cs cac modalccs cac modalc, by(racegender)

cc cac binge

Page 19: Basic epidemiologic analysis

2 x 2 Tables

• Example of a crude association (unadjusted). cs cac binge

| Binge pattern [>5 drinks| | on occasion] | | Exposed Unexposed | Total-----------------+------------------------+------------ Cases | 106 186 | 292 Noncases | 585 2165 | 2750-----------------+------------------------+------------ Total | 691 2351 | 3042 | | Risk | .1534009 .0791153 | .0959895 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | .0742856 | .0452852 .103286 Risk ratio | 1.938954 | 1.551487 2.423187 Attr. frac. ex. | .484258 | .355457 .5873203 Attr. frac. pop | .1757923 | +------------------------------------------------- chi2(1) = 33.96 Pr>chi2 = 0.0000

Page 20: Basic epidemiologic analysis

2 x 2 Tables

• Example of Confounding

. cs cac binge, by(male)

male | RR [95% Conf. Interval] M-H Weight-----------------+------------------------------------------------- 0 | 1.570175 .9402789 2.622042 9.339759 1 | 1.497071 1.164201 1.925117 39.53256 -----------------+------------------------------------------------- Crude | 1.938954 1.551487 2.423187 M-H combined | 1.511042 1.205656 1.89378-------------------------------------------------------------------Test of homogeneity (M-H) chi2(1) = 0.027 Pr>chi2 = 0.8700

Page 21: Basic epidemiologic analysis

2 x 2 Tables

• Example of Effect Modification

. cs cac modalc, by(racegender)

racegender | RR [95% Conf. Interval] M-H Weight-----------------+------------------------------------------------- Black women | .75888 .3595892 1.601547 8.043758 White women | .8960739 .4971477 1.61511 11.07552 Black men | 1.945668 1.114927 3.3954 8.304878 White men | .9279831 .66551 1.293974 29.45557 -----------------+------------------------------------------------- Crude | 1.30072 1.023022 1.653798 M-H combined | 1.046446 .8225915 1.331218-------------------------------------------------------------------Test of homogeneity (M-H) chi2(3) = 6.245 Pr>chi2 = 0.1003

Page 22: Basic epidemiologic analysis

2 x 2 Tables

• Inmediate commands – csi, cci– No dataset required – just 2x2 cell frequencies

csi a b c dcsi 106 186 585 2165 (for cac binge)

Page 23: Basic epidemiologic analysis

Multivariable adjustment

• Binge drinking appears to be associated with coronary calcium– Association partially due to confounding by

gender

• What about race? Age? SES? Smoking?

Page 24: Basic epidemiologic analysis

Multivariable adjustmentmanual stratification

# 2x2 tablesCrude association 1Adjust for gender 2Adjust for gender, race 4Adjust for gender, race, age 68Adjust for “” + income, education 816Adjust for “” + “” + smoking 2448

Page 25: Basic epidemiologic analysis

Multivariable adjustmentcs command

• cs command– Does manual stratification for you

• Lists results from every strata• Tests for overall homogeneity• Adjusted and crude results

– Demo cs cac binge, by(male black age)

Page 26: Basic epidemiologic analysis

Multivariable adjustmentcs command

• cs command– Does manual stratification for you

• Lists results from every strata• Tests for overall homogeneity• Adjusted and crude results

– Demo cs cac binge, by(male black age)– Can’t interpret interactions!

Page 27: Basic epidemiologic analysis

Multivariable adjustmentmhodds command

• mhodds allows you to look at specific interactions, adjusted for multiple covariates– Does same stratification for you– Adjusted results for each interaction variable– P-value for specific interaction (homogeneity)– Summary adjusted result

• Demo mhodds cac binge age, by(racegender)

Page 28: Basic epidemiologic analysis

Multivariable adjustmentmhodds command

• mhodds allows you to look at specific interactions, adjusted for multiple covariates– Does same stratification for you– Adjusted results for each interaction variable– P-value for specific interaction (homogeneity)– Summary adjusted result

• Demo mhodds cac binge age, by(racegender)• But strata get thin!

Page 29: Basic epidemiologic analysis

Multivariable adjustmentlogistic command

• Assumes logit model– Await biostats class for details!– Coefficients estimated, no actual stratification– Continuous variables used as they are

Page 30: Basic epidemiologic analysis

Multivariable adjustmentlogistic command

Basic syntax:

logistic outcomevar [predictorvar1 predictorvar2 predictorvar3…]

Page 31: Basic epidemiologic analysis

Multivariable adjustmentlogistic command

If using any categorical predictors:

logistic outcomevar [i.catvar var2…]

Creates “dummy variables” on the fly

If you forget, Stata won’t know they are categorical, and you’ll get the wrong answer!

Page 32: Basic epidemiologic analysis

Multivariable adjustmentlogistic command

Demo

logistic cac bingelogistic cac binge malelogistic cac binge male blacklogistic cac binge male black agelogistic cac binge male black age i.smokelogistic cac binge##i.racegender age i.smokelogistic cac modalc##racegender

Page 33: Basic epidemiologic analysis

Multivariable adjustmentlogistic command

Demo . xi: logistic cac binge male black age i.smokei.smoke _Ismoke_0-2 (naturally coded; _Ismoke_0 omitted)

Logistic regression Number of obs = 3036 LR chi2(6) = 211.95 Prob > chi2 = 0.0000Log likelihood = -852.99988 Pseudo R2 = 0.1105

------------------------------------------------------------------------------ cac | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- binge | 1.387573 .1985355 2.29 0.022 1.048251 1.836736 male | 3.253031 .4608839 8.33 0.000 2.464287 4.294226 black | .7282563 .0994953 -2.32 0.020 .5571756 .9518674 age | 1.19833 .025771 8.41 0.000 1.148869 1.24992 _Ismoke_1 | 1.357694 .2308651 1.80 0.072 .972886 1.894707 _Ismoke_2 | 2.120925 .3302698 4.83 0.000 1.563063 2.87789------------------------------------------------------------------------------

Page 34: Basic epidemiologic analysis

logistic command interaction demo. logistic cac modalc##racegender age i.smoke

Logistic regression Number of obs = 2795 LR chi2(10) = 186.28 Prob > chi2 = 0.0000Log likelihood = -739.54359 Pseudo R2 = 0.1119

------------------------------------------------------------------------------ cac | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- 1.modalc | .6024889 .2430813 -1.26 0.209 .2732258 1.328546 | racegender | 2 | 1.018361 .3137632 0.06 0.953 .5567262 1.862783 3 | 1.601149 .519393 1.45 0.147 .8478374 3.023786 4 | 4.119486 1.100853 5.30 0.000 2.439922 6.955209 | modalc#| racegender | 1 2 | 1.422897 .7314808 0.69 0.493 .5195041 3.897247 1 3 | 2.867897 1.473405 2.05 0.040 1.047736 7.850102 1 4 | 1.546468 .7057105 0.96 0.339 .6322751 3.782472 | age | 1.184036 .0271845 7.36 0.000 1.131937 1.238534 | smoke | 1 | 1.438413 .2623889 1.99 0.046 1.00603 2.056629 2 | 2.464978 .4157232 5.35 0.000 1.771154 3.430597------------------------------------------------------------------------------

Page 35: Basic epidemiologic analysis

Multivariable adjustmentlogistic command

• Pro’s– Provides all OR’s in the model– Accepted approach (mhodds rarely used by statisticians)– Can deal with continuous variables (like age)– Better estimation for large models?

• Con’s– Interaction testing more cumbersome, less automatic– More assumptions– Harder to test for trends

Page 36: Basic epidemiologic analysis

Multivariable adjustment

• Format for linear regression, and other types of regression is the same as for logistic regression, except for the initial command:

regress outcomevar [predictorvar1 predictorvar2 predictorvar3…]

ologit outcomevar [predictorvar1 predictorvar2 predictorvar3…]

etc

Page 37: Basic epidemiologic analysis

Testing for trend

• Test of trend with tabodds. tabodds cac alccat

-------------------------------------------------------------------------- alccat | cases controls odds [95% Conf. Interval]------------+------------------------------------------------------------- 0 | 110 1325 0.08302 0.06835 0.10084 <1 | 90 933 0.09646 0.07770 0.11976 1-1.9 | 46 295 0.15593 0.11429 0.21275 2+ | 45 193 0.23316 0.16856 0.32252--------------------------------------------------------------------------Test of homogeneity (equal odds): chi2(3) = 36.70 Pr>chi2 = 0.0000

Score test for trend of odds: chi2(1) = 32.20 Pr>chi2 = 0.0000

Page 38: Basic epidemiologic analysis

Testing for trendstabodds command

• Adjustment for multiple variables possible– tabodds cac alccat, adjust(age male black)

Page 39: Basic epidemiologic analysis

Approaching your analysis

• Number of potential models/analyses is daunting– Where do you start? How do you finish?

• My suggestion– Explore– Plan definitive analysis, make dummy tables/figures– Do analysis (do/log files), fill in tables/figures– Show to collaborators, reiterate prn– Write paper

Page 40: Basic epidemiologic analysis

Summary• Make sure you understand confounding and interaction

with 2x2 tables in Stata

• Epitab commands are a great way to explore your data– Emphasis on interaction

• Logistic regression is a more general approach, ubiquitous, but testing for interactions and trends is more difficult

Page 41: Basic epidemiologic analysis

In lab today…

• Lab 5– Epi analysis of coronary calcium dataset– Walks you through evaluation of confounding

and interaction• Judgment calls – often no right answer, just focus on

reasoning.• Reminder – put your answers as comments in the do

file* 15c – 15%, p<.001