basic epidemiologic analysis

Post on 04-Jan-2017

241 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Basic epidemiologic analysis with Stata

Biostatistics 212

Lecture 5

Housekeeping• Questions about Lab 4?

– Extra credit puzzle• Lab 3 issues

– Make sure your do file executes– Make sure your do file opens the dataset

• Final Project – by the last session you should:– Have dataset imported into Stata– Clean up the variables you will use– Sketch out (paper and pencil) a table and a figure– Be ready to write analysis do files

Today...

• What’s the difference between epidemiologic and statistical analysis?

• Interaction and confounding with 2 x 2’s• Stata’s “Epitab” commands• Adjusting for many things at once• Logistic regression• Testing for trends

Epi vs. Biostats• Statistical analysis – Evaluating the role of chance

• Epidemiologic analysis – Analyzing and interpreting clinical research data in the context of scientific knowledge– Directionality of causes– Mediation vs. confounding– Prediction vs. causal inference– Clinical importance of effect size– “Cost” of a type I and type II error

Epi vs. Biostats

• Epi –Confounding, interaction, and causal diagrams.– What to adjust for?– What do the adjusted estimates mean?

A B

C

A BC

2 x 2 Tables

• “Contingency tables” are the traditional analytic tool of the epidemiologist

Outcome

Exposure

+ -

+

-

a b

c d

OR = (a/b) /(c/d) = ad/bc

RR = a/(a+b) / c/(c+d)

2 x 2 Tables

• Example

Coronary calcium

Binge drinking

+ -

+

-

106 585

186 2165

OR = 2.1 (1.6 – 2.7)

RR = 1.9 (1.6 – 2.4)

292 2750

2351

691

3042

2 x 2 Tables

• Example

Coronary calcium

Binge drinking

+ -

+

-

106 585

186 2165

OR = 2.1 (1.6 – 2.7)

RR = 1.9 (1.6 – 2.4)

292 2750

2351

691

3042

Can we say that binge drinking CAUSES atherosclerosis?

2 x 2 Tables

• There is a statistically significant association, but is it causal?

• Does male gender confound the association?

Binge drinking Coronary calcium

Male

2 x 2 Tables

• Men more likely to binge– 34% of men, 14% of women

• Men have more coronary calcium– 15% of men, 7% of women

2 x 2 Tables

• But what does confounding look like in a 2x2 table?

• And how do you adjust for it?

2 x 2 Tables

• But what does confounding look like in a 2x2 table?

• And how do you adjust for it?– Stratify– Examine strata-specific estimates (for interaction)– Combine estimates if appropriate (if no interaction)

• Weighted average of strata-specific estimates

2 x 2 Tables• First, stratify…

106 585

186 2165

CAC

Binge

+ -

+

-

89 374

118 801

CAC

Binge

+ -

+

-

17 211

68 1364

CAC

Binge

+ -

+

-

In men In women

RR = 1.94 (1.55-2.42)

(34%) (14%)

(15%) (7%)

RR = 1.57 (0.94-2.62)RR = 1.50 (1.16-1.93)

2 x 2 Tables• …compare strata-specific estimates…

• (they’re about the same)

89 374

118 801

CAC

Binge

+ -

+

-

17 211

68 1364

CAC

Binge

+ -

+

-

In men In women

(34%) (14%)

(15%) (7%)

RR = 1.57 (0.94-2.62)RR = 1.50 (1.16-1.93)

2 x 2 Tables• …and then “combine” the estimates.

89 374

118 801

CAC

Binge

+ -

+

-

17 211

68 1364

CAC

Binge

+ -

+

-

In men In women

RR = 1.50 (1.16-1.93) RR = 1.57 (0.94-2.62)

RRadj = 1.51 (1.21-1.89)

106 585

186 2165Binge

+ -

+

-

89 374

118 801

CAC

Binge

+ -

+

-

17 211

68 1364

CAC

Binge

+ -

+

-

In men In women

(34%) (14%)

(15%) (7%)

RR = 1.57 (0.94-2.62)RR = 1.50 (1.16-1.93)

RR = 1.94 (1.55-2.42)

RRadj = 1.51 (1.21-1.89)

2 x 2 Tables

• How do we do this with Stata?– Tabulate – output not exactly what we want.– The “epitab” commands

• Stata’s answer to stratified analyses

cs, cccsi, ccitabodds, mhodds

2 x 2 Tables

• Example – demo using Stata

cs cac bingecs cac binge, by(male)

cs cac modalccs cac modalc, by(racegender)

cc cac binge

2 x 2 Tables

• Example of a crude association (unadjusted). cs cac binge

| Binge pattern [>5 drinks| | on occasion] | | Exposed Unexposed | Total-----------------+------------------------+------------ Cases | 106 186 | 292 Noncases | 585 2165 | 2750-----------------+------------------------+------------ Total | 691 2351 | 3042 | | Risk | .1534009 .0791153 | .0959895 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | .0742856 | .0452852 .103286 Risk ratio | 1.938954 | 1.551487 2.423187 Attr. frac. ex. | .484258 | .355457 .5873203 Attr. frac. pop | .1757923 | +------------------------------------------------- chi2(1) = 33.96 Pr>chi2 = 0.0000

2 x 2 Tables

• Example of Confounding

. cs cac binge, by(male)

male | RR [95% Conf. Interval] M-H Weight-----------------+------------------------------------------------- 0 | 1.570175 .9402789 2.622042 9.339759 1 | 1.497071 1.164201 1.925117 39.53256 -----------------+------------------------------------------------- Crude | 1.938954 1.551487 2.423187 M-H combined | 1.511042 1.205656 1.89378-------------------------------------------------------------------Test of homogeneity (M-H) chi2(1) = 0.027 Pr>chi2 = 0.8700

2 x 2 Tables

• Example of Effect Modification

. cs cac modalc, by(racegender)

racegender | RR [95% Conf. Interval] M-H Weight-----------------+------------------------------------------------- Black women | .75888 .3595892 1.601547 8.043758 White women | .8960739 .4971477 1.61511 11.07552 Black men | 1.945668 1.114927 3.3954 8.304878 White men | .9279831 .66551 1.293974 29.45557 -----------------+------------------------------------------------- Crude | 1.30072 1.023022 1.653798 M-H combined | 1.046446 .8225915 1.331218-------------------------------------------------------------------Test of homogeneity (M-H) chi2(3) = 6.245 Pr>chi2 = 0.1003

2 x 2 Tables

• Inmediate commands – csi, cci– No dataset required – just 2x2 cell frequencies

csi a b c dcsi 106 186 585 2165 (for cac binge)

Multivariable adjustment

• Binge drinking appears to be associated with coronary calcium– Association partially due to confounding by

gender

• What about race? Age? SES? Smoking?

Multivariable adjustmentmanual stratification

# 2x2 tablesCrude association 1Adjust for gender 2Adjust for gender, race 4Adjust for gender, race, age 68Adjust for “” + income, education 816Adjust for “” + “” + smoking 2448

Multivariable adjustmentcs command

• cs command– Does manual stratification for you

• Lists results from every strata• Tests for overall homogeneity• Adjusted and crude results

– Demo cs cac binge, by(male black age)

Multivariable adjustmentcs command

• cs command– Does manual stratification for you

• Lists results from every strata• Tests for overall homogeneity• Adjusted and crude results

– Demo cs cac binge, by(male black age)– Can’t interpret interactions!

Multivariable adjustmentmhodds command

• mhodds allows you to look at specific interactions, adjusted for multiple covariates– Does same stratification for you– Adjusted results for each interaction variable– P-value for specific interaction (homogeneity)– Summary adjusted result

• Demo mhodds cac binge age, by(racegender)

Multivariable adjustmentmhodds command

• mhodds allows you to look at specific interactions, adjusted for multiple covariates– Does same stratification for you– Adjusted results for each interaction variable– P-value for specific interaction (homogeneity)– Summary adjusted result

• Demo mhodds cac binge age, by(racegender)• But strata get thin!

Multivariable adjustmentlogistic command

• Assumes logit model– Await biostats class for details!– Coefficients estimated, no actual stratification– Continuous variables used as they are

Multivariable adjustmentlogistic command

Basic syntax:

logistic outcomevar [predictorvar1 predictorvar2 predictorvar3…]

Multivariable adjustmentlogistic command

If using any categorical predictors:

logistic outcomevar [i.catvar var2…]

Creates “dummy variables” on the fly

If you forget, Stata won’t know they are categorical, and you’ll get the wrong answer!

Multivariable adjustmentlogistic command

Demo

logistic cac bingelogistic cac binge malelogistic cac binge male blacklogistic cac binge male black agelogistic cac binge male black age i.smokelogistic cac binge##i.racegender age i.smokelogistic cac modalc##racegender

Multivariable adjustmentlogistic command

Demo . xi: logistic cac binge male black age i.smokei.smoke _Ismoke_0-2 (naturally coded; _Ismoke_0 omitted)

Logistic regression Number of obs = 3036 LR chi2(6) = 211.95 Prob > chi2 = 0.0000Log likelihood = -852.99988 Pseudo R2 = 0.1105

------------------------------------------------------------------------------ cac | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- binge | 1.387573 .1985355 2.29 0.022 1.048251 1.836736 male | 3.253031 .4608839 8.33 0.000 2.464287 4.294226 black | .7282563 .0994953 -2.32 0.020 .5571756 .9518674 age | 1.19833 .025771 8.41 0.000 1.148869 1.24992 _Ismoke_1 | 1.357694 .2308651 1.80 0.072 .972886 1.894707 _Ismoke_2 | 2.120925 .3302698 4.83 0.000 1.563063 2.87789------------------------------------------------------------------------------

logistic command interaction demo. logistic cac modalc##racegender age i.smoke

Logistic regression Number of obs = 2795 LR chi2(10) = 186.28 Prob > chi2 = 0.0000Log likelihood = -739.54359 Pseudo R2 = 0.1119

------------------------------------------------------------------------------ cac | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- 1.modalc | .6024889 .2430813 -1.26 0.209 .2732258 1.328546 | racegender | 2 | 1.018361 .3137632 0.06 0.953 .5567262 1.862783 3 | 1.601149 .519393 1.45 0.147 .8478374 3.023786 4 | 4.119486 1.100853 5.30 0.000 2.439922 6.955209 | modalc#| racegender | 1 2 | 1.422897 .7314808 0.69 0.493 .5195041 3.897247 1 3 | 2.867897 1.473405 2.05 0.040 1.047736 7.850102 1 4 | 1.546468 .7057105 0.96 0.339 .6322751 3.782472 | age | 1.184036 .0271845 7.36 0.000 1.131937 1.238534 | smoke | 1 | 1.438413 .2623889 1.99 0.046 1.00603 2.056629 2 | 2.464978 .4157232 5.35 0.000 1.771154 3.430597------------------------------------------------------------------------------

Multivariable adjustmentlogistic command

• Pro’s– Provides all OR’s in the model– Accepted approach (mhodds rarely used by statisticians)– Can deal with continuous variables (like age)– Better estimation for large models?

• Con’s– Interaction testing more cumbersome, less automatic– More assumptions– Harder to test for trends

Multivariable adjustment

• Format for linear regression, and other types of regression is the same as for logistic regression, except for the initial command:

regress outcomevar [predictorvar1 predictorvar2 predictorvar3…]

ologit outcomevar [predictorvar1 predictorvar2 predictorvar3…]

etc

Testing for trend

• Test of trend with tabodds. tabodds cac alccat

-------------------------------------------------------------------------- alccat | cases controls odds [95% Conf. Interval]------------+------------------------------------------------------------- 0 | 110 1325 0.08302 0.06835 0.10084 <1 | 90 933 0.09646 0.07770 0.11976 1-1.9 | 46 295 0.15593 0.11429 0.21275 2+ | 45 193 0.23316 0.16856 0.32252--------------------------------------------------------------------------Test of homogeneity (equal odds): chi2(3) = 36.70 Pr>chi2 = 0.0000

Score test for trend of odds: chi2(1) = 32.20 Pr>chi2 = 0.0000

Testing for trendstabodds command

• Adjustment for multiple variables possible– tabodds cac alccat, adjust(age male black)

Approaching your analysis

• Number of potential models/analyses is daunting– Where do you start? How do you finish?

• My suggestion– Explore– Plan definitive analysis, make dummy tables/figures– Do analysis (do/log files), fill in tables/figures– Show to collaborators, reiterate prn– Write paper

Summary• Make sure you understand confounding and interaction

with 2x2 tables in Stata

• Epitab commands are a great way to explore your data– Emphasis on interaction

• Logistic regression is a more general approach, ubiquitous, but testing for interactions and trends is more difficult

In lab today…

• Lab 5– Epi analysis of coronary calcium dataset– Walks you through evaluation of confounding

and interaction• Judgment calls – often no right answer, just focus on

reasoning.• Reminder – put your answers as comments in the do

file* 15c – 15%, p<.001

top related