what is interaction for a binary outcome? chun li department of biostatistics center for human...
TRANSCRIPT
What is Interaction for A Binary Outcome?
Chun Li
Department of Biostatistics
Center for Human Genetics Research
September 19, 2007
2
What We Have Learned
• Little.• Generic.
• In linear regression: y = β0 + β1x1 + β2x2 + β3x1x2
• In whatever other regression, the right-hand side isβ0 + β1x1 + β2x2 + β3x1x2
• For a binary outcome, we often use logistic regression. For example, the log-odds of cancer risk
log(Oij) = β0 + β1×sex + β2×smoking + β3×sex×smoking
“main effect” “interaction effect”
3
Interaction
• Introduced by R. A. Fisher to generalize the concept “epistasis” in genetics.
• The concept is ubiquitous.• The word sounds easy to understand, and is
charismatic in some circles.
• Ambiguous without model context.• Hard to interpret and translate to reality for some
models, such as logistic regression.
4
Epistasis
• Example: Genotype BB masks the effect of gene A.
• It is a very special type of interaction.
• Such a phenomenon can be seen in other contexts, e.g. gene-environment interaction.
bb Bb BB
aa
Aa
AA
Exposure
No Yes
aa
Aa
AA
5
“No Interaction” ≠ Independence• Interaction is about the joint effect of input variables
on an outcome, or how the effect change as the values change at the input variables.
• Independence is about the statistical relationship between input variables, irrespective of the outcome or the effect on the outcome.
• Using “independent effect” to describe “no interaction” may be confusing.
6
Interaction = Effect Modification• Effect modification: The effect of one variable on
the outcome is modified depending on the values of other variables.
• It depends on how “effect” is measured and on what scale. ― Kenneth Rothman, Sander Greenland
• For a binary outcome, “effect” can be measured as– risk difference
– risk ratio
– odds ratio
7
Measuring Effect: Risk Difference
If gender doesn’t modify the “effect” of smoking, thenR01 – R00 = R11 – R10
R11 – R00 = (R10 – R00) + (R01 – R00)
RR11 – 1 = (RR10 – 1) + (RR01 – 1)
additive decomposition of risk: Rij = ai + bj
Smoking
No (0) Yes (1) Marginal
Male (0) R00 R01 R0•
Female (1) R10 R11 R1•
Marginal R•0 R•1
“Effect” of smoking:R01 – R00 (in males)
R11 – R10 (in females)
Equivalent
= R•1 – R•0 (!)
= (R1• – R0•) + (R•1 – R•0)
, where RRij = Rij / R00
8
Measuring Effect: Risk Ratio
If gender doesn’t modify the “effect” of smoking, thenR01 / R00 = R11 / R10
RR11 = RR10 × RR01
RR11 = (R1• / R0•) × (R•1 / R•0)
multiplicative decomposition of risk: Rij = ci × dj
Smoking
No (0) Yes (1) Marginal
Male (0) R00 R01 R0•
Female (1) R10 R11 R1•
Marginal R•0 R•1
“Effect” of smoking:R01 / R00 (in males)
R11 / R10 (in females)
Equivalent
= R•1 / R•0 (!)
9
Measuring Effect: Odds Ratio
If gender doesn’t modify the “effect” of smoking, thenO01 / O00 = O11 / O10
OR11 = OR10 × OR01 , where ORij = Oij / O00
additive decomposition of log-odds ln(Oij)
Even if gender doesn’t modify the effect of smoking, smoking’s marginal effect may be different from its gender-specific effect !?!
Smoking
No (0) Yes (1) Marginal
Male (0) O00 O01 O0•
Female (1) O10 O11 O1•
Marginal O•0 O•1
“Effect” of smoking:O01 / O00 (in males)
O11 / O10 (in females)
O** = R**/(1 – R**)
Equivalent≠ O•1 / O•0 in general (?!?)
10
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
p1
p 2
25
OR = 10
1/21/5
1/10
0.0 0.2 0.4 0.6 0.8 1.0
-0.4
0.0
0.2
0.4
p1
p 2
p 1
0.0 0.2 0.4 0.6 0.8 1.0
02
46
810
p1
p 2p 1
0.0 0.2 0.4 0.6 0.8 1.0
-3-2
-10
12
3
p1
log
p 2
log
p 1
)1/(
)1/(
11
22
pp
ppOR
11
“No interaction” under one definition often means interaction under another definition.
Results from interaction analysis should be always reported with the scale that was used to measure effect.
Some effect measures are intuitive, some are not intuitive and even not intrinsically consistent.
Interaction = Effect ModificationMeasure
12
Biologic Interaction
• Biologic interaction = biologically causal interaction.• Greenland and Rothman argued that “biologic
interaction” is reflected by departure from additive risks.– Counterfactual arguments
– Causal pie arguments
• Additive definition is difficult to test directly in case-control studies.
13
Advantages of Logistic Regression
• For retrospective studies (e.g., case-control studies), risk difference and risk ratio cannot be estimated and analyzed. But odds ratio can!
• Odds ratio doesn’t have boundary effect. Both risk difference and risk ratio do:– Interaction effect must exist under some circumstances.– May cause problems computationally.
• Odds ratio ≈ risk ratio, when risks are very small.
14
Misconception 1
Interaction terms are treated the same way as main-effect terms:– Numerical comparison between an interaction
coefficient and a main-effect coefficient.– (logistic regression) Power to detect interaction
when “interaction explains half of the total effect.”– (logistic regression) “Odds ratio” of the
interaction.– Fact: They are oranges and apples.
15
Misconception Reinforced by Software
• Stata output:
. logistic case v1 v2 v12
Logistic regression Number of obs = 1530 LR chi2(3) = 12.93 Prob > chi2 = 0.0048Log likelihood = -878.77373 Pseudo R2 = 0.0073
------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- v1 | 1.52674 .8978875 0.72 0.472 .4821329 4.83463 v2 | .7779552 .4651644 -0.42 0.675 .2409871 2.511397 v12 | 1.004005 .3277949 0.01 0.990 .5294554 1.903893------------------------------------------------------------------------------
16
Interaction in Logistic Regression
μ00 = β0
μ01 = β0 + β2
μ10 = β0 + β1
μ11 = β0 + β1 + β2 + β3
Smoking
No (0) Yes (1)
Male (0) O00 O01
Female (1) O10 O11
μij = log(Oij) = β0 + β1×sex + β2×smoking + β3×sex×smoking
Coefficient β exp(β)
β1 = μ10 – μ00 O10 / O00
β2 = μ01 – μ00 O01 / O00
β3 = (μ11 – μ10) – (μ01 – μ00) (O11 / O10) / (O01 / O00)
Ratio of odds ratios
Baseline ORs
β1β2
17
Misconception 2Interpret main-effect terms when interaction terms
are included in the model:– Evaluation of statistical significance of “main-effect”.– Fact: Main-effect term should always be included
in the model as long as it is involved in some interaction terms.
– A main-effect coefficient is interpreted as the magnitude of “main effect” or “marginal effect”.
– Fact: Main-effect coefficient of variable X represents its “baseline effect” when all variables “interacting” with X are zero (i.e. at baseline).
– Its interpretation depends on how other variables are coded (i.e. where the baselines are).
18
Significance of a Main-Effect Term in Logistic Regression
μ00 = β0
μ01 = β0 + β2
μ10 = β0 + β1
μ11 = β0 + β1 + β2 + β3
Smoking
No (0) Yes (1)
Male (0) O00 O01
Female (1) O10 O11
Statistical significance of a term ≡ if it can be removed.
μij = log(Oij) = β0 + β1×sex + β2×smoking + β3×sex×smoking
What would happen if β2 = 0?
This means differently when sex is coded differently.
19
One Input Variable is ContinuousY = β0 + β1G + β2X + β3G×X
A: YA = β0 + β2X
B: YB = (β0 + β1) + (β2 + β3)X
β1 = YB – YA when X = 0
β2 = slope for group A
β3 = difference in slopes (B – A)
x
y
a b
G = 0 (group A)
G = 1 (group B)
β1 = 0 → same Y when X = 0.
β2 = 0 → group A is flat.
β3 = 0 → equal slopes.
often extrapolative and meaningless
Not marginal effects
20
Misconception 3
• If a set of variables/genes together with all possible combinations among them (i.e. allowing full interactions) significantly predict the outcome, then we have found interaction among these variables.
• Fact: Interaction is about departure from additive effects. The variables may just have additive effects without interaction.
21
Do We Want Generic Interaction?Carcinogen exposure
No (#case/#control)
Yes (#case/#control)
aa 14/30 12/34
Aa 8/20 19/19
AA 9/18 18/19
Generic interaction
H0: 4 parameters
Ha: 6 parameters
DF = 2, p = 0.19
Carcinogen
No Yes
aa − 0.76
Aa 0.86 2.14
AA 1.07 2.03
A gene is identified to metabolize a carcinogen. Allele A is the putative susceptibility allele.
Goal: Is the risk elevated for those who have carcinogen exposure and carry the risk allele?Data from Piegorsch et al. (1994)
22
Do We Want Generic Interaction?
Approach 4
H0: 1 group
Ha: 2 groups
DF = 1, p = 0.0043
Carcinogen
No Yes
aa − −
Aa − 2.31
AA − 2.31
Approach 3
H0: 1 group
Ha: 3 groups
DF = 2, p = 0.017
Carcinogen
No Yes
aa − −
Aa − 2.37
AA − 2.25
Approach 2
H0: 2 groups
Ha: 4 groups
DF = 2, p = 0.037
Carcinogen
No Yes
aa − 0.77
Aa − 2.19
AA − 2.08
23
Testing for Interaction While Adjusting for Other Covariates
μage, 00 = (β0 + β4age)
μage, 01 = (β0 + β4age) + β2
μage, 10 = (β0 + β4age) + β1
μage, 11 = (β0 + β4age) + β1 + β2 + β3
μage, ij = log(Oage, ij) = β0 + β4age + β1sex + β2smoking+ β3sex×smoking
We are testing for interaction under the assumption that the effects of sex, smoking, and sex×smoking are the same over the whole ranges of the covariates.
Smoking
No (0) Yes (1)
Male (0) Oage, 00 Oage, 01
Female (1) Oage, 10 Oage, 11