as08

25
1 Additive and multiplicative models (AS08) EPM304 Advanced Statistical Methods in Epidemiology Course: PG Diploma/ MSc Epidemiology This document contains a copy of the study material located within the computer assisted learning (CAL) session. If you have any questions regarding this document or your course, please contact DLsupport via [email protected] . Important note: this document does not replace the CAL material found on your module CDROM. When studying this session, please ensure you work through the CDROM material first. This document can then be used for revision purposes to refer back to specific sessions. These study materials have been prepared by the London School of Hygiene & Tropical Medicine as part of the PG Diploma/MSc Epidemiology distance learning course. This material is not licensed either for resale or further copying. © London School of Hygiene & Tropical Medicine September 2013 v2.0

Upload: lakshmi-seth

Post on 15-Sep-2015

223 views

Category:

Documents


1 download

DESCRIPTION

stats notes

TRANSCRIPT

  • 1

    Additive and multiplicative models (AS08)

    EPM304 Advanced Statistical Methods in Epidemiology

    Course: PG Diploma/ MSc Epidemiology

    This document contains a copy of the study material located within the computer assisted learning (CAL) session. If you have any questions regarding this document or your course, please contact DLsupport via [email protected]. Important note: this document does not replace the CAL material found on your module CDROM. When studying this session, please ensure you work through the CDROM material first. This document can then be used for revision purposes to refer back to specific sessions. These study materials have been prepared by the London School of Hygiene & Tropical Medicine as part of the PG Diploma/MSc Epidemiology distance learning course. This material is not licensed either for resale or further copying.

    London School of Hygiene & Tropical Medicine September 2013 v2.0

  • 2

    Section 1: Additive and multiplicative models Aim

    To consider how the effect of 2 categorical variables combine, in terms of their effect on the outcome variable

    To learn how additive models may sometimes be used as an alternative to multiplicative models with interaction terms

    Objectives By the end of this session you will be able to:

    describe the difference between an additive model and a multiplicative model obtain an additive model for modelling risks or rates, from cohort and cross

    sectional study designs compare the fit to the data of an additive model with the fit to the data of a

    multiplicative model think about the statistical and biological considerations for choosing between

    an additive and a multiplicative model This session should take you between 1h 15m and 2 hours to complete. Section 2: Planning your study The models we have fitted in previous sessions have all been multiplicative. This means that model parameters represent ratios (e.g. rate ratios, odds ratios) and the effects of two factors are assumed to combine multiplicatively. This assumption seems to work well for many risk factors in epidemiology; we can always fit interaction terms when the assumption is not appropriate. In this session, we see how additive models may sometimes be used as an alternative to multiplicative models with interaction terms. With additive models, we assess the effect of explanatory variables with risk or rate differences, rather than with risk, rate or odds ratios. To work through this session you should know about regression models, specifically Logistic and Poisson regression. You should also know about the strategy for building these models. If you need to review any materials before you continue refer to the appropriate sessions below. Framework for regression models AS01 Logistic regression SM07, SM08, SM09 Poisson regression SM11, AS05

  • 3

    2.1: Planning your study To illustrate methods in this session, we will use data from the Whitehall cohort study. Whitehall Study: A cohort study of risk factors for mortality in men employed in Whitehall, London. Interaction: Hyperlink: Whitehall Study: (card appears on right handside): This cohort study was set up to examine risk factors for mortality in male government employees (civil servants) working around Whitehall, London. Employees were recruited between 1967 and 1970. Information on exposure to selected risk factors was obtained by a self-administered questionnaire and a screening examination during this period. All participants were followed at the National Health Service Central Registry to identify mortality and emigration. Information on death (date and cause) was provided for those who died. The results used in this session are from a 10% random sample of the total dataset. Section 3: Background: Multiplicative models The models you have fitted previously all take the general form: log( y ) = + i xi = + 1x1 + 2x2 + where: y is a measure of disease occurrence xi are the explanatory variables (or categories within them) and i (i = 1, ... , I ) are the regression parameters, which have to be estimated from the data. p3c1rhs In each of the following models, what is the disease outcome measure, and in what form is the disease outcome measure modelled: a) a Poisson regression model? b) a logistic regression model? Interaction: Button: clouds picture (pop up box appears and text and an interaction appear on bottom RHS): For Poisson regression the outcome measure y is the rate , but remember it is log( y ) that is modelled.

  • 4

    For logistic regression the outcome measure y is the odds /(1 ), but remember it is log( y ) that is modelled. What form can the explanatory (independent) parameters xi take? Interaction: Button: clouds picture (pop up box appears): The explanatory variables can be categorical variables, quantitative variables or interactions between the explanatory variables. 3.1: Background: Multiplicative models Because the left-hand side of the equation is on the log scale, these models are all multiplicative models. To illustrate this, consider the simple model with two dichotomous variables x1 and x2 which represent two binary (0 or 1) exposures E1 and E2. log( y ) = + 1x1 + 2x2 If we want the model equation on the original scale of y (so rates or odds), we need to exponentiate both sides of the equation. Using the laws of logarithms, we have y = exp() exp(1x1) exp(2x2) Then the rates (or odds) for the four combinations of exposure are: E1- E1+ E2- exp() exp() exp(1) E2+ exp() exp(2) exp() exp(1) exp(2) so that the effects of the two exposures multiply together. Note that exp(1) is a rate ratio for the effect of E1 if the outcome y is a rate, and it is an odds ratio for the effect of E1 if the outcome y is an odds. Similarly for exp(2) Pop-up for each cell of the table: Interaction: Button: "exp()" (pop-up box appears): In this case there is no exposure to E1 or E2, so x1=0 and x2=0 and the rate y is given by exp() exp(1 x 0) exp(2 x 0)= exp()exp(0)exp(0) = exp()

  • 5

    Interaction: Button: "exp() exp(1)" (pop-up box appears): In this case there is exposure to E1, but not to E2, so x1=1 and x2=0 and the rate y is given by exp() exp(1) x exp(2x 0) = exp() exp(1) x exp(0) = exp() exp(1) Interaction: Button: "exp() exp(2)" (pop-up box appears): In this case there is exposure to E2, but not to E1, so x1=0 and x2=1 and the rate y is given by exp() exp(1x 0) x exp(2) = exp() exp(0) exp(2) = exp() exp(2) Interaction: Button: "exp() exp(1) exp(2)" (pop-up box appears): In this case there is exposure to E1 and to E2, so x1=1 and x2=1, and the rate y is given by exp() exp(1 x 1) exp(2 x 1) = exp() exp(1) exp(2) 3.2: Background: Multiplicative models So: The effects of the exposures on log (y) combine additively, that is, the model is additive on the log scale But: On the original scale of y, the effects of the exposures combine multiplicatively so that the model is multiplicative on the original scale. When the effects of explanatory variables combine multiplicatively on the original scale then we say that it is a multiplicative model. 3.3: Background: Multiplicative models Sometimes the effects of explanatory variables are not of this kind. We can model the effect of categorical variables (on the outcome) that do not combine multiplicatively, by incorporating interaction terms. The interaction term serves to "correct" the model so that it fits the data better. Fitting interaction terms is how you have dealt with non-multiplicative effects so far. Why might this be a problem? Might it be preferable to avoid interaction terms? Interaction: Button: clouds picture (pop up box appears and card appears on RHS): Including interaction terms in a model adds additional parameters, and for ease of interpretation and presentation we would prefer to avoid interaction terms if possible. We now consider alternative models that have an additive structure. Such models may sometimes provide a better fit to the data without needing interaction terms. Section 4: Additive models

  • 6

    The simplest alternative to the multiplicative model is the additive model. The general form of an additive model is: y = + i xi Notice that this model is additive on the original scale of y, hence the term additive model. The outcome y is modelled on its original scale. This type of model is suitable for modelling rates and risks. Interaction: Tabs: Rates: In cohort studies, we have person-time data and y is the rate . = + i xi This is the additive rate model. The parameters i represent rate differences whereas in the multiplicative Poisson model they represent a log(rate ratio) Interaction: Tabs: Risks: For binary data, we are modelling the proportion with a given response, and y is the proportion . = + i xi This is the additive risk model. The parameters i represent risk differences whereas in logistic regression, where we have a multiplicative model, they represent a log(odds ratio). Note: This model involves neither logs nor odds, although it is often termed "logistic regression with additive risks". A better term would be binomial regression. Note also that this model is for risks (not odds), so it may not be used for case-control studies. 4.1: Additive models The additive model for two dichotomous variables x1 and x2, which represent two binary (0 or 1) exposures E1 and E2, has the form: y = + 1x1 + 2x2 Under this model, the rates (or risks) for the four combinations of exposure are shown opposite. You can click the button below the table to compare this with the multiplicative format.

  • 7

    Notice how the effect due to E1 (+1) and the effect due to E2 (+2) add to give the combined effect (+1 + 2). Notice also how the effect of E1 - as measured by 1 - is measured as a rate (or risk) difference. And the effect of E2 - as measured by 2 - is measured also as a rate (or risk) difference. In contrast, with a multiplicative model 1 and 2 were each (i) a log(odds ratio), if y was the log(odds of outcome) or (ii) a log(rate ratio), if y was the log(rate of outcome) Additive model E1 E1+ E2 +1 E2+ +2 +1+2 Since the outcome is modelled on the original scale, there is no need to exponentiate the coefficients. Interaction: Button: (pop up box appears): In this case, there is no exposure to E1 nor to E2. So x1 = 0 and x2 = 0, and the rate y is given by: y = + 1x1 + 2x2 = + (1 x 0) + (2 x 0) = Interaction: Button: +1 (pop up box appears): In this case, there is exposure to E1 but not to E2. So x1 = 1 and x2 = 0, and the rate y is given by: y = + 1x1 + 2x2 = + (1 x 1) + (2 x 0) = + 1 Interaction: Button: + 2 (pop up box appears): In this case, there is exposure to E2 but not to E1. So x1 = 0 and x2 = 1, and the rate y is given by:

  • 8

    y = + 1x1 + 2x2 = + (1 x 0) + (2 x 1) = + 2 Interaction: Button: + 1 + 2 (pop up box appears): In this case, there is exposure to both E1 and E2. So x1 = 1 and x2 = 1, and the rate y is given by: y = + 1x1 + 2x2 = + (1 x 1) + (2 x 1) = + 1 + 2 Interaction: Button: Compare (above table changes to the following): Multiplicative model E1 E1+ E2 exp() exp() exp(1) E2+ exp() exp(2) exp() exp(1) exp(2) Interaction: Button: exp() (pop up box appears): In this case, there is no exposure to E1 nor to E2. So x1 = 0 and x2 = 0, and the rate y is given by: y = exp() exp(1x1) exp(2 x2) y = exp() exp(0) exp(0) y = exp() Interaction: Button: exp() exp(1) (pop up box appears): In this case, there is exposure to E1 but not to E2. So x1 = 1 and x2 = 0, and the rate y is given by: y = exp() exp(1 x1) exp(2 x2) y = exp() exp(1) exp(0) y = exp() exp(1) Note that the effect of x1 is measured as : rate if x1=1 / rate if x1=0 = exp() exp(1) / exp() = exp(1) So exp(1) is a rate (or odds) ratio

  • 9

    Interaction: Button: exp() exp(2) (pop up box appears): In this case, there is exposure to E2 but not to E1. So x1 = 0 and x2 = 1, and the rate y is given by: y = exp() exp(1x1) exp(2 x2) y = exp() exp(0) exp(2) y = exp() exp(2) Interaction: Button: exp() exp(1)exp(2) (pop up box appears): In this case, there is exposure to both E1 and E2. So x1 = 1 and x2 = 1, and the rate y is given by: y = exp() exp(1x1) exp(2 x2) y = exp() exp(1) exp(2) y = exp() exp(1) exp(2) 4.2: Additive models Comparison between a multiplicative and an additive model On the following cards you will compare a multiplicative model with an additive model. You will examine the strength of evidence for the need for interaction terms in each type of model, and compare the models in terms of whether they do, or do not, need interaction terms in order to give an adequate fit to the data. To illustrate this, consider the Whitehall study of risk factors for ischaemic heart disease (IHD) mortality. Smoking was thought to increase mortality. Mortality also increases with age. The two exposures were categorised as follows: Smoking: non/ex-smokers, current smokers Age: 50 to 64 years, 65 to 74 years. p4c4lhs The outcome, y, is the ischemic heart disease (IHD) mortality rate (per 1000 person-years) In a multiplicative Poisson model, log(y) = log(rate) is modelled. In an additive rate model, y = rate is modelled. 4.3: Additive models The observed data

  • 10

    The observed IHD mortality rates (events/person-years) are given below. There is a higher IHD mortality rate in smokers than in non-smokers (for both age groups), and a higher rate in those aged 65-74 years old than in those aged 50-64 years old (for smokers and non-smokers).

    Age group (x2)

    50-64 years=0

    65-74 years=1

    Smoking (x1)

    No/ex=0

    2.76 (240/86863)

    8.78 (243/27692)

    Yes=1

    5.60 (376/37131)

    12.32 (293/23786)

    Use the button to swap to a table of rate ratios for the effect of age group, separately for smokers and non-smokers, and rate differences for the effect of age, separately for smokers and non-smokers. On the basis of these rate ratios and rate differences, do you think the effects of smoking and age combine multiplicatively or additively? Interaction: Button: Swap (table on centre bottom changes to the following): Rate ratios and differences for effect of age by smoking Non/ Ex Smoker Smoker

    Rate ratio for age: 3.18 2.20 Rate difference for age: 6.02 6.72

    Interaction: Button: The rate differences for the effect of age are similar (6.02 compared with 6.72) for non/ex-smokers and smokers. The rate ratios for age are not so similar for non/ex-smokers and smokers (3.18 for non/ex-smokers compared to 2.20 for smokers). This suggests that the effects of smoking and age might combine additively, but perhaps not multiplicatively, in which case an interaction term might be needed if we fit a multiplicative model. We could form a similar table to summarise the effect of smoking, separately for age group 50-64 years old, and age group 65-74 years old. If we did this, we would find that the rate ratio for smoking in individuals aged 50-64 years old is 2.03 and the rate ratio for smoking in individuals aged 65-74 years old is 1.40. And the rate difference between smokers and non/ex-smokers is 2.84 for individuals aged 50-64 years old, compared to 3.54 for individuals aged 65-74 years old. 4.4: Additive models Below are the two multiplicative models with and without interaction between age and smoking. Use the button to swap between the tables. Note: The output is given on the log scale, so the coefficients are log rates and log RRs Interaction: Button: Swap (table on centre bottom changes to the following):

  • 11

    Multiplicative Poisson model with smoking and age, and the interaction between them Coeffici

    ent Standard Error

    z P > |z| 95% Confidence Interval

    Smoking 0.7066 0.0826 8.55 < 0.001 0.545 0.869 Age 1.1556 0.0910 12.70 < 0.001 0.977 1.334 Smoking.Age

    0.3674 0.1198 3.07 0.002 0.602 0.133

    Constant 1.0163 0.0645 15.74 < 0.001 0.890 1.143 Log likelihood = -1225.701

    Multiplicative Poisson model with smoking and age, without interaction Coeff

    icient Stand

    ard Error

    z P > |z| 95% Confidence Interval

    Smoking 0.5345

    0.0597 8.95 < 0.001 0.417 0.652

    Age 0.9426

    0.0591 15.95 < 0.001 0.827 1.058

    Constant 1.1178

    0.0527 21.21 < 0.001 1.014 1.221

    Log likelihood = -1230.411 In the model without interaction, what is the rate ratio for smoking, to 2 decimal places? RR (Smoking) = In the model with interaction, what is the rate ratio for the additional joint effect of smoking and age, to 2 decimal places? RR (Smoking.Age) = Interaction: Calculation: RR (Smoking) =____: Correct Response 1.71 (pop up box appears): Correct That's right, the rate ratio is given as the exponential of the coefficient for smoking: RR = exp(0.5345) = 1.71 Incorrect Response (pop up box appears): Sorry, the rate ratio should be calculated as the exponential of the coefficient for smoking (because the coefficient is the log rate ratio): RR = exp(0.5345) = 1.71

  • 12

    Interaction: Calculation: RR (Smoking.Age) =____: Correct Response 0.69 (pop up box appears): Correct Yes, the rate ratio is given as the exponential of the coefficient for the smoking.age interaction term: RR = exp(0.3674) = 0.69 Incorrect Response (pop up box appears): Sorry, the rate ratio should be calculated as the exponential of the coefficient for the interaction term smoking.age (because the coefficient is the log rate ratio): RR = exp(0.3674) = 0.69 4.5: Additive models The likelihood ratio test for interaction gives P = 0.002. Is there evidence of interaction when the data are analysed using a multiplicative model? Interaction: Button: clouds picture (pop up box appears on right handside): Yes, there is strong evidence of interaction, therefore you must include the interaction in the multiplicative model in order to obtain a good fit of the model to your data. Interaction: Button: Swap (the table on centre bottom changes to the following): Multiplicative Poisson model with smoking and age, and the interaction between them Coeffici

    ent Standa

    rd Error

    z P > |z| 95% Confidence Interval

    Smoking 0.7066 0.0826 8.55 < 0.001 0.545 0.869 Age 1.1556 0.0910 12.70 < 0.001 0.977 1.334 Smoking.Age

    0.3674 0.1198 3.07 0.002 0.602 0.133

    Constant 1.0163 0.0645 15.74 < 0.001 0.890 1.143 Log likelihood = -1225.701

  • 13

    Multiplicative Poisson model with smoking and age, without interaction Coeffici

    ent Standard Error

    z P > |z| 95% Confidence Interval

    Smoking 0.5345 0.0597 8.95 < 0.001 0.417 0.652 Age 0.9426 0.0591 15.95 < 0.001 0.827 1.058 Constant 1.1178 0.0527 21.21 < 0.001 1.014 1.221

    Log likelihood = 1230.411 4.6: Additive models The two multiplicative models can be summarised as shown in the tabs below. Move the cursor over each term of the equation to see what it means. Interaction: Tabs: No interaction: Multiplicative model with no interaction: log( ) = 1.12 + 0.53 x1 + 0.94 x2 = exp(1.12+0.53x1 +0.94x2) = exp(1.12) x exp(0.53x1) x exp(0.94x2) From this we can obtain the fitted mortality rates (i.e. those predicted under this model), for each of the four combinations of smoking and age. The fitted mortality rates are given in the table opposite. Interaction: Scroll over log( ): Log rate Interaction: Scroll over 1.12: Baseline log rate (i.e. log rate in non/ex-smokers who are aged 50-64 years old) Interaction: Scroll over 0.53 x1: Log RR for smoking Interaction: Scroll over 0.94 x2: Log RR for age Interaction: Tabs: Interaction: Multiplicative model with an interaction:

  • 14

    log( ) = 1.02 + 0.71 x1 + 1.16 x2 0.37x1x2 = exp(1.02 + 0.71x1 + 1.16x2 -0.37x1x2) = exp(1.02) x exp(0.71x1) x exp(1.16x2) x exp(-0.37x1x2) = exp(1.02) x exp(0.71x1) x exp(1.16x2) / exp(0.37x1x2) From this we can obtain the fitted mortality rates as shown in the table opposite. Interaction: Scroll over: 1.02: Log rate in the baseline group, i.e. in non/ex-smokers aged 50-64 years old Interaction: Scroll over: 0.71 x1: Log RR for smoking in the baseline group of age, i.e. in individuals aged 50-64 years old Interaction: Scroll over: 1.16 x2: Log RR for age in the baseline group of smoking, i.e. in individuals who are non/ex-smokers Interaction: Scroll over: - 0.37x1x2: Log RR interaction Fitted mortality rates estimated from the multiplicative model with no interaction:

    Smoking (x1) Age group (x2) 50-64 years

    (=0) 65-74 years

    (=1) None / ex-smoker (=0)

    3.06 7.85

    Current smoker (=1)

    5.21 13.33

    Interaction: Button: 3.06 (pop up box appears): exp(1.12) = 3.06 Interaction: Button: 7.85 (pop up box appears): exp(1.12)xexp(0)xexp(0.94) = exp(2.06) = 7.85 Interaction: Button: 5.21 (pop up box appears): exp(1.12)xexp(0.53)xexp(0) = exp(1.65) = 5.21

  • 15

    Interaction: Button: 13.33 (pop up box appears): exp(1.12)xexp(0.53)xexp(0.94) = exp(2.59) = 13.33 Model: log = 1.12 + 0.53 smoking + 0.94 age = exp(1.12)xexp(0.53 smoking)xexp(0.94 age) Interaction: Button: Explanation (pop up box appears): Explanation In this model we assume the effects of smoking and age combine multiplicatively (no interaction means that effects combine multiplicatively). The estimated mortality rate for someone exposed to neither smoking nor age is exp(1.12) = 3.06. The estimated mortality rate for someone exposed to both is obtained by multiplying together the baseline rate, the effect of smoking and the effect of age: exp(1.12) exp(0.53) exp(0.94) = 13.33.

  • 16

    Fitted mortality rates estimated from the multiplicative model with interaction:

    Smoking (x1) Age group (x2) 50-64 years

    (=0) 65-74 years

    (=1) None / ex-smoker (=0)

    2.77 8.85

    Current smoker (=1)

    5.64 12.43

    Interaction: Button: 2.77 (pop up box appears): exp(1.02) = 2.77 Interaction: Button: 8.85 (pop up box appears): exp(1.02)xexp(1.16) = exp(2.18) = 8.85 Interaction: Button: 5.64: exp(1.02)xexp(0.71) = exp(1.73) = 5.64 Interaction: Button: 12.43: exp(1.02)xexp(0.71)xexp(1.16)xexp(-0.37) = exp(2.52) = 12.43 Model: log = 1.02 + 0.71 smoking + 1.16 age

    0.37 smoking.age

    = exp(1.02) x exp(0.71x1) x exp(1.16x2) x exp(-0.37x1x2) Interaction: Button: Explanation (pop up box appears): Explanation In this model the effects of smoking and age do not combine multiplicatively. Including the interaction term has dealt with the problem of non-multiplicative effects. The fitted mortality rate for a non/ex-smoker aged 50-64 years old is 2.77. The IHD mortality rate for someone exposed to both smoking and age is obtained by multiplying together the baseline rate, the effect of smoking, and the effect of age, and then dividing by the interaction term (since the interaction term, on the log scale, is negative): exp(1.02) exp(0.71) exp(1.16) / exp(0.37) = 12.43. An interaction term different to zero suggest departures from a multiplicative model.

  • 17

    4.7: Additive models The meaning of interaction on the multiplicative scale Interaction: Tab 1: The sign (positive or negative) and size of an interaction coefficient may be used to assess whether multiplicative effects are consistent with the data. The following interpretation is applicable provided that each of (i) the RR for x1 in the baseline group of x2 is greater than one and (ii) the RR for x2 in the baseline group of x1 is greater than one OR Both these RRs are less than one, i.e. that these RRs are in the same direction in terms of whether they are above or below one. Interaction: Tab 2: In our example, the RR for age in the baseline group of smoking is >1 (it is exp(1.1556) = 3.18), and the RR for smoking in the baseline group of age is also >1 (it is exp(0.7066) = 2.03), so we can interpret the interaction coefficient in the following way. If these conditions are true (as they are in our example), then: An interaction coefficient close to zero suggests that the effects of age and smoking combine multiplicatively. A large negative interaction coefficient suggests less than multiplicative effects (additive, perhaps). A large positive interaction coefficient suggests greater than multiplicative effects. In the multiplicative model for smoking and age that includes interaction, the interaction term is negative (-0.3674) and there is strong evidence that it is not zero (p=0.002), possibly suggestive of additive effects. 4.8: Additive models Fitting an additive rate model Now let's look at the additive rate models with and without interaction between age and smoking. Use the button to swap between the tables. Comparison of the models using a likelihood ratio test gives P = 0.47. Is there evidence of interaction in an additive model? Interaction: Button: clouds picture (pop up box appears): A P-value of 0.47 shows data are consistent with no interaction in the additive model. Note that the P-value for the smoking.age interaction term in the table (from the Wald test) also gives P = 0.47. Interaction: Button: Swap (table on bottom centre changes to the following):

  • 18

    Additive rate model with interaction Coeffici

    ent Standa

    rd Error

    z P > |z|

    95% Confidence

    Interval Smoking 2.8380 0.3395 8.36 <

    0.001 2.173 3.503

    Age 6.0120 0.5905 10.18

    < 0.001

    4.855 7.160

    Smoking.Age

    0.7053 0.9747 0.72 0.469 1.205 2.616

    Constant 2.7630 0.1783 15.49

    < 0.001

    2.413 3.113

    Log likelihood = 1225.701 Additive rate model without interaction Coeffici

    ent Standa

    rd Error

    z P > |z|

    95% Confidence

    Interval Smoking 2.9246 0.3189 9.17 <

    0.001 2.300 3.550

    Age 6.2789 0.4712 13.33

    < 0.001

    5.355 7.202

    Constant 2.7394 0.1746 15.69

    < 0.001

    2.397 3.082

    Log likelihood = 1225.964 In the model without interaction, what is the rate difference for smoking, to 2 decimal places? RD (Smoking) = In the model with interaction, what is the rate difference for the additional joint effect of smoking and age, to 2 decimal places? RD (Smoking.Age) = Interaction: Calculation: RD (Smoking) =____: Correct Response 2.92 (pop up box appears): Correct That's right, the rate difference is the coefficient for smoking in the table: RD = 2.92 Incorrect Response (pop up box appears): Sorry, the rate difference is the coefficient for smoking in the table since the outcome is modelled on the original scale:

  • 19

    RD = 2.92 Interaction: Calculation: RD (Smoking.Age) =____: Correct Response 0.71 (pop up box appears): Correct Yes, the rate difference is the coefficient for smoking.age in the table: RD = 0.71 Incorrect Response (pop up box appears): Sorry, the rate difference is the coefficient for smoking.age in the table since the outcome is modelled on the original scale: RD = 0.71 4.9: Additive models The two additive rate models can be summarised as shown on the tabs below. Move the cursor over each term of the equation to see what it means. Interaction: Tabs: No interaction:: Additive rate model with no interaction: = 2.74 + 2.92 x1 + 6.28 x2 The corresponding mortality rates are given in the table opposite. Interaction: Scroll over: : Rate Interaction: Scroll over 2.74: Baseline rate Interaction: Scroll over: 2.92 x1: Rate difference for smokers Interaction: Scroll over: 6.28 x2: Rate difference for age Interaction: Tabs: Interaction:

  • 20

    Additive model with an interaction: = 2.76 + 2.84 x1 + 6.01 x2 + 0.71x1x2 The corresponding mortality rates are given in the table opposite. Interaction: Scroll over: : Rate Interaction: Scroll over: 2.76: Baseline rate (i.e. rate in non/ex-smokers aged 50-64 years old) Interaction: Scroll over: 2.84 x1: Rate difference for smokers in the baseline group of age (50-64 years old) Interaction: Scroll over: 6.01 x2: Rate difference for age in the baseline group of smoking (non/ex-smokers) Interaction: Scroll over: 0.71x1x2: Interaction - the "additional" effect of smoking in age group 65-74 years old (compared to the effect of smoking in individuals aged 50-64 years old), and the "additional" effect of being aged 65-74 years old in smokers (compared to the effect of age in non/ex-smokers). Note that although the word "additional" is used here, the interaction term can be either negative or positive. p4c10rhs (when lhs is on No Interaction tab) Fitted mortality rates estimated from the additive rate model with no interaction: Smoking (x1) Age group (x2)

    50-64 years (=0)

    65-74 years (=1)

    None / ex-smoker (=0)

    2.74 2.74 + 6.28 = 9.02

    Current smoker (=1)

    2.74 + 2.92 = 5.66

    2.74 + 2.92 + 6.28

    = 11.94 Model: = 2.74 + 2.92 smoking + 6.28 age Interaction: Button: Explanation (pop up box appears): Explanation

  • 21

    In this example, no interaction term has been added to the additive model, so the effects of smoking and age are assumed to combine additively. The fitted mortality rate for an individual aged 50-64 years old who is a non/ex-smoker is 2.74. The fitted mortality rate for an individual aged 65-74 years old who is a smoker is obtained by adding together the baseline rate, the effect of smoking and the effect of age: 2.74 + 2.92 + 6.28 = 11.94. Fitted mortality rates estimated from the additive rate model with interaction: Smoking (x1)

    Agegroup (x2) 50-64 years (=0)

    65-74 years (=1)

    None / ex- smoker (=0)

    2.76 2.76 + 6.01 = 8.77

    Current smoker (=1)

    2.76 + 2.84

    = 5.60

    2.76 + 2.84 + 6.01 +

    0.71 = 12.32

    Model: = 2.76 + 2.84 smoking + 6.01 age + 0.71 smoking.age Interaction: Button: Explanation (pop up box appears): In this model the effects of smoking and age do not combine additively. The fitted mortality rate for a non/ex-smoker aged 50-64 years old is 2.76. The estimated mortality rate for an individual aged 65-74 years old who is a smoker is: 2.76 + 2.84 + 6.01 + 0.71 = 12.32. 4.10: Additive models The meaning of interaction on the additive scale The sign (positive or negative) and size of an interaction coefficient may be used to assess whether additive effects are consistent with the data. The following interpretation is applicable provided that each of (i) the rate difference for x1 in the baseline group of x2 is greater than zero and (ii) the rate difference for x2 in the baseline group of x1 is greater than zero OR Both these rate difference are less than zero, i.e. that these rate difference are in the same direction in terms of whether they are above or below zero.

  • 22

    In our example, both rate differences are greater than zero so we can interpret the interaction coefficient in the following way. An interaction coefficient close to zero suggests additive effects. A large negative interaction coefficient suggests less than additive effects. A large positive interaction coefficient suggests greater than additive effects (multiplicative, perhaps). In the additive model with interaction opposite, the interaction term is small (0.70) and there is no evidence it is different to zero (p=0.47), suggesting additive effects. 4.11: Additive models Five further points on interaction in additive or multiplicative models should be noted: If the effects of the exposures combine multiplicatively then they cannot combine

    additively, and vice versa. However, in reality it may not be possible to distinguish between the models

    No interaction on the multiplicative scale means there is an interaction on the additive scale, although it might not be reflected by P-values from the hypothesis tests. Hence, when reporting interaction results, it is important to specify the scale e.g. there is (no) heterogeneity of rate ratios or there is (no) heterogeneity of rate differences.

    Interaction tests have low power and should be interpreted cautiously - large p-values suggest data are compatible with no interaction but may mean not enough power to detect an interaction.

    Conducting lots of tests for interaction may lead to evidence for one or more interactions just by chance.

    The fitted values of the IHD mortality rates for the above additive model with an interaction (2.76, 8.77, 5.60, 12.32) are exactly equal to the observed data. The fitted values for the multiplicative model with an interaction (2.77, 8.85, 5.64, 12.43) are different to the observed data, but only because of rounding error (because in the calculations we worked with all values to only 2 decimal places). The fitted values from a model will always be exactly the same as the observed data when there are as many parameters in the model as there are data points (the models with interactions each have 4 parameters, which are estimated from the 4 observed rates in the data).

    Section 5: Choosing between multiplicative and additive models We want the model to be simple (few parameters) but to provide a good fit to the data (the fitted values to be similar to the observed data). The more parameters that are included in the model, the better the fit in general. When there are as many parameters as data points (combinations of explanatory variables), the model has a perfect fit (e.g. the above multiplicative and additive models with an interaction have a perfect fit). However, remember that models that have as many parameters as data points are not in general useful (the example here is an exception, as we have so few data

  • 23

    points). With more covariates (anything from 5 or more), such models are unnecessarily complicated. We would like to have a model that is "as simple as possible, but no simpler". In this example, we would either fit a multiplicative model with an interaction (since there was evidence against the

    null hypothesis of no interaction), or, an additive model without an interaction (as the data were compatible with the

    null hypothesis of no interaction). The additive model is preferable, based on statistical considerations alone, because it describes the data with fewer parameters (3 rather than 4). 5.1: Choosing between multiplicative and additive models It is also useful to look at the expected number of deaths from the multiplicative and additive models (with no interaction) and compare to the observed number of deaths. Expected deaths from multiplicative and additive models without interaction Smoking Age D : observed Expected

    multiplicative model Expected additive model

    0 0 240 265.63 237.95 0 1 243 217.37 249.74 1 0 376 350.37 380.23 1 1 293 318.63 284.07 The expected number of deaths are closer to the observed number of deaths for the additive model compared with the multiplicative model, suggesting the additive model with no interaction fits the data better than the multiplicative model with no interaction. 5.2: Choosing between multiplicative and additive models Our preference is for a model that is as simple as possible while providing an adequate fit to the data. Hence, we might avoid models containing interaction terms, since these generally complicate interpretation and presentation of the results. This suggests a general strategy of trying both multiplicative and additive models (without interaction terms), and choosing whichever provides the better fit based on the test for interaction and comparing the expected number of deaths (from models with no interaction) with the observed number of deaths . There are several caveats, however: Unless there are very large amounts of data, it is often difficult to distinguish

    between the fit of the alternative models, because interaction tests have low power.

    Multiplicative models have a number of desirable mathematical properties which

  • 24

    make them easier to work with, and this is one of the reasons why they are much more commonly used. Additive models tend to have convergence problems and therefore they generally take longer to fit (sometimes they fail to converge). Also, the Wald-based confidence intervals and P-values from additive models can be misleading.

    5.3: Choosing between multiplicative and additive models It is often impossible to distinguish clearly on purely statistical criteria between the fit of additive and multiplicative models. Despite this, the implications of these differing models, for example, for the fitted effects of various combinations of the risk factors, can be very different. It is important, therefore, to use any information we have about the biological mode of action of the exposures to select an appropriate model formulation. Biological considerations For two independent exposures for which the pathways of disease causation are separate, the effects are more likely to be additive and an additive model may be more appropriate. Where exposures have the same pathway, the effects are more likely to multiply, in which case a multiplicative model is more appropriate. Click below for some examples. Interaction: Button: Example 1 (pop up box appears): Example 1 Transfusion of contaminated blood and sexual exposure to an infected partner are two independent pathways to HIV infection. You might expect the effect of these two exposures to combine additively. Interaction: Button: Example 2 (pop up box appears): Example 2 Condom use and number of sexual partners relate to the same pathway of infection. You could assume that the effect of not using a condom is to multiply the risk associated with the level of sexual exposure (the effect will be more than one, since not using a condom is expected to increase the risk of infection). Interaction: Button: Example 3 (pop up box appears): Example 3 A more complex example is that of multistage models of carcinogenesis. These have helped explain the way in which incidence rates of a cancer are related to age and to the level and duration of various exposures. In its simplest form, the model assumes that cells are initially "normal" (Stage 0), and may then undergo a series of transitions through stages 1,2, etc, with each transition occurring with low probability for any individual cell. If and when it undergoes the kth transition (to stage k), the cell undergoes malignant replication and the cancer occurs. Data for many cancers appear consistent with this model (often k=5 or 6). Exposures to risk factors are assumed to have their effect by increasing the rate of transition at one or

  • 25

    more stages of the process. Data on the effects of a particular exposure on risk are used to make inferences about the stage or stages at which that exposure has its effect. For the joint effect of two exposures, the general conclusion is that if they act at the same stage their effects can be expected to combine additively, while if they act at different stages their effects can be expected to combine multiplicatively. Section 6: Summary Multiplicative Models: Ratios are useful for studying aetiology Many effects combine multiplicatively Models usually converge Models easily fitted in standard software Additive Models: Differences are useful public health measures Some effects known to combine additively Models sometimes do not converge

    2.1: Planning your study3.1: Background: Multiplicative models3.2: Background: Multiplicative models3.3: Background: Multiplicative models4.1: Additive models4.2: Additive models4.3: Additive models4.4: Additive models4.5: Additive models4.6: Additive models4.7: Additive models4.8: Additive models4.9: Additive models4.10: Additive models4.11: Additive models5.1: Choosing between multiplicative and additive models5.2: Choosing between multiplicative and additive models5.3: Choosing between multiplicative and additive models