tuanv.nguyen%% - garvan.org.au · analysis using r # read data into r wt = read.csv("~/google...

40
Tuan V. Nguyen Garvan Ins)tute of Medical Research Sydney, Australia Garvan Ins)tute Biosta)s)cs Seminar 18/8/2015 © Tuan V. Nguyen

Upload: duongnhi

Post on 14-Jan-2019

218 views

Category:

Documents


0 download

TRANSCRIPT

Tuan  V.  Nguyen    Garvan  Ins)tute  of  Medical  Research  

Sydney,  Australia  

Garvan  Ins)tute    Biosta)s)cs  Seminar    18/8/2015   ©  Tuan  V.  Nguyen  

Analysis of covariance (ANCOVA)

•  Example  of  Before-­‐AMer  study    

•  Problems  with  percentage  change    

•  Introduc)on  to  ANCOVA  

 

Case 1: Goat study

•  40  goats  were  randomized  into  two  groups  of  treatment  (intensive  and  standard,  20  each)    

•  Outcome:  weight  gain    

•  Aim:  to  test  for  the  efficacy  of  treatment    

How to assess treatment effect

Common  strategy  

•  Calculage  percentage  weight  gain  =  (post-­‐treatment  –  baseline)/baseline  *  100    

•  Test  for  difference  in  weight  gain  between  treatment  groups  (using  t-­‐test)  

How to assess the efficacy?

Strategy  2  

•  Use  t-­‐test  to  compare  post-­‐treatment  weight  between  treatment  groups    

•  (Assuming  that  baseline  weight  is  comparable  between  treatment  groups)  

Analysis using R

# Read data into R

wt = read.csv("~/Google Drive/Garvan Lectures 2014/ANCOVA/goats.csv", header=T)

# Calculate percentage change

wt$PCT = ((wt$PostWt / wt$BaselineWt)-1)*100

attach(wt)

# analysis

boxplot(PCT ~ Treatment, col="blue")

hist(PCT, breaks=10, col="blue", border="white")

t.test(PCT ~ Treatment)

intensive standard

1020

3040

50

> t.test(PCT ~ Treatment)

data: PCT by Treatmentt = 1.5319, df = 37.991, p-value = 0.1338alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -1.89734 13.70172sample estimates:mean in group intensive mean in group standard 31.14783 25.24564

Histogram of PCT

PCT

Frequency

10 20 30 40 50 60

02

46

8

Problems with the analysis

•  Percentage  change  (PCT)  is  a  bad  choice  of  effect  metric    

•  Theore9cal:  PCT  maps  two-­‐dimensional  data  into  one-­‐vector  data  

•  Bias:  es)mated  PCT  is  NOT  equal  to  the  true  PCT    

•  Sta9s9cal  inefficiency:  PCT  is  a  ra)o,  very  sensi)ve  to  baseline  data  

•  OMen  non-­‐normally  distributed    

Regression toward the mean (RCT)

•  Offspring  of  tall  parents  tended  to  be  shorter  than  their  parents  

•  Offsprings  of  short  parents  tended  to  be  taller  than  their  parents      

 Sir  Francis  Galton  (1822  -­‐  1911)  

This  can  be  proved  by  mathema9cs  (easily!)    

A real example from the goat study

10

20

30

40

50

60

17.5 20.0 22.5 25.0 27.5 30.0BaselineWt

PCT

library(ggplot2)qplot(BaselineWt, PCT, color=Treatment, data=wt, geom=c("point", "smooth"), method="lm")

Implications of RTM

•  Individuals  with  "extreme"  baseline  values  tend  to  regress  back  to  their  mean  in  subsequent  measurements  WITHOUT  any  treatment  effect    

•  A  proper  analysis  of  change  must  take  into  RTM  effect    

Analysis  of  Covariance   Analysis of Covariance

Solution: Analysis of Covariance (ANCOVA)

•  A  form  of  mul)ple  linear  regression  analysis  

•  Informal  statement:  

Y  =  intercept  +  effect  of  covariate  +  effect  of  treatment  +  random  error    

• Where  "covariate"  is  a  confounding  factor  (eg  baseline  scores,  other  factors)  

 

ANCOVA for the goat study

•  Let  Y  =  post-­‐treatment  scores;  X  =  pre-­‐treatment  scores;  T  =  treatment  effect;  and  e  =  random  error  

•  A  formal  model  for  an  individual  i  is:  

Yi  =  α  +  βXi  +  γTi  +  εi    

•  Thus,  if  T  =  0,  the  mean  value    

Ŷ  =  α  +  β*mean(X)      

and  if  T  =  1,    

Ŷ  =  α  +  β*mean(X)  +  γ    =  α  +  β*mean(X)  +  γ      

We can implement ANCOVA in R

# Read data into R

wt = read.csv("~/Google Drive/Garvan Lectures 2014/ANCOVA/goats.csv", header=T)

attach(wt)

# analysis of covariance

model1 = lm(PostWt ~ BaselineWt + Treatment, data=wt)

summary(model1)

Result of analysis

> summary(model1)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 14.96661 1.75261 8.540 2.82e-10 ***BaselineWt 0.64863 0.07424 8.737 1.59e-10 ***Treatmentstandard -1.26486 0.51169 -2.472 0.0182 * ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.618 on 37 degrees of freedomMultiple R-squared: 0.6887, Adjusted R-squared: 0.6718 F-statistic: 40.92 on 2 and 37 DF, p-value: 4.214e-10

Comment:  the  treatment  effect  is  now  sta)s)cally  significant!  

Bayes factor (BF)

•  Recall  that  BF  is  a  metric  of  evidence.  It  is  defined  as    

P D |H1( )P D |H0( )

• Where  D  is  data,  H0  and  H1  are  hypothesis  

Interpretation of BF

BF     Interpreta9on  

>100     Decisively  favors  H1    

30  to  100   Very  strong  evidence  for  H1    

10  to  30   Strong  evidence  for    H1    

3  to  10   Substan)al  evidence  for  H1    

1  to  3   Weak  evidence  for    H1    

1   No  evidence    

0.3  to  1   Weak  evidence  for  H0  

0.1  to  0.3   Substan)al  evidence  for  H0  

0.03  to  0.1   Strong  evidence  for  H0  

0.01  to  0.03   Very  strong  evidence  for  H0  

<0.01   Decisive  evidence  for  H0  

BF in ANCOVA

> library(BayesFactor)

> model0 = lm(PostWt ~ BaselineWt, data=wt)

> model1 = lm(PostWt ~ BaselineWt + Treatment, data=wt)

> model1/model0

> model1/model0

Bayes factor analysis

--------------

[1] BaselineWt + Treatment : 2.961372 ±0.97%

BF in ANCOVA

> model1 = lm(PostWt ~ BaselineWt + Treatment, data=wt)> summary(model1)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 14.96661 1.75261 8.540 2.82e-10 ***BaselineWt 0.64863 0.07424 8.737 1.59e-10 ***Treatmentstandard -1.26486 0.51169 -2.472 0.0182 * ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.618 on 37 degrees of freedomMultiple R-squared: 0.6887,Adjusted R-squared: 0.6718 F-statistic: 40.92 on 2 and 37 DF, p-value: 4.214e-10

> ttest.tstat(t=2.472, n1=100, n2=100, nullInterval=c(0, Inf))$bf[1] 1.645503

Interac9on  effect     Analysis of Covariance

No interaction effect

INCOME

100000 80000 60000 40000 20000 0

HA

PP

Y

10

9

8

7

6

5

4

3

2

1 0

Men

Women

Interaction effect

INCOME

100000 80000 60000 40000 20000 0

HA

PP

Y

10

9

8

7

6

5

4

3

2

1 0

Concept of "interaction"

•  “Differences  in  the  rela)onship  (slope)  between  two  variables  for  each  category  of  a  third  variable”  

•  The  rela9onship  between  X  and  Y  is  dependent  on  a  third  variable  Z      

Men: Happy = a1 + b1*income + e

Women: Happy = a2 + b2*income + f

Interac9on  is  present  when  b1  ≠  b2  

RCT of depression

•  Aim:  to  es)mate  the  efficacy  of  an  an)depressant  (imipramine)  and  cogni)ve  behavior  rx  (CBT)  

•  Design:  Randomized  controlled  trial,  pa)ents  were  randomized  into  4  groups:  

– Double  placebo  (placebo  drug  and  placebo  CBT,  n=35)  –  Imipramine  and  placebo  CBT  (n  =  35)  

–  Placebo  imipramine  and  CBT  (n  =  35)  

–  Imipramine  and  CBT  (n  =  35)  

Results

How to assess the efficacy?

Strategy  1  

•  Calculate  difference  between  baseline  and  post-­‐treatment  (Diff)  

•  Use  t-­‐test  to  compare  Diff  between  Cogni)ve  Rx  and  placebo  

How to assess the efficacy?

Strategy  2  

•  Use  analysis  of  covariance  (ANCOVA)  to  compare  post-­‐treatment  scores  between  Imipramine  and  placebo,  adjus9ng  for  baseline  scores  

 

R codes for the RCT

dep = read.csv("~/Google Drive/Garvan Lectures 2014/ANCOVA/ancova.csv", header=T)

head(dep)

attach(dep)

plot(PostScore ~ BaselineScore, pch=16, col="blue")

15 20 25 30 35

1015

2025

30

BaselineScore

PostScore

library(ggplot2)p = ggplot(dep, aes(x=BaselineScore, y=PostScore, color=as.factor(CognitiveRx)))p = p + geom_point(aes(col=as.factor(CognitiveRx)))p + geom_smooth(method="lm")

10

15

20

25

30

15 20 25 30 35BaselineScore

PostScore as.factor(CognitiveRx)

1

2

ANCOVA model

•  Let  Y  =  post-­‐rx  score,  X  =  baseline  score,  T  =  cogni)ve  treatment,  the  model  is:    

Y  =  b0  +  b1*X  +    b2*T  +  b3*X*T  +  e    

•  Using  R:  model1 = lm(PostScore ~ BaselineScore + CognitiveRx, data=dep)

model2 = lm(PostScore ~ BaselineScore + CognitiveRx + CognitiveRx:BaselineScore, data=dep)

 

> summary(model1)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 10.38177 2.51911 4.121 7.46e-05 ***BaselineScore 0.53552 0.08181 6.546 2.11e-09 ***CognitiveRx -1.62178 0.75895 -2.137 0.0349 * ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.974 on 107 degrees of freedom (30 observations deleted due to missingness)Multiple R-squared: 0.3055, Adjusted R-squared: 0.2925 F-statistic: 23.54 on 2 and 107 DF, p-value: 3.379e-09

> summary(model2) Estimate Std. Error t value Pr(>|t|) (Intercept) 18.2054 7.3240 2.486 0.0145 *BaselineScore 0.2467 0.2667 0.925 0.3570 CognitiveRx -6.6827 4.5136 -1.481 0.1417 BaselineScore:CognitiveRx 0.1867 0.1641 1.137 0.2579 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.968 on 106 degrees of freedom (30 observations deleted due to missingness)Multiple R-squared: 0.3139, Adjusted R-squared: 0.2945 F-statistic: 16.17 on 3 and 106 DF, p-value: 1.008e-08

model3 = lm(PostScore ~ BaselineScore + CognitiveRx + Imipramine, data=dep)

> summary(model3)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 13.07321 2.80369 4.663 9.13e-06 ***BaselineScore 0.51987 0.08095 6.422 3.90e-09 ***CognitiveRx -1.56658 0.74814 -2.094 0.0386 * Imipramine -1.54872 0.75074 -2.063 0.0416 * ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.915 on 106 degrees of freedom (30 observations deleted due to missingness)Multiple R-squared: 0.3323, Adjusted R-squared: 0.3134 F-statistic: 17.59 on 3 and 106 DF, p-value: 2.444e-09

Main effect model

model3 = lm(PostScore ~ BaselineScore + CognitiveRx + Imipramine + CognitiveRx:Imipramine, data=dep)

> summary(model3)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 19.8959 4.3582 4.565 1.36e-05 ***BaselineScore 0.5222 0.0798 6.543 2.25e-09 ***CognitiveRx -6.0970 2.3561 -2.588 0.0110 * Imipramine -6.1059 2.3695 -2.577 0.0114 * CognitiveRx:Imipramine 2.9875 1.4756 2.025 0.0455 *

---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.859 on 105 degrees of freedom (30 observations deleted due to missingness)Multiple R-squared: 0.3574, Adjusted R-squared: 0.3329 F-statistic: 14.6 on 4 and 105 DF, p-value: 1.631e-09

Interaction effect model

Summary

•  ANCOVA  is  a  sta)s)cal  method  very  useful  for  the  analysis  of  change  taking  into  account  effects  of  covariate  

•  In  before-­‐aMer  experiments,  ANCOVA  is  highly  preferable  to  the  percentage  change  from  baselin  analysis  

•  Can  implement  Bayesian  analysis  in  ANCOVA  using  R