sjs sdi_31 design of statistical investigations stephen senn 3. design of experiments 1 some basic...
TRANSCRIPT
SJS SDI_3 1
Design of Statistical Investigations
Stephen Senn
3. Design of Experiments 1
Some Basic Ideas
SJS SDI_3 2
Elements of an ExperimentThe “Nouns”
• Experimental material– Basic units– Blocks– Replications
• Treatments – Orderings– Dimensions– Combinations
SJS SDI_3 3
Elements of an ExperimentThe “Verbs”
• Allocation– Which material gets which treatment
• For example using some form of randomisation
• Conduct– How will it all be carried out?
• Measuring– When to measure what
• Analysis
SJS SDI_3 4
Exp_1 Rat TXB2
• Experimental material – 36 Rats
• Treatments to be studied– 6 in a ‘one-way layout’
• 4 new chemical entities
• 1 vehicle
• 1 marketed product
SJS SDI_3 5
Caution!!!!!• In practice such things are not given
• Material– Why rats and not mice, dogs, or guinea-pigs?– Why 36?
• Treatments– Why these 6?
• In practice the statistician can be involved in such decisions also
SJS SDI_3 6
Exp_1 Rat TXB2Allocation
• If rats are not differentiable in any way we can determine, we might as well allocate at random?
• Unconstrained randomisation not a good idea, however. Some treatments will be allocated to few rats.
• So constrain to have 6 rats per group
SJS SDI_3 7
S-Plus Randomisation#M2 Rat TXB2 Randomisation#Vector of treatmentstreat<-c(rep("V",6),rep("M",6),rep("a",6), rep("b",6),rep("c",6),rep("d",6)) #Random number for each ratrnumb<-runif(36,0,1)#Sort rats by random number rat<-sort.list(rnumb)#Join rats and treatments temp.frame<-data.frame(rat,treat)#Sort rows by ratdes.frame<-sort.col(temp.frame,c("rat","treat"),"rat")#Print designdes.frame
We shall illustrate an alternative using the sample function later in the course
SJS SDI_3 8
Result of Randomisation rat treat 9 1 M22 2 b 4 3 V33 4 d13 5 a11 6 M10 7 M31 8 d 7 9 M19 10 b 3 11 V25 12 c18 13 a
rat treat 12 14 M17 15 a20 16 b24 17 b34 18 d26 19 c23 20 b30 21 c16 22 a21 23 b32 24 d28 25 c 8 26 M
rat treat 14 27 a 1 28 V29 29 c36 30 d 6 31 V 5 32 V35 33 d15 34 a 2 35 V27 36 c
SJS SDI_3 9
Exp_1 Rat TXB2Conduct
• We will not cover this in this course
• This does not mean that this is not important
• In the Exp_1 example precise instructions might be necessary for treating the rats.
SJS SDI_3 10
Exp_1 Rat TXB2Measurement
• Obviously we have to decide what it is important to measure
• Here it has been decided to measure TXB2 a marker of Cox-1 activity
• Cox = cyclooxygenase
• Analgesics are designed to inhibit Cox-2, which is involved in synthesis of inflammatory prostaglandins
SJS SDI_3 11
Measurement (Cont)
• However they also tend to inhibit Cox-1 which is involved in synthesis of the prostaglandins that help maintain gastric mucosa
• Cox-1 inhibition can lead to ulcers
• Ulcers are an unwanted side-effect of Non Steroidal Anti-inflammatory Drugs (NSAIDs)
SJS SDI_3 12
The Moral
• Even ‘simple’ experiments may involve complex subject matter-knowledge
• It may be dangerous for the statistician to assume that all that is being produced is sets of numbers, details being irrelevant
• Team work may be necessary
SJS SDI_3 13
Analysis
• One-way layout
• Six treatments
• Balanced design
• “No-brainer” is one-way ANOVA– We shall look at the maths of one-way ANOVA
in more detail later.– For the moment take this as understood
SJS SDI_3 14
S-PLUS ANOVA Code#Analysis of TXB2 data#Set contrast optionsoptions(contrasts=c(factor="contr.treatment",ordered="contr.poly"))#Input datatreat<-factor(c(rep(1,6),rep(2,6), rep(3,6),rep(4,6),rep(5,6),rep(6,6)),labels=c("V","M","a","b","c","d")) TXB2<-c(196.85,124.40,91.20,328.05,268.30,214.70,2.08,1.97,4.80,5.01,2.52,9.35,315.85,75.60,322.80,212.15,42.95, 111.90,127.95,81.75,52.70,352.85,198.80,107.65,83.19,66.80,81.15,39.00,61.96,87.00,74.48,60.00,77.00,42.00,48.95,66.30)fit1<-aov(TXB2~treat)#ANOVAsummary(fit1)
SJS SDI_3 15
S-PLUS Output
summary(fit1) Df Sum of Sq Mean Sq F Value Pr(F) treat 5 184595.5 36919.11 6.313142 0.000409356
Residuals 30 175439.3 5847.98
So there is highly significant difference between treatments but this does not make this an adequate analysis
SJS SDI_3 16
S-PLUS Diagnostic Code
#Diagnostic plot datapar (mfrow=c(2,2))plot(treat~TXB2)hist(resid(fit1),xlab="residual")plot(fit1$fitted.values,resid(fit1),xlab="fitted",ylab="residual")abline(h=0)qqnorm(resid(fit1),xlab="theoretical",ylab="empirical")qqline(resid(fit1))
SJS SDI_3 17
TXB2
tre
at
0 100 200 300
12
34
56
-100 0 100 200
02
46
81
01
2
residual
fitted
resi
du
al
0 50 100 150 200
-10
00
10
02
00
theoretical
em
pir
ica
l
-2 -1 0 1 2
-10
00
10
02
00
SJS SDI_3 18
Model Failure
• Histogram of residuals has heavy tails
• QQ Plot shows clear departure from Normality
• Variance increases with mean– Suggests log-transformation
SJS SDI_3 19
LTXB2
tre
at
1 2 3 4 5 6
12
34
56
-1.0 -0.5 0.0 0.5 1.0
02
46
residual
fitted
resi
du
al
0 50 100 150 200
-1.0
-0.5
0.0
0.5
1.0
theoretical
em
pir
ica
l
-2 -1 0 1 2
-1.0
-0.5
0.0
0.5
1.0
SJS SDI_3 20
Exp_2: A Simple Design Problem(The simplest)
• You have N experimental units in total
• They are completely exchangeable
• You have two treatments A and B– with no prior knowledge of their effects
• You wish to compare A and B– continuous outcome assumed Normal
• How many units for A and for B?
SJS SDI_3 21
Solution is obvious
• Allocate half the units to one treatment and half to the other– Assuming that there is an even number of units
• However, we should go through the design cycle
• What sort of data will we collect?
• What will we do with them?
SJS SDI_3 22
Basic Design CycleObjective
Tentative Design
Potential Data
Possible Analysis
Possible Conclusions
Relevant factors
SJS SDI_3 23
The Anticipated Data
• Two mean outcomes
• Variances expected to be the same– Assumption but
• Reasonable under null hypothesis
• No other assumption is more reasonable given that we know nothing about the treatments
• We will calculate the contrast between these means
SJS SDI_3 24
1 2
21 2
1 2
21 2 1 2
1 2
2 2 2 21 2
1 2
ˆ
ˆvar( ) (1/ 1/ )
(1/ 1/ ) ( )
/ /
Y Y
n n
n n N
f n n n n N
dfn n N
ddf df
n ndn dn
SJS SDI_3 25
Now set the derivatives equal to zero
1 2
2 21
2 22
0 (1)
/ 0 (2)
/ 0 (3)
n n N
n
n
From (2) and (3) we have
2 2 2 21 2
1 2
/ /n n
n n
SJS SDI_3 26
So What!!??
• Solution is obvious
• Statistical theory does not seem to have helped us very much
• However, this was a trivial problem
• We now try a slightly more complicated experiment
• This leads to a non-trivial problem
SJS SDI_3 27
Exp_3A More Complicated Case
• Now suppose that we are comparing k experimental treatments to a single control.
• The treatments will not be compared to each other.
• How many units should we allocate to each treatment?– We assume that variances do not vary with
treatment: homoscedasticity
SJS SDI_3 28
Exp_3 Continued
• Arguments of symmetry suggest the active treatments be given to the same number of units, say n.
• Suppose that m units will be allocated the control.
• With N units in total we have N = m + kn
SJS SDI_3 29
We consider the variance of a typical contrast
2 2/ /m n
Incorporating the necessary constraint using a Lagrange multiplier we obtain the following objective function
2 2/ / ( )f m n N m nk
And proceed to minimise this by setting the partial derivatives with respect to m, n and equal to zero. (Note that we assume that k and N are fixed in the design specification.)
SJS SDI_3 30
Set derivatives equal to zero.
Solution gives
2 2
2 2
/
/ /
/ /
df d N m nk
df dm m
df dn n k
Setting equal to zero we have
2 2
2 2
(4)
/ (5)
/( ) (6)
N m nk
m
kn
SJS SDI_3 31
From (4) and (5) we have
2 2kn m n m k
Substituting in (4) we have
1
1
N m km k
N m k
Nm
k
SJS SDI_3 32
Check
• Exp_2 was a special case of Exp_3 with k = 1
• So our general solution must give the same answer as the special case when k = 1
• But when k = 1 the formula yields m = N/2, which is the solution we reached before
SJS SDI_3 33
Allocation as a function of number of experimental treatments
*
*
*
**
** * *
Pro
po
rtio
n o
f u
nits
on
co
ntr
ol
2 4 6 8
0.0
0.2
0.4
0.6
#
#
##
##
# # #
Number of experimental treatments
2 4 6 8
0.0
0.2
0.4
0.6
naive = *optimal = #
SJS SDI_3 34
Exp_3 Concluded
• The “optimal solution” was not easy to guess
• It consists of more units to the control than to the experimental treatment
• Lesson: be careful!
SJS SDI_3 35
Questions
• What are the practical problems in implementing the solution we found for Exp_3?
• Why might this not be a good solution after all?
• Are there any implications for the design of Exp_1?