chapter 1 introduction - ucla statisticshqxu/stat201a/ch1-4page.pdf · • note ⌧k = pk1 i=1 ⌧i...
TRANSCRIPT
Stats 201A Research Design, Sampling and Analysis
Part I. Research Design
• Textbook: Montgomery (2005+). Design and Analysis of Experiments.
• Reference: Wu and Hamada (2009). Experiments: Planning, Analysisand Optimization.
• Reference: Faraway (2005). Linear Models with R.
Chapter 1 Introduction
Topics: Basic principles, guidelines, history
1.1 Strategy of Experimentation
• one of the most common activities
• covers a wide range of applications
• used to understand and/or improve a system or a process
• deliberately apply some treatments to observe the change
All experiments are designed experiments, some are poorly designed, someare well-designed.
1.2 Some typical applications of experimental designs
1. Comparing treatments
2. Screening variables
3. Characterizing a process
4. Optimizing a process
1
1.3 Basic Principles: replication, randomization and blocking
1. Replication means that each treatment is applied to di↵erent experi-mental units.
• enables the estimation of the magnitude of experimental error.
• decreases the variance of the treatment e↵ect estimates.
• distinction between replicates and repetitions.
The estimator from replicates has a smaller variance. (Why?)
2. Randomization should be applied to
• the allocation of units to treatments,
• the order in which the treatments are applied in performing theexperiment
• the order in which the responses are measured
It provides protection against unknown variables.
It prevents subjective assignment.
It provides a basis for inference in analyzing the experiments.
3. Blocking deals with nuisance factors (that have significant e↵ects onthe response, but are not interested).
• A block is a group of homogeneous units.
• For blocking to be e↵ective, the units should be arranged so thatthe within-block variation is much smaller than the between-blockvariation.
Block what you can and randomize what you cannot.
2
An Example
To compare 2 wines A and B. Suppose 10 people are available.
• What response to measure?
Design 1: Let people taste A or B randomly by flipping a coin.
Design 2: Randomly choose 5 people to taste A and other 5 to taste B.
Design 3: Have all 10 people to taste both A and B.Flip a coin to decide the order of A and B.
• How are the principles applied in the designs?
• Which design will you recommend?
1.4 Guidelines for designing experiments
1. Recognition of and statement of the problem.
2. Choice of factors, levels and range
• A factor is a variable that is studied in the experiment.
• These values are referred to as levels or settings.
• A treatment is a combination of factor levels.
• Design factors vs. nuisance factors.
• Factors may be quantitative and qualitative.
3. Selection of the response variable
• Make sure that the variable really provides useful information.
• Responses may be discrete or continuous.
• Continuous responses are generally preferable.
4. Choice of experimental design
• Main topic
• A poor design captures little information which no analysis can res-cue.
3
• The results may be obvious for a well planned experiments.
5. Performing the experiment
• Use planning matrix (with actual values or settings of the factors)
• Monitor the process carefully
6. Statistical analysis of the data
• Another main topic
• Graphs, models, hypotheses tests, diagnostics
7. Conclusions and recommendations
• A confirmation experiment is worthwhile to validate the conclusions.
1.5 A Brief History of Statistical Design
1. R.A. Fisher in the 1930’s.
2. G.E.P. Box in the 1950’s.
3. G. Taguchi in mid-1980’s.
4. Modern era, since 1990
1.6 Summary: Using statistical techniques in experimentation
1. Use your nonstatistical knowledge of the problem
2. Keep the design and analysis as simple as possible
3. Recognize the di↵erence between practical and statistical significance
4. Experiments are usually iterative
4
Chapter 2 Simple Comparative Experiments
Topics: Paired comparison designsI go over Section 2.5 only, but you are expected to understand the whole
chapter.
2.5 Paired Comparison Design
The drink experiment:
• Goal: to compare two drinks: pepsi and coke
• students are asked to taste both drinks and assign scores
• scores: 1 (very bad)–9 (excellent)
• the order of drinking pepsi and coke is determined by flipping a coin
Students StandardDrink 1 2 3 4 5 6 7 8 9 Average Deviationcoke 5 7 5 9 7 3 3 8 8 6.11 2.20pepsi 6 5 8 7 4 8 2 6 7 5.89 1.96di↵ -1 2 -3 2 3 -5 1 2 1 0.22 2.68
This is a paired comparison design because each student tastes bothdrinks.
• For paired designs, analyze the di↵erence.
Think about: What would be an unpaired design?
• An unpaired design is a completely randomized design
Hypotheses: H0 : µ1 = µ2 vs. H1 : µ1 6= µ2.
The paired t test:
tpaired
=d
sd
/p
n,
where dj
= y1j � y2j are the di↵erences, d and sd
are the sample mean andstandard deviation.
Under H0, tpaired has a t distribution with df=n� 1.
5
• reject H0 at level ↵ if|tpaired
| > t↵/2,n�1.
For the drink experiment,
tpaired
=0.22
2.68/p
9=
0.22
0.89= 0.25.
• Accept H0 at 5% level because 0.25 < t.05/2,8 = 2.306.
• P-value =Prob(|t8| > 0.25) = 0.81.
The unpaired t test: (one of the two-sample t-tests in Section 2.4)
tunpaired
=y1 � y2
sp
p1/n1 + 1/n2
, s2p
=(n1 � 1)s21 + (n2 � 1)s22
n1 + n2 � 2
where yi
and s2i
are the sample mean and variance for the ith treatment.Under H0, tunpaired has a t distribution with df=2n � 2 (assuming equal
population variances: �21 = �2
2).
• reject H0 at level ↵ if
|tunpaired
| > t↵/2,2n�2.
A wrong analysis: using unpaired test,
tunpaired
=0.22p
(2.202 + 1.962)/9=
0.22
0.98= 0.22
• Still accept H0 at 5% level because 0.22 < t.05/2,16 = 2.120.
• P-value =Prob(|t16| > 0.22) = 0.83.
Which design is more powerful? The answer depends
• If there is a large unit to unit variation, a paired design is more e↵ective.
• Otherwise, an unpaired design is more e↵ective.
Recall: For blocking to be e↵ective, the units should be arranged so thatthe within-block variation is much smaller than the between-block variation.
6
Alternative analysis using ANOVA and F test.
• A paired design is a randomized block design with blocks of size two.
The ANOVA table is
Df Sum Sq Mean Sq F value Pr(>F)
student 8 41.000 5.125 1.4247 0.3142
drink 1 0.222 0.222 0.0618 0.8100
Residuals 8 28.778 3.597
Neither the treatment variable (drink) nor the blocking variable (student)is significant.
If incorrectly analyzing the experiment as unpaired design, the ANOVAtable is
Df Sum Sq Mean Sq F value Pr(>F)
drink 1 0.222 0.222 0.051 0.8243
Residuals 16 69.778 4.361
Note the connection between F and t tests:
• F = t2
Residual analysis.What is the linear model?
What are the assumptions?
Are the assumptions reasonable?
Think again: Is pairing necessary for the drink experiment? Why?
7
coke pepsi
23
45
67
89
drink
scor
es c
c
c
c
c
c c
c c
2 4 6 8
02
46
810
students
scor
es pp
pp
p
p
p
pp
3 4 5 6 7 8
−3−1
12
3
Fitted values
Res
idua
ls
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
Residuals vs Fitted
6
15
3
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
−2−1
01
2
Theoretical Quantiles
Stan
dard
ized
resi
dual
sNormal Q−Q plot
6
15
3
8
Chapter 3 Experiments With a Single Factor
Topics: Analysis of Variance (ANOVA), constraints, multiple comparison
The fiber experiment:
• Formulation of a new “synthetic” fiber to be used to make cloth forshirts.
• The response variable is tensile strength.
• To determine the “best” level of cotton (in weight %) to combine withthe synthetics.
• Cotton weight content can vary between 10–40%.
• Chooses 5 levels of cotton weight content: 15, 20, 25, 30, and 35%.
• The experiment is replicated 5 times.
• Runs made in random order.
Weight Tensile Strength Totals Averages15 7 7 11 15 9 49 9.820 12 17 12 18 18 77 15.425 14 18 18 19 19 88 17.630 19 25 22 19 23 108 21.635 7 10 11 15 11 54 10.8
This is a completely randomized design, also called one-way layout.
Q: How are the principles applied here?
Q: How many ways to assign the five treatments to experimental units?
Q: How many ways to run the experiments?
9
Data for one-way layout (one factor with k levels)
Treatment (level) Observations Totals Averages1 y11 y12 · · · y1n1 y1· y1·2 y21 y22 · · · y2n2 y2· y2·...
...... · · ·
......
...k y
k1 yk2 · · · y
knk yk·
yk·
The means model is:
yij
= µi
+ ✏ij
, i = 1, . . . , k; j = 1, . . . , ni
, (1)
where yij
is the jth observation with treatment i, µi
is the mean of theith treatment (or factor level), ✏
ij
is a random error, k is the number oftreatments, and n
i
is the number of observations with treatment i.
Assume: ✏ij
⇠ NID(0, �2) (i.i.d. normal error)
linear model (or e↵ects model) (by substituting µi
= µ+ ⌧i
) is:
yij
= µ+ ⌧i
+ ✏ij
, i = 1, . . . , k; j = 1, . . . , ni
, (2)
where µ is the overall mean and ⌧i
is the ith treatment e↵ect.In terms of the general linear model
y = X� + ✏,
y = (7, 12, 14, 19, 7, 7, 17, 18, 25, 10, 11, 12, 18, 22, 11, 15, 18, 19, 19, 15, 9, 18, 19, 23, 11)T ;
� = (µ, ⌧1, ⌧2, ⌧3, ⌧4, ⌧5)T
X =
0
BBBB@
X0
X0
X0
X0
X0
1
CCCCA, X0 =
0
BBBB@
1 1 0 0 0 01 0 1 0 0 01 0 0 1 0 01 0 0 0 1 01 0 0 0 0 1
1
CCCCA. (3)
However, XTX is singular and (XTX)�1 does not exist.
• The model (2) is over-parameterized.
• Need one constraint on the parameters.
10
The Zero-Sum Constraint iskX
i=1
⌧i
= 0. (4)
• Can drop any ⌧i
, usually ⌧k
.
• � = (µ, ⌧1, ⌧2, . . . , ⌧k�1)T .
• Note ⌧k
= �
Pk�1i=1 ⌧i
• Be careful about the model matrix.
For the fiber experiment,
� = (µ, ⌧1, ⌧2, ⌧3, ⌧4)T
X =
0
BBBB@
X0
X0
X0
X0
X0
1
CCCCA, X0 =
0
BBBB@
1 1 0 0 01 0 1 0 01 0 0 1 01 0 0 0 11 �1 �1 �1 �1
1
CCCCA. (5)
� = (µ, ⌧1, ⌧2, ⌧3, ⌧4) = (15.04,�5.24, 0.36, 2.56, 6.56)T .
Interpretation of the parameters and estimates
•
1k
Pk
i=1E(yij
) = 1k
Pk
i=1(µ+ ⌧i
) = µ+ 0 = µ
• E(yij
)� µ = µ+ ⌧i
� µ = ⌧i
for i � 1.
In word, µ represents the grand mean, ⌧i
is the o↵set between the expected
treatment i response and the average response.
The estimates under the zero-sum constraint are
• µ = y··
• ⌧i
= yi·
� y··
for i = 1, . . . , k.
Notation
yi·
=1
ni
niX
j=1
yij
, y··
=1
N
kX
i=1
niX
j=1
yij
, N =kX
i=1
ni
Remarks
11
• There are other constraints other than the zero-sum constraint.
• The model matrix X and LSE � depend on the choice of constraints,
• The fitted value y = X� and residuals y � y do not.
• The di↵erence ⌧i
- ⌧j
does not.
In R, a baseline or treatment constraint is the default.
options(contrasts=c("contr.treatment", "contr.poly")) # default baseline constraints
options(contrasts=c("contr.sum", "contr.poly")) # specify zero-sum constraints
The estimates under the baseline constraint (⌧1 = 0) are
• µ = y1·
• ⌧i
= yi·
� y1· for i = 1, . . . , k.
Hypotheses and ANOVA
• H0: there is no di↵erence between the treatments, i.e., µ1 = · · · = µk
(or⌧1 = · · · = ⌧
k
= 0).
• H1: there is di↵erence between the treatments, i.e., not all µi
are thesame.
The ANOVA table for the one-way layout is
Degrees of Sum of MeanSource Freedom Squares Squares F
treatment k � 1 SSTr =P
k
i=1 ni
(yi· � y··)
2 MSTr = SSTr/(k � 1) MSTr/MSE
residual N � k SSE
=P
k
i=1
Pni
j=1 (yij � yi·)
2 MSE
= SSE
/(N � k)
total N � 1 SSTotal
=P
k
i=1
Pni
j=1 (yij � y··)2
where N =P
k
i=1 ni
is the total number of observations.The ANOVA for the one-way layout can be derived directly using decom-
position:
kX
i=1
niX
j=1
(yij
� y··
)2 =kX
i=1
ni
(yi·
� y··
)2 +kX
i=1
niX
j=1
(yij
� yi·
)2
SSTotal
= SSTr + SSE
.
12
The treatment sum of squares (SS) is also called the between-treatment SSand the residual SS called the within-treatment SS.
Expected Mean Squares
E(MSE
) = �2, E(MSTr) = �2 +1
k � 1
kX
i=1
ni
(⌧i
� ⌧)2
The F statistic
F =MSTr
MSE
=
Pk
i=1 ni
(yi·
� y··
)2/(k � 1)P
k
i=1
Pni
j=1 (yij � yi·
)2/(N � k), (6)
has an F distribution with parameters k � 1 and N � k under H0.
• Reject H0 at the ↵ level if F > Fk�1,N�k,↵
.
For the fiber experiment, the ANOVA table is
Source DF SS MS F
cotton 4 475.76 118.94 14.757residual 20 161.20 8.06total 24 636.96
• F = 14.757 > F4,20,.05 = 2.87
• Decision: reject H0 at the 5% level
• Conclusion: the treatments are di↵erent.
• P-value=P (F4,20 � 14.757) < .001.
Once H0 is rejected, an immediate question is: what pairs of treatmentsare di↵erent? Note that we are testing more than one pairs, that is, we aredoing multiple comparisons. See the text for various methods of multiplecomparisons (Section 3.5).
13
Multiple Comparisons
Suppose that the null hypothesis H0 : µ1 = · · · = µk
, is rejected. An im-mediate question is to determine which pairs of treatments are significantlydi↵erent.
The t statistic to test µi
= µj
vs. µi
6= µj
is
tij
=point estimate
s.e. of point estimate=
yj·
� yi·
�p
1/nj
+ 1/ni
, (7)
where � =p
MSE
(has N � k df).Reject µ
i
= µj
at level ↵ if
|tij
| > tN�k,↵/2, (8)
• This test is valid for testing one pair of treatments.
• but cannot be applied in the multiple comparisons of treatments.
• EER (experimentwise error rate) = probability of declaring at leastone pair of treatments significantly di↵erent under H0
• EER > ↵ when k0 > 1 comparisons are made.
The Tukey Method declares “µi
di↵erent from µj
” if
|tij
| >1p
2qk,N�k,↵
, (9)
where qk,N�k,↵
is the upper ↵ quantile of the Studentized range distributionwith parameter k and N � k degrees of freedom.
• For the balanced one-way layout (i.e., ni
= n), EER is exactly ↵.
The simultaneous confidence intervals for µi
� µj
are
yi·
� yj·
±
1p
2qk,N�k,↵
�q
1/ni
+ 1/nj
, (10)
for all i and j pairs.For the fiber experiment, the t statistics are
14
1 vs. 2 1 vs. 3 1 vs. 4 1 vs. 5 2 vs. 3 2 vs. 4 2 vs. 5 3 vs. 4 3 vs. 5 4 vs. 53.12 4.34 6.57 0.56 1.23 3.45 -2.56 2.23 -3.79 -6.01
•
1p
2qk,N�k,0.05 =
1p
2q5,20,0.05 =
4.23p
2= 2.99
• six pairs (1 vs. 2, 1 vs. 3, 1 vs. 4, 2 vs. 4, 3 vs. 5, 4 vs. 5) are di↵erent.
Note qk,N�k,↵
(or q↵
(p, f)) can be found in tables in Appendix VII (p.621-622) for ↵ =.05 and .01. In R, use
> qtukey(1-.05, 5, 20)
[1] 4.231857
• There are several other related procedures, see Section 3.5.
Regression Models
In the fiber experiment, the factor (cotton weight) is quantitative and thelevels are evenly spaced (15, 20, 25, 30 and 35). The experiment providesstrong evidence that cotton weight percent a↵ects the tensile strength of thefiber. A natural question is to model the relationship between strength andcotton weight percent. This can be done with regression.
We can fit a polynomial model in terms of x to the data, say
y = �0 + �1x+ �2x2 + . . .+ �
p
xp + ✏. (11)
We can start with a small p (say p = 1). If the model does not fit thedata, we can increase p. Alternatively, we can start with a large p (say p = 4here) and then drop insignificant terms. The fitted models with p = 1, 2, 3, 4are:
y = 10.940 + 0.164x (12)
y = �39.9886 + 4.5926x� 0.0886x2 (13)
y = 62.6114� 9.0114x+ 0.4814x2 � 0.0076x3 (14)
y = �406.4000 + 73.7767x� 4.8077x2 + 0.1377x3 � 0.001453x4 (15)
where x are in the original unit (i.e., actual factor levels 15, 20, 25, 30 and35). We can use general ANOVA to compare and select models (not shown
15
here). At the 5% level, I will reject the linear model (12) and quadraticmodel (13) but accept the cubic model (14).
Remarks
• Be cautious of overfitting.
• Polynomials with fourth and higher degrees should not be used (unlessthey can be justified by a physical model).
Do not forget Residual Analysis and Diagnostics
Are the assumptions reasonable?
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●● ●
●
●
●●
●
●
10 15 20 25 30 35 40
1015
2025
Fiber experiment
cotton weight
tens
ile s
treng
th (y
)
10 12 14 16 18 20 22
−40
24
6
Fitted values
Res
idua
ls
●● ●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
● ●
●
Residuals vs Fitted16
20
5
●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
−2 −1 0 1 2
−10
12
Theoretical Quantiles
Stan
dard
ized
resi
dual
s
Normal Q−Q16
20
5 ●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●● ●
●
●
●●
●
●
10 15 20 25 30 35 40
1015
2025
Fitted Curves
cotton weight
tens
ile s
treng
th (y
)
16
Chapter 4 Randomized Blocks, Latin Squares, and Re-lated Designs
Topics: Randomized Block Designs
4.1 The Randomized Complete Block Design
The hardness testing experiment
• To determine whether 4 di↵erent tips produce di↵erent hardness readingon a Rockwell hardness tester
• Assignment of the tips to an experimental unit; that is, a test coupon
• The test coupons are a source of nuisance variability (need for blocking)
• Assign all 4 tips to each coupon
• Each coupon is called a block; that is, its a more homogenous experi-mental unit on which to test the tips
• Variability between blocks can be large, variability within a block shouldbe relatively small
• All runs within a block are randomized
CouponTip 1 2 3 4 Averages1 9.3 9.4 9.6 10 9.5752 9.4 9.3 9.8 9.9 9.6003 9.2 9.4 9.5 9.7 9.4504 9.7 9.6 10 10.2 9.875
This is a randomized (complete) block design, with one experimentalfactor of k = 4 treatments and b = 4 blocks of size k.
• Paired comparison design is a randomized block design with k = 2.
17
The linear model (or e↵ects model) for the randomized block design is
yij
= µ+ ⌧i
+ �j
+ ✏ij
, i = 1, . . . , k; j = 1, . . . , b, (16)
where yij
represents the observation of the ith treatment in the jth block, ⌧i
is the ith treatment e↵ect, �j
is the jth block e↵ect and ✏ij
are NID(0, �2)errors.
• Constraints on the parameters are necessary.
• The zero-sum constraints are
kX
i=1
⌧i
=bX
j=1
�j
= 0.
The ANOVA table for the randomized block design
Source DF SS MSTreatment k � 1 SS
Tr
MSTr
= SSTr
/(k � 1)Block b� 1 SS
B
MSB
= SSB
/(b� 1)Residual (b� 1)(k � 1) SS
E
MSE
= SSE
/((b� 1)(k � 1))Total bk � 1 SS
total
The ANOVA decomposition:
kX
i=1
bX
j=1
(yij
� y··
)2 =kX
i=1
b(yi·
� y··
)2 +bX
j=1
k(y·j
� y··
)2 +kX
i=1
bX
j=1
(yij
� yi·
� y·j
+ y··
)2
SStotal
= SSTr
+ SSB
+ SSE
.
Null hypothesis of no treatment e↵ect di↵erence, H0 : ⌧1 = · · · = ⌧k
. UnderH0,
F =MS
Tr
MSE
=SS
Tr
/(k � 1)
SSE
/(b� 1)(k � 1)
has a F distribution with DF k � 1 and (b� 1)(k � 1).
• reject H0 at level ↵ if
F > Fk�1,(b�1)(k�1),↵
18
If H0 is rejected, multiple comparisons of the ⌧j
should be performed. Thet statistics for making multiple comparisons are
tij
=yj·
� yi·
�p1/b+ 1/b
, (17)
where �2 = MSE
. Under H0 : ⌧1 = · · · = ⌧k
, each tij
has a t distributionwith DF=(b� 1)(k � 1).
At level ↵, the Tukey multiple comparison method identifies “treatmentsi and j as di↵erent” if
|tij
| >1p
2qk,(b�1)(k�1),↵.
The simultaneous CI for ⌧j
� ⌧i
based on the Tukey method are
yj·
� yi·
± qk,(b�1)(k�1),↵�/
p
b (18)
for all i and j pairs.For the hardness testing experiment, the ANOVA table is
Df Sum Sq Mean Sq F value Pr(>F)
tip 3 0.38500 0.12833 14.438 0.0008713 ***
coupon 3 0.82500 0.27500 30.938 4.523e-05 ***
Residuals 9 0.08000 0.00889
The small p-value suggests that the tips are di↵erent and the coupons arealso di↵erent. Here blocking is e↵ective.
Wrong analysis ignoring blocking, the ANOVA table would be
Df Sum Sq Mean Sq F value Pr(>F)
tip 3 0.38500 0.12833 1.7017 0.2196
Residuals 12 0.90500 0.07542
The wrong conclusion would be there is no treatment e↵ect.
Multiple comparison for treatment (tip) means. The t statistics are
1 vs. 2 1 vs. 3 1 vs. 4 2 vs. 3 2 vs. 4 3 vs. 40.375 �1.875 4.500 �2.25 4.125 6.375
The Tukey method
19
•
1p
2q4,9,0.05 =
4.41p
2= 3.12
• 3 pairs (1 vs. 4, 2 vs. 4, 3 vs. 4) are significantly di↵erent at 5% level
Diagnostics: Are the assumptions reasonable?
1 2 3 4
9.2
9.4
9.6
9.8
10.2
coupon
y
1 2 3 4
9.2
9.4
9.6
9.8
10.2
tipy
9.2
9.4
9.6
9.8
10.2
coupon
mea
n of
y
1 2 3 4
tip
1234
9.2
9.4
9.6
9.8
10.2
tip
mea
n of
y
1 2 3 4
coupon
1234
9.2 9.4 9.6 9.8 10.0 10.2
−0.1
00.
000.
10
Fitted values
Res
idua
ls
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
Residuals vs Fitted10
4
6
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
−2 −1 0 1 2
−10
12
Theoretical Quantiles
Stan
dard
ized
resi
dual
s
Normal Q−Q10
4
6
20