chapter 1 introduction - ucla statisticshqxu/stat201a/ch1-4page.pdf · • note ⌧k = pk1 i=1 ⌧i...

Stats 201A Research Design, Sampling and Analysis

Part I. Research Design

• Textbook: Montgomery (2005+). Design and Analysis of Experiments.

• Reference: Wu and Hamada (2009). Experiments: Planning, Analysisand Optimization.

• Reference: Faraway (2005). Linear Models with R.

Chapter 1 Introduction

Topics: Basic principles, guidelines, history

1.1 Strategy of Experimentation

• one of the most common activities

• covers a wide range of applications

• used to understand and/or improve a system or a process

• deliberately apply some treatments to observe the change

All experiments are designed experiments, some are poorly designed, someare well-designed.

1.2 Some typical applications of experimental designs

1. Comparing treatments

2. Screening variables

3. Characterizing a process

4. Optimizing a process

1

1.3 Basic Principles: replication, randomization and blocking

1. Replication means that each treatment is applied to di↵erent experi-mental units.

• enables the estimation of the magnitude of experimental error.

• decreases the variance of the treatment e↵ect estimates.

• distinction between replicates and repetitions.

The estimator from replicates has a smaller variance. (Why?)

2. Randomization should be applied to

• the allocation of units to treatments,

• the order in which the treatments are applied in performing theexperiment

• the order in which the responses are measured

It provides protection against unknown variables.

It prevents subjective assignment.

It provides a basis for inference in analyzing the experiments.

3. Blocking deals with nuisance factors (that have significant e↵ects onthe response, but are not interested).

• A block is a group of homogeneous units.

• For blocking to be e↵ective, the units should be arranged so thatthe within-block variation is much smaller than the between-blockvariation.

Block what you can and randomize what you cannot.

2

An Example

To compare 2 wines A and B. Suppose 10 people are available.

• What response to measure?

Design 1: Let people taste A or B randomly by flipping a coin.

Design 2: Randomly choose 5 people to taste A and other 5 to taste B.

Design 3: Have all 10 people to taste both A and B.Flip a coin to decide the order of A and B.

• How are the principles applied in the designs?

• Which design will you recommend?

1.4 Guidelines for designing experiments

1. Recognition of and statement of the problem.

2. Choice of factors, levels and range

• A factor is a variable that is studied in the experiment.

• These values are referred to as levels or settings.

• A treatment is a combination of factor levels.

• Design factors vs. nuisance factors.

• Factors may be quantitative and qualitative.

3. Selection of the response variable

• Make sure that the variable really provides useful information.

• Responses may be discrete or continuous.

• Continuous responses are generally preferable.

4. Choice of experimental design

• Main topic

• A poor design captures little information which no analysis can res-cue.

3

• The results may be obvious for a well planned experiments.

5. Performing the experiment

• Use planning matrix (with actual values or settings of the factors)

• Monitor the process carefully

6. Statistical analysis of the data

• Another main topic

• Graphs, models, hypotheses tests, diagnostics

7. Conclusions and recommendations

• A confirmation experiment is worthwhile to validate the conclusions.

1.5 A Brief History of Statistical Design

1. R.A. Fisher in the 1930’s.

2. G.E.P. Box in the 1950’s.

3. G. Taguchi in mid-1980’s.

4. Modern era, since 1990

1.6 Summary: Using statistical techniques in experimentation

1. Use your nonstatistical knowledge of the problem

2. Keep the design and analysis as simple as possible

3. Recognize the di↵erence between practical and statistical significance

4. Experiments are usually iterative

4

Chapter 2 Simple Comparative Experiments

Topics: Paired comparison designsI go over Section 2.5 only, but you are expected to understand the whole

chapter.

2.5 Paired Comparison Design

The drink experiment:

• Goal: to compare two drinks: pepsi and coke

• students are asked to taste both drinks and assign scores

• scores: 1 (very bad)–9 (excellent)

• the order of drinking pepsi and coke is determined by flipping a coin

Students StandardDrink 1 2 3 4 5 6 7 8 9 Average Deviationcoke 5 7 5 9 7 3 3 8 8 6.11 2.20pepsi 6 5 8 7 4 8 2 6 7 5.89 1.96di↵ -1 2 -3 2 3 -5 1 2 1 0.22 2.68

This is a paired comparison design because each student tastes bothdrinks.

• For paired designs, analyze the di↵erence.

Think about: What would be an unpaired design?

• An unpaired design is a completely randomized design

Hypotheses: H0 : µ1 = µ2 vs. H1 : µ1 6= µ2.

The paired t test:

tpaired

=d

sd

/p

n,

where dj

= y1j � y2j are the di↵erences, d and sd

are the sample mean andstandard deviation.

Under H0, tpaired has a t distribution with df=n� 1.

5

• reject H0 at level ↵ if|tpaired

| > t↵/2,n�1.

For the drink experiment,

tpaired

=0.22

2.68/p

9=

0.22

0.89= 0.25.

• Accept H0 at 5% level because 0.25 < t.05/2,8 = 2.306.

• P-value =Prob(|t8| > 0.25) = 0.81.

The unpaired t test: (one of the two-sample t-tests in Section 2.4)

tunpaired

=y1 � y2

sp

p1/n1 + 1/n2

, s2p

=(n1 � 1)s21 + (n2 � 1)s22

n1 + n2 � 2

where yi

and s2i

are the sample mean and variance for the ith treatment.Under H0, tunpaired has a t distribution with df=2n � 2 (assuming equal

population variances: �21 = �2

2).

• reject H0 at level ↵ if

|tunpaired

| > t↵/2,2n�2.

A wrong analysis: using unpaired test,

tunpaired

=0.22p

(2.202 + 1.962)/9=

0.22

0.98= 0.22

• Still accept H0 at 5% level because 0.22 < t.05/2,16 = 2.120.

• P-value =Prob(|t16| > 0.22) = 0.83.

Which design is more powerful? The answer depends

• If there is a large unit to unit variation, a paired design is more e↵ective.

• Otherwise, an unpaired design is more e↵ective.

Recall: For blocking to be e↵ective, the units should be arranged so thatthe within-block variation is much smaller than the between-block variation.

6

Alternative analysis using ANOVA and F test.

• A paired design is a randomized block design with blocks of size two.

The ANOVA table is

Df Sum Sq Mean Sq F value Pr(>F)

student 8 41.000 5.125 1.4247 0.3142

drink 1 0.222 0.222 0.0618 0.8100

Residuals 8 28.778 3.597

Neither the treatment variable (drink) nor the blocking variable (student)is significant.

If incorrectly analyzing the experiment as unpaired design, the ANOVAtable is


drink 1 0.222 0.222 0.051 0.8243

Residuals 16 69.778 4.361

Note the connection between F and t tests:

• F = t2

Residual analysis.What is the linear model?

What are the assumptions?

Are the assumptions reasonable?

Think again: Is pairing necessary for the drink experiment? Why?

7

coke pepsi

23

45

67

89

drink

scor

es c

c

c

c

c

c c

c c

2 4 6 8

02

46

810

students

scor

es pp

pp

p

p

p

pp

3 4 5 6 7 8

−3−1

12

3

Fitted values

Res

idua

ls

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

Residuals vs Fitted

6

15

3

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

−2 −1 0 1 2

−2−1

01

2

Theoretical Quantiles

Stan

dard

ized

resi

dual

sNormal Q−Q plot

6

15

3

8

Chapter 3 Experiments With a Single Factor

Topics: Analysis of Variance (ANOVA), constraints, multiple comparison

The fiber experiment:

• Formulation of a new “synthetic” fiber to be used to make cloth forshirts.

• The response variable is tensile strength.

• To determine the “best” level of cotton (in weight %) to combine withthe synthetics.

• Cotton weight content can vary between 10–40%.

• Chooses 5 levels of cotton weight content: 15, 20, 25, 30, and 35%.

• The experiment is replicated 5 times.

• Runs made in random order.

Weight Tensile Strength Totals Averages15 7 7 11 15 9 49 9.820 12 17 12 18 18 77 15.425 14 18 18 19 19 88 17.630 19 25 22 19 23 108 21.635 7 10 11 15 11 54 10.8

This is a completely randomized design, also called one-way layout.

Q: How are the principles applied here?

Q: How many ways to assign the five treatments to experimental units?

Q: How many ways to run the experiments?

9

Data for one-way layout (one factor with k levels)

Treatment (level) Observations Totals Averages1 y11 y12 · · · y1n1 y1· y1·2 y21 y22 · · · y2n2 y2· y2·...

...... · · ·

......

...k y

k1 yk2 · · · y

knk yk·

yk·

The means model is:

yij

= µi

+ ✏ij

, i = 1, . . . , k; j = 1, . . . , ni

, (1)

where yij

is the jth observation with treatment i, µi

is the mean of theith treatment (or factor level), ✏

ij

is a random error, k is the number oftreatments, and n

i

is the number of observations with treatment i.

Assume: ✏ij

⇠ NID(0, �2) (i.i.d. normal error)

linear model (or e↵ects model) (by substituting µi

= µ+ ⌧i

) is:

yij

= µ+ ⌧i

+ ✏ij

, i = 1, . . . , k; j = 1, . . . , ni

, (2)

where µ is the overall mean and ⌧i

is the ith treatment e↵ect.In terms of the general linear model

y = X� + ✏,

y = (7, 12, 14, 19, 7, 7, 17, 18, 25, 10, 11, 12, 18, 22, 11, 15, 18, 19, 19, 15, 9, 18, 19, 23, 11)T ;

� = (µ, ⌧1, ⌧2, ⌧3, ⌧4, ⌧5)T

X =

0

BBBB@

X0

X0

X0

X0

X0

1

CCCCA, X0 =

0

BBBB@

1 1 0 0 0 01 0 1 0 0 01 0 0 1 0 01 0 0 0 1 01 0 0 0 0 1

1

CCCCA. (3)

However, XTX is singular and (XTX)�1 does not exist.

• The model (2) is over-parameterized.

• Need one constraint on the parameters.

10

The Zero-Sum Constraint iskX

i=1

⌧i

= 0. (4)

• Can drop any ⌧i

, usually ⌧k

.

• � = (µ, ⌧1, ⌧2, . . . , ⌧k�1)T .

• Note ⌧k

= �

Pk�1i=1 ⌧i

• Be careful about the model matrix.

For the fiber experiment,

� = (µ, ⌧1, ⌧2, ⌧3, ⌧4)T

X =

0

BBBB@

X0

X0

X0

X0

X0

1

CCCCA, X0 =

0

BBBB@

1 1 0 0 01 0 1 0 01 0 0 1 01 0 0 0 11 �1 �1 �1 �1

1

CCCCA. (5)

� = (µ, ⌧1, ⌧2, ⌧3, ⌧4) = (15.04,�5.24, 0.36, 2.56, 6.56)T .

Interpretation of the parameters and estimates

•

1k

Pk

i=1E(yij

) = 1k

Pk

i=1(µ+ ⌧i

) = µ+ 0 = µ

• E(yij

)� µ = µ+ ⌧i

� µ = ⌧i

for i � 1.

In word, µ represents the grand mean, ⌧i

is the o↵set between the expected

treatment i response and the average response.

The estimates under the zero-sum constraint are

• µ = y··

• ⌧i

= yi·

� y··

for i = 1, . . . , k.

Notation

yi·

=1

ni

niX

j=1

yij

, y··

=1

N

kX

i=1

niX

j=1

yij

, N =kX

i=1

ni

Remarks

11

• There are other constraints other than the zero-sum constraint.

• The model matrix X and LSE � depend on the choice of constraints,

• The fitted value y = X� and residuals y � y do not.

• The di↵erence ⌧i

- ⌧j

does not.

In R, a baseline or treatment constraint is the default.

options(contrasts=c("contr.treatment", "contr.poly")) # default baseline constraints

options(contrasts=c("contr.sum", "contr.poly")) # specify zero-sum constraints

The estimates under the baseline constraint (⌧1 = 0) are

• µ = y1·

• ⌧i

= yi·

� y1· for i = 1, . . . , k.

Hypotheses and ANOVA

• H0: there is no di↵erence between the treatments, i.e., µ1 = · · · = µk

(or⌧1 = · · · = ⌧

k

= 0).

• H1: there is di↵erence between the treatments, i.e., not all µi

are thesame.

The ANOVA table for the one-way layout is

Degrees of Sum of MeanSource Freedom Squares Squares F

treatment k � 1 SSTr =P

k

i=1 ni

(yi· � y··)

2 MSTr = SSTr/(k � 1) MSTr/MSE

residual N � k SSE

=P

k

i=1

Pni

j=1 (yij � yi·)

2 MSE

= SSE

/(N � k)

total N � 1 SSTotal

=P

k

i=1

Pni

j=1 (yij � y··)2

where N =P

k

i=1 ni

is the total number of observations.The ANOVA for the one-way layout can be derived directly using decom-

position:

kX

i=1

niX

j=1

(yij

� y··

)2 =kX

i=1

ni

(yi·

� y··

)2 +kX

i=1

niX

j=1

(yij

� yi·

)2

SSTotal

= SSTr + SSE

.

12

The treatment sum of squares (SS) is also called the between-treatment SSand the residual SS called the within-treatment SS.

Expected Mean Squares

E(MSE

) = �2, E(MSTr) = �2 +1

k � 1

kX

i=1

ni

(⌧i

� ⌧)2

The F statistic

F =MSTr

MSE

=

Pk

i=1 ni

(yi·

� y··

)2/(k � 1)P

k

i=1

Pni

j=1 (yij � yi·

)2/(N � k), (6)

has an F distribution with parameters k � 1 and N � k under H0.

• Reject H0 at the ↵ level if F > Fk�1,N�k,↵

.

For the fiber experiment, the ANOVA table is

Source DF SS MS F

cotton 4 475.76 118.94 14.757residual 20 161.20 8.06total 24 636.96

• F = 14.757 > F4,20,.05 = 2.87

• Decision: reject H0 at the 5% level

• Conclusion: the treatments are di↵erent.

• P-value=P (F4,20 � 14.757) < .001.

Once H0 is rejected, an immediate question is: what pairs of treatmentsare di↵erent? Note that we are testing more than one pairs, that is, we aredoing multiple comparisons. See the text for various methods of multiplecomparisons (Section 3.5).

13

Multiple Comparisons

Suppose that the null hypothesis H0 : µ1 = · · · = µk

, is rejected. An im-mediate question is to determine which pairs of treatments are significantlydi↵erent.

The t statistic to test µi

= µj

vs. µi

6= µj

is

tij

=point estimate

s.e. of point estimate=

yj·

� yi·

�p

1/nj

+ 1/ni

, (7)

where � =p

MSE

(has N � k df).Reject µ

i

= µj

at level ↵ if

|tij

| > tN�k,↵/2, (8)

• This test is valid for testing one pair of treatments.

• but cannot be applied in the multiple comparisons of treatments.

• EER (experimentwise error rate) = probability of declaring at leastone pair of treatments significantly di↵erent under H0

• EER > ↵ when k0 > 1 comparisons are made.

The Tukey Method declares “µi

di↵erent from µj

” if

|tij

| >1p

2qk,N�k,↵

, (9)

where qk,N�k,↵

is the upper ↵ quantile of the Studentized range distributionwith parameter k and N � k degrees of freedom.

• For the balanced one-way layout (i.e., ni

= n), EER is exactly ↵.

The simultaneous confidence intervals for µi

� µj

are

yi·

� yj·

±

1p

2qk,N�k,↵

�q

1/ni

+ 1/nj

, (10)

for all i and j pairs.For the fiber experiment, the t statistics are

14

1 vs. 2 1 vs. 3 1 vs. 4 1 vs. 5 2 vs. 3 2 vs. 4 2 vs. 5 3 vs. 4 3 vs. 5 4 vs. 53.12 4.34 6.57 0.56 1.23 3.45 -2.56 2.23 -3.79 -6.01

•

1p

2qk,N�k,0.05 =

1p

2q5,20,0.05 =

4.23p

2= 2.99

• six pairs (1 vs. 2, 1 vs. 3, 1 vs. 4, 2 vs. 4, 3 vs. 5, 4 vs. 5) are di↵erent.

Note qk,N�k,↵

(or q↵

(p, f)) can be found in tables in Appendix VII (p.621-622) for ↵ =.05 and .01. In R, use

> qtukey(1-.05, 5, 20)

[1] 4.231857

• There are several other related procedures, see Section 3.5.

Regression Models

In the fiber experiment, the factor (cotton weight) is quantitative and thelevels are evenly spaced (15, 20, 25, 30 and 35). The experiment providesstrong evidence that cotton weight percent a↵ects the tensile strength of thefiber. A natural question is to model the relationship between strength andcotton weight percent. This can be done with regression.

We can fit a polynomial model in terms of x to the data, say

y = �0 + �1x+ �2x2 + . . .+ �

p

xp + ✏. (11)

We can start with a small p (say p = 1). If the model does not fit thedata, we can increase p. Alternatively, we can start with a large p (say p = 4here) and then drop insignificant terms. The fitted models with p = 1, 2, 3, 4are:

y = 10.940 + 0.164x (12)

y = �39.9886 + 4.5926x� 0.0886x2 (13)

y = 62.6114� 9.0114x+ 0.4814x2 � 0.0076x3 (14)

y = �406.4000 + 73.7767x� 4.8077x2 + 0.1377x3 � 0.001453x4 (15)

where x are in the original unit (i.e., actual factor levels 15, 20, 25, 30 and35). We can use general ANOVA to compare and select models (not shown

15

here). At the 5% level, I will reject the linear model (12) and quadraticmodel (13) but accept the cubic model (14).

Remarks

• Be cautious of overfitting.

• Polynomials with fourth and higher degrees should not be used (unlessthey can be justified by a physical model).

Do not forget Residual Analysis and Diagnostics

Are the assumptions reasonable?

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●● ●

●

●

●●

●

●

10 15 20 25 30 35 40

1015

2025

Fiber experiment

cotton weight

tens

ile s

treng

th (y

)

10 12 14 16 18 20 22

−40

24

6

Fitted values

Res

idua

ls

●● ●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●

●

●

●

● ●

●

Residuals vs Fitted16

20

5

●●●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●●

●

−2 −1 0 1 2

−10

12


Stan

dard

ized

resi

dual

s

Normal Q−Q16

20

5 ●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●● ●

●

●

●●

●

●

10 15 20 25 30 35 40

1015

2025

Fitted Curves

cotton weight

tens

ile s

treng

th (y

)

16

Chapter 4 Randomized Blocks, Latin Squares, and Re-lated Designs

Topics: Randomized Block Designs

4.1 The Randomized Complete Block Design

The hardness testing experiment

• To determine whether 4 di↵erent tips produce di↵erent hardness readingon a Rockwell hardness tester

• Assignment of the tips to an experimental unit; that is, a test coupon

• The test coupons are a source of nuisance variability (need for blocking)

• Assign all 4 tips to each coupon

• Each coupon is called a block; that is, its a more homogenous experi-mental unit on which to test the tips

• Variability between blocks can be large, variability within a block shouldbe relatively small

• All runs within a block are randomized

CouponTip 1 2 3 4 Averages1 9.3 9.4 9.6 10 9.5752 9.4 9.3 9.8 9.9 9.6003 9.2 9.4 9.5 9.7 9.4504 9.7 9.6 10 10.2 9.875

This is a randomized (complete) block design, with one experimentalfactor of k = 4 treatments and b = 4 blocks of size k.

• Paired comparison design is a randomized block design with k = 2.

17

The linear model (or e↵ects model) for the randomized block design is

yij

= µ+ ⌧i

+ �j

+ ✏ij

, i = 1, . . . , k; j = 1, . . . , b, (16)

where yij

represents the observation of the ith treatment in the jth block, ⌧i

is the ith treatment e↵ect, �j

is the jth block e↵ect and ✏ij

are NID(0, �2)errors.

• Constraints on the parameters are necessary.

• The zero-sum constraints are

kX

i=1

⌧i

=bX

j=1

�j

= 0.

The ANOVA table for the randomized block design

Source DF SS MSTreatment k � 1 SS

Tr

MSTr

= SSTr

/(k � 1)Block b� 1 SS

B

MSB

= SSB

/(b� 1)Residual (b� 1)(k � 1) SS

E

MSE

= SSE

/((b� 1)(k � 1))Total bk � 1 SS

total

The ANOVA decomposition:

kX

i=1

bX

j=1

(yij

� y··

)2 =kX

i=1

b(yi·

� y··

)2 +bX

j=1

k(y·j

� y··

)2 +kX

i=1

bX

j=1

(yij

� yi·

� y·j

+ y··

)2

SStotal

= SSTr

+ SSB

+ SSE

.

Null hypothesis of no treatment e↵ect di↵erence, H0 : ⌧1 = · · · = ⌧k

. UnderH0,

F =MS

Tr

MSE

=SS

Tr

/(k � 1)

SSE

/(b� 1)(k � 1)

has a F distribution with DF k � 1 and (b� 1)(k � 1).

• reject H0 at level ↵ if

F > Fk�1,(b�1)(k�1),↵

18

If H0 is rejected, multiple comparisons of the ⌧j

should be performed. Thet statistics for making multiple comparisons are

tij

=yj·

� yi·

�p1/b+ 1/b

, (17)

where �2 = MSE

. Under H0 : ⌧1 = · · · = ⌧k

, each tij

has a t distributionwith DF=(b� 1)(k � 1).

At level ↵, the Tukey multiple comparison method identifies “treatmentsi and j as di↵erent” if

|tij

| >1p

2qk,(b�1)(k�1),↵.

The simultaneous CI for ⌧j

� ⌧i

based on the Tukey method are

yj·

� yi·

± qk,(b�1)(k�1),↵�/

p

b (18)

for all i and j pairs.For the hardness testing experiment, the ANOVA table is


tip 3 0.38500 0.12833 14.438 0.0008713 ***

coupon 3 0.82500 0.27500 30.938 4.523e-05 ***

Residuals 9 0.08000 0.00889

The small p-value suggests that the tips are di↵erent and the coupons arealso di↵erent. Here blocking is e↵ective.

Wrong analysis ignoring blocking, the ANOVA table would be


tip 3 0.38500 0.12833 1.7017 0.2196

Residuals 12 0.90500 0.07542

The wrong conclusion would be there is no treatment e↵ect.

Multiple comparison for treatment (tip) means. The t statistics are

1 vs. 2 1 vs. 3 1 vs. 4 2 vs. 3 2 vs. 4 3 vs. 40.375 �1.875 4.500 �2.25 4.125 6.375

The Tukey method

19

•

1p

2q4,9,0.05 =

4.41p

2= 3.12

• 3 pairs (1 vs. 4, 2 vs. 4, 3 vs. 4) are significantly di↵erent at 5% level

Diagnostics: Are the assumptions reasonable?

1 2 3 4

9.2

9.4

9.6

9.8

10.2

coupon

y

1 2 3 4

9.2

9.4

9.6

9.8

10.2

tipy

9.2

9.4

9.6

9.8

10.2

coupon

mea

n of

y

1 2 3 4

tip

1234

9.2

9.4

9.6

9.8

10.2

tip

mea

n of

y

1 2 3 4

coupon

1234

9.2 9.4 9.6 9.8 10.0 10.2

−0.1

00.

000.

10

Fitted values

Res

idua

ls

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

Residuals vs Fitted10

4

6

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

−2 −1 0 1 2

−10

12


Stan

dard

ized

resi

dual

s

Normal Q−Q10

4

6

20

chapter 1 introduction - ucla statisticshqxu/stat201a/ch1-4page.pdf · • note ⌧k = pk1 i=1 ⌧i...

Documents