stat 502 design and analysis of experiments two sample experiments

85

Upload: others

Post on 03-Feb-2022

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Stat 502Design and Analysis of Experiments

Two Sample Experiments: Treatment vs Control

Fritz Scholz

Department of Statistics, University of Washington

September 14, 2013

1

Page 2: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Circuit Board Solder Flux Experiment (Courtesy Denis Janky)

Flux is material used to facilitate soldering on circuit boards.http://en.wikipedia.org/wiki/Flux_(metallurgy)

Moisture condensing on the boards can interact with solderingcontaminants (usually residual solder �ux) and result in thin�laments (dendrites) on the surface.

The dendrites can carry current and disrupt the circuits orshort circuit the board.

Measure Surface Insulation Resistance (SIR) between twoelectrically isolated sites.

Electrical problems on aircraft could occur during �ight,possibly intermittent.

Troubled circuit boards are yanked, often without �ndings.

Cleaning soldering contaminants is a leading contaminator ofthe ozone layer.

Some �uxes are easier to clean than others.

We have 2 �ux candidates.2

Page 3: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Treatments and Controls

Here the experimental situation was laid out in terms of twodi�erent and competing �ux treatments.

One of the treatments could have been used in the past.

It is the familiar treatment and often it would then be calledthe control.

The other treatment would be the newer treatment.

Why should we change?

Some possible reasons:

the promise of better resultsequal results with a cheaper or cleaner processtreatment �exibility, spreading the risk of treatment availability

3

Page 4: Stat 502 Design and Analysis of Experiments Two Sample Experiments

The Experimental Process

18 boards are available for the experiment, not necessarily arandom sample from all boards.

Test �ux brands X and Y: randomly assign 9 boards each to X& Y (FLUX variable)

The boards are soldered and cleaned. Order randomized.(SC.ORDER)

Then the boards are coated and cured to avoid handlingcontamination. Order randomized. (CT.ORDER)

Then the boards are placed in a humidity chamber andmeasured for SIR. Position in chamber randomized. (SLOT)

The randomization at the various process steps avoidsunknown biases.

When in doubt on the e�ect of any process step, randomize!

The randomization of �ux assignments gives us amathematical basis for judging �ux di�erences with respect tothe response SIR (to become clear later).

4

Page 5: Stat 502 Design and Analysis of Experiments Two Sample Experiments

DOE Steps Recapitulated

1 Goal of the experiment.Answer question: Is Flux X di�erent from Flux Y?If not, we can use them interchangeably.One may be cheaper than the other.Test null hypothesis H0: No di�erence in �uxes.

2 Understand the experimental units: Boards.

3 De�ne the appropriate response variable to be measured: SIR

4 De�ne potential sources of response variationa) factors of interest: �ux typeb) nuisance factors: boards, various processing steps, testing.

5 Decide on treatment and blocking variables.Treatment = �ux type, no blocking.With 2 humidity chambers might have blocked on those,since variation from chamber to chamber may be substantial.

6 Clearly de�ne experimental process and what is randomized.Treatments and all nuisance factors are randomized.

5

Page 6: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Why Are Treatments and Nuisance Factors Randomized?

The treatments are randomized among boards to avoidconfounding and to provide a mathematical basis for inferenceconcerning treatment e�ects.

We don't want any board peculiarities all ganging up on onetreatment.

This would make it di�cult to distinguish treatment fromboard e�ect.

The nuisance factors all come into play after the �uxtreatments are applied. If not, such nuisance factor e�ectscould be lumped with the board peculiarities.

Any e�ect that they have could align with the treatments, i.e.,the 9 highest nuisance factor e�ects could be aligned with �uxX. Randomization makes that unlikely.

Randomizing over all nuisance factors separately andindependently makes any combined confounding e�ectsextremely unlikely. It reduces variability since + and − e�ectscancel out to a large extent.

6

Page 7: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Flux Data

BOARD FLUX SC.ORDER CT.ORDER SLOT SIR1 Y 13 14 5 8.62 Y 16 8 6 7.53 X 18 9 15 11.54 Y 11 11 11 10.65 X 15 18 9 11.66 X 9 15 18 10.37 X 6 1 16 10.18 Y 17 12 17 8.29 Y 5 10 13 10.010 Y 10 13 14 9.311 Y 14 5 10 11.512 X 12 17 12 9.013 X 4 7 3 10.714 X 8 6 1 9.915 Y 3 2 4 7.716 X 7 3 2 9.717 Y 1 16 8 8.818 X 2 4 7 12.6

see Flux.csv or fluxNote the 4 levels of randomization

7

Page 8: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Flux Experiment: First Box Plot Look at SIR Data

FluxX Y

SIR

( l

og10

((Ohm

)) )

67

89

1011

1213

14 FLUXY −− FLUXX == −1.467

●●

boxplot(SIR~FLUX,data=flux)# basic boxplot without annotation# see class web site for flux.plots

8

Page 9: Stat 502 Design and Analysis of Experiments Two Sample Experiments

What Does a Box Plot Show?

In the box plot (Tukey 1977) the upper/lower quartiles, Q(.75) &Q(.25), and the median, Q(.5), are portrayed by the 3 horizontalline segments of the rectangle.

Dashed lines extend from the ends of the box to the adjacentvalues, de�ned below.

Compute the interquartile range IQR = Q(.75)− Q(.25).The upper adjacent value is the largest observation≤ Q(.75) + 1.5× IQR.The lower adjacent value is the smallest observation≥ Q(.25)− 1.5× IQR. (Chambers, Cleveland, Kleiner, Tukey)

Observations beyond adjacent values are shown individually.

For a normal population ≈ .35% of the values fall beyond eachadjacent value.

The dots shown superimposed on our SIR box plots are just theadded data values.

Sample quantiles, Q(p), are determined by one of many possibleschemes, see ?quantile in R.

9

Page 10: Stat 502 Design and Analysis of Experiments Two Sample Experiments

The Utility of Box Plots

Succinct description of data sets, showing median, upper andlower quartiles. This avoids looking at too many data points.

If median ≈ center of the box=⇒ the middle 50% of the data appear symmetric.

If in addition adjacent values ≈ symmetric relative to the box=⇒ this symmetry appears to extend to the tails,otherwise data are skewed.

Any values beyond the adjacent values are singled out aspotential outliers, asking for attention.

Especially e�ective for comparing many data sets side by side.

10

Page 11: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Detour on Taxiway Deviations

T

d1

d2

wing span W2

wing span W1

taxiway centerline

taxiway centerline

Risk of collision and running o� taxiway, getting stuck in the mud.

11

Page 12: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Two Lasers Measure Distances to Nose & Main Gears

12

Page 13: Stat 502 Design and Analysis of Experiments Two Sample Experiments

747 Taxiway Deviations

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

● ●

●●

●● ●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

● ●

0 5 10 15 20

−10

−5

05

10

Eastbound Traffic

time of day

Nos

e G

ear

Dev

iatio

n fr

om T

axiw

ay C

ente

rline

(ft)

13

Page 14: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Box Plots for 747 Taxiway Deviations

●●

●●

●●

●●

●●

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

−10

−5

05

10

Eastbound Traffic

time of day

Nos

e G

ear

Dev

iatio

n fr

om T

axiw

ay C

ente

rline

(ft)

See class web site for deviation.boxplot.eb14

Page 15: Stat 502 Design and Analysis of Experiments Two Sample Experiments

747 Taxiway Deviations

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

● ●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●●

●●

●●

● ●

●● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0 5 10 15 20

−10

−5

05

10

Westbound Traffic

time of day

Nos

e G

ear

Dev

iatio

n fr

om T

axiw

ay C

ente

rline

(ft)

15

Page 16: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Box Plots for 747 Taxiway Deviations

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

● ●●

●●

●●

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

−10

−5

05

10

Westbound Traffic

time of day

Nos

e G

ear

Dev

iatio

n fr

om T

axiw

ay C

ente

rline

(ft)

See class web site for deviation.boxplot.wb

16

Page 17: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Some Comments

Eastbound Tra�c box plots hinted at a preponderance ofmedians on one side of the taxiway centerline.

Westbound Tra�c made the same point very clearly becauseof the much larger tra�c volume in that direction.

The consistent bias direction suggests that it is not only dueto pilot position relative to aircraft centerline (parallax e�ect).

Eastbound bias seems less than the Westbound bias.

This suggests two bias components, one switching directionwith tra�c direction (parallax e�ect), the other does not.

17

Page 18: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Taxiway Centerline and Centerlights

Pilots avoid hitting the o� centerline light bumps=⇒ consistent bias (in same direction) in addition to pilotparallax bias which changes with direction of travel.

18

Page 19: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Three Studies on Taxiway Deviations

The links below are active, if connected to www.

http://www.airporttech.tc.faa.gov/Design/Downloads/anc.report.091003.pdf

http://www.airporttech.tc.faa.gov/Design/Downloads/jfk.report.pdf

http://www.airporttech.tc.faa.gov/Design/Downloads/separation new.pdf

19

Page 20: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Returning to the Flux Experiment: QQ-Plot of SIR Data

8 9 10 11 12

89

1011

12

SIR with Flux X ( log10(Ohm) )

SIR

with

Flu

x Y

(

log 1

0(Ohm

) )

20

Page 21: Stat 502 Design and Analysis of Experiments Two Sample Experiments

What is a QQ-Plot?

A QQ-plot is a another way of comparing 2 data sets visually.

Easiest to construct for equal sample sizes, m = n.

Then you plot the ordered values of one sample against thecorresponding ordered values of the other sample.

For m > n you linearly interpolate n values from the sortedlarger sample and plot these against the sorted smaller sample.

out<-qqplot(x,y) does it for 2 sample vectors x and y.

The out<- is optional. out is a list with plotting positionsout$x and out$y.

abline(lm(y ~ x,out)) �ts a line to the QQ-plot.

See the function body of qqplot and ?approx for theinterpolation scheme.

21

Page 22: Stat 502 Design and Analysis of Experiments Two Sample Experiments

What Can a QQ-Plot Tell You?

If the points scatter around the main diagonal=⇒ samples appear to come from the same distribution.

An ≈ linear point pattern=⇒ the two samples di�er at most in location and scale.

An ≈ linear point pattern parallel to the main diagonal=⇒ the two samples di�er mainly in location (not scale).

If the point pattern is ≈ linear and crosses the diagonal nearthe middle of the point pattern,

=⇒ the two samples di�er mainly in scale (not location).

If the point pattern does not look ≈ linear, then the samplesdi�er in aspects other than location and scale.

For this there are many possibilities.

Location is captured by mean or median, scale by standarddeviation or IQR.

Judgment is subjective and takes experience.

Simulation is one way to gain such experience.

22

Page 23: Stat 502 Design and Analysis of Experiments Two Sample Experiments

QQ-Plot of SIR Data (Higher Perspective?)

● ●●

●●●

0 5 10 15 20

05

1015

20

SIR with Flux X ( log10(Ohm) )

SIR

with

Flu

x Y

(

log 1

0(Ohm

) )

Here the same point pattern looks more linear and closer to themain diagonal.

23

Page 24: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Some QQ-Plots from N(0,1) Samples (m=9, n=9)

●● ●

●● ●

y −− x == −0.792

●●

● ●

●●

●y −− x == 0.926

●●

● ●

●● ●

y −− x == −0.57

● ●●●

●●

●y −− x == −0.394

●●

●●

●●

●●

y −− x == −0.62

● ● ●●

●y −− x == 0.115

●● ●

●●

y −− x == −0.625

● ●

●●

● ● ●●

y −− x == −1.12

●●

●●

y −− x == 0.584

●●

●● ●●

y −− x == −0.647

● ●

● ●

● ●

●y −− x == 0.41

●●

●●● ●

●● ●

y −− x == −1.33

● ●

● ●●

●●

●●

y −− x == −0.667

●●

● ●

● ●

●y −− x == −0.31

● ●

● ●●

● ● ● ●

y −− x == −0.757

● ●● ●

●●

●y −− x == 0.845

●● ●y −− x == 0.165

●●

● ●

●y −− x == 1.07

●●●

●●

●●

●y −− x == −0.194

●●

● ●

● ●

●● ●

y −− x == −0.253

24

Page 25: Stat 502 Design and Analysis of Experiments Two Sample Experiments

What Do the 20 Simulated QQ-Plots Tell Us?

Very few patterns seem to vary along the main diagonal, the�expected� situation.

We say �expected� because both samples come from the samepopulation/distribution.

Clear linear patterns are also rare. This is due to the smallsample size.

12 of 20 patterns have points on both sides of main diagonal.

Some patterns are clearly o�set from the main diagonal.

Thus one should be rather forgiving when judging with smallsample sizes.

The next slide illustrates how matters improve when bothsamples are of size 30.

25

Page 26: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Some QQ-Plots from N(0,1) Samples (m=30, n=30)

●●

● ●●●●

●●● ●● ●●●●●●

●●●●●●

●●●

●●

●y −− x == 0.299

● ● ● ● ● ●●●● ●●●●●●● ●

●●●●●●

●●● ● ●

●●

y −− x == 0.23

●●●

● ● ●●●

●●●●

●● ●● ●●

●●●●●

●●● ● ●

●●y −− x == 0.17

● ●●●●● ●●

●●●●●●

● ●●●●● ●●●● ●

● ●● ●

●y −− x == 0.118

●●

● ● ●●●

●●●● ●●●●

●●●●●

● ● ●●

● ● ● ●

●y −− x == −0.03

●●●●●●●

●● ●●●

●●●

● ● ●●● ●

● ● ●● ●●●

●y −− x == 0.00176

●●● ●

●● ●●●●●

●●●● ●●●

●● ●●●

●● ●● ●

y −− x == 0.0172

●●●

●●●●

●● ● ●●●● ●

●●●●●●●●●

● ●●

●y −− x == 0.252

●●

●● ●● ●●

●●●●

●●●●● ●●

●●● ●●

● ●●

● ● ●

y −− x == −0.0942

●● ●

● ●●● ●●●●

●●●●

● ●● ●●●● ● ●

●●● ●

●y −− x == 0.257

●●

●●● ●

●●●●● ●●●●● ● ●●

● ●● ●

●● ● ●

●●●y −− x == 0.0611

● ●●●● ●●●

●● ●● ●●●●● ●●●

●●●

● ●

●y −− x == 0.253

●●

●●●●●●

●●●

●● ●●●● ●●●●●●

●●●

●●

●y −− x == −0.0129

●● ●●●●

●●● ● ●●

●●●●●●●● ●

●●●●●

●y −− x == −0.383

● ●●

● ●●●●●● ●●●

●●●●● ●●●

●●●

●●● ●

●y −− x == 0.11

●●● ●●●●●●

●●●●

●●●● ●● ●

●●●●

●● ●●

●y −− x == 0.0591

●●●●

●●●●

●● ●

●●●

●●●●●●

● ● ●●●●

●● ●

●y −− x == 0.0275

●●●

●● ●● ●● ●● ●●●●●●●●● ● ●●●

●●●

● ●y −− x == −0.314

●● ●

●● ●

● ●●

● ●●●

● ● ●●●●

●●●● ●● ● ●

●●

y −− x == 0.219

● ● ●

●●●●●

● ●●●●● ●● ●●●

●●●●●

● ●●

●●y −− x == −0.164

26

Page 27: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Is the Di�erence Y − X = −1.467 Signi�cant?

In comparing SIR for the two �uxes let us focus on thedi�erence of means FLUXY − FLUXX = Y − X .

If the use of �ux X or �ux Y made no di�erence (H0), then weshould have seen the same results for these 18 boards, nomatter which got �ux X or Y.

X or Y is then just an arti�cial label with no consequence.

Due to the di�erent splittings into two groups of 9 & 9, wewould see other di�erences of means.

There are(189

)= 48620 = choose(18,9) such possible

splits. For each split we could obtain Y − X .

Was our observed di�erence of −1.467 from an unusualrandom split?

Need the randomization or reference distribution of Y − X forall possible splits.

27

Page 28: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Some Randomization Examples of Y − X

8.6 8.6 8.6 8.6 8.6 8.6 8.67.5 7.5 7.5 7.5 7.5 7.5 7.511.5 11.5 11.5 11.5 11.5 11.5 11.510.6 10.6 10.6 10.6 10.6 10.6 10.611.6 11.6 11.6 11.6 11.6 11.6 11.610.3 10.3 10.3 10.3 10.3 10.3 10.310.1 10.1 10.1 10.1 10.1 10.1 10.18.2 8.2 8.2 8.2 8.2 8.2 8.210.0 10.0 10.0 10.0 10.0 10.0 10.09.3 9.3 9.3 9.3 9.3 9.3 9.311.5 11.5 11.5 11.5 11.5 11.5 11.59.0 9.0 9.0 9.0 9.0 9.0 9.010.7 10.7 10.7 10.7 10.7 10.7 10.79.9 9.9 9.9 9.9 9.9 9.9 9.97.7 7.7 7.7 7.7 7.7 7.7 7.79.7 9.7 9.7 9.7 9.7 9.7 9.78.8 8.8 8.8 8.8 8.8 8.8 8.812.6 12.6 12.6 12.6 12.6 12.6 12.6

Y−X Y−X Y−X Y−X Y−X Y−X Y−X1.1778 0.4222 -0.0889 -0.4000 0.5778 0.7778 0.2000

28

Page 29: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Reference Distribution of Y − X

Compute Y − X for each of the 48620 possible splits anddetermine how unusual the observed di�erence of −1.467 is.

This seems like a lot of computing work. Here it takes just afew seconds in R using the function combn.

But the computing work can grow rapidly, since(m+nm

)gets

large in a hurry. For example(2613

)= 10400600 (10 million+).

Let SIR be the vector of all 18 SIR values in your workspace.

In your workspace de�ne the following function of the 2arguments ind and ym.diff <-function(ind,y){mean(y[ind])-mean(y[-ind])}

rand.ref.dist <- combn(1:18,9,FUN=m.diff,y=SIR)

gives the vector of all 48620 such di�erences of averages.

29

Page 30: Stat 502 Design and Analysis of Experiments Two Sample Experiments

What does combn Do?

The function combn goes through all combinations of 9indices taken from 1:18. Each such combination is fed as thevalue for the �rst argument ind to the function m.diff.Note that ind is not explicitly named in combn(...).

Instead of ind any other unused variable name could be used,e.g., IndianaJones.

The 2nd argument y in m.diff is explicitly named as y=SIRin combn(1:18,9,FUN=m.diff,y=SIR).

For each combination ind and assigned vectorSIR = (X1, . . . ,X9,Y1, . . . ,Y9) = (Z1, . . . ,Z18) the functionm.diff(ind,y) is evaluated. For example,ind=c(3,5,7,9,10,12,14,17,18) −→mean(y[ind]) - mean(y[-ind]) = (Z3 + Z5 + Z7 +. . .+ Z17 + Z18)/9− (Z1 + Z2 + Z4 + . . .+ Z15 + Z16)/9.

30

Page 31: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Using IndianaJones

m.diff <-function(IndianaJones,y){mean(y[IndianaJones])-

mean(y[-IndianaJones])}andrand.ref.dist <-combn(1:18,9,FUN=m.diff,y=SIR)would have produced the same output vector rand.ref.dist.

If m.diff <- function(z,y1,y2) we would need toexplicitly indicate the sources for the two variables y1 and y2 incombn(...,y1=...,y2=...).

31

Page 32: Stat 502 Design and Analysis of Experiments Two Sample Experiments

combn

The default function call (with no FUN speci�cation) combn(x,m, FUN = NULL, simplify = TRUE, ...) or justcombn(x, m) results in a matrix with columns representing allpossible combinations of m elements taken from the vector x .Don't use x= in ... since x is already implied as �rst argument.Here is an example with x=1:5=c(1,2,3,4,5) and m=3

> combn(1:5,3)[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]

[1,] 1 1 1 1 1 1 2 2 2 3[2,] 2 2 2 3 3 4 3 3 4 4[3,] 3 4 5 4 5 5 4 5 5 5

Note(53

)=(52

)= 5!/(3!2!) = 5 · 4/(1 · 2) = 10.

32

Page 33: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Reference Distribution of Y − X (continued)

rand.ref.dist, gives the reference distribution of Y − X .

We �nd a (two-sided) p-value of p.val=.02587 for ourobserved Y − X = −1.467 via p.val <-mean(abs(rand.ref.dist)>= 1.46666) =proportion of T or 1 in the logic vectorabs(rand.ref.dist)>= 1.46666.

The (two-sided) p-value is the probability of seeing an|Y − X |-value as extreme as or more extreme than the actuallyobserved |y − x | = 1.467, when in fact the hypothesis H0

holds true, i.e., under the randomization reference distribution.

The upfront randomization of �uxes and the assumption of H0

make this a valid probability statement!

Without that randomization we cannot speak of chance.

With upfront randomization and assuming H0 to be true, wecan view the observed di�erence as having arisen as a result ofone of 48620 equally likely splits.

33

Page 34: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Randomization Reference Distribution of Y − X

Y −− X == SIRY −− SIRX

freq

uenc

y

−2 −1 0 1 2

0.0

0.1

0.2

0.3

0.4

0.5

0.6

P((Y

−−X

≤≤−

1.46

6666

))==0.

0129

4 P((Y−−

X≥≥

1.466666))==0.01294

2−sided p−value

0.02587 reference distribution

0.02828 normal approximation

34

Page 35: Stat 502 Design and Analysis of Experiments Two Sample Experiments

How to Determine the P-Value

How to obtain the p-value from the reference distributions?

Note the following:> x <- 1:10> x>3[1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE> sum(x>3)[1] 7> mean(x>3) # the proportion of x values > 3[1] 0.7

Note that x>3 produced a logic vector with same length as x.

In arithmetic expressions or functions the logic values FALSEand TRUE are interpreted as 0 and 1, respectively, .

35

Page 36: Stat 502 Design and Analysis of Experiments Two Sample Experiments

How to Determine the P-Value (continued)

x = rand.ref.dist is the vector of all the di�erences ofmeans, Y − X , obtained for all 48620 possible splits.

mean(x<=-1.46666) and mean(x>=1.46666) give usthe respective one-sided p-values p1 = .01294 and p2 = .01294for the randomization reference distribution.

Rather than adding these 2 p-values to get a two-sided p-valuewe can also do this directly viamean(abs(x)>= 1.46666)=.02587.

Here abs(x) is the vector of absolute values in x.

36

Page 37: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Some Computational Issues

In the p-value calculation we used 1.46666 instead of 1.467p.val <- mean(abs(rand.ref.dist)>= 1.46666)Why?

The number −1.467 is obtained by rounding the truey − x = −1.46666.....Thus the logic vector abs(rand.ref.dist)>= 1.467would give an F to all those splits with |Y − X | = 1.46666....

mean(abs(rand.ref.dist)>= 1.467)=.02345would compute the chance of seeing a result for |Y − X | thatis more extreme than the observed |y − x | = 1.46666.....

However, the p-value is the chance of seeing a |Y − X | that isas extreme as or more extreme than the observed|y − x | = 1.46666...., i.e., p.val = .02587.

37

Page 38: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Interpreting the P-Value

The calculation of the p-value is based on two premises

Fluxes are assigned randomly to the 18 units upfront. TRUE!!The hypothesis H0 of no �ux e�ect. TRUE or FALSE.

If the p-value is very small, then we can either believe that wesaw a rather rare result for our test statistic Y − X while H0 istrue, or we should be induced to disbelieve H0 and reject it.

When is the p-value su�ciently small?

When p-value ≤ .05 the result is said to be signi�cant at level.05, and we should reject H0, if .05 is our prearranged cut-o�or signi�cance level.

The actual p-value is more informative than stating asigni�cant result at .05 or .01.

38

Page 39: Stat 502 Design and Analysis of Experiments Two Sample Experiments

One-Sided or Two-Sided P-Value?

A two-sided p-value is appropriate if we test the hypothesis ofno treatment di�erence against the alternative that there is adi�erence, without specifying in which direction.

Here we may hope for an insigni�cant result so that we may beable to use both treatments interchangeably.

A one-sided p-value would be appropriate if we test thehypothesis of no treatment di�erence against an alternativethat speci�es the direction in which the means may di�er, saythe newer treatment, �ux X, yields higher SIR values than �uxY, the previous treatment.

However, the test result should not in�uence the choice of thetargeted alternative.

39

Page 40: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Randomization Reference Distribution of Y − X

Y −− X == SIRY −− SIRX

freq

uenc

y

−2 −1 0 1 2

0.0

0.1

0.2

0.3

0.4

0.5

0.6

P((Y

−−X

≤≤−

1.46

6666

))==0.

0129

4 P((Y−−

X≥≥

1.466666))==0.01294

2−sided p−value

0.02587 reference distribution

0.02828 normal approximation

40

Page 41: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Which Normal Distribution Do We Use as Approximation?

Taking n values Y1, . . . ,Yn randomly and without replacementfrom Z1, . . . ,ZN , then (see Appendix A, slide 82) samplingtheory gives the mean and variance of Y =

∑ni=1 Yi/n as

E (Y ) = Z and var(Y ) =S2

n

(1− n

N

)with S2 =

1

N − 1

N∑i=1

(Zi−Z )2

(1− n/N) is called the �nite population correction factor.

Further theory, a version of the Central Limit Theorem (CLT),suggests that Y has an approximate normal distribution withthis mean and variance, i.e.,

Y ≈ N(E (Y ), var(Y )

)

41

Page 42: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Approximate Normality for Y − X

If X is the average of the N − n = m Z -values notcontained in the Y -sample, then

Z =nY + mX

N=⇒ Y − Z = Y − nY + mX

N=

m

N(Y − X )

=⇒ Y − X =N

m(Y − Z ) .

Since Z remains constant during any of this random sampling,it follows that Y − X is approximately normally distributedwith mean and variance given by E (Y − X ) = 0 and

var(Y − X ) =N2

m2

S2

n

(1− n

N

)= S2 N

mn= S2

(1

m+

1

n

)No more �nite population correction factor!

42

Page 43: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Normal QQ-Plot of Y − X Randomization Reference Distribution

● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ● ● ●

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

corresponding quantiles of the normal approximation

203

uniq

ue v

alue

s of

Y

−−X

the reference distribution has shorter tails than the normal approximation

σσ((Y −− X)) == 0.669

43

Page 44: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Construction of the Previous Normal QQ-Plot

Let D1 < D2 < . . . < D203 denote the distinct possible valuesof Y − X and let ni be the frequency of Y − X = Di amongthe M = 48620 possible splits.

The proportion of Y − X ≤ Di is pi = (n1 + . . .+ ni )/M

Could regard Di as the pi -quantile of the approximating normaldistribution with mean zero and standard deviation σ = .669.

Unfortunately p203 = 1 and the normal 1-quantile =∞.

To remedy this and also to treat p1 and p203 in a moresymmetric fashion we modify pi to pi = pi ·M/(M + 1) andplot Di against the pi -quantile of the above normaldistribution. This was done in the previous plot.

Another option, with roughly the same e�ect, is to usepi = (n1 + . . .+ ni − .5)/M.

We note that the QQ-plot shows the discrepancy in the normalapproximation much more clearly (in the tails) than thedensity plot superimposed on the histogram.

44

Page 45: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Some Experimenting with Randomization Tests

In the following we will take our SIR data and subtract Y − X

from all the Y values,

i.e., obtain Y ′i = Yi − (Y − X ), i = 1, 2, . . . , 9.

This altered Y ′ sample will have as averageY ′ = Y − (Y − X ) = X , i.e., both samples are aligned ontheir common average.Then we take Y ′1 and add 9× (Y − X ), i.e., obtainY ′′1 = Y ′1 + 9× (Y − X ), and leave the otherY ′′i = Y ′i , i = 2, . . . , 9 unchanged.

=⇒1

9

(Y ′′1 + . . .+ Y ′′9

)=

1

9

(Y ′1 + . . .+ Y ′9

)+ (Y − X ) = X + (Y − X ) = Y

=⇒ Y ′′ − X = Y − X same di�erence as before.

However, we round Y ′′1 , . . . ,Y′′9 to nearest decimal. This

changes Y ′′ − X slightly.

What is the e�ect of this data change on the randomizationreference distribution?

45

Page 46: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Box Plot View of Altered SIR Data

FluxX Y

SIR

( l

og10

((Ohm

)) )

−3

−2

−1

01

23

45

67

89

1011

1213

14 FLUXY −− FLUXX == −1.463

●●

●●

●●

46

Page 47: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Comments and Questions

The two samples seem well aligned when comparing medians.

The Y ′′ sample has an outlier, does not a�ect the median.

Y ′′ − X = −1.463 ≈ −1.467, our previous value (rounding).There is no di�erence in Flux, if we disregard the outlier.

The outlier is presumably a defective circuit board orsomething gone wrong with this board and is not a treatmente�ect (otherwise should see more such cases).

What happens to Y ′′ − X for all 48620 splits?

Where is −1.463 relative to the reference distribution?

What is its two-sided p-value?

47

Page 48: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Randomization Reference Distribution of Y ′′ − X

Y″″ −− X == SIR″″Y −− SIRX

freq

uenc

y

−2 0 2

0.0

0.1

0.2

0.3

0.4 P((Y″″ −− X ≤≤ −1.463)) == 0.2727 P((Y″″ −− X ≥≥ 1.463)) == 0.2727

2−sided p−value

0.5455 reference distribution

0.367 normal approximation

48

Page 49: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Comments

The randomization reference distribution does not judge thisoutlier sample an unusual split.

The randomization test is not fooled by the outlier.

The two-sided p-value of .55 shows no signi�cance.

The outlier has equal chance of being in either the X -sampleor the Y -sample under random splits.

The normal approximation is poor and should not be used here.

In fact, the randomization reference distribution looks like a(.5, .5) mixture of two approximately normal distributions,centered at ±1.463, respectively.The CLT e�ect still shows in these two constituentdistributions, generated as randomization distributions fromsamples of 8 and 9 or 9 and 8, respectively.

49

Page 50: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Su�cient Conditions for CLT

In order for limit theory to make sense we need to let thepopulation size N = m + n→∞.

Denote the population values by ZN1, . . . ,ZNN .

Let Y be the average of a random sample of size n(without replacement) from this population.

Su�cient conditions for asymptotic normality of Y are

m→∞ and n→∞

andmaxi (ZNi − ZN)2∑N

i=1(ZNi − ZN)2

−→ 0

For an accessible proof see Lehmann's Nonparametrics,Statistical Methods Based on Ranks, Springer 2006, pp. 352�.

While a single outlier may allow a CLT e�ect, it still shows itsadverse e�ect for small N.

50

Page 51: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Some More Experimenting with Randomization Tests

Take our SIR data and subtract Y − X from all the Y values,i.e., obtain Y ′i = Yi − (Y − X ), i = 1, 2, . . . , 9.

This altered Y ′ sample has average Y ′ = Y − (Y − X ) = X ,i.e., both samples are aligned on their common average.

Then we take Y ′1, . . . ,Y′4 and add (9/4)× (Y − X ) to each,

i.e., obtain Y ′′i = Y ′i + (9/4)× (Y − X ) for i = 1, 2, 3, 4, andleave the other Y ′′i = Y ′i , i = 5, . . . , 9 unchanged.

=⇒ 1

9

(Y ′′1 + . . .+ Y ′′9

)=

1

9

(Y ′1 + . . .+ Y ′9

)+ (Y − X )

= X + (Y − X ) = Y

=⇒ Y ′′ − X = Y − X same di�erence as before.

Again we will round Y ′′1 , . . . ,Y′′9 to nearest decimal.

What is the e�ect on the randomization reference distribution?

51

Page 52: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Box Plot View of Altered SIR Data

FluxX Y

SIR

( l

og10

((Ohm

)) )

56

78

910

1112

1314 FLUXY −− FLUXX == −1.452

●●

52

Page 53: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Comments and Questions

The two samples are not so well aligned, comparing medians.

Three Y values seem to fall apart from the other 6, whichresults in Y ′′ − X = −1.452 ≈ −1.467, our previous value.It seems that there is a di�erence in Flux, but it shows in adi�erence in mean and scale (center and spread).

The three �outliers� could presumably be defective circuitboards or represent more strongly varying treatment e�ects(treatment/unit interactions).

What happens to Y ′′ − X for all 48620 splits?

What is its two-sided p-value of −1.452 ?

53

Page 54: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Randomization Reference Distribution of Y ′′ − X

Y −− X == SIRY −− SIRX

freq

uenc

y

−2 0 2

0.0

0.1

0.2

0.3

0.4 P((Y −− X ≤≤ −1.452)) == 0.06495 P((Y −− X ≥≥ 1.452)) == 0.06495

2−sided p−value

0.1299 reference distribution

0.1208 normal approximation

54

Page 55: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Comments

The randomization reference distribution does not judge thismore dispersed and shifted Y -sample a very unusual split.

The randomization test looks only at the di�erence in averagesand not at scale changes, thus loses some e�cacy.

The two-sided p-value of .13 shows no signi�cance.

The normal approximation is reasonable when compared to theprevious case, especially when smoothing over the very localjaggedness, i.e., using a wider histogram bin width.

A di�erence in sample dispersion has a negative impact whencomparing for means.

55

Page 56: Stat 502 Design and Analysis of Experiments Two Sample Experiments

The E�ect of Spread

We take our SIR data and transform them as follows

X ′i = X + .5× (Xi − X ) and Y ′i = Y + .5× (Yi − Y )

X ′ = X and Y ′ = Y , thus Y ′ − X ′ = Y − X is unchanged.

However, we have reduced the spread in both samples by afactor .5√∑

i (X′i − X ′)2

m − 1= .5×

√∑i (Xi − X )2

m − 1and similarly for the Y ′i .

Again round X ′1, . . . ,X′9 and Y ′1, . . . ,Y

′9 to nearest decimal.

What is the e�ect of this change on the randomizationreference distribution?

56

Page 57: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Box Plot View of Altered SIR Data

FluxX Y

SIR

( l

og10

((Ohm

)) )

89

1011

1213 FLUXY −− FLUXX == −1.433

●●

●●

●●

57

Page 58: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Comments and Questions

The two samples seem to have roughly the same di�erence insample means Y ′ − X ′ = −1.433 ≈ −1.467 = Y − X in theoriginal data. The di�erence comes from the rounding.

Here the di�erence in �ux is more clearly de�ned, becauseeach sample is more tightly scattered around its center.

Clearer separation of signal from noise!

What is its two-sided p-value of −1.433?

58

Page 59: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Randomization Reference Distribution of Y ′ − X ′

Y −− X == SIRY −− SIRX

freq

uenc

y

−2 0 2

0.0

0.2

0.4

0.6

0.8

P((Y −− X ≤≤ −1.433)) == 0.0002262 P((Y −− X ≥≥ 1.433)) == 0.0002262

2−sided p−value

0.0004525 reference distribution

0.001152 normal approximation

59

Page 60: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Comments and Questions

As expected, the two-sided p-value .00045 is much smaller.

The normal approximation seems somewhat reasonable but theresulting two-sided p-value .0012 is more than twice as largeas the p-value .00045. (−→ Tail discrepancy! Slide 43)

Instead of tightening the spread of each sample by a factor .5enlarge the spread by a factor of 2. What would be the result?

Instead of tightening the spread of each sample by a factor .5tighten the spread by a factor of .5 in only one of the samples.What would be the result?

60

Page 61: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Computation Time and Storage Issues

ref.dist2 <- function () {SIR <- flux$SIRrand.ref.dist <- combn(1:18,9,

FUN=m.diff,y=SIR)}> system.time(ref.dist2())

user system elapsed1.23 0.00 1.26

1.26 seconds to compute the(189

)= 48620 values of the

reference distribution.

30 experimental units split into 15 and 15=⇒

(3015

)= 155, 117, 520 splits and it would take at least

1.26 · 155117520/48620 = 4019.9 seconds or 67 minutes.

Computation times and storage requirements get out of handin a hurry. We need an alternative computational approach.

61

Page 62: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Approximation to Randomization Reference Distribution

A simple way out of this dilemma of exploding computationtimes and storage requirements is to generate a su�cientlylarge random sample (with replacement), say K = 10, 000, ofcombinations from this set of all

(m+nm

)combinations.

For each sampled combination compute the statistic ofinterest, Si = S(X i ,Y i ) = Yi − Xi , i = 1, . . . ,K andapproximate the randomization reference CDF F (z) by theempirical distribution of the S1, . . . , SK .

Approximate

F (z) = P(S(X ,Y ) ≤ z) = P(Y−X ≤ z) ≈ FK (z) =1

K

K∑i=1

Bi (z)

with Bi (z) = 1 when Si = S(X i ,Y i ) ≤ z and Bi (z) = 0 else.

62

Page 63: Stat 502 Design and Analysis of Experiments Two Sample Experiments

The Law of Large Numbers (LLN)

For any z we have E (Bi (z)) = F (z).

The Bi (z) are independent and identically distributed (iid).

The Law of Large Numbers (LLN)

=⇒ FK (z) =1

K

K∑i=1

Bi (z) −→ F (z) as K →∞ .

Similarly, we can approximatep(z) = P(|S(X ,Y )| ≥ |z |) = P(|Y − X | ≥ |z |) by

pK (z) =1

K

K∑i=1

Bi (z)

with Bi (z) = 1 when |Si | ≥ |z | and Bi (z) = 0 else.

For z = y − x , the observed sample mean pK (z) gives us agood approximation for the true two-sided p-value p(z) of z .

63

Page 64: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Sample Simulation Program

This can be done in a loop using the sample function in R.

sim.ref.dist <- function(K=10000){D.star <- numeric(K)for(i in 1:K){

SIR.s <- sample(SIR)D.star[i] <- mean(SIR.s[1:9])-mean(SIR.s[10:18])

}D.star}

The output vector D.star holds the sampled referencedistribution of Y − X from which estimated p-values can becomputed for any observed y − x .

The following slide shows the QQ-plot comparison with the fullrandomization reference distribution, together with therespective p-values.

This approach should su�ce for practical purposes.

Problematic with many simultaneous tests (microarray data).Screen with normal or Student-t approximation upfront.

64

Page 65: Stat 502 Design and Analysis of Experiments Two Sample Experiments

QQ-Plot of Simulated & Full Randomization Reference Distribution

for Y − X

−2 −1 0 1 2

−2

−1

01

2

Y −− X for all combinations

Y−−

X fo

r al

l 100

00 s

ampl

ed c

ombi

natio

ns

p1 == 0.0119

p2 == 0.0127

p1 == 0.01294

p2 == 0.01294

65

Page 66: Stat 502 Design and Analysis of Experiments Two Sample Experiments

The 2-Sample Student t-Test Statistic

When testing the equality of means of two normal populationsit is customary and optimal to compare the respective samplemeans relative to a measure of the dispersion in the twosamples =⇒ Student's 2-sample t-statistic

t(X ,Y ) =(Y − X )/

√1/n + 1/m√

[∑n

i=1(Yi − Y )2 +∑m

j=1(Xj − X )2]/(m + n − 2)

In the presence of substantial inherent variation �small�di�erences in means could easily be due to this variation andthus will not appear signi�cant.

Judge mean di�erences relative to a measure of this variation.

So far we have not assumed that we sample normalpopulations. In fact, we only guaranteed that we sample froma �nite population Z1, . . . ,ZN .

66

Page 67: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Randomization Distribution of the 2-Sample t-Statistic

The values of t(X ,Y ) are in 1-1 monotone increasingcorrespondence with those of Y − X under all splits.See Appendix B, slide 85.

In fact,

=⇒ t(X ,Y ) =

√m + n − 2√1/n + 1/m

W√1− mn

NW 2

↗ in W

where

W = (Y − X )/√∑

(Zk − Z )2 .

with∑

(Zk − Z )2 staying constant over all splits.

Thus p-values for t(x , y) and y − x will be the same undereach respective randomization reference distribution.

67

Page 68: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Student t-Approximation for the t(X ,Y ) Reference Distribution

In regular1 situations the randomization reference distributionof t(X ,Y ) is very well approximated by a Studentt-distribution with 16 = 18− 1− 1 degrees of freedom.

This was very useful before computing and simulation werereadily available. All one needs to do is compute t(X ,Y ) forthe observed data and calculate its p-value from the Studentt16-distribution. Here t(x , y) = −2.512236.Reference: G.E.P Box and S.L. Anderson, �Permutation theory

in the derivation of robust criteria and the study of departures from

assumptions,�J. Roy. Stat. Soc., Ser. B, Vol 17 (1955), pp. 1-34.

The test based on t(X ,Y ) and its t-distribution under H0 alsoshows up in the normal-model-based approach to this problem.More on this later.

It is the concept of the t(X ,Y ) statistic (not Y − X ) thatgeneralizes to more complex experimental situations.

1Our outlier experiment shows what can happen68

Page 69: Stat 502 Design and Analysis of Experiments Two Sample Experiments

t(X ,Y ) Randomization Reference Distribution

t((X,, Y))

freq

uenc

y

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

P((t((

X,, Y

))≤≤−

2.51

2236

))==0.

0129

4 P((t((X,, Y))≥≥

2.512236))==0.01294

2−sided p−value

0.02587 reference distribution

0.023098 t16 −approximation

69

Page 70: Stat 502 Design and Analysis of Experiments Two Sample Experiments

t-QQ-Plot of t(X ,Y ) Randomization Reference Distribution

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−6 −4 −2 0 2 4 6

−6

−4

−2

02

46

corresponding quantiles of the Student t16 −approximation

203

uniq

ue v

alue

s of

t((

X,, Y

))

the reference distribution has longer tails than the Student t16 −approximation

70

Page 71: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Randomization Reference Distribution of Y ′′ − X

t((X,, Y))

freq

uenc

y

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7 P((t((X,, Y)) ≤≤ −0.8942885)) == 0.2727 P((t((X,, Y)) ≥≥ 0.8942885)) == 0.2727

2−sided p−value

0.5455 reference distribution0.38442 t16 −approximation

The outlier example shows that the t-distribution is not always agood �t.

71

Page 72: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Critical Value, Signi�cance Level & Type I Error

Any extreme value of |Y − X | or |t(X ,Y )| could either comeabout due to a rare chance event (via the randomization) ordue to H0 actually being wrong.

We have to make a decision: Reject H0 or not?

We may decide to reject H0 when |Y − X | ≥ C or|t(X ,Y )| ≥ C , where C or C are respective critical values.

To determine C or C one usually sets a signi�cance level αwhich limits the probability of rejecting H0 when in fact H0 istrue (Type I error).

The requirement

α = P(reject H0 | H0) = P(|Y−X | ≥ C | H0) = P(|t(X ,Y )| ≥ C | H0)

then determines C = Cα or C = Cα. α↘⇒ Cα ↗, Cα ↗.

P(. . . |H0) refers to the randomization reference distribution.

72

Page 73: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Signi�cance Levels and P-Values

When we reject H0 we would say that the results weresigni�cant at the level α, agreed upon prior to the experiment.

Commonly used values of α are α = .05 or α = .01,a remnant of statistical tables.

Rejecting at smaller α than these would be even strongerevidence against H0.

For how small an α would we still have rejected?

This leads us to the observed signi�cance level α or p-value forthe observed discrepancy value |y − x |

p-value = P(|Y−X | ≥ |y−x | | H0) = α i.e., Cα = |y−x |

α is the smallest α at which we would have rejected H0 withthe observed |y − x |.

73

Page 74: Stat 502 Design and Analysis of Experiments Two Sample Experiments

How to Determine the Critical Value C.crit

For α = .05 we want to �nd C.crit such thatmean(abs(x)>=C.crit)=.05. Here x =rand.ref.dist is the reference distribution vector.

Equivalently, �nd the .95-quantile of abs(x) viaC.crit=quantile(abs(x),.95).

From the full reference distribution we getC.crit(α = .05) = 1.28889 andC.crit(α = .01) = 1.64444.

From the simulated reference distribution we getC.crit(α = .05) = 1.31111 andC.crit(α = .01) = 1.66667.

Note that we avoided the use of C in place of C.crit becausein R the letter C has a predetermined meaning. =⇒ ?C.

R used to warn you when your chosen object name masks asystem object name. This warning no longer happens.

74

Page 75: Stat 502 Design and Analysis of Experiments Two Sample Experiments

What Does the t-Distribution Give Us?

What does the observed t-statistic t(x , y) = −2.5122 give as2-sided p-value?P(|t(X ,Y )| ≥ 2.5122) =2 ∗ (1− pt(2.5122,16)) = 2 ∗ pt(−2.5122,16) = .02310,≈ .02587 from the randomization reference distribution.

What are the critical values tcrit(α) for |t(X ,Y )| for levelα = .05, .01 tests?

We �nd tcrit(α = .05) = C.05 = qt(.975,16) = 2.1199 andtcrit(α = .01) = C.01 = qt(.995,16) = 2.9208, respectively.

With |t(x , y)| = 2.5122 we would reject H0 at α = .05 since|t(x , y)| ≥ 2.1199 but not at α = .01 since |t(x , y)| < 2.9208.

The same message derives from the p-value:.01 < .02310 ≤ .05.

75

Page 76: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Comparing P-Values

The randomization reference distribution gave us a p-value of:.02587

The normal approximation for the Y − X reference distributiongave us: .02828

The t16 approximation for t(X ,Y ) gave us: .02310.

The approximations deviate in opposite directions with roughlythe same magnitude .02828-.02587 = .00241 and.02310-.02587 = -.00277.

It is not clear whether the t approximation should be preferredbased on approximation quality.

The t approximation involves a single distribution while thenormal approximation still requires σ(Y − X ).

t(X ,Y ) �ts better into the larger picture of what lies ahead.

76

Page 77: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Critical Values of the Randomization Reference Distribution

t((X,, Y))

freq

uenc

y

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

observedαα == 0.05αα == 0.01

77

Page 78: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Some Concluding Comments

Critical values and �xed signi�cance levels were motivated by

theoretical considerations that allowed the comparison ofdi�erent level α tests and to �nd the best one.Neyman-Pearson optimality theory.

the planning of sample sizes to achieve su�cient power todetect treatment e�ects of speci�ed size.

easy tabulation of critical values for a limited set of α values.

the fact that computing was not what it is today.

Statistical tables are following the path of logarithm tables.

78

Page 79: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Calculator Simulations in 1980

Back in 1980 I asked colleagues Piet Groeneboom and RonPyke to look at the large sample distribution of a particularstatistic Sn (a very di�cult problem).

They succeeded in showing that the asymptotic distribution ofSn should be normal

Sn ≈ N(n log(n), 3n2 log(n)

)for large n.

I simulated the distribution for Sn on my HP-41 CV.

After days & weeks intermediate results did not look normal.

Since it was known that Sn ≥ 0, a natural heuristic for decentnormality would be

0 ≤ n log(n)−3×√3n2 log(n) =⇒ log(n) ≥ 27 or n ≥ 5.3×1011

79

Page 80: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Calculators during R.A. Fischer' Time

Photographed at the Bonn Arithmeum, July 2013.Fisher used such a machine for 14 years.

80

Page 81: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Calculators in/before My Time

81

Page 82: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Appendix A: Sampling Theory Proofs

The following two slides establish the mean and variance of Y whenY1, . . . ,Yn are randomly drawn without replacement from

Z1, . . . ,ZN with N = m + n.

E (Y ) = Z

and

var(Y ) =S2

n

(1− n

N

)with S2 =

1

N − 1

N∑i=1

(Zi − Z )2

Note the evident correctness when n = N (m = 0).

82

Page 83: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Sampling Theory

Write Y =∑N

i=1 IiZi/n , where the random variable Ii = 1whenever the Y -sample contains Zi and Ii = 0 otherwise. Then

E (Ii ) = P(Ii = 1) =

(1

1

)(N−1

n−1

)(Nn

) =(N − 1)!n!(N − n)!

N!(n − 1)!(N − n)!=

n

N.

and for i 6= j

E (Ii Ij) = P(Ii = 1, Ij = 1) =

(2

2

)(N−2

n−2

)(Nn

) =(N − 2)!n!(N − n)!

N!(n − 2)!(N − n)!=

n(n − 1)

N(N − 1).

The Z1, . . . ,ZN are the �nite population from which we sample.

=⇒ E (Y ) =1

n

N∑i=1

Zi E (Ii ) =1

n

N∑i=1

Zin

N=

1

N

N∑i=1

Zi = Z .

Further note

Y =1

n

N∑i=1

IiZi =1

n

N∑i=1

Ii (Zi − Z ) + Z sinceN∑i=1

Ii = n .

83

Page 84: Stat 502 Design and Analysis of Experiments Two Sample Experiments

var(Y ) = S2

n

(1− n

N

)Writing zi = Zi − Z with

∑Ni=1 zi = 0 and dropping the constant Z

in the variance

=⇒ var(Y ) = var

(1

n

N∑i=1

Ii zi

)= cov

1

n

N∑i=1

Ii zi ,1

n

N∑j=1

Ijzj

=1

n2

N∑i=1

N∑j=1

zi zjcov(Ii , Ij )

For i = j we have cov(Ii , Ij) = var(Ii ) = (n/N)(1− n/N) = nm/N2 andfor i 6= j we have

cov(Ii , Ij ) = E(Ii Ij )− E(Ii )E(Ij ) = P(Ii = 1, Ij = 1)−n

N

n

N=

n(n − 1)

N(N − 1)−

n2

N2

=n

N

(n − 1

N − 1−

n

N

)=

n

N

N(n − 1)− (N − 1)n

N(N − 1)= −

n

N

N − n

N(N − 1)= −

nm

N2(N − 1)

var(Y ) =1

n2

N∑i=1

z2inm

N2−

1

n2

∑i 6=j

zi zjnm

N2(N − 1)= S2

(nm(N − 1)

n2N2+

nm

n2N2

)=

S2

n

(1−

n

N

)

since 0 =∑

i zi∑

j zj =∑

i

∑j zizj =

∑i z

2i +

∑i 6=j zizj , i.e.,∑

i 6=j zizj = −∑

i z2i .

84

Page 85: Stat 502 Design and Analysis of Experiments Two Sample Experiments

Appendix B: Monotone Equivalence of Y − X and t(X ,Y )

The values of t(X ,Y ) are in 1-1 monotone increasingcorrespondence with those of Y − X under all splits.

∑(Zk − Z)2 −

mn

N

(Y − X

)2=∑

Y 2

i +∑

X 2

j −1

N

(mX + nY

)2 − mn

N

(Y − X

)2=∑

Y 2

i +∑

X 2

j − nY 2 −mX 2 =∑

(Yi − Y )2 +∑

(Xi − X )2

Notet(X ,Y )

√1/n + 1/m

√m + n − 2

=Y − X√∑

(Yi − Y )2 +∑

(Xi − X )2=

W√1− mn

NW 2

↗ inW

with W = (Y − X )/√∑

(Zk − Z )2, where∑

(Zk − Z )2 isconstant over all splits.

=⇒ t(X ,Y ) =

√m + n − 2√1/n + 1/m

W√1− mn

NW 2

or W =t(X ,Y )√

m+n−21/n+1/m

+ mnNt(X ,Y )2

85