valério de patta pillar departamento de ecologia universidade federal do rio grande do sul

23
Accuracy and power of randomization tests in multivariate analysis of variance with vegetation data Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul Porto Alegre, Brazil [email protected] http://ecoqua.ecologia.ufrgs.br

Upload: allayna

Post on 05-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Accuracy and power of randomization tests in multivariate analysis of variance with vegetation data. Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul Porto Alegre, Brazil [email protected] http://ecoqua.ecologia.ufrgs.br. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

Accuracy and power of randomization tests in multivariate analysis of variance with vegetation data

Valério De Patta PillarDepartamento de Ecologia

Universidade Federal do Rio Grande do SulPorto Alegre, Brazil

[email protected]://ecoqua.ecologia.ufrgs.br

Page 2: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

• Randomization testing:

– Became practical with fast microcomputers.

– Applicable to most cases analyzed by classical methods.

– Applicable to cases not covered by classical methods.

Page 3: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

How good is randomization testing?

• Is it accurate?• Is it powerful enough?

Page 4: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

Group comparison by randomization testing

Choose a test criterion () to compare the groups

Permute the data according to the conditions stated by the null hypothesis

(Ho) that the groups do not differ

Calculate the test criterionin the random data and compare it to the value found in the observed data.

After many iterations, the probability P(o ≥ ) will be the number of iterations with o ≥ divided by the total number of

iterations.

Reject Ho if P(o ≥ ) is smaller than a threshold ()

Manly, B. F. J. 1997. Randomization, Bootstrap and Monte Carlo Methods in Biology. 2 ed. Chapman and Hall.

Page 5: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

Randomization test criteria for multivariate comparisons of any number

of groupsSum of squares between groups* (

Qb )

Qb Qt Qw

Qt 1

ndhi

2

i h1

n

h1

n1

is the total sum of squares of n(n-1)/2 pair-wise squareddissimilarities between n sampling units.

Qw Qwc

c1

k

is the sum of squares within k groups, such that

Qwc

1

nc

dhi |c2

i h1

n

h1

n1

dhi |c2

is comparing units belonging to group c.

Pseudo F-ratio

Qb / Qw between the sum of squares

between groups (

Qb ) and within groups (

Qw)

*Pillar, V. D. & Orlóci, L. 1996. J. Veg. Sci. 7:585-592.

Page 6: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

An example

How common is a Qb ≥ 50.068 if Ho were true (that the composition is unrelated to group)?

Cover-abundance of two plant functional types (PFTs ) in 14experimental plots of n atural grassland, under differentlevels of N fertilizer addition (Sosinski 2000).

Plots

1 2 3 4 5 6 7 8 9 10N level 0 30 30 100 100 100 100 170 170 200PFT 4 6 5 5 3 3.2 1.6 0.6 0.8 1.4 0.4

PFT 13 1 0 1 2.2 2 2.2 0 0.8 2.6 5.6

Observed squared distance matrix0

2.1 01.3 0.5 07.4 9.7 5.8 06.1 8.0 4.5 0.1 0

16.6 17.8 12.8 2.0 2.6 027.0 21.2 18.0 10.6 10.8 5.8 023.4 20.0 16.0 6.8 7.2 2.6 0.7 019.1 21.2 15.6 2.7 3.6 0.2 7.4 3.6 044.7 54.4 44.4 18.3 20.8 13.0 31.4 23.2 10.0 0

SQ within groups (Qw) = 0/1 + 0.5/2 + (0.1+2+2.6+10.6+10.8+5.8)/4 + 3.6/2 + 0/1 = 10.02

Total sum of squares (Qt)= (2.1 + 1.3 + ... + 10.0)/10 = 60.088

SQ between groups(Qb) = 60.088 - 10.02 = 50.068

200

170

170

100 100100

100

30

30

0

00

Is there a significant effect of N on vegetation composition as defined by these two PFTs?

Page 7: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

Reference set under Ho

Factor groups 0 30 30 100 100 100 100 170170200

Observation vectors 1 2 3 4 5 6 7 8 9 10

One possible permutation:Factor groups 0 30 30 100 100 100 100 170170200

Observation vectors 4 8 7 6 10 3 9 2 5 1

If Ho true, the observation vector in a given sampling unit is independent from the group to which the unit belongs.

Page 8: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

Sampling unit 0 30 30 100 100 100 100 170 170 200Vector 4 8 7 6 10 3 9 2 5 1

A random permutation and corresponding statistics

200

170

170

100 100100

100

30

30

0

00

Observed

100

100

30

100 0170

30

100

170

200

00

Permuted

SQ within groups (Qwo) = 0/1 + 0.7/2 + (13+12.8+44.4+0.2+10+15.6)/4 + 8/2 + 0/1 = 28.35

Total sum of squares (Qt)= (6.8 + 10.6 + ... + 6.1)/10 = 60.088

SQ between groups(Qbo) = 60.088 - 28.35 = 31.738Since, 31.738 < 50.068 (Qbo < Qb), this iteration adds zero to the frequency of cases in which Qbo ≥ Qb.

06.8 0

10.6 0.7 02.0 2.6 5.8 0

18.3 23.2 31.4 13.0 05.8 16.0 18.0 12.8 44.4 02.7 3.6 7.4 0.2 10.0 15.6 09.7 20.0 21.2 17.8 54.4 0.5 21.2 00.1 7.2 10.8 2.6 20.8 4.5 3.6 8.0 07.4 23.4 27.0 16.6 44.7 1.3 19.1 2.1 6.1 0

Permuted squared distance matrix

Page 9: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

After 10000 random permutations…--------------------------- ------------------ ------------------- ------------------ -RANDOMIZATION TEST--------------------------- ------------------ ------------------- ------------------ -Elapsed time: 1 secondNumber of iterations: 10000Group partition of sampling units:Sampling units: 1 2 3 4 5 6 7 8 9 1 0Factor N: 1 2 2 3 3 3 3 4 4 5

Source of variation Sum of squares(Q) P(QboQb)--------------------------- ------------------ ------------------- ------------------ -N:Between groups 50.068 0.0049Within groups 10.02--------------------------- ------------------ ------------------- ------------------ -Tot al 60.088

Group centroid vectors in each group:Factor N:Group 1 (n=1): 5.6 1.4Group 2 (n=2): 5 0.3Group 3 (n=4): 2.1 1.6Group 4 (n=2): 1.1 1.7Group 5 (n=1): 0.4 5.6

Page 10: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

Two-factor designsTest criterion:

Qb = Qt - Qw is based on the groups defined by the joint states of the factors.

Qb is partitioned as

Qb = Qb|A + Qb|B + Qb|AB whereQb|A: sum of squares between la groups according to factor A

disregarding factor BQb|B: sum of squares between lb groups according to factor B

disregarding factor AQb|AB: sum of squares of the interaction AB, obtained by

difference.

F-ratio = Qb/Qw

Page 11: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

Unrestricted permutation in two-factor design

Groups factor A 1 1 1 1 2 2 2 2Groups factor B 1 2 3 4 1 2 3 4Observation vectors 1 2 3 4 5 6 7 8

One permutation:Groups factor A 1 1 1 1 2 2 2 2Groups factor B 1 2 3 4 1 2 3 4Observation vectors 6 8 1 4 5 7 2 3

Page 12: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

A B1 1 0.01 2 102.7 0.02 1 40.7 101.0 0.02 2 149.1 55.2 149.7 0.01 1 42.0 133.2 46.5 176.9 0.01 2 118.7 62.5 102.5 93.2 111.8 0.02 1 50.7 95.9 47.7 153.9 47.5 97.8 0.02 2 66.9 75.7 69.3 120.8 85.3 57.0 57.7 0.0

A 1 1 2 2 1 1 2 2B 1 2 1 2 1 2 1 2

Two-factor Multivariate Analysis of Variance

A B1 1 0.01 2 57.7 0.02 1 66.9 50.7 0.02 2 69.3 47.7 40.7 0.01 1 75.7 95.9 102.7 101.0 0.01 2 85.3 47.5 42.0 46.5 133.2 0.02 1 57.0 97.8 118.7 102.5 62.5 111.8 0.02 2 120.8 153.9 149.1 149.7 55.2 176.9 93.2 0.0

A 1 1 2 2 1 1 2 2B 1 2 1 2 1 2 1 2

ObservedIteration 2 …Factor A: 21.5 26.6 …Factor B: 129.1 37.5 …Interaction A x B: 26.9 54 …

Qb combination A and B: 177.5 118.1 …

Sum of squares within groups: 136.5 195.9 …

Tot al sum of squares: 314 314 …

Observed One random permutation

Data: Species (57) composition in 8 vegetation units surveyed in two landscape positions (factor A) and two grazing levels (factor B).

Page 13: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

After 10000 random permutations…

Source of variation Sum of squares(Q) F=Qb/Q w P(FoF)--------------------------- ------------------ ------------------- --------------A (landscape position):Between groups 21.519 0.15769 0.7172--------------------------- ------------------ ------------------- --------------B (grazing):Between groups 129.11 0.9461 0.0246--------------------------- ------------------ ------------------- --------------A x B 26.913 0.19722 0.5776--------------------------- ------------------ ------------------- --------------Between groups 177.54 1.301 0.1236Within groups 136.46--------------------------- ------------------ ------------------- --------------Tot al 314

Data: Species (57) composition in 8 vegetation units surveyed in two landscape positions (factor A) and two grazing levels (factor B). Unrestricted random permutations. Test criterion F-ratio = Qb/Qw.

Page 14: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

Restricted permutations• In two-factor (not nested) designs, for testing one factor,

permutations may be restricted to occur within the levels of the other factor (Edgington 1987).

• Restricted permutation within the levels of factor A (for testing factor B):

Groups factor A 1 1 1 1 2 2 2 2Groups factor B 1 1 2 2 1 1 2 2Observation vector identities1 2 3 4 5 6 7 8

One permutation:Groups factor A 1 1 1 1 2 2 2 2Groups factor B 1 1 2 2 1 1 2 2Observation vector identities2 4 3 1 5 7 8 6

Edgington, E. S. 1987. Randomization Tests. Marcel Dekker, New

York.

Page 15: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

Permutations of residuals instead of raw data

Permutation of observation vectors in which the effects of both factors wereremoved can overcome impossibility of exact tests for interactions.

Residuals are computed in the data before obtaining the dissimilarity matrix.

For testing the interaction in two-factor analysis the residuals remove both factors:

zhijk yhijk y hi.. y h. j. y h...

yhijk: observation of variable h in unit k, belonging to group i in factor A and to

group j in factor B;

y hi..: mean for variable h in factor A group i;

y h. j.: mean for variable h in factor B group j;

y h...: overall mean for variable h in the data set.

Anderson, M.J . and ter Braak, C. 2003, J ournal of Statistical Computations and Simulations 73:85-113.

Page 16: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

--------------------------- ------------------ ------------------- ------------------ -RANDOMIZATION TEST--------------------------- ------------------ ------------------- ------------------ -Elapsed time: 7 secondsNumber of iterations: 10000Group partition of sampling units:Sampling units: 1 2 3 4 5 6 7 8 9 1 0 11 12 1 3 14 15 1 6Landscape position: 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4Grazing: 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

Source of variation Sum of squares(Q) F=Qb/Q w P(FoF)--------------------------- ------------------ ------------------- ------------------ -Landscape position:Between groups 439.55 1.6858 0.0003Contrasts: 1 -1 0 0 21.519 0.073575 0.83011 0 -1 0 97.293 0.31375 0.058 1 0 0 -1 250 1.315 0.059 0 1 -1 0 107.92 0.34032 0.0519 0 1 0 -1 240.88 1.2219 0.0564 0 0 1 -1 161.48 0.75192 0.05631 1 -2 0 129.64 0.26931 0.00411 1 1 -3 288.39 0.43803 0.0014--------------------------- ------------------ ------------------- --------------Grazing:Between groups 122.94 0.47152 0.0055--------------------------- ------------------ ------------------- --------------LanPosition x Grazing 123.56 0.4739 0.1605--------------------------- ------------------ ------------------- --------------Between groups 686.05 2.6312 0.0003Within groups 260.73--------------------------- ------------------ ------------------- ------------------ -Tot al 946.78

Two-factor multivariate analysis of variance by randomization testing for the effects of landscape position and grazing level in natural grassland, southern Brazil (data from Pillar 1986). The data set contains 16 polled community stands by 60 species.

Restricted random permutations for testing factors landscape and grazing.

Permutation of residuals removing both factors for testing the interaction.

Page 17: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

How good is randomization testing in two-factor

multivariate analysis of variance?

Page 18: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

Simulation of interaction

Factor A 1 1 2 2Factor B 1 2 1 2

Group averages0.00 0.00 0.16 0.160.00 0.16 0.16 0.320.00 0.16 0.16 0.000.00 0.16 0.16 0.480.00 0.16 0.16 0.64

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

1 2

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1 2

-0 .0 2

0

0 .0 2

0 .0 4

0 .0 6

0 .0 8

0 .1

0 .1 2

0 .1 4

0 .1 6

0 .1 8

1 2

0

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

1 2

0

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

1 2

• For each case, 1000 data sets were generated, with distribution properties of real vegetation data and subject to multivariate analysis of variance with randomization testing.

• When factor or interaction effect is set to zero, the proportion of Ho rejection under a given threshold estimates Type I Error, the probability of wrongly rejecting Ho when it is true.

• If Type I Error is equal to , the test is exact.• When factor or interaction effect > 0, the proportion of

Ho rejection estimates the power of the test, which is the one-complement of Type II Error, the probability of not rejecting Ho when it is false.

Page 19: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

Simulated data generated with distributional properties of real data

Data set: 16 grassland units described by cover of 60 species.

Two factors: landscape position (top-convex, concave-lowland) and grazing

levels (grazed, ungrazed). Procedure described by Peres-Neto & Olden (2000):

1. Calculate the mean () and the standard deviation (ij) for each species vector i within each group j defined by the four factor level combinations;

2. Standardize these vectors for mean equal 0 and standard deviation equal 1, thij=(xhij-)/ ;

3. Randomly permute whole stand vectors across groups;4. Restore the original dispersion within each group by computing new

observations shij= thij, defining in this way a data set with the conditions specified by Ho;

5. Apply to the species vectors the corresponding group differences for factor and interaction effects;

6. Perform the randomization tests using 1000 random permutations;7. Repeat the steps (3) to (6) 1000 times, recording the proportion of Ho

rejection.

Peres-Neto, P.R. & Olden, J.D. 2000. Animal Behaviour 61: 79-86.

Page 20: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

1 2

Increasing effect of one factor, no interaction:Proportion of Ho rejection

Unrestricted permutationsEffects Qb F-ratio

1 2 1x2 1 2 1x2 1 2 1x20.00 0.00 0.00 0.031 0.03 0.048 0.032 0.042 0.0490.16 0.00 0.00 0.977 0.012 0.015 0.971 0.036 0.0410.32 0.00 0.00 1.000 0.000 0.000 1.000 0.012 0.023

With no factor and interaction effects, type I error is not different from 0.05, as expected by using = 0.05.

Results of power evaluation by data simulation in two-factor MANOVA. The proportion of Ho rejection at = 0.05 was obtained for 1000 simulated data sets generated on the basis of plant community data with 16 units and 60 species, with increasing difference between the two groups for factor 1, with no interaction. Each factor combination had equal number of units. For each data set a randomization test was run with 1000 iterations.

Restricted Residuals F1 2 1x2

0.043 0.042 0.0560.971 0.042 0.0411.000 0.040 0.047

As the effect of factor 1 increases, type I error for factor 2 and interaction are underestimated with unrestricted permutations with Qb and -ratio, but not with restricted permutations and residuals.

Page 21: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

As the effects of both factors increase, type I error for the interaction is underestimated with unrestricted permutations with Qb and -ratio, but not with residuals.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1 2

Increasing effect of both factors, no interaction:Proportion of Ho rejection

Effects Qb F-ratio1 2 1x2 1 2 1x2 1 2 1x2

0.16 0.16 0.00 0.777 0.803 0.003 0.932 0.925 0.0270.32 0.32 0.00 1.000 1.000 0.000 1.000 1.000 0.005

Restricted Residuals F1 1x2

0.971 0.0411.000 0.053

Page 22: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

As the effect of interaction increases, type I error for both factors is underestimated with Qb and -ratio, un- and restricted permutations.

No factors effects, increasing interaction:Proportion of Ho rejection

Effects Qb F-ratio1 2 1x2 1 2 1x2 1 2 1x2

0.00 0.00 0.16 0.009 0.007 0.970 0.023 0.039 0.9630.00 0.00 0.32 0.000 0.000 1.000 0.007 0.009 1.000

-0 .0 2

0

0 .0 2

0 .0 4

0 .0 6

0 .0 8

0 .1

0 .1 2

0 .1 4

0 .1 6

0 .1 8

1 2

Restricted Residuals F1 1x2

0.000 0.9560.000 1.000

But, main factors should not be considered at all when interaction is present!

Page 23: Valério De Patta Pillar Departamento de Ecologia Universidade Federal do Rio Grande do Sul

0

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

1 2

Increasing effect of both factors, with weak orstronger interaction:

Proportion of Ho rejectionEffects Qb F-ratio

1 2 1x2 1 2 1x2 1 2 1x20.08 0.08 0.16 0.112 0.158 0.200 0.184 0.198 0.2300.24 0.24 0.16 0.987 0.989 0.001 1.000 1.000 0.061

Proportion of Ho rejectionEffects Qb F-ratio

1 2 1x2 1 2 1x2 1 2 1x20.16 0.16 0.32 0.445 0.505 0.555 0.849 0.828 0.8600.32 0.32 0.32 0.999 0.995 0.000 1.000 1.000 0.421

As the effects of both factors increase, the power of the test with permutations of raw data is decreased for detecting the interaction when using Qb and -ratio, but not when permuting residuals.

Restricted Residuals F1 1x2

0.159 0.2551.000 0.222

0

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

1 2

Restricted Residuals F1 1x2

0.823 0.9411.000 0.956