from: mccune, b. & j. b. grace. 2002. analysis of ecological communities. mjm software design,...

43
CHAPTER 21 Canonical Correspondence Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon http://www.pcord.com Tables, Figures, and Equations

Upload: juan-hay

Post on 10-Dec-2015

263 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

CHAPTER 21

Canonical Correspondence Analysis

From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon http://www.pcord.com

Tables, Figures, and Equations

Page 2: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Yes

No

Linear

UnimodalInterested only incommunity structure that isrelated to measuredenvironmental variables?

Do not use CCAor RDA

Species have which kind ofrelationship to explanatoryvariables?

Use CCA

Use RDA

Figure 21.1. Decision tree for using CCA for community data. Assume that we have a site species matrix and a site environment matrix and that chi-square distances are acceptable. RDA is a constrained ordination method based on a linear model (see “Variations” below).

Page 3: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Matrix relationships Questions for which CCA is OK Questions for which CCA is not OK

A only Not applicable. What are the strongest gradients in species composition?

Ho: no linear relationship between A E

Are any aspects of community structure related to these environmental variables?

Are the strongest community gradients related to these environmental variables?

Describe A E How is the community structure related to these environmental variables?

How are the strongest gradients in community structure related to these environmental variables?

Table 21.1. Questions about the community (A) and environmental or experimental design (E) matrices that are appropriate for using CCA.

Page 4: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

The basic method

The species data matrix Y contains nonnegative abundances,yij, for i = 1 to n sample units and j = 1 to p species.

y+j indicates species totals

yi+ indicates and sample unit (site) totals

The environmental matrix Z contains values n sites by q environmental variables.

1. Start with arbitrary but unequal site scores, x.

Page 5: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

2. Calculate species scores, u, by weighted averaging of the site scores:

j-

i=

n

ij i + ju = y x / y1

= user-selected scaling constant as described later.

Page 6: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

2. Calculate species scores, u, by weighted averaging of the site scores:

j-

i=

n

ij i + ju = y x / y1

= user-selected scaling constant as described later.

Score for species j

Page 7: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

2. Calculate species scores, u, by weighted averaging of the site scores:

j-

i=

n

ij i + ju = y x / y1

= user-selected scaling constant as described later.

Score for species j

Score (weight) for site i

Page 8: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

3. Calculate new site scores, x*, by weighted averaging of the species scores:

= user-selected scaling constant as described later.

x = y u / yi-

j=

p

ij j i+* 1

1

Page 9: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

3. Calculate new site scores, x*, by weighted averaging of the species scores:

= user-selected scaling constant as described later.

Score for site i

Score (weight) for species j

x = y u / yi-

j=

p

ij j i+* 1

1

Page 10: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

4. Obtain regression coefficients, b, by weighted least-squares multiple regression of the sites scores on the environmental variables. The weights are the site totals stored in the diagonal of the otherwise empty, n n square matrix R.

b Z R Z Z R x = -( ) * 1

Page 11: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

4. Obtain regression coefficients, b, by weighted least-squares multiple regression of the sites scores on the environmental variables. The weights are the site totals stored in the diagonal of the otherwise empty, n n square matrix R.

b Z R Z Z R x = -( ) * 1

Environmental matrix

WA scores

Page 12: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

5. Calculate new site scores that are the fitted values from the preceding regression:

x zb =

These are the "LC scores" of Palmer (1993), which are linear combinations of the environmental variables.

Page 13: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

6. Adjust the site scores by making them uncorrelated with previous axes by weighted least squares multiple regression of the current site scores on the site scores of the preceding axes (if any). The adjusted scores are the residuals from this regression.

Page 14: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

7. Center and standardize the site scores to a mean = 0 and variance = 1.

Page 15: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

8. Check for convergence on a stable solution by summing the squared differences in site scores from those in the previous iteration. If the convergence criterion (detailed below) has not been reached, return to step 2.

Page 16: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

9. Save site scores and species scores, then construct additional axes as desired by going to step 1.

Page 17: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Axis scaling

Centered with Unit Variance. The site scores are rescaled such that the mean is zero and the variance is one. Three steps:

x w x

s w x x

xx x

s

i

i*

i

i

i*

i

i* i

-

-

2 ( )

( )

where xi

* is the new site score

wi* is the weight for site i

(wi* = yi+ / y++)

Page 18: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Hill's scaling standardizes the scores such that:

y x u = yij i ji k

++( ),

2

In CCA, Hill's scaling is accomplished by multiplying the scores by a constant based on / 1- (see below). Thus it is a linear rescaling of the axis scores.

Page 19: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Table 21.2. Constants used for rescaling site and species scores in CCA. Combining the choices for axis scaling and optimizing species or sites results in the following constants used to rescale particular axes. Lambda () is the eigenvalue for the given axis. Alpha () is selected as described in the text.

Biplot scaling

Hill's scaling

Constant for rescaling species scores

1

1

1 ( )-

Constant for rescaling site scores

1 -

Page 20: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Interpreting output1. Correlations among explanatory variables

LogPoll Var2 Var3

LogPoll 1 0.107 -0.119

Var2 0.107 1 -0.039

Var3 -0.119 -0.039 1

Table 21.3. Correlations among the environmental variables.

Page 21: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

2. Iteration report.

ITERATION REPORT-----------------------------------------------------------------Calculating axis 1Residual = 0.53E+04 at iteration 1Residual = 0.96E-01 at iteration 2Residual = 0.47E-01 at iteration 3Residual = 0.19E-01 at iteration 4Residual = 0.84E-02 at iteration 5Residual = 0.43E-02 at iteration 6Residual = 0.24E-02 at iteration 7Residual = 0.14E-02 at iteration 8Residual = 0.88E-03 at iteration 9Residual = 0.54E-03 at iteration 10Residual = 0.46E-05 at iteration 20Residual = 0.40E-07 at iteration 30Residual = 0.34E-09 at iteration 40Residual = 0.30E-11 at iteration 50Residual = 0.69E-13 at iteration 58Solution reached tolerance of 0.100000E-12 after 58 iterations.-----------------------------------------------------------------Calculating axis 2Residual = 0.20E+01 at iteration 1Residual = 0.30E-03 at iteration 2etc....

Page 22: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

3. Total variance in the species data. It is the sum of squared deviations from expected values, which are based on the row and column totals. Let

eij = the expected value of species j at site i

y+j = total for species j,

yi+ = total for site i, and

y++ = community matrix grand total.

iji+ + j

++

e = y y

y

The variance of species j, var(yj), is

var( )( )

ji=

n

ij ij ij

+ j

y = y e / e

y1

2

Page 23: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

and the total variance is

total variance = y / y yi=

n

+ j ++ j1

var ( )

Page 24: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

4. Axis summary statistics

Axis 1 Axis 2 Axis 3

Eigenvalue 0.636 0.044 0.016

Variance in species data

% of variance explained 14.4 1.0 0.4

Cumulative % explained 14.4 15.4 15.8

Pearson Correlation, Spp-Envt 0.900 0.307 0.213

Kendall (Rank) Corr., Spp-Envt 0.717 0.167 0.158

Table 21.4. Axis summary statistics

Page 25: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

5. Multiple regression results

Table 21.5. Multiple regression results (regression of sites in species space on environmental variables).

Canonical Coefficients

Standardized Original Units

Variable Axis 1 Axis 2 Axis 3 Axis 1 Axis 2 Axis 3 S.Dev

LogPoll -0.799 0.014 0.009 -2.385 0.041 0.027 0.335 Var2 0.033 -0.194 0.048 0.11 -0.638 0.159 0.304 Var3 0.003 0.075 0.118 0.01 0.242 0.378 0.312

Page 26: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

6. Final scores for sites and species. Ordination scores (coordinates on ordination axes) are given for each site, x, and each species, u (Tables 21.6, 21.7, 21.8).

WA scores Raw Data

Axis 1 Axis 2 Axis 3 Totals

Site1 1.298381 1.555888 -0.98204 1131 Site2 1.17872 1.19812 -1.26412 1000 Site3 0.808255 0.145479 -1.10749 721 Site4 0.335053 -1.16647 -0.34654 635 Site5 0.204182 -1.40531 -0.05847 735 ... Site99 -1.15441 0.044354 0.100729 580 Site100 -1.31167 -0.45384 -0.23881 748

Table 21.6. Sample unit scores that are derived from the scores of species. These are the WA scores. Raw data totals (weights) are also given

Page 27: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Table 21.7. Sample unit scores that are linear combinations of environmental variables for 100 sites. These are the LC Scores that are plotted in Fig. 21.3.

Axis 1 Axis 2 Axis 3

Site1 0.857 0.213 0.012

Site2 0.423 -0.100 -0.103

Site3 0.646 0.103 -0.024

Site4 0.474 -0.238 -0.104

Site5 -0.107 -0.297 0.078

...

Site99 -0.807 0.159 0.147

Site100 -1.405 0.011 -0.084

Page 28: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Table 21.8. Species scores and raw data totals (weights).

Raw Data

Axis 1 Axis 2 Axis 3 Totals

Sp1 -0.769 4.211 4.643 16

Sp2 -1.608 0.240 -2.377 627

Sp3 -1.051 1.623 1.862 68

Sp4 1.344 1.817 -0.794 3464

...

Sp37 -1.640 1.072 -1.748 164

Sp38 -1.158 3.689 2.161 7

Page 29: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

From: McCune, B. 1997. Influence of noisy environmental data on canonical correspondence analysis. Ecology 78:2617-2623.

Page 30: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

LC Scores WA Scores

No noise

Figure 21.2 Influence of the type and amount of noise in environmental data on LC site scores (left column) and WA site scores (right column) from CCA, based on analysis of simulated responses of 40 species to two independent environmental gradients of approximately equal strength.

Page 31: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

LC Scores WA Scores

Moderate noise added to two otherwise perfect environmental variables

Figure 21.2 (cont.) A small amount of noise added to the two environmental variables.

Page 32: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

LC Scores WA Scores

10 random environmental variables

Figure 21.2 (cont.) The two underlying environmental variables replaced with ten random variables.

Page 33: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

7. Weights for sites and species. Sites and species are weighted by their totals.

Table 21.8. Species scores and raw data totals (weights).

Raw Data

Axis 1 Axis 2 Axis 3 Totals

Sp1 -0.769 4.211 4.643 16

Sp2 -1.608 0.240 -2.377 627

Sp3 -1.051 1.623 1.862 68

Sp4 1.344 1.817 -0.794 3464

...

Sp37 -1.640 1.072 -1.748 164

Sp38 -1.158 3.689 2.161 7

Page 34: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

8. Correlations of environmental variables with ordination axes.

"interset correlations" are correlations of environmental variables with x*, the WA scores.

"intraset correlations" are correlations of environmental variables with x the LC scores.

Page 35: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Table 21.9. Biplot scores and correlations for the environmental variables with the ordination axes. Biplot scores are used to plot the vectors in the ordination diagram. Two kinds of correlations are shown, interset and intraset.

Variable Axis 1 Axis 2 Axis 3

BIPLOT scores

LogPoll -0.797 -0.008 0.002

Var2 -0.028 -0.196 0.045

Var3 0.073 0.081 0.115

INTRASET correlations

LogPoll -0.999 -0.038 0.018

Var2 -0.035 -0.933 0.357

Var3 0.092 0.386 0.918

INTERSET correlations

LogPoll -0.899 -0.012 0.004

Var2 -0.032 -0.286 0.076

Var3 0.083 0.118 0.195

Page 36: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

9. Biplot scores for environmental variables The environmental variables are often represented as lines radiating from the centroid of the ordination. The biplot scores give the coordinates of the tips of the radiating lines (Fig. 21.3).

LogPoll

Var2

-2.0

-0.6

-1.0 0.0 1.0 2.0

-0.2

0.2

0.6

Axis 1

Axi

s 2

Page 37: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

The coordinates for the environmental points are based on the intraset correlations. These correlations are weighted by a function of the eigenvalue of an axis and the scaling constant ():

jk jk kv = r

where vjk = the biplot score on axis k of environmental variable j,

rjk = intraset correlation of variable j with axis k, and

α = scaling constant

If Hill's scaling is used, then

jk jk k kv = r - ( )1

Page 38: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

10. Monte Carlo tests of significance

Ho: No linear relationship between matrices. For this hypothesis, the rows in the second matrix are randomly reassigned within the second matrix.

Ho: No structure in main matrix and therefore no linear relationship between matrices. For this hypothesis, elements in the main matrix are randomly reassigned within columns.

Page 39: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

To evaluate the significance of the first CCA axis:

If:n = the number of randomizations (permutations) with an eigenvalue greater than or equal to the corresponding observed eigenvalue

N = the total number of randomizations (permutations)

then

p = (1 + n)/(1 + N)

p = probability of type I error for the null hypothesis that you selected.

Page 40: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Table 21.10. Monte Carlo test results for eigenvalues and species-environment correlations based on 999 runs with randomized data.

Randomized data Axis Real data Mean Minimum Max. p

Eigenvalue 1 0.636 0.098 0.033 0.217 0.001 2 0.044 0.046 0.009 0.112 3 0.016 0.020 0.004 0.076 Spp-Envt

Corr.

1 0.900 0.378 0.224 0.553 0.001 2 0.307 0.287 0.155 0.432 3 0.213 0.218 0.107 0.396

Page 41: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Table 21.11. Comparison of CCA and NMS of the example data set.

CCA NMS

Variance represented (%)

Axis 1 14.4 29.5

Axis 2 1.0 38.9

Cumulative 15.4 68.4

Correlation with LogPoll

Axis 1 -0.899 0.673

Axis 2 -0.012 -0.038

Page 42: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Redundancy analysis

Givenmatrix of response variables (A) matrix of explanatory variables (E).

The basic steps of RDA as applied in community ecology are:

• Center and standardize columns of A and E.• Regress each response variable on E.• Calculated fitted values for the response variables from the multiple regressions.• Perform PCA on the matrix of fitted values• Use eigenvectors from that PCA to calculate scores of sample units in the space defined by E.

Page 43: From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon :

Regression with multiple dependent variables

In the usual case of regressing a single dependent variable (Y) on multiple independent variables (X), the regression coefficients (B) are found by:

B = (XX)-1 X’Y

With multiple dependent variables, Y and B are matrices rather than vectors.