ordination techniques in environmental biology - progress, problems, and pitfalls

111
ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY - PROGRESS, PROBLEMS, AND PITFALLS H.J.B. Birks University of Bergen and University College London

Upload: honora

Post on 23-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY - PROGRESS, PROBLEMS, AND PITFALLS. H.J.B. Birks University of Bergen and University College London. CONTENTS. INTRODUCTION Definitions Data Types of Ordination Historical Perspective PROGRESS Introduction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY - PROGRESS, PROBLEMS, AND

PITFALLS

H.J.B. Birks

University of Bergenand

University College London

Page 2: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

CONTENTS

INTRODUCTION

Definitions

Data

Types of Ordination

Historical Perspective

PROGRESS

Introduction

Underlying Response Models

Indirect Gradient Analysis

Direct Gradient Analysis

PROBLEMS

PITFALLS

CONCLUSIONS

Page 3: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

INTRODUCTION

Definitions

Ordination - "process of reducing the dimensionality (i.e., the number of variables) of multivariate data by deriving a small number of new variables ('latent variables', 'composite variables', ordination axes) that contain much of the information in the original data.

-the reduced data set is often most useful for investigating possible structure in the observations."

B.S. Everitt (1998)

-"the arrangement of samples or sites along gradients on the basis of their species composition or environmental attributes. Ordination is the mathematical expression of the continuum concept in ecology. Gradient analysis is often treated as a synonym by plant ecologists."

M.O. Hill (1998)

Page 4: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

End result is a low-dimensional (usually 2 dimensions) plot in which sites are represented by points in two-dimensional space in such a way that points close together in the plot correspond to sites that are similar in species composition, and points that are far apart correspond to sites that are dissimilar in species composition. Plot is a graphical summary of the data.

Ordination multidimensional scaling, component analysis, latent-structure analysis

Environmental biology - encompasses ecology, environmental monitoring, and palaeoecology. Basically, differences in time scale and temporal resolution only.

Page 5: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS
Page 6: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS
Page 7: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Data for ordination typically consist of a matrix of values specifying the abundance or presence of species in sites. Environmental data for the same sites sometimes also available.

A convenient way to envisage the structure of the data is of two matrices Y and X stacked one beside the other

C = [Y|X] = [yij|xik] (i = 1, ......, n; j = 1, ......, m;

k = 1, ......, q)

Rows of the matrix represent sites. The first block of m columns represents species, yij is the abundance of species j in site i. The second block of q columns represents environmental variables; xik is the value of the kth environmental variable in site i.

Given data on species (Y) and environment (X), there are two major approaches to ordination:

Indirect gradient analysis - analyse Y only, and then involve X

Direct gradient analysis - analyse Y and X simultaneously

Data

Page 8: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Types of Ordination

Page 9: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Historical Perspective

1901 - Pearson develops PCA as a regression technique.

1927 - Spearman applies factor analysis to psychology.

1930 - Ramensky uses an informal ordination technique and introduces the term 'ordnung' into ecology.

1954 - D.W. Goodall introduces PCA into ecology and proposes the term 'ordination'.

1970 - R.H. Whittaker develops theoretical foundations of gradient analysis, especially unimodal species responses and turnover along environmental gradients.

1971 - K.R. Gabriel develops biplot graphical display.

1973 - M.O. Hill re-invents correspondence analysis and introduces CA (as 'reciprocal averaging') into ecology.

1986 - Cajo ter Braak invents canonical correspondence analysis (CCA) and released CANOCO software.

1988 - Cajo ter Braak and Colin Prentice's "A theory of gradient analysis" (Advances in Ecological Research 18; 271-317) that unifies indirect and direct gradient analysis and highlights the importance of underlying species response models.

1998, 2002

- Cajo ter Braak and Petr Šmilauer CANOCO 4 & 4.5 software and manual.

Page 10: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Mark Hill

Cajo ter Braak

Petr Šmilauer

Page 11: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS
Page 12: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

1987

20032002

1987

Page 13: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

PROGRESS

Introduction

Indirect gradient analysis - analyse Y data only

Direct gradient analysis - analyse Y and X data together

Page 14: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Aims of Indirect Gradient Analysis

1. Summarise multivariate data in a convenient low-dimensional way. Dimension-reduction technique.

2. Uncover the fundamental underlying structure of data. Assume that there is underlying LATENT structure. Occurrences of all species are determined by a few unknown environmental variables, LATENT VARIABLES, according to a simple response model. In ordination trying to recover and identify that underlying structure.

Reasons for using Indirect Gradient Analysis

1. Species compositions easier to determine than full range of environmental conditions. Many possible environmental variables. Which are important?

2. Overall composition is often a good reflection of overall environment.

3. Overall composition often of greater concern than individual species. Global, holistic picture, in contrast to regression which gives a local, individualist reductionist view.

Constrained canonical ordination or direct gradient analysis stands between indirect gradient analysis and regression. Many species, many environmental variables.

Page 15: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Underlying Response Models

A straight line displays the linear relation between the abundance value (y) of a species and an environmental variable (x), fitted to artificial data (●). (a = intercept; b = slope or regression coefficient).

A Gaussian curve displays a unimodal relation between the abundance value (y) of a species and an environmental variable (x). (u = optimum or mode; t = tolerance; c = maximum = exp(a)).

Page 16: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Indirect gradient analysis can be viewed as being like regression analysis but with the MAJOR difference that in ordination the explanatory variables are not known environmental variables but are theoretical ‘latent’ variables.

Constructed so that they ‘best’ explain the species data.

As in regression, each species is a response variable but in contrast to regression, consider all response variables simultaneously. 

____________________ PRINCIPAL COMPONENTS ANALYSIS PCACORRESPONDENCE ANALYSIS CA

& relative DCA PCA – linear response model CA – unimodal response model  

Page 17: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Indirect Gradient Analysis

Principal components analysis

In regression, fit a particular environmental variable to all the species. Might then repeat for a different environmental variable. For some species, one variable may fit better and for other species, another variable may fit better. Judge the goodness-of-fit (explanatory power) of an environmental variable by the total regression sum-of-squares.What is the best possible fit that is theoretically obtainable within the constraints of the linear response model?

Defines the ordination problem - to construct the single hypothetical variable that gives the best fit to the species data according to the linear response model. This hypothetical environmental variable is the LATENT VARIABLE or the first ordination axis.

Page 18: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Principal components analysis provides the solution to this linear ordination problem in any number of dimensions.

Page 19: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

PCA results most conveniently presented as BIPLOTS

Correlation (=covariance) biplot scalingSpecies scores sum of squares = λ Site scores scaled to unit sum of squares

Emphasis on species

Distance biplot scalingSite scores sum of squares = λ Species scores scaled to unit sum of squares

Emphasis on sites

Page 20: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Other important questions in PCA

1. Data transformations.

2. Data standardisations (covariance or correlation matrix).

3. Can position 'unknown' samples (e.g., fossil samples) into PCA of 'known' modern samples.

4. How many axes to retain for interpretation? Comparison with broken-stick model surprisingly reliable.

5. Interpretation indirect e.g., if environmental data available, overlay or regress variables on PCA axes 1 and 2. If no environmental data, interpret on basis of species biology and ecology.

Page 21: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Correspondence analysis

Invented independently numerous times:

1. Correspondence Analysis: Weighted Principal Components with Chi-squared metric.

2. Optimal or Dual Scaling: Find site and species scores so that (i) all species occurring in one site are as similar as possible, but (ii) species at different sites are as different as possible, and (iii) sites are dispersed as widely as possible relative to species scores.

3. Reciprocal Averaging: species scores are weighted averages of site scores, and simultaneously, site scores are weighted averages of species scores.

Page 22: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Assume we have five species (presences and absences) along a moisture or pH gradient. Can estimate by Gaussian logit regression the moisture optimum of each species.

Simple weighted averaging is a good approximation to Gaussian logit regression under a wide range of conditions.

Gaussian logit curve fitted by logit regression of the presences ( at p = 1) and absences ( at p = 0) of a species on acidity (pH).

Page 23: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Artificial example of unimodal response curves of five species (A-E) with respect to standardised variables, showing different degrees of separation of the species curves. a: moisture b: First axis of CA c: First axis of CA folded in this middle and the response curves of the species lowered by a factor of about 2. Sites are shown as dots at y = 1 if Species D is present and at y = 0 if Species D is absent.

Page 24: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

As a measure of how well moisture explains the species data, can use the dispersion or spread of the species scores or optima. If the dispersion is large, moisture separates the species curves and explains the species data well, under the assumption of a unimodal response model. If the dispersion is small, then moisture is a poor explanatory variable.

Is there an environmental variable that would explain the species data better? CA is the technique that will construct the theoretical latent variable that will best explain the species data under the assumption of unimodal species responses i.e., will find the latent variable that will maximise the dispersion of the species scores. The theoretical variable is the first CA axis.

Page 25: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

A second CA axis and further CA axes can be constructed so that they also maximise the dispersion of the species scores but subject to the constraint of being uncorrelated with previous CA axes.

CA can be applied not only to presence-absence data but also to abundance data. Involves two-way iterative weighted averaging algorithm by starting from arbitary initial values for sites or from arbitary initial (indicator) values for species. Calculate new species scores by weighted averaging of site scores, calculate new site scores by weighted averaging of sample scores, continue until convergence. On convergence, the values are the site and species scores on CA axis 1, and have the maximum dispersion of species scores on CA axis 1. Eigenvalue of axis 1 is the maximised dispersion of the species scores on the CA axis and is a measure of the importance of the ordination axis.

Page 26: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Remarkable feature of correspondence analysis is that it turns out to be the solution to a wide range of seemingly different problems. (Hill, 1974)

CA axes can be shown to minimise the ratio of within-species variance of the site scores to the between-species variance, i.e., a CA axis finds the latent variable that separates the species niches or optima as well as possible. Important property of CA in its constrained or canonical form, canonical correspondence analysis.

M.O. Hill (1973) J. Ecology 61, 237-249

M.O. Hill (1974) Applied Statistics 23, 340-354

Page 27: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Other important questions in CA

1. Hill's scaling of scores in multiples of one standard deviation or 'TURNOVER'. Sites differing by 4 SD tend to have no species in common.

2. Often need to detrend as CA axis 2 may be an artifact, resulting in an arch in CA axis 2. Commonest cause is that there is only one dominant gradient and the second axis may simply be the first axis folded so that axis 2 is a quadratic function of axis 1, axis 3 a cubic function of axis 1, and so on.

3. Present CA results as biplots with emphasis on species or on sites or symmetric scaling (Gabriel, 2002), or scaled in Hill's turnover standard deviation units, depending on research aims.

4. Rare species usually have 'extreme' scores - delete them, downweight them, or do not plot them.

5. Interpretation indirect, as in PCA.

Page 28: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

When to use PCA or CA/DCA?PCA – linear response model

CA/DCA – unimodal response model

How to know which to use?

Gradient lengths important.

If short, good statistical reasons to use LINEAR methods.

If long, linear methods become less effective, UNIMODAL methods become more effective.

Range 1.5–3.0 standard deviations both are effective.

In practice:

Do a DCA first and establish gradient length.

If less than 2 SD, responses are monotonic. Use PCA.

If more than 2 SD, use CA or DCA.

When to use CA or DCA more difficult.

Ideally use CA (fewer assumptions) but if arch is present, use DCA.

Page 29: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Hypothetical occurrence of species A-J over an environmental gradient. The length of the gradient is expressed in SD units. Broken lines describe fitted occurrences of species. If sampling takes place over a gradient range 1.5 SD, this means that occurrences of most species are best described by a linear model. If sampling takes place over a gradient range 3 SD, occurrences of most species are best described by a unimodal model.

Page 30: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Methods of indirect gradient analysis or unconstrained ordination of a multivariate set of data, Y

Name of method (acronyms, synonyms)

Distance measure preserved

Relationship of ordination axes with original variables

Criterion for drawing ordination axes

Principal Components Analysis (PCA)

Euclidean distance

Linear Finds axis that maximises the total variance (or, equivalently, that minimises the total residual variation)

Correspondence Analysis (CA, reciprocal averaging, dual scaling)

Chi-square distance

Unimodal (approximately Gaussian)

Finds axis that maximises dispersion of species scores (which are themselves weighted averages of site scores)

Principal Coordinates Analysis (PCO, PCoA, metric multidimensional scaling, classical scaling, Torgerson scaling)

Any chosen distance or dissimilarity measure

Unknown; depends on distance measure chosen

Euclidean distances in new full-dimensional space are equal to original distances (or dissimilarities)

Nonmetric Multidimensional Scaling (MDS, NMDA, NMDSCAL)

Any chosen distance or dissimilarity measure

Unknown; depends on distance measure chosen

The number of dimensions for the new space is chosen a priori (reduced). Euclidean distances in new space are monotonically related to original distances

Name of method (acronyms, synonyms)

Distance measure preserved

Relationship of ordination axes with original variables

Criterion for drawing ordination axes

Principal Components Analysis (PCA)

Euclidean distance

Linear Finds axis that maximises the total variance (or, equivalently, that minimises the total residual variation)

Correspondence Analysis (CA, reciprocal averaging, dual scaling)

Chi-square distance

Unimodal (approximately Gaussian)

Finds axis that maximises dispersion of species scores (which are themselves weighted averages of site scores)

Principal Coordinates Analysis (PCO, PCoA, metric multidimensional scaling, classical scaling, Torgerson scaling)

Any chosen distance or dissimilarity measure

Unknown; depends on distance measure chosen

Euclidean distances in new full-dimensional space are equal to original distances (or dissimilarities)

Nonmetric Multidimensional Scaling (MDS, NMDA, NMDSCAL)

Any chosen distance or dissimilarity measure

Unknown; depends on distance measure chosen

The number of dimensions for the new space is chosen a priori (reduced). Euclidean distances in new space are monotonically related to original distances

Page 31: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Direct Gradient Analysis

Introduction

Primary data in gradient analysis

Ind

irect

GA

Dir

ect

GA

Abundances or +/-

variables

Response variables

Y

Values

Classes

Predictor or explanatory variables

X

Speci

es

Env.

vars

Page 32: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Two-step approach of indirect gradient analysis

PCA or CA followed by regression

Standard approach from 1954 to about 1985

Limitations: (1) environmental variable studied may turn out to be poorly related to the first few ordination axes.

(2) may only be related to 'residual' minor directions of variation in species data.

(3) remaining variation can be substantial, especially in large data sets with many zero values.

(4) a strong relation of the environmental variables with, say, axis 5 or 6 can easily be overlooked and unnoticed.

Limitations overcome by canonical or constrained ordination techniques = multivariate direct gradient analysis.

Page 33: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Direct gradient analysis or canonical ordination techniquesOrdination and regression in one technique

Search for a weighted sum of environmental variables that fits the species best, i.e. that gives the maximum regression sum of squares

Ordination diagram  1) patterns of variation in the species data  2) main relationships between species and each environmental variable

Redundancy analysis constrained or canonical PCA

Canonical correspondence analysis (CCA) constrained CA

(Detrended CCA) constrained DCA

Axes constrained to be linear combinations of environmental variables.

In effect PCA or CA with one extra step:

Do a multiple regression of site scores on the environmental variables and take as new site scores the fitted values of this regression.

Multivariate regression of Y on X.

Page 34: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Artificial example of unimodal response curves of five species (A-E) with respect to standardised environmental variables showing different degrees of separation of the species curves

a: Moistureb: Linear combination of moisture and phosphate, chosen a prioric: Best linear combination of environmental variables, chosen by CCA.Sites are shown as dots, at y = 1 if Species D is present and at y = 0 if Species D is absent.

moisture

linear combination of moisture and phosphate

CCA linear combination

Page 35: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Combinations of environmental variables 

e.g. 3 x moisture + 2 x phosphate 

e.g. all possible linear combinations 

 zj = environmental variable at site j

 c = weights

 xj = resulting ‘compound’ environmental variable

 CCA selects linear combination of environmental variables that maximises dispersion of species scores, i.e. chooses the best weights (ci) of the environmental variables.

C.J.F. ter Braak (1986) Ecology 67, 1167-1179

.....332211 jjjoj zczczccx

Page 36: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Alternating regression algorithms

Algorithms for (A) Correspondence Analysis, (B) Detrended Correspondence Analysis, and (C) Canonical Correspondence Analysis, diagrammed as flowcharts. LC scores are the linear combination site scores, and WA scores are the weighted averaging scores.

- CA - DCA - CCA

Page 37: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

CCA axis is a linear combination of the environmental variables supplied and it is the 'best' in the sense that it minimises the size of the species niches (minimises the ratio of within-species variance to total variance), and maximises the dispersion of the species scores or optima.

Canonical or constrained correspondence analysis

Ordinary correspondence analysis gives:

1. Site scores which may be regarded as reflecting the underlying gradients.

2. Species scores which may be regarded as the location of species optima in the space spanned by site scores.

Canonical or constrained correspondence analysis gives in addition:

3. Environmental scores which define the gradient space.

These optimise the interpretability of the results.

Page 38: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

CCA of the Dune Meadow Data. a: Ordination diagram with environment-al variables represented by arrows. The c scale applies to environmental variables, the u scale to species and sites. the types of management are also shown by closed squares at the centroids of the meadows of the corresponding types of management.

b: Inferred ranking of the species along the variable amount of manure, based on the biplot interpretation of Part a of this figure.

aa

bb

Page 39: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

CCA: Biplots and triplots

• You may have in a same figure• WA scores of species• WA or LC scores of sites• Biplot arrows or class centroids of environmental variables

 • In full space, the length of an environmental vector is 1: When

projected onto ordination space• Length tells the strength of the variable• Direction shows the gradient• For every arrow, there is an equal arrow to the opposite direction,

decreasing direction of the gradient• Project sample points onto a biplot arrow to get the expected

value 

• Class variables coded as dummy variables• Plotted as class centroids• Class centroids are weighted averages 

• LC score shows the class centroid, WA scores the dispersion of the centroid

Page 40: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Redundancy analysis - constrained or canonical PCA

Short (< 2 SD) compositional gradients

Linear or monotonic responses

Reduced-rank regression

PCA of y with respect to x

Two-block mode C PLS

PCA of instrumental variables Rao (1964)

PCA - best hypothetical latent variable is the one that gives the smallest total residual sum of squares

RDA - selects linear combination of environmental variables that gives smallest total residual sum of squares

C.J.F. ter Braak (1994) Ecoscience 1, 127–140 Canonical community ordination Part I: Basic theory and linear methods

Page 41: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

RDA ordination diagram of the Dune Meadow Data with environmental variables represented as arrows. The scale of the diagram is: 1 unit in the plot corresponds to 1 unit for the sites, to 0.067 units for the species and to 0.4 units for the environmental variables.

Page 42: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Partial constrained ordinations (partial CCA, RDA, etc)

e.g. pollution effectsseasonal effects COVARIABLES

 

Eliminate (partial out) effect of covariables. Relate residual variation to pollution variables. Replace environmental variables by their residuals obtained by regressing each pollution variable on the covariables.

Page 43: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Partial CCA

Natural variation due to sampling season and due to gradient from fresh to brackish water partialled out by partial CCA.

Variation due to pollution could now be assumed.

Ordination diagram of a partial canonical correspondence analysis of diatom species (A) in dykes with as explanatory variables 24 variables-of-interest (arrows) and 2 covariables (chloride concentration and season). The diagram is symmetrically scaled [23] and shows selected species and standardised variables and, instead of individual dykes, centroids (•) of dyke clusters. The variables-of-interest shown are: BOD = biological oxygen demand, Ca = calcium, Fe = ferrous compounds, N = Kjeldahl-nitrogen, O2 = oxygen, P = ortho-phosphate, Si= silicium-compunds, WIDTH = dyke width, and soil types (CLAY, PEAT). All variables except BOD, WIDTH, CLAY and PEAT were transformed to logarithms because of their skew distribution. The diatoms shown are: Ach hun = Achnanthes hungarica, Ach min = A. minutissima, Aph cas= Amphora castellata Giffen, Aph lyb = A. lybica, Aph ven = A. veneta, Coc pla = Cocconeis placentulata, Eun lun = Eunotia lunaris, Eun pec = E. pectinalis, Gei oli = Gomphoneis olivaceum, Gom par = Gomphonema parvulum, Mel jur = Melosira jürgensii, Nav acc = Navicula accomoda, Nav cus = N. cuspidata, Nav dis = N. diserta, Nav exi = N. exilis, Nav gre = N. gregaria, Nav per = N. permitis, Nav sem = N. seminulum, Nav sub= N. subminuscula,Nit amp = Nitzschia amphibia, Nit bre = N. bremensis v. brunsvigensis, Nit dis = N. dissipata, Nit pal = N. palea, Rho cur = Rhoicosphenia curvata.(Adapted from H. Smit, province of Zuid Holland, in prep)

Page 44: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Partial ordination analysis (partial PCA, CA, DCA)

There can be many causes of variation in ecological data. Not all are of major interest. In partial ordination, can ‘factor out’ influence from causes not of primary interest. Directly analogous to partial correlation or partial regression. Can have partial ordination (indirect gradient analysis) and partial constrained ordination (direct gradient analysis). Variables to be factored out are ‘COVARIABLES’ or ‘COVARIATES’ or ‘CONCOMITANT VARIABLES’. Examples are:1) Differences between observers.

2) Time of observation.

3) Between-plot variation when interest is temporal trends within repeatedly sampled plots.

4) Uninteresting gradients, e.g. elevation when interest is on grazing effects.

5) Temporal or spatial dependence, e.g. stratigraphical depth, transect position, x and y co-ordinates. Help remove autocorrelation and make objects more independent.

6) Collecting habitat – outflow, shore, lake centre.

7) Everything – partial out effects of all factors to see residual variation in data. Given ecological knowledge of sites and/or species, can try to interpret residual variation. May indicate environmental variables not measured, may be largely random, etc.

Page 45: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Response

Linear Unimodal

Ordination PCA CA, DCA

Constrained ordination

RDA CCA, DCCA

Partial ordination Partial PCAPartial CA, partial DCA

Partial constrained ordination

Partial RDAPartial CCA, partial DCCA

Overview of major ordination techniques

Page 46: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Partitioning variance or variation

ANOVA total SS = regression SS + residual SS

Two-way ANOVA between group (factor 1) + between treatments (factor 2) + interactions + error component Borcard et al. (1992) Ecology 73, 1045–1055 Variance or variation decomposition into 4 components

Important to consider groups of environmental variables relevant at same level of ecological relevance (e.g. micro-scale, species-level, assemblage-level, etc.).

Page 47: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Total inertia = total variance 1.164Sum canonical eigenvalues = 0.663 57%Explained variance 57%Unexplained variance = T – E 43%

What of explained variance component?Soil variables (pH, Ca, LOI)Land-use variables (e.g. grazing, mowing)Not independentDo CCA/RDA using

1) Soil variables only canonical eigenvalues 0.5212) Land-use variables only canonical eigenvalues 0.5033) Partial analysis Soil Land-use covariables 0.1604) Partial analysis Land-use Soil covariables 0.142

a) Soil variation independent of land-use (3) 0.160 13.7%b) Land-use structured (covarying) soil variation (1–3) 0.361 31%c) Land-use independent of soil (4) 0.142 12.2%

Total explained variance 56.9%d) Unexplained 43.1%

unexplaineduniqueuniquecovariance

a b c d

Page 48: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Variance partitioning or decomposition with three or more sets of predictor (explanatory) variables

Qinghong & Bråkenheim (1995) Water, Air and Soil Pollution 85: 1587–1592

Three sets of predictors – Climate (C), Geography (G) and Deposition of Pollutants (D)

Series of RDA and partial RDAPredictors Covariables Sum of canonical G+C+D - 0.811D G+C 0.027G+C - 0.784G+C D 0.132D - 0.679Joint effectDG+C=0.784-0.132=0.679-0.027=0.652C D+G 0.106G+D - 0.706G+D C 0.074C - 0.737Joint effectCD+G=0.737-0.106=0.706-0.074=0.631

0.811

0.811

0.812

0.811

Page 49: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Predictors Covariables Sum of canonical G D+G 0.034D+C - 0.777D+C G 0.228G - 0.538

Joint effectGD+C=0.777-0.228=0.538-0.034=0.549

0.811

0.811

Canonical eigenvaluesAll predictors 0.811Pure deposition 0.027 PDPure climate 0.106 PCPure geography 0.034 PGJoint G + C 0.132Joint G + D 0.074Joint D + C 0.228Unexplained variance 1 – 0.811 = 0.189

PD

DGCD

CG

CDG

PGPC

D

GC

Covariance terms

CDDGCGCDG

Page 50: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Total explained variance 0.811 consists of:

Common climate + deposition 0.095 Unique climate PC 0.106

Common deposition + geography 0.013 Unique geography PG 0.034

Common climate + geography 0.008 Unique deposition PD 0.027

Common climate + geography + 0.544 Unexplained variance 0.189 deposition

See also Qinghong Liu (1997) Environmetrics 8: 75–85

Anderson & Gribble (1998) Australian J. Ecology 23: 158-167

Total variation:

 1)  random variation

 2)  unique variation from a specific predictor variable or set of predictor variables

 3)  common variation contributed by all predictor variables considered together and in all possible combinations

Usually only interpretable with 2 or 3 'subsets' of predictors.

In CCA and RDA, the constraints are linear. If levels of the environmental variables are not uncorrelated (orthogonal), may find negative 'components of variation'.

Page 51: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Statistical testing of constrained ordination results

Statistical significance of species-environmental relationships. Monte Carlo permutation tests.

Randomly permute the environmental data, relate to species data ‘random data set’. Calculate eigenvalue and sum of all canonical eigenvalues (trace). Repeat many times (99).

If species react to the environmental variables, observed test statistic (1 or trace) for observed data should be larger than most (e.g. 95%) of test statistics calculated from random data. If observed value is in top 5% highest values, conclude species are significantly related to the environmental variables.

Page 52: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Statistical significance of constraining variables

• CCA or RDA maximise correlation with constraining variables and eigenvalues.

• Permutation tests can be used to assess statistical significance:

- Permute rows of environmental data.

- Repeat CCA or RDA with permuted data many times.

- If observed higher than (most) permutations, it is regarded as statistically significant.

Page 53: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Monte Carlo permutation tests in CCA and RDA as implemented in CANOCO

1. Unrestricted

2. Restricted - time series and line transects

-spatial layout on rectangular grid

-split-plot design

-multifactorial analysis of variance

Model-based permutations - when covariables are present, what is permuted are the residuals of the regression of Y on the covariables ('reduced model') or the residuals of the regression of Y on X and the covariables ('full model').

Page 54: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

An ecological example of CCA

OrdinalOrdinal

4 classes4 classes

3 classes3 classes

7 binary class 7 binary class variablesvariables

Remove effect ofRemove effect ofseasonal variationseasonal variation

}}

Example data: quantitative and qualitative environmental variables (a) and qualitative covariables (b) recorded at 40 sites along two tributaries from the Hierden stream (sd: standard deviation, min: minimum, max: maximum). Aquatic macro-fauna data

Page 55: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Ranking environmental variables in importance by their marginal (left) and conditional (right) effects of the macrofauna in the example data-set, as obtained by forward selection. (1 = fit = eigenvalue with variable j only; a = additional fit = increase in eigenvalue; cum (a) = cumulative total of eigenvalues a; P = significance level of the effect, as obtained with a Monte Carlo permutation test under the null model with 199 random permutations; - additional variables tested; veg. = vegetation). Seasonal variation is partialled out by taking the month class variables as covariables.Marginal effects (forward: step 1)Marginal effects (forward: step 1) Conditional effects Conditional effects (forward: continued)(forward: continued)  jj VariableVariable 11 PP jj VariableVariable aa PP Cum (Cum (aa))

11 Shrubs (1/0)Shrubs (1/0) 0.250.25 (0.01)(0.01) 11 Shrubs (1/0)Shrubs (1/0) 0.250.25 (0.01)(0.01) 0.250.2522 Source distanceSource distance 0.220.22 (0.01)(0.01) 22 Source distance Source distance STEP 2STEP 2 0.190.19 (0.01)(0.01) 0.440.4433 ECEC 0.200.20 (0.01)(0.01) 33 Discharge Discharge STEP 3STEP 3 0.190.19 (0.01)(0.01)0.630.6344 DischargeDischarge 0.170.17 (0.01)(0.01) 44 EC EC STEP 4STEP 40.140.14 (0.03)(0.03) 0.750.7555 TotTot.. cover of veg.0.16 cover of veg.0.16 (0.01)(0.01)66 ShadingShading 0.150.15 (0.01)(0.01) -- Cover emergent veg.Cover emergent veg. 0.110.11 (0.10)(0.10) --77 Soil grain sizeSoil grain size 0.140.14 (0.02)(0.02) -- Cover bank veg.Cover bank veg. 0.110.11 (0.12)(0.12) --88 Stream widthStream width 0.140.14 (0.05)(0.05) -- Soil grain sizeSoil grain size 0.100.10 (0.13)(0.13) --99 High weedy veg.High weedy veg. 0.140.14 (0.08)(0.08)

1010 Cover bank veg.Cover bank veg. 0.130.13 (0.11)(0.11)

-- U vs L streamU vs L stream 0.220.22 (0.01)(0.01) -- U vs L streamU vs L stream 0.090.09 (0.26)(0.26) --EXTRA FITEXTRA FIT

Each variable is the only Each variable is the only Change in eigenvalue if particular variable selectedChange in eigenvalue if particular variable selectedenvironmental variable environmental variable MARGINAL EFFETSMARGINAL EFFETS - - CONDITIONAL EFFECTSCONDITIONAL EFFECTS i.e. ignoring all other variables i.e. ignoring all other variables given other selected variablesgiven other selected variables

Page 56: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Species-conditional triplot based on a canonical correspondence analysis of the example macro-invertebrate data displaying 13% of the inertia (=weighted variance) in the abundances and 69% of the variance in the weighted averages and class totals of species with respect to the environmental variables. The eigenvalue of axis 1 (horizontally) and axis 2 (vertically) are 0.35 and 0.17 respectively; the eigenvalue of the axis 3 (not displayed) is 0.13. Sites are labelled with stream code (U, L) and are ranked by distance from the source (rank number within the stream). Species (triangles) are weighted averages of site scores (circles). Quantitative environmental variables are indicated by arrows. The class variable shrub is indicated by the square points labelled Shrub and No shrub. The scale marks along the axes apply to the quantitative environmental variables; the species scores, site scores and class scores were multiplied by 0.4 to fit in the coordinate system. Only selected species are displayed which have N2>4 and a small N2-adjusted root mean square tolerance for the first two axes. The species names are abbreviated to the part in italics as follows Ceratopogonidae, Dendrocoelum lacteum, Dryops luridus, Erpobdella testacea, Glossiphonia complanata, Haliplus lineatocollis, Helodidae, Micropsectra atrofasciata, Micropsectra fusca, Micropterna sequax, Prodiamesa olivacea, Stictochironomus sp.

Page 57: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Other aspects of CCA/RDA

1. Very robust, major assumption of CCA is that species responses are approximately unimodal.

2. Can add in 'unknown' or passive samples into CCA or RDA space (e.g., fossil samples positioned into modern CCA space).

3. Unlike canonical correlation analysis, RDA (and CCA) can handle data sets where the number of variables (species and environmental variables ) >> number of sites.

4. Can calculate range of ordination diagnostics, comparable to regression diagnositics.

Page 58: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Fossils samples in modern CCA space

Page 59: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Distance-based redundancy analysis

DISTPCOA Pierre Legendre & Marti Anderson (1999) Ecol. Monogr. 69: 1-24.RDA but with any distance coefficientRDA - Euclidean distance Absolute abundances Quantity dominatedCCA - chi-square metric Relative abundances Shape/composition dominated

Does it matter?Total biomass or cover and species compositionVarying e.g. ridge snow bed gradient

Other dissimilaritiesBray & Curtis non-Euclidean semi-metricJaccard +/- non-Euclidean semi-metricGower mixed data non-Euclidean semi-metric

Basic ideaReduce site x site DC matrix (any DC) to principal co-ordinates (principal co-ordinates analysis, classical scaling, metric scaling – Torgerson, Gower) but with correction for negative eigenvalues to preserve distances.

PCoA – embeds the Euclidean part of DC matrix, rest are negative eigenvalues for which no real axes exist. These correspond to variation in distance matrix, which cannot be represented in Euclidean space.

Page 60: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Correction for negative eigenvalues

where c1 is equal to absolute value of largest negative eigenvalue of matrix used in PCoA 1

   

Use all principal co-ordinate site scores (n - 1 or m, whichever is less) as RESPONSE (species) data in RDA. Use dummy variables for experimental design as predictors in X in RDA.

Now under framework of RDA and battery of permutation tests, can analyse structured experiments but WHOLE ASSEMBLAGE (cf. MANOVA but where m >n).

Now can test null hypothesis (as in MANOVA) that assemblages from different treatments are no more different than would be expected due to random chance at a given level of probability. BUT unlike non-parametric tests (ANOSIM, Mantel tests), can test for interactions between factors in multivariate data but using any DC (not only Euclidean as in ANOVA/MANOVA). Using permutation tests means we do not have to worry about multivariate normality or homogeneity of covariance matrices within groups, or abundance of zero values as in ecological data.

DISTPCoA www.fas.umontreal.ca/biol/legendre

2

2

1ijij da

..aaaa jiijij

jiforcdd ijij 5.01

2)2('

D

1

Page 61: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Raw data(replicates x species)

Distance matrix(Bray-Curtis, etc)

Principal coordinate analysis(PCoA)

Correction for negative eigenvalues

Matrix Y(replicates x

principal coordinates)

Matrix X(dummy variables

for the factor)

Test of one factorin a single-factor model

Redundancy analysis (RDA)F# statistic

Partial redundancy analysis (partial RDA)F# statistic

Matrix Y(replicates x

principal coordinates)

Matrix X(dummy

variables for the

interaction)

Matrix XC

(dummy variables

for the main

effects)

Test of F# by permutation under the full model

Test of F# by permutation

Test of interaction term in multifactorial model

Page 62: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Canonical analysis of principal co-ordinates

Anderson, M.J. & Willis, T.J. (2003) Ecology 84: 511-525

CAP – www.stat.auckland.ac.nz/~mja

CAP - canonical analysis of principal co-ordinates based on any symmetric distance matrix including permutation tests.

Y response variables (n x m)

X predictor variables (n x q) (1/0 or continuous variables)

Performs canonical analysis of effects of X on Y on the basis of any distance measure of choice and uses permutations of the observations to assess statistical significance.

If X contains 1/0 coding of an ANOVA model (design matrix), result is a generalised discriminant analysis. If X contains one or more predictor variables, result is a generalised canonical correlation analysis.

Page 63: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Summary of constrained ordination methods

Name of methods (acronyms, synonyms)

Distance measure preserved

Relationship of ordination axes with original variables

Takes into account correlation structure

Redundancy Analysis (RDA)

Euclidean distance

Linear with X, linear with fitted values, Y = X(X'X)-1 X'Y

... among variables in X, but not among variables in Y

Canonical Correspondence Analysis (CCA)

Chi-square distance

Linear with X, approx unimodal with Y, linear with fitted values, Y*

... among variables in X, but not among variables in Y

Canonical Analysis of Principal Coordinates (CAP; Generalized Discriminant Analysis)

Any chosen distance or dissimilarity

Linear with X, linear with Qm; unknown with Y (depends on distance measure)

... among variables in X, and among principal coordinates Qm

Canonical Correlation Analysis (CCorA, COR)

Mahalanobis distance

Linear with X, linear with Y

... among variables in X, and among variables in Y

Canonical Discriminant Analysis (CDA; Canonical Variate Analysis CVA; Discriminant Function Analysis, DFA)

Mahalanobis distance

Linear with X, linear with Y

... among variables in X, and among variables in Y

Methods of constrained ordination relating response variables, Y (species abundance variables) with predictor variables, X (such as quantitative environmental variables or qualitative variables that identify factors or groups as in ANOVA).

Name of methods (acronyms, synonyms)

Distance measure preserved

Relationship of ordination axes with original variables

Takes into account correlation structure

Redundancy Analysis (RDA)

Euclidean distance

Linear with X, linear with fitted values, Y = X(X'X)-1 X'Y

... among variables in X, but not among variables in Y

Canonical Correspondence Analysis (CCA)

Chi-square distance

Linear with X, approx unimodal with Y, linear with fitted values, Y*

... among variables in X, but not among variables in Y

Canonical Analysis of Principal Coordinates (CAP; Generalized Discriminant Analysis)

Any chosen distance or dissimilarity

Linear with X, linear with Qm; unknown with Y (depends on distance measure)

... among variables in X, and among principal coordinates Qm

Canonical Correlation Analysis (CCorA, COR)

Mahalanobis distance

Linear with X, linear with Y

... among variables in X, and among variables in Y

Canonical Discriminant Analysis (CDA; Canonical Variate Analysis CVA; Discriminant Function Analysis, DFA)

Mahalanobis distance

Linear with X, linear with Y

... among variables in X, and among variables in Y

Page 64: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Principal response curves (PRC)

van der Brink, P. & ter Braak, C.J.F. (1999) Environmental Toxicology & Chemistry 18: 138-148

van der Brink, P. & ter Braak, C.J.F. (1998) Aquatic Ecology 32: 163-178PRC is a means of analysing repeated measurement designs and of testing and displaying optimal treatment effects that change across time.

Based on RDA (= reduced rank regression) that is adjusted for changes across time in the control treatment. Allows focus on time-dependent treatment effects. Plot resulting principal component against time in PRC diagram.

Developed in ecotoxicology; also used in repeated measures in experimental ecology and in descriptive ecology where spatial replication is substituted for temporal replication.

Highlights differences in measurement end-points betweeen treatments and the reference control.

Page 65: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

PRC model

Yd(i)tk = Yotk + bk cdt + d(i)tk where,

Yd(i)tk = abundance counts of taxon k at time t in replicate i of treatment d

Yotk = mean abundance of taxon k in controls (o) at time t

cdt = principal response of treatment d at time t (PRC)

bk = weight of species k with respect to cdt

d(i)tk = error term with mean of zero and variance 2k

Modelling the abundance of particular species as a sum of three terms, mean abundance in control, a treatment effect, and an error term.Data input - species data (often log transformed) for different treatments

at different times

- predictor variables of dummy variables (1/0) to indicate all combinations of treatment and sampling time ('indicator variables')

- covariables of dummy variables to indicate sampling timeDo partial RDA with responses, predictors, and covariables and delete all predictor variables that represent the control. This ensures that the treatment effects are expressed as deviations from the control.

Page 66: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

PRC plots

One curve for each treatment expressed as deviation from the control. Species weights (bk) allow species interpretation. Higher the weight, more the actual species response is likely to follow the PRC pattern, because the response pattern = bk cdt. Taxa with high negative weight are inferred to show opposite pattern. Taxa with near zero weight show no response.

Significance of PRC can be tested by Monte Carlo permutation of the whole time series within each treatment.

Can use the second RDA axis to generate a second PRC diagram to rank 2 model.

Page 67: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

PRC and analysis of monitoring data

PRC usually used with experimental data. Can be used with (bio)monitoring data.

Samples at several dates at several sites of a river, some upstream of a sewage treatment plant (STP) (300 m, 100 m), in the STP outlet, and some downstream (100 m, 1 km). 795 samples, 5 sites, 1994-2002.

PRC using sampling month as covariable, product of sampling month and site as explanatory variables. Used STP outlet as the reference site.

Of total variance, 24% could be attributed to between-month variation. 57% of all variance could be allocated to between-site differences, the remaining 19% to within-month variation.

Page 68: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

See biggest differences for the two upstream sites, with lower NOx, total N, conductivity, salinity, total P, and temperature and higher values of turbidity and faecal coliforms. STP outlet leads to increases in N, P, temperature, etc. Downstream values decrease but are not as low as upstream sites. STP successfully reduces faecal coliforms as their values are higher in the upstream sites due to pollution.

Page 69: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Filters out mean abundance pattern across time in the control. Focuses on deviation between treatment and control. PRC displays major patterns in those deviations and provides good summary of response curves of individual taxa.

PRC helps to highlight 'signal' from 'noise' in ecological data in replicated experimental studies.

Simplified RDA - simplified by representing the time trajectory for the controls as a horizontal line and taking the control as the reference to which other treatments are compared.

PRC gives simple representation of how treatment effects develop over time at the assemblage level.

PRC:

Page 70: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

A palaeoecological example of RDA

Laacher See volcanic ash

A F Lotter & H J B Birks 1993 J Quat Sci 8, 263 - 276

11000 BP

? Any impact on terrestrial and aquatic systems

Also:

H J B Birks & A F Lotter 1994 J Paleolimnology 11, 313 - 922

A F Lotter et al 1995 J Paleolimnology 14, 23 - 47

Page 71: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Map showing the location of Laacher See (red star), as well as the location of the sites investigated (blue circle). Numbers indicate the amount of Laacher See Tephra deposition in millimetres (modified from van den Bogaard, 1983).

Page 72: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS
Page 73: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS
Page 74: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS
Page 75: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS
Page 76: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Data

Terrestrial pollen and spores (9, 31 taxa)Aquatic pollen and spores (6, 8 taxa) RESPONSE VARIABLESDiatoms (42,54 taxa) % data

Biozone (Allerød, Allerød/Younger Dryas, Younger Dryas)

+/-

Lithology (gyttja, clay/gyttja) +/-

Depth ("age") Continuous

Ash Exponential decay process Continuous

= 0.5

x = 100

t = time

YD

211 years

Exp x-t

Time AL

EXPLANATORY VARIABLES

NUMERICAL ANALYSIS

(Partial) redundancy analysis

Restricted (stratigraphical) Monte Carlo permutation tests

Variance partitioning

Log-ratio centring because of % data

Page 77: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

RESULTS OF (PARTIAL) RESUNDANCY ANALYSIS OF THE BIOSTRATIGRAPHICAL DATA SETS AT ROTMEER (RO-6) AND HIRSCHENMOOR (HI-1) UNDER DIFFERENT MODELS OF EXPLANATORY VARIABLES AND COVARIABLES. Entries are significance levels as assessed by restricted Monte Carlo permutation tests (n = 99)

Data Set

Site Explanatory variables Covariables Terrestria

l pollenAquatic pollen & spores

Diatoms

RO-6 Depth + biozone + ash + lithology

- 0.01a 0.01a 0.01a

HI-1 Depth + biozone + ash + lithology

- 0.01a 0.10 0.01a

RO-6 Ash Depth + biozone

0.09ns 0.48ns 0.16ns Unique ash effect (no lithology)

HI-1 Ash Depth + biozone

0.28ns 0.13ns 0.01a

RO-6 Ash + lithology Depth + biozone

- 0.88ns 0.17ns Unique ash + lithology effect

HI-1 Ash + lithology Depth + biozone

- 0.10ns 0.01a

RO-6 Ash Depth + biozone + lithology

- 0.53ns 0.08ns Unique ash effect (lithology considered)

HI-1 Ash Depth + biozone + lithology

- 0.10ns 0.19ns

RO-6 Ash + lithology + ash*lithology

Depth + biozone

- 0.25ns 0.03b Unique ash + lithology + (ash*lithology) interaction effectHI-1 Ash + lithology +

ash*lithologyDepth + biozone

- 0.12ns 0.05b

a p 0.01 b 0.01 < p 0.05

Page 78: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS
Page 79: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Other recent developments in gradient analysis methods based on weighted averaging

1. Weighted averaging partial least squares regression and calibration

Predict one or more environmental variables (e.g., lake pH) from biological data (e.g., diatoms).

ter Braak and Juggins (1993) Hydrobiologia 269: 485-502

2. Canonical correspondence analysis partial least squares regression

Predict biological assemblages from many environmental variables

ter Braak and Verdonschot (1995) Aquatic Sciences 57: 255-289

3. Co-correspondence analysis

Relate two biological data sets (e.g., vascular plants and invertebrates) to identify patterns common to both.

ter Braak and Schaffers (2003) Ecology (in press)

Page 80: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

PROBLEMS

1. Data type and choice of ordination method

Besides gradient length (standard deviations), data type is also important in selecting ordination method.

Absolute abundancesRelative abundances(Compositional differences)

Unconstrained PCA (linear) CA, DCA (unimodal)

Constrained RDA (linear) CCA, DCCA (unimodal)

Constrained (PRC) (linear) -

PCA/RDA are weighted summations; CA/CCA are weighted averages, hence the difference between modelling absolute values (PCA/RDA) or relative values (CA/CCA).

Cannot currently model satisfactorily absolute abundances over long gradients. Need to partition the data into smaller gradients first (e.g. TWINSPAN).

Page 81: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Long gradients and many rare species

What to do with data containing many zero values and long gradients, and to avoid the inherent problems of the Chi-squared distance metric implicit in CA and CCA?

These problems are:1. Implicit use of relative abundances

2. A difference between abundance values for a common species contributes less to the distance than the same difference for a rare species, so rare species may have an unduly large influence on the analysis

Possible solutions:

1. Delete rare species

2. Empirical downweighting of rare species as in CANOCO

3. Data transformations that preserve the Euclidean distances and balance rare and common species

P. Legendre & E.D. Gallagher (2001) Oecologia 129: 271-280

Software: www.fas.umontreal.ca/biol/legendre

Page 82: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

(1) Chord distance

(2) Chi-square metric

where y+j is the column (species) sum for species j, and y1+ is the row (sample) sum for sample 1

(3) Species profiles

(4) Hellinger distance

m

jm

jj

j

m

jj

j

y

y

y

yD

1

2

1

22

2

1

21

112

m

j

jj

y

y

y

y

jyD

1

2

2

2

1

112

1

m

j

jj

y

y

y

yD

1

2

2

2

1

112

m

j

jj

y

y

y

yD

1

2

2

2

1

112

(These are also relevant transformations for minimum-variance cluster analysis and k-means cluster analysis that minimise a least-squares function. These transformations result in Euclidean metrics that can be represented in Euclidean space and that preserve sum of squares.)

Page 83: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS
Page 84: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

1) The fewer the environmental variables, stronger the constraints are.

2) With q (number of samples – 1) environmental variables, the analysis is unconstrained.

3) Want to try to find MINIMAL ADEQUATE SET of environmental variables that explain the species data about as well as the FULL SET.

4) Automatic selection (e.g. forward selection) can be dangerous:

a) Several sets can be almost equally good. Automatic selection finds one but may not be the best.

  b) Selection order may change the result and ecologically important variables may not be selected.

  c) Small changes in the data can change the selected variables. Difficult to draw reliable conclusions about relative importance of variables. Omission of a variable does not mean it is not ecologically important.

5) If we are lucky, there may only be one minimal adequate model but we cannot assume that there is only one such model.

2. Selecting environmental variables in constrained ordination analysis (e.g.,

CCA, RDA)

Page 85: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Use Y species data

X environmental data

(and Z covariable data)

All matching for all n samples

But, we may have previous knowledge about our m species (e.g., life-history strategies, ecological indicator values, growth rates, heights, life-forms, etc). At present this information is not used.

Need for 3 matrix approach to CCA and RDA.

Doledec, S. et al. (1996) Environmental & Ecological Statistics 3: 143-166

3. Ignoring knowledge about species

Page 86: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

If the abundance of species is measured by biomass, then the analysis will only really pay attention to the trees and not the herbs.

Useful to rescale data or re-define species variables. For example, reduce the data to presences and absences of particular attributes, such as the attribute 'present with cover of 25%' (cf. PSEUDOSPECIES in TWINSPAN).

In much field plant ecological data, our experience is that almost all quantitative information is contained in three attributes "present", "present in moderate quantity", and "present in large quantity".

4. Data coding

Page 87: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

PITFALLS

1. Biplot scaling

Data-tables in an ecological study on species environmental relations. Primary data are the sub-table 1 of abundance values of species and the sub-tables 4 and 7 of values and class labels of quantitative and qualitative environmental variables (env. var), respectively. The primary data are input for canonical correspondence analysis (CCA). The other sub-tables contain derived (secondary) data, as the arrows indicate, named after the (dis)similarity coefficient they contain. The coefficients shown in the figure are optimal when species-environmental relations are unimodal. The CA ordination diagram represents these sub-tables, with emphasis on sub-tables 5 (weighted averages of species with respect to quantitative environmental variables), 8 (totals of species in classes of qualitative environmental variables) and 1 (with fitted, as opposed to observed, abundance values of species).

The sub-tables 6, 9, and 10 contain correlations among quantitative environmental variables, means of the quantitative environmental variables in each of the classes of the qualitative variables and chi-square distances among the classes, respectively. (Chi-sq = Chi-square; Aver = Averages; Rel = Relative)

Page 88: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Scaling-1: focus on sites Hill's scaling

Interpreta-tion

2: focus on species biplot scaling of CCA

Interpreta-tion

1. species x sites Rel. abundances CENTROID Fitted rel. abund. BIPLOT rule or CENTROID rule

2. species x species

- UNKNOWN

-square distances

DISTANCE rule

3. sites x sites Turnover distances

DISTANCE -square distances*

DISTANCE rule

Quantitative env. vars

4. sites x env. vars - UNKNOWN

Values of env.vars

BIPLOT rule

5. species x env. vars

Weighted averages

BIPLOT Weighted averages

BIPLOT rule

6. env.vars x env. vars

Effects ? BIPLOT Correlations BIPLOT rule

Qualitative env. vars

7. sites x env. classes

Membership CENTROID Membership CENTROID rule

8. species x env. cls.

Rel. total abund. CENTROID Rel. total abund. CENTROID rule

9. env.vars x env. classes

- UNKNOWN

Mean values of env. vars

BIPLOT rule

10. env. classes x env. classes.

Turnover distances

DISTANCE -square distances*

DISTANCE rule

(Italics = fitted by weighted least-squares) * if 1 2

Page 89: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Interpretation of CCA plots

Centroid principle

Distance principle

Biplot principle (of relative abundances)

Small eigenvalues, short (< 4 SD) gradients – Biplot principle

Large eigenvalues (> 0.40), long (> 4 SD) gradients – Centroid and Distance principles and some Biplot principles

The centroid and distance principles may approximate biplot principle if gradients are short and eigenvalues small.

Differences are least important if 12

Page 90: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

1) Choice can greatly influence the results. Fewer the environmental variables, the more constrained the ordination is.

2) Possible to have one only – can evaluate its explanatory power.

3) Can remove superfluous variables if they are confusing or difficult to interpret. Can often remove large number without any marked effect. Post-hoc removal of variables is not valid in a hypothesis-testing analysis.

4) Linear combinations – environmental variables cannot be linear combinations of other variables. If a variable is a linear combination of other variables, singular matrix results.

  Examples:  -    total cations, Ca, Mg, Na, K, etc. Delete total cations

- % clay, % silt, % sand-    dummy variables (granite or limestone or basalt)

5) Transformation of environmental data – how do we scale environmental variables in such a way that vegetation ‘perceives’ the environment? Need educated guesses.

  Log transformation usually sensible – 1 unit difference in N or P is probably more important at low concentrations than at high concentrations.

As statistical significance is assessed by randomisation tests, no need to transform data to fulfil statistical assumptions.

  Transformations useful to dampen influence of outliers.

  Environmental data automatically standardised in RDA and CCA.

2. Choice of environmental variables in constrained ordinations

Page 91: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

6) Dummy variables – factors such as bedrock type, land-use history, management, etc, usually described by categorical or class variables. 1 if belongs to class, 0 if it does not. For every categorical variable with K categories, only need K – 1 dummy variables e.g.

Granite Limestone Basalt GabbroPlot 1 1 0 0 0

2 0 1 0 03 1 0 0 04 0 1 0 05 0 0 1 06 0 0 0 1

7) Circular data – some variables are circular (e.g. aspect) and large values are very close to small values. Aspect – transform to trigonometric functions.

  northness = cosine (aspect)eastness = sine (aspect)

  Northness will be near 1 if aspect is generally northward and –1 if southward. Close to 0 if west or east.

  Day of year – usually not a problem unless dealing with sampling over whole year. Can create ‘winterness’ and ‘springness’ variables as for aspect.

8) Vegetation-derived variables – maximum height, total biomass, total cover, light penetration, % open ground can all be ‘environmental’ variables. Such variables SHOULD NOT BE USED in hypothesis testing, as danger of circular reasoning.

Page 92: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

9) Interaction terms – e.g. elevation * precipitation. Easy to implement, difficult to interpret. If elevation and precipitation interact to influence species composition, easy to make term but the ecological meaning of where in environmental space the sites or species are is unclear. Huge number of possibilities N variables ½ N (N – 1) possible interactions. 5 variables 10 interactions.

 

AVOID quadratic terms [e.g. pH * pH (pH2) (cf. multiple regression and polynomial terms)]. Can create an ARCH effect or warpage of ordination space.

 

Try to avoid interaction terms except in clearly defined hypothesis-testing studies where the null hypothesis is that ‘variables c and d do not interact together to influence the species composition’.

 

For interaction to be significant, eigenvalue 1 of the analysis with product term should be considerably greater than 1 when there is no product term and the t-value associated with the product term should be greater than 2 in absolute value.

 

Avoid product variables to avoid ‘data dredging’.

Page 93: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

1. Valid even without random samples

2. Relatively easy to take account of particular features of data

3. Can use 'non-standard' test statisticsTell us if a certain pattern could or could not be caused/arisen by chance. Completely specific to data set.

'Non-parametric' does not mean 'no assumptions'.

Validity of permutation results depends on validity of permutation types for particular data-type - time-series or line transects, spatial grids, repeated measures, split plots. All require particular types of permutations.

In restricted permutations, may be a limited number of possible permutations for a particular test. Increasing number of permutations in such cases does not 'improve' the p-value!

3. Monte Carlo permutation test results

Page 94: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Used to

(i) Find minimal adequate set of explanatory variables that explain the species data as well as the full data

(ii) Rank the environmental variables in importance

(iii) Evaluate the statistical significance of the effects on the species of a particular environmental variable unconditionally or conditionally on the effects of other environmental variables.

When applied repeatedly and in a step-wise fashion, shares the shortcomings of all regression selection procedures in that the overall size of the test in not controlled. In practice, too many variables are judged significant . The tests are too tolerant overall. Great care and patience are needed to find the 'minimal adequate model(s)'.

4. Forward selection procedures in CCA and RDA

Page 95: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

M.O. Hill (1988) Bull. Soc. Roy. Bot. Belg. 121, 134–41 "Ordination is a rather artificial technique. The idea that the world consists of a series of environmental gradients, along which we should place our vegetation samples, is attractive. But this remains an artificial view of vegetation. In the end the behaviour of vegetation should be interpreted in terms of its structure, the autoecology of its species and, above all, the time factor. At this level, trends become unimportant and multivariate analysis is perhaps irrelevant. Ordination is useful to provide a first description but it cannot provide deeper biological insights."

5. General limitations of ordination in environmental biology

Page 96: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

CONCLUSIONS

What can be done with modern ordination techniques in a combined exploratory and confirmatory way?

Hallgren, Palmer, & Milberg (1999) Journal of Ecology 87, 1037-1051

2000+ plots in cereal and oil-seed crops in Sweden 1970-1994

Page 97: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

AN INVESTIGATION OF BROAD-SCALE GRADIENTS IN SWEDISH WEED COMMUNITIES

E. Hallgren, M.W. Palmer & P. Milberg (1999), Journal of Ecology 87:1037-1051

Data Diving With Cross-validation:

Flow chart for the sequence of analyses employed in the study. Solid lines represent the flow of data and dashed lines the flow of ideas or analyses.

Page 98: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Map of Sweden with the geographical regions (A-H) indicated.

Environmental variables used in the analyses

Nominal variables

Sowing season (autumn or spring)

Geographical regions: Swedish counties (A-H)

Soil types (1, sandy soil; 2, fine sand soil; 3, silty soil; 4, loamy soil; 5, silty clay loam; 6, heavy to very heavy clay soil; 7, organogenic soil)

Crop species (barley, wheat, rye, oats, turnip rape, rape; categorised according to season of sowing)

Interval-scale variables

Year of trial (1970-1994)

Organic content (%; seven catgories)

Continuous variables

Nitrogen fetilization (N ha-1)

Crop yield (kg ha-1)

Page 99: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS
Page 100: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Techniques used

1.  Detrended correspondence analysis - reveal gradients in data

2.  Canonical correspondence analysis - Monte Carlo permutation tests on "trace" statistic

3. Partial DCA - any interpretable species patterns beyond the effects of measured environmental variables

4.  Partial CCA - can one set of variables explain variation in species composition not explained by a second set of variables

5.  Stepwise CCA

Exploratory data set 1000 plots

Confirmatory data set 1000 plots

Display data set 2359 plots

Page 101: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS
Page 102: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Autumn sown Spring sown

Page 103: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Variance partitioning

Partitioning of the explainable variable variation among the four groups of variables. TU is the variation described by T but not explained by U. TU is the variation jointly described by T and U. UT is the variation described by U but not by T.S = soil type; C = crop species; Y = year; G = geographical region

Percentage of explainable variation

Percentage of explainable variation

T U T|U TU U|T Other T U T|U TU U|T Other

Autumn

Spring

S G 28.6 9.4 33.9 28.2 S G 34.7 10.8 36.3 18.2

C SG 24.5 7.2 64.6 3.7 C SG 14.3 4.6 77.1 1.0

Y CSG 3.7* 2.4 93.9 - Y CSG 4.0* 4.2 91.9 -

S C 32.7 5.3 26.4 35.6 S C 43.4 2.0 16.9 37.6

S Y 36.4 1.6 4.5 57.5 S Y 45.3 0.17 8.0 46.6

C G 28.2 3.5 39.7 28.6 C G 14.9 4.0 43.1 38.0

C Y 29.0 2.7 3.3 65.0 C Y 14.9 4.0 4.2 77.0

G Y 42.5 0.78 5.3 51.5 G Y 46.1 1.0 7.1 45.8

Year – low Soil – high Geography - high Crop speices - high

Page 104: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Autumn sown Spring sown

Display phase

Page 105: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Autumn sown Spring sown

Page 106: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS
Page 107: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

CONCLUSIONS1. Besides analysing 'classical' ecological data (species x sites;

environmental variables x sites), ordination methods such as CCA and RDA with associated Monte Carlo permutation tests provide a means of examining multivariate experimental and monitoring data.

2. CCA and RDA are, in reality, reduced rank multivariate regression techniques with Y (many species) and X (environmental or predictor variables, design matrix in MANOVA) and Z (covariables defining, for example, plot design). 3. Ordination methods have progressed from basic two-dimensional plots (still very valuable if constructed correctly) to statistical testing of specific hypotheses about impacts, effects, etc. A semi-graphical approach to MANOVA but without MANOVA's crippling assumptions.

4. Due to recent progress, there is now a unified theory about choice of methods for particular data sets and research questions. Still several critical problems and potential pitfalls.

Page 108: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Outline of ordination techniques presented here. DCA (detrended correspondence analysis) was applied for the determination of the length of gradient (LG). LG is important for choosing between ordination based on a linear or on a unimodal response model. In cases where LG < 2.5 SD, ordination based on linear response models is considered to be the most appropriate. PCA (principal components analysis) visualizes variation in species data in relation to best fitting theoretical variables. Environmental variables explaining this visualized variation are deduced afterwards, hence, indirectly. RDA (redundancy analysis) visualizes variation in species data directly in relation to quantified environmental variables. Before analysis, covariables may be introduced in RDA to compensate for systematic differences in experimental units. After RDA, a permutation test can be used to examine the significance of effects.

Page 109: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

5. Ordination methods, both for indirect and direct gradient analysis, are now an important part of environmental biology. Current and potential applications are very great in ecology, monitoring, palaeoecology, limnology, ecotoxicology, analysis of genetic diversity, biogeography, behavioural ecology, restoration ecology, taxonomy, etc.

Page 110: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Andrew Lang 1844-1912. He uses statistics as a drunken man uses lamp-posts – for support rather than illumination.

From MacKay, 1977, and reproduced through the courtesy of the Institute of Physics.

The use of ordination prior to about 1988

Page 111: ORDINATION TECHNIQUES IN ENVIRONMENTAL BIOLOGY -  PROGRESS, PROBLEMS, AND PITFALLS

Cartoon illustrating statistical zap and shotgun approaches

Post-1988 Pre-1988