multidimensional scaling mds · stress of final configuration is 0.069 iteration history how low...
TRANSCRIPT
Multidimensional scaling
MDS
And other permutation based analyses
MDS Aim
• Graphical representation of
dissimilarities between objects in as few
dimensions (axes) as possible
• Graphical representation is termed an
“ordination” in ecology
• Axes of graph represent new variables
which are summaries of original
variables
Haynes & Quinn (unpublished)
• Four sites along Morwell River – site 1 upstream from planned sewage
outfall
– sites 2, 3 and 4 downstream
– site 3 below fish farm
• Abundance of all species of
invertebrates recorded from 3 stations
at each site
• 12 objects (sampling units): – 4 sites by 3 stations at each site
• 94 variables (species)
Do invertebrate communities (or
assemblages) differ between stations
and sites? – Is Site 1 different from rest?
Multidimensional scaling
1. Set up a raw data matrix
Species 1 2 3 4 5 etc.
Site/sample
S11 54 0 0 5 0
S12 37 1 0 4 0
S13 68 2 0 2 0
S21 60 0 0 0 1
S22 47 0 0 2 0
S23 60 0 0 0 0
etc.
2. Calculate a dissimilarity (Bray-Curtis) matrix
S11 S12 S13 S21 S22 S23 etc.
S11 .000
S12 .203 .000
S13 .666 .652 .000
S21 .216 .331 .759 .000
S22 .328 .410 .796 .191 .000
S23 .336 .432 .796 .183 .054 .000
etc.
3. Decide on number of dimensions
(axes) for the ordination:
– suspected number of underlying
ecological gradients
– match distances between objects on plot
and dissimilarities between objects as
closely as possible
– more dimensions means better match
– usually between 2 and 4 dimensions
4. Arrange objects (eg. sampling units)
initially on ordination plot in chosen
number of dimensions
– starting configuration
– usually generated randomly
Starting configuration
-2 -1 0 1 2
-2
-1
0
1
2
Axis I
Axis II
Site 1 Site 3 Site 2 Site 4
5. Compare distances between objects on
ordination plot and Bray-Curtis
dissimilarities between objects
– strength of relationship measured by
Kruskal’s stress value
– measures “badness of fit” so lower values
indicate better match
– plot is called Shepard plot
Starting configuration
-2 -1 0 1 2
-2
-1
0
1
2
Axis I
Axis II
Site 1
Site 3
Site 2
Site 4
0 0.5 1 0
1
2
3
Dissimilarity
Distance
Shepard plot
Stress = 0.394
6. Move objects on ordination plot
iteratively by method of steepest
descent
– each step improves match between
dissimilarities and distances between
objects on ordination plot
– lowers stress value
0 0.5 1 0
1
2
3
Dissimilarity
Distance
-2 -1 0 1 2 -2
-1
0
1
2
Axis I
Axis II
After 20 iterations
Stress = 0.119
7. Final configuration
• further moving of objects on ordination
plot cannot improve match between
dissimilarities and distances
• stress as low as possible
0 0.5 1 0
1
2
3
Dissimilarity
Distance
-2 -1 0 1 2 -2
-1
0
1
2
Axis II
Axis I
Final configuration - 50 iterations
Stress = 0.069
Iteration Stress
1 0.394
2 0.368
3 0.357
4 0.351
... ...
20 0.119
... ...
49 0.069
50 0.069
Stress of final configuration is 0.069
Iteration history
How low should stress be?
Clarke (1993) suggests:
• > 0.20 is basically random
• < 0.15 is good
• < 0.10 is ideal
– configuration is close to actual
dissimilarities
How many dimensions?
• Increasing no. of dimensions above 4
usually offers little reduction in stress
• 2 or 3 dimensions usually adequate to
get good fit (ie. low stress)
• 2 dimensions straightforward to plot
Lonhart (unpublished data)
• Effects of depth and piling location on
marine fouling assemblage
• Two pilings, four sides of each panel,
two depths, sampled 4 times
• 40 species in total recorded
• MDS to examine relationship piling
location and depth on invertebrate
community – Does the community vary as a function of
depth?
– Does the community vary as a function of
pilling location?
– Does the effect of depth on the community
vary as a function of piling location?
• Bray-Curtis dissimilarity
• Non-metric MDS
• ANOSIM / PERMANOVA
• SIMPER
Transform: Square root
Resemblance: S17 Bray Curtis similarity
Date2_22_2010
3_05_2010
3_18_2010
4_02_2010
2D Stress: 0.17
MDS Plot
Transform: Square root
Resemblance: S17 Bray Curtis similarity
Piling8381
8179
2D Stress: 0.17
Transform: Square root
Resemblance: S17 Bray Curtis similarity
DepthShallow
Deep
2D Stress: 0.17
Transform: Square root
Resemblance: S17 Bray Curtis similarity
PilingDepth8381Shallow
8381Deep
8179Shallow
8179Deep
2D Stress: 0.17
Comparing groups in MDS
• 2 Piling locations
• 2 Depths
• 8 replicates per treatment combination (4
sides x 2 samples)
• Are sites significantly different in species
composition?
• Is there an ANOVA-like equivalent for
MDS?
Procedure 1:Analysis of
similarities - ANOSIM
• Uses (dis)similarity matrix
• Because dissimilarities are not normally distributed, uses ranks of pairwise dissimilarities
• Because dissimilarities are not independent of each other, uses randomization test rather than usual significance testing procedure
• Generates own test statistic (called R) by randomization of rank dissimilarities
• Available through PRIMER package
Lonhart ANOSIM
• Depth effect R = 0.305, P = 0.001 so reject Ho.
- Significant differences between depths
• Piling location R = 0.761 , P = 0.001 so reject Ho
- Significant difference by Piling
Permanova (permutation
ANOVA)
• Run just like an ANOVA
• Sums of Squares can be partitioned in
multivariate space (based on distances to
multidimensional centroids)
• P – values based on permutations of the
analysis
Permanova (permutation
ANOVA)
PERMANOVA table of results
Unique
Source df SS MS Pseudo-F P(perm) perms
Depth 1 14884 14884 15.67 0.001 999
Piling 1 70878 70878 74.623 0.001 999
DepthxPiling 1 10558 10558 11.116 0.001 999
Res 124 1.1778E5 949.82
Total 127 2.141E5
Transform: Square root
Resemblance: S17 Bray Curtis similarity
Piling8381
8179
2D Stress: 0.17
Transform: Square root
Resemblance: S17 Bray Curtis similarity
DepthShallow
Deep
2D Stress: 0.17
Transform: Square root
Resemblance: S17 Bray Curtis similarity
PilingDepth8381Shallow
8381Deep
8179Shallow
8179Deep
2D Stress: 0.17
Interaction effect
Which variables (species) most
important?
• For MDS-type analyses, three methods:
– correlate individual variables (species abundances) with axis scores – like PCA loadings
– SIMPER (similarity percentages) to determine which species contribute most to Bray-Curtis dissimilarity
– CA (Correspondence Analyis)to simultaneously ordinate objects and species - biplots
SIMPER (similarity percentages)
|yij - yik|
Bray-Curtis dissimilarity =
yij + yik)
Note is summing over each species, 1 to p.
The contribution of species i is:
|yij - yik|
i =
yij + yik)
Simper results – comparing deep
depths between Pilings
Groups 8381Deep & 8179Deep
Average dissimilarity = 77.47
Group 8381Deep
Group 8179Deep
Species Av.Abund
Av.Abund Av.Diss Diss/SD Contrib% Cum.%
Watersipora, live 11.34 0 11.58 1.63 14.94 14.94
Detritus 3.28 13.34 10.7 1.7 13.81 28.75
Corynactis californica 0 7.53 7.68 1.15 9.92 38.67
Burgundy crust 0 6.66 6.79 1.04 8.77 47.44
Diplosoma listerianum 6.97 2.41 6.4 0.8 8.26 55.7
CaCO3 9.13 8.16 6.06 1.43 7.82 63.52
Dead bryozoan 5.41 0.19 5.35 1.16 6.91 70.42
Orange bryozoan 5 0 5.1 0.83 6.59 77.01
Dead Watersipora 4.88 0 4.97 0.9 6.42 83.43
Ascidia ceratodes 0.09 4.91 4.95 0.83 6.39 89.82
Rhynchozoon (brwn bryo) 1 1.44 2.04 0.67 2.64 92.45
Are these results interpretable
graphically?
Transform: Square root
Resemblance: S17 Bray Curtis similarity
Watersipora, live
0
10
20
30
2D Stress: 0.17
Transform: Square root
Resemblance: S17 Bray Curtis similarity
PilingDepth8381Shallow
8381Deep
8179Shallow
8179Deep
2D Stress: 0.17
Linking biota MDS to
environmental variables
• Are differences in species composition
related to differences in environmental
variables?
• Correlate MDS axis scores with
environmental variables
• BIO-ENV procedure - correlates
dissimilarities from biota with
dissimilarities from environmental variables
BIO-ENV procedure
Samples
Species
abundances
Env
variables
Euclidean
Bray-Curtis
Subsets of
variables
Rank correlation - Spearman
- Weighted Spearman
Dissimilarity matrix
BIO-ENV correlations
• Exploratory rather than hypothesis testing
procedure.
• Tries to find best combination of
environmental variables, ie. combination
most correlated with biotic dissimilarities.
• A priori chosen correlations can be tested
with RELATE procedure - randomization
test of correlation.
Example
• Bristol Bay Zooplankton
• 57 stations
• 25 species sampled
• Salinity measures taken at the same time
• Question: is zooplankton community related
to salinity
Zooplankton community data
Community Matrix
NMDS plot
Bristol Channel zooplankton
Non-metric MDSTransform: Square root
Resemblance: S17 Bray-Curtis similarity
1
2
34
567
8
9
10
1112
13
14
15
16
171819
20
21
22
23
24
25
2627
28
29
31
32
3334
35
36
37
38
3940
41
42
43
44
45
46
47
4849
50
51
52
5354
55
56
5758
2D Stress: 0.1
Bristol Channel zooplankton
Non-metric MDSTransform: Square root
Resemblance: S17 Bray-Curtis similarity
Salinity
1.8
4.2
6.6
9
1
2
34
567
8
9
10
1112
13
14
15
16
171819
20
21
22
23
24
25
2627
28
29
31
32
3334
35
36
37
38
3940
41
42
43
44
45
46
47
4849
50
51
52
5354
55
56
5758
2D Stress: 0.1
NMDS plot with Salinity Bubbles
Salinity data
Salinity Matrix
RELATE procedure
Samples
Species
abundances
Env
variables
Euclidean
Bray-Curtis
All variables
Rank correlation - Spearman
- Weighted Spearman
Dissimilarity matrix
RELATE the matrices
Bristol Channel salinity group (1-9 in increasing salinity)RELATE
-0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Rho
0
56
Fre
quency
0.741
Parameters
Correlation method: Spearman rank
Sample statistic (Rho): 0.741
Significance level of sample statistic: 0.1 % (=<0.001)
Number of permutations: 999
Number of permuted statistics greater than or equal to Rho: 0
A more complicated example – linking multivariate
biological data to multivariate environmental data
• Biological data: Nematode species (>100)
abundance at 19 sites in Exe estuary
• Environmental:
– MPD: mean particle diameter
– % Org: Percent organic matter
– WT: water table depth
– H2S: depth of Hydrogen sulfide layer
– Sal: interstitial salinity
– Ht: Intertidal range
Environmental NMDS Exe estuary
Non-metric MDSNormalise
Resemblance: D1 Euclidean distance
1
2
3
4
5
6
7
89
10
11
12
13
14
15
16
17
1819
2D Stress: 0.06
Biological NMDS
Exe nematodes (19 sites averaged over season)Non-metric MDS
Transform: Square root
Resemblance: S17 Bray-Curtis similarity
site1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
123
4
5
6
7 8 9
10
11
121314
15
16
17
18
19
2D Stress: 0.05
Linking Environment to Community Exe nematodes (19 sites averaged over season)
Non-metric MDSTransform: Square root
Resemblance: S17 Bray-Curtis similarity
Med Part Diam
0.2
0.8
1.4
2
site1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2D Stress: 0.05
Exe nematodes (19 sites averaged over season)Non-metric MDS
Transform: Square root
Resemblance: S17 Bray-Curtis similarity
Interstit Salinity
19
46
73
100
site1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2D Stress: 0.05
Exe nematodes (19 sites averaged over season)Non-metric MDS
Transform: Square root
Resemblance: S17 Bray-Curtis similarity
Dep Water Tab
2
8
14
20
site1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2D Stress: 0.05
Exe nematodes (19 sites averaged over season)Non-metric MDS
Transform: Square root
Resemblance: S17 Bray-Curtis similarity
%Organics
0.8
3.2
5.6
8
site1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2D Stress: 0.05
Formally
• First: use RELATE to determine
relationship between the biological
community and the environmental
community
RELATE procedure
Samples
Species
abundances
Env
variables
Euclidean
Bray-Curtis
All variables
Rank correlation - Spearman
- Weighted Spearman
Dissimilarity matrix
Exe estuaryRELATE
-0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Rho
0
67
Fre
quency
0.791
Parameters
Correlation method: Spearman rank
Sample statistic (Rho): 0.791
Significance level of sample statistic: 0.1 % (<0.001)
Number of permutations: 999
Number of permuted statistics greater than or equal to Rho: 0
Formally
• First: use RELATE to determine
relationship between the biological
community and the environmental
community
• Second: Use BIO ENV to determine best fit
of environmental variables to Biological
Community
BIO-ENV procedure
Samples
Species
abundances
Env
variables
Euclidean
Bray-Curtis
Subsets of
variables
Rank correlation - Spearman
- Weighted Spearman
Dissimilarity matrix
Select best model
Best result for each number of variables
No.Vars Corr. Selections
1 0.676 Dep H2S layer
2 0.777 Dep H2S layer,Interstit Salinity
3 0.816 Med Part Diam,Dep H2S layer,Interstit Salinity
4 0.811 Med Part Diam,Dep H2S layer,%Organics,Interstit Salinity
5 0.804 Med Part Diam,Dep H2S layer,Shore height,%Organics,Interstit Salinity
6 0.791 Med Part Diam,Dep Water Tab,Dep H2S layer,Shore height,%Organics,Interstit Salinity
Exe nematodes (19 sites averaged over season)Non-metric MDS
Transform: Square root
Resemblance: S17 Bray-Curtis similarity
Med Part Diam
0.2
0.8
1.4
2
site1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2D Stress: 0.05
Linking Environment to Community – model results
Exe nematodes (19 sites averaged over season)Non-metric MDS
Transform: Square root
Resemblance: S17 Bray-Curtis similarity
Interstit Salinity
19
46
73
100
site1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2D Stress: 0.05
Exe nematodes (19 sites averaged over season)Non-metric MDS
Transform: Square root
Resemblance: S17 Bray-Curtis similarity
Dep H2S layer
2
8
14
20
site1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2D Stress: 0.05
Best result for each number of variables
No.Vars Corr. Selections
1 0.676 Dep H2S layer
2 0.777 Dep H2S layer,Interstit Salinity
3 0.816 Med Part Diam,Dep H2S layer,Interstit Salinity
4 0.811 Med Part Diam,Dep H2S layer,%Organics,Interstit
Salinity
5 0.804 Med Part Diam,Dep H2S layer,Shore
height,%Organics,Interstit Salinity
6 0.791 Med Part Diam,Dep Water Tab,Dep H2S layer,Shore
height,%Organics,Interstit Salinity
Procedure 1:Analysis of
similarities - ANOSIM
• Uses (dis)similarity matrix
• Because dissimilarities are not normally distributed, uses ranks of pairwise dissimilarities
• Because dissimilarities are not independent of each other, uses randomization test rather than usual significance testing procedure
• Generates own test statistic (called R) by randomization of rank dissimilarities
• Available through PRIMER package
Null hypothesis
Average of rank dissimilarities between objects
within groups = average of rank dissimilarities
between objects between groups
rB = rW
No difference in species composition between
groups
Within group dissimilarities
Between group dissimilarities
Test statistic
R average of rank dissimilarities between objects
between groups - average of rank
dissimilarities between objects within groups
R = (rB - rW) / (M / 2) where M = n(n-1)/2
• R between -1 and +1.
• Use randomization test to generate probability
distribution of R when H0 is true.
Lonhart ANOSIM
• Depth effect R = 0.305, P = 0.001 so reject Ho.
- Significant differences between depths
• Piling location R = 0.761 , P = 0.001 so reject Ho
- Significant difference by Piling
SIMPER (similarity percentages)
|yij - yik|
Bray-Curtis dissimilarity =
yij + yik)
Note is summing over each species, 1 to p.
The contribution of species i is:
|yij - yik|
i =
yij + yik)
Which species discriminate
groups of objects?
• Calculate average i over all pairs of objects between groups
– larger values indicate species contribute more to group differences
• Calculate standard deviation of i
– smaller values indicate species contribution is consistent across all pairs of objects
• Calculate ratio of i / SD(i)
– larger values indicate good discriminating species between 2 groups