the same dataset
DESCRIPTION
O PTIMCLASS: Simultaneous identification of optimal clustering method and optimal number of clusters in vegetation classification studies. Tich y L ubomír 1 , Chytr y M ilan 1 , B otta-Dukát Zoltán 2 , Hájek M ichal 1 ; Talbot S tephen S. 3 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/1.jpg)
OPTIMCLASS: Simultaneous identification of optimal clustering method and optimal
number of clusters in vegetation classification studies
Tichy Lubomír1, Chytry Milan1, Botta-Dukát Zoltán2, Hájek Michal1; Talbot Stephen S.3
1Masaryk University, Brno, Czech Republic2Hungarian Academy of Sciences, Vácrátot, Hungary
3U.S. Fish and Wildlife Service, Anchorage, USA
![Page 2: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/2.jpg)
Why do we need a method for identification of optimal clustering algorithm and optimal number of clusters?
The same dataset
![Page 3: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/3.jpg)
-A huge variety of clustering methods produce “reasonable” results.
-Subjective selection of the clustering method and no. of clusters is usually based on empirical experience
Why do we need a method for identification of optimal clustering algorithm and optimal number of clusters?
Methods published:
Most algorithms identify the optimal partition mathematically, without considering ecological interpretation
![Page 4: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/4.jpg)
The Method
A posteriori description of phytosociological tables is based on
diagnostic species
Diagnostic species describes a cluster. Therefore, the number of diagnostic species determines whether the classified table can be sufficiently interpreted.
Species 1 98788 12112 3.211Species 2 51123 1223. 11132Species 3 23132 ..... .....Species 4 ..2.4 112.. 1..5.Species 5 ..... .1.1. 1.213
![Page 5: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/5.jpg)
The Method The samedataset:
![Page 6: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/6.jpg)
The Method
Measure of the classification quality: the total sum of diagnostic species
Fisher’s Exact Test
calculates the probability of observed occurrence of species across clusters for a right-tailed test hypothesis
– The measure reduces the importance of very small clusters.
– Easy interpretation: the more diagnostic species in the dataset, the better description of the clusters.
![Page 7: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/7.jpg)
The Method Test on three different datasets
Southern Siberia, Sayan Mountains (310 plots; forest, steppe and tundra vegetation)
Central Europe, Carpathians (241 plots; mire vegetation)
Alaska, Kenai Peninsula(171 plots; wetlands)
![Page 8: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/8.jpg)
The Method Classifications tested
Flexible beta clustering WARD‘s clustering
UPGMA(PC-ORD)
Cover transformations (percentages, log percentages,
Braun-Blanquet, presence/absence)
Distance measures(Bray-Curtis, Manhattan,
Euclidean)
Ordinal cluster analysis(SYN-TAX)
Modified TWINSPAN classification
(JUICE) The sequence of splits in divisive
classification is determined by internal heterogeneity of clusters.
Therefore, any number of clusters is possible
(three modifications of pseudospecies cut levels)
Distance measures (Kruskal-Wallis, Kendall,
Gower-Podani coefficient)
![Page 9: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/9.jpg)
Results Sayan Mountains, Siberia(310 plots, 1036 species)
0
100
200
300
400
500
600
700
800
900
0 5 10 15 20 25 30 35 40 45 50
0
50
100
150
200
250
300
0 5 10 15 20 25 30 35 40 45 50
0
50
100
150
0 5 10 15 20 25 30 35 40 45 50
Probability = 10-3
Probability = 10-6
Probability = 10-9
No. of clusters
No
. o
f d
iag
no
stic
sp
ecie
s No. of clustersNo. of clusters
No
. o
f d
iag
. s
pec
.
No
. o
f d
iag
. s
pec
.
![Page 10: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/10.jpg)
Results Sayan Mountains, Siberia(310 plots, 1036 species)
Untransformed cover data
0
50
100
150
200
250
300
0 5 10 15 20 25 30 35 40 45 50
Nu
mb
er o
f d
iag
no
stic
sp
ecie
s
Number of clusters
![Page 11: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/11.jpg)
Results Sayan Mountains, Siberia(310 plots, 1036 species)
Euclidean distance measure
0
50
100
150
200
250
300
0 5 10 15 20 25 30 35 40 45 50
Nu
mb
er o
f d
iag
no
stic
sp
ecie
s
Number of clusters
![Page 12: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/12.jpg)
Results Sayan Mountains, Siberia(310 plots, 1036 species)
Manhattan distance measure
0
50
100
150
200
250
300
0 5 10 15 20 25 30 35 40 45 50
Nu
mb
er o
f d
iag
no
stic
sp
ecie
s
Number of clusters
![Page 13: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/13.jpg)
Results Sayan Mountains, Siberia(310 plots, 1036 species)
Bray-Curtis distance measure
0
50
100
150
200
250
300
0 5 10 15 20 25 30 35 40 45 50
Nu
mb
er o
f d
iag
no
stic
sp
ecie
s
Number of clusters
![Page 14: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/14.jpg)
Results Sayan Mountains, Siberia(310 plots, 1036 species)
UPGMA
0
50
100
150
200
250
300
0 5 10 15 20 25 30 35 40 45 50
Nu
mb
er o
f d
iag
no
stic
sp
ecie
s
Number of clusters
![Page 15: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/15.jpg)
Results Sayan Mountains, Siberia(310 plots, 1036 species)
Ward‘s method
0
50
100
150
200
250
300
0 5 10 15 20 25 30 35 40 45 50
Nu
mb
er o
f d
iag
no
stic
sp
ecie
s
Number of clusters
![Page 16: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/16.jpg)
Results Sayan Mountains, Siberia(310 plots, 1036 species)
Flexible beta -0.25
0
50
100
150
200
250
300
0 5 10 15 20 25 30 35 40 45 50
Nu
mb
er o
f d
iag
no
stic
sp
ecie
s
Number of clusters
![Page 17: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/17.jpg)
Results Sayan Mountains, Siberia(310 plots, 1036 species)
Ordinal cluster analyses (SYN-TAX)
0
50
100
150
200
250
300
0 5 10 15 20 25 30 35 40 45 50
Nu
mb
er o
f d
iag
no
stic
sp
ecie
s
Number of clusters
![Page 18: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/18.jpg)
Results Sayan Mountains, Siberia(310 plots, 1036 species)
Modified TWINSPAN
0
50
100
150
200
250
300
0 5 10 15 20 25 30 35 40 45 50
Nu
mb
er o
f d
iag
no
stic
sp
ecie
s
Number of clusters
![Page 19: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/19.jpg)
The Method Test on three different datasets
Southern Siberia, Sayan Mountains (310 plots; forest, steppe and tundra vegetation)
Central Europe, Carpathians (241 plots; mire vegetation)
Alaska, Kenai Peninsula(171 plots; wetlands)
Similar results:
![Page 20: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/20.jpg)
Conclusions
Classifications based on transformed cover values give better results than percentage covers.
Euclidean distance - slightly poorer results than Manhattan or Bray-Curtis distances.
UPGMA clustering method - poorer results than Ward’s and Flexible beta methods.
No significant difference between ordinal cluster analysis proposed by Podani (SYN-TAX 2000) and other clustering methods.
Modified TWINSPAN – performs well with small numbers of clusters.
![Page 21: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/21.jpg)
![Page 22: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/22.jpg)
![Page 23: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/23.jpg)
0
50
100
150
200
250
300
0 5 10 15 20 25 30 35 40 45 50
Number of clusters
Nu
mb
er o
f d
iag
no
stic
sp
ecie
s o
ccu
rren
ces Modified TWINSPAN classification
![Page 24: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/24.jpg)
0
50
100
150
200
250
300
0 5 10 15 20 25 30 35 40 45 50
Number of clusters
Su
m o
f d
iag
no
stic
sp
ecie
sModified TWINSPAN classification
![Page 25: The same dataset](https://reader036.vdocuments.net/reader036/viewer/2022062408/568138e4550346895da09510/html5/thumbnails/25.jpg)
0
5
10
15
20
0 5 10 15 20 25 30 35 40 45 50
Number of clusters
Nu
mb
er o
f cl
ust
ers
wit
h m
ore
th
an 4
dia
gn
ost
ic s
pec
ies
Modified TWINSPAN classification