robustness of three hierarchical agglomerative clustering

116

Upload: dangkien

Post on 03-Feb-2017

237 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Robustness of three hierarchical agglomerative clustering

RH-12-2008

Thesis for the degree of Master of Science in Environment and Natural

Resources

Robustness of three hierarchicalagglomerative clustering

techniques for ecological data

Warsha Singh

Faculty of Natural Sciences

Department of Mathematics

October 2008

Page 2: Robustness of three hierarchical agglomerative clustering
Page 3: Robustness of three hierarchical agglomerative clustering

A thesis submitted in partial ful�llment of the requirements for the degree of Master

of Science in Environment and Natural Resources at the University of Iceland.

Robustness of three hierarchical agglomerative clustering techniques for ecological

data

Warsha Singh

Science Institute Report: RH-12-2008

c© Warsha Singh 2008

Committee in charge:

Dr. Gunnar Stefánsson (Department of Mathematics, University of Iceland)

Dr. Einar Hjörleifsson (Marine Research Institute of Iceland)

Moderator:

Dr. Erla Björk Ornolfsdóttir (Marine Research Center Breiðafjörður)

iii

Page 4: Robustness of three hierarchical agglomerative clustering

iv

Page 5: Robustness of three hierarchical agglomerative clustering

Contents

Abstract xv

Acknowledgement xvi

1 Introduction 1

1.1 Purpose of the study . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Statistical Theory 11

2.1 Hierarchical agglomerative clustering . . . . . . . . . . . . . . . . . . 11

2.1.1 Average linkage (UPGMA) . . . . . . . . . . . . . . . . . . . . 12

2.1.2 Complete linkage . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.3 Ward's linkage . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Non-Metric Multidimensional Scaling (NMDS) . . . . . . . . . . . . . 13

3 Methodology 15

3.1 Icelandic Ground�sh Survey . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.3 Hierarchical cluster analysis - Species Assemblages . . . . . . . . . . . 17

3.3.1 Analysis I: Correlation distance . . . . . . . . . . . . . . . . . 17

3.3.2 Analysis II: Bray-Curtis distance . . . . . . . . . . . . . . . . 18

3.3.3 Data Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.4 Comparison of the hierarchical clustering techniques . . . . . . . . . . 22

3.5 Comparison of hierarchical clustering with non-metric multidimen-

sional scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.6 Fish Assemblages in relation to environmental variables . . . . . . . . 23

3.7 Habitat analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.8 Heatmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

v

Page 6: Robustness of three hierarchical agglomerative clustering

vi CONTENTS

4 Results 25

4.1 Comparison of the three hierarchical clustering techniques . . . . . . 25

4.1.1 Analysis I: Correlation distance . . . . . . . . . . . . . . . . . 25

4.1.2 Analysis II: Bray-Curtis distance . . . . . . . . . . . . . . . . 26

4.2 Sample size e�ect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.2.1 Analysis I: Correlation distance . . . . . . . . . . . . . . . . . 36

4.2.2 Analysis II: Bray-Curtis distance . . . . . . . . . . . . . . . . 37

4.3 Data Aggregation (smoothing) e�ect . . . . . . . . . . . . . . . . . . 50

4.3.1 Analysis I: Correlation distance . . . . . . . . . . . . . . . . . 50

4.3.2 Analysis II: Bray-Curtis distance . . . . . . . . . . . . . . . . 50

4.4 Comparison of hierarchical clustering with non-metric multidimen-

sional scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.5 Fish Assemblages in relation to environmental variables . . . . . . . . 57

4.5.1 Analysis I: Correlation distance . . . . . . . . . . . . . . . . . 57

4.5.2 Analysis II: Bray-Curtis distance . . . . . . . . . . . . . . . . 58

4.6 Habitat Classi�cation . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.6.1 Analysis I: Correlation distance . . . . . . . . . . . . . . . . . 71

4.6.2 Analysis II: Bray-Curtis distance . . . . . . . . . . . . . . . . 71

4.7 Heatmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5 Discussion 77

5.1 Fish Assemblages and species-environment relationships . . . . . . . . 82

6 Main considerations and recommendations 85

A Appendix 89

Page 7: Robustness of three hierarchical agglomerative clustering

List of Figures

3.1 Icelandic ground�sh survey area within the 500 meter contour line,

outlining the statistical rectangles and the locations of the stations . . 16

3.2 Distribution of the data (a) before and (b) after transforming to

fourth root and scaling to zero mean and variance 1, for four abundant

species in the survey, as labelled. The histogram shows the number

of �sh per tow collections. . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3 Distribution of the data (a) before and (b) after transforming to

fourth root and standardising by range, for four adundant species

in the survey, as labelled. The histogram shows the number of �sh

per tow collections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.1 Dendrogram of species assemblage for the Icelandic Ground�sh (IGF)

survey from 1998-2007 using (a) Average linkage and (b) Complete

linkage, with correlation dissimilarity measure. Data consists of species

abundance in numbers, fourth root transformed and scaled to 0 mean

and variance 1, comprising of all tow collections. The rectangles high-

light the clusters with AU > 0.9. The AU values are used for interpre-

tation are indicated in blue and the cluster number (edge) is marked

in green. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Dendrogram of species assemblage using Ward's linkage with corre-

lation dissimilarity measure. Data consists of species abundance in

numbers fourth root transformed and scaled to 0 mean and variance

1. The rectangles highlight the clusters with AU > 0.9. . . . . . . . . 29

vii

Page 8: Robustness of three hierarchical agglomerative clustering

viii LIST OF FIGURES

4.3 Dendrogram of species assemblage using (a) Average linkage and (b)

Complete linkage, with correlation dissimilarity measure. Data con-

sists of mean species abundance in numbers by stations, fourth root

transformed and scaled to 0 mean and variance 1. The rectangles

highlight the identi�ed species assemblages for comparison. . . . . . . 30

4.4 Dendrogram of species assemblage using Ward's linkage with correla-

tion dissimilarity measure. Data consists of mean species abundance

in numbers by stations, fourth root transformed and scaled to 0 mean

and variance 1. The rectangles highlight the identi�ed species assem-

blages for comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.5 Dendrogram of species assemblage using (a) Average linkage and (b)

Complete linkage with Bray-Curtis dissimilarity measure. Data con-

sists of species abundance in numbers, fourth root transformed and

standardised by range. The rectangles highlight the clusters with AU

> 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.6 Dendrogram of species assemblage using Ward's linkage with Bray-

Curtis dissimilarity measure. Data consists of species abundance in

numbers, fourth root transformed and standardised by range. The

rectangles highlight the clusters with AU > 0.9. . . . . . . . . . . . . 33

4.7 Dendrogram of species assemblage using (a) Average linkage and (b)

Complete linkage with Bray-Curtis dissimilarity measure. Data con-

sists of mean species abundance in numbers by stations, fourth root

transformed and standardised by range. The rectangles highlight the

identi�ed species assemblages for comparison. . . . . . . . . . . . . . 34

4.8 Dendrogram of species assemblage using Ward's linkage with Bray-

Curtis dissimilarity measure. Data consists of mean species abun-

dance in numbers by stations, fourth root transformed and standard-

ised by range. The rectangles highlight the identi�ed species assem-

blages for comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.9 Dendrogram of species assemblage using Average linkage with corre-

lation dissimilarity measure. Data consists of species abundance in

numbers, fourth root transformed and scaled to 0 mean and variance

1, comprising of (a) 50% random subsample and (b) 25% random

subsample of the total tow collections. The rectangles highlight the

clusters with AU > 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Page 9: Robustness of three hierarchical agglomerative clustering

LIST OF FIGURES ix

4.10 Dendrogram of species assemblage using Average linkage with corre-

lation dissimilarity measure. Data consists of species abundance in

numbers, fourth root transformed and scaled to 0 mean and variance

1, comprising of 10% random subsample of the total tow collections.

The rectangles highlight the clusters with AU > 0.9. . . . . . . . . . 39

4.11 Dendrogram of species assemblage using Complete linkage with cor-

relation dissimilarity measure. Data consists of mean species abun-

dance in numbers by stations, fourth root transformed and scaled

to 0 mean and variance 1, comprising of (a) 50% random subsam-

ple and (b) 25% random subsample of the total tow collections. The

rectangles highlight the clusters with AU > 0.9. . . . . . . . . . . . . 40

4.12 Dendrogram of species assemblage using Complete linkage with corre-

lation dissimilarity measure. Data consists of mean species abundance

in numbers by stations, fourth root transformed and scaled to 0 mean

and variance 1, comprising of 10% random subsample of the total tow

collections. The rectangles highlight the clusters with AU > 0.9. . . . 41

4.13 Dendrogram of species assemblage using Ward's linkage with corre-

lation dissimilarity measure. Data consists of species abundance in

numbers, fourth root transformed and scaled to 0 mean and variance

1, comprising of (a) 50% random subsample and (b) 25% random

subsample of the total tow collections. The rectangles highlight the

clusters with AU > 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.14 Dendrogram of species assemblage using Ward's linkage with corre-

lation dissimilarity measure. Data consists of species abundance in

numbers, fourth root transformed and scaled to 0 mean and variance

1, comprising of 10% random subsample of the total tow collections.

The rectangles highlight the clusters with AU > 0.9. . . . . . . . . . 43

4.15 Dendrogram of species assemblage using Average linkage with Bray-

Curtis dissimilarity measure. Data consists of species abundance in

numbers, fourth root transformed and standardised by range, com-

prising of (a) 50% random subsample and (b) 25% random subsample

of the total tow collections. The rectangles highlight the clusters with

AU > 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Page 10: Robustness of three hierarchical agglomerative clustering

x LIST OF FIGURES

4.16 Dendrogram of species assemblage using Average linkage with Bray-

Curtis dissimilarity measure. Data consists of species abundance in

numbers, fourth root transformed and standardised by range, com-

prising of 10% random subsample of the total tow collections. The

rectangles highlight the clusters with AU > 0.9. . . . . . . . . . . . . 45

4.17 Dendrogram of species assemblage using Complete linkage with Bray-

Curtis dissimilarity measure. Data consists of mean species abun-

dance in numbers by stations, fourth root transformed and standard-

ised by range, comprising of (a) 50% random subsample and (b) 25%

random subsample of the total tow collections. The rectangles high-

light the clusters with AU > 0.9. . . . . . . . . . . . . . . . . . . . . 46

4.18 Dendrogram of species assemblage using Complete linkage with Bray-

Curtis dissimilarity measure. Data consists of mean species abun-

dance in numbers by stations, fourth root transformed and standard-

ised by range, comprising of a 10% random subsample of the total

tow collections. The rectangles highlight the clusters with AU > 0.9. 47

4.19 Dendrogram of species assemblage using Ward's linkage with Bray-

Curtis dissimilarity measure. Data consists of species abundance in

numbers, fourth root transformed and standardised by range, com-

prising of (a) 50% random subsample and (b) 25% random subsample

of the total tow collections. The rectangles highlight the clusters with

AU > 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.20 Dendrogram of species assemblage using Ward's linkage with Bray-

Curtis dissimilarity measure. Data consists of species abundance in

numbers, fourth root transformed and standardised by range, com-

prising of 10% random subsample of the total tow collections. The

rectangles highlight the clusters with AU > 0.9. . . . . . . . . . . . . 49

4.21 Dendrogram of species assemblage using (a) Average linkage and (b)

Complete linkage with correlation dissimilarity measure. Data con-

sists of mean species abundance in numbers by statistical subrectan-

gles, fourth root transformed and scaled to 0 mean and variance 1.

The rectangles highlight the clusters with AU > 0.9. . . . . . . . . . 51

Page 11: Robustness of three hierarchical agglomerative clustering

LIST OF FIGURES xi

4.22 Dendrogram of species assemblage using Ward's linkage with correla-

tion dissimilarity measure. Data consists of mean species abundance

in numbers by statistical subrectangles, fourth root transformed and

scaled to 0 mean and variance 1. The rectangles highlight the clusters

with AU > 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.23 Dendrogram of species assemblage using (a) Average linkage and (b)

Complete linkage with Bray-Curtis dissimilarity measure. Data con-

sists of mean species abundance in numbers by statistical subrectan-

gles, fourth root transformed and standardised by range. The rect-

angles highlight the clusters with AU > 0.9. . . . . . . . . . . . . . . 53

4.24 Dendrogram of species assemblage using Ward's linkage with Bray-

Curtis dissimilarity measure. Data consists of mean species abun-

dance in numbers by statistical subrectangles, fourth root transformed

and standardised by range. The rectangles highlight the clusters with

AU > 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.25 Multidimensional scaling using Bray-Curtis distance measure for (a)

the full data set (comprising all tow collections) (b) data aggregated

by statistical sub-rectangle. Species abundance in numbers was fourth

root transformed and standardised by range. . . . . . . . . . . . . . . 56

4.26 Geographical distribution of the 40 species analysed for this study,

labelled accordingly. The bubble plot shows the mean abundance of

species by statistical subrectangles averaged across years. The size of

circles are proportional to the square root of the mean abundance. . . 67

4.27 Weighted average depths and standard deviations for the 40 species

analysed. A-D refers to the identi�ed �sh assemblages from Ward's

hierarchical clustering based on correlation distance. . . . . . . . . . . 68

4.28 (a) Box and whisker plot for the mean depths of species in the iden-

ti�ed �sh assemblages from Ward's hierarchical clustering based on

correlation distance (b) Tukey test results showing the signi�cant dif-

ference between the identi�ed �sh assemblages (c) Box and whisker

plot for the mean depths of species in the identi�ed �sh assemblages

from Ward's hierarchical clustering based on Bray-Curtis distance

(d) Tukey test results showing the signi�cant di�erence between the

identi�ed �sh assemblages from (c) . . . . . . . . . . . . . . . . . . . 69

Page 12: Robustness of three hierarchical agglomerative clustering

xii LIST OF FIGURES

4.29 Weighted average depths and standard deviations for the 40 species

analysed. A*-C* refers to the identi�ed �sh assemblages from Ward's

hierarchical clustering based on Bray-Curtis distance. . . . . . . . . . 70

4.30 De�nition of areas in Icelandic waters using Ward's hierarchical clus-

tering. The data consist of species abundance in numbers transformed

to fourth root. Clustering was based on (a) correlation distance with

data scaled to 0 mean and variance 1 (b) Bray-Curtis distance with

data standardised by range. . . . . . . . . . . . . . . . . . . . . . . . 73

4.31 Species composition of de�ned clusters from the habitat classi�cation

using Correlation distance measure and Ward's linkage. The species

codes are outlined in Table 4 in the Appendix. . . . . . . . . . . . . . 74

4.32 Species composition of de�ned clusters from the habitat classi�cation

using Bray-Curtis distance measure and Ward's linkage. The species

codes are outlined in Table 4 in the Appendix. . . . . . . . . . . . . . 75

4.33 A heatmap showing the species-area association for the Icelandic

Ground�sh (IGF) survey from 1998-2007 using Average linkage hi-

erarchical clustering with correlation dissimilarity measure. The x-

axis shows the dendrogram of areas (statistical rectangles) and y-axis

shows the dendrogram of species assemblage. Data consists of species

abundance in numbers, fourth root transformed and scaled to 0 mean

and variance 1. The colours range from blue (low ratios) to red (high

ratios) indicating the strength of associations. . . . . . . . . . . . . . 76

A.1 De�nition of areas in Icelandic waters using (a) Average (b) Com-

plete hierarchical clustering with correlation distance. Data consists

of species abundance in numbers, transformed to fourth root and

scaled to 0 mean and variance 1. . . . . . . . . . . . . . . . . . . . . . 91

A.2 De�nition of areas in Icelandic waters using (a) Average (b) Com-

plete hierarchical clustering with Bray-Curtis distance. Data consists

of species abundance in numbers, transformed to fourth root and

standardised by range. . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Page 13: Robustness of three hierarchical agglomerative clustering

List of Tables

2.1 Parameter Values for the clustering algorithms used in this study . . 12

4.1 Cophenetic Correlation Coe�cient for Analysis I (Correlation dis-

tance) and II (Bray-Curtis distance) . . . . . . . . . . . . . . . . . . . 27

4.2 Agglomerative Coe�cient for Analysis I (Correlation distance) and

II (Bray-Curtis distance) . . . . . . . . . . . . . . . . . . . . . . . . . 27

A.1 The common and Latin names of the fourty most common species

analysed for this study with the codes used for analysis. . . . . . . . . 90

xiii

Page 14: Robustness of three hierarchical agglomerative clustering

Abstract

Although, cluster validity has been a subject of interest and importance in the �eld

of molecular genetics for some decades now, substantive guidelines are not readily

available for the choice of the appropriate clustering algorithms for ecological data.

This study tested the robustness of three common hierarchical agglomerative clus-

tering methods, Average, Complete and Ward's linkage, for identi�cation of species

assemblages. The Icelandic ground�sh survey data for the period 1998-2007 was used

for this study, taking the fourty most abundant species into consideration. The ob-

jective criteria used for cluster validity or e�ciency was the Cophenetic Correlation

Coe�cient (CPCC) and the Agglomerative Coe�cient (AC). In order to test the

reliability of the clusters bootstrap resampling technique was used to generate the

probability for the clusters. Furthermore, to examine the stability and consistency

of the linkage methods, their performances across di�erent sample sizes and levels

of data smoothing were tested. Two modes of data analyses based on a di�erent

combination of data standardisation and distance measure; (1) Correlation distance

on data scaled to zero mean and one variance and (2) Bray-Curtis distance on data

standardised by range, showed that Ward's clustering technique was the most robust

and suitable for this data set. It generated consistent well-de�ned clusters with high

probabilities and gave high values of CPCC and AC. The assemblages were also

ecologically meaningful when related to two environmental parameters, depth and

geographical distribution. A veri�cation of the hierarchical clusters with Non-metric

Multidimensional Scaling also gave similar species groupings. The Complete linkage

was unstable generating inconsistent results across di�erent sample sizes and data

smoothing. The Average linkage maximised CPCC but was sensitive to the way the

data were standardised. The CPCC criterion of cluster validity was not seen as a

very reliable and adequate measure in this study.

Subsequently, the main species assemblages o� the Icelandic waters, covered by

the survey, were de�ned. Biological interpretations of the �sh assemblages showed

xiv

Page 15: Robustness of three hierarchical agglomerative clustering

LIST OF TABLES xv

that the spatial structure of the environmental gradients around Iceland played a

role in characterising the �sh assemblages. A de�nition of areas around Iceland led

to a separation along the north-south gradient, according to the bathymetric and

hydrographic conditions, which further showed some di�erentiation along depth.

Furthermore, the use of a visualisation technique, the heatmap, was introduced for

exploring community patterns.

Page 16: Robustness of three hierarchical agglomerative clustering

Acknowledgment

I would like to acknowledge the Marine Research Institute of Iceland for making the

data on the Icelandic ground�sh survey available for this study. My sincere gratitude

goes to Dr. Gunnar Stefánsson of the Department of Mathematics, University of

Iceland and Dr. Einar Hjörliefsson of the Marine Research Institute of Iceland for

their technical guidance and continued valuable support throughout this study and

for their constructive comments in strengthening this study.

I am immensely and forever grateful to the co-ordinating team of the United

Nations University Fisheries Training Programme, Dr. Tumi T'omasson, Mr. Þor

Ásgeirsson and Ms. Sigridur Kr. Ingvarsdóttir for giving me this opportunity to

be a part of this Masters Programme in Environment and Natural Resources at the

University of Iceland. I would also like to acknowledge the continued support and

encouragement from Dr. Brynhildur Davidsdóttir (Co-ordinator for the Masters

Programme).

I thank Mr. Sigurdur þor Jóonsson of the Marine Research Institute of Iceland

for providing his technical assistance with the statistical software R.

xvi

Page 17: Robustness of three hierarchical agglomerative clustering

1Introduction

The shift toward ecosystem based �sheries management has resulted in numerous

studies, carried out world-wide, to determine �sh assemblages. This new approach

entails starting �sheries management at the ecosystem level rather than at sin-

gle species level (Pikitch et al., 2004). An initial step toward understanding the

ecosystem- or multispecies-based approaches is to understand the mechanisms of

the biological communities in space and time, including their correlation with the

environment (Sousa et al., 2005; Jaureguizar et al., 2006). Hence, the identi�cation

of �sh assemblages and their relation to environmental variables may be seen as

one probable measure of potential interactions between the species (Francis et al.,

2002). The term �sh assemblage refers to a group of species that coexist at a

geographical scale because of similar habitat preferences or biological interactions

(Jaureguizar et al., 2003; Mahon et al., 1998). Because these assemblages poten-

tially characterise geographical areas or environmental gradients, they are consid-

ered an appropriate indicator for habitat complexity (Noss, 1990). The patterns

of species assemblages are commonly de�ned using multivariate analysis, inferring

species-environment relationships. Nonetheless, not much attention is given to the

reliability of the methodology that is applied which is the main topic of the present

study.

Hierarchical cluster analysis is widely applied for assemblage studies. This

method is based on identifying objects with similar characteristics and grouping

them together such that objects within the group are more similar than objects

in di�erent groups. Cluster analysis can be used to identify species assemblages,

1

Page 18: Robustness of three hierarchical agglomerative clustering

2 Chapter 1 Introduction

and di�erent sites and times having similar community structures (Clarke and War-

wick, 2001). The output is a tree-like structure called a dendrogram with the x-axis

showing the objects and the y-axis indicating the level of similarity or dissimilarity

of the groupings. Similarity between the clusters diminishes moving from lower to

upper levels. Hierarchical clustering is sub-divided into agglomerative and divisive

methods. Agglomerative methods are most commonly used (Clarke and Warwick,

2001). In the basic description given by Quinn and Keough (2002), the procedure

starts with calculating a matrix of dissimilarity between the objects or variables and

two objects which are most similar cluster together to form a new object replacing

the merged pair. The dissimilarity between the new set of objects is re-calculated

and again the most similar objects are merged. The process continues until all the

objects are linked in a cluster. Dissimilarity indices (also called distance) measure

how di�erent the objects are (how far apart the objects are in multidimensional

space) and is calculated for every possible pair of objects. This is the basis for the

formation of a cluster. For continuous variables dissimilarity measures include Eu-

clidean (squared normal distance), Manhattan, Canberra, Minkowski, Bray-Curtis,

Kulczynski and Chi-square (Quinn and Keough, 2002). A variety of agglomerative

clustering methods exist depending on which technique or linkage method is used

to fuse the objects during the clustering process. Some of the common ones include

Single linkage, Complete linkage, Average linkage and Ward's hierarchical cluster-

ing method. The divisive method is opposite to the agglomerative, starting with a

single cluster which contains all the objects and splitting it up into smaller groups

(Clarke and Warwick, 2001) and two-way indicator species analysis (TWINSPAN)

is a common method in this class (Quinn and Keough, 2002) .

For the most part, hierarchical clustering techniques lack a completely stable

output and an objective measure for evaluating the outcomes obtained (Cao et al.,

2002a; Nemec and Brinkhurst, 1988) introducing subjectivity into the interpreta-

tion of the classi�cations (Mahon et al., 1998). Generally, prior to clustering the

grouping properties of the data set are unknown and the number of expected clus-

ters cannot be assigned beforehand. In other words, it is an unsupervised process

(�unsupervised learning�) and it is generally di�cult to judge whether the resulting

classi�cation patterns and the number of groups are acceptable. Additionally, the

function of the clustering algorithms are susceptible to the properties of the data

and the assumptions made for the de�nition of the groups (Halkidi et al., 2002b;

Kovács et al., 2005). Another drawback is that once a cluster is formed it cannot

Page 19: Robustness of three hierarchical agglomerative clustering

Introduction 3

be broken down later in the process and an inaccurate cluster formed early in the

process will therefore in�uence the classi�cation that follows (Quinn and Keough,

2002). Consequently, evaluation and validation of clustering techniques are an es-

sential part of cluster analysis (Legendre, 1998). Comparing outcomes from a few

techniques can also ensure consistency and plausibility of the results as di�erent

clustering algorithms could lead to di�erent results for the same data set (Jakoniene

and Lambrix, 2007). Naturally, if there really is a strong association in the data,

di�erent methods should produce similar results (Quinn and Keough, 2002).

The results of hierarchical classi�cation depend on the choice of the clustering

technique (linkage method) and the initial dissimilarity index used to calculate the

pairwise dissimilarity between objects, thus one should be wary of their choices. The

purpose of the analysis, the nature of the data and the standardisations of the data

all play a role in determining the optimum clustering technique used (Quinn and

Keough, 2002) taking note that the choice of linkage method is more critical than the

choice of the dissimilarity measure (Vakharia and Wemmerlöv, 1995). For ecological

studies, the group mean (or Average) linkage technique also known as unweighted

pair-groups method using arithmetic averages (UPGMA) based on Bray-Curtis dis-

similarity has been a prominent technique for some decades, as noted by Clarke and

Ainsworth (1993) and also falls within the recommendation of Quinn and Keough

(2002). When data are in the form of species abundance the problem of "double

zeros� normally exists, that is, a species can be absent from two sites. �If a species is

absent from two sites, then these two sites are either both above or both below the

optimal niche value for that species, or one above and one below that value� (Leg-

endre, 1998). Thus clear indications about the ecological preferences of the species

cannot be reached in these circumstances and ecological conclusions should not be

drawn. Therefore, dissimilarity coe�cients that do not classify sampling units as

similar because they have no species in common are recommended. Coe�cients of

this type are called asymmetric as they treat zeros in a di�erent way and skip double

zeros altogether when computing dissimilarities (Legendre, 1998). Bray-Curtis is an

asymmetrical quantitative coe�cient where the comparison excludes double zeros

which makes it preferable for ecological studies (Legendre, 1998). Bray-Curtis is one

such asymmetric coe�cient together with others such as Kulczynski and Canberra.

On the other hand, Euclidean and Chi-square are also good measures of dissimi-

larity if the data do not have zeros (Quinn and Keough, 2002). Other reasons as

to why Bray-Curtis coe�cient is preferred is, the inclusion of a third sample does

Page 20: Robustness of three hierarchical agglomerative clustering

4 Chapter 1 Introduction

not a�ect the similarity between two initial samples and its value is unchanged by

inclusion and exclusion of a species which is jointly absent from two samples (Clarke

and Warwick, 2001). If the data are �rst normalised then the use of correlation as a

dissimilarity measure may be appropriate for species associations (Legendre, 1998).

Correlation distance is used more in analysis of species than sites since it incor-

porates a type of row standardization (Clarke and Warwick, 2001). This however

does not remove the problem of double-zeros but the problem can be minimised by

eliminating rare species from the analysis (Legendre, 1998).

The ecological literature has a vast number of studies on �sh assemblages rang-

ing across various types of �sheries and habitats. Some of the analyses of demersal

�sh assemblages in the Northern region include; Galician continental shelf and up-

per slope, north-west Spain (Fariña et al., 1997); eastern Norwegian sea (Bergstad

et al., 1999); north-east Newfoundland/Labrador shelf (Gomes and Richard, 1995);

Flemish Cap (González-Troncoso et al., 2006); Faroe Banks (Magnussen, 2002); east

coast of North America (Mahon et al., 1998); west and east Greenland continen-

tal shelf and slope (Rätz, 1999) and Portuguese continental margin (Sousa et al.,

2005). Most of these studies relate the spatial and temporal patterns of species

assemblages to possible environmental variables that could explain these structures.

To ensure consistency of the results, output from at least two multivariate analyti-

cal techniques are generally compared in most studies. For example, Mahon et al.

(1998), Medina et al. (2007) and González-Troncoso et al. (2006) compare PCA to

hierarchical clustering. Francis et al. (2002); Sousa et al. (2005) compare CA and

hierarchical clustering and Lee and Sampson (2000) look at DCA and hierarchical

clustering. Some of the studies such as Brazner and Beals (1997) and Massuti and

Moranta (2003), among others, try to complement results obtained from clustering

with MDS. All these studies report consistent results from the di�erent techniques

used. However cluster validation or comparison of techniques was not the underlying

objective of these studies. With some exceptions, justi�cation is not provided on the

choice of techniques used. Generally such studies are more focused on the biological

aspects of analysis and interpretation rather than the reliability of the techniques

used. It is therefore not clear in general whether consistency is a general feature or

only present between the two methods chosen in each of those analyses.

Numerous studies have focused on testing the e�ciency and stability of various

hierarchical clustering techniques and in turn trying to determine the best linkage

method for the data set being evaluated. Some of these include Datta and Datta

Page 21: Robustness of three hierarchical agglomerative clustering

Introduction 5

(2003); Gauch Jr and Whittaker (1981); Hennig (2007); Loganantharaj et al. (2006);

Milligan and Cooper (1987); Scheibler and Schneider (1985) and references therein.

Quinn and Keough (2002) and Cao et al. (1997a) also give further citations. The

majority of other studies on cluster validation are based on non-ecological data.

Studies such as Scheibler and Schneider (1985) used Monte Carlo tests to examine

the accuracy of a wide range of hierarchical and non-hierarchical clustering, show-

ing that Ward's linkage was the most robust among the hierarchical classi�cation

techniques examined. Some recent studies as Hennig (2007) use simulation studies

to test stability of clustering techniques also, based on external validation criteria

such as Jaccard's coe�cient.

Cluster validation studies are fairly limited in the �eld of ecology. One study

conducted by Cao et al. (1997a) compared the performance of three hierarchical

linkage methods, UPGMA, Complete and Ward's linkage, and TWINSPAN on river

benthic community data. Contrary to the general recommendation they found that

Ward's clustering technique produced the best result. Nonetheless, the choice of

dissimilarity measure also plays a role. Ward's linkage needs Euclidean distance

(Vakharia and Wemmerlöv, 1995) and this distance measure is known to strongly

overweight abundant species, even after data transformation (Cao et al., 1997a). In

their study Cao et al. (1997a) broaden the use of Wards linkage and apply it to

a new dissimilarity measure, namely CY dissimilarity measure, proposed by Cao

et al. (1997b). Ward's technique has generally been applied and recommended for

non-ecological studies. Since ecological patterns in multivariate data are normally

not known a priori this poses some di�culties in assessment of patterns. This short-

coming has been addressed by some studies through the use of simulated data. One

such study by Gauch Jr and Whittaker (1981) compared hierarchical classi�cation

for simulated community data and �eld data. They showed that UPGMA did not

perform very well in separating the predetermined plant communities in compari-

son to TWINSPAN and Complete linkage. On the contrary, Belbii and McDonald

(1993) found that �exible-UPGMA performed better than TWINSPAN when tested

on simulated community data.

Even though there are some studies which suggest that sampling e�ort could have

a signi�cant e�ect on the multivariate analyses, this has seldom been investigated

(Cao et al., 2002a,b). Cao et al. (2002a) investigated the e�ect of sampling e�ort on

the similarity/dissimilarity measures as opposed to the clustering technique, with

the justi�cation that these are fundamental to cluster analysis. Their study illus-

Page 22: Robustness of three hierarchical agglomerative clustering

6 Chapter 1 Introduction

trated that increasing sampling e�ort signi�cantly improved the site separation in

techniques such as cluster analysis and ordination, since more samples improve the

estimate of the similarity between objects resulting in a clearer separation between

groups. Additionally, decreasing sampling e�ort or insu�cient sampling can have an

e�ect on the observed community structure as fewer species are caught and recorded

in smaller sample sizes (Riecken, 1999). As such, to test the e�ect of sample sizes

on the observed ecological communities appears worthwhile.

The two important aspects of cluster validation involve testing the e�ciency and

the stability of a method. Validation techniques used for testing e�ciency, or the

goodness-of-�t of the clustering, can be broadly classi�ed into external, internal and

relative criteria. A concise account of these are given by Halkidi et al. (2002b,a). In

short, an external criterion for cluster validation involves comparing the clustering to

a prede�ned structure. Statistical indices such as the Rand Statistic, Jaccard coe�-

cient, Hurbert's statistic and Folkes and Mallow Index are used for this criterion. A

relative criterion is based on certain assumptions and parameters and involves com-

paring the obtained classi�cation to other clustering schemes. Some of the statistics

used in this criterion are the Dunn family of indices, modi�ed Hurbert's statistic,

Davies-Bouldin index among others (Halkidi et al., 2002a). An internal criterion on

the other hand relies on the inherent features of the data to evaluate the clustering

structure (Halkidi et al., 2002b) such as the initial dissimilarity patterns between

the objects. This is particularly useful if no prior information about the de�nitions

in the data are available. One such criterion is the Cophenetic Correlation Coe�-

cient (CPCC), also referred to as a matrix correlation or the standardized Mantel

statistic, proposed by Sokal and Rohlf (1962). The hierarchical clustering proce-

dure produces a total dissimilarity matrix known as the cophenetic matrix. The

correlation between this cophenetic matrix and the original dissimilarity matrix on

which the clustering was carried out is the CPCC (Lessig, 1972). A high correla-

tion shows that the clustering technique did not distort much information contained

in the original dissimilarity matrix. This criteria of cluster validation has been

applied in several studies for evaluating clustering e�ciency (Farris, 1969). Such

evaluations also include Gauch Jr and Whittaker (1981); Li (1990); Rodrigues and

Diniz-Filho (1998). However some studies such as Farris (1969); Rohlf and Fisher

(1968); Phipps (1971) have questioned the reliability of this index of cluster validity.

Another criterion is the agglomerative coe�cient (AC), proposed by Kaufman and

Rousseeuw (1990). This criterion is based on the clustering structure itself found by

Page 23: Robustness of three hierarchical agglomerative clustering

Introduction 7

the clustering algorithm and is normally used to assess the strength and quality of

the clustering (Rodrigues and Diniz-Filho, 1998; Hasan and Masumoto, 1999; Lesage

et al., 1999).

The second aspect of cluster validation, stability, normally refers to whether the

clusters remain constant irrespective of changes in the initial data set, such as taking

subsamples or adding noise to the data (Hennig, 2007). Perhaps one of the disad-

vantages of hierarchical cluster analyses is to verify that the clusters are not just a

result of random e�ects. This has been, in some ways, overcome by the bootstrap

technique. Bootstrapping is used to assess the uncertainty in hierarchical clustering

by determining the probabilities of the obtained clusters. The stability and con-

sistency of a cluster can therefore also be tested using the bootstrap (Hennig and

Mathematik-SPST, 2005; Efron et al., 1996). Bootstrapping has also been applied in

a variety of ways to assess the reliability of clusters (Efron et al., 1996; Handl et al.,

2005; McKenna, 2003; Kerr and Churchill, 2001; Shimodaira, 2002; Suzuki and Shi-

modaira, 2004). The majority of such work has been done in the �eld of molecular

genetics, some in a rather elaborate manner, and studies such as Bolshakova et al.

(2005) have developed speci�c software for cluster validation of DNA microarray

data. This software can be used to validate a range of clustering techniques and

incorporates various validation indices. However, bootstrapping of cluster analysis

is somewhat less seen among the numerous ecological studies conducted.

Given the drawbacks of cluster analysis which have been outlined earlier, ordi-

nation techniques such as MDS are sometimes preferred. An ordination is like a

map of the objects in more than two dimensions, where the placement of the ob-

jects represents their similarity. Multidimensional scaling (MDS), also referred to

as non-metric multidimensional scaling (NMDS), is like clustering, based on simi-

larities or dissimilarities between the objects. The procedure �scales objects based

on a reduced set of new variables derived from the original variables� (Quinn and

Keough, 2002). Other ordination techniques include Principal Co-ordinates Analy-

sis and Correspondence Analysis (CA), Canonical Correspondence Analysis (CCA),

Detrended Correspondence Analysis (DCA) and Principal Components Analysis

(PCA) which is the longest-established ordination method (Clarke and Warwick,

2001). Ordination gives a more informative display when samples do not portray a

strong grouping (Clarke and Warwick, 2001). Clarke and Warwick (2001) suggest

that cluster analysis be used in conjunction with ordination, even if the samples

are strongly grouped. Gauch and Whittaker 1981 argue that in community ecology,

Page 24: Robustness of three hierarchical agglomerative clustering

8 Chapter 1 Introduction

data are relatively continuous with samples relatively evenly spaced and the data are

not naturally clustered. Consequently, clustering may impose clusters which are not

intrinsic to the data. Therefore they suggest that non-hierarchical and ordination

techniques have advantages over hierarchical techniques in such cases. NMDS is nor-

mally recommended as one of the best ordination techniques (Quinn and Keough,

2002; Clarke and Warwick, 2001; Clarke and Ainsworth, 1993) due to its �exibility.

It can be applied in conjunction with a wide range of dissimilarity measures and does

not rely on any particular response model between species and underlying ecological

gradients (Legendre, 1998).

Examining complex patterns in community structures can be considerably de-

manding and complex. When the data are extensive and structures are abstract,

a clear visualisation of patterns in a graphical format can be particularly bene�cial

for understanding and interpretation. The heatmap is one such visualisation tech-

nique that is a useful data exploratory tool and has been applied widely in the �eld

of genetics for studying patterns in complex DNA microarray data (Pryke et al.,

2006; Hastie et al., 2001; Zhang et al., 2003; Eisen et al., 1998; Quackenbush, 2007).

This conceptualisation deals with assigning colours to each data point that �quan-

titatively and qualitatively re�ects the original experimental observations� (Eisen

et al., 1998) which is much more interpretable and informative than reading num-

bers. Visualisation can also be used as a measure of quality of the solutions (Pryke

et al., 2006). Although applied extensively in taxonomical studies, ecologists have

refrained from the use of these visualisation techniques for exploring community

structures. Here an attempt is made to give an informative representation of the

species-area relationship through a heatmap.

Page 25: Robustness of three hierarchical agglomerative clustering

1.1 Purpose of the study 9

1.1 Purpose of the study

Hierarchical agglomerative cluster analyses have been widely applied in the �eld of

ecology. However, the robustness of the techniques used are seldom examined. The

primary emphasis of this study was to address the methodological and statistical

aspects of clustering procedures. Secondarily, the biological aspects of the estimation

of �sh assemblages were also addressed.

The robustness of three hierarchical agglomerative clustering techniques namely,

Average linkage or Unweighted Pair-Group Mean Average (UPGMA), Complete link-

age and Ward's linkage were examined for identi�cation of �sh assemblages. These

are the most commonly used linkage methods in ecology. This study was based on

Icelandic ground�sh survey data for the period 1998 to 2007.

The objective criteria used for assessing the cluster validity or e�ciency was

the Cophenetic Correlation Coe�cient (CPCC) and the Agglomerative Coe�cient

(AC). In order to test the reliability of the clustering methods, the probability values

for the clusters were determined through bootstrap resampling. As a measure of the

stability and consistency of the methods, their performances were examined across

di�erent sample sizes and di�erent levels of data smoothing (data aggregation) .

As a secondary aim, it was explored if di�erent data standardisation methods

and dissimilarity measures played a signi�cant role in determining multivariate pat-

terns, in this context the species assemblages. Thus the above analyses were carried

out using two modes of data analysis. These were a di�erent combination of (1)

data transformation and standardisation and (2) the dissimilarity measure used to

obtain the matrix of dissimilarities before the clustering. For each mode of data anal-

ysis, relative comparisons were made between the three linkage methods in order to

determine which hierarchical agglomerative clustering technique was conditionally

most robust, thus potentially most suitable for the data being studied. Furthermore,

NMDS was used as an external subjective criterion to compare and verify the �sh

assemblages obtained from hierarchical cluster analysis.

Furthermore, after the identi�cation of the most robust linkage method, it was

important to examine if the species assemblages obtained from that method were

ecologically meaningful. Thus the identi�ed assemblages were examined in rela-

tion to two environmental variables, depth and geographic distribution. These two

variables were hypothesised to be in�uential in determining the species associations.

A classi�cation of the �shing areas was also carried out to determine similar

Page 26: Robustness of three hierarchical agglomerative clustering

10 Chapter 1 Introduction

habitats. This was carried out in line with the two modes of data analysis and

the three linkage methods in order to examine the consistency of the outcomes. A

visualisation technique, the �heatmap�, was then used to give a more informative

display of the patterns in the community structures by giving a pairwise display of

the two classi�cations of species and areas (statistical rectangles).

Page 27: Robustness of three hierarchical agglomerative clustering

2Statistical Theory

2.1 Hierarchical agglomerative clustering

All hierarchical agglomerative clustering procedures begin with an initial dissimi-

larity matrix between the objects. At the start of the agglomerative process each

object is considered as a separate class or cluster. For a set of N initial objects,

the �rst clustering will result in N-1 clusters, the next N-2 and so on until only one

cluster contains all the objects, with objects which are most similar fusing together

at each step. How the distance between the new cluster and the remaining objects is

computed is determined by the clustering algorithm being used (Gordon, 1999). A

general equation proposed by Lance and Williams (1967) and outlined in Scheibler

and Schneider (1985), describes how the various hierarchical algorithms compute

this distance:

dhk = αidhi + αjdhj + βdij + λ |dhi − dhj| (2.1)

where:

dij denotes the Euclidean distance between the entities i and j which have been

combined to form a new cluster k

dhk denotes the Euclidean distance between a remaining entity h and the new cluster

k

αi, αj, β and λ are parameters that depend on the clustering method being used and

are outlined in Table 1 below for the three methods considered here.

11

Page 28: Robustness of three hierarchical agglomerative clustering

12 Chapter 2 Statistical Theory

Cluster Method αi αj β λAverage ni

nk

nj

nk0 0

Complete 0.5 0.5 0 0.5Ward's nh+ni

nh+nk

nh+nj

nh+nk

−nh

nh+nk0

Table 2.1: Parameter Values for the clustering algorithms used in this study

where:

ni is the number of entities in cluster i of preceding partition

nj is the number of entities in cluster j of preceding partition

nk is the number of entities in the new cluster k (nk = ni + nj)

nh is the number of remaining entities for which the distance to cluster k has to be

recomputed (one less than the number of clusters after the merger).

The output from the analyses are represented as hierarchical tree or dendrograms.

A general description of the three methods evaluated in this study is given below.

2.1.1 Average linkage (UPGMA)

In this method after two objects with the least dissimilarity fuse together an arith-

metic average of the dissimilarity of this new cluster and the rest of the objects are

calculated. This leads to a reduction in the size of the original dissimilarity matrix.

The procedure then continues with the dissimilarity matrix being correspondingly

reduced. When the average between an object and a cluster is calculated, the

method gives equal weights to the members of the clusters when averaging, thus is

called unweighted. Thus, in the progressive reduction of the dissimilarity matrix,

only relationships between groups are considered, which are given equal weighting

and this leads to loss of information about the relationships between pairs of objects

(Legendre, 1998).

2.1.2 Complete linkage

The fusion of the clusters depends on the most distant pair of objects as opposed to

the closest. An object can join a cluster only when it is linked to all objects present

in the cluster. Two clusters can only fuse when all members from the �rst cluster

are related to all objects from the second cluster, hence it becomes more di�cult

Page 29: Robustness of three hierarchical agglomerative clustering

2.2 Non-Metric Multidimensional Scaling (NMDS) 13

for objects to join a cluster. This however creates clusters with clear discontinuities

(Legendre, 1998).

2.1.3 Ward's linkage

This method is also referred to as Ward's minimum variance method. The procedure

minimizes the sum of squares to form clusters, thus it is also referred to as the

incremental sum of squares method. The procedure initially considers each object

as a cluster on its own so the distance of the object to its cluster centroid is 0.

The centroid of a cluster is the average of the coordinates of the objects in the

cluster. As the clusters form, the centroids move away from actual object coordinates

and the sum of squared distances between the objects and the centroids increases.

The distance of the object to its cluster centroid is calculated using the Euclidean

distance formula. At each clustering step, the cluster identi�ed for fusion is the one

that minimizes the sum of squared distance over all objects. The dendrogram is

normally represented in squared distances.

2.2 Non-Metric Multidimensional Scaling (NMDS)

The process begins with an ordination (scaling) of the objects in full-dimensional

space and then represents them in few dimensions while the distance relationships

between objects are retained as much as possible. The main objective of NMDS

is to plot dissimilar objects far apart in the ordination space and similar objects

close to one another. An initial distance matrix is calculated using an appropriate

distance measure for the data. A con�guration of the objects is constructed in a

speci�ed dimension which goes through an iterative algorithm to calculate a matrix

of �tted distances in the ordination space, using Euclidean distance mostly. The

solution depends on the initial positions of the objects so the choice of the original

dissimilarity measure is important. The �tted distances are then compared to the

original distances through regression and the corresponding scatter plot is known as

the Shepard Diagram. The goodness-of-�t of the regression is evaluated by the use

of the sum of squares from the regression analysis. These are known as the stress

values and the �t is considered good if the stress value is less than 0.01 (Legendre,

1998).

Page 30: Robustness of three hierarchical agglomerative clustering

14 Chapter 2 Statistical Theory

Stress =

√√√√∑h,i(dhi − d̂hi)2∑

h,i d2hi

(2.2)

where:

dhi are the �tted distance values

d̂hi are the values forecasted by the regression between dhi and dhi (original distances)

Page 31: Robustness of three hierarchical agglomerative clustering

3Methodology

3.1 Icelandic Ground�sh Survey

The Icelandic ground�sh survey was instigated in 1985 and has been conducted in

March every year since by the Marine Research Institute. The survey area which

consists of the Icelandic continental shelf inside the 500 meters depth contour, is

divided into statistical rectangles. Each statistical rectangle represents one half

degree latitude and one degree longitude, on which the strati�cation scheme is based.

Statistical rectangles are further divided into 4 subrectangles. The strati�cation

system in the survey design, used to de�ne the locations of tows (stations) was

based on the density of cod found in the area. These density patterns, estimated by

statistical rectangles, were calculated from catch data from commercial and research

vessels prior to the survey design. For analysis, the survey area is divided into a

northern and southern area and ten strata based on biological and hydrographic

considerations. The allocation of stations to strata is directly proportional to the

area of the stratum and its estimated cod density (Pálsson et al., 1989). Figure 3.1

shows the survey area, the statistical rectangles and the approximate locations of

the stations.

The sampling scheme can be classi�ed as semi-random strati�ed (Pálsson et al.,

1989) as half the stations were randomly chosen by the research team of the institute

whereas the other half was chosen by �shermen who had knowledge and experience

of �shing and the �shing grounds. The design however is systematic since the same

stations are covered every year (Pálsson et al., 1989). Five commercial vessels are

15

Page 32: Robustness of three hierarchical agglomerative clustering

16 Chapter 3 Methodology

Figure 3.1: Icelandic ground�sh survey area within the 500 meter contour line,outlining the statistical rectangles and the locations of the stations

leased every year to carry out the survey within the restricted time frame of 2-3

weeks. Emphasis is placed on standardizing the �shing methods as far as possible.

The towing speed is �xed at 3.8 knots over the bottom and the towing distance is

4.0 nautical miles.

3.2 Data

The survey targets all major commercial demersal �sh species within the survey

area. The criterion used for identifying the species to be included in the current

analysis was the frequency of occurrence of the species in the overall number of

samples. Species which appeared in greater than 5% of the total number of samples

were analysed. This comprised 40 species. Rare species were excluded as they could

confuse patterns in multivariate analysis if left in the similarity matrix since they

typically have only single sporadic occurrences at variable sites, without apparent

structure (Clarke and Warwick, 2001).

Data for the period 1998-2007 were analyzed. The raw data used for analy-

sis consisted of abundance in numbers by species, year, station, statistical square,

Page 33: Robustness of three hierarchical agglomerative clustering

3.3 Hierarchical cluster analysis - Species Assemblages 17

sub-square, depth, latitude and longitude of the stations. The original matrix of

abundance had species arranged in columns and each row corresponded to a single

tow.

The data were appropriately standardized (for each method) and transformed

before analysis. For data on species abundance standardizing reduces the strong

weighting and in�uence of few highly abundant species. It is important to make all

species have similar importance so that uncommon species also contribute to the

dissimilarities. Standardization also reduces the e�ect of di�erent total abundance

in di�erent sampling units which is important when comparing sites.

3.3 Hierarchical cluster analysis - Species Assem-

blages

The data analyses consisted of two main parts (Analysis I and II), based on di�erent

data standardisations and dissimilarity measures and are described below.

3.3.1 Analysis I: Correlation distance

For this distance measure, the data were �rst transformed to fourth root and then

scaled to mean 0 and variance 1 before carrying out the analysis. The distribution of

the data, before and after transformation are outlined in Figure 3.2, for four abun-

dant species. The dissimilarity measure used was 1 - Correlation. This coe�cient

best measures linear relationships between standardized (zero mean and unit vari-

ance) variables (Quinn and Keough, 2002). Since the data were centered (zero

mean), the Uncentered Pearsons Correlation Coe�cient was used, subsequently

modi�ed to dissimilarity by subtracting from 1:

1−

n∑i=1

xijxik√√√√ n∑i=1

x2ij

n∑i=1

x2ik

where xij and xik represents the abundance of jth and kth species at site i.

Page 34: Robustness of three hierarchical agglomerative clustering

18 Chapter 3 Methodology

3.3.2 Analysis II: Bray-Curtis distance

The second distance measure tested was the Bray-Curtis. The data were transformed

to fourth root and standardized by range which is one suitable standardisation for

this distance measure (Quinn and Keough, 2002). The Bray-Curtis measure of

dissimilarity could not be applied to earlier data standardisation as it does not

accept negative values (Quinn and Keough, 2002) which are generated when the

data are scaled. Figure 3.3 outlines the distribution of the data before and after

transformation for four abundant species. The Bray-Curtis coe�cient compares two

species in terms of their minimum abundance at each site:

100

∑pi=1 2min(xij, xik)∑p

i=1(xij + xik)(3.1)

where xij and xik represents the abundance of jth and kth species at site i.

The dissimilarity coe�cient is calculated by subtracting similarity from 100.

3.3.3 Data Analyses

The statistical software R was used to carry out all the analyses.

For each mode of analysis (Analysis I: Correlation distance and II: Bray-Curtis

distance) the three hierarchical clustering methods; Average, Complete and Ward's

were applied. For each method three levels of data aggregation were tested; (i) raw

data including all stations and years, (ii) data aggregated by station by taking an

average across years and (iii) data aggregated by subrectangles by taking an average

across years and stations.

The e�ect of sample size was tested by taking subsamples of the data. A total

of 5352 tows were available initially. Subsamples of 50%, 25% and 10% of the

original tow collection were taken. These subsamples were generated randomly

while maintaining the design and relative station density of the survey. Clustering

was done on each subsample for the two modes of analyses.

The cluster analysis was carried out using the Pvclust routine under package

Pvclust to assess the uncertainty in the clustering through bootstrap resampling

technique. A thousand bootstrap replications were run for each cluster. Two types

Page 35: Robustness of three hierarchical agglomerative clustering

3.3 Hierarchical cluster analysis - Species Assemblages 19

of probability values are computed in parallel by the routine i.e. approximately

unbiased (AU) p-value and bootstrap probability (BP) value. The AU p-value is

generated through multiscale bootstrap resampling and has asymptotic superiority

in bias over the BP value (Suzuki and Shimodaira, 2006). The BP value of a cluster,

which is calculated by the ordinary bootstrap resampling, is the frequency that it

appears in the bootstrap replicates. A detailed account of these computations are

given by Shimodaira (2008).

In R the Bray-Curtis measure of dissimilarity is implemented using the routine

vegdist in package vegan.

Page 36: Robustness of three hierarchical agglomerative clustering

20 Chapter 3 MethodologyC

od a

Frequency

020

0050

00010002000300040005000

b

Frequency

−2

02

46

020040060080010001200

Had

dock

a

Frequency

010

000

2500

0

010002000300040005000

b

Frequency

−2

02

4

02004006008001000

Red

fish

a

Frequency

020

000

4000

0

010002000300040005000

b

Frequency

−2

02

46

020040060080010001400

Long

rou

gh d

ab

a

Frequency

020

0050

00010002000300040005000

b

Frequency

−2

02

4

020040060080010001200

Figure3.2:

Distributionof

thedata

(a)beforeand(b)aftertransformingto

fourth

root

andscalingto

zero

meanand

variance

1,forfour

abundant

speciesinthesurvey,aslabelled.

The

histogramshow

sthenumberof�shpertowcollections.

Page 37: Robustness of three hierarchical agglomerative clustering

3.3 Hierarchical cluster analysis - Species Assemblages 21C

od a

Frequency

020

0050

00010002000300040005000

b

Frequency

0.0

0.4

0.8

050010001500

Had

dock

a

Frequency

010

000

2500

0

010002000300040005000

b

Frequency

0.0

0.4

0.8

02004006008001000

Red

fish

a

Frequency

020

000

4000

0

010002000300040005000

b

Frequency

0.0

0.4

0.8

0500100015002000

Long

rou

gh d

ab

a

Frequency

020

0050

00010002000300040005000

b

Frequency

0.0

0.4

0.8

050010001500

Figure3.3:

Distributionof

thedata

(a)beforeand(b)aftertransformingto

fourth

root

andstandardisingby

range,for

four

adundant

speciesin

thesurvey,as

labelled.

The

histogram

show

sthenumber

of�shper

towcollections.

Page 38: Robustness of three hierarchical agglomerative clustering

22 Chapter 3 Methodology

3.4 Comparison of the hierarchical clustering tech-

niques

One objective criterion used for comparison was the Cophenetic Correlation Coef-

�cient (CPCC). The CPCC is a simple correlation coe�cient between the original

dissimilarity matrix and the cophenetic matrix which is the total dissimilarity matrix

produced after clustering i.e. the distance at which two objects become members of

the same cluster. This correlation therefore measures how well the clustering was

able to maintain the original dissimilarity in the data. The Pearson's correlation

coe�cient was used here. In order to test the e�ect of di�erent sample sizes, the

correlation was calculated between the cophenetic matrix for the various reduced

sample sizes and the original dissimilarity matrix for all samples.

Another objective criterion used was the agglomerative coe�cient (AC) which

basically measures the clustering structure found by a technique. �For each ob-

servation i, its dissimilarity to the �rst cluster it is merged with is divided by the

dissimilarity of the merger in the �nal step of the algorithm, denoted by m(i). The

AC is the average of all 1 - m(i)� (Maechler et al., 2005). The value ranges from 0

to 1 and the higher the AC the better. The AC was however not used to compare

results for di�erent sample sizes as the coe�cient tends to increase with the number

of observations. In R, AC is computed using the routine agnes in package cluster.

The de�nition of the clusters and their probability values were noted and com-

pared. The signi�cance of the clusters were set at 0.9 for the AU p-value of the

clusters.The dendrograms were also visually compared for the presence of similar

clusters across the di�erent data smoothing and sample sizes.

Independent comparisons were made for Analysis I (Correlation distance) and

Analysis II (Bray-Curtis distance) to examine which clustering method performed

relatively better, for the two modes of analysis. The most robust method was then

identi�ed.

3.5 Comparison of hierarchical clustering with non-

metric multidimensional scaling

A non-statistical approach was used to validate the results from the hierarchical ag-

glomerative clustering. This was done by comparing it with non-metric multidimen-

Page 39: Robustness of three hierarchical agglomerative clustering

3.6 Fish Assemblages in relation to environmental variables 23

sional scaling (NMDS). The Kruskal's non-metric multidimensional scaling routine

isoMDS under package MASS was used. The procedure does not accept negative

values for initial dissimilarities, hence it could not be applied to data scaled to 0

mean and 1 variance. Thus the comparison could only be made with Analysis II:

Bray-Curtis dissimilarity measure on fourth root transformed data scaled by range.

NMDS plots the clusters on an ordination diagram to look for groupings. These

identi�ed groups were then compared with the clusters formed by the hierarchical

clustering. The stress values were used to examine the goodness-of-�t.

3.6 Fish Assemblages in relation to environmental

variables

After the comparisons of the clustering techniques and the identi�cation of the most

robust linkage method, some biological interpretations were made on the identi�ed

species assemblages, for both Analysis I and II. It was tested if the identi�ed �sh

community structures could be related to two environmental variables, depth and

geographic location of species.

For each species, weighted average depths d and standard deviations sd were

calculated by:

d =

∑nsds∑ns

(3.2)

and

sd =

√∑ns(ds − d)2∑

ns

(3.3)

where ns represents the abundance in numbers for species and ds represents the

depth at station s.

A one-way Analysis of Variance (ANOVA) was carried out to examine any sig-

ni�cant variability in mean depths among the identi�ed �sh assemblages. A Tukey

multiple comparison test was then undertaken to determine between which treat-

ment levels (assemblages) the actual di�erences lay.

Furthermore, the geographic distribution of each species was mapped. This was

Page 40: Robustness of three hierarchical agglomerative clustering

24 Chapter 3 Methodology

done by generating a bubble plot which shows the mean abundance of each species,

averaged across all years, by statistical sub-rectangles. The sizes of the circles are

proportional to the square root of the mean abundance. Any relationship between

this and the identi�ed assemblages was then examined in a non-statistical manner.

3.7 Habitat analysis

This part of the analysis entailed carrying out a classi�cation of the areas within the

Icelandic continental shelf. The areas were de�ned as the statistical subrectangles.

An average of the species abundance in numbers, was calculated by each subrectangle

generating a species-subrectangle matrix. This was essentially a transpose of the

species-site matrix used for species assemblages. Clustering was then carried out

on these data to determine the hierarchical classi�cation of the areas. Classi�cation

was carried out using the three hierarchical linkage methods, for the two distance

measures described above (Analysis I: Correlation distance and II: Bray-Curtis).

The classi�cations obtained were mapped for clarity. For each identi�ed cluster of

areas, its species composition was also determined.

In a previous analysis described in Stefánsson and Pálsson (1997) it was inferred,

based on the bathymetric and hydrographic structure of the Icelandic continental

shelf, that some de�nition between the north and south areas and some depth di-

visions should be observed. The e�ciency of the techniques were based on this

hypothesis.

3.8 Heatmap

A heatmap was generated using the heatplot routine in package made4 ). This plots

hierarchical dendrograms of objects and variables, in this context sites and species

respectively, in a two-way rearrangement. The data were transformed to fourth root

and scaled to mean 0 and variance 1 for this analysis. Here the default settings

were used, which is clustering based on correlation dissimilarity and Average linkage

(Culhane et al., 2005). This generated an image with a spectrum of colours indicat-

ing the strength of associations between the species and their corresponding areas

of occurrence.

Page 41: Robustness of three hierarchical agglomerative clustering

4Results

4.1 Comparison of the three hierarchical clustering

techniques

4.1.1 Analysis I: Correlation distance

The results from the objective criteria for assessing the clustering techniques, CPCC

and AC, are outlined in Tables 4.1 and 4.2 respectively. Overall it was seen that

Average linkage gave the highest CPCC (0.82), followed by Complete (0.79) then

Ward's (0.76), although Complete linkage performed poorly with the full data set

(0.67). The AC was the highest for Ward's linkage (0.82) followed by Complete

(0.62) then Average (0.49).

The hierarchical clustering yielded by Average and Complete linkage, Figures

4.1a and 4.1b respectively, produced clusters at high dissimilarity levels. Ward's

linkage, however, gave well-de�ned clusters forming at lower levels of dissimilarity.

When the entire data set was used, this technique classi�ed the species into 2 distinct

signi�cant groups (AU > 0.9; edge 37 & 38 in Figure 4.2). Edge refers to the

cluster number which is marked in green in the �gures. A few signi�cant groups of

species were produced by the Average and Complete linkage. Overall, the probability

of clustering was lower for the Complete linkage in comparison to the other two

methods. The AU p-values were used for comparison which are illustrated in blue

in the �gures.

Clustering on the full data set provided inconsistent species assemblages across

25

Page 42: Robustness of three hierarchical agglomerative clustering

26 Chapter 4 Results

the three hierarchical clustering techniques. However, with some data smoothing,

i.e. averaging the species abundance by stations and across years, the results were

more consistent and comparable among the three clustering methods. Essentially

four main species assemblages could be identi�ed and these are portrayed in Figures

4.3a, 4.3b and 4.4 for Average, Complete and Ward's linkage respectively. Species

such as altantic wol�sh, moustache sculpin, lump�sh, long rough dab and snake

blenny were inconsistent in clustering, among the three linkage methods.

4.1.2 Analysis II: Bray-Curtis distance

For this analytical method also, it was seen that Average linkage gave the highest

CPCC (0.87), followed by Complete (0.74) then Ward's (0.61) (Table 4.1). The AC

was the highest for Ward's (0.75) linkage followed by Complete (0.62) then Average

(0.44) (Table 4.2).

When the clustering was carried out on the full data set, the Average (Figure

4.5a) and Complete linkage (Figure 4.5b) produced clusters at high dissimilarity

levels. The Complete linkage did not give a clear de�nition of clusters in particular.

Ward's linkage gave well de�ned clusters (Figure 4.6). The results among the three

linkage techniques was not consistent. With smoother data, the clustering structure

improved for Average and Complete linkage and the results across the three clus-

tering techniques were relatively more consistent. Similar groups of species could

be identi�ed. The results from Average and Complete linkage were similar (Figures

4.7a, 4.7b) except Average linkage produced some outlying observations. However

the clustering structure between the constituent groups of species was di�erent for

Ward's linkage (Figure 4.8).

Page 43: Robustness of three hierarchical agglomerative clustering

4.1 Comparison of the three hierarchical clustering techniques 27

Data Average Complete Ward's

I II I II I IIFull data set 0.82 0.87 0.67 0.74 0.75 0.61Aggregated by stations 0.82 0.84 0.79 0.74 0.76 0.64Aggregated by subrectangles 0.81 0.83 0.79 0.79 0.75 0.6650% Subsample 0.80 0.83 0.74 0.79 0.75 0.6925% Subsample 0.80 0.83 0.75 0.68 0.75 0.6510% Subsample 0.78 0.82 0.70 0.61 0.66 0.63

Table 4.1: Cophenetic Correlation Coe�cient for Analysis I (Correlation distance)and II (Bray-Curtis distance)

Data Average Complete Ward's

I II I II I IIFull data set 0.49 0.44 0.62 0.62 0.82 0.75Aggregated by stations 0.66 0.55 0.75 0.65 0.90 0.83Aggregated by subrectangles 0.70 0.61 0.77 0.63 0.91 0.85

Table 4.2: Agglomerative Coe�cient for Analysis I (Correlation distance) and II(Bray-Curtis distance)

Page 44: Robustness of three hierarchical agglomerative clustering

28 Chapter 4 Results(a)

deepwater redfishpolar cod

polar sculpinatlantic sculpin

artic rocklinggreenland halibutesmark's eelpout

lycodes spatlantic poacherlongfin snailfish

codspotted wolffish

thorny skatesnake blenny

long rough dabvahl's eelpout

witchfourbeaded rockling

haddockwhiting

monkfishlemon sole

blue whitingblueling

greater argentinetusk

megrimnorway pout

lingnorway haddock

saitheredfish

skatedogfish

atlantic wolffishmoustache sculpin

lumpfishhalibutplaice

dab

0.20.40.60.81.0

Dissimilarity

9797

9299

9999

9897

9910

094

100

9910

098

9910

010

098

6010

010

010

061

6989

5793

100

100

7673

4993

6485

75

83au

9910

086

9710

010

098

9799

100

100

100

9910

097

100

100

100

9856

100

9910

033

6982

7291

100

100

6467

5790

4860

65

61bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

3536

37

38ed

ge #

(b)

megrimnorway pout

lingnorway haddock

tusksaithe

redfishwitch

fourbeaded rocklinghaddock

whitingmonkfish

lemon soleatlantic wolffish

moustache sculpinthorny skate

codspotted wolffish

lumpfishhalibutplaice

dabdeepwater redfish

blue whitingblueling

greater argentineskate

dogfishsnake blenny

long rough dabvahl's eelpout

polar codgreenland halibutesmark's eelpout

lycodes spatlantic poacherlongfin snailfish

polar sculpinatlantic sculpin

artic rockling

0.20.40.60.81.01.21.4

Dissimilarity

9974

9210

010

080

9896

9910

099

100

100

100

100

9797

9610

063

9710

096

8897

5154

6153

9849

4755

6279

9190

78

au

9910

086

9910

010

098

9799

100

9910

010

010

010

092

9310

098

6192

100

9784

96

6848

7830

7955

4236

2918

1414

23

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

25

2627

2829

3031

3233

3435

3637

38

edge

#

Figure4.1:

Dendrogram

ofspeciesassemblagefortheIcelandicGround�sh

(IGF)survey

from

1998-2007using(a)Average

linkage

and(b)Com

pletelinkage,withcorrelationdissimilarity

measure.Dataconsists

ofspeciesabundancein

numbers,

fourth

root

transformed

andscaled

to0meanandvariance

1,comprisingof

alltowcollections.The

rectangles

highlight

theclusterswithAU>0.9.

The

AUvalues

areused

forinterpretation

areindicatedin

blue

andtheclusternumber

(edge)

ismarkedin

green.

Page 45: Robustness of three hierarchical agglomerative clustering

4.1 Comparison of the three hierarchical clustering techniques 29

witchfourbeaded rockling

haddockwhiting

monkfishlemon sole

halibutplaice

dabmegrim

norway poutsaithe

redfishtuskling

norway haddockblueling

greater argentineblue whiting

deepwater redfishskate

dogfishpolar cod

atlantic poacherlongfin snailfish

greenland halibutlycodes sp

atlantic sculpinartic rockling

esmark's eelpoutpolar sculpin

long rough dabsnake blennythorny skate

vahl's eelpoutatlantic wolffish

moustache sculpinlumpfish

codspotted wolffish

0123456

Dissimilarity

9910

099

7010

099

9471

6510

074

100

3264

9663

5576

6190

5366

100

7493

8599

9597

9688

9892

8698

90

94

94

au

9996

100

6999

100

9268

5110

069

100

100

4575

4258

4351

3942

5199

4461

8093

7888

7380

8848

5289

42

72

72

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

3536

37

38

edge

#

Figure4.2:

Dendrogram

ofspeciesassemblageusingWard'slinkage

withcorrelationdissimilarity

measure.Dataconsists

ofspeciesabundancein

numbersfourth

root

transformed

andscaled

to0meanandvariance

1.The

rectangles

highlight

theclusters

withAU>0.9.

Page 46: Robustness of three hierarchical agglomerative clustering

30 Chapter 4 Results

(a)

tuskblue whiting

bluelinggreater argentine

lingnorway haddock

megrimnorway pout

saitheredfish

haddockfourbeaded rockling

whitingwitch

plaicedab

halibutmonkfish

lemon soleskate

dogfishmoustache sculpin

atlantic wolffishlumpfish

vahl's eelpoutlong rough dab

snake blennycod

spotted wolffishdeepwater redfish

polar codthorny skateartic rocklingpolar sculpin

atlantic sculpingreenland halibutesmark's eelpoutatlantic poacherlongfin snailfish

lycodes sp

0.20.40.60.81.01.2

Dissimilarity

9893

9681

9398

8487

9798

9883

9987

9995

9699

8474

8999

9999

8497

9473

8192

7280

8685

8378

78

78

au

9575

9761

8598

7188

8610

098

8198

6399

7488

9974

6357

9210

097

4288

4829

2569

1763

3362

4153

53

53

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

3536

37

38

edge

#

(b)

tusksaithe

redfishling

norway haddockmegrim

norway poutblue whiting

bluelinggreater argentine

long rough dabsnake blenny

fourbeaded rocklingwhiting

witchhalibutplaice

dabhaddockmonkfish

lemon soleskate

dogfishmoustache sculpin

atlantic wolffishlumpfish

codspotted wolffish

thorny skatevahl's eelpout

greenland halibutesmark's eelpoutatlantic poacherlongfin snailfish

lycodes sppolar sculpin

atlantic sculpinartic rockling

deepwater redfishpolar cod

0.00.51.01.5

Dissimilarity

9987

9698

8898

8486

100

8199

9989

8275

8688

7887

7892

9391

9563

8983

9477

8594

81

84

77

91

81

8081

au

9575

9795

7198

7188

100

8198

9885

5689

5864

5574

5564

3262

9757

4843

4965

2749

18

42

40

14

22

3131

bp

12

34

56

78

910

1112

1314

1516

1718

1920

21

2223

2425

2627

2829

3031

32

33

34

35

36

3738

edge

#

Figure4.3:

Dendrogram

ofspeciesassemblageusing(a)Average

linkage

and(b)Com

pletelinkage,withcorrelationdis-

similarity

measure.Dataconsists

ofmeanspeciesabundancein

numbersby

stations,fourth

root

transformed

andscaled

to0meanandvariance

1.The

rectangles

highlight

theidenti�edspeciesassemblages

forcomparison.

Page 47: Robustness of three hierarchical agglomerative clustering

4.1 Comparison of the three hierarchical clustering techniques 31

codspotted wolffish

thorny skatevahl's eelpout

deepwater redfishpolar cod

greenland halibutesmark's eelpoutatlantic poacherlongfin snailfish

lycodes sppolar sculpin

atlantic sculpinartic rocklingblue whiting

bluelinggreater argentine

lingnorway haddock

megrimnorway pout

tusksaithe

redfishmoustache sculpin

atlantic wolffishlumpfish

long rough dabsnake blenny

haddockfourbeaded rockling

whitingwitchskate

dogfishplaice

dabhalibut

monkfishlemon sole

02468

Dissimilarity

9892

9697

9597

8285

9982

9999

8582

9798

9389

7994

8010

098

7895

9651

9997

9480

8275

8557

78

83

63

au

9577

9794

8198

7188

100

8098

9963

5797

9466

8361

6970

100

9442

8774

3293

8963

3454

4448

28

32

29

26

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

35

36

37

38

edge

#

Figure4.4:

Dendrogram

ofspeciesassemblageusingWard'slinkage

withcorrelationdissimilarity

measure.Dataconsists

ofmeanspeciesabundancein

numbersby

stations,fourth

root

transformed

andscaled

to0meanandvariance

1.The

rectangles

highlight

theidenti�edspeciesassemblages

forcomparison.

Page 48: Robustness of three hierarchical agglomerative clustering

32 Chapter 4 Results

(a)

polar codpolar sculpin

atlantic sculpinartic rockling

greenland halibutlycodes sp

esmark's eelpoutatlantic poacherlongfin snailfish

deepwater redfishblueling

greater argentineblue whiting

snake blennyhalibutplaice

dabnorway pout

lingmegrim

monkfishfourbeaded rockling

whitingwitch

moustache sculpinlumpfish

atlantic wolffishthorny skate

codlong rough dab

haddockredfish

spotted wolffishvahl's eelpout

saithetusk

lemon solenorway haddock

skatedogfish

0.00.20.40.60.81.0

Dissimilarity

5610

0

100

100

100

100

8710

087

9886

100

8184

100

6793

6373

8210

054

8954

6687

58

7558

100

75

7779

7573

86

100

57au

5510

0

100

100

100

100

8710

091

9992

100

8582

9965

9663

2767

100

5079

5989

7748

9254

100

81

9287

8182

90

100

59bp

12

34

5

67

89

1011

1213

1415

1617

1819

2021

2223

2425

2627

2829

3031

3233

3435

36

3738

edge

#

(b)

snake blennymoustache sculpin

lumpfishspotted wolffish

vahl's eelpoutpolar sculpin

esmark's eelpoutatlantic sculpin

artic rocklingdeepwater redfish

polar codatlantic poacherlongfin snailfish

greenland halibutlycodes sp

halibutplaice

dabnorway pout

lingmegrimblueling

greater argentineblue whiting

fourbeaded rocklingmonkfish

whitingwitchtusk

atlantic wolffishthorny skate

codlong rough dab

haddockredfishsaithe

lemon solenorway haddock

skatedogfish

0.00.20.40.60.81.0

Dissimilarity

5610

0

100

100

100

100

8589

9991

9286

100

8354

5894

9191

6490

5742

9292

9175

9493

80

9183

7781

6686

7986

au

5410

0

100

100

100

100

8691

9893

8085

9979

5261

7996

7049

6952

6768

55

9627

6647

26

3324

5826

6924

1224

bp

12

34

5

67

89

1011

1213

1415

1617

1819

2021

2223

2425

2627

2829

30

3132

3334

3536

3738

edge

#

Figure4.5:

Dendrogram

ofspeciesassemblageusing(a)Average

linkage

and(b)Com

pletelinkage

withBray-Curtis

dissimilarity

measure.Dataconsistsofspeciesabundancein

numbers,fourth

root

transformed

andstandardised

byrange.

The

rectangles

highlight

theclusters

withAU>0.9.

Page 49: Robustness of three hierarchical agglomerative clustering

4.1 Comparison of the three hierarchical clustering techniques 33

moustache sculpinpolar sculpin

esmark's eelpoutatlantic sculpin

artic rocklingpolar cod

atlantic poacherlongfin snailfish

greenland halibutlycodes sp

spotted wolffishvahl's eelpout

lumpfishhaddock

redfishatlantic wolffish

thorny skatecod

long rough dabdeepwater redfish

bluelinggreater argentine

blue whitingskate

dogfishhalibutplaice

dabnorway pout

lingmegrim

saithetusk

lemon solenorway haddock

snake blennymonkfish

fourbeaded rocklingwhiting

witch

0.00.51.01.52.02.53.03.5

Dissimilarity

5410

010

010

010

010

085

9299

8787

7199

9390

6693

8394

7764

8678

7973

9381

8372

7678

94

7467

84

65

80

74

au

5310

010

010

010

010

087

9199

9275

6499

8894

6295

5897

5067

9758

5773

9540

4865

5560

99

4028

58

36

52

40

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

35

36

37

38

edge

#

Figure4.6:

Dendrogram

ofspeciesassemblageusingWard'slinkage

withBray-Curtisdissimilarity

measure.Dataconsists

ofspeciesabundancein

numbers,fourth

root

transformed

andstandardised

byrange.

The

rectangles

highlight

theclusters

withAU>0.9.

Page 50: Robustness of three hierarchical agglomerative clustering

34 Chapter 4 Results

(a)

dogfishskate

dabgreenland halibut

lycodes spartic rockling

atlantic sculpinpolar sculpin

esmark's eelpoutatlantic poacherlongfin snailfish

deepwater redfishpolar cod

snake blennymoustache sculpin

tusknorway haddock

saitheredfish

atlantic wolffishlumpfishhaddock

thorny skatelong rough dab

vahl's eelpoutcod

spotted wolffishfourbeaded rockling

whitingwitch

plaicehalibut

monkfishlemon sole

megrimling

norway poutblue whiting

bluelinggreater argentine

0.00.20.40.60.8

Dissimilarity

100

100

100

100

100

5710

066

100

9582

100

7399

8082

9866

7976

8288

8984

8590

9276

6997

6789

99

63

75

64

5655

au

100

100

9910

010

054

100

4710

094

9310

0

6098

4657

9778

4634

6780

4454

4171

8222

2288

2268

99

17

56

26

2323

bp

1

23

45

67

89

1011

12

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

33

34

35

36

3738

edge

#

(b)

artic rocklingesmark's eelpout

polar sculpinatlantic sculpin

atlantic poacherlongfin snailfish

greenland halibutlycodes sp

deepwater redfishpolar cod

megrimling

norway poutblue whiting

bluelinggreater argentine

skatedogfish

atlantic wolffishlumpfishhaddock

thorny skatelong rough dab

moustache sculpinvahl's eelpout

codspotted wolffish

saitheredfish

tusknorway haddock

halibutmonkfish

lemon soleplaice

dabsnake blenny

fourbeaded rocklingwhiting

witch

0.00.20.40.60.81.0

Dissimilarity

100

100

9986

9454

9410

093

69

7470

8095

9854

8779

8488

8597

8779

8982

8583

8792

88

9390

8189

9595

91

au

100

100

100

7797

5597

100

9437

8361

5292

9757

6386

3480

3199

3167

8036

5468

7654

33

9430

7330

4648

82

bp

1

23

45

67

89

10

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

31

3233

3435

3637

38

edge

#

Figure4.7:

Dendrogram

ofspeciesassemblageusing(a)Average

linkage

and(b)Com

pletelinkage

withBray-Curtis

dissimilarity

measure.Dataconsists

ofmeanspeciesabundancein

numbersby

stations,fourth

root

transformed

and

standardised

byrange.

The

rectangles

highlight

theidenti�edspeciesassemblages

forcomparison.

Page 51: Robustness of three hierarchical agglomerative clustering

4.1 Comparison of the three hierarchical clustering techniques 35

atlantic poacherlongfin snailfishatlantic sculpin

artic rocklingesmark's eelpout

polar sculpingreenland halibut

lycodes spdeepwater redfish

polar codatlantic wolffish

lumpfishhaddock

thorny skatelong rough dab

moustache sculpinvahl's eelpout

codspotted wolffish

blue whitingblueling

greater argentinesaithe

redfishtusk

norway haddockmegrim

lingnorway pout

skatedogfish

snake blennyfourbeaded rockling

whitingwitch

halibutmonkfish

lemon soleplaice

dab

01234

Dissimilarity

100

100

9793

6298

8394

7310

073

7175

9995

9081

9789

100

7360

9271

9796

100

9896

7369

9297

7662

5365

97

au

100

100

100

9155

100

100

9429

100

8660

5699

9778

3197

8110

021

1788

6379

8510

096

6154

2992

96

387

870

96

bp 12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

33

3435

3637

38

edge

#

Figure4.8:

Dendrogram

ofspeciesassemblageusingWard'slinkage

withBray-Curtisdissimilarity

measure.Dataconsists

ofmeanspeciesabundancein

numbersby

stations,fourth

root

transformed

andstandardised

byrange.

The

rectangles

highlight

theidenti�edspeciesassemblages

forcomparison.

Page 52: Robustness of three hierarchical agglomerative clustering

36 Chapter 4 Results

4.2 Sample size e�ect

4.2.1 Analysis I: Correlation distance

For this part, Average linkage performed well down to a subsample of 25% with some

minor changes in the clustering structure of the species. On the other hand, Com-

plete linkage gave unstable results but Ward's linkage performed well down to 10%

subsample. Two main observations can be made in all three cases. The probability

values decreased with smaller sample size leading to many clusters being insigni�-

cant and the CPCC for all linkage techniques generally decreased with decreasing

sample size (Table 4.1). Some more detailed observations for the three clustering

methods are outlined below.

Average linkage

The 50% subsample gave very similar assemblage groupings to the total sample

size. Three clusters were identi�ed at a dissimilarity of 1 (edge 34, 36 & 37 in Figures

4.1a and 4.9a). The 25% subsample gave similar results except the species group

containing blue whiting, blue ling and greater argentine clustered with a di�erent

group of species (Figure 4.9b). At a subsample of 10%, the clusters containing cod

and greenland halibut (edge 36; Figure 4.10) were similar however the clustering for

the rest of the species changed. The probability values decreased with decreasing

sample size. The CPCC decreased from 0.82 for the largest sample to 0.78 for the

smallest sample. (Table 4.1).

Complete linkage

Data aggregated by stations were used to compare the sample sizes in this case

as it gave relatively more consistent results. Additionally, the results obtained from

these were similar to the results obtained from the other two clustering techniques

therefore this was considered more reliable for comparison. Reducing the sample

size had an e�ect on the assemblages obtained from this method. Even though the

results from 25% subsample were similar (Figures 4.3b & 4.11b), the 50% subsample

gave some inconsistent results, such as, the cluster containing cod (edge 34; Figures

4.11a) had a di�erent clustering structure. At 10% subsample the clustering was

Page 53: Robustness of three hierarchical agglomerative clustering

4.2 Sample size e�ect 37

signi�cantly di�erent (Figure 4.12). The probability values decreased with decreas-

ing sample size. The CPCC decreased from 0.79 for the largest sample to 0.70 for

the smallest sample (Table 4.1).

Ward's linkage

For the 50% and 25% subsamples the results were similar with lump�sh, skate

and dog�sh being exceptions (Figures 4.2, 4.13a, & 4.13b). At 10% subsample

snake blenny was an exception to the general clustering structure (Figure 4.14).

The probability values of the clusters decreased signi�cantly with fewer samples.

The CPCC values were consistent down to 25% subsample at 0.75 but decreased to

0.66 with a further reduction in the sample size (Table 4.1).

4.2.2 Analysis II: Bray-Curtis distance

For this distance measure, Average and Ward linkage performed relatively better

than Complete linkage.

Average linkage performed consistently at 50% subsample, some species were

unstable in clusters (Figures 4.5a and 4.15a). Some inconsistencies were observed at

25% subsample however the overall structure was similar (Figure 4.15b) but changed

considerably at 10% sample size (Figure 4.16).

Complete linkage performed consistently at 50% subsample, some species were

unstable in clusters (Figures 4.7b and 4.17a). The assemblages were considerably

di�erent at 25% and 10% subsample (Figures 4.17b & 4.18).

Ward's linkage performed relatively well at 50% and 25% subsample, with some

exceptions (Figures 4.6, 4.19a and 4.19b). At 10% subsample the assemblages were

considerably di�erent (Figure 4.20).

Here again, the CPCC values decreased gradually with decreasing sample size for

all techniques (Table 4.1) and the probability values of the clusters also decreased.

Page 54: Robustness of three hierarchical agglomerative clustering

38 Chapter 4 Results

(a)

codspotted wolffish

thorny skatesnake blenny

long rough dabvahl's eelpout

deepwater redfishpolar cod

polar sculpinatlantic sculpin

artic rocklingatlantic poacherlongfin snailfish

lycodes spgreenland halibutesmark's eelpout

skatedogfish

witchfourbeaded rockling

haddockwhiting

monkfishlemon sole

blue whitingblueling

greater argentinetusk

megrimnorway pout

lingnorway haddock

saitheredfish

atlantic wolffishmoustache sculpin

lumpfishhalibutplaice

dab

0.20.40.60.81.0

Dissimilarity

99

9684

6199

9710

095

7398

9096

8294

9293

9910

093

100

6465

9396

6458

7591

9910

0

6695

6783

7976

68

70

au

100

9678

5010

092

100

9679

9986

100

7789

9692

100

100

9610

038

2175

8935

3954

9210

099

3688

3575

2733

27

38

bp

1

23

45

67

89

1011

1213

1415

1617

1819

2021

2223

2425

2627

2829

30

3132

3334

3536

37

38

edge

#

(b)

polar codpolar sculpin

atlantic sculpinartic rockling

esmark's eelpoutatlantic poacherlongfin snailfish

greenland halibutlycodes sp

codspotted wolffish

thorny skatevahl's eelpout

long rough dabsnake blenny

atlantic wolffishmoustache sculpin

lumpfishhalibutplaice

dabdogfish

skatedeepwater redfish

blue whitingblueling

greater argentinemonkfish

lemon solewitch

fourbeaded rocklinghaddock

whitingtusk

saitheredfish

megrimnorway pout

lingnorway haddock

0.20.40.60.81.0

Dissimilarity

9997

9995

6696

9895

6710

075

100

100

6880

8277

7879

5898

8171

6269

100

8271

9895

79

8083

8587

8082

80

au

9891

100

8768

9510

090

6810

064

100

100

7373

4742

4741

5710

043

4542

5699

7069

9781

62

5545

2841

1623

25

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

31

3233

3435

3637

38

edge

#

Figure4.9:

Dendrogram

ofspeciesassemblageusingAverage

linkage

withcorrelationdissimilarity

measure.Dataconsists

ofspeciesabundancein

numbers,fourth

root

transformed

andscaled

to0meanandvariance

1,comprisingof

(a)50%

random

subsam

pleand(b)25%

random

subsam

pleof

thetotaltowcollections.The

rectangles

highlight

theclusterswith

AU>0.9.

Page 55: Robustness of three hierarchical agglomerative clustering

4.2 Sample size e�ect 39

dogfishskate

greater argentineblue whiting

tusksaithe

redfishfourbeaded rockling

haddockmonkfish

whitingwitch

bluelingmegrim

norway poutling

norway haddockhalibutplaice

lemon soledab

lumpfishpolar sculpin

deepwater redfishatlantic sculpin

artic rocklingesmark's eelpout

polar codgreenland halibut

lycodes spatlantic poacherlongfin snailfishatlantic wolffish

moustache sculpinsnake blenny

long rough dabthorny skate

vahl's eelpoutcod

spotted wolffish

0.00.20.40.60.81.01.2

Dissimilarity

6582

9287

9974

6761

9210

079

6678

9784

9069

8184

7075

8497

8875

8777

9178

8083

8380

7880

7385

86

au

5357

8179

9749

6644

8810

040

4015

9656

6518

5164

2810

6176

1628

7342

5134

935

1735

1626

1939

38

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

3536

3738

edge

#

Figure4.10:Dendrogram

ofspeciesassemblageusingAverage

linkage

withcorrelationdissimilarity

measure.Dataconsists

ofspeciesabundancein

numbers,fourth

root

transformed

andscaled

to0meanandvariance

1,comprisingof10%

random

subsam

pleof

thetotaltowcollections.The

rectangles

highlight

theclusters

withAU>0.9.

Page 56: Robustness of three hierarchical agglomerative clustering

40 Chapter 4 Results

(a)

tusksaithe

redfishling

norway haddockmegrim

norway poutblue whiting

bluelinggreater argentine

haddockwhiting

witchfourbeaded rockling

long rough dabsnake blenny

monkfishlemon sole

halibutplaice

dabskate

dogfishlumpfish

atlantic wolffishmoustache sculpindeepwater redfishgreenland halibutesmark's eelpout

polar sculpinartic rockling

codspotted wolffish

thorny skatevahl's eelpout

polar codatlantic sculpin

lycodes spatlantic poacherlongfin snailfish

0.00.51.01.5

Dissimilarity

8588

8178

9694

9887

9194

6585

7952

8790

9275

6668

7388

8081

7374

4795

8176

8280

7982

71

88

8586

au

7096

5569

9685

9833

9189

5012

8753

6058

7438

1442

66

2456

35

3366

8142

162

3547

15

23

3535

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

35

36

3738

edge

#

(b)

tuskblue whiting

bluelinggreater argentine

saitheredfish

lingnorway haddock

megrimnorway pout

long rough dabsnake blenny

fourbeaded rocklinghaddock

whitingwitchskate

dogfishplaice

dabhalibut

monkfishlemon sole

lumpfishatlantic wolffish

moustache sculpincod

spotted wolffishthorny skate

vahl's eelpoutesmark's eelpout

polar sculpinatlantic sculpin

artic rocklingdeepwater redfish

polar codatlantic poacherlongfin snailfish

greenland halibutlycodes sp

0.00.51.01.5

Dissimilarity

9187

8788

7599

9199

7580

9899

9083

8864

8483

7571

9355

8679

8976

6879

7176

8364

6572

72

7165

65

au

7470

8281

7499

5799

5741

9696

8847

5240

5539

4532

2129

4857

8824

1113

928

1121

1510

8

410

9

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

35

3637

38

edge

#

Figure4.11:Dendrogram

ofspeciesassemblageusingCom

pletelinkage

withcorrelationdissimilarity

measure.

Data

consists

ofmeanspeciesabundancein

numbersby

stations,fourth

root

transformed

andscaled

to0meanandvariance

1,comprisingof

(a)50%

random

subsam

pleand(b)25%

random

subsam

pleof

thetotaltowcollections.The

rectangles

highlight

theclusters

withAU>0.9.

Page 57: Robustness of three hierarchical agglomerative clustering

4.2 Sample size e�ect 41

(a)

dabmonkfish

lemon soleling

norway haddockfourbeaded rockling

haddockwhiting

witchtusk

saitheredfishdogfishmegrim

norway poutblueling

greater argentineblue whiting

lumpfishatlantic wolffish

moustache sculpinhalibutplaice

long rough dabsnake blennypolar sculpin

codthorny skate

vahl's eelpoutskate

artic rocklingspotted wolffishatlantic sculpin

polar codatlantic poacherlongfin snailfish

greenland halibutlycodes sp

deepwater redfishesmark's eelpout

0.00.51.01.5

Dissimilarity

9786

9110

092

9673

9586

9165

7792

7295

7182

9479

7977

7664

6081

7579

87

9685

70

6290

8786

70

80

71

au

9161

5810

070

7834

7735

5215

3072

2482

2944

8510

1222

1518

1416

226

39

05

3

32

21

1

9

1

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

31

3233

3435

36

37

38

edge

#

Figure4.12:Dendrogram

ofspeciesassemblageusingCom

pletelinkage

withcorrelationdissimilarity

measure.

Data

consistsof

meanspeciesabundancein

numbersby

stations,fourth

root

transformed

andscaled

to0meanandvariance

1,comprisingof

10%

random

subsam

pleof

thetotaltowcollections.The

rectangles

highlight

theclusters

withAU>0.9.

Page 58: Robustness of three hierarchical agglomerative clustering

42 Chapter 4 Results

(a)

polar sculpinatlantic sculpin

artic rocklingpolar cod

greenland halibutesmark's eelpout

lycodes spatlantic poacherlongfin snailfish

thorny skatesnake blenny

long rough dabvahl's eelpout

codspotted wolffishatlantic wolffish

moustache sculpindeepwater redfish

blue whitingblueling

greater argentinemegrim

norway poutling

norway haddocktusk

saitheredfish

witchfourbeaded rockling

haddockwhiting

monkfishlemon sole

skatedogfish

lumpfishhalibutplaice

dab

0123456

Dissimilarity

9797

9299

9797

9895

9910

010

010

010

098

100

100

9999

9899

100

100

9998

8486

9910

097

8486

9280

6492

78

9191

au

9910

086

9910

010

098

9710

010

010

010

010

096

100

100

100

9896

9910

010

098

9968

8999

100

9990

7873

3655

7338

6666

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

3536

3738

edge

#

(b)

polar sculpinatlantic sculpin

artic rocklingpolar cod

greenland halibutesmark's eelpout

lycodes spatlantic poacherlongfin snailfish

thorny skatesnake blenny

long rough dabvahl's eelpout

codspotted wolffishatlantic wolffish

moustache sculpinmegrim

norway poutsaithe

redfishtuskling

norway haddockdeepwater redfish

bluelinggreater argentine

blue whitingskate

dogfishwitch

fourbeaded rocklinghaddock

whitingmonkfish

lemon solelumpfish

halibutplaice

dab

0123456

Dissimilarity

9799

6696

7299

9883

8295

100

8910

010

085

5676

7510

010

079

7258

7786

9957

100

7199

6681

9379

8378

8080

au

9610

077

9672

100

9981

100

9610

089

100

100

6855

7350

9910

071

4581

7360

9867

100

7998

8575

7876

6951

7575

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

3536

3738

edge

#

Figure4.13:Dendrogram

ofspeciesassemblageusingWard'slinkage

withcorrelationdissimilarity

measure.Dataconsists

ofspeciesabundancein

numbers,fourth

root

transformed

andscaled

to0meanandvariance

1,comprisingof

(a)50%

random

subsam

pleand(b)25%

random

subsam

pleof

thetotaltowcollections.The

rectangles

highlight

theclusterswith

AU>0.9.

Page 59: Robustness of three hierarchical agglomerative clustering

4.2 Sample size e�ect 43

tusksaithe

redfishdogfish

greater argentineblue whiting

skatebluelingmegrim

norway poutling

norway haddockhalibutplaice

lemon soledab

haddockmonkfish

whitingwitch

fourbeaded rocklingsnake blenny

polar codgreenland halibut

lycodes spatlantic poacherlongfin snailfish

polar sculpinatlantic sculpin

artic rocklingdeepwater redfishesmark's eelpout

codspotted wolffishlong rough dab

thorny skatevahl's eelpout

lumpfishatlantic wolffish

moustache sculpin

01234567

Dissimilarity

6977

9287

8371

7299

9410

090

100

7794

9078

8050

8187

7582

7890

8384

9291

8579

7492

8789

6881

65

65

au

5544

7479

5767

5195

8710

051

9950

7974

5032

2116

1828

4942

7548

2025

237

508

48

257

46

6

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

3536

37

38

edge

#

Figure4.14:Dendrogram

ofspeciesassemblageusingWard'slinkage

withcorrelationdissimilarity

measure.Dataconsists

ofspeciesabundancein

numbers,fourth

root

transformed

andscaled

to0meanandvariance

1,comprisingof10%

random

subsam

pleof

thetotaltowcollections.The

rectangles

highlight

theclusters

withAU>0.9.

Page 60: Robustness of three hierarchical agglomerative clustering

44 Chapter 4 Results

(a)

skateatlantic sculpin

artic rocklinglycodes sp

greenland halibutesmark's eelpout

polar sculpinatlantic poacherlongfin snailfish

deepwater redfishpolar cod

dogfishblue whiting

bluelinggreater argentine

snake blennyplaice

dabsaithe

norway haddockredfish

tuskmoustache sculpin

atlantic wolffishlumpfishhaddock

thorny skatelong rough dab

codspotted wolffish

vahl's eelpoutfourbeaded rockling

lingmegrimhalibut

monkfishlemon sole

norway poutwhiting

witch

0.10.20.30.40.50.60.70.8

Dissimilarity

100

8980

7984

8065

9310

070

9968

7364

8378

8069

7880

7780

9291

9378

9998

5880

5750

99

5060

6664

64au

9966

4252

6168

4695

100

59

9861

5947

5655

6448

6444

2923

7684

5416

9492

3173

1217

100

1826

2734

16bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

33

3435

3637

38ed

ge #

(b)

dogfishskate

polar coddeepwater redfish

lycodes sppolar sculpinartic rockling

atlantic sculpinatlantic poacherlongfin snailfish

greenland halibutesmark's eelpout

bluelingmegrim

greater argentineblue whiting

dabhalibutplaice

tusknorway haddock

saitheredfish

monkfishlemon sole

lingnorway pout

moustache sculpinatlantic wolffish

lumpfishhaddock

thorny skatelong rough dab

vahl's eelpoutcod

spotted wolffishsnake blenny

fourbeaded rocklingwhiting

witch

0.00.20.40.60.8

Dissimilarity

99

9493

8788

7998

8295

9399

9680

7399

9187

7252

7459

6069

6670

8569

6773

8681

6993

6790

89

8378

au

99

8993

6281

7196

6889

8095

8841

5393

6530

1840

4322

3566

136

318

4032

579

3145

1869

47

4678

bp

1

23

45

67

89

1011

1213

1415

1617

1819

2021

2223

2425

2627

2829

3031

3233

3435

36

3738

edge

#

Figure4.15:Dendrogram

ofspeciesassemblageusingAverage

linkage

withBray-Curtisdissimilarity

measure.

Data

consists

ofspeciesabundancein

numbers,

fourth

root

transformed

andstandardised

byrange,

comprisingof

(a)50%

random

subsam

pleand(b)25%

random

subsam

pleof

thetotaltowcollections.The

rectangles

highlight

theclusterswith

AU>0.9.

Page 61: Robustness of three hierarchical agglomerative clustering

4.2 Sample size e�ect 45

(a)

polar coddeepwater redfishgreenland halibut

skateartic rockling

atlantic poacheratlantic sculpin

polar sculpinesmark's eelpout

longfin snailfishlycodes sp

dogfishblue whiting

bluelinggreater argentine

snake blennyplaice

dabsaithe

tuskredfish

norway haddockmoustache sculpin

vahl's eelpoutspotted wolffish

codthorny skate

atlantic wolffishlumpfishhaddock

long rough dabhalibut

monkfishlemon sole

fourbeaded rocklingling

megrimwitch

whitingnorway pout

0.10.20.30.40.50.60.70.8

Dissimilarity

9967

8370

8980

9789

8076

9997

8977

9599

9184

9285

7110

087

8653

9079

6990

83

7785

85

8481

9292

93

au

9733

5246

5451

8432

4852

8680

5638

6387

6125

1212

2398

1311

2810

520

815

514

16

99

3024

23

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

33

3435

3637

38

edge

#

Figure4.16:DendrogramofspeciesassemblageusingAverage

linkage

withBray-Curtisdissimilarity

measure.Dataconsists

ofspeciesabundanceinnumbers,fourth

root

transformed

andstandardised

byrange,comprisingof10%random

subsam

ple

ofthetotaltowcollections.The

rectangles

highlight

theclusters

withAU>0.9.

Page 62: Robustness of three hierarchical agglomerative clustering

46 Chapter 4 Results

(a)

deepwater redfishpolar cod

lycodes spgreenland halibutesmark's eelpout

atlantic sculpinartic rocklingpolar sculpin

atlantic poacherlongfin snailfish

fourbeaded rocklingsnake blenny

atlantic wolffishlumpfishhaddock

thorny skatelong rough dab

moustache sculpincod

spotted wolffishvahl's eelpout

plaicehalibut

monkfishlemon sole

saithenorway haddock

redfishtusk

norway poutwhiting

witchling

megrimblue whiting

bluelinggreater argentine

dabskate

dogfish

0.00.20.40.60.81.0

Dissimilarity

100

8890

7861

9881

9870

6468

9958

8477

8667

9073

6291

7989

8592

9182

68

7883

83

9974

8572

87

91

94au 99

6855

5446

9573

100

5052

4997

5360

5670

51

5769

6887

2668

2388

3168

43

5927

11

9860

137

11

9

83bp

12

34

56

78

910

1112

1314

1516

17

1819

2021

2223

2425

2627

28

2930

31

3233

3435

36

37

38ed

ge #

(b)

skatemegrim

lingnorway pout

bluelinggreater argentine

blue whitingdogfish

dabplaice

halibutmonkfish

lemon soledeepwater redfish

polar sculpinartic rockling

atlantic sculpinatlantic poacherlongfin snailfish

polar codlycodes sp

greenland halibutesmark's eelpout

snake blennyfourbeaded rockling

whitingwitchtusk

norway haddocksaithe

redfishmoustache sculpin

codspotted wolffish

vahl's eelpoutthorny skate

long rough dabhaddock

atlantic wolffishlumpfish

0.00.20.40.60.81.0

Dissimilarity

98

9490

9084

9789

9389

93

8692

7892

9891

8463

8092

6771

9667

9960

67

9285

7886

6693

95

9281

9695

au

98

9093

6478

9577

8065

53

4676

5567

9248

4327

5049

6414

7022

8835

8

5133

224

157

23

1119

99

bp

1

23

45

67

89

10

1112

1314

1516

1718

1920

2122

2324

2526

27

2829

3031

3233

34

3536

3738

edge

#

Figure4.17:Dendrogram

ofspeciesassemblageusingCom

pletelinkage

withBray-Curtisdissimilarity

measure.Data

consistsofmeanspeciesabundancein

numbersby

stations,fourth

root

transformed

andstandardised

byrange,comprising

of(a)50%

random

subsam

pleand(b)25%

random

subsam

pleof

thetotaltowcollections.The

rectangles

highlight

the

clusters

withAU>0.9.

Page 63: Robustness of three hierarchical agglomerative clustering

4.2 Sample size e�ect 47

(a)

polar coddeepwater redfish

atlantic poacheresmark's eelpout

longfin snailfishlycodes sp

artic rocklingatlantic sculpin

polar sculpinskate

greenland halibutdogfish

greater argentineblue whiting

saithetusk

redfishnorway haddock

witchwhiting

norway poutfourbeaded rockling

bluelingling

megrimhalibut

monkfishlemon sole

plaicedab

snake blennymoustache sculpin

vahl's eelpoutspotted wolffish

codthorny skate

atlantic wolffishlumpfishhaddock

long rough dab

0.00.20.40.60.81.0

Dissimilarity

9967

8580

8590

9883

7499

9574

92

8575

9683

9410

071

5290

7167

8567

9077

8648

5675

9183

9493

75

77au

9732

5151

5247

8643

5281

6138

36

4641

6820

6898

157

1547

98

6

3213

71

10

78

42

15

13bp

12

34

56

78

910

1112

13

1415

1617

1819

2021

2223

2425

26

2728

2930

3132

3334

3536

37

38ed

ge #

Figure4.18:Dendrogram

ofspeciesassemblageusingCom

pletelinkage

withBray-Curtisdissimilarity

measure.Data

consistsofmeanspeciesabundancein

numbersby

stations,fourth

root

transformed

andstandardised

byrange,comprising

ofa10%

random

subsam

pleof

thetotaltowcollections.The

rectangles

highlight

theclusters

withAU>0.9.

Page 64: Robustness of three hierarchical agglomerative clustering

48 Chapter 4 Results

(a)

lycodes spgreenland halibutesmark's eelpoutatlantic poacherlongfin snailfishatlantic sculpin

polar sculpinartic rockling

deepwater redfishpolar cod

atlantic wolffishlumpfishhaddock

thorny skatelong rough dab

moustache sculpinvahl's eelpout

codspotted wolffish

bluelinggreater argentine

blue whitingskate

dogfishtusk

norway haddocksaithe

redfishmegrim

lingnorway pout

snake blennyfourbeaded rockling

whitingwitch

dabplaice

halibutmonkfish

lemon sole

01234

Dissimilarity

9988

8988

8599

9076

9381

8674

9292

9884

8362

6787

6787

8168

8496

9570

9170

9186

9182

95

88

94

92

au

9991

9280

6296

8067

8165

4755

7967

9243

4625

6560

1344

3043

1178

7428

4817

5323

4615

29

11

36

51

bp 12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

35

36

37

38

edge

#

(b)

deepwater redfishpolar cod

atlantic sculpinartic rockling

lycodes spgreenland halibutesmark's eelpout

polar sculpinatlantic poacherlongfin snailfish

saithenorway haddock

redfishtusk

atlantic wolffishlumpfishhaddock

thorny skatelong rough dab

moustache sculpincod

spotted wolffishvahl's eelpout

skatedogfish

norway poutling

megrimblue whiting

bluelinggreater argentine

dabplaice

halibutmonkfish

lemon solesnake blenny

fourbeaded rocklingwhiting

witch

01234

Dissimilarity

100

8987

8069

9683

100

8083

6210

082

7981

7482

8576

9681

5673

9698

7979

9789

9066

7586

8180

77

86

87

au

9968

5252

5295

7010

063

5349

9959

6356

4467

5675

9041

6365

9389

7431

8174

6647

1685

3816

24

27

84

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

3536

37

38

edge

#

Figure4.19:DendrogramofspeciesassemblageusingWard'slinkage

withBray-Curtisdissimilarity

measure.Dataconsists

ofspeciesabundancein

numbers,

fourth

root

transformed

andstandardised

byrange,

comprisingof

(a)50%

random

subsam

pleand(b)25%

random

subsam

pleof

thetotaltowcollections.The

rectangles

highlight

theclusters

withAU

>0.9.

Page 65: Robustness of three hierarchical agglomerative clustering

4.2 Sample size e�ect 49

(a)

skategreenland halibut

artic rocklingatlantic poacheratlantic sculpin

polar sculpinesmark's eelpout

longfin snailfishlycodes sp

snake blennydeepwater redfish

polar codatlantic wolffish

lumpfishhaddock

long rough dabmoustache sculpin

vahl's eelpoutspotted wolffish

codthorny skate

dogfishblueling

lingmegrim

greater argentineblue whiting

plaicedab

fourbeaded rocklingwitch

whitingnorway pout

saithetusk

redfishnorway haddock

halibutmonkfish

lemon sole

01234

Dissimilarity

9969

8381

8783

9880

7398

9275

9292

8199

8910

073

7875

6488

9476

6984

6882

7395

5380

7779

70

83

83

au

9732

4954

5151

8642

5379

6038

4355

2079

3498

619

484

2765

211

213

116

293

73

72

9

6

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

3536

37

38

edge

#

Figure4.20:DendrogramofspeciesassemblageusingWard'slinkage

withBray-Curtisdissimilarity

measure.Dataconsists

ofspeciesabundanceinnumbers,fourth

root

transformed

andstandardised

byrange,comprisingof10%random

subsam

ple

ofthetotaltowcollections.The

rectangles

highlight

theclusters

withAU>0.9.

Page 66: Robustness of three hierarchical agglomerative clustering

50 Chapter 4 Results

4.3 Data Aggregation (smoothing) e�ect

4.3.1 Analysis I: Correlation distance

The level at which the data were aggregated had an e�ect particularly on Complete

linkage. With the full data set, the clusters were not very well-de�ned and the

de�nition improved with data smoothing, increasing the probability slightly also

(Figure 4.1b & Figure 4.3b & Figure 4.21b). The CPCC was considerably higher

for aggregated data then for the full data set. The CPCC for Average and Ward's

linkage did not show any considerable di�erence with data smoothing (Table 4.1).

The overall assemblage patterns for these two linkage methods were comparable,

across di�erent data aggregations, with some species being exceptions that moved

between the clusters. These are illustrated in Figures 4.1a, 4.3a & 4.21a for Average

linkage and Figures 4.2, 4.4 & 4.22 for Ward's linkage.

The dissimilarity levels at which the clusters formed was lower when the data

were aggregated by stations, for Average and Complete linkage. Further data aggre-

gation by subrectangles, did not result in any signi�cant changes in the clustering

levels. The AC values considerably increased when the data were aggregated by sta-

tions for all three linkage methods. However, no considerable changes were observed

when data were further aggregated by subrectangles (Table 4.2).

The probability of clustering generally decreased with data smoothing. Ward's

linkage performed well across all three data aggregation levels with the highest

probability of clustering with the full data set, indicating the greatest consistency

in generated clusters across bootstraps.

4.3.2 Analysis II: Bray-Curtis distance

The structure of assemblages were sensitive to data aggregation for all three link-

age techniques, in particular for Complete linkage. The probability of the clusters

increased with increased data smoothing for all three linkage techniques. These are

illustrated in Figures 4.5a , 4.7a and 4.21a for Average linkage; Figures 4.5b , 4.7b

and 4.21b for the Complete linkage and Figures 4.6 , 4.8 and 4.24 for the Ward's

linkage.

The CPCC increased for Complete and Ward's linkage but decreased slightly for

Average linkage (Table 4.1). The AC values increased with data smoothing for all

linkage techniques (Table 4.2) together with the probability values for the clusters.

Page 67: Robustness of three hierarchical agglomerative clustering

4.3 Data Aggregation (smoothing) e�ect 51

(a)

halibutplaice

dablong rough dab

snake blennyfourbeaded rockling

haddockwhiting

witchskate

dogfishblue whiting

bluelinggreater argentine

monkfishlemon sole

lingnorway haddock

megrimnorway pout

tusksaithe

redfishmoustache sculpin

atlantic wolffishlumpfish

deepwater redfishthorny skate

vahl's eelpoutcod

spotted wolffishpolar cod

artic rocklingpolar sculpin

atlantic poacheratlantic sculpin

greenland halibutesmark's eelpout

longfin snailfishlycodes sp

0.00.20.40.60.81.01.2

Dissimilarity

9376

100

100

93

100

6199

9883

100

9677

7298

9984

7096

7081

9086

100

9259

9286

8191

8593

72

6087

80

81

80

au

8043

100

100

75

9965

9810

049

9893

4367

9499

7429

8757

7663

7410

082

4738

4453

7030

5552

1529

16

31

32

bp

12

34

5

67

89

1011

1213

1415

1617

1819

2021

2223

2425

2627

2829

3031

3233

3435

36

37

38

edge

#

(b)

moustache sculpinatlantic wolffish

lumpfishthorny skate

vahl's eelpoutcod

spotted wolffishatlantic sculpin

greenland halibutesmark's eelpout

longfin snailfishlycodes sp

polar sculpinartic rockling

deepwater redfishatlantic poacher

polar codblue whiting

bluelinggreater argentine

tusksaithe

redfishskate

dogfishmonkfish

lemon soleling

norway haddockmegrim

norway pouthalibutplaice

dabhaddock

whitingwitch

fourbeaded rocklinglong rough dab

snake blenny

0.00.51.01.5

Dissimilarity

9280

9999

8799

6410

099

9766

7496

7198

9760

9592

7278

84

9784

7166

95

8189

8476

77

9280

83

87

8685

au

8150

100

100

5599

6598

100

9367

4993

2494

9830

8282

5052

68

9638

6231

55

1325

2043

21

2249

18

24

2728

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

27

2829

3031

32

3334

35

36

3738

edge

#

Figure4.21:Dendrogram

ofspeciesassemblageusing(a)Average

linkage

and(b)Com

pletelinkage

withcorrelation

dissimilarity

measure.

Dataconsists

ofmeanspeciesabundancein

numbersby

statisticalsubrectangles,

fourth

root

transformed

andscaled

to0meanandvariance

1.The

rectangles

highlight

theclusters

withAU>0.9.

Page 68: Robustness of three hierarchical agglomerative clustering

52 Chapter 4 Results

(a)

atlantic sculpingreenland halibutesmark's eelpout

longfin snailfishlycodes sp

polar sculpinartic rockling

vahl's eelpoutcod

spotted wolffishthorny skate

atlantic poacherpolar cod

deepwater redfishblue whiting

bluelinggreater argentine

tusksaithe

redfishmonkfish

lemon soleling

norway haddockmegrim

norway poutmoustache sculpin

atlantic wolffishlumpfish

long rough dabsnake blenny

fourbeaded rocklinghaddock

whitingwitch

halibutplaice

dabskate

dogfish

0246810

Dissimilarity

9385

100

100

8099

7199

100

9666

6497

6110

055

8292

9984

100

6967

9188

7194

7594

7589

8992

8494

93

94

87

au

8252

100

100

5199

6598

100

9366

4293

2099

3065

8293

7410

062

4581

7924

4745

7130

2353

2035

1716

15

33

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

3536

37

38

edge

#

Figure4.22:Dendrogram

ofspeciesassemblageusingWard'slinkage

withcorrelationdissimilarity

measure.Dataconsists

ofmeanspeciesabundancein

numbersby

statisticalsubrectangles,

fourth

root

transformed

andscaled

to0meanand

variance

1.The

rectangles

highlight

theclusters

withAU>0.9.

Page 69: Robustness of three hierarchical agglomerative clustering

4.3 Data Aggregation (smoothing) e�ect 53

(a)

dogfishskate

polar coddeepwater redfish

polar sculpinartic rockling

atlantic poacheratlantic sculpin

lycodes spgreenland halibutesmark's eelpout

longfin snailfishblue whiting

bluelinggreater argentine

snake blennyplaice

dabmoustache sculpin

atlantic wolffishlumpfishhaddock

thorny skatelong rough dab

vahl's eelpoutcod

spotted wolffishmegrim

norway poutfourbeaded rockling

whitingwitch

lingnorway haddock

tusksaithe

redfishhalibut

monkfishlemon sole

0.00.20.40.60.8

Dissimilarity

99

6697

9980

7494

7991

8210

088

100

5768

6458

9472

6563

8266

7495

4678

8793

76

7195

96

8685

99

7974

au

100

7397

9959

6193

100

9655

9969

100

3824

5541

9120

3043

7111

4196

5220

8077

24

2670

95

4450

93

6684

bp

1

23

45

67

89

1011

1213

1415

1617

1819

2021

2223

2425

2627

2829

30

3132

33

3435

36

3738

edge

#

(b)

dogfishhalibut

monkfishlemon sole

tusksaithe

redfishnorway pout

lingnorway haddock

megrimblue whiting

bluelinggreater argentine

skateplaice

dabdeepwater redfish

artic rocklingpolar sculpin

esmark's eelpoutlongfin snailfish

polar codgreenland halibut

lycodes spatlantic poacheratlantic sculpinatlantic wolffish

lumpfishhaddock

thorny skatelong rough dab

moustache sculpinvahl's eelpout

codspotted wolffish

snake blennyfourbeaded rockling

whitingwitch

0.00.20.40.60.81.0

Dissimilarity

100

8596

9978

9697

9294

100

9087

8867

8197

7494

7982

6288

8286

8980

8890

5987

86

8867

83

7082

8283

au

100

8896

9951

9410

086

9499

8668

5564

4897

4778

7126

5027

2591

1969

1825

4511

34

9138

19

179

2121

bp

1

23

45

67

89

1011

1213

1415

1617

1819

2021

2223

2425

2627

28

2930

31

3233

34

3536

3738

edge

#

Figure4.23:Dendrogram

ofspeciesassemblageusing(a)Average

linkage

and(b)Com

pletelinkage

withBray-Curtis

dissimilarity

measure.

Dataconsists

ofmeanspeciesabundancein

numbersby

statisticalsubrectangles,

fourth

root

transformed

andstandardised

byrange.

The

rectangles

highlight

theclusters

withAU>0.9.

Page 70: Robustness of three hierarchical agglomerative clustering

54 Chapter 4 Results

(a)

deepwater redfishpolar cod

greenland halibutlycodes sp

artic rocklingpolar sculpin

esmark's eelpoutlongfin snailfishatlantic poacheratlantic sculpinatlantic wolffish

lumpfishhaddock

thorny skatelong rough dab

moustache sculpinvahl's eelpout

codspotted wolffish

blue whitingblueling

greater argentineskate

dogfishmegrim

norway pouthalibut

monkfishlemon sole

lingnorway haddock

tusksaithe

redfishplaice

dabsnake blenny

fourbeaded rocklingwhiting

witch

01234

Dissimilarity

9897

9610

093

9210

010

089

9190

9162

6780

7199

7964

8164

9480

8077

6169

9496

7088

8390

86

90

94

94

88

au

100

9796

9981

9310

099

9369

8988

6064

4759

9974

2025

6183

1180

1733

3493

8915

8731

8138

44

43

80

87

bp

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

35

36

37

38

edge

#

Figure4.24:DendrogramofspeciesassemblageusingWard'slinkage

withBray-Curtisdissimilarity

measure.Dataconsists

ofmeanspeciesabundancein

numbersby

statisticalsubrectangles,fourth

root

transformed

andstandardised

byrange.

The

rectangles

highlight

theclusters

withAU>0.9.

Page 71: Robustness of three hierarchical agglomerative clustering

4.4 Comparison of hierarchical clustering with non-metric multidimensional scaling 55

Summary

For both Analysis I: Correlation distance and Analysis II: Bray-Curtis distance,

the following holds for the linkage methods. Average linkage always gave the highest

CPCC followed by Complete then Ward's linkage. Complete linkage was the most

sensitive method, giving inde�nite patterns with full data set and with deviations

in sample size. Average and Ward's linkage were to some extent sensitive to data

aggregation but less so than Complete linkage and were more stable when sample size

was altered. The CPCC and the probability of clusters decreased with decreasing

sample size for all three linkage techniques.

Ward's linkage always gave the highest AC followed by Complete then Average

linkage. The AC always increased with data aggregation.

The Bray-Curtis distance measure worked better with aggregated data yielding

higher p-values for the clusters. The Correlation distance measure worked better

with the full data set for Ward's linkage. This was based on the reliability of the

clusters in terms of their probability values.

4.4 Comparison of hierarchical clustering with non-

metric multidimensional scaling

The NMDS ordination of species, with Bray-Curtis dissimilarity measure resulted

in a high stress of 8.87% in three dimensions, for the full data set. The ordination

was repeated for the two data aggregation levels and produced a stress of 7.85%

and 7.49% respectively. Ordination of the full data set did not produce any distinct

groupings (Figure 4.25a). With smoother data (aggregated by statistical subrect-

angles) four species groups could be identi�ed (Figure 4.25b) which were similar to

the outcome from the hierarchical clustering, particularly to Ward's linkage (Figure

4.24), on the same level of data aggregation .

Page 72: Robustness of three hierarchical agglomerative clustering

56 Chapter 4 Results

(a)

−0.

50.

00.

5

−0.4−0.20.00.20.40.60.8

Str

ess

= 8

.87

cod

hadd

ock

saith

e

whi

ting

redf

ish

ling

blue

ling tu

sk

atla

ntic

wol

ffish

thor

ny s

kate

spot

ted

wol

ffish

mon

kfis

h

skat

edo

gfis

h

grea

ter

arge

ntin

e

halib

ut

gree

nlan

d ha

libut

plai

ce

lem

on s

ole

witc

h

meg

rim

dab

long

rou

gh d

ab

norw

ay p

out

blue

whi

ting

lum

pfis

hm

oust

ache

scu

lpin

atla

ntic

poa

cher

four

bead

ed r

ockl

ing

norw

ay h

addo

ck

deep

wat

er r

edfis

h

esm

ark'

s ee

lpou

t

long

fin s

nailf

ish

pola

r co

d

atla

ntic

scu

lpin

vahl

's e

elpo

ut

pola

r sc

ulpi

n

artic

roc

klin

g

snak

e bl

enny

lyco

des

sp

(b)

−0.

4−

0.2

0.0

0.2

0.4

0.6

−0.20.00.20.4

Str

ess

= 7

.49

cod

hadd

ock

saith

ew

hitin

gre

dfis

h

ling

blue

ling

tusk

atla

ntic

wol

ffish

thor

ny s

kate

spot

ted

wol

ffish

mon

kfis

h

skat

e

dogf

ish

grea

ter

arge

ntin

e

halib

ut

gree

nlan

d ha

libut

plai

ce

lem

on s

olew

itch

meg

rim

dab

long

rou

gh d

ab

norw

ay p

out

blue

whi

ting

lum

pfis

hm

oust

ache

scu

lpin

atla

ntic

poa

cher

four

bead

ed r

ockl

ing

norw

ay h

addo

ck

deep

wat

er r

edfis

h

esm

ark'

s ee

lpou

t

long

fin s

nailf

ish

pola

r co

d

atla

ntic

scu

lpin

vahl

's e

elpo

ut

pola

r sc

ulpi

n

artic

roc

klin

g

snak

e bl

enny

lyco

des

sp

Figure4.25:Multidimensional

scalingusingBray-Curtisdistance

measure

for(a)thefulldata

set(com

prisingalltow

collections)(b)data

aggregated

bystatisticalsub-rectangle.

Speciesabundancein

numberswas

fourth

root

transformed

andstandardised

byrange.

Page 73: Robustness of three hierarchical agglomerative clustering

4.5 Fish Assemblages in relation to environmental variables 57

4.5 Fish Assemblages in relation to environmental

variables

4.5.1 Analysis I: Correlation distance

The classi�cations from Ward's linkage, carried out on the full data set, were related

to the two environmental variables, depth and geographic location, to examine pos-

sible ecological rationale for the assemblages obtained. Two discrete clusters were

obtained having high probabilities (AU > 0.9) (edge 37 & 38, Figure 4.2). These

clusters were further divided into two. Essentially, four species assemblages were

obtained. The �rst assemblage (A) comprised of halibut, plaice, dab, monk�sh,

lemon sole, witch, fourbeaded rockling, whiting and haddock clustering at AU=0.86.

The second assemblage (B) consisted of tusk, saithe, red�sh, ling, norway haddock,

megrim, norway pout, blueling, blue whiting, greater argentine, skate, dog�sh and

deepwater red�sh at AU=0.90. The third assemblage (C) was Altantic wol�sh,

moustache sculpin, thorny skate, vahl's eelpout, cod, spotted wol�sh, lump�sh,

long rough dab and snake blenny AU=0.98. The fourth assemblage (D) comprised

of greenland halibut, esmark's eelpout, polar sculpin, Altantic sculpin, lycodes sp.,

artic rockling, Altantic poacher, long�n snail�sh and polar cod AU=0.88. The latin

names for the �sh species are outlined in Table A.1 in the Appendix.

These assemblages could be related to environmental parameters such as depth

and geographic distribution of the species. The species that clustered together had

similar geographical distributions also. Assemblages A and B were characterised as

species found in the southern region (Figure 4.26). In relation to the mean depths of

the species, the �rst assemblage was de�ned as shallow to intermediate with depths

ranging from 50m - 200m. The second assemblage was de�ned as intermediate to

deep with a mean depth range of 180m - 340m. Assemblages C and D characterised

the northern region (Figure 4.26), where assemblage C was categorised as shallow

to intermediate with a 150m - 250m depth range and assemblage D was de�ned

as deep ranging between 280m - >400m. This relationship between depth and the

identi�ed assemblages is demonstrated in Figure 4.27 where the weighted depths

and standard deviations for each species are outlined and each species is assigned

to the respective cluster.

The box and whisker plot in Figure 4.28a shows the data on which a one-way

ANOVA was performed to investigate statistical di�erences in the mean depths of

Page 74: Robustness of three hierarchical agglomerative clustering

58 Chapter 4 Results

the species comprising the assemblages. The ANOVA showed that the mean depths

at which the assemblages occurred were signi�cantly di�erent (F = 41.282, df:3, P

< 0.05). The Tukey multiple comparisons test showed that assemblages A, B and

D were signi�cantly di�erent from each other but assemblage B and C were not

signi�cantly di�erent (Figure 4.28b).

Average linkage gave similar assemblages when applied to data aggregated by sta-

tions, although some species from assemblage C became a part of assemblage D and

skate and dog�sh moved to cluster A (Figure 4.3a). Complete linkage gave similar

clusters with long rough dab, snake blenny, skate and dog�sh being exceptions. The

probability of the clusters were slightly lower than Average linkage (Figure 4.3b).

4.5.2 Analysis II: Bray-Curtis distance

Three assemblages were identi�ed by the Ward's linkage on data aggregated by sta-

tistical subrectangles. The �rst assemblage (A*) comprised of halibut, plaice, dab,

monk�sh, lemon sole, fourbeaded rockling, whiting, witch, tusk, saithe, red�sh, ling,

norway haddock, megrim, norway pout, blueling, blue whiting, greater argentine,

skate, dog�sh with an AU=0.94. The second assemblage (B*) comprised of cod,

spotted wol�sh, vahl's eelpout, moustache sculpin, long rough dab, thorny skate,

lump�sh, haddock and atlantic wol�sh with a probability of 0.94 and the third as-

semblage (C*) consisted of deepwater red�sh, polar cod, greenland halibut, lycodes

sp., artic rockling, long�n snail�sh, altantic poacher, atlantic sculpin, polar sculpin

and esmark's eelpout with an AU=0.88 (Figure 4.24).

The relationship between depth and the identi�ed assemblages is demonstrated

in Figure 4.29 where the weighted depths and standard deviations for each species

are outlined and each species is assigned to the respective cluster. The box and

whisker plot in Figure 4.28c shows the data on which a one-way ANOVA was per-

formed to see any statistical di�erences in the mean depths of the species comprising

the assemblages. The ANOVA showed that the mean depths at which the assem-

blages occurred were signi�cantly di�erent (F = 26.398, df:2, P < 0.05). The Tukey

multiple comparisons test showed that the di�erence lay between assemblage A and

C. Assemblages A and B were not signi�cantly di�erent (Figure 4.28d). The species

separated broadly into north and south divisions in relation to the geographic loca-

tion (Figure 4.26).

The Average linkage gave two signi�cant clusters when applied to data aggregated

Page 75: Robustness of three hierarchical agglomerative clustering

4.5 Fish Assemblages in relation to environmental variables 59

by statistical subrectangles. One of the assemblages was similar to assemblage C*

de�ned above with a probability of 0.96. The rest of the species grouped together

with a probability of 0.99 with two outliers (Figure 4.23a). Complete linkage gave

two distinct groups, with a probability of > 0.80, according to the north and south

divisions except species such as whiting, witch and fourbeaded rockling grouped

with the cod cluster instead (Figure 4.23b).

Page 76: Robustness of three hierarchical agglomerative clustering

60 Chapter 4 Results

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●● ●

● ●●

●●

●● ●●

●● ● ●

●●

●●

●●

● ●●

●●

●●

●●

●●● ●

●●

● ●●

● ●●● ●●●

●● ●●

●●

●●

●●

●●

● ● ●●● ●

●●

●● ●●

●●●

●●

●●

●●

●●

● ●●

●● ●

●●

●●

●●●

● ●●

●●

●●

●●

● ●●●

●●●

●●

●●

●●

●● ● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

● ● ●●

●●●

●●

● ● ●●

●● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ● ●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

Had

dock

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●● ●

●●

●● ●

●●

●●

●●

●●

● ● ●●●

●●

●●

●●

●●

● ●●●

●●

●●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

●● ● ●

●●

●● ● ●●

●●●

● ●●

●●● ●

●●

●●

●●

●●

● ●●

●● ●

●●

●●

● ●● ●

●●

● ●●

●●

●● ●

●●

●●

●●●

●● ●

●●

●●

●●

●● ● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

Tus

k

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●●

●● ●

●●

●●

●●

●●● ● ●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●●

●●

● ●●

●●

● ●●

●●● ●

●● ●●

●●

●●

● ●

●●

●● ●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

● ●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Sai

the

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

●●

●● ●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

● ●●

●●● ●

●●

●●●

●●

●●

●●

●●

● ●● ●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

Red

fish

Page 77: Robustness of three hierarchical agglomerative clustering

4.5 Fish Assemblages in relation to environmental variables 61

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

*

**

**

** *

*

**

**

**

**

***

***

**

**

**

*** * *

**

*

**

**

**

*

**

**

**

*

**

* **

**

**

*

* **

*

*

**

**

**

**

**

*

**

* *

*

**

** *

*

**

*

**

**

* **

**

*

* *

**

**

* **

***

*

**

*

* **

*

* *

* **

*

**

*** *

**

**

* *

** *

**

*

**

**

**

*

**

*

**

* *

**

**

**

** * *

**

* **

*

**

** *

*

** *

*

** *

**

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

** **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●● ●

●●

●●

●●

●●

●● ● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ●●●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●●

●●●

●●

●●

● ●

●●●

●●

● ●● ●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

Ling

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

*

**

**

** *

*

**

**

**

**

***

***

**

**

**

*** * *

**

*

**

**

**

*

**

**

**

*

**

* **

**

**

*

* **

*

*

**

**

**

**

**

*

**

* *

*

**

** *

*

**

*

**

**

* **

**

*

* *

**

**

* **

***

*

**

*

* **

*

* *

* **

*

**

*** *

**

**

* *

** *

**

*

**

**

**

*

**

*

**

* *

**

**

**

** * *

**

* **

*

**

** *

*

** *

*

** *

**

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

** **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●●

● ●●

●●

●●

●●

● ● ●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

●●

●●

● ● ● ●●

●●

●●

● ●●

●●

●●

●●

●●

● ●●

● ●●

●●

●● ●●●

●●●

●●

● ●

●●

●●

● ●● ●

●●

●● ●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Nor

way

had

dock

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

*

**

**

** *

*

**

**

**

**

***

***

**

**

**

*** * *

**

*

**

**

**

*

**

**

**

*

**

* **

**

**

*

* **

*

*

**

**

**

**

**

*

**

* *

*

**

** *

*

**

*

**

**

* **

**

*

* *

**

**

* **

***

*

**

*

* **

*

* *

* **

*

**

*** *

**

**

* *

** *

**

*

**

**

**

*

**

*

**

* *

**

**

**

** * *

**

* **

*

**

** *

*

** *

*

** *

**

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

** **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●● ●

●●

●●

●●

●●

●● ●

● ●●

●●

● ●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

Meg

rim

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

*

**

**

** *

*

**

**

**

**

***

***

**

**

**

*** * *

**

*

**

**

**

*

**

**

**

*

**

* **

**

**

*

* **

*

*

**

**

**

**

**

*

**

* *

*

**

** *

*

**

*

**

**

* **

**

*

* *

**

**

* **

***

*

**

*

* **

*

* *

* **

*

**

*** *

**

**

* *

** *

**

*

**

**

**

*

**

*

**

* *

**

**

**

** * *

**

* **

*

**

** *

*

** *

*

** *

**

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

** **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ● ●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●●

●●

Nor

way

pou

t

Page 78: Robustness of three hierarchical agglomerative clustering

62 Chapter 4 Results

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

*

**

**

** *

*

**

**

**

**

***

***

**

**

**

*** * *

**

*

**

**

**

*

**

**

**

*

**

* **

**

**

*

* **

*

*

**

**

**

**

**

*

**

* *

*

**

** *

*

**

*

**

**

* **

**

*

* *

**

**

* **

***

*

**

*

* **

*

* *

* **

*

**

*** *

**

**

* *

** *

**

*

**

**

**

*

**

*

**

* *

**

**

**

** * *

**

* **

*

**

** *

*

** *

*

** *

**

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

** **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

Blu

e lin

g

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

*

**

**

** *

*

**

**

**

**

***

***

**

**

**

*** * *

**

*

**

**

**

*

**

**

**

*

**

* **

**

**

*

* **

*

*

**

**

**

**

**

*

**

* *

*

**

** *

*

**

*

**

**

* **

**

*

* *

**

**

* **

***

*

**

*

* **

*

* *

* **

*

**

*** *

**

**

* *

** *

**

*

**

**

**

*

**

*

**

* *

**

**

**

** * *

**

* **

*

**

** *

*

** *

*

** *

**

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

** **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●●●

●●

●●●

●●

●●

●●

●●

● ●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

Blu

e w

hitin

g

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

*

**

**

** *

*

**

**

**

**

***

***

**

**

**

*** * *

**

*

**

**

**

*

**

**

**

*

**

* **

**

**

*

* **

*

*

**

**

**

**

**

*

**

* *

*

**

** *

*

**

*

**

**

* **

**

*

* *

**

**

* **

***

*

**

*

* **

*

* *

* **

*

**

*** *

**

**

* *

** *

**

*

**

**

**

*

**

*

**

* *

**

**

**

** * *

**

* **

*

**

** *

*

** *

*

** *

**

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

** **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

● ●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

Gre

ater

arg

entin

e

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●●

●●

●●

deep

wat

er r

edfis

h

Page 79: Robustness of three hierarchical agglomerative clustering

4.5 Fish Assemblages in relation to environmental variables 63

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

● ●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

● ●●

●●

●●

●●

Ska

te

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

Dog

fish

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

● ●●

●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●● ●

●●

●●

● ●●

●● ●

● ●●

●●

●●

●● ●●

●●

●● ●

● ●●

●●

●●●

●●●

● ●●

● ● ● ●● ● ●●

●●

●●

●●

● ●

●●

●●

●● ●●

●● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●●●

●●

●●

●●

● ●●

● ●●

●●

●● ●●

●●

●●

● ●●

●●

●●

● ●●

●●

●●

●●

Atla

ntic

wol

ffish

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●● ●

●●

●●

●●

●●

●●

●●

●● ●

●●●

●●

●●

● ●●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

● ●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

Mou

stac

he s

culp

in

Page 80: Robustness of three hierarchical agglomerative clustering

64 Chapter 4 Results

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

● ●●

●● ●

●●

● ●●● ● ●

●●

●●

● ●● ● ●●

● ●●●

●●●

●●

●●

● ●●●

●●

● ●● ●

●●

●●

● ●●

● ●●

●●

●●

●●

●●

●● ●

●● ● ● ●

●●●●

●● ● ●●

●● ●

● ● ●● ●

●●

● ●●

●● ●

●●

●●

●●●

●●●

● ●●

●●

●●

●●

●● ●

● ● ●●●

●●

●● ●●

● ● ●●●● ●

●●

●●●

● ●●●

●●

●● ●

● ●● ●

● ●●

●●

●●

●●

Lum

pfis

h

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

*

**

**

** *

*

**

**

**

**

***

***

**

**

**

*** * *

**

*

**

**

**

*

**

**

**

*

**

* **

**

**

*

* **

*

*

**

**

**

**

**

*

**

* *

*

**

** *

*

**

*

**

**

* **

**

*

* *

**

**

* **

***

*

**

*

* **

*

* *

* **

*

**

*** *

**

**

* *

** *

**

*

**

**

**

*

**

*

**

* *

**

**

**

** * *

**

* **

*

**

** *

*

** *

*

** *

**

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

** **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●● ● ●

●●● ●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●● ●

●●

●●

●●

● ●●

●●

● ●●

●●

●● ●

●●

●●

●●

●●

● ●●●

●●

● ● ● ●●●

●●

●●

●●

●●

●●

●● ● ●

●● ● ●

●●● ●

●●

●●

●●

●● ●

● ●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●●●

●●

●● ●

● ● ●●

●● ●●

● ● ●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

● ●●

● ●●

●●

●●

●●

Long

rou

gh d

ab

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

*

**

**

** *

*

**

**

**

**

***

***

**

**

**

*** * *

**

*

**

**

**

*

**

**

**

*

**

* **

**

**

*

* **

*

*

**

**

**

**

**

*

**

* *

*

**

** *

*

**

*

**

**

* **

**

*

* *

**

**

* **

***

*

**

*

* **

*

* *

* **

*

**

*** *

**

**

* *

** *

**

*

**

**

**

*

**

*

**

* *

**

**

**

** * *

**

* **

*

**

** *

*

** *

*

** *

**

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

** **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●● ●

●●

●● ●

●●

●●

●●

●●

●●

● ●●

●● ● ●

●● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

Sna

ke b

lenn

y

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

●●

●● ●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

● ●●

● ●●●

●● ●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●●

●●

● ●●

●●

● ●●●

●●●

●●

● ●●

● ●●

●●

● ●●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●● ● ●

●●

●●

●●●●

●● ● ●●

●● ●

●●

●●

●● ●

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●● ●

●● ●●

● ● ● ●●● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●● ● ●

● ●●● ●●

●●

●●

●● ●

Tho

rny

skat

e

Page 81: Robustness of three hierarchical agglomerative clustering

4.5 Fish Assemblages in relation to environmental variables 65

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●●

●● ●

●●

●●

●●

●●

● ● ●

●●

●●

●●

● ●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●● ●

●● ●●● ● ●

●●

● ●●

●●

●●

●●

●●

● ●

● ● ●●

● ●

●●

●●

●●

● ●●

●●

● ●●●

●●

● ●●

●●●

●●

●●

● ●●● ●●

● ● ● ●●

●●

● ●●

● ●●

●●

●●

●●

●●

●● ●

● ● ● ●●●

●●

●●●

● ● ●●●

● ●●●

●●

●● ● ●

●● ●

●● ●

●●

●●

●●

●●

● ● ●●●

●●

●●

●●●

●●

●●

●● ●

●●●● ● ●

●● ●●

●●

●●

●● ●

●●

●●

●● ●●

●●●

●●

●●● ●

●● ●●

●●

●●

●●

Cod

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●● ● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●

●●●

●●

● ●●

●●

●●

● ●●

● ●

●●●

●●

● ●●●

●●

●● ●

●●

●●

●●

●●

●●

● ●●● ● ●

●●

●●

●● ● ●●

●● ●

●●

●●

●● ●●

● ●●

●●

●●

●●

●●

●●

● ●● ●●

●●

●●

●●

●●

●●

●● ●

●● ●●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●● ●

●● ●●

●●

●●

● ●

Spo

tted

Wol

ffish

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●●

●●

● ●●

●●

●●

● ●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●●

●●

● ● ●●●

● ●●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●● ●

●● ●●

● ●●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ● ●●

●●

●●

●●

Vah

l's e

elpo

ut

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●● ●

●●

●●

●●

Gre

enla

nd H

alib

ut

Page 82: Robustness of three hierarchical agglomerative clustering

66 Chapter 4 Results

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●● ●●

● ●● ●

●●

●●

●●

Esm

ark´

s ee

lpou

t

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

●●

●● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●● ●

●●

●● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●●

●●

●●

●●

Pol

ar s

culp

in

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●●

●●

●●

●●

Atla

ntic

scu

lpin

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ● ●● ●

●●

●●

Lyco

des

sp.

Page 83: Robustness of three hierarchical agglomerative clustering

4.5 Fish Assemblages in relation to environmental variables 67

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●● ●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

● ●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●●

● ●●

●●

●●

●●

Art

ic r

ockl

ing

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●● ● ●

●●● ● ●

●●

●●

●●

Atla

ntic

poa

cher

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

● ●● ●

●● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●● ●●

● ●● ●

●●

●●

●●

●●

Long

fin s

nailf

ish

63°

64°

65°

66°

67°

28°

26°

24°

22°

20°

18°

16°

14°

12°

10°

* **

**

**

***

**

**

**

**

**

* * **

**

**

**

* **

**

**

**

**

***

***

**

** *

***

* * **

***

**

**

**

**

**

**

**

** **

**

**

*

* **

*

**

**

**

*

***

*

*

**

* *

* **

** *

*

**

*

**

*** *

**

**

* **

*

**

* **

***

**

*

*

* **

*

* *

* **

*

**

*** *

**

**

* **

* **

**

**

**

**

** *

**

** *

**

**

***

* * ***

* **

**

**

* **

** *

**

* **

*

**

**

**

*

**

**

*

**

**

**

**

**

**

** *

* * **

**

**

**

**

**

**

** *

**

**

**

**

*

**

**

* **

**

* **

* **

**

**

**

*

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

● ●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

pola

r co

d

Figure4.26:Geographicaldistribution

ofthe40

speciesanalysed

forthisstudy,

labelledaccordingly.

The

bubble

plot

show

sthemeanabundanceofspeciesby

statisticalsubrectangles

averaged

acrossyears.The

size

ofcirclesareproportional

tothesquare

root

ofthemeanabundance.

Page 84: Robustness of three hierarchical agglomerative clustering

68 Chapter 4 Results

0

100

200

300

400

500

600

dab A

dogfish A

plaice A

lemon sole A

halibut A

whiting A

haddock A

atlantic wolffish C

lumpfish C

monkfish A

snake blenny C

witch A

saithe B

long rough dab C

fourbeaded rockling A

skate A

moustache sculpin C

norway pout B

thorny skate C

cod C

norway haddock B

redfish B

tusk B

ling B

vahl's eelpout C

megrim B

spotted wolffish C

polar cod D

polar sculpin D

blue whiting B

blueling B

atlantic poacher D

artic rockling D

greater argentine D

longfin snailfish D

atlantic sculpin D

deepwater redfish D

greenland halibut D

esmark's eelpout D

lycodes sp D

Depth (m)

Figure4.27:Weightedaveragedepths

andstandard

deviations

forthe40

speciesanalysed.A-D

refersto

theidenti�ed�sh

assemblages

from

Ward'shierarchicalclustering

basedon

correlationdistance.

Page 85: Robustness of three hierarchical agglomerative clustering

4.5 Fish Assemblages in relation to environmental variables 69

(a)

A B C D

100

200

300

400

Assemblage

Dep

th

(b)

−100 0 100 200

D−

CD

−B

C−

BD

−A

C−

AB

−A

95% family−wise confidence level

Differences in mean levels of Assemblage

(c)

●●

A B C

100

200

300

400

Assemblage

Dep

th

(d)

−50 0 50 100 150 200

C−

BC

−A

B−

A

95% family−wise confidence level

Differences in mean levels of Assemblage

Figure 4.28: (a) Box and whisker plot for the mean depths of species in the identi�ed�sh assemblages from Ward's hierarchical clustering based on correlation distance(b) Tukey test results showing the signi�cant di�erence between the identi�ed �shassemblages (c) Box and whisker plot for the mean depths of species in the identi�ed�sh assemblages from Ward's hierarchical clustering based on Bray-Curtis distance(d) Tukey test results showing the signi�cant di�erence between the identi�ed �shassemblages from (c)

Page 86: Robustness of three hierarchical agglomerative clustering

70 Chapter 4 Results

0

100

200

300

400

500

600

dab A*

dogfish A*

plaice A*

lemon sole A*

halibut A*

whiting A*

haddock B*

atlantic wolffish B*

lumpfish B*

monkfish A*

snake blenny A*

witch A*

saithe A*

long rough dab B*

fourbeaded rockling A*

skate A*

moustache sculpin B*

norway pout A*

thorny skate B*

cod B*

norway haddock A*

redfish A*

tusk A*

ling A*

vahl's eelpout B*

megrim A*

spotted wolffish B*

polar cod C*

polar sculpin C*

blue whiting A*

blueling A*

atlantic poacher C*

artic rockling C*

greater argentine A*

longfin snailfish C*

atlantic sculpin C*

deepwater redfish C*

greenland halibut C*

esmark's eelpout C*

lycodes sp C*

Depth (m)

Figure4.29:Weightedaveragedepths

andstandard

deviations

forthe40

speciesanalysed.A*-C*refers

totheidenti�ed

�shassemblages

from

Ward'shierarchicalclustering

basedon

Bray-Curtisdistance.

Page 87: Robustness of three hierarchical agglomerative clustering

4.6 Habitat Classi�cation 71

4.6 Habitat Classi�cation

4.6.1 Analysis I: Correlation distance

The Average and Ward linkage yielded similar results. When the dendrogram of

subrectangles was split into 5 clusters, a separation along the north-west and south-

east gradient was obtained with clusters 1, 4 & 5 in the north and clusters 2 & 3 in

the south. The north and south areas further separated along the depth gradient

(Figure 4.30a). The output from Ward's linkage is presented here. The output from

Average linkage is given in Appendix; Figure A.1a. Whereas the outcome from the

Complete linkage was di�erent and is outlined in Appendix; Figure A.1b.

The species composition of the �ve clusters is delineated in Figure 4.31. Cluster

1 mainly comprised of greenland halibut, blue whiting, atlantic poacher, deepwater

red�sh, esmark's eelpout, long�n snail�sh, polar cod, atlantic sculpin, vahl's eelpout,

polar sculpin, artic rockling, lycodes sp. and some of altantic cod, thorny skate and

spotted wol�sh. Cluster 5 mainly comprised of atlantic wol�sh, lump�sh, moustache

sculpin, vahl's eelpout, cod, haddock, spotted wol�sh, tusk, long rough dab, snake

blenny and some of the species in cluster 1. Cluster 4 consisted of haddock, whiting,

thorny skate, plaice, witch, long rough dab, lump�sh, fourbeaded rockling, vahl's

eelpout, snake blenny and some cod. Cluster 3 consisted of haddock, atlantic wol�sh,

monk�sh, dog�sh, halibut, plaice, lemon sole, dab, lump�sh and moustache sculpin.

Cluster 2 contained haddock, saithe, whiting, red�sh, ling, blueling, tusk, monk�sh,

skate, dog�sh, greater argentine, halibut, lemon sole, witch, megrim, norway pout,

blue whiting, fourbeaded rockling, norway haddock. The species codes shown in

Figure 4.31 are outlined in Table A.1 in the Appendix.

4.6.2 Analysis II: Bray-Curtis distance

The Bray-Curtis distance with Ward's linkage showed a de�nition along the north

and south areas with some separation along the depth gradient within these areas

(Figure 4.30b). The Complete linkage also gave similar patterns (Appendix, Figure

A.2b). The Average linkage however gave a de�nition along the north and south

areas only but de�nition according to depth was not apparent (Appendix, Figure

A.2a). The species compositions for the di�erent clusters are delineated in (Figure

4.32).

Page 88: Robustness of three hierarchical agglomerative clustering

72 Chapter 4 Results

4.7 Heatmap

A heatmap of the association between species and areas is shown in Figure 4.33.

The map shows a pair-wise display of two dendrograms which were generated us-

ing the Average linkage technique based on Analysis I: Correlation distance. The

species assemblage dendrogram is on the y-axis and the assemblage of areas den-

drogram is on the x-axis. The spectrum of colours ranging from blue (low ratios)

to red (high ratios) gave three main patches of high ratio colours indicating the

species-environment patterns. Thus it can be seen that the species relationships

were re�ected by the spatial relationships.

Page 89: Robustness of three hierarchical agglomerative clustering

4.7 Heatmap 73

(a)

62°3

0'

63°

63°3

0'

64°

64°3

0'

65°

65°3

0'

66°

66°3

0'

67°

67°3

0'

29°

28°

27°

26°

25°

24°

23°

22°

21°

20°

19°

18°

17°

16°

15°

14°

13°

12°

11°

10°

clus

ter

1cl

uste

r 2

clus

ter

3cl

uste

r 4

clus

ter

5

(b)

62°3

0'

63°

63°3

0'

64°

64°3

0'

65°

65°3

0'

66°

66°3

0'

67°

67°3

0'

29°

28°

27°

26°

25°

24°

23°

22°

21°

20°

19°

18°

17°

16°

15°

14°

13°

12°

11°

10°

clus

ter

1cl

uste

r 2

clus

ter

3cl

uste

r 4

clus

ter

5

Figure4.30:De�nition

ofareasin

IcelandicwatersusingWard'shierarchical

clustering.

The

data

consistof

species

abundancein

numberstransformed

tofourth

root.Clusteringwas

basedon

(a)correlationdistance

withdata

scaled

to0

meanandvariance

1(b)Bray-Curtisdistance

withdata

standardised

byrange.

Page 90: Robustness of three hierarchical agglomerative clustering

74 Chapter 4 Results

cod

had

sai

whi

red

linbl

utu

sat

wth

osp

om

onsk

ado

ggr

aha

lgr

epl

ale

mw

itm

egda

blrd

nor

blw

lum

mou

atp

fou

noh

der

esm

los

pol

ats

vah

pos

art

sna

lyc

clus

ter

1

−1.00.5

cod

had

sai

whi

red

linbl

utu

sat

wth

osp

om

onsk

ado

ggr

aha

lgr

epl

ale

mw

itm

egda

blrd

nor

blw

lum

mou

atp

fou

noh

der

esm

los

pol

ats

vah

pos

art

sna

lyc

clus

ter

2

−0.50.5

cod

had

sai

whi

red

linbl

utu

sat

wth

osp

om

onsk

ado

ggr

aha

lgr

epl

ale

mw

itm

egda

blrd

nor

blw

lum

mou

atp

fou

noh

der

esm

los

pol

ats

vah

pos

art

sna

lyc

clus

ter

3

−0.50.5

cod

had

sai

whi

red

linbl

utu

sat

wth

osp

om

onsk

ado

ggr

aha

lgr

epl

ale

mw

itm

egda

blrd

nor

blw

lum

mou

atp

fou

noh

der

esm

los

pol

ats

vah

pos

art

sna

lyc

clus

ter

4

−1.00.52.0

cod

had

sai

whi

red

linbl

utu

sat

wth

osp

om

onsk

ado

ggr

aha

lgr

epl

ale

mw

itm

egda

blrd

nor

blw

lum

mou

atp

fou

noh

der

esm

los

pol

ats

vah

pos

art

sna

lyc

clus

ter

5

−0.50.5

Figure4.31:Sp

eciescompositionof

de�ned

clustersfrom

thehabitatclassi�cationusingCorrelation

distance

measure

and

Ward'slinkage.The

speciescodesareoutlined

inTable4in

theApp

endix.

Page 91: Robustness of three hierarchical agglomerative clustering

4.7 Heatmap 75

cod

had

sai

whi

red

linbl

utu

sat

wth

osp

om

onsk

ado

ggr

aha

lgr

epl

ale

mw

itm

egda

blrd

nor

blw

lum

mou

atp

fou

noh

der

esm

los

pol

ats

vah

pos

art

sna

lyc

clus

ter

1

−1.50.01.5

cod

had

sai

whi

red

linbl

utu

sat

wth

osp

om

onsk

ado

ggr

aha

lgr

epl

ale

mw

itm

egda

blrd

nor

blw

lum

mou

atp

fou

noh

der

esm

los

pol

ats

vah

pos

art

sna

lyc

clus

ter

2

−0.50.5

cod

had

sai

whi

red

linbl

utu

sat

wth

osp

om

onsk

ado

ggr

aha

lgr

epl

ale

mw

itm

egda

blrd

nor

blw

lum

mou

atp

fou

noh

der

esm

los

pol

ats

vah

pos

art

sna

lyc

clus

ter

3

−0.50.5

cod

had

sai

whi

red

linbl

utu

sat

wth

osp

om

onsk

ado

ggr

aha

lgr

epl

ale

mw

itm

egda

blrd

nor

blw

lum

mou

atp

fou

noh

der

esm

los

pol

ats

vah

pos

art

sna

lyc

clus

ter

4

−0.50.5

cod

had

sai

whi

red

linbl

utu

sat

wth

osp

om

onsk

ado

ggr

aha

lgr

epl

ale

mw

itm

egda

blrd

nor

blw

lum

mou

atp

fou

noh

der

esm

los

pol

ats

vah

pos

art

sna

lyc

clus

ter

5

−1.00.52.0

Figure4.32:Sp

eciescompositionofde�ned

clustersfrom

thehabitatclassi�cationusingBray-Curtisdistance

measure

and

Ward'slinkage.The

speciescodesareoutlined

inTable4in

theApp

endix.

Page 92: Robustness of three hierarchical agglomerative clustering

76 Chapter 4 Results

573320574611476619373620563513512562617618721375564424474672613663717667374668615718616714715664614671662568425422477423472571561716612426666376372475665511323361360362311673312310322318669570416575719370670319523625675624626623460410364365524366415462413414412463621527367569317461576411321674722622723525315316324473371720363526

deep

wat

er r

edfis

hpo

lar

cod

thor

ny s

kate

artic

roc

klin

gat

lant

ic p

oach

erpo

lar

scul

pin

atla

ntic

scu

lpin

gree

nlan

d ha

libut

esm

ark'

s ee

lpou

tlo

ngfin

sna

ilfis

hly

code

s sp

vahl

's e

elpo

utco

dsp

otte

d w

olffi

shsk

ate

dogf

ish

saith

ere

dfis

hm

onkf

ish

lem

on s

ole

ling

norw

ay h

addo

ckm

egrim

norw

ay p

out

tusk

blue

whi

ting

blue

ling

grea

ter

arge

ntin

em

oust

ache

scu

lpin

atla

ntic

wol

ffish

lum

pfis

hlo

ng r

ough

dab

snak

e bl

enny

witc

hfo

urbe

aded

roc

klin

gha

libut

dab

plai

ceha

ddoc

kw

hitin

g

−2

−1

01

23

4C

olum

n Z

−S

core

Col

or K

ey

Figure4.33:A

heatmap

show

ingthespecies-area

associationfortheIcelandicGround�sh

(IGF)survey

from

1998-2007

usingAverage

linkage

hierarchical

clustering

withcorrelationdissimilarity

measure.The

x-axisshow

sthedendrogram

ofareas(statistical

rectangles)andy-axisshow

sthedendrogram

ofspeciesassemblage.

Dataconsists

ofspeciesabundance

innumbers,fourth

root

transformed

andscaled

to0meanandvariance

1.The

coloursrangefrom

blue

(low

ratios)to

red

(highratios)indicating

thestrength

ofassociations.

Page 93: Robustness of three hierarchical agglomerative clustering

5Discussion

A clustering algorithm will always generate a clustering structure even if no real

structure may be intrinsic to the data (Loganantharaj et al., 2006) and di�erent

clustering algorithms are likely to generate di�erent results from the same data set.

The problem becomes more complex when the choice of the dissimilarity measure

to be used is taken into consideration, and the data properties itself, which in turn

in�uence the e�ectiveness of the algorithm (Loganantharaj et al., 2006). This issue

becomes more di�cult as the number of variables increases. Cluster validity has

therefore been a subject of interest and importance in the �eld of molecular genetics

for some decades now. However, substantive guidelines are not available in regards

to the choice of the appropriate algorithms and distance metric for ecological data.

In the �eld of ecology the Average linkage technique is generally recommended in

conjunction with the Bray-Curtis distance measure (Clarke and Warwick, 2001;

Quinn and Keough, 2002).

A number of assessment criteria were used in this study to test the robustness

of the three hierarchical agglomerative clustering techniques that are commonly

applied in ecological studies, Average, Complete and Ward's linkage. According

to the internal criteria of cluster validity and e�ciency, CPCC, Average linkage

performed most e�ciently for both modes of data analyses (Correlation and Bray-

Curtis distance) and yielded the highest values for the coe�cient. In theory, this

would indicate that this linkage generated a classi�cation which was most similar

to the original dissimilarity patterns in the species and site matrix, since the CPCC

is a basic correlation between the two matrix of dissimilarities, that is prior to and

77

Page 94: Robustness of three hierarchical agglomerative clustering

78 Chapter 5 Discussion

subsequent to the clustering. Thus by the de�nition of the CPCC criterion, the

overall performance ranking of the clustering methods were Average followed by

Complete then Ward's, although the CPCC for Complete and Ward's linkage were

not considerably lower. However, Complete linkage did not perform e�ciently when

applied to the full set of data.

On the other hand, based on the AC criterion which measures the caliber of the

clustering, Ward's linkage outperformed the Average and Complete linkage for both

modes of data analyses. The AC values for this method were always higher than

the other two techniques. Thus this clustering technique gave the highest quality

of clustered data set. This can be seen from the dendrograms that had well-de�ned

clusters forming at lower dissimilarity levels. Also, there were no outliers produced

by this technique. Ward's linkage is designed to give compact clusters that minimise

the loss of information based on the sums of squares criteria (Ward, 1963). Thus

from a di�erent perspective, this technique could impose clusters or patterns on a

data set which are not truly there (Gauch Jr and Whittaker, 1981). Average and

Complete linkage, on the other hand, gave clusters at high dissimilarity levels and

some species were always de�ned as outliers.

An assessment of the uncertainty of the clusters, through bootstrapping showed

that Ward's linkage performed the best. When applied to the full data set for the

Correlation distance, it gave two distinct clusters with high probabilities. Thus some

con�dence could be placed in the clusters that were obtained. Average and Complete

linkage on the other hand, gave clusters with lower probabilities resulting in many

small signi�cant clusters. Thus the species were not grouping together with a high

likelihood. For Ward's linkage similar patterns were observed with aggregated data

except the likelihood (p-values) of the clusters were lower. This technique was also

robust across the decreasing sample sizes and performed well down to a subsample

of 10% with some anomalies. The clusters obtained were similar although their

probabilities were much lower. For Average linkage, aggregating the data formed

clusters at lower dissimilarity levels. However this reduced the probability of the

clusters. The linkage method worked well down to a sample size of 25% giving

similar species clusters.

The Complete linkage was observed to be the most unstable method. Irrespective

of the data analysis method used, it was sensitive to the di�erent levels of data

aggregation (smoothing) and the extent of the data used for clustering (sample

size). When the full set of data were used this technique did not perform well and

Page 95: Robustness of three hierarchical agglomerative clustering

Discussion 79

gave unclear de�nition of clusters. With aggregated data the classi�cation was more

de�ned with high probability of clustering. Similarly as the samples were reduced

the patterns observed from the clustering were not coherent. This algorithm only

allows an object to merge with a cluster if it is similar to all objects already present

in the cluster (Legendre, 1998). Thus as a cluster is formed it is receding in space

from other clusters as its dissimilarity with the other groups increases (Cao et al.,

1997a). Hence a lot information in the data set could potentially a�ect the algorithm

in de�ning close groups which lead to an unstable outcome.

Generally, it seemed that a reduction in sample size reduced the information

about the species-site similarities in the data which resulted in lower bootstrap

probability values for the clusters. Even though the assemblages obtained were

similar for Ward's and Average linkage, for Correlation distance, the accuracy and

reliability of the clusters decreased with fewer samples.

Ward's linkage yielded similar species assemblages even with the Bray-Curtis

distance measure. However, this was when highly smooth data (aggregated by sub-

rectangles) were used. There distinct clusters, with similar species composition,

were obtained with high probabilities. Complete linkage also gave comparable re-

sults with some exceptions, when the outcome for data aggregated by stations for

Correlation distance and data aggregated by statistical subrectangles for Bray-Curtis

distance were compared. Average linkage on the other hand appeared sensitive to

the type of data standardisation and the distance measure that were used. This

method resulted in considerably di�erent species assemblages for the two modes of

data analyses. The Average clustering algorithm takes an average dissimilarity be-

tween two groups. All agglomerative methods inhabit a monotonic property, that

is the dissimilarity between the merged clusters increases monotonically with the

level of the merger. The average technique appears sensitive to the numerical scale,

on which the clustering dissimilarities are calculated from the initial dissimilarities,

since applying a monotonic function to averaging formula can have an e�ect on the

outcome (Hastie et al., 2001). Average linkage in combination with data standard-

ised by range and Bray-Curtis distance did not perform well in identifying species

assemblages for this data set. Clarke and Warwick (2001) recommend a row stan-

dardisation on untransformed data for Average linkage with Bray-Curtis distance

measure.

The general observation was that the Ward's linkage, when applied with Corre-

lation distance performed better with full data set. This was assessed in terms of the

Page 96: Robustness of three hierarchical agglomerative clustering

80 Chapter 5 Discussion

accuracy and reliability of the clusters. On the contrary, with Bray-Curtis distance,

the method performed better with highly smooth data (aggregated by statistical

subrectangles). This could be related to the properties of the Bray-Curtis distance

measure which compares two species according to their minimum abundance at each

site.

Stability in cluster analysis is to a great extent dependent on the data set it-

self. Essentially if strong patterns are not present in the data then the clustering

algorithm might not give clear de�nitions and di�erent methods may give consid-

erable deviations in the patterns obtained (Hennig, 2007). The NMDS ordination

technique which is considered more reliable in �nding groups was used as an inde-

pendent technique to verify results from hierarchical cluster analyses. The NMDS

ordination showed roughly three groupings which were similar to the clusters ob-

tained from Ward's linkage, with Bray-Curtis distance and resulted in stress values

of approximately 0.075 in three-dimensions for data aggregated by statistical sub-

rectangles. Normally, results giving stress values of < 0.1 indicates a good ordination

with no real likelihood of misleading interpretation (Clarke and Ainsworth, 1993).

The Average technique has some desirable properties such as the maximisation

of the cophenetic correlation which makes it highly preferable in ecological studies

(Gauch Jr and Whittaker, 1981). As Cao et al. (1997a) point out, it has seldom been

assessed whether the classi�cation acquired from the Average linkage is ecologically

meaningful even though the technique is highly recommended. Cao et al. (1997a)

based their study on river samples with some predeterminations on site separation

from cluster analysis. They found that Ward's and Complete linkage were better

in site separation of the samples in comparison to Average linkage with Ward's

linkage performing better. Similar observations are made in this study. Gauch Jr

and Whittaker (1981) also showed that Average and Complete linkage were not as

adequate in recognising pre-determined plant communities as Ward's linkage and

other non-hierarchical clustering methods.

Since its formulation by Sokal and Rohlf (1962), the CPCC criterion of cluster

validity has been widely applied (Farris, 1969). However this criterion has been ques-

tioned and studies such as Farris (1969); Rohlf and Fisher (1968); Phipps (1971) have

deemed it inadequate. In this study the CPCC criteria was not adequate in identi-

fying the optimal clustering method either. As described earlier, it is a correlation

Page 97: Robustness of three hierarchical agglomerative clustering

Discussion 81

between the initial dissimilarities and the �nal cophenetic dissimilarity obtained by

the clustering algorithm. The cophenetic dissimilarity is a restrictive measure since

it contains tied values i.e. out of the N(N − 1)/2 pair of dissimilarities only N − 1

values can be distinct (Hastie et al., 2001). Additionally, hierarchical classi�cations

of objects obey ultrametric inequality for distance hij (from classi�cation), �every

triple of objects (i, j, k) possesses the property that the two largest values in the set

hij, hik, hjk are equal� (Gordon, 1999). Comparing dissimilarities and ultrametric

distances seems ambiguous by measuring the strength of their linear relationship,

even more so when the ultrametric distance contains many tied values (Gordon,

1999). Besides, the signi�cance of CPCC cannot be tested since the cophenetic ma-

trix is dependent on the original dissimilarity matrix (Legendre, 1998). Thus the

nature of this cluster validity index is limiting.

Ward's linkage was identi�ed as the most robust method after assessing it against

the above criteria. This linkage method has been shown to be a robust method in

some non-ecological studies (Scheibler and Schneider, 1985; Milligan and Cooper,

1987). Its use in ecology has been restricted since it is normally used in conjunction

with Euclidean distance which appears unsuitable for species abundance data, as

noted earlier. This study showed that this linkage method performed well with

Correlation and Bray-Curtis distance metrics.

It was considered important that the validity and e�ciency of the linkage tech-

niques were not entirely based on only numeric indices. Clusters can be stable and

yet give meaningless results therefore it is important to complement the results by

some visual inspection and subject-based validation (Hennig, 2007). In the present

study, no pre-determinations could be made about the �sh assemblages. However,

as community structures may change along environmental gradients it was inferred

that the assemblages should be distributed according to some key environmental

variables, in this case geographic distribution and depth were considered in�uen-

tial parameters. Thus the obtained assemblages were related to these variables to

observe any meaningful ecological patterns.

Page 98: Robustness of three hierarchical agglomerative clustering

82 Chapter 5 Discussion

5.1 Fish Assemblages and species-environment re-

lationships

The boreal �sheries are dominated by a few key species which strongly interact.

Generally highly dynamic environment, attributed to the oceanographic conditions,

in�uence the �sh stocks (Livingston and Tjelmeland, 2000). The assemblage pat-

terns of the fourty most abundant species in the Icelandic ground�sh survey area

were studied here. The focus was more on demersal species therefore pelagic species

such as capelin and herring were not considered in the study.

Bathymetric studies show that Iceland is situated on two ridges, the mid Atlantic

ridge running from south-west Reykjanes ridge to the north-east Jan Mayen ridge

and the Faroes-Greenland ridge going from south-east to north-west (Stefánsson and

Pálsson, 1997). The bathymetry largely in�uences the hydrography. Several water

masses are present in the Iceland shelf. The Irminger current, which is part of the

North Atlantic current, brings the warm saline Atlantic water to the south coast of

Iceland. To the North, the East Icelandic current is cold and fresh as it carries Artic

water, sea ice and icebergs from East Greenland Current. These largely a�ect the

atmosphere and oceanography around Iceland with warm conditions in the south

and the west, cold in the east, and variable conditions in the north (Valdimarsson

and Malmberg, 1999). Di�erent water masses have distinct thermal and oxygen

concentrations and temperature and salinity are highly variable as a result. This

leads to a natural separation in the habitat preferences of �sh species. Thus it

was inferred that the species occurring in the north and south areas should cluster

separately.

The species assemblage obtained by the Ward's linkage on Correlation distance

gave a separation along the geographic location (north and south) and depth gradient

within each region, as per the inference. Some con�dence could be placed in the

clusters obtained as the bootstrap generated high probability values for these clusters

and indicated that the assemblages were not entirely a result of random e�ects. High

probability values essentially indicate the accuracy of a cluster where �accuracy

means the certainty of the existence of a cluster� (Suzuki and Shimodaira, 2004).

Essentially, four �species assemblage areas� (Jaureguizar et al., 2006) were de-

�ned on the basis of the geographic distribution of the species. Species found in

the north clustered together (assemblages C and D). These formed two constituent

Page 99: Robustness of three hierarchical agglomerative clustering

5.1 Fish Assemblages and species-environment relationships 83

groups, one containing the deepwater species such as Greenland halibut, that prefer

colder environmental conditions, and one containing species which are more dis-

persed within the area such as cod and the shallow range species such as lump�sh.

Species found in the south, that prefer warmer conditions, clustered into two assem-

blages, A and B. Assemblage A contained the shallow water species and assemblage

B was the group of species which were present in the intermediate to deep region.

Similar observations have been made by studies on demersal �sh assemblages in the

region (Bergstad et al., 1999; Colvocoresses and Musick, 1984; Fariña et al., 1997;

Gabriel, 1992; Rätz, 1999) where depth and geographic distribution were signi�cant

variables in explaining the �sh assemblages. A similar observation was made for

the analysis based on Bray-Curtis distance. Ward's linkage could be related to the

environmental gradients. Bottom temperature and salinity are other two potentially

important variables that could explain the variability in the �sh assemblages. How-

ever this has not been addressed in the present study since the primary focus of the

study was on the methodological aspects of identifying �sh assemblages.

Species assemblages are groups of species that tend to co-occur in space and time

because they have similar habitat preferences or because they interact biologically.

Nonetheless, association of species or co-occurrence does not necessarily imply that

the species are interacting (Legendre, 1998). This study showed assemblage patterns

in the data and it was seen that the environmental gradients, depth and geographic

properties, played a role in the structuring of the �sh assemblages. Thus the �sh

assemblages re�ected the habitat heterogeneity.

The deeper water species such as Greenland halibut, Altantic poacher, long�n

snail�sh and others that form part of this species group (assemblage D), have dis-

tinguished geographical locations and it was observed that this cluster of species

was always obtained irrespective of the data analysis and clustering methods used.

Whereas, most of the other species occur in a wide area and this could have confused

the multivariate patterns, leading to discrepancies in the classi�cations acquired with

di�erent approaches used for analysis.

The de�nition of areas around Iceland (habitat classi�cation) also led to a sep-

aration along the north-south gradient which further showed some di�erentiation

along depth. The de�nitions obtained were comparable to the previous study on

the de�nition of oceanic areas around Iceland in Stefánsson and Pálsson (1997),

which was in relation to identifying appropriate areas for Bormicon, a Boreal migra-

tion and consumption model for multispecies modeling. Similar observations were

Page 100: Robustness of three hierarchical agglomerative clustering

84 Chapter 5 Discussion

made in this study, where the areas were approximately split according to the Bormi-

con area de�nitions (Stefánsson and Pálsson, 1997). The previous study was based

on hierarchical cluster analysis of some key species including cod, haddock, saithe,

red�sh, cat�sh, Greenland halibut, plaice, herring, capelin and shrimp showed some

consistency in the cluster of areas and the Bormicon strata. It was seen that this

independent study which took many species into consideration and di�erent hierar-

chical clustering methods, complemented the de�nitions of the Bormicon strata.

This study experimenting the use of heatmap in the �eld of ecology for pattern

recognition. The visual display showed some patterns in community structure. Es-

sentially three species-environment associations could be observed through the high

ratio (red) patches. These identi�ed the species characteristic of the northern area

and their corresponding habitats (statistical squares). The species in the southern

areas are divided into two according to depth. This basically gives a visual rep-

resentation that speci�c species groups characterise speci�c geographical locations.

It should be noted that the heatmap here was generated using the default settings

which was Average linkage hierarchical clustering with correlation distance mea-

sure. However, the heatmap routine in R can be used to de�ne speci�c clustering

techniques and distance measures for calculating the dendrograms.

Page 101: Robustness of three hierarchical agglomerative clustering

6Main considerations and

recommendations

The Ward's linkage was the most robust hierarchical clustering method according to

this study and is recommended for any further studies based on the Icelandic ground-

�sh survey data. It generated consistent well-de�ned clusters with high probabili-

ties and gave high values of CPCC and AC. The assemblages were also ecologically

meaningful when related to two environmental parameters depth and geographical

distribution. It also performed well for the classi�cation of habitats, giving a de�-

nition as per the inference based on the bathymetric and hydrographic conditions

of the Icelandic continental shelf. Complete linkage worked well with aggregated

data, but was generally an unstable method. The Average technique appeared to be

sensitive to the type of data standardisation and distance measure used. The Bray-

Curtis distance metric in conjunction with Average linkage on data standardised by

range was not a suitable method of analysis for this data set. The �shing areas were

also not well-de�ned by this mode of data analysis.

The choice of the distance measure, data standardisation and clustering algo-

rithm is important and should be given more attention. As has been noted in prior

studies, the internal criteria for cluster validity CPCC was not adequate for this

study either.

Biological interpretations of �sh assemblages showed that the spatial structure

of the environmental gradients around Iceland played a role in characterising the

�sh assemblages. Further studies of this nature could relate the �sh assemblages

85

Page 102: Robustness of three hierarchical agglomerative clustering

86 Chapter 6 Main considerations and recommendations

with other environmental variables such as temperature and salinity which could be

signi�cant parameters in explaining the variation in �sh assemblages. Examining

some spatial and temporal patterns in species assemblages could also be of interest.

Use of visualisation techniques such as heatmaps are recommended in the �eld

of ecology for displaying community patterns (species-habitat associations). Gen-

erating a heatmap based on Ward's linkage would be recommended for any further

studies of this nature.

Some limitations of the study need to be taken into consideration and some

appropriate recommendations are provided. More attention needs to be paid to

the initial sample selection criterion for analysis. Some pelagic and semi-pelagic

species such as blue ling and greater argentine were not excluded from the data

before analysis. This needs to be taken into consideration for any further studies

of this nature, if the emphasis needs to be on demersal species. In future this type

of analysis could also incorporate some details on the structural composition of the

major species by splitting the abundance values into juvenile (immature) and adult

(mature) prior to analysis.

The Icelandic ground�sh survey covers the �shing grounds down to 500m depth

as it was primarily designed for cod. As such, the variability of deep water species

such as Greenland halibut are relatively high in the survey. The autumn survey on

the other hand covers stations in deeper waters even though it has fewer stations.

However, this study indicated that a reduction in the sample size did not lead to any

major changes in the species assemblage patterns. Whether the high variability of

some deep water species in the spring survey, which are included in this assemblage

study, have an e�ect on the species associations could be examined by using the

data from the autumn survey.

Fisheries management is largely moving toward community analysis and identi-

fying potential management strategies to target �sh assemblages rather than single

species. These �ndings on species assemblages in relation to the particular envi-

ronmental conditions and the habitat de�nitions could be used for multi-species or

ecosystem based management purposes. Further research on temporal and spatial

variability and persistence of these assemblages would be recommended. Whether

these assemblages have functional relationships cannot be determined from this anal-

ysis. Some trophic studies in relation to habitat association within the de�ned

assemblages could be used to determine some functional associations between the

species. The de�nition of the speci�c geographical units having distinct species as-

Page 103: Robustness of three hierarchical agglomerative clustering

Main considerations and recommendations 87

semblages relating to the bathymetry and hydrographic conditions, such as shown

here, could also be utilised for conservation purposes, for example if there were

intentions of setting up marine protected areas then these species-environment re-

lationships could be useful.

Page 104: Robustness of three hierarchical agglomerative clustering

88 Chapter 6 Main considerations and recommendations

Page 105: Robustness of three hierarchical agglomerative clustering

AAppendix

89

Page 106: Robustness of three hierarchical agglomerative clustering

90 Chapter A Appendix

Common Name Latin Name Code

Cod Gadus morhua codHaddock Melanogrammus aegle�nus hadSaithe Pollachius virens saiWhiting Merlangius merlangus whiRed�sh Sebastes marinus redLing Molva molva linBlueling (European ling) Molva dipterygia bluTusk Brosme brosme tusAtlantic wol�sh Anarhichas lupus atwThorny skate (starry ray) Raja (Amblyraja) radiata thoSpotted wol�sh (leopard�sh) Anarhichas minor spoMonk�sh Lophius piscatorius monSkate Raja (Dipturus) batis skaDog�sh Squalus acanthias dogGreater argentine Argentina silus graHalibut Hippoglossus hippoglossus halGreenland halibut Reinhardtius hippoglossoides grePlaice Pleuronectes platessa plaLemon sole Microstomus kitt lemWitch Glyptocephalus cynoglossus witMegrim Lepidorhombus whi�agonis megDab Limanda limanda dabLong rough dab Hippoglossoides platessoides limandoides lrdNorway pout Trisopterus esmarki norBlue whiting Micromesistius poutassou blwLump�sh (lumpsucker) Cyclopterus lumpus lumMoustache sculpin Triglops murrayi mouAtlantic poacher Leptagonus decagonus atpFourbearded rockling Rhinonemus cimbrius fouNorway haddock Sebastes viviparus nohDeepwater red�sh Sebastes mentella derEsmark´s eelpout Lycodes esmarki esmLong�n snail�sh (sea tadpole) Careproctus reinhardti losPolar cod Boreogadus saida polAtlantic hookear sculpin Artediellus atlanticus atsVahl´s eelpout (checker eelpout) Lycodes vahli vahPolar sculpin Cottunculus microps posArctic rockling Onogadus argentatus artSnake blenny Lumpenus lampretaeformis snaLycodes sp. Lycodes eudipleurostictus lyc

Table A.1: The common and Latin names of the fourty most common species anal-ysed for this study with the codes used for analysis.

Page 107: Robustness of three hierarchical agglomerative clustering

Appendix 91

(a)

62°3

0'

63°

63°3

0'

64°

64°3

0'

65°

65°3

0'

66°

66°3

0'

67°

67°3

0'

29°

28°

27°

26°

25°

24°

23°

22°

21°

20°

19°

18°

17°

16°

15°

14°

13°

12°

11°

10°

clus

ter

1cl

uste

r 2

clus

ter

3cl

uste

r 4

clus

ter

5

(b)

62°3

0'

63°

63°3

0'

64°

64°3

0'

65°

65°3

0'

66°

66°3

0'

67°

67°3

0'

29°

28°

27°

26°

25°

24°

23°

22°

21°

20°

19°

18°

17°

16°

15°

14°

13°

12°

11°

10°

clus

ter

1cl

uste

r 2

clus

ter

3cl

uste

r 4

clus

ter

5

FigureA.1:De�nition

ofareasin

Icelandicwatersusing(a)Average

(b)Com

pletehierarchicalclustering

withcorrelation

distance.Dataconsistsof

speciesabundancein

numbers,transformed

tofourth

root

andscaled

to0meanandvariance

1.

Page 108: Robustness of three hierarchical agglomerative clustering

92 Chapter A Appendix

(a)

62°3

0'

63°

63°3

0'

64°

64°3

0'

65°

65°3

0'

66°

66°3

0'

67°

67°3

0'

29°

28°

27°

26°

25°

24°

23°

22°

21°

20°

19°

18°

17°

16°

15°

14°

13°

12°

11°

10°

clus

ter

1cl

uste

r 2

clus

ter

3cl

uste

r 4

clus

ter

5

(b)

62°3

0'

63°

63°3

0'

64°

64°3

0'

65°

65°3

0'

66°

66°3

0'

67°

67°3

0'

29°

28°

27°

26°

25°

24°

23°

22°

21°

20°

19°

18°

17°

16°

15°

14°

13°

12°

11°

10°

clus

ter

1cl

uste

r 2

clus

ter

3cl

uste

r 4

clus

ter

5

FigureA.2:De�nition

ofareasin

Icelandicwatersusing(a)Average

(b)Com

pletehierarchicalclustering

withBray-Curtis

distance.Dataconsists

ofspeciesabundancein

numbers,transformed

tofourth

root

andstandardised

byrange.

Page 109: Robustness of three hierarchical agglomerative clustering

Bibliography

L. Belbii and C. McDonald. Comparing Three Classi�cation Strategies for Use in

Ecology. Journal of Vegetation Science, 4(3):341�348, 1993.

OA Bergstad, O. Bjelland, and JDM Gordon. Fish communities on the slope of the

eastern Norwegian Sea. Sarsia, 84:67�78, 1999.

N. Bolshakova, F. Azuaje, and P. Cunningham. An integrated tool for microarray

data clustering and cluster validity assessment. Bioinformatics, 21(4):451�455,

2005.

J.C. Brazner and E.W. Beals. Patterns in �sh assemblages from coastal wetland and

beach habitats in Green Bay, Lake Michigan: A multivariate analysis of abiotic

and biotic forcing factors. Canadian Journal of Fisheries and Aquatic Sciences,

54(8):1743�1761, 1997.

Y. Cao, A.W. Bark, and W.P. Williams. A comparison of clustering methods for

river benthic community analysis. Hydrobiologia, 347(1):24�40, 1997a.

Y. Cao, W.P. Williams, and A.W. Bark. Similarity measure bias in river ben-

thic Aufwuchs community analysis. Water Environment Research, 69(1):95�106,

1997b.

Y. Cao, DP Larsen, RM Hughes, PL Angermeier, and TM Patton. Sampling e�ort

a�ects multivariate comparisons of stream assemblages. Journal of the North

American Benthological Society, 21(4):701�714, 2002a.

Y. Cao, D.D. Williams, and D.P. Larsen. Comparison of Ecological Communities:

The Problem of Sample Representativeness. Ecological Monographs, 72(1):41�56,

2002b.

93

Page 110: Robustness of three hierarchical agglomerative clustering

94 BIBLIOGRAPHY

KR Clarke and M. Ainsworth. A method of linking multivariate community struc-

ture to environmental variables. Marine Ecology Progress Series, 92(3):205�219,

1993.

KR Clarke and R.M. Warwick. Change in Marine Communities: An Approach to

Statistical Analysis and Interpretation; Second Edition. PRIMER-E Ltd� 2001.

JA Colvocoresses and JA Musick. Species associations and community composition

of Middle Atlantic Bight continental shelf demersal �shes. Fishery Bulletin, 82

(2):295�313, 1984.

A.C. Culhane, J. Thioulouse, G. Perriere, and D.G. Higgins. MADE4: an R package

for multivariate analysis of gene expression data, 2005.

S. Datta and S. Datta. Comparisons and validation of statistical clustering tech-

niques for microarray gene expression data. Bioinformatics, 19(4):459�466, 2003.

B. Efron, E. Halloran, and S. Holmes. Bootstrap con�dence levels for phylogenetic

trees. Proceedings of the National Academy of Sciences, 93(23):13429, 1996.

M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein. Cluster analysis and

display of genome-wide expression patterns. Proceedings of the National Academy

of Sciences, 95(25):14863, 1998.

AC Fariña, J. Freire, and E. González-Gurriarán. Demersal Fish Assemblages in the

Galician Continental Shelf and Upper Slope (NW Spain): Spatial Structure and

Long-term Changes. Estuarine, Coastal and Shelf Science, 44(4):435�454, 1997.

J.S. Farris. On the Cophenetic Correlation Coe�cient. Systematic Zoology, 18(3):

279�285, 1969.

M.P. Francis, R.J. Hurst, B.H. McArdle, N.W. Bagley, and O.F. Anderson. New

Zealand Demersal Fish Assemblages. Environmental Biology of Fishes, 65(2):

215�234, 2002.

W.L. Gabriel. Persistence of demersal �sh assemblages between Cape Hatteras and

Nova Scotia, Northwest Atlantic. Journal of Northwest Atlantic Fisheries Science,

14:29�46, 1992.

Page 111: Robustness of three hierarchical agglomerative clustering

BIBLIOGRAPHY 95

H.G. Gauch Jr and R.H. Whittaker. Hierarchical Classi�cation of Community Data.

The Journal of Ecology, 69(2):537�557, 1981.

M.C. Gomes and L. Richard. Spatial and temporal changes in the ground�sh as-

semblages on the northeeast NewfoundlandLabrador Shelf, northewest. Fisheries

Oceanograpgy, 4(2):85�101, 1995.

D. González-Troncoso, X. Paz, and X. Cardoso. Persistence and Variation in the

Distribution of Bottom-trawl Fish Assemblages over the Flemish Cap. Journal of

Northwest Atlantic Fisheries Science, 37:103�117, 2006.

A.D. Gordon. Classi�cation, second edition. Chapman & Hall, 1999.

M. Halkidi, Y. Batistakis, and M. Vazirgiannis. Clustering validity checking meth-

ods: part II. ACM SIGMOD Record, 31(3):19�27, 2002a.

M. Halkidi, Y. Batistakis, and M. Vazirgiannis. Cluster validity methods: part I.

Association for Computing Machinery Special Interest Group in Management of

Data (ACM SIGMOD) Record, 31(2):40�45, 2002b.

J. Handl, J. Knowles, and D.B. Kell. Computational cluster validation in post-

genomic data analysis. Bioinformatics, 21(15):3201�3212, 2005.

M. Hasan and Y. Masumoto. Document clustering: before and after the singular

value decomposition. Sapporo, Japan, Information Processing Society of Japan

(IPSJ-TR: 99-NL-134.) pp, pages 47�55, 1999.

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning:

Data Mining, Inference, and Prediction. Springer, 2001.

C. Hennig. Cluster-wise assessment of cluster stability. Computational Statistics

and Data Analysis, 52(1):258�271, 2007.

C. Hennig and F. Mathematik-SPST. A Method for Visual Cluster Validation.

Classi�cation-the Ubiquitous Challenge: Proceedings of the 28th Annual Confer-

ence of the Gesellschaft Für Klassi�kation EV, University of Dortmund, March

9-11, 2004, 2005.

V. Jakoniene and P. Lambrix. A Tool for Evaluating Strategies for Grouping of

Biological Data. Journal of Integrative Bioinformatics, 4(3):83, 2007.

Page 112: Robustness of three hierarchical agglomerative clustering

96 BIBLIOGRAPHY

AJ Jaureguizar, R. Menni, C. Bremec, H. Mianzan, and C. Lasta. Fish assemblage

and environmental patterns in the R�´o de la Plata estuary. Estuarine, Coastal

and Shelf Science, 56(5-6):921�933, 2003.

A.J. Jaureguizar, R. Menni, C. Lasta, and R. Guerrero. Fish assemblages of the

northern Argentine coastal system: spatial patterns and their temporal variations.

Fisheries Oceanography, 15(4):326�344, 2006.

L. Kaufman and P.J. Rousseeuw. Finding groups in data. an introduction to cluster

analysis. Wiley Series in Probability and Mathematical Statistics. Applied Proba-

bility and Statistics, New York: Wiley, 1990, 1990.

M.K. Kerr and G.A. Churchill. Bootstrapping cluster analysis: Assessing the reli-

ability of conclusions from microarray experiments. Proceedings of the National

Academy of Sciences, page 161273698, 2001.

F. Kovács, C. Legány, and A. Babos. Cluster Validity Measurement Techniques.

Proceedings of 6th International Symposium of Hungarian Researchers on Com-

putational Intelligence, Budapest, Hungary, 2005.

GN Lance and WTWilliams. Mixed-Data Classi�catory Programs I - Agglomerative

Systems. Australian Computer Journal, 1(1):15�20, 1967.

Y.W. Lee and D.B. Sampson. Spatial and temporal stability of commercial ground-

�sh assemblages o� Oregon and Washington as inferred from Oregon trawl log-

books. Canadian Journal of Fisheries and Aquatic Sciences, 57(12):2443�2454,

2000.

P. Legendre. Numerical Ecology. Elsevier Science, 1998.

V. Lesage, M.O. Hammill, and K.M. Kovacs. Functional classi�cation of harbor seal

(Phoca vitulina) dives using depth pro�les, swimming velocity, and an index of

foraging success. Canadian Journal of Zoology, 77:74�87, 1999.

V.P. Lessig. Comparing Cluster Analyses with Cophenetic Correlation. Journal of

Marketing Research, 9(1):82�84, 1972.

X. Li. Parallel algorithms for hierarchical clustering and cluster validity. Pat-

tern Analysis and Machine Intelligence, IEEE Transactions on, 12(11):1088�1092,

1990.

Page 113: Robustness of three hierarchical agglomerative clustering

BIBLIOGRAPHY 97

P.A. Livingston and S. Tjelmeland. Fisheries in boreal ecosystems. ICES Journal

of Marine Science, 57(3):619, 2000.

R. Loganantharaj, S. Cheepala, and J. Cli�ord. Metric for Measuring the E�ective-

ness of Clustering of DNA Microarray Expression. Bioinformatics, 7(Suppl 2),

2006.

Martin Maechler, Peter Rousseeuw, Anja Struyf, and Mia Hubert. Cluster analysis

basics and extensions. Rousseeuw et al provided the S original which has been

ported to R by Kurt Hornik and has since been enhanced by Martin Maechler:

speed improvements, silhouette() functionality, bug �xes, etc. See the 'Changelog'

�le (in the package source), 2005.

E. Magnussen. Demersal �sh assemblages of Faroe Bank: species composition,

distribution, biomass spectrum and diversity. Marine Ecology Progress Series,

238:211�225, 2002.

R. Mahon, S.K. Brown, K.C.T. Zwanenburg, D.B. Atkinson, K.R. Buja, L. Cla�in,

G.D. Howell, M.E. Monaco, R.N. O'Boyle, and M. Sinclair. Assemblages and

biogeography of demersal �shes of the east coast of North America. Canadian

Journal of Fisheries and Aquatic Sciences, 55(7):1704�1738, 1998.

E. Massuti and J. Moranta. Demersal assemblages and depth distribution of elas-

mobranchs from the continental shelf and slope o� the Balearic Islands (western

Mediterranean). ICES Journal of Marine Science, 60(4):753, 2003.

JE McKenna. An enhanced cluster analysis program with bootstrap signi�cance

testing for ecological community analysis. Environmental Modelling and Software,

18(3):205�220, 2003.

A. Medina, J.C. Brêthes, J.M. Sévigny, and B. Zakardjian. How geographic distance

and depth drive ecological variability and isolation of demersal �sh communities

in an archipelago system (Cape Verde, Eastern Atlantic Ocean). Marine Ecology,

28(3):404�417, 2007.

G.W. Milligan and M.C. Cooper. Methodology Review: Clustering Methods. Ap-

plied Psychological Measurement, 11(4):329, 1987.

Page 114: Robustness of three hierarchical agglomerative clustering

98 BIBLIOGRAPHY

AFL Nemec and RO Brinkhurst. Using the bootstrap to assess statistical signi�cance

in the cluster analysis of species abundance data. Canadian Journal of Fisheries

and Aquatic Sciences, 45(6):965�970, 1988.

R.F. Noss. Indicators for Monitoring Biodiversity: A Hierarchical Approach. Con-

servation Biology, 4(4):355�364, 1990.

O.K. Pálsson, E. Jónsson, SA Schopka, G. Stefánsson, and BÆ Steinarsson. Icelandic

ground�sh survey data used to improve precision in stock assessments. Journal

of Northwest Atlantic Fishery Science, 9:53�72, 1989.

JB Phipps. Dendrogram topology. Systematic Zoology, 20:306�308, 1971.

EK Pikitch, C. Santora, EA Babcock, A. Bakun, R. Bon�l, DO Conover, P. Dayton,

P. Doukakis, D. Fluharty, B. Heneman, et al. ECOLOGY: Ecosystem-Based

Fishery Management. Science, 305(5682):346�347, 2004.

A. Pryke, S. Mostaghim, and A. Nazemi. Heatmap Visualization of Population

Based Multi Objective Algorithms. School of computer science research reports -

University of Birmingham CSR, 14, 2006.

J. Quackenbush. Extracting biology from high-dimensional biological data. Journal

of Experimental Biology, 210(9):1507, 2007.

G.P. Quinn and M.J. Keough. Experimental Design and Data Analysis for Biologists.

Cambridge University Press, 2002.

H.J. Rätz. Structures and changes of the demersal �sh assemblage o� Greenland,

1982�96. NAFO Scienti�c Council Studies, 32(1):15, 1999.

U. Riecken. E�ects of Short-Term Sampling on Ecological Characterization and

Evaluation of Epigeic Spider Communities and Their Habitats for Site Assessment

Studies. Journal of Arachnology, 27(1):189�195, 1999.

F.M. Rodrigues and J.A.F. Diniz-Filho. Hierarchical structure of genetic distances:

E�ects of matrix size, spatial distribution and correlation structure among gene

frequencies. Genetics and Molecular Biology, 21:233�240, 1998.

F.J. Rohlf and DL Fisher. Test for hierarchical structure in random data sets.

Systematic Zoology, 17:407�412, 1968.

Page 115: Robustness of three hierarchical agglomerative clustering

BIBLIOGRAPHY 99

D. Scheibler and W. Schneider. Monte CarRo Tests of the Accuracy of Cluster

Analysis Algorithms: A Comparison of Hierarchical andl Nonhierarchical Meth-

ods. Multivariate Behavioral Research, 20:283�304, 1985.

H. Shimodaira. An Approximately Unbiased Test of Phylogenetic Tree Selection.

Systematic Biology, 51(3):492�508, 2002.

H. Shimodaira. Testing regions with nonsmooth boundaries via multiscale bootstrap.

Journal of Statistical Planning and Inference, 138(5):1227�1241, 2008.

R.R. Sokal and F.J. Rohlf. The comparison of dendrograms by objective methods.

Taxon, 11(1):30�40, 1962.

P. Sousa, M. Azevedo, and M.C. Gomes. Demersal assemblages o� Portugal: Map-

ping, seasonal, and temporal patterns. Fisheries Research, 75(1-3):120�137, 2005.

G. Stefánsson and OK Pálsson. BORMICON: A Boreal Migration and Consumption

Model. Marine Research Institute Report. 58. 223 p., 1997.

R. Suzuki and H. Shimodaira. An application of multiscale bootstrap resampling to

hierarchical clustering of microarray data: How accurate are these clusters? pro-

ceedings by the Fifteenth International Conference on Genome Informatics (GIW

2004), p. P, 34, 2004.

R. Suzuki and H. Shimodaira. Pvclust: an R package for assessing the uncertainty

in hierarchical clustering. Bioinformatics, 22(12):1540�1542, 2006.

A.J. Vakharia and U. Wemmerlöv. A comparative investigation of hierarchical clus-

tering techniques and dissimilarity measures applied to the cell formation problem.

Journal of Operations Management, 13(2):117�138, 1995.

H. Valdimarsson and S. Malmberg. Near-surface circulation in Icelandic waters

derived from satellite tracked drifters. Rit Fiskideildar, 16:23�39, 1999.

J.H. Ward. Hierarchical grouping to optimize an objective function. Journal of the

American Statistical Association, 58(301):236�244, 1963.

L. Zhang, A. Zhang, and M. Ramanathan. Fourier harmonic approach for visualizing

temporal patterns of gene expression data. Bioinformatics Conference, 2003. CSB

2003. Proceedings of the 2003 IEEE, pages 137�147, 2003.

Page 116: Robustness of three hierarchical agglomerative clustering

100 BIBLIOGRAPHY