systematics lecture: phenetics
TRANSCRIPT
PHENETICS (Numerical Taxonomy)
Phenetics -Character scoring
1
2
3
4
5
C1 C2 C3 C4 C5 C6 ..
Raw table
Character states
•Multi-dimensional problem
•Numerical taxonomy/ phenetics is
essentially a multivariate method of
statistical analysis
• Characters are reduced to distances for
phenetic analysis
5. Dendrogram – cluster or group OTUs by
overall resemblance
Scores
4) Calculate from characters the pairwise measures of overall resemblance between OTUs (results in a distance matrix – OTU x OTU)
5×5
1
2
3
4
5
C1 C2 C3 C4 C5 C6 ..
3) List characters
Similarity criterion
Cluster criterion
Any set of numbers per
column
1) Choose taxa 2) Discover and measure characters
Caminalcules
Operational Taxonomic Units (OTUs)--a name we use to avoid
assigning organisms to any particular taxonomic rank (such as
species).
Step 1
The first step is to make a
subjective judgement about
the overall similarity
between all pair-wise
combinations of the eight
OTUs
Measures of Overall Similarity
• Measured by means of a “similarity coefficient
1. Qualitative characters
a. Coefficients of association
i. Simple matching
ii. Jaccard
2. Quantitative characters
a. Distance
i. Taxonomic distance
Coefficients of association
- Data are qualitative characters with 2 states,
i.e. presence/absence
CHARACTERS
1 2 3 4
OTUi + + - -
OTUj - + - +
Simple matching coefficient
• Fraction of characters where OTUs have
the identical state:
• Formula:
Ssm = m/(m+u), where m = match
u = mismatch
CHARACTERS
1 2 3 4
OTUi + + - -
OTUj - + - +
• Ssm = m/(m+u)
• Number of matches = 2
• Number of mismatches = 2
• Ssm = 2/(2+2)
= 1/2
Simple matching coefficient
Jaccard coefficient (Sneath)
• Sj = a/(a+u)
where a = # presence identities
u = b + c (mismatch)
- Ignores absence matches
OTUi
+ -
OTUj + a b
- c d
OTUi
+ -
OTUj + a =1 b = 1
- c =1 d =1
CHARACTERS
1 2 3 4
OTUi + + - -
OTUj - + - +
• Sj = a/(a+u)
where a = # presence identities
u = b + c
Sj = 1 / (1+ 2)
= 1/3
Distance coefficients
1. Taxonomic distance = Euclidean
distance in character space
Quantitative characters
Euclidean Distance Metric
The Euclidean distance between two
points and
in Euclidean n-space, is defined as:
Sample data for seven operational
taxonomic units
Distance between 1 and 2 character SPECIES X-Y (X-Y)2
1 (X) 2 (Y)
5 2.47 2.35 0.12 0.0144
6 3.08 2.99 0.09 0.0081
7 1.93 1.88 0.05 0.0025
9 1.97 1.88 0.09 0.0081
10 1.93 1.81 0.12 0.0144
11 2.46 2.31 0.15 0.0225
12 1.08 1.36 -0.28 0.0784
15 2.3 2.23 0.07 0.0049
17 8.5 8.3 0.2 0.04
22 109.7 111.1 -1.4 1.96
23 96 94.6 1.4 1.96
25 90.9 89.9 1 1
Euclidean Distance = 5.1133
add
Step 2
The similarity rankings you have produced
are then used to create a similarity matrix.
Step 3 Find the pair of OTUs that have the highest similarity
ranking. (In this example, it happens to be OTUs 2 and
7, with a similarity ranking of 0.9 shown in boldface and
with an asterisk*).
Step 4 Combine OTUs 2 and 7, and treat them as a single composite
unit from this point on. Construct a new similarity matrix (this
time it will be 7 x 7), as shown in the table below.
Recalculate the similarity values for each OTU with the new
composite 2/7 OTU. To do so, simply compute the average
similarity of each OTU with 2 and with 7
How to calculate for the new similarity values ?
1 & 7 = 0.1
1 & 2 = 0.1
1 and (7,2) = (0.1 + 0.1)/2
= 0.2/2 = 0.1
How to calculate for the new similarity values ?
3 & 7 = 0.2
3 & 2 = 0.1
3 and (7,2) = (0.2 + 0.1)/2
= 0.3/2 = 0.15
How to calculate for the new similarity values ?
4 & 7 = 0.3
4 & 2 = 0.3
4 and (7,2) = (0.3 + 0.3)/2
= 0.6/2 = 0.3
How to calculate for the new similarity values ?
5 & 7 = 0.2
5 & 2 = 0.2
5 and (7,2) = (0.2 + 0.2)/2
= 0.4/2 = 0.2
How to calculate for the new similarity values ?
6 & 7 = 0.3
6 & 2 = 0.2
6 and (7,2) = (0.3 + 0.2)/2
= 0.5/2 = 0.25
How to calculate for the new similarity values ?
8 & 7 = 0.4
8 & 2 = 0.3
8 and (7,2) = (0.4 + 0.3)/2
= 0.7/2 = 0.35
Step 5 In the new, reduced matrix with recomputed similarity
values, find the next pair of OTUs with the highest
similarity value. In this case, OTUs 1 and 6 and OTUs 3
and 5 are tied with a similarity value of 0.8. For
simplicity, choose one pairing at random and recalculate
the similarity indices, and then do the next pairing,
Dendrogram (tree)
Similarity matrix
Cluster criterion
Your OTUs can now be clustered
graphically in a branching diagram
called a phenogram.
How to construct the dendrogram ?
How to construct the dendrogram ?
Simple matching coefficient Formula:
Ssm = m/(m+u), where m = match
u = mismatch
Jaccard coefficient Formula:
Sj = a/(a+u)
where a = # matching presence identities
u = b + c (mismatch)