ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Memorial University of Newfoundland
Pattern RecognitionLecture 15, June 29, 2006
http://www.engr.mun.ca/~charlesr
Office Hours: Tuesdays & Thursdays 8:30 - 9:30 PM
EN-3026
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
July 2006
2
Lecture 16 Lecture 17Assignment
4 Due
Lecture 18 Lecture 19
Presentations PresentationsAssignment
5 Due
Lecture 21
Final ReportsLecture 22
2
9
16
4 6 8
11 13 15
18 20 22
Sunday Tuesday Thursday SaturdayMonday Wednesday Friday
23 25 27 29
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Last Week
3
Clustering (Unsupervised Classification)
ß
# o
ccurr
ences
pel value
Distribution Modes and Minima Pattern Grouping
- Group similar patterns using
distance metrics.
- Merge or split clusters based on
cluster similarity measurements.
- Measures of clusters ‘goodness’
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Recap: Simple Grouping using Threshold
4
1.!! k = 1! ! (number of clusters)
2.!! z1 = x1!! (set first sample as class prototype)
3.!! For all other samples xi:
! a.! Find zj for which d(xi,zj) is minimum
" b." If d(xi,zj) # T then assign xi to Cj
! c.! Else k = k+1, zk = xi
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Recap: Hierarchical Grouping Algorithms
5
Successively merge based on some measure of within or
between cluster similarity.
For n samples {x1, ..., xn}
1.!! k = n, Ci = xi for i = 1...n
2.!! Merge Ci, Cj which are most similar, k = k-1:
3.!! Continue until some stopping condition is met.
1. IMAGE SEGMENTATION 2
dH(xi, xj) = number of di!erences
xi ! xj
Hierarchical scheme:
mini,j
d(Ci, Cj)" C !i = Ci # Cj
Intercluster distance
d1(Ci, Cj) = minx!Ci,x!!Cj
(d(x, x!))
Furthest Neighbour
d2(Ci, Cj) = maxx!Ci,x!!Cj
(d(x, x!)
Average Neighbour
d3(Ci, Cj) =1
NiNj
!
x!Ci
!
x!!Cj
d(x, x!)
Mean distance
d4(Ci, Cj) = d(mi,mj)
Stopping Conditions
min(d(Ci, Cj)) $ T
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Goodness of Partitioning
6
• Several of the stopping conditions suggest that we
can use a measure of the scatter of each cluster to
gauge how good the overall clustering is.
• In general, we would like compact clusters with a lot
of space between them.
• We can use the measure of goodness to iteratively
move samples from one cluster to another to optimize
the groupings.
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Clustering Criterion
7
Global measurements of the goodness of the clusters.
1. Representative error = summed scatter within clusters
Representation error of a clustering is the error from
representing the N samples by the k cluster prototypes.
Can choose zi to minimize
1. IMAGE SEGMENTATION 3
d(k) = min(d(Ci, Cj)d(k) > !d(k ! 1)
Representation Error
Je =k!
i=1
!
x!Ci
|x! zi|2
Ji =!
x!Ci
|x! zi|2
"Ji
"zi
=!
x!Ci
2(x! zi)" zi = mi
Je =k!
i=1
!
x!Ci
|x!mi|2
Swi =!
x!Ci
(x!mi)(x!mi)T
Je =k!
i=1
tr Swi
Sw =k!
i=1
SwiJe = tr Sw
Cluster criterion 2
|Sw|
Cluster Criterion 3
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 8
So
Now define scatter matrix
Thus
Define summed scatter to be
1. IMAGE SEGMENTATION 3
d(k) = min(d(Ci, Cj)d(k) > !d(k ! 1)
Representation Error
Je =k!
i=1
!
x!Ci
|x! zi|2
Ji =!
x!Ci
|x! zi|2
"Ji
"zi
=!
x!Ci
2(x! zi)" zi = mi
Je =k!
i=1
!
x!Ci
|x!mi|2
Scatter Matrix
SWi =!
x!Ci
(x!mi)(x!mi)T
Je =k!
i=1
tr SWi
SW =k!
i=1
SWi
Je = tr SW
Cluster criterion 2
1. IMAGE SEGMENTATION 3
d(k) = min(d(Ci, Cj)d(k) > !d(k ! 1)
Representation Error
Je =k!
i=1
!
x!Ci
|x! zi|2
Ji =!
x!Ci
|x! zi|2
"Ji
"zi
=!
x!Ci
2(x! zi)" zi = mi
Je =k!
i=1
!
x!Ci
|x!mi|2
Scatter Matrix
SWi =!
x!Ci
(x!mi)(x!mi)T
Je =k!
i=1
tr SWi
SW =k!
i=1
SWi
Je = tr SW
Cluster criterion 2
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Clustering Criterion
9
2. Use volume of summed scatter:
2. K-MEANS 4
|Sw|
Cluster Criterion 3
SB =k!
i=1
Ni(mi !m)(mi !m)T
tr S!1w SB
2 K-Means
{z1, ..., zk}
x!Ci if d(x, zi) < d(x, zj) j "= i
{zi}zi =1Ni
!
Ci
x = mi
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Clustering Criterion
10
3. Could use between-cluster to within-cluster scatter
2. K-MEANS 4
|Sw|
Cluster Criterion 3
SB =k!
i=1
Ni(mi !m)(mi !m)T
tr S!1w SB
2 K-Means
{z1, ..., zk}
x!Ci if d(x, zi) < d(x, zj) j "= i
{zi}zi =1Ni
!
Ci
x = mi
So could use
2. K-MEANS 4
|Sw|
Cluster Criterion 3
SB =k!
i=1
Ni(mi !m)(mi !m)T
tr S!1w SB
2 K-Means
{z1, ..., zk}
x!Ci if d(x, zi) < d(x, zj) j "= i
{zi}zi =1Ni
!
Ci
x = mi
Note: Any within-cluster criterion is minimized with k=N, and
thus we would need an independent criterion for k.
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
K-Means
11
Once we have a criterion, we can create an iterative
clustering scheme.
K-Means is the classic iterative clustering scheme:
1. Choose k prototypes {z1, ..., zk}
2. Assign all samples to clusters:
3. Update {zi} to minimize Ji, i=1,...,k
4. Reassign the samples using the new prototypes
5. Continue until no prototypes change.
2. K-MEANS 4
|Sw|
Cluster Criterion 3
SB =k!
i=1
Ni(mi !m)(mi !m)T
tr S!1w SB
2 K-Means
{z1, ..., zk}
x!Ci if d(x, zi) < d(x, zj) j "= i
{zi}zi =1Ni
!
Ci
x = mi
2. K-MEANS 4
|Sw|
Cluster Criterion 3
SB =k!
i=1
Ni(mi !m)(mi !m)T
tr S!1w SB
2 K-Means
{z1, ..., zk}
x!Ci if d(x, zi) < d(x, zj) j "= i
{zi}
zi =1Ni
!
Ci
x = mi
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 12From “KM Demo Algorithm” Java Applet" " " http://web.sfc.keio.ac.jp/~osada/KM/index.html
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
K-Means
13
Good Points:
! - Simple conceptually
! - Successful if k is accurate and clusters are well-separated
Problems:
" - If k is incorrect, then the clusters can’t be right
! - Efficiency depends on sample order
! - Non-spherical clusters cause problems
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Extensions to K-Means
14
There are several ways to extend the basic k-means
algorithm:
1. Global minimization of Je
! As an alternative to simply assigning samples to closest
cluster prototype.
2. Allow a variable k
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Global Minimization of Je
15
Pattern Recognition - Clustering
Charles RobertsonJuly 9, 2005
1 K-means Extension
global minimization of Je
Je =k!
i=1
Ji
Basic idea:
J !j =
!
x!Cj
|x!m!j |2 + |x! !m!
j |2
But
m!j =
Njmj + x!
Nj + 1= mj +
x! !mj
Nj + 1
! J !j =
!
x!Cj
|x!mj !x! !mj
Nj + 1|2 + |x! !mj !
x! !mj
Nj + 1|2
=!
x!Cj
|x!mj |2 + 0 + Nj(x! !mj)2
(Nj + 1)2+ |
(Nj + 1)(x! !mj)! (x! !mj)Nj + 1
|
=!
x!Cj
|x!mj |2 +Nj
(Nj + 1)2|x! !mj |2 +
N2j
(Nj + 1)2|x! !mj |2
=!
x!Cj
|x!mj |2 +Nj
Nj + 1|x! !mj |2
1
Ji is the representation error for the ith cluster.
Basic Plan:
Move sample x’ from Ci to Cj if the magnitude of the increment
to the representation error in Jj is less than the decrement to the
representation error in Ji
Pattern Recognition - Clustering
Charles Robertson
July 9, 2005
1 K-means Extension
global minimization of Je
Je =k!
i=1
Ji
Basic idea:
!j < !i
J !j =
!
x!Cj
|x!m!j |2 + |x! !m!
j |2
But
m!j =
Njmj + x!
Nj + 1= mj +
x! !mj
Nj + 1
1
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 16
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 17
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 18
Similarly, we can show that the decrement to Ji is
1. K-MEANS EXTENSION 2
! J !j =
!
x!Cj
""""x!mj !x! !mj
Nj + 1
""""2
+""""x
! !mj !x! !mj
Nj + 1
""""2
=!
x!Cj
""x!mj
""2 + 0 + Nj(x! !mj)2
(Nj + 1)2+
""""(Nj + 1)(x! !mj)! (x! !mj)
Nj + 1
""""2
=!
x!Cj
|x!mj |2 +Nj
(Nj + 1)2|x! !mj |2 +
N2j
(Nj + 1)2|x! !mj |2
=!
x!Cj
|x!mj |2 +Nj
Nj + 1|x! !mj |2
= Jj + !j
Thus:
!j =Nj
Nj + 1|x! !mj |2
Similarly we can show that
!i =Ni
Ni ! 1|x! !mi|2
So the reassignment rule:
Nj
Nj + 1|x! !mj |2 <
Ni
Ni ! 1|x! !mi|2
Notes:
Nj
Nj + 1< 1
Ni
Ni ! 1> 1
Ni
Ni + 1|x! !mi|2
Ci
"Cj
Nj
Nj + 1|x! !mj |2
So the reassignment rule (step 2 of K-means) is:
- Move x’ from Ci to Cj if
1. K-MEANS EXTENSION 2
! J !j =
!
x!Cj
""""x!mj !x! !mj
Nj + 1
""""2
+""""x
! !mj !x! !mj
Nj + 1
""""2
=!
x!Cj
""x!mj
""2 + 0 + Nj(x! !mj)2
(Nj + 1)2+
""""(Nj + 1)(x! !mj)! (x! !mj)
Nj + 1
""""2
=!
x!Cj
|x!mj |2 +Nj
(Nj + 1)2|x! !mj |2 +
N2j
(Nj + 1)2|x! !mj |2
=!
x!Cj
|x!mj |2 +Nj
Nj + 1|x! !mj |2
= Jj + !j
Thus:
!j =Nj
Nj + 1|x! !mj |2
Similarly we can show that
!i =Ni
Ni ! 1|x! !mi|2
So the reassignment rule:
Nj
Nj + 1|x! !mj |2 <
Ni
Ni ! 1|x! !mi|2
Notes:
Nj
Nj + 1< 1
Ni
Ni ! 1> 1
Ni
Ni + 1|x! !mi|2
Ci
"Cj
Nj
Nj + 1|x! !mj |2
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Notes on Global Minimization
19
1. Rule has little impact when Ni and Nj are very large
3. If x’ is an unassigned sample, would get minimum increase to
Je with:
Modifies initial K-Means assignment by taking cluster size into
account. If x’ is equidistant from mi, mj, then assign it to the
smallest cluster.
1. K-MEANS EXTENSION 2
! J !j =
!
x!Cj
""""x!mj !x! !mj
Nj + 1
""""2
+""""x
! !mj !x! !mj
Nj + 1
""""2
=!
x!Cj
""x!mj
""2 + 0 + Nj(x! !mj)2
(Nj + 1)2+
""""(Nj + 1)(x! !mj)! (x! !mj)
Nj + 1
""""2
=!
x!Cj
|x!mj |2 +Nj
(Nj + 1)2|x! !mj |2 +
N2j
(Nj + 1)2|x! !mj |2
=!
x!Cj
|x!mj |2 +Nj
Nj + 1|x! !mj |2
= Jj + !j
Thus:
!j =Nj
Nj + 1|x! !mj |2
Similarly we can show that
!i =Ni
Ni ! 1|x! !mi|2
So the reassignment rule:
Nj
Nj + 1|x! !mj |2 <
Ni
Ni ! 1|x! !mi|2
Notes:
Nj
Nj + 1< 1
Ni
Ni ! 1> 1
Ni
Ni + 1|x! !mi|2
Cj
"Ci
Nj
Nj + 1|x! !mj |2
1. K-MEANS EXTENSION 2
! J !j =
!
x!Cj
|x!mj !x! !mj
Nj + 1|2 + |x! !mj !
x! !mj
Nj + 1|2
=!
x!Cj
|x!mj |2 + 0 + Nj(x! !mj)2
(Nj + 1)2+ |
(Nj + 1)(x! !mj)! (x! !mj)Nj + 1
|
=!
x!Cj
|x!mj |2 +Nj
(Nj + 1)2|x! !mj |2 +
N2j
(Nj + 1)2|x! !mj |2
=!
x!Cj
|x!mj |2 +Nj
Nj + 1|x! !mj |2
Thus:
!j =Nj
Nj + 1|x! !mj |2
Similarly we can show that
!i =Ni
Ni ! 1|x! !mi|2
So the reassignment rule:
Nj
Nj + 1|x! !mj |2 <
Ni
Ni ! 1|x! !mi|2
Notes:
Nj
Nj + 1< 1
Ni
Ni ! 1> 1
Ni
Ni + 1|x! !mi|2
Ci
"Cj
Nj
Nj + 1|x! !mj |2
Dealing k:
1. K-MEANS EXTENSION 2
! J !j =
!
x!Cj
|x!mj !x! !mj
Nj + 1|2 + |x! !mj !
x! !mj
Nj + 1|2
=!
x!Cj
|x!mj |2 + 0 + Nj(x! !mj)2
(Nj + 1)2+ |
(Nj + 1)(x! !mj)! (x! !mj)Nj + 1
|
=!
x!Cj
|x!mj |2 +Nj
(Nj + 1)2|x! !mj |2 +
N2j
(Nj + 1)2|x! !mj |2
=!
x!Cj
|x!mj |2 +Nj
Nj + 1|x! !mj |2
Thus:
!j =Nj
Nj + 1|x! !mj |2
Similarly we can show that
!i =Ni
Ni ! 1|x! !mi|2
So the reassignment rule:
Nj
Nj + 1|x! !mj |2 <
Ni
Ni ! 1|x! !mi|2
Notes:
Nj
Nj + 1< 1
Ni
Ni ! 1> 1
Ni
Ni + 1|x! !mi|2
Ci
"Cj
Nj
Nj + 1|x! !mj |2
Dealing k:
while
no matter what Ni and Nj are.
2. A point nearly on the MED boundary will be reassigned since
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Example
20
(0,2) (1,0) (1,1)(1,2) (1,3) (1,4) (2,2) (3,1) (4,1) (5,0) (5,1) (5,2) (6,1) (7,1)
Consider the following set of samples:
Cluster using basic K-means and using K-means with global
minimization method. Use k = 2.
What happens if we start with k $ 2 (e.g. k = 3)?
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Dealing with K
21
Need a way of varying k in accordance with goodness of
partitioning.
Strategies for dealing with k:
1. Delete clusters with too few samples.
! If Ni < T1 drop Ci and zi, and reassign samples from Ci.
2. Merge clusters which are close together.
! If
! then replace Ci with the union of Ci and Cj, and drop zj.
2. FEATURE SELECTION AND EXTRACTION 3
Dealing k:
2. Merge clusters which are close together
(mi !mj)T
!Si + Sj
2
"(mi !mj) < T2
2 Feature Selection and Extraction
max J(xk) = P (E|{xi, i "= k})! P (E|{xi#i})
1. Interclass distance
2 classes:
Ji =(mi1 !mi2)2
S2i1 + S2
i2
k classes:
Ji =k#
j=1
k#
l=1
(mij !mil)2
S2ij + S2
il
2. Information measure
I(Ci) ! ! log P (Ci)
independent:
I(Ci, yi) = 0
to evaluate:
3. Split clusters which are spread out.
! If maximum eigenvalue of Si is greater than T3, split Ci with a
plane through mi perpendicular to max. eigenvector and add
new cluster.
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 22
There are many possible clustering algorithms.
See “Statistical Pattern Recognition: A Review” by Jain, Duin,
and Mao for more possibilities.