pattern classification - computer science & erose/768/ppt/dhsch2part3.pdf · pattern...
TRANSCRIPT
Pattern Classification
All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000
with the permission of the authors and the publisher
Chapter 2 (part 3)Bayesian Decision Theory
(Sections 2-6,2-9)
• Discriminant Functions for the Normal Density
• Bayes Decision Theory – Discrete Features
Pattern Classification, Chapter 2 (Part 3)
2Discriminant Functions for the Normal Density
• We saw that the minimum error-rate classification can be achieved by the discriminant function
gi(x) = ln P(x | ωi) + ln P(ωi)
• Case of multivariate normal
)(lnln212ln
2)()(
21)(
1
iii
it
ii Pdxxxg ωπμμ +Σ−−−−−= ∑−
Pattern Classification, Chapter 2 (Part 3)
3
Case Σi = σ2I (I stands for the identity matrix)
• What does “Σi = σ2I” say about the dimensions?
• What about the variance of each dimension?
normEuclidean thedenotes where
)(ln2
)(
:osimplify tcan weThus
)(lnln212ln
2)()(
21)(
in oft independen are ln (d/2) and both :Note
2
2
1
i
⋅
+−
−=
+Σ−−−−−=
Σ
∑−
ii
i
iii
it
ii
Pxg
Pdxxxg
i
ωσμχ
ωπμμ
π
Pattern Classification, Chapter 2 (Part 3)
4
• We can further simplify by recognizing that the quadratic term xtx implicit in the Euclidean norm is the same for all i.
)category!th for the threshold thecalled is (
)(ln2
1 ;
:wherefunction)nt discrimina(linear )(
0
202
0
ii
iitii
ii
itii
Pw
wxg
ω
ωσσ
+−==
+=
μμμw
xw
Pattern Classification, Chapter 2 (Part 3)
5
• A classifier that uses linear discriminantfunctions is called “a linear machine”
• The decision surfaces for a linear machine are pieces of hyperplanes defined by:
gi(x) = gj(x)
The equation can be written as:wt(x-x0)=0
Pattern Classification, Chapter 2 (Part 3)
6
• The hyperplane separating Ri and Rj
always orthogonal to the line linking the means!
)()()(ln)(
21
2
2
0 jij
i
ji
ji PP μμ
μμμμx −
−−+=
ωωσ
)(21 then )()( 0 jiji PPif μμx +== ωω
Pattern Classification, Chapter 2 (Part 3)
7
Pattern Classification, Chapter 2 (Part 3)
8
Pattern Classification, Chapter 2 (Part 3)
9
Pattern Classification, Chapter 2 (Part 3)
10
• Case Σi = Σ (covariance of all classes are identical but arbitrary!)Hyperplane separating Ri and Rj
(the hyperplane separating Ri and Rj is generally not orthogonal to the line between the means!)
[ ]).(
)()()(/)(ln
)(21
and
)(Where
0)(equation theHas
10
0t
jiji
tji
jiji
ji
PPμμ
μμΣμμμμx
μμΣw
xxw
1
−−−
−+=
−=
=−
−
−
ωω
Pattern Classification, Chapter 2 (Part 3)
11
Pattern Classification, Chapter 2 (Part 3)
12
Pattern Classification, Chapter 2 (Part 3)
13
• Case Σi = arbitrary
• The covariance matrices are different for each category
The decision surfaces are hyperquadratics(Hyperquadrics are: hyperplanes, pairs of hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids, hyperhyperboloids)
)(Plnln21
21 w
w21W
:wherewxwxWx)x(g
iii1
iti0i
i1
ii
1ii
0itii
ti
ωΣμΣμ
μΣ
Σ
+−−=
=
−=
=+=
−
−
−
Pattern Classification, Chapter 2 (Part 3)
14
Pattern Classification, Chapter 2 (Part 3)
15
Pattern Classification, Chapter 2 (Part 3)
16Bayes Decision Theory – Discrete
Features
• Components of x are binary or integer valued, x can take only one of m discrete values
v1, v2, …, vm
concerned with probabilities rather than probability densities in Bayes Formula:
∑=
=
=
c
jjj
jjj
PPP
PPP
P
1
)()|()(
where)(
)()|()|(
ωω
ωωω
xx
xx
x
Pattern Classification, Chapter 2 (Part 3)
17Bayes Decision Theory – Discrete
Features
• Conditional risk is defined as before: R(α|x)
• Approach is still to minimize risk:
)|(minarg* xiiR αα =
Pattern Classification, Chapter 2 (Part 3)
18
Bayes Decision Theory – Discrete Features
• Case of independent binary features in 2 category problemLet x = [x1, x2, …, xd ]t where each xi is either 0 or 1, with probabilities:
pi = P(xi = 1 | ω1)qi = P(xi = 1 | ω2)
Pattern Classification, Chapter 2 (Part 3)
19Bayes Decision Theory – Discrete
Features
• Assuming conditional independence, P(x|ωi) can be written as a product of component probabilities:
∏
∏
∏
=
−
=
−
=
−
⎟⎟⎠
⎞⎜⎜⎝
⎛−−
⎟⎟⎠
⎞⎜⎜⎝
⎛=
−=
−=
d
i
x
i
i
x
i
i
d
i
xi
xi
d
i
xi
xi
ii
ii
ii
qp
qp
PP
qqP
ppP
1
1
2
1
1
12
1
11
11
)|()|(
:bygiven ratio likelihood a yielding
)1()|(
and
)1()|(
ωω
ω
ω
xx
x
x
Pattern Classification, Chapter 2 (Part 3)
20Bayes Decision Theory – Discrete
Features
• Taking our likelihood ratio
)()(ln
11ln)1(ln)(
:yields)()(ln
)|()|(ln)(
31 Eq. intoit plugging and
11
)|()|(
2
1
1
2
1
2
1
1
1
2
1
ωω
ωω
ωω
ωω
pp
qpx
qpxg
pp
ppg
qp
qp
PP
d
i i
ii
i
ii
d
i
x
i
i
x
i
iii
+⎥⎦
⎤⎢⎣
⎡−−
−+=
+=
⎟⎟⎠
⎞⎜⎜⎝
⎛−−
⎟⎟⎠
⎞⎜⎜⎝
⎛=
∑
∏
=
=
−
x
xxx
xx
Pattern Classification, Chapter 2 (Part 3)
21
• The discriminant function in this case is:
0g(x) if and0 g(x) if decide)(P)(Pln
q1p1lnw
:and
d,...,1i )p1(q)q1(plnw
:where
wxw)x(g
21
2
1d
1i i
i0
ii
iii
0i
d
1ii
≤>
+−−
=
=−−
=
+=
∑
∑
=
=
ωωωω