lecture 17
DESCRIPTION
neural nets lectureTRANSCRIPT
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Memorial University of Newfoundland
Pattern RecognitionLecture 17, July 11, 2006
http://www.engr.mun.ca/~charlesr
Office Hours: Tuesdays & Thursdays 8:30 - 9:30 PM
EN-3026
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Presentations - July 18th and 20th
• All graduate students are to give a 10-12 minute
presentation + answer questions for 3-5 minutes
• All students are expected to attend
• Fill in evaluation sheets for each presentation
• Email me ([email protected]) by
Thursday, July 13th, with:
• Final project title
• Requirements for presentation (if any)
2
“Chance favours the prepared mind.”Louis Pasteur
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Presentations - July 25th
• Suggestions and considerations
• Think about what your audience already knows and
probably doesn’t know
• Include information on the pattern recognition algorithm
(2-4 minutes)
• Describe the context of your use of the algorithm (1-2
min)
• Describe how you used the algorithm (2-3 min)
• Discuss your experiment and results - is the algorithm
actually useful? (2-3 min)
3
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Recap
• K-Means
• Basic algorithm review
• Extensions - global minimization method
• Discussion of strategies for dealing with k
• Examples
• K-Means - basic and global minimization
• Dendrogram
4
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Feature Selection and Extraction
• Classifier performance depend on a combination of
the number of samples, number of features, and
complexity of the classifier.
• Ideally, using more features leads to better
performance
• Only true if the class-conditional densities are
completely known
• Not true if the number of training samples small relative
to the number of features
• This is known as the ‘peaking phenomenon’, or
equivalently ‘the curse of dimensionality’
5
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 6
!"#"$%&'( )#*'+&%,-! &( %"+-! '. $#)!!&.&$)%&'( "++'+ )(/ +0(%&-"1 2," *"("+)# $'($#0!&'( &! %,)% %," !"30"(%&)# .'+4)+/.#')%&(* !")+$, 567768 -"%,'/ 9"+.'+-! )#-'!% )! 4"## )!%," :+)($,;)(/;:'0(/ )#*'+&%,- )(/ /"-)(/! #'4"+$'-90%)%&'()# +"!'0+$"!1 6'-'# "% )#1 <=>?@ ,)A" 9+'9'!"/)( )/)9%&A" A"+!&'( '. %," 6776 )#*'+&%,- 4,&$, ,)! :""(!,'4( %' ,)A" !09"+&'+ 9"+.'+-)($"1
2," .")%0+" !"#"$%&'( -"%,'/! &( 2):#" > $)( :" 0!"/4&%, )(B '. %," 4"##;C('4( $#)!!&.&"+!1 D0%E &. ) -0#%&#)B"+.""/ .'+4)+/ ("%4'+C &! 0!"/ .'+ 9)%%"+( $#)!!&.&$)%&'(E %,"(%," ('/";9+0(&(* -"%,'/ !&-0#%)("'0!#B /"%"+-&("! :'%,%," '9%&-)# .")%0+" !0:!"% )(/ %," '9%&-)# ("%4'+C$#)!!&.&"+ <FG@E <=HI@1 7&+!% %+)&( ) ("%4'+C )(/ %,"( +"-'A"%," #")!% !)#&"(% ('/" 5&( &(90% '+ ,&//"( #)B"+!81 2,"+"/0$"/ ("%4'+C &! %+)&("/ )*)&(E .'##'4"/ :B ) +"-'A)# '.B"% )('%,"+ #")!% !)#&"(% ('/"1 2,&! 9+'$"/0+" &! +"9")%"/0(%&# %," /"!&+"/ %+)/";'.. :"%4""( $#)!!&.&$)%&'( "++'+ )(/!&J" '. %," ("%4'+C &! )$,&"A"/1 2," 9+0(&(* '. )( &(90%('/" &! "30&A)#"(% %' +"-'A&(* %," $'++"!9'(/&(* .")%0+"1
K'4 +"#&):#" )+" %," .")%0+" !"#"$%&'( +"!0#%! 4,"( %,"+)%&' '. %," )A)&#):#" (0-:"+ '. %+)&(&(* !)-9#"! %' %,"(0-:"+ '. .")%0+"! &! !-)##L 6099'!" %," M),)#)(':&!/&!%)($" <>N@ &! 0!"/ )! %," .")%0+" !"#"$%&'( $+&%"+&'(1 O%/"9"(/! '( %," &(A"+!" '. %," )A"+)*" $#)!! $'A)+&)($"-)%+&P1 2," &-9+"$&!&'( &( &%! "!%&-)%" &( !-)## !)-9#" !&J"!&%0)%&'(! $)( +"!0#% &( )( '9%&-)# .")%0+" !0:!"% 4,&$, &!30&%" /&.."+"(% .+'- %," '9%&-)# !0:!"% %,)% 4'0#/ :"':%)&("/ 4,"( %," $'A)+&)($" -)%+&P &! C('4(1 Q)&( )(/R'(*C"+ <N>@ &##0!%+)%" %,&! 9,"('-"('( .'+ ) %4';$#)!!$#)!!&.&$)%&'( 9+':#"- &(A'#A&(* FH;/&-"(!&'()# S)0!!&)($#)!!;$'(/&%&'()# /"(!&%&"! 5%," !)-" /)%) 4)! )#!' 0!"/ :B2+0(C <=>T@ %' /"-'(!%+)%" %," $0+!" '. /&-"(!&'()#&%B9,"('-"('(81 U! "P9"$%"/E %," 30)#&%B '. %," !"#"$%"/.")%0+" !0:!"% .'+ !-)## %+)&(&(* !"%! &! 9''+E :0% &-9+'A"!)! %," %+)&(&(* !"% !&J" &($+")!"!1 7'+ "P)-9#"E 4&%, FH9)%%"+(! &( %," %+)&(&(* !"%E %," :+)($,;)(/;:'0(/ )#*';+&%,- !"#"$%"/ ) !0:!"% '. =H .")%0+"! 4,&$, &($#0/"/ '(#B.&A" .")%0+"! &( $'--'( 4&%, %," &/")# !0:!"% '. =H .")%0+"!54,"( /"(!&%&"! 4"+" C('4(81 V&%, FE>HH 9)%%"+(! &( %,"%+)&(&(* !"%E %," :+)($,;)(/;:'0(/ 9+'$"/0+" !"#"$%"/ ) =H;.")%0+" !0:!"% 4&%, '(#B '(" 4+'(* .")%0+"1
7&*1 G !,'4! )( "P)-9#" '. %," .")%0+" !"#"$%&'(9+'$"/0+" 0!&(* %," .#')%&(* !")+$, %"$,(&30" '( %," WXU.")%0+"! &( %," /&*&% /)%)!"% .'+ %4' /&.."+"(% %+)&(&(* !"%!&J"!1 2," %"!% !"% !&J" &! .&P"/ )% =EHHH 9)%%"+(!1 O( ")$, '.%," !"#"$%"/ .")%0+" !9)$"! 4&%, /&-"(!&'()#&%&"! +)(*&(*.+'- = %' G?E %," D)B"! 9#0*;&( $#)!!&.&"+ &! /"!&*("/)!!0-&(* S)0!!&)( /"(!&%&"! 4&%, "30)# $'A)+&)($"-)%+&$"! )(/ "A)#0)%"/ '( %," %"!% !"%1 2," .")%0+" !"#"$%&'($+&%"+&'( &! %," -&(&-0- 9)&+4&!" M),)#)(':&! /&!%)($"1 O(%," !-)## !)-9#" !&J" $)!" 5%'%)# '. =HH %+)&(&(* 9)%%"+(!8E%," $0+!" '. /&-"(!&'()#&%B 9,"('-"('( $)( :" $#")+#B':!"+A"/1 O( %,&! $)!"E %," '9%&-)# (0-:"+ '. .")%0+"! &!):'0% FH 4,&$, "30)#! !"! !! " "###E 4,"+" ( &! %," (0-:"+'. %+)&(&(* 9)%%"+(!1 2," +0#";'.;%,0-: '. ,)A&(* #"!! %,)(!""# .")%0+"! &! '( %," !)." !&/" &( *"("+)#1
! "#$%%&'&()%
Y($" ) .")%0+" !"#"$%&'( '+ $#)!!&.&$)%&'( 9+'$"/0+" .&(/! )9+'9"+ +"9+"!"(%)%&'(E ) $#)!!&.&"+ $)( :" /"!&*("/ 0!&(* )
(0-:"+ '. 9'!!&:#" )99+')$,"!1 O( 9+)$%&$"E %," $,'&$" '. )$#)!!&.&"+ &! ) /&..&$0#% 9+':#"- )(/ &% &! '.%"( :)!"/ '(4,&$, $#)!!&.&"+5!8 ,)99"( %' :" )A)&#):#"E '+ :"!% C('4(E %'%," 0!"+1
V" &/"(%&.B %,+"" /&.."+"(% )99+')$,"! %' /"!&*(&(* )$#)!!&.&"+1 2," !&-9#"!% )(/ %," -'!% &(%0&%&A" )99+')$, %'$#)!!&.&"+ /"!&*( &! :)!"/ '( %," $'($"9% '. !&-&#)+&%BZ9)%%"+(! %,)% )+" !&-&#)+ !,'0#/ :" )!!&*("/ %' %," !)-"$#)!!1 6'E '($" ) *''/ -"%+&$ ,)! :""( "!%):#&!,"/ %' /".&("!&-&#)+&%BE 9)%%"+(! $)( :" $#)!!&.&"/ :B %"-9#)%" -)%$,&(*'+ %," -&(&-0- /&!%)($" $#)!!&.&"+ 0!&(* ) ."4 9+'%'%B9"!9"+ $#)!!1 2," $,'&$" '. %," -"%+&$ )(/ %," 9+'%'%B9"! &!$+0$&)# %' %," !0$$"!! '. %,&! )99+')$,1 O( %," (")+"!% -")($#)!!&.&"+E !"#"$%&(* 9+'%'%B9"! &! A"+B !&-9#" )(/ +':0!%[")$, 9)%%"+( $#)!! &! +"9+"!"(%"/ :B ) !&(*#" 9+'%'%B9"4,&$, &! %," -")( A"$%'+ '. )## %," %+)&(&(* 9)%%"+(! &( %,)%$#)!!1 M'+" )/A)($"/ %"$,(&30"! .'+ $'-90%&(* 9+'%'%B9"!)+" A"$%'+ 30)(%&J)%&'( <==>@E <=T=@ )(/ #")+(&(* A"$%'+30)(%&J)%&'( <\F@E )(/ %," /)%) +"/0$%&'( -"%,'/! )!!';$&)%"/ 4&%, %," '(";(")+"!% ("&*,:'+ /"$&!&'( +0#" 5=;]]8E!0$, )! "/&%&(* )(/ $'(/"(!&(* <I\@1 2," -'!%!%+)&*,%.'+4)+/ =;]] +0#" $)( :" $'(A"(&"(%#B 0!"/ )! ):"($,-)+C .'+ )## %," '%,"+ $#)!!&.&"+! !&($" &% )99")+! %')#4)B! 9+'A&/" ) +")!'():#" $#)!!&.&$)%&'( 9"+.'+-)($" &(-'!% )99#&$)%&'(!1 70+%,"+E )! %," =;]] $#)!!&.&"+ /'"! ('%+"30&+" )(B 0!"+;!9"$&.&"/ 9)+)-"%"+! 5"P$"9% 9"+,)9! %,"/&!%)($" -"%+&$ 0!"/ %' .&(/ %," (")+"!% ("&*,:'+E :0%^0$#&/")( /&!%)($" &! $'--'(#B 0!"/8E &%! $#)!!&.&$)%&'(+"!0#%! )+" &-9#"-"(%)%&'( &(/"9"(/"(%1
O( -)(B $#)!!&.&$)%&'( 9+':#"-!E %," $#)!!&.&"+ &!"P9"$%"/ %' ,)A" !'-" /"!&+"/ &(A)+&)(% 9+'9"+%&"!1 U("P)-9#" &! %," !,&.% &(A)+&)($" '. $,)+)$%"+! &( $,)+)$%"++"$'*(&%&'([ ) $,)(*" &( ) $,)+)$%"+_! #'$)%&'( !,'0#/ ('%).."$% &%! $#)!!&.&$)%&'(1 O. %," 9+"9+'$"!!&(* '+ %,"+"9+"!"(%)%&'( !$,"-" /'"! ('% ('+-)#&J" %," &(90%9)%%"+( .'+ %,&! &(A)+&)($"E %,"( %," !)-" $,)+)$%"+ -)B:" +"9+"!"(%"/ )% -0#%&9#" 9'!&%&'(! &( %," .")%0+" !9)$"12,"!" 9'!&%&'(! /".&(" ) '(";/&-"(!&'()# !0:!9)$"1 U!-'+" &(A)+&)(%! )+" $'(!&/"+"/E %," /&-"(!&'()#&%B '. %,&!!0:!9)$" $'++"!9'(/&(*#B &($+")!"!1 2"-9#)%" -)%$,&(*
!"#$ %& "'() *&"&#*&#+"' ,"&&%-$ -%+./$#&#.$) " -%0#%1 23
456( 7( +89::5;5<9=5>? @AA>A B:( =C@ ?DEF@A >; ;@9=DA@: D:5?6 =C@ ;8>9=5?6
:@9A<C ;@9=DA@ :@8@<=5>? =@<C?5GD@ H:@@ =@I=J(
From “Statistical Pattern Recognition: A Review”, by Jain, Duin,
and Mao; IEEE Trans. of PAMI, January 2000
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Feature Selection and Extraction
7
In general, we would like to have a classifier or clustering
algorithm to use a minimum number of dimensions:
! - reduces the number of computations
! - statistical estimation reliability (more samples/dimension)
We can often use prior knowledge about the physical experiment
to choose the most discriminating measurements, but this is not
always the case.
Feature Selection:
Given m measurements, choose n<m best as features.
Feature Extraction:
Given m measurements {xi}, find n<m functions yj=fj(x1, ..., xm)
which produce the best features.
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Feature Selection
8
We require:
1. A criterion to establish the best features
2. An algorithm to optimize the chosen criterion
We need to evaluate a measure of the feature’s utility to
determine how much it contributes to performance.
2. FEATURE SELECTION AND EXTRACTION 3
2. Merge clusters which are close together
(mi !mj)T
!Si + Sj
2
"(mi !mj) < T2
2 Feature Selection and Extraction
max J(xk) = P (E|{xi, i "= k})! P (E|{xi#i})
1. Interclass distance
2 classes:
Ji =(mi1 !mi2)2
S2i1 + S2
i2
k classes:
Ji =k#
j=1
k#
l=1
(mij !mil)2
S2ij + S2
il
2. Information measure
I(Ci) ! ! log P (Ci)
independent:
I(Ci, yi) = 0
to evaluate:
I(Ci, Y ) = E[I(Ci, yj)]
Can’t usually test classifier before choosing features, so we need
some sub-optimal but reasonable criterion.
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Feature Selection - Criterion
9
Interclass distance (normalized by intraclass distance)
2 classes:
where mi1 = mean of ith feature of class 1
and si1 = scatter (variance) of ith feature in class 1
k classes:
2. FEATURE SELECTION AND EXTRACTION 3
Dealing k:
2. Merge clusters which are close together
(mi !mj)T
!Si + Sj
2
"(mi !mj) < T2
2 Feature Selection and Extraction
max J(xk) = P (E|{xi, i "= k})! P (E|{xi#i})
1. Interclass distance
2 classes:
Ji =(mi1 !mi2)2
s2i1 + s2
i2
k classes:
Ji =k#
j=1
k#
l=1
(mij !mil)2
s2ij + s2
il
2. Information measure
I(Ci) ! ! log P (Ci)
independent:
I(Ci, yi) = 0
to evaluate:
2. FEATURE SELECTION AND EXTRACTION 3
Dealing k:
2. Merge clusters which are close together
(mi !mj)T
!Si + Sj
2
"(mi !mj) < T2
2 Feature Selection and Extraction
max J(xk) = P (E|{xi, i "= k})! P (E|{xi#i})
1. Interclass distance
2 classes:
Ji =(mi1 !mi2)2
s2i1 + s2
i2
k classes:
Ji =k#
j=1
k#
l=1
(mij !mil)2
s2ij + s2
il
2. Information measure
I(Ci) ! ! log P (Ci)
independent:
I(Ci, yi) = 0
to evaluate:
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition
Optimizing
10
Whether we use distance or some other criterion, we still have to
find the best subset of measurements.
For m measurements, n features, there are
combinations.
Usually an exhaustive comparison is not feasible.
2. FEATURE SELECTION AND EXTRACTION 4
m measurements, n features:
!mn
"=
m!n!(m! n)!
1. Feature Ranking:
J1 " J2 " ... " Jn " ... " Jm
2. Incrementally Best:
y1 = xi st. Ji is maximized
y2 = xj st. J(y1, xj) > J(y1, xk)#k $= j
y3 = xk...
criterion: (a)
|SB ||SW |
(b)
k#
i=1
k#
j=1
(mi !mj)T S!1j (mi !mj)
(c)
I(C, Y )
J(y2, y3) > J(y1, y2)
Need a sub-optimal strategy...
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 11
1. Feature Ranking
If Ji = measure of ith feature’s effectiveness, then order features
such that:
" " J1 # J2 # ... # Jn # ... # Jm
- Take first n features.
- Might not comprise the best subset, since some features might
be highly correlated.
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 12
2. Incrementally Best Feature
Let y1 = xi such that Ji is maximized.
Choose y2 = xj such that
Choose y3 = xk such that ... (for y1 and y2)
And so on...
2. FEATURE SELECTION AND EXTRACTION 4
m measurements, n features:
!mn
"=
m!n!(m! n)!
1. Feature Ranking:
J1 " J2 " ... " Jn " ... " Jm
2. Incrementally Best:
y1 = xi st. Ji is maximized
y2 = xj st. J(y1, xj) > J(y1, xk)#k $= j
y3 = xk...
criterion: (a)
|SB ||SW |
(b)
k#
i=1
k#
j=1
(mi !mj)T S!1j (mi !mj)
(c)
I(C, Y )
J(y2, y3) > J(y1, y2)
Note: When we start evaluating subsets rather than individual
features, we might use a criterion like:
(a)
2. FEATURE SELECTION AND EXTRACTION 4
m measurements, n features:
!mn
"=
m!n!(m! n)!
1. Feature Ranking:
J1 " J2 " ... " Jn " ... " Jm
2. Incrementally Best:
y1 = xi st. Ji is maximized
y2 = xj st. J(y1, xj) > J(y1, xk)#k $= j
y3 = xk...
criterion: (a)
|SB ||SW |
(b)
k#
i=1
k#
j=1
(mi !mj)T S!1j (mi !mj)
(c)
I(C, Y )
J(y2, y3) > J(y1, y2)
2. FEATURE SELECTION AND EXTRACTION 4
m measurements, n features:
!mn
"=
m!n!(m! n)!
1. Feature Ranking:
J1 " J2 " ... " Jn " ... " Jm
2. Incrementally Best:
y1 = xi st. Ji is maximized
y2 = xj st. J(y1, xj) > J(y1, xk)#k $= j
y3 = xk...
criterion: (a)
|SB ||SW |
(b)
k#
i=1
k#
j=1
(mi !mj)T S!1j (mi !mj)
(c)
I(C, Y )
J(y2, y3) > J(y1, y2)
(b)
2. FEATURE SELECTION AND EXTRACTION 4
m measurements, n features:
!mn
"=
m!n!(m! n)!
1. Feature Ranking:
J1 " J2 " ... " Jn " ... " Jm
2. Incrementally Best:
y1 = xi st. Ji is maximized
y2 = xj st. J(y1, xj) > J(y1, xk)#k $= j
y3 = xk...
criterion: (a)
|SB ||SW |
(b)
k#
i=1
k#
j=1
(mi !mj)T S!1j (mi !mj)
(c)
I(C, Y )
J(y2, y3) > J(y1, y2)
(c)
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 13
Still don’t necessarily get the best subset since it’s possible that
2. FEATURE SELECTION AND EXTRACTION 4
m measurements, n features:
!mn
"=
m!n!(m! n)!
1. Feature Ranking:
J1 " J2 " ... " Jn " ... " Jm
2. Incrementally Best:
y1 = xi st. Ji is maximized
y2 = xj st. J(y1, xj) > J(y1, xk)#k $= j
y3 = xk...
criterion: (a)
|SB ||SW |
(b)
k#
i=1
k#
j=1
(mi !mj)T S!1j (mi !mj)
(c)
I(C, Y )
J(y2, y3) > J(y1, y2)
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 14
3. Successive Addition/Deletion
Given k features {y1, ..., yk} with Jk( {y} ), add yk+1 such that
J(y1, ..., yk+1) is maximized.
Now find l, 1 ! l ! k, to maximize J( {yi, 1 ! i ! k+1, i"l} )
If this is greater than Jk, then delete yl and add the next
incrementally best feature.
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 15
Feature Selection
In summary:
1. Evaluate feature to determine how much it contributes to
performance.
! a. Interclass distance
! b. Information measures
2. Optimize chosen criterion
! a. Rank features by effectiveness and choose best
! b. Incrementally add features to set of chosen features
! c. Successively add and delete features to chosen set