lecture 17

8
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition Memorial University of Newfoundland Pattern Recognition Lecture 17, July 11, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30 - 9:30 PM EN-3026 ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition Presentations - July 18 th and 20 th All graduate students are to give a 10-12 minute presentation + answer questions for 3-5 minutes All students are expected to attend Fill in evaluation sheets for each presentation Email me (charles.robertson@verafin.com ) by Thursday, July 13 th , with: Final project title Requirements for presentation (if any) 2 “Chance favours the prepared mind.” Louis Pasteur

Upload: aritra-bhattacharyya

Post on 04-Jan-2016

212 views

Category:

Documents


0 download

DESCRIPTION

neural nets lecture

TRANSCRIPT

Page 1: Lecture 17

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Memorial University of Newfoundland

Pattern RecognitionLecture 17, July 11, 2006

http://www.engr.mun.ca/~charlesr

Office Hours: Tuesdays & Thursdays 8:30 - 9:30 PM

EN-3026

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Presentations - July 18th and 20th

• All graduate students are to give a 10-12 minute

presentation + answer questions for 3-5 minutes

• All students are expected to attend

• Fill in evaluation sheets for each presentation

• Email me ([email protected]) by

Thursday, July 13th, with:

• Final project title

• Requirements for presentation (if any)

2

“Chance favours the prepared mind.”Louis Pasteur

Page 2: Lecture 17

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Presentations - July 25th

• Suggestions and considerations

• Think about what your audience already knows and

probably doesn’t know

• Include information on the pattern recognition algorithm

(2-4 minutes)

• Describe the context of your use of the algorithm (1-2

min)

• Describe how you used the algorithm (2-3 min)

• Discuss your experiment and results - is the algorithm

actually useful? (2-3 min)

3

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Recap

• K-Means

• Basic algorithm review

• Extensions - global minimization method

• Discussion of strategies for dealing with k

• Examples

• K-Means - basic and global minimization

• Dendrogram

4

Page 3: Lecture 17

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Feature Selection and Extraction

• Classifier performance depend on a combination of

the number of samples, number of features, and

complexity of the classifier.

• Ideally, using more features leads to better

performance

• Only true if the class-conditional densities are

completely known

• Not true if the number of training samples small relative

to the number of features

• This is known as the ‘peaking phenomenon’, or

equivalently ‘the curse of dimensionality’

5

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 6

!"#"$%&'( )#*'+&%,-! &( %"+-! '. $#)!!&.&$)%&'( "++'+ )(/ +0(%&-"1 2," *"("+)# $'($#0!&'( &! %,)% %," !"30"(%&)# .'+4)+/.#')%&(* !")+$, 567768 -"%,'/ 9"+.'+-! )#-'!% )! 4"## )!%," :+)($,;)(/;:'0(/ )#*'+&%,- )(/ /"-)(/! #'4"+$'-90%)%&'()# +"!'0+$"!1 6'-'# "% )#1 <=>?@ ,)A" 9+'9'!"/)( )/)9%&A" A"+!&'( '. %," 6776 )#*'+&%,- 4,&$, ,)! :""(!,'4( %' ,)A" !09"+&'+ 9"+.'+-)($"1

2," .")%0+" !"#"$%&'( -"%,'/! &( 2):#" > $)( :" 0!"/4&%, )(B '. %," 4"##;C('4( $#)!!&.&"+!1 D0%E &. ) -0#%&#)B"+.""/ .'+4)+/ ("%4'+C &! 0!"/ .'+ 9)%%"+( $#)!!&.&$)%&'(E %,"(%," ('/";9+0(&(* -"%,'/ !&-0#%)("'0!#B /"%"+-&("! :'%,%," '9%&-)# .")%0+" !0:!"% )(/ %," '9%&-)# ("%4'+C$#)!!&.&"+ <FG@E <=HI@1 7&+!% %+)&( ) ("%4'+C )(/ %,"( +"-'A"%," #")!% !)#&"(% ('/" 5&( &(90% '+ ,&//"( #)B"+!81 2,"+"/0$"/ ("%4'+C &! %+)&("/ )*)&(E .'##'4"/ :B ) +"-'A)# '.B"% )('%,"+ #")!% !)#&"(% ('/"1 2,&! 9+'$"/0+" &! +"9")%"/0(%&# %," /"!&+"/ %+)/";'.. :"%4""( $#)!!&.&$)%&'( "++'+ )(/!&J" '. %," ("%4'+C &! )$,&"A"/1 2," 9+0(&(* '. )( &(90%('/" &! "30&A)#"(% %' +"-'A&(* %," $'++"!9'(/&(* .")%0+"1

K'4 +"#&):#" )+" %," .")%0+" !"#"$%&'( +"!0#%! 4,"( %,"+)%&' '. %," )A)&#):#" (0-:"+ '. %+)&(&(* !)-9#"! %' %,"(0-:"+ '. .")%0+"! &! !-)##L 6099'!" %," M),)#)(':&!/&!%)($" <>N@ &! 0!"/ )! %," .")%0+" !"#"$%&'( $+&%"+&'(1 O%/"9"(/! '( %," &(A"+!" '. %," )A"+)*" $#)!! $'A)+&)($"-)%+&P1 2," &-9+"$&!&'( &( &%! "!%&-)%" &( !-)## !)-9#" !&J"!&%0)%&'(! $)( +"!0#% &( )( '9%&-)# .")%0+" !0:!"% 4,&$, &!30&%" /&.."+"(% .+'- %," '9%&-)# !0:!"% %,)% 4'0#/ :"':%)&("/ 4,"( %," $'A)+&)($" -)%+&P &! C('4(1 Q)&( )(/R'(*C"+ <N>@ &##0!%+)%" %,&! 9,"('-"('( .'+ ) %4';$#)!!$#)!!&.&$)%&'( 9+':#"- &(A'#A&(* FH;/&-"(!&'()# S)0!!&)($#)!!;$'(/&%&'()# /"(!&%&"! 5%," !)-" /)%) 4)! )#!' 0!"/ :B2+0(C <=>T@ %' /"-'(!%+)%" %," $0+!" '. /&-"(!&'()#&%B9,"('-"('(81 U! "P9"$%"/E %," 30)#&%B '. %," !"#"$%"/.")%0+" !0:!"% .'+ !-)## %+)&(&(* !"%! &! 9''+E :0% &-9+'A"!)! %," %+)&(&(* !"% !&J" &($+")!"!1 7'+ "P)-9#"E 4&%, FH9)%%"+(! &( %," %+)&(&(* !"%E %," :+)($,;)(/;:'0(/ )#*';+&%,- !"#"$%"/ ) !0:!"% '. =H .")%0+"! 4,&$, &($#0/"/ '(#B.&A" .")%0+"! &( $'--'( 4&%, %," &/")# !0:!"% '. =H .")%0+"!54,"( /"(!&%&"! 4"+" C('4(81 V&%, FE>HH 9)%%"+(! &( %,"%+)&(&(* !"%E %," :+)($,;)(/;:'0(/ 9+'$"/0+" !"#"$%"/ ) =H;.")%0+" !0:!"% 4&%, '(#B '(" 4+'(* .")%0+"1

7&*1 G !,'4! )( "P)-9#" '. %," .")%0+" !"#"$%&'(9+'$"/0+" 0!&(* %," .#')%&(* !")+$, %"$,(&30" '( %," WXU.")%0+"! &( %," /&*&% /)%)!"% .'+ %4' /&.."+"(% %+)&(&(* !"%!&J"!1 2," %"!% !"% !&J" &! .&P"/ )% =EHHH 9)%%"+(!1 O( ")$, '.%," !"#"$%"/ .")%0+" !9)$"! 4&%, /&-"(!&'()#&%&"! +)(*&(*.+'- = %' G?E %," D)B"! 9#0*;&( $#)!!&.&"+ &! /"!&*("/)!!0-&(* S)0!!&)( /"(!&%&"! 4&%, "30)# $'A)+&)($"-)%+&$"! )(/ "A)#0)%"/ '( %," %"!% !"%1 2," .")%0+" !"#"$%&'($+&%"+&'( &! %," -&(&-0- 9)&+4&!" M),)#)(':&! /&!%)($"1 O(%," !-)## !)-9#" !&J" $)!" 5%'%)# '. =HH %+)&(&(* 9)%%"+(!8E%," $0+!" '. /&-"(!&'()#&%B 9,"('-"('( $)( :" $#")+#B':!"+A"/1 O( %,&! $)!"E %," '9%&-)# (0-:"+ '. .")%0+"! &!):'0% FH 4,&$, "30)#! !"! !! " "###E 4,"+" ( &! %," (0-:"+'. %+)&(&(* 9)%%"+(!1 2," +0#";'.;%,0-: '. ,)A&(* #"!! %,)(!""# .")%0+"! &! '( %," !)." !&/" &( *"("+)#1

! "#$%%&'&()%

Y($" ) .")%0+" !"#"$%&'( '+ $#)!!&.&$)%&'( 9+'$"/0+" .&(/! )9+'9"+ +"9+"!"(%)%&'(E ) $#)!!&.&"+ $)( :" /"!&*("/ 0!&(* )

(0-:"+ '. 9'!!&:#" )99+')$,"!1 O( 9+)$%&$"E %," $,'&$" '. )$#)!!&.&"+ &! ) /&..&$0#% 9+':#"- )(/ &% &! '.%"( :)!"/ '(4,&$, $#)!!&.&"+5!8 ,)99"( %' :" )A)&#):#"E '+ :"!% C('4(E %'%," 0!"+1

V" &/"(%&.B %,+"" /&.."+"(% )99+')$,"! %' /"!&*(&(* )$#)!!&.&"+1 2," !&-9#"!% )(/ %," -'!% &(%0&%&A" )99+')$, %'$#)!!&.&"+ /"!&*( &! :)!"/ '( %," $'($"9% '. !&-&#)+&%BZ9)%%"+(! %,)% )+" !&-&#)+ !,'0#/ :" )!!&*("/ %' %," !)-"$#)!!1 6'E '($" ) *''/ -"%+&$ ,)! :""( "!%):#&!,"/ %' /".&("!&-&#)+&%BE 9)%%"+(! $)( :" $#)!!&.&"/ :B %"-9#)%" -)%$,&(*'+ %," -&(&-0- /&!%)($" $#)!!&.&"+ 0!&(* ) ."4 9+'%'%B9"!9"+ $#)!!1 2," $,'&$" '. %," -"%+&$ )(/ %," 9+'%'%B9"! &!$+0$&)# %' %," !0$$"!! '. %,&! )99+')$,1 O( %," (")+"!% -")($#)!!&.&"+E !"#"$%&(* 9+'%'%B9"! &! A"+B !&-9#" )(/ +':0!%[")$, 9)%%"+( $#)!! &! +"9+"!"(%"/ :B ) !&(*#" 9+'%'%B9"4,&$, &! %," -")( A"$%'+ '. )## %," %+)&(&(* 9)%%"+(! &( %,)%$#)!!1 M'+" )/A)($"/ %"$,(&30"! .'+ $'-90%&(* 9+'%'%B9"!)+" A"$%'+ 30)(%&J)%&'( <==>@E <=T=@ )(/ #")+(&(* A"$%'+30)(%&J)%&'( <\F@E )(/ %," /)%) +"/0$%&'( -"%,'/! )!!';$&)%"/ 4&%, %," '(";(")+"!% ("&*,:'+ /"$&!&'( +0#" 5=;]]8E!0$, )! "/&%&(* )(/ $'(/"(!&(* <I\@1 2," -'!%!%+)&*,%.'+4)+/ =;]] +0#" $)( :" $'(A"(&"(%#B 0!"/ )! ):"($,-)+C .'+ )## %," '%,"+ $#)!!&.&"+! !&($" &% )99")+! %')#4)B! 9+'A&/" ) +")!'():#" $#)!!&.&$)%&'( 9"+.'+-)($" &(-'!% )99#&$)%&'(!1 70+%,"+E )! %," =;]] $#)!!&.&"+ /'"! ('%+"30&+" )(B 0!"+;!9"$&.&"/ 9)+)-"%"+! 5"P$"9% 9"+,)9! %,"/&!%)($" -"%+&$ 0!"/ %' .&(/ %," (")+"!% ("&*,:'+E :0%^0$#&/")( /&!%)($" &! $'--'(#B 0!"/8E &%! $#)!!&.&$)%&'(+"!0#%! )+" &-9#"-"(%)%&'( &(/"9"(/"(%1

O( -)(B $#)!!&.&$)%&'( 9+':#"-!E %," $#)!!&.&"+ &!"P9"$%"/ %' ,)A" !'-" /"!&+"/ &(A)+&)(% 9+'9"+%&"!1 U("P)-9#" &! %," !,&.% &(A)+&)($" '. $,)+)$%"+! &( $,)+)$%"++"$'*(&%&'([ ) $,)(*" &( ) $,)+)$%"+_! #'$)%&'( !,'0#/ ('%).."$% &%! $#)!!&.&$)%&'(1 O. %," 9+"9+'$"!!&(* '+ %,"+"9+"!"(%)%&'( !$,"-" /'"! ('% ('+-)#&J" %," &(90%9)%%"+( .'+ %,&! &(A)+&)($"E %,"( %," !)-" $,)+)$%"+ -)B:" +"9+"!"(%"/ )% -0#%&9#" 9'!&%&'(! &( %," .")%0+" !9)$"12,"!" 9'!&%&'(! /".&(" ) '(";/&-"(!&'()# !0:!9)$"1 U!-'+" &(A)+&)(%! )+" $'(!&/"+"/E %," /&-"(!&'()#&%B '. %,&!!0:!9)$" $'++"!9'(/&(*#B &($+")!"!1 2"-9#)%" -)%$,&(*

!"#$ %& "'() *&"&#*&#+"' ,"&&%-$ -%+./$#&#.$) " -%0#%1 23

456( 7( +89::5;5<9=5>? @AA>A B:( =C@ ?DEF@A >; ;@9=DA@: D:5?6 =C@ ;8>9=5?6

:@9A<C ;@9=DA@ :@8@<=5>? =@<C?5GD@ H:@@ =@I=J(

From “Statistical Pattern Recognition: A Review”, by Jain, Duin,

and Mao; IEEE Trans. of PAMI, January 2000

Page 4: Lecture 17

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Feature Selection and Extraction

7

In general, we would like to have a classifier or clustering

algorithm to use a minimum number of dimensions:

! - reduces the number of computations

! - statistical estimation reliability (more samples/dimension)

We can often use prior knowledge about the physical experiment

to choose the most discriminating measurements, but this is not

always the case.

Feature Selection:

Given m measurements, choose n<m best as features.

Feature Extraction:

Given m measurements {xi}, find n<m functions yj=fj(x1, ..., xm)

which produce the best features.

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Feature Selection

8

We require:

1. A criterion to establish the best features

2. An algorithm to optimize the chosen criterion

We need to evaluate a measure of the feature’s utility to

determine how much it contributes to performance.

2. FEATURE SELECTION AND EXTRACTION 3

2. Merge clusters which are close together

(mi !mj)T

!Si + Sj

2

"(mi !mj) < T2

2 Feature Selection and Extraction

max J(xk) = P (E|{xi, i "= k})! P (E|{xi#i})

1. Interclass distance

2 classes:

Ji =(mi1 !mi2)2

S2i1 + S2

i2

k classes:

Ji =k#

j=1

k#

l=1

(mij !mil)2

S2ij + S2

il

2. Information measure

I(Ci) ! ! log P (Ci)

independent:

I(Ci, yi) = 0

to evaluate:

I(Ci, Y ) = E[I(Ci, yj)]

Can’t usually test classifier before choosing features, so we need

some sub-optimal but reasonable criterion.

Page 5: Lecture 17

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Feature Selection - Criterion

9

Interclass distance (normalized by intraclass distance)

2 classes:

where mi1 = mean of ith feature of class 1

and si1 = scatter (variance) of ith feature in class 1

k classes:

2. FEATURE SELECTION AND EXTRACTION 3

Dealing k:

2. Merge clusters which are close together

(mi !mj)T

!Si + Sj

2

"(mi !mj) < T2

2 Feature Selection and Extraction

max J(xk) = P (E|{xi, i "= k})! P (E|{xi#i})

1. Interclass distance

2 classes:

Ji =(mi1 !mi2)2

s2i1 + s2

i2

k classes:

Ji =k#

j=1

k#

l=1

(mij !mil)2

s2ij + s2

il

2. Information measure

I(Ci) ! ! log P (Ci)

independent:

I(Ci, yi) = 0

to evaluate:

2. FEATURE SELECTION AND EXTRACTION 3

Dealing k:

2. Merge clusters which are close together

(mi !mj)T

!Si + Sj

2

"(mi !mj) < T2

2 Feature Selection and Extraction

max J(xk) = P (E|{xi, i "= k})! P (E|{xi#i})

1. Interclass distance

2 classes:

Ji =(mi1 !mi2)2

s2i1 + s2

i2

k classes:

Ji =k#

j=1

k#

l=1

(mij !mil)2

s2ij + s2

il

2. Information measure

I(Ci) ! ! log P (Ci)

independent:

I(Ci, yi) = 0

to evaluate:

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Optimizing

10

Whether we use distance or some other criterion, we still have to

find the best subset of measurements.

For m measurements, n features, there are

combinations.

Usually an exhaustive comparison is not feasible.

2. FEATURE SELECTION AND EXTRACTION 4

m measurements, n features:

!mn

"=

m!n!(m! n)!

1. Feature Ranking:

J1 " J2 " ... " Jn " ... " Jm

2. Incrementally Best:

y1 = xi st. Ji is maximized

y2 = xj st. J(y1, xj) > J(y1, xk)#k $= j

y3 = xk...

criterion: (a)

|SB ||SW |

(b)

k#

i=1

k#

j=1

(mi !mj)T S!1j (mi !mj)

(c)

I(C, Y )

J(y2, y3) > J(y1, y2)

Need a sub-optimal strategy...

Page 6: Lecture 17

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 11

1. Feature Ranking

If Ji = measure of ith feature’s effectiveness, then order features

such that:

" " J1 # J2 # ... # Jn # ... # Jm

- Take first n features.

- Might not comprise the best subset, since some features might

be highly correlated.

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 12

2. Incrementally Best Feature

Let y1 = xi such that Ji is maximized.

Choose y2 = xj such that

Choose y3 = xk such that ... (for y1 and y2)

And so on...

2. FEATURE SELECTION AND EXTRACTION 4

m measurements, n features:

!mn

"=

m!n!(m! n)!

1. Feature Ranking:

J1 " J2 " ... " Jn " ... " Jm

2. Incrementally Best:

y1 = xi st. Ji is maximized

y2 = xj st. J(y1, xj) > J(y1, xk)#k $= j

y3 = xk...

criterion: (a)

|SB ||SW |

(b)

k#

i=1

k#

j=1

(mi !mj)T S!1j (mi !mj)

(c)

I(C, Y )

J(y2, y3) > J(y1, y2)

Note: When we start evaluating subsets rather than individual

features, we might use a criterion like:

(a)

2. FEATURE SELECTION AND EXTRACTION 4

m measurements, n features:

!mn

"=

m!n!(m! n)!

1. Feature Ranking:

J1 " J2 " ... " Jn " ... " Jm

2. Incrementally Best:

y1 = xi st. Ji is maximized

y2 = xj st. J(y1, xj) > J(y1, xk)#k $= j

y3 = xk...

criterion: (a)

|SB ||SW |

(b)

k#

i=1

k#

j=1

(mi !mj)T S!1j (mi !mj)

(c)

I(C, Y )

J(y2, y3) > J(y1, y2)

2. FEATURE SELECTION AND EXTRACTION 4

m measurements, n features:

!mn

"=

m!n!(m! n)!

1. Feature Ranking:

J1 " J2 " ... " Jn " ... " Jm

2. Incrementally Best:

y1 = xi st. Ji is maximized

y2 = xj st. J(y1, xj) > J(y1, xk)#k $= j

y3 = xk...

criterion: (a)

|SB ||SW |

(b)

k#

i=1

k#

j=1

(mi !mj)T S!1j (mi !mj)

(c)

I(C, Y )

J(y2, y3) > J(y1, y2)

(b)

2. FEATURE SELECTION AND EXTRACTION 4

m measurements, n features:

!mn

"=

m!n!(m! n)!

1. Feature Ranking:

J1 " J2 " ... " Jn " ... " Jm

2. Incrementally Best:

y1 = xi st. Ji is maximized

y2 = xj st. J(y1, xj) > J(y1, xk)#k $= j

y3 = xk...

criterion: (a)

|SB ||SW |

(b)

k#

i=1

k#

j=1

(mi !mj)T S!1j (mi !mj)

(c)

I(C, Y )

J(y2, y3) > J(y1, y2)

(c)

Page 7: Lecture 17

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 13

Still don’t necessarily get the best subset since it’s possible that

2. FEATURE SELECTION AND EXTRACTION 4

m measurements, n features:

!mn

"=

m!n!(m! n)!

1. Feature Ranking:

J1 " J2 " ... " Jn " ... " Jm

2. Incrementally Best:

y1 = xi st. Ji is maximized

y2 = xj st. J(y1, xj) > J(y1, xk)#k $= j

y3 = xk...

criterion: (a)

|SB ||SW |

(b)

k#

i=1

k#

j=1

(mi !mj)T S!1j (mi !mj)

(c)

I(C, Y )

J(y2, y3) > J(y1, y2)

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 14

3. Successive Addition/Deletion

Given k features {y1, ..., yk} with Jk( {y} ), add yk+1 such that

J(y1, ..., yk+1) is maximized.

Now find l, 1 ! l ! k, to maximize J( {yi, 1 ! i ! k+1, i"l} )

If this is greater than Jk, then delete yl and add the next

incrementally best feature.

Page 8: Lecture 17

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 15

Feature Selection

In summary:

1. Evaluate feature to determine how much it contributes to

performance.

! a. Interclass distance

! b. Information measures

2. Optimize chosen criterion

! a. Rank features by effectiveness and choose best

! b. Incrementally add features to set of chosen features

! c. Successively add and delete features to chosen set