machine learning: jordan boyd-graberusers.umiacs.umd.edu/~jbg/teaching/cmsc_726/08b.pdf · behold...

42
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland KERNEL SVMS Slides adapted from Jerry Zhu Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 1 / 27

Upload: others

Post on 08-Oct-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Introduction to Machine Learning

Machine Learning: Jordan Boyd-GraberUniversity of MarylandKERNEL SVMS

Slides adapted from Jerry Zhu

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 1 / 27

Page 2: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Can you solve this with linear separator?

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 2 / 27

Page 3: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Can you solve this with linear separator?

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 2 / 27

Page 4: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Can you solve this with linear separator?

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 2 / 27

Page 5: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Adding another dimension

Behold yon miserable creature. That Point is aBeing like ourselves, but confined to thenon-dimensional Gulf. He is himself his own World,his own Universe; of any other than himself he canform no conception; he knows not Length, norBreadth, nor Height, for he has had no experienceof them; he has no cognizance even of the numberTwo; nor has he a thought of Plurality, for he ishimself his One and All, being really Nothing. Yetmark his perfect self-contentment, and hence learnthis lesson, that to be self-contented is to be vileand ignorant, and that to aspire is better than to beblindly and impotently happy.

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 3 / 27

Page 6: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Problems get easier in higher dimensions

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 4 / 27

Page 7: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

What’s special about SVMs?

max~α

m∑

i=1

αi −1

2

m∑

i=1

m∑

i=1

αiαjyiyj(~xi · ~xj) (1)

� This dot product is basically just how much xi looks like xj . Can wegeneralize that?

� Kernels!

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 5 / 27

Page 8: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

What’s special about SVMs?

max~α

m∑

i=1

αi −1

2

m∑

i=1

m∑

i=1

αiαjyiyj(~xi · ~xj) (1)

� This dot product is basically just how much xi looks like xj . Can wegeneralize that?

� Kernels!

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 5 / 27

Page 9: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

What’s special about SVMs?

max~α

m∑

i=1

αi −1

2

m∑

i=1

m∑

i=1

αiαjyiyj(~xi · ~xj) (1)

� This dot product is basically just how much xi looks like xj . Can wegeneralize that?

� Kernels!

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 5 / 27

Page 10: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

What’s a kernel?

� A function K :X ×X 7→R is a kernel overX .

� This is equivalent to taking the dot product ⟨φ(x1),φ(x2)⟩ for somemapping

� Mercer’s Theorem: So long as the function is continuous andsymmetric, then K admits an expansion of the form

K (x ,x ′) =∞∑

n=0

anφn(x)φn(x′) (2)

� The computational cost is just in computing the kernel

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 6 / 27

Page 11: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

What’s a kernel?

� A function K :X ×X 7→R is a kernel overX .

� This is equivalent to taking the dot product ⟨φ(x1),φ(x2)⟩ for somemapping

� Mercer’s Theorem: So long as the function is continuous andsymmetric, then K admits an expansion of the form

K (x ,x ′) =∞∑

n=0

anφn(x)φn(x′) (2)

� The computational cost is just in computing the kernel

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 6 / 27

Page 12: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Polynomial Kernel

K (x ,x ′) = (x · x ′+ c)d (3)

When d = 2:

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 27

Page 13: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Polynomial Kernel

K (x ,x ′) = (x · x ′+ c)d (3)

When d = 2:

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 27

Page 14: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Gaussian Kernel

K (x ,x ′) = exp−

x ′− x

2

2σ2(4)

which can be rewritten as

K (x ,x ′) =∑

n

(x · x ′)n

σnn!(5)

(All polynomials!)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 8 / 27

Page 15: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Gaussian Kernel

K (x ,x ′) = exp−

x ′− x

2

2σ2(4)

which can be rewritten as

K (x ,x ′) =∑

n

(x · x ′)n

σnn!(5)

(All polynomials!)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 8 / 27

Page 16: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Gaussian Kernel

K (x ,x ′) = exp−

x ′− x

2

2σ2(4)

which can be rewritten as

K (x ,x ′) =∑

n

(x · x ′)n

σnn!(5)

(All polynomials!)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 8 / 27

Page 17: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Tree Kernels

� Sometimes we have example x that are hard to express as vectors

� For example sentences “a dog” and “a cat”: internal syntax structure

3/5 structures match, so tree kernel returns .6

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 9 / 27

Page 18: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Tree Kernels

� Sometimes we have example x that are hard to express as vectors

� For example sentences “a dog” and “a cat”: internal syntax structure

3/5 structures match, so tree kernel returns .6

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 9 / 27

Page 19: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Tree Kernels

� Sometimes we have example x that are hard to express as vectors

� For example sentences “a dog” and “a cat”: internal syntax structure

3/5 structures match, so tree kernel returns .6

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 9 / 27

Page 20: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

What does this do to learnability?

� Kernelized hypothesis spaces are obviously more complicated

� What does this do to complexity?

� Rademacher complexity for a kernel with radius Λ and data with radiusr : S ⊂ {x : K (x ,x)≤ r2}, H = {x 7→w ·φ(x) : ‖w‖ ≤Λ}

R̂S (H)≤

√ r2Λ2

m(6)

� Proof requires real analysis

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 10 / 27

Page 21: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

What does this do to learnability?

� Kernelized hypothesis spaces are obviously more complicated

� What does this do to complexity?

� Rademacher complexity for a kernel with radius Λ and data with radiusr : S ⊂ {x : K (x ,x)≤ r2}, H = {x 7→w ·φ(x) : ‖w‖ ≤Λ}

R̂S (H)≤

√ r2Λ2

m(6)

� Proof requires real analysis

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 10 / 27

Page 22: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

What does this do to learnability?

� Kernelized hypothesis spaces are obviously more complicated

� What does this do to complexity?

� Rademacher complexity for a kernel with radius Λ and data with radiusr : S ⊂ {x : K (x ,x)≤ r2}, H = {x 7→w ·φ(x) : ‖w‖ ≤Λ}

R̂S (H)≤

√ r2Λ2

m(6)

� Proof requires real analysis

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 10 / 27

Page 23: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Margin learnability

� With probability 1−δ:

R(h)≤ R̂ρ(h)+2

√ r2Λ2/ρ2

m+

√ log 1δ

2m(7)

� So if you can find a simple kernel representation that induces a margin,use it!

� . . . so long as you can handle the computational complexity

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 11 / 27

Page 24: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Margin learnability

� With probability 1−δ:

R(h)≤ R̂ρ(h)+2

√ r2Λ2/ρ2

m+

√ log 1δ

2m(7)

� So if you can find a simple kernel representation that induces a margin,use it!

� . . . so long as you can handle the computational complexity

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 11 / 27

Page 25: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Margin learnability

� With probability 1−δ:

R(h)≤ R̂ρ(h)+2

√ r2Λ2/ρ2

m+

√ log 1δ

2m(7)

� So if you can find a simple kernel representation that induces a margin,use it!

� . . . so long as you can handle the computational complexity

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 11 / 27

Page 26: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

How does it effect optimization

� Replace all dot product with kernel evaluations K (x1,x2)

� Makes computation more expensive, overall structure is the same

� Try linear first!

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 12 / 27

Page 27: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

Kernelized SVM

X, Y = read_data("ex8a.txt")clf = svm.SVC(kernel=kk, degree=dd, gamma=gg)clf.fit(X, Y)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 13 / 27

Page 28: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

Linear Kernel Doesn’t Work

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 14 / 27

Page 29: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

Polynomial Kernel

K (x ,x ′) = (x · x ′+ c)d (8)

When d = 2:

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 15 / 27

Page 30: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

Polynomial Kernel d = 1,c = 5

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 16 / 27

Page 31: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

Polynomial Kernel d = 2,c = 5

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 17 / 27

Page 32: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

Polynomial Kernel d = 3,c = 5

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 18 / 27

Page 33: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

Gaussian Kernel

K (x ,x ′) = exp�

γ

x ′− x

2�(9)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 19 / 27

Page 34: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

RBF Kernel γ= 2

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 20 / 27

Page 35: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

RBF Kernel γ= 100

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 21 / 27

Page 36: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

RBF Kernel γ= 1

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 22 / 27

Page 37: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

RBF Kernel γ= 10

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 23 / 27

Page 38: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

RBF Kernel γ= 100

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 24 / 27

Page 39: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

RBF Kernel γ= 1000

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 25 / 27

Page 40: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

Be careful!

� Which has the lowest training error?

� Which one would generalize best?

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 26 / 27

Page 41: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

Recap

� This completes our discussion ofSVMs

� Workhorse method of machinelearning

� Flexible, fast, effective

� Kernels: applicable to wide rangeof data, inner product trick keepsmethod simple

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 27 / 27

Page 42: Machine Learning: Jordan Boyd-Graberusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/08b.pdf · Behold yon miserable creature. That Point is a Being like ourselves, but confined to the

Examples

Recap

� This completes our discussion ofSVMs

� Workhorse method of machinelearning

� Flexible, fast, effective

� Kernels: applicable to wide rangeof data, inner product trick keepsmethod simple

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 27 / 27