introduction to kernel methods - github pages · to kernel methods f. gonz alez introduction the...

99
Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data Introduction to Kernel Methods Fabio A. Gonz´ alez Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogot´ a September 11, 2015

Upload: others

Post on 22-Sep-2020

33 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Introduction to Kernel Methods

Fabio A. Gonzalez Ph.D.

Depto. de Ing. de Sistemas e IndustrialUniversidad Nacional de Colombia, Bogota

September 11, 2015

Page 2: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Outline

1 IntroductionMotivation

2 The Kernel TrickMapping the input space to the feature spaceCalculating the dot product in the feature space

3 The Kernel Approach to Machine Learning

4 A Kernel Pattern Analysis AlgorithmPrimal linear regressionDual linear regression

5 Kernel FunctionsMathematical characterisationVisualizing kernels in input space

6 Kernel Algorithms

7 Kernels in Complex Structured Data

Page 3: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

Motivation

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Outline

1 IntroductionMotivation

2 The Kernel TrickMapping the input space to the feature spaceCalculating the dot product in the feature space

3 The Kernel Approach to Machine Learning

4 A Kernel Pattern Analysis AlgorithmPrimal linear regressionDual linear regression

5 Kernel FunctionsMathematical characterisationVisualizing kernels in input space

6 Kernel Algorithms

7 Kernels in Complex Structured Data

Page 4: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

Motivation

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Outline

1 IntroductionMotivation

2 The Kernel TrickMapping the input space to the feature spaceCalculating the dot product in the feature space

3 The Kernel Approach to Machine Learning

4 A Kernel Pattern Analysis AlgorithmPrimal linear regressionDual linear regression

5 Kernel FunctionsMathematical characterisationVisualizing kernels in input space

6 Kernel Algorithms

7 Kernels in Complex Structured Data

Page 5: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

Motivation

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Problem 1

How to separate these two classes using a linear function?

Page 6: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

Motivation

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Problem 2

How to do symbolic regression?

Σ = {A,C ,G ,T}

f : Σd → RACGTA 7→ 10.0GTCCA 7→ 11.3GGTAC 7→ 1.0CCTGA 7→ 4.5

......

...

Page 7: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

Mapping the inputspace to the featurespace

Calculating the dotproduct in thefeature space

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Outline

1 IntroductionMotivation

2 The Kernel TrickMapping the input space to the feature spaceCalculating the dot product in the feature space

3 The Kernel Approach to Machine Learning

4 A Kernel Pattern Analysis AlgorithmPrimal linear regressionDual linear regression

5 Kernel FunctionsMathematical characterisationVisualizing kernels in input space

6 Kernel Algorithms

7 Kernels in Complex Structured Data

Page 8: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

Mapping the inputspace to the featurespace

Calculating the dotproduct in thefeature space

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Outline

1 IntroductionMotivation

2 The Kernel TrickMapping the input space to the feature spaceCalculating the dot product in the feature space

3 The Kernel Approach to Machine Learning

4 A Kernel Pattern Analysis AlgorithmPrimal linear regressionDual linear regression

5 Kernel FunctionsMathematical characterisationVisualizing kernels in input space

6 Kernel Algorithms

7 Kernels in Complex Structured Data

Page 9: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

Mapping the inputspace to the featurespace

Calculating the dotproduct in thefeature space

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Problem 1

• How to separate these two classes using a linear function?

Page 10: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

Mapping the inputspace to the featurespace

Calculating the dotproduct in thefeature space

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• Map to R3:

φ : R2 → R3

(x , y) 7→ (x 2, y2, xy)

Page 11: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

Mapping the inputspace to the featurespace

Calculating the dotproduct in thefeature space

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• Map to R3:

φ : R2 → R3

(x , y) 7→ (x 2, y2, xy)

Page 12: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

Mapping the inputspace to the featurespace

Calculating the dotproduct in thefeature space

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Input space vs. feature space

Page 13: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

Mapping the inputspace to the featurespace

Calculating the dotproduct in thefeature space

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Outline

1 IntroductionMotivation

2 The Kernel TrickMapping the input space to the feature spaceCalculating the dot product in the feature space

3 The Kernel Approach to Machine Learning

4 A Kernel Pattern Analysis AlgorithmPrimal linear regressionDual linear regression

5 Kernel FunctionsMathematical characterisationVisualizing kernels in input space

6 Kernel Algorithms

7 Kernels in Complex Structured Data

Page 14: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

Mapping the inputspace to the featurespace

Calculating the dotproduct in thefeature space

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Dot product in the feature space

φ : R2 → R3

(x1, x2) 7→ (x 21 , x

22 ,√

2x1x2)

〈φ(x ), φ(z )〉 =⟨

(x 21 , x

22 ,√

2x1x2), (z21 , z

22 ,√

2z1z2)⟩

= x 21 z

21 + x 2

2 z22 + 2x1x2z1z2

= (x1z1 + x2z2)2

= 〈x , z 〉2

• A function k : X ×X → R such thatk(x , z ) = 〈φ(x ), φ(z )〉 is called a kernel

• Morale: you don’t need to apply φ explicitly tocalculate the dot product in the feature space!

Page 15: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

Mapping the inputspace to the featurespace

Calculating the dotproduct in thefeature space

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Dot product in the feature space

φ : R2 → R3

(x1, x2) 7→ (x 21 , x

22 ,√

2x1x2)

〈φ(x ), φ(z )〉 =⟨

(x 21 , x

22 ,√

2x1x2), (z21 , z

22 ,√

2z1z2)⟩

= x 21 z

21 + x 2

2 z22 + 2x1x2z1z2

= (x1z1 + x2z2)2

= 〈x , z 〉2

• A function k : X ×X → R such thatk(x , z ) = 〈φ(x ), φ(z )〉 is called a kernel

• Morale: you don’t need to apply φ explicitly tocalculate the dot product in the feature space!

Page 16: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

Mapping the inputspace to the featurespace

Calculating the dotproduct in thefeature space

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Dot product in the feature space

φ : R2 → R3

(x1, x2) 7→ (x 21 , x

22 ,√

2x1x2)

〈φ(x ), φ(z )〉 =⟨

(x 21 , x

22 ,√

2x1x2), (z21 , z

22 ,√

2z1z2)⟩

= x 21 z

21 + x 2

2 z22 + 2x1x2z1z2

= (x1z1 + x2z2)2

= 〈x , z 〉2

• A function k : X ×X → R such thatk(x , z ) = 〈φ(x ), φ(z )〉 is called a kernel

• Morale: you don’t need to apply φ explicitly tocalculate the dot product in the feature space!

Page 17: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

Mapping the inputspace to the featurespace

Calculating the dotproduct in thefeature space

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Dot product in the feature space

φ : R2 → R3

(x1, x2) 7→ (x 21 , x

22 ,√

2x1x2)

〈φ(x ), φ(z )〉 =⟨

(x 21 , x

22 ,√

2x1x2), (z21 , z

22 ,√

2z1z2)⟩

= x 21 z

21 + x 2

2 z22 + 2x1x2z1z2

= (x1z1 + x2z2)2

= 〈x , z 〉2

• A function k : X ×X → R such thatk(x , z ) = 〈φ(x ), φ(z )〉 is called a kernel

• Morale: you don’t need to apply φ explicitly tocalculate the dot product in the feature space!

Page 18: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

Mapping the inputspace to the featurespace

Calculating the dotproduct in thefeature space

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Kernel induced feature space

• The feature space induced by the kernel is not unique:The kernel

k(x , z ) = 〈x , z 〉2

also calculates the dot product in the four dimensionalfeature space:

φ : R2 → R4

(x1, x2) 7→ (x 21 , x

22 , x1x2, x2x1)

• The example can be generalised to Rn

Page 19: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

Mapping the inputspace to the featurespace

Calculating the dotproduct in thefeature space

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Kernel induced feature space

• The feature space induced by the kernel is not unique:The kernel

k(x , z ) = 〈x , z 〉2

also calculates the dot product in the four dimensionalfeature space:

φ : R2 → R4

(x1, x2) 7→ (x 21 , x

22 , x1x2, x2x1)

• The example can be generalised to Rn

Page 20: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Outline

1 IntroductionMotivation

2 The Kernel TrickMapping the input space to the feature spaceCalculating the dot product in the feature space

3 The Kernel Approach to Machine Learning

4 A Kernel Pattern Analysis AlgorithmPrimal linear regressionDual linear regression

5 Kernel FunctionsMathematical characterisationVisualizing kernels in input space

6 Kernel Algorithms

7 Kernels in Complex Structured Data

Page 21: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

The Process

Page 22: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

The Approach

• Data items are embedded into a vector space called thefeature space

• Linear relations are sought among the images of the dataitems in the feature space

• The pattern analysis algorithm are based only on thepairwise dot products, they do not need the actualcoordinates of the embedded points

• The pairwise dot products in the feature space could beefficiently calculated using a kernel function

Page 23: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

The Approach

• Data items are embedded into a vector space called thefeature space

• Linear relations are sought among the images of the dataitems in the feature space

• The pattern analysis algorithm are based only on thepairwise dot products, they do not need the actualcoordinates of the embedded points

• The pairwise dot products in the feature space could beefficiently calculated using a kernel function

Page 24: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

The Approach

• Data items are embedded into a vector space called thefeature space

• Linear relations are sought among the images of the dataitems in the feature space

• The pattern analysis algorithm are based only on thepairwise dot products, they do not need the actualcoordinates of the embedded points

• The pairwise dot products in the feature space could beefficiently calculated using a kernel function

Page 25: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

The Approach

• Data items are embedded into a vector space called thefeature space

• Linear relations are sought among the images of the dataitems in the feature space

• The pattern analysis algorithm are based only on thepairwise dot products, they do not need the actualcoordinates of the embedded points

• The pairwise dot products in the feature space could beefficiently calculated using a kernel function

Page 26: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Outline

1 IntroductionMotivation

2 The Kernel TrickMapping the input space to the feature spaceCalculating the dot product in the feature space

3 The Kernel Approach to Machine Learning

4 A Kernel Pattern Analysis AlgorithmPrimal linear regressionDual linear regression

5 Kernel FunctionsMathematical characterisationVisualizing kernels in input space

6 Kernel Algorithms

7 Kernels in Complex Structured Data

Page 27: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Outline

1 IntroductionMotivation

2 The Kernel TrickMapping the input space to the feature spaceCalculating the dot product in the feature space

3 The Kernel Approach to Machine Learning

4 A Kernel Pattern Analysis AlgorithmPrimal linear regressionDual linear regression

5 Kernel FunctionsMathematical characterisationVisualizing kernels in input space

6 Kernel Algorithms

7 Kernels in Complex Structured Data

Page 28: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Problem definition

• Given a training set S = {(x1, y1), . . . , (xl , yl )} of pointsxi ∈ Rn with corresponding labels yi ∈ R the problem isto find a real-valued linear function that best interpolatesthe training set:

g(x) = 〈w, x〉 = w′x =

n∑i=1

wixi

• If the data points were generated by a function like g(x),it is possible to find the parameters w by solving

Xw = y

where

X =

x′1...

x′l

Page 29: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Problem definition

• Given a training set S = {(x1, y1), . . . , (xl , yl )} of pointsxi ∈ Rn with corresponding labels yi ∈ R the problem isto find a real-valued linear function that best interpolatesthe training set:

g(x) = 〈w, x〉 = w′x =

n∑i=1

wixi

• If the data points were generated by a function like g(x),it is possible to find the parameters w by solving

Xw = y

where

X =

x′1...

x′l

Page 30: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Graphical representation

Page 31: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Loss function

• Minimize

L(g ,S ) = L(w,S ) =

l∑i=1

(yi − g(xi))2 =

l∑i=1

ξ2i

=

l∑i=1

L(g , (xi , yi))

• This could be written as

L(w,S ) = ‖ξ‖2 = (y −Xw)′(y −Xw)

• Optimization problem:

minwL(w,S ) = min

w(y −Xw)′(y −Xw)

Page 32: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Loss function

• Minimize

L(g ,S ) = L(w,S ) =

l∑i=1

(yi − g(xi))2 =

l∑i=1

ξ2i

=

l∑i=1

L(g , (xi , yi))

• This could be written as

L(w,S ) = ‖ξ‖2 = (y −Xw)′(y −Xw)

• Optimization problem:

minwL(w,S ) = min

w(y −Xw)′(y −Xw)

Page 33: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Loss function

• Minimize

L(g ,S ) = L(w,S ) =

l∑i=1

(yi − g(xi))2 =

l∑i=1

ξ2i

=

l∑i=1

L(g , (xi , yi))

• This could be written as

L(w,S ) = ‖ξ‖2 = (y −Xw)′(y −Xw)

• Optimization problem:

minwL(w,S ) = min

w(y −Xw)′(y −Xw)

Page 34: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

∂L(w,S )

∂w= −2X′y + 2X′Xw = 0,

thereforeX′Xw = X′y,

andw = (X′X)−1X′y

Page 35: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Outline

1 IntroductionMotivation

2 The Kernel TrickMapping the input space to the feature spaceCalculating the dot product in the feature space

3 The Kernel Approach to Machine Learning

4 A Kernel Pattern Analysis AlgorithmPrimal linear regressionDual linear regression

5 Kernel FunctionsMathematical characterisationVisualizing kernels in input space

6 Kernel Algorithms

7 Kernels in Complex Structured Data

Page 36: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Dual representation of the problem

• w = (X′X)−1X′y = X′X(X′X)−2X′y = X′α

• So, w is a linear combination of the training samples,w =

∑li=1 αixi .

Page 37: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Dual representation of the problem

• w = (X′X)−1X′y = X′X(X′X)−2X′y = X′α

• So, w is a linear combination of the training samples,w =

∑li=1 αixi .

Page 38: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• From the solution of the primal problem:

X′Xw = X′y,

• thenXX′Xw = XX′y,

• using the dual representation

XX′XX′α = XX′y,

• thenα = (XX′)−1y,

• andg(x) = w′x = α′Xx.

• Note: XX′ may be close to singular, or singular accordingto machine precision.

Page 39: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• From the solution of the primal problem:

X′Xw = X′y,

• thenXX′Xw = XX′y,

• using the dual representation

XX′XX′α = XX′y,

• thenα = (XX′)−1y,

• andg(x) = w′x = α′Xx.

• Note: XX′ may be close to singular, or singular accordingto machine precision.

Page 40: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• From the solution of the primal problem:

X′Xw = X′y,

• thenXX′Xw = XX′y,

• using the dual representation

XX′XX′α = XX′y,

• thenα = (XX′)−1y,

• andg(x) = w′x = α′Xx.

• Note: XX′ may be close to singular, or singular accordingto machine precision.

Page 41: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• From the solution of the primal problem:

X′Xw = X′y,

• thenXX′Xw = XX′y,

• using the dual representation

XX′XX′α = XX′y,

• thenα = (XX′)−1y,

• andg(x) = w′x = α′Xx.

• Note: XX′ may be close to singular, or singular accordingto machine precision.

Page 42: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• From the solution of the primal problem:

X′Xw = X′y,

• thenXX′Xw = XX′y,

• using the dual representation

XX′XX′α = XX′y,

• thenα = (XX′)−1y,

• andg(x) = w′x = α′Xx.

• Note: XX′ may be close to singular, or singular accordingto machine precision.

Page 43: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• From the solution of the primal problem:

X′Xw = X′y,

• thenXX′Xw = XX′y,

• using the dual representation

XX′XX′α = XX′y,

• thenα = (XX′)−1y,

• andg(x) = w′x = α′Xx.

• Note: XX′ may be close to singular, or singular accordingto machine precision.

Page 44: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Ridge regression

• If XX′ is singular, the pseudo-inverse could be used: tofind the w that satisfies X′Xw = X′y with minimal norm.

• Optimisation problem:

minwLλ(w,S ) = min

wλ ‖w‖2 +

l∑i=1

(yi − g(xi))2,

where λ defines the trade-off between norm and loss. Thiscontrols the complexity of the model (the process is calledregularization).

Page 45: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Ridge regression

• If XX′ is singular, the pseudo-inverse could be used: tofind the w that satisfies X′Xw = X′y with minimal norm.

• Optimisation problem:

minwLλ(w,S ) = min

wλ ‖w‖2 +

l∑i=1

(yi − g(xi))2,

where λ defines the trade-off between norm and loss. Thiscontrols the complexity of the model (the process is calledregularization).

Page 46: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• Taking the derivative and making it equal to zero:

X′Xw + λw = (X′X + λIn)w = X′y,

where In is an identity matrix of n × n dimension,

• then,w = (X′X + λIn)−1X′y.

• In terms of α:

w = λ−1X′(y −Xw) = X′α,

• thenα = λ−1(y −Xw) = (XX′ + λIl )

−1y.

Page 47: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• Taking the derivative and making it equal to zero:

X′Xw + λw = (X′X + λIn)w = X′y,

where In is an identity matrix of n × n dimension,

• then,w = (X′X + λIn)−1X′y.

• In terms of α:

w = λ−1X′(y −Xw) = X′α,

• thenα = λ−1(y −Xw) = (XX′ + λIl )

−1y.

Page 48: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• Taking the derivative and making it equal to zero:

X′Xw + λw = (X′X + λIn)w = X′y,

where In is an identity matrix of n × n dimension,

• then,w = (X′X + λIn)−1X′y.

• In terms of α:

w = λ−1X′(y −Xw) = X′α,

• thenα = λ−1(y −Xw) = (XX′ + λIl )

−1y.

Page 49: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• Taking the derivative and making it equal to zero:

X′Xw + λw = (X′X + λIn)w = X′y,

where In is an identity matrix of n × n dimension,

• then,w = (X′X + λIn)−1X′y.

• In terms of α:

w = λ−1X′(y −Xw) = X′α,

• thenα = λ−1(y −Xw) = (XX′ + λIl )

−1y.

Page 50: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Prediction function

g(x) = 〈w, x〉 =

⟨l∑

i=1

αixi, x

⟩=

l∑i=1

αi 〈xi, x〉

Page 51: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Ridge regression as a kernelmethod

• The Gram matrix G = XX′ is the matrix of dot products

G = XX′ =

x′1...

x′l

[x1 · · · xl ] =

〈x1, x1〉 〈x1, xl 〉

〈xl , x1〉 〈xl , xl 〉

• G may be replaced by a general kernel matrix, K, with

kij = k(xi, xj) = < φ(xi), φ(xj) >

• The α’s are calculated as:

α = (K + λIl )−1y

• The predicted function is approximated as:

g(x) =

l∑i=1

αik(x, xi) = y ′(K + λIl )−1

k(x, x1)...

k(x, xl)

Page 52: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Ridge regression as a kernelmethod

• The Gram matrix G = XX′ is the matrix of dot products

G = XX′ =

x′1...

x′l

[x1 · · · xl ] =

〈x1, x1〉 〈x1, xl 〉

〈xl , x1〉 〈xl , xl 〉

• G may be replaced by a general kernel matrix, K, with

kij = k(xi, xj) = < φ(xi), φ(xj) >

• The α’s are calculated as:

α = (K + λIl )−1y

• The predicted function is approximated as:

g(x) =

l∑i=1

αik(x, xi) = y ′(K + λIl )−1

k(x, x1)...

k(x, xl)

Page 53: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Ridge regression as a kernelmethod

• The Gram matrix G = XX′ is the matrix of dot products

G = XX′ =

x′1...

x′l

[x1 · · · xl ] =

〈x1, x1〉 〈x1, xl 〉

〈xl , x1〉 〈xl , xl 〉

• G may be replaced by a general kernel matrix, K, with

kij = k(xi, xj) = < φ(xi), φ(xj) >

• The α’s are calculated as:

α = (K + λIl )−1y

• The predicted function is approximated as:

g(x) =

l∑i=1

αik(x, xi) = y ′(K + λIl )−1

k(x, x1)...

k(x, xl)

Page 54: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

Primal linearregression

Dual linear regression

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Ridge regression as a kernelmethod

• The Gram matrix G = XX′ is the matrix of dot products

G = XX′ =

x′1...

x′l

[x1 · · · xl ] =

〈x1, x1〉 〈x1, xl 〉

〈xl , x1〉 〈xl , xl 〉

• G may be replaced by a general kernel matrix, K, with

kij = k(xi, xj) = < φ(xi), φ(xj) >

• The α’s are calculated as:

α = (K + λIl )−1y

• The predicted function is approximated as:

g(x) =

l∑i=1

αik(x, xi) = y ′(K + λIl )−1

k(x, x1)...

k(x, xl)

Page 55: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

Outline

1 IntroductionMotivation

2 The Kernel TrickMapping the input space to the feature spaceCalculating the dot product in the feature space

3 The Kernel Approach to Machine Learning

4 A Kernel Pattern Analysis AlgorithmPrimal linear regressionDual linear regression

5 Kernel FunctionsMathematical characterisationVisualizing kernels in input space

6 Kernel Algorithms

7 Kernels in Complex Structured Data

Page 56: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

Outline

1 IntroductionMotivation

2 The Kernel TrickMapping the input space to the feature spaceCalculating the dot product in the feature space

3 The Kernel Approach to Machine Learning

4 A Kernel Pattern Analysis AlgorithmPrimal linear regressionDual linear regression

5 Kernel FunctionsMathematical characterisationVisualizing kernels in input space

6 Kernel Algorithms

7 Kernels in Complex Structured Data

Page 57: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

Characterisation

Theorem(Mercer’s Theorem)A function

k : X ×X → R,

which is either continuous or has a countable domain, can bedecomposed

k(x, z) = 〈φ(x), φ(z)〉

into a feature map φ into a Hilbert space F applied to both itsarguments followed by the evaluation of the inner product in Fif and only if it satisfies the finitely positive semi-definiteproperty.

Page 58: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

Some kernel functions

Assume k1 and k2 kernels:

• k(x, z) = p(k1(x, z)). p a polynomial with positivecoefficients.

• k(x, z) = exp(k1(x, z)).

• k(x, z) = exp(−‖x− z‖2 /(2σ2)). Gaussian kernel.

• k(x, z) = k1(x, z)k2(x, z)

Page 59: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

Some kernel functions

Assume k1 and k2 kernels:

• k(x, z) = p(k1(x, z)). p a polynomial with positivecoefficients.

• k(x, z) = exp(k1(x, z)).

• k(x, z) = exp(−‖x− z‖2 /(2σ2)). Gaussian kernel.

• k(x, z) = k1(x, z)k2(x, z)

Page 60: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

Some kernel functions

Assume k1 and k2 kernels:

• k(x, z) = p(k1(x, z)). p a polynomial with positivecoefficients.

• k(x, z) = exp(k1(x, z)).

• k(x, z) = exp(−‖x− z‖2 /(2σ2)). Gaussian kernel.

• k(x, z) = k1(x, z)k2(x, z)

Page 61: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

Some kernel functions

Assume k1 and k2 kernels:

• k(x, z) = p(k1(x, z)). p a polynomial with positivecoefficients.

• k(x, z) = exp(k1(x, z)).

• k(x, z) = exp(−‖x− z‖2 /(2σ2)). Gaussian kernel.

• k(x, z) = k1(x, z)k2(x, z)

Page 62: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

Embeddings corresponding tokernels

• It is possible to calculate the feature space induced by akernel (Mercer’s Theorem)

• This can be done in a constructive way

• The feature space can even be of infinite dimension.

Page 63: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

Embeddings corresponding tokernels

• It is possible to calculate the feature space induced by akernel (Mercer’s Theorem)

• This can be done in a constructive way

• The feature space can even be of infinite dimension.

Page 64: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

Embeddings corresponding tokernels

• It is possible to calculate the feature space induced by akernel (Mercer’s Theorem)

• This can be done in a constructive way

• The feature space can even be of infinite dimension.

Page 65: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

Outline

1 IntroductionMotivation

2 The Kernel TrickMapping the input space to the feature spaceCalculating the dot product in the feature space

3 The Kernel Approach to Machine Learning

4 A Kernel Pattern Analysis AlgorithmPrimal linear regressionDual linear regression

5 Kernel FunctionsMathematical characterisationVisualizing kernels in input space

6 Kernel Algorithms

7 Kernels in Complex Structured Data

Page 66: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

How to visualize?

• Choose a point in input space p0

• Calculate the distance from another point x to p0 in thefeature space:

‖φ(p0)− φ(x )‖2F = 〈φ(p0)− φ(x ), φ(p0)− φ(x )〉F= 〈φ(p0), φ(p0)〉F + 〈φ(x ), φ(x )〉F−2 〈φ(p0), φ(x )〉F

= k(p0, p0) + k(x , x )− 2k(p0, x )

• Plot f (x ) = ‖φ(p0)− φ(x )‖2F

Page 67: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

How to visualize?

• Choose a point in input space p0

• Calculate the distance from another point x to p0 in thefeature space:

‖φ(p0)− φ(x )‖2F = 〈φ(p0)− φ(x ), φ(p0)− φ(x )〉F= 〈φ(p0), φ(p0)〉F + 〈φ(x ), φ(x )〉F−2 〈φ(p0), φ(x )〉F

= k(p0, p0) + k(x , x )− 2k(p0, x )

• Plot f (x ) = ‖φ(p0)− φ(x )‖2F

Page 68: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

How to visualize?

• Choose a point in input space p0

• Calculate the distance from another point x to p0 in thefeature space:

‖φ(p0)− φ(x )‖2F = 〈φ(p0)− φ(x ), φ(p0)− φ(x )〉F= 〈φ(p0), φ(p0)〉F + 〈φ(x ), φ(x )〉F−2 〈φ(p0), φ(x )〉F

= k(p0, p0) + k(x , x )− 2k(p0, x )

• Plot f (x ) = ‖φ(p0)− φ(x )‖2F

Page 69: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

Identity kernel

k(x , z ) = 〈x , z 〉

Page 70: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

Quadratic kernel (1)

k(x , z ) = 〈x , z 〉2

Page 71: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

Identity kernel (2)

k(x , z ) = 〈x , z 〉2

Page 72: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

Mathematicalcharacterisation

Visualizing kernels ininput space

KernelAlgorithms

Kernels inComplexStructuredData

Gaussian kernel

k(x, z) = e−‖x−z‖22σ2

Page 73: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Outline

1 IntroductionMotivation

2 The Kernel TrickMapping the input space to the feature spaceCalculating the dot product in the feature space

3 The Kernel Approach to Machine Learning

4 A Kernel Pattern Analysis AlgorithmPrimal linear regressionDual linear regression

5 Kernel FunctionsMathematical characterisationVisualizing kernels in input space

6 Kernel Algorithms

7 Kernels in Complex Structured Data

Page 74: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Basic computations in featurespace

• Means

• Distances

• Projections

• Covariance

Page 75: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Basic computations in featurespace

• Means

• Distances

• Projections

• Covariance

Page 76: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Basic computations in featurespace

• Means

• Distances

• Projections

• Covariance

Page 77: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Basic computations in featurespace

• Means

• Distances

• Projections

• Covariance

Page 78: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Classification and regression

• Support Vector Machines

• Support Vector Regression

• Kernel Fisher Discriminant

• Kernel Perceptron

Page 79: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Classification and regression

• Support Vector Machines

• Support Vector Regression

• Kernel Fisher Discriminant

• Kernel Perceptron

Page 80: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Classification and regression

• Support Vector Machines

• Support Vector Regression

• Kernel Fisher Discriminant

• Kernel Perceptron

Page 81: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Classification and regression

• Support Vector Machines

• Support Vector Regression

• Kernel Fisher Discriminant

• Kernel Perceptron

Page 82: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Dimensionality reduction andclustering

• Kernel PCA

• Kernel CCA

• Kernel k -means

• Kernel SOM

Page 83: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Dimensionality reduction andclustering

• Kernel PCA

• Kernel CCA

• Kernel k -means

• Kernel SOM

Page 84: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Dimensionality reduction andclustering

• Kernel PCA

• Kernel CCA

• Kernel k -means

• Kernel SOM

Page 85: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Dimensionality reduction andclustering

• Kernel PCA

• Kernel CCA

• Kernel k -means

• Kernel SOM

Page 86: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Outline

1 IntroductionMotivation

2 The Kernel TrickMapping the input space to the feature spaceCalculating the dot product in the feature space

3 The Kernel Approach to Machine Learning

4 A Kernel Pattern Analysis AlgorithmPrimal linear regressionDual linear regression

5 Kernel FunctionsMathematical characterisationVisualizing kernels in input space

6 Kernel Algorithms

7 Kernels in Complex Structured Data

Page 87: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Kernels in complex structured data

• Since kernel methods do not require an attribute-basedrepresentation of objects, it is possible to perform learningover complex structured data (or unstructured data)

• We only need to define a dot product operation (similarity,dissimilarity measure)

• Examples:

• Strings• Texts• Trees• Graphs

Page 88: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Kernels in complex structured data

• Since kernel methods do not require an attribute-basedrepresentation of objects, it is possible to perform learningover complex structured data (or unstructured data)

• We only need to define a dot product operation (similarity,dissimilarity measure)

• Examples:

• Strings• Texts• Trees• Graphs

Page 89: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Kernels in complex structured data

• Since kernel methods do not require an attribute-basedrepresentation of objects, it is possible to perform learningover complex structured data (or unstructured data)

• We only need to define a dot product operation (similarity,dissimilarity measure)

• Examples:

• Strings• Texts• Trees• Graphs

Page 90: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Problem 2

How to do symbolic regression?

Σ = {A,C ,G ,T}

f : Σd → RACGTA 7→ 10.0GTCCA 7→ 11.3GGTAC 7→ 1.0CCTGA 7→ 4.5

......

...

Page 91: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• Define a kernel on strings

k : Σd × Σd → R

• Use the kernel along with a kernel learning regressionalgorithm to find the regression function

• What is a good candidate for k?

• a function that measures string similarity• higher value for similar strings, smaller value for different

strings• k(s1 . . . sd , t1 . . . td) =

∑ni=1 equal(si , ti)

equal(si , ti) =

{1 if si = ti

0 otherwise

• k(ACTAG ,CCTCG) =?

• Is it a kernel?

Page 92: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• Define a kernel on strings

k : Σd × Σd → R

• Use the kernel along with a kernel learning regressionalgorithm to find the regression function

• What is a good candidate for k?

• a function that measures string similarity• higher value for similar strings, smaller value for different

strings• k(s1 . . . sd , t1 . . . td) =

∑ni=1 equal(si , ti)

equal(si , ti) =

{1 if si = ti

0 otherwise

• k(ACTAG ,CCTCG) =?

• Is it a kernel?

Page 93: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• Define a kernel on strings

k : Σd × Σd → R

• Use the kernel along with a kernel learning regressionalgorithm to find the regression function

• What is a good candidate for k?

• a function that measures string similarity• higher value for similar strings, smaller value for different

strings• k(s1 . . . sd , t1 . . . td) =

∑ni=1 equal(si , ti)

equal(si , ti) =

{1 if si = ti

0 otherwise

• k(ACTAG ,CCTCG) =?

• Is it a kernel?

Page 94: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• Define a kernel on strings

k : Σd × Σd → R

• Use the kernel along with a kernel learning regressionalgorithm to find the regression function

• What is a good candidate for k?

• a function that measures string similarity• higher value for similar strings, smaller value for different

strings• k(s1 . . . sd , t1 . . . td) =

∑ni=1 equal(si , ti)

equal(si , ti) =

{1 if si = ti

0 otherwise

• k(ACTAG ,CCTCG) =?

• Is it a kernel?

Page 95: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• Define a kernel on strings

k : Σd × Σd → R

• Use the kernel along with a kernel learning regressionalgorithm to find the regression function

• What is a good candidate for k?

• a function that measures string similarity• higher value for similar strings, smaller value for different

strings• k(s1 . . . sd , t1 . . . td) =

∑ni=1 equal(si , ti)

equal(si , ti) =

{1 if si = ti

0 otherwise

• k(ACTAG ,CCTCG) =?

• Is it a kernel?

Page 96: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Solution

• Define a kernel on strings

k : Σd × Σd → R

• Use the kernel along with a kernel learning regressionalgorithm to find the regression function

• What is a good candidate for k?

• a function that measures string similarity• higher value for similar strings, smaller value for different

strings• k(s1 . . . sd , t1 . . . td) =

∑ni=1 equal(si , ti)

equal(si , ti) =

{1 if si = ti

0 otherwise

• k(ACTAG ,CCTCG) =?

• Is it a kernel?

Page 97: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Induced Feature Space

• What is the feature space induced by k?

φ : Σd → R4d

s1 . . . sd 7→ (x 11 , . . . , x

14 , x

21 , . . . , x

24 , . . . , x

d1 , . . . , x

d4 )

(x j1 , . . . , x

j4 ) =

(1, 0, 0, 0) if sj = ′A′

(0, 1, 0, 0) if sj = ′C′

(0, 0, 1, 0) if sj = ′G′

(0, 0, 0, 1) if sj = ′T′

Page 98: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

Induced Feature Space

• What is the feature space induced by k?

φ : Σd → R4d

s1 . . . sd 7→ (x 11 , . . . , x

14 , x

21 , . . . , x

24 , . . . , x

d1 , . . . , x

d4 )

(x j1 , . . . , x

j4 ) =

(1, 0, 0, 0) if sj = ′A′

(0, 1, 0, 0) if sj = ′C′

(0, 0, 1, 0) if sj = ′G′

(0, 0, 0, 1) if sj = ′T′

Page 99: Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

Introductionto KernelMethods

F. Gonzalez

Introduction

The KernelTrick

The KernelApproach toMachineLearning

A KernelPatternAnalysisAlgorithm

KernelFunctions

KernelAlgorithms

Kernels inComplexStructuredData

References

Shawe-Taylor, J. and Cristianini, N. 2004 Kernel Methodsfor Pattern Analysis. Cambridge University Press.