svm icml01 tutorial

Upload: amur-al-manji

Post on 08-Apr-2018

245 views

Category:

Documents


1 download

TRANSCRIPT

  • 8/7/2019 SVM Icml01 Tutorial

    1/85

    Support Vector and KernelMachines

    Nello CristianiniBIOwulf Technologies

    [email protected]

    http://www.support-vector.net/tutorial.html

    ICML 2001

  • 8/7/2019 SVM Icml01 Tutorial

    2/85

    www.support-vector.net

    A Little History

    z SVMs introduced in COLT-92 by Boser, Guyon,Vapnik. Greatly developed ever since.

    z Initially popularized in the NIPS community, now animportant and active field of all Machine Learningresearch.

    z Special issues of Machine Learning Journal, and

    Journal of Machine Learning Research.z Kernel Machines: large class of learning algorithms,

    SVMs a particular instance.

  • 8/7/2019 SVM Icml01 Tutorial

    3/85

    www.support-vector.net

    A Little History

    z Annual workshop at NIPS

    z Centralized website: www.kernel-machines.org

    z Textbook (2000): see www.support-vector.net

    z Now: a large and diverse community: from machinelearning, optimization, statistics, neural networks,functional analysis, etc. etc

    z

    Successful applications in many fields (bioinformatics,text, handwriting recognition, etc)

    z Fast expanding field, EVERYBODY WELCOME ! -

  • 8/7/2019 SVM Icml01 Tutorial

    4/85

    www.support-vector.net

    Preliminaries

    z Task of this class of algorithms: detect andexploit complex patterns in data (eg: byclustering, classifying, ranking, cleaning, etc.the data)

    z Typical problems: how to represent complexpatterns; and how to exclude spurious

    (unstable) patterns (= overfitting)z The first is a computational problem; the

    second a statistical problem.

  • 8/7/2019 SVM Icml01 Tutorial

    5/85

    www.support-vector.net

    Very Informal Reasoning

    z The class of kernel methods implicitly definesthe class of possible patterns by introducing anotion of similarity between data

    z Example: similarity between documentsz By length

    z By topic

    z By language

    z Choice of similarity Choice of relevantfeatures

  • 8/7/2019 SVM Icml01 Tutorial

    6/85

    www.support-vector.net

    More formal reasoning

    z Kernel methods exploit information about the inner

    products between data items

    z Many standard algorithms can be rewritten so that they

    only require inner products between data (inputs)

    z Kernel functions = inner products in some feature

    space (potentially very complex)

    z If kernel given, no need to specify what features of thedata are being used

  • 8/7/2019 SVM Icml01 Tutorial

    7/85

    www.support-vector.net

    Just in case

    z Inner product between vectors

    z Hyperplane: x z xzi i

    i

    ,

    =x

    x

    o

    o

    o x

    xo

    oo

    x

    x

    x

    w

    b

    w x b, + = 0

  • 8/7/2019 SVM Icml01 Tutorial

    8/85

    www.support-vector.net

    Overview of the Tutorial

    z Introduce basic concepts with extended

    example of Kernel Perceptron

    z Derive Support Vector Machinesz Other kernel based algorithms

    z Properties and Limitations of Kernels

    z On Kernel Alignment

    z On Optimizing Kernel Alignment

  • 8/7/2019 SVM Icml01 Tutorial

    9/85

    www.support-vector.net

    Parts I and II: overview

    z Linear Learning Machines (LLM)

    z Kernel Induced Feature Spaces

    z Generalization Theoryz Optimization Theory

    z Support Vector Machines (SVM)

  • 8/7/2019 SVM Icml01 Tutorial

    10/85

    www.support-vector.net

    Modularity

    z Any kernel-based learning algorithm composed of twomodules:

    A general purpose learning machine

    A problem specific kernel function

    z Any K-B algorithm can be fitted with any kernel

    z Kernels themselves can be constructed in a modular

    wayz Great for software engineering (and for analysis)

    IMPORTANTCONCEPT

  • 8/7/2019 SVM Icml01 Tutorial

    11/85

    www.support-vector.net

    1-Linear Learning Machines

    z Simplest case: classification. Decision function

    is a hyperplane in input space

    z The Perceptron Algorithm (Rosenblatt, 57)

    z Useful to analyze the Perceptron algorithm,

    before looking at SVMs and Kernel Methods ingeneral

  • 8/7/2019 SVM Icml01 Tutorial

    12/85

    www.support-vector.net

    Basic Notation

    z Input space

    z Output space

    z Hypothesisz Real-valued:

    z Training Set

    z Test error

    z Dot product

    x X

    y Y

    h Hf X

    S x y x y

    x z

    i i

    = +

    =

    { , }

    : R

    {( , ),..., ( , ),....}

    ,

    1 1

    1 1

  • 8/7/2019 SVM Icml01 Tutorial

    13/85

    www.support-vector.net

    Perceptron

    z Linear Separation of theinput space

    x

    x

    o

    o

    o x

    xo

    oo

    x

    x

    x

    w

    b

    f x w x bh x sign f x

    ( ) ,( ) ( ( ))

    = +

    =

  • 8/7/2019 SVM Icml01 Tutorial

    14/85

    www.support-vector.net

    Perceptron Algorithm

    Update rule

    (ignoring threshold):

    z if then xx

    o

    o

    o x

    xo

    oo

    x

    x

    x

    w

    b

    y w xi k i( , ) 0

    w w y x

    k k

    k k i i+ +

    +

    1

    1

  • 8/7/2019 SVM Icml01 Tutorial

    15/85

    www.support-vector.net

    Observations

    z Solution is a linear combination of training

    points

    z Only used informative points (mistake driven)

    z The coefficient of a point in combinationreflects its difficulty

    w y xi i i

    i

    =

    0

  • 8/7/2019 SVM Icml01 Tutorial

    16/85

    www.support-vector.net

    Observations - 2

    z Mistake bound:

    z coefficients are non-negative

    z possible to rewrite the algorithm using this alternativerepresentation

    MR

    2x

    x

    o

    o

    ox

    xo

    oo

    x

    x

    x

    g

  • 8/7/2019 SVM Icml01 Tutorial

    17/85

    www.support-vector.net

    Dual Representation

    The decision function can be re-written as

    follows:

    f x w x b y x x bi i i( ) , ,= + = +

    w y xi i i=

    IMPORTANTCONCEPT

  • 8/7/2019 SVM Icml01 Tutorial

    18/85

    www.support-vector.net

    Dual Representation

    z And also the update rule can be rewritten as follows:

    z if then

    z Note: in dual representation, data appears only insidedot products

    y y x x bi j j j i ,

    +

    3 80 i i +

  • 8/7/2019 SVM Icml01 Tutorial

    19/85

    www.support-vector.net

    Duality: First Property of SVMs

    z DUALITY is the first feature of Support VectorMachines

    z SVMs are Linear Learning Machinesrepresented in a dual fashion

    z Data appear only within dot products (indecision function and in training algorithm)

    f x w x b y x x bi i i( ) , ,= + = +

  • 8/7/2019 SVM Icml01 Tutorial

    20/85

    www.support-vector.net

    Limitations of LLMs

    Linear classifiers cannot deal with

    z Non-linearly separable dataz Noisy data

    z + this formulation only deals with vectorial data

  • 8/7/2019 SVM Icml01 Tutorial

    21/85

    www.support-vector.net

    Non-Linear Classifiers

    z One solution: creating a net of simple linearclassifiers (neurons): a Neural Network

    (problems: local minima; many parameters;heuristics needed to train; etc)

    z Other solution: map data into a richer featurespace including non-linear features, then use alinear classifier

  • 8/7/2019 SVM Icml01 Tutorial

    22/85

    www.support-vector.net

    Learning in the Feature Space

    z Map data into a feature space where they are

    linearly separable x x ( )

    x

    x

    x

    x

    oo

    o

    o

    f(o)f(x)

    f(x)

    f(o)

    f(o)

    f(o) f(x)

    f(x)

    f

    X F

  • 8/7/2019 SVM Icml01 Tutorial

    23/85

    www.support-vector.net

    Problems with Feature Space

    z Working in high dimensional feature spacessolves the problem of expressing complexfunctions

    BUT:

    z There is a computational problem (working withvery large vectors)

    z And a generalization theory problem (curse ofdimensionality)

  • 8/7/2019 SVM Icml01 Tutorial

    24/85

    www.support-vector.net

    Implicit Mapping to Feature Space

    We will introduce Kernels:

    z

    Solve the computational problem of workingwith many dimensions

    z Can make it possible to use infinite dimensions efficiently in time / space

    z Other advantages, both practical andconceptual

  • 8/7/2019 SVM Icml01 Tutorial

    25/85

    www.support-vector.net

    Kernel-Induced Feature Spaces

    z In the dual representation, the data points only

    appear inside dot products:

    z The dimensionality of space F not necessarilyimportant. May not even know the map

    f x y x x bi i i( ) ( , ( ))= +

  • 8/7/2019 SVM Icml01 Tutorial

    26/85

    www.support-vector.net

    Kernels

    z A function that returns the value of the dot

    product between the images of the two

    arguments

    z Given a function K, it is possible to verify that itis a kernel

    K x x x x( , ) ( ), ( )1 2 1 2=

    IMPORTANT

    CONCEPT

  • 8/7/2019 SVM Icml01 Tutorial

    27/85

    www.support-vector.net

    Kernels

    z One can use LLMs in a feature space by

    simply rewriting it in dual representation and

    replacing dot products with kernels:

    x x K x x x x1 2 1 2 1 2, ( , ) ( ), ( ) =

  • 8/7/2019 SVM Icml01 Tutorial

    28/85

    www.support-vector.net

    The Kernel Matrix

    z (aka the Gram matrix):

    K(m,m)K(m,3)K(m,2)K(m,1)

    K(2,m)K(2,3)K(2,2)K(2,1)

    K(1,m)K(1,3)K(1,2)K(1,1)

    IMPORTANT

    CONCEPT

    K=

  • 8/7/2019 SVM Icml01 Tutorial

    29/85

    www.support-vector.net

    The Kernel Matrix

    z The central structure in kernel machines

    z Information bottleneck: contains all necessary

    information for the learning algorithmz Fuses information about the data AND the

    kernel

    z Many interesting properties:

  • 8/7/2019 SVM Icml01 Tutorial

    30/85

    www.support-vector.net

    Mercers Theorem

    z The kernel matrix is Symmetric Positive

    Definite

    z

    Any symmetric positive definite matrix can beregarded as a kernel matrix, that is as an inner

    product matrix in some space

  • 8/7/2019 SVM Icml01 Tutorial

    31/85

    www.support-vector.net

    More Formally: Mercers Theorem

    z Every (semi) positive definite, symmetric

    function is a kernel: i.e. there exists a mapping

    such that it is possible to write:

    Pos. Def.

    K x x x x( , ) ( ), ( )1 2 1 2=

    K x z f x f z d x d z

    f L

    ( , ) ( ) ( )

    I 02

  • 8/7/2019 SVM Icml01 Tutorial

    32/85

    www.support-vector.net

    Mercers Theorem

    z Eigenvalues expansion of Mercers Kernels:

    z That is: the eigenfunctions act as features !

    K x x x xi

    i

    i i

    ( , ) ( ) ( )1 2 1 2

    =

  • 8/7/2019 SVM Icml01 Tutorial

    33/85

    www.support-vector.net

    Examples of Kernels

    z Simple examples of kernels are:

    K x z x z

    K x z e

    d

    x z

    ( , ) ,

    ( , )/

    =

    =

    22

  • 8/7/2019 SVM Icml01 Tutorial

    34/85

    www.support-vector.net

    Example: Polynomial Kernels

    x x x

    z z z

    x z x z x z

    x z x z x z x z

    x x x x z z z z

    x z

    =

    =

    = + =

    = + + =

    = =

    =

    ( , );

    ( , );

    , ( )

    ( , , ), ( , , )

    ( ), ( )

    1 2

    1 2

    21 1 2 2

    2

    1

    2

    1

    2

    2

    2

    2

    21 1 2 2

    1

    2

    2

    21 2 1

    2

    2

    21 2

    2

    2 2

  • 8/7/2019 SVM Icml01 Tutorial

    35/85

    www.support-vector.net

    Example: Polynomial Kernels

  • 8/7/2019 SVM Icml01 Tutorial

    36/85

    www.support-vector.net

    Example: the two spirals

    z Separated by a hyperplane in feature space

    (gaussian kernels)

  • 8/7/2019 SVM Icml01 Tutorial

    37/85

    www.support-vector.net

    Making Kernels

    z The set of kernels is closed under someoperations. If K, K are kernels, then:

    z K+K is a kernel

    z cK is a kernel, if c>0

    z aK+bK is a kernel, for a,b >0

    z Etc etc etc

    z can make complex kernels from simple ones:modularity !

    IMPORTANTCONCEPT

  • 8/7/2019 SVM Icml01 Tutorial

    38/85

    www.support-vector.net

    Second Property of SVMs:

    SVMs are Linear Learning Machines, that

    z Use a dual representation

    ANDz Operate in a kernel induced feature space

    (that is:

    is a linear function in the feature space implicitelydefined by K)

    f x y x x bi i i( ) ( , ( ))= +

  • 8/7/2019 SVM Icml01 Tutorial

    39/85

    www.support-vector.net

    Kernels over General Structures

    z Haussler, Watkins, etc: kernels over sets, over

    sequences, over trees, etc.

    z

    Applied in text categorization, bioinformatics,etc

  • 8/7/2019 SVM Icml01 Tutorial

    40/85

    www.support-vector.net

    A bad kernel

    z would be a kernel whose kernel matrix is

    mostly diagonal: all points orthogonal to each

    other, no clusters, no structure

    1000

    1

    0010

    0001

  • 8/7/2019 SVM Icml01 Tutorial

    41/85

    www.support-vector.net

    No Free Kernel

    z If mapping in a space with too many irrelevant

    features, kernel matrix becomes diagonal

    z Need some prior knowledge of target so

    choose a good kernel

    IMPORTANT

    CONCEPT

  • 8/7/2019 SVM Icml01 Tutorial

    42/85

  • 8/7/2019 SVM Icml01 Tutorial

    43/85

    www.support-vector.net

    %5($.

  • 8/7/2019 SVM Icml01 Tutorial

    44/85

    www.support-vector.net

    The Generalization Problem

    z The curse of dimensionality: easy to overfit in highdimensional spaces(=regularities could be found in the training set that are accidental,that is that would not be found again in a test set)

    z The SVM problem is ill posed (finding one hyperplanethat separates the data: many such hyperplanes exist)

    z Need principled way to choose the best possiblehyperplane

    NEW

    TOPIC

  • 8/7/2019 SVM Icml01 Tutorial

    45/85

    www.support-vector.net

    The Generalization Problem

    z Many methods exist to choose a good

    hyperplane (inductive principles)

    z

    Bayes, statistical learning theory / pac, MDL,

    z Each can be used, we will focus on a simple

    case motivated by statistical learning theory(will give the basic SVM)

  • 8/7/2019 SVM Icml01 Tutorial

    46/85

    www.support-vector.net

    Statistical (Computational)Learning Theory

    z Generalization bounds on the risk of overfitting

    (in a p.a.c. setting: assumption of I.I.d. data;

    etc)

    z Standard bounds from VC theory give upper

    and lower bound proportional to VC dimension

    z

    VC dimension of LLMs proportional todimension of space (can be huge)

  • 8/7/2019 SVM Icml01 Tutorial

    47/85

    www.support-vector.net

    Assumptions and Definitions

    z distribution D over input space X

    z train and test points drawn randomly (I.I.d.) from D

    z training error of h: fraction of points in S misclassifed

    by h

    z test error of h: probability under D to misclassify a pointx

    z VC dimension: size of largest subset of X shattered byH (every dichotomy implemented)

    C

  • 8/7/2019 SVM Icml01 Tutorial

    48/85

    www.support-vector.net

    VC Bounds = mVC

    O

    ~

    VC = (number of dimensions of X) +1

    Typically VC >> m, so not useful

    Does not tell us which hyperplane to choose

  • 8/7/2019 SVM Icml01 Tutorial

    49/85

    www.support-vector.net

    Margin Based Bounds

    f

    xfy

    m

    RO

    ii

    i

    )(min

    )/(~ 2

    =

    =

    Note: also compression bounds exist; and online bounds.

  • 8/7/2019 SVM Icml01 Tutorial

    50/85

    www.support-vector.net

    Margin Based Bounds

    z (The worst case bound still holds, but if lucky (margin islarge)) the other bound can be applied and bettergeneralization can be achieved:

    z Best hyperplane: the maximal margin one

    z Margin is large is kernel chosen well

    =m

    RO

    2)/(~

    IMPORTANTCONCEPT

  • 8/7/2019 SVM Icml01 Tutorial

    51/85

    www.support-vector.net

    Maximal Margin Classifier

    z Minimize the risk of overfitting by choosing the

    maximal margin hyperplane in feature space

    z Third feature of SVMs: maximize the margin

    z SVMs control capacity by increasing the

    margin, not by reducing the number of degrees

    of freedom (dimension free capacity control).

  • 8/7/2019 SVM Icml01 Tutorial

    52/85

    www.support-vector.net

    Two kinds of margin

    z Functional and geometric margin:

    x

    x

    o

    o

    ox

    xo

    oo

    x

    x

    x

    g

    funct y f x

    geomy f x

    f

    i i

    i i

    =

    =

    min ( )

    min( )

  • 8/7/2019 SVM Icml01 Tutorial

    53/85

    www.support-vector.net

    Two kinds of margin

  • 8/7/2019 SVM Icml01 Tutorial

    54/85

    www.support-vector.net

    Max Margin = Minimal Norm

    z If we fix the functional margin to 1, the

    geometric margin equal 1/||w||

    z Hence, maximize the margin by minimizing the

    norm

  • 8/7/2019 SVM Icml01 Tutorial

    55/85

    www.support-vector.net

    Max Margin = Minimal Norm

    Distance between

    The two convex hullsw x b

    w x b

    w x x

    w

    wx x

    w

    ,

    ,

    , ( )

    , ( )

    +

    +

    +

    + = +

    + =

    =

    =

    1

    1

    2

    2

    x

    x

    o

    o

    ox

    xo

    oo

    x

    x

    x

    g

  • 8/7/2019 SVM Icml01 Tutorial

    56/85

    www.support-vector.net

    The primal problem

    z Minimize:

    subject to:w w

    y w x bi i

    ,

    ,

    + 4 9 1

    IMPORTANT

    STEP

  • 8/7/2019 SVM Icml01 Tutorial

    57/85

    www.support-vector.net

    Optimization Theory

    z The problem of finding the maximal margin

    hyperplane: constrained optimization

    (quadratic programming)

    z Use Lagrange theory (or Kuhn-Tucker Theory)

    z Lagrangian:

    1

    21

    0

    w w y w x bi i i, , + ! "$#

    1 6

  • 8/7/2019 SVM Icml01 Tutorial

    58/85

    www.support-vector.net

    From Primal to Dual

    L w w w y w x bi i i

    i

    ( ) , ,= + !"$#

    1

    21

    0

    1 6

    Differentiate and substitute:

    L

    b

    L

    w

    =

    =

    0

    0

  • 8/7/2019 SVM Icml01 Tutorial

    59/85

    www.support-vector.net

    The Dual Problem

    z Maximize:

    z Subject to:

    z The duality again ! Can use kernels !

    W y y x x

    y

    i

    i

    ii j

    j i j i j

    i

    i ii

    ( ) ,,

    =

    =

    1

    2

    0

    0

    IMPORTANT

    STEP

  • 8/7/2019 SVM Icml01 Tutorial

    60/85

    www.support-vector.net

    Convexity

    z This is a Quadratic Optimization problem:convex, no local minima (second effect of

    Mercers conditions)

    z Solvable in polynomial time

    z (convexity is another fundamental property of

    SVMs)

    IMPORTANTCONCEPT

  • 8/7/2019 SVM Icml01 Tutorial

    61/85

    www.support-vector.net

    Kuhn-Tucker Theorem

    Properties of the solution:z Duality: can use kernels

    z KKT conditions:

    z Sparseness: only the points nearest to the hyperplane(margin = 1) have positive weight

    z They are called support vectors

    w y xi i i=

    i i iy w x b

    i

    , + =

    1 6 1 0

  • 8/7/2019 SVM Icml01 Tutorial

    62/85

    www.support-vector.net

    KKT Conditions Imply Sparseness

    x

    x

    o

    o

    ox

    xo

    oo

    x

    x

    x

    g

    Sparseness:another fundamental property of SVMs

  • 8/7/2019 SVM Icml01 Tutorial

    63/85

    www.support-vector.net

    Properties of SVMs - Summary

    9 Duality

    9 Kernels

    9

    Margin9 Convexity

    9 Sparseness

  • 8/7/2019 SVM Icml01 Tutorial

    64/85

    www.support-vector.net

    Dealing with noise

    ( )2

    2

    1

    2+

    R

    m

    y w x bi i i, + 4 9 1

    In the case of non-separable data

    in feature space, the margin distribution

    can be optimized

  • 8/7/2019 SVM Icml01 Tutorial

    65/85

    www.support-vector.net

    The Soft-Margin Classifier

    1

    2

    1

    2

    1

    2

    w w C

    w w C

    y w x b

    i

    i

    i

    i

    i i i

    ,

    ,

    ,

    +

    +

    +

    4 9

    Minimize:

    Or:

    Subject to:

  • 8/7/2019 SVM Icml01 Tutorial

    66/85

    www.support-vector.net

    Slack Variables

    ( )2

    2

    1

    2

    +

    R

    m

    y w x bi i i, + 4 9 1

  • 8/7/2019 SVM Icml01 Tutorial

    67/85

    www.support-vector.net

    Soft Margin-Dual Lagrangian

    z Box constraints

    z Diagonal

    W y y x x

    C

    y

    i

    i

    ii j

    j i j i j

    i

    i ii

    ( ) ,,

    =

    =

    1

    2

    0

    0

    i

    i

    ii j

    j i j i j j j

    i

    i i

    y y x x C

    yi

    1

    2

    1

    2

    0

    0

    , ,

  • 8/7/2019 SVM Icml01 Tutorial

    68/85

    www.support-vector.net

    The regression case

    z For regression, all the above properties areretained, introducing epsilon-insensitive loss:

    L

    yi-+b

    e

    0

    x

  • 8/7/2019 SVM Icml01 Tutorial

    69/85

    www.support-vector.net

    Regression: the -tube

  • 8/7/2019 SVM Icml01 Tutorial

    70/85

    www.support-vector.net

    Implementation Techniques

    z Maximizing a quadratic function, subject to alinear equality constraint (and inequalities as

    well)

    W y y K x x

    y

    ii

    ii j j i j i j

    i

    i ii

    ( ) ( , ),

    =

    =

    1

    2

    0

    0

  • 8/7/2019 SVM Icml01 Tutorial

    71/85

    www.support-vector.net

    Simple Approximation

    z Initially complex QP pachages were used.

    z Stochastic Gradient Ascent (sequentiallyupdate 1 weight at the time) gives excellent

    approximation in most cases

    i i

    i i

    i i i i j

    K x x y y K x x +

    1

    1( , ) ( , )

  • 8/7/2019 SVM Icml01 Tutorial

    72/85

    www.support-vector.net

    Full Solution: S.M.O.

    z SMO: update two weights simultaneously

    z Realizes gradient descent without leaving thelinear constraint (J. Platt).

    z Online versions exist (Li-Long; Gentile)

  • 8/7/2019 SVM Icml01 Tutorial

    73/85

    www.support-vector.net

    Other kernelized Algorithms

    z Adatron, nearest neighbour, fisherdiscriminant, bayes classifier, ridge regression,

    etc. etc

    z Much work in past years into designing kernel

    based algorithms

    z Now: more work on designing good kernels (for

    any algorithm)

  • 8/7/2019 SVM Icml01 Tutorial

    74/85

    www.support-vector.net

    On Combining Kernels

    z When is it advantageous to combine kernels ?

    z Too many features leads to overfitting also inkernel methods

    z Kernel combination needs to be based on

    principles

    z

    Alignment

  • 8/7/2019 SVM Icml01 Tutorial

    75/85

    www.support-vector.net

    Kernel Alignment

    z Notion of similarity between kernels:Alignment (= similarity between Gram

    matrices)

    A K K K K

    K K K K

    ( , ),

    , ,1 2

    1 2

    1 1 2 2=

    IMPORTANTCONCEPT

  • 8/7/2019 SVM Icml01 Tutorial

    76/85

    www.support-vector.net

    Many interpretations

    z As measure of clustering in data

    z As Correlation coefficient between oracles

    z Basic idea: the ultimate kernel should be YY,

    that is should be given by the labels vector

    (after all: target is the only relevant feature !)

  • 8/7/2019 SVM Icml01 Tutorial

    77/85

    www.support-vector.net

    The ideal kernel

    11-1-1

    11-1-1

    -1-111

    -1-111

    YY=

  • 8/7/2019 SVM Icml01 Tutorial

    78/85

    www.support-vector.net

    Combining Kernels

    z Alignment in increased by combining kernelsthat are aligned to the target and not aligned to

    each other.

    z

    A K YY K YY

    K K YY YY

    ( , ' ), '

    , ' , '1

    1

    1 1

    =

  • 8/7/2019 SVM Icml01 Tutorial

    79/85

    www.support-vector.net

    Spectral Machines

    z Can (approximately) maximize the alignment ofa set of labels to a given kernel

    z By solving this problem:

    z Approximated by principal eigenvector(thresholded) (see courant-hilbert theorem)

    yyKy

    yy

    yi

    =

    +

    arg max'

    { , }1 1

  • 8/7/2019 SVM Icml01 Tutorial

    80/85

    www.support-vector.net

    Courant-Hilbert theorem

    z A:symmetric and positive definite,

    z Principal Eigenvalue / Eigenvector

    characterized by:

    = max

    'v

    vAv

    vv

  • 8/7/2019 SVM Icml01 Tutorial

    81/85

    www.support-vector.net

    Optimizing Kernel Alignment

    z One can either adapt the kernel to the labels orvice versa

    z In the first case: model selection method

    z Second case: clustering / transduction method

  • 8/7/2019 SVM Icml01 Tutorial

    82/85

    www.support-vector.net

    Applications of SVMs

    z Bioinformatics

    z Machine Vision

    z Text Categorization

    z Handwritten Character Recognition

    z Time series analysis

  • 8/7/2019 SVM Icml01 Tutorial

    83/85

    www.support-vector.net

    Text Kernels

    z Joachims (bag of words)

    z Latent semantic kernels (icml2001)

    z String matching kernels

    z

    z See KerMIT project

  • 8/7/2019 SVM Icml01 Tutorial

    84/85

    www.support-vector.net

    Bioinformatics

    z Gene Expression

    z Protein sequences

    z Phylogenetic Information

    z Promoters

    z

  • 8/7/2019 SVM Icml01 Tutorial

    85/85

    www.support-vector.net

    Conclusions:

    z Much more than just a replacement for neuralnetworks. -

    z General and rich class of pattern recognition methods

    z %RR RQ 690VZZZVXSSRUWYHFWRUQHW

    z Kernel machines website

    www.kernel-machines.org

    z www.NeuroCOLT.org