minimal neural networks support vector machines and bayesian learning for neural networks peter...

28
Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras [email protected]

Post on 21-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Minimal Neural Networks

Support vector machines and Bayesian learning for neural

networks

Peter Andras

[email protected]

Page 2: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Bayesian neural networks I.The Bayes rule: Let’s consider a model of a system and an

observation of the system, an event. The a posteriori probability of correctness of the model, after the observation of the event, is proportional to the product of the a priori correctness of the model and the probability of the event conditioned by the correctness of the model.

Mathematically:)(

)()|()|(

DP

HPHDPDHP

where is the parameter of the model H and D is the observed event

Page 3: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Bayesian neural networks II.

Best model: model with highest a posteriori probability of correctness

Model selection by optimizing the formula:

)(ln()|(ln(min HPHDP

Page 4: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Bayesian neural networks III.

Application to neural networks:

g is the function represented by the neural network,

niyxD ii ,1|),( is the observed event

where is the vector of all parameters of the network

we suppose normal distribution for the data conditioned by the validity of a model, i.e., the observed values yi are normally distributed around g(xi), if is the correct parameter vector

Page 5: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Bayesian neural networks IV.

By making the calculations we get:

n

i

ii

n

i

r

xgy

xgyr

eHDP

ii

1

22

1

2

))((

))((2

11ln)|(ln(

2

2

and the new formula for optimization is:

n

i

ii Prxgy

1

22 ))(ln())((2

1min

Page 6: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Bayesian neural networks V.

The equivalence of the regularization and Bayesian model selection

Regularization formula:

n

i

ii Tgxgy

1

22 ||||))((2

1min

Bayesian optimization formula:

n

i

ii Prxgy

1

22 ))(ln())((2

1min

Equivalence: 22 ||||))(ln( TgPr

Both represents a priori information about the correct solution

Page 7: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Bayesian neural networks VI.

Bayesian pruning by regularization

Gauss pruning:

Laplace pruning:

Cauchy pruning:

n

i

N

kk

ii

rxgy

1 1

22

2

2))((

2

1min

n

i

N

kk

ii rxgy

1 1

22 ||))((2

1min

n

i

N

kk

ii

rxgy

1 1

22

2 )1ln())((2

1min

N is the number of components of the vectors

Page 8: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM I.

Linear separable classes:

- many separators

- there is an optimal separator

Page 9: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM II.

How to find the optimal separator ?

- support vectors

- overspecification

Property: one less support vector new optimal separator

Page 10: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM III.

We look for minimal and robust separators.

These are minimal and robust models of the data.

The full data set is equivalent with the set of the support vectors with respect to the specification of the minimal robust model.

Page 11: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM IV.Mathematical problem formulation I.

1,1,,1),,( iii yniyx

we represent the separator as a pair (w,b), where w is vector and b is a scalar

we look w and b such that they satisfy:

11

11

iiT

iiT

yifbxw

yifbxw

The support vectors are those xi-s for which this inequality is in fact equality.

Page 12: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM V.

Mathematical problem formulation II.

The distances form the origo of the hyper-planes of the support vectors are:

2

2

||||

|1|

||||

|1|

w

bd

w

bd

The distance between the two planes is: 2||||

2

wd

Page 13: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM VI.

Mathematical problem formulation III.

Optimal separator: the distance between the two hyper-planes is maximal

Optimization: 2||||2

1w

with the restrictions that

or in other form

SViiT Iiifybxw

SViT

i Iiifbxwy 01)(

Page 14: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM VII.

Mathematical problem formulation IV.

Complete optimization formula, using Lagrange multipliers

0

)(||||2

1

11

2

i

n

ii

n

i

iTiiP bxwywL

Page 15: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM VIII.Mathematical problem formulation V.

Writing the optimality conditions for w and b we get:

n

iii

n

i

iii

y

xyw

1

1

0

The dual problem is:

n

j

jijiji

n

i

n

iiD xxyyL

111

,2

1

The support vectors are those xi-s for which i is strictly positive

Page 16: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM IX.

Graphical interpretation

We search for the tangent point of a hyper-ellipsoid with the positive space quadrant

Page 17: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM X.

How to solve the support vector problem ?

Optimization with respect to the -s

- gradient method

- Newton and quasi-Newton methods

We get as result:

- the support vectors

- the optimal linear separator

Page 18: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM XI.

Implications for artificial neural networks:

- robust perceptron (low sensitivity to noise)

- minimal linear classificatory neural network

Page 19: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM XII.What can we do if the boundary is nonlinear ?

Idea: transform the data vectors to a space where the separator is linear

Page 20: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM XIII.

The transformation many times is made to an infinite dimensional space, usually a function space.

Example: x cos(uTx)

Page 21: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM XIV.

The new optimization formulas are:

)(,0

))(,(||||2

1

11

2

iii

n

ii

n

i

iiiP

xx

bxwywL

n

j

jijiji

n

i

n

iiD xxyyL

111

)(),(2

1

Page 22: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM XIV.

How to handle the products of the transformed vectors ?

Idea: use a transformation that fits the Mercer theorem

Mercer theorem: Let RRRK nn : then K has a decomposition

)(),(),( yxyxK

where HRn : and H is a function space

if and only if

nn RR

dxdyygyxKxg )(),()(0

for each )(2nRLg

Page 23: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM XV.

Optimization formula with transformation that fits the Mercer theorem:

n

j

jijiji

n

i

n

iiD xxKyyL

111

),(2

1

The form of the solution:

n

i

iii bxxKybxw

1

0),()(,

the b is determined from an equation valid for a support vector

Page 24: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM XVI.

Examples of transformations and kernels

22221

21

32 ,),(,),2,()(;: vuvuKxxxxxRR Ta.

b.

c.

2222121

21

42 ,),(,),,,()(;: vuvuKxxxxxxxRR T

2sin2

))(21

(sin),(

,))sin(,),2sin(),sin(),cos(,),2cos(),cos(,2

1()(;:

vu

vuNvuK

NxxxNxxxxHR T

Page 25: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM XVII.

Other typical kernels

),tanh(),(.

),(.

)1,(),(.

),(),(.

2

2

2

||||

vuavuKd

evuKc

vuvuKb

vuvuKa

r

vu

p

p

Page 26: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM XVIII.

Summary of main ideas

• look for minimal complexity classification

• transform the data to another space where the class boundaries are linear

• use Mercer kernels

Page 27: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Support vector machines - SVM XIX.

Practical issues

• the global optimization doesn’t work with large amount of data sequential optimization with chunks of the data

• the resulted models are minimal complexity models, they are insensitive to noise and keep the generalization ability of the more complex models

• applications: character recognition, economic forecasting

Page 28: Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras andrasp@ieee.org

Regularization neural networks

General optimization vs. optimization over the grid

The regularization operator specifies the grid:

- we look for functions that satisfy ||Tg||2=0

- in the relaxed case the regularization operator is incorporated as a constraint in the error function:

2||||TgEE usualT