mml inference of rbfs enes makalic lloyd allison andrew paplinski

MML Inference of RBFs

Enes MakalicLloyd AllisonAndrew Paplinski

Presentation Outline

RBF architecture selection Existing methods

Overview of MML MML87

MML inference of RBFs MML estimators for RBF parameters Results

Conclusion Future work

RBF Architecture Selection (1)

Determine optimal network architecture for a given problemInvolves choosing: Number and type of basis functions

Influences the success of the training processIf we choose a RBF that is: Too small: poor performance Too large: overfitting


Poor Performance Overfitting


Architecture selection solutions Use as many basis functions as there

is data Expectation Maximization (EM)

K-means clustering Regression trees (M. Orr)

BIC, GPE, etc. Bayesian inference

Reversible jump MCMC

Overview of MML (1)

Objective function to estimate the goodness of a modelA sender wishes to send data, x, to a receiver

How well is the data encoded? Message length (for example, in bits)

Sender ReceiverTransmission channel

(noiseless)

Overview of MML (2)

Transmit the data in two parts: Part 1: encoding of the model Part 2: encoding of the data given the

model

Quantitative form of Occam’s razor

Hypothesis Data given Hypothesis

- log Pr(H) - log Pr(D|H)

Overview of MML (3)

MML87 Efficient approximation to strict MML Total message length for a model

with parameters :

2)|(log

)(

)(logmsgLen

2/

nf

F

hnn

θx

θ

θ

nθ

Overview of MML (4)

MML87 is the prior information is the likelihood function is the number of parameters is a dimension constant is the determinant of the expected

Fisher information matrix with entries (i, j):

2/nn)(θF

)(θh)( θ|xf

)(log)(2

θ|xθ|xx

ffX ji

n

Overview of MML (5)

MML87 Fisher Information:

Sensitivity of likelihood function to parameters Determines the accuracy of stating the model Small second derivatives state parameters less

precisely Large second derivatives state parameters

more accurately

A model that minimises the total message length is optimal

MML Inference of RBFs (1)

Regression problemsWe require: A likelihood function Fisher information Priors on all model parameters


Notation),(1 rc

),( rcH

)ˆM(ˆ yz

1x

ix

mx

)N(ˆ wy


RBF Network m inputs, n parameters, o outputs Mapping from parameters to outputs

w: vector of network parameters Network output implicitly depends on

the network input vector, Define output non-linearity

on :N )N(ˆ wy

oo :M )ˆM(ˆ yz

mx


Likelihood function Learning: minimisation of a scalar function

We define L as the negative log likelihood

L implicitly depends on given targets, z, for network outputs

Different input-target pairs are considered independent

o:L )))L(M(N(w

)ˆPr(log)ˆL( zz

),(,),,( 11 NN zzD xx


Likelihood function Regression problems The network error, , is assumed

Gaussian with a mean and variance

2

2ˆ

2

1exp

2

1),|ˆPr( zzxz

w

zz ˆ2z

),|ˆPr(),|ˆ,,ˆPr(),(

1 wxwxx

iDz

iN

ii

zzz


Fisher information Expected Hessian matrix, Jacobian matrix of L

Hessian matrix of L

''''N(w)MLw LMNNML JJJJ

iN

iiMLNMLN HJJHJH

'

x|zHF


Fisher information Taking expectations and simplifying we obtain

Positive semi-definite Complete Fisher includes a summation over the

whole data set D We used an approximation to F

Block-diagonal Hidden basis functions assumed to be independent Simplified determinant – product of determinants for

each block

NMN JJJF '


Priors Must specify a prior density for each

parameter Centres: uniform Radii: uniform (log-scale) Weights: Gaussian

Zero mean and standard deviation is usually taken to be large (vague prior)


Message length of a RBF

where: denotes the cost of transmitting the number

of basis functions F(w) is the determinant of the expected Fisher

information

L is the negative log-likelihood

C is a dimension constant Independent of w

C L)F(log2

1)logh(HlogmsgLen * ww

Hlog*

),|ˆ,,ˆPr(logL 1 wxNzz

)det()F( ''NMN JJJw


MML estimators for parameters Standard unbiased estimator for the error s.d.

Numerical optimisation using Differentiation of the expected Fisher information

determinant

1

)ˆ(ˆ

2

N

zz

wmsgLen

da

d

da

d FFTracedet(F)log 1

Results (1)

MML inference criterion is compared to: Conventional MATLAB RBF

implementation M. Orr’s regression tree method

Functions used for criteria evaluation Correct answer known Correct answer not known

Results (2)

Correct answer known Generate data from a known RBF (one,

three and five basis functions respectively) Inputs uniformly sampled in the range (-8,8)

1D and 2D inputs were considered Gaussian noise N(0,0.1) added to the

network outputs Training set and test set comprise 100 and

1000 patterns respectively

Results (3)

MSE Correct answer known (1D input)

Results (4)

MSE Correct answer known (2D inputs)

Results (5)

Correct answer not known The following functions were used:

2

))2(1()( 21

xexxxf

)2sin()(2 xxf

2sin

4sin),( 21

213

xxxxf

)4,4(x

)4,4(x

)5,5(),10,0( 21 xx

Results (6)

Correct answer not known Gaussian noise N(0,0.1) added to the

network outputs Training set and test set comprise

100 and 1000 patterns respectively

Results (7)

Results (8)

Results (9)

MSE Correct answer not known

Results (10)

Sensitivity of criteria to noise

Results (11)

Sensitivity of criteria to data set size

Conclusion (1)

Novel approach to architecture selection in RBF networks MML87 Block-diagonal Fisher information

matrix approximation

MATLAB code available from: http://www.csse.monash.edu.au/~enesm

Conclusion (2)

Results Initial testing Good performance when level of noise and

dataset size is varied No over-fitting

Future work Further testing Examine if MML parameter estimators

improve performance MML and regularization

Conclusion (3)

Questions?

Conclusion (4)

mml inference of rbfs enes makalic lloyd allison andrew paplinski

Documents

notationmml inference

optimalmml inference

blockmml inference of

variance mml inference

optimal network architecture

mcmcoverview of mml

number of parameters

n parameters