mml inference of rbfs
DESCRIPTION
MML Inference of RBFs. Enes Makalic Lloyd Allison Andrew Paplinski. Presentation Outline. RBF architecture selection Existing methods Overview of MML MML87 MML inference of RBFs MML estimators for RBF parameters Results Conclusion Future work. RBF Architecture Selection (1). - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/1.jpg)
MML Inference of RBFsEnes MakalicLloyd AllisonAndrew Paplinski
![Page 2: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/2.jpg)
Presentation OutlineRBF architecture selection Existing methods
Overview of MML MML87
MML inference of RBFs MML estimators for RBF parameters Results
Conclusion Future work
![Page 3: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/3.jpg)
RBF Architecture Selection (1)
Determine optimal network architecture for a given problemInvolves choosing: Number and type of basis functions
Influences the success of the training processIf we choose a RBF that is: Too small: poor performance Too large: overfitting
![Page 4: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/4.jpg)
RBF Architecture Selection (2)
Poor Performance Overfitting
![Page 5: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/5.jpg)
RBF Architecture Selection (2)
Architecture selection solutions Use as many basis functions as there
is data Expectation Maximization (EM)
K-means clustering Regression trees (M. Orr)
BIC, GPE, etc. Bayesian inference
Reversible jump MCMC
![Page 6: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/6.jpg)
Overview of MML (1)Objective function to estimate the goodness of a modelA sender wishes to send data, x, to a receiver
How well is the data encoded? Message length (for example, in bits)
Sender ReceiverTransmission channel(noiseless)
![Page 7: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/7.jpg)
Overview of MML (2)Transmit the data in two parts: Part 1: encoding of the model Part 2: encoding of the data given the
model
Quantitative form of Occam’s razor
Hypothesis Data given Hypothesis
- log Pr(H) - log Pr(D|H)
![Page 8: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/8.jpg)
Overview of MML (3)MML87 Efficient approximation to strict MML Total message length for a model
with parameters :
2)|(log
)()(logmsgLen
2/
nfF
hnn
θx
θθ
nθ
![Page 9: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/9.jpg)
Overview of MML (4)MML87 is the prior information is the likelihood function is the number of parameters is a dimension constant is the determinant of the expected
Fisher information matrix with entries (i, j):
2/nn)(θF
)(θh)( θ|xf
)(log)(2
θ|xθ|xx
ffX ji
n
![Page 10: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/10.jpg)
Overview of MML (5)MML87 Fisher Information:
Sensitivity of likelihood function to parameters Determines the accuracy of stating the model Small second derivatives state parameters less
precisely Large second derivatives state parameters
more accurately A model that minimises the total message length is optimal
![Page 11: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/11.jpg)
MML Inference of RBFs (1)Regression problemsWe require: A likelihood function Fisher information Priors on all model parameters
![Page 12: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/12.jpg)
MML Inference of RBFs (2)Notation
),(1 rc
),( rcH
)ˆM(ˆ yz
1x
ix
mx
)N(ˆ wy
![Page 13: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/13.jpg)
MML Inference of RBFs (3)RBF Network m inputs, n parameters, o outputs Mapping from parameters to outputs
w: vector of network parameters Network output implicitly depends on
the network input vector, Define output non-linearity
on :N )N(ˆ wy
oo :M )ˆM(ˆ yz
mx
![Page 14: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/14.jpg)
MML Inference of RBFs (4)Likelihood function Learning: minimisation of a scalar
function
We define L as the negative log likelihood
L implicitly depends on given targets, z, for network outputs
Different input-target pairs are considered independent
o:L )))L(M(N(w
)ˆPr(log)ˆL( zz
),(,),,( 11 NN zzD xx
![Page 15: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/15.jpg)
MML Inference of RBFs (5)Likelihood function Regression problems The network error, , is assumed
Gaussian with a mean and variance
2
2 ˆ21exp
21),|ˆPr( zzxz
w
zz ˆ2z
),|ˆPr(),|ˆ,,ˆPr(),(
1 wxwxx
iDz
iNii
zzz
![Page 16: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/16.jpg)
MML Inference of RBFs (6)Fisher information Expected Hessian matrix, Jacobian matrix of L
Hessian matrix of L
''''N(w)MLw LMNNML JJJJ
iN
iiMLNMLN HJJHJH
'
x|zHF
![Page 17: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/17.jpg)
MML Inference of RBFs (7)Fisher information Taking expectations and simplifying we obtain
Positive semi-definite Complete Fisher includes a summation over the
whole data set D We used an approximation to F
Block-diagonal Hidden basis functions assumed to be independent Simplified determinant – product of determinants for
each block
NMN JJJF '
![Page 18: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/18.jpg)
MML Inference of RBFs (8)Priors Must specify a prior density for each
parameter Centres: uniform Radii: uniform (log-scale) Weights: Gaussian
Zero mean and standard deviation is usually taken to be large (vague prior)
![Page 19: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/19.jpg)
MML Inference of RBFs (9)Message length of a RBF
where: denotes the cost of transmitting the number of
basis functions F(w) is the determinant of the expected Fisher
information
L is the negative log-likelihood
C is a dimension constant Independent of w
C L)F(log21)logh(HlogmsgLen * ww
Hlog*
),|ˆ,,ˆPr(logL 1 wxNzz
)det()F( ''NMN JJJw
![Page 20: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/20.jpg)
MML Inference of RBFs (10)
MML estimators for parameters Standard unbiased estimator for the error s.d.
Numerical optimisation using Differentiation of the expected Fisher information
determinant
1)ˆ(ˆ2
Nzz
wmsgLen
dad
dad FFTracedet(F)log 1
![Page 21: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/21.jpg)
Results (1)MML inference criterion is compared to: Conventional MATLAB RBF
implementation M. Orr’s regression tree methodFunctions used for criteria evaluation Correct answer known Correct answer not known
![Page 22: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/22.jpg)
Results (2)Correct answer known Generate data from a known RBF (one,
three and five basis functions respectively) Inputs uniformly sampled in the range (-8,8)
1D and 2D inputs were considered Gaussian noise N(0,0.1) added to the
network outputs Training set and test set comprise 100 and
1000 patterns respectively
![Page 23: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/23.jpg)
Results (3)MSE Correct answer known (1D input)
![Page 24: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/24.jpg)
Results (4)MSE Correct answer known (2D inputs)
![Page 25: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/25.jpg)
Results (5)Correct answer not known The following functions were used:
2
))2(1()( 21
xexxxf
)2sin()(2 xxf
2sin
4sin),( 21
213xxxxf
)4,4(x
)4,4(x
)5,5(),10,0( 21 xx
![Page 26: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/26.jpg)
Results (6)Correct answer not known Gaussian noise N(0,0.1) added to the
network outputs Training set and test set comprise
100 and 1000 patterns respectively
![Page 27: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/27.jpg)
Results (7)
![Page 28: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/28.jpg)
Results (8)
![Page 29: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/29.jpg)
Results (9)MSE Correct answer not known
![Page 30: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/30.jpg)
Results (10)Sensitivity of criteria to noise
![Page 31: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/31.jpg)
Results (11)Sensitivity of criteria to data set size
![Page 32: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/32.jpg)
Conclusion (1)Novel approach to architecture selection in RBF networks MML87 Block-diagonal Fisher information
matrix approximationMATLAB code available from:
http://www.csse.monash.edu.au/~enesm
![Page 33: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/33.jpg)
Conclusion (2)Results Initial testing Good performance when level of noise and
dataset size is varied No over-fitting
Future work Further testing Examine if MML parameter estimators
improve performance MML and regularization
![Page 34: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/34.jpg)
Conclusion (3)Questions?
![Page 35: MML Inference of RBFs](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815e38550346895dcc9d2a/html5/thumbnails/35.jpg)
Conclusion (4)