ls-svmlab & large scale modeling

35
LS-SVMlab & Large scale modeling Kristiaan Pelckmans, ESAT- SCD/SISTA J.A.K. Suykens, B. De Moor

Upload: mikko

Post on 07-Feb-2016

58 views

Category:

Documents


0 download

DESCRIPTION

LS-SVMlab & Large scale modeling. Kristiaan Pelckmans, ESAT- SCD/SISTA J.A.K. Suykens, B. De Moor. I. Overview II. Classification III. Regression IV. Unsupervised Learning V. Time-series VI. Conclusions and Outlooks. Content. People Contributors to LS-SVMlab: Kristiaan Pelckmans - PowerPoint PPT Presentation

TRANSCRIPT

LS-SVMlab & Large scale modeling

Kristiaan Pelckmans, ESAT- SCD/SISTA

J.A.K. Suykens, B. De Moor

ContentContent

• I. Overview• II. Classification • III. Regression• IV. Unsupervised Learning• V. Time-series• VI. Conclusions and Outlooks

People

Contributors to LS-SVMlab:

•Kristiaan Pelckmans

•Johan Suykens

•Tony Van Gestel

•Jos De Brabanter

•Lukas Lukas

•Bart Hamers

•Emmanuel Lambert

Supervisors:

•Bart De Moor

•Johan Suykens

•Joos Vandewalle

Acknowledgements

Our research is supported by grants from several funding agencies and sources: Research Council K.U.Leuven: Concerted Research Action GOA-Mefisto 666 (Mathematical Engineering), IDO (IOTA Oncology, Genetic networks), several PhD/postdoc & fellow grants; Flemish Government: Fund for Scientific Research FWO Flanders (several PhD/postdoc grants, projects G.0407.02 (support vector machines), G.0080.01 (collective intelligence), G.0256.97 (subspace), G.0115.01 (bio-i and microarrays), G.0240.99 (multilinear algebra), G.0197.02 (power islands), research communities ICCoS, ANMMM), AWI (Bil. Int. Collaboration South Africa, Hungary and Poland), IWT (Soft4s (softsensors), STWW-Genprom (gene promotor prediction), GBOU McKnow (Knowledge management algorithms), Eureka-Impact (MPC-control), Eureka-FLiTE (flutter modeling), several PhD-grants); Belgian Federal Government: DWTC (IUAP IV-02 (1996-2001) and IUAP V-10-29 (2002-2006): Dynamical Systems and Control: Computation, Identification & Modelling), Program Sustainable Development PODO-II (CP-TR-18: Sustainibility effects of Traffic Management Systems); Direct contract research: Verhaert, Electrabel, Elia, Data4s, IPCOS. JS is a professor at K.U.Leuven Belgium and a postdoctoral researcher with FWO Flanders. BDM and JWDW are full professors at K.U.Leuven Belgium.

I. OverviewI. Overview

• Goal of the Presentation1. Overview & Intuition

2. Demonstration LS-SVMlab

3. Pinpoint research challenges

4. Preparation NIPS 2002

• Research results and challenges• Towards applications• Overview LS-SVMlab

I.2 Overview researchI.2 Overview research

“Learning, generalization, extrapolation, identification, smoothing, modeling”

• Prediction (black box modeling)

• Point of view: Statistical Learning, Machine Learning, Neural Networks, Optimization, SVM

I.2 Type, Target, TopicI.2 Type, Target, Topic

I.3 Towards applicationsI.3 Towards applications

• System identification• Financial engineering• Biomedical signal processing• Datamining• Bio-informatics• Textmining• Adaptive signal processing

I.4 LS-SVMlabI.4 LS-SVMlab

I.4 LS-SVMlab (2)I.4 LS-SVMlab (2)

• Starting points:– Modularity– Object Oriented & Functional Interface– Basic bricks for advanced research

• Website and tutorial

• Reproducibility (preprocessing)

II. ClassificationII. Classification

“Learn the decision function associated with a set of labeled data points to predict the values of unseen data”

• Least Squares – Support Vector Machines

• Bayesian Framework• Different norms• Coding schemes

II.1 Least Squares – Support vector Machines II.1 Least Squares – Support vector Machines (LS-SVM (LS-SVM (,)))

1. Least Squares cost-function + regularization & equality constraints

2. Non-linearity by Mercer kernels

3. Primal-Dual Interpretation (Lagrange multipliers)

Primal parametric Model:

iiT

i ebxwy

Dual non-parametric Model:

i

n

jjiii ebxxKy

1

),(

(.,.)K

II.1 LS-SVM II.1 LS-SVM ((,,))

“Learning representations from relations”

NNNN

N

aaaa

aa

aaaaaa

,......,

............

.........,

,...,,

12

12111

II.2 Bayesian Inference

• Bayes rule (MAP):

• Closed form formulasApproximations: - Hessian in optimum

- Gaussian distribution

• Three levels of posteriors:

)(

)()|()|(

XP

PXPXP

)|(:Level

),|(:Level

),,|(:Level

3

2

1

XKP

XKP

XKP

II.3 SVM formulations & normsII.3 SVM formulations & norms

• 1 norm + inequality constraints: SVMextensions to any convex cost-function

• 2 norm + equality constraints: LS-SVM

weighted versions

II.4 Coding schemesII.4 Coding schemes

… 1 2 4 6 2 1 3 …

…… 1 -1 1 1 …

… -1 -1 -1 1 …

… 1 -1 -1 -1 …… 1 2 4 6 2 1 3 …

Encoding Decoding

Multi-class Classification task (multiple) binary classifiers

Labels:

III. RegressionIII. Regression

“Learn the underlying function from a set of data points and its corresponding noisy targets in order to predict the values of unseen data”

• LS-SVM(,)

• Cross-validation (CV)• Bayesian Inference• Robustness

III.1 LS-SVM(,)

• Least Squares cost-function + Regularization & Equality constraints

• Mercer kernels

• Lagrange multipliers:Primal Parametric Dual Non-parametric

III.1 III.1 LS-SVM(,) (2)• Regularization parameter:

– Do not fit noise (overfitting)!– trade-off noise and information

ex

xxf 5

)10sin()sinc()(

III.2 Cross-validation (CV)III.2 Cross-validation (CV)

“How to estimate generalization power of model?”

• Division training set – test set

• Repeated division: Leave-one-out CV (fast implementation)

• L-fold cross-validation

• Generalized Cross-validation (GCV):

• Complexity criteria: AIC, BIC, …

NN y

y

y

y

KXS

ˆ

...

ˆ

... . ),|(11

1 2 3…t-l-1 t-l…t+l t+1+l … n

1 2 3 …. t-2 t-1 t t+1 t+2 … n

1 2 3 …. t-1 t … n

III.2 Cross-validation Procedure III.2 Cross-validation Procedure (CVP)(CVP)

“How to optimize model for optimal generalization performance”

• Trade-off fitting – model complexity

• Kernel parameters

• Optimization routine?

III.1 III.1 LS-SVM(,) (3)

• Kernel type and parameter“Zoölogy as elephantism and non-elephantism”

• Model Comparison

• By cross-validation or Bayesian Inference

III.3 ApplicationsIII.3 Applications

“ok, but does it work?”

• Soft4s– Together with O. Barrero, L. Hoegaerts,

IPCOS (ISMC), BASF, B. De Moor– Soft-sensor

• ELIA– Together with O. Barrero, I.Goethals, L.

Hoegaerts, I.Markovsky, T. Van Gestel, ELIA, B. De Moor

– Prediction short and long term electricity consumption

III.2 Bayesian Inference

• Bayes rule (MAP):

• Closed form formulas

• Three levels of posteriors:

)(

)()|()|(

XP

PXPXP

)|(:)Comparison (Model Level

),|(:ation)(Regulariz Level

),,|(: )parameters (Model Level

3

2

1

XKP

XKP

XKP

III.4 RobustnessIII.4 Robustness

“How to build good models in the case of non-Gaussian noise or outliers”

• Influence function

• Breakdown point

• How:– De-preciating influence of large residuals– Mean - Trimmed mean – Median

• Robust CV, GCV, AIC,…

IV. Unsupervised LearningIV. Unsupervised Learning

“Extract important features from the unlabeled data”

• Kernel PCA and related methods • Nyström approximation

– From Dual to primal

– Fixed size LS-SVM

IV.1 Kernel PCAIV.1 Kernel PCA

Principal Component Analysis Kernel based PCA

y

x

z

IV.2 Kernel PCA (2)IV.2 Kernel PCA (2)

• Primal Dual LS-SVM style formulations

• For Kernel PCA, CCA, PLS

IV.2 NystrIV.2 Nyström approximationöm approximation

• Sampling of integral equation

• Approximating Feature map for Mercer kernel

)()()(),( ydxxpyyxK ii

)()(),(1

yyyxK i

N

jiij

)()(),(1

yyyxK i

n

jiij

)()(),( yxyxK T

(.)

(.)

IV.3 Fixed Size LS-SVMIV.3 Fixed Size LS-SVM

iiT

i ebxwy )(i

n

jjiii ebxxKy

1

),(

?

V. Time-seriesV. Time-series

“Learn to predict future values given a sequence of past values”

• NARX• Recurrent vs. feedforward

V.1 NARXV.1 NARX

• Reducible to static regression

• CV and Complexity criteria• Predicting in recurrent mode• Fixed size LS-SVM (sparse representation)

),...,,(ˆ 11 ltttt yyyfy

,....,,,,,..., 54321 tttttt yyyyyy

f

V.1 NARX (2)V.1 NARX (2)

Santa Fe Time-series competition

V.2V.2 Recurrent models? Recurrent models?

“How to learn recurrent dynamical models?”

• Training cost = Prediction cost?

• Non-parametric model class?

• Convex or non-convex?

• Hyper-parameters?

)ˆ,...,ˆ,ˆ(ˆ 21 ltttt yyyfy

VI.0 ReferencesVI.0 References

• J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor & J. Vandewalle (2002), Least Squares Support Vector Machines, World Scientific.

• V. Vapnik (1995), The Nature of Statistical Learning Theory, Springer-Verlag. • B. Schölkopf & A. Smola (2002),

Learning with Kernels, MIT Press.• T. Poggio & F. Girosi (1990), ``Networks

for approximation and learning'', Proc. of the IEEE, , 78, 1481-1497.

• N. Cristianini &J. Shawe-Taylor (2000), An Introduction to Support Vector Machines, Cambridge University Press.

VI. ConclusionsVI. Conclusions

“Non-linear Non-parametric learning as a generalized methodology”

• Non-parametric Learning• Intuition & Formulations• Hyper-parameters• LS-SVMlab

Questions?