introduction to gaussian processesdanielpr/files/gptut-penn_0.pdfintroduction to gaussian processes...

Post on 10-Jun-2020

16 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Gaussian Processes

Daniel Preotiuc-Pietro

Positive Psychology CenterUniversity of Pennsylvania

13 October 2014

with slides from Trevor Cohn, Neil Lawrence, Richard Turner

Gaussian Processes

Brings together several key ideas in one framework

� Bayesian� kernelised� non-parametric� non-linear� modelling uncertainty

Elegant and powerful framework, with growing popularity inmachine learning and application domains.

Gaussian Processes

State of the art for regression

� exact posterior inference� supports very complex non-linear functions� elegant model selection

Gaussian Processes

Now mature enough for use in NLP

� support for classification, ranking, etc� fancy kernels, e.g., text� sparse approximations for large scale inference� probabilistic formulation allows incorporation into larger

graphical models� models the prediction uncertainty so that it’s propagated

through pipelines of probabilistic components

Gaussian Processes

Regression and classification are fundamental techniques inNLP

Linear models:

� simple and fast to run� provide straightforward interpretability� but very limited in the type of relations they can model

Non-linear models (e.g. SVM):

� better results because they allow to model otherrelationships

� but high cost of hyperparameter optimisation� lack of interoperability with downstream processing

Gaussian Processes

The Gaussian distribution:

� is a distribution over vectors

The Gaussian process:

� is a distribution over functions

Gaussian Processes, prior

� the prior represents our prior beliefs over the functions weexpect to learn

� here, standard choices: 0 mean, 1 variance, ‘smooth’functions

Gaussian Process, posterior

� after observing data points, adjust the posteriordistribution over functions

� the functions must pass ’close’ to the observations,compute posterior mean and variance

Gaussian Processes

GP demo: http://www.tmpl.fi/gp/

Relationship to Linear regression

Linear regression:y = wx + ε (1)

where w are parameters we learn.

Gaussian process regression:

y = f (x) + ε (2)

where f is drawn from a Gaussian process prior.

� the w parameters are gone, hence the term‘non-parametric’ for GP’s

� f is not limited to a parametric form

Graphical model view

f ∼ GP(m, k)

y ∼ N( f (x), σ2)

� f : RD− > R is a latentfunction

� y is a noisy realisationof f (x)

� k is the covariancefunction or kernel

� m and σ2 is a learnedparameter

Formal definition

Definition

A Gaussian process is a collection of random variables, anyfinite number of which have a joint Gaussian distribution.

A GP is fully specified by mean m(x) and covariance k(x, x′)functions:

� m(x) = E[ f (x)]

� k(x, x′) = E[( f (x) −m(x))(( f (x′) −m(x′))]

Without loss of generality we can assume m(x) = 0.

Compared to a Gaussian distribution

Gaussian distribution:

� specified by a mean and covariance matrix: x ∼ N(μ,Σ)� the vector index is the position of the random variables x

Gaussian process:

� specified by a mean and covariance function: f ∼ GP(m, k)� the index is the argument x of f

The Gaussian Process is an infinite extension of multivariateGaussian distributions

Gaussian distribution

p(y|Σ) ∝ exp(−1

2yTΣ−1y)

Σ =

⎡⎣ 1 .7

.7 1

⎤⎦

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Gaussian distribution

p(y|Σ) ∝ exp(−1

2yTΣ−1y)

Σ =

⎡⎣ 1 .7

.7 1

⎤⎦

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Gaussian distribution

p(y|Σ) ∝ exp(−1

2yTΣ−1y)

Σ =

⎡⎣ 1 .6

.6 1

⎤⎦

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Gaussian distribution

p(y|Σ) ∝ exp(−1

2yTΣ−1y)

Σ =

⎡⎣ 1 .4

.4 1

⎤⎦

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Gaussian distribution

p(y|Σ) ∝ exp(−1

2yTΣ−1y)

Σ =

⎡⎣ 1 .1

.1 1

⎤⎦

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Gaussian distribution

p(y|Σ) ∝ exp(−1

2yTΣ−1y)

Σ =

⎡⎣ 1 .0

.0 1

⎤⎦

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Gaussian distribution

p(y|Σ) ∝ exp(−1

2yTΣ−1y)

Σ =

⎡⎣ 1 .7

.7 1

⎤⎦

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Gaussian distribution

p(y|Σ) ∝ exp(−1

2yTΣ−1y)

Σ =

⎡⎣ 1 .7

.7 1

⎤⎦

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Gaussian distribution

p(y2|y1,Σ) ∝ exp(−1

2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1

2

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Gaussian distribution

p(y2|y1,Σ) ∝ exp(−1

2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1

2

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Gaussian distribution

p(y2|y1,Σ) ∝ exp(−1

2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1

2

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Gaussian distribution

p(y2|y1,Σ) ∝ exp(−1

2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1

2

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Gaussian distribution

p(y2|y1,Σ) ∝ exp(−1

2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1

2

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Gaussian distribution

p(y2|y1,Σ) ∝ exp(−1

2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1

2

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Gaussian distribution

p(y2|y1,Σ) ∝ exp(−1

2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1

2

y1

y 2

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 2

1 2−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎣ 1 .9

.9 1

⎤⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 5

1 2 3 4 5−3

−2

−1

0

1

2

3

variable index

y

Σ =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

New visualisation

−2 0 2−3

−2

−1

0

1

2

3

y1

y 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

Regression using Gaussians

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

Regression using Gaussians

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

Regression using Gaussians

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

1 10 20

1

10

20

Regression using Gaussians

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

−2

−1

0

1

2

3

variable index

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)

1 10 20

1

10

20

Regression: probabilistic inference in function space

−3

−2

−1

0

1

2

3

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)

1 10 20

1

10

20

Regression: probabilistic inference in function space

−3

−2

−1

0

1

2

3

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric)

K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)

1 10 20

1

10

20

Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)

Regression: probabilistic inference in function space

−3

−2

−1

0

1

2

3

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

Non-parametric (∞-parametric)

p(y|θ) = N (0,Σ)

function estimatewith uncertaintyK(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

1 10 20

1

10

20

Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)

Regression: probabilistic inference in function space

−3

−2

−1

0

1

2

3

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric)

noise

K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)

1 10 20

1

10

20

Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)

Regression: probabilistic inference in function space

−3

−2

−1

0

1

2

3

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric)

horizontal-scale

noise

K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)

1 10 20

1

10

20

Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)

Regression: probabilistic inference in function space

−3

−2

−1

0

1

2

3

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric)

noise

Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)

1 10 20

1

10

20

horizontal-scalevertical-scale

Squared Exponential (SE) kernel

Usual choice for covariance is the Squared Exponential (SE)kernel (a.k.a. RBF, exponentiated quadratic, Gaussian):

k(x, x′) = σ2 exp(− (x − x′)2

2l2

)

The kernel’s parameters {l, σ} are called the GP’shyperparameters.

When l is very high, neighbouring points are very correlatedi.e. that feature is less relevant.

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

Σ =

short horizontal length-scale

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

short horizontal length-scale

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

short horizontal length-scale

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

short horizontal length-scale

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

short horizontal length-scale

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

short horizontal length-scale

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

short horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

short horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

short horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

short horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric)

short horizontal length-scale

Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

short horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric)

short horizontal length-scale

Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

short horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

short horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

short horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric)

short horizontal length-scale

Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric)

short horizontal length-scale

Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric)

short horizontal length-scale

Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric)

short horizontal length-scale

Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric)

short horizontal length-scale

Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

Σ =

long horizontal length-scale

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

long horizontal length-scale

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

long horizontal length-scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

What effect do the hyper-parameters have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

)

Hyperparameters

Training a GP means:

� finding the right kernel� finding values of hyperparameters

The covariance function controls properties of the GP:

� smoothness� stationarity� periodicity� etc.

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

Σ =

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

Σ =

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

Σ =

Periodic

sinusoid × squared exponential

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

What effect does the form of the covariance function have?

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

−3

−2

−1

0

1

2

3

x

y

Σ =

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

)

Periodic

sinusoid × squared exponential

Inference

GP is an infinite dimensional object but we only need to workwith finite dimensional objects.

For prediction (and the Bayesian update) we are interested inthe conditional distribution of ‘new’ points y∗ given observedpoints y:

p(y∗|y) =p(y, y∗)

p(y)

Using the GP definition, we know that:

p(y) = N(m,Σ), p(y, y∗) = N([

mm∗

],

[Σ ΣT∗Σ∗ Σ∗∗

])

where we ignored the white noise for simplification and Σ arecovariance matrices.

Inference

Using the multi-variate Gaussian distribution properties:

p(y∗|y) = N(m∗ + ΣT∗ Σ−1(y −m),Σ∗∗ − ΣT∗ Σ−1Σ∗)

Predictive mean is linear in the data:

μy∗|y = ΣT∗ Σ−1y

Predictive covariance is the prior covariance minus theinformation the observations give about the function:

Σy∗|y = Σ∗∗ − ΣT∗ Σ−1Σ∗

Relationship to Dirichlet Processes

A Gaussian process defines a distribution over functions:

f ∼ GP(m, k)

GPs can be viewed as infinite-dimensional Gaussiandistributions.

A Dirichlet process (DP) defines a distribution overdistributions:

G ∼ DP(G0, α0)

DPs can be viewed as infinite-dimensional Dirichletdistributions.

Both f and G are infinite-dimensional objects and both makeuse of conjugacy (Gaussian-Gaussian andMultinomial-Dirichlet).

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1co

effic

ient

s

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.5

1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1

x

f(x)

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1co

effic

ient

s

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.5

1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1

x

f(x)

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1co

effic

ient

s

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.5

1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1

x

f(x)

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1co

effic

ient

s

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.5

1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1

x

f(x)

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1co

effic

ient

s

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.5

1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1

x

f(x)

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1co

effic

ient

s

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.5

1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1

x

f(x)

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1co

effic

ient

s

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.5

1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1

x

f(x)

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1co

effic

ient

s

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.5

1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1

x

f(x)

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1co

effic

ient

s

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.5

1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1

x

f(x)

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1co

effic

ient

s

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.5

1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−1

0

1

x

f(x)

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.2

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.2

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.2

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.2

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.2

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.2

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.2

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.2

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.2

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.2

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.05

0.1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.05

0.1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.05

0.1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.05

0.1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.05

0.1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.05

0.1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.05

0.1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.05

0.1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.05

0.1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

coef

ficie

nts

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.05

0.1

basi

s fu

nctio

ns

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 〈f(x)〉

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 〈∑k γkgk(x)〉

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) =∑

k〈γk〉gk(x)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) = 〈f(x)f(x′)〉

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) = 〈∑k γkgk(x)∑

k′ γk′gk′(x′)〉

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) =∑

k,k′〈γkγk′〉gk(x)gk′(x′)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) =∑

k,k′ δk,k′gk(x)gk′(x′)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) =∑

k gk(x)gk(x′)

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) =∑

k gk(x)gk(x′)

K(x, x′) = 1K

∑k e− 1

l2(x−k/K)2− 1

l2(x′−k/K)2

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) =∑

k gk(x)gk(x′)

K(x, x′) = 1K

∑k e− 1

l2(x−k/K)2− 1

l2(x′−k/K)2

K(x, x′) −→ ∫du e− 1

l2(x−u)2− 1

l2(x′−u)2

K → ∞

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) =∑

k gk(x)gk(x′)

K(x, x′) = 1K

∑k e− 1

l2(x−k/K)2− 1

l2(x′−k/K)2

K(x, x′) −→ ∫du e− 1

l2(x−u)2− 1

l2(x′−u)2

K → ∞

K(x, x′) ∝ e− 1

2l2(x−x′)2

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

0

0.5

x

f(x)

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) =∑

k gk(x)gk(x′)

K(x, x′) = 1K

∑k e− 1

l2(x−k/K)2− 1

l2(x′−k/K)2

K(x, x′) −→ ∫du e− 1

l2(x−u)2− 1

l2(x′−u)2

K → ∞

K(x, x′) ∝ e− 1

2l2(x−x′)2

Gaussian processes ≡ models with ∞ parameters

Basis function view of Gaussian Processes

�����������

� �����

���������

Basis function view of Gaussian Processes

�����������

�������

��������

�������

���������

Basis function view of Gaussian Processes

�����������

�������

��������

�������

���������

��������������

Basis function view of Gaussian Processes

��� ��������

��������

�������

����

���� �����

��������������

���������

����� ��������

����� �������

Basis function view of Gaussian Processes

�������������

��������

�������

����

���� �����

��������������

���������

����� ��������

��������� � � ������������������� ��� �� ��

��������������� ����� ���

����� ����� �

Basis function view of Gaussian Processes

�������������

���

��������

�������

����

�����������

���� ���������

����������

���������� � �

�������

����� ���

������������������������������������� ���� �

������� ���������������

�����������!

Beyond GP regression

� Classification: non-Gaussian likelihoods, use EP or Laplace� Ordinal regression� Count data� Model selection: kernel and hyperparameter optimisation� Kernel design� Extrapolation� Multi-output GPs: for multi-task learning� Sparse GPs: scaling to large number of features and inputs� Latent variable models: GP-LVM as non-linear

probabilistic PCA� And many others ...

Resources

Free book:http://www.gaussianprocess.org/gpml/chapters/

Tutorials

� GPs for Natural Language Processing tutorial (ACL 2014)http://goo.gl/18heUk

� GP Schools in Sheffield and roadshows in Kampala,Pereira, and (future) Nyeri, Melbournehttp://ml.dcs.shef.ac.uk/gpss/

� GP Regression demohttp://www.tmpl.fi/gp/

� Annotated bibliography and other materialshttp://www.gaussianprocess.org

Toolkits

� GPML (Matlab)http://www.gaussianprocess.org/gpml/code

� GPy (Python)https://github.com/SheffieldML/GPy

� GPstuff (R, Matlab, Octave)http://becs.aalto.fi/en/research/bayes/gpstuff/

top related