introduction to gaussian processesdanielpr/files/gptut-penn_0.pdfintroduction to gaussian processes...
TRANSCRIPT
Introduction to Gaussian Processes
Daniel Preotiuc-Pietro
Positive Psychology CenterUniversity of Pennsylvania
13 October 2014
with slides from Trevor Cohn, Neil Lawrence, Richard Turner
Gaussian Processes
Brings together several key ideas in one framework
� Bayesian� kernelised� non-parametric� non-linear� modelling uncertainty
Elegant and powerful framework, with growing popularity inmachine learning and application domains.
Gaussian Processes
State of the art for regression
� exact posterior inference� supports very complex non-linear functions� elegant model selection
Gaussian Processes
Now mature enough for use in NLP
� support for classification, ranking, etc� fancy kernels, e.g., text� sparse approximations for large scale inference� probabilistic formulation allows incorporation into larger
graphical models� models the prediction uncertainty so that it’s propagated
through pipelines of probabilistic components
Gaussian Processes
Regression and classification are fundamental techniques inNLP
Linear models:
� simple and fast to run� provide straightforward interpretability� but very limited in the type of relations they can model
Non-linear models (e.g. SVM):
� better results because they allow to model otherrelationships
� but high cost of hyperparameter optimisation� lack of interoperability with downstream processing
Gaussian Processes
The Gaussian distribution:
� is a distribution over vectors
The Gaussian process:
� is a distribution over functions
Gaussian Processes, prior
� the prior represents our prior beliefs over the functions weexpect to learn
� here, standard choices: 0 mean, 1 variance, ‘smooth’functions
Gaussian Process, posterior
� after observing data points, adjust the posteriordistribution over functions
� the functions must pass ’close’ to the observations,compute posterior mean and variance
Gaussian Processes
GP demo: http://www.tmpl.fi/gp/
Relationship to Linear regression
Linear regression:y = wx + ε (1)
where w are parameters we learn.
Gaussian process regression:
y = f (x) + ε (2)
where f is drawn from a Gaussian process prior.
� the w parameters are gone, hence the term‘non-parametric’ for GP’s
� f is not limited to a parametric form
Graphical model view
f ∼ GP(m, k)
y ∼ N( f (x), σ2)
� f : RD− > R is a latentfunction
� y is a noisy realisationof f (x)
� k is the covariancefunction or kernel
� m and σ2 is a learnedparameter
Formal definition
Definition
A Gaussian process is a collection of random variables, anyfinite number of which have a joint Gaussian distribution.
A GP is fully specified by mean m(x) and covariance k(x, x′)functions:
� m(x) = E[ f (x)]
� k(x, x′) = E[( f (x) −m(x))(( f (x′) −m(x′))]
Without loss of generality we can assume m(x) = 0.
Compared to a Gaussian distribution
Gaussian distribution:
� specified by a mean and covariance matrix: x ∼ N(μ,Σ)� the vector index is the position of the random variables x
Gaussian process:
� specified by a mean and covariance function: f ∼ GP(m, k)� the index is the argument x of f
The Gaussian Process is an infinite extension of multivariateGaussian distributions
Gaussian distribution
p(y|Σ) ∝ exp(−1
2yTΣ−1y)
Σ =
⎡⎣ 1 .7
.7 1
⎤⎦
y1
y 2
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Gaussian distribution
p(y|Σ) ∝ exp(−1
2yTΣ−1y)
Σ =
⎡⎣ 1 .7
.7 1
⎤⎦
y1
y 2
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Gaussian distribution
p(y|Σ) ∝ exp(−1
2yTΣ−1y)
Σ =
⎡⎣ 1 .6
.6 1
⎤⎦
y1
y 2
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Gaussian distribution
p(y|Σ) ∝ exp(−1
2yTΣ−1y)
Σ =
⎡⎣ 1 .4
.4 1
⎤⎦
y1
y 2
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Gaussian distribution
p(y|Σ) ∝ exp(−1
2yTΣ−1y)
Σ =
⎡⎣ 1 .1
.1 1
⎤⎦
y1
y 2
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Gaussian distribution
p(y|Σ) ∝ exp(−1
2yTΣ−1y)
Σ =
⎡⎣ 1 .0
.0 1
⎤⎦
y1
y 2
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Gaussian distribution
p(y|Σ) ∝ exp(−1
2yTΣ−1y)
Σ =
⎡⎣ 1 .7
.7 1
⎤⎦
y1
y 2
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Gaussian distribution
p(y|Σ) ∝ exp(−1
2yTΣ−1y)
Σ =
⎡⎣ 1 .7
.7 1
⎤⎦
y1
y 2
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Gaussian distribution
p(y2|y1,Σ) ∝ exp(−1
2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1
2
y1
y 2
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Gaussian distribution
p(y2|y1,Σ) ∝ exp(−1
2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1
2
y1
y 2
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Gaussian distribution
p(y2|y1,Σ) ∝ exp(−1
2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1
2
y1
y 2
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Gaussian distribution
p(y2|y1,Σ) ∝ exp(−1
2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1
2
y1
y 2
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Gaussian distribution
p(y2|y1,Σ) ∝ exp(−1
2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1
2
y1
y 2
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Gaussian distribution
p(y2|y1,Σ) ∝ exp(−1
2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1
2
y1
y 2
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Gaussian distribution
p(y2|y1,Σ) ∝ exp(−1
2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1
2
y1
y 2
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 2
1 2−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎣ 1 .9
.9 1
⎤⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 5
1 2 3 4 5−3
−2
−1
0
1
2
3
variable index
y
Σ =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 .9 .8 .6 .4
.9 1 .9 .8 .6
.8 .9 1 .9 .8
.6 .8 .9 1 .9
.4 .6 .8 .9 1
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
New visualisation
−2 0 2−3
−2
−1
0
1
2
3
y1
y 20
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
Regression using Gaussians
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
Regression using Gaussians
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
Regression using Gaussians
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
1 10 20
1
10
20
Regression using Gaussians
1 2 3 4 5 6 7 8 9 1011121314151617181920−3
−2
−1
0
1
2
3
variable index
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)
1 10 20
1
10
20
Regression: probabilistic inference in function space
−3
−2
−1
0
1
2
3
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)
1 10 20
1
10
20
Regression: probabilistic inference in function space
−3
−2
−1
0
1
2
3
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric)
K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)
1 10 20
1
10
20
Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)
Regression: probabilistic inference in function space
−3
−2
−1
0
1
2
3
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
Non-parametric (∞-parametric)
p(y|θ) = N (0,Σ)
function estimatewith uncertaintyK(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
1 10 20
1
10
20
Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)
Regression: probabilistic inference in function space
−3
−2
−1
0
1
2
3
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric)
noise
K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)
1 10 20
1
10
20
Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)
Regression: probabilistic inference in function space
−3
−2
−1
0
1
2
3
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric)
horizontal-scale
noise
K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)
1 10 20
1
10
20
Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)
Regression: probabilistic inference in function space
−3
−2
−1
0
1
2
3
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric)
noise
Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)
1 10 20
1
10
20
horizontal-scalevertical-scale
Squared Exponential (SE) kernel
Usual choice for covariance is the Squared Exponential (SE)kernel (a.k.a. RBF, exponentiated quadratic, Gaussian):
k(x, x′) = σ2 exp(− (x − x′)2
2l2
)
The kernel’s parameters {l, σ} are called the GP’shyperparameters.
When l is very high, neighbouring points are very correlatedi.e. that feature is less relevant.
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
Σ =
short horizontal length-scale
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
short horizontal length-scale
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
short horizontal length-scale
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
short horizontal length-scale
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
short horizontal length-scale
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
short horizontal length-scale
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
short horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
short horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
short horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
short horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric)
short horizontal length-scale
Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
short horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric)
short horizontal length-scale
Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
short horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
short horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
short horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric)
short horizontal length-scale
Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric)
short horizontal length-scale
Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric)
short horizontal length-scale
Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric)
short horizontal length-scale
Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric)
short horizontal length-scale
Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
Σ =
long horizontal length-scale
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
long horizontal length-scale
Σ(x1, x2) = K(x1, x2) + Iσ2y
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
long horizontal length-scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
What effect do the hyper-parameters have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Σ(x1, x2) = K(x1, x2) + Iσ2y
small vertical scale
p(y|θ) = N (0,Σ)
Non-parametric (∞-parametric) Parametric model
y(x) = f(x; θ) + σyε
ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1
2l2(x1 − x2)2
)K(x1, x2) = σ2 exp
(− 1
2l2(x1 − x2)2
)
Hyperparameters
Training a GP means:
� finding the right kernel� finding values of hyperparameters
The covariance function controls properties of the GP:
� smoothness� stationarity� periodicity� etc.
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
Σ =
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Laplacian covariance function
Browninan motion
Ornstein-Uhlenbeck
K(x1, x2) = σ2 exp(− 1
2l2|x1 − x2|
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
Σ =
Rational Quadratic
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
Rational Quadratic
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Rational Quadratic
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Rational Quadratic
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Rational Quadratic
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
Rational Quadratic
K(x1, x2) = σ2(1 + 1
2αl2|x1 − x2|
)−α
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
Σ =
Periodic
sinusoid × squared exponential
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
What effect does the form of the covariance function have?
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
−3
−2
−1
0
1
2
3
x
y
Σ =
K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1
2l2(x1 − x2)2
)
Periodic
sinusoid × squared exponential
Inference
GP is an infinite dimensional object but we only need to workwith finite dimensional objects.
For prediction (and the Bayesian update) we are interested inthe conditional distribution of ‘new’ points y∗ given observedpoints y:
p(y∗|y) =p(y, y∗)
p(y)
Using the GP definition, we know that:
p(y) = N(m,Σ), p(y, y∗) = N([
mm∗
],
[Σ ΣT∗Σ∗ Σ∗∗
])
where we ignored the white noise for simplification and Σ arecovariance matrices.
Inference
Using the multi-variate Gaussian distribution properties:
p(y∗|y) = N(m∗ + ΣT∗ Σ−1(y −m),Σ∗∗ − ΣT∗ Σ−1Σ∗)
Predictive mean is linear in the data:
μy∗|y = ΣT∗ Σ−1y
Predictive covariance is the prior covariance minus theinformation the observations give about the function:
Σy∗|y = Σ∗∗ − ΣT∗ Σ−1Σ∗
Relationship to Dirichlet Processes
A Gaussian process defines a distribution over functions:
f ∼ GP(m, k)
GPs can be viewed as infinite-dimensional Gaussiandistributions.
A Dirichlet process (DP) defines a distribution overdistributions:
G ∼ DP(G0, α0)
DPs can be viewed as infinite-dimensional Dirichletdistributions.
Both f and G are infinite-dimensional objects and both makeuse of conjugacy (Gaussian-Gaussian andMultinomial-Dirichlet).
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1co
effic
ient
s
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.5
1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1
x
f(x)
γ ∼ N (0, 1)
g(x) = e− 1
l2(x−μ)2
f(x) = γg(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1co
effic
ient
s
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.5
1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1
x
f(x)
γ ∼ N (0, 1)
g(x) = e− 1
l2(x−μ)2
f(x) = γg(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1co
effic
ient
s
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.5
1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1
x
f(x)
γ ∼ N (0, 1)
g(x) = e− 1
l2(x−μ)2
f(x) = γg(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1co
effic
ient
s
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.5
1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1
x
f(x)
γ ∼ N (0, 1)
g(x) = e− 1
l2(x−μ)2
f(x) = γg(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1co
effic
ient
s
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.5
1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1
x
f(x)
γ ∼ N (0, 1)
g(x) = e− 1
l2(x−μ)2
f(x) = γg(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1co
effic
ient
s
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.5
1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1
x
f(x)
γ ∼ N (0, 1)
g(x) = e− 1
l2(x−μ)2
f(x) = γg(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1co
effic
ient
s
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.5
1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1
x
f(x)
γ ∼ N (0, 1)
g(x) = e− 1
l2(x−μ)2
f(x) = γg(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1co
effic
ient
s
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.5
1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1
x
f(x)
γ ∼ N (0, 1)
g(x) = e− 1
l2(x−μ)2
f(x) = γg(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1co
effic
ient
s
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.5
1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1
x
f(x)
γ ∼ N (0, 1)
g(x) = e− 1
l2(x−μ)2
f(x) = γg(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1co
effic
ient
s
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.5
1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−1
0
1
x
f(x)
γ ∼ N (0, 1)
g(x) = e− 1
l2(x−μ)2
f(x) = γg(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−101
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.2
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−101
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.2
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−101
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.2
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−101
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.2
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−101
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.2
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−101
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.2
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−101
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.2
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−101
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.2
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−101
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.2
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−101
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.2
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−202
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.05
0.1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−202
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.05
0.1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−202
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.05
0.1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−202
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.05
0.1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−202
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.05
0.1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−202
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.05
0.1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−202
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.05
0.1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−202
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.05
0.1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−202
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.05
0.1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−202
coef
ficie
nts
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.05
0.1
basi
s fu
nctio
ns
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
m(x) = 〈f(x)〉
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
m(x) = 〈∑k γkgk(x)〉
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
m(x) =∑
k〈γk〉gk(x)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
m(x) = 0
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
m(x) = 0
K(x, x′) = 〈f(x)f(x′)〉
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
m(x) = 0
K(x, x′) = 〈∑k γkgk(x)∑
k′ γk′gk′(x′)〉
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
m(x) = 0
K(x, x′) =∑
k,k′〈γkγk′〉gk(x)gk′(x′)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
m(x) = 0
K(x, x′) =∑
k,k′ δk,k′gk(x)gk′(x′)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
m(x) = 0
K(x, x′) =∑
k gk(x)gk(x′)
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
m(x) = 0
K(x, x′) =∑
k gk(x)gk(x′)
K(x, x′) = 1K
∑k e− 1
l2(x−k/K)2− 1
l2(x′−k/K)2
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
m(x) = 0
K(x, x′) =∑
k gk(x)gk(x′)
K(x, x′) = 1K
∑k e− 1
l2(x−k/K)2− 1
l2(x′−k/K)2
K(x, x′) −→ ∫du e− 1
l2(x−u)2− 1
l2(x′−u)2
K → ∞
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
m(x) = 0
K(x, x′) =∑
k gk(x)gk(x′)
K(x, x′) = 1K
∑k e− 1
l2(x−k/K)2− 1
l2(x′−k/K)2
K(x, x′) −→ ∫du e− 1
l2(x−u)2− 1
l2(x′−u)2
K → ∞
K(x, x′) ∝ e− 1
2l2(x−x′)2
Basis function view of Gaussian processes
−10 −8 −6 −4 −2 0 2 4 6 8 10
−0.5
0
0.5
x
f(x)
γk ∼ N (0, 1)
gk(x) = 1√K
e− 1
l2(x−k/K)2
f(x) =∑K
k=1 γkgk(x)
m(x) = 0
K(x, x′) =∑
k gk(x)gk(x′)
K(x, x′) = 1K
∑k e− 1
l2(x−k/K)2− 1
l2(x′−k/K)2
K(x, x′) −→ ∫du e− 1
l2(x−u)2− 1
l2(x′−u)2
K → ∞
K(x, x′) ∝ e− 1
2l2(x−x′)2
Gaussian processes ≡ models with ∞ parameters
Basis function view of Gaussian Processes
�����������
� �����
���������
Basis function view of Gaussian Processes
�����������
�������
��������
�������
���������
Basis function view of Gaussian Processes
�����������
�������
��������
�������
���������
��������������
Basis function view of Gaussian Processes
�
��� ��������
��������
�������
����
���� �����
��������������
���������
����� ��������
����� �������
Basis function view of Gaussian Processes
�
�������������
��������
�������
����
���� �����
��������������
���������
����� ��������
��������� � � ������������������� ��� �� ��
��������������� ����� ���
����� ����� �
Basis function view of Gaussian Processes
�
�������������
���
��������
�������
����
�����������
���� ���������
����������
���������� � �
�������
����� ���
������������������������������������� ���� �
������� ���������������
�����������!
Beyond GP regression
� Classification: non-Gaussian likelihoods, use EP or Laplace� Ordinal regression� Count data� Model selection: kernel and hyperparameter optimisation� Kernel design� Extrapolation� Multi-output GPs: for multi-task learning� Sparse GPs: scaling to large number of features and inputs� Latent variable models: GP-LVM as non-linear
probabilistic PCA� And many others ...
Resources
Free book:http://www.gaussianprocess.org/gpml/chapters/
Tutorials
� GPs for Natural Language Processing tutorial (ACL 2014)http://goo.gl/18heUk
� GP Schools in Sheffield and roadshows in Kampala,Pereira, and (future) Nyeri, Melbournehttp://ml.dcs.shef.ac.uk/gpss/
� GP Regression demohttp://www.tmpl.fi/gp/
� Annotated bibliography and other materialshttp://www.gaussianprocess.org
Toolkits
� GPML (Matlab)http://www.gaussianprocess.org/gpml/code
� GPy (Python)https://github.com/SheffieldML/GPy
� GPstuff (R, Matlab, Octave)http://becs.aalto.fi/en/research/bayes/gpstuff/