introduction to gaussian processesdanielpr/files/gptut-penn_0.pdfintroduction to gaussian processes...

Introduction to Gaussian Processes

Daniel Preotiuc-Pietro

Positive Psychology CenterUniversity of Pennsylvania

13 October 2014

with slides from Trevor Cohn, Neil Lawrence, Richard Turner

Gaussian Processes

Brings together several key ideas in one framework

� Bayesian� kernelised� non-parametric� non-linear� modelling uncertainty

Elegant and powerful framework, with growing popularity inmachine learning and application domains.

Gaussian Processes

State of the art for regression

� exact posterior inference� supports very complex non-linear functions� elegant model selection

Gaussian Processes

Now mature enough for use in NLP

� support for classification, ranking, etc� fancy kernels, e.g., text� sparse approximations for large scale inference� probabilistic formulation allows incorporation into larger

graphical models� models the prediction uncertainty so that it’s propagated

through pipelines of probabilistic components

Gaussian Processes

Regression and classification are fundamental techniques inNLP

Linear models:

� simple and fast to run� provide straightforward interpretability� but very limited in the type of relations they can model

Non-linear models (e.g. SVM):

� better results because they allow to model otherrelationships

� but high cost of hyperparameter optimisation� lack of interoperability with downstream processing

Gaussian Processes

The Gaussian distribution:

� is a distribution over vectors

The Gaussian process:

� is a distribution over functions

Gaussian Processes, prior

� the prior represents our prior beliefs over the functions weexpect to learn

� here, standard choices: 0 mean, 1 variance, ‘smooth’functions

Gaussian Process, posterior

� after observing data points, adjust the posteriordistribution over functions

� the functions must pass ’close’ to the observations,compute posterior mean and variance

Gaussian Processes

GP demo: http://www.tmpl.fi/gp/

Relationship to Linear regression

Linear regression:y = wx + ε (1)

where w are parameters we learn.

Gaussian process regression:

y = f (x) + ε (2)

where f is drawn from a Gaussian process prior.

� the w parameters are gone, hence the term‘non-parametric’ for GP’s

� f is not limited to a parametric form

Graphical model view

f ∼ GP(m, k)

y ∼ N( f (x), σ2)

� f : RD− > R is a latentfunction

� y is a noisy realisationof f (x)

� k is the covariancefunction or kernel

� m and σ2 is a learnedparameter

Formal definition

Definition

A Gaussian process is a collection of random variables, anyfinite number of which have a joint Gaussian distribution.

A GP is fully specified by mean m(x) and covariance k(x, x′)functions:

� m(x) = E[ f (x)]

� k(x, x′) = E[( f (x) −m(x))(( f (x′) −m(x′))]

Without loss of generality we can assume m(x) = 0.

Compared to a Gaussian distribution

Gaussian distribution:

� specified by a mean and covariance matrix: x ∼ N(μ,Σ)� the vector index is the position of the random variables x

Gaussian process:

� specified by a mean and covariance function: f ∼ GP(m, k)� the index is the argument x of f

The Gaussian Process is an infinite extension of multivariateGaussian distributions

Gaussian distribution

p(y|Σ) ∝ exp(−1

2yTΣ−1y)

⎡⎣ 1 .7

⎤⎦

−3 −2 −1 0 1 2 3−3

2yTΣ−1y)

⎡⎣ 1 .7

⎤⎦

−3 −2 −1 0 1 2 3−3

2yTΣ−1y)

⎡⎣ 1 .6

⎤⎦

−3 −2 −1 0 1 2 3−3

2yTΣ−1y)

⎡⎣ 1 .4

⎤⎦

−3 −2 −1 0 1 2 3−3

2yTΣ−1y)

⎡⎣ 1 .1

⎤⎦

−3 −2 −1 0 1 2 3−3

2yTΣ−1y)

⎡⎣ 1 .0

⎤⎦

−3 −2 −1 0 1 2 3−3

2yTΣ−1y)

⎡⎣ 1 .7

⎤⎦

−3 −2 −1 0 1 2 3−3

2yTΣ−1y)

⎡⎣ 1 .7

⎤⎦

−3 −2 −1 0 1 2 3−3

p(y2|y1,Σ) ∝ exp(−1

2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1

−3 −2 −1 0 1 2 3−3

p(y2|y1,Σ) ∝ exp(−1

2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1

−3 −2 −1 0 1 2 3−3

p(y2|y1,Σ) ∝ exp(−1

2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1

−3 −2 −1 0 1 2 3−3

p(y2|y1,Σ) ∝ exp(−1

2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1

−3 −2 −1 0 1 2 3−3

p(y2|y1,Σ) ∝ exp(−1

2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1

−3 −2 −1 0 1 2 3−3

p(y2|y1,Σ) ∝ exp(−1

2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1

−3 −2 −1 0 1 2 3−3

p(y2|y1,Σ) ∝ exp(−1

2(y2 − μ∗)Σ∗−1(y2 − μ∗)) 1

−3 −2 −1 0 1 2 3−3

New visualisation

−2 0 2−3

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2−3

variable index

⎡⎣ 1 .9

⎤⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5−3

variable index

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 .9 .8 .6 .4

.9 1 .9 .8 .6

.8 .9 1 .9 .8

.6 .8 .9 1 .9

.4 .6 .8 .9 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

New visualisation

−2 0 2−3

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

Regression using Gaussians

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

1 10 20

1 2 3 4 5 6 7 8 9 1011121314151617181920−3

variable index

Σ(x1, x2) = K(x1, x2) + Iσ2y

K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

1 10 20

Regression: probabilistic inference in function space

Σ(x1, x2) = K(x1, x2) + Iσ2y

K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

1 10 20

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric)

K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

1 10 20

Parametric model

y(x) = f(x; θ) + σyε

ε ∼ N (0, 1)

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

function estimatewith uncertaintyK(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

1 10 20

Parametric model

ε ∼ N (0, 1)

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

1 10 20

Parametric model

ε ∼ N (0, 1)

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

horizontal-scale

K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

1 10 20

Parametric model

ε ∼ N (0, 1)

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Parametric model

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

1 10 20

horizontal-scalevertical-scale

Squared Exponential (SE) kernel

Usual choice for covariance is the Squared Exponential (SE)kernel (a.k.a. RBF, exponentiated quadratic, Gaussian):

k(x, x′) = σ2 exp(− (x − x′)2

The kernel’s parameters {l, σ} are called the GP’shyperparameters.

When l is very high, neighbouring points are very correlatedi.e. that feature is less relevant.

What effect do the hyper-parameters have?

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Non-parametric (∞-parametric) Parametric model

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

short horizontal length-scale

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Parametric model

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Parametric model

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Parametric model

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Parametric model

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Parametric model

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Parametric model

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

Parametric model

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

long horizontal length-scale

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

small vertical scale

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Σ(x1, x2) = K(x1, x2) + Iσ2y

p(y|θ) = N (0,Σ)

ε ∼ N (0, 1)K(x1, x2) = σ2 exp(− 1

2l2(x1 − x2)2

)K(x1, x2) = σ2 exp

(− 1

2l2(x1 − x2)2

Hyperparameters

Training a GP means:

� finding the right kernel� finding values of hyperparameters

The covariance function controls properties of the GP:

� smoothness� stationarity� periodicity� etc.

What effect does the form of the covariance function have?

Laplacian covariance function

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Browninan motion

Ornstein-Uhlenbeck

K(x1, x2) = σ2 exp(− 1

2l2|x1 − x2|

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Rational Quadratic

K(x1, x2) = σ2(1 + 1

2αl2|x1 − x2|

)−α

Periodic

sinusoid × squared exponential

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

K(x1, x2) = σ2 cos(ω(x1 − x2)) exp(− 1

2l2(x1 − x2)2

Periodic

Inference

GP is an infinite dimensional object but we only need to workwith finite dimensional objects.

For prediction (and the Bayesian update) we are interested inthe conditional distribution of ‘new’ points y∗ given observedpoints y:

p(y∗|y) =p(y, y∗)

Using the GP definition, we know that:

p(y) = N(m,Σ), p(y, y∗) = N([

[Σ ΣT∗Σ∗ Σ∗∗

where we ignored the white noise for simplification and Σ arecovariance matrices.

Inference

Using the multi-variate Gaussian distribution properties:

p(y∗|y) = N(m∗ + ΣT∗ Σ−1(y −m),Σ∗∗ − ΣT∗ Σ−1Σ∗)

Predictive mean is linear in the data:

μy∗|y = ΣT∗ Σ−1y

Predictive covariance is the prior covariance minus theinformation the observations give about the function:

Σy∗|y = Σ∗∗ − ΣT∗ Σ−1Σ∗

Relationship to Dirichlet Processes

A Gaussian process defines a distribution over functions:

f ∼ GP(m, k)

GPs can be viewed as infinite-dimensional Gaussiandistributions.

A Dirichlet process (DP) defines a distribution overdistributions:

G ∼ DP(G0, α0)

DPs can be viewed as infinite-dimensional Dirichletdistributions.

Both f and G are infinite-dimensional objects and both makeuse of conjugacy (Gaussian-Gaussian andMultinomial-Dirichlet).

Basis function view of Gaussian processes

−10 −8 −6 −4 −2 0 2 4 6 8 10

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

γ ∼ N (0, 1)

g(x) = e− 1

l2(x−μ)2

f(x) = γg(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−101

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−202

−10 −8 −6 −4 −2 0 2 4 6 8 100

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 〈f(x)〉

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 〈∑k γkgk(x)〉

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) =∑

k〈γk〉gk(x)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) = 〈f(x)f(x′)〉

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) = 〈∑k γkgk(x)∑

k′ γk′gk′(x′)〉

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) =∑

k,k′〈γkγk′〉gk(x)gk′(x′)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) =∑

k,k′ δk,k′gk(x)gk′(x′)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) =∑

k gk(x)gk(x′)

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) =∑

k gk(x)gk(x′)

K(x, x′) = 1K

∑k e− 1

l2(x−k/K)2− 1

l2(x′−k/K)2

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) =∑

k gk(x)gk(x′)

K(x, x′) = 1K

∑k e− 1

l2(x−k/K)2− 1

l2(x′−k/K)2

K(x, x′) −→ ∫du e− 1

l2(x−u)2− 1

l2(x′−u)2

K → ∞

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) =∑

k gk(x)gk(x′)

K(x, x′) = 1K

∑k e− 1

l2(x−k/K)2− 1

l2(x′−k/K)2

K(x, x′) −→ ∫du e− 1

l2(x−u)2− 1

l2(x′−u)2

K → ∞

K(x, x′) ∝ e− 1

2l2(x−x′)2

−10 −8 −6 −4 −2 0 2 4 6 8 10

−0.5

γk ∼ N (0, 1)

gk(x) = 1√K

e− 1

l2(x−k/K)2

f(x) =∑K

k=1 γkgk(x)

m(x) = 0

K(x, x′) =∑

k gk(x)gk(x′)

K(x, x′) = 1K

∑k e− 1

l2(x−k/K)2− 1

l2(x′−k/K)2

K(x, x′) −→ ∫du e− 1

l2(x−u)2− 1

l2(x′−u)2

K → ∞

K(x, x′) ∝ e− 1

2l2(x−x′)2

Gaussian processes ≡ models with ∞ parameters

Basis function view of Gaussian Processes

��

� ��

��

��!

Beyond GP regression

� Classification: non-Gaussian likelihoods, use EP or Laplace� Ordinal regression� Count data� Model selection: kernel and hyperparameter optimisation� Kernel design� Extrapolation� Multi-output GPs: for multi-task learning� Sparse GPs: scaling to large number of features and inputs� Latent variable models: GP-LVM as non-linear

probabilistic PCA� And many others ...

Resources

Free book:http://www.gaussianprocess.org/gpml/chapters/

Tutorials

� GPs for Natural Language Processing tutorial (ACL 2014)http://goo.gl/18heUk

� GP Schools in Sheffield and roadshows in Kampala,Pereira, and (future) Nyeri, Melbournehttp://ml.dcs.shef.ac.uk/gpss/

� GP Regression demohttp://www.tmpl.fi/gp/

� Annotated bibliography and other materialshttp://www.gaussianprocess.org

Toolkits

� GPML (Matlab)http://www.gaussianprocess.org/gpml/code

� GPy (Python)https://github.com/SheffieldML/GPy

� GPstuff (R, Matlab, Octave)http://becs.aalto.fi/en/research/bayes/gpstuff/

introduction to gaussian processesdanielpr/files/gptut-penn_0.pdfintroduction to gaussian processes...

Documents

squiz iuc - 3.2 mashups

b.s. health science & psychology preot

slides liquidação iuc v4 20140317

regolamento iuc tricase

an analysis of the user occupational class through twitter...

the iuc pre partner programme

preot nicolae grebenea - glasul...

techsoup iuc

multi-task pairwise neural ranking for hashtag...

catehetica preot profesor dumitru calugaru

povestea cescutei - preot arsenie boca

academic programme 2019 - iuc

preot sorin cosma - cateheze

preot dr. pavel vesa - bibliotecaarad.ro

iuc brochure bg

preot, parohie, înnoire - emilianos timiadis

beyond binary labels: political ideology prediction...

steno 20 06 2008 - arad, romania · 2010. 4. 30. · ioan...

iuc npresentation final

1 october 2008the iuc pre partner programme iuc jimma...