![Page 1: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/1.jpg)
Bayesian methods, priors and Gaussian processes
John Paul GoslingDepartment of Probability and Statistics
![Page 2: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/2.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling2
Overview
• The Bayesian paradigm
• Bayesian data modelling
• Quantifying prior beliefs
• Data modelling with Gaussian processes
![Page 3: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/3.jpg)
Bayesian methods
The beginning, the subjectivist philosophy, and an overview of Bayesian techniques.
![Page 4: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/4.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling4
Subjective probability
• Bayesian statistics involves a very different way of thinking about probability in comparison to classical inference.
• The probability of a proposition is defined to a measure of a person’s degree of belief.
• Wherever there is uncertainty, there is probability
• This covers aleatory and epistemic uncertainty
![Page 5: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/5.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling5
Differences with classical inferenceTo a frequentist, data are repeatable, parameters are not:
P(data|parameters)
To a Bayesian, the parameters are uncertain, the observed data are not:
P(parameters|data)
![Page 6: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/6.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling6
• This can be extended to continuous distributions:
• In early probability courses, we are taught Bayes’s theorem for events:
• In Bayesian statistics, we use Bayes’s theorem in a particular way:
Bayes’s theorem for distributions
![Page 7: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/7.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling7
Prior Posterior
Prior to posterior updating
Bayes’s theorem is used to update our beliefs.
The posterior is proportional to the prior times the likelihood.
Data
![Page 8: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/8.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling8
Posterior distribution
• So, once we have our posterior, we have captured all our beliefs about the parameter of interest.
• We can use this to do informal inference, i.e. intervals, summary statistics.
• Formally, to make choices about the parameter, we must couple this with decision theory to calculate the optimal decision.
![Page 9: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/9.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling9
Data
Sequential updating
Today’s posterior is tomorrow’s prior
Prior beliefs
Posterior beliefs
More dataPosterior beliefs
![Page 10: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/10.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling10
The triplot
• A triplot gives a graphical representation of prior to posterior updating.
Prior
Likelihood
Posterior
![Page 11: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/11.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling11
Audience participation
Quantification of our prior beliefs
• What proportion of people in this room are left handed? – call this parameter ψ
• When I toss this coin, what’s the probability of me getting a tail? – call this θ
![Page 12: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/12.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling12
A simple example
• The archetypal example in probability theory is the outcome of tossing a coin.
• Each toss of a coin is a Bernoulli trial with the probability of tails given by θ.
• If we carry out 10 independent trials, we know the number of tails(X) will follow a binomial distribution. [X | θ ~ Bi(10, θ)]
![Page 13: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/13.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling13
Our prior distribution
• A Beta(2,2) distribution may reflect our beliefs about θ.
![Page 14: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/14.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling14
Our posterior distribution
• If we observe X = 3, we get the following triplot:
![Page 15: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/15.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling15
Our posterior distribution
• If we are more convinced, a priori, that θ = 0.5 and we observe X = 3, we get the following triplot:
![Page 16: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/16.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling16
Credible intervals
• If asked to provide an interval in which there is a 90% chance of θ lying, we can derive this directly from our posterior distribution.
• Such an interval is called a credible interval.
• In frequentist statistics, there are confidence intervals that cannotcannot be interpreted in the same way.
• In our example, using our first prior distribution, we can report a 95% posterior credible interval for θ of (0.14,0.62).
![Page 17: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/17.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling17
• Yesterday we saw a lot of this:
• We have a least squares solution given by
• Instead of trying to find the optimal set of parameters, we express our beliefs about them.
Basic linear model
![Page 18: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/18.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling18
Basic linear model
• By selecting appropriate priors for the two parameters, we can derive the posterior analytically.
• It is a normal inverse-gamma distribution.
• The mean of our posterior distribution is then
which is a weighted average of the LSE and prior mean.
![Page 19: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/19.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling19
Bayesian model comparison
• Suppose we have two plausible models for a set of data, M and N say.
• We can calculate posterior odds in favour of M using
![Page 20: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/20.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling20
Bayesian model comparison
• The Bayes factor is calculated using
• A Bayes factor that is greater than one would mean that your odds in favour of M increase.
• Bayes factors naturally help guard against too much model structure.
![Page 21: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/21.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling21
Advantages/Disadvantages• Bayesian methods are often more complex
than frequentist methods.
• There is not much software to give scientists off-the-shelf analyses.
• Subjectivity: all the inferences are based on somebody’s beliefs.
![Page 22: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/22.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling22
Advantages/Disadvantages• Bayesian statistics offers a framework to deal with all
the uncertainty.• Bayesians make use of more information – not just
the data in their particular experiment.• The Bayesian paradigm is very flexible and it is able
to tackle problems that frequentist techniques could not.
• In selecting priors and likelihoods, Bayesians are showing their hands – they can’t get away with making arbitrary choices when it comes to inference.
• …
![Page 23: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/23.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling23
Summary
• The basic principles of Bayesian statistics have been covered.
• We have seen how we update our beliefs in the light of data.
• Hopefully, I’ve convinced you that the Bayesian way is the right way.
![Page 24: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/24.jpg)
Priors
Advice on choosing suitable prior distributions and eliciting their parameters.
![Page 25: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/25.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling25
Importance of priors
• As we saw in the previous section, prior beliefs about uncertain parameters are a fundamental part of Bayesian statistics.
• When we have few data about the parameter of interest, our prior beliefs dominate inference about that parameter.
• In any application, effort should be made to model our prior beliefs accurately.
![Page 26: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/26.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling26
Weak prior information• If we accept the subjective nature of Bayesian statistics
and are not comfortable using subjective priors, then many have argued that we should try to specify prior distributions that represent no prior information.
• These prior distributions are called noninformative, reference, ignorance or weak priors.
• The idea is to have a completely flat prior distribution over all possible values of the parameter.
• Unfortunately, this can lead to improper distributions being used.
![Page 27: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/27.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling27
Weak prior information
• In our coin tossing example, Be(1,1), Be(0.5,0.5) and Be(0,0) have been recommended as noninformative priors. Be(0,0) is improper.
![Page 28: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/28.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling28
Conjugate priors
• When we move away from noninformative priors, we might use priors that are in a convenient form.
• That is a form where combining them with the likelihood produces a distribution from the same family.
• In our example, the beta distribution is a conjugate prior for a binomial likelihood.
![Page 29: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/29.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling29
Informative priors
• An informative prior is an accurate representation of our prior beliefs.
• We are not interested in the prior being part of some conjugate family.
• An informative prior is essential when we have few or no data for the parameter of interest.
• Elicitation, in this context, is the process of translating someone’s beliefs into a distribution.
![Page 30: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/30.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling30
Elicitation• It is unrealistic to expect someone to be
able to fully specify their beliefs in terms of a probability distribution.
• Often, they are only able to report a few summaries of the distribution.
• We usually work with medians, modes and percentiles.
• Sometimes they are able to report means and variances, but there are more doubts about these values.
![Page 31: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/31.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling31
Elicitation
• Once we have some information about their beliefs, we fit some parametric distribution to them.
• These distribution almost never fit the judgements precisely.
• There are nonparametric techniques that can bypass this.
• Feedback is essential in the elicitation process.
![Page 32: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/32.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling32
Normal with unknown mean
Noninformative prior:
![Page 33: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/33.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling33
Normal with unknown mean
Conjugate prior:
![Page 34: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/34.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling34
Normal with unknown mean
Proper prior:
![Page 35: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/35.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling35
Structuring prior information
• It is possible to structure our prior beliefs in a hierarchical manner:
• Here is referred to as the hyperparameter(s).
Data model:
First level of prior:
Second level of prior:
x
![Page 36: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/36.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling36
• An example of this type of hierarchical is a nonparametric regression model.
• We want to know about μ so the other parameters must be removed. The other parameters are known as nuisance parameters.
Structuring prior information
Data model:
First level of prior:
Second level of prior:
![Page 37: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/37.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling37
Analytical tractability
• The more complexity that is built into your prior and likelihood the more likely it is that you won’t be able to derive your posterior analytically.
• In the ’90’s, computational techniques were devised to combat this.
• Markov chain Monte Carlo (MCMC) techniques allow us to access our posterior distributions even in complex models.
![Page 38: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/38.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling38
Sensitivity analysis
• It is clear that the elicitation of prior distributions is far from being a precise science.
• A good Bayesian analysis will check that the conclusions are sufficiently robust to changes in the prior.
• If they aren’t, we need more data or more agreement on the prior structure.
![Page 39: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/39.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling39
Summary
• Prior distributions are an important part of Bayesian statistics.
• They are far from being ad hoc, pick-the-easiest-to-use distributions when modelled properly.
• There are classes of noninformative priors that allow us to represent ignorance.
![Page 40: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/40.jpg)
Gaussian processes
A Bayesian data modelling technique that fully accounts for uncertainty.
![Page 41: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/41.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling41
Data modelling: a fully probabilistic method
• Bayesian statistics offers a framework to account for uncertainty in data modelling.
• In this section, we’ll concentrate on regression using Gaussian processes and the associated Bayesian techniques
![Page 42: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/42.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling42
We have:
or
and are uncertain.
In order to proceed, we must elicit our beliefs about these two.
can be dealt with as in the previous section.
The basic idea
![Page 43: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/43.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling43
• We assume that f(.) follows a Gaussian process a priori.
• That is:
• i.e. any sample of f(x)’s will follow a MV-normal.
Gaussian processes
A process is Gaussian if and only if every finite sample from the process is a vector-valued Gaussian random variable.
![Page 44: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/44.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling44
Gaussian processes
We have prior beliefs about the form of the underlying model.
We observe/experiment to get data about the model with which we train
our GP.
We are left with our posterior beliefs about the model, which can have a ‘nice’ form.
![Page 45: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/45.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling45
A simple example
Warning: more audience
participation coming up
![Page 46: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/46.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling46
A simple example
• Imagine we have data about some one dimensional phenomenon.
• Also, we’ll assume that there is no observational error.
• We’ll start with five data points between 0 and 4.
• A priori, we believe is roughly linear and differentiable everywhere.
![Page 47: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/47.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling47
A simple example
![Page 48: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/48.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling48
A simple example
![Page 49: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/49.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling49
A simple example
![Page 50: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/50.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling50
A simple example with error
• Now, we’ll start over and put some Gaussian error on the observations.
• Note: in kriging, this is equivalent to adding a nugget effect.
,
![Page 51: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/51.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling51
A simple example with error
![Page 52: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/52.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling52
The mean function
Recall that our prior mean for is given by
where is vector of regression
functions evaluated at and is a vector of
unknown coefficients.
The form of the regression functions is dependent on the application.
![Page 53: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/53.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling53
The mean function
• It is common practice to use a constant (bias)
• Linear functions
• Gaussian basis functions
• Trigonometric basis functions
• …
It is important to capture your beliefs about in the mean function.
![Page 54: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/54.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling54
The correlation structure
The correlation function defines how we believe
will deviate nonparametrically from the mean
function.
In the examples here, I have used a stationary correlation function of the form:
![Page 55: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/55.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling55
Dealing with the model parametersWe have the following hyperparameters:
can be removed analytically using conjugate priors.
are not so easily accounted for…
![Page 56: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/56.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling56
A 2-D exampleRock porosity somewhere in the US
![Page 57: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/57.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling57
A 2-D example
Mean of our posterior beliefs about the underlying model, f(.).
![Page 58: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/58.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling58
A 2-D example
Mean of our posterior beliefs about the underlying model, f(.), in 3D!!!
![Page 59: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/59.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling59
A 2-D example
Our uncertainty about f(.) – two standard deviations
![Page 60: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/60.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling60
A 2-D example
Our uncertainty about f(.) looks much better in 3D.
![Page 61: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/61.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling61
A 2-D example - prediction
• The geologists held back two observations at:P1 = (0.60,0.35), z1 = 10.0 and P2 (0.20,0.90), z2 = 20.8
• Using our posterior distribution for f(.) and e, we get the following 90% credible intervals:
z1|rest of points in (8.7,12.0) and
z2|rest of points in (21.1,26.0)
![Page 62: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/62.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling62
Diagnostics• Cross validation allows us to check the validity of
our GP fit.• Two variations are often used: leave-one-out or
leave-final-20% out.• Leave-one-out
• Hyperparameters use all data and are then fixed when prediction is carried out for each omitted point.
• Leave-final-20%-out (hold out)• Hyperparameters are estimated using the reduced data
subset.
• Cross validation is not enough to justify GP fit.
![Page 63: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/63.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling63
Cross validation for the 2-D e.g.• Applying leave-one-out cross validation gives a
RMSE of:
Constant: 2.1787
Linear: 2.1185
(Using a linear function, reduces RMSE by 2.8%)• Applying leave-last-20%-out cross validation gives:
Constant: 6.8684
Linear: 5.7466
(A 16.3% difference)
![Page 64: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/64.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling64
Benefits and limitations of GPs• Gaussian processes offer a rich class of models,
which, when fitted properly, is extremely flexible.
• It also offers us a framework in which we can account for all of our uncertainty.
• If there are discontinuities, the method will struggle to provide a good fit.
• The computation time hinges on the inversion of a square matrix of size (number of data points).
![Page 65: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/65.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling65
Extensions• Nonstationarity in the covariance can be
modelled by added extra levels to variance term or deforming the input space.
• Discontinuity can be handled by using piecewise Gaussian process models.
• The GP model can be applied in a classification setting.
• There is a lot more research on GPs and there probably will be a way of using them in your applications.
![Page 66: Bayesian methods, priors and Gaussian processes John Paul Gosling Department of Probability and Statistics](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649d7b5503460f94a5f345/html5/thumbnails/66.jpg)
24-25 January 2007 An Overview of State-of-the-Art Data Modelling66
Further details
I have set up a section on my website that has a comprehensive list of references for extended information on the topics covered in this presentation.
j-p-gosling.staff.shef.ac.uk