bayesian methods i: parameter estimation “a statistician is a person who draws a mathematically...
TRANSCRIPT
![Page 1: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/1.jpg)
Bayesian Methods I: Parameter Estimation
“A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone conclusion.”
Anon
![Page 2: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/2.jpg)
The Medical TestYou take a test for a rare debilitating disease, Frequentitis:
• False Positive rate for the test = 5%• False Negative rate for the test = 1%• The incidence of Frequentitis in the population is 0.1%• Data D = You test positive.
Bayes TheoremP(H|D,I) = P(D|I)
P(H|I) P(D|H I) What are the odds H that you have the disease?
Normalization factor P(D|I) ensures i P(Hi |D,I) = 1 P(D|I) = P(H|I) P(D|H I) + P(H|I) P(D|H I)
P(H|D,I) = 0.1% 99%0.1% 99% + 99.9% 5%
= 0.019
The odds are 1.9% (not 95%) that you have the disease!
(Sum Rule)
![Page 3: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/3.jpg)
Two basic classes of Inference
Which of two or more competing models is the most probable given our present state of knowledge?
1. Model Comparison
• Competing models may have free parameters• Models may vary in complexity (some with more free parameters)• Generally, model comparison is not concerned with finding values• Free parameters are usually Marginalized out in the analysis.
Given a certain model, what is the probability density function for each of its free parameters?
2. Parameter Estimation
• Suppose model M has free parameters f and A• We wish to find p( f | D, M, I) and p( A | D, M, I) • p( f | D, M, I) is known as the Marginal Posterior Distribution for f
![Page 4: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/4.jpg)
Where: 0 = 37 & L = 2 (channels)
{ }
Spectral Line fittingGaussian line profile in noisy data. We are given the model M:
Tfi = T exp -(i - 0)2
2L2
Noise has been independently characterized as Gaussian withn = 1 (in units of the signal)
Estimates of T from theory are uncertain over three orders of magnitude from 0.1 to 100
![Page 5: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/5.jpg)
Parameter Estimation: Spectral Line FitP(T|D,M,I) =
P(D|,M,I) P(T|M,I) P(D|M,T,
I) P(T|M,I) P(D|M,T,
I) Calculating the likelihood P(D|M,T, I)
di = Tfi + ei P(D|M,T, I) = P(E1,E2,E3...En||M,T, I) = i
P(Ei||M,T, I)
{ } 1 exp -(di - Tfi)2
2n2 i
= n sqrt(2)
N
{ } exp-i(di - Tfi)2
2n2 = n
-N (2)-N/2
Prior Likelihood
For Gaussian Noise
![Page 6: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/6.jpg)
Parameter Estimation: Spectral Line Fit
What to use for the Prior P(T|M, I) ?For now let us assume a Uniform prior: Tmax < T < Tmin Posterior Likelihood.
Maximum Likelihood Estimator
![Page 7: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/7.jpg)
The choice of Prior
Our choice of prior can have strong influence the outcome of a Bayesian analysis. In our example, we adopted a uniform prior for the unknown line strength. Was this the right thing to do?
Tmax < T < Tmin where Tmin = 0.1 and Tmax= 100 (given in problem)
Implication: we don’t know the scale. Uniform prior heavily weights to upper decade of range. In such cases, consider the use of the scale invariant Jeffreys Prior (equal probability per decade).
P(T|I)= T ln(Tmax / Tmin)1
The Jeffreys Prior is defined: Jeffreys
UniformJeffreys
Uniform
PDF Prob. Per log interval
![Page 8: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/8.jpg)
Varying the Prior: Spectral Line Fit
JeffreysUniform
Posterior PDF for line strength T
![Page 9: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/9.jpg)
Increasing the linestrength T
In the case of a stronger line detection, the data make a more powerful constraint on the parameters, so that the choice of prior is less critical.
JeffreysUniform
Posterior PDF for line strength T
Stronger linestrength case
![Page 10: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/10.jpg)
Ignorance Priors
Location Parameter: Measured location from some origin:p(X|I)= p(x → x+dx|I)
The solution to this is: pdf(x) = constant (uniform prior)
From a different (arbitrary) origin p(X’|I)= p(x’ → x’+dx’|I) wherex’ = x + c
Indifference requires p(X|I) = p(X’|I) so that pdf(x)=pdf(x’)=pdf(x+c)
Principle of indifference states p(Ai|B) = 1/N where N possible states
How to select a prior when (we think) we have no clue?
![Page 11: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/11.jpg)
Ignorance Priors
The solution to this is: pdf(t) = constant/t (Jeffreys prior)
Ignorance of a scale parameter implies we should have invariance of the distribution when measured either in units t or t’= βt
Then: p(T|I) dT = p(T’|I) dT’ = p(T’|I) dβT = β p(T’|I) dT
Scale Parameter: for example, half life of a new radioactive element
pdf(t)= β pdf(t’)= β pdf(β t)
![Page 12: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/12.jpg)
Improper PriorsSuppose we have absolutely no idea of the limits, Xmin and Xmax (a recent physics example: the distance to a GRB).
A uniform prior with an infinite range cannot be normalized. Such priors are known as Improper Priors.
Improper priors can still be used for parameter estimation problems (like the previous problem), but not for model comparison (Lecture 4) where the normalization of the prior is required to obtain probabilities
![Page 13: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/13.jpg)
Nuisance ParametersFrequently, we are only interested in a subset of the model parameters. Uninteresting parameters are called Nuisance Parameters.• Example: We may be interested in the frequency
of a sinusoidal signal in a noisy dataset, but not interested in the phase or the amplitude a.
P(|D,I) = d da P(, , a|D,I)
We obtain the marginal posterior for by the process of marginalization (by integration) of the nuisance parameters:
![Page 14: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/14.jpg)
Where: 0 = 37 & L = 2 (channels)
{ }
Spectral Line fitting
Gaussian line profile in noisy data. We are given the model M:
Tfi = T exp -(i - 0)2
2L2
Noise has been independently characterized as Gaussian withn = 1 (in units of the signal)
Estimates of T from theory are uncertain over three orders of magnitude from 0.1 to 100
Likely Examples of Nuisance Parameters
![Page 15: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/15.jpg)
Maximum Likelihood
Then we have:
Assume a uniform PriorIgnore normalization factor
P(X|D,I) = P(D|I)
P(X|I) P(D|X,I) Hypothesis H pertains to the PDF for a variable x
P(X|D,I) P(D|X I)
The value x0 which gives the maximum value for the posterior in this case is that which maximises the value of the likelihood function P(D|X,I), and is referred to as the Maximum Likelihood estimator.
![Page 16: Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649f005503460f94c161a6/html5/thumbnails/16.jpg)
Maximum Likelihood and Least Squares
Assuming the noise is Gaussian, then for each datum:
Where Fi = f(x,i) is our ideal (noiseless) model prediction
i P(Di||X, I)
{ } exp -(Fi - Di )2
2i2 i
√(2)
N
(Fi - Di)2
Given a set of data D, where individual data points are independent, then the likelihood is:
P(D|X, I) =
P(Di||X, I) = 1
Thus P(D|X,I) exp( - ) χ2 2
χ2 = where i i2
N
Since the location of a maximum is not affected by a monotonic transformation:
L = ln [ P(X|D,I) ] = constant - χ2
2Maximum Likelihood is obtained by minimizing χ2 we have derived the well-known Least Squares optimization result.