harrison b. prosper workshop on top physics, grenoble bayesian statistics in analysis harrison b....
TRANSCRIPT
Harrison B. Prosper Workshop on Top Physics, Grenoble
Bayesian Statistics in AnalysisBayesian Statistics in Analysis
Harrison B. ProsperFlorida State University
Workshop on Top Physics:
from the TeVatron to the LHCOctober 19, 2007
2Harrison B. Prosper Workshop on Top Physics, Grenoble
OutlineOutline
Introduction
Inference
Model Selection
Summary
3Harrison B. Prosper Workshop on Top Physics, Grenoble
IntroductionIntroduction
Blaise Pascal1670
Thomas Bayes1763
Pierre Simon de Laplace1812
4Harrison B. Prosper Workshop on Top Physics, Grenoble
)(
)()|(
BP
ABPBAP
IntroductionIntroduction
AABB
ABAB
Let P(A) and P(B) be probabilities, assigned to statements, orevents, A and B and let P(AB) be the probability assigned to the joint statement AB, then the conditional probability of A given B is defined by
P(A) is the probability of A withoutthe restriction specified by B.
P(A|B) is the probability of A when we restrict to the conditions specifiedby statement B
)(
)()|(
AP
ABPABP
5Harrison B. Prosper Workshop on Top Physics, Grenoble
Fromwe deduce immediately Bayes’ TheoremBayes’ Theorem:
Fromwe deduce immediately Bayes’ TheoremBayes’ Theorem:
IntroductionIntroduction
( | ) ( )( | )
( )
BP BB
A PP
PA
A
( ) ( | ) ( )
( | ) ( )
P PBA A A
P A
PB
B P B
Bayesian statistics is the application of Bayes’ theorem toproblems of inference
7Harrison B. Prosper Workshop on Top Physics, Grenoble
InferenceInference
The Bayesian approach to inference is conceptually simple andalways the same:
Compute Pr(DataData|ModelModel)
ComputePr(ModelModel|DataData) = Pr(DataData|ModelModel) Pr(ModelModel)/Pr(DataData)
Pr(ModelModel) is called the prior. It is the probabilityassigned to the ModelModel irrespective of the DataData
Pr(DataData|ModelModel) is called the likelihoodPr(ModelModel|DataData) is called the posterior probability
8Harrison B. Prosper Workshop on Top Physics, Grenoble
posterior densityposterior density prior densityprior density
( | ) ( , | )p D p D d marginalizationmarginalization
( | , ) ( , )( , | )
( | , ) ( , )
p Dp D
p D d d
likelihoodlikelihood
In practice, inference is done using the continuous form of Bayes’ theorem:
are the parameters of interest denote all other
parameters of theproblem, which are referred to as nuisance parameters
InferenceInference
9Harrison B. Prosper Workshop on Top Physics, Grenoble
ModelModel
DatumDatum
LikelihoodLikelihood
n bs s is the mean signal count
b is the mean background count
Task: Infer s, given N
}{ND
Prior informationPrior information
max0
ˆ
ss
bb
( | , ) Poisson( , )P D Nb bs s
Example – 1Example – 1
10Harrison B. Prosper Workshop on Top Physics, Grenoble
Apply Bayes’ theorem:
(s,b) is the prior density for s and b, which encodes our prior
knowledge of the signal and background means.
The encoding is often difficult and can be controversial.
priorpriorlikelihoodlikelihoodposteriorposterior
( | , ) ( , )( , | )
( | , ) ( , )
b bb
b b b
s sP Dp Ds
s ds sP D d
Example – 1Example – 1
11Harrison B. Prosper Workshop on Top Physics, Grenoble
First factor the prior
( | ) ( | , ) ( )s sD dbP bl bD Define the marginal likelihood
( , ) ( | ) ( )
( ) ( )
s b b s
b
s
s
( | ) ( )( | )
( | ) ( )
s ss
s
l Dp
D sD
l ds
and write the posterior density for the signal as
Example – 1Example – 1
12Harrison B. Prosper Workshop on Top Physics, Grenoble
The Background Prior DensitySuppose that the background has been estimated from a Monte Carlo simulation of the background process, yieldingB events that pass the cuts.
Assume that the probability for the count B is given by P(B|) = Poisson(B), where is the (unknown) mean count of the Monte Carlo sample. We can infer the value of by applying Bayes’ theorem to the Monte Carlo background experiment
( | ) ( )( | )
( | ) ( )
P Bp B
P B d
Example – 1Example – 1
13Harrison B. Prosper Workshop on Top Physics, Grenoble
The Background Prior DensityAssuming a flat prior prior () = constant, we find
p(|B) = Gamma (, 1, B+1) (= B exp(–)/B!).
Often the mean background count b in the real experiment is related to the mean count in the Monte Carlo experiment linearly, b = k , where k is an accurately known scale factor, for example, the ratio of the data to Monte Carlo integrated luminosities.
The background can be estimated as followsˆ ,k kb B b B
Example – 1Example – 1
14Harrison B. Prosper Workshop on Top Physics, Grenoble
The Background Prior DensityThe posterior density p(|B) now serves as the prior density for the background b in the real experiment
(b) = p(|B), where b = k.
We can write ( | ) ( | , ) ( )l D s k P D s k k d
( | ) ( )( | )
( | ) ( )
l D s sp s D
l D s s ds
and
Example – 1Example – 1
15Harrison B. Prosper Workshop on Top Physics, Grenoble
( )
0
10
( | ) ( | , ) ( )
( )
! !
( 1)
! (1 ) ( )! !
s k N B
r N rNs
N r Br
l D s P D s k k d
e s k ed
N B
s k N r Be
r k N r B
The calculation of the marginal likelihood yields:
Example – 1Example – 1
16Harrison B. Prosper Workshop on Top Physics, Grenoble
ji
N
jji apd
1
Data partitioned into K bins and modeled by a sum of N sources of strength p. The numbers A are the source distributions for model M. Each M corresponds to a different top signal + background model
1
( | , , ) exp( ) !i
KD
i ii
P D a p d d DM
1
( , , ) ( ) exp( ) !ji
NA
ji ji jij
a p p a a AM
Example – 2: Top Mass – Run IExample – 2: Top Mass – Run I
priorprior
likelihoodlikelihood
modelmodel
( | ) ( , , | )P D P a p D daM M dp posteriorposterior
Harrison B. Prosper Workshop on Top Physics, Grenoble
130 140 150 160 170 180 190 200 210 220 2300
0.1
0.2
0.3
Probability of Model M
Top Quark Mass (GeV/c**2)
P(M
|d)
mtop = 173.5 ± 4.5 GeVs = 33 ± 8 eventsb = 50.8 ± 8.3 events
Example – 2: Top Mass – Run IExample – 2: Top Mass – Run I
18Harrison B. Prosper Workshop on Top Physics, Grenoble
To Bin Or Not To BinTo Bin Or Not To Bin
Binned – Pros Likelihood can be modeled accurately Bins with low counts can be handled exactly Statistical uncertainties handled exactly
Binned – Cons Information loss can be severe Suffers from the curse of dimensionality
19Harrison B. Prosper Workshop on Top Physics, Grenoble
December 8, 2006 - Binned likelihoods do work!
To Bin Or Not To BinTo Bin Or Not To Bin
20Harrison B. Prosper Workshop on Top Physics, Grenoble
To Bin Or Not To BinTo Bin Or Not To Bin
Un-Binned – Pros No loss of information (in principle)
Un-Binned – Cons Can be difficult to model likelihood accurately.
Requires fitting (either parametric or KDE) Error in likelihood grows approximately
linearly with the sample size. So at LHC, large sample sizes could become an issue.
21Harrison B. Prosper Workshop on Top Physics, Grenoble
1
1 1
( | , , ) exp( ) !
exp( ) !
i
i
KD
i ii
KKD
i ii i
P D a b d d D
d d D
likelihoodlikelihood
Start with the standard binned likelihood over K bins
i i id a b modelmodel
Un-binned Likelihood FunctionsUn-binned Likelihood Functions
22Harrison B. Prosper Workshop on Top Physics, Grenoble
1
1
( | , , ) exp[ ( ( ) ( )) ]
[ ( ) ( )]
exp[ ( )] [ ( ) ( )]
i i
K
i i ii
K
i ii
P D A B a x b x dx
a x b x x
A B a x b x
Make the bins smaller and smaller
( ) [ ( ) ( )]i i i i
i
d d x dx a x b x x the likelihood becomes
where K is now the number of eventsand a(x) and b(x) arethe effective luminosity and background densities, respectively, and A and B are their integrals
Un-binned Likelihood FunctionsUn-binned Likelihood Functions
23Harrison B. Prosper Workshop on Top Physics, Grenoble
1
( | , , ) exp[ ( )] [ ( ) ( )]K
i ii
p D A B A B a x b x
The un-binned likelihood function
is an example of a marked Poisson likelihood. Each event is marked by the discriminating variable xi, which could be multi-dimensional.
The various methods for measuring the top cross section and massdiffer in the choice of discriminating variables x.
Un-binned Likelihood FunctionsUn-binned Likelihood Functions
24Harrison B. Prosper Workshop on Top Physics, Grenoble
( , , , , )A B
Note: Since the functions a(x) and b(x) have to be modeled, they will depend on sets of modeling parameters and , respectively. Therefore, in general, the un-binned likelihood function is
which must be combined with a prior density
1
( | , , , , ) exp[ ( )] [ ( , ) ( , )]K
i ii
p D A B A B a x b x
to compute the posterior density for the cross section
( | ) ( | , , , , ) ( , , , , )p D dA dB d d p D A B A B
Un-binned Likelihood FunctionsUn-binned Likelihood Functions
25Harrison B. Prosper Workshop on Top Physics, Grenoble
If we write s(x) = a(x), and S = A we can re-write the un-binned likelihood function as
Computing the Un-binned Likelihood FunctionComputing the Un-binned Likelihood Function
Since a likelihood function is defined only to within a scaling by aparameter-independent quantity, we are free to scale it by, for example, the observed distribution d(x)
1
( | , ) exp[ ( )] [ ( ) ( )]K
i ii
p D S B S B s x b x
1
( ) ( )( | , ) exp[ ( )]
( )
Ki i
i i
s x b xp D S B S B
d x
26Harrison B. Prosper Workshop on Top Physics, Grenoble
One way to approximate the ratio [s(x)+ b(x)]/d(x) is with a neural network function trained with an admixture of data, signal and background in the ratio 2:1:1.
If the training can be done accurately enough, the network will approximate
n(x) = [s(x)+ b(x)]/[ s(x)+b(x)+d(x)]
in which case we can then write
1
( )( | , ) exp[ ( )]
1 ( )
Ki
i i
n xp D S B S B
n x
Computing the Un-binned Likelihood FunctionComputing the Un-binned Likelihood Function
28Harrison B. Prosper Workshop on Top Physics, Grenoble
posteriorposterior priorprior
( | ) ( | , , )
( , | )
M M
M M M M
p D p D
d d
M M
M
( | ) ( )( | )
( )
Mp MM
D PP D
p D
evidenceevidence
Model SelectionModel Selection
Model selection can also be addressed using Bayes’ theorem. It requires computing
where the evidence for model M is defined by
29Harrison B. Prosper Workshop on Top Physics, Grenoble
posterior oddsposterior odds prior oddsprior odds
( | ) ( | ) ( )
( | ) ( | ) ( )
P D p D P
P D p D
M M M
PN N N
Bayes factorBayes factor
Model SelectionModel Selection
The Bayes Factor, BMN, or any one-to-one function thereof, can be used to choose between two competing models M and N, e.g., signal + background versus background only.
However, one must be careful to use proper priors.
30Harrison B. Prosper Workshop on Top Physics, Grenoble
Model 1Model 1 ( | , ) Poisson( , ), ( , )
( | ) Poisson( , ), ( )
P D s b N s b s b
P D b N b b
Model Model
22
Model Selection – ExampleModel Selection – Example
Consider the following two prototypical models
12
Poisson( , ) ( , )( | )
( | ) Poisson( , ) )2
1
(
N s b s b dsdbP DB
P D N b b db
The Bayes factor for these models is given by
31Harrison B. Prosper Workshop on Top Physics, Grenoble
Model Selection – ExampleModel Selection – Example
Calibration of Bayes FactorsConsider the quantity (called the Kullback-Leibler divergence)
( | )( || ) ( | ) ln2
1
( | )21 1
P Dk P D dD
P D
For the simple Poisson models with known signal and background, it is easy to show that
1( || ) (2 ) ln 1s
k s s bb
For s << b, we get √k(2||1) ≈ s /√b. That is, roughly speaking,for s << b, √ ln B12 ≈ s /√b
32Harrison B. Prosper Workshop on Top Physics, Grenoble
SummarySummary
Bayesian statistics is a well-founded and general framework for thinking about and solving analysis problems, including: Analysis design Modeling uncertainty Parameter estimation Interval estimation (limit setting) Model selection Signal/background discrimination etc.
It well worth learning how to think this way!