harrison b. prosper workshop on top physics, grenoble bayesian statistics in analysis harrison b....

32
on B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics: from the TeVatron to the LHC October 19, 2007

Upload: arabella-cobb

Post on 30-Dec-2015

218 views

Category:

Documents


4 download

TRANSCRIPT

Harrison B. Prosper Workshop on Top Physics, Grenoble

Bayesian Statistics in AnalysisBayesian Statistics in Analysis

Harrison B. ProsperFlorida State University

Workshop on Top Physics:

from the TeVatron to the LHCOctober 19, 2007

2Harrison B. Prosper Workshop on Top Physics, Grenoble

OutlineOutline

Introduction

Inference

Model Selection

Summary

3Harrison B. Prosper Workshop on Top Physics, Grenoble

IntroductionIntroduction

Blaise Pascal1670

Thomas Bayes1763

Pierre Simon de Laplace1812

4Harrison B. Prosper Workshop on Top Physics, Grenoble

)(

)()|(

BP

ABPBAP

IntroductionIntroduction

AABB

ABAB

Let P(A) and P(B) be probabilities, assigned to statements, orevents, A and B and let P(AB) be the probability assigned to the joint statement AB, then the conditional probability of A given B is defined by

P(A) is the probability of A withoutthe restriction specified by B.

P(A|B) is the probability of A when we restrict to the conditions specifiedby statement B

)(

)()|(

AP

ABPABP

5Harrison B. Prosper Workshop on Top Physics, Grenoble

Fromwe deduce immediately Bayes’ TheoremBayes’ Theorem:

Fromwe deduce immediately Bayes’ TheoremBayes’ Theorem:

IntroductionIntroduction

( | ) ( )( | )

( )

BP BB

A PP

PA

A

( ) ( | ) ( )

( | ) ( )

P PBA A A

P A

PB

B P B

Bayesian statistics is the application of Bayes’ theorem toproblems of inference

Harrison B. Prosper Workshop on Top Physics, Grenoble

InferenceInference

7Harrison B. Prosper Workshop on Top Physics, Grenoble

InferenceInference

The Bayesian approach to inference is conceptually simple andalways the same:

Compute Pr(DataData|ModelModel)

ComputePr(ModelModel|DataData) = Pr(DataData|ModelModel) Pr(ModelModel)/Pr(DataData)

Pr(ModelModel) is called the prior. It is the probabilityassigned to the ModelModel irrespective of the DataData

Pr(DataData|ModelModel) is called the likelihoodPr(ModelModel|DataData) is called the posterior probability

8Harrison B. Prosper Workshop on Top Physics, Grenoble

posterior densityposterior density prior densityprior density

( | ) ( , | )p D p D d marginalizationmarginalization

( | , ) ( , )( , | )

( | , ) ( , )

p Dp D

p D d d

likelihoodlikelihood

In practice, inference is done using the continuous form of Bayes’ theorem:

are the parameters of interest denote all other

parameters of theproblem, which are referred to as nuisance parameters

InferenceInference

9Harrison B. Prosper Workshop on Top Physics, Grenoble

ModelModel

DatumDatum

LikelihoodLikelihood

n bs s is the mean signal count

b is the mean background count

Task: Infer s, given N

}{ND

Prior informationPrior information

max0

ˆ

ss

bb

( | , ) Poisson( , )P D Nb bs s

Example – 1Example – 1

10Harrison B. Prosper Workshop on Top Physics, Grenoble

Apply Bayes’ theorem:

(s,b) is the prior density for s and b, which encodes our prior

knowledge of the signal and background means.

The encoding is often difficult and can be controversial.

priorpriorlikelihoodlikelihoodposteriorposterior

( | , ) ( , )( , | )

( | , ) ( , )

b bb

b b b

s sP Dp Ds

s ds sP D d

Example – 1Example – 1

11Harrison B. Prosper Workshop on Top Physics, Grenoble

First factor the prior

( | ) ( | , ) ( )s sD dbP bl bD Define the marginal likelihood

( , ) ( | ) ( )

( ) ( )

s b b s

b

s

s

( | ) ( )( | )

( | ) ( )

s ss

s

l Dp

D sD

l ds

and write the posterior density for the signal as

Example – 1Example – 1

12Harrison B. Prosper Workshop on Top Physics, Grenoble

The Background Prior DensitySuppose that the background has been estimated from a Monte Carlo simulation of the background process, yieldingB events that pass the cuts.

Assume that the probability for the count B is given by P(B|) = Poisson(B), where is the (unknown) mean count of the Monte Carlo sample. We can infer the value of by applying Bayes’ theorem to the Monte Carlo background experiment

( | ) ( )( | )

( | ) ( )

P Bp B

P B d

Example – 1Example – 1

13Harrison B. Prosper Workshop on Top Physics, Grenoble

The Background Prior DensityAssuming a flat prior prior () = constant, we find

p(|B) = Gamma (, 1, B+1) (= B exp(–)/B!).

Often the mean background count b in the real experiment is related to the mean count in the Monte Carlo experiment linearly, b = k , where k is an accurately known scale factor, for example, the ratio of the data to Monte Carlo integrated luminosities.

The background can be estimated as followsˆ ,k kb B b B

Example – 1Example – 1

14Harrison B. Prosper Workshop on Top Physics, Grenoble

The Background Prior DensityThe posterior density p(|B) now serves as the prior density for the background b in the real experiment

(b) = p(|B), where b = k.

We can write ( | ) ( | , ) ( )l D s k P D s k k d

( | ) ( )( | )

( | ) ( )

l D s sp s D

l D s s ds

and

Example – 1Example – 1

15Harrison B. Prosper Workshop on Top Physics, Grenoble

( )

0

10

( | ) ( | , ) ( )

( )

! !

( 1)

! (1 ) ( )! !

s k N B

r N rNs

N r Br

l D s P D s k k d

e s k ed

N B

s k N r Be

r k N r B

The calculation of the marginal likelihood yields:

Example – 1Example – 1

16Harrison B. Prosper Workshop on Top Physics, Grenoble

ji

N

jji apd

1

Data partitioned into K bins and modeled by a sum of N sources of strength p. The numbers A are the source distributions for model M. Each M corresponds to a different top signal + background model

1

( | , , ) exp( ) !i

KD

i ii

P D a p d d DM

1

( , , ) ( ) exp( ) !ji

NA

ji ji jij

a p p a a AM

Example – 2: Top Mass – Run IExample – 2: Top Mass – Run I

priorprior

likelihoodlikelihood

modelmodel

( | ) ( , , | )P D P a p D daM M dp posteriorposterior

Harrison B. Prosper Workshop on Top Physics, Grenoble

130 140 150 160 170 180 190 200 210 220 2300

0.1

0.2

0.3

Probability of Model M

Top Quark Mass (GeV/c**2)

P(M

|d)

mtop = 173.5 ± 4.5 GeVs = 33 ± 8 eventsb = 50.8 ± 8.3 events

Example – 2: Top Mass – Run IExample – 2: Top Mass – Run I

18Harrison B. Prosper Workshop on Top Physics, Grenoble

To Bin Or Not To BinTo Bin Or Not To Bin

Binned – Pros Likelihood can be modeled accurately Bins with low counts can be handled exactly Statistical uncertainties handled exactly

Binned – Cons Information loss can be severe Suffers from the curse of dimensionality

19Harrison B. Prosper Workshop on Top Physics, Grenoble

December 8, 2006 - Binned likelihoods do work!

To Bin Or Not To BinTo Bin Or Not To Bin

20Harrison B. Prosper Workshop on Top Physics, Grenoble

To Bin Or Not To BinTo Bin Or Not To Bin

Un-Binned – Pros No loss of information (in principle)

Un-Binned – Cons Can be difficult to model likelihood accurately.

Requires fitting (either parametric or KDE) Error in likelihood grows approximately

linearly with the sample size. So at LHC, large sample sizes could become an issue.

21Harrison B. Prosper Workshop on Top Physics, Grenoble

1

1 1

( | , , ) exp( ) !

exp( ) !

i

i

KD

i ii

KKD

i ii i

P D a b d d D

d d D

likelihoodlikelihood

Start with the standard binned likelihood over K bins

i i id a b modelmodel

Un-binned Likelihood FunctionsUn-binned Likelihood Functions

22Harrison B. Prosper Workshop on Top Physics, Grenoble

1

1

( | , , ) exp[ ( ( ) ( )) ]

[ ( ) ( )]

exp[ ( )] [ ( ) ( )]

i i

K

i i ii

K

i ii

P D A B a x b x dx

a x b x x

A B a x b x

Make the bins smaller and smaller

( ) [ ( ) ( )]i i i i

i

d d x dx a x b x x the likelihood becomes

where K is now the number of eventsand a(x) and b(x) arethe effective luminosity and background densities, respectively, and A and B are their integrals

Un-binned Likelihood FunctionsUn-binned Likelihood Functions

23Harrison B. Prosper Workshop on Top Physics, Grenoble

1

( | , , ) exp[ ( )] [ ( ) ( )]K

i ii

p D A B A B a x b x

The un-binned likelihood function

is an example of a marked Poisson likelihood. Each event is marked by the discriminating variable xi, which could be multi-dimensional.

The various methods for measuring the top cross section and massdiffer in the choice of discriminating variables x.

Un-binned Likelihood FunctionsUn-binned Likelihood Functions

24Harrison B. Prosper Workshop on Top Physics, Grenoble

( , , , , )A B

Note: Since the functions a(x) and b(x) have to be modeled, they will depend on sets of modeling parameters and , respectively. Therefore, in general, the un-binned likelihood function is

which must be combined with a prior density

1

( | , , , , ) exp[ ( )] [ ( , ) ( , )]K

i ii

p D A B A B a x b x

to compute the posterior density for the cross section

( | ) ( | , , , , ) ( , , , , )p D dA dB d d p D A B A B

Un-binned Likelihood FunctionsUn-binned Likelihood Functions

25Harrison B. Prosper Workshop on Top Physics, Grenoble

If we write s(x) = a(x), and S = A we can re-write the un-binned likelihood function as

Computing the Un-binned Likelihood FunctionComputing the Un-binned Likelihood Function

Since a likelihood function is defined only to within a scaling by aparameter-independent quantity, we are free to scale it by, for example, the observed distribution d(x)

1

( | , ) exp[ ( )] [ ( ) ( )]K

i ii

p D S B S B s x b x

1

( ) ( )( | , ) exp[ ( )]

( )

Ki i

i i

s x b xp D S B S B

d x

26Harrison B. Prosper Workshop on Top Physics, Grenoble

One way to approximate the ratio [s(x)+ b(x)]/d(x) is with a neural network function trained with an admixture of data, signal and background in the ratio 2:1:1.

If the training can be done accurately enough, the network will approximate

n(x) = [s(x)+ b(x)]/[ s(x)+b(x)+d(x)]

in which case we can then write

1

( )( | , ) exp[ ( )]

1 ( )

Ki

i i

n xp D S B S B

n x

Computing the Un-binned Likelihood FunctionComputing the Un-binned Likelihood Function

Harrison B. Prosper Workshop on Top Physics, Grenoble

Model SelectionModel Selection

28Harrison B. Prosper Workshop on Top Physics, Grenoble

posteriorposterior priorprior

( | ) ( | , , )

( , | )

M M

M M M M

p D p D

d d

M M

M

( | ) ( )( | )

( )

Mp MM

D PP D

p D

evidenceevidence

Model SelectionModel Selection

Model selection can also be addressed using Bayes’ theorem. It requires computing

where the evidence for model M is defined by

29Harrison B. Prosper Workshop on Top Physics, Grenoble

posterior oddsposterior odds prior oddsprior odds

( | ) ( | ) ( )

( | ) ( | ) ( )

P D p D P

P D p D

M M M

PN N N

Bayes factorBayes factor

Model SelectionModel Selection

The Bayes Factor, BMN, or any one-to-one function thereof, can be used to choose between two competing models M and N, e.g., signal + background versus background only.

However, one must be careful to use proper priors.

30Harrison B. Prosper Workshop on Top Physics, Grenoble

Model 1Model 1 ( | , ) Poisson( , ), ( , )

( | ) Poisson( , ), ( )

P D s b N s b s b

P D b N b b

Model Model

22

Model Selection – ExampleModel Selection – Example

Consider the following two prototypical models

12

Poisson( , ) ( , )( | )

( | ) Poisson( , ) )2

1

(

N s b s b dsdbP DB

P D N b b db

The Bayes factor for these models is given by

31Harrison B. Prosper Workshop on Top Physics, Grenoble

Model Selection – ExampleModel Selection – Example

Calibration of Bayes FactorsConsider the quantity (called the Kullback-Leibler divergence)

( | )( || ) ( | ) ln2

1

( | )21 1

P Dk P D dD

P D

For the simple Poisson models with known signal and background, it is easy to show that

1( || ) (2 ) ln 1s

k s s bb

For s << b, we get √k(2||1) ≈ s /√b. That is, roughly speaking,for s << b, √ ln B12 ≈ s /√b

32Harrison B. Prosper Workshop on Top Physics, Grenoble

SummarySummary

Bayesian statistics is a well-founded and general framework for thinking about and solving analysis problems, including: Analysis design Modeling uncertainty Parameter estimation Interval estimation (limit setting) Model selection Signal/background discrimination etc.

It well worth learning how to think this way!