goodness of fit, confidence intervals and limits

Goodness of fit, confidence intervals and limits

Jorge Andre Swieca School

Campos do Jordão, January,2003

fourth lecture

References

• Statistical Data Analysis, G. Cowan, Oxford, 1998• Statistics, A guide to the Use of Statistical Methods in

the Physical Sciences, R. Barlow, J. Wiley & Sons, 1989;

• Particle Data Group (PDG) Review of Particle Physics, 2002 electronic edition.

• Data Analysis, Statistical and Computational Methods for Scientists and Engineers, S. Brandt, Third Edition, Springer, 1999

Limits

“Tens, como Hamlet, o pavor do desconhecido?Mas o que é conhecido? O que é que tu conheces,Para que chames desconhecido a qualquer coisa em especial?”

Álvaro de Campos (Fernando Pessoa)

“Se têm a verdade, guardem-na!”Lisbon Revisited, Álvaro de Campos

Statistical tests

How well the data stand in agreement with given predicted probabilities – hypothesis.

null hypothesis H0 )|( 0Hxf

alternative )|( 1Hxf

)|( 2Hxf

function of measured variables: test statistics )(xt

)|( 0Htg

cutt

dtHtg )|( 0error first kindsignificance level

cutt

dtHtg )|( 0

power = 1

error second kind

power to discriminateagainst H1

Neyman-Pearson lemma

Where to place tcut?H0 signalH1 background

1-D: efficiency (and purity)m-D:

def. of acceptance region is not obvious),...,( mttt 1

Neyman-Pearson lemma: highest power (highest signal purity)for a given significance level α

region of t-space such that cHtg

Htg

)|(

)|(

1

0 determined by

the desired efficiency

Goodness of fit

how well a given null hypothesis H0 is compatible with the observed data (no reference to other alternative hypothesis)

coins: N tosses, nh , nt= N - nh coin “fair’? H and T equal?

test statistic: nh binomial distribution, p=0.5

hh nNn

hhh nNn

NNnf

2

121

)!(!

!);(

N=20, nh=17

E[nh]=Np=10

0 1 2 3 17 18 19 2010

);():();();(

);():();();(

2020201920182017

203202201200

ffff

ffff

Goodness of fit

P=0.0026 P-value: probability P, under H0, obtain a result as compatible of less with H0 than the one actually observed.

P-value is a random variable, α is a constant specified beforecarrying out the test

Bayesian statistics: use the Bayes theorem to assign a probability to H0 (specify the prior probability)

P value is often interpreted incorrectly as a prob. to H0

P-value: fraction of times on would obtain data as compatiblewith H0 or less so if the experiment (20 coin tosses) were repeated under similar circunstances

Goodness of fit

Easy to identify the region of values of t with equal or less degree of compatibility with the hypothesis than the observed value (alternate hypothesis: p ≠ 0.5)

“optional stopping problem”

Significance of an observed signal

Whether a discrepancy between data and expectation is sufficiently significant to merit a claim for a new discovery

signal event ns, Poisson variable νS

background event nb, Poisson variable νb

bs nnn bs

prob. to observe n events: !

)(),;(

)(

n

enf

bsnbs

bs

experiment: nobs events, quantify our degree of confidence inthe discovery of a new effect (νS≠0)

How likely is to find nobs events or more from background alone?


obs

obs

nn

n

nbsbsobs nfnfnnP

1

0

010 ),;(),;()(

1

0

1obs bn

n

nb

n

e

!

Ex: expect νb=0.5, nobs= 5 P(n>nobs)=1.7x10-4

this is not the prob. of the hypothesis νS=0 !

this is the prob., under the hypothesis νS=0, of obtainingas many events as observed or more.


How to report the measurement?

estimate of ν : 55 2254 .. s

misleading: • only two std. deviations from zero• impression that νS is not very incompatible with zero

yes: prob. that a Poisson variable of mean νb will fluctuateup to nobs or higher

no: prob. that a variable with mean nobs will fluctuate down to νb or lower

Pearson’s test 2

histogram of x with N binsni

νi

construct a statistic which reflects the level of agreement between observed and expected histograms

N

i i

iin

1

2

)( data 5 1 iN nnnn ),,(

aprox. gaussian, Poisson distributedwith ),,( N

1

follow a distribution for N degrees of freedom 2• regardless of the distribution of x• distribution free

larger larger discrepancy between data and the hypothesis

2

Pearson’s test2

2

dznzfP d );( dnE ][ 2 12

dn

(rule of thumbfor a good fit)

130 10 152 . Pnd4 2 1009 100 150 .Pnd

Pearson’s test2

Pearson’s test2

Before

N

iitot nn

1

Poisson variable with

N

iitot

1

Set ntot = fixed ni dist. as multinomial with prob. tot

ii n

p

Not testing the total number of expected and observed Events, but only the distribution of x.

N

i toti

totii

np

npn

1

22 )( large number on entries in each bin

pi known

Follows a distribution for N-1 degrees of freedom2

In general, if m parameters estimated from data, nd = N - m

ML: estimator for θ

Standard deviation as stat. error

n observations of x, hypothesis p.d.f f(x;θ)

),,(ˆ nxx 1analytic methodRCF boundMonte Carlographical

standard deviation ˆˆ

measurement

ˆˆˆ

repeated estimates each based on n obs.: estimator dist. centered around true value θ andwith true estimated by and

);( g

ˆ ˆˆ

Most practical estimators: becomes approx. Gaussian in the large sample limit.

);( g

Classical confidence intervals

n obs. of x, evaluate an estimator for a param. θ ),,(ˆ nxx 1

obs obtained and its p.d.f. (for a given θ unknown));( g

uˆprob. α

prob. β vˆ

)),((ˆ);())(ˆ()(

uGdguPu

1

)),((ˆ);())(ˆ()(

vGdgvPv


prob. for estimator to be inside the belt regardless of θ

1))(ˆ)(( uvP

)(),( vu monotonic incresingfunctions of θ

)ˆ()ˆ( 1ua )ˆ()ˆ(

1vb

)(ˆ u

)(ˆ v

)ˆ(a

)ˆ(b ))ˆ((aP

))ˆ((bP

1)ˆ()ˆ(( baP


Usually: central confidence interval 2

1)( baP

a: hypothetical value of for whicha fraction of the repeated estimt. would be higher than the obtain.

obs

)()(ˆ bvauobs

obs

aGdag obs

);ˆ(ˆ);( 1

obs

bGdbg obs

ˆ

);ˆ(ˆ);(


Relationship between a conf. interval and a test of goodnessof fit:

test the hypothesys using having equalor less agreement than the result obtained

a obs ˆ

P-value = α (random variable) and θ = a is specified

Confidence interval: α is specified first, a is a random quantitydepending on the data

],[ bac

d

ac bd


Many experiments: the interval would include the truevalue in 1

It does not mean that the probability that the true value of is in the fixed interval is 1

Frequency interpretation: is not a random variable,but the interval fluctuates since it is constructed from data.

Gaussian distributed

Simple and very important application

Central limit theorem: any estimator linear function of sum of random variables becomes Gaussian in the large sample limit.

ˆ))ˆ(

exp(),;(ˆ

ˆ

ˆ

ˆ dG2

2

2 22

1

ˆ known, experiment resulted in obs

)ˆ

(),;ˆ(ˆ

ˆ

aaG obs

obs

11

)ˆ

(),;ˆ(ˆ

ˆ

bbG obs

obs

11


)(ˆˆ 11

obsa

)(ˆˆ 11

obsb

)()( 111


Choose quantile

)( 21 1 1 )( 11 1

1 0.6827 10.8413

2 0.9544 20.9772

3 0.9973 30.9987

Choose confidence level

1 )( 21 1 )( 111

0.90 1.645 0.90 1.2820.95 1.960 0.95 1.6450.99 2.576 0.99

2.326.

goodness of fit, confidence intervals and limits

Documents