goodness of fit, confidence intervals and limits
DESCRIPTION
fourth lecture. Goodness of fit, confidence intervals and limits. Jorge Andre Swieca School Campos do Jordão, January,2003. References. Statistical Data Analysis, G. Cowan , Oxford, 1998 - PowerPoint PPT PresentationTRANSCRIPT
Goodness of fit, confidence intervals and limits
Jorge Andre Swieca School
Campos do Jordão, January,2003
fourth lecture
References
• Statistical Data Analysis, G. Cowan, Oxford, 1998• Statistics, A guide to the Use of Statistical Methods in
the Physical Sciences, R. Barlow, J. Wiley & Sons, 1989;
• Particle Data Group (PDG) Review of Particle Physics, 2002 electronic edition.
• Data Analysis, Statistical and Computational Methods for Scientists and Engineers, S. Brandt, Third Edition, Springer, 1999
Limits
“Tens, como Hamlet, o pavor do desconhecido?Mas o que é conhecido? O que é que tu conheces,Para que chames desconhecido a qualquer coisa em especial?”
Álvaro de Campos (Fernando Pessoa)
“Se têm a verdade, guardem-na!”Lisbon Revisited, Álvaro de Campos
Statistical tests
How well the data stand in agreement with given predicted probabilities – hypothesis.
null hypothesis H0 )|( 0Hxf
alternative )|( 1Hxf
)|( 2Hxf
function of measured variables: test statistics )(xt
)|( 0Htg
cutt
dtHtg )|( 0error first kindsignificance level
cutt
dtHtg )|( 0
power = 1
error second kind
power to discriminateagainst H1
Neyman-Pearson lemma
Where to place tcut?H0 signalH1 background
1-D: efficiency (and purity)m-D:
def. of acceptance region is not obvious),...,( mttt 1
Neyman-Pearson lemma: highest power (highest signal purity)for a given significance level α
region of t-space such that cHtg
Htg
)|(
)|(
1
0 determined by
the desired efficiency
Goodness of fit
how well a given null hypothesis H0 is compatible with the observed data (no reference to other alternative hypothesis)
coins: N tosses, nh , nt= N - nh coin “fair’? H and T equal?
test statistic: nh binomial distribution, p=0.5
hh nNn
hhh nNn
NNnf
2
121
)!(!
!);(
N=20, nh=17
E[nh]=Np=10
0 1 2 3 17 18 19 2010
);():();();(
);():();();(
2020201920182017
203202201200
ffff
ffff
Goodness of fit
P=0.0026 P-value: probability P, under H0, obtain a result as compatible of less with H0 than the one actually observed.
P-value is a random variable, α is a constant specified beforecarrying out the test
Bayesian statistics: use the Bayes theorem to assign a probability to H0 (specify the prior probability)
P value is often interpreted incorrectly as a prob. to H0
P-value: fraction of times on would obtain data as compatiblewith H0 or less so if the experiment (20 coin tosses) were repeated under similar circunstances
Goodness of fit
Easy to identify the region of values of t with equal or less degree of compatibility with the hypothesis than the observed value (alternate hypothesis: p ≠ 0.5)
“optional stopping problem”
Significance of an observed signal
Whether a discrepancy between data and expectation is sufficiently significant to merit a claim for a new discovery
signal event ns, Poisson variable νS
background event nb, Poisson variable νb
bs nnn bs
prob. to observe n events: !
)(),;(
)(
n
enf
bsnbs
bs
experiment: nobs events, quantify our degree of confidence inthe discovery of a new effect (νS≠0)
How likely is to find nobs events or more from background alone?
Significance of an observed signal
obs
obs
nn
n
nbsbsobs nfnfnnP
1
0
010 ),;(),;()(
1
0
1obs bn
n
nb
n
e
!
Ex: expect νb=0.5, nobs= 5 P(n>nobs)=1.7x10-4
this is not the prob. of the hypothesis νS=0 !
this is the prob., under the hypothesis νS=0, of obtainingas many events as observed or more.
Significance of an observed signal
How to report the measurement?
estimate of ν : 55 2254 .. s
misleading: • only two std. deviations from zero• impression that νS is not very incompatible with zero
yes: prob. that a Poisson variable of mean νb will fluctuateup to nobs or higher
no: prob. that a variable with mean nobs will fluctuate down to νb or lower
Pearson’s test 2
histogram of x with N binsni
νi
construct a statistic which reflects the level of agreement between observed and expected histograms
N
i i
iin
1
2
)( data 5 1 iN nnnn ),,(
aprox. gaussian, Poisson distributedwith ),,( N
1
follow a distribution for N degrees of freedom 2• regardless of the distribution of x• distribution free
larger larger discrepancy between data and the hypothesis
2
Pearson’s test2
2
dznzfP d );( dnE ][ 2 12
dn
(rule of thumbfor a good fit)
130 10 152 . Pnd4 2 1009 100 150 .Pnd
Pearson’s test2
Pearson’s test2
Before
N
iitot nn
1
Poisson variable with
N
iitot
1
Set ntot = fixed ni dist. as multinomial with prob. tot
ii n
p
Not testing the total number of expected and observed Events, but only the distribution of x.
N
i toti
totii
np
npn
1
22 )( large number on entries in each bin
pi known
Follows a distribution for N-1 degrees of freedom2
In general, if m parameters estimated from data, nd = N - m
ML: estimator for θ
Standard deviation as stat. error
n observations of x, hypothesis p.d.f f(x;θ)
),,(ˆ nxx 1analytic methodRCF boundMonte Carlographical
standard deviation ˆˆ
measurement
ˆˆˆ
repeated estimates each based on n obs.: estimator dist. centered around true value θ andwith true estimated by and
);( g
ˆ ˆˆ
Most practical estimators: becomes approx. Gaussian in the large sample limit.
);( g
Classical confidence intervals
n obs. of x, evaluate an estimator for a param. θ ),,(ˆ nxx 1
obs obtained and its p.d.f. (for a given θ unknown));( g
uˆprob. α
prob. β vˆ
)),((ˆ);())(ˆ()(
uGdguPu
1
)),((ˆ);())(ˆ()(
vGdgvPv
Classical confidence intervals
prob. for estimator to be inside the belt regardless of θ
1))(ˆ)(( uvP
)(),( vu monotonic incresingfunctions of θ
)ˆ()ˆ( 1ua )ˆ()ˆ(
1vb
)(ˆ u
)(ˆ v
)ˆ(a
)ˆ(b ))ˆ((aP
))ˆ((bP
1)ˆ()ˆ(( baP
Classical confidence intervals
Usually: central confidence interval 2
1)( baP
a: hypothetical value of for whicha fraction of the repeated estimt. would be higher than the obtain.
obs
)()(ˆ bvauobs
obs
aGdag obs
);ˆ(ˆ);( 1
obs
bGdbg obs
ˆ
);ˆ(ˆ);(
Classical confidence intervals
Relationship between a conf. interval and a test of goodnessof fit:
test the hypothesys using having equalor less agreement than the result obtained
a obs ˆ
P-value = α (random variable) and θ = a is specified
Confidence interval: α is specified first, a is a random quantitydepending on the data
],[ bac
d
ac bd
Classical confidence intervals
Many experiments: the interval would include the truevalue in 1
It does not mean that the probability that the true value of is in the fixed interval is 1
Frequency interpretation: is not a random variable,but the interval fluctuates since it is constructed from data.
Gaussian distributed
Simple and very important application
Central limit theorem: any estimator linear function of sum of random variables becomes Gaussian in the large sample limit.
ˆ))ˆ(
exp(),;(ˆ
ˆ
ˆ
ˆ dG2
2
2 22
1
ˆ known, experiment resulted in obs
)ˆ
(),;ˆ(ˆ
ˆ
aaG obs
obs
11
)ˆ
(),;ˆ(ˆ
ˆ
bbG obs
obs
11
Gaussian distributed
)(ˆˆ 11
obsa
)(ˆˆ 11
obsb
)()( 111
Gaussian distributed
Choose quantile
)( 21 1 1 )( 11 1
1 0.6827 10.8413
2 0.9544 20.9772
3 0.9973 30.9987
Choose confidence level
1 )( 21 1 )( 111
0.90 1.645 0.90 1.2820.95 1.960 0.95 1.6450.99 2.576 0.99
2.326.