statistics in hep 2 helge vosshadron collider physics summer school june 8-17, 2011― statistics in...

22
Statistics In HEP 2 Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

Upload: darcy-randall

Post on 19-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

1

Statistics In HEP 2

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

How do we understand/interpret our measurements

Page 2: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

2

Outline

Maximum Likelihood fit strict frequentist Neyman – confidence intervals

what “bothers” people with them

Feldmans/Cousins confidence belts/intervals

Bayesian treatement of ‘unphysical’ results

How the LEP-Higgs limit was derived

what about systematic uncertainties?Profile Likleihood

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

Page 3: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

3

Parameter Estimation

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

want to measure/estimate some parameter e.g. mass, polarisation, etc..

observe: e.g observables for events

“hypothesis” i.e. PDF ) - distribution for given e.g. diff. cross section independent events: P(,..

for fixed regard ) as function of (i.e. Likelihood! L() ) close to Likelihood L() will be large

Glen Cowan: Statistical data analysis

try to maximuse L() typically:

minimize -2Log(L())

Maximum Likelihood estimator

Page 4: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

4

Maximum Likelihood Estimator

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

example: PDF(x) = Gauss(x,μ,σ)

estimator for from the data measured in an experiment

full Likelihood

typically: Note: It’s a function of !

𝜇

−𝟐𝐥𝐧

(𝐋(𝜇

))

�̂�𝑏𝑒𝑠𝑡𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒

For Gaussian PDFs: ==

from interval variance on the estimate

=1

Page 5: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

5

Parameter Estimation

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

properties of estimators

biased or unbiased large or small variance distribution of on many measurements ?

Glen Cowan: 𝜽 𝒕𝒓𝒖𝒆

Small bias and small variance are typically “in conflict” Maximum Likelihood is typically unbiased only in the limit

If Likelihood function is “Gaussian” (often the case for large K central limit theorem)

get “error” estimate from or -2 if (very) none Gaussian

revert typically to (classical) Neyman confidence intervals

Page 6: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

6

Classical Confidence Intervals

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

another way to look at a measurement rigorously “frequentist” Neymans Confidence belt for CL α (e.g. 90%)

Feldman/Cousin (1998) xmeasured

μh

ypo

the

tica

lly

tru

e

each μhypothetically true has a PDF of how the measured values will be distributed

determine the (central) intervals (“acceptance region”) in these PDFs such that they contain α

do this for ALL μhyp.true

connect all the “red dots” confidence belt

measure xobs : conf. interval =[μ1, μ2] given by vertical line intersecting the belt.

by construction: for each xmeas. (taken according PDF() the confidence interval [μ1, μ2] contains in α = 90% cases

μ1

μ2

Page 7: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

7

Classical Confidence Intervals

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

another way to look at a measurement rigorously “frequentist” Neymans Confidence belt for CL α (e.g. 90%)

Feldman/Cousin(1998): xmeasured

μh

ypo

the

tica

lly

tru

e

conf.interval =[μ1, μ2] given byvertical line intersecting the belt. by construction:

if the true value were lies in [, ] if it intersects ▐ xmeas intersects ▬ as in 90%

(that’s how it was constructed)

only those xmeas give [, ]’s that intersect with the ▬

90% of intervals cover

μ1

μ2

is Gaussian ( central 68% Neyman Conf. Intervals Max. Likelihood + its “error” estimate ;

𝝁𝒕𝒓𝒖𝒆

Page 8: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

8

Flip-Flop

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

Feldman/Cousins(1998)

When to quote measuremt or a limit! estimate Gaussian distributed quantity that cannot be < 0 (e.g. mass) same Neuman confidence belt construction as before:

once for measurement (two sided, each tail contains 5% ) once for limit (one sided tails contains 10%)

induces “undercovering” as this acceptance region contains only 85% !!

decide: if <0 assume you =0 conservative

if you observe <3 quote upper limit only

if you observe >3 quote a measurement

Page 9: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

9

Some things people don’t like..

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

Feldman/Cousins(1998).

same example: estimate Gaussian distributed quantity that cannot be < 0 (e.g. mass)

using proper confidence belt assume:

confidence interval is EMPTY!

Note: that’s OK from the frequentist interpretation

in 90% of (hypothetical)

measurements.

Obviously we were ‘unlucky’ to pick one out of the remaining 10%

nontheless: tempted to “flip-flop” ??? tsz .. tsz.. tsz..

Page 10: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

10

Feldman Cousins: a Unified Approach

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

How we determine the “acceptance” region for each μhyp.true is up to us as long as it covers the desired integral of size α (e.g. 90%)

include those “xmeas. ” for which the large likelihood ratio first:

𝑹=𝑳 (𝒙𝒎𝒆𝒂𝒔|𝝁𝒎𝒆𝒂𝒔𝒖𝒓𝒆𝒅 )𝑳 (𝒙𝒎𝒆𝒂𝒔|𝝁𝒃𝒆𝒔𝒕𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆 )

here: either the observation or the closes ALLOWED

α = 90%

𝑹

No “empty intervals anymore!

Page 11: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

11

Being Lucky…

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

give upper limit on signal on top of know (mean) background n=s+b from a possion distribution

Neyman: draw confidence belt with “” in the “y-axis” (the possible true values of )

for fixed Background

Glen Cowan: Statistical data analysisBackground so

rry…

th

e p

lots

do

n’t

mat

ch

: o

ne

is

of

90%

CL

th

e o

ther

fo

r 95

%C

L

observed n

2 experiments (E1 and E2) , observing (out of luck) 0E1: 95% limit on E2: 95% limit on UNFAIR ! ?

Page 12: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

12

Being Lucky …

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

Feldman/Cousins confidence beltsmotivated by “popular” ‘Bayesian’ approaches to handle such problems.

Bayesian: rather than constructing Confidence belts: turn Likelihood for (on given ) into Posterior probability on add prior probability on “s”:

Background b

Up

per

lim

it o

n s

ign

al

𝒏𝒐𝒃𝒔

Feldman/Cousins• there is still SOME “unfairness”• perfectly “fine” in frequentist

interpretation:• should quote “limit+sensitivity”0 1 10

Feldman/Cousins(1998).Helene(1983).

Page 13: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

13

Statistical Tests in Particle Searches

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

exclusion limits• upper limit on cross section

(lower limit on mass scale)• (σ < limit as otherwise we would have seen

it)

need to estimate probability of downward fluctuation of s+btry to “disprove” H0 = s+bbetter: find minimal s, for which you can sill exclude H0 = s+b a pre-specified Confidence Level

discoveriesneed to estimate probability of upward fluctuation of btry to disprove H0 =“background only”

b only

Nobs for 5s discoveryP = 2.8 10–7

possible observation

b only s+b

type 1 error

Page 14: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

14

Which Test Statistic to used?

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

exclusion limit: • test statistic does not necessarily have to be simply the counted

number of events: remember Neyman Pearson Likelihood ratio

pre-specify 95% CL on exclusion make your measurements

if accepted (i.e. in “chritical (red) region”) where you decided to “recjet” H0 = s+b

(i.e. what would have been the chance for THIS particular

measurement to still have been “fooled” and there would have actually BEEN a signal)

P(t

_

t

b only s+b

Page 15: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

15

Example :LEP-Higgs search

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

Remember: there were 4 experiments, many different search channels treat different exerpiments just like “more channels”

Evaluate how the -2lnQ is distributed for background only signal+background (note: needs to be done for all Higgs masses)

example: mH=115GeV/c2

more signal like more background like

𝑪𝑳𝒔+𝒃𝑪𝑳𝒃

Page 16: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

Example: LEP SM Higgs Limit

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 16

Page 17: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

17

Example LEP Higgs Search

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

𝑪𝑳𝒔+𝒃𝑪𝑳𝒃

more signal like more background like

In order to “avoid” the possible “problem” of Being Lucky when setting the limit

rather than “quoting” in addition the expected sensitivity weight your CLs+b by it:

𝑪𝑳𝒔=𝑪𝑳𝒔+𝒃

𝑪 𝑳𝒃

=𝑷 (𝑳𝑳𝑹≥𝑳𝑳𝑹𝒐𝒃𝒔∨𝑯𝟏 )𝑷 (𝑳𝑳𝑹≤𝑳𝑳𝑹𝒐𝒃𝒔∨𝑯𝟎 )

Page 18: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

18

Systmatic Uncertainties standard popular way: (Cousin/Highland)

integrate over all systematic errors and their “Probability distribution)

marginalisation of the “joint probability density of measurement paremters and systematic error)

“hybrid” frequentist intervals and Bayesian systematic

has been shown to have possible large “undercoverage” for very small p-values /large significances (i.e. underestimate the chance of “false discovery” !!

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

! Bayesian ! (probability of the systematic parameter)

LEP-Higgs: generaged MC to get the PDFs with “varying” param. with systematic uncertainty

essentiall the same as “integrating over” need probability density for “how these parameters vary”

Page 19: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

19

Systematic Uncertainties

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

Why don’t we: include any systematic uncertainly as “free parameter” in the fit

K.Cranmer Phystat2003

eg. measure background contribution under signal peak in sidebands

measurement + extrapolation into side bands have uncertainty

but you can parametrise your expected background such that:

if sideband measurement gives this data then b=…

Note: no need to specify prior probability

Build your Likelyhood function such that it includes: your parameters of interest those describing the influcene of the sys. uncertainty nuisance parameters

Page 20: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

20

Nuisance Parameters and Profile Liklihood

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

Build your Likelyhood function such that it includes: your parameters of interest those describing the influcene of the sys. uncertainty nuisance parameters

“ratio of likelihoods”, why ?

Page 21: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

21

Profile Likelihood

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

Why not simply using L(m,q ) as test statistics ?

• The number of degrees of freedom of the fit would be N q + 1m

• However, we are not interested in the values of q ( they are nuisance !)

• Additional degrees of freedom dilute interesting information on m• The “profile likelihood” (= ratio of maximum likelihoods) concentrates the

information on what we are interested in• It is just as we usually do for chi-squared: Dc2(m) = c2(m,qbest’ ) – c2(mbest, qbest)• Nd.o.f. of Dc2(m) is 1, and value of c2(mbest, qbest) measures “Goodness-of-fit”

“ratio of likelihoods”, why ?

Page 22: Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements

22

Summary

Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP

Maximum Likelihood fit to estimate paremters

what to do if estimator is non-gaussion:Neyman – confidence intervals

what “bothers” people with them Feldmans/Cousins confidene belts/intervals

unifies “limit” or “measurement” confidence belts CLs … the HEP limit;

CLs … ratio of “p-values” … statisticians don’t like that

new idea: Power Constrained limits rather than specifying “sensitivity” and “neymand conf. interval” decide beforehand that you’ll “accept” limits only if the where your

exerpiment has sufficient “power” i.e. “sensitivity ! .. a bit about Profile Likelihood, systematic error.