hypothesis testing class of “experimental methods of physics” mikhail yurov kyungpook national...

HYPOTHESIS TESTINGHYPOTHESIS TESTING

class ofclass of

““Experimental Methods of Experimental Methods of Physics”Physics”

Mikhail YurovMikhail YurovKyungpook National Kyungpook National

UniversityUniversityMay 9May 9thth, 2005, 2005

Contents

Introduction Use of weighted sum of squared deviations Errors of two sorts Types of hypothesis testing Using the z-statistic

Introduction

In probability theory, we start with some well defined problem, and calculate from this the possible outcomes of a specific experiment. We thus proceed from theory to the data.

In ‘statistics’ we try solve the inverse problem of using the data to enable us to deduce what are the rules or laws relevant for our experiment.

The two basic sorts of problems that we deal with in the subject of statistics are hypothesis testing and parameter fitting.

In the former, we test whether our data are consistent with a specific theory (which may contain some free parameters) and in the latter we use the data to determine the values of the free parameters.

Logically, hypothesis testing precedes parameter fitting, since if our hypothesis is incorrect, then there is no point in determining the values of the free parameters contained within the hypothesis.

In fact, we deal with parameter fitting first, since it is easier to understand. In practice, one often does parameter fitting first anyway; it may be impossible to perform a sensible test of the hypothesis before its free parameters have been set at their optimum values.

2coscos

bad

dn

If the data look inconsistent with this, can we make a numerical estimate indicating how confident we are that the experimental data show that the angular distribution in incorrect?

ExampleExampleSuppose we have data on angular distribution, consisting of a set of values

cosθi for each interaction, where θi is the angle that the observed particle makes with some fixed direction. We can ask.Are the data consistent with an angular distribution of the form?

Use of weighted sum of squared deviations

So, the more fundamental question is of whether our hypothesis concerning the form of the data is correct or not. In fact we will not be able to give ‘yes or not’ answer, but simply to state how confident we are about accepting or rejecting the hypothesis.

In simply cases, the hypothesis may consist simply of a particular value for some parameters.

The desirability of examining a distribution rather than simply determining parameter when we are hypothesis testing. If we fit either the solid or the dashed distribution in cosθ by an expression [1+b/acos2θ], the value of b/a is liable to be close to zero. This does not imply that either distribution is isotropic.

It is preferable to perform distribution testing rather than parameter testing.Distribution are tested by the χ2-method.

In order to test hypothesis we have toa. Construct S and minimize it with respect to the free parameters

2

1

)(

bins

i i

jthi

obsi yy

S

b. Determine the number of degrees of freedom ν fromν=b-pwhere b is the number of bins of the distribution, p is the number of free parameters.

c. look up in the relevant set of tables the probability that, for ν degrees of freedom, χ2 is greater than or equal to our observed value Smin.

χ2-distribution have the property that the expectation value

2

and the variance σ2(χ2)=2ν

χ2-distribution for various numbers of degrees of freedom ν. As ν increases, so do the mean and variance of the distribution.

Thus large values of Smin are unlikely, and so our hypothesis is probably wrong. Very small values of Smin are also unlikely, and so again something is suspicious.

More useful than the χ2 distribution itself is

Fν(c)=Pν(χ2>c)

i.e. the probability that, for the given number of degrees of freedom, the value χ2 will exceed a particular specified value c.

Such distributions are available in almost all books on statistics

ExampleIn a cosθ histogram, let’s assume that there are 12 bins and that when we fit

the expression N(1+b/acos2θ) to the data, we obtain a value of 20.0 for Smin.

In this case we have ten degrees of freedom (12 bins less two parameters N and b/a).

From figure, we see that the probability of getting a value of 20.0 or large is about 3%.

If our experiment is repeated many times, and assuming that our hypothesis is correct, then because of fluctuations we will get a larger value of Smin than particular one we are considering in a fraction F of experiments.

Errors of two sortsIn deciding whether or not to reject a hypothesis, we can make two sorts

incorrect decision.

Error of the first kind

In this case we reject the hypothesis H when it is in fact correct.

This should happen in a well known fraction F of the tests, where F is determined by the maximum accepted value of Smin.

But if we have biases in our experiment so that the actual value of the answer is incorrect, or if our errors are incorrectly estimated, then such errors of the first kind can happen more or less frequently.

The number of errors of the first kind can be reduced simply by increasing the limit on Smin above which we reject the hypothesis.

Error of the second kind

In this case we fail to reject the hypothesis when in fact it is false, and some other hypothesis is correct.

The value of Smin accidentally turns out to be small, even though the hypothesis H (i.e. the theoretical curve yth that is being compared with the data) is incorrect. It is very difficult to estimate how frequent this effect is likely to be; it depends only on the magnitude of the cut for Smin but also on the nature of the competing hypothesis.

If these are known, then we may be able to predict what distribution they will give for Smin and hence how often we will be incorrect in accepting H.

Types of hypothesis testing

The hypothesis we are testing may relate to the experiment as a whole, or alternatively it may be used as a selector for subsets of a data samples which satisfy specific criteria.

Hypothesis relates to whole experiment

We observe an angular distribution from the decay of a resonance. The question is “Does the resonance have spin zero?”, this would imply that the angular distribution is isotropic.

In this case, an error of the first kind is serious and in this example so is an error of the second kind; in the former case, we reject the spin zero case, when it is in fact true, in the latter we accept it when the spin is non-zero.In this experiment, the alternative hypothesis are well defined: if the spin is not zero, it is 1,2,3,.. It may also be possible to calculate angular distribution for these cases, and hence we can deduce how often each of these give a low value for Smin.

ExampleThe angular distribution for the decay of a state whose spin we wish to

determine. If the spin is zero, the distribution must be isotropic (dashed line). We calculate the value of Smin for this hypothesis. There five experimental points and four degrees of freedom for this hypothesis, since the only variable is the normalization.If Smin is large than 10, we would reject this hypothesis - the probability that χ2 for 4 degrees of freedom exceeds 10 is only 5%. In our case Smin is 8.7, so the hypothesis is not rejected.

This does not necessarily mean that the spin is zero. If it were 1, the predicted decay distribution may be cos2θ (dotted curve). The Smin

' for this hypothesis is 4.1, which is also below our rejection cut. The errors on our data are so large that we have poor discrimination between these two hypothesis

Hypothesis used as data selectorAn experiment may consist of a large set of interaction of a beam of protons

with a hydrogen target, in each of which four charged tracks are observed and measured.

We test the hypothesis that this interaction are examples of the reaction

pp→ppπ+π-

The hypothesis is tested by seeing whether the measured direction and momentum of the tracks are consistent with those expected for reaction on the basis of energy and momentum conservation. Here, we are using our hypothesis to check individual sets of data to study with a view to extracting some interesting physics.

Errors of the first kind correspond to rejecting a small fraction of genuine examples of reaction. It is not serious; reduction is the size of the data sample due to the rejection of these events should be small

Errors of the second kind correspond to accepting events as examples of reaction when they in fact are produced by some other reaction with four visible charged tracks, for example

pp→ppμ+μ- (*)

Thus errors of the second kind constitute a potentially more dangerous problem; our data sample is contaminated. The extent of this contamination is difficult to estimate. It will depend on how frequently the reaction (*) produce kinematical configurations resembling those of reaction of interest.Since the μ mass is very close to that of the π, reaction (*) will be difficult to distinguish from primary reaction simply on the basis of measurements of direction and momenta.

These contamination are in general reduced by lowering the value of the cut on Smin.

Using the z-statistic

When σ is known it is possible to describe the form of the distribution of the sample mean as a Z statistic.

x

xz

μ – population mean (either known or hypothesized under H0)

nx /

Critical RegionCritical Region –the portion of the area under the curve which includes those values of a statistic that lead to the rejection of the null hypothesis.

The most often used significance levels are 0.01, 0.05, 0.1. For a one-tailed test using z-statistic, these correspond to z-values of 2.33, 1.65, and 1.28 respectively. For a two-tailed test, the critical region of 0.01 is split into two equal outer areas marked by z-values of |2.58|.

ExampleGiven a population with μ=250 and σ=50, what is the probability of drawing a sample of n=100 values whose mean is at least 255?

In this case, Z=1.00.Looking at Table of Areas Under the Normal Curve, the given area for Z=1.00 is 0.3413. To its right is 0.1587(=0.5-0.3413) or 15.85%ConclusionThere are approximately 16 chances in 100 of obtaining a sample mean 255 from this population when n=100

References

• L.Lyons, “Statistics for nuclear and particle physics”, Cambridge (1985)• William R.Leo “Techniques for Nuclear and Particle Physics Experiments”,

Springer-Verlag Berlin Heidelberg (1987)• http://rvgs.k12.va.us/statman/

hypothesis testing class of “experimental methods of physics” mikhail yurov kyungpook national...

Documents