physics 3700 probability, statistics, & data analysis€¦ · probability, statistics, &...

22
1 Physics 3700 Probability, Statistics, & Data Analysis Introduction: I) The understanding of many physical phenomena relies on statistical and probabilistic concepts: Statistical Mechanics (physics of systems composed of many parts: gases, liquids, solids) 1 mole of anything contains 6x10 23 particles (Avogadro's number) Even though the force between particles (Newtons laws) is known it is impossible to keep track of all 6x10 23 particles even with the fastest computer imaginable We must resort to learning about the group properties of all the particles: use the partition function: calculate average energy, entropy, pressure... of a system Quantum Mechanics (physics at the atomic or smaller scale, < 10 -10 m) wavefunction = probability amplitude talk about the probability of an electron being located at (x,y,z) at a certain time. II) Our understanding/interpretation of experimental data relies on statistical and probabilistic concepts: how do we extract the best value of a quantity from a set of measurements? how do we decide if our experiment is consistent/inconsistent with a given theory? how do we decide if our experiment is internally consistent? how do we decide if our experiment is consistent with other experiments? In this course we will concentrate on II), the above experimental issues! Note: The theory of probability is an area of pure mathematics while statistics is an area of applied mathematics that uses the axioms and definitions of probability theory. R.Kass/Sp15 P3700 Lecture 1

Upload: others

Post on 14-Aug-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

1!

Physics 3700 Probability, Statistics, & Data Analysis

Introduction:!I) The understanding of many physical phenomena relies on statistical and probabilistic concepts:!! !Statistical Mechanics (physics of systems composed of many parts: gases, liquids, solids)!! ! !1 mole of anything contains 6x1023 particles (Avogadro's number)!! ! !Even though the force between particles (Newton�s laws) is known it is impossible to keep track !! ! ! !of all 6x1023 particles even with the fastest computer imaginable!! ! !We must resort to learning about the group properties of all the particles:! use the partition function: calculate average energy, entropy, pressure... of a system!! Quantum Mechanics (physics at the atomic or smaller scale, < 10-10m)!! ! !wavefunction = probability amplitude!! ! ! !talk about the probability of an electron being located at (x,y,z) at a certain time.!

!II) Our understanding/interpretation of experimental data relies on statistical and probabilistic concepts:!! !how do we extract the best value of a quantity from a set of measurements? ! !how do we decide if our experiment is consistent/inconsistent with a given theory? ! !how do we decide if our experiment is internally consistent? ! !how do we decide if our experiment is consistent with other experiments?!!In this course we will concentrate on II), the above experimental issues!!

Note: The theory of probability is an area of pure mathematics while ! statistics is an area of applied mathematics that uses the axioms and definitions of probability theory.!

R.Kass/Sp15! P3700 Lecture 1 !

Page 2: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

2!

An Example from Particle Physics!Many of the process involved with detection of particles are statistical in nature:! Number of ion pairs created when proton goes through 1 cm of gas! Energy lost by an electron going through 1 mm of lead!The understanding and interpretation of all experimental data depend !on statistical and probabilistic concepts:! �The result of the experiment was inconclusive so we had to use statistics� how do we extract the best value of a quantity from a set of measurements? how do we decide if our experiment is consistent/inconsistent with a given theory? how do we decide if our experiment is internally consistent? how do we decide if our experiment is consistent with other experiments?! how do we decide if we have a signal (i.e. evidence for a new particle)?

Consider the recent discovery of the Higgs particle by ATLAS and CMS in 2012. What is a Higgs particle? Responsible for you having MASS! 2013 Nobel Prize to Englert & Higgs for predicting this particle What is the evidence that such a particle has been discovered? How did �statistics� play a role in its discovery? Important to convince skeptics that that there was a very low probability for two experiments to make a mistake at the same time. Can estimate this probability using stuff we learn in this class Also, how do we use the data in an optimal way to calculate the mass of the Higgs?

Peter Higgs!François Englert!

Page 3: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

Higgs decay into 2 high energy gamma rays!data taken over a two year period 2011-2012!

R.Kass/Sp15! 3!P3700 Lecture 1 !

Page 4: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

Higgs decay into 4 leptons!lepton=electron or muon !

data taken over a two year period 2011-2012!

R.Kass/Sp15! 4!P3700 Lecture 1 !

Page 5: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

5!

Definitions of probability:!● !Frequentist definition: Suppose we have N trials and a specified event occurs r times. ! ! !example: the trial could be rolling a dice and the event could be rolling a 6. !! !define probability (P) of an event (E) occurring as: P(E) = r/N when N →∞#! ! !examples: ! ! ! !six sided dice: !P(6) = 1/6!! ! ! ! !for an honest dice: P(1) = P(2) = P(3) = P(4) =P(5) = P(6) =1/6! coin toss: ! ! !P(heads) = P(tails) =0.5 ! ! !P(heads) should approach 0.5 the more times you toss the coin. !! ! !For a single coin toss we can never get P(heads) = 0.5!!

◆ Mathematical definition (Kolmogorov) : !By definition probability (P) is a non-negative real! number bounded by 0≤ P ≤1

!if P = 0 then the event never occurs ! ! !if P = 1 then the event always occurs !! ! ! !Let A and B be subsets of S then P(A)≥0, P(B)≥0!! ! ! !Events are independent if: P(A∩B) = P(A)P(B)!! ! ! ! !Coin tosses are independent events, the result of the next toss does not depend on previous toss. !! ! ! !Events are mutually exclusive (disjoint) if: P(A∩B) = 0 or P(A∪B) = P(A) + P(B)!! ! ! ! !In tossing a coin we either get a head or a tail.!

! ! !Sum (or integral) of all probabilities if they are mutually exclusive must = 1.!◆Bayesian definition: Instead of repeatability relies on degree of belief, obeys Kolmogorov�s axioms. ! Avoids the difficult of N →∞ which never happens in practice.

∩≡intersection! ∪≡ union!

How do we define Probability? !

Α= {1,2,3}#Β={1,3,5}#Α∩Β={1,3} A∪Β= {1,2,3,5}!

R.Kass/Sp15! P3700 Lecture 1 !

Page 6: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

6!

● !Probability can be a discrete or a continuous variable. !! !Discrete probability: P can have certain values only. ! ! !examples: ! ! ! !tossing a six-sided dice: !P(xi) = Pi here xi = 1, 2, 3, 4, 5, 6 and Pi = 1/6 for all xi. ! ! ! !tossing a coin: only 2 choices, heads or tails.!! ! !for both of the above discrete examples (and in general)!! ! !when we sum over all mutually exclusive possibilities:!! ! !!! !Continuous probability: P can be any number between 0 and 1. ! ! !define a �probability density function�, pdf, f(x):

with α a continuous variable!! ! Probability for x to be in the range a ≤ x ≤ b is:!

!!

! ! !Just like the discrete case the sum of all probabilities must equal 1. !!!

! ! ! !We say that f(x) is normalized to one. ! ! !Probability for x to be exactly some number is zero since:!  !

NOTATION!xi is called a!random variable!

Probability=�area under the curve�!

Note: in the above example the pdf depends on only 1 variable, x. In general, the pdf can depend on many!variables, i.e. f=f(x,y,z,…). In these cases the probability is calculated using from multi-dimensional integration.!

P xi( )i∑ =1

f x( )dx = dP x ≤α ≤ x+ dx( )

P(a ≤ x ≤ b) = f x( )a

b∫ dx

f x( )−∞

+∞∫ dx =1

f x( )x=a

x=a∫ dx = 0

R.Kass/Sp15! P3700 Lecture 1 !

Page 7: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

7!

● !Examples of some common P(x)�s and f(x)�s: ! !Discrete = P(x) ! !Continuous = f(x) ! !binomial ! ! ! !uniform, i.e. constant ! !Poisson! ! ! ! !Gaussian ! ! ! ! ! ! ! ! !exponential ! ! ! ! ! ! ! ! !chi square

● !How do we describe a probability distribution?! !◆ !mean, mode, median, and variance!

!◆ !for a normalized continuous distribution, these quantities are defined by:!!!!!!!

!◆ !for a discrete distribution, the mean and variance are defined by:!

Mean Mode Median Varianceaverage most probable 50% point width of distribution

µ = xf (x)dx−∞

+∞

∫∂ f x( )∂x x= a

= 0 0.5 = f (x)dx−∞

a

∫ σ 2 = f (x) x − µ( )2dx−∞

+∞

µ =1n

xii=1

n∑

σ 2 =1n

(xi −µ)2i=1

n∑

f x( )−∞

+∞∫ dx =1

R.Kass/Sp15! P3700 Lecture 1 !

Page 8: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

8!

Some Continuous Probability Distributions!

σ

mode median mean

symmetric distribution (gaussian)Asymmetric distribution showing the mean, median and mode

Chi-square distribution Student t distribution u

v=∞ ⇒gaussian v=1 ⇒Cauchy (Breit-Wigner)

For a Gaussian pdf!the mean, mode,!and median are !all at the same x.!

For many pdfs!the mean, mode,!and median are !in different places.!

Remember: Probability is the area under these curves! For many pdfs its integral can not be done in closed form, use a table to calculate probability.

R.Kass/Sp15! P3700 Lecture 1 !

Page 9: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

9!

● !Calculation of mean and variance:!! !example: a discrete data set consisting of three numbers: {1, 2, 3}!! ! !average (µ) is just:!

!!! ! !Complication: suppose some measurements are more precise than others. !! ! ! !Let each measurement xi have a weight wi associated with it then:!

!!! ! ! variance (σ2) or average squared deviation from the mean is just:!

!!# # # #σ is called the standard deviation!! ! ! !rewrite the above expression by expanding the summations:!

!!!!!!

! Note: The n in the denominator would be n -1 if we determined the average (µ) from the data itself.!

µ =xini=1

n∑ =

1+2+ 33

= 2

µ = xii=1

n∑ wi / wi

i=1

n∑

σ 2 =1n

(xi −µ)2i=1

n∑

σ 2 =1n

xi2 + µ2

i=1

n∑ −2µ xi

i=1

n∑

i=1

n∑%

& '

(

) *

=1n

xi2 +

i=1

n∑ µ2 −2µ2

=1n

xi2 −

i=1

n∑ µ2

The variance!describes!the width !of the pdf !!

�weighted average�!

This is sometimes written as:!<x2>-<x>2 with <>≡ average !of what ever is in the brackets!

R.Kass/Sp15! P3700 Lecture 1 !

Page 10: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

10!

! ! !Using the definition of µ from above we have for our example of {1,2,3}:!!!

! ! !The case where the measurements have different weights is more complicated:!!!! ! ! !Here µ is the weighted mean!! ! ! !If we calculated µ from the data, σ2 gets multiplied by a factor n/(n�1).!

!!Example: a continuous probability distribution, !

!!This �pdf��has two modes! ! ! !It has same mean and median, but differ from the mode(s).!

σ 2 =1n

xi2 −

i=1

n∑ µ2 = 4.67−22 = 0.67

2

11

2

1

2

1

2 //)( µµσ −=−= ∑∑∑∑====

n

ii

n

iii

n

ii

n

iii wxwwxw

constantc,20for sin)( 2 =≤≤= πxxcxf

R.Kass/Sp15! P3700 Lecture 1 !

Page 11: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

11!

! ! !For continuous probability distributions, the mean, mode, and median are#! ! !calculated using either integrals or derivatives:!

!!!!!!!!!!!

!!

!

!

!

!

!

◆ !example: Gaussian distribution function, a continuous probability distribution!€

µ = x sin2 xdx /0

2π∫ sin2 xdx

0

2π∫ = π

mode =∂∂xsin2 x = 0⇒ π

2, 3π2

median = sin2 xdx /0

α∫ sin2 xdx =

120

2π∫ ⇒α = π

π=∫π2

0

2sin:Note xdx

In this class you!should feel free to!use a table of integrals!and/or derivatives.!

f(x)=sin2x is not a true pdf since it is not normalized!!f(x)=(1/π) sin2x is a normalized pdf (c=1/π).!

R.Kass/Sp15! P3700 Lecture 1 !

Page 12: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

12!

Accuracy and Precision Accuracy: The accuracy of an experiment refers to how close the experimental measurement is to the true value of the quantity being measured. Precision: This refers to how well the experimental result has been determined, without regard to the true value of the quantity being measured. !Just because an experiment is precise it does not mean it is accurate!!!!example: measurements of the neutron lifetime over the years:!

!!!!!!!!!!Steady increase in precision of the neutron lifetime but are any of these measurements accurate?!

This figure shows!various measurements!of the neutron lifetime!over the years.!

The size of bar!reflects the!precision of!the experiment!

R.Kass/Sp15! P3700 Lecture 1 !

Page 13: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

13!

Measurement Errors (or uncertainties)!Use results from probability and statistics as a way of calculating how �good� a measurement is.!! !most common quality indicator:!! !relative precision = [uncertainty of measurement]/measurement!! ! !example: we measure a table to be 10 inches with uncertainty of 1 inch.!! ! !relative precision = 1/10 = 0.1 or 10% (% relative precision)!

! !Uncertainty in measurement is usually square root of variance:!# # #σ = standard deviation!! ! !σ is usually calculated using the technique of �propagation of errors�. !

Statistical and Systematic Errors!Results from experiments are often presented as:!! !N ± XX ± YY!!N: !value of quantity measured (or determined) by experiment.!!XX: !statistical error, usually assumed to be from a Gaussian distribution.!! ! !With the assumption of Gaussian statistics we can say (calculate) something about !!!!!!!how

well our experiment agrees with other experiments and/or theories.!! ! !Expect ~ 68% chance that the true value is between N - XX and N + XX.!!YY: !systematic error. Hard to estimate, distribution of errors usually not known.!! !examples:!mass of proton = 0.9382769 ± 0.0000027 GeV (only statistical error given)!! ! ! ! ! !mass of W boson = 80.8 ± 1.5 ± 2.4 GeV (both statistical and systematic error given)!

R.Kass/Sp15! P3700 Lecture 1 !

Page 14: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

14!

What�s the difference between statistical and systematic errors?! N ± XX ± YY!Statistical errors are �random� in the sense that if we repeat the measurement enough times:!! ! !XX →0 as the number of measurements increases

Systematic errors, YY, do not → 0 with repetition of the measurements.!! ! !examples of sources of systematic errors:!! ! ! !voltmeter not calibrated properly!! ! ! !a ruler not the length we think is (meter stick might really be < meter!)!! !Because of systematic errors, an experimental result can be precise, but not accurate!!

How do we combine systematic and statistical errors to get one estimate of precision?!! !BIG PROBLEM!!! !two choices:!! ! !σtot = XX + YY add them linearly!! ! !σtot = (XX2 + YY2)1/2 add them in quadrature!!Some other ways of quoting experimental results!! !lower limit: �the mass of particle X is > 100 GeV�!! !upper limit: �the mass of particle X is < 100 GeV�!! !asymmetric errors: mass of particle !

!X =100−3

+4 GeV

R.Kass/Sp15! P3700 Lecture 1 !

Page 15: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

15!

Probability, Set Theory and Stuff!The relationships and results from set theory are essential to the understanding of probability.!Below are some definitions and examples that illustraτe the connection between set theory,!probability and statistics.!We define an experiment as a process that generates �observations� and a sample space (S)!as the set of all possible outcomes from the experiment:!

!simple event: !only one possible outcome!!compound event: !more than one outcome!

As an example of simple and compound events consider particles (e.g. protons, neutrons) made!of u (�up�), d (�down�), and s (�strange�) quarks. The u quark has electric charge (Q) =2/3|e|!(e=charge of electron) while the d and s quarks have charge =-1/3|e|. !Let the experiment be the ways we combine 3 quarks to make a Q=0, 1, or 2 state.!

!Event A: !Q=0 {ssu, ddu, sdu} !note: a neutron is a ddu state!!Event B: !Q=1{suu, duu} ! !note: a proton is a duu state!!Event C: Q=2 {uuu}!

For this example events A and B are compound while event C is simple.!

The following definitions from set theory are used all the time in the discussion of probability.!Let A and B be events in a sample space S.!Union: The union of A & B (A∪B) is the event consisting of all outcomes in A or B.!Intersection: The intersection of A & B (A∩B) is the event consisting of all outcomes in A and B.!Complement: The complement of A (A´) is the set of outcomes in S not contained in A. !Mutually exclusive: If A & B have no outcomes in common they are mutually exclusive.!

R.Kass/Sp15! P3700 Lecture 1 !

Page 16: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

16!

Probability, Set Theory and Stuff!Returning to our example of particles containing 3 quarks (�baryons�):!The event consisting of charged particles with Q=1,2 is the union of B and C: B∪C!The events A, B, C are mutually exclusive since they do not have any particles in common.!

S

A!B!

A common and useful way to visualize union, intersection, and mutually exclusive is !to use a Venn diagram of sets A and B defined in space S:!

S

A!B!

A∩B: intersection of A&B!S A!

B!

A∪B: union of A&B! A & B mutually exclusive!

S

A!B!

Venn diagram of A&B!

The axioms of probabilities (P):!a) For any event A, P(A)≥0. (no negative probabilities allowed)!b) P(S)=1. !c) If A1, A2, ….An is a collection of mutually exclusive events then:!!

∑=

=∪∪n

iin APAAAP

121 )()(

(the collection can be infinite (n=∞))!

From the above axioms we can prove the following useful propositions:!a) For any event A: P(A)=1-P(A´) !b) If A & B are mutually exclusive then P(A∩B)=0!c) For any two events A & B: P(A∪B)=P(A)+P(B)-P(A∩B)!

items b, c are �obvious�!from their Venn diagrams!

R.Kass/Sp15! P3700 Lecture 1 !

Page 17: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

17!

Probability, Set Theory and Stuff!Example: Everyone likes pizza. !Assume the probability of having pizza for lunch is 40%, the probability of having pizza for dinner is 70%, and the probability of having pizza for lunch and dinner is 30%. Also, this person always skips breakfast. We can recast this example using:!

!P(A)= probability of having pizza for lunch =40%!!P(B)= probability of having pizza for dinner = 70%!!P(A∩B)=30% (pizza for lunch and dinner)!

1) What is the probability that pizza is eaten at least once a day?! The key words are �at least once�, this means we want the union of A & B!

!P(A∪B)=P(A)+P(B)-P(A∩B) = .7+.4-.3 =0.8!2) What is the probability that pizza is not eaten on a given day?!

!Not eating pizza (Z´) is the complement of eating pizza (Z) so P(Z)+P(Z´)=1 !!P(Z´)=1-P(Z) =1-0.8 = 0.2!

3) What is the probability that pizza is only eaten once a day?!This can be visualized by looking at the Venn diagram and realizing we need to exclude the overlap (intersection) region.!

!P(A∪B)-P(A∩B) = 0.8-0.3 =0.5!

prop. c)!

prop. a)!

The non-overlapping blue area is pizza for lunch, no pizza for dinner. !The non-overlapping red area is pizza for dinner, no pizza for lunch. !

pizza for lunch!

pizza for dinner!

R.Kass/Sp15! P3700 Lecture 1 !

Page 18: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

18!

Conditional Probability!Frequently we must calculate a probability assuming something else has occurred.! This is called conditional probability.!Here�s an example of conditional probability:! Suppose a day of the week is chosen at random. The probability the day is Thursday is 1/7.! P(Thursday)=1/7! Suppose we also know the day is a weekday. Now the probability is conditional, =1/5.! P(Thursday|weekday)=1/5 !

the notation is: probability of it being Thursday given that it is a weekday!

Formally, we define the conditional probability of A given B has occurred as:!!P(A|B)=P(A∩B)/P(B)!

We can use this definition to calculate the intersection of A and B:!!P(A∩B)=P(A|B)P(B)!

For the case where the Ai�s are both mutually exclusive and exhaustive we have:

∑=

=+=n

iiinn APABPAPABPAPABPAPABPBP

12211 )()|()()|()()|()()|()(

For our example let B=the day is a Thursday, A1= weekday, A2=weekend, then:! P(Thursday)=P(thursday|weekday)P(weekday)+P(Thursday|weekend)P(weekend)! P(Thursday)=(1/5)(5/7)+(0)(2/7)=1/7!

R.Kass/Sp15! P3700 Lecture 1 !

Page 19: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

19!

Bayes�s Theorem!Bayes�s Theorem relates conditional probabilities. It is widely used!in many areas of the physical and social sciences.!Let A1, A2,..Ai be a collection of mutually exclusive and exhaustive events with !P(Ai)>0 for all i.!Then for any other event B with P(B)>0 we have:!

∑=

=∩

= n

jjj

iiii

APABP

APABPBPBAPBAP

1)()|(

)()|()()()|(

We call: P(Aj) the aprori probability of Aj occurring P(Aj|B) the posterior probability that Aj will occur given that B has occurred P(B|Aj) the likelihood

Independence has a special meaning in probability: Events A and B are said to be independent if P(A|B)=P(A) Using the definition of conditional probability A and B are independent iff:

!P(A∩B)=P(A)P(B)

R.Kass/Sp15! P3700 Lecture 1 !

Page 20: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

20!

Example of Bayes�s Theorem!While Bayes�s theorem is very useful in physics, perhaps the best!illustration of its use is in medical statistics, especially drug testing.!Assume a certain drug test: ! gives a positive result 97% of the time when the drug is present:!

! P(positive test|drug present)=0.97! gives a positive result 0.4% of the time if the drug is not present (�false positive�)!

!P(positive test|drug not present)=0.004!Let�s assume that the drug is present in 0.5% of the population (1 out of 200 people).!

!P(drug present)=0.005!!P(drug not present)=1-P(drug present)=0.995!

What is the probability that the drug is not present and you have a positive test?!!P(drug is not present|positive test)=????!

Bayes�s Theorem gives:!

)presentdrug()presentdrug|positivetest()presentnotdrug()presentnotdrug|positivetest()presentnotdrug()presentnotdrug|positivetest(

)positivetest |presentnotdrug(

PPPPPP

P

+

=

45.0)005.0)(97.0()005.01)(004.0(

)005.01)(004.0()positivetest |presentnotdrug( =

+−

−=P

Thus there is a 45% chance that the test comes back positive even if you are drug free!!The real life consequence of this large probability is that drug tests are often administered twice!!

R.Kass/Sp15! P3700 Lecture 1 !

Page 21: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

21!

Another example from Particle Physics!

Consider the pentaquark from the SPring-8 (LEPS) experiment Θ5=dudus Signal: 19 events Significance: 4.6σ (Assuming gaussian stats the prob. for a 4.6σ effect is ~4x10-6)

What are the authors trying to say here? If this bump is accidental, then the accident rate is 1 in 4 million. Or If I repeated the experiment 4 million times I would get a bump this big or bigger. What do the authors want you to think? Since the accident rate is so low it must not be an accident, therefore it is physics!

Θ5=dudus

Consider the (non-)discovery of the pentaquark!A pentaquark is a bound state of 5 quarks! These states are allowed but in ~40 years of searching no evidence for them..!Until 2003 when several expperiments reported positive evidence!

R.Kass/Sp15! P3700 Lecture 1 !

Page 22: Physics 3700 Probability, Statistics, & Data Analysis€¦ · Probability, Statistics, & Data Analysis Introduction:! I) The understanding of many physical phenomena relies on statistical

22!

Pentaquark Reality!

CLAS 2003/2004 CLAS 2005 g11

Sometimes it is not a question of statistical significance! Again, the pentaquark state Θ+(1540) gives a great example: Consider the CLAS experiment at JLAB: 2003/4 report a 7.8σ effect (~6x10-15 according to MATHEMATICA) 2005 report NO Signal! (better experiment)

What size signal should we expect?

Lesson: This is not a statistics issue, but one of experiment design and implementation.

R.Kass/Sp15! P3700 Lecture 1 !