lecture 8 probabilities and distributions

17
Lecture 8 Probabilities and distributions n k k p ) ( ( ) 1 1 () n k k p k pk n n () ( )1 pk p k 0 1 p Probability is the quotient of the number of desired events k through the total number of events n. If it is impossible to count k and n we might apply the stochastic definition of probability. The probability of an event j is approximately the frequency of j during n observations.

Upload: maia-dorsey

Post on 31-Dec-2015

16 views

Category:

Documents


2 download

DESCRIPTION

Lecture 8 Probabilities and distributions. Probability is the quotient of the number of desired events k through the total number of events n. If it is impossible to count k and n we might apply the stochastic definition of probability . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 8 Probabilities  and  distributions

Lecture 8Probabilities and distributions

n

kkp )( ( ) 1 1 ( )

n k kp k p k

n n

( ) ( ) 1p k p k

0 1p

Probability is the quotient of the number of desired events k through the total number of events n.

If it is impossible to count k and n we might apply the stochastic definition of probability. The probability of an event j is approximately the frequency of j during n observations.

Page 2: Lecture 8 Probabilities  and  distributions

What is the probability to win in Duży Lotek?

13983816

1

)!649(!6

!491

p

The number of desired events is 1. The number of possible events comes from the

number of combinations of 6 numbers out of 49.

)!649(!6!4949

6 C

We need the number of combinations of k events out of a total of N events

)!(!!kNk

Nk

NC nk

10

0

1!0

Bernoulli distribution

kn

n

k

n

n1

0

Page 3: Lecture 8 Probabilities  and  distributions

What is the probability to win in Duży Lotek?1 1 1 1

0.0000649 49 49 49

6 5 4 3

p

Wrong!

Hypergeometric distribution

A B C D E F G H I1 N 49 49 49 492 K 6 =+KOMBINACJE(B1;B2) 6 =+KOMBINACJE(D1;D2) 6 =+KOMBINACJE(F1;F2) 6 =+KOMBINACJE(H1;H2)3 n 6 =+KOMBINACJE(B2;B4) 6 =+KOMBINACJE(D2;D4) 6 =+KOMBINACJE(F2;F4) 6 =+KOMBINACJE(H2;H4)4 k 3 =+KOMBINACJE(B1-B2;B3-B4) 4 =+KOMBINACJE(D1-D2;D3-D4) 5 =+KOMBINACJE(F1-F2;F3-F4) 6 =+KOMBINACJE(H1-H2;H3-H4)5Combinations =+C2/(C3*C4) =+E2/(E3*E4) =+G2/(G3*G4) =+I2/(I3*I4)6 Probability =1/C5 =1/E5 =1/G5 =1/I57 Sum =+SUMA(C6:I6)

P = 0.0186

kn

KN

n

Kn

N

C Knkn,,

n

Nkn

KN

n

K

p Knkn,,

6

49

66

649

6

6

6

49

56

649

5

6

6

49

46

649

4

6

6

49

36

649

3

6

,,Knknp

N

K=n+k

n

We need the probability that of a sample of K elements out of a sample universe of N exactly n have a desired probability and k not.

Page 4: Lecture 8 Probabilities  and  distributions

In Multi Lotek 20 numbers are taken out of a total of 80. What is the probability that you have exactly 10 numbers correct?

kn

KN

n

K

n

N

C Knkn,,

N = 80K = 20n = 10k = 10

004.0

20

8010

60

10

20

)10;20;80(

p

Page 5: Lecture 8 Probabilities  and  distributions

Assessing the number of infected personsAssessing total population size

Capture – recapture methods

The frequency of marked animals should equal the frequency wothin the total population Assumption:

Closed populationRandom catchesRandom dispersalMarked animals do not differ in behaviour

resample

resampletotaltotal

resample

resample

m

nmN

N

m

n

m

42176 N Nreal = 38

We take a sample of animals/plants and mark them

We take a second sample and count the number of

marked individuals

Page 6: Lecture 8 Probabilities  and  distributions

The two sample case

common

common

nmm

NNm

m

n 211

2

You take two samples and count the number of infected persons in the first sample m1, in the second sample m2 and the number of infected persons noted in both samples k.

12143 N

How many persons have a certain infectuous desease?

Page 7: Lecture 8 Probabilities  and  distributions

m species l species k species

In ecology we often have the problem to compare the species composition of two habitats. The species overlap is measured by the Soerensen distance metric.

lmk

S

2

We do not know whether S is large or small.

To assess the expectation we construct a null model.Both habitats contain species of a common species pool. If the pool size n is known we can estimate how many joint species k contain two random samples of size m and l out of n.

n species

Common species pool

Habitat A Habitat B

K

n n k n m

k m k l kp

n n

m l

nml

k

lk

nm

The expected number of joint species.Mathematical expectation

The probability to get exactly k joint species.Probability distribution.

Page 8: Lecture 8 Probabilities  and  distributions

0

0.1

0.2

0.3

0 3 6 9 12 15Species in common

Pro

ba

bili

ty A

0

0.1

0.2

0.3

0 3 6 9 12 15Species in common

Pro

ba

bili

ty B

0

0.1

0.2

0.3

0 3 6 9 12 15Species in common

Pro

ba

bili

ty C

0

0.1

0.2

0.3

0 3 6 9 12 15Species in common

Pro

ba

bili

ty D

Ground beetle species of two poplar plantations and two adjacent wheet fields near Torun (Ulrich et al. 2004, Annales Zool. Fenn.)

Pool size 90 to 110 species.

There are much more species in common than expected just by chance.The ecological interpretation is that ground beetles colonize fields and adjacent

seminatural habitats in a similar manner. Ground beetles do not colonize according to ecological requirements (niches) but

according to spatial neighborhood.

K

n n k n m

k m k l kp

n n

m l

Page 9: Lecture 8 Probabilities  and  distributions

Bayesian inference and maximum likelihood

(Idż na całość)

Car Zonk Zonk

P=1/3

P=1/2

Car Zonk Zonk

Car Zonk Zonk

Car Zonk Zonk

Remain Change

win loose

1. choice Shown

loose win

loose win

Page 10: Lecture 8 Probabilities  and  distributions

p(A B) p(A | B)p(B)

p(A B)p(A | B)

p(B)

p(B A) p(B | A)p(A)

p(B A)p(B | A)

p(A)

p(A B) p(A | B)p(B) p(B | A)p(A)

p(B | A)p(A)p(A | B)

p(B)

The law of dependent propability

conditional priori(A)posterior

priori(B)

Theorem of Bayes

Thomas Bayes (1702-1761)

Abraham de Moivre (1667-1754)

Page 11: Lecture 8 Probabilities  and  distributions

1

( ) ( | ) ( )

n

i ii

p A p A B p B

Total probability

1

( ) ( | )( | )

( | ) ( )

i i

i n

i ii

p B p A Bp B A

p A B p B

p(M3 G1)p(G1| M3)

p(M3)

p(M3 | G1)p(G1)

p(M3 | G1)p(G1) p(M3 | G2)p(G2) p(M3 | G3)p(G3)

1/ 2*1/ 3 1

1/ 2*1/ 3 1*1/ 3 0*1/ 3 3

Idż na całość

Assume we choose gate 1 (G1) at the first choice. We are looking for the probability p(G1|M3) that the car is behind gate 1 if we know that the moderator opened gate 3 (M3).

A

B3B2B1

N

P(B1) P(B3)P(B2)

P(A|B1) P(A|B3)P(A|B2)

)()()|(

)|(Ap

BpBApABp

Page 12: Lecture 8 Probabilities  and  distributions

Calopteryx spelendens

We study the occurrence of the damselfly Calopteryx splendens at small rivers. We know from the literature that C. splendens occurs at about 10% of all rivers. Occurrence depends on water quality. Suppose we have five quality classes that occur in 10% (class I), 15% (class II), 27% (class III), 43% (class IV), and 5% (class V) of all rivers. The probability to find Calopteryx in these five classes is 1% (class I), 7% (class II), 14% (class III), 31% (class IV), and 47% (class V).

To which class belongs probably a river if we find Calopteryx?( | ) ( 1)

( | )( | ) ( 1) ( | 2) ( 2) ( | 3) ( 3) ( | 4) ( 4) ( | 5) ( 5)

0.1*0.01( | )

0.1*0.01 0.15*0.07 0.27*0.14 0.43*0.31 0.05*

p A classI p classp classI A

p A classI p class p A class p class p A class p class p A class p class p A class p class

p classI A 0.00480.47

p(class II|A) = 0.051, p(class III|A) = 0.183, p(class IV|A) = 0.647, p(class V|A) = 0.114

Indicator values

Page 13: Lecture 8 Probabilities  and  distributions

Bayes and forensic

False positive fallacyError of the prosecutor

500 suspects

DNA identical1 person

DNA not identical499 persons

DNA test positive1 person

DNA negative495 persons

DNA test positive4 persons

Let’s take a standard DNA test for identifying persons. The test has a precision of more than 99%.

What is the probability that we identify the wrong person?

p( | c)p() 1*1/ 500 1p(c | )

p( ) 5 / 500 5

p( | c)p(c)

p(c | )p( | c)p(c) p( | c)p( c)

11* 1500p(c | )1 4 499 51* *500 499 500

The forensic version of Bayes theorem

Page 14: Lecture 8 Probabilities  and  distributions

The error of the advocate

In the process against the basketball star E. O. Simpson, one of his advocates (a Harvard professor) argued that Simpson sometimes has beaten his wife. However, only very few

man who beat their wives later murder them (about 0.1%).

Whole population250 000 000

Murdered by husbandP = 1/10000

Beaten wives250 000 000 - N

Not beaten wivesN

Murdered otherwiseP = 1/10000

Murdered otherwiseP = 1 /10000

10000 beaten wivesMurdered by husband

P = 1/2

b

b b

p(m | h ) 1p(m | b)

p(m | h ) p(m | h ) 2

Page 15: Lecture 8 Probabilities  and  distributions

Maximum likelihoods

Suppose you studied 50 patients in a clinical trial and detected at 30 of them the presence of a certain bacterial disease.

What is the most probable frequency of this disease in the population?

50

0.5

30 20

0.6

30 20

0.8

50 1p (30 | 50) 0.042

30 2

50 3 2p (30 | 50) 0.115

30 5 5

50 4 1p (30 | 50) 0.001

30 5 5

p p 1 iL f (x ...x )

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 0.2 0.4 0.6 0.8 1p

L(p

)We look for the maximum value of the likelihood function

p30 20 29 20 30 19p

dL50 50 50L p (1 p) 30p (1 p) p 20(1 p) 0

30 30 30dp

33(1 p) 2p p

5

Page 16: Lecture 8 Probabilities  and  distributions

p

p

50ln(L ) ln( 30ln(p) 20ln(1 p)

30

d ln L 30 20 30 p

dp p 1 p 5

log likelihood estimator ln(Lp)

2030 )1(30

50ppp

Page 17: Lecture 8 Probabilities  and  distributions

Home work and literature

Refresh:

• Probability• Permutations, variations, combinations• Bernoulli event• Pascal triangle, binomial coefficients• Dependent probability• Independent probability• Derivative, integral of power functions

Prepare to the next lecture:

• Arithmetic, geometric, harmonic mean• Cauchy inequality• Statistical distribution• Probability distribution• Moments of distributions• Error law of Gauß

Literature:

http://www.brixtonhealth.com/CRCaseFinding.pdf