lecture 8 probabilities and distributions
DESCRIPTION
Lecture 8 Probabilities and distributions. Probability is the quotient of the number of desired events k through the total number of events n. If it is impossible to count k and n we might apply the stochastic definition of probability . - PowerPoint PPT PresentationTRANSCRIPT
Lecture 8Probabilities and distributions
n
kkp )( ( ) 1 1 ( )
n k kp k p k
n n
( ) ( ) 1p k p k
0 1p
Probability is the quotient of the number of desired events k through the total number of events n.
If it is impossible to count k and n we might apply the stochastic definition of probability. The probability of an event j is approximately the frequency of j during n observations.
What is the probability to win in Duży Lotek?
13983816
1
)!649(!6
!491
p
The number of desired events is 1. The number of possible events comes from the
number of combinations of 6 numbers out of 49.
)!649(!6!4949
6 C
We need the number of combinations of k events out of a total of N events
)!(!!kNk
Nk
NC nk
10
0
1!0
Bernoulli distribution
kn
n
k
n
n1
0
What is the probability to win in Duży Lotek?1 1 1 1
0.0000649 49 49 49
6 5 4 3
p
Wrong!
Hypergeometric distribution
A B C D E F G H I1 N 49 49 49 492 K 6 =+KOMBINACJE(B1;B2) 6 =+KOMBINACJE(D1;D2) 6 =+KOMBINACJE(F1;F2) 6 =+KOMBINACJE(H1;H2)3 n 6 =+KOMBINACJE(B2;B4) 6 =+KOMBINACJE(D2;D4) 6 =+KOMBINACJE(F2;F4) 6 =+KOMBINACJE(H2;H4)4 k 3 =+KOMBINACJE(B1-B2;B3-B4) 4 =+KOMBINACJE(D1-D2;D3-D4) 5 =+KOMBINACJE(F1-F2;F3-F4) 6 =+KOMBINACJE(H1-H2;H3-H4)5Combinations =+C2/(C3*C4) =+E2/(E3*E4) =+G2/(G3*G4) =+I2/(I3*I4)6 Probability =1/C5 =1/E5 =1/G5 =1/I57 Sum =+SUMA(C6:I6)
P = 0.0186
kn
KN
n
Kn
N
C Knkn,,
n
Nkn
KN
n
K
p Knkn,,
6
49
66
649
6
6
6
49
56
649
5
6
6
49
46
649
4
6
6
49
36
649
3
6
,,Knknp
N
K=n+k
n
We need the probability that of a sample of K elements out of a sample universe of N exactly n have a desired probability and k not.
In Multi Lotek 20 numbers are taken out of a total of 80. What is the probability that you have exactly 10 numbers correct?
kn
KN
n
K
n
N
C Knkn,,
N = 80K = 20n = 10k = 10
004.0
20
8010
60
10
20
)10;20;80(
p
Assessing the number of infected personsAssessing total population size
Capture – recapture methods
The frequency of marked animals should equal the frequency wothin the total population Assumption:
Closed populationRandom catchesRandom dispersalMarked animals do not differ in behaviour
resample
resampletotaltotal
resample
resample
m
nmN
N
m
n
m
42176 N Nreal = 38
We take a sample of animals/plants and mark them
We take a second sample and count the number of
marked individuals
The two sample case
common
common
nmm
NNm
m
n 211
2
You take two samples and count the number of infected persons in the first sample m1, in the second sample m2 and the number of infected persons noted in both samples k.
12143 N
How many persons have a certain infectuous desease?
m species l species k species
In ecology we often have the problem to compare the species composition of two habitats. The species overlap is measured by the Soerensen distance metric.
lmk
S
2
We do not know whether S is large or small.
To assess the expectation we construct a null model.Both habitats contain species of a common species pool. If the pool size n is known we can estimate how many joint species k contain two random samples of size m and l out of n.
n species
Common species pool
Habitat A Habitat B
K
n n k n m
k m k l kp
n n
m l
nml
k
lk
nm
The expected number of joint species.Mathematical expectation
The probability to get exactly k joint species.Probability distribution.
0
0.1
0.2
0.3
0 3 6 9 12 15Species in common
Pro
ba
bili
ty A
0
0.1
0.2
0.3
0 3 6 9 12 15Species in common
Pro
ba
bili
ty B
0
0.1
0.2
0.3
0 3 6 9 12 15Species in common
Pro
ba
bili
ty C
0
0.1
0.2
0.3
0 3 6 9 12 15Species in common
Pro
ba
bili
ty D
Ground beetle species of two poplar plantations and two adjacent wheet fields near Torun (Ulrich et al. 2004, Annales Zool. Fenn.)
Pool size 90 to 110 species.
There are much more species in common than expected just by chance.The ecological interpretation is that ground beetles colonize fields and adjacent
seminatural habitats in a similar manner. Ground beetles do not colonize according to ecological requirements (niches) but
according to spatial neighborhood.
K
n n k n m
k m k l kp
n n
m l
Bayesian inference and maximum likelihood
(Idż na całość)
Car Zonk Zonk
P=1/3
P=1/2
Car Zonk Zonk
Car Zonk Zonk
Car Zonk Zonk
Remain Change
win loose
1. choice Shown
loose win
loose win
p(A B) p(A | B)p(B)
p(A B)p(A | B)
p(B)
p(B A) p(B | A)p(A)
p(B A)p(B | A)
p(A)
p(A B) p(A | B)p(B) p(B | A)p(A)
p(B | A)p(A)p(A | B)
p(B)
The law of dependent propability
conditional priori(A)posterior
priori(B)
Theorem of Bayes
Thomas Bayes (1702-1761)
Abraham de Moivre (1667-1754)
1
( ) ( | ) ( )
n
i ii
p A p A B p B
Total probability
1
( ) ( | )( | )
( | ) ( )
i i
i n
i ii
p B p A Bp B A
p A B p B
p(M3 G1)p(G1| M3)
p(M3)
p(M3 | G1)p(G1)
p(M3 | G1)p(G1) p(M3 | G2)p(G2) p(M3 | G3)p(G3)
1/ 2*1/ 3 1
1/ 2*1/ 3 1*1/ 3 0*1/ 3 3
Idż na całość
Assume we choose gate 1 (G1) at the first choice. We are looking for the probability p(G1|M3) that the car is behind gate 1 if we know that the moderator opened gate 3 (M3).
A
B3B2B1
N
P(B1) P(B3)P(B2)
P(A|B1) P(A|B3)P(A|B2)
)()()|(
)|(Ap
BpBApABp
Calopteryx spelendens
We study the occurrence of the damselfly Calopteryx splendens at small rivers. We know from the literature that C. splendens occurs at about 10% of all rivers. Occurrence depends on water quality. Suppose we have five quality classes that occur in 10% (class I), 15% (class II), 27% (class III), 43% (class IV), and 5% (class V) of all rivers. The probability to find Calopteryx in these five classes is 1% (class I), 7% (class II), 14% (class III), 31% (class IV), and 47% (class V).
To which class belongs probably a river if we find Calopteryx?( | ) ( 1)
( | )( | ) ( 1) ( | 2) ( 2) ( | 3) ( 3) ( | 4) ( 4) ( | 5) ( 5)
0.1*0.01( | )
0.1*0.01 0.15*0.07 0.27*0.14 0.43*0.31 0.05*
p A classI p classp classI A
p A classI p class p A class p class p A class p class p A class p class p A class p class
p classI A 0.00480.47
p(class II|A) = 0.051, p(class III|A) = 0.183, p(class IV|A) = 0.647, p(class V|A) = 0.114
Indicator values
Bayes and forensic
False positive fallacyError of the prosecutor
500 suspects
DNA identical1 person
DNA not identical499 persons
DNA test positive1 person
DNA negative495 persons
DNA test positive4 persons
Let’s take a standard DNA test for identifying persons. The test has a precision of more than 99%.
What is the probability that we identify the wrong person?
p( | c)p() 1*1/ 500 1p(c | )
p( ) 5 / 500 5
p( | c)p(c)
p(c | )p( | c)p(c) p( | c)p( c)
11* 1500p(c | )1 4 499 51* *500 499 500
The forensic version of Bayes theorem
The error of the advocate
In the process against the basketball star E. O. Simpson, one of his advocates (a Harvard professor) argued that Simpson sometimes has beaten his wife. However, only very few
man who beat their wives later murder them (about 0.1%).
Whole population250 000 000
Murdered by husbandP = 1/10000
Beaten wives250 000 000 - N
Not beaten wivesN
Murdered otherwiseP = 1/10000
Murdered otherwiseP = 1 /10000
10000 beaten wivesMurdered by husband
P = 1/2
b
b b
p(m | h ) 1p(m | b)
p(m | h ) p(m | h ) 2
Maximum likelihoods
Suppose you studied 50 patients in a clinical trial and detected at 30 of them the presence of a certain bacterial disease.
What is the most probable frequency of this disease in the population?
50
0.5
30 20
0.6
30 20
0.8
50 1p (30 | 50) 0.042
30 2
50 3 2p (30 | 50) 0.115
30 5 5
50 4 1p (30 | 50) 0.001
30 5 5
p p 1 iL f (x ...x )
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 0.2 0.4 0.6 0.8 1p
L(p
)We look for the maximum value of the likelihood function
p30 20 29 20 30 19p
dL50 50 50L p (1 p) 30p (1 p) p 20(1 p) 0
30 30 30dp
33(1 p) 2p p
5
p
p
50ln(L ) ln( 30ln(p) 20ln(1 p)
30
d ln L 30 20 30 p
dp p 1 p 5
log likelihood estimator ln(Lp)
2030 )1(30
50ppp
Home work and literature
Refresh:
• Probability• Permutations, variations, combinations• Bernoulli event• Pascal triangle, binomial coefficients• Dependent probability• Independent probability• Derivative, integral of power functions
Prepare to the next lecture:
• Arithmetic, geometric, harmonic mean• Cauchy inequality• Statistical distribution• Probability distribution• Moments of distributions• Error law of Gauß
Literature:
http://www.brixtonhealth.com/CRCaseFinding.pdf