maximum likely hood
TRANSCRIPT
-
8/4/2019 Maximum Likely Hood
1/28
HO #2 EE 722
9/3/2002 J. Chun
MLD (Maximum Likelihood Detection)
Example 1 A blood test is 95% effective in detecting the HIV infection when it is, in fact, present.
However, the test also yields false positive result for 1% of the healthy persons tested. If a person is
tested to be positive, would you decide that he has HIV?
95.0}HIV|positive{ =P
01.0}HIVno|positive{=
P
It is more likely that a person with HIV gives the positive result than a person with no HIV. So we
would decide that the person has HIV.
The decision criterion used in Example 1 is called the maximum likelihood decision criterion. Let us
generalize the idea in Example 1. There are two hypotheses 0H and 1H ( HIVno0 =H ,
HIV1 =H in the above example). Each of the two messages generates a point z in the observation
space Z . It is desired to divide Z into two decision regions 0Z and 1Z (The division gives a
decision rule). We make decision 0d , that hypothesis 0H is true, if 0Zz , similarly for decision
1d .
In Example 1, }positivenegative,{=Z .
1
0Z
1Z
0)( dzd =
1)( dzd =
Z
binary
Single observation
-
8/4/2019 Maximum Likely Hood
2/28
MLD Criterion
Given an observation Zz , set 0)( dzd = if it is more likely that 0H generated z than 1H generated z . Namely,
( ) ( ){ }100 ||| HzpHzpzZ >=
( ) ( ){ }101 ||| HzpHzpzZ ,
)|(
)|()(
0
1
Hzp
Hzpz
Example 2
nzH
NnnzH
+==1:
)1,0(~,:
1
0
2
likelihood ratio
=
=
2
)1(
1
20
2
2
2
1)|(
2
1)|(
z
z
eHzp
eHzp
0 5.0 1
)|(1
Hzp)|(0
Hzp
z
1Z
0Z
-
8/4/2019 Maximum Likely Hood
3/28
1)|(
)|()(
1
0
22
2
2
212
2)1(
2
2
1
2
)1(
2
1
0
1
H
H
zzz
z
z
ee
e
e
Hzp
Hzpz ====
0
2
12)(l n
1
0
H
H
zz
=
or
2
11
0
H
H
z
So, if 6.0=z , we choose 1H because2
16.0 > .
Example 3
( ))1,0(~2
1)|(: 200
2
NzeHzpH
z
=
))2,0(~(
22
1)|(:
2
2
2811
NzeHzpH
z
=
3
2
1
1Z0Z
Z
)|(1
Hzp
)|( 0Hzp
1Z
0Z1Z
-
8/4/2019 Maximum Likely Hood
4/28
12
1
)|(
)|()(
1
0
2
2
2
8
3
2
22
1
8
2
1
0
1
H
H
z
z
z
e
e
e
Hzp
Hzpz
===
0832l n)(l n
1
0
2
H
H
zz+=
2ln3
81
0
2
H
H
z
or 2ln3
81
0
H
H
z
.
Example 4 (Multiple observation)
nz =:0H
nsz +=:1H
[ ]nnn 121exp
)(det)2(
1)(
21
2
= RR
pT
P
Aside:
(i) IR 2=
= IR2
1 1
:
4
p-element vector
-
8/4/2019 Maximum Likely Hood
5/28
( ) ( ) ( ) PP II 222 detdet ==
So,( )
=
=
P
i
iPnp
1
2
2
12
exp2
1)(
n
(ii) 1=P :
[ ]22
12exp
2
1)( np
=n
( ) ( ){ }[ ]zzszszz
zz
11
21
0
1 exp)|(
)|()( == RR
Hp
Hp TT
( ) 022
1)(l n
1
0
11
H
H
TTRR =szssz
So, sssz11
2
11
0
RR T
H
H
T.
This is a multiple decision, too.
Neyman-Pearson Criterion
Fix { }01 | HdP at a preselected value 0 , and then maximize { }11 | HdP . constrained
maximization
5
( )szss 112
1 2 = RR TT
)|(1
Hzp
)|( 0Hzp
1Z
0Z
1)( dzd =0)( dzd =z
-
8/4/2019 Maximum Likely Hood
6/28
area ( )= { } FPHdP 01 |
false alarm probability
area( )= { } DPHdP 11 |
detection probability
We want { } 11 | HdP , { } 01 | HdP . By moving , however, { }11 | HdP and { }01 | HdP
either decrease and increase simultaneously.To find the threshold according to the Neyman-Pearson criterion, we want to maximize:
{ } { }
[ ] 001
0
)|(
01
)|(
11
1
10
11
)|()|(
]|[|
+=
=
==
Z
dzHzpdzHzp
dzHzpHzp
HdPHdP
ZZ
0)|()|( 01 > HzpHzp
or >)|(
)|(
0
1
Hzp
Hzp.
Namely,
1
0
)|(
)|()(
0
1
H
H
Hzp
Hzpz ==
We must select such that the constraint
{ } ==1
)|(| 0001Z
dzHzpHdP
is satisfied.
6
This can be maximized by selecting 1Z of all z such that
-
8/4/2019 Maximum Likely Hood
7/28
Example 5
0,2
)(exp
2
1)|(
2
exp
2
1)|(
2
1
2
0
>
=
=
zHzp
zHzp
We require that { } 25.0| 01 =HdP
( )
z
zz
z
z
Hzp
Hzpz
====2
21
22
2
2
ee
e
e
)|(
)|()(
22
)(
2
2
)(
0
1
So,
l n2
11
0
2
1
2
H
H
z
z +
=
ln
2
11
0
H
H
z or
2
1ln1
0
+
H
H
z
{ }
+==
+
=2
lne|
2
ln
2
2
101
2
25.0
QdzHdP
z
So, 674.02
ln
+
Notice that we did not have to know the value
, to derive the Neyman-Pearson detection
rule.
Example 6
7
1Z
0Z
z
674.00= z
( )0
zQ
unknown
-
8/4/2019 Maximum Likely Hood
8/28
-
8/4/2019 Maximum Likely Hood
9/28
that it is known that only 0.5% of the Korean population has HIV. Would you strongly thrust the test when
you are tested to be positive?
The probability that a person has HIV, given that his test result is positive.
{ }{ }
{ }
{ } { }{ } { } { } { }
323.0
)995.0()01.0()005.0()95.0(
)005.0()95.0(
HIVnoHIVno|positiveHIVHIV|positive
HIVHIV|positive
positive
positiveHIV,positive|HIV
=
+
=
+
=
=
PPPP
PP
P
PP
So, we need a better decision criterion that can use a priori information such as 0.5% in the above
argument. The idea is that if either 0H or 1H is highly unlikely to be true, the MLD is not a goodcriterion.
MAP (maximum a posteriori decision criterion)
Given an observation z , choose 0H if 0H is more likely than 1H .
1)|(
)|(1
00
1
H
HzHp
zHp
So,
{ }{ }1
0
1
0
)(HP
HPz
H
H
(*)
For Example 1, we consider the ratio,
9
{ }{ } { }{ }0
1
00
11 )()|()|(
HPHPz
HPHzpHPHzp ==
{ }
{ }
{ } { } { }
{ } { } { }
2
1477.0
995.001.0
005.095.0
positiveHIVnoHIVno|positive
positiveHIVHIV|positive
positive|HIVno
positive|HIV
-
8/4/2019 Maximum Likely Hood
10/28
So the MAP criterion decides that he does not have HIV.
Another good thing about the MAP criterion is that the MAP minimizes the probability of error (of making
incorrect decision).
Proof.
{ } { }
{ } { } { } { }
{ } { } { }[ ] +=
+=+=
111001
110001
1001
)|()|(
||
,,
Z
e
dzHPHzpHPHzpHp
HPHdPHPHdP
HdPHdPP
To minimize eP , put z , for which { } { }[ ]1100 )|()|( HPHzpHPHzp is negative into
1Z :
{ } { }[ ]{ }0)|()|( 11001
which is the same to (*).
Example 7
=
=
2
)1(exp
2
1)|(
2exp
2
1)|(
2
1
2
0
zHzp
zHzp
10
=
=
1
0
)|(1
)|(
1
1
Z
Z
dzHzp
dzHzp=1
)|( 0Z
dzHzp
{ }
{ }1
0
0
1
)|(
)|(
HP
HP
Hzp
Hzp>
-
8/4/2019 Maximum Likely Hood
11/28
{ } 25.00 =HP , { } 75.01 =HP
{ }
{ } 3
1
2
12e x p)(
1
0
1
0
=
=
HP
HPzz
H
H
i.e. 6.0
2
1
3
1
ln
1
0 +
H
Hz
Example 8 (single observation, multiple decision)
Closed region has Nanimals.
(i) Catch ranimals, mark them and release them.
(ii) After they are dispersed, catch n animals, and count the number, i of the marked animal.
Let x denote the number of the marked animals.
11
Unknown that we want to estimate (decide). multiple decision
Single observation
02
11
z
1Z
0Z
6.0
-
8/4/2019 Maximum Likely Hood
12/28
function ofN
unknown parameter
{ }
==
n
N
in
rN
i
r
ixPNPi )(
Suppose that
50=r , 40=n , 4=i .
The MLD chooses the value N that maximizes )(NPi , the probability of the observed event (
4=i ) when there are actually Nanimals.
12
clear all
r = 50;
n = 40;
i = 4;
for N = 50:1000
rCi = prod(r:-1:r-(i-1))/factorial(i);
NmrCnmi = prod(N-r:-1:N-r-(n-i-1))/factorial(n-i);NCn = prod(N:-1:N-(n-1))/factorial(n);
Pi(N) = rCi*NmrCnmi/NCn;
end
plot(Pi);
i: observation
-
8/4/2019 Maximum Likely Hood
13/28
13
-
8/4/2019 Maximum Likely Hood
14/28
Decision v.s. Estimation
In the decision problem, the number of Hypotheses is finite or countably infinite. (So, the Hypotheses
form a discrete space.)
example 8
In the estimation problem, the number ofhypothesis is uncountably infinite.
The same physical problem may be formulated as either a decision problem or an estimation problem.Example 8, where we used the decision problem setting, may be formulated as an estimation problem,
which would give a solution such as N = 501.42.
e.g.
i
j
target
decision setting: (5,6)
estimation setting: (4.98, 6.12)
target position
subpixelresolution
Image plane
Something more general than MLD, Neyman-Pearson or MAP criteria?
Bayes risk criterion.
14
estimator decision ruleestimate decision
-
8/4/2019 Maximum Likely Hood
15/28
Assign cost to each of four possible situations and minimize the total average cost.
=00c cost of deciding 0d when 0H is true.
=10c cost of deciding 1d when 0H is true.
=01c cost of deciding 0d when 1H is true.
=11c cost of deciding 1d when 1H is true.
Total average cost }{ ijcEB =
||
},{},{},{},{ 1111100101100000 HdPcHdPcHdPcHdPc +++
||
|||
}1||
}|{[}{]
|||
}|{
}1
||
}[
1
00
10010
0
0000
00
0000
b
|HP{d
HdPcHP
b
HdPc
|HP{d
| HP{dc
+
++
dHzPHPccHzPHPccHPcHPc
HPHdPccHPcHPHdPccHPc
) ]|(}{)()|(}{)[ (}{}{}{}|{)(}{}{}|{)(}{
111 10 1000 01 010 100 0
1110 11 110 10010 01 000 0
1++=
+++= Z
15
conditional cost, i.e. average costassuming that is true.
-
8/4/2019 Maximum Likely Hood
16/28
To minimize B , put z for which [ ]cc
Therefore, the Bayes decision rule is:
}{
}{)(
1
0
1 10 1
0 01 0
0
1
HP
HP
cc
ccz
H
H
(when 1101 cc > )
Example 9
||
02
1)|( zeHzP = cost for miss cost for false alarm
||2
1)|(zeHzP =
2,1,0 10011100 ==== cccc
75.0}{ 1 =HP
3
2
75.0)01(
25.0)02(2
)|(
)|()(
0
1
||
0
1=
==
H
H
zeHzP
HzPz
16
cost forincorrectdecision
cost forcorrectdecision
-
8/4/2019 Maximum Likely Hood
17/28
i.e.3
1ln||
0
1
H
H
z or
3
1ln||
0
1
>
= zeHzP z
02)|( 21 >=
zeHzP z
1,2,0 10011100 ==== cccc
0 0 .2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0. 5
1
1. 5
2
2. 5
3
1
1
zH
ln2
)|( 1HzP
)|( 0HzP
0
0
zH
0
0
0
12
12
1
2
2
P
P
H
H
e
e
e(z)
z
z
z
=
or
|||
)1(4ln
0
0
0
1
P
Pz
H
H
>
< P
function of unknown
-
8/4/2019 Maximum Likely Hood
20/28
)1(8
89
)1(2)1(
)1(2)1(
)1(}]|{)20(2[}]|{)01(0[
0
02
0
)1(4ln2
0
)1(4ln
0
02
0
011001
0
0
0
0
=
+=
+=
+++=
P
PP
ePeP
PeeP
PHdPPHdPB
P
P
P
P
320 0
0
== PdPdB
So,
69.02ln
3
2
)3
21(4
ln'
0
1
==
>
-
8/4/2019 Maximum Likely Hood
22/28
B=[];
for p0=0:0.01:1
if p0
-
8/4/2019 Maximum Likely Hood
23/28
Claim
Suppose that1
Z s.t. ),()( 1110 ZZ bb = and 1Z is a Bayes decision region for some
}){( 00 HPP = . Then 1Z is the min-max decision region.
Proof
Suppose that1
Z s.t. ),()( 1110 ZZ bb = and 1Z is not the min-max decision region. Then
1' Z s.t.
),(max),(max 10
0
10
0
ZZ PB
P
PB
P
-
8/4/2019 Maximum Likely Hood
24/28
24
-
8/4/2019 Maximum Likely Hood
25/28
)1)(()(),( 01101010 PbPbPB += ZZZ - (*) (p. 15)
)1(
11 ZZ =
)2(
11 ZZ =
)3(
11ZZ =line when
tangent to all lines
( must be convex)(why?)
minimum cost when
minimum cost whenmaximum of the
minimum cost
( is the optimal
Bayes decision
region)
)2(
00 PP =
)1(
00 PP =
)1(
1Z
),( 10 ZPB
0P0 1)1(
0P)3(
0P)2(
0P
So from (*)
0)()(0
),(1110
0
10
=
= ZZZ
bbdP
PdB
i.e. )()( 1110 ZZ bb =
25
-
8/4/2019 Maximum Likely Hood
26/28
5.2$2
110
4
110
4
120 =+=
Connection to the game theory (G. Strang)
2 cards 2 cards
player x
dealer
player y$20 $20
$10 $10
Player x and player y show one of their two cards simultaneously.
If player y matches the card of player x ($20 $20, or $10 $10), then player y gets $10 from player
x.
If player y does not match the card of player x ($20 $10, or $20 $10), then player x gets $20 (if
player x showed $20) or $10 (if player x showed $10) from player y.
Some thought
Players x and y must make decisions which do not have a regular pattern, and each decision must be
independent from the previous decision. Otherwise the opponent would try to take advantage of it
x chooses $20 with probability xP ,20
x chooses $10 with probability xP ,201
y chooses $20 with probability yP ,20
x chooses $10 with probability yP ,201
Want to find the optimal yx PP ,20,20 and
.
equilibrium point
Suppose that x and y choose a card with equal probability i.e.2
1,20,20 == yx PP .
Then yofcostaveragethe
Player y does not know what card player x would show.
26
-
8/4/2019 Maximum Likely Hood
27/28
( 0P in the previous examples)
But player y wishes to minimize the average cost by choosing yP ,20 .
cost for y
10
20
2010
1,101
100,0
cc
cc
,
,
x chooses x chooses
$10 $20
cost for y cost for y is earning for x
1010
2010
ys strategy is to minimize the average cost.
x
yy
PP
b
P
a
P
PP
,20
,20,20
],3020,2010[
]10,10[)1](20,10[
+=
+
s.t. ba =
)(5
3
30202010
,20 yPP
PP
==
=+
So, y should show $20-card with the rate of5
3.
$10-card with the rate of5
2.
What is the cost for y with this strategy?
]2,2[]5
33020,
5
32010[ =+
average cost = $2 which is less than $2.5.
So y minimizes his maximum cost.
27
y chooses $10
y chooses $20
zero-sum game
-
8/4/2019 Maximum Likely Hood
28/28
y's average cost
other exampleunknown noise
covariance matrix
optimal point
(equilibrium point,saddle point)
optimal cost for y
our parameter
estimate
x tries to stay on thisline
other example
xP ,20
yP ,20