maximum likely hood

8/4/2019 Maximum Likely Hood

1/28

HO #2 EE 722

9/3/2002 J. Chun

MLD (Maximum Likelihood Detection)

Example 1 A blood test is 95% effective in detecting the HIV infection when it is, in fact, present.

However, the test also yields false positive result for 1% of the healthy persons tested. If a person is

tested to be positive, would you decide that he has HIV?

95.0}HIV|positive{ =P

01.0}HIVno|positive{=

P

It is more likely that a person with HIV gives the positive result than a person with no HIV. So we

would decide that the person has HIV.

The decision criterion used in Example 1 is called the maximum likelihood decision criterion. Let us

generalize the idea in Example 1. There are two hypotheses 0H and 1H ( HIVno0 =H ,

HIV1 =H in the above example). Each of the two messages generates a point z in the observation

space Z . It is desired to divide Z into two decision regions 0Z and 1Z (The division gives a

decision rule). We make decision 0d , that hypothesis 0H is true, if 0Zz , similarly for decision

1d .

In Example 1, }positivenegative,{=Z .

1

0Z

1Z

0)( dzd =

1)( dzd =

Z

binary

Single observation


2/28

MLD Criterion

Given an observation Zz , set 0)( dzd = if it is more likely that 0H generated z than 1H generated z . Namely,

( ) ( ){ }100 ||| HzpHzpzZ >=

( ) ( ){ }101 ||| HzpHzpzZ ,

)|(

)|()(

0

1

Hzp

Hzpz

Example 2

nzH

NnnzH

+==1:

)1,0(~,:

1

0

2

likelihood ratio

=

=

2

)1(

1

20

2

2

2

1)|(

2

1)|(

z

z

eHzp

eHzp

0 5.0 1

)|(1

Hzp)|(0

Hzp

z

1Z

0Z


3/28

1)|(

)|()(

1

0

22

2

2

212

2)1(

2

2

1

2

)1(

2

1

0

1

H

H

zzz

z

z

ee

e

e

Hzp

Hzpz ====

0

2

12)(l n

1

0

H

H

zz

=

or

2

11

0

H

H

z

So, if 6.0=z , we choose 1H because2

16.0 > .

Example 3

( ))1,0(~2

1)|(: 200

2

NzeHzpH

z

=

))2,0(~(

22

1)|(:

2

2

2811

NzeHzpH

z

=

3

2

1

1Z0Z

Z

)|(1

Hzp

)|( 0Hzp

1Z

0Z1Z


4/28

12

1

)|(

)|()(

1

0

2

2

2

8

3

2

22

1

8

2

1

0

1

H

H

z

z

z

e

e

e

Hzp

Hzpz

===

0832l n)(l n

1

0

2

H

H

zz+=

2ln3

81

0

2

H

H

z

or 2ln3

81

0

H

H

z

.

Example 4 (Multiple observation)

nz =:0H

nsz +=:1H

[ ]nnn 121exp

)(det)2(

1)(

21

2

= RR

pT

P

Aside:

(i) IR 2=

= IR2

1 1

:

4

p-element vector


5/28

( ) ( ) ( ) PP II 222 detdet ==

So,( )

=

=

P

i

iPnp

1

2

2

12

exp2

1)(

n

(ii) 1=P :

[ ]22

12exp

2

1)( np

=n

( ) ( ){ }[ ]zzszszz

zz

11

21

0

1 exp)|(

)|()( == RR

Hp

Hp TT

( ) 022

1)(l n

1

0

11

H

H

TTRR =szssz

So, sssz11

2

11

0

RR T

H

H

T.

This is a multiple decision, too.

Neyman-Pearson Criterion

Fix { }01 | HdP at a preselected value 0 , and then maximize { }11 | HdP . constrained

maximization

5

( )szss 112

1 2 = RR TT

)|(1

Hzp

)|( 0Hzp

1Z

0Z

1)( dzd =0)( dzd =z


6/28

area ( )= { } FPHdP 01 |

false alarm probability

area( )= { } DPHdP 11 |

detection probability

We want { } 11 | HdP , { } 01 | HdP . By moving , however, { }11 | HdP and { }01 | HdP

either decrease and increase simultaneously.To find the threshold according to the Neyman-Pearson criterion, we want to maximize:

{ } { }

[ ] 001

0

)|(

01

)|(

11

1

10

11

)|()|(

]|[|

+=

=

==

Z

dzHzpdzHzp

dzHzpHzp

HdPHdP

ZZ

0)|()|( 01 > HzpHzp

or >)|(

)|(

0

1

Hzp

Hzp.

Namely,

1

0

)|(

)|()(

0

1

H

H

Hzp

Hzpz ==

We must select such that the constraint

{ } ==1

)|(| 0001Z

dzHzpHdP

is satisfied.

6

This can be maximized by selecting 1Z of all z such that


7/28

Example 5

0,2

)(exp

2

1)|(

2

exp

2

1)|(

2

1

2

0

>

=

=

zHzp

zHzp

We require that { } 25.0| 01 =HdP

( )

z

zz

z

z

Hzp

Hzpz

====2

21

22

2

2

ee

e

e

)|(

)|()(

22

)(

2

2

)(

0

1

So,

l n2

11

0

2

1

2

H

H

z

z +

=

ln

2

11

0

H

H

z or

2

1ln1

0

+

H

H

z

{ }

+==

+

=2

lne|

2

ln

2

2

101

2

25.0

QdzHdP

z

So, 674.02

ln

+

Notice that we did not have to know the value

, to derive the Neyman-Pearson detection

rule.

Example 6

7

1Z

0Z

z

674.00= z

( )0

zQ

unknown


8/28


9/28

that it is known that only 0.5% of the Korean population has HIV. Would you strongly thrust the test when

you are tested to be positive?

The probability that a person has HIV, given that his test result is positive.

{ }{ }

{ }

{ } { }{ } { } { } { }

323.0

)995.0()01.0()005.0()95.0(

)005.0()95.0(

HIVnoHIVno|positiveHIVHIV|positive

HIVHIV|positive

positive

positiveHIV,positive|HIV

=

+

=

+

=

=

PPPP

PP

P

PP

So, we need a better decision criterion that can use a priori information such as 0.5% in the above

argument. The idea is that if either 0H or 1H is highly unlikely to be true, the MLD is not a goodcriterion.

MAP (maximum a posteriori decision criterion)

Given an observation z , choose 0H if 0H is more likely than 1H .

1)|(

)|(1

00

1

H

HzHp

zHp

So,

{ }{ }1

0

1

0

)(HP

HPz

H

H

(*)

For Example 1, we consider the ratio,

9

{ }{ } { }{ }0

1

00

11 )()|()|(

HPHPz

HPHzpHPHzp ==

{ }

{ }

{ } { } { }

{ } { } { }

2

1477.0

995.001.0

005.095.0

positiveHIVnoHIVno|positive

positiveHIVHIV|positive

positive|HIVno

positive|HIV


10/28

So the MAP criterion decides that he does not have HIV.

Another good thing about the MAP criterion is that the MAP minimizes the probability of error (of making

incorrect decision).

Proof.

{ } { }

{ } { } { } { }

{ } { } { }[ ] +=

+=+=

111001

110001

1001

)|()|(

||

,,

Z

e

dzHPHzpHPHzpHp

HPHdPHPHdP

HdPHdPP

To minimize eP , put z , for which { } { }[ ]1100 )|()|( HPHzpHPHzp is negative into

1Z :

{ } { }[ ]{ }0)|()|( 11001

which is the same to (*).

Example 7

=

=

2

)1(exp

2

1)|(

2exp

2

1)|(

2

1

2

0

zHzp

zHzp

10

=

=

1

0

)|(1

)|(

1

1

Z

Z

dzHzp

dzHzp=1

)|( 0Z

dzHzp

{ }

{ }1

0

0

1

)|(

)|(

HP

HP

Hzp

Hzp>


11/28

{ } 25.00 =HP , { } 75.01 =HP

{ }

{ } 3

1

2

12e x p)(

1

0

1

0

=

=

HP

HPzz

H

H

i.e. 6.0

2

1

3

1

ln

1

0 +

H

Hz

Example 8 (single observation, multiple decision)

Closed region has Nanimals.

(i) Catch ranimals, mark them and release them.

(ii) After they are dispersed, catch n animals, and count the number, i of the marked animal.

Let x denote the number of the marked animals.

11

Unknown that we want to estimate (decide). multiple decision

Single observation

02

11

z

1Z

0Z

6.0


12/28

function ofN

unknown parameter

{ }

==

n

N

in

rN

i

r

ixPNPi )(

Suppose that

50=r , 40=n , 4=i .

The MLD chooses the value N that maximizes )(NPi , the probability of the observed event (

4=i ) when there are actually Nanimals.

12

clear all

r = 50;

n = 40;

i = 4;

for N = 50:1000

rCi = prod(r:-1:r-(i-1))/factorial(i);

NmrCnmi = prod(N-r:-1:N-r-(n-i-1))/factorial(n-i);NCn = prod(N:-1:N-(n-1))/factorial(n);

Pi(N) = rCi*NmrCnmi/NCn;

end

plot(Pi);

i: observation


13/28

13


14/28

Decision v.s. Estimation

In the decision problem, the number of Hypotheses is finite or countably infinite. (So, the Hypotheses

form a discrete space.)

example 8

In the estimation problem, the number ofhypothesis is uncountably infinite.

The same physical problem may be formulated as either a decision problem or an estimation problem.Example 8, where we used the decision problem setting, may be formulated as an estimation problem,

which would give a solution such as N = 501.42.

e.g.

i

j

target

decision setting: (5,6)

estimation setting: (4.98, 6.12)

target position

subpixelresolution

Image plane

Something more general than MLD, Neyman-Pearson or MAP criteria?

Bayes risk criterion.

14

estimator decision ruleestimate decision


15/28

Assign cost to each of four possible situations and minimize the total average cost.

=00c cost of deciding 0d when 0H is true.




Total average cost }{ ijcEB =

||

},{},{},{},{ 1111100101100000 HdPcHdPcHdPcHdPc +++

||

|||

}1||

}|{[}{]

|||

}|{

}1

||

}[

1

00

10010

0

0000

00

0000

b

|HP{d

HdPcHP

b

HdPc

|HP{d

| HP{dc

+

++

dHzPHPccHzPHPccHPcHPc

HPHdPccHPcHPHdPccHPc

) ]|(}{)()|(}{)[ (}{}{}{}|{)(}{}{}|{)(}{

111 10 1000 01 010 100 0

1110 11 110 10010 01 000 0

1++=

+++= Z

15

conditional cost, i.e. average costassuming that is true.


16/28

To minimize B , put z for which [ ]cc

Therefore, the Bayes decision rule is:

}{

}{)(

1

0

1 10 1

0 01 0

0

1

HP

HP

cc

ccz

H

H

(when 1101 cc > )

Example 9

||

02

1)|( zeHzP = cost for miss cost for false alarm

||2

1)|(zeHzP =

2,1,0 10011100 ==== cccc

75.0}{ 1 =HP

3

2

75.0)01(

25.0)02(2

)|(

)|()(

0

1

||

0

1=

==

H

H

zeHzP

HzPz

16

cost forincorrectdecision

cost forcorrectdecision


17/28

i.e.3

1ln||

0

1

H

H

z or

3

1ln||

0

1

>

= zeHzP z

02)|( 21 >=

zeHzP z

1,2,0 10011100 ==== cccc

0 0 .2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0. 5

1

1. 5

2

2. 5

3

1

1

zH

ln2

)|( 1HzP

)|( 0HzP

0

0

zH

0

0

0

12

12

1

2

2

P

P

H

H

e

e

e(z)

z

z

z

=

or

|||

)1(4ln

0

0

0

1

P

Pz

H

H

>

< P

function of unknown


20/28

)1(8

89

)1(2)1(

)1(2)1(

)1(}]|{)20(2[}]|{)01(0[

0

02

0

)1(4ln2

0

)1(4ln

0

02

0

011001

0

0

0

0

=

+=

+=

+++=

P

PP

ePeP

PeeP

PHdPPHdPB

P

P

P

P

320 0

0

== PdPdB

So,

69.02ln

3

2

)3

21(4

ln'

0

1

==

>


22/28

B=[];

for p0=0:0.01:1

if p0


23/28

Claim

Suppose that1

Z s.t. ),()( 1110 ZZ bb = and 1Z is a Bayes decision region for some

}){( 00 HPP = . Then 1Z is the min-max decision region.

Proof

Suppose that1

Z s.t. ),()( 1110 ZZ bb = and 1Z is not the min-max decision region. Then

1' Z s.t.

),(max),(max 10

0

10

0

ZZ PB

P

PB

P


24/28

24


25/28

)1)(()(),( 01101010 PbPbPB += ZZZ - (*) (p. 15)

)1(

11 ZZ =

)2(

11 ZZ =

)3(

11ZZ =line when

tangent to all lines

( must be convex)(why?)

minimum cost when

minimum cost whenmaximum of the

minimum cost

( is the optimal

Bayes decision

region)

)2(

00 PP =

)1(

00 PP =

)1(

1Z

),( 10 ZPB

0P0 1)1(

0P)3(

0P)2(

0P

So from (*)

0)()(0

),(1110

0

10

=

= ZZZ

bbdP

PdB

i.e. )()( 1110 ZZ bb =

25


26/28

5.2$2

110

4

110

4

120 =+=

Connection to the game theory (G. Strang)

2 cards 2 cards

player x

dealer

player y$20 $20

$10 $10

Player x and player y show one of their two cards simultaneously.

If player y matches the card of player x ($20 $20, or $10 $10), then player y gets $10 from player

x.

If player y does not match the card of player x ($20 $10, or $20 $10), then player x gets $20 (if

player x showed $20) or $10 (if player x showed $10) from player y.

Some thought

Players x and y must make decisions which do not have a regular pattern, and each decision must be

independent from the previous decision. Otherwise the opponent would try to take advantage of it

x chooses $20 with probability xP ,20

x chooses $10 with probability xP ,201

y chooses $20 with probability yP ,20

x chooses $10 with probability yP ,201

Want to find the optimal yx PP ,20,20 and

.

equilibrium point

Suppose that x and y choose a card with equal probability i.e.2

1,20,20 == yx PP .

Then yofcostaveragethe

Player y does not know what card player x would show.

26


27/28

( 0P in the previous examples)

But player y wishes to minimize the average cost by choosing yP ,20 .

cost for y

10

20

2010

1,101

100,0

cc

cc

,

,

x chooses x chooses

$10 $20

cost for y cost for y is earning for x

1010

2010

ys strategy is to minimize the average cost.

x

yy

PP

b

P

a

P

PP

,20

,20,20

],3020,2010[

]10,10[)1](20,10[

+=

+

s.t. ba =

)(5

3

30202010

,20 yPP

PP

==

=+

So, y should show $20-card with the rate of5

3.

$10-card with the rate of5

2.

What is the cost for y with this strategy?

]2,2[]5

33020,

5

32010[ =+

average cost = $2 which is less than $2.5.

So y minimizes his maximum cost.

27

y chooses $10

y chooses $20

zero-sum game


28/28

y's average cost

other exampleunknown noise

covariance matrix

optimal point

(equilibrium point,saddle point)

optimal cost for y

our parameter

estimate

x tries to stay on thisline

other example

xP ,20

yP ,20

maximum likely hood

Documents