maximum likely hood

Upload: sri-harsha-k

Post on 07-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 Maximum Likely Hood

    1/28

    HO #2 EE 722

    9/3/2002 J. Chun

    MLD (Maximum Likelihood Detection)

    Example 1 A blood test is 95% effective in detecting the HIV infection when it is, in fact, present.

    However, the test also yields false positive result for 1% of the healthy persons tested. If a person is

    tested to be positive, would you decide that he has HIV?

    95.0}HIV|positive{ =P

    01.0}HIVno|positive{=

    P

    It is more likely that a person with HIV gives the positive result than a person with no HIV. So we

    would decide that the person has HIV.

    The decision criterion used in Example 1 is called the maximum likelihood decision criterion. Let us

    generalize the idea in Example 1. There are two hypotheses 0H and 1H ( HIVno0 =H ,

    HIV1 =H in the above example). Each of the two messages generates a point z in the observation

    space Z . It is desired to divide Z into two decision regions 0Z and 1Z (The division gives a

    decision rule). We make decision 0d , that hypothesis 0H is true, if 0Zz , similarly for decision

    1d .

    In Example 1, }positivenegative,{=Z .

    1

    0Z

    1Z

    0)( dzd =

    1)( dzd =

    Z

    binary

    Single observation

  • 8/4/2019 Maximum Likely Hood

    2/28

    MLD Criterion

    Given an observation Zz , set 0)( dzd = if it is more likely that 0H generated z than 1H generated z . Namely,

    ( ) ( ){ }100 ||| HzpHzpzZ >=

    ( ) ( ){ }101 ||| HzpHzpzZ ,

    )|(

    )|()(

    0

    1

    Hzp

    Hzpz

    Example 2

    nzH

    NnnzH

    +==1:

    )1,0(~,:

    1

    0

    2

    likelihood ratio

    =

    =

    2

    )1(

    1

    20

    2

    2

    2

    1)|(

    2

    1)|(

    z

    z

    eHzp

    eHzp

    0 5.0 1

    )|(1

    Hzp)|(0

    Hzp

    z

    1Z

    0Z

  • 8/4/2019 Maximum Likely Hood

    3/28

    1)|(

    )|()(

    1

    0

    22

    2

    2

    212

    2)1(

    2

    2

    1

    2

    )1(

    2

    1

    0

    1

    H

    H

    zzz

    z

    z

    ee

    e

    e

    Hzp

    Hzpz ====

    0

    2

    12)(l n

    1

    0

    H

    H

    zz

    =

    or

    2

    11

    0

    H

    H

    z

    So, if 6.0=z , we choose 1H because2

    16.0 > .

    Example 3

    ( ))1,0(~2

    1)|(: 200

    2

    NzeHzpH

    z

    =

    ))2,0(~(

    22

    1)|(:

    2

    2

    2811

    NzeHzpH

    z

    =

    3

    2

    1

    1Z0Z

    Z

    )|(1

    Hzp

    )|( 0Hzp

    1Z

    0Z1Z

  • 8/4/2019 Maximum Likely Hood

    4/28

    12

    1

    )|(

    )|()(

    1

    0

    2

    2

    2

    8

    3

    2

    22

    1

    8

    2

    1

    0

    1

    H

    H

    z

    z

    z

    e

    e

    e

    Hzp

    Hzpz

    ===

    0832l n)(l n

    1

    0

    2

    H

    H

    zz+=

    2ln3

    81

    0

    2

    H

    H

    z

    or 2ln3

    81

    0

    H

    H

    z

    .

    Example 4 (Multiple observation)

    nz =:0H

    nsz +=:1H

    [ ]nnn 121exp

    )(det)2(

    1)(

    21

    2

    = RR

    pT

    P

    Aside:

    (i) IR 2=

    = IR2

    1 1

    :

    4

    p-element vector

  • 8/4/2019 Maximum Likely Hood

    5/28

    ( ) ( ) ( ) PP II 222 detdet ==

    So,( )

    =

    =

    P

    i

    iPnp

    1

    2

    2

    12

    exp2

    1)(

    n

    (ii) 1=P :

    [ ]22

    12exp

    2

    1)( np

    =n

    ( ) ( ){ }[ ]zzszszz

    zz

    11

    21

    0

    1 exp)|(

    )|()( == RR

    Hp

    Hp TT

    ( ) 022

    1)(l n

    1

    0

    11

    H

    H

    TTRR =szssz

    So, sssz11

    2

    11

    0

    RR T

    H

    H

    T.

    This is a multiple decision, too.

    Neyman-Pearson Criterion

    Fix { }01 | HdP at a preselected value 0 , and then maximize { }11 | HdP . constrained

    maximization

    5

    ( )szss 112

    1 2 = RR TT

    )|(1

    Hzp

    )|( 0Hzp

    1Z

    0Z

    1)( dzd =0)( dzd =z

  • 8/4/2019 Maximum Likely Hood

    6/28

    area ( )= { } FPHdP 01 |

    false alarm probability

    area( )= { } DPHdP 11 |

    detection probability

    We want { } 11 | HdP , { } 01 | HdP . By moving , however, { }11 | HdP and { }01 | HdP

    either decrease and increase simultaneously.To find the threshold according to the Neyman-Pearson criterion, we want to maximize:

    { } { }

    [ ] 001

    0

    )|(

    01

    )|(

    11

    1

    10

    11

    )|()|(

    ]|[|

    +=

    =

    ==

    Z

    dzHzpdzHzp

    dzHzpHzp

    HdPHdP

    ZZ

    0)|()|( 01 > HzpHzp

    or >)|(

    )|(

    0

    1

    Hzp

    Hzp.

    Namely,

    1

    0

    )|(

    )|()(

    0

    1

    H

    H

    Hzp

    Hzpz ==

    We must select such that the constraint

    { } ==1

    )|(| 0001Z

    dzHzpHdP

    is satisfied.

    6

    This can be maximized by selecting 1Z of all z such that

  • 8/4/2019 Maximum Likely Hood

    7/28

    Example 5

    0,2

    )(exp

    2

    1)|(

    2

    exp

    2

    1)|(

    2

    1

    2

    0

    >

    =

    =

    zHzp

    zHzp

    We require that { } 25.0| 01 =HdP

    ( )

    z

    zz

    z

    z

    Hzp

    Hzpz

    ====2

    21

    22

    2

    2

    ee

    e

    e

    )|(

    )|()(

    22

    )(

    2

    2

    )(

    0

    1

    So,

    l n2

    11

    0

    2

    1

    2

    H

    H

    z

    z +

    =

    ln

    2

    11

    0

    H

    H

    z or

    2

    1ln1

    0

    +

    H

    H

    z

    { }

    +==

    +

    =2

    lne|

    2

    ln

    2

    2

    101

    2

    25.0

    QdzHdP

    z

    So, 674.02

    ln

    +

    Notice that we did not have to know the value

    , to derive the Neyman-Pearson detection

    rule.

    Example 6

    7

    1Z

    0Z

    z

    674.00= z

    ( )0

    zQ

    unknown

  • 8/4/2019 Maximum Likely Hood

    8/28

  • 8/4/2019 Maximum Likely Hood

    9/28

    that it is known that only 0.5% of the Korean population has HIV. Would you strongly thrust the test when

    you are tested to be positive?

    The probability that a person has HIV, given that his test result is positive.

    { }{ }

    { }

    { } { }{ } { } { } { }

    323.0

    )995.0()01.0()005.0()95.0(

    )005.0()95.0(

    HIVnoHIVno|positiveHIVHIV|positive

    HIVHIV|positive

    positive

    positiveHIV,positive|HIV

    =

    +

    =

    +

    =

    =

    PPPP

    PP

    P

    PP

    So, we need a better decision criterion that can use a priori information such as 0.5% in the above

    argument. The idea is that if either 0H or 1H is highly unlikely to be true, the MLD is not a goodcriterion.

    MAP (maximum a posteriori decision criterion)

    Given an observation z , choose 0H if 0H is more likely than 1H .

    1)|(

    )|(1

    00

    1

    H

    HzHp

    zHp

    So,

    { }{ }1

    0

    1

    0

    )(HP

    HPz

    H

    H

    (*)

    For Example 1, we consider the ratio,

    9

    { }{ } { }{ }0

    1

    00

    11 )()|()|(

    HPHPz

    HPHzpHPHzp ==

    { }

    { }

    { } { } { }

    { } { } { }

    2

    1477.0

    995.001.0

    005.095.0

    positiveHIVnoHIVno|positive

    positiveHIVHIV|positive

    positive|HIVno

    positive|HIV

  • 8/4/2019 Maximum Likely Hood

    10/28

    So the MAP criterion decides that he does not have HIV.

    Another good thing about the MAP criterion is that the MAP minimizes the probability of error (of making

    incorrect decision).

    Proof.

    { } { }

    { } { } { } { }

    { } { } { }[ ] +=

    +=+=

    111001

    110001

    1001

    )|()|(

    ||

    ,,

    Z

    e

    dzHPHzpHPHzpHp

    HPHdPHPHdP

    HdPHdPP

    To minimize eP , put z , for which { } { }[ ]1100 )|()|( HPHzpHPHzp is negative into

    1Z :

    { } { }[ ]{ }0)|()|( 11001

    which is the same to (*).

    Example 7

    =

    =

    2

    )1(exp

    2

    1)|(

    2exp

    2

    1)|(

    2

    1

    2

    0

    zHzp

    zHzp

    10

    =

    =

    1

    0

    )|(1

    )|(

    1

    1

    Z

    Z

    dzHzp

    dzHzp=1

    )|( 0Z

    dzHzp

    { }

    { }1

    0

    0

    1

    )|(

    )|(

    HP

    HP

    Hzp

    Hzp>

  • 8/4/2019 Maximum Likely Hood

    11/28

    { } 25.00 =HP , { } 75.01 =HP

    { }

    { } 3

    1

    2

    12e x p)(

    1

    0

    1

    0

    =

    =

    HP

    HPzz

    H

    H

    i.e. 6.0

    2

    1

    3

    1

    ln

    1

    0 +

    H

    Hz

    Example 8 (single observation, multiple decision)

    Closed region has Nanimals.

    (i) Catch ranimals, mark them and release them.

    (ii) After they are dispersed, catch n animals, and count the number, i of the marked animal.

    Let x denote the number of the marked animals.

    11

    Unknown that we want to estimate (decide). multiple decision

    Single observation

    02

    11

    z

    1Z

    0Z

    6.0

  • 8/4/2019 Maximum Likely Hood

    12/28

    function ofN

    unknown parameter

    { }

    ==

    n

    N

    in

    rN

    i

    r

    ixPNPi )(

    Suppose that

    50=r , 40=n , 4=i .

    The MLD chooses the value N that maximizes )(NPi , the probability of the observed event (

    4=i ) when there are actually Nanimals.

    12

    clear all

    r = 50;

    n = 40;

    i = 4;

    for N = 50:1000

    rCi = prod(r:-1:r-(i-1))/factorial(i);

    NmrCnmi = prod(N-r:-1:N-r-(n-i-1))/factorial(n-i);NCn = prod(N:-1:N-(n-1))/factorial(n);

    Pi(N) = rCi*NmrCnmi/NCn;

    end

    plot(Pi);

    i: observation

  • 8/4/2019 Maximum Likely Hood

    13/28

    13

  • 8/4/2019 Maximum Likely Hood

    14/28

    Decision v.s. Estimation

    In the decision problem, the number of Hypotheses is finite or countably infinite. (So, the Hypotheses

    form a discrete space.)

    example 8

    In the estimation problem, the number ofhypothesis is uncountably infinite.

    The same physical problem may be formulated as either a decision problem or an estimation problem.Example 8, where we used the decision problem setting, may be formulated as an estimation problem,

    which would give a solution such as N = 501.42.

    e.g.

    i

    j

    target

    decision setting: (5,6)

    estimation setting: (4.98, 6.12)

    target position

    subpixelresolution

    Image plane

    Something more general than MLD, Neyman-Pearson or MAP criteria?

    Bayes risk criterion.

    14

    estimator decision ruleestimate decision

  • 8/4/2019 Maximum Likely Hood

    15/28

    Assign cost to each of four possible situations and minimize the total average cost.

    =00c cost of deciding 0d when 0H is true.

    =10c cost of deciding 1d when 0H is true.

    =01c cost of deciding 0d when 1H is true.

    =11c cost of deciding 1d when 1H is true.

    Total average cost }{ ijcEB =

    ||

    },{},{},{},{ 1111100101100000 HdPcHdPcHdPcHdPc +++

    ||

    |||

    }1||

    }|{[}{]

    |||

    }|{

    }1

    ||

    }[

    1

    00

    10010

    0

    0000

    00

    0000

    b

    |HP{d

    HdPcHP

    b

    HdPc

    |HP{d

    | HP{dc

    +

    ++

    dHzPHPccHzPHPccHPcHPc

    HPHdPccHPcHPHdPccHPc

    ) ]|(}{)()|(}{)[ (}{}{}{}|{)(}{}{}|{)(}{

    111 10 1000 01 010 100 0

    1110 11 110 10010 01 000 0

    1++=

    +++= Z

    15

    conditional cost, i.e. average costassuming that is true.

  • 8/4/2019 Maximum Likely Hood

    16/28

    To minimize B , put z for which [ ]cc

    Therefore, the Bayes decision rule is:

    }{

    }{)(

    1

    0

    1 10 1

    0 01 0

    0

    1

    HP

    HP

    cc

    ccz

    H

    H

    (when 1101 cc > )

    Example 9

    ||

    02

    1)|( zeHzP = cost for miss cost for false alarm

    ||2

    1)|(zeHzP =

    2,1,0 10011100 ==== cccc

    75.0}{ 1 =HP

    3

    2

    75.0)01(

    25.0)02(2

    )|(

    )|()(

    0

    1

    ||

    0

    1=

    ==

    H

    H

    zeHzP

    HzPz

    16

    cost forincorrectdecision

    cost forcorrectdecision

  • 8/4/2019 Maximum Likely Hood

    17/28

    i.e.3

    1ln||

    0

    1

    H

    H

    z or

    3

    1ln||

    0

    1

    >

    = zeHzP z

    02)|( 21 >=

    zeHzP z

    1,2,0 10011100 ==== cccc

    0 0 .2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

    0. 5

    1

    1. 5

    2

    2. 5

    3

    1

    1

    zH

    ln2

    )|( 1HzP

    )|( 0HzP

    0

    0

    zH

    0

    0

    0

    12

    12

    1

    2

    2

    P

    P

    H

    H

    e

    e

    e(z)

    z

    z

    z

    =

    or

    |||

    )1(4ln

    0

    0

    0

    1

    P

    Pz

    H

    H

    >

    < P

    function of unknown

  • 8/4/2019 Maximum Likely Hood

    20/28

    )1(8

    89

    )1(2)1(

    )1(2)1(

    )1(}]|{)20(2[}]|{)01(0[

    0

    02

    0

    )1(4ln2

    0

    )1(4ln

    0

    02

    0

    011001

    0

    0

    0

    0

    =

    +=

    +=

    +++=

    P

    PP

    ePeP

    PeeP

    PHdPPHdPB

    P

    P

    P

    P

    320 0

    0

    == PdPdB

    So,

    69.02ln

    3

    2

    )3

    21(4

    ln'

    0

    1

    ==

    >

  • 8/4/2019 Maximum Likely Hood

    22/28

    B=[];

    for p0=0:0.01:1

    if p0

  • 8/4/2019 Maximum Likely Hood

    23/28

    Claim

    Suppose that1

    Z s.t. ),()( 1110 ZZ bb = and 1Z is a Bayes decision region for some

    }){( 00 HPP = . Then 1Z is the min-max decision region.

    Proof

    Suppose that1

    Z s.t. ),()( 1110 ZZ bb = and 1Z is not the min-max decision region. Then

    1' Z s.t.

    ),(max),(max 10

    0

    10

    0

    ZZ PB

    P

    PB

    P

  • 8/4/2019 Maximum Likely Hood

    24/28

    24

  • 8/4/2019 Maximum Likely Hood

    25/28

    )1)(()(),( 01101010 PbPbPB += ZZZ - (*) (p. 15)

    )1(

    11 ZZ =

    )2(

    11 ZZ =

    )3(

    11ZZ =line when

    tangent to all lines

    ( must be convex)(why?)

    minimum cost when

    minimum cost whenmaximum of the

    minimum cost

    ( is the optimal

    Bayes decision

    region)

    )2(

    00 PP =

    )1(

    00 PP =

    )1(

    1Z

    ),( 10 ZPB

    0P0 1)1(

    0P)3(

    0P)2(

    0P

    So from (*)

    0)()(0

    ),(1110

    0

    10

    =

    = ZZZ

    bbdP

    PdB

    i.e. )()( 1110 ZZ bb =

    25

  • 8/4/2019 Maximum Likely Hood

    26/28

    5.2$2

    110

    4

    110

    4

    120 =+=

    Connection to the game theory (G. Strang)

    2 cards 2 cards

    player x

    dealer

    player y$20 $20

    $10 $10

    Player x and player y show one of their two cards simultaneously.

    If player y matches the card of player x ($20 $20, or $10 $10), then player y gets $10 from player

    x.

    If player y does not match the card of player x ($20 $10, or $20 $10), then player x gets $20 (if

    player x showed $20) or $10 (if player x showed $10) from player y.

    Some thought

    Players x and y must make decisions which do not have a regular pattern, and each decision must be

    independent from the previous decision. Otherwise the opponent would try to take advantage of it

    x chooses $20 with probability xP ,20

    x chooses $10 with probability xP ,201

    y chooses $20 with probability yP ,20

    x chooses $10 with probability yP ,201

    Want to find the optimal yx PP ,20,20 and

    .

    equilibrium point

    Suppose that x and y choose a card with equal probability i.e.2

    1,20,20 == yx PP .

    Then yofcostaveragethe

    Player y does not know what card player x would show.

    26

  • 8/4/2019 Maximum Likely Hood

    27/28

    ( 0P in the previous examples)

    But player y wishes to minimize the average cost by choosing yP ,20 .

    cost for y

    10

    20

    2010

    1,101

    100,0

    cc

    cc

    ,

    ,

    x chooses x chooses

    $10 $20

    cost for y cost for y is earning for x

    1010

    2010

    ys strategy is to minimize the average cost.

    x

    yy

    PP

    b

    P

    a

    P

    PP

    ,20

    ,20,20

    ],3020,2010[

    ]10,10[)1](20,10[

    +=

    +

    s.t. ba =

    )(5

    3

    30202010

    ,20 yPP

    PP

    ==

    =+

    So, y should show $20-card with the rate of5

    3.

    $10-card with the rate of5

    2.

    What is the cost for y with this strategy?

    ]2,2[]5

    33020,

    5

    32010[ =+

    average cost = $2 which is less than $2.5.

    So y minimizes his maximum cost.

    27

    y chooses $10

    y chooses $20

    zero-sum game

  • 8/4/2019 Maximum Likely Hood

    28/28

    y's average cost

    other exampleunknown noise

    covariance matrix

    optimal point

    (equilibrium point,saddle point)

    optimal cost for y

    our parameter

    estimate

    x tries to stay on thisline

    other example

    xP ,20

    yP ,20