detection theory - binary and m-ary source

Upload: mohamed-abdelbaset

Post on 13-Oct-2015

27 views

Category:

Documents


0 download

DESCRIPTION

Detection Theory - Binary and M-ary source

TRANSCRIPT

  • Bayesian Hypothesis testing Statistical Decision Theory I. Simple Hypothesis testing. Binary Hypothesis testing Bayesian Hypothesis testing. Minimax Hypothesis testing. Neyman-Pearson criterion. M-Hypotheses. Receiver Operating Characteristics. Composite Hypothesis testing. Composite Hypothesis testing approaches. Performance of GLRT for large data records. Nuisance parameters.

  • Classical detection and estimation theory. What is detection? Signal detection and estimation is the area of study that deals with

    the processing of information-bearing signals for the purpose of extracting information from them.

    A simple digital communication system.

    TransmitterSource ChannelDigital

    SequenceSignal

    Sequence

    s(t) r(t)=s(t)+n(t)

    TT

    1 0

    TT

    sin0tsin1t

  • Components of a decision theory problem. 1. Source - that generates an

    output. 2. Probabilistic transition

    mechanism - a device that knows which hypothesis is true. It generates a point in the observation space accordingly to some probability law.

    3. Observation space describes all the outcomes of the transition mechanism.

    4. Decision - to each point in observation space is assigned one of the hypotheses

    Probabilistictransition

    mechanismSource Observation

    space

    Decisionrule

    Decision

    H1

    H0

    Components of a decision theory problem.

  • Example:

    When 1H is true the source generates +1.

    When 0H is true the source generates -1.

    An independent discrete random variable n whose probability density is added to the source output.

    12

    14

    14

    N0-1 +1

    pn(N)

    12

    14

    14

    N+10 +2

    pn(N H1)

    12

    14

    14

    N-1-2 0

    pn(N H0)

    Source

    H1

    H0+1

    -1

    Transitionmechanism

    Observationspace

    r

  • The sum of the source output and n is observed variable r. Observation space has finite dimension, i.e. observation consists of a

    set of N numbers and can be represented as a point in N dimensional space.

    Under the two hypotheses, we have

    1

    0

    : 1

    : 1

    H r n

    H r n

    = +

    = + After observing the outcome in the observation space we shall guess

    which hypothesis is true. We use a decision rule that assigns each point to one of the

    hypotheses.

  • Detection and estimation applications involve making inferences from observations that are distorted or corrupted in some unknown manner.

  • Simple binary hypothesis testing. The decision problem in which each of two source outputs

    corresponds to a hypothesis. Each hypothesis maps into a point in the observation space. We assume that the observation space is a set of N observations:

    1 2, , , Nr r r . Each set can be represented as a vector r:

    1

    2

    N

    rr

    r

    r #

  • The probabilistic transition mechanism generates points in accord with the two known conditional densities ( )

    11|

    |H

    p Hr

    R , ( )0

    0||

    Hp H

    rR .

    The objective is to use this information to develop a decision rule.

  • Decision criteria. In the binary hypothesis problem either 0H or 1H is true. We are seeking decision rules for making a choice. Each time the experiment is conducted one of four things can happen:

    1. 0H true; choose 0H correct 2. 0H true; choose 1H 3. 1H true; choose 1H correct 4. 1H true; choose 0H

    The purpose of a decision criterion is to attach some relative importance to the four possible courses of action.

    The method for processing the received data depends on the decision criterion we select.

  • Bayesian criterion. Source generates two outputs with given (a priori) probabilities 1 0,P P . These represent the observer information before the experiment is conducted.

    The cost is assigned to each course of actions. 00 10 01 11, , ,C C C C . Each time the experiment is conducted a certain cost will be incurred. The decision rule is designed so that on the average the cost will be

    as small as possible. Two probabilities are averaged over: the a priori probability and

    probability that a particular course of action will be taken.

    Source

    pr H1(R H1)

    pr H0(R H0)Z: Observation space

    R

    R

    Z0

    Z0

    Say:H0

    Z1

    Say:H1

  • The expected value of the cost is

    ( )( )( )( )

    00 0 0 0

    10 0 1 0

    11 1 1 1

    01 1 0 1

    Pr say | is true

    Pr say | is true

    Pr say | is true

    Pr say | is true

    C P H H

    C P H H

    C P H H

    C P H H

    =

    +

    +

    +

    R

    The binary observation rule divides the total observation space Z into two parts: 0 1,Z Z .

    Each point in observation space is assigned to one of these sets. The expression of the risk in terms of transition probabilities and the

    decision regions:

  • ( ) ( )

    ( ) ( )

    0 0

    0 1

    1 1

    1 0

    00 0 0 10 0 0r| r|

    11 1 1 01 1 1r| r|

    R | R R | R

    R | R R | R

    H HZ Z

    H HZ Z

    C P p H d C P p H d

    C P p H d C P p H d

    = +

    + +

    R

    0 1,Z Z cover the observation space (the integrals integrate to one). We assume that the cost of a wrong decision is higher than the cost

    of a correct decision.

    10 00

    01 11

    C C

    C C

    >

    >

    For Bayesian test the regions 0Z and 1Z are chosen such that the risk will be minimized.

  • We assume that the decision is to be made for each point in observation space. ( )0 1Z Z Z= +

    The decision regions are defined by the statement:

    ( ) ( )

    ( ) ( )

    0 0

    0 0

    1 1

    0 0

    00 0 0 10 0 0| |

    11 1 1 01 1 1| |

    | |

    | |

    H HZ Z Z

    H HZ Z Z

    C P p H d C P p H d

    C P p H d C P p H d

    = +

    + +

    r r

    r r

    R R R R

    R R R R

    R

    Observing that

    ( ) ( )0 1

    0 1| || | 1

    H HZ Z

    p H d p H d= = r rR R R R

  • ( ) ( )( ) ( )

    1

    0 0

    1 01 11 1|

    0 10 1 11

    0 10 00 0|

    |

    |

    H

    Z H

    P C C p HPC PC d

    P C C p H

    = + + r

    r

    RR

    RR

    The integral represents the cost controlled by those points R that we assign to 0Z .

    The value of R where the second term is larger than the first contribute to the negative amount to the integral and should be included in 0Z .

    The value of R where two terms are equal has no effect. The decision regions are defined by the statement: If ( ) ( ) ( ) ( )

    1 01 01 11 1 0 10 11 0r| r|

    R | R |H H

    P C C p H P C C p H , assign R to 1Z and say that 1H is true. Otherwise assign R to 0Z and say that 0H is true.

  • This may be expressed as:

    ( )( )

    ( )( )

    01

    10

    1r| 0 0010

    0 1 01 11r|

    R |

    R |

    HH

    HH

    p H P C C

    p H P C C

    ( )( )( )

    1

    0

    1r|

    0r|

    R |R

    R |H

    H

    p H

    p H = is called likelihood ratio.

    Regardless of the dimension of R , ( ) R is one-dimensional variable. Data processing is involved in computing ( )R and is not affected by

    the prior probabilities and cost assignments.

    The quantity ( )( )

    0 10 00

    1 01 11

    P C C

    P C C is the threshold of the test.

  • The can be left as a variable threshold and may be changed if our a priori knowledge or costs are changed.

    Bayes criterion has led us to a Likelihood Ratio Test (LRT)

    ( )0

    1

    RH

    H

    An equivalent test is ( )0

    1

    ln R lnH

    H

  • Summary of the Bayesian test: The Bayesian test can be conducted simply by calculating the

    likelihood ratio ( )R and comparing it to the threshold. Test design: Assign a-priori probabilities to the source outputs. Assign costs for each action. Assume distribution for ( )

    11|

    |H

    p Hr

    R , ( )0

    0||

    Hp H

    rR .

    calculate and simplify the ( )R

    Likelihood ratio processor

    DataProcessor

    ( )R( )

    1

    0

    RH

    H

    Threshold device DecisionR

  • Special case.

    00 11 0C C= = 01 10 1C C= =

    ( ) ( )0 1

    0 0

    0 0 1 1| || |

    H HZ Z Z

    P p H d P p H d

    = + r rR R R RR

    ( ) ( )01

    00 1

    1

    ln R ln ln ln 1H

    H

    PP P

    P =

    When the two hypotheses are equally likely, the threshold is zero.

  • Sufficient statistics.

    Sufficient statistics is a function T that transfers the initial data set to the new data set ( )T R that still contains all necessary information contained in R regarding the problem under investigation.

    The set that contains a minimal amount of elements is called minimal

    sufficient statistics.

    When making a decision knowing the value of the sufficient statistic

    is just as good as knowing R .

  • The integrals in the Bayes test. False alarm:

    ( )0

    1

    0||F H

    Z

    P p H d= r R R . We say that target is present when it is not.

    Probability of detection:

    ( )1

    1

    1||D H

    Z

    P p H d= r R R . Probability of miss:

    ( )1

    0

    1||M H

    Z

    P p H d= r R R . We say target is absent when it is present.

  • Special case: the prior probabilities unknown. Minimax test.

    ( ) ( )

    ( ) ( )

    0 0

    0 0

    1 1

    0 0

    00 0 0 10 0 0| |

    11 1 1 01 1 1| |

    | |

    | |

    H HZ Z Z

    H HZ Z Z

    C P p H d C P p H d

    C P p H d C P p H d

    = +

    + +

    r r

    r r

    R R R R

    R R R R

    R

    If the regions 0Z and 1Z fixed the integrals are determined. ( ) ( )( )0 10 1 11 1 01 11 0 10 00 1M FPC PC P C C P P C C P= + + R 0 11P P= The Bayesian risk will be function of 1P .

  • ( ) ( )

    ( ) ( ) ( )1 00 10

    1 11 00 01 11 10 00

    1 F F

    M F

    P C P C P

    P C C C C P C C P

    = + + +

    R

    Bayesian test can be found if all the costs and a priori probabilities are known.

    If we know all the probabilities we can calculate the Bayesian cost.

    Assume that we do not know 1P and just assume a certain one *

    1P and design a corresponding test.

    If 1P changes the regions 0 1,Z Z changes and with these also FP and DP .

  • The test is designed for *1P but the actual a priori probability is 1P . By assuming *1P we fix FP and DP . Cost for different 1P is given by a

    function ( )*1 1,P PR . Because the threshold is fixed the cost ( )*1 1,P PR is a linear function of *1P .

    Bayesian test minimizes the risk for *1P . For other values of 1P

    ( ) ( )*1 1 1,P P PR R ( )1PR is strictly concave. (If ( )R is a

    continuos random variable with strictly

    R

    RF

    RBC11

    C00

    P1

    *

    1P10

    A function of 1P if *

    1P is fixed

  • monotonic probability distribution function, the change of always change the risk.)

    Minimax test. The Bayesian test designed to minimize the maximum possible risk is

    called a minimax test. 1P is chosen to maximize our risk ( )*1 1,P PR . To minimize the maximum risk we select the *1P for which ( )1PR is

    maximum. If the maximum occurs inside the interval [ ]0,1 , the ( )*1 1,P PR will

    become a horizontal line. Coefficient of 1P must be zero. ( ) ( ) ( )11 00 01 11 10 00 0M FC C C C P C C P + = This equation is

    the minimax equation.

  • RRF

    RB

    C11

    C00

    P110

    R

    RF

    RB

    C11

    C00P1

    10

    R

    RF

    RB

    C11C00

    P110

    a) b) c)

    Risk curves: maximum value of Rat a) 1 1P = b) 1 0P = c) 10 1P

  • Special case. Cost function is

    00 11 0C C= =

    01 10,M FC C C C= = . The risk is

    ( ) ( )1 1 0 1F F M M F F F F M MP C P P C P C P PC P PC P= + = +R . The minimax equation is

    M M F FC P C P= .

  • Neyman-Pearson test. Often it is difficult to assign realistic costs of a priori probabilities.

    This can be bypassed if to work with the conditional probabilities FP and DP .

    We have two conflicting objectives to make FP as small as possible and DP as large as possible.

    Neyman-Pearson criterion. Constrain 'FP = and design a test to maximize DP (or minimize

    MP ) under this constraint. The solution can be obtained by using Lagrange multipliers. 'M FF P P = +

  • ( ) ( )1 0

    0 0

    1 0r| r|F R | R+ R | R '

    H HZ Z Z

    p H d p H d

    =

    If FP = , minimizing F minimizes MP . ( ) ( ) ( )

    1 0

    0

    1 0r| r|F 1 ' R | R | R

    H HZ

    p H p H d = + For any positive value of an LRT will minimize F. F is minimized if we assign a point R to 0Z only when the term in the

    bracket is negative.

    If ( )( )

    1

    0

    1r|

    0r|

    R |

    R |H

    H

    p H

    p H< assign point to 0Z (say 0H )

  • F is minimized by the likelihood ratio test. ( )1

    0

    RH

    H

    To satisfy the constraint is selected so that 'FP = . ( )

    00|

    | 'F HP p H d

    = = Value of will be nonnegative because ( )

    00|

    |H

    p H will be zero for negative values of .

    ( )0| 0l Hp X H ( )1| 1l Hp X H

    L

    ( )0| 0l Hp X H ( )1| 1l Hp X H

    L

  • Example. We assume that under 1H the source output is a constant voltage m . Under 0H the sourse output is zero. Voltage is corrupted by an additive noise. The out put is sampled with N samples for each second. Each noise sample is a i.i.d. zero mean Gaussian random variable with variance 2 .

    1

    0

    : , 1, 2, ,

    : , 1, 2, ,i i

    i i

    H r m n i N

    H r n i N

    = + =

    = =

    ( )2

    2

    1 exp2 2in

    Xp X =

    The probability density of ir under each hypothesis is:

  • ( ) ( ) ( )1

    2

    21|

    1| exp2 2ii

    ini ir H

    R mp R H p R m

    = =

    ( ) ( ) ( )0

    2

    20|

    1| exp2 2ii

    ini ir H

    Rp R H p R

    = =

    The joint probability of N samples is:

    ( ) ( )1

    2

    21|1

    1| exp2 2

    Ni

    Hi

    R mp H =

    = r R

    ( )0

    2

    20|1

    1| exp2 2

    Ni

    Hi

    Rp H =

    = r R

  • The likelihood ratio is

    ( )

    ( )22

    1

    2

    21

    1 exp2 2

    1 exp2 2

    Ni

    i

    Ni

    i

    R m

    R

    =

    =

    =

    R

    After cancelling common terms and taking logarithm:

    ( )2

    2 21

    ln .2

    N

    ii

    m NmR = = R

    Likelihood ratio test is:

  • 1

    0

    2

    2 21

    ln2

    N H

    iHi

    m NmR = ,

    1

    0

    2

    1

    ln2

    N H

    iHi

    NmRm

    =+ .

    We use Nmd for normalisation.

    1

    01

    1 ln2

    N H

    iHi

    Nml RN Nm

    == +

    Under 0H l is obtained by adding N independent zero mean Gaussian variables with variance 2 and then dividing by N . Therefore l is ( )0,1N .

  • Under 1H l is ,1NmN

    .

    ( )

    2

    log / /2

    1 lnexp2 2 2F

    d d

    x dP dx erfcd

    +

    = = + where

    Nmd is the distance between the means of the two densities.

  • ( )

    ( )

    ( )

    ( )

    2

    log / /2

    2

    log / /2

    1 exp2 2

    1 logexp2 2 2

    Dd d

    d d

    x dP dx

    y ddy erfcd

    +

    = = =

    In the communication systems a special case is important

    ( ) 0 1Pr F MP P PP + . If 0 1P P= the threshold is one and ( ) ( )12Pr F MP P + .

  • Receiver Operating Characteristics (ROC).

    For a Neyman-Pearson test the values of FP and DP completely specify the test performance.

    DP depends on FP . The function of ( )D FP P is defined as the Receiver Operating Characteristic (ROC).

    The Receiver Operating Characteristic (ROC) completely describes the performance of the test as a function of the parameters of interest.

  • Example.

  • Properties of ROC. All continuous likelihood tests have ROCs that are concave

    downward. All continuous likelihood ratio tests have ROCs that are above the

    D FP P= . The slope of a curve in a ROC at a particular point is equal to the

    value of the threshold required to achieve the FP and DP at that point

  • Whenever the maximum value of the Bayes risk is interior to the interval (0,1) of the P1 axis the minimax operating point is the intersection of the line

    ( ) ( )( ) ( )11 00 01 11 10 001 0D FC C C C P C C P + = and the appropriate curve on the ROC.

  • Determination of minimax operating point.

  • Conclusions. Using either the Bayes criterion of a Neyman-Pearson criterion, we

    find that the optimum test is a likelihood ratio test. Regardless of the dimension of the observation space the optimum

    test consist of comparing a scalar variable ( ) R with the threshold. For the binary hypothesis test the decision space is one dimensional. The test can be simplified by calculating the sufficient statistics. A complete description of the LRT performance was obtained by

    plotting the conditioning probabilities DP and FP as the threshold was varied.

  • M Hypotheses. We choose one of M hypotheses There are M source outputs each of which corresponds to one of M

    hypotheses. We are forsed to make decisions. There are 2M alternatives that may occur each time the experiment

    is conducted.

    Probabilistictransition

    mechanismSource

    NH

    #1H0H

    Z: Observation space

    R

    R

    Zi

    Z0

    Say:H0

    Z1

    Say:H1

    Say:Hi

    ( )0

    0r|R |

    Hp H

    ( )1 1r|

    R |H

    p H

    ( )r| R |N NHp H

  • Bayes Criterion.

    ijC cost of each course of actions.

    iZ region in observation space where we chose iH

    iP a priori probabilities.

    ( )1 1

    |0 0

    |j

    i

    M M

    j ij jHi j Z

    PC p H d

    = == r R RR

    R is minimized through selecting iZ .

  • Example M = 3.

    0 1 2Z Z Z Z= + +

    ( ) ( )

    ( ) ( )

    ( ) ( )

    ( ) ( )

    ( )

    0 0

    1 2 1

    0 1

    2 0 2

    1 1

    0 2

    2 2

    0 1 0

    2

    1

    0 00 0 0 10 0| |

    0 20 0 1 11 1| |

    1 01 1 1 21 1| |

    2 22 2 2 02 2| |

    2 12 2|

    | |

    | + |

    | |

    | + |

    |

    H HZ Z Z Z

    H HZ Z Z Z

    H HZ Z

    H HZ Z Z Z

    HZ

    PC p H d PC p H d

    PC p H d PC p H d

    PC p H d PC p H d

    PC p H d PC p H d

    PC p H d

    = +

    +

    + +

    +

    r r

    r r

    r r

    r r

    r

    R R R R

    R R R R

    R R R R

    R R R R

    R R

    R

  • ( ) ( ) ( ) ( )

    ( ) ( ) ( ) ( )

    ( ) ( ) ( ) ( )

    2 1

    0

    0 2

    1

    0 1

    2

    0 00 1 11 2 22

    2 02 22 2 1 01 11 1| |

    0 10 00 0 2 12 22 2| |

    0 20 00 0 1 21 11 1| |

    | |

    | |

    | |

    H HZ

    H HZ

    H HZ

    PC PC PC

    P C C p H P C C p H d

    P C C p H P C C p H d

    P C C p H P C C p H d

    = + +

    + + + + + +

    r r

    r r

    r r

    R R R

    R R R

    R R R

    R

    R is minimized if we assign each R to the region in which the value of the integrand is the smallest.

    Label the integrals ( ) ( ) ( )0 1 2, ,I I IR R R .

  • ( ) ( ) ( )0 1 2 0 and , chooseI I I H
  • ( ) ( ) ( ) ( ) ( )

    ( ) ( ) ( ) ( ) ( )

    ( ) ( ) ( ) ( ) ( )

    0 2

    1 2

    0 1

    2 1

    0 1

    0 2

    or

    0 001 01 11 1 10 2 12 02 2or

    or

    0 002 02 22 2 20 1 21 01 1or

    or

    02 12 22 2 20 10 1 21 11 1or

    H H

    H H

    H H

    H H

    H H

    H H

    P C C P C C P C C

    P C C P C C P C C

    P C C P C C P C C

    +

    +

    +

    R R

    R R

    R R

    M hypotheses always lead to a decision space that has, at most, 1M dimensions.

    Decision Space

    2H

    1H0H

    ( )1 R

    ( )2 R

  • Special case. (often in communication)

    00 11 22 0C C C= = = 1,ijC i j= ( ) ( )

    ( ) ( )

    ( ) ( )

    1 2

    1 00 2

    1 2

    2 00 1

    0 2

    2 10 1

    or

    1 1 0 0| |or

    or

    2 2 0 0| |or

    or

    2 2 0 1| |or

    | |

    | |

    | |

    H H

    H HH H

    H H

    H HH H

    H H

    H HH H

    P p H P p H

    P p H P p H

    P p H P p H

    r r

    r r

    r r

    R R

    R R

    R R

    Compute the posterior probabilities and choose the largest. Maximum a posteriori probability computer.

  • Special case. Degeneration of hypothesis. What happens if to combine 1H and 2H :

    12 21 0C C= = . For simplicity

    01 10 20 02C C C C= = =

    00 11 22 0C C C= = = First two equations of the test reduce to

    ( ) ( )1 2

    0

    or

    1 1 2 2 0

    H H

    HP P P + R R

    1 2or H H

    0H

    ( )2 R

    ( )1 R

  • Dummy hypothesis. Actual problem has two hypothesis 1H and 2H . We introduce a new one 0H with a priori probability 0 0P = Let 1 2 1P P+ = and 12 02 21 01,C C C C= = We always choose 1H or 2H . The test reduces to:

    ( ) ( ) ( ) ( )21

    2 12 22 2 1 21 11 1

    H

    HP C C P C C R R

    Useful if ( )( )

    2

    1

    2|

    1|

    |

    |H

    H

    p H

    p Hr

    r

    R

    R difficult to work with but ( )1 R , ( )2 R are

    simple.

  • Conclusions. 1. The minimum dimension of the decision space is no more that

    1M . The boundaries of the decision regions are hyperplanes in the ( )1 1, , m plane. 2. The test is simple to find but error probabilities are often difficult to compute. 3. An important test is the minimum total probability of error test. Here we compute the a posteriori probability of each test and choose the largest.