estimation and detection lec 01

7/25/2019 estimation and detection lec 01

1/5

Lecture 1: Introduction to Statistical Inference

1 Introduction

We are interested in processing of information-bearing signals to extract informa-tion. There are two types of problems of fundamental interest.

1. Detection: Finite number of possible situations

2. Estimation: Nearest to the possible situation

We look at three examples of interest.

Example 1.1 (Communication System). We need to estimate unknown analogsignal from the received signal that is distorted and corrupted.

Source Transducer Transmitter

Channel

ReceiverTransducerOutput

Figure 1: Block diagram for analog communication.

Example 1.2 (Radar communication). Pulse electromagnetic waves are sentand received after possible reflection from a target if it exists. Target could be anaircraft, ship, spacecraft, missile etc. If the target is detected, then one is interested

in estimating range, angle, and velocity of the target. One may be interested intracking the mobile target trajectory or even controlling it.

Example 1.3 (Automatic Control). In automatic control problem, given alinear time invariant system H, we need to design a controller C for achievinga desired response through output signal y(t). Typically, reference signal x(t) isunknown in such systems, and only a noisy version of the state may be observable.

1


2/5

x(t) +

Ce(t)

Hu(t)

y(t)

F

Figure 2: Block diagram for automatic control.

Other applications of estimation and detection theory are in seismology, radioastronomy, sonar, speech, signal, and image processing, biomedical signal process-ing, optimal communications etc.

2 Probability Review

We denote observation space by equipped with a -algebra G, that is a measur-able collection of sets. Further, for all elements ofA G, we have a non-negativeset function P : [0, 1] that satisfies the following axioms of probability:

1. P() = 1,

2. for any disjoint countable collection of sets {An:n N}, we haveP(nAn) =nP(An).

Example 2.1 (Finite Observations). When observation space has finitelymany elements, we can take G= P(). Further, specifying P({}) for all completely specifies the probability set function.

Example 2.2 (Euclidean Space). For the case when observation space = Rn,we takeG=Bn, Borel -algebra on Rn. For this case, it suffices to specify the setfunction P(A) for sets A Gof the form { :ixi, i [n]}.

Definition 2.3 (Expectation). For a real valued function g : R, we denoteits expectation by E[g(Y)] and define it as

E[g(Y)] =

y

g(y)dP(y).

3 Hypothesis Testing

Definition 3.1. Ahypothesis is a statement about a population parameter.

Definition 3.2. The two complementary hypotheses in a hypothesis testing prob-lem are called thenull hypothesisand thealternative hypothesis, and denotedbyH0 and H1 respectively.

2


3/5

We assume that observation is a random variable Y distributed withprobability set function Pi when true hypothesis is Hi for i {0, 1}.

Definition 3.3. A hypothesis test is a rule : {0, 1} that specifies for allvalues ofy, index of the accepted true hypothesis H(y).

Definition 3.4. Region 1 ={y : (y) = 1} is called the rejection region,and 0=

c1 is called the acceptance region.

Definition 3.5. When the true underlying hypothesis is Hj, thecostincurred onaccepting hypothesisHi is denoted by Cij. Uniform cost is given by

Cij = 1{i=j}, i , j {0, 1}.

Definition 3.6. For each hypothesis Hi, theconditional riskis denoted byR

j()

and defined as the expected cost incurred by the decision rule , when it is theunderlying true hypothesis. That is,

Rj() = E

i

Cij1{(y)=i}|Hj holds

= Ej

i

Cij1{(y)=i}

=C0jPj(0) +C1jPj(1).

Our objective is to design a decision rule that minimizes risk. Usually, costsof correct identification of the true hypothesis is low, and cost of incorrect identi-fication is higher. Hence, minimizing risk for any hypothesis would be to ensurethat probability Pj(i) is low for i = j. One cant simultaneously decrease alldecision regions{i}, since they form a partition of observation space .

3.1 Bayesian Hypothesis Testing

In this approach, we assume a prior distribution on hypotheses to be true.Specifically, let i denotes the prior probability ofHi being true.

Definition 3.7. We can defined unconditionalriskas expected value of risk overall possible hypotheses, denoted by r(). That is,

r() = ERj() =j

Rj()j.

Definition 3.8. For two distributions P1(y) and P0(y) we can define likelihoodratio as L(y) = dP1

dP0(y). When two distributions admit density, it is ratio of their

densities at y. For discrete distributions, it is ratio of their probability massfunctions at y.

3


4/5

Theorem 3.9. For a Bayesian hypothesis testing problem optimal decision rulethat minimizes unconditional risk for a prior distribution and costs{Cij} is athreshold based rule called likelihood ratio test . That is,

B(y) = 1{L(y)},

where likelihood ratio L(y) = dP1dP0

(y) and threshold= 0(C10C00)1(C01C11)

.

Proof. We can write unconditional risk as

r() =j

jC0j+

y1

j

j(C1j C0j)dPj(y).

Minimizing unconditional risk is equivalent to selecting rejection region 1 such

that the integrand is negative. That is,

1 = {y :j

j(C1j C0j)dPj(y) 0}.

By the definition of likelihood ratio and the threshold as defined in the theoremhypothesis, theorem follows.

Remark 1. For uniform cost, threshold = 01

and conditional risk

r() =0P0(1) +1P1(0),

is probability of error in detection.

Definition 3.10. We can define posterior probability of hypothesis Hj beingtrue conditioned on observation being y as

j(y) = Pr{Hj is true |Y =y}= jdPj(y)

0dP0(y) +1dP1(y).

Remark 2. Observe that rejection region can be written in terms of posteriorprobabilities as

1 = {y : (C10 C00)0(y) (C01 C11)1(y) 0}.

This is again a likelihood ratio test in term of ratio of posterior probabilitiesL(y) = 1(y)

0(y)and threshold = C10C00

C01C11. That is, Bayes decision rule is

B(y) = 1{L(y)}.

4


5/5

Definition 3.11. The expected cost of choosing hypothesis Hi given observationy is called posterior cost and denoted as Ri(y), where

Ri(y) = E[k,j

Ckj1{(y)=i}|Y =y ] =j

Cijj(y) =Ci00(y) +Ci11(y).

Remark 3. Alternatively, one can also write rejection region in terms of posteriorcosts as

1={y :R0(y) R1(y)}.

Therefore, Bayes decision rule can be interpreted as the one that minimizes theposterior cost of choosing a hypothesis when the observation is y. That is,

B(y) = 1{R1(y)R0(y)

1}.

Remark 4. For uniform cost, Bayes decision rule is likelihood ratio test for poste-rior probabilities when the threshold is unity, That is,

B(y) = 1{L(y)1}.

This is equivalent to maximizing a posterior probability of underlying hypothesisbased on the observation. This is also called a MAP decision rule for binaryhypothesis test.

5

estimation and detection lec 01

Documents