ese 524ese 524 detection and estimation theory

ESE 524ESE 524Detection and Estimation Theory

Joseph A. O’Sullivan

yJoseph A. O Sullivan

Samuel C. Sachs ProfessorElectronic Systems and Signals Research

LaboratoryLaboratoryElectrical and Systems Engineering

Washington University211 U b H ll211 Urbauer Hall

314-935-4173 (Lynda answers)[email protected]

J. A. O'S. ESE 524, Lecture 10, 02/20/09 11

A tAnnouncements Problem Set 3 is Due in Class 2/20 Problem Set 3 is Due in Class 2/20 We another make-up class Feb. 27 Another Friday after spring break Another Friday after spring break Midterm Exam? Other announcements or questions? Other announcements or questions?

J. A. O'S. ESE 524, Lecture 10, 02/20/09 22

St ti ti l I f rStatistical InferenceTransition Probability or pdf

Inference AlgorithmL lik lih d ti t tTransition Probability or pdf

p(R|θ)- Log-likelihood ratio test- Parameter estimate

Parameter Space- Hypothesis

Data Space-Continuous

Inference SpaceHypothesisHypothesis

- ContinuousContinuous

-Discrete- Random process

- Hypothesis- Continuous

J. A. O'S. ESE 524, Lecture 10, 02/20/09 3

Outline: Introduction to Estimation ThTheory Range of problems studied Range of problems studied Minimum mean cost problems

Minimum mean square error estimation Minimum mean square error estimation Minimum absolute error estimation Maximum a posteriori estimation Other

Maximum likelihood for nonrandom parameters

Fisher information and the Cramer-Rao b d

J. A. O'S. ESE 524, Lecture 10, 02/20/09 44

bound

Range of Estimation TheoryP bl S di dProblems Studied

Cost Function Solution

Random Parameters

Mean Square Error Posterior Mean

Mean Absolute Error MedianMean Absolute Error Median

Likelihood Function: Maximum a posteriori

Likelihood equationMaximum a posteriori equation

Other mean cost Generalized mean

Nonrandom parameters

Likelihood Function: Maximum likelihood

Likelihood equation

Other costJ. A. O'S. ESE 524, Lecture 10, 02/20/09 55

Other cost

R d P r t r E ti tiRandom Parameter Estimation Prior on the parameters |( ), ( | )p ps r ss S r R S Conditional pdf on the data

given the parameters Bayes’ Rule gives posterior

df

{ }{ }

|( ) ( | )ˆ ˆ[ , ( )], [ , ( )]

ˆ ˆ* arg min [ , ( )]

p pC E C

E C=

s r s

S s R s s r

s s s rpdf

Cost function is given Select that estimator that

minimizes the mean cost

{ }ˆ

g [ ( )]s

{ }{ } { }{ }

|ˆ ˆ[ , ( )] [ , ( )] ( | ) ( )

ˆ ˆ[ , ( )] [ , ( )] |

E C C p p d d

E C E E C

=

= r s ss s r S s R R S S S R

s s r s s r rminimizes the mean cost Estimator is a function For each data point,

estimator is single-valued

{ } { }{ }|

|

[ , ( )] [ , ( )] |

ˆ[ , ( )] ( | ) ( )

ˆ ˆ( ) arg min [ , ] ( | )

E C E E C

C p d p d

C p d

=

=

s r r

s r

s s r s s r r

S s R S R S R R

s R S s S R Sestimator is single-valued Minimize conditional mean

cost Generalized notion of mean

ps(S) ( )s | ( | )pr s R S

|ˆ

( ) g [ , ] ( | )p s rs

J. A. O'S. ESE 524, Lecture 10, 02/20/09 66

Generalized notion of mean



Random Parameters





Other mean cost Generalized mean



Likelihood equation

Other costJ. A. O'S. ESE 524, Lecture 10, 02/20/09 77

Other cost



Random Parameters


Mean Absolute Error Median2ˆ ˆ[ ]C =S s S sMean Absolute Error Median


Likelihood equation

|ˆ

[ , ]

ˆ ˆ( ) arg min [ , ] ( | )MMSE

C

C p d

= −

= s rs

S s S s

s R S s S R SMaximum a posteriori equationOther mean cost Generalized mean

{ } ( ){ }

2|

ˆ

2

ˆarg min ( | )

ˆ ˆ| 2 |

p d

E E

= −

∇

s rs

S s S R S



Likelihood equation

Other cost

{ } ( ){ }( )

ˆ ˆ ˆ| 2 |

ˆ2 [ | ] 0

E E

E

∇ − = − −

= − − =

s s s r s s r

s r s

J. A. O'S. ESE 524, Lecture 10, 02/20/09 88

Other cost( ) [ | ]MMSE E= =s R s r R



Random Parameters


Mean Absolute MedianMean Absolute Error

Median

Likelihood Function: Likelihood ˆ

ˆ ˆ[ , ]C S s S s= −

Maximum a posteriori equationOther mean cost Generalized mean

ˆ

| |ˆ( | ) ( | )

ˆ ( ) median of posterior

s

s ss

MAE

p S dS p S dS

s

∞

−∞=

= r rR R

R



Likelihood equation

MAE

J. A. O'S. ESE 524, Lecture 10, 02/20/09 99

Other cost



Random Parameters





Other mean cost Generalized mean|ˆ ( ) arg max ln ( | )

l ( | ) l ( ) l ( )

MAP sS

s p S=

rR R



Likelihood equation

Other cost

|

|

arg max ln ( | ) ln ( ) ln ( )

ln ( | ) ln ( )0

s sS

s s

p S p S p

p S p S

= + −

∂ ∂= +

r r

r

R R

R

J. A. O'S. ESE 524, Lecture 10, 02/20/09 1010

Other cost |0ˆˆ

s

MAPMAPS S S sS s

= +∂ ∂ ==

Comments on Random Parameter E i iEstimation If the posterior is symmetric around its mean, p y ,

then the posterior mean (MMSE estimate) equals the posterior median (MAE estimate).

If the posterior mean is also the maximum, the If the posterior mean is also the maximum, the MAP equals the MMSE estimate.

If the cost function is symmetric in the error and the posterior is symmetric around the mean thenthe posterior is symmetric around the mean, then the minimum cost estimate equals the MMSE estimate.

J. A. O'S. ESE 524, Lecture 10, 02/20/09 11

Oth r C t F tiOther Cost Functions Parameters may take many forms

Amplitude, frequency, phase Intensity of a Poisson (concentration of radioactive

substance) Variance of noise in an amplifier or circuit Variance of noise in an amplifier or circuit Direction: SO(3); distance and direction: SE(3) Subspace in signal space Deformation or warping: image or volume warping Deformation or warping: image or volume warping

Distance or other discrepancy must be defined on the parameter space Nonnegative, zero at truth, monotonic in some sense Nonnegative, zero at truth, monotonic in some sense

Example: map parameter into a matrix and use a matrix-distance (or distance squared like sum of square errors) to induce a discrepancy in

J. A. O'S. ESE 524, Lecture 10, 02/20/09 1212

q ) p yparameter space

T d N d PToday: Nonrandom Parameters Maximum likelihood for nonrandom Maximum likelihood for nonrandom

parameters Fisher information and the Cramér-Rao s e o at o a d t e C a é ao

bound

J. A. O'S. ESE 524, Lecture 10, 02/20/09 13

N r d P r t r E ti tiNonrandom Parameter Estimation There is no prior on the parameters. Concentrate on the maximum likelihood rule: find the

parameter that maximizes the likelihood function or equivalently the loglikelihood function.

Nonrandom parameter version of MAP estimation Performance?

| |ˆ ( ) arg max ( | ) arg max ln ( | )

ln ( | )

ML s sS S

s p S p S

p S

= =

∂

r rR R R

R|ln ( | )0

ˆ

s

ML

p SS S s

∂=

∂ =

r R

J. A. O'S. ESE 524, Lecture 10, 02/20/09 1414

N r d P r t r E ti tiNonrandom Parameter Estimation There is no prior on the parameters. Concentrate on the maximum likelihood rule: find the

parameter that maximizes the likelihood function or equivalently the loglikelihood function.

Nonrandom parameter version of MAP estimation Single variable and

multiple variable| |ˆ ( ) arg max ( | ) arg max ln ( | )ML s s

S Ss p S p S= =r rR R R

p Performance? |ln ( | )

0ˆ

ln ( | )

s

ML

p SS S s

p

∂=

∂ =

= ∇

r R

0 R S|ln ( | )ˆML

p= ∇=

S r s0 R SS s

S( )s | ( | )pr s R S

J. A. O'S. ESE 524, Lecture 10, 02/20/09 15

( )s|

Maximum Likelihood Estimation

Repeated measurements of a 21 2 i i d (0 )

si N σ

∈+

R

Nmeasurements of a deterministic variable in Gaussian noise

Solve likelihood( )

2

2

|

, 1, 2,..., , i.i.d. (0, )

1( | ) exp

i i i n

i

Ni

r s w i N ww s

R Sp S

σ= + =

− = − ∏R

N

Solve likelihood equation

ML estimate is the limit of the MMSE

ti t SNR

( )

| 221

2

| 21

( | ) exp22

ln ( | ) constant2

si nn

Ni

si n

p S

R Sp S

σπσ

σ

=

=

−= − +

∏

r

r

R

Restimate as SNR goes to infinity Prior variance

goes to infinity

( )1

|2

1ˆ ˆ

ln ( | )0

1ML ML

i n

Ns i

i nS s S s

N

p S R SS σ

=

== =

∂ −= =

∂ r R

Performance? MSE=(mean error)2+(error variance)

1

1ˆ

ˆ ( )

ML ii

MMSE

s RNSs

=

=

=

R1

11

N

ii

NR RSNR N

+

J. A. O'S. ESE 524, Lecture 10, 02/20/09 16

) 11 iSNR N =+

P f D i i i PPerformance: Deterministic Parameters Estimate is random [ ]ˆ( ) ( )E = +s r S B S Bias equals the mean

of the estimate minus the truthV i i

( )( )ˆ ˆ ˆcov( ( )) ( ) ( ) ( ) ( ) TE = − − − − s r s r S B S s r S B S

Variance or covariance matrix of the estimate

For the example, the estimate is unbiased

2

1

, 1, 2,..., , i.i.d. (0, )1ˆ

i i i nN

ML iiN

r s w i N w

s RN

σ

=

= + =

=

N

estimate is unbiased and the variance is easily computed.

In many cases,

[ ] [ ]

( ) ( )

1

22

1ˆ ( )

1ˆ ( )

N

ML ii

N

ML i

E s E r sN

E s s E r s

=

= =

− = −

r

ry ,computing the bias and variance may be hard.

( ) ( )1

2 2

1

( )

1

ML ii

Nn

ii

N

E wN N

σ

=

=

= =

J. A. O'S. ESE 524, Lecture 10, 02/20/09 17

Fisher Information and theC é R B dCramér-Rao Bound Actual performance in terms of variance may be difficult to

compute so bounds on performance are soughtcompute, so bounds on performance are sought. The Cramér-Rao Bound is a lower bound on the variance of

any unbiased estimator. It depends only on the probability distribution for the data, not on any particular estimator.distribution for the data, not on any particular estimator.

Later, we consider algorithms to compute estimates. It is important to note that the Cramér-Rao Bound (and related bounds) are independent of the algorithm.

If the lower bound is achievable, then any estimator that achieves that lower bound is called efficient.

There is a version of the Cramér-Rao Bound for biased ti t ll b t it i t f l b bi iestimators as well, but it is not as useful because bias is

not known (otherwise it could be subtracted). Performance bounds such as the Cramér-Rao Bound may

be used for system design and analysis by evaluating how

J. A. O'S. ESE 524, Lecture 10, 02/20/09 18

be used for system design and analysis by evaluating how the bounds depend on system parameters.

Fisher Information and theC é R B dCramér-Rao Bound

ˆTheorem: Let ( ) be any unbiased estimate of . Assume thats sr

( ) ( )2

2

| | and exist and are absolutely integrable. Then

1

p s p ss s

∂ ∂∂ ∂R R

( ) 2

1ˆvar( ( ))ln |

sp s

Es

≥ ∂ ∂

rr

or, equivalently1ˆvar( ( ))s

≥ −r 2

var( ( ))ln

sp

E≥

∂r

r( )2

.| s

s ∂

J. A. O'S. ESE 524, Lecture 10, 02/20/09 19

Fisher Information and theC é R B d 1Cramér-Rao Bound CRB=1/FI

( ) 2

1ˆTheorem: var( ( ))ln |

sp s

Es

≥ ∂ ∂

rr

CRB 1/FI

[ ] ( ) ( )Proof:

ˆ ˆ( ) ( ) | 0E d R R R( )2

2

or, equivalently1ˆvar( ( )) .

ln |s

p sE

≥ − ∂ ∂

rr

[ ] ( ) ( )

( ) ( ) ( ) ( )

( ) ( )

ˆ ˆ( ) ( ) | 0

|ˆ ˆ( ) | ( ) 1 0

E s s s s p s d

p ss s p s d s s d

s s

− = − =

∂∂ − = − − =∂ ∂

r R R R

RR R R R R

2s ∂

( ) ( ) ( )

( ) ( ) ( )

| ln ||

ln |ˆ( ) | 1

p s p sp s

s sp s

s s p s d

∂ ∂=

∂ ∂∂

− =∂

R RR

RR R R( ) ( )

( ) ( ) ( ) ( )

2

( ) |

ln |ˆ( ) | | 1

ps

p ss s p s p s d

s

∂ ∂ − = ∂

R

R R R R

J. A. O'S. ESE 524, Lecture 10, 02/20/09 20

( ) ( ) ( ) ( )2

2 ln |ˆ( ) | | 1

p ss s p s d p s d

s ∂

− ≥ ∂

RR R R R R Schwarz inequality

C tCondition for equality in Schwarz inequality Condition for an

ML estimateComments Equality is achieved in the CRB

if the Schwarz inequality holds ( ) ( )ln |ˆ( ) ( )

p sk s s s

∂− =

∂R

Rif the Schwarz inequality holds with equality. If an estimator exists that achieves equality (is efficient), then the ML estimator is efficient

( )

( ) ( )ˆ ( )

ˆ ( )

ln |ˆ( ) ( ) 0

ML

ML

s ss s

sp s

k s s ss=

=

∂∂

− = =∂R

R

RR

estimator is efficient. If no efficient estimator exists,

the variance may be arbitrarily larger than the CRB.

ˆ ˆ ˆ( ) ( ), ( ( )) 0ML MLs s k s= =R R R

The second derivative form of Fisher Information is easily found.

For biased estimators the

2

2

( )1ˆvar( ( ))

dB sds

s + ≥

r

Biased Estimator:

For biased estimators, the bound changes. The variance of a biased estimator may be lower than an unbiased estimator

( ) 2( ( ))

ln |p sE

s

∂ ∂

r

J. A. O'S. ESE 524, Lecture 10, 02/20/09 21

estimator. Consider the zero estimator.

M i Lik lih d E ti ti i.i.d. measurements

of a function of a s∈R

Maximum Likelihood Estimationof a function of a deterministic variable in Gaussian noise

Solve likelihood ( )

2

2

( ) , 1, 2,..., , i.i.d. (0, )

( )1

i i i i n

i

N

sr g s w i N ww s

R S

σ∈= + =

R

N

Solve likelihood equation use some preferred solution techniqueFi h i f ti

( )

( )

| 221

2

( )1( | ) exp22

( )ln ( | )

Ni i

si nn

Ni i

R g Sp S

R g Sp S

σπσ=

− = −

−=

∏

r R

R Fisher information is easily computed.

Note the dependence of

( )

| 21

|2

1ˆ ˆ

ln ( | )2

ln ( | ) ( ) ( ) 0ML ML

si n

Ns i i i

i nS s S s

p S

p S R g S dg SS dS

σ

σ

=

== =

= −

∂ −= =

∂

r

r

R

R

pperformance on the true value of the parameter.

|ln

ML MLS s

spJ E∂

= r2 2

21

( | ) ( )1Ni

i n

S dg SS dSσ=

= ∂

r

J. A. O'S. ESE 524, Lecture 10, 02/20/09 22

Maximum Likelihood Estimation:E l C i Amplitude

2 2|

2

ln ( | ) ( )1Ns ip S dg SJ ES dSσ

∂ = = ∂ r r

Example Computations

Signal Energy divided by noise power

1

2

( )( ) 1

( ) 1

i n

ii

nN

S dS

dg S Ng s s JdSd S E

σ

σ

=∂

= = =

Frequency Signal energy

is proportional to the square

( )

( )

22 2

1

( ) 1( )

( ) 2 ( 1)( ) cos(2 ( 1) / ) sin 2 ( 1) /

Ni k

i i i iin n

ii

dg S Eg s sk k J kdS

dg S ig s s i M s i MdS M

σ σππ π

=

= = = =

−= − = − −

to the square of the number of cycles

Exponent

22 2

2 2

4 ( 1) sinin

dS M

J iM

πσ =

= − ( )2 2

2 21

1( 1) ( 1) 2 2

2 ( 1) /

( ) 1( ) ( 1)

N

nN

s i s i sii

Ns i MM

dg S i J i

ππσ

−− −

− ≈

Exponent Positive vs.

negative exponent

( 1) ( 1) 2 22

0

1 1

0 0

( )( ) ( 1) ...

1 1, ,1 1

s i s i siii

in

N NN Ni i

i i

gg s e i e J i edS

e ee iee e

α αα α

α α

σ

α

=

− −

= =

= = − = =

− ∂ −= = − ∂ −

J. A. O'S. ESE 524, Lecture 10, 02/20/09 23( ) ( )

( )21

222

0

1 111 1

N NNNi

i

Ne e e eei ee e

α α α ααα

α αα α

−

=

− − + − ∂ − ∂= = ∂ − ∂ −

ese 524ese 524 detection and estimation theory

Documents