chapter 2. cramer-rao lower bound - uniud

CHAPTER 2. Cramer-Rao lower bound

• Given an estimation problem, what is the variance of thebest possible estimator?

• This quantity is given by the Cramer-Rao lower bound(CRLB), which we will study in this section.

• As a side product, the CRLB theorem gives also a methodfor finding the best estimator.

• However, this is not very practical. There exists easiermethods, which will be studied in later chapters.

Discovering the CRLB idea

• Consider again the problem of estimating the DC level Ain the model

x[n] = A+w[n],

where w[n] ∼ N(0,σ2).• For simplicity, suppose that we’re using only a single

observation to do this: x[0].• Now the pdf of x[0] is

p(x[0];A) =1√

2πσ2exp

[−

12σ2 (x[0] −A)

2]

Discovering the CRLB idea (cont.)

• Once we have observed x[0], say for example x[0] = 3,some values of A are more likely than others.

• Actually, the pdf of A has the same form as the PDF of x[0]:

pdf of A =1√

2πσ2exp

[−

12σ2 (3 −A)2

]This pdf is plotted below for two values of σ2.


0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

x[0]

Value of A

PDF plotted versus A in the case x[0]=3 and σ2 = 1/3

0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

x[0]

Value of A

PDF plotted versus A in the case x[0]=3 and σ2 = 1

• Observation: in the upper case we will probably get moreaccurate results.

Discovering the CRLB idea

• If the pdf is viewed as a function of the unknownparameter (with x fixed), it is called the likelihood function.

• Often the likelihood function has an exponential form.Then it’s usual to take the natural logarithm to get rid ofthe exponential. Note that the maximum of the newlog-likelihood function does not change.

• The "sharpness" of the likelihood function determines howaccurately we can estimate the parameter. For example thepdf on top is the easier case. If a single sample is observedas x[0] = A+w[0], then we can expect a better estimate ifσ2 is small.


• How to quantify the sharpness?• We could try to guess a good estimator, and find it’s

variance• But this gives only information of our estimator, not of the

whole estimation problem (and all possible estimators).

• Is there any measure that would be common for allpossible estimators for a specific estimation problem?Specifically, we’d like to find the smallest possible variance.

• The second derivative of the likelihood function (orlog-likelihood) is one alternative for measuring thesharpness of a function.


• Let’s see what it is in our simple case. Note, that in thiscase the minimum variance of all estimators is σ2, sincewe’re using only one sample (we can’t prove it, but itseems obvious).

• From the likelihood function

p(x[0];A) =1√

2πσ2exp

[−

12σ2 (x[0] −A)

2]

we get the log-likelihood function:

lnp(x[0];A) = − ln√

2πσ2 −1

2σ2 (x[0] −A)2


• The derivative with respect to A is:

∂ lnp(x[0];A)∂A

=1σ2 (x[0] −A)

• And the second derivative:

∂2 lnp(x[0];A)∂A2 = −

1σ2


• Since σ2 is the smallest possible variance, in this case wehave an alternative way of finding the minimum varianceof all estimators:

minimum variance = σ2 =1

−∂2 lnp(x[0];A)

∂A2

• Does our hypothesis hold in general?


• The answer is yes, with the following notes.• Note 1: if the function depends on the data x, take the

expectation over all x.• Note 2: if the function depends on the parameter θ,

evaluate the derivative at the true value of θ.

• Thus, we have the following rule:

minimum variance of any unbiased estimator =1

−E[∂2 lnp(x;θ)∂θ2

]• The following two slides state the above in mathematical

terms. Additionally, there is a rule for finding an estimatorreaching the CRLB.

Cramer Rao Lower Bound - theorem

Theorem: CRLB - scalar parameterIt is assumed that the pdf p(x; θ) satisfies the “regularity” condition

E[∂ lnp(x; θ)

∂θ

]= 0, ∀θ (1)

where the expectation is taken with respect to p(x; θ). Then, thevariance of any unbiased estimator θ must satisfy

var(θ) >1


] (2)

Cramer Rao Lower Bound - theorem (cont.)

where the derivative is evaluated at the true value of θ and theexpectation is taken with respect to p(x; θ). Furthermore, an unbiasedestimator may be found that attains the bound for all θ if and only if

∂ lnp(x; θ)∂θ

= I(θ)(g(x) − θ) (3)

for some functions g and I. That estimator, which is the MVUestimator is θ = g(x) and the minimum variance is 1/I(θ).

Proof. Three pages of integrals; see appendix 3A of Kay’s book.

CRLB Example 1: estimation of DC level inWGN

• Example 1. DC level in white Gaussian noise with N datapoints:

x[n] = A+w[n], n = 0, 1, . . . ,N− 1

What’s the minimum variance of any unbiased estimator usingN samples?

CRLB Example 1: estimation of DC level inWGN (cont.)

• Now the pdf (and the likelihood function) is the product ofindividual densities:

p(x;A) =

N−1∏n=0

1√2πσ2

exp[−

12σ2 (x[n] −A)

2]

=1

(2πσ2)N2

exp[−

12σ2

N−1∑n=0

(x[n] −A)2].


• The log-likelihood function is now

lnp(x;A) = − ln[(2πσ2)N2 ] −

12σ2

N−1∑n=0

(x[n] −A)2.

• The first derivative is

∂ lnp(x,A)∂A

=1σ2

N−1∑n=0

(x[n] −A)

=N

σ2 (x −A),

where x denotes the sample mean.


• The second derivative has a simple form:

∂2 lnp(x,A)∂A2 = −

N

σ2

• Therefore, the minimum variance of any unbiasedestimator is

var(A) >σ2

N

• In lecture 1 we saw that this variance can be achieved usingthe sample mean estimator. Therefore, the sample mean isa MVUE for this problem—no better estimators exist.


• Note that we could have arrived at the optimal estimatoralso using the CRLB theorem, which says that an unbiasedestimator attaining the CRLB exists if and only if ∂ lnp(x,A)

∂A

can be factored as

∂ lnp(x,A)∂A

= I(A)(g(x) −A).


• Earlier we saw that

∂ lnp(x,A)∂A

=N

σ2 (x −A),

which means that this factorization exists with

I(A) =N

σ2 and g(x) = x.

Furthermore, g(x) is the MVUE and 1/I(A) is its variance.

CRLB Example 2: estimation of the phase of asinusoid

• Example 2. Phase estimation: estimate the phase φ of asinusoid embedded in WGN. Suppose that the amplitudeA and frequency f0 are known.

• We measure N data points assuming the model:

x[n] = A cos(2πf0n+ φ) +w[n], n = 0, 1, . . . ,N− 1

How accurately is it possible to estimate φ? Can we findthis estimator?

CRLB Example 2: estimation of the phase of asinusoid (cont.)

• Let’s start with the pdf:

p(x;φ) =1

(2πσ2)N2

exp

{−

12σ2

N−1∑n=0

[x[n] −A cos(2πf0n+ φ)]2

}


• The derivative of its logarithm is

∂ lnp(x;φ)∂φ

= −1σ2

N−1∑n=0

[x[n] −A cos(2πf0n+ φ)] ·A sin(2πf0n+ φ)

= −A

σ2

N−1∑n=0

[x[n] sin(2πf0n+ φ) −A cos(2πf0n+ φ) sin(2πf0n+ φ)︸︷︷︸

12 sin(2·(2πf0n+φ))

]

= −A

σ2

N−1∑n=0

[x[n] sin(2πf0n+ φ) −

A

2sin(4πf0n+ 2φ)

]


• And the second derivative:

∂2 lnp(x;φ)∂φ2 = −

A

σ2

N−1∑n=0

[x[n] cos(2πf0n+φ)−A cos(4πf0n+2φ)]


• This time the second derivative depends on the data x.Therefore, we have to take the expectation:

E

[∂2 lnp(x;φ)

∂φ2

]= E

[−A

σ2

N−1∑n=0

[x[n] cos(2πf0n+ φ) −A cos(4πf0n+ 2φ)]

]

= −A

σ2

N−1∑n=0

[E[x[n]] cos(2πf0n+ φ) −A cos(4πf0n+ 2φ)]


• The expectation simplifies to

E[x[n]] = A cos(2πf0n+ φ) + E[w[n]] = A cos(2πf0n+ φ),

and thus we have the following form

E

[∂2 lnp(x;φ)

∂φ2

]= −

A

σ2

N−1∑n=0

[A cos2(2πf0n+ φ) −A cos(4πf0n+ 2φ)]

= −A2

σ2

N−1∑n=0

[cos2(2πf0n+ φ) − cos(4πf0n+ 2φ)].


• We can evaluate this approximately by noting that thesquared cosine has the form

cos2(x) =12+

12

cos(2x).


• Therefore,

E

[∂2 lnp(x;φ)

∂φ2

]= −

A2

σ2

N−1∑n=0

[12+

12

cos(4πf0n+ 2φ) − cos(4πf0n+ 2φ)]

= −A2

σ2

N−1∑n=0

[12−

12

cos(4πf0n+ 2φ)]

= −NA2

2σ2 +A2

2σ2

N−1∑n=0

cos(4πf0n+ 2φ)


• As N grows, the effect of the sum of cosines becomesnegligible compared to the first term (unless f0 = 0 orf0 = 1

2 , which are pathological cases anyway).• Therefore, we can approximate roughly:

var(φ) >approx.

−1

−NA2

2σ2

=2σ2

NA2

for any unbiased estimator φ


• Note that in this case the factorization of CRLB theoremcan not be done. This means that an unbiased estimatorachieving the bound does not exist. However, a MVUEmay still exist.

• Note also that we could have evaluated the exact boundaryusing computer for a fixed f0. Now we approximated toget a nice looking formula.

CRLB summary

• Cramer Rao inequality provides lower bound for theestimation error variance.

• Minimum attainable variance is often larger than CRLB.• We need to know the pdf to evaluate CRLB. Often we don’t

know this information and cannot evaluate this bound. Ifthe data is multivariate Gaussian or i.i.d. with knowndistribution, we can evaluate it.1,

• If the estimator reaches the CRLB, it is called efficient.• MVUE may or may not be efficient. If it is not, we have to

use other tools than CRLB to find it (see Chapter 5).• It’s not guaranteed that MVUE exists or is realizable.

1i.i.d. is a widely used abbreviation for independent and identically distributed. This means that such randomvariables all have the same probability distribution and they are mutually independent.

Fisher information

• The CRLB states that

var(θ) >1

−E[∂2 lnp(x[0];θ)

∂θ2

]• The denominator is so significant, that it has its own name:

Fisher information, and is denoted by

I(θ) = −E[∂2 lnp(x; θ)

∂θ2

]

Fisher information (cont.)

• In the proof of CRLB, I(θ) has also another form:

I(θ) = E[(∂ lnp(x; θ)

∂θ

)2]• The Fisher information is

1 nonnegative2 additive for independent observations, i.e., if the samples

are i.i.d.,

I(θ) = −E[∂2 lnp(x; θ)

∂θ2

]= −

N−1∑n=0

E[∂2 lnp(x[n]; θ)

∂θ2

]

CRLB for a general WGN case

• Since we often assume that the signal is embedded inwhite Gaussian noise, it is worthwhile to derive a generalformula of CRLB in this case.

• Suppose, that a deterministic signal depending on anunknown parameter θ in WGN is observed:

x[n] = s[n] +w[n], n = 0, 1, . . . ,N− 1,

where only s[n] depends on θ.

CRLB for a general WGN case (cont.)

• It can be shown, that the variance of any unbiasedestimator θ

var(θ) >σ2

N−1∑n=0

(∂s[n]∂θ

)2

• Note that if s[n] is constant, we have the familiar DC levelin WGN case, and var(θ) > σ2/N, as expected.

CRLB for a general WGN case—example

• Suppose we observe x[n] assuming the model

x[n] = A cos(2πf0n+ φ) +w[n],

where A and φ are know, and f0 is the unkown parameterto be estimated. Moreover, w[n] ∼ N(0,σ2).

• Using the general rule for CRLB in WGN, we only need tosolve

N−1∑n=0

(∂s[n]

∂f0

)2

CRLB for a general WGN case—example (cont.)

• Now, the derivative equals

∂s[n]

∂f0=

∂

∂f0(A cos(2πf0n+ φ)) = −2πAn sin(2πf0n+ φ).

• Thus, the variance of any unbiased estimator f0

var(f0) >σ2

(2πA)2N−1∑n=0

n2 sin2(2πf0n+ φ)

• The plot of this bound for different frequencies is shownbelow.

CRLB for a general WGN case—example (cont.)

• Here N = 10, φ = 0, A2 = σ2 = 1.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.51

2

3

4

5x 10

−4

Frequency f0C

ram

er−

Rao

low

er b

ound

• Below is the case with N = 50.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

1

2

3x 10

−6

Frequency f0

Cra

mer

−R

ao lo

wer

bou

nd

• There exists a number of "easy" frequencies.

CRLB - Transformation of parameters

• Suppose we want to estimate α = g(θ) through estimatingθ first. Is it enough just to apply the transformation to theestimation result?

• As an example, instead of estimating the DC level A ofWGN:

x[n] = A+w[n],

we might be interested in estimating the power A2. Whatis the CRLB for any estimator of A2?

CRLB - Transformation of parameters (cont.)

• It can be shown (Kay: Appendix 3A) that the CRLB is

var(α) >

(∂g∂θ

)2


] .

• Thus, it’s the CRLB of A, but multiplied by(∂g∂θ

)2.

• For example, in the above case of estimating α = A2, theCRLB becomes:

var(A2) >(2A)2

N/σ2 =4A2σ2

N.


• However, the MVU estimator of A does not transform to theMVU estimator of A2.

• This is easy to see by noting that it’s not even unbiased:

E(x2) = E2(x) + var(x) = A2 +σ2

N, A2

• The efficiency of an estimator is destroyed by a nonlineartransformation.

• The efficiency of an estimator is however maintained foraffine transformations.


• Additionally, the efficiency of an estimator is approximatelypreserved over nonlinear transformations if the data recordis large enough.

• For example, α above is asymptotically unbiased. Sincex ∼ N(A,σ2/N), it can be shown, that the variancevar(x2) = 4A2σ2

N + 2σ4

N2 → 4A2σ2

N , as N→∞.

• Thus, the variance approaches the CRLB as N→∞, andα = A2 is asymptotically efficient.

CRLB - Vector Parameter

Theorem: CRLB - vector parameterIt is assumed that the pdf p(x;θ) satisfies the “regularity” condition

E[∂ lnp(x;θ)

∂θ

]= 0, ∀θ (4)

where the expectation is taken with respect to p(x;θ). Then, thecovariance matrix of any unbiased estimator θ satisfies:

Cθ − I−1(θ) > 0 (5)

CRLB - Vector Parameter (cont.)

where “> 0” is interpreted as positive semidefinite. The Fisherinformation matrix I(θ) is given as

[I(θ)]ij = −E[∂ lnp(x;θ)∂θi∂θj

]where the derivatives are evaluated at the true value of θ and theexpectation is taken with respect to p(x;θ).Furthermore, an unbiased estimator may be found that attains thebound in that Cθ = I−1(θ) if and only if

∂ lnp(x;θ)∂θ

= I(θ)(g(x) − θ) (6)

CRLB - Vector Parameter (cont.)

for some function g and somem×m matrix I (m = dim(θ)). Thatestimator, which is the MVU estimator, is θ = g(x) and thecovariance matrix is I−1(θ).

The special case of WGN

• Again, the important special case of WGN is easier thanthe general case.

• Suppose we have observed the datax = [x[0], x[1], . . . , x[N− 1], and assume the model

x[n] = s[n] +w[n],

where s[n] depends on the parameter vectorθ = [θ1, θ2, . . . , θm] and w[n] is WGN

• In this case the Fisher information matrix becomes

[I(θ)]ij =1σ2

N−1∑n=0

∂s[n]

∂θi

∂s[n]

∂θj

CRLB with vector parameters—Example

• Consider the sinusoidal parameter estimation seen earlier,but this time with A, f0 and φ all unknown:

x[n] = A cos(2πf0n+φ)+w[n], n = 0, 1, . . . ,N−1,

with w[n] ∼ N(0,σ2). We can not derive an estimator yet,but let’s see what are their bounds.

• Now the parameter vector θ is

θ =

Af0φ

.

CRLB with vector parameters—Example (cont.)

• Because the noise is white Gaussian, we can apply theWGN form of CRLB from the previous slide (easier thanthe general form). The Fisher information matrix I(θ) is a3× 3 matrix given by

[I(θ)]ij =1σ2

N−1∑n=0

∂s[n]

∂θi

∂s[n]

∂θj

• Let us simplify the notation by using α = 2πf0n+ φ whenpossible.


[I(θ)]11 =1σ2

N−1∑n=0

∂s[n]

∂A

∂s[n]

∂A=

1σ2

N−1∑n=0

cos2 α ≈ N

2σ2

[I(θ)]12 =1σ2

N−1∑n=0

∂s[n]

∂A

∂s[n]

∂f0= −

1σ2

N−1∑n=0

A2πn cosα sinα = −πA

σ2

N−1∑n=0

n sin 2α ≈ 0

[I(θ)]13 =1σ2

N−1∑n=0

∂s[n]

∂A

∂s[n]

∂φ= −

1σ2

N−1∑n=0

A cosα sinα = −A

2σ2

N−1∑n=0

sin 2α ≈ 0

[I(θ)]22 =1σ2

N−1∑n=0

∂s[n]

∂f0

∂s[n]

∂f0=

1σ2

N−1∑n=0

(A2πn sinα)2 ≈ (2πA)2

2σ2

N−1∑n=0

n2

=(2πA)2N(N− 1)(2N− 1)

12σ2

[I(θ)]23 =1σ2

N−1∑n=0

∂s[n]

∂f0

∂s[n]

∂φ=

1σ2

N−1∑n=0

A22πn sin2 α ≈ πA2

σ2

N−1∑n=0

n =πA2N(N− 1)

2σ2

[I(θ)]33 =1σ2

N−1∑n=0

∂s[n]

∂φ

∂s[n]

∂φ=

1σ2

N−1∑n=0

A2 sin2 α ≈ NA2

2σ2


• All approximations are based on the following triangularidentities

cos 2α = 2 cos2 α− 1⇒ cos2 α =12+

12

cos 2α

sin 2α = 2 sinα cosα⇒ sinα cosα =12

sin 2α

A sum of sines or cosines becomes negligible for large N.


• Since the Fisher information matrix is symmetric, we haveall we need to form it:

I(θ) =1σ2

N2 0 00 (πA)2N(N−1)(2N−1)

3πA2N(N−1)

2

0 πA2N(N−1)2

NA2

2

• Inversion with symbolic math toolbox gives the inverse:

I−1(θ) = σ2

2N 0 00 6

A2N(N2−1)π2 − 6A2(N+N2)π

0 − 6A2(N+N2)π

4(2N−1)A2(N+N2)


• The CRLB theorem states that the covariance matrix Cθ ofany unbiased estimator satisfies

Cθ − I−1(θ) > 0,

which means that the matrix on the left is positivesemidefinite.

• In particular, all diagonal elements of a positivesemidefinite matrix are non-negative, that is,[

Cθ − I−1(θ)]ii

> 0.


• In other words, the diagonal elements of the inverse ofFisher information matrix are the bounds for the threeestimators:

var(θi) = [Cθi ]ii >[I−1(θ)

]ii

• The end result is, that

var(A) >2σ2

N

var(f0) >6σ2

A2N(N2 − 1)π2

var(φ) >4(2N− 1)A2(N+N2)


• It is interesting to note, that the bounds are higher thanwhen estimating the parameters individually.

• For example, the minimum variance of f0 is significantlyhigher than when the other parameters are known.

• The plot below shows the approximated lower bound inthis case (straight line) compared to earlier bound forindividual estimation of f0 (curve).


0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

1

2

x 10−4

Frequency f0

Cra

mer

−R

ao lo

wer

bou

nd

CRLB when other parameters are knownCRLB when other parameters are not known

chapter 2. cramer-rao lower bound - uniud

Documents