chapter 7: point estimation (part ii)

Chapter 7: Point Estimation (Part II)

STK4011/9011: Statistical Inference Theory

Johan Pensar

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > 1 / 21

Overview

1 Methods of Evaluating EstimatorsMean Squared ErrorBest Unbiased EstimatorsSufficiency and Unbiasedness

Covers Sec 7.3.1–7.3.3 in CB.

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > 2 / 21

Mean Squared Error (MSE)

Definition 7.3.1: The mean squared error (MSE) of an estimator W of a parameter θ isthe function of θ defined by Eθ

([W − θ]2

).

The MSE is tractable analytically and it has a natural interpretation in terms of varianceand bias:

Eθ([W − θ]2

)= Varθ(W ) + Eθ(W − θ)2

Definition 7.3.2: The bias of a point estimator W of a parameter θ is

Biasθ(W ) = Eθ(W − θ) = Eθ(W )− θ.

An estimator for which Biasθ(W ) = 0 (that is, Eθ(W ) = θ) for all θ is called unbiased.

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Mean Squared Error 3 / 21

MSE, Bias, and Variance

An estimator with good MSE need to control both variance (random error) and bias(systematic error).

Unbiased estimators are optimal in terms of bias, and the MSE is then equal to thevariance:

Eθ([W − θ]2

)= Varθ(W ).

However, there is typically a bias-variance tradeoff; sometimes, a small increase in biascan be traded for a larger decrease in variance, and thereby a lower MSE.


Example: Normal MSE


Unbiased Estimators

The notion of finding the “best MSE” estimator is problematic in the sense that no suchestimator exists in general.

Example: θ = 17 is optimal in MSE at θ = 17, but a terrible estimator in general.

One way to make the problem tractable is to consider a limited class of estimators.

We are going to focus on the class of unbiased estimators, for which the MSE equals thevariance of the estimator, and we choose the estimator with the smallest variance.

In particular, if we can find an unbiased estimator with uniformly smallest variance, we ahave an optimal unbiased estimator w.r.t. the MSE.

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 6 / 21

Best Unbiased Estimator

Definition 7.3.7: An estimator W ∗ is a best unbiased estimator of τ(θ) if it satisfies

Eθ(W ∗) = τ(θ) for all θ,

and for any other estimator W with Eθ(W ) = τ(θ), we have that

Varθ(W ∗) ≤ Varθ(W ) for all θ.

W ∗ is also called a uniform minimum variance unbiased estimator (UMVUE) of τ(θ).


Finding a Best Unbiased Estimator

Finding a best unbiased estimator (or UMVUE), if one exists, is not an easy task.

Example: Let X1, . . . ,Xn be iid Poisson(λ).

X and S2 are unbiased estimators of λ.

It can be shown that Varλ(X ) ≤ Varλ(S2) for all λ.

But, what about other unbiased estimators?

One technique for finding a best unbiased estimator is to bound the variance from below,and find an unbiased estimator whose variance equals the bound.


The Cramer-Rao Lower Bound

Theorem 7.3.9: Let X1, . . . ,Xn be a sample with pdf f (x | θ), and letW (X ) = W (X1, . . . ,Xn) be any estimator satisfying

d

dθEθ(W (X )

)=

∫X

∂

∂θ[W (x)f (x | θ)]dx and Varθ

(W (X )

)<∞.

Then,

Varθ(W (X )

)≥

[ddθEθ

(W (X )

)]2Eθ

([∂∂θ log f (X | θ)

]2) .NOTE: The quantity Eθ

([∂∂θ log f (X | θ)

]2)is known as the Fisher information and it

measures the amount of information a random sample X carries about θ.


The Cramer-Rao Lower Bound - IID Case

Corollary 7.3.10: If the assumptions of Theorem 7.3.9 are satisfied and, additionally, ifX1, . . . ,Xn are iid with pdf f (x | θ), then

Varθ(W (X )

)≥

[ddθEθ

(W (X )

)]2nEθ

([∂∂θ log f (X | θ)

]2) .

NOTE: The C-R lower bound also applies to discrete variables, but the key condition ismodified to enable interchange of summation and differentiation (assumes that the pmf isdifferentiable in θ, which is the case for most common pmfs).


Example: Poisson Best Unbiased Estimator


Attainment of the C-R Lower Bound

There is in general no guarantee that the C-R bound is sharp, that is, it may be strictlysmaller than the variance of any unbiased estimator.

Corollary 7.3.15: Let X1, . . . ,Xn be iid with f (x | θ), which satisfies the conditions listedin Theorem 7.3.9. If W (X ) = W (X1, . . . ,Xn) is any unbiased estimator of τ(θ), thenW (X ) attains the C-R Lower Bound iff

a(θ)[W (x)− τ(θ)

]=

∂

∂θlog(L(θ | x)

)for some function a(θ).

NOTE: In addition to checking if the bound can be reached, the above result alsoimplicitly gives a way to find a best unbiased estimator.


Example: Normal Variance Bound


Sufficiency and Unbiased Estimators

The C-R theorem cannot be used for finding a best unbiased estimator if:

f (x | θ) does not satisfy the assumptions required by the theorem,

The bound is unattainable by the considered class of estimators.

In an alternative approach to the C-R theorem, we are going to introduce the concept ofsufficiency in our search for best unbiased estimators.

The main theorem is a clever application of the following results:

E (X ) = E [E (X |Y )] (Thm 4.4.3)

Var(X ) = Var [E (X |Y )] + E [Var(X |Y )] (Thm 4.4.7)

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Sufficiency and Unbiasedness 15 / 21

The Rao-Blackwell Theorem

Theorem 7.3.17: Let W be any unbiased estimator of τ(θ), let T be a sufficientstatistic for θ, and define φ(T ) = E (W |T ). Then, for all θ:

Eθ(φ(T )

)= τ(θ) and Varθ

(φ(T )

)≤ Varθ(W ).

In other words, φ(T ) is a uniformly better unbiased estimator of τ(θ).


Proof of Thm 7.3.17


Towards a Characterization of Best Unbiased Estimators

Implied by Thm 7.3.17, we only need to consider estimators that are functions of asufficient statistic in our search for best unbiased estimators.

Moreover, we have that the best unbiased estimator is unique.

Theorem 7.3.19: If W is a best unbiased estimator of τ(θ), then W is unique.

But, if E (φ) = τ(θ) and φ is based on a sufficient statistic T , i.e. E (φ |T ) = φ, how dowe know that φ is best unbiased for τ(θ) (if it does not attain the C-R lower bound)?


Improving Upon an Unbiased Estimator

Idea: To check if an estimator is best unbiased, see if it can be improved upon:

Let W and U be two estimators for which Eθ(W ) = τ(θ) and Eθ(U) = 0 for all θ.

Consider the unbiased estimator φa = W + aU for which

Varθ(φa) = Varθ(W ) + 2aCovθ(W ,U) + a2Varθ(U).

If for some θ0, we have that Covθ0 (W ,U) 6= 0 we can choose a value on a such that

2aCovθ(W ,U) + a2Varθ(U) < 0 ⇒ Varθ(φa) < Varθ(W ),

meaning that W cannot be best unbiased.

The relationship of W with unbiased estimators of 0 can be used to characterize bestunbiasedness.


A Characterization of Best Unbiased Estimators

Theorem 7.3.20: If Eθ(W ) = τ(θ), then W is a best unbiased estimator of τ(θ) iff W isuncorrelated with all unbiased estimators of 0.

NOTE: An unbiased estimator of 0 is essentially random noise (the most sensibleestimator of 0 is 0).

The practical usefulness of Thm 7.3.20 is limited in general, since characterizing allunbiased estimators of 0 is typically very difficult, requiring conditions on the pdf/pmf.


Completeness

Consider a family of pdfs/pmfs with the property that there are no unbiased estimators of0 other than 0 itself (recall completeness), and note that Covθ(W , 0) = 0.

Theorem 7.3.23: Let T be a complete sufficient statistic for a parameter θ, and letφ(T ) be any estimator based only on T . Then φ(T ) is the unique best unbiasedestimator of its expected value.

NOTE: If T is a complete sufficient statistic for a parameter θ and h(X ) any unbiasedestimator of τ(θ), then φ(T ) = E

(h(X ) |T

)is the best unbiased estimator of τ(θ).


chapter 7: point estimation (part ii)

Documents