chapter 7: point estimation (part ii)
TRANSCRIPT
Chapter 7: Point Estimation (Part II)
STK4011/9011: Statistical Inference Theory
Johan Pensar
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > 1 / 21
Overview
1 Methods of Evaluating EstimatorsMean Squared ErrorBest Unbiased EstimatorsSufficiency and Unbiasedness
Covers Sec 7.3.1–7.3.3 in CB.
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > 2 / 21
Mean Squared Error (MSE)
Definition 7.3.1: The mean squared error (MSE) of an estimator W of a parameter θ isthe function of θ defined by Eθ
([W − θ]2
).
The MSE is tractable analytically and it has a natural interpretation in terms of varianceand bias:
Eθ([W − θ]2
)= Varθ(W ) + Eθ(W − θ)2
Definition 7.3.2: The bias of a point estimator W of a parameter θ is
Biasθ(W ) = Eθ(W − θ) = Eθ(W )− θ.
An estimator for which Biasθ(W ) = 0 (that is, Eθ(W ) = θ) for all θ is called unbiased.
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Mean Squared Error 3 / 21
MSE, Bias, and Variance
An estimator with good MSE need to control both variance (random error) and bias(systematic error).
Unbiased estimators are optimal in terms of bias, and the MSE is then equal to thevariance:
Eθ([W − θ]2
)= Varθ(W ).
However, there is typically a bias-variance tradeoff; sometimes, a small increase in biascan be traded for a larger decrease in variance, and thereby a lower MSE.
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Mean Squared Error 4 / 21
Example: Normal MSE
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Mean Squared Error 5 / 21
Unbiased Estimators
The notion of finding the “best MSE” estimator is problematic in the sense that no suchestimator exists in general.
Example: θ = 17 is optimal in MSE at θ = 17, but a terrible estimator in general.
One way to make the problem tractable is to consider a limited class of estimators.
We are going to focus on the class of unbiased estimators, for which the MSE equals thevariance of the estimator, and we choose the estimator with the smallest variance.
In particular, if we can find an unbiased estimator with uniformly smallest variance, we ahave an optimal unbiased estimator w.r.t. the MSE.
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 6 / 21
Best Unbiased Estimator
Definition 7.3.7: An estimator W ∗ is a best unbiased estimator of τ(θ) if it satisfies
Eθ(W ∗) = τ(θ) for all θ,
and for any other estimator W with Eθ(W ) = τ(θ), we have that
Varθ(W ∗) ≤ Varθ(W ) for all θ.
W ∗ is also called a uniform minimum variance unbiased estimator (UMVUE) of τ(θ).
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 7 / 21
Finding a Best Unbiased Estimator
Finding a best unbiased estimator (or UMVUE), if one exists, is not an easy task.
Example: Let X1, . . . ,Xn be iid Poisson(λ).
X and S2 are unbiased estimators of λ.
It can be shown that Varλ(X ) ≤ Varλ(S2) for all λ.
But, what about other unbiased estimators?
One technique for finding a best unbiased estimator is to bound the variance from below,and find an unbiased estimator whose variance equals the bound.
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 8 / 21
The Cramer-Rao Lower Bound
Theorem 7.3.9: Let X1, . . . ,Xn be a sample with pdf f (x | θ), and letW (X ) = W (X1, . . . ,Xn) be any estimator satisfying
d
dθEθ(W (X )
)=
∫X
∂
∂θ[W (x)f (x | θ)]dx and Varθ
(W (X )
)<∞.
Then,
Varθ(W (X )
)≥
[ddθEθ
(W (X )
)]2Eθ
([∂∂θ log f (X | θ)
]2) .NOTE: The quantity Eθ
([∂∂θ log f (X | θ)
]2)is known as the Fisher information and it
measures the amount of information a random sample X carries about θ.
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 9 / 21
The Cramer-Rao Lower Bound - IID Case
Corollary 7.3.10: If the assumptions of Theorem 7.3.9 are satisfied and, additionally, ifX1, . . . ,Xn are iid with pdf f (x | θ), then
Varθ(W (X )
)≥
[ddθEθ
(W (X )
)]2nEθ
([∂∂θ log f (X | θ)
]2) .
NOTE: The C-R lower bound also applies to discrete variables, but the key condition ismodified to enable interchange of summation and differentiation (assumes that the pmf isdifferentiable in θ, which is the case for most common pmfs).
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 10 / 21
A Useful Result For Calculating the Fisher Information
Lemma 7.3.11: If f (x | θ) satisfies
d
dθEθ
(∂
∂θlog f (X | θ)
)=
∫∂
∂θ
[(∂
∂θlog f (x | θ)
)f (x | θ)
]dx
(which is true for an exponential family), then
Eθ
([∂
∂θlog f (X | θ)
]2)
= −Eθ(∂2
∂θ2log f (X | θ)
).
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 11 / 21
Example: Poisson Best Unbiased Estimator
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 12 / 21
Attainment of the C-R Lower Bound
There is in general no guarantee that the C-R bound is sharp, that is, it may be strictlysmaller than the variance of any unbiased estimator.
Corollary 7.3.15: Let X1, . . . ,Xn be iid with f (x | θ), which satisfies the conditions listedin Theorem 7.3.9. If W (X ) = W (X1, . . . ,Xn) is any unbiased estimator of τ(θ), thenW (X ) attains the C-R Lower Bound iff
a(θ)[W (x)− τ(θ)
]=
∂
∂θlog(L(θ | x)
)for some function a(θ).
NOTE: In addition to checking if the bound can be reached, the above result alsoimplicitly gives a way to find a best unbiased estimator.
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 13 / 21
Example: Normal Variance Bound
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 14 / 21
Sufficiency and Unbiased Estimators
The C-R theorem cannot be used for finding a best unbiased estimator if:
f (x | θ) does not satisfy the assumptions required by the theorem,
The bound is unattainable by the considered class of estimators.
In an alternative approach to the C-R theorem, we are going to introduce the concept ofsufficiency in our search for best unbiased estimators.
The main theorem is a clever application of the following results:
E (X ) = E [E (X |Y )] (Thm 4.4.3)
Var(X ) = Var [E (X |Y )] + E [Var(X |Y )] (Thm 4.4.7)
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Sufficiency and Unbiasedness 15 / 21
The Rao-Blackwell Theorem
Theorem 7.3.17: Let W be any unbiased estimator of τ(θ), let T be a sufficientstatistic for θ, and define φ(T ) = E (W |T ). Then, for all θ:
Eθ(φ(T )
)= τ(θ) and Varθ
(φ(T )
)≤ Varθ(W ).
In other words, φ(T ) is a uniformly better unbiased estimator of τ(θ).
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Sufficiency and Unbiasedness 16 / 21
Proof of Thm 7.3.17
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Sufficiency and Unbiasedness 17 / 21
Towards a Characterization of Best Unbiased Estimators
Implied by Thm 7.3.17, we only need to consider estimators that are functions of asufficient statistic in our search for best unbiased estimators.
Moreover, we have that the best unbiased estimator is unique.
Theorem 7.3.19: If W is a best unbiased estimator of τ(θ), then W is unique.
But, if E (φ) = τ(θ) and φ is based on a sufficient statistic T , i.e. E (φ |T ) = φ, how dowe know that φ is best unbiased for τ(θ) (if it does not attain the C-R lower bound)?
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Sufficiency and Unbiasedness 18 / 21
Improving Upon an Unbiased Estimator
Idea: To check if an estimator is best unbiased, see if it can be improved upon:
Let W and U be two estimators for which Eθ(W ) = τ(θ) and Eθ(U) = 0 for all θ.
Consider the unbiased estimator φa = W + aU for which
Varθ(φa) = Varθ(W ) + 2aCovθ(W ,U) + a2Varθ(U).
If for some θ0, we have that Covθ0 (W ,U) 6= 0 we can choose a value on a such that
2aCovθ(W ,U) + a2Varθ(U) < 0 ⇒ Varθ(φa) < Varθ(W ),
meaning that W cannot be best unbiased.
The relationship of W with unbiased estimators of 0 can be used to characterize bestunbiasedness.
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Sufficiency and Unbiasedness 19 / 21
A Characterization of Best Unbiased Estimators
Theorem 7.3.20: If Eθ(W ) = τ(θ), then W is a best unbiased estimator of τ(θ) iff W isuncorrelated with all unbiased estimators of 0.
NOTE: An unbiased estimator of 0 is essentially random noise (the most sensibleestimator of 0 is 0).
The practical usefulness of Thm 7.3.20 is limited in general, since characterizing allunbiased estimators of 0 is typically very difficult, requiring conditions on the pdf/pmf.
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Sufficiency and Unbiasedness 20 / 21
Completeness
Consider a family of pdfs/pmfs with the property that there are no unbiased estimators of0 other than 0 itself (recall completeness), and note that Covθ(W , 0) = 0.
Theorem 7.3.23: Let T be a complete sufficient statistic for a parameter θ, and letφ(T ) be any estimator based only on T . Then φ(T ) is the unique best unbiasedestimator of its expected value.
NOTE: If T is a complete sufficient statistic for a parameter θ and h(X ) any unbiasedestimator of τ(θ), then φ(T ) = E
(h(X ) |T
)is the best unbiased estimator of τ(θ).
STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Sufficiency and Unbiasedness 21 / 21