point estimators - statistics -- lecture no. 10 -...

Point EstimatorMethods of Point Estimations

Point EstimatorsSTATISTICS – Lecture no. 10

Jirı Neubauer

Department of Econometrics FEM UO Brnooffice 69a, tel. 973 442029email:[email protected]

8. 12. 2009

Jirı Neubauer Point Estimators


Introduction

Suppose that we manufacture lightbulbs and we want to state theaverage lifetime on the box. Let us say that we have following fiveobserved lifetimes (in hours)

983 1063 1241 1040 1103

which have the average 1086. If it is all the information we have, itseems to be reasonable to state 1086 as the average lifetime.



Introduction

Let the random variable X be the lifetime of a lightbulb, and letE (X ) = µ. Here µ is an unknown parameter. We decide torepeat the experiment to measure a lifetime 5 times and will thenget an outcome on the five random variables X1, . . . ,X5 that arei.i.d. (independent identically distributed). We now estimate µ by

X =1

5

5∑i=1

Xi

which is the sample mean.



Unbiased EstimatorAsymptotically Unbiased EstimatorConsistent EstimatorEfficiency of EstimatorsMean Square Error

Point Estimator

Definition

Let X1, . . . ,Xn be a random sample. The statistic (randomvariable)

T = T (X1,X2, . . . ,Xn) = T (X),

which is a function of the random sample and is used to estimatean unknown parameter θ, is called a point estimator of θ. Wewrite T (X) = θ.




Unbiased Estimator

Definition

The estimator T (X) is said to be unbiased estimator theparameter θ if

E [T (X)] = θ.

The differenceB(θ, T ) = E [T (X)]− θ

is called a bias of the estimator T (X).




Example

Let X1,X2, . . . ,Xn be a random sample from a distribution withthe mean µ and the variance σ2.

The sample mean X is an unbiased estimator of µ, because

E (X ) = E

(1

n

n∑i=1

Xi

)=

1

n

n∑i=1

E (Xi ) = µ.

The sample variance S2 is an unbiased estimator of σ2,because

E (S2) = E

(1

n − 1

n∑i=1

(Xi − X )2

)= · · · = σ2.




Example

Let X1,X2, . . . ,Xn be a random sample from a distribution withthe mean µ and the variance σ2.

The (moment) variance S2n is a biased estimator of σ2,

because

E (S2n ) = E

(1

n

n∑i=1

(Xi − X )2

)= · · · = n − 1

nσ2.

The bias of the estimator S2n is

B(σ2,S2n ) = E (S2

n )− σ2 =n − 1

nσ2 − σ2 =

1

nσ2.

The bias decreases for large n.




Asymptotically Unbiased Estimator

Some estimators are biased but their bias decrease when nincreases.

Definition

Iflim

n→∞E [T (X)] = θ,

then the estimator T (X) is said to be asymptotically unbiasedestimator of the parameter θ.

It easy to see that

limn→∞

E [T (X)− θ] = 0.




Example

The (moment) variance is an asymptotically unbiased estimator ofσ2, because

limn→∞

E (S2n ) = lim

n→∞

n − 1

nσ2 = σ2.




Consistent Estimator

Definition

The statistic T (X) is a consistent estimator of the parameter θ iffor every ε > 0

limn→∞

P(|T (X)− θ| < ε) = 1.

Iflim

n→∞B(θ, T ) = 0 and lim

n→∞D[T (X)] = 0,

then T (X) is the consistent estimator of θ.




Example

Prove that the sample mean is a consistent estimator of theexpected value µ.

According to E (X ) = µ and D(X ) = σ2/n we obtain

B(µ,X ) = E (X )− µ = 0 a limn→∞

D(X ) = limn→∞

σ2

n= 0.




Efficiency of Estimators

If we have two unbiased estimators T1(X) = θ and T2(X) = θ,which should we choose? Intuitively, we should choose the onethat tends to be closer to θ, and since E (T1) = E (T2) = θ, itmakes sense to choose the estimator with the smaller variance.

Definition

Suppose that T1(X) = θ and T2(X) = θ are two unbiasedestimators of θ. If

D(T1(X)) < D(T2(X))

then T1(X) = θ is said to be more efficient than T2(X) = θ.




Example

We can find two unbiased estimators of a parameter λ of Poissondistribution

E (X ) = λ and E (S2) = λ.

It is possible to calculate that

D(X ) < D(S2).

The estimator X is more efficient then the estimator S2.




How to Compare Estimators?

Let us suppose we would like to compare unbiased and biased estimatorsof the parameter θ. In this case might not be suitable to choose one ofthe smallest variance.

The estimator T has the smallestvariance but has a large bias. Eventhe estimator with the smallest biasis not necessary the best one. Theestimator U has no bias but its vari-ance is to large. The estimator Vseems to be the best.




Mean Square Error

Definition

The mean square error of the estimator T of a parameter θ isdefined as

MSE (T ) = E (T − θ)2 = D(T ) + B2(θ, T )

(MSE of estimator = variance of estimator + bias2),

where T − θ is a sample error.




Mean Square Error

The mean square error

indicates the ”average” sample error of estimates which canbe calculated for all possible random sample of the size n.

is a combination of 2 required properties (a small bias anda small variance), that why it is an universal criterion.

If T is an unbiased estimator then MSE (T ) = D(T ).Another possibility how to measure an accuracy of estimators isstandard error

SE =√

D(T ).




Example

The sample mean is an unbiased estimator of the expected valueµ, the standard error is equal to the standard deviation of thesample mean

SE =

√D(X ) = σ(X ) =

σ(X )√n

.

σ(X ) is unknown, we have to estimate it by the sample standarddeviation and we get the estimation

SE =σ(X )√

n=

S√n.




Example

Find the mean square error of S2 and S2n . Let us start with the

statistic S2 which is an unbiased estimator of σ2.

MSE (S2) = D(S2) = E (S2 − σ2)2 = E (S4)− 2σ2E (σ2) + σ4 =

= E (S4)− σ4 = 2σ4

n−1 .

The MSE of the estimator S2n is

MSE (S2n ) = E (S2

n − σ2)2 = E (S4n )− 2n−1

n σ4 + σ4 == E (S4

n )− 2−nn σ4 = 2n−1

n2 σ4,

MSE (S2n ) < MSE (S2) because

2n − 1

n2<

2

n − 1.



Method of MomentsMethod of Maximum Likelihood

Methods of Point Estimations

The definitions of unbiasness and other properties of estimators donot provide any guidance about how good estimators can beobtained. In this part, we discuss two methods for obtaining pointestimators:

the method of moments,

the method of maximum likelihood.

Maximum likelihood estimates are generally preferable to momentestimators because they have better efficiency properties. However,moment estimators are sometimes easier to compute. Bothmethods can produce unbiased point estimators.




Method of Moments

The general idea behind the method of moments is to equatepopulation moments, which are defined in terms of expectedvalues, to the corresponding sample moments. The populationmoments will be functions of the unknown parameters. Then theseequations are solved to yield estimators of the unknownparameters.




Method of Moments

Let us assume the distribution with m ≥ 1 real parametersθ1, θ2, . . . , θm and let X1,X2, . . . ,Xn be a random sample from thisdistribution. Let us suppose that exist moments

µ′r = E (X ri ) for r = 1, 2, . . . ,m.

These moments depend on the parameters θ1, θ2, . . . , θm. Samplemoments are defined by the formula

M ′r =

1

n

n∑i=1

X ri , r = 1, 2 . . . .




Method of Moments

Let X1, . . . ,Xn be a random sample from either a probabilityfunction or probability density function with m unknownparameters θ1, . . . , θm. The moment estimators are found byequating the first m population moments to the first m samplemoments and solving the resulting equations for the unknownparameters

µ′r = M ′r .




Example

Estimation of the parameter λ – Poisson distribution.

Suppose that X1, . . . ,Xn is a random sample from the Poissondistribution Po(λ), we get an equation

µ′1 = M ′1 ⇒ E (Xi ) =

1

n

n∑i=1

Xi ,

the estimator λ of the parameter λ is

λ = X .




Example

Estimation of the parameters µ and σ2 – normal distribution.Suppose that X1, . . . ,Xn is a random sample from the normal distribution

N(µ, σ2).

µ′1 = M ′

1 ⇒ E (Xi ) =1

n

n∑i=1

Xi ,

µ′2 = M ′

2 ⇒ E (X 2i ) =

1

n

n∑i=1

X 2i ⇔ D(Xi ) + E (Xi )

2 =1

n

n∑i=1

X 2i

σ2 + µ2 =1

n

n∑i=1

X 2i

We obtain estimators

µ = X , σ2 =1

n

n∑i=1

X 2i − X

2=

1

n

n∑i=1

(Xi − X )2 = S2n =

n − 1

nS2

.




Example

Estimation of the parameters µ and σ2 – normal distribution.Suppose that X1, . . . ,Xn is a random sample from the normal distribution

N(µ, σ2).

µ′1 = M ′

1 ⇒ E (Xi ) =1

n

n∑i=1

Xi ,

µ′2 = M ′

2 ⇒ E (X 2i ) =

1

n

n∑i=1

X 2i ⇔ D(Xi ) + E (Xi )

2 =1

n

n∑i=1

X 2i

σ2 + µ2 =1

n

n∑i=1

X 2i

We obtain estimators

µ = X , σ2 =1

n

n∑i=1

X 2i − X

2=

1

n

n∑i=1

(Xi − X )2 = S2n =

n − 1

nS2

.Jirı Neubauer Point Estimators



Method of Maximum Likelihood

Let X1,X2, . . . ,Xn be a random sample from either a probabilitydensity function f (x , θ) or a probability function p(x ,θ) with anunknown parameter θ = (θ1, θ2, . . . , θm). A random vectorX = (X1,X2, . . . ,Xn) has either a joint probability density functionor probability function

g(x,θ) = g(x1, x2, . . . , xn,θ) = f (x1,θ)f (x2,θ) · · · f (xn,θ)

or

g(x,θ) = g(x1, x2, . . . , xn,θ) = p(x1,θ)p(x2,θ) · · · p(xn,θ).





The density g(x,θ) is a function of x with a given value of θ. Ifvalues x are given (observed data) than g(x,θ) is a function ofa variable θ. We denote it L(θ, x) and call it a likelihoodfunction.If exists some θ which fulfils

L(θ, x) ≥ L(θ, x),

then θ is a maximum likelihood estimator of the parameter θ.

Sometimes is reasonable to use a logarithm of the likelihoodfunction L(θ, x) = lnL(θ, x). For the maximum likelihoodestimator we can write

L(θ, x) ≥ L(θ, x),

because the logarithm is an increasing function.





The density g(x,θ) is a function of x with a given value of θ. Ifvalues x are given (observed data) than g(x,θ) is a function ofa variable θ. We denote it L(θ, x) and call it a likelihoodfunction.If exists some θ which fulfils

L(θ, x) ≥ L(θ, x),

then θ is a maximum likelihood estimator of the parameter θ.Sometimes is reasonable to use a logarithm of the likelihoodfunction L(θ, x) = lnL(θ, x). For the maximum likelihoodestimator we can write

L(θ, x) ≥ L(θ, x),

because the logarithm is an increasing function.Jirı Neubauer Point Estimators




The Maximum likelihood estimator of the vectorθ = (θ1, θ2, . . . , θm) we obtain by solving a system of equations

∂L(θ, x)

∂θi= 0, i = 1, 2, . . . ,m.




Example

Let X be a Bernoulli random variable. The probability function is

p(x) =

{πx(1− π)1−x x = 0, 1,

0 otherwise.

The likelihood function is

L(π, x) = πx1(1− π)1−x1πx2(1− π)1−x2 . . . πxn(1− π)1−xn =

= πPn

i=1 xi (1− π)n−Pn

i=1 xi

The logarithm of L(π, x) is

L(π, x) =n∑

i=1

xi lnπ +

(n −

n∑i=1

xi

)ln(1− π).




Example

We calculate the maximum of L(π, x)

dL(π, x)

dπ=

∑ni=1 xi

π−

n −∑n

i=1 xi

1− π= 0,

and get the estimator

π =

∑ni=1 xi

n= x .




Example

Find a maximum likelihood estimator of a parameter λ of Poissondistribution Po(λ).

L(λ, x) = e−nλ λPn

i=1 xi

x1!x2! · · · xn!,

L(λ, x) = lnL(λ, x) = −nλ +n∑

i=1

xi lnλ− ln(x1!x2! · · · xn!)

dL(λ, x)

dλ= −n +

n∑i=1

xi ·1

λ= 0

λ =1

n

n∑i=1

xi = x .


point estimators - statistics -- lecture no. 10 -...

Documents