decision thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 1/23



Statistical Decision Theory

Abraham Wald(1902 - 1950)

Wald¶s test

Rigorous proof of the consistency of MLE

³Note on the consistency of the maximum

likelihood estimate´, Ann. Math. Statist., 20,

595-601.



Statistical Decision Theory

and Hypothesis Testing.

A major use of statistical inference is its application to decision making

under uncertainty, say Parameter estimation

Unlike classical statistics which is only directed towards the use of

sampling information in making inferences about unknown numerical

quantities, an attempt in decision theory is made to combine the sampling

information with a knowledge of the consequences of our decisions.



Three elements in SDT

State of Nature: , some unknown quantities, say parameters.

Decision Space D : A space of all possible values of decisions/actions/rules/estimators

Loss function L(, d(X)):

± a non-negative function on x D.

± a measure of how much we lose by choosing action d when is used.

± In estimation, a measure of the accuracy of estimators d of .



For example,

_ a1,0U = 0 means ³nuclear warhead is NOT headed to UBC´

= 1 means ³nuclear warhead is headed to UBC´

L(0,0) = 0

L(0,1) = cost of moving

L(, d):

L(1,1) = cost of moving + cost of belongings we cannot move

L(1,0) = loss of belongings +

D ={0,1}={Stay in Vancouver, Leave }



Common loss functions

Univariate ± L1 = | ± d(x)| (absolute error loss)

± L2 = ( ± d(x))2 (squared error loss)

Multivariate ± (Generalized) Euclidean norm:

[ ± d(x)]T Q [ ± d(x)], where Q is positive definite

More generally, ± Non-decreasing functions of L1 or Euclidean norm



Loss function, L (d(X), ), is random

Frequentist Bayesian

L E (in Frequentist) (in Bayesian)

Risk or Posterior risk



Estimator Comparison

The Risk principle: the estimator d1(X) is better than another estimator d2(X) in the sense of risk if R(,d1) R(,d2) for all , with

strict inequality for some .

However, in general, it does not exist

Best estimator (uniformly minimum risk estimator)

d*(X) = arg min R(, d(X)) for all d

Class of all

estimators



For instance, only consider

mean-unbiased estimators.

In particular, UMVUE is the best

unbiased estimator when L2 is

used.

Shrink the class of estimators.

Then find the best estimator in

this class.

Class of estimatorsSmaller class of estimators

I am the best!!

Too Large



Weaken the optimality criterion by considering the maximum

value of the risk over all . Then choose the estimator with the

smallest maximum risk. T he best estimator according to this

minimax principle is called minimax estimator.

Notice that the risk depends on . So, the risks of two estimators

often cross each other. This is another possibility that the best

estimator does not exist.

d1 d2

R(,d)

Winner Loser



Alternatively, we can find the best estimator by minimizing

the average risk with respect to a prior of in the Bayesian

framework.

The estimator having the smallest Bayes risk with a specific prior

is called the Bayes estimator (with ).

Given a prior of , the average risk of the estimator d(X) defined by

is called Bayes risk, r (d)

´ UUT U d d R )(),(



Under Bayes risk principle

R(, d1)

()

R(, d2)

d1 d2

Winner Loser Winner Loser

§i

iid R )(),( UT UT



In general, it is not easy to find the Bayes estimator by

minimizing the Bayes risk.

However, if the Bayes risk of the Bayes estimator is

finite, then the estimator minimizing the posterior risk

and the Bayes estimator are the same.



Some examples for finding the Bayes estimator

(2) Absolute error loss:

min E[|-d| |x]d

The minimizer is med[|x], i.e. the posterior median.

(3) Linear error loss:

L(,d) = K0(-d) if -d>=0 and

= K1(d-) if -d<0

The K0/(K0+K1) th quantile of the posterior is the Bayes

estimator of .



Relationship between minimax and

Bayes estimators

In particular, if the Bayes estimator has a constant risk, then it is minimax.

Denote by d the Bayes estimator with respect to .

If the Bayes risk of d is equal to the maximum risk of d, i.e.

),(sup)(),( T U

T UUUT U d Rd d R !´Then the Bayes estimator d is minimax .



Problems for the risk measure:

The risk measure is too sensitive to the choice of

loss function.

All estimators are assumed to have finite risks.

So, in general, the risk measure fails to use in

problems with heavy tails or outliers.



Other measures:

(1) Pitman measure of closeness, PMC:

d1

d2

QQ d d ||2||||1|| UU P( )1/2

d1 is Pitman-closer to than d2 if the above condition holds for all .



Other measures:

(2) Universal domination, u.d.:

d1(X) is said to universally dominate d2(X) if,

for all nondecreasing functions h and all ,

E[h(|| d1(X) - ||Q)] E[h(|| d2(X) - ||Q)].

(3) Stochastic domination, s.d.:

d1(X) is said to stochastically dominate d2(X) if,

for every c > 0 and all ,

P[|| d1(X) - ||Q c] P[|| d2(X) - ||Q c].



Problems:

(1) Pitman measure of closeness, PMC:

d1

d2

d3



Problems:

(2) Universal domination, u.d.:

d1(X) is said to universally dominate d2(X) if,

for all nondecreasing functions h and all ,

E[h(|| d1(X) - ||Q)] E[h(|| d2(X) - ||Q)].

Expectation is a linear operator

h(t) = at + b, a>0



[h(|| d1(X) - ||Q)]

For all nondecreasing functions h,

ET

where T has a property that

T[h(y)] =h[T(y)]



Than k You!!

decision thry1

Documents