decision thry1

23

Upload: manishbansalkota

Post on 10-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 1/23

Page 2: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 2/23

Statistical Decision Theory

 Abraham Wald(1902 - 1950)

Wald¶s test

Rigorous proof of the consistency of MLE

³Note on the consistency of the maximum

likelihood estimate´, Ann. Math. Statist., 20,

595-601.

Page 3: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 3/23

Statistical Decision Theory

and Hypothesis Testing.

 A major use of statistical inference is its application to decision making

under uncertainty, say Parameter estimation

Unlike classical statistics which is only directed towards the use of 

sampling information in making inferences about unknown numerical

quantities, an attempt in decision theory is made to combine the sampling

information with a knowledge of the consequences of our decisions.

Page 4: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 4/23

Three elements in SDT

State of Nature: , some unknown quantities, say parameters.

Decision Space D  :  A space of all possible values of decisions/actions/rules/estimators

Loss function L(, d(X)):

 ± a non-negative function on x D.

 ± a measure of how much we lose by choosing action d when is used.

 ± In estimation, a measure of the accuracy of estimators d of .

Page 5: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 5/23

For example,

 _ a1,0U = 0 means ³nuclear warhead is NOT headed to UBC´

= 1 means ³nuclear warhead is headed to UBC´

L(0,0) = 0

L(0,1) = cost of moving

L(, d):

L(1,1) = cost of moving + cost of belongings we cannot move

L(1,0) = loss of belongings +

D ={0,1}={Stay in Vancouver, Leave }

Page 6: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 6/23

Common loss functions

Univariate ± L1 = |  ± d(x)| (absolute error loss)

 ± L2 = ( ± d(x))2 (squared error loss)

Multivariate ± (Generalized) Euclidean norm:

[ ± d(x)]T Q [ ± d(x)], where Q is positive definite

More generally, ± Non-decreasing functions of L1 or Euclidean norm

Page 7: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 7/23

Loss function, L (d(X), ), is random

Frequentist Bayesian

L E (in Frequentist) (in Bayesian)

Risk or Posterior risk 

Page 8: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 8/23

Estimator Comparison

The Risk principle: the estimator d1(X) is better than another estimator d2(X) in the sense of risk if R(,d1) R(,d2) for all , with

strict inequality for some .

However, in general, it does not exist

Best estimator (uniformly minimum risk estimator)

d*(X) = arg min R(, d(X)) for all d

Class of all

estimators

Page 9: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 9/23

For instance, only consider 

mean-unbiased estimators.

In particular, UMVUE is the best

unbiased estimator when L2 is

used.

Shrink the class of estimators.

Then find the best estimator in

this class.

Class of estimatorsSmaller class of estimators

I am the best!!

Too Large

Page 10: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 10/23

Weaken the optimality criterion by considering the maximum

value of the risk over all . Then choose the estimator with the

smallest maximum risk. T he best estimator according to this

minimax  principle is called minimax estimator.

Notice that the risk depends on . So, the risks of two estimators

often cross each other. This is another possibility that the best

estimator does not exist.

d1 d2

R(,d)

Winner Loser  

Page 11: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 11/23

 Alternatively, we can find the best estimator by minimizing

the average risk with respect to a prior of  in the Bayesian

framework.

The estimator having the smallest Bayes risk with a specific prior

is called the Bayes estimator (with ).

Given a prior of , the average risk of the estimator d(X) defined by

is called Bayes risk, r (d)

´ UUT U d d  R )(),(

Page 12: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 12/23

Under Bayes risk principle

R(, d1)

()

R(, d2)

d1 d2

Winner Loser  Winner Loser 

§i

iid  R )(),( UT UT 

Page 13: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 13/23

In general, it is not easy to find the Bayes estimator by

minimizing the Bayes risk.

However, if the Bayes risk of the Bayes estimator is

finite, then the estimator minimizing the posterior risk

and the Bayes estimator are the same.

Page 14: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 14/23

Some examples for finding the Bayes estimator 

(1) Squared error loss:

E[(-d)2|x] = E[2|x] -2d E[|x] + d2

min E[(-d)2

|x]d

Minimize the posterior risk

The minimizer of f(d) is E[|x], i.e. the posterior mean.

f(d)

Page 15: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 15/23

Some examples for finding the Bayes estimator 

(2) Absolute error loss:

min E[|-d| |x]d

The minimizer is med[|x], i.e. the posterior median.

(3) Linear error loss:

L(,d) = K0(-d) if -d>=0 and

= K1(d-) if -d<0

The K0/(K0+K1) th quantile of the posterior is the Bayes

estimator of .

Page 16: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 16/23

Relationship between minimax and

Bayes estimators

In particular, if the Bayes estimator has a constant risk, then it is minimax.

Denote by d the Bayes estimator with respect to .

If the Bayes risk of d is equal to the maximum risk of d, i.e.

),(sup)(),( T U

T  UUUT U d  Rd d  R !´Then the Bayes estimator d is minimax .

Page 17: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 17/23

Problems for the risk measure:

The risk measure is too sensitive to the choice of 

loss function.

 All estimators are assumed to have finite risks.

So, in general, the risk measure fails to use in

problems with heavy tails or outliers.

Page 18: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 18/23

Other measures:

(1) Pitman measure of closeness, PMC:

d1

d2

QQ d d  ||2||||1|| UU P( )1/2

d1 is Pitman-closer to than d2 if the above condition holds for all .

Page 19: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 19/23

Other measures:

(2) Universal domination, u.d.:

d1(X) is said to universally dominate d2(X) if,

for all nondecreasing functions h and all ,

E[h(|| d1(X) - ||Q)] E[h(|| d2(X) - ||Q)].

(3) Stochastic domination, s.d.:

d1(X) is said to stochastically dominate d2(X) if,

for every c > 0 and all ,

P[|| d1(X) - ||Q c] P[|| d2(X) - ||Q c].

Page 20: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 20/23

Problems:

(1) Pitman measure of closeness, PMC:

d1

d2

d3

Page 21: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 21/23

Problems:

(2) Universal domination, u.d.:

d1(X) is said to universally dominate d2(X) if,

for all nondecreasing functions h and all ,

E[h(|| d1(X) - ||Q)] E[h(|| d2(X) - ||Q)].

Expectation is a linear operator 

h(t) = at + b, a>0

Page 22: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 22/23

[h(|| d1(X) - ||Q)]

For all nondecreasing functions h,

ET

where T has a property that

T[h(y)] =h[T(y)]

Page 23: Decision Thry1

8/8/2019 Decision Thry1

http://slidepdf.com/reader/full/decision-thry1 23/23

Than k You!!