decision thry1
TRANSCRIPT
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 1/23
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 2/23
Statistical Decision Theory
Abraham Wald(1902 - 1950)
Wald¶s test
Rigorous proof of the consistency of MLE
³Note on the consistency of the maximum
likelihood estimate´, Ann. Math. Statist., 20,
595-601.
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 3/23
Statistical Decision Theory
and Hypothesis Testing.
A major use of statistical inference is its application to decision making
under uncertainty, say Parameter estimation
Unlike classical statistics which is only directed towards the use of
sampling information in making inferences about unknown numerical
quantities, an attempt in decision theory is made to combine the sampling
information with a knowledge of the consequences of our decisions.
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 4/23
Three elements in SDT
State of Nature: , some unknown quantities, say parameters.
Decision Space D : A space of all possible values of decisions/actions/rules/estimators
Loss function L(, d(X)):
± a non-negative function on x D.
± a measure of how much we lose by choosing action d when is used.
± In estimation, a measure of the accuracy of estimators d of .
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 5/23
For example,
_ a1,0U = 0 means ³nuclear warhead is NOT headed to UBC´
= 1 means ³nuclear warhead is headed to UBC´
L(0,0) = 0
L(0,1) = cost of moving
L(, d):
L(1,1) = cost of moving + cost of belongings we cannot move
L(1,0) = loss of belongings +
D ={0,1}={Stay in Vancouver, Leave }
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 6/23
Common loss functions
Univariate ± L1 = | ± d(x)| (absolute error loss)
± L2 = ( ± d(x))2 (squared error loss)
Multivariate ± (Generalized) Euclidean norm:
[ ± d(x)]T Q [ ± d(x)], where Q is positive definite
More generally, ± Non-decreasing functions of L1 or Euclidean norm
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 7/23
Loss function, L (d(X), ), is random
Frequentist Bayesian
L E (in Frequentist) (in Bayesian)
Risk or Posterior risk
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 8/23
Estimator Comparison
The Risk principle: the estimator d1(X) is better than another estimator d2(X) in the sense of risk if R(,d1) R(,d2) for all , with
strict inequality for some .
However, in general, it does not exist
Best estimator (uniformly minimum risk estimator)
d*(X) = arg min R(, d(X)) for all d
Class of all
estimators
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 9/23
For instance, only consider
mean-unbiased estimators.
In particular, UMVUE is the best
unbiased estimator when L2 is
used.
Shrink the class of estimators.
Then find the best estimator in
this class.
Class of estimatorsSmaller class of estimators
I am the best!!
Too Large
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 10/23
Weaken the optimality criterion by considering the maximum
value of the risk over all . Then choose the estimator with the
smallest maximum risk. T he best estimator according to this
minimax principle is called minimax estimator.
Notice that the risk depends on . So, the risks of two estimators
often cross each other. This is another possibility that the best
estimator does not exist.
d1 d2
R(,d)
Winner Loser
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 11/23
Alternatively, we can find the best estimator by minimizing
the average risk with respect to a prior of in the Bayesian
framework.
The estimator having the smallest Bayes risk with a specific prior
is called the Bayes estimator (with ).
Given a prior of , the average risk of the estimator d(X) defined by
is called Bayes risk, r (d)
´ UUT U d d R )(),(
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 12/23
Under Bayes risk principle
R(, d1)
()
R(, d2)
d1 d2
Winner Loser Winner Loser
§i
iid R )(),( UT UT
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 13/23
In general, it is not easy to find the Bayes estimator by
minimizing the Bayes risk.
However, if the Bayes risk of the Bayes estimator is
finite, then the estimator minimizing the posterior risk
and the Bayes estimator are the same.
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 14/23
Some examples for finding the Bayes estimator
(1) Squared error loss:
E[(-d)2|x] = E[2|x] -2d E[|x] + d2
min E[(-d)2
|x]d
Minimize the posterior risk
The minimizer of f(d) is E[|x], i.e. the posterior mean.
f(d)
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 15/23
Some examples for finding the Bayes estimator
(2) Absolute error loss:
min E[|-d| |x]d
The minimizer is med[|x], i.e. the posterior median.
(3) Linear error loss:
L(,d) = K0(-d) if -d>=0 and
= K1(d-) if -d<0
The K0/(K0+K1) th quantile of the posterior is the Bayes
estimator of .
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 16/23
Relationship between minimax and
Bayes estimators
In particular, if the Bayes estimator has a constant risk, then it is minimax.
Denote by d the Bayes estimator with respect to .
If the Bayes risk of d is equal to the maximum risk of d, i.e.
),(sup)(),( T U
T UUUT U d Rd d R !´Then the Bayes estimator d is minimax .
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 17/23
Problems for the risk measure:
The risk measure is too sensitive to the choice of
loss function.
All estimators are assumed to have finite risks.
So, in general, the risk measure fails to use in
problems with heavy tails or outliers.
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 18/23
Other measures:
(1) Pitman measure of closeness, PMC:
d1
d2
QQ d d ||2||||1|| UU P( )1/2
d1 is Pitman-closer to than d2 if the above condition holds for all .
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 19/23
Other measures:
(2) Universal domination, u.d.:
d1(X) is said to universally dominate d2(X) if,
for all nondecreasing functions h and all ,
E[h(|| d1(X) - ||Q)] E[h(|| d2(X) - ||Q)].
(3) Stochastic domination, s.d.:
d1(X) is said to stochastically dominate d2(X) if,
for every c > 0 and all ,
P[|| d1(X) - ||Q c] P[|| d2(X) - ||Q c].
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 20/23
Problems:
(1) Pitman measure of closeness, PMC:
d1
d2
d3
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 21/23
Problems:
(2) Universal domination, u.d.:
d1(X) is said to universally dominate d2(X) if,
for all nondecreasing functions h and all ,
E[h(|| d1(X) - ||Q)] E[h(|| d2(X) - ||Q)].
Expectation is a linear operator
h(t) = at + b, a>0
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 22/23
[h(|| d1(X) - ||Q)]
For all nondecreasing functions h,
ET
where T has a property that
T[h(y)] =h[T(y)]
8/8/2019 Decision Thry1
http://slidepdf.com/reader/full/decision-thry1 23/23
Than k You!!