Statistical Decision Theory
Perry Williams
Department of Fish, Wildlife, and Conservation BiologyDepartment of StatisticsColorado State University
26 June 2016
Perry Williams Statistical Decision Theory 1 / 50
Motivation, History, and Fundamentals
Perry Williams Statistical Decision Theory 1 / 50
Why Statistical Decision Theory?
Decisions are made at every step in scientific investigation
Data collectionModel selectionSummary statisticsManagement
SDT provides a cohesive framework for decision making
Data collection–Dynamic adaptive samplingModel selection–Optimal predictionSummary statistics–Bayes rulesManagement actions–Optimal management
Perry Williams Statistical Decision Theory 2 / 50
Why Statistical Decision Theory?
Decisions are made at every step in scientific investigation
Data collectionModel selectionSummary statisticsManagement
SDT provides a cohesive framework for decision making
Data collection–Dynamic adaptive samplingModel selection–Optimal predictionSummary statistics–Bayes rulesManagement actions–Optimal management
Perry Williams Statistical Decision Theory 2 / 50
History
Bayes’ Theorem appeared in “AnEssay Towards Solving a Problem inthe Doctrine of Chances”
“Aldrich suggests that we interpret[Bayes’ definition of probability] interms of expected utility, and thus
that Bayes’ result would make senseonly to the extent to which one canbet on its observable consequences.”
-Stephen Fienberg, 2006.
1763 1774 1922 1931 1934 1949 1954 1961
Perry Williams Statistical Decision Theory 3 / 50
History
Laplace published “Memoire sur laProbabilite des Causes par lasEvenements”
Elaborate example of inverseprobability
Uniform prior distributions
Methods for choosing estimatorsthat minimize posterior loss
1763 1774 1922 1931 1934 1949 1954 1961
Perry Williams Statistical Decision Theory 4 / 50
History
Fisher published “On theMathematical Foundations ofTheoretical Statistics”
Rejected inverse probability
Grounded his theory on frequencyinterpretation of probability
Obviated the need for priordistributions
Introduced likelihood
Tests of significance
1763 1774 1922 1931 1934 1949 1954 1961
Perry Williams Statistical Decision Theory 5 / 50
History
Fisher on Probability and Decisions
“We aim, in fact, atmethods of inferencewhich should be equallyconvincing to all rationalminds, irrespective of anyintentions they may havein utilizing the knowledgeinferred.”
1763 1774 1922 1931 1934 1949 1954 1961
Perry Williams Statistical Decision Theory 5 / 50
History
1763 1774 1922 1931 1934 1949 1954 1961
Perry Williams Statistical Decision Theory 6 / 50
History
Wald published “Statistical DecisionFunctions”
Unified statistical theory bytreating statistical problems asspecial cases of zero-sumtwo-person games
Statistical inference was viewedas a special case of decisiontheory (c.f., Von Neumann andMorgenstern 1944)
1763 1774 1922 1931 1934 1949 1954 1961
Perry Williams Statistical Decision Theory 7 / 50
History
“It is well recognized that thestatistical estimation theoryshould and can be organizedwithin the framework of thetheory of statistical decisionfunctions (Wald 1950)”
Akaike, H. 1973. Informationtheory and an extension of themaximum likelihood principle.
1763 1774 1922 1931 1934 1949 1954 1961
Perry Williams Statistical Decision Theory 7 / 50
History
Savage published “The Foundationsof Statistics”
Set the stage for Bayesianrevival
1763 1774 1922 1931 1934 1949 1954 1961
Perry Williams Statistical Decision Theory 8 / 50
History
“Decision theory is the bestand most stimulating, if notthe only, systematic model ofstatistics.”
1763 1774 1922 1931 1934 1949 1954 1961
Perry Williams Statistical Decision Theory 8 / 50
History
Raiffa and Schlaifer published“Applied Statistical DecisionTheory”
Methods of Fisher, Neyman,and Pearson did not address themain problem of a businessman:how to make decisions underuncertainty
Developed Bayesian decisiontheory
1763 1774 1922 1931 1934 1949 1954 1961
Perry Williams Statistical Decision Theory 9 / 50
F.P. Ramsey
B. De Finetti
J.M. Keynes
H. Jeffreys
D.V. Lindley
D.R. Cox
J.W. Tukey
A. Birnbaum
M. Kendall
Perry Williams Statistical Decision Theory 10 / 50
Fundamentals of SDT
Perry Williams Statistical Decision Theory 10 / 50
Inferential Steps
1 Identify possible states of nature (support)
2 Assign prior probabilities
3 Assign model processes (for potentially many models)
4 Apply Bayes theorem to obtain posterior probabilities from data
Perry Williams Statistical Decision Theory 11 / 50
Decision Steps
1 Identify possible states of nature (support)
2 Assign prior probabilities
3 Assign model processes
4 Apply Bayes theorem to obtain posterior probabilities from data
5 Enumerate possible decisions
6 Assign a loss function
7 Choose decision that minimizes expected loss
Perry Williams Statistical Decision Theory 11 / 50
Data (y)
Conceptual Model
Potential Actions
a
Loss L(θ, a)
Likelihood [y| θ]
Prior [θ]
Optimal Decision a*=argmina {∫ ∫L(θ, a)[y|θ]dy[θ]dθ}
Williams, P.J., and M.B. Hooten. 2016. Combining statistical inference and decisions inecology. Ecological Applications.
Perry Williams Statistical Decision Theory 12 / 50
Basic Elements
Θ: potential states of nature
θ: true state of nature
A : potential actions
a: a specific action, possibly a function of data (i.e., δ(y))
L : Θ×A 7→ R: loss function
L(θ, a): loss incurred if action a is made and θ is true
Perry Williams Statistical Decision Theory 13 / 50
Example: Measurement With Uncertainty
0 1 2 3 4 5θ
Perry Williams Statistical Decision Theory 14 / 50
Example: Measurement With Uncertainty
0 1 2 3 4 5θ
Perry Williams Statistical Decision Theory 14 / 50
Example: Measurement With Uncertainty
0 1 2 3 4 5θ
MeanMedian2.5
Perry Williams Statistical Decision Theory 14 / 50
Example: Measurement With Uncertainty
0 1 2 3 4 5θ
MeanMedian2.5
01
23
45
Loss
Perry Williams Statistical Decision Theory 14 / 50
Example: Measurement With Uncertainty
0 1 2 3 4 5θ
MeanMedian2.5
01
23
45
Loss
Perry Williams Statistical Decision Theory 14 / 50
Example: Measurement With Uncertainty
0 1 2 3 4 5θ
MeanMedian2.5
01
23
45
Loss
Perry Williams Statistical Decision Theory 14 / 50
Types of Loss
Squared Error Loss:L(θ, a) = (θ − a)2
Linear Loss:
L(θ, a) =
{c1(θ − a) if θ > a
c2(a− θ) if θ < a
0–1 Loss:
L(θ, a) =
{0 if θ = a
1 if θ 6= a
Perry Williams Statistical Decision Theory 15 / 50
Risk
Frequentist Risk
Bayesian Expected Loss
Bayesian Risk
Perry Williams Statistical Decision Theory 16 / 50
Decision Rule δ(y)
Y : a random variable that depends on θ
Y : the sample space of Y
y : a realization from Y
δ : Y 7→ A
(for any possible realization y ∈ Y , δ describes which action to take)
Perry Williams Statistical Decision Theory 17 / 50
Decision Rule Example
H0 : θ ≥ 0
Ha : θ < 0
δ1(y) =
{a1 = Reject H0 y < −0.3
a2 = Fail to reject H0 y ≥ −0.3
δ2(y) =
{a1 = Reject H0 y < −0.5
a2 = Fail to reject H0 y ≥ −0.5
Perry Williams Statistical Decision Theory 18 / 50
Frequentist Risk
How much you expect to lose when using a decision rule ∀y ∈ Y
R(θ, δ) =E [L(θ, δ(y))]
=
∫YL(θ, δ(y))[y |θ]dy
Perry Williams Statistical Decision Theory 19 / 50
Frequentist Risk and p-values
δ1(y) =
{a1 = Reject H0 y < −0.3
a2 = Fail to reject H0 y ≥ −0.3 L(θ, δ(y)) =
1 if reject H0 and θ ≥ 0
1 if fail to reject H0 and θ < 0
0 otherwise
Perry Williams Statistical Decision Theory 20 / 50
Frequentist Risk
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
θ
Ris
k
δ1δ2
Perry Williams Statistical Decision Theory 21 / 50
Decision Theory is Inherently Bayesian
Goal of Decision Theory: Make a decision based on our belief in theprobability of an unknown state
Frequentist Probability: The limit of a state’s relative frequency in alarge number of trials
Bayesian Probability: Degree of rational belief to which a state isentitled in light of the given evidence
Perry Williams Statistical Decision Theory 22 / 50
Bayesian Expected Loss
Probability distribution assigned to θ
Prior probability distribution: [θ]
Posterior probability distribution: [θ|y ]
Bayesian Expected Loss: the loss averaged over thedistribution of θ.
Perry Williams Statistical Decision Theory 23 / 50
Bayesian Expected Loss
ρ(a) = EθL(θ, a) =∫
Θ L(θ, a)[θ]dθ
ρ(a) = Eθ|yL(θ, a) =∫
Θ L(θ, a)[θ|y ]dθ
Bayes Rule: a∗ = argmina(ρ(a))
Perry Williams Statistical Decision Theory 24 / 50
Bayesian Expected Loss
0 1 2 3 4 5θ
MeanMedian2.5
01
23
45
Loss
Perry Williams Statistical Decision Theory 25 / 50
Bayesian Expected Loss
0 1 2 3 4 5θ
MeanMedian2.5
L(θ,
a)[θ
|y]
Perry Williams Statistical Decision Theory 26 / 50
Bayes Risk
Mathematical Relationship Between Frequentist Risk and Bayes ExpectedLoss:
r(a) =
∫Θ
{∫YL(θ, a)[y |θ]dy
}[θ]dθ
Note: [y |θ][θ] = [θ|y ][y ]
r(a) =
∫Y
{∫θ
L(θ, a)[θ|y ]dθ
}[y ]dy
Perry Williams Statistical Decision Theory 27 / 50
Bayesian Expected Loss vs. Frequentist Risk
R(θ, δ) integrates over y , but y is known (i.e., data)
R(θ, δ) is a function of unknown θ
R(θ, δ) doesn’t make use of auxiliary information (e.g., priorknowledge)
δ that minimizes ρ(δ) also minimizes r(δ) (don’t need to integrateover hypothetical replicates of y)
Perry Williams Statistical Decision Theory 28 / 50
Bayesian Point Estimation
Perry Williams Statistical Decision Theory 28 / 50
Bayesian Point Estimation
Suppose we want to summarize a posterior distribution [θ|y ] with apoint estimate
We want to minimize the information lost using the point estimate tosummarize [θ|y ]
Is the choice of point estimate arbitrary?
Bayes Estimator: Bayes Rule for point estimation
Perry Williams Statistical Decision Theory 29 / 50
Bayesian Point Estimation
Suppose we want to summarize a posterior distribution [θ|y ] with apoint estimate
We want to minimize the information lost using the point estimate tosummarize [θ|y ]
Is the choice of point estimate arbitrary?
Bayes Estimator: Bayes Rule for point estimation
Perry Williams Statistical Decision Theory 29 / 50
Bayes rule for squared-error loss
ρ(a) =
∫θ(θ − a)2[θ|y ]dθ
d
daρ(a) = 2E [θ|y ]− 2a
2E [θ|y ]− 2aset= 0
a∗ = E [θ|y ]
Perry Williams Statistical Decision Theory 30 / 50
Implied Loss of Posterior Mean
a = E [θ|y ]
f ′′(a)(a−E [θ|y ]) = 0
∫f ′′(a)(a− E [θ|y ])da =
∫(f ′(a)(a− θ)− f (a) + g(θ))[θ|y ]dθ
L(θ, a) =f ′(a)(a− θ)− f (a) + g(θ)
Perry Williams Statistical Decision Theory 31 / 50
Bayes rule for absolute-error loss
ρ(a) =
∫Θ|θ − a|[θ|y ]dθ
d
daρ(a) = −P(θ ≥ [a|y ]) + P(θ ≤ [a|y ])
− P(θ ≥ [a|y ]) + P(θ ≤ [a|y ])set= 0
a∗ = median([θ|y ])
Perry Williams Statistical Decision Theory 32 / 50
Bayes rule for 0–1 loss
ρ(a) =
∫ a2
a1
0[θ|y ]dθ +
∫ a1
−∞1[θ|y ]dθ +
∫ ∞a2
1[θ|y ]dθ
d
daρ(a) =[a1|y ]− [a2|y ], as a1 → a← a2
a∗ = mode([θ|y ])
Perry Williams Statistical Decision Theory 33 / 50
Asymmetric Distributions
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
φ
Den
sity
MeanMedianMode
Perry Williams Statistical Decision Theory 34 / 50
Asymmetric Distributions
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
φ
Den
sity
MeanMedianModeTruth
Perry Williams Statistical Decision Theory 34 / 50
Bayesian Point Estimation
Bayesian point estimation is NOT arbitrary!
Different loss functions yield different point estimates.
SEL: Posterior meanAbsolute Loss: Posterior median0–1 Loss: Posterior mode
A choice of point estimator implies decision maker’s choice of lossfunction (or class of functions).
Perry Williams Statistical Decision Theory 35 / 50
Management Decisions
Perry Williams Statistical Decision Theory 35 / 50
Henslow’s Sparrow
Williams, P.J., and M.B. Hooten. 2016. Combining statistical inference and decisions in ecology. Ecological Applications.
Perry Williams Statistical Decision Theory 36 / 50
Big Oaks National Wildlife Refuge
Perry Williams Statistical Decision Theory 37 / 50
Optimal Prescribed Fire Return Interval?
Perry Williams Statistical Decision Theory 38 / 50
Non-Optimal Solution
●
1 2 3 4
0.0
0.5
1.0
1.5
2.0
2.5
Year since burn
HE
SP
den
sity
Perry Williams Statistical Decision Theory 39 / 50
Elements for SDT
Data: Density estimates
Prior Information: Mean henslow’s sparrow densities at other sites(Herkert and Glass 1999)
Loss Function related to cost and management objectives
Perry Williams Statistical Decision Theory 40 / 50
Response-to-fire model
yj ,t ∼Poisson(Ajλj ,t),
log(λj ,t) = x ′j ,tβ + ηj ,
β ∼ Normal(µ, σ2I ),
ηj ∼ Normal(0, σ2η).
Perry Williams Statistical Decision Theory 41 / 50
Cumulative abundance as a derived parameter
Na = limT→∞
20∑8
j=1
∑Tt=T+1 Aiλj ,t
T − T
j=1, a=3 A1λ1,1 A1λ1,2 A1λ1,3 A1λ1,1 A1λ1,2 A1λ1,3 … A1λ1,1 A1λ1,2 A1λ1,3
t=1 t=2 t=3 t=4 t=5 t=6 … t=
Perry Williams Statistical Decision Theory 42 / 50
Posterior Distributions
0 500 1000 1500
N
Pro
babi
lity
Burn interval
1 year2 years3 years4 years
Perry Williams Statistical Decision Theory 43 / 50
Axioms of Henslow’s Sparrow Management
1 Frequent fire intervals are more expensive
2 More birds are better
3 The relative importance of cost decreases as abundance increases
Perry Williams Statistical Decision Theory 44 / 50
Henslow’s sparrow loss function
L(θ, a) =
{α0(a) + α1(a)Na,θ, Na,θ < 1835
0, Na,θ ≥ 1835
Perry Williams Statistical Decision Theory 45 / 50
0 500 1000 1500
Loss
Loss
01
N
Burn interval
1 year2 years3 years4 years
Perry Williams Statistical Decision Theory 46 / 50
Den
sity
Pro
babi
lity
0 500 1000 1500
Burn interval
1 year2 years3 years4 years
0 500 1000 1500
Loss
Loss
01
N
Perry Williams Statistical Decision Theory 47 / 50
Convolution of Loss Function and Posterior Distribution
0 500 1000 1500
Loss
L(θ,
a)[θ
|y]
N
Burn interval
1 year2 years3 years4 years
Perry Williams Statistical Decision Theory 48 / 50
Bayes Expected Loss
0 500 1000 1500
Loss
L(θ,
a)[θ
|y]
N
Burn interval
1 year2 years3 years4 years
0.65
0.340.26
0.27
Perry Williams Statistical Decision Theory 48 / 50
Model Selection Example
Posterior predictive distribution
[y|y,m] =
∫[y|y,θ(m)][θ(m)|y]dθ(m)
Posterior predictive loss (Gelfand and Ghosh 1998)
L(y, y, a) =L(y, a) + kL(y, a)
Posterior predictive expectation of loss for model m, and action a.
Ey|y,mL(y, y, a)
Perry Williams Statistical Decision Theory 49 / 50
Summary
SDT combines statistical analyses and decision theory
Implications for data collection, point estimation, and model selection
Ecological/Management applications
Perry Williams Statistical Decision Theory 50 / 50