brief introduction to (hierarchical) bayesian methods ... introduction to (hierarchical) bayesian...
TRANSCRIPT
Brief Introduction to (Hierarchical) Bayesian Methods
Christopher K. WikleDepartment of Statistics
University of Missouri-Columbia
Outline
• Bayesian Statistics
• Hierarchical Bayesian Models
• Examples
– Blending Tropical Surface Winds
– Long-Lead Prediction of SST
– Tornado Report Climatology
• Conclusion
AMS Short Course, January, 2004
1
Geophysical Analysis
Two types of geophysical models?
• Physical:
– Laws of classical physics
– Uncertainties: neglecting higher order terms; choice of space-time averaging (subgrid-scale parameterizations); linking sys-tems (coupling biological processes to weather), etc.
• Statistical:
– Descriptive
– Uncertainties: inefficient in complex settings; data not repre-sentative of full dynamics; application under system changes;sampling; estimation
2
“Physical/Statistical Models”
Better to consider a “spectrum of models” (e.g., Berliner, 2003)
Deterministic ⇐⇒ Stochastic
Bayesian Approach:
• Natural for combining information sources while managing theiruncertainties.
– Multiple data sources
– Uncertain physics
– Physical models with stochastic parameters
– Expert opinion
• Predictive distributions of quantities of interest, conditioned on
observations
3
Distributional Notation
We use the short-hand notation [ ] for probability distribution.
Given random variables A and B, represent:
• Joint distribution of A and B: [A, B]
• Conditional distribution of A given B: [A|B]
• Marginal distribution of A: [A]
Also, note the following are equivalent:
Y |µ, σ2 ∼ N(µ, σ2)
Y = µ + ε, ε ∼ N(0, σ2)
where “∼” means “is distributed as”.
4
Bayesian Viewpoint
Bayes’ Theorem:
[Y |X ] =[Y,X ]
[X ]=
[X|Y ][Y ]∫[X|Y ][Y ]dY
For random variables with known distributions, this is a “fact” fromprobability theory. It is not controversial!
Bayesian Statistics:
Modeling unknown distributions and updating those models based ondata. Might be controversial!
5
Bayesian Statistics
[Y |X ] =[X|Y ][Y ]∫
[X|Y ][Y ]dY∝ [X|Y ][Y ]
• Want to make inference about Y , but we only observe X
• We update our uncertainty about Y after observing X
• [Y ] reflects our prior knowledge of Y
• [X|Y ] is the “likelihood”
• [Y |X ] is the posterior distribution
Bayesian Modeling:
Treat all unknowns as if they are random and evaluateprobabilistically
[Process|Data] ∝ [Data|Process][Process]
6
Simple Example 1
• Temperature observations: T (s1), T (s2), . . . , T (sn)
• These are observations of the true (unknown) process: θ
• We have prior information about θ, say θ0 from a “numerical model”but we don’t believe our model is perfect.
Probabilistically, we may believe:
[Data|Process] : T (si)|θ, σ2 ∼ N(θ, σ2)
[Process] : θ|θ0, τ2 ∼ N(θ0, τ
2)
7
We want the posterior: [Process|Data], i.e.,
[θ|T (s1), . . . , T (sn)] ∝ [T (s1), . . . , T (sn)|θ, σ2][θ|θ0, τ2]
where for simplicity, we assume θ0, τ2, σ2 are known. Then,
θ|T (s1), . . . , T (sn) ∼ N(wT + (1− w)θ0,σ2τ 2
σ2 + nτ 2)
where
T =1
n
n∑i=1
T (si)
w =τ 2
τ 2 + σ2/n.
Note: this is a weighted average of the sample mean and the priormean, where the weights are related to our certainty about the obser-vations and prior.
8
Basic Bayes is Not New to Meteorology
• Epstein, early 1960’s used Bayesian decision making in applied me-teorology
• Weather Modification Research in 1970’s (e.g., Simpson et al. 1973,JAS)
• Data Assimilation can be viewed from Bayesian viewpoint (e.g.,Lorenc 1986, Q. J. Roy. Met. Soc.)
– Kalman filter is Bayesian (e.g., Meinhold and Singpurwalla, 1983)
– Ensemble Kalman Filters (Sequential Monte Carlo); e.g., Evensenand van Leeuwen, 2000, MWR
• Climate Change (e.g., Solow 1988; J. Clim.; Leroy 1998, J. Clim.)
• Satellite Retrieval (many!)
Hierarchical Bayes is new to meteorology!
9
Hierarchical Bayesian Viewpoint
Cornerstone of hierarchical modeling is conditional thinking.
Joint distribution can be represented as products of conditionals. e.g.,
[A, B, C] = [A|B, C][B|C][C]
NOTE: Our choice for this decomposition is based on what we knowabout the process, and what assumptions we are willing and able tomake for simplification:
e.g., conditional independence: [A|B,C]=[A|B].
Often easier to express conditional models than full joint models.
10
Hierarchical Bayesian Modeling of Geophysical Systems
Separate unknowns into two groups:
• process variables: actual physical quantities of interest (e.g., tem-perature, pressure, wind)
• model parameters: quantities introduced in model development(e.g., propagator matrices, measurement error and sub-grid scalevariances, unknown physical constants)
Basic Hierarchical Model
1. [data|process, parameters]
2. [process|parameters]
3. [parameters]
The posterior [process,parameters|data] is proportional to theproduct of these three distributions!
11
Simple Example 1 (cont.)
Say that in Example 1, we believe that the numerical model valueof θ is biased. We do not know exactly the bias, but suspect it isrelated to known factors x ≡ (x1, x2, . . . , xk)
′ (e.g., MOS). Then, ourhierarchical model has the following stages:
• Data Model:
T (si)|θ, σ2 ∼ N(θ, σ2)
• Process Model:
θ|θ0,x, β, τ 2 ∼ N(θ0 + x′β, τ 2)
• Parameter Model:
β|β0,Σ ∼ N(β0,Σ)
We then seek the posterior distribution: [θ, β|T (s1), . . . , T (sn)]
12
Complicated Processes
Usually, we can’t find analytically the normalizing constant in Bayes’theorem. Thus we can’t easily get the posterior distribution.
• Numerical integration (o.k. for low dimensions)
• Use Monte Carlo methods to sample from the posterior. Specifi-cally,
– Markov Chain Monte Carlo (MCMC)
∗ Sample from a Markov Chain that has the same ergodic dis-tribution as the posterior
∗ Don’t need to know normalizing constant
∗ e.g., Metropolis, Gibbs sampler algorithms
∗ revolutionized Bayesian statistics in the late 1980’s
– Very computationally intensive
13
Example 2: Tropical Ocean Winds
Problem: Blending tropical surface winds given high-resolution satel-lite scatterometer observations and low-resolution assimilated modeloutput. (Wikle, Milliff, Nychka, Berliner, 2001; JASA)
14
Wind Problem: Process Model Motivation I
Consider the linear shallow water equations (on an equatorial beta-plane):
∂u
∂t− β0yv + g
∂h
∂x= 0
∂v
∂t+ β0yu + g
∂h
∂y= 0
∂h
∂t+ h(
∂u
∂x+
∂v
∂y) = 0
• β0: constant related to rotation of earth
• g: gravitational acceleration
This linear system can be solved analytically, giving a series of travelingwaves (equatorial normal modes) - Observed in the tropics!
15
Wind Problem: Process/Parameter Model Motivation II
Empirical results suggest turbulent scaling behavior for near surfacetropical winds (e.g., Wikle, Milliff and Large, 1999; JAS)
In particular:
Sv(ω) ∝ σ2v
|ω|κ
where κ = 5/3, and ω is the spatial frequency.
16
Wind Model: Hierarchical Model Sketch (v-Component)
• Data Model: Observations at different resolutions
[Zs′t Za
′t]′ = Ktvt + εt, εt ∼ N(0,Σε)
• Process Model:vt = Φat + γt,
Φ - matrix containing “important” normal mode basis functions(importance determined from empirical studies, e.g., Wheeler andKiladis, 1999)
at - time-varying spectral coefficients
γt - multiresolution spatio-temporal process
• Parameter Models:
at = H(θ)at−1 + ηt, ηt ∼ N(0,Ση)
θ - priors based on equatorial wave theory and empirical studies
γt - priors to suggest turbulent power-law relationships
17
Wind Problem Results: Posterior Decomposition
18
Wind Problem Results: Tropical Cyclone Dale
19
Wind Blending: “Operational”
Tim Hoar, Geophysical Statistics Project, NCAR has made thismodel “operational” on QSCAT data (1/2 deg)
20
Example 3: Long Lead Prediction of SST
Goal: Predict Pacific SST anomalies (2◦ × 2◦ resolution) 7 months inadvance while realistically accounting for uncertainty. [Berliner,Wikle,Cressie, 2000, J. Climate]
Consider data {Z(si; t)} to be anomalies from monthly means.
Want to predict Z(s0; T + τ ); for τ = 7 months, from space-time dataD(T ) ≡ {ZT ,ZT−1, . . . ,Z1}.
21
Data Model
Dimension Reduction:
Zt = Φat + νt,
where
• νt ∼ Gau(0,Σν); measurement error and small-scale spatial vari-ability that is uncorrelated in time
• Φ; truncated empirical orthogonal function basis set, ob-tained from SVD of Z ≡ (Z1, . . . ,ZT ) .
22
Process Model
at+τ = µt + Htat + ηt+τ ,
where, ηt ∼ Gau(0,Ση) for all t.
Critical modeling assumption: Let Ht and µt be dependent on boththe current and future climate regimes:
Ht = H(It, Jt)
µt = µ(It, Jt),
where,
• It - classifies the current regime as “warm” (2), “normal” (1), or“cold” (0)
• Jt - anticipates a transition to one of the three regimes at time t+τ
23
Climate States
Current, It: [Threshold Model, e.g., Tong 1990]
It = 0, if SOIt > low threshold
= 1, if otherwise
= 2, if SOIt < upper threshold,
where SOIt is the Southern Oscillation Index (assumed “known”).
Future, Jt: [Latent (hidden) Process Model]
Jt = 0, if Wt > low threshold
= 1, if otherwise
= 2, if Wt < upper threshold,
where Wt is a latent process which anticipates the future climateregime.
24
Latent Process to Assess “Future”
Wt|βw, σ2w ∼ Gau(x′tβw, σ2
w)
where,
xt = (1, Ut, Utsin(2πmt
12), Utcos(
2πmt
12), U 2
t )′,
where
• Ut - the lowpass filtered E-W component of the wind at 10 metersabove the surface at 5◦ N and 157◦ E. (This is absolutely critical!)
• mt - index of month (0 - 11) at time t
25
Priors on Model Parameters
At next stage of hierarchy:
βw ∼ Gau(βSOI, cβvar(βSOI)),
σ2w ∼ IG(q, r)
where the mean and variance and IG (“Inverse Gamma”) parametersare obtained from fitting
SOIt+τ = x′tβSOI + et
(gives R2 ≈ .7 !)
• vec(Hj), j = 1, 2, 3 - Multivariate Normal distributions
• µj, j = 1, 2, 3 - Multivariate Normal distributions
• Covariance matrices - Wishart distributions
(“Empirical Bayes” priors)
26
Bayesian Prediction
MCMC Implementation
• Probability weighted mixtures:
E(aT+τ |D(T )) =2∑
j=0Pr{JT = j|D(T )}E(µ(IT , j) + H(IT , j)aT |D(T ))
• Pixelwise percentiles of [ΦaT+τ |D(T )]
• Summary measures of El Nino/La Nina (e.g., Nino 3.4 Index) andtheir posterior distributions
27
Results: Nino 3.4 Prediction of 10/97 from 3/97
28
Results: Posterior Predictions for 10/97 and 10/98
29
Results: Component Predictions for 10/98
30
Results: Out of Sample Nino 3.4 Posterior Predictions
31
Example 4: Tornado Counts (Wikle and Anderson, 2003; JGR-A)
Investigate possible dependence between observed tornado counts andclimate indices (ENSO, NAO, NPI).
NWS Storm Reports 1953-2001
32
Modeling Issues
Complicated Variability
• Non-normal data (counts)
• Observational uncertainty in counts
– Uncertainty in procedure for assigning Fujita scale (reliance onhuman observation and variable rating procedure)
– Demographic and technological influences
• Spatial variability in long-term frequency of tornado counts
• Temporal trends that may vary with space
• Exogenous climatological factors with effects that may vary
spatially (e.g., regionally)
33
Data Model
Y (si; t) - tornado count in grid box si in year t
Poisson Data Model
Y (si; t)|λ(si; t) ∼ Poisson(λ(si; t))
assumes conditional independence (i.e., counts are independent condi-tioned on the true Poisson intensity process, λ)
NOTE: In paper, we consider a “zero-inflated Poisson model”.
34
Process Model
log(λ(si; t) = β1(si)X1(t) + . . . + β6(si)X6(t) + η(si; t) + ε(si; t)
where,
• X1 - time
• X2 - ENSO index
• X3 - NPI index
• X4 - NAO index
• X5 - ENSO by NPI interaction
• X6 - ENSO by NAO interaction
• η(si; t) - spatially correlated temporal process (large-scale) [account-ing for other potential climate effects]
• ε(si; t) - uncorrelated random effect [ε(si; t) ∼ N(0, σ2ε )]
35
Parameter Models
βk ∼ N(0,Σ(θk)) - Spatially correlated random fields specified byvariance and dependence parameters (θk).
ηt ∼ N(µ,Σ(θη)) - spatial process, with mean µ and variance anddependence parameters given by θη.
Distributions must also be specified for the “hyper-parameters”: θk,θη, µ, and σ2
ε (see Wikle and Anderson (2003) for details)
36
Posterior Summary: Time Trend (β1) Coefficients
Posterior Mean and Posterior 95% “Credibility”
37
Posterior Summary: ENSO (β2) Coefficients
Posterior Mean and Posterior 95% “Credibility”
38
Posterior Summary: NPI (β3) Coefficients
Posterior Mean and Posterior 95% “Credibility”
39
Some Other Recent Hierarchical Bayesian Examples
• Air-Sea Interaction: (Berliner, Milliff, Wikle, 2003, JGR-O)
• Climate Change: (Berliner, Levine, Shea, 2000, J. Clim.),
• Stochastic Boundary Value Problems: (Wikle, Berliner, Milliff,2003, MWR)
• Nowcasting Radar: (Xu, Wikle, Fox, 2003)
• Hurricane Climatology: (Elsner and Bossak, 2001, J. Clim.)
• Analysis of Rainfall Data: (Sanso and Guenni, 1999, Applied Statis-tics)
40
Conclusion
• Bayesian methods allow one to quantify uncertainty in all aspects ofthe problem (data, process, parameters) and distributional outputreflects the quantification of uncertainty
• Hierarchical Bayesian methods allow one to decompose the probleminto a series of simpler conditional models
– Can accommodate data, model, and parameter uncertainty
– Can accommodate data of different resolution/alignment (e.g.,GIS layers)
– Can accommodate complicated spatial, temporal, and spatio-temporal dependence
– Can accommodate physics (e.g., shallow water PDE; turbulence)
– Can accommodate expert opinion
• Downside: Complicated and computationally intense (canned soft-ware available for free, winBUGS)
41
* Most of the papers discussed here can be found on myweb page:
http://www.stat.missouri.edu/~wikle/
42