bayesian forecasting and dynamic models m.west and j.harrison springer, 1997 presented by deepak...
TRANSCRIPT
Bayesian Forecasting and Dynamic Models
M.West and J.Harrison
Springer, 1997
Presented by Deepak Agarwal
Problem Definition
{yt} : 1-d time series to be monitored E.g. Daily counts of some pattern, e.g., number of
emergency room visits to a hospital
Goal: A statistical method which Forecast accurately (short term, long term behavior), i.e., a good
baseline model.
Detects deviations from baseline (detect outliers, gradual changes, structural changes) with good ROC characteristics.
Baseline model adapts to changes over time, e.g., learns gradual changes in day of week effects, learns mean shifts etc.
The Approach
Baseline Model learned using a Kalman Filter Novel and simple way of learning “evolution” covariance
using a “discount” concept
Change detection done by cumulating evidence against status quo through residuals.
Procedure adapts to changes in the baseline by using the principle of management by exception
Use the forecasting model unless exceptional circumstances arise wherein one intervenes and corrects the forecasting model.
Simple but illustrative model
t timeuntil Data :beginning at thePrior :),(~
),0(~
Equation State
'),0(~y
Equationn Observatio
000
1
t
innovationor evolution is
tindependenlly conditiona
t
t
tttt
t
ttt
DCmN
wWNw
syVNv
Kalman Filter update at time t: Bayes Rule
EWMA. toconvergesFilter thecomponents arianceconstant v
with and statesteady under ally,Asymptotic t)allfor .1,.05 e.g, (A!EWMA! with Compare
;/A
time)across shrinkingor strength Borrowing;yon g(Regressin
C
),(~)|(:for Posterior )(
),(~)|(:yfor Marginal)(
),,(~)|(:for Prior )(
),(~)|(:for Posterior (a)
t
t
tt
t1
tt
1t
111tt
1111-t1-t
A
fyeQR
VAeAmm
CmNDd
VRQfNDyc
WCRRmNDb
CmND
ttttt
tttttt
ttt
tttttt
tttttt
ttt
Asymptotic relation between SNR and EWMA coefficient
Estimating Variance components
t t
t
Evolution variance W C(Asymptotic justification, factor used called "discount factor")W (1 ) / (.8 1 reasonable,Ihave found it useful to estimate this usingsome initial data but doesn't w
tC
t
ork sometimes)
Estimating V is routine Bayesian scalemixture analysis, i.e., mix the scale of Normals using inverse gamma which essentially replacesNormals with Student-ts (See book for formulae).
Illustration on Data Percentages of calls to an automated service at
AT&T that ended in Hang ups. Can’t give you the real numbers, this is what I did
Did an arcsine transform Generated mean surface using Loess making sure the
span was chosen to minimize autocorrelation in residuals. Generated smooth variances using deviation of observed
from mean surface Simulated observation from this process (See figure on
next page).
A realization of the simulated process
Frequentist property of the procedure.
2 2 2t t t
L L2 2
p=1 p=1
2t t
MSE ( )=E(y ) (y ) ( )
( ( ) ( ) ) /
(for L replicates of the process) and
MSE (Mean Known)=E(y )
t t t t
tp tp t tp
t
KF E E
y L
L L2 2
p=1 p=1
t t
( ( ) ( ) ) /
Efficiency = 100*MSE ( ) / MSE (Mean Known)Used L=100.
tp t t ty y y L
KF
Red:recovered signal
Discount=.8
Discount=.95
How to detect changes?
0 1
t 1 1
0
1
t t 1
se ( ( | )) / ( | )Under null model M , standardized residualsare iid N(0,1). Consider an alternate modelM which assumes the distribution is N(0,h(>1))
Bayes factor H (se | ) / (
t t t t t
M t M
y E y D Var y D
P D P
0 1
1
t t 1 1 t 1 1
t t-1 t-k+1
| )Cumulative Bayes Factor for most recent set of
observations isW ( ) (se , ,..., | ) / (se , ,..., | ) =H H ...H
Our goal is to identify the most
t t
M t t k t k M t t k t k
se D
kk P se se D P se se D
t t 11 k t
t1 k t
discrepant group of recent,consecutive observations which involves monitoringS = min W ( ) min(1, )
arg min W ( ) is the related run length of the sequentialprocedure.
t t
t
k H Sl k
Detecting changes, continued
t
t t 1 1
t
t
Procedure:At time
) If H , declare a change with 1b) If H , compute S min(1, ), 1 Declare a change if S , also return the run-length )If change detected, re-initialize S
t
t t t t
t
ta l
H S l ll
c
1
1, 1 , intervene(discussed later) and proceed to 1.
)If change not detected, update the Filter and proceed to to 1.
Choice of in M : 4,5 are adequate. between .1 and .15 is adequate.
One
tlt
dt
h h
could be creative here and use different types of alternative model to detect different changes.e.g., mean shift, local autocorrelation, slow linear trend etc.
What to do when a change is detected? Possibilities:
Ignore points, underestimates variance Proceed with filtering as usual, introduces bias and
overestimates variance Need something in between.
Intervention: Management by Exception: Use a forecasting model
unless exceptional circumstances arise.
Feed forward: anticipatory in nature, e.g., a new version of the system comes out which is likely to increase hang up rates.
Feed back: Model performance deteriorates, adapt to new conditions, done automatically.
How to intervene at time t? Add additional evolution to state at time t
1
2t
2t-1 t-1
~ ( , )
depend on application.Example:
0;
c chosen so that prior standard deviationof increases by some factor m (m=2, 3 good choices).
i.e., C (C )
t t t t t t
t t t
t t t
w h U
Parameters
h U cW
W U m W
ie
2( 1)
(1 )
Preserves the integrity of the Kalman Filter, all we are doing is changing the parameters of the prior at time .
t t
mU W
t
Mild intervention, U_{t}=0
Zoomed area, mild intervention
Strong intervention, sd of state vector tripled
Zoomed in, strong intervention
More general models
1
equationy ~ (0, )
equation~ (0, )
updated using Kalman Filter(Filtering equations in the book)
One can almost take any static model and make it dynamic.
e.g, i
Tt t t t t
t t t t t
t
t
observationx V
StateG w W
s a 7-dim vector corresponding to day of week effects. could be a harmonic series to model seasonal patterns parsimoniously.
Covariates whose coefficients evolve dynamically
Tt tx
1t
Yt-1 Yt
t
xtXt-1
Gt
Yt-1
Xt-1
Model with Day of week effects on real data.
Non-normal models
Observation model is one parameter exponential family.
State equations are same.
Using canonical parametrization, prior on natural parameter \eta_{t}=x^{T}\theta_{t} formed by using prior on \theta_{t} through method of moments
Posterior of \eta_{t} converted to posterior of \theta_{t}. Details in the book
Recent work and possible research questions
Detecting subtle changes that are not outliers Breakpoints, variance changes, autocorrelated errors (Salvador
and Gargallo,JCGS)
Detecting blips might not be important unless it is huge, want to alert only if things persist for a while Take an EWMA of Bayes factor, similar to Q-chart idea.
Intend to analyse data posted on the AD website using these models.
Comparative analyses with other commonly used methods.