bayesian forecasting and dynamic models m.west and j.harrison springer, 1997 presented by deepak...

24
Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Upload: amia-coffey

Post on 27-Mar-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Bayesian Forecasting and Dynamic Models

M.West and J.Harrison

Springer, 1997

Presented by Deepak Agarwal

Page 2: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Problem Definition

{yt} : 1-d time series to be monitored E.g. Daily counts of some pattern, e.g., number of

emergency room visits to a hospital

Goal: A statistical method which Forecast accurately (short term, long term behavior), i.e., a good

baseline model.

Detects deviations from baseline (detect outliers, gradual changes, structural changes) with good ROC characteristics.

Baseline model adapts to changes over time, e.g., learns gradual changes in day of week effects, learns mean shifts etc.

Page 3: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

The Approach

Baseline Model learned using a Kalman Filter Novel and simple way of learning “evolution” covariance

using a “discount” concept

Change detection done by cumulating evidence against status quo through residuals.

Procedure adapts to changes in the baseline by using the principle of management by exception

Use the forecasting model unless exceptional circumstances arise wherein one intervenes and corrects the forecasting model.

Page 4: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Simple but illustrative model

t timeuntil Data :beginning at thePrior :),(~

),0(~

Equation State

'),0(~y

Equationn Observatio

000

1

t

innovationor evolution is

tindependenlly conditiona

t

t

tttt

t

ttt

DCmN

wWNw

syVNv

Page 5: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Kalman Filter update at time t: Bayes Rule

EWMA. toconvergesFilter thecomponents arianceconstant v

with and statesteady under ally,Asymptotic t)allfor .1,.05 e.g, (A!EWMA! with Compare

;/A

time)across shrinkingor strength Borrowing;yon g(Regressin

C

),(~)|(:for Posterior )(

),(~)|(:yfor Marginal)(

),,(~)|(:for Prior )(

),(~)|(:for Posterior (a)

t

t

tt

t1

tt

1t

111tt

1111-t1-t

A

fyeQR

VAeAmm

CmNDd

VRQfNDyc

WCRRmNDb

CmND

ttttt

tttttt

ttt

tttttt

tttttt

ttt

Page 6: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Asymptotic relation between SNR and EWMA coefficient

Page 7: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Estimating Variance components

t t

t

Evolution variance W C(Asymptotic justification, factor used called "discount factor")W (1 ) / (.8 1 reasonable,Ihave found it useful to estimate this usingsome initial data but doesn't w

tC

t

ork sometimes)

Estimating V is routine Bayesian scalemixture analysis, i.e., mix the scale of Normals using inverse gamma which essentially replacesNormals with Student-ts (See book for formulae).

Page 8: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Illustration on Data Percentages of calls to an automated service at

AT&T that ended in Hang ups. Can’t give you the real numbers, this is what I did

Did an arcsine transform Generated mean surface using Loess making sure the

span was chosen to minimize autocorrelation in residuals. Generated smooth variances using deviation of observed

from mean surface Simulated observation from this process (See figure on

next page).

Page 9: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

A realization of the simulated process

Page 10: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Frequentist property of the procedure.

2 2 2t t t

L L2 2

p=1 p=1

2t t

MSE ( )=E(y ) (y ) ( )

( ( ) ( ) ) /

(for L replicates of the process) and

MSE (Mean Known)=E(y )

t t t t

tp tp t tp

t

KF E E

y L

L L2 2

p=1 p=1

t t

( ( ) ( ) ) /

Efficiency = 100*MSE ( ) / MSE (Mean Known)Used L=100.

tp t t ty y y L

KF

Page 11: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Red:recovered signal

Discount=.8

Page 12: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Discount=.95

Page 13: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

How to detect changes?

0 1

t 1 1

0

1

t t 1

se ( ( | )) / ( | )Under null model M , standardized residualsare iid N(0,1). Consider an alternate modelM which assumes the distribution is N(0,h(>1))

Bayes factor H (se | ) / (

t t t t t

M t M

y E y D Var y D

P D P

0 1

1

t t 1 1 t 1 1

t t-1 t-k+1

| )Cumulative Bayes Factor for most recent set of

observations isW ( ) (se , ,..., | ) / (se , ,..., | ) =H H ...H

Our goal is to identify the most

t t

M t t k t k M t t k t k

se D

kk P se se D P se se D

t t 11 k t

t1 k t

discrepant group of recent,consecutive observations which involves monitoringS = min W ( ) min(1, )

arg min W ( ) is the related run length of the sequentialprocedure.

t t

t

k H Sl k

Page 14: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Detecting changes, continued

t

t t 1 1

t

t

Procedure:At time

) If H , declare a change with 1b) If H , compute S min(1, ), 1 Declare a change if S , also return the run-length )If change detected, re-initialize S

t

t t t t

t

ta l

H S l ll

c

1

1, 1 , intervene(discussed later) and proceed to 1.

)If change not detected, update the Filter and proceed to to 1.

Choice of in M : 4,5 are adequate. between .1 and .15 is adequate.

One

tlt

dt

h h

could be creative here and use different types of alternative model to detect different changes.e.g., mean shift, local autocorrelation, slow linear trend etc.

Page 15: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

What to do when a change is detected? Possibilities:

Ignore points, underestimates variance Proceed with filtering as usual, introduces bias and

overestimates variance Need something in between.

Intervention: Management by Exception: Use a forecasting model

unless exceptional circumstances arise.

Feed forward: anticipatory in nature, e.g., a new version of the system comes out which is likely to increase hang up rates.

Feed back: Model performance deteriorates, adapt to new conditions, done automatically.

Page 16: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

How to intervene at time t? Add additional evolution to state at time t

1

2t

2t-1 t-1

~ ( , )

depend on application.Example:

0;

c chosen so that prior standard deviationof increases by some factor m (m=2, 3 good choices).

i.e., C (C )

t t t t t t

t t t

t t t

w h U

Parameters

h U cW

W U m W

ie

2( 1)

(1 )

Preserves the integrity of the Kalman Filter, all we are doing is changing the parameters of the prior at time .

t t

mU W

t

Page 17: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Mild intervention, U_{t}=0

Page 18: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Zoomed area, mild intervention

Page 19: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Strong intervention, sd of state vector tripled

Page 20: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Zoomed in, strong intervention

Page 21: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

More general models

1

equationy ~ (0, )

equation~ (0, )

updated using Kalman Filter(Filtering equations in the book)

One can almost take any static model and make it dynamic.

e.g, i

Tt t t t t

t t t t t

t

t

observationx V

StateG w W

s a 7-dim vector corresponding to day of week effects. could be a harmonic series to model seasonal patterns parsimoniously.

Covariates whose coefficients evolve dynamically

Tt tx

1t

Yt-1 Yt

t

xtXt-1

Gt

Yt-1

Xt-1

Page 22: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Model with Day of week effects on real data.

Page 23: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Non-normal models

Observation model is one parameter exponential family.

State equations are same.

Using canonical parametrization, prior on natural parameter \eta_{t}=x^{T}\theta_{t} formed by using prior on \theta_{t} through method of moments

Posterior of \eta_{t} converted to posterior of \theta_{t}. Details in the book

Page 24: Bayesian Forecasting and Dynamic Models M.West and J.Harrison Springer, 1997 Presented by Deepak Agarwal

Recent work and possible research questions

Detecting subtle changes that are not outliers Breakpoints, variance changes, autocorrelated errors (Salvador

and Gargallo,JCGS)

Detecting blips might not be important unless it is huge, want to alert only if things persist for a while Take an EWMA of Bayes factor, similar to Q-chart idea.

Intend to analyse data posted on the AD website using these models.

Comparative analyses with other commonly used methods.