model building steps: forecasting the jobs number
TRANSCRIPT
Model Building Steps
Forecasting the jobs number
John H. Muller
October 1, 2012
John H. Muller () Model Building Steps October 1, 2012 1 / 40
Outline
1 Goals
2 The TaskForecasting the jobs numberPredictor Variables
3 Modeling ProcessPreliminariesFitting and Tuning the ModelModel SelectionPrediction
4 Resources
John H. Muller () Model Building Steps October 1, 2012 2 / 40
1 Goals
2 The TaskForecasting the jobs numberPredictor Variables
3 Modeling ProcessPreliminariesFitting and Tuning the ModelModel SelectionPrediction
4 Resources
John H. Muller () Model Building Steps October 1, 2012 3 / 40
Goals for the presentation
Illustrate issues and choices in a typical model building process.
To do that we take the following as out task.Build a model to forecast a macro economic time series
Time is limited, so we don’t have time to discuss:
econometrics or macroeconomics
time series methods
details or merits of particular modeling or model fitting methods
John H. Muller () Model Building Steps October 1, 2012 4 / 40
1 Goals
2 The TaskForecasting the jobs numberPredictor Variables
3 Modeling ProcessPreliminariesFitting and Tuning the ModelModel SelectionPrediction
4 Resources
John H. Muller () Model Building Steps October 1, 2012 5 / 40
Total nonfarm (FRED symbol = PAYEMS)
60000
80000
100000
120000
140000
1960 1970 1980 1990 2000 2010
Every month BLS publishes the Employment Situation report.
Two most important numbers: Unemployment rate and Total nonfarm
Total nonfarm: count of jobs from survey of businesses(units: thousands of jobs)
John H. Muller () Model Building Steps October 1, 2012 6 / 40
Total nonfarm (FRED symbol = PAYEMS)
60000
80000
100000
120000
140000
1960 1970 1980 1990 2000 2010
Every month BLS publishes the Employment Situation report.
Two most important numbers: Unemployment rate and Total nonfarm
Total nonfarm: count of jobs from survey of businesses(units: thousands of jobs)
Task: Forecast month-over-month change in Total nonfarm
John H. Muller () Model Building Steps October 1, 2012 6 / 40
Total nonfarm (FRED symbol = PAYEMS)
130000
132000
134000
136000
138000
2000 2002 2004 2006 2008 2010 2012
Figure: PAYEMS since 2000
John H. Muller () Model Building Steps October 1, 2012 7 / 40
Month over Month Change in PAYEMS
mean=23, sd = 289
−1000
−500
0
500
1000
2000 2002 2004 2006 2008 2010 2012
Figure: Month-over-month change in PAYEMS
John H. Muller () Model Building Steps October 1, 2012 8 / 40
1 Goals
2 The TaskForecasting the jobs numberPredictor Variables
3 Modeling ProcessPreliminariesFitting and Tuning the ModelModel SelectionPrediction
4 Resources
John H. Muller () Model Building Steps October 1, 2012 9 / 40
ID Description
ALTSALES Light Weight Vehicle Sales: Autos & Light Trucks
BUSLOANS Commercial and Industrial Loans at All Commercial Banks
CE16OV Civilian Employment
CIVPART Civilian Labor Force Participation Rate
CLF16OV Civilian Labor Force
CONSUMER Consumer Loans at All Commercial Banks
CPATAX Corporate Profits After Tax
DPI Disposable Personal Income
PAYEMS All Employees: Total nonfarm
PCE Personal Consumption Expenditures
PSAVERT Personal Saving Rate
SRVPRD All Employees: Service-Providing Industries
TCU Capacity Utilization: Total Industry
UEMP27OV Civilians Unemployed for 27 Weeks and Over
UEMPLT5 Civilians Unemployed - Less Than 5 Weeks
UEMPMEAN Average (Mean) Duration of Unemployment
UEMPMED Median Duration of Unemployment
UNEMPLOY Unemployed
UNRATE Civilian Unemployment Rate
USGOOD All Employees: Goods-Producing Industries
Table: Variables and descriptionsJohn H. Muller () Model Building Steps October 1, 2012 10 / 40
Jobs
1350
00
2000 2004 2008 2012
CE16OV
6466
2000 2004 2008 2012
CIVPART
1400
00
2000 2004 2008 2012
CLF16OV
1280
00
2000 2004 2008 2012
PAYEMS
1050
002000 2004 2008 2012
SRVPRD
2000
2000 2004 2008 2012
UEMP27OV
2500
2000 2004 2008 2012
UEMPLT5
1530
2000 2004 2008 2012
UEMPMEAN
515
25
2000 2004 2008 2012
UEMPMED
6000
1600
0
2000 2004 2008 2012
UNEMPLOY
46
810
2000 2004 2008 2012
UNRATE
1800
0
2000 2004 2008 2012
USGOOD
Figure: Original Series
John H. Muller () Model Building Steps October 1, 2012 11 / 40
Consumer
1014
18
2000 2004 2008 2012
ALTSALES
600
1000
2000 2004 2008 2012
CONSUMER
6000
9000
2000 2004 2008 2012
DPI
6000
9000
2000 2004 2008 2012
PCE
−2
02
46
2000 2004 2008 2012
PSAVERT
Figure: Original Series
John H. Muller () Model Building Steps October 1, 2012 12 / 40
Business
1000
1400
2000 2002 2004 2006 2008 2010 2012
BUSLOANS
600
1200
2000 2002 2004 2006 2008 2010 2012
CPATAX
7075
80
2000 2002 2004 2006 2008 2010 2012
TCU
Figure: Original Series
John H. Muller () Model Building Steps October 1, 2012 13 / 40
Jobs
−10
00
2000 2004 2008 2012
CE16OV
−0.
40.
2
2000 2004 2008 2012
CIVPART
−10
0030
00
2000 2004 2008 2012
CLF16OV
−10
00
2000 2004 2008 2012
PAYEMS
−10
0010
002000 2004 2008 2012
SRVPRD
−40
040
0
2000 2004 2008 2012
UEMP27OV
−50
050
0
2000 2004 2008 2012
UEMPLT5
−1
12
2000 2004 2008 2012
UEMPMEAN
−20
2
2000 2004 2008 2012
UEMPMED
−50
0
2000 2004 2008 2012
UNEMPLOY
−0.
40.
2
2000 2004 2008 2012
UNRATE
−10
000
2000 2004 2008 2012
USGOOD
Figure: Differenced Series
John H. Muller () Model Building Steps October 1, 2012 14 / 40
Consumer
−4
−2
02
4
2000 2004 2008 2012
ALTSALES
010
020
0
2000 2004 2008 2012
CONSUMER
−10
00
100
300
20002002 2006 2010
DPI
−10
00
100
200
2000 2004 2008 2012
PCE
−4
−2
02
4
2000 2004 2008 2012
PSAVERT
Figure: Differenced Series
John H. Muller () Model Building Steps October 1, 2012 15 / 40
Business
−40
020
4060
2000 2002 2004 2006 2008 2010 2012
BUSLOANS
−10
00
100
200
2000 2002 2004 2006 2008 2010 2012
CPATAX
−2
−1
01
2000 2002 2004 2006 2008 2010 2012
TCU
Figure: Differenced Series
John H. Muller () Model Building Steps October 1, 2012 16 / 40
1 Goals
2 The TaskForecasting the jobs numberPredictor Variables
3 Modeling ProcessPreliminariesFitting and Tuning the ModelModel SelectionPrediction
4 Resources
John H. Muller () Model Building Steps October 1, 2012 17 / 40
Preliminaries
choose target and predictor variablesconsideration might include: history, cost, frequency, accuracy
choose model form & method: Lasso & random forestalternatives: neural networks, OLS, robust regression, ...Criteria for choosing:
◮ prediction accuracy◮ interpretability◮ suitability to the task and data◮ available software, model maintenance, implementation complexity
Derive variables from inputs. smoothed, standardizedalternatives: powers of original variables, cross terms, ratios
Plan for estimating out of sample error:cross validation & test/train split
John H. Muller () Model Building Steps October 1, 2012 18 / 40
Preliminaries
Data issues
Missing data: removealternatives: impute, ignore (for some model forms)
Outliers: trim to within 3 sd of rolling meanalternatives: ignore, remove
Correlated predictor variable: ignorealternatives: cluster variables and choose 1 from each cluster
John H. Muller () Model Building Steps October 1, 2012 19 / 40
Figure: Trimmed and Smoothed
Jobs
−10
0030
00
2000 2004 2008 2012
CE16OV
−0.
40.
2
2000 2004 2008 2012
CIVPART
−10
0030
00
2000 2004 2008 2012
CLF16OV
−10
0010
00
2000 2004 2008 2012
PAYEMS
−10
0050
0
2000 2004 2008 2012
SRVPRD
−40
0040
0
2000 2004 2008 2012
UEMP27OV
−50
050
0
2000 2004 2008 2012
UEMPLT5
−1
12
2000 2004 2008 2012
UEMPMEAN
−2
02
2000 2004 2008 2012
UEMPMED
−50
050
02000 2004 2008 2012
UNEMPLOY
−0.
40.
2
2000 2004 2008 2012
UNRATE
−10
000
2000 2004 2008 2012
USGOOD
John H. Muller () Model Building Steps October 1, 2012 20 / 40
Figure: Trimmed and Smoothed
Consumer
−4
−2
02
4
2000 2004 2008 2012
ALTSALES
010
020
0
2000 2004 2008 2012
CONSUMER
−10
010
030
0
2000 2004 2008 2012
DPI
−10
00
100
200
2000 2004 2008 2012
PCE
−4
−2
02
42000 2004 2008 2012
PSAVERT
John H. Muller () Model Building Steps October 1, 2012 21 / 40
Figure: Trimmed and Smoothed
Business
−40
020
4060
2000 2002 2004 2006 2008 2010 2012
BUSLOANS
−10
00
100
2000 2002 2004 2006 2008 2010 2012
CPATAX
−2
−1
01
2000 2002 2004 2006 2008 2010 2012
TCU
John H. Muller () Model Building Steps October 1, 2012 22 / 40
1 Goals
2 The TaskForecasting the jobs numberPredictor Variables
3 Modeling ProcessPreliminariesFitting and Tuning the ModelModel SelectionPrediction
4 Resources
John H. Muller () Model Building Steps October 1, 2012 23 / 40
Fitting and Tuning the Model
Complexity: how many knobs the model hase.g. degrees of freedom,# variables, shrinkage factor, tree size, ...
Fitting: estimating parameters for given complexityMethods: least squares, method of moments, maximum likelihood, optimization
Tuning: adjusting the models complexity
Possibly iterative, using diagnostics:
out of sample error
sensitivity, e.g. ∂error
∂data
significance of parameters
error structure, e.g. heteroskedastic
alignment with prior beliefsWhich variabless are important for the model?
John H. Muller () Model Building Steps October 1, 2012 24 / 40
0.0 0.2 0.4 0.6 0.8 1.0
4e+
048e
+04
Fraction of final L1 norm
Cro
ss−
Val
idat
ed M
SE
John H. Muller () Model Building Steps October 1, 2012 25 / 40
** * * * * ** ** **** **** * *** * *** ** * * * * *
0.0 0.2 0.4 0.6 0.8 1.0
−20
000
|beta|/max|beta|
Sta
ndar
dize
d C
oeffi
cien
ts
** * * * * ** ** **** **** * *** * *** ** * * * * *** * * * * ** ** ******** * *** * ***
** * * * **
** * * * * ** ** ******** * ***
* *** ** * * * * *
** * * * * ** ** **** **** * *** * ***** * * * *
*** * * * * ** ** **** **** * *** * *** ** * * * * *** * * * * ** ** **** **** * *** * *** ** * * * * *** * * * * ** ** **** **** * *** * *** ** * * * * *** * * * * ** ** **** **** * *** * *** ** * * * * *
** * * ** ** ** **** **** * *** * *** ** * * * * *
** * * * * ** ** **** **** * *** * *** ** * * * * *** * * * * ** ** **** **** * *** * *** ** * * * * *** * * * * ** ** **** **** * *** * *** ** * * * * *** * * * * ** ** **** **** * *** * *** ** * * * * *** * * * * ** ** **** **** * *** * *** ** * * * * *** * * * * ** ** **** **** * *** * *** ** * * * * *** * * * * ** ** **** **** * *** * *** ** * * * * *** * * * * ** ** ****
**** * *** * ***** * * * *
*
** * * * * ** ** **** **** * *** * *** ** * * * * *
** * * * * ** ** **** **** * *** * *** ** * * * * *
LASSO
183
87
4
John H. Muller () Model Building Steps October 1, 2012 26 / 40
variable estimate
ALTSALES 0.000
BUSLOANS 0.000
CE16OV 0.157
CIVPART 0.000
CLF16OV 0.000
CONSUMER 0.000
CPATAX 0.000
DPI 0.000
PAYEMS 0.028
PCE 2.250
PSAVERT -15.350
SRVPRD 0.000
TCU 0.000
UEMP27OV -0.265
UEMPLT5 0.000
UEMPMEAN -18.559
UEMPMED 0.000
UNEMPLOY 0.000
UNRATE -305.852
USGOOD 0.292
Table: Coefficient estimates for LASSOJohn H. Muller () Model Building Steps October 1, 2012 27 / 40
ALTSALES
CPATAX
DPI
UEMPLT5
BUSLOANS
CONSUMER
PSAVERT
UEMPMED
UEMPMEAN
SRVPRD
CIVPART
PAYEMS
CLF16OV
PCE
UNRATE
UEMP27OV
CE16OV
UNEMPLOY
TCU
USGOOD
0 200000 400000 600000 800000 1000000 1200000
Random Forest: predictor variable importance
IncNodePurity
John H. Muller () Model Building Steps October 1, 2012 28 / 40
1 Goals
2 The TaskForecasting the jobs numberPredictor Variables
3 Modeling ProcessPreliminariesFitting and Tuning the ModelModel SelectionPrediction
4 Resources
John H. Muller () Model Building Steps October 1, 2012 29 / 40
Model Selection
Model selection: choosing the best among different models
Our criteria: prediction accuracyHow will we measure this?
Training set cross validation estimates of out of sample MSERF: 52,000Lasso: 62,000
Separate test data.25,000 essentially the same for both!
John H. Muller () Model Building Steps October 1, 2012 30 / 40
2000 2002 2004 2006 2008 2010 2012
−10
00−
500
050
010
00
targetrflasso
John H. Muller () Model Building Steps October 1, 2012 31 / 40
2000 2002 2004 2006 2008 2010 2012−10
000
500
rflasso
Figure: Training set error
John H. Muller () Model Building Steps October 1, 2012 32 / 40
0 5 10 15 20
−0.
20.
20.
61.
0
Lag
AC
FRandom Forest
0 5 10 15 20
−0.
20.
20.
61.
0Lag
AC
F
Lasso
Figure: Training error ACF
John H. Muller () Model Building Steps October 1, 2012 33 / 40
Jan Mar May Jul Sep
010
020
030
040
050
0 targetrflasso
John H. Muller () Model Building Steps October 1, 2012 34 / 40
Jan Mar May Jul Sep
−40
0−
200
020
0
rflasso
Figure: Test set error
John H. Muller () Model Building Steps October 1, 2012 35 / 40
1 Goals
2 The TaskForecasting the jobs numberPredictor Variables
3 Modeling ProcessPreliminariesFitting and Tuning the ModelModel SelectionPrediction
4 Resources
John H. Muller () Model Building Steps October 1, 2012 36 / 40
Prediction
Random Forest: 120
Lasso: 86
John H. Muller () Model Building Steps October 1, 2012 37 / 40
1 Goals
2 The TaskForecasting the jobs numberPredictor Variables
3 Modeling ProcessPreliminariesFitting and Tuning the ModelModel SelectionPrediction
4 Resources
John H. Muller () Model Building Steps October 1, 2012 38 / 40
The Secrets of Economic Indicators, Bernard Baumohl
The Elements of Statistical Learning, Hastie, Tibshirani, Friedman
Macroeconomic Patterns and Stories, Edward E. Leamer
Analysis of Financial Time Series, Ruey S. Tsay
http://api.stlouisfed.org/docs/fred/good source for both FRED and ALFRED
http://cran.r-project.org/
John H. Muller () Model Building Steps October 1, 2012 39 / 40
Thank you!
and thank you to John Verostek, Vladimir Valenta and Steve Kusiak
John H. Muller () Model Building Steps October 1, 2012 40 / 40