heterogenous employment effects of job search programmes ... london kurz.pdfheterogenous employment...
TRANSCRIPT
Heterogenous Employment Effects ofJob Search Programmes:
A Machine Learning Approach
Michael Lechner (jointly with Michael Knaus & Anthony Strittmatter)Swiss Institute for Empirical Economic Research (SEW)University of St. Gallen | Switzerland | December 2017
1 | Introduction
2 | Institutions & data
3 | Empirical strategy & econometrics
4 | Results & robustness
5 | Conclusions & further research
1 | Introduction
2 | Institutions & data
3 | Empirical strategy & econometrics
4 | Results & robustness
5 | Conclusions & further research
Motivation
Understanding differential effects of policy measures for different types ofindividuals is important for the efficient allocation of public expenditures
• Same for private sector
Common practice is to search for effect heterogeneity by includinginteraction terms or slicing data
• Spurious heterogeneity likely to be discovered: multiple testing problem− Report only factors that are ‘significant’ ex-post data mining
– In medicine, researchers have to pre-specify analysis plan
– Ex.:If based on 50 independent test statistics, probability for false positives is 92% (5% sig. level)
• Some important heterogeneity may be overlooked− Impossible to stratify & estimate (semiparametrically) for all possible strata− Even for regression models there may be more possible interactions than data points
Our research questions
1) Do causal machine learning methods provide useful tools to
uncover effect heterogeneities in active labour market
programmes?
2) Did Swiss job search programmes have differential effects for
different groups of unemployed and case workers?
Literature | Causal ML for heterogeneity | 1
Goal: Finding and estimating CATE’s under CIA
ML methods are effective in prediction• Able to deal with very high dimensions (N & p)
• Computationally efficient
• Semiparametric (sort-of)
Literature | Causal ML for heterogeneity-latest | 4
Least Absolute Shrinkage and Selection Operator (LASSO) (or similar)-type approaches based on transformed covariates oroutcomes
• Tian, Alizadeh, Gentles, Tibshirani (2014, JASA): Experimental (plus) • Chen et al. (2017): General weighting functions
Trees with larger leaves & IPW (or transformed outcomes)• Athey & Imbens (2016, Nat. Acad. Science)
Random Forests with deep trees• Wager & Athey (2017, JASA)
...
Literature | Active Labour Market Programmes
Effects of active labour market programmes• This has now become a very large literature
• Typically based on observational studies informed by rich
administrative data employing a selection-on-observables
identification strategy
• Nice summary, e.g., by meta study of Card, Kluve, Weber (2015)
• Results generally mixed
Literature | Job search programmes
Considerable literature• E.g. Cottier, Lalive (2017), Crepon, van den Berg (2016)
Mixed results• Negative for Germany (Lechner, Wunsch, 2008)• Negative for Switzerland (Gerfin, Lechner, 2002)• More positive Danish studies (Graversen, van Ours, 2008)• …
Heterogeneity• Card et al. (2015) report better results for disadvantaged participants• Lechner & Wunsch (2009) report better results during recessions
Our (intended) contributions | 1
Show how new causal machine learning tools can be fruitfully applied in a causal framework to uncover effect heterogeneities in ALMPs
Check Swiss Job Search programmes for heterogeneities• Swiss data has case worker information
− Advantage for selection-bias correction & heterogeneity analysis
If heterogeneities were discovered, translate them into information useful for policy makers
Our approach
Use informative Swiss administrative data such that CIA plausibly
holds for conditional programme effects
Use (mainly) the LASSO (Least Absolute Shrinkage and Selection
Operator) based methods suggested by Tian et al. (2014) to
investigate the heterogeneity
Reanalyse a typical programme that has already been evaluated• Tested (and thus a bit older) administrative data set with a standard
ALMP programme for a ‘normal’ developed country
The results of the paper in a nutshell
Methods ‘work’ and provide useful information• Main conclusions robust to particular type of method & its
implementation
Swiss job search programmes• Substantial heterogeneity in the beginning (lock-in phase)
− Heterogeneity is related to type of unemployed– Programme works better for UE with bad a-priori labour market chances
– Programme works better for foreigners (probably because of lack of network for informal job search) – this effect has been overlooked so far …
− Case worker heterogeneity seems to play only a very limited role
• Heterogeneity fades out after 1 year
1 | Introduction
2 | Institutions & data
3 | Empirical strategy & econometrics
4 | Results & robustness
5 | Conclusions & further research
Institutional setting | ALMP
Active labour market programmes part of Swiss UI system• Standard set of programmes (subsidized employment & training)• >500 mio CHF (=450 mio EUR) expenditures
Job search programme• Content: Learn how to search and apply for a job• Duration ~ 22 days• Class room training• Private providers• Active job search by participants is supposed to continue during the
programme
Data | Social security data & case worker survey
Data is a (merged) combination of• Social security data
− Data sources and main variables– AVAM: Information from counselling process
– ASAL: Information relevant for paying out benefits
− AHV: Information relevant for paying out pensions− Variables useful for selection and heterogeneity analysis
– Employment histories and individual socio-demographic information
• Regional data− Economic environment
• Case worker survey− Sociodemographics of case worker− Counselling strategies
Definition of treated and control group |2
Job search programmes start early• Treated: First participation during first 6 months of UE spell• Control
− No programme participation in this period− Not employed prior to randomly allocated start date from start date
distribution of treated
12’000 participants, 72’000 controls, 1’300 case workers• (First) Inflow inflow into UE in 2003 and first caseworker• UE is 24-55 old & receives UE benefits …• Case worker replies to questionaire (response rate 84%)
1 | Introduction
2 | Institutions & data
3 | Empirical strategy & econometrics
4 | Results & robustness
5 | Conclusions & further research
Basic concept | 1
Causal framework: Rubin-Neymann-Potential-Outcome-framework
Objects of interests: Conditional average treatment effects (CATE)
Useful to disguish between two types of conditioning variables• X: Variables needed to remove selectivitiy• Z: Variables capturing (‘policy-relevant’) heterogeneity
− Z may be larger, smaller, partially or fully overlapping with X
• Distinction of X & Z is usually absent from many papers on the topic
Sometimes useful to disguish between effects for treated & non-treated (if components of X do not appear in Z)
• Ex.: Precise pre-specified treatment rules that can be used in prediction
CATE | 1
Goal is to estimate CATE(z) under CIA
1 0
1 0
1 0
1 0 1 0
( ) ( | ) ( | );( , ) ( | , ) ( | , );
Main assumptions) , | , , ,) 0 ( 1| , ) ( , ) 1,) , . (a bit too strong)
z d E Y Z z D d E Y Z z D d
a Y Y D X x Z z x zc P D X x Z z p x zd X X Z Z
z E Y Z z E Y Z zθ
χ
γ = = − =
= = = − = =
= = ∀ ∈ ∀ ∈Ξ< = = = = <
= =
Identification in this study | 1
Why is CIA plausible in this particular implementation?• Discussed in many previous papers
− Gerfin, Lechner (2002), Gerfin, Lechner, Steiger (2004), Behncke, Frölich,
Lechner (2010), etc. ….
• Main arguments− Decision mainly driven by case worker
− CW has high autonomy within organisation
− We have (almost) the same information set about the UE as the CW
− We also have information about the CW and his/her councelling style
Estimation and inference
Main idea
Modify covariates potentially responsible for heterogeneity within
a regresssion-type framework
Use IPW to remove selection bias
Use ML tools for dimension reduction and variable selection• Lasso inference possible
• Requires sparsity assumption on heterogeneity
Model can be very flexible by creating many, many interactions
Motivation and intuition of this approach
Assume linear regression model & 50% random assignment
2
treatment effectmain effect1 1 0 0
; 2 1; ( ) 0; ( ) 1
;
Y Z Z error T D E T E T
Y Z Z error Y Z Z r
T
e ror
β δ
β δ β δ
= + + = − = =
⇒ = + + = − +
1 0
2( )
( )0.5 ( )Mod.
2 Mod. out
( ) ( | ) 2
(2 | ) ( )2
covariat
co e
2
m
e
CATE z
zz
z E Y Y Z z z z
E YT Z z zZ error Z error
TZY error T Z error e
YT
rr rZ oT
δ
γγ
γ δ δ
γ
δ δδ δ δ
⇒ = − = = =
⇒ = =
⇒ = + = +
⇒ = + = + = +
Modified covariate approach | 1
Similar transformation available for logit and survival models
Robustness for deviations of functional form (LS best approx.)
Use weighted regression in observational studies
Increased efficiency if main effects are included
Modified covariate approach | 1
This specification facilitates the use of certain ML methods to search
for heterogeneity• Create very many interactions of variables (flexibility)
• Use variable selection methods LASSO− Sparsity assumption on heterogeneity required (restricted eigenvalue condition)
Heterogeneity can be well approximated by few variables
Oracle property
− Lack of main effects facilitates the use of LASSO-type estimators
– Main & interaction effects may be highly correlated difficult for LASSO to distinghuish
between them
− Post-Lasso inference may be available under standard conditions
Summary
Weighted Lasso
• Propensity score estimated with several methods
• Choice of penalty term by 10-fold cross-validation of Post-Lasso MSE
• Note that Lasso coefficients are generally biased towards 0
ˆ( 1| , )ˆ ( , , ) ˆ ˆ( 1| , ) 1 ( ( 1| , )d P D X x Z zw d x z
P D X x Z z D P D X x Z z− = = =
= = = = − − = = =
1
2
1
1ˆ ˆarg min ( , , )( )2
ˆˆ( )
p
jj
Ni i
i i i ii
z tw d x z yN
z z
δ
δ
δ δ λ δ
γ δ
=
==
= − + ∑
=
∑
Inference |1
Honesty• Use one (50%-) sample to select variables
• Use other (50%-) sample to estimate coefficients given selected
variables
• Standard OLS inference is valid
Implementation• Random sample split does (randomly) lead to different models
• Average over 30 splits
Inference | 2
Results may also be desired for aggregated (marginalized)
elements of Z• Predict for full Z and then aggregate at the end
Approximate bootstrap• OLS inference not useful because of sample splitting and aggregation
• Full bootstrap computationally too demanding because of LASSO
• Bootstap contains all estimation steps but keeps 2 steps constant− Sample splits
− Variable selection within each sample split
1 | Introduction
2 | Institutions & data
3 | Empirical strategy & econometrics
4 | Results & robustness
5 | Conclusions & further research
ATE, ATET and potential outcomes
Classical average programme effects: Summary
Programmes on average not effective
Substantial lock-in effect
No difference between effects on treated and non-treated
Summary of variable selection
Outcome: Cumulated employment after programme start
0-6 months: 17 variables selected
0-12 months: 13 variables selected
25-31: No variables selected
Distribution of CATE | 1
Aggregation of effects
Due to many variables (incl. interactions & polynomials), the
Post-Lasso coefficients may be difficult to interpret
Aggregate w.r.t. policy relevant variables• Here just examples, could also use other dimensions
Bootstrap based inference based taken the variable selection step
as given• Computation costs prohibit a ‘full’ bootstrap
Results | Lock-in effects | Marginal effects | 1
Robustness checks
IPW & matching estimation
Random forest estimation of p-score
Different specifications of the modified covariate method• Adaptive Lasso
• Choice of penalty terms
• Efficiency augmentation
Modified outcome method
Causal forests
Potential of different assignments
Investigate 5 different allocations (# of participants constant)
Statistical rules• random allocation• unemployed with the highest CATEs (best case)• unemployed with the lowest CATEs (worst case)
Easy-to-implement rules• all unemployed with at least 1 UE spell in the previous 2 years & no
degree plus a random selection of unemployed with at least 1 UE spell in previous 2 years & unskilled
• all unemployed with low employability rating plus a random sample with medium employability rating (most easy to implement)
Average employment after 6 months
1 | Introduction
2 | Institutions & data
3 | Empirical strategy & econometrics
4 | Results & robustness
5 | Conclusions & further research
ConclusionsMethods provide useful (and plausible) additional information for
‘consumers’ of empirical evaluation studies
Effect of this programme in general is questionable
No heterogeneity detected in medium-term effects
Heterogeneity detected in short-term effect (<6-12 months)• Penalty (=lock-in effect) due to reduced job search is smaller for
unemployed with lower a priori employment chances
Different CATE-based allocation rules change overall effectiveness of programme
Further research
Investigate comparative performance under specific
circumstances of the various causal ML methods
Apply to different programmes of ALMP
Apply to different fields in economics
Multiple treatments
Address trade-offs: Efficiency-robustness-computation time
Thank you for your attention!
Michael LechnerUniversity of St. Gallen | [email protected]
You find this paper (and related work) on Repec or ResearchGate.