heterogenous employment effects of job search programmes ... london kurz.pdfheterogenous employment...

Heterogenous Employment Effects ofJob Search Programmes:

A Machine Learning Approach

Michael Lechner (jointly with Michael Knaus & Anthony Strittmatter)Swiss Institute for Empirical Economic Research (SEW)University of St. Gallen | Switzerland | December 2017

Motivation

Understanding differential effects of policy measures for different types ofindividuals is important for the efficient allocation of public expenditures

• Same for private sector

Common practice is to search for effect heterogeneity by includinginteraction terms or slicing data

• Spurious heterogeneity likely to be discovered: multiple testing problem− Report only factors that are ‘significant’ ex-post data mining

– In medicine, researchers have to pre-specify analysis plan

– Ex.:If based on 50 independent test statistics, probability for false positives is 92% (5% sig. level)

• Some important heterogeneity may be overlooked− Impossible to stratify & estimate (semiparametrically) for all possible strata− Even for regression models there may be more possible interactions than data points

Our research questions

1) Do causal machine learning methods provide useful tools to

uncover effect heterogeneities in active labour market

programmes?

2) Did Swiss job search programmes have differential effects for

different groups of unemployed and case workers?

Literature | Causal ML for heterogeneity | 1

Goal: Finding and estimating CATE’s under CIA

ML methods are effective in prediction• Able to deal with very high dimensions (N & p)

• Computationally efficient

• Semiparametric (sort-of)

Literature | Causal ML for heterogeneity-latest | 4

Least Absolute Shrinkage and Selection Operator (LASSO) (or similar)-type approaches based on transformed covariates oroutcomes

• Tian, Alizadeh, Gentles, Tibshirani (2014, JASA): Experimental (plus) • Chen et al. (2017): General weighting functions

Trees with larger leaves & IPW (or transformed outcomes)• Athey & Imbens (2016, Nat. Acad. Science)

Random Forests with deep trees• Wager & Athey (2017, JASA)

...

Literature | Active Labour Market Programmes

Effects of active labour market programmes• This has now become a very large literature

• Typically based on observational studies informed by rich

administrative data employing a selection-on-observables

identification strategy

• Nice summary, e.g., by meta study of Card, Kluve, Weber (2015)

• Results generally mixed

Literature | Job search programmes

Considerable literature• E.g. Cottier, Lalive (2017), Crepon, van den Berg (2016)

Mixed results• Negative for Germany (Lechner, Wunsch, 2008)• Negative for Switzerland (Gerfin, Lechner, 2002)• More positive Danish studies (Graversen, van Ours, 2008)• …

Heterogeneity• Card et al. (2015) report better results for disadvantaged participants• Lechner & Wunsch (2009) report better results during recessions

Our (intended) contributions | 1

Show how new causal machine learning tools can be fruitfully applied in a causal framework to uncover effect heterogeneities in ALMPs

Check Swiss Job Search programmes for heterogeneities• Swiss data has case worker information

− Advantage for selection-bias correction & heterogeneity analysis

If heterogeneities were discovered, translate them into information useful for policy makers

Our approach

Use informative Swiss administrative data such that CIA plausibly

holds for conditional programme effects

Use (mainly) the LASSO (Least Absolute Shrinkage and Selection

Operator) based methods suggested by Tian et al. (2014) to

investigate the heterogeneity

Reanalyse a typical programme that has already been evaluated• Tested (and thus a bit older) administrative data set with a standard

ALMP programme for a ‘normal’ developed country

The results of the paper in a nutshell

Methods ‘work’ and provide useful information• Main conclusions robust to particular type of method & its

implementation

Swiss job search programmes• Substantial heterogeneity in the beginning (lock-in phase)

− Heterogeneity is related to type of unemployed– Programme works better for UE with bad a-priori labour market chances

– Programme works better for foreigners (probably because of lack of network for informal job search) – this effect has been overlooked so far …

− Case worker heterogeneity seems to play only a very limited role

• Heterogeneity fades out after 1 year

1 | Introduction





Institutional setting | ALMP

Active labour market programmes part of Swiss UI system• Standard set of programmes (subsidized employment & training)• >500 mio CHF (=450 mio EUR) expenditures

Job search programme• Content: Learn how to search and apply for a job• Duration ~ 22 days• Class room training• Private providers• Active job search by participants is supposed to continue during the

programme

Data | Social security data & case worker survey

Data is a (merged) combination of• Social security data

− Data sources and main variables– AVAM: Information from counselling process

– ASAL: Information relevant for paying out benefits

− AHV: Information relevant for paying out pensions− Variables useful for selection and heterogeneity analysis

– Employment histories and individual socio-demographic information

• Regional data− Economic environment

• Case worker survey− Sociodemographics of case worker− Counselling strategies

Definition of treated and control group |2

Job search programmes start early• Treated: First participation during first 6 months of UE spell• Control

− No programme participation in this period− Not employed prior to randomly allocated start date from start date

distribution of treated

12’000 participants, 72’000 controls, 1’300 case workers• (First) Inflow inflow into UE in 2003 and first caseworker• UE is 24-55 old & receives UE benefits …• Case worker replies to questionaire (response rate 84%)

1 | Introduction





Basic concept | 1

Causal framework: Rubin-Neymann-Potential-Outcome-framework

Objects of interests: Conditional average treatment effects (CATE)

Useful to disguish between two types of conditioning variables• X: Variables needed to remove selectivitiy• Z: Variables capturing (‘policy-relevant’) heterogeneity

− Z may be larger, smaller, partially or fully overlapping with X

• Distinction of X & Z is usually absent from many papers on the topic

Sometimes useful to disguish between effects for treated & non-treated (if components of X do not appear in Z)

• Ex.: Precise pre-specified treatment rules that can be used in prediction

CATE | 1

Goal is to estimate CATE(z) under CIA

1 0

1 0

1 0

1 0 1 0

( ) ( | ) ( | );( , ) ( | , ) ( | , );

Main assumptions) , | , , ,) 0 ( 1| , ) ( , ) 1,) , . (a bit too strong)

z d E Y Z z D d E Y Z z D d

a Y Y D X x Z z x zc P D X x Z z p x zd X X Z Z

z E Y Z z E Y Z zθ

χ

γ = = − =

= = = − = =

= = ∀ ∈ ∀ ∈Ξ< = = = = <

= =

Identification in this study | 1

Why is CIA plausible in this particular implementation?• Discussed in many previous papers

− Gerfin, Lechner (2002), Gerfin, Lechner, Steiger (2004), Behncke, Frölich,

Lechner (2010), etc. ….

• Main arguments− Decision mainly driven by case worker

− CW has high autonomy within organisation

− We have (almost) the same information set about the UE as the CW

− We also have information about the CW and his/her councelling style

Estimation and inference

Main idea

Modify covariates potentially responsible for heterogeneity within

a regresssion-type framework

Use IPW to remove selection bias

Use ML tools for dimension reduction and variable selection• Lasso inference possible

• Requires sparsity assumption on heterogeneity

Model can be very flexible by creating many, many interactions

Motivation and intuition of this approach

Assume linear regression model & 50% random assignment

2

treatment effectmain effect1 1 0 0

; 2 1; ( ) 0; ( ) 1

;

Y Z Z error T D E T E T

Y Z Z error Y Z Z r

T

e ror

β δ

β δ β δ

= + + = − = =

⇒ = + + = − +

1 0

2( )

( )0.5 ( )Mod.

2 Mod. out

( ) ( | ) 2

(2 | ) ( )2

covariat

co e

2

m

e

CATE z

zz

z E Y Y Z z z z

E YT Z z zZ error Z error

TZY error T Z error e

YT

rr rZ oT

δ

γγ

γ δ δ

γ

δ δδ δ δ

⇒ = − = = =

⇒ = =

⇒ = + = +

⇒ = + = + = +

Modified covariate approach | 1

Similar transformation available for logit and survival models

Robustness for deviations of functional form (LS best approx.)

Use weighted regression in observational studies

Increased efficiency if main effects are included

Modified covariate approach | 1

This specification facilitates the use of certain ML methods to search

for heterogeneity• Create very many interactions of variables (flexibility)

• Use variable selection methods LASSO− Sparsity assumption on heterogeneity required (restricted eigenvalue condition)

Heterogeneity can be well approximated by few variables

Oracle property

− Lack of main effects facilitates the use of LASSO-type estimators

– Main & interaction effects may be highly correlated difficult for LASSO to distinghuish

between them

− Post-Lasso inference may be available under standard conditions

Summary

Weighted Lasso

• Propensity score estimated with several methods

• Choice of penalty term by 10-fold cross-validation of Post-Lasso MSE

• Note that Lasso coefficients are generally biased towards 0

ˆ( 1| , )ˆ ( , , ) ˆ ˆ( 1| , ) 1 ( ( 1| , )d P D X x Z zw d x z

P D X x Z z D P D X x Z z− = = =

= = = = − − = = =

1

2

1

1ˆ ˆarg min ( , , )( )2

ˆˆ( )

p

jj

Ni i

i i i ii

z tw d x z yN

z z

δ

δ

δ δ λ δ

γ δ

=

==

= − + ∑

=

∑

Inference |1

Honesty• Use one (50%-) sample to select variables

• Use other (50%-) sample to estimate coefficients given selected

variables

• Standard OLS inference is valid

Implementation• Random sample split does (randomly) lead to different models

• Average over 30 splits

Inference | 2

Results may also be desired for aggregated (marginalized)

elements of Z• Predict for full Z and then aggregate at the end

Approximate bootstrap• OLS inference not useful because of sample splitting and aggregation

• Full bootstrap computationally too demanding because of LASSO

• Bootstap contains all estimation steps but keeps 2 steps constant− Sample splits

− Variable selection within each sample split

1 | Introduction





ATE, ATET and potential outcomes

Classical average programme effects: Summary

Programmes on average not effective

Substantial lock-in effect

No difference between effects on treated and non-treated

Summary of variable selection

Outcome: Cumulated employment after programme start

0-6 months: 17 variables selected

0-12 months: 13 variables selected

25-31: No variables selected

Distribution of CATE | 1

Aggregation of effects

Due to many variables (incl. interactions & polynomials), the

Post-Lasso coefficients may be difficult to interpret

Aggregate w.r.t. policy relevant variables• Here just examples, could also use other dimensions

Bootstrap based inference based taken the variable selection step

as given• Computation costs prohibit a ‘full’ bootstrap

Results | Lock-in effects | Marginal effects | 1

Robustness checks

IPW & matching estimation

Random forest estimation of p-score

Different specifications of the modified covariate method• Adaptive Lasso

• Choice of penalty terms

• Efficiency augmentation

Modified outcome method

Causal forests

Potential of different assignments

Investigate 5 different allocations (# of participants constant)

Statistical rules• random allocation• unemployed with the highest CATEs (best case)• unemployed with the lowest CATEs (worst case)

Easy-to-implement rules• all unemployed with at least 1 UE spell in the previous 2 years & no

degree plus a random selection of unemployed with at least 1 UE spell in previous 2 years & unskilled

• all unemployed with low employability rating plus a random sample with medium employability rating (most easy to implement)

Average employment after 6 months

1 | Introduction





ConclusionsMethods provide useful (and plausible) additional information for

‘consumers’ of empirical evaluation studies

Effect of this programme in general is questionable

No heterogeneity detected in medium-term effects

Heterogeneity detected in short-term effect (<6-12 months)• Penalty (=lock-in effect) due to reduced job search is smaller for

unemployed with lower a priori employment chances

Different CATE-based allocation rules change overall effectiveness of programme

Further research

Investigate comparative performance under specific

circumstances of the various causal ML methods

Apply to different programmes of ALMP

Apply to different fields in economics

Multiple treatments

Address trade-offs: Efficiency-robustness-computation time

Thank you for your attention!

Michael LechnerUniversity of St. Gallen | [email protected]

You find this paper (and related work) on Repec or ResearchGate.

heterogenous employment effects of job search programmes ... london kurz.pdfheterogenous employment...

Documents