iab homepage: institut für arbeitsmarkt- und berufsforschung/institute for employment research a...

24
IAB homepage: www.iab.de Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the IAB Establishment Panel – Multiple Imputation for a Better Data Access Jörg Drechsler Competence Center for Empirical Methods Institute for Employment Research of the Federal Employment Agency, Germany UNECE Work Session on Statistical Data Editing Bonn 25.09.2006-27.09.2006

Upload: thomas-patrick

Post on 05-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

IAB homepage: www.iab.de

Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research

A New Approach for Disclosure Control in the IAB Establishment Panel –

Multiple Imputation for a Better Data Access

Jörg Drechsler

Competence Center for Empirical MethodsInstitute for Employment Research of the Federal Employment Agency, Germany

UNECE Work Session on Statistical Data Editing Bonn 25.09.2006-27.09.2006

Page 2: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 2Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

Overview

The IAB Establishment Panel

Three approaches for disclosure control via multiple imputation

Application of the full MI approach to the IAB Establishment Panel

First results

Proceedings/open questions

Page 3: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 3Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

The IAB Establishment Panel

Annually conducted Establishment Survey (generally face-to-face interviews)

Since 1993 in Western Germany, since 1996 in Eastern Germany

Population: All establishments with at least one employee covered by social security

Source: Official Employment Statistics

Response rate of repeatedly interviewed establishments more than 80%

Page 4: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 4Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

The IAB Establishment Panel: Sample/Weighting

Sample of more than 16.000 establishments in the last wave

Stratified sample:20 economic branches x 10 size classes

Oversampling of large establishments

Yearly additional samples:newly founded firms and replacements for panel attrition

Weighting:- inverse sampling probabilities- adjustment to exogenous values- probabilities to stay in the sample

Page 5: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 5Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

The IAB Establishment Panel: Contents

Annual: employment structure, changes in employment, business policies, investment, training,

remuneration, working hours, collective wage agreements, works councils

Bi- or triennial: innovations, government aid, further training, flexibility of working hours, business activities, contact with

employment offices

Focus: 2001 innovation and modern technologies 2002 elderly employees and contact to the labour offices

Kölling, A. (2000): The IAB-Establishment Panel, Journal of Appl. Social Science Studies, 120: 2, 291-300.

Page 6: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 6Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

Overview

The IAB Establishment Panel

Three approaches for disclosure control via multiple imputation

Application of the full MI approach to the IAB Establishment Panel

First results

Proceedings/open questions

Page 7: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 7Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

(1) Fully Synthetic Data

Proposed by Rubin (1993) Idea: - Treat all the units from the population not included in

the sample as missing data and impute them multiply

- Take random samples from the imputed population and release these samples to the public.

Yexc

Yinc

X

X variables available for all units in the populationY variables available only for units in the surveyYinc units included in the surveyYexc units not included in the survey

Page 8: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 8Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

(2) Imputation of Selected Variables

Only for variables that bear a high risk of disclosure (key variables) observed values are replaced by imputed values

Proposal: Replace only parts of each key variable in every imputation round and combine the imputed parts to achieve fully imputed variables.

Example: 3 variables and 3 imputation rounds

      

      

      

      

      

      

      

      

      

      

      

      

      

      

      

      

    

      

  

      

      

      

      

      

      

Page 9: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 9Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

(3) Selective Multiple Imputation of Key Variables (SMIKe)

Suggested by Liu and Little (2002) Only selected units of key variables are multiply imputed Assume, the dataset can be divided in a set of categorical key

variables X and a set of continuous variables Y Cross tabulation of X yields the vector x containing cell counts for

all combinations of x Cell counts lower than a previously defined sensitivity threshold

possibly allow re-identification These cells combined with some non sensitive cells, closely

related to the sensitive cells in regard to Y, are replaced by imputed values

Page 10: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 10Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

Overview

The IAB Establishment Panel

Three approaches for disclosure control via multiple imputation

Application of the full MI approach to the IAB Establishment Panel

First results

Proceedings/open questions

Page 11: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 11Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

Generating a synthetic data set Create a synthetic data set for selected variables from the wave

1997 from the Establishment Panel Imputation for the whole population is not feasible Draw a new sample from the Official Employment Statistics using

the same sampling design as for the Establishment Panel (Stratification by economic branch, size, and region)

Each stratum cell contains the same number of observations as the wave 1997 from the Establishment Panel

Additional Information from the German Social Security Data (GSSD) for the imputation

missing data

data from thenew sample

data from the IAB Establishment Panel

Yexc

Yinc

X

Page 12: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 12Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

The German Social Security Data (GSSD)

Contains information on all employees covered by social security

Since 1973 all employers are required to notify the social security agencies about all employees covered by social security.

The GSSD represents about 80% of the German workforce Information from the GSSD is aggregated on the

establishment level and is matched to the IAB Establishment Panel via establishment identification number

Information on: number of employees by gender, schooling, mean of the employees age, mean of the wages of the employees…

Page 13: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 13Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

Imputation procedure

For simplicity new founded establishments are excluded from the sampling frame and from the panel

8 new samples are drawn The number of observations in each sample equals the

number of observations in the panel ns=np=7332 Every sample is imputed five times using chained

equations Number of variables in X=24 Number of variables in Y=48

Imputations are generated using IVEware by Raghunathan, Solenberger and Hoewyk (2001)

Page 14: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 14Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

Overview

The IAB Establishment Panel

Three approaches for disclosure control via multiple imputation

Application of the full MI approach to the IAB Establishment Panel

First results

Proceedings/open questions

Page 15: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 15Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

A regression by T. Zwick (2005) as a means of evaluation

Zwick analyses the productivity effects of different continuing vocational training forms in Germany

Results: vocational training is one of the most important measures to gain and keep productivity

Probit regression to explain, why firms offer vocational training

13 Explanatory variables including: Share of qualified employees, establishment size, region, collective wage agreement, high qualification needs expected…

2 variables, based on the 1998 wave of the panel, are dropped for the evaluation

Page 16: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 16Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

Binary variables in the original and in the synthetic data set

Variable survey meansynthetic

data meanDeviation

Training Yes/No 0.7069 0.7229 2.25%

Redundancies expected 0.2239 0.1880 -16.01%

Many employees are expected to be on maternity leave 0.0644 0.0811 25.84%

High qualification needs expected 0.1551 0.1752 12.95%

Establishment size 20-199 0.3973 0.4092 3.00%

Establishment size 200-499 0.1348 0.1450 7.57%

Establishment size 500-999 0.0745 0.0777 4.29%

Establishment size 1000+ 0.0942 0.0991 5.17%

Collective wage agreement 0.7643 0.7562 -1.06%

Apprenticeship training reaction on skill shortages 0.3632 0.3725 2.58%

Training reaction on skill shortages 0.4490 0.4693 4.52%

State-of-the-art technical equipment 0.6513 0.7095 8.94%

Apprenticeship training 0.6141 0.6398 4.17%

Page 17: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 17Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

Continuous variables in the original and in the synthetic dataset

VariableSurvey mean

synthetic data mean

Deviation

Share of qualified employees 0.6741 0.6236 -7.49%

number of employees 365.6238 356.1432 -2.59%

number of employees that participated in training measures 110.2944 88.2385 -20.00%

Page 18: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 18Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

Results from the regressionRegression as performed by T. Zwick (n=6,258)

Exogenous variables Coefficients z-value

Redundancies expected 0.2610 4.58

Emp. exp. on maternity leave 0.2516 2.49

High qualification needs expected 0.6407 8.1

Appr. tr. react. on skill shortages 0.1763 3.4

Tr. reaction on skill shortages 0.5974 11.91

Establishment size 20-199 0.6827 15.19

Establishment size 200-499 1.3514 15.71

Establishment size 500-999 1.3984 11.75

Establishment size 1000+ 1.9725 9.15

Share of qualified employees 0.7663 10.28

State-of-the-art tech. equipment 0.1755 4.16

Collective wage agreement 0.2450 5.46

Apprenticeship training 0.4199 9.31

Regression with all missing data imputed (n=7,332)

Exogenous variables Coefficients z-values

Redundancies expected 0.2491 4.62

Emp. Exp. on maternity leave 0.2657 2.82

High qual. needs expected 0.6483 8.76

Appr. tr. react. on skill shortages 0.1142 2.05

Tr. reaction on skill shortages 0.5270 9.92

Establishment size 20-199 0.6866 16.01

Establishment size 200-499 1.3555 17.22

Establishment size 500-999 1.3475 12.78

Establishment size 1000+ 1.9622 10.13

Share of qualified employees 0.7793 11.21

State-of-the-art tech. equipment 0.1694 4.3

Collective wage agreement 0.2535 5.82

Apprenticeship training 0.4841 11.24

Page 19: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 19Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

Complete data set and synthetic data setRegression with all missing data imputed (n=7,332)

Exogenous variables Coefficients z-values

Redundancies expected 0.2491 4.62

Emp. exp. on maternity leave 0.2657 2.82

High qual. needs expected 0.6483 8.76

Appr. tr. react. on skill shortages 0.1142 2.05

Tr. reaction on skill shortages 0.5270 9.92

Establishment size 20-199 0.6866 16.01

Establishment size 200-499 1.3555 17.22

Establishment size 500-999 1.3475 12.78

Establishment size 1000+ 1.9622 10.13

Share of qualified employees 0.7793 11.21

State-of-the-art tech. equipment 0.1694 4.3

Collective wage agreement 0.2535 5.82

Apprenticeship training 0.4841 11.24

Regression on the synthetic data (n=7,332)

Exogenous variables Coefficients z-values

Redundancies expected 0.2764 4.71

Many emp. exp. on maternity leave 0.2373 2.78

High qualification needs expected 0.6308 9.15

Appr. tr. react. on skill shortages 0.1442 2.66

Training reaction on skill shortages 0.5566 10.69

Establishment size 20-199 0.5466 12.65

Establishment size 200-499 1.0313 14.37

Establishment size 500-999 1.1425 10.40

Establishment size 1000+ 1.2331 9.89

Share of qualified employees 0.8692 9.98

State-of-the-art technical equipment 0.2041 5.00

Collective wage agreement 0.3117 7.10

Apprenticeship training 0.4655 10.81

Page 20: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 20Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

Overview

The IAB Establishment Panel

Three approaches for disclosure control via multiple imputation

Application of the full MI approach to the IAB Establishment Panel

First results

Proceedings/open questions

Page 21: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 21Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

Proceedings/Open Questions

Use non parametric approaches

Replace only selected variables

Measure the disclosure risk after imputation

Generate weights for the synthetic sample?

Page 22: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 22Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

Thank you for the attention!

Page 23: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 23Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

Rubin’s adjusted combining rules

• Imputation yields m different data sets

• Information from the data sets has to be combined to get valid estimates

Point Estimate: Average of the point estimates from the different data sets

m

i

iMI m 1

)(ˆ1ˆ

Variance estimate as a combination of the variance within the data sets (W) and the variance between the data sets (B)

m

i

t

mW

1

)( )ˆr(av1

m

iMI

i

mB

1

2)( )ˆˆ(1

1

WBm

mMI

1)ˆr(av B

m

mW

1(not )

with

Additional sampling step necessary, when creating synthetic data sets variance B already reflects the variance within each population

Page 24: IAB homepage:  Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the

Jörg Drechsler 26. September 2006

Slide 24Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research

- number of employees in June 1996 - qualification of the employees- number of temporary employees- number of agency workers- working week (full-time and overtime)- the firm‘s commitment to collective agreements- existence of a works council- turnover, advance performance and export share- investment total- overall wage bill in June 1997- technological status- age of the establishment- legal form and corporate position- overall company-economic situation- reorganisation measures- company further training activities- additional information on new foundations

Information contained in the German Social Security Data (from 1997)

Available for all German establishments with at least one employee covered by social security

Information contained in the IAB Establishment Panel (wave 1997)

Available for establishments in the survey

Covered in both datasets

establishment number, branch and size

location of the establishment

number of employees in June 1997

- number of full-time and part-time employees- short-time employment- mean and standard deviation of the employees age- mean and standard deviation of wages from

full-time employees- mean and standard deviation of wages from all

employees- occupation- schooling and training- number of women and men- number of German employees

Information from the two data sets