assessing quality of income data in a survey vs an administrative source dmitri romanov and yuri...

19
Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Upload: cody-lewis

Post on 25-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Assessing quality of income data in a survey vs an administrative source

Dmitri Romanov and Yuri Gubman

Q2014

Vienna, 5 June 2014

Page 2: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Outline

Motivation

Literature

Research hypotheses

Data

Methodology

Key findings and conclusions

Romanov&Gubman: Quality of income data in survey vs. administrative file 2

Page 3: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Motivation

Income items are often presented in the surveys by a banded question, when income variable does not serve for substantive analysis, but as a socioeconomic classificatory variable, e.g. in Israel’s Social Survey.

This form is considered to be mitigating the issues of item non-response and inaccuracy associated with the sensitivity of income inquiry (Tourangeau and Smith, 1996).

Administrative data on incomes, usually from the national tax or social security authorities, are available in many NSO. These are regarded as a (partial) substitute for the survey data.

Romanov&Gubman: Quality of income data in survey vs. administrative file 3

Page 4: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Motivation (cont.)

Income data from both types of sources, administrative and survey, differ in definitions, timeliness, periods of reference, and populations covered.

Any unadjusted comparison should yield discrepancies. Even after a thorough adjustment on both sides, meaningful discrepancies are normally found.

There is a lively debate on the issue: Which source better represents the true income? Is under-reporting in a survey preferable--for statistical purposes--to cheating to tax authorities?

Without taking a stand on that, we analyze the distribution of discrepancies, by income source.

Romanov&Gubman: Quality of income data in survey vs. administrative file 4

Page 5: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Literature

Micklewright and Schnepf (2007): the use of a banded question may result in loss of information within the bands and, in turn, may impair the accuracy of the aggregate estimates derived from the survey.

Moore et al. (JOS 2001): “Response bias estimates for wage/salary income amounts are generally small and without a consistent sign, indicating neither under- nor over-reporting bias given accurate source reporting”.

Martin et al. (1996): reporting of income from business and self-employed activity, is more difficult in terms of conceptualization, definition and questioning than reporting wage/salary income and transfer payments.

Romanov&Gubman: Quality of income data in survey vs. administrative file 5

Page 6: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Literature (cont.)

Romanov and Furman (2006): found a “regression to the mean” of the reportage of income in a census around the data from an administrative file. If errors among the wealthy in one direction are offset by errors among the poor in the opposite direction, the average error may be negligible.

Abowd and Stinson (2011): both survey and administrative data on earnings are viewed as noisy measures of some underlying true amount of annual earnings.

Romanov&Gubman: Quality of income data in survey vs. administrative file 6

Page 7: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Research hypotheses

The distribution of discrepancies between the survey and administrative income measures exhibits a “regression to the mean”, i.e. the discrepancies at the ends of the distribution are of opposite sign and, possibly, heteroscedastic.

Different factors explain the discrepancies on each end of the income distribution. The factors are related to patterns of employment and regularity of earnings: number of jobs, part-time employment and periods of unemployment, fringe benefits.

Different factors explain the discrepancies among the wage-earners and among the self-employed individuals. A leading factor is a cognitive complexity of response on the income item for the SE.

Romanov&Gubman: Quality of income data in survey vs. administrative file 7

Page 8: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Data

In 2008’ Social Survey (CAPI, response rate 83%), 4,493 working individuals, employees or self-employed, were asked the GROSS INCOME question

Employees: Last month, what was your gross income, before deductions, from all places where you worked?

Self-employed: Last month, what was your gross income, before deductions, from all places where you worked, including wages and income from a business?

Overall item non-response rate was 8.26%

The respondents were asked to choose a value from a card listing 10 predetermined income bands

Romanov&Gubman: Quality of income data in survey vs. administrative file 8

Page 9: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Data (cont.)

These records were matched, by unique PIN, with the administrative file that contains information on all taxable incomes from wage/salary and self-employment/business of individual taxpayers.

Annual administrative records were adjusted by the number of months of individual’s employment in all jobs (12 for the SE), and traced to the “last month” inferred to in the Social Survey.

Romanov&Gubman: Quality of income data in survey vs. administrative file 9

Page 10: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Data (cont.)

For the 2008’ Social Survey, Blaise Audit Trail (AT) log files are available. This log captures, inter alia, time of entrance to fill in a field (a question), time of exit, value entered by interviewer. If the respondent corrects a previous response, as many lines are recorded in the relevant field in the AT file as the number of times the interviewer corrected the value of the field.

Matching between the survey, the administrative file, and the AT log file yielded 3,417 records.

Romanov&Gubman: Quality of income data in survey vs. administrative file 10

Page 11: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Data (cont.)

Romanov&Gubman: Quality of income data in survey vs. administrative file 11

0

1

2

3

4

5

6

7

8

9

10

centiles of income from the administrative file

band

of

inco

me

repo

rted

in th

e su

rvey

Thick line—income from administrative file by survey reportage bands.Thin line—income reported in survey. Broken line—continuous estimator of income reported in survey based on fitted

Page 12: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Methodology

Romanov&Gubman: Quality of income data in survey vs. administrative file 12

Dependent variable: discrepancy (in %) between the value of individual’s income in the administrative file and an approximation of her income from the survey. The approximation of the banded values of gross income was found by ML fitting of different theoretic distributions. Best fit was received for the skew-t distribution.

“No errors” are considered when both values fall within the same band.

Respondent group

Pct. of negative errors

Pct. of no error

Pct. of positive errors

Mean error

S.D. of error

Full sample 42.1 39.6 18.3 19.5- 71.9

Employees 44.6 40.4 15.0 21.8- 67.3

Self-employed

24.0 33.8 42.2 2.3- 97.7

Page 13: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Methodology (cont.)

Explanatory variables:

Basic socio-demographic characteristics: gender, age, immigrant status, marital status, ethnic group, education.

Employment traits for WE: number of jobs held, usual working hours, part-time main job, more/less-than usual hours last week, job satisfaction, wage raise/cut last year.

Contract-dependent fringe benefits: receives full pay for sick days, pension and advance training fund contributions paid by employer, participation in profit-sharing, receives company car, reimbursement of transportation expenses.

Ln of income from the administrative file.Romanov&Gubman: Quality of income data in survey vs. administrative file 13

Page 14: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Methodology (cont.)

Audit Trail variables:

Response time.

After the GROSS income question, a NET income question was asked. When facing it, some respondents realized they had given net income value (which people in Israel are more familiar with) instead of gross. Then they returned to the gross income question and corrected the previous answer.

Two variables constructed on this basis:

Dummy variable of correction of response about gross income during interview.

Size of response correction.Romanov&Gubman: Quality of income data in survey vs. administrative file 14

Page 15: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Key findings and conclusions

1. Regression to the mean occurs in the income reported in the survey as against that obtained from the administrative source. Positive discrepancies were more common among low-income respondents while negative discrepancies were more common among those of high income.

2. A significant negative monotonic relation was found between the income recorded in the administrative file and the measurement error. The relation was stronger for negative discrepancies than for positive ones (elasticity -0.5 and -0.2).

Romanov&Gubman: Quality of income data in survey vs. administrative file 15

Page 16: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Key findings and conclusions

3. High-income persons tend to “forget” income that they received from additional jobs, overtime, self-employed income, fringe benefits, and windfall gains such as bonuses and profit-sharing. As a result, they tend to under-report their labor income in the survey relative to the administrative data.

4. In contrast, low-income workers, who hold part-time and/or irregular jobs, tend to report the income they receive in a full month of work, a level that may be not representative of their average income. Consequently, they are likely to over-report their income in the survey relative to the administrative data.

5. This proves that the factors related to negative discrepancies (at the high end of income distribution) are different from those that are associated with negative discrepancies (at the low end of income distribution).

Romanov&Gubman: Quality of income data in survey vs. administrative file 16

Page 17: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Key findings and conclusions

6. Employees and self-employed should not be pooled into one model due to material differences in the conceptual definition of income, how income is measured, and volatility in income level during the year.

7. The response time to the gross income question was 27% longer among the self-employed than among employees, indicating that the former found the question harder to answer. Among the self-employed, positive discrepancies were almost twice as frequent as negative ones but the average was close to zero.

8. Only a few factors were found to be related to discrepancies among the self-employed: subjective variables such as satisfaction with income and expectations of business progress in the near future.

Romanov&Gubman: Quality of income data in survey vs. administrative file 17

Page 18: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Key findings and conclusions

9. Inserting a question about NET income immediately after inquiring about GROSS income, as a logical way to allow respondents to control their responses, caused 22% of the respondents to go back and check the accuracy of their responses to the gross income question. Reversion to the gross income question during the interview and correction of the answer reduced the discrepancies among both employees and the self-employed.

10. When analyzing measurement errors, one would be advised to do it separately for the employees and the self-employed, for positive and negative discrepancies, and with different factors for each group.

Romanov&Gubman: Quality of income data in survey vs. administrative file 18

Page 19: Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

Thank you