on measuring tax evasion

Journal of Economic PsychoIogy 13 (1992) 545-567 North-Holland

545

On. measuring tax evasion

Henk Elffers ‘, Henry S.J. Robben b and Dick J. Hessing a ‘I Erasmus ~?~i~ersit~l Rotte~dam, R~tterdam, The Ni~t~lerlands ’ Unir*ersity of Tilburg, Tifhurg, The Netherlands

Received January 20, 1992; accepted May 1X, 1992

This study reports on the measurement problem in studying tax evasion behaviour of individuals. The three most frequently used methods in researching tax evasion (self-reports, officers’ classification and experimental methods) are presented. Having observed a lack of association between self-report evasion behaviour and officers’ classifications in a previous study (Elffers, Weigel and Hessing 1987), the authors report an empirical study in which the three measures were used on one and the same sample of taxpayers. Not only was the lack of association between self-reported behaviour and officers’ classifications replicated but evasion in the experiment did not correlate with either of these. The authors conclude that tax evasion consists of at least three conceptually independent aspects that need to be assessed by three independent measures. Consequences for future research on tax evasion are discussed.

1. Introduction

In the last decade the academic community has displayed a keen interest in researching fiscal noncompliance and tax evasion. Both the practical relevance, vis-a-vis the assumed immense tax gap, and the theoretical interest in rule transgression behaviour in general have been stated as reasons for the popularity of the theme. The predomi- nant problem in tax evasion research is measurement: the difficulties in assessing tax behaviour are formidable. In 1987 the Journal of ~~~~u~~~ ~~yc~~Zo~ published an empirical study on tax evasion measurement (Elffers, Weigel and Hessing, 1987; we refer to this study as EWH). The present article discusses progress in the measurement problem since EWH. We concentrate on the measurement aspects of a study designed for developing and testing a model for

Correspondence to: H. Elffers, Erasmus Centre for Sociolegal Tax Research, Erasmus University Rotterdam, P.O. Box 1738, 3000 DR Rotterdam, The Netherlands.

0167-4870/92/$05.00 0 1992 - Elsevier Science Publishers B.V. All rights reserved

taxpaying behaviour. That study among taxpayers and tax inspectors was carried out in the Netherlands in the late ‘SOS, and is extensively reported in Robben (1991) and Elffers (1991). They also give an overview of the relevant hterature. Other literature reviews are given by Roth et al. (1989) and Long and Swingen (1991). Noncompliance is understood here as filing a tax return which deviates from what the law requires, given the situation of the taxpayer. Evasion is understood as wilful noncompliance. Basically, three approaches are available for assessing individuals’ fiscal status as tax compliers or noncom- pliers. The first two approaches are taxpayer-oriented. The most popular method uses se~~reports, where taxpayers are asked to report their filing behaviour. Less common are expeuimentul methods, in which the taxpayer is studied while performing tasks in an experimental setting. The other approach is return-oriented, where taxpayers’ behaviour is assessed by means of classifying their returns as display- ing (non-)compliance or evasion. The dominant method here is the routine assessment of a tax officer processing a tax return, which establishes whether or not the return is acceptable, or whether it is challenged because the declared income is not in accordance with legal requirements. This method yields a judgement on compliance, not on evasion. We shall denote this type of measure here as officers’ classification .

Though the literature shows awareness of the problem of distortion of self-reports on rule transgressive behaviour, seeking self-reports has been the dominant strategy in tax-evasion research. This is to some extent due to the cumbersomeness of obtaining tax inspectors’ judgements; an approach for which legal obstacles on confidentiality of the data concerned abound. Experimental methods are relatively new in this field and until now less frequently applied.

The remainder of the article is organised as follows. In section 2.1 we present a review of where EWH has left us. Section 2.2 formulates the need to further research the quality of officers’ CIassification of ~non)compliance, and section 2.3 looks into the need for further investigating the validity of self-reports. Section 2.4 discusses the possible value of incorporating experimental methods in a research design addressing tax evasion. Section 2.5 gives an overview of the research goals of our empirical study presented in section 3. Section 4 concludes with a discussion on the relevance of the empirical study for tax-evasion research in general.

H. Elffers et al. / On measuring tax wasion s47

2. Tax-evasion measurement in the EWH-study

2.1. Self-report versus officers’ classification

EWH was the first article which directly addressed the issue of the relationship between officers’ classification and self-reports of taxpayers. That paper demonstrated a strategy capable of documenting the occurrence of an individual’s tax-evasion behaviour as seen by the tax inspectorate from tax returns, and combining it with the self-reports of the taxpayers concerned, while protecting individual anonymity. The results of combining these radically different ways of assessing tax evasion were astonishing: a zero correlation between the two measures emerged and, moreover, both measures seemed to relate to different sets of attitudinal and personal characteristics of the taxpayers. Of course, no perfect correlation could be expected, as a result of such factors as differences in knowledge levels, perspectives and purposes between tax officers and taxpayers. However, although rather low correlation was to be expected, a zero correlation was indeed a surprise. After all, EWH believed that the various measures were targeted at one common core concept of tax evasion. The authors discussed various possibilities with respect to the divergence of the two measures, and concluded that in this particular study, due to the contrast group design used, more trust should be placed in the officially documented behaviour. They classified people whose self-reports differed from officers’ reports as boasters and repudiators, and they argued that the self-reports differed from the documented behaviour because of self-presentational concerns and lack of awareness of the behaviour of the respondents. Partial replications of these results were obtained by Kinsey (1988) and by Webley et al. (1991).

Before dealing with an interpretation of this vanishing correlation between the official judgement and the individual self-report where a substantial relationship was expected, the possibility must be considered that this result is an artifact of measurement procedures, We see three possibilities:

(i> that the officers’ classification involves large measurement errors. If this is the case, we have to explore the possibility of improving that measure;

(ii) that the self-report measure involves large measurement errors. We should in that case try and improve the self-reports;

(iii) that the correlation observed is in reality not zero, but only relatively small. If improving upon these measures themselves is difficult, we may try to solve the problem of assessing evasion by adding more indicators which tap the same dimension, thus finding a way of identifying a common factor behind the different measures.

The present article considers these three possibilities, and demon- strates, from a new empirical study, that none of them can account for, nor can increase, the low correlation between self-reports and officers’ judgements. The implications for the evasion measurement problem that this observation yields will be discussed.

2.2. Restrictions on the clalidity of officers’ classification

To what extent does the officers’ judgemcnt represent a good measure of tax evasion? Apart from the practical objection that gaining the collaboration of the tax authorities for this type of assessment is difficult, we observe that the design of the EWH-study displays too many restrictions for claiming generality. The main re- striction lies in the artificial selection of cases by means of a contrast group design: the study incorporated only clear cases. Tax inspectors were asked to review individual tax returns independently. An inspector classified a taxpayer as an er%ader if in two consecutive years a correction had been applied of at least f500 (approximately $2501, and if both of these corrections were judged as clearly reproachable attempts to underreport taxable income. The inspectors, on the other hand, had to classify a case as fully compliant if no correction whatsoever had been applied for two consecutive years. All other cases were set aside: these comprised, for example, cases for which in only one year or only small corrections were indicated, cases in which the misrepresentation of income could not unequivocally be classified as reproachable (such misrepresentations may have resulted from inadvertent errors). Only those cases were retained that were judged to be evasive or fully compliant by two independently working tax officials.

Of course, this clear-cases selection mechanism was used by the authors for enhancing the power of their research. A contrast study works as a magnifying glass: rather small differences become visible, because the unclear cases that clutter and confound the image are removed. This quality of the contrast group method is gained at the cost of the generalisability of the results. EWH only studied clear cases, and the result, therefore, is only relevant for clear cases. For EWH this means that only cases for which the judgement of the tax inspector is rather clear and relatively indisputable have been studied, The conclusions of the study lack generality, and we reconsider them by addressing them in a more general, less selective design since we firmly believe that there is a general model for tax evasion behaviour that applies to all categories of respondents, including clear as well as other cases. A tentative conclusion that o~~cers~ ciassi~c~tio~ is t:alid could be considerably affected by the selective design, and it remains to be seen whether it can be substantiated in a more natural, unselec- tive design in which also less clear, in-between cases also have to be classified by the inspectors.

In itself, the need for officers’ classification of more representative cases creates new problems. The tax evasion literature rather gener- ally expresses trust in officers’ classification of tax behaviour. Though most studies have used self-reports, it seems fair to say that this has been largely the effect of faute de rnieux. The impressive report of the NSA-panel on tax compliance research (Roth et al. 1989; Roth and Scholz 1989) neatly sums up the general idea that if nnd when possible, officers’ classification should be used, as it is the best approximation of a valid assessment of tax evasion. We have to be aware, however, that this positive opinion about the quality of officers’ classification ignores the difference between noncompliance and evasion. The routine assessment by a tax officer of the status of a tax return as either being in need of correction or being correct does not address the issue of evasion, but only that of noncompliance. In most cases, and in the Dutch tax system in fact in virtually all cases, it is immaterial for tax officers as to whether or not an established incorrect declaration of taxable income should be classified as wilful, hence evasion, or as accidental: in both cases they will confine themselves to correcting the declared income, and send an additional tax bill or refund. No fines will be applied in the majority of cases, therefore no questions about intent need to be answered. It is well documented that a considerable

fraction of noncompliance should not be seen as evasion, and the identification of officers’ judgement about noncompliance as a measure of evasion has been severely criticised (Long and Swingen 1991). Therefore, it is conceivable that in such cases low correlations should be attributed to the fact that self-report measures address evasion, while this type of officers’ assessments are targeted at noncompliance.

For assessing evasion by means of a return-oriented method, it is paramount that the officers qualify their noncompliance statements as to assumed intent, yielding an evasion judgement. Where EWH argued that in contrasting clear cases the evasion judgement of tax officers is a useful measurement method, we do not have any evidence that this is also true for the in-between cases. We expect that it is much more difficult for the officers to assess intent in these cases.

The study reported in section 3 addresses the reliability and validity of officers’ classification of evasion in a representative sample of tax returns.

2.3. Restrictions on the r’alidity of self-reports

Problems with self-reports on rule-transgressing behaviour are well documented in the literature (see e.g. Zimring and Hawkins 1973: 321-327; Wilson and Herrnstein 1985: 37-38; Hessing et al. 1988). In the special case of comparing self-reports with officers’ classification of evasion, EWH in fact argue that it is the low quality of self-reports that best explains the discrepancy observed. They argue that self-presentational concerns, awareness of behaviour and memory problems can account for many of the differences observed between self-reports and the - in this particular study rather indisputable - official report. The conclusion that self-reports are inrlalid is not so much affected by the use of the special contrast design of EWH. For, if a method does not perform well in a particular area, logically it cannot perform well over the whole range. It is, of course, possible that this special area is not representative, and that by coincidence we have seen the method’s worst performance in this particular case. Nevertheless, we feel a need for replicating these results. We will adapt the self-report measure. The characteristics of the situation of self-reporting will be altered in a way that minimises or redirects the self-presentational concern of the respondent, as well as enhances his or her memory of

exactly what happened with the tax return in question. The study described in section 3 has adapted the self-report method in this way.

2.4. More indicators? Experimental methods

When contemplating additional indicators of evasion and noncompliance, we need indicators that are tapping the same dimension, without being mere echoes of the measures already present. For this purpose we included experimental measures of tax evasion. Two books on experimental measures of tax evasion have recently been published. The first one, Webley et al. (19911 is a monograph on experimental methods in tax-evasion research in general. The second one, Robben (19911, treats experimental methods as a way of improving the existing set of measures in order to approach a solution of the measurement problem. The present treatment draws heavily on Robben.

The general idea of an experimental approach to study taxpaying behaviour is simple. Ideally we would manipulate the environment in which people perform the behaviour under study, observe that behaviour, and then draw appropriate conclusions. The great advantage of this method is that we can isolate the variables of interest and study the effects of our manipulations knowing that these are not the result of extraneous factors. No other method is as good at untangling the causal structure of a phenomenon. Set against these advantages are some disadvantages. The artificial nature of experiments is probably the commonest criticism and this usually leads to comments about their lack of external validity. People are right to worry about external validity but not because of artificiality. Artificiality per se does not matter as long as the important variables have been operationalised so that they engage the same psychological processes as their real-world counterparts. It is, of course, no easy task to design experiments satisfying these demands.

Friedland et al. (1978) pioneered the use of experiments in this area. They carried out a small-scale study in which student subjects were given tax tables and then received a monthly ‘income’ which they had to declare so as to maximise net income. The experiment had a 2 x 2 design with tax rates and frequency of auditing/severity of fine as independent variables. A number of studies has used the same experimental set-up (Spicer and Becker 1980; Friedland 1982; Spicer and Thomas 1982; Spicer and Hero 1985). Later experiments tried to

enhance realism by introducing more complex tasks for the tax-declaring subjects (progressive tax rates; transfer payments; possibility of asking for outside advice) and some experiments paid the subjects real money that was subsequently used for paying taxes (Giith and Mackscheidt 1985; Baidry 1986, 1987; Becker et al. 1987).

A general criticism raised against these experiments by Webley and Halstead (19%) was that the purpose of the experiment is too obvious for the subjects, who can only manipulate what is happening by evading taxes. Webley devised a new type of experiment (Webley et al. 1985) in which the tax declaration purpose is concealed. Subjects are asked to act as a businessman (landscape gardener, small shop keeper) and their small business is simulated on a computer. They have to make decisions in several rounds (months, quarters, years) about a number of business tasks, such as pricing, advertising, hiring person- nel, etc. The computer program presents these choices to the subject, and reports what sales and profits are made when the subject has made his choices. One of the tasks is filling out tax declarations at the end of each simulated year. Subjects may be audited and fined when underreporting taxable income. A similar series of experiments has been devised and reported by Webley et al. (1993).

The advantage of the Webley-type experiment is its experimental realism. Participants become involved in it and take it seriously; it tries to evoke the same processes it models. Moreover, subjects do not experience the task as a tax emsion situation, but as a business task. Hence, we suppose that problems of self-presentation, if at all present, will not be focused at evasion. Experimental measures may be less prone to distortion by self-presentational concerns than other measures. A definite disadvantage of experiments as a means to measure evasion tendencies is that we place subjects in an environment that may be unknown to them. After all, few people are small shop keepers. The elicited behaviour may be partly hypothetical in that respect. The experiment treated business incomes as if they were personal i.e. individual incomes. This procedure reflects the situation in the Netherlands, where many of the small businesses are taxed under individual income tax laws. Although differences exist between the fiscal treatment of individual and corporate incomes, they are of lesser importance for the present investigation.

The study reported in section 3 incorporates this type of experimental measurement of evasive behaviour.

H. Elffers et al. / On measuring tax ecasion 553

2.5. The aims of the empirical study

An empirical study has been designed and executed to research the following questions:

- Is the lack of relationship between self-report and officers’ classification an artifact of the contrast design of EWH, or can it be replicated in a representative design?

_ How reliably and with what validity can we assess evasion by means of officers’ classification of returns as to evasion?

- How reliably and with what validity can we assess evasion by means of an adapted self-report method?

- How reliably and with what validity can we assess evasion by means of an experimental measure?

- Is it possible to use the three methods mentioned to yield a combined valid and reliable measure, in order to identify a common factor?

Answers to these questions make it possible to consider to what extent we can exclude an explanation of the EWH zero correlation in terms of low measurement quality.

3. Design and results of the empirical study

To address the research questions formulated above, an empirical study was conducted to collect and combine the three types of data on an individual level: officers’ classification (section 3.1), self-report (section 3.21, and experimental classification (section 3.3). The study was done in 1988/1989 in the Netherlands, with the full cooperation of the Dutch Tax Service, and with elaborate safeguards for confidentiality of the data. The study reported was designed for testing the tax evasion behaviour model of Weigel et al. (19871, a model that explains evasion in terms of situational and personal instigations and con- straints. In this paper, however, we will concentrate on the dependent variable of that model, tax evasion behaviour.

Section 3.4 discusses the mutual relationships of the resulting measures and the meaning of the results for our research questions.

554 H. E@s et al. / On tneuswing tax er,asion

3. I. Results of officers’ classification of filed tax returns

The Dutch Tax Service provided data on 918 individual income tax returns concerning nonbusiness income over the fiscal year 1986. The sample is believed to be representative for nonbusiness tax returns (Elffers 1991; Robben 1991).

Three sets of measures are relevant to the present discussion: actual assessment, reassessment and expert assessment, The first set concerns the actual outcome of the typical tax assessment process as executed during the regular work of tax inspectors. Each return is classified as to its ~o~comp~~ance statL~s~ u~c~al~e~ged, corrected, or corrigible (i.e. in principle correctable, but the inspector refrains from actually applying the correction). Also noted is the guilder amount of corrections. The tax officer also described the corrections based on assumed intent to evade: surely not, uncertain, sure (ellasion status).

The second set concerns the independent reassessment by a second tax officer of a clean copy of the returns filed. The officers were restrained to some extent in their operations, because they were not allowed to contact the taxpayer or tax consultant to ask questions or obtain proof. Within this limitation, the reassessment is a replication of the original assessment. The reassessment was done on a sample of 413 files, a stratified sample from the original 918, which overrepre- sented corrected files. The analysis reweighs for disproportionate sampling probabilities. The analysis presents only the 166 cases in which the reassessor was of the same rank as the original assessor. Reassessing officers noted down the same data as the assessors: noncompliance status, guilder amount of corrections, and evasion status.

The third set concerns a third assessment of the same tax return by a team of three expert tax officers (expert assessment). They had access to the work of actual assessment and reassessment, and had to discuss and reach consensus about how the return should be handled. Again, they were not allowed to contact the taxpayer. Within this limitation their assessment approximates the best the tax service can provide for. Data are on the same 413 cases already reassessed, and again noncompliance status, guilder amount and evasion status were noted.

By comparing actual assessment and reassessment we estimate the reliability of routine officers’ classification: by comparing actual and

H. Elffws rt al. / On measuring tax ersasion 555

expert assessment its validity is estimated. We judge the extent to which both assessments agree by computing Cohen’s kappa (Cohen 1960). Perfect agreement between two judgements is reflected in a kappa of 1, while random judgements display a kappa of 0. A kappa of at least 0.60 is considered satisfactory. There was, in fact, considerable disagreement between the different kinds of assessments of l~~ltc~m- pliunce status (actual assessment and reassessment, Cohen’s kappa = 0.35; actual assessment and expert assessment, kappa = 0.31; reassessment and expert assessment, kappa = 0.32).

The differences in the guilder amount of corrections are quite large: between actual and reassessment 30% of the cases differed by at least f500, and 6% of cases differed by even more than f5000 in assessed income. Between actual and expert assessment these figures are 43% and 12%.

There was even less agreement about assessments of evasion status (actual assessment and reassessment, Cohen’s kappa = 0.32; actual assessment and expert assessment, kappa = 0.24; reassessment and expert assessment, kappa = 0.21).

As part of the irreproducibility of the evasion status derives from the fact that the noncompliance status turned out to be irrepro- ducible, it is worthwhile to observe the reproducibility of the evasion status among cases where both assessments had resulted in corrections. We observe that Cohen’s kappa between actual assessment and expert assessment among those cases is only 0.03 (n = 99). (For computing a comparable figure involving reassessment only 25 returns are available; therefore, we omit it.)

The inescapable conclusion of this part of our study is that neither reliability nor validity of the officers’ classification is adequate for research purposes. Measuring tax evasion on the basis of the outcome of the routine assessment of tax return is not warranted. This holds for noncompliance status, for guilder amount of corrections, as well as for evasion status.

For the last of our research questions, the possibility of identifying a common factor in several measures, this conclusion is a setback. We intend to use for that purpose, therefore, the expert assessment, i.e. the result of a discussion of three expert tax officers who had access to what two independently working officers had already indicated as the right classification of a return. Though a direct evaluation of its reproducibility is not at hand, we consider it to be the best available.

Not all tax returns in the final sample received the expert assessment, so subsequent analyses would have suffered from a large number of missing values if this assessment was used. To arrive at a value of the evasion status for ~~11 respondents, a new variable was created. For respondents for whom the evasion status from the expert assessment was present, this score was used, for the remaining participants the next best score was used, namely the judgement of evasion status of the actual assessor (for an analysis of the differences between these scores see Elffers 1991). A zero on this measure indicated no assumed intent, a score of one meant that the assessor was sure of an inten- tional misrepresentation.

3.2. Results of self-reports by individual taxpayers

Gathering potentially sensitive fiscal information, demands procedures that guarantee confidentiality of that information. To collect these data, a version of the procedures reported by Elffers et al, (19871 and Robben et al. (1989) was used. This procedure guaranteed that none of the parties involved could, at any point in time, identify the source of a given data set. The present methodology ensured that only in the analysis phase did the investigators have a complete, yet depersonalized, data set. In total, 840 of the taxpayers whose returns form the data of the study were approached by letter. It was not possible to contact the remaining 78 taxpayers of the original sample. The sample was randomly divided in half. One subsample of 415 tax returns was assigned to the so-called confrontation study. The other 425 cases were assigned to the experimental study. The assignment was stratified according to correction status. Both groups were asked to receive an interviewer with a questionnaire, identical for both groups. The experimental group, however, were asked to take part in an expe~i~e~2ta~ stl~dy (section 3.3) before completing the questionnaire, while the confrontation group had to complete an additional confrontation questionnaire afterwards. This confrontation questionnaire prompted respondents about their reaction to a written re- minder by the tax inspectorate on what corrections, if any, the inspector had applied for 1986. The covering letter explicitly asked for consent to present this data in a sealed envelope. It is not surprising that the prospect of being interrogated about one’s past tax filing

H. E(ffers et al. / Ott measuring tux erasion 557

behaviour was not inviting: ultimately, 83 taxpayers took part in the confrontation study, a response rate of 20%.

Those to be approached for cooperation in the experiment received a request to participate in a survey on the Dutch economy. At that moment, prospective subjects did not know the real goal of the investigation. Eventually, 147 people indicated their willingness to participate, a response rate of 35%. For the total sample of participants, the response rate was just over 27%.

During the personal interview, a questionnaire was administered that contained the personal instigation and personal constraint concepts as specified by the theoretical model (Weigel et al. 19871, and questions concentrating on the respondents’ past filing behaviour. A brief description of some of the measures is given below.

The dependent variabies are measures of seZ~-reforged ~e~~v~o~r. During the interview, respondents were asked in a variety of ways about whether, how and when they had underdeclared income, over- stated deductions and filed incorrectly. For the present analyses, only the answers to the questions regarding the 1986 income tax returns were used. This question reads: ‘Did you try to underdeclare your income or overstate your deductions in 1986?‘. The response categories were ‘Yes’, ‘No’, and ‘Don’t know’. A negative response received a score of ‘l’, and an affirmative answer a score of ‘3’, with the ‘Don’t know’ category in the middle position. There were also questions about having received a correction or not.

The confrontation group considered several additional questions. These were presented after the main questionnaire, well separated in time from the earlier self-report questions. After having been asked to open the sealed envelope, they answered questions on whether a correction was mentioned in the letter, whether it was remembered by the respondent, and agreed upon, and whether the respondent was prepared to qualify that correction as pertaining to an attempt at wilful underreporting. Respondents were also asked whether the inspector had missed other instances of underreporting in that return. The rationale of the confrontation method is twofold: first, it attempts to overcome memory problems associated with self-report; secondly, we hope that a seIf-presentational concern that possibly aligned self- report with attitudes is overridden by a desire to upheld a coherent image before the interviewer by acknowledging the content of the envelope and to comment upon its meaning.

558 H. Elffers et 01. / On measuring tax wasion

The independent variables employed in the study were as follows: (1) Personal instigation measures: These measures contained variables such as Dissatisfaction with tax authorities and Comprehensibility of tax-related information and rules. Three measures were included to assess aspects of individuals’ personality orientation: (a) Competitiue- ness (b) Alienation, and Cc> Tolerance of deviance. (2) Personal constraint measures: These measures contained three types of constraint measures: fear of punishment, social controls, and personal controls. Three measures were used to assess fear of punishment: (a> Perceived certainty of punishment, (b) Perceived senerity of punishment, and (c) Perceived risk index. Two measures assessed the social controls: (a) Perceived frequency of tax evasion, and (b) Anticipated social support. Two attitude measures were included to assess personal controls: (a> Underreporting income, and (b) False deductions.

In addition, the questionnaire contained questions on other psychological concepts. These measures and the corresponding analyses with regard to the theoretical model are further described in Elffers (1991) and Robben (1991).

The resulting number of people reporting a correction or a case of evasion are given in table 1. We observe that self-reported evasion and self-reported correction satus are minimally associated.

The results of the confrontation method are interesting. The number of cases for analysis is, however, rather low (only 83 consenting respondents remained and of these only 52 had been corrected). The method did succeed in decreasing the discrepancy between the self-report on being corrected and what we know from tax office files. Only in 10 cases did the respondents not acknowledge the content of the envelope, resulting in a correlation of 0.78 between self-report and

Table 1

Self-reports on being corrected and on having evaded in 1986.

Self-report on Self-report on evasion Total

being corrected No evasion Evasion

No correction 139 (68%) 25 (12%) 164 (80%) Correction 31 (15%) 10 (5%) 41 (20%)

Total 170 (83%) 35 (17%) 205 (100%)

Note: Percentages are with respect to the total number of 205. The product moment correlation

coefficient between both measures is 0.20.

officials’ report on being corrected. So we succeeded in focusing the respondent on the intended event. Of the 57 respondents who stated during the ordinary self-report that they did not evade, only one changed his mind when confronted. On the other hand, of the 12 people who admitted evasion when self-reporting, 10 denied evasion when confronted. This may indicate that a lot of the admitted evasion does not refer to the special correction mentioned in the envelope. In addition, 5 people stated, when asked, that the tax inspector had not found all there was to find. Ordinary self-report on evasion and the confrontational variant have a low correlation (r = 0.27). The signifi- cance of this observation is not clear before studying the relationship with other measures (section 3.4). In any case, we lack sufficient data to decide definitely on the usefulness of the method.

3.3. Result of ~xper~rnei~t~l measurement

The experimental simulation presented to half of the respondents constitutes the third measurement study by which information on an individual level was collected. The present simulation is very similar to the one developed by Robben et al. (19901. The most important differences pertained to procedural matters which facilitated subjects’ participation in the simulation. The subjects’ task in this simulation involved running a small business, an assignment that required them to consider a number of recurring financial decisions in either of two years that they were in business. Of interest here were subjects’ decisions on filing a tax return for each year. The tax return included a report of income earned and expenses incurred in the simulation.

Of 147 prospective subjects (see section 3.21, 10 did not participate. They indicated that they had no time to complete the investigation and one of them had already left the country. Some difficulties with experimental procedures limited the usefulness of an additional eight data sets. Nine data sets could not be classified for fiscal undercompli- ante, and were not included in the analyses. Only those data sets were used in the subsequent analysis that were generated by the participants themselves without help or guidance from the interviewers. This left 120 data sets for further analysis.

The experiment contained two independent variables - ~it~~o~~iF2g status and opportunity to ecade taxes. These variables stand out in the theoretical mode1 (Weigel et al. 1987) as prime causes of tax evasion behaviour. Robben (19911 provides an in-depth treatment of the

relevance of these variables to tax evasion behaviour. Withholding status was manipulated by providing half of the subjects with a situation requiring them to make a substantial additional payment to the revenue ~~uthurities~ the other haif could expect a cunsiderable refund of the same amount from the authorities. Opportunity to evade taxes was systematically varied by telling half of the subjects that the chance of being audited was 1 in 100, that their income would be difficult to verify as it derived largely from cash payments, and that the penalties for evasion were not harsh. Contrasting messages were presented to the other participants. Robhen (1991) presents more detaifs of this type of simulation, as well as fuller treatment of the present simulation.

Three measures were used to assess the occurrence and extent of tax-cheating behaviour within the simulation. The first of these was the Occurrence qf undercompliance index. During the simulation, tax declaration forms were completed twice by each subject. If a subject underreported income OF exaggerated business expenses on ei&er tax return, that subject was assigned a score of ‘1’ on the occurrence of undercomplianc~ index. Subjects who did not underreport income or overstate deductible expenses on either tax return were assigned a score of ‘0’. The second measure was the Frequency of undercompli- wzce index. This measure was calculated by summing the instances of income and deduction misrepresentation that occurred on the two tax returns subjects filed. That is, subjects were assigned a score of ‘1’ for each tax return on which they underreported their business income. Similarly, subjects received a score of ‘1’ for e~ach tax return on which they exaggerated any deductible business expense. Hence, scores on this index varied from O-4 with higher scores indicating more frequent undercompliance behaviour exhibited either by underreporting income or declaring unwarranted deductions. The third measure was the Amount L~izre~~~rted index. Here guilder amounts of underreported income for both tax returns were summed. Similarly, the guifder amounts of deductions claimed above a subject’s legitimate business expenses were summed for the two tax returns filed. These two sums, in turn, were added together to yield a figure reflecting the total guilder amount of undercompliance.

As part of the initial set-up interviewers visited participants at home. The simulation study was done before the core questionnaire was given to the respondents.

H. Elffers et al. / On meusrtring tax wasion 561

A number of questions in the postexperimental questionnaire assessed the subjects’ subjective. experiences of the experimental conditions they had confronted. In general, subjects perceived the experimental manipulations as intended. In the overwithholding situation, the refund was seen as a gain; in the underwithholding condition, the additional payment created a sense of loss. Similarly, the opportunity conditions evoked feelings of having little or much opportunity to successfully evade taxes without being detected. All subjects expressed satisfaction with their performance in the simulation. Most of the participants found the task interesting and indicated that they found the simulation to be a realistic representation of running a small store. Only one subject responded to an open-ended question that he/she thought that tax evasion was ogle of the study’s main aims; 11 other participants indicated that they thought that taxation was involved in the investigation. These figures represent, we believe, the relative success with which the original purpose of the study remained hidden for the subjects.

The three quantitative measures obtained in the simulation, occurrence of noncompliance, frequency of noncompliance and the money amount were subjected to 2 x 2 analyses of variance. The opportunity to evade taxes factor yielded significant effects for the occurrence (F(1, 116) = 14.9, p < 0.01, two-tailed), frequency (F(1, 116) = 13.3, p < 0.01, two-tailed) and money indices (F(1, 116) = 7.5, p < 0.05, two-tailed). People in the high-opportunity condition evaded more and to a larger extent than those in the low-opportunity condition. Withholding status had little effect on the frequency index (F(1, 116) = 4.0, p < 0.10, two-tailed). The overwithholding situation led to less frequent evasions than the underwithholding situation. There were no significant interactions between the predictor variables. The results are straightforward: the prospect of financial loss and increased opportunity to cheat without being detected both produced more frequent and more extensive noncompliance. This accords with what was hypothesized (Robben 19911, and is interpreted as support for the predictive validity of the experimental measure.

Consistent with the focus of the present article and previous work (e.g. Robben et al. 1990), an attempt was made to refocus the dependent variable on tax evasion rather than on noncompliance; this effort capitalized on the strong correlation (Y = 0.79, p < 0.001) between scores on the measure assessing the occurrence of undercompli-

ante (EOU) and scores on the measure of self-reported evasion in the experiment. The clearest subset of evaders were subjects who (a) underreported income or exaggerated deductions at least once during the simulation, and (bl also acknowledged their intent to evade on the postexperimental questionnaire. In this way, the 23 subjects who satisfied these criteria were classified as ‘evaders’ and assigned a score of ‘1’ on this tax-cheating index. Another 87 subjects who had both filed accurate returns and indicated in their self-reports that they had intended to do so were classified as ‘nonevaders’ and assigned a score of ‘0’. The resulting index is called the Experimental index of Occur- rence of Evasion, EOE. Analysis of variance of EOE yields almost identical results as the analysis in terms of the occurrence of undercompliance index EOU.

3.4. Combining measurement methods

In the previous sections we assessed the individual qualities of officers’ classification, self-report, and the experimental measures. This section examines their mutual relationships using only evasion indices. For officials’ classification we use experts assessment of evasion (when available) and actual assessors classification in the rest of the cases called OCE (Officers’ Classification of Evasion). For the self-report we use the ordinary self-report of evasion (SRE) and - when available - the version after confrontation (SREC). For the experimental measure we use the direct experimental result occurrence of undercompliance (EOU). Because we are trying to assess ecasion, this experimental measure is augmented by a mixed experimental/ self-report variable occurrence of undercompliance and ad- mittance of intent (EOE).

These methods yield different frequencies (see table 2) but in fact the differences are not very large. This does not imply that the methods identify substantially the same taxpayers as evaders, as can be seen from table 3, in which the relationship between the measures in terms of correlation coefficients are presented.

We observe that the lack of correlation between evasion measurements as reported in EWH has been replicated. The experimental measures do not display a statistically significant relationship with the self-reports, nor with the officers’ classification. Since we have painstakingly attempted to optimize the reproducibility of the measures involved, we reject the solution of blaming measurement error

H. Elffers et al. / On measuring tax evasion 563

Table 2

Frequencies of evasion measures.

Measure Evasion No evasion Total N

Officers’

classification

of evasion

Self-reported

evasion

Self-reported

evasion after

confrontation

Experimental

measure of undercompliance

Experimental

measure of evasion

OCE 27 (12%) 193 (88%) 220

SRE 35 (17%) 170 (83%) 205

SREC 3 (17%) 80 (83%) 69

EOU 31 (26%) 89 (74%) 120

EOE 23 (21%) 87 (79%) 110

Table 3

Correlation matrix of evasion measures based on (subsamples of) the survey group.

Variable Product moment correlation coefficients with

Officers’ Self-reports Experimental classification measures

OCE SRE SREC EOU EOE

Officers’

classification

OCE 1.00

Self-reported

Evasion

SRE 0.10

n = 209

idem, after

Confrontation

SREC 0.16 n = 69

Experimental Occurrence

Undercompliance

EOU - 0.02 n = 120

Experimental Occurrence Evasion

EOE - 0.06 n = 110

1.00

0.27 a 1.00 n = 69

0.02 _ I.00 n = 120 n=O

0.04 _ 0.79 b 1.00 n=llO n=O n=llO

a Significantly different from zero (5%, one sided); h significantly different from zero (O.l%, one sided).

564 H. Elffers et al. / On measuring tax euzsion

for the zero correlation. We must now take very seriously the fact that three different ways of approaching the measurement of tax evasion result in nearly uncorrelated indices.

4. An outlook on future research

We are forced to present here a complex, but simultaneously straightforward, outlook on future research. Tax evasion is a highly complex behavioural act which is fragmented over many different periods in a tax year, based on different detailed aspects of tax obligations, occurring in different spheres of life, and under ever- changing situational, economic, fiscal, social and psychological conditions. Last but not least, it occurs with changing and multifaceted levels of conscious intentions to comply or evade. It is no wonder then that the measurement of tax evasion behaviour is an arduous task for the social scientist.

To address this task, EWH performed a study using a two-method approach in a contrast-group design. The zero correlation observed between both measures of tax evasion directly influenced the present investigation which sought to elaborate on the EWH study. This was done by both improving the two original measurement methods, as well as by introducing a third method, the experimental approach.

The measurement problem on the inspector’s side has been re- searched thoroughly in the present study, given the classification of tax-evasion behaviour by three expert tax inspectors, who also used the judgement of two independent tax inspectors on the same invidid- ual tax forms. Since EWH put some faith in the validity of the inspector’s classification, we feel the need to be careful in our evaluation of the present findings regarding the official classification. First, there are several differences between the behavioural outcome measures used in both studies (see Robben 1991: 227-229). Second, the contrast-group design used in the EWH study was not replicated in the present study in which the focus of the model was generalised to a representative sample of taxpayers. And, third, we encountered a rather low number of wilful misrepresentations of income in the sample of tax returns we used in the present study. With these limitations in mind, we believe that it will be rather difficult to improve on this approach, and, until then, we believe that we have reached a more or less optimal measurement level.

H. Elffers et al. / On measuring tax evasion 565

We have also tried to improve on the self-reports by introducing a so-called confrontation approach. Unfortunately, however, this elabo- ration was marred by a small sample size. Nevertheless, since the relationship between self-reported behaviour and official classification was improved by the confrontation appraoch, we are convinced that the resulting self-reported behaviour represents an optimal measure. We therefore cannot avoid the conclusion that the zero correlation between self-reported and officially measured behaviour in the EWH study was replicated in the present study. Within the limitations of this study, we believe we have exhausted the possibilities of improving both measures.

The introduction of the third measure - the experimental method - to tap the underlying concept of tax evasion also failed, to the extent that this measure did not significantly associate with both other evasion measures. Although the experiment showed reasonably strong external validity by revealing the expected effects of withholding and opportunity, using the results of the experimental method in combina- tion with the results of the two other approaches did not enable us to measure a common factor of tax evasion behaviour.

In this sense, this study replicates the main results of the EWH study in producing uncorrelated behavioural measures of self-reported and officially classified tax evasion. In this case, however, we are confronted with three uncorrelated behavioural measures.

What is the impact of these results on our understanding of the concept of tax evasion and on future research on tax evasion? Since we have established that the existing measurement error cannot be used as an explanation for the lack of associations between the three behavioural measures, we are obliged to consider the possibility that these three measures do not share a common core as is conventionally supposed. That would imply the disquieting conclusion that neither we nor any other tax evasion researchers have offered a successful definition and operationalization of tax evasion in general. Further- more, tax-evasion behaviour seems to consist of at least three unre- lated aspects of one and the same underlying concept. The unavoid- ability of this conclusion is accompanied by incomprehensibility. We have to admit that our imagination fails to produce even the slightest insight in the possible conceptual structure of an emerging multi-concept of tax evasion behaviour.

However, through the lack of other explanations, and with much

566 H. Elffers et al. / On measuring tax evasion

hesitation, we put forward a multi-concept hypothesis for the act of tax evasion. That is, until proven wrong, we suggest that tax-evasion behaviour consists of three independent conceptual aspects that are measured by three independent methods. Notwithstanding our hesitation, this hypothesis has a strong ruison d’i?tre: we are convinced that, until a solution is found for the problems raised by this study and the EWH study, there is no valid scientific reason for studying tax evasion behaviour by using only one of the three methods. Does this all mean that empirical studies on tax evasion are useless? We believe that such a conclusion would be too rash at this time, but we are convinced that researchers should refrain from identifying their measurements with tax evasion in general, and, more modestly, qualify them as evasion in the experimental, official or self-reported sense.

The task for future tax-evasion researchers is clear but demanding: studying and developing theoretical models for explaining tax evasion must wait until our multi-concept hypothesis is proven false. Until then we believe that tax-evasion studies will not significantly con- tribute to our understanding of real-life tax-evasion behaviour.

References

Baldty, J.C., 1986. Tax evasion is not a gamble: A report on two experiments. Economics Letters

22, 333-335.

Baldry, J.C., 1987. Income tax evasion and the tax schedule: Some experimental results. Public

Finance 42, 357-383.

Becker, W., H-J. Biichner and S. Sleeking, 1987. The impact of public expenditures on tax

evasion: An experimental approach. Journal of Public Economics 34, 243-253.

Cohen, J., 1960. A coefficient of agreement for nominal scales. Educational and Psychological

Measurement 20, 37-46.

Elffers, H., 1991. Income tax evasion. Theory and measurement. Deventer: Kluwer.

Elffers, H., R.H. Weigel and D.J. Hessing, 1987. The consquences of different strategies for

measuring tax evasion behavior. Journal of Economic Psychology 8, 311-337.

Friedland, N., 1982. A note on tax evasion as a function of the quality of information about the

magnitude and credibility of threatened fines: Some preliminary research. Journal of Applied

Social Psychology 12, 54-59. Friedland, N., S. Maital and A. Rutenberg, 1978. A simulation study of income tax evasion.

Journal of Public Economics 10, 107-l 16.

G&h, W. and K. Mackscheidt, 1985. An empirical study of tax evasion. Mimeo, University of

Cologne. Cited by Becker et al. (1987). Hessing, D.J., H. Elffers and R.H. Weigel, 1988. Exploring the limits of self-reports and

reasoned action: An investigation of the psychology of tax evasion behavior. Journal of

Personality and Social Psychology 54, 405-314. Kinsey, K.A., 1988. Measurement bias or honest disagreement? Problems of validating measures

of tax evasion. American Bar Foundation Working Paper 8811.

H. Elffers et al. / On measuring tax euasion 567

Long, S.B. and J.A. Swingen, 1991. Taxpayer compliance: Setting new agendas for research.

Review of: National Academy of Sciences Panel Report on Taxpayer Compliance: Taxpayer

compliance: An agenda for research, Vols. 1 and 2. Law and Society Review 25, 639-683.

Robben, H.S.J., 1991. A behavioral simulation and documented behavior approach to income tax

evasion. Deventer: Khrwer.

Robben, H.S.J., H. Elffers and W.F.M. Verlind, 1989. Determinanten van individueel misbruik

van sociale verzekeringen: het geval van WW-misbruik [Determinants of individual’s misusing

of social security: The case of misuse of unemplo~ent benefits]. Rapport R89-02. Amster-

dam: Federatie van Bedrijf~erenigingen.

Robben, H.S.J., P. Webley, R.H. Weigel, K.E. WHmeryd, K.A. Kinsey, D.J. Hessing, F. Alvira

Martin, H. Elffers, R. Wahlund, L. van Langenhove, S.B. Long and J.T. Scholz, 1990.

Decision frame and opportunity as determinants of tax cheating: An international experi-

mental study. Journal of Economic Psychology 11, 341-364.

Roth, J.A. and J.T. Scholz (Eds.), 1989. Taxpayer compliance, Vol. 2: Social science perspectives. Philadelphia, PA: University of Pennsylvania Press.

Roth, J.A., J.T. Scholz and A.D. Witte (Eds.), 1989. Taxpayer compliance, Vol. 1: An agenda for

research. Philadelphia, PA: University of Pennsylvania Press.

Spicer, M.W. and LA. Becker, 1980. Fiscal inequity and tax evasion: An e~riment~ approach.

National Tax Journal 33, 172-175.

Spicer, M.W. and R.E. Hero, 1985. Tax evasion and heuristics: A research note. Journal of

Public Economics 26, 263-267.

Spicer, M.W. and J.E. Thomas, 1982. Audit probabilities and the tax evasion decision: An

experimental approach. Journal of Economic Psychology 2, 241-245.

Webley, P. and S. Halstead, 1986. Tax evasion on the micro: Significant simulations or expedient

experiments? Journal of Interdisciplinary Economics 1, 87-100.

Webley, P., f. Morris and F. Amstutz, 1985. ‘Tax evasion during a small business simulation’. In:

H. Brandstatter and E. Kirchler (Eds.), Economic psychology: Proceedings of the 10th

Annual ~Il~uium of the ~ntemational Association for Research in Economic Psy~holo~

(pp. 233-242). Linz: Trauner.

Webley, P., H.S.J. Robben, H. Eiffers and D.J. Hessing, 1991. Tax evasion: An experimental

approach. Cambridge: Cambridge University Press.

Weigel, R.H., D.J. Hessing and H. Elffers, 1987. Tax evasion research: A critical appraisal and

theoretical model. Journal of Economic Psychology 8, 215-235.

Wilson, J.Q. and R.J. Hermstein, 1985. Crime & human behavior: The definitive study on the causes of crime. New York: Simon and Schuster.

Zimring, F.E. and G.J. Hawkins, 1973. Deterrence: The legal threat in crime control. Chicago, it: The University of Chicago Press.

on measuring tax evasion

Documents