estimation of interveiwer effects for categorical items in a random digit dial telephone survey

9
Estimation of Interveiwer Effects for Categorical Items in a Random Digit Dial Telephone Survey Author(s): Lynne Stokes Source: Journal of the American Statistical Association, Vol. 83, No. 403 (Sep., 1988), pp. 623- 630 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2289284 . Accessed: 14/06/2014 18:47 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 62.122.73.17 on Sat, 14 Jun 2014 18:47:38 PM All use subject to JSTOR Terms and Conditions

Upload: lynne-stokes

Post on 20-Jan-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Estimation of Interveiwer Effects for Categorical Items in a Random Digit Dial TelephoneSurveyAuthor(s): Lynne StokesSource: Journal of the American Statistical Association, Vol. 83, No. 403 (Sep., 1988), pp. 623-630Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2289284 .

Accessed: 14/06/2014 18:47

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 62.122.73.17 on Sat, 14 Jun 2014 18:47:38 PMAll use subject to JSTOR Terms and Conditions

Estimation of Interviewer Effects for Categorical Items in a Random Digit Dial Telephone Survey

LYNNE STOKES*

The loss of precision in estimates of means due to variability among interviewers can be substantial for some questionnaire items and survey designs. The most commonly used methods for estimating the magnitude of this loss are inappropriate for binary items in complex surveys. This article shows how parameters from a model for variance components in binary variables (Anderson and Aitkin 1985) are related to the increased variance of population estimates. It is suggested that one of these parameters, a measure of correlation between interviewer observations on a latent variable, may be more appropriate than the intrainterviewer correlation p for measuring the magnitude of interviewer effects. This is because it is unaffected by the level of the attribute in the population, which is not true of p. Interviewer effects for a household respondent's recorded labor- force status in a random digit dial telephone survey are examined using the new model and estimation process. Small but positive effects are present for at least one of the labor-force categories. KEY WORDS: Interviewer variance; Logistic regression; Nonsampling error; Variance component.

1. INTRODUCTION

Interviewers, editors, and coders of survey data can in- troduce errors into the data they handle. For example, when the errors made by an individual interviewer are of the same or similar type, a correlation is induced among the units in that interviewer's assignment. Measurement of this correlation is important for at least three reasons. First, the presence of the correlation inflates the variance of the sample mean d. I assume throughout this article that the correlation is due to interviewers, and I let dij be the response of the jth unit of interviewer i's assignment, where i 1, . . , k and j = 1, ... , n. Then

var(d) = var( E d,I/kn)

= [var(dij) + (n - 1)cov(dij, dij)]Ikn = var(dij)[1 + (n - 1)p]lkn, (1)

where p = cov(dij, dijj)Ivar(djj). If n is moderately large, even a small p can dramatically increase the variance of d. Knowing the magnitude of p gives researchers an idea of the amount of precision lost to interviewer variability.

Second, a large correlation may indicate a problem with interviewer training on an item or with the design of the question itself. If such items can be identified, they might be improved.

Third, researchers in survey-sampling methods use p as a comparative measure of interviewer effects for different question types, different interviewing modes (e.g., tele- phone and personal visit), or different interviewer and respondent characteristics. This information can aid in the design of better surveys.

Because of the importance of this source of nonsampling errors, experiments designed to allow estimation of the interviewer variance [as cov(dij, dij) is called when it is due to interviewers] in personal-visit surveys and censuses have been conducted for some time by the U.S. Bureau

* Lynne Stokes is Assistant Professor, Department of Management Science and Information Systems, University of Texas, Austin, TX 78712. This work was partially supported by an Intergovernmental Personnel Agreement with the U.S. Bureau of the Census.

of the Census (e.g., Bailey, Moore, and Bailar 1978; Han- son and Marks 1958; U.S. Bureau of the Census 1968) and Statistics Canada (e.g., Fellegi 1974; McLeod and Krotki 1979). Experiments designed to allow estimation of interviewer variance in telephone surveys have also been reported (Groves and Magilavy 1986; Stokes 1986). In fact, measuring interviewer variance may be particularly important in telephone surveys, since "better interviewer control" is commonly listed among the advantages of tele- phone compared with personal-visit interviews. Improved control should manifest itself as a smaller intrainterviewer correlation, although this has not yet been clearly dem- onstrated with empirical studies.

From April through September 1982, the U.S. Bureau of the Census conducted an experimental random digit dial telephone survey (called RDD-I) from its headquar- ters in Suitland, Maryland. The survey was conducted in seven two-week replicates, each resulting in completed interviews with about 500 households. One of the several goals of the experiment was gaining experience with es- timating nonsampling errors in telephone surveys. An in- terviewer interpenetration experiment was embedded in the survey design so that interviewer variance could be estimated. This article describes some of the results of that experiment.

Interpenetration means randomizing sample units within a defined area to two or more interviewers. When data are collected by personal visit, interviewers are generally interpenetrated in pairs, because of the high cost of travel that would be incurred for larger numbers. Because of practical field considerations, interpenetration in tele- phone surveys is not as straightforward (though less ex- pensive) as in personal-visit surveys and censuses. There- fore, some modeling of the effects of these complications was required for the analysis of the RDD-I data. This difference in the experimental-design requirements, and therefore the resulting estimators, is one reason that com- parison of interviewer variance for personal-visit and tele- phone surveys is difficult.

? 1988 American Statistical Association Journal of the American Statistical Association

September 1988, Vol. 83, No. 403, Applications & Case Studies

623

This content downloaded from 62.122.73.17 on Sat, 14 Jun 2014 18:47:38 PMAll use subject to JSTOR Terms and Conditions

624 Journal of the American Statistical Association, September 1988

In the past, analysis of variance (ANOVA) models have been used to describe the effect of interviewers on re- sponses. Since the items of primary interest in survey ques- tionnaires are proportions of the population in certain cat- egories, however, the usual ANOVA estimators are not appropriate. When the ANOVA model has only a single random effect (such as that due to interviewers) and no fixed effects, the estimator of the variance component ob- tained from the model is still unbiased (Stokes and Mulry 1987). But when there are fixed effects in the model as well, as is required in telephone surveys, a single param- eter cannot describe the intrainterviewer correlation of the responses. Therefore, the RDD-I data were analyzed with an adaptation of a method suggested by Anderson and Aitkin (1985), which eliminates this problem.

Section 2 describes the practical problems caused by the interpenetrated design in RDD-I. Section 3 gives an over- view of Anderson and Aitkin's model. The relationship between the parameters of their model and the intrain- terviewer correlation defined in (1) is clarified. In Section 4, a method for obtaining maximum likelihood estimators and confidence intervals for parameters from their model is described, as well as a method for checking the fit of the model. In Section 5, results from the RDD-I experi- ment are reported, and a discussion follows in Section 6.

2. INTERPENETRATION DESIGN FOR RDD-I

To attain complete interpenetration, each interviewer must complete a random sample from the entire popula- tion. This could be accomplished in a telephone survey by randomly assigning sample units to interviewers at the beginning of the survey period. The interviewers would pursue these units until they were complete or the survey period ended. Unless an interviewer were available to work all shifts, however, there might be units in their assignment that they could never reach, but that would have been reachable if they had been called at the appropriate time. Thus the interviewer would be forced to accept a higher nonresponse rate than could have been obtained with a different method.

One of the requirements of the RDD-I design was that the measurement of nonsampling errors should not in- crease them. Therefore, an alternative scheme was chosen that intended to achieve interpenetrated assignments only within shifts. Each interviewer received a random sample of all units unreached at the beginning of each shift, and all units that remained unreached at the end of the shift were returned to the pool of respondents. The assignment of each interviewer working a given shift was a random sample from the population available (at home and willing to talk) during that shift.

A further complication resulted because the managers of the field operation felt that too much efficiency would be sacrificed by not arranging that the cases most likely to be reached or most important to resolve could be called first. This procedure is incompatible with interpenetration, because faster interviewers would be assigned respondents of lower priority. If respondent characteristics are related to priority, then the assignments of slow interviewers will

differ from those of fast ones, and interviewer differences will be confounded with priority differences.

This problem was resolved in RDD-I by assigning each unreached unit to one of six priority groups at the begin- ning of each shift. The units within each group were ran- domly assigned to interviewers, but one group was ex- hausted before units from the next were assigned. The priority level of the completed case was recorded along with other case information, so it could be accounted for before comparing interviewer assignment results.

3. THE MODEL

The variability among interviewers is generally treated as a random effect in an ANOVA model. One of several estimators [such as ANOVA or MINQUEO (minimum norm quadratic unbiased estimator)] for variance components is usually computed (Searle 1971, chap. 10). For balanced designs, these estimators have some optimal properties when the components are assumed normally distributed. The designs of most personal-visit surveys are relatively balanced, but telephone survey designs are less likely to be so. Nevertheless, the responses to most questionnaire items in both personal-visit and telephone surveys are re- corded in categories, and each category is generally treated for analysis as dichotomous. Since the assumption of nor- mality no longer holds, the properties of the variance- component estimators are not known. Furthermore, the presence of fixed effects in the model-needed to describe the more complex design necessary for modeling the RDD- I data, as for that from most telephone surveys-can no longer be assessed with F tests.

Another problem is that a single measure of intrainter- viewer correlation may not describe the correlation struc- ture of Bernoulli variables having different means, since unlike normal random variables their second moments are functions of their means. Let d1ij and d1ij' denote the re- sponses recorded for the jth and j'th units in interviewer i's assignment in subpopulation t (t = 1, . . . , T); d,ij = 1 denotes that the unit is recorded in a given category and d,ij = 0 indicates that it is not. Let pi(t) denote the pro- portion in subpopulation t that would be recorded by in- terviewer i as being in the category; that is,

pi(t) = E(dtij I i). (2) (The subpopulations might represent shift, priority, or shift x priority-level categories.) If interviewer i's assignment is a random sample from subpopulation t and the finite population correction may be ignored, then

cov(dtij, dt'ij) = E[cov(dtij, dt'ij' I i)]

+ cov[E(dtij I i), E(dttlij | i)]

= var[ p(t)] if t =t

= cov[pi(t),pi(t')] if t # t' (3)

and

var(dt11) = var[p1(t)] + E{pi(t)[1 - pi(t)]}

= E[p1(t)] {1 - E[pi(t)]}. (4)

This content downloaded from 62.122.73.17 on Sat, 14 Jun 2014 18:47:38 PMAll use subject to JSTOR Terms and Conditions

Stokes: Estimation of Interviewer Effects in an RDD Survey 625

Thus Pt = cov(d1ij, dij')Ivar(dti1) and Ptt' = cov(d1ij, d,'ij1)l [var(d,i1)var(d'ij')I"/2 might vary from one subpopulation (or pair of them) to another if E(dti1) = E[pi(t)] does.

To avoid this model inadequacy, Anderson and Aitkin (1985) developed a method for estimating interviewer variability for binary variables by allowing for random effects in logit and probit models. As usual for these models, they hypothesized the presence of an unobservable con- tinuous random variable that determines the outcome of the observed variable. Let Y1ij (t = 1, . . ., T) denote the unobservable variable controlling the jth unit in inter- viewer i's assignment in subpopulation t, and let c1ij be the threshold value. Then

pi(t) = Pr[dtij = 1 i i] = Pr[Ytij > ctij I i]. (5)

The value ctij is assumed to be a random variable as well, but since its mean and variance can be absorbed into those of Y, we may with no loss of generality write pi(t) = Pr[ Yt1j > 0 | il. Y may be treated as having any unimodal distri- bution, but assuming a normal or logistic distribution gives rise to the probit or logit model for d. Any recognized treatment effects may then be included as fixed or random effects in an ANOVA model for Y, the vector of random variables Ytij; that is,

Y I pi, ... , |R G(XI + E ZrOr 21), (6)

where , and Or (r = 1, . . . , R) are the vectors of fixed and random effects, X and Zr are their associated design matrices, and o2 is a scale parameter (which is a variance if G is the normal cdf, but not necessarily otherwise). Anderson and Aitkin assumed that the elements of each Or are independent and marginally N(0, oar), regardless of the form of G. They then provided a method for obtaining maximum likelihood estimators of all Pyr = t74r/(U2 + Er oar). When o2 is a variance (e.g., in the probit model), Pyr has its usual interpretation, as the proportion of variation in Y explained by the rth random effect.

For the remainder of this discussion I assume the presence of only one random effect, attributable to inter- viewers. I then suppress the indexing subscript on the parameters described previously and write I1 = = 01, ***Apk), ufl = Ufi, and py = a2/(a2 + a.)

Excess variability in the unobservable variable Y (which is being measured by the estimation process) indicates the presence of interviewer effects, just as variability in the observable variable d does. Therefore, the parameter py can be helpful in identifying problem items on a ques- tionnaire or for comparing interviewer effects across sur- veys or interviewing modes, which were identified in the introduction as two of the uses of estimates of interviewer variability. In fact, py is more appropriate for that purpose (theoretically at least) than pt or pa', because they are affected by the prevalence of the attribute in the popu- lation and py is not.

The value po does not conveniently reflect the increase in variability of d due to the interviewers, however, which was the other reason identified in the introduction for estimating p. Anderson and Aitkin's approach does not directly yield an estimate of a parameter that can be used

for this purpose. Nevertheless, for any specific model of the form described by (5) and (6), pt can be written as a function of py and the parameter vector FL.

First, note again from (3) and (4) that

Pt = {E[pi(t)] - E2[pi(t)]}I{E[pi(t)] - E2[pi(t)]}. (7)

Thus pt depends on only the first and second moments of pi(t), which in turn depends on the choice of G in (6). For example, when G is the logistic distribution function, we have from (2), (5), and (6) that

pi(t) = exp(1t(p) + (flvio))I[l + exp(1t(p) + (flI))],

(8)

where i7t(,) = -E(Ytij)Ia = -xtp/a, and xt denotes a row from the design matrix X corresponding to a unit in subgroup t. Then, further assuming a normal distribution for fli yields

E[pi(t)] = fgt(z)i(z)dz (9)

and

E[pi (t)] = fg2(z) rfz)dz (10)

where gt(z) = exp{ft(F) + [pyl(l - py)]2 Z}I/ {1 + exp{6t(L) + [pyl(l - py)]l/2 z}} and 0 is the standard normal pdf. Equations (9) and (10) can be evaluated nu- merically using Gaussian quadrature. [The weights and quadrature points can be obtained, for example, from Abramowitz and Stegun (1972, table 25.10).] For small values of py, a closed-form approximation for E[pi(t)] and E[p;(t)], and thus for pt from (7), can be derived by ap- proximating (9) and (10) by a polynomial in z = pyl(l -

py), obtained from a Taylor series expansion about the point z = 0. This yields the expressions

E[pi(t)] i t + (4)ir(1 - 7r)(1 - 27r)[pyI(l - py)] and

E[ p; (t)]

7 2 + [7r(l

- 7)12[l

- (j)7r/(1

- 7)][py/l(

- py)],

where 7t = exp{i7t(p)/[1 + exp[t7t(F)]]}. The resulting relationship between pt and E(dtij) is

shown in Figure 1 for three values of py. [The functions are all symmetric about the point E(dtij) = 4.] Three ob- servations can be made. First, the closer E(dtij) is to 0 or 1, the smaller pt is. So, a given correlation among the unobservable variables has a smaller impact on the vari- ance of estimates of the prevalence of rare attributes than moderately common ones. Second, the curves are rela- tively flat over a large range when py is small; however, when E(dtij) is extreme, and especially when py is mod- erately large, the curves are steep. As a result, compari- sons of pt for rare but unequally prevalent attributes may give the incorrect impression that one suffers from inter- viewer effects more than the other, when in fact py, is the same for both. Finally, even at their maximum the Pt values are much smaller than the py, values. Therefore, it is not

This content downloaded from 62.122.73.17 on Sat, 14 Jun 2014 18:47:38 PMAll use subject to JSTOR Terms and Conditions

626 Journal of the American Statistical Association, September 1988

0.024-

0.023/ (tij,)

0.022/

0.021/

0.020

0.019

0.018

0.017.

0.018

0.015s

0.014.

0.013 -cy

Ptt 0. 012'

0. 010

0.010.

0.009

0.008

0.007 -

0.008.

0. 005.

0.004-

0. 003

0.0 0. 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

E (dtjj)

Figlure 1. Relationship Between Pt and E(dt,1). The value pt is shown as a function of E(dt,1) for three values of py:py = .2, .091, and .05. All functions are symmetric about the point E(d,,,) = 112.

surprising to find (as Anderson and Aitkin did) that the ANOVA estimate of intrainterviewer correlation, which estimates a kind of average pt value directly, is far smaller than the corresponding estimate of py.

The same method used for computing Pt can be used to determine pt't, the correlation between observed responses collected by the same interviewer but from different sub- populations. Its numerator [from (3)] requires computa- tion of

E[pi(t) pi(t')] = f gt(z)gt'(z)/(z)dz, (11)

which also can be obtained by Gaussian quadrature or approximated by a method similar to that described for E[p;(t)]. The relationship between ptt and the pair of values E(dtij) and E(dt1ij) is shown in Figure 2 for py = .091 and three values of E(dt'ij). (.091 is the estimated value of py in an example in Sec. 5.) The first and third observations about characteristics of pt can also be made about P, from Figure 2. (The second observation made about pt is true as well, but it is not clearly shown in Fig. 2.) The correlation function reaches its maximum and is symmetric about the point E(dtij) = .5, when E(dt,ij) = .5. It is asymmetric for any other value of E(dt'ij).

4. DATA ANALYSIS WITH THE NEW MODEL

4.1 The Estimators

For personal-visit surveys, interviewers are generally in- terpenetrated within areas, so the factor interviewer can be considered nested within the factor area. In contrast,

0.06

O. _5 2a

Pt

0.03

- ~~~~~~~~py _.091\

0.02\

0.01 0 _ n\

0.5o 0.55 0.00 0.05 0.70 0.75 0.ao 0.8as 0.90 0.95 00

E (d,1j)

Figure 2. Relationships Between pt,, E(dt,,), and E(dt,11). The value pt,, is shown as a function of E(d,,,) when py = .091, for three values of E(dt,,,): .5, .75, and .90.

in a telephone interpenetration experiment such as that described in Section 2, the factor interviewer is generally crossed with the fixed factors, since interviewers are al- lowed to work in more than one shift or priority level. This leads [using (7)] to the log-likelihood equation

L(py, ) = E log f l(z)O(z)dz, (12) l00

where

I (Z) - II exp{dtj.(q,(pL) + [py 1(l - pA)1/22z)} i) t{1 + exp(qlt(Fp) + [p,l(l - p,)]112z)}nt'

da. = Yjdtij, and nj denotes the number of sample units belonging to subpopulation t that are in interviewer i's assignment.

Anderson and Aitkin (1985) described a method for solving their likelihood equations [similar to those that would be obtained from (12)] by using GENSTAT, after numerically approximating the integral by Gaussian quad- rature. They found that this method, though simple to program, became computationally difficult at times be- cause of space problems when using an adequate number of quadrature points. They found that at least five quad- rature points are necessary for stability of the estimate when py is large; for small variance components, three are enough.

For this analysis, the maximum likelihood estimates were obtained by maximizing (12) directly using the software package GRG2 (available at the University of Texas at Austin). This package maximizes nonlinear equations by the generalized reduced-gradient method (Lasdon, Waren, Jamn, and Ratner 1978). The number of quadrature points does not appreciably affect the size of the computational problem with this approach, so six quadrature points were

This content downloaded from 62.122.73.17 on Sat, 14 Jun 2014 18:47:38 PMAll use subject to JSTOR Terms and Conditions

Stokes: Estimation of Interviewer Effects in an RDD Survey 627

used for all of the results given in Section 5. Computational difficulties may be encountered, however, if the assign- ment size of a single interviewer (71n ) is very large, since li(z) may then become too small. This problem was mit- igated in the RDD-I data analysis by multiplying each term in the product of li(z) by a large constant. This slows the rate at which li(z) diminishes, though not changing the values for which L(py, ,u) is maximized.

A likelihood-ratio test of the hypothesis Ho: py = 0, as well as of hypotheses concerning the fixed effects in the model, can be easily performed by comparing twice the difference of the log-likelihoods with a chosen percentage point of the appropriate x2 distribution. Confidence in- tervals for all parameters can be constructed from likeli- hood-ratio statistics as well.

4.2 Assessing the Model Fit

When the data set to be analyzed is large and the number of subpopulations required is few, the fit of this model can be assessed graphically by a plot of the residuals from the expected marginal distributions of the logits. To determine that distribution, first observe from a Taylor series expan- sion of (8) that the conditional distribution of the logit for a fixed interviewer has

E[logit,j I i] i1t(P) + /i3/1 (13) and

var[logit,j I i] i 1I{nt pj(t)[1 - pi(t)]}, (14) where logitti = log(til/(1 - Ati)) and Ati = dti.Inti. Equa- tions (13) and (14) imply that

E(logitti) == qt'f' (15)

and

var(logit,i) = E[var(logit,, I i)] + var[E(logitj I i)]

E{1/{npj(t)[1 - pi(t)]J}}

+ (1 - pY)/p. (16)

Therefore, if the model holds we can expect the stan- dardized logit, or

_ logit,j - E(logitj) [var(logit,1 )] 1/2

to have mean 0 and variance 1 for all i and t. Furthermore, cov(eti, eti) = 0, but cov(eti, et,') $ 0. By replacing the parameter values in (15)-(17) with their estimates, one can compute estimates of the standardized logits, e. If the model is adequate, a plot of eti against qt(jU) should show no trends and a constant variance near 1 in each subpopulation.

5. RESULTS FROM RDD-1

The analyses presented in this section are of data from replicates 6 and 7 of RDD-I, the last two of the experi- ment. By that time, the interviewers had completed their learning process and their biases had stabilized. There were 935 households whose questionnaires could be

matched to one of the 15 interviewers who worked in those two replicates. The interviewer workloads were unbal- anced; the largest number of interviews completed by a single interviewer was 174, and the smallest was 11.

The RDD-I questionnaire was similar to that of the Current Population Survey (CPS), containing items con- cerned with employment. Results from the analysis of the household respondent's labor-force status are discussed here. The question is identical to one in the CPS: "What were you doing most of last week, working or something else?" This is probably the most important item on the CPS questionnaire, for it is used in the production of the estimates of parameters such as unemployment rate and labor-force size.

The interviewer places the responses to this question into several labor-force categories (working; with a job but not at work; looking for work) or non-labor-force categories (keeping house; in school; retired; unable to work; other). An analysis was done for only a subset of these categories (working; looking for work; keeping house; other), since the other categories contained very few re- sponses. In addition, a composite category, containing those units in any non-labor-force category, was considered sep- arately.

A logistic model [as shown in (8)] having three fixed effects and a normally distributed interviewer effect was first fit for each variable. The fixed effects considered for each of the variables were replicate (6 and 7), shift (morn- ing: 9:00 a.m.-2:00 p.m. EDT weekdays; afternoon: 2:00 p.m.-7:00 p.m. EDT weekdays; evening: 7:00 p.m.-mid- night EDT weekdays; weekend: 10:00 a.m.-8:00 p.m. EDT Saturday), and priority level of the case when reached (new cases, previously called, and irregular cases, which include appointments and other cases with unusual cir- cumstances). The data for the working category are pre- sented in Table 1. It shows the number of responses coded as working and the total number of responses obtained by each interviewer for each replicate, shift, and priority level.

The replicate effect was not found significant for any category of the labor-force item (or for any other variable examined), but shift and/or priority were significant for four of them, so both effects were retained in the models reported here for all labor-force variables. Parameter es- timates for the model pi(t) = exp[,u + a, + sp + (A /3I)]/ {1 + exp[,u + a, + sp + (/Al3I)J}, where t = (s, p) is the subpopulation of shift s (M, morning; A, afternoon; E, evening; or W, weekend) and priority p (N, new; P, previously called; I, irregular) cases, which with aw = -aM-aA-aE and 5s = -5N - 5P are shown in Table 2.

Shift was a significant effect for the two variables "work- ing" (for which p c .001) and "not in the labor force" (for which p = .007). The parameter estimates for the shift effect coincide with what intuition suggests. For ex- ample, interviewers in daytime shifts are more likely to find their respondents not in the labor force than those in evening and weekend shifts, for which the opposite is true (i.e., aBM and cIA > 0, whereas (BE and &xw < 0). Working

This content downloaded from 62.122.73.17 on Sat, 14 Jun 2014 18:47:38 PMAll use subject to JSTOR Terms and Conditions

628 Journal of the American Statistical Association, September 1988

Table 1. Data for the "Working" Category: dt,.lnt,

Subpopulation t Interviewers i

Replicate Shift Priority 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

6 M N 6/15 1/4 1/1 2/3 4/8 5/9 3/11 5/10 7/16 M P 5/8 2/4 0/1 1/1 3/6 0/1 6/15 3/5 M I 0/1 2/3 3/3 A N 5/10 1/2 16/34 1/1 3/6 10/15 6/9 16/28 9/14 8/9 5/9 A P 0/2 3/8 1/2 0/2 0/1 0/3 4/5 1/1 1/1 A I 2/3 8/17 1/3 4/5 3/3 2/2 1/1 6/7 3/4 1/1 2/3 E N 1/1 1/3 2/2 3/4 6/12 2/3 E P 3/7 2/5 13/18 9/18 3/4 E I 6/9 1/3 4/5 18/20 5/7 W N 2/2 1/1 2/3 1/1 2/3 W P 1/2 1/1 2/3 3/3 W I 2/3 1/2 1/1 3/3 1/1

7 M N 0/1 2/4 6/16 5/15 7/20 5/14 10/14 M P 1/2 4/7 3/8 1/1 1/1 3/12 1/3 1/1 5/5 M I 1/1 0/2 0/1 1/1 A N 1/1 16/41 1/2 4/4 11/25 1/1 4/8 2/11 12/27 11/18 A P 6/8 1/2 1/2 1/1 0/1 1/2 4/4 3/4 A I 2/4 1/3 1/1 1/1 E N 2/2 4/6 3/6 1/2 E P 14/17 1/1 18/21 10/22 7/8 E I 7/7 0/1 5/6 4/6 2/2 W N 4/6 0/5 3/3 W P 1/1 4/4 4/6 5/6 4/5 W I 2/3 1/1 1/1 1/1

NOTE: The potential fixed effects considered were replicate, having levels 6 and 7; shift, having levels morning (M), afternoon (A), evening (E), and weekend (W); and priority, having levels new cases (N), previously called (P), and irregular (/). The value to the left of the slash is the number of responses coded "working." The value to the right of the slash is the total number of responses obtained by each interviewer for each replicate, shift, and priority level.

respondents are most often encountered on the weekend and least often in the morning.

Priority level was a significant effect for three of the variables considered: "keeping house" (p = .002), "work- ing" (p = .002), and "not in the labor force" (p = .002). As expected, the results indicate that those respondents in the categories "keeping house" or "not in the labor force" are most easily reached, since they are more fre- quently encountered as new cases than previously called ones, and least frequently become irregular cases. In con- trast, those in the "working" category are most frequently irregular cases, and least frequently new cases. Point es- timates of py were substantial for three of the variables considered: "working" (py = .091), "looking for work" (pY = .11), and "keeping house" (py = .095). Neverthe- less, a test of the hypothesis Ho :py = 0 could be rejected at a reasonable level for only the "working" category. [A 90% confidence interval for py for that variable is (.020,

Table 2. Parameter Estimates for the Labor-Force Variables Model

Variable

Looking for Keeping Not in the

Parameter Working work house Other labor force

p .58 -3.05 -2.42 -1.98 -.97 aM - .61 .32 .50 .36 .54 aA - .20 .25 .07 .00 .12 aE .28 .09 -.12 -.27 -.27 aw .53 -.66 -.45 -.09 -.39 6N - .31 .12 .61 .22 .40 6p --.11 .01 -.04 -.08 -.10 ,1 .42 -.13 .57 -.14 -.30 py .091 .11 .095 .00 .00

.21).] The smaller number of cases in the other two cat- egories make the power of the tests of no intrainterviewer correlation low.

To evaluate the impact of this variability on the ob- servable variables dij, and thus on d, estimates of pt and ptt' were obtained by substituting estimates for qt,() and py in (9)-(11) and then substituting the resulting estimates into (3) and (4). The behavior of Pt and pt' for the "work- ing" category can be seen in Figures 1 and 2. The estimate of pt ranges from a low of .015 to a high of .024 as the estimated value of E(d1ij) ranges from .82 (weekend "ir- regular" cases) to .51 (afternoon "new" cases), and esti- mates of pu', have a similar range. In contrast, the MINQUEO estimate of intrainterviewer correlation from a mixed ANOVA model having both shift and priority as fixed effects is .028, slightly larger than the largest Pt estimate from the new model.

The model fit was assessed graphically using the method described in Section 4.2; Figure 3 is typical of the results. They are residual plots of e^t against ht(ji) for the categories "working" and "other." Neither plot shows any obvious trends or variance heterogeneity.

The shift effects were collapsed before computing re- siduals for the category "other," since there were many subpopulation x interviewer cells that had no responses there. Shift was not a significant effect for "other," so this procedure should still yield a residual plot without trends if the model is adequate. The same approach was necessary for the other two sparse categories, "looking for work" and "keeping house."

A further examination of the residuals was made by computing their first two sample moments for each sub-

This content downloaded from 62.122.73.17 on Sat, 14 Jun 2014 18:47:38 PMAll use subject to JSTOR Terms and Conditions

Stokes: Estimation of Interviewer Effects in an RDD Survey 629

2.0 .5

a 4 b 2.0 o

1.5

O. i. * * ' '"4sl ',i4

? O ? v -41.0 1.0

4).5 0.O 0.5 1. 1.5 2 .0 -25-.*-.

ti 0* 4 0

0.5 ' 4~~~~~~~~~~~~~~~~~~~~~~~~~0 * 0~~~~~~~~~~~~~~~~~~~~~~~-.5 4

02.0 * 2 40

*0 * .O 0 1.02. 02.0 42 4-I

A n,(i4) ntG,)

Figure 3. Residual Plots. These are plots of the standardized logits [given in (17)] against r/t(R) for the (a) "working" and (b) "other" categories. There is no evidence of nonlinearity of variance heterogeneity in either case.

population. Table 3 shows nt, the number of interviewers having responses in subpopulation t, t = E etilnt, and st = [Sj(eti - et)21(nt - 1)]1/2, for the categories "working" and "other." Though the eti's are not identically distrib- uted, their means and standard deviations are all approx- imately 0 and 1, respectively, and uncorrelated within each subpopulation if the model holds. Thus one should expect to see et and st near 0 and 1 for each t if the model fit is adequate. This expectation is met reasonably well, with the exception that the standard deviations may be slightly too small. One reason for this is that the residuals for cells in which dti. = 0 or nti were actually based on adjusted

Table 3. Standardized Logits for the "Working" and "Other" Categories

Subpopulation

Shift Priority nt et St

Working

M N 10 .05 .92 M P 12 .10 .66 M 1 6 -.32 .95 A N 12 .25 .91 A P 11 -.28 1.19 A / 11 -.09 .85 E N 6 -.13 .54 E P 5 .02 1.74 E I 5 -.10 1.09 W N 6 -.42 1.52 W P 7 .15 .64 W I 7 -.47 .24

Other

N 15 .27 .80 P 15 -.10 1.13 I 13 -.04 .87

logits of the form log[(d,i. + .5)I(nti + 1 - d4f). These adjusted logits are less variable than (16) predicts.

6. CONCLUSIONS Estimating intrainterviewer correlation for binary items

for data from RDD-I required a change from the usual methods, because the nature of the interpenetration design is more complex than usual in personal-visit surveys. A model for accommodating variance components in binary variables was implemented. The new method allows in- ferences about the intrainterviewer correlation to be made correctly, and it provides a parameter better suited for comparing interviewer effects among items, interviewing modes, and populations.

The new model was used to evaluate interviewer effects for several variables on a random digit dial telephone sur- vey. The model appeared to fit the data reasonably well for all of the variables considered (those reported here and others). An indicator of the improved fit over the ANOVA model is that the maximum likelihood estimates of py, though unconstrained, were in all cases nonnegative, leading to nonnegative estimates of p, and p,u. This con- trasts with the ANOVA approach, which frequently yields negative estimates of p for items with small interviewer effects.

The results of the analysis of the labor-force status item were reported here. Three categories of the item showed some evidence of interviewer effects, though a test of a zero correlation could not be rejected for any but the "working" category. The data suggest that interviewers were consistent in recognizing a response as being in or out of the labor force, but were variable in their assign- ments to categories within the larger groups. Interestingly,

This content downloaded from 62.122.73.17 on Sat, 14 Jun 2014 18:47:38 PMAll use subject to JSTOR Terms and Conditions

630 Journal of the American Statistical Association, September 1988

a category that failed to show an effect due to interviewers was the catch-all category "other.'" It might have been expected to show an effect if the interviewers were not well trained in coding responses.

The results further showed that either shift or priority effects were needed in the model for most variables. Fail- ure to include the shift effect when it was needed produced a substantially larger estimate of py, since interviewer units tended to be concentrated in one or two shifts. Failure to include the priority effect when needed also consistently increased the estimate of py, but by a smaller amount. This supports the concern that interviewers are likely to vary with regard to the proportion of their assignments in various priority levels. This might have been caused in RDD-I not so much by the relative speed of the inter- viewers, but because certain interviewers were consistently chosen to work in the late stages of the replicate, when staffing needs were low and when higher-priority cases were all that remained.

[Received April 1987. Revised January 1988.]

REFERENCES

Abramowitz, M., and Stegun, I. (1972), Handbook of Mathematical Functions, New York: Dover Publications.

Anderson, D. A., and Aitkin, M. (1985), "Variance Component Models

With Binary Response: Interviewer Variability," Journal of the Royal Statistical Society, Ser. B, 47, 203-210.

Bailey, L., Moore, T. F., and Bailar, B. (1978), "An Interviewer Vari- ance Study for the Eight Impact Cities of the National Crime Survey Cities Sample," Journal of the American Statistical Association, 73, 16-23.

Fellegi, I. P. (1974), "An Improved Method of Estimating the Correlated Response Variance," Journal of the American Statistical Association, 69, 496-501.

Groves, R. M., and Magilavy, L. (1986), "Measuring and Explaining Interviewer Effects in Centralized Telephone Interviewing," Public Opinion Quarterly, 50, 251-266.

Hanson, R. H., and Marks, E. S. (1958), "Influence of the Interviewer on the Accuracy of Survey Results," Journal of the American Statistical Association, 53, 635-655.

Lasdon, L. S., Waren, A. D., Jain, A., and Ratner, M. (1978), "Design and Testing of a Generalized Reduced Gradient Code for Nonlinear Programming," ACM Transactions on Mathematical Software, 4, 34- 50.

McLeod, A., and Krotki, K. P. (1979), "An Empirical Investigation of an Improved Method of Measuring Correlated Response Variances," Survey Methodology, 5, 59-78.

Searle, S. R. (1971), Linear Models, New York: John Wiley. Stokes, S. L. (1986), "Estimation of Interviewer Effects in Complex

Surveys With Application to RDD," in Proceedings of the Second Annual Census Bureau Research Conference, Washington, DC: U.S. Department of Commerce, pp. 21-31.

Stokes, S. L., and Mulry, M. H. (1987), "Estimation of Interviewer Variance for Categorical Variables," Journal of Official Statistics, 3, 389-401.

U.S. Bureau of the Census (1968), Evaluation and Research Program of the U.S. Census of Population and Housing 1960: Effects of Interview- ers and Crew Leaders (Ser. ER60, No. 7), Washington DC: U.S. Government Printing Office.

This content downloaded from 62.122.73.17 on Sat, 14 Jun 2014 18:47:38 PMAll use subject to JSTOR Terms and Conditions