the us national comorbidity survey - harvard medical school · the us national comorbidity survey...

24
69 International Journal of Methods in Psychiatric Research, Volume 13, Number 2 The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PATRICIA BERGLUND, 2 WAI TAT CHIU, 1 OLGA DEMLER, 1 STEVEN HEERINGA, 2 EVA HIRIPI, 1 ROBERT JIN, 1 BETH-ELLEN PENNELL, 2 ELLEN E. WALTERS, 1 ALAN ZASLAVSKY, 1 HUI ZHENG 1 1 Department of Health Care Policy, Harvard Medical School, Boston MA, USA 2 Institute for Social Research, University of Michigan, Ann Arbor MI, USA ABSTRACT The National Comorbidity Survey Replication (NCS-R) is a survey of the prevalence and correlates of mental disorders in the US that was carried out between February 2001 and April 2003. Interviews were administered face-to-face in the homes of respondents, who were selected from a nationally representative multi-stage clustered area probability sample of households. A total of 9,282 interviews were completed in the main survey and an additional 554 short non-response interviews were completed with initial non-respondents. This paper describes the main features of the NCS-R design and field procedures, including information on fieldwork organization and procedures, sample design, weighting and considerations in the use of design-based versus model-based estimation. Empirical information is presented on non-response bias, design effect, and the trade-off between bias and efficiency in minimizing total mean- squared error of estimates by trimming weights. Key words: design effects, epidemiologic research design, sample bias, sample weights, survey design efficiency, survey sampling literature on survey methodology (Groves, Fowler, Couper, Lepkowski, Singer and Tourangeau, in press) and the fourth of which is based on considerations unique to the NCS-R. First, the coverage properties of an area probability sample are superior to other samples such as those used in telephone, mail, or Internet surveys. Second, the accuracy of screening and household enumeration procedures, which are required to create a probability sample, is greater in face-to-face surveys than in surveys based on these other modes of data collection. Third, response rates are generally much higher in face-to-face surveys than in those based on other modes of data collec- tion. Fourth, the NCS-R interview schedule was quite long and highly complex, making it impossible to use these other modes effectively. Although the above four considerations were sufficient to convince us that a face-to-face survey mode was needed, there were also additional advan- tages of this mode that we recognized and used to our advantage. One was related to the issue of length and The current paper presents an overview of the National Comorbidity Survey Replication (NCS-R) survey design and field procedures. We also discuss weighting and design effects. Although the instru- ment used in the NCS-R is described in a separate report in this issue (Kessler and Üstün, 2004), there is some discussion of broad outlines of the interview, with implications for the design of the survey. Survey mode The NCS-R was carried in the homes of a nationally representative sample of respondents between February 2001 and April 2003. The survey was administered using laptop computer-assisted personal interview (CAPI) methods by professional survey interviewers employed by the Survey Research Center (SRC) of the Institute for Social Research at the University of Michigan. The decision to use face- to-face administration rather than telephone, mail, or Internet administration was based on four main factors, the first three of which come from the

Upload: others

Post on 02-Feb-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

69International Journal of Methods in Psychiatric Research Volume 13 Number 2

The US National Comorbidity SurveyReplication (NCS-R) design and field proceduresRONALD C KESSLER1 PATRICIA BERGLUND2 WAI TAT CHIU1 OLGA DEMLER1 STEVEN

HEERINGA2 EVA HIRIPI1 ROBERT JIN1 BETH-ELLEN PENNELL2 ELLEN E WALTERS1 ALANZASLAVSKY1 HUI ZHENG1

1 Department of Health Care Policy Harvard Medical School Boston MA USA2 Institute for Social Research University of Michigan Ann Arbor MI USA

ABSTRACT The National Comorbidity Survey Replication (NCS-R) is a survey of the prevalence and correlates ofmental disorders in the US that was carried out between February 2001 and April 2003 Interviews were administeredface-to-face in the homes of respondents who were selected from a nationally representative multi-stage clustered areaprobability sample of households A total of 9282 interviews were completed in the main survey and an additional 554short non-response interviews were completed with initial non-respondents This paper describes the main features ofthe NCS-R design and field procedures including information on fieldwork organization and procedures sampledesign weighting and considerations in the use of design-based versus model-based estimation Empirical information ispresented on non-response bias design effect and the trade-off between bias and efficiency in minimizing total mean-squared error of estimates by trimming weights

Key words design effects epidemiologic research design sample bias sample weights survey design efficiency survey sampling

literature on survey methodology (Groves FowlerCouper Lepkowski Singer and Tourangeau in press)and the fourth of which is based on considerationsunique to the NCS-R First the coverage propertiesof an area probability sample are superior to othersamples such as those used in telephone mail orInternet surveys Second the accuracy of screeningand household enumeration procedures which arerequired to create a probability sample is greater inface-to-face surveys than in surveys based on theseother modes of data collection Third response ratesare generally much higher in face-to-face surveysthan in those based on other modes of data collec-tion Fourth the NCS-R interview schedule wasquite long and highly complex making it impossibleto use these other modes effectively

Although the above four considerations weresufficient to convince us that a face-to-face surveymode was needed there were also additional advan-tages of this mode that we recognized and used to ouradvantage One was related to the issue of length and

The current paper presents an overview of theNational Comorbidity Survey Replication (NCS-R)survey design and field procedures We also discussweighting and design effects Although the instru-ment used in the NCS-R is described in a separatereport in this issue (Kessler and Uumlstuumln 2004) thereis some discussion of broad outlines of the interviewwith implications for the design of the survey

Survey modeThe NCS-R was carried in the homes of a nationallyrepresentative sample of respondents betweenFebruary 2001 and April 2003 The survey wasadministered using laptop computer-assisted personalinterview (CAPI) methods by professional surveyinterviewers employed by the Survey ResearchCenter (SRC) of the Institute for Social Research atthe University of Michigan The decision to use face-to-face administration rather than telephone mailor Internet administration was based on four mainfactors the first three of which come from the

IJMPR 132 3rd v4 25604 1026 am Page 69

Kessler Berglund et al70

complexity of the interview which can lead to highrespondent burden in some cases The face-to-facesurvey mode made it possible for interviewers togauge respondent fatigue and to suggest short breaksif respondents needed time to regain their focusAlong the same lines interviews with respondentswho had complex histories of psychopathology wereoften broken up into two or more sessions that werespread out over a period of days or even weeks In arelated way use of the face-to-face mode minimizedthe problem of the respondent prematurely haltingthe interview (which is common in telephoneadministration) or completing only part of the assess-ment (which is common in mail questionnaires andInternet surveys) Break-offs of this sort are rare inin-person surveys This is especially true when as in the NCS-R interviewers are trained to monitorrespondent fatigue and to suggest breaks Consistentwith this thinking only 107 out of 9389 initialNCS-R respondents broke off an interview

The decision to use CAPI rather than paper-and-pencil (PAPI) administration was based on the factthat the interview schedule had many complex skipsThese skips create opportunities for interviewer errorin PAPI that are avoided in CAPI due to thecomputer controlling the skip logic Computer-assisted personal interviews can also be cost-effectivewhen the sample size is large because the investmentin the application programming is less than thelabour needed to keypunch PAPI responses An addi-tional appeal of CAPI compared to PAPI is that theinterviewer can be prompted for missing or inconsis-tent responses while the interview is in progressallowing these problems to be resolved immediately

Given the fact that the NCS-R interview askedabout a number of embarrassing feelings and behav-iours a question could be raised whether the methodof audio computer-assisted self-administered inter-viewing (A-CASI) should have been used instead ofmore conventional CAPI The use of A-CASI allowsrespondents to enter answers to embarrassing ques-tions into a laptop without the interviewer knowingtheir answers by using digital audio recordings andheadsets connected to the laptop to administer thesurvey questions There is now impressive evidencesuggesting that A-CASI leads to significantly higherreports of some illegal or embarrassing behaviours(Turner Lessler and Devore 1982 Tourangeau andSmith 1998 Turner Ku Rogers Lindberg Pleck

and Sonenstein 1998) Our decision not to use A-CASI despite this evidence was based on a concernabout non-comparability of responses for purposes oftrending with the baseline NCS and also with timingconsiderations Regarding the latter the field periodfor the NCS-R was set back more than 2 years fromits original start date because of unplanned complexi-ties in mounting a parallel national survey ofadolescent mental health The use of A-CASI wouldhave added to this delay

As noted above it was sometimes necessary toadminister an interview over two or more sessionsMost of the follow-up sessions were carried out inperson in the homes of respondents Interviewerswere allowed to complete follow-up sessions over thetelephone in three kinds of situations (1) when therespondent requested telephone administration forthe remaining questions (2) when the respondentlived in a remote rural area that required a great dealof travel time for the interviewer to reach and (3)when the remaining questions were few in numberIn each of these three situations the interviewer lefta respondent booklet with the respondent to be usedas a visual aid in completing the remainder of theinterview by telephone

The NCS-R interview scheduleAs described in more detail by Kessler and Uumlstuumln(2004) the NCS-R interview schedule was theversion of the World Health Organization (WHO)Composite International Diagnostic Interview(CIDI) that was developed for the WHO WorldMental Health (WMH) Survey Initiative Thisinstrument is referred to as the WMH-CIDI A smallnumber of supplemental sections were also includedthat were unique to the US version of the survey Ahard copy of the instrument is posted atwwwhcpmedharvardeduncs Only one importantaspect of the interview is discussed here ndash its length ndashas this had important implications for design andfield procedures

The NCS-R interview schedule was quite long Ittook a minimum of 90 minutes to complete amongrespondents who reported no lifetime disorders anaverage of approximately 2 hours and 30 minutesamong people with a history of disorder and as longas 5 to 6 hours among respondents with a verycomplex history of many different disorders Thislong length was due to the fact that the scientific

IJMPR 132 3rd v4 25604 1026 am Page 70

NCS-R design and field procedures 71

questions guiding instrument development requiredus to assess a wide variety of topics in the samesample of respondents Some of the sections couldhave been administered independently of the maininterview in a separate survey without undercuttingthe investigation of the issues considered in thosesections However given that major mental healthsurveys like the NCS-R are only funded once everydecade we were eager to keep all of these sections inthe instrument

The problem of interview length was addressed inthree ways First we carefully reviewed each questionin the interview to make sure it added value We alsocarefully evaluated each skip instruction to makesure that we were skipping respondents out ofsections as soon as we had the information needed toevaluate the issues under consideration This wasespecially important in the diagnostic sectionswhere it was possible to skip respondents once itbecame clear that they failed to meet any symptomrequired for a diagnosis Given that one of our aimswas to explore sub-threshold diagnoses though wehad to balance the desire to invoke skips with theneed to obtain sub-threshold information

Second we evaluated the data-analysis goals todetermine if they could be achieved by administeringthe section to a probability subsample of respondentsrather than to all respondents For example onesection replicated questions about psychologicaldistress that were originally developed for the 1957Americans View Their Mental Health survey(Gurin Veroff and Feld 1960) and were laterrepeated in a 1976 replication of that survey (VeroffDouvan and Kulka 1981) The aim was to merge thearchival individual-level data files from these earliersurveys with NCS-R data to study trends in self-

reported psychological distress over time Howeveras the earlier surveys were based on samples of1800ndash2100 respondents there would have beenlittle incremental improvement in the statisticalpower of trend comparisons if we administered thesequestions to the entire sample This section wasconsequently administered to a 30 subsampleSimilar subsampling was used to evaluate familyburden and to assess a number of disorders that were either included for exploratory purposes (forexample adult attention-deficithyperactivitydisorder adult separation anxiety disorder) or thatrequired in-depth assessment (non-affectivepsychosis obsessive-compulsive disorder)

Third the interview schedule was divided intotwo parts Part I administered to all respondentsincluded all core WMH-CIDI disorders The admin-istration time of Part I averaged 338 minutes andhad an inter-quartile range between 226 and 398minutes (Table 1) Part II included assessments ofrisk factors consequences services and other corre-lates of the core disorders Part II also includedassessments of additional disorders that were eitherof secondary importance or that were very time-consuming to assess The administration time of PartII had a mean of 1094 minutes a median of 1017minutes and an inter-quartile range between 839and 1241 minutes In order to reduce respondentburden Part II was administered only to 5692 of the9282 NCS-R respondents over-sampling those withclinically significant psychopathology All respon-dents who did not receive Part II were administered abrief demographic battery and were then eitherterminated or sampled in their appropriate propor-tions into sub-sampled interview sections that aredescribed below

Table 1 Administration time (minutes) of the Part I and Part II NCS-R interview schedule

Among respondents who completedPercentile Part I only Part I and Part II

Part I Part II

0 24 01 0125 226 335 83950 297 492 101775 398 708 124199 1029 1824 2565

Mean 338 573 1094

IJMPR 132 3rd v4 25604 1026 am Page 71

Kessler Berglund et al72

Selection into Part II was controlled by the CAPIprogram which divided respondents into three stratabased on their Part I responses The first stratumconsisted of respondents who either (1) met lifetimecriteria for at least one of the mental disordersassessed in Part I (2) met subthreshold lifetimecriteria for any of these disorders and sought treat-ment for at least one of them at some time in theirlife or (3) ever in their life either made a plan tocommit suicide or attempted suicide All of thesepeople were administered Part II The secondstratum consisted of respondents who although notmeeting criteria for membership in the first stratumgave responses in Part I indicating that they either(1) ever met subthreshold criteria for any of the PartI disorders (2) ever sought treatment for anyemotional or substance problem (3) ever hadsuicidal ideation or (4) used any psychotropicmedications (whether or not under the direct super-vision of a physician) in the past 12 months to treatemotional problems A probability sample of 59 ofthe respondents in this second stratum was selectedto receive Part II The third stratum consisted of allother respondents of whom 25 were selected toreceive Part II

As virtually all planned analyses of the NCS-Rdata feature comparisons of cases with non-cases andas the number of non-cases substantially exceeds thenumber of cases the under-sampling of non-cases inthe second and third strata has only a small effect onstatistical power to study correlates of disorder Thisassertion is demonstrated empirically later in thepaper An additional efficiency of the Part II sampleis that respondents in the second and third stratawere selected with probabilities proportional to theirhousehold size This approach which is described inmore detail below is opposite of the procedure usedin initial sample selection leading to a partialcancelling out of the variation in the within house-hold probability of selection weight

The survey populationThe survey was designed to be representative ofEnglish-speaking adults ages 18 or older living in thenon-institutionalized civilian household populationof the coterminous US (excluding Alaska andHawaii) plus students living in campus grouphousing who have a permanent household address

Fieldwork organization and proceduresAs noted above the NCS-R fieldwork was carriedout by the professional SRC national field interviewstaff Over 300 interviewers participated in datacollection The SRC field staff was supervised by ateam of 18 experienced regional supervisorsSupervisors in larger regions also had team leaderswho worked with them A study manager located atthe central SRC facility in Michigan oversaw thework of the supervisors and their staff

After sample selection (see below) each inter-viewer received a folder from his or her supervisor foreach household in which an interview was to beobtained An advance letter and study fact brochurewere sent to each of these households at a timedesigned to arrive a few days before the interviewermade their first contact attempt The letterexplained the purpose of the study and gave a toll-free number for respondents who had additionalquestions The study fact brochure containedanswers to frequently asked questions Upon makingin-person contact with the household the inter-viewer explained the study once again and obtaineda household listing This listing was then used toselect a random respondent in the household Therandom respondent was approached the interviewwas explained and verbal informed consent wasobtained Respondents were given $50 as a token ofappreciation for participating in the survey It shouldbe noted that verbal rather than written informedconsent was obtained because the NCS-R wasdesigned as a trend study replication of the baselineNCS which used verbal informed consent

In cases where the interviewer had difficultycontacting the household or in which the householdwas reluctant to participate persuasion letters weresent to the household Sixty days before the end of thefield period a special effort was made to recruit asmany unresolved cases as possible by sending a specialrecruitment letter and offering a higher financialincentive ($100) to complete a truncated version ofthe interview either in person or over the telephoneInterviewers were allowed to make unlimited in-personcontact attempts to complete these final interviewsand were given financial incentives for completingthese interviews during the closeout period

Survey Research Center interviewers were paid bythe hour rather than by the interview This is

IJMPR 132 3rd v4 25604 1026 am Page 72

NCS-R design and field procedures 73

important in light of evidence that interviewers paidby the interview tend to get low response ratesbecause they focus their efforts on interviewing easy-to-recruit respondents In the case of theWMH-CIDI where there is great variation in lengthof interview per-interview payment rather than per-hour payment also encourages interviewers to rushthrough long interviews We wanted to avoid thisand in fact we encouraged interviewers duringtraining to work especially hard at long interviewsbecause respondents with long interviews are gener-ally those with complex histories of psychopathologywhich are of special interest to the project We alsoencouraged interviewers to set up appointments forsecond and third interview sessions to complete longinterviews It is easy to facilitate such procedureswhen interviewers are paid by the hour

The Human Subjects Committees of bothHarvard Medical School (HMS) and the Universityof Michigan approved these recruitment consentand field procedures

Interviewer training Each professional SRC interviewer must complete atwo-day general interviewer training (GIT) coursebefore working on any SRC survey Moreover expe-rienced interviewers have to complete GIT refreshercourses on a periodic basis Each interviewer whoworked on NCS-R also received 7 days of study-specific training Each interviewer had to completean NCS-R certification test that involved adminis-tering a series of practice interviews with scriptedresponses before beginning production work

Fieldwork quality control As noted above sample households were selectedcentrally to avoid interviewers selectively recruitingrespondents from attractive neighbourhoods Therandom respondent in the household was selectedusing a standardized method that minimizes theprobability of interviewer cheating to select easy-to-recruit household members In addition the CAPIprogram controlled skip logic and had a built-inclock to record speed of data entry making it difficultfor interviewers to truncate interviews by skippingsections or to fake parts of interviews by filling in thelast sections quickly

Despite these structural disincentives to error and

cheating supervisors checked for all of these types oferror Supervisors also made contact with a random10 of interviewed households to confirm thehousehold address household enumeration andrandom selection procedures Supervisors alsoconfirmed the length of the interview and repeated arandom sample of questions in order to make surethat interviewers administered the full interview torespondents and to make sure responses wererecorded accurately

Survey Research Center field procedures called forcompleted CAPI interviews to be sent electronicallyby interviewers every night This allowed supervisorsto review the completeness of open-ended responsesand to make other quality control checks of the dataon a daily basis In cases where problems weredetected the interviewers were contacted andinstructed to re-contact the respondent to obtainmissing data

The SRC uses a computerized survey tracking soft-ware system to facilitate field quality control Thenumber of interviews completed outstandingresponse rate hours per interview and so forth areall recorded on a constantly updated basis by thissystem at the interviewer level along with bench-mark comparisons Supervisors can at a glance callinto the SRC computer server to monitor thesestatistics and go over them with individual inter-viewers in their weekly phone calls These same dataare available to study staff The supervisors SRCand HMS staff used these systems to pinpoint inter-viewers with low response rates in the first replicatefor remedial re-training Interviewers who persistedin low performance or who were found to makeconscious errors were terminated from the study

In the case of interviews with stem-branch logiclike the WMH-CIDI an additional type of potentialproblem is that interviewers who want to shorten thelength of the interview can do so by entering nega-tive responses to diagnostic stem questions evenwhen respondents endorse these questions As notedearlier SRC interviewers are paid by the hour in aneffort to avoid providing an incentive for this type ofcheating but we have seen clear evidence of it insurveys where interviewers were paid by the inter-view and supervision was only minimal Thereforein the NCS-R as in the other WMH surveys inter-viewer-level data were monitored to look for

IJMPR 132 3rd v4 25604 1026 am Page 73

Kessler Berglund et al74

evidence of systematic patterns of under-reportingdiagnostic stem questions Consistent with the ratio-nale for paying interviewers by the hour no suchevidence was found in the NCS-R

The sample design

Household sample selection proceduresRespondents were selected from a four-stage areaprobability sample of the non-institutionalizedcivilian population using small area data collected bythe US Bureau of the Census from the year 2000census of the US population to select the first twostages of the sample

The first stage of sampling selected a probabilitysample of 62 primary sampling units (PSUs) repre-sentative of the population These PSUs were linkedto the original PSUs used in the baseline NCS inorder to maximize the efficiency of cross-timecomparison Each PSU consisted of all counties in acensus-defined metropolitan statistical area (MSA)or in the case of counties not in an MSA of indi-vidual counties Primary sampling units wereselected with probabilities proportional to size (PPS)and geographic stratification from all possiblesegments in the country

The 62 PSUs include 16 MSAs that entered thesample with certainty an additional 31 non-certainty MSAs and 15 non-MSA counties Thecertainty selections include in order of size NewYork City Los Angeles Chicago PhiladelphiaDetroit San Francisco Washington DC DallasFortWorth Houston Boston Nassau-Suffolk NY StLouis Pittsburgh Baltimore Minneapolis andAtlanta These 16 PSUs are referred to as lsquoself-representingrsquo PSUs because they were not selectedrandomly to represent other MSAs but are so largethat they represent themselves Rigorously speakingthese 16 are not PSUs in the technical sense of thatterm but rather population strata The other 46PSUs are lsquonon-self-representingrsquo in the sense thatthey were selected to be representative of smallerareas in the country Systematic selection from anordered list was used to select the non-self-repre-senting PSUs with the order building in secondarystratification for geographic variation across thecountry After selection each of the three largestself-representing PSUs (New York Los AngelesChicago) was divided into four pseudo-PSUs while

each of the remaining 13 self-representing PSUs wasdivided into two pseudo-PSUs When combinedwith the 46 non-self-representing PSUs this yieldeda sample of 84 PSUs and pseudo-PSUs We hence-forth refer to both PSUs and pseudo-PSUs as PSUs

The second stage of sampling divided each PSUinto segments of between 50 and 100 housing unitsbased on 2000 small-area census data and selected aprobability sample of 12 such segments from eachnon-self-representing PSU A larger number ofsegments were selected from the self-representingPSUs using the ratio of the population size over thesystematic sampling interval Within each PSU thesegments were selected systematically from anordered list with probabilities of selection propor-tional to size The order in the list built in secondarystratification for geographic variation A total of1001 area segments were selected in the entiresample

Once the sample segments were selected eachsegment was either visited by an interviewer torecord the addresses of all housing units (HUs) (inthe case of segments that were not included in thebaseline NCS) or the listing used in the baselineNCS was updated for new construction and demoli-tion (in the case of segments that were included inthe baseline NCS) These lists were entered into acentralized computer data file In order to adjust fordiscrepancies between expected and observednumbers of HUs a random sample of HUs wasselected that equals 10 times OE where O is theobserved number of households listed in the segmentand E is the number of HUs expected in the segmentfrom the Census data files

This three-stage design creates a sample in whichthe probability of any individual HU being selectedto participate in the survey is equal for every HU inthe coterminous US The fourth stage of selectionthen obtained a household listing of all residents inthe age range 18 and older from a household infor-mant The informant was also asked whether eachadult HU resident spoke English Once the HUlisting was obtained a probability procedure wasused to select one or in some cases (see below) tworespondents to be interviewed As the number ofsampled cases in the HU did not increase proportion-ally with increase in HU size there is a bias in thisapproach toward underrepresenting people who livein large HUs As described below this bias was

IJMPR 132 3rd v4 25604 1026 am Page 74

NCS-R design and field procedures 75

corrected by weighting the data to adjust for thedifferential probability of selection within HUs

Procedures for sampling students living away from homeThe largest segment of the US population not livingin HUs consists of students who live in campus grouphousing (for example dormitories fraternities andsororities) We included these students in thesampling frame if they had a permanent homeaddress (typically the home of their parents) byincluding them in HU listings in all sample house-holds If they were selected for interview theircontact information at school was obtained andspecial arrangements were made to interview themStudents living in off-campus private housing ratherthan campus group housing were eligible for sampleselection in their school residences Students of thislatter sort were not included in the HU listings oftheir permanent residences in order to avoid givingsuch students two chances to enter the sample

Within-household sample selection proceduresRecruitment of HUs began by the interviewermailing an advance letter and study fact brochure tothe HU These materials explained the purposes ofthe study described the funding sources and thesurvey organization that was carrying out the surveylisted the names and affiliations of the senior scien-tists involved in the research provided informationabout the content and length of the interviewdescribed confidentiality procedures clearly statedthat participation was voluntary and provided a toll-free number for respondents who had additionalquestions The advance letter said that the inter-viewer would make an in-person visit to the HUwithin the next week in order to answer anyremaining questions and to determine whether theHU resident selected to participate would be willingto do so

In cases where it was difficult to find a HU resi-dent at home at least 15 in-person call attemptswere made to each sample HU to make an initialcontact with a HU member Additional contactattempts were made by telephone using a reversedirectory and by leaving notes at the HU with thestudy toll-free number and asking someone in theHU to call the study office to schedule an appoint-ment Additional in-person contact attempts weremade based on the discretion of the supervisor

Once a HU resident was contacted a listing of alleligible HU residents was made that recorded the ageand sex of each such person confirmed that eachcould speak English and ensured that permanentresidents who were students living away from homein campus group housing were included The Kishtable selection method was used to select one eligibleperson as the primary predesignated respondent Inaddition in a probability sample of households inwhich there were at least two eligible residents asecond predesignated respondent was selected inorder to study within-household aggregation ofmental disorders and to reduce variation across HUsin within-household probability of selection into thesample No attempt was made to select or to inter-view a secondary respondent unless the primaryrespondent was interviewed first This decision wasmade to avoid compromising the response rate forprimary respondents

The sampling fraction for the second predesig-nated respondent was lower in HUs with only two eligible respondents than those with three ormore eligible respondents in order to reduce variancein the within-household probability of selectionweight No more than two interviews were taken perHU even when the number of HU residents was largebecause we were concerned that taking more thantwo interviews would adversely affect the responserate by increasing household burden Moreover welimited selection of a second respondent to 25 ofHUs because we were concerned that using thisprocedure in a larger proportion of HUs in a samplewith a target of 10000 interviews would adverselyaffect the geographic dispersion of the sample byreducing the total number of HUs in the sample

Interviewing hard-to-recruit casesHard-to-recruit cases included both HUs that weredifficult to contact because the residents were usuallyaway from home and respondents who were reluctantto participate once they were reached The first ofthese problems was addressed by being persistent inmaking contact attempts at all times of day and alldays of the week We also left notes at the HUswhere we were consistently unable to make contactsthat included toll free numbers for residents tocontact us to schedule interview appointments Thesecond problem was addressed by offering a $50financial incentive to all respondents and by using a

IJMPR 132 3rd v4 25604 1026 am Page 75

Kessler Berglund et al76

variety of standard survey persuasion techniques toexplain the purpose and importance of the surveyand to emphasize the confidentiality of responses

Main data collection continued until all non-contact HUs had their full complement of callattempts and all reluctant predesignated respondentshad a number of recruitment attempts It was clear atthis point that the remaining hard-to-recruit caseswere busy people who were unlikely to participate ina survey that could reasonably be expected to lastbetween 2 and 3 hours We therefore developed ashort-form version of the interview that could beadministered in less than 1 hour and we made onelast attempt to obtain an interview with hard-to-recruit cases using this shortened instrumentRemaining hard-to-reach cases were sent a letterinforming them that the study was about to end andthat we were offering an honorarium of $100 forcompleting a shortened version of the interview inthe next 30 days that would last no more than onehour Interviewers were also given a bonus for eachshort-form interview they completed in this 30-dayclose-out period at which point fieldwork ended

Sample dispositionThe sample disposition as of the end of the mainphase of data collection is shown in the first fourcolumns of Table 2 The first two columns describeprimary predesignated respondents in the 10843

final sample HUs while the third and fourthcolumns describe secondary predesignated respon-dents in the 1976 HUs that were selected to obtainsecond interviews Ineligible HUs are excluded fromthe table The response rate at this point in the datacollection was 709 among primary and 804among secondary predesignated respondents with atotal of 9282 completed interviews Non-respondents included 73 of primary and 63 of secondary cases who refused to participate 177 ofprimary and 116 of secondary respondents whowere reluctant to participate (who told the inter-viewer that they were too busy at the time of contactbut did not refuse) 20 of primary and 17 ofsecondary respondents who could not be interviewedbecause of either a permanent condition (forexample mental retardation) or a long-term situa-tion (such as an overseas work assignment) and HUsthat were never contacted (20)

We subsequently attempted to administer short-form interviews to the 2143 reluctant andno-contact primary predesignated respondents andthe 230 reluctant secondary predesignated respon-dents described in the first two columns of Table 2In the course of completing the short-form inter-views with primary respondents an additional 104secondary pre-designated respondents were gener-ated resulting in a total of 334 secondarypre-designated respondents who we attempted to

Table 2 The NCS-R sample disposition

Main interview Short-form interview

Primary Secondary Primary Secondary

(n) (n) (n) (n)

Interview 709 (7693) 804 (1589) 186 (399) 464 (155)Refusal 73 (791)1 63 (124)1 751 (1609) 443 (148)Reluctant 177 (1922) 116 (230) ndash 2 ndash 2 ndash 2 ndash 2Circumstantial 20 (216) 17 (33) 06 (14) 93 (31)No contact 20 (221) ndash 3 ndash 3 56 (121) ndash 3 ndash 3Total (10843) (1976) (2143) (334)

1 Refusals among primary respondents include informants who refused to provide a HU listing respondents who refused to be interviewed and respondents who began the interview but broke off before completing Part I Refusals among secondary respon-dents include the last two of these three categories2 All non-respondents in the short-form interview phase of the survey were classified as refusals rather than reluctant unless theyhad circumstantial reasons for not being interviewed 3 No contact is defined at the HU-level as never making any contact with a resident As a result none of the secondary pre-designated respondents was included in this category even if no contact was ever made with them In the latter case they wereclassified as reluctant in the main interview and as refusals in the short-form interview

IJMPR 132 3rd v4 25604 1026 am Page 76

NCS-R design and field procedures 77

interview The sample disposition is shown in thelast four columns of Table 2 The conditionalresponse rate was 186 among primary and 464among secondary pre-designated respondents with atotal of 554 completed short-form interviews

The 9282 main interviews combined with the554 short-form interviews total to 9836 with 8092among primary and 1744 among secondary respon-dents Focusing on primary respondents the overallresponse rate is 746 the overall cooperation rate(excluding from the denominator HUs that werenever contacted) is 761 and the cooperation rateamong pre-designated respondents without circum-stantial constraints on their participation is 778Among secondary respondents the conditionaloverall response rate is 838 and the cooperationrate among pre-designated respondents withoutcircumstantial constraints is 865

WeightingTwo approaches were taken to weighting the NCS-Rdata The first which we refer to as the non-response(NR) adjustment approach treats the 9282 main interviews as the achieved sample The short-form interviews are treated as non-respondentinterviews which are used to develop a non-responseadjustment weight applied to the 9282 main inter-views The second approach which we refer to as themultiple-imputation (MI) approach combines the9282 main interviews with the 554 short-form inter-views to create a combined sample of 9836 cases inwhich the 554 short-form interviews are weighted totreat them as representative of all initial non-respon-dents The method of MI (Rubin 1987) is used toadjust for the fact that the short-form interviews didnot include all the questions in the main interviewsAdditional weights are applied in both approaches toadjust for differences in within-household proba-bility of selection and to post-stratify the finalsample to approximate the distribution of the 2000Census on a range of socio-demographic variables

The NR approach is the main one used in analysisof the NCS-R data The MI approach is moreexploratory because MI can lead to downward bias inestimates of associations if the models used to makethe imputations do not capture the full complexity ofthe associations in the main cases On the otherhand by using explicit models to impute missingvalues the MI approach allows more flexibility than

the NR approach in combining observed variablesfor short-form cases with patterns detected in thecomplete data In addition mild modelling assumptions can be used in the MI approach toimprove efficiency compared to the NR approach Incases of this sort when the parameter estimates aresimilar in the two approaches the SE will generallybe lower in the MI approach As a result of thesepotential advantages of the MI approach we plan touse MI to replicate marginally significant substantivepatterns as a sensitivity analysis bearing in mindthat shrinkage of the associations might occur due tolack of precision in the imputations Based on thishierarchy between the two approaches in the NCS-Ranalyses the focus in this section of the paper islargely on the NR approach

The non-response adjustment approachFive weights are applied to the data in the NRapproach a locked building subsampling weight(WT11) a within-household probability of selec-tion weight (WT12) a non-response adjustmentweight (WT13) a post-stratification weight(WT14) and a Part II selection weight (WT15)The joint product of the first four of these weights isthe consolidated weight used to analyse data fromthe Part I sample in the NR approach (n = 9282)while the joint product of all five weights is theconsolidated weight used to analyse data from the PartII sample (n = 5692) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample andweight are used This section of the paper reviewseach of these five weights

The locked building subsampling weight (WT11)gives a double weight to 60 sample HUs from an orig-inal 120 apartments in locked apartment buildingswhere we could not make contact with a superinten-dent or other official to gain entry A random 50 ofthe predesignated units in these buildings wereselected for especially intense recruitment effortThese 60 were double weighted It should be notedthat residents of a number of other locked apartmentbuildings were also included and interviewed basedon the interviewers contacting the building ownersuperintendent or official committee of residentsand obtaining permission to carry out interviews inthe building In addition we were officially refusedpermission to carry out interviews in five large

IJMPR 132 3rd v4 25604 1026 am Page 77

Kessler Berglund et al78

locked buildings in New York City each representinga complete sample segment Non-probabilityreplacements of other locked buildings in the sameneighbourhoods were made in order to avoidcomplete non-response in these segments

The within-household probability of selectionweight (WT12) adjusts for the fact that the proba-bility of selection of respondents within HUs variesinversely with the number of people in the HU Thisis true because as noted earlier only one or tworespondents were selected for interview in each HUWT12 was created by generating a separate observa-tional record for each of the weighted (by WT11)14025 eligible HU residents in the 7693 partici-pating HUs The three variables in the householdlisting were included in the individual-level recordsrespondent age and sex and number of eligible resi-dents in the HU The weighted distributions of thesethree variables were then calculated for the 9282respondents the 4733 eligible HU residents whowere not interviewed and for all 14015 eligible HUresidents weighted to adjust for WT11 The sum ofweights was 14025 because of this adjustment

As shown in Table 3 these distributions differmeaningfully with the greatest difference being for

size of household This is because the individual-level probability of selection within HUs wasinversely proportional to HU size leading to adistortion in the age and sex distributions amongrespondents because these variables vary with size ofHU WT12 was constructed to correct this bias Theweighted three-way cross-classification of age sexand household size was computed separately for the9282 respondents (weighted to 9287 because ofWT11) and for all 14015 residents of the partici-pating HUs (weighted to 14025) The ratio of thenumber of cases in the latter to the former was thencalculated separately for each cell and these ratioswere applied to each respondent The sum of theweights across respondents was normed to sum to14025 These normed ratios define WT12

The non-response adjustment weight (WT13)was then developed and applied to the weighted (bythe product of WT11 and WT12) dataset of 9282respondents to adjust for the fact that non-respondents differ from respondents The partici-pants in the short-form interviews from initialnon-respondent HUs were treated as representativeof all non-respondents in order to develop thisweight As described in the next subsection a two-

Table 3 Household listing information for respondents and other eligible residents of the households thatparticipated in the main survey1

Respondents Others Total

(SE) (SE) (SE)

Age18ndash29 227 (04) 269 (06) 241 (04)30ndash44 317 (05) 298 (07) 311 (04)45ndash59 246 (04) 266 (06) 253 (04)60+ 210 (04) 166 (05) 195 (03)

SexMale 446 (05) 520 (07) 471 (04)Female 554 (05) 480 (07) 529 (04)

Number of eligible HU residents1 293 (05) 00 (00) 194 (03)2 539 (05) 606 (07) 562 (04)3+ 168 (04) 394 (07) 244 (04)

Total (n) (9287) (4738) (14025)

1 A total of 9282 respondents from 7693 HUs participated in the main survey An additional 4733 eligible people lived in the 7693HUs The locked building weight (WT11) was used in calculating the distributions in the table converting the 9282 respondents intoa weighted total of 9287 and the 4733 other eligible HU residents into a weighted total of 4738 for a total of 14025

IJMPR 132 3rd v4 25604 1026 am Page 78

NCS-R design and field procedures 79

part weight was applied as a preliminary step to theseshort-form respondents aimed at making them asrepresentative as possible of all residents of non-respondent HUs As the short-form respondentscompleted the disorder screening section as well asthe demographic sections it was possible to use diag-nostic stem questions in addition to individual-leveldemographic variables as predictors in comparing themain survey respondents with the initial non-respon-dents Prior to carrying out the stepwise logisticregression analysis the main survey respondentswere weighted to a sum of weights of 14025 to repre-sent all eligible residents of the respondent HUswhile the initial non-respondents were weighted to asum of weights of 6302 to represent all eligible resi-dents of non-respondent HUs In both cases thesesums were obtained from HU listings

WT13 is a very important weight because thelogistic regression equation on which it is based eval-uates whether there is a significant bias in theprevalence of mental disorders in the main sampleWe consequently consider the construction ofWT13 in some detail in this part of the paper Fourstages of model fitting were used to create the logisticregression equation on which WT13 is based Thesestages sequentially evaluated the effects ofgeographic variables individual-level demographicvariables census small area aggregate variables andscreening measures of mental disorders Results ofthe first three stages are presented in Table 4 Thefirst stage (Model 1) examined segment-level differ-ences between respondent and non-respondent HUsin Census region and Census urbanicity Results inthe first two columns of Table 4 show that respon-dent HUs differ meaningfully from non-respondentsHUs in urbanicity (χ2

7 = 1060 p = 000) and region(χ2

3 = 66 p = 009) The odds-ratios (ORs) forurbanicity and region show that the sample over-represents non-metropolitan counties the Midwestand the South

The second stage (Model 2) added basic individual-level demographic variables to the equa-tion ndash age sex race-ethnicity marital statuseducation employment status and household sizeAs shown in Table 4 household size (χ2

3 = 604 p =000) and marital status (χ2

2 = 92 p = 001) werethe only significant predictors in this set withrespondents significantly less likely than non-respondents to live in houses with one to three

eligible residents and to be never married or previ-ously married

The third stage (Model 3) was based on a forwardstepwise logistic regression analysis that searched for2000 census block group (BG) aggregate measuresthat significantly discriminate between respondentsand non-respondents A wide range of variables wasused to describe BG characteristics includingpercentage variables (for example the percentage ofadults in the BG living in poverty the percentageliving alone) and mean variables (for example theaverage family income of HUs in the BG the averagenumber of adults per HU) In cases where a samplesegment crossed BG boundaries weighted averagesacross these units were calculated

Five aggregate variables were found to be signifi-cant predictors in the third stage of analysis All werediscretized in elaborations of the basic logistic regres-sion equations in order to study their functional formin distinguishing respondents from non-respondents(005 level two-sided tests) Dichotomous classifica-tion was found to be appropriate to characterize thefunctional form of four predictors A trichotomousspecification was required for the fifth predictorResults presented in Table 4 show that respondentswere significantly more likely than non-respondentsto live in areas with a low proportion of people overthe age of 65 a low proportion of foreign-speakers ahigh proportion of non-Hispanic whites a highproportion of never-married people and highproportions of people not in the labour force

In the fourth stage of estimating the non-responsebias equation positive responses to diagnostic stemquestions in the screening section of the interviewwere added to the predictors in Model 3 Stem questions were included for mood disturbance(dysphoria euphoria irritability) anxiety (persistentworry panic agoraphobia specific fear social fearchildhood and adult separation anxiety) substanceproblems (alcohol drugs nicotine) and impulsecontrol problems (oppositional-defiant conductdisorder intermittent explosive attentional andhyperactive) As shown in Table 5 none of thesevariables alone was either a statistically or substan-tively significant predictor in the fourth stage of theanalysis These variables are significant as a group(χ2

20 = 381 p = 0010) though even though noneof the predictors in the equation is individuallysignificant We also considered more complex

IJMPR 132 3rd v4 25604 1026 am Page 79

Kessler Berglund et al80

Table 4 Geographic and socio-demographic predictors of response versus non-response based on thecomparison of main survey respondents (n = 9282) versus short-form survey respondents (n = 475)1

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Region Midwest 17 (11ndash26) 66 16 (10ndash25) 53 14 (09ndash21) 28 3South 18 (11ndash29) 16 (10ndash26) 13 (08ndash22)West 13 (07ndash25) 14 (07ndash26) 13 (07ndash24)Northeast 10 ndash

County urbanicity2

Central counties of metro 10 ndash 10 ndash 10 ndashareas of 1+million

Fringe counties of metro a 10 (05ndash21) 1060 12 (05ndash24) 875 08 (04ndash15) 232 7reas 1+ million

Central and fringe counties 15 (09ndash25) 16 (10ndash27) 15 (10ndash25)of metro areas of 250000 ndash1 million

Central and fringe counties 15 (08ndash29) 16 (09ndash29) 13 (07ndash25)of metro areasof less than 250000

Non-metro counties of 20000 19 (09ndash42) 21 (10ndash45) 20 (11ndash40)or more adjacent to a metro area

Non-metro counties of 20000 or 69 (42ndash111) 69 (41ndash118) 28 (14ndash54)more not adjacent to a metro area

Non-metro counties of 17 (08ndash37) 19 (09ndash40) 16 (08ndash35)2500 ndash19999

Non-metro counties of less than 31 (18ndash55) 34 (20ndash58) 22 (12ndash43)2500

Age18ndash29 10 ndash 10 ndash30ndash44 09 (06ndash14) 23 10 (07ndash15) 29 345ndash59 08 (06ndash12) 09 (06ndash14)60+ 11 (06ndash19) 13 (08ndash23)

SexFemale 10 (09ndash13) ndash 10 (09ndash13) ndashMale 10 ndash 10 ndash

Race Non-Hispanic White 10 ndash 10 ndashNon-Hispanic Black 15 (10ndash22) 39 17 (11ndash26) 76 3Hispanic 11 (07ndash17) 14 (09ndash20)Other 09 (05ndash15) 09 (05ndash15)

Marital statusMarried3 10 ndash 10 ndashNever married 16 (11ndash24) 92 17 (11ndash25) 91 2Separatedwidoweddivorced 18 (12ndash28) 18 (12ndash27)

Education0ndash11 10 ndash 10 ndash12 09 (06ndash14) 18 09 (06ndash14) 32 313ndash15 10 (07ndash14) 10 (07ndash14)16+ 11 (08ndash16) 12 (08ndash17)

IJMPR 132 3rd v4 25604 1026 am Page 80

NCS-R design and field procedures 81

specifications that included disorder clusters andcounts but none yielded any more compellingevidence than in Table 5 for significant differencesbetween respondents and initial non-respondents inthe prevalence of these diagnostic stem questions

Based on these results the final non-responseadjustment weight was based on Model 3 Each ofthe 9282 main study respondents was assigned apredicted probability of response based on this equa-tion The non-response adjustment weight was

defined as the multiplicative inverse of thatpredicted probability This weight was then normedto sum to 9282 This normed weight defines WT13

The 9282 cases were then weighted by the jointproduct of WT11 WT12 and WT13 A fourthweight (WT14) was then created to adjust for varia-tion between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the March 2002 Current PopulationSurvey (CPS) data This post-stratification weight

Table 4 contd

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Employment statusEmployed 10 ndash 10 ndashHomemaker 13 (08ndash20) 20 14 (09ndash21) 28 4Retired 11 (06ndash18) 11 (07ndash18)Student 09 (04ndash18) 09 (04ndash18)Other 11 (06ndash19) 11 (06ndash20)

Number of eligible household residents1 07 (04ndash11) 604 06 (04ndash10) 720 32 19 (11ndash35) 18 (10ndash33)3 27 (14ndash52) 26 (13ndash50)4+ 10 ndash 10 ndash

Block group-level socio-demographics4

Unable to speak English D1-6 19 (14ndash26) Never married D5-10 15 (11ndash22) Not in labor force D1-3 06 (04ndash09) Age 65+ D1-3 27 (16ndash45) Age 65+ D4-9 15 (09ndash23) Non-Hispanic White D1 04 (02ndash06)

Significant at the 005 level two-sided test

1 Based on logistic regression equations in which main survey respondents were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of all non-respondents usingmethods described in the text2 Metropolitan counties are defined as either central or fringe counties of census metropolitan statistical areas Non-metropolitancounties are defined residually as all counties that are not in census metropolitan statistical areas For more details on the CensusBureau definition of metropolitan statistical areas see httpwwwcensusgovpopulationwwwestimatesmetrodefhtml3 Includes co-habitors4 The percentages in each county were converted to percentiles based on weighted (by population size) county-level distributionsand converted into deciles (D) of the weighted percentile distributions The coefficients from preliminary regression analysesincluded nine dummies for the deciles of each substantive variable These coefficients were examined in order to choose theoptimal way to collapse the deciles A dichotomous coding was optimal for four of the five substantive variables and a trichotomouscoding for the fifth

IJMPR 132 3rd v4 25604 1026 am Page 81

Kessler Berglund et al82

was based on comparisons of age sex race-ethnicityeducation marital status region and urbanicity inthe weighted (by the joint product of WT11WT12 and WT13) sample and the 2002 CPSsample for persons ages 18+ in the continental USAs the full cross-classification of these seven vari-ables even using coarse categories has more cellsthan there are respondents in the sample asmoothing method was used to fit only the two-waymarginals by estimating a logistic regression equationthat combined the weighted 9282 respondents(coded 1 on the dichotomous dependent variable)with a synthetic dataset representative of the individual-level 2002 CPS data for the populationages 18+ (coded 0 on the dependent variable) Theseven socio-demographic variables and their two-way interactions were the predictors The predictedprobability of being in the NCS-R sample ratherthan in the synthetic CPS population sample wasdefined for each respondent (pNn) based on thisequation The ratio pNn (1 ndash pNn) was then calcu-lated for each respondent The sum of these ratiosacross the 9282 respondents was normed to sum to9282 These normed ratios define WT14

The four-way product of weights WT11 throughWT14 was then created and applied to the 9282respondent cases This defines the consolidated PartI weight based on the NR approach An additionalweight (WT15) is needed though to analyse datain the Part II sample This is because as notedearlier the Part I sample was divided into three stratathat differed in their probabilities of selection intoPart II (100 in stratum I 50 in stratum II and25 in stratum III) The proportions who actuallycompleted Part II differ from these targets 992 ofthe 4088 respondents in stratum I (n = 4054)534 of the 1555 respondents in stratum II (n = 830) and 222 of the 3639 respondents instratum III (n = 808) for a total Part II sample of5692 respondents These differences from the targetproportions occurred because some respondentsrefused to complete Part II (approximately 1 ofdesignated cases in each stratum) and because therandom selection procedure for respondents imple-mented in the CAPI program had some randomerror The latter accounts for the fact that thenumber of Part II respondents in stratum II is largerthan the target proportion

We could have generated WT15 as the simple

inverse of the weighted (by the consolidated Part Iweight) proportion of Part I respondents in thestratum who completed Part II However in order tocheck for the possibility of systematic bias we esti-mated separate stepwise logistic regression equationsin each stratum to predict participation in Part IIfrom Part I responses Only a handful of variables allof them demographic were found to be significantpredictors in these equations Each Part II respon-dent was assigned the inverse of his or her predictedprobability of participation in Part II from the finalwithin-stratum equation These values were thennormed to have a sum equal to the sum of the Part Iweights in the full Part I sample in the stratumThese normed values were then summed across theentire Part II sample of 5692 cases and renormed tohave a sum of weights of 5692 These renormedvalues define WT15 WT15 was then multiplied bythe consolidated Part I weight to create the consoli-dated Part II weight

Comparison of the weighted and unweighteddistributions of the Part I and Part II samples with2002 CPS population distributions provides informa-tion on the effects of weighting As shown in Table6 the unweighted Part I samples overrepresentedracial minorities females residents of the Midwestpeople with 13+ years of education and residents ofmetropolitan areas All of these biases were correctedwith the consolidated Part I weight Biases in theunweighted Part II sample were similar althoughmore extreme biases were found than in the Part Isample with regard to the over-representation offemales young adults (ages 18ndash34) and residents ofMetropolitan areas All of these biases werecorrected with the consolidated Part II weight

Weighting short-form respondents to be representative ofall non-respondentsWe noted in the last subsection that WT13 relies ona two-part weighting scheme that was applied to theshort-form respondents before they were comparedwith the main survey respondents This was done inorder to increase the extent to which the short-formrespondents represent all main survey non-respondents The first part of this weighting schemecompared the 399 HUs in which short-form inter-views were completed with all other non-respondentHUs on the same aggregate 2000 census block group(BG) data as those used in the third stage (Model 3)

IJMPR 132 3rd v4 25604 1026 am Page 82

NCS-R design and field procedures 83

of the non-response adjustment model (Table 4)Stepwise logistic regression was used to select signifi-cant predictors of HU-level participation versusnon-participation The final set of significant predic-tors included region urbanicity and household sizeThe participating HUs were then weighted by theinverse of their predicted probability of participationto adjust for segment-level variation in responseThis weight was then normed to have a sum ofweights equal to 3258 the weighted total number ofnon-respondent HUs in the sample

With this first weight applied separately to each ofthe 773 residents of the 399 participating short-formHUs a comparison of the short-form respondents inthese 399 HUs with the remaining eligible residentsof the same HUs was made based on the cross-classification of listing information (age and sex ofeach eligible HU resident and number of eligibleresidents in each HU) A second weight was thendeveloped that was equivalent to WT12 in adjustingthe short-form respondents to be representative of all773 eligible residents of these HUs As the 399 HUs

Table 5 Diagnostic stem question predictors of response versus non-response based on the comparison ofmain survey respondents (n = 9282) versus short-form survey respondents in non-respondent households (n = 475)1

Bivariate Multivariate

OR (95 CI) OR (95 CI)

I Mood disturbanceDysphoria 09 (07ndash12) 10 (08ndash14)Euphoria 08 (06ndash11) 09 (07ndash12)Irritability 08 (06ndash10) 08 (06ndash11)Extreme irritability 08 (06ndash12) 09 (07ndash13)

II AnxietyPersistent worry 09 (07ndash11) 09 (07ndash12)Panic 08 (06ndash11) 08 (06ndash11)Agoraphobic fear 09 (07ndash12) 09 (07ndash12)Specific fear 10 (07ndash13) 10 (08ndash13)Social fear 10 (07ndash13) 10 (07ndash14)Separation anxietyndash Childhood only 12 (08ndash18) 13 (09ndash20)ndash Adult only 11 (07ndash17) 13 (08ndash20)ndash Both 13 (07ndash24) 16 (08ndash29)

III Substance problemsNicotine 12 (09ndash15) 12 (09ndash16)Alcohol-drugs 11 (08ndash15) 11 (08ndash14)

IV Impulse-control problemsOppositional-defiant 10 (08ndash13) 10 (08ndash14)Conduct 09 (07ndash11) 09 (06ndash11)Intermittent explosive 11 (09ndash14) 12 (10ndash16)Attention-hyperactive problemsndash Attention only 09 (06ndash13) 09 (07ndash13)ndash Hyperactive only 11 (07ndash19) 11 (06ndash20)ndash Both 10 (06ndash06) 10 (06ndash16)

1 Based on logistic regression equations in which respondents in the main survey were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of the residents of all non-respondent HUs using methods described in the text All equations controlled for the predictors in Model 3 in Table 4

IJMPR 132 3rd v4 25604 1026 am Page 83

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 2: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

Kessler Berglund et al70

complexity of the interview which can lead to highrespondent burden in some cases The face-to-facesurvey mode made it possible for interviewers togauge respondent fatigue and to suggest short breaksif respondents needed time to regain their focusAlong the same lines interviews with respondentswho had complex histories of psychopathology wereoften broken up into two or more sessions that werespread out over a period of days or even weeks In arelated way use of the face-to-face mode minimizedthe problem of the respondent prematurely haltingthe interview (which is common in telephoneadministration) or completing only part of the assess-ment (which is common in mail questionnaires andInternet surveys) Break-offs of this sort are rare inin-person surveys This is especially true when as in the NCS-R interviewers are trained to monitorrespondent fatigue and to suggest breaks Consistentwith this thinking only 107 out of 9389 initialNCS-R respondents broke off an interview

The decision to use CAPI rather than paper-and-pencil (PAPI) administration was based on the factthat the interview schedule had many complex skipsThese skips create opportunities for interviewer errorin PAPI that are avoided in CAPI due to thecomputer controlling the skip logic Computer-assisted personal interviews can also be cost-effectivewhen the sample size is large because the investmentin the application programming is less than thelabour needed to keypunch PAPI responses An addi-tional appeal of CAPI compared to PAPI is that theinterviewer can be prompted for missing or inconsis-tent responses while the interview is in progressallowing these problems to be resolved immediately

Given the fact that the NCS-R interview askedabout a number of embarrassing feelings and behav-iours a question could be raised whether the methodof audio computer-assisted self-administered inter-viewing (A-CASI) should have been used instead ofmore conventional CAPI The use of A-CASI allowsrespondents to enter answers to embarrassing ques-tions into a laptop without the interviewer knowingtheir answers by using digital audio recordings andheadsets connected to the laptop to administer thesurvey questions There is now impressive evidencesuggesting that A-CASI leads to significantly higherreports of some illegal or embarrassing behaviours(Turner Lessler and Devore 1982 Tourangeau andSmith 1998 Turner Ku Rogers Lindberg Pleck

and Sonenstein 1998) Our decision not to use A-CASI despite this evidence was based on a concernabout non-comparability of responses for purposes oftrending with the baseline NCS and also with timingconsiderations Regarding the latter the field periodfor the NCS-R was set back more than 2 years fromits original start date because of unplanned complexi-ties in mounting a parallel national survey ofadolescent mental health The use of A-CASI wouldhave added to this delay

As noted above it was sometimes necessary toadminister an interview over two or more sessionsMost of the follow-up sessions were carried out inperson in the homes of respondents Interviewerswere allowed to complete follow-up sessions over thetelephone in three kinds of situations (1) when therespondent requested telephone administration forthe remaining questions (2) when the respondentlived in a remote rural area that required a great dealof travel time for the interviewer to reach and (3)when the remaining questions were few in numberIn each of these three situations the interviewer lefta respondent booklet with the respondent to be usedas a visual aid in completing the remainder of theinterview by telephone

The NCS-R interview scheduleAs described in more detail by Kessler and Uumlstuumln(2004) the NCS-R interview schedule was theversion of the World Health Organization (WHO)Composite International Diagnostic Interview(CIDI) that was developed for the WHO WorldMental Health (WMH) Survey Initiative Thisinstrument is referred to as the WMH-CIDI A smallnumber of supplemental sections were also includedthat were unique to the US version of the survey Ahard copy of the instrument is posted atwwwhcpmedharvardeduncs Only one importantaspect of the interview is discussed here ndash its length ndashas this had important implications for design andfield procedures

The NCS-R interview schedule was quite long Ittook a minimum of 90 minutes to complete amongrespondents who reported no lifetime disorders anaverage of approximately 2 hours and 30 minutesamong people with a history of disorder and as longas 5 to 6 hours among respondents with a verycomplex history of many different disorders Thislong length was due to the fact that the scientific

IJMPR 132 3rd v4 25604 1026 am Page 70

NCS-R design and field procedures 71

questions guiding instrument development requiredus to assess a wide variety of topics in the samesample of respondents Some of the sections couldhave been administered independently of the maininterview in a separate survey without undercuttingthe investigation of the issues considered in thosesections However given that major mental healthsurveys like the NCS-R are only funded once everydecade we were eager to keep all of these sections inthe instrument

The problem of interview length was addressed inthree ways First we carefully reviewed each questionin the interview to make sure it added value We alsocarefully evaluated each skip instruction to makesure that we were skipping respondents out ofsections as soon as we had the information needed toevaluate the issues under consideration This wasespecially important in the diagnostic sectionswhere it was possible to skip respondents once itbecame clear that they failed to meet any symptomrequired for a diagnosis Given that one of our aimswas to explore sub-threshold diagnoses though wehad to balance the desire to invoke skips with theneed to obtain sub-threshold information

Second we evaluated the data-analysis goals todetermine if they could be achieved by administeringthe section to a probability subsample of respondentsrather than to all respondents For example onesection replicated questions about psychologicaldistress that were originally developed for the 1957Americans View Their Mental Health survey(Gurin Veroff and Feld 1960) and were laterrepeated in a 1976 replication of that survey (VeroffDouvan and Kulka 1981) The aim was to merge thearchival individual-level data files from these earliersurveys with NCS-R data to study trends in self-

reported psychological distress over time Howeveras the earlier surveys were based on samples of1800ndash2100 respondents there would have beenlittle incremental improvement in the statisticalpower of trend comparisons if we administered thesequestions to the entire sample This section wasconsequently administered to a 30 subsampleSimilar subsampling was used to evaluate familyburden and to assess a number of disorders that were either included for exploratory purposes (forexample adult attention-deficithyperactivitydisorder adult separation anxiety disorder) or thatrequired in-depth assessment (non-affectivepsychosis obsessive-compulsive disorder)

Third the interview schedule was divided intotwo parts Part I administered to all respondentsincluded all core WMH-CIDI disorders The admin-istration time of Part I averaged 338 minutes andhad an inter-quartile range between 226 and 398minutes (Table 1) Part II included assessments ofrisk factors consequences services and other corre-lates of the core disorders Part II also includedassessments of additional disorders that were eitherof secondary importance or that were very time-consuming to assess The administration time of PartII had a mean of 1094 minutes a median of 1017minutes and an inter-quartile range between 839and 1241 minutes In order to reduce respondentburden Part II was administered only to 5692 of the9282 NCS-R respondents over-sampling those withclinically significant psychopathology All respon-dents who did not receive Part II were administered abrief demographic battery and were then eitherterminated or sampled in their appropriate propor-tions into sub-sampled interview sections that aredescribed below

Table 1 Administration time (minutes) of the Part I and Part II NCS-R interview schedule

Among respondents who completedPercentile Part I only Part I and Part II

Part I Part II

0 24 01 0125 226 335 83950 297 492 101775 398 708 124199 1029 1824 2565

Mean 338 573 1094

IJMPR 132 3rd v4 25604 1026 am Page 71

Kessler Berglund et al72

Selection into Part II was controlled by the CAPIprogram which divided respondents into three stratabased on their Part I responses The first stratumconsisted of respondents who either (1) met lifetimecriteria for at least one of the mental disordersassessed in Part I (2) met subthreshold lifetimecriteria for any of these disorders and sought treat-ment for at least one of them at some time in theirlife or (3) ever in their life either made a plan tocommit suicide or attempted suicide All of thesepeople were administered Part II The secondstratum consisted of respondents who although notmeeting criteria for membership in the first stratumgave responses in Part I indicating that they either(1) ever met subthreshold criteria for any of the PartI disorders (2) ever sought treatment for anyemotional or substance problem (3) ever hadsuicidal ideation or (4) used any psychotropicmedications (whether or not under the direct super-vision of a physician) in the past 12 months to treatemotional problems A probability sample of 59 ofthe respondents in this second stratum was selectedto receive Part II The third stratum consisted of allother respondents of whom 25 were selected toreceive Part II

As virtually all planned analyses of the NCS-Rdata feature comparisons of cases with non-cases andas the number of non-cases substantially exceeds thenumber of cases the under-sampling of non-cases inthe second and third strata has only a small effect onstatistical power to study correlates of disorder Thisassertion is demonstrated empirically later in thepaper An additional efficiency of the Part II sampleis that respondents in the second and third stratawere selected with probabilities proportional to theirhousehold size This approach which is described inmore detail below is opposite of the procedure usedin initial sample selection leading to a partialcancelling out of the variation in the within house-hold probability of selection weight

The survey populationThe survey was designed to be representative ofEnglish-speaking adults ages 18 or older living in thenon-institutionalized civilian household populationof the coterminous US (excluding Alaska andHawaii) plus students living in campus grouphousing who have a permanent household address

Fieldwork organization and proceduresAs noted above the NCS-R fieldwork was carriedout by the professional SRC national field interviewstaff Over 300 interviewers participated in datacollection The SRC field staff was supervised by ateam of 18 experienced regional supervisorsSupervisors in larger regions also had team leaderswho worked with them A study manager located atthe central SRC facility in Michigan oversaw thework of the supervisors and their staff

After sample selection (see below) each inter-viewer received a folder from his or her supervisor foreach household in which an interview was to beobtained An advance letter and study fact brochurewere sent to each of these households at a timedesigned to arrive a few days before the interviewermade their first contact attempt The letterexplained the purpose of the study and gave a toll-free number for respondents who had additionalquestions The study fact brochure containedanswers to frequently asked questions Upon makingin-person contact with the household the inter-viewer explained the study once again and obtaineda household listing This listing was then used toselect a random respondent in the household Therandom respondent was approached the interviewwas explained and verbal informed consent wasobtained Respondents were given $50 as a token ofappreciation for participating in the survey It shouldbe noted that verbal rather than written informedconsent was obtained because the NCS-R wasdesigned as a trend study replication of the baselineNCS which used verbal informed consent

In cases where the interviewer had difficultycontacting the household or in which the householdwas reluctant to participate persuasion letters weresent to the household Sixty days before the end of thefield period a special effort was made to recruit asmany unresolved cases as possible by sending a specialrecruitment letter and offering a higher financialincentive ($100) to complete a truncated version ofthe interview either in person or over the telephoneInterviewers were allowed to make unlimited in-personcontact attempts to complete these final interviewsand were given financial incentives for completingthese interviews during the closeout period

Survey Research Center interviewers were paid bythe hour rather than by the interview This is

IJMPR 132 3rd v4 25604 1026 am Page 72

NCS-R design and field procedures 73

important in light of evidence that interviewers paidby the interview tend to get low response ratesbecause they focus their efforts on interviewing easy-to-recruit respondents In the case of theWMH-CIDI where there is great variation in lengthof interview per-interview payment rather than per-hour payment also encourages interviewers to rushthrough long interviews We wanted to avoid thisand in fact we encouraged interviewers duringtraining to work especially hard at long interviewsbecause respondents with long interviews are gener-ally those with complex histories of psychopathologywhich are of special interest to the project We alsoencouraged interviewers to set up appointments forsecond and third interview sessions to complete longinterviews It is easy to facilitate such procedureswhen interviewers are paid by the hour

The Human Subjects Committees of bothHarvard Medical School (HMS) and the Universityof Michigan approved these recruitment consentand field procedures

Interviewer training Each professional SRC interviewer must complete atwo-day general interviewer training (GIT) coursebefore working on any SRC survey Moreover expe-rienced interviewers have to complete GIT refreshercourses on a periodic basis Each interviewer whoworked on NCS-R also received 7 days of study-specific training Each interviewer had to completean NCS-R certification test that involved adminis-tering a series of practice interviews with scriptedresponses before beginning production work

Fieldwork quality control As noted above sample households were selectedcentrally to avoid interviewers selectively recruitingrespondents from attractive neighbourhoods Therandom respondent in the household was selectedusing a standardized method that minimizes theprobability of interviewer cheating to select easy-to-recruit household members In addition the CAPIprogram controlled skip logic and had a built-inclock to record speed of data entry making it difficultfor interviewers to truncate interviews by skippingsections or to fake parts of interviews by filling in thelast sections quickly

Despite these structural disincentives to error and

cheating supervisors checked for all of these types oferror Supervisors also made contact with a random10 of interviewed households to confirm thehousehold address household enumeration andrandom selection procedures Supervisors alsoconfirmed the length of the interview and repeated arandom sample of questions in order to make surethat interviewers administered the full interview torespondents and to make sure responses wererecorded accurately

Survey Research Center field procedures called forcompleted CAPI interviews to be sent electronicallyby interviewers every night This allowed supervisorsto review the completeness of open-ended responsesand to make other quality control checks of the dataon a daily basis In cases where problems weredetected the interviewers were contacted andinstructed to re-contact the respondent to obtainmissing data

The SRC uses a computerized survey tracking soft-ware system to facilitate field quality control Thenumber of interviews completed outstandingresponse rate hours per interview and so forth areall recorded on a constantly updated basis by thissystem at the interviewer level along with bench-mark comparisons Supervisors can at a glance callinto the SRC computer server to monitor thesestatistics and go over them with individual inter-viewers in their weekly phone calls These same dataare available to study staff The supervisors SRCand HMS staff used these systems to pinpoint inter-viewers with low response rates in the first replicatefor remedial re-training Interviewers who persistedin low performance or who were found to makeconscious errors were terminated from the study

In the case of interviews with stem-branch logiclike the WMH-CIDI an additional type of potentialproblem is that interviewers who want to shorten thelength of the interview can do so by entering nega-tive responses to diagnostic stem questions evenwhen respondents endorse these questions As notedearlier SRC interviewers are paid by the hour in aneffort to avoid providing an incentive for this type ofcheating but we have seen clear evidence of it insurveys where interviewers were paid by the inter-view and supervision was only minimal Thereforein the NCS-R as in the other WMH surveys inter-viewer-level data were monitored to look for

IJMPR 132 3rd v4 25604 1026 am Page 73

Kessler Berglund et al74

evidence of systematic patterns of under-reportingdiagnostic stem questions Consistent with the ratio-nale for paying interviewers by the hour no suchevidence was found in the NCS-R

The sample design

Household sample selection proceduresRespondents were selected from a four-stage areaprobability sample of the non-institutionalizedcivilian population using small area data collected bythe US Bureau of the Census from the year 2000census of the US population to select the first twostages of the sample

The first stage of sampling selected a probabilitysample of 62 primary sampling units (PSUs) repre-sentative of the population These PSUs were linkedto the original PSUs used in the baseline NCS inorder to maximize the efficiency of cross-timecomparison Each PSU consisted of all counties in acensus-defined metropolitan statistical area (MSA)or in the case of counties not in an MSA of indi-vidual counties Primary sampling units wereselected with probabilities proportional to size (PPS)and geographic stratification from all possiblesegments in the country

The 62 PSUs include 16 MSAs that entered thesample with certainty an additional 31 non-certainty MSAs and 15 non-MSA counties Thecertainty selections include in order of size NewYork City Los Angeles Chicago PhiladelphiaDetroit San Francisco Washington DC DallasFortWorth Houston Boston Nassau-Suffolk NY StLouis Pittsburgh Baltimore Minneapolis andAtlanta These 16 PSUs are referred to as lsquoself-representingrsquo PSUs because they were not selectedrandomly to represent other MSAs but are so largethat they represent themselves Rigorously speakingthese 16 are not PSUs in the technical sense of thatterm but rather population strata The other 46PSUs are lsquonon-self-representingrsquo in the sense thatthey were selected to be representative of smallerareas in the country Systematic selection from anordered list was used to select the non-self-repre-senting PSUs with the order building in secondarystratification for geographic variation across thecountry After selection each of the three largestself-representing PSUs (New York Los AngelesChicago) was divided into four pseudo-PSUs while

each of the remaining 13 self-representing PSUs wasdivided into two pseudo-PSUs When combinedwith the 46 non-self-representing PSUs this yieldeda sample of 84 PSUs and pseudo-PSUs We hence-forth refer to both PSUs and pseudo-PSUs as PSUs

The second stage of sampling divided each PSUinto segments of between 50 and 100 housing unitsbased on 2000 small-area census data and selected aprobability sample of 12 such segments from eachnon-self-representing PSU A larger number ofsegments were selected from the self-representingPSUs using the ratio of the population size over thesystematic sampling interval Within each PSU thesegments were selected systematically from anordered list with probabilities of selection propor-tional to size The order in the list built in secondarystratification for geographic variation A total of1001 area segments were selected in the entiresample

Once the sample segments were selected eachsegment was either visited by an interviewer torecord the addresses of all housing units (HUs) (inthe case of segments that were not included in thebaseline NCS) or the listing used in the baselineNCS was updated for new construction and demoli-tion (in the case of segments that were included inthe baseline NCS) These lists were entered into acentralized computer data file In order to adjust fordiscrepancies between expected and observednumbers of HUs a random sample of HUs wasselected that equals 10 times OE where O is theobserved number of households listed in the segmentand E is the number of HUs expected in the segmentfrom the Census data files

This three-stage design creates a sample in whichthe probability of any individual HU being selectedto participate in the survey is equal for every HU inthe coterminous US The fourth stage of selectionthen obtained a household listing of all residents inthe age range 18 and older from a household infor-mant The informant was also asked whether eachadult HU resident spoke English Once the HUlisting was obtained a probability procedure wasused to select one or in some cases (see below) tworespondents to be interviewed As the number ofsampled cases in the HU did not increase proportion-ally with increase in HU size there is a bias in thisapproach toward underrepresenting people who livein large HUs As described below this bias was

IJMPR 132 3rd v4 25604 1026 am Page 74

NCS-R design and field procedures 75

corrected by weighting the data to adjust for thedifferential probability of selection within HUs

Procedures for sampling students living away from homeThe largest segment of the US population not livingin HUs consists of students who live in campus grouphousing (for example dormitories fraternities andsororities) We included these students in thesampling frame if they had a permanent homeaddress (typically the home of their parents) byincluding them in HU listings in all sample house-holds If they were selected for interview theircontact information at school was obtained andspecial arrangements were made to interview themStudents living in off-campus private housing ratherthan campus group housing were eligible for sampleselection in their school residences Students of thislatter sort were not included in the HU listings oftheir permanent residences in order to avoid givingsuch students two chances to enter the sample

Within-household sample selection proceduresRecruitment of HUs began by the interviewermailing an advance letter and study fact brochure tothe HU These materials explained the purposes ofthe study described the funding sources and thesurvey organization that was carrying out the surveylisted the names and affiliations of the senior scien-tists involved in the research provided informationabout the content and length of the interviewdescribed confidentiality procedures clearly statedthat participation was voluntary and provided a toll-free number for respondents who had additionalquestions The advance letter said that the inter-viewer would make an in-person visit to the HUwithin the next week in order to answer anyremaining questions and to determine whether theHU resident selected to participate would be willingto do so

In cases where it was difficult to find a HU resi-dent at home at least 15 in-person call attemptswere made to each sample HU to make an initialcontact with a HU member Additional contactattempts were made by telephone using a reversedirectory and by leaving notes at the HU with thestudy toll-free number and asking someone in theHU to call the study office to schedule an appoint-ment Additional in-person contact attempts weremade based on the discretion of the supervisor

Once a HU resident was contacted a listing of alleligible HU residents was made that recorded the ageand sex of each such person confirmed that eachcould speak English and ensured that permanentresidents who were students living away from homein campus group housing were included The Kishtable selection method was used to select one eligibleperson as the primary predesignated respondent Inaddition in a probability sample of households inwhich there were at least two eligible residents asecond predesignated respondent was selected inorder to study within-household aggregation ofmental disorders and to reduce variation across HUsin within-household probability of selection into thesample No attempt was made to select or to inter-view a secondary respondent unless the primaryrespondent was interviewed first This decision wasmade to avoid compromising the response rate forprimary respondents

The sampling fraction for the second predesig-nated respondent was lower in HUs with only two eligible respondents than those with three ormore eligible respondents in order to reduce variancein the within-household probability of selectionweight No more than two interviews were taken perHU even when the number of HU residents was largebecause we were concerned that taking more thantwo interviews would adversely affect the responserate by increasing household burden Moreover welimited selection of a second respondent to 25 ofHUs because we were concerned that using thisprocedure in a larger proportion of HUs in a samplewith a target of 10000 interviews would adverselyaffect the geographic dispersion of the sample byreducing the total number of HUs in the sample

Interviewing hard-to-recruit casesHard-to-recruit cases included both HUs that weredifficult to contact because the residents were usuallyaway from home and respondents who were reluctantto participate once they were reached The first ofthese problems was addressed by being persistent inmaking contact attempts at all times of day and alldays of the week We also left notes at the HUswhere we were consistently unable to make contactsthat included toll free numbers for residents tocontact us to schedule interview appointments Thesecond problem was addressed by offering a $50financial incentive to all respondents and by using a

IJMPR 132 3rd v4 25604 1026 am Page 75

Kessler Berglund et al76

variety of standard survey persuasion techniques toexplain the purpose and importance of the surveyand to emphasize the confidentiality of responses

Main data collection continued until all non-contact HUs had their full complement of callattempts and all reluctant predesignated respondentshad a number of recruitment attempts It was clear atthis point that the remaining hard-to-recruit caseswere busy people who were unlikely to participate ina survey that could reasonably be expected to lastbetween 2 and 3 hours We therefore developed ashort-form version of the interview that could beadministered in less than 1 hour and we made onelast attempt to obtain an interview with hard-to-recruit cases using this shortened instrumentRemaining hard-to-reach cases were sent a letterinforming them that the study was about to end andthat we were offering an honorarium of $100 forcompleting a shortened version of the interview inthe next 30 days that would last no more than onehour Interviewers were also given a bonus for eachshort-form interview they completed in this 30-dayclose-out period at which point fieldwork ended

Sample dispositionThe sample disposition as of the end of the mainphase of data collection is shown in the first fourcolumns of Table 2 The first two columns describeprimary predesignated respondents in the 10843

final sample HUs while the third and fourthcolumns describe secondary predesignated respon-dents in the 1976 HUs that were selected to obtainsecond interviews Ineligible HUs are excluded fromthe table The response rate at this point in the datacollection was 709 among primary and 804among secondary predesignated respondents with atotal of 9282 completed interviews Non-respondents included 73 of primary and 63 of secondary cases who refused to participate 177 ofprimary and 116 of secondary respondents whowere reluctant to participate (who told the inter-viewer that they were too busy at the time of contactbut did not refuse) 20 of primary and 17 ofsecondary respondents who could not be interviewedbecause of either a permanent condition (forexample mental retardation) or a long-term situa-tion (such as an overseas work assignment) and HUsthat were never contacted (20)

We subsequently attempted to administer short-form interviews to the 2143 reluctant andno-contact primary predesignated respondents andthe 230 reluctant secondary predesignated respon-dents described in the first two columns of Table 2In the course of completing the short-form inter-views with primary respondents an additional 104secondary pre-designated respondents were gener-ated resulting in a total of 334 secondarypre-designated respondents who we attempted to

Table 2 The NCS-R sample disposition

Main interview Short-form interview

Primary Secondary Primary Secondary

(n) (n) (n) (n)

Interview 709 (7693) 804 (1589) 186 (399) 464 (155)Refusal 73 (791)1 63 (124)1 751 (1609) 443 (148)Reluctant 177 (1922) 116 (230) ndash 2 ndash 2 ndash 2 ndash 2Circumstantial 20 (216) 17 (33) 06 (14) 93 (31)No contact 20 (221) ndash 3 ndash 3 56 (121) ndash 3 ndash 3Total (10843) (1976) (2143) (334)

1 Refusals among primary respondents include informants who refused to provide a HU listing respondents who refused to be interviewed and respondents who began the interview but broke off before completing Part I Refusals among secondary respon-dents include the last two of these three categories2 All non-respondents in the short-form interview phase of the survey were classified as refusals rather than reluctant unless theyhad circumstantial reasons for not being interviewed 3 No contact is defined at the HU-level as never making any contact with a resident As a result none of the secondary pre-designated respondents was included in this category even if no contact was ever made with them In the latter case they wereclassified as reluctant in the main interview and as refusals in the short-form interview

IJMPR 132 3rd v4 25604 1026 am Page 76

NCS-R design and field procedures 77

interview The sample disposition is shown in thelast four columns of Table 2 The conditionalresponse rate was 186 among primary and 464among secondary pre-designated respondents with atotal of 554 completed short-form interviews

The 9282 main interviews combined with the554 short-form interviews total to 9836 with 8092among primary and 1744 among secondary respon-dents Focusing on primary respondents the overallresponse rate is 746 the overall cooperation rate(excluding from the denominator HUs that werenever contacted) is 761 and the cooperation rateamong pre-designated respondents without circum-stantial constraints on their participation is 778Among secondary respondents the conditionaloverall response rate is 838 and the cooperationrate among pre-designated respondents withoutcircumstantial constraints is 865

WeightingTwo approaches were taken to weighting the NCS-Rdata The first which we refer to as the non-response(NR) adjustment approach treats the 9282 main interviews as the achieved sample The short-form interviews are treated as non-respondentinterviews which are used to develop a non-responseadjustment weight applied to the 9282 main inter-views The second approach which we refer to as themultiple-imputation (MI) approach combines the9282 main interviews with the 554 short-form inter-views to create a combined sample of 9836 cases inwhich the 554 short-form interviews are weighted totreat them as representative of all initial non-respon-dents The method of MI (Rubin 1987) is used toadjust for the fact that the short-form interviews didnot include all the questions in the main interviewsAdditional weights are applied in both approaches toadjust for differences in within-household proba-bility of selection and to post-stratify the finalsample to approximate the distribution of the 2000Census on a range of socio-demographic variables

The NR approach is the main one used in analysisof the NCS-R data The MI approach is moreexploratory because MI can lead to downward bias inestimates of associations if the models used to makethe imputations do not capture the full complexity ofthe associations in the main cases On the otherhand by using explicit models to impute missingvalues the MI approach allows more flexibility than

the NR approach in combining observed variablesfor short-form cases with patterns detected in thecomplete data In addition mild modelling assumptions can be used in the MI approach toimprove efficiency compared to the NR approach Incases of this sort when the parameter estimates aresimilar in the two approaches the SE will generallybe lower in the MI approach As a result of thesepotential advantages of the MI approach we plan touse MI to replicate marginally significant substantivepatterns as a sensitivity analysis bearing in mindthat shrinkage of the associations might occur due tolack of precision in the imputations Based on thishierarchy between the two approaches in the NCS-Ranalyses the focus in this section of the paper islargely on the NR approach

The non-response adjustment approachFive weights are applied to the data in the NRapproach a locked building subsampling weight(WT11) a within-household probability of selec-tion weight (WT12) a non-response adjustmentweight (WT13) a post-stratification weight(WT14) and a Part II selection weight (WT15)The joint product of the first four of these weights isthe consolidated weight used to analyse data fromthe Part I sample in the NR approach (n = 9282)while the joint product of all five weights is theconsolidated weight used to analyse data from the PartII sample (n = 5692) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample andweight are used This section of the paper reviewseach of these five weights

The locked building subsampling weight (WT11)gives a double weight to 60 sample HUs from an orig-inal 120 apartments in locked apartment buildingswhere we could not make contact with a superinten-dent or other official to gain entry A random 50 ofthe predesignated units in these buildings wereselected for especially intense recruitment effortThese 60 were double weighted It should be notedthat residents of a number of other locked apartmentbuildings were also included and interviewed basedon the interviewers contacting the building ownersuperintendent or official committee of residentsand obtaining permission to carry out interviews inthe building In addition we were officially refusedpermission to carry out interviews in five large

IJMPR 132 3rd v4 25604 1026 am Page 77

Kessler Berglund et al78

locked buildings in New York City each representinga complete sample segment Non-probabilityreplacements of other locked buildings in the sameneighbourhoods were made in order to avoidcomplete non-response in these segments

The within-household probability of selectionweight (WT12) adjusts for the fact that the proba-bility of selection of respondents within HUs variesinversely with the number of people in the HU Thisis true because as noted earlier only one or tworespondents were selected for interview in each HUWT12 was created by generating a separate observa-tional record for each of the weighted (by WT11)14025 eligible HU residents in the 7693 partici-pating HUs The three variables in the householdlisting were included in the individual-level recordsrespondent age and sex and number of eligible resi-dents in the HU The weighted distributions of thesethree variables were then calculated for the 9282respondents the 4733 eligible HU residents whowere not interviewed and for all 14015 eligible HUresidents weighted to adjust for WT11 The sum ofweights was 14025 because of this adjustment

As shown in Table 3 these distributions differmeaningfully with the greatest difference being for

size of household This is because the individual-level probability of selection within HUs wasinversely proportional to HU size leading to adistortion in the age and sex distributions amongrespondents because these variables vary with size ofHU WT12 was constructed to correct this bias Theweighted three-way cross-classification of age sexand household size was computed separately for the9282 respondents (weighted to 9287 because ofWT11) and for all 14015 residents of the partici-pating HUs (weighted to 14025) The ratio of thenumber of cases in the latter to the former was thencalculated separately for each cell and these ratioswere applied to each respondent The sum of theweights across respondents was normed to sum to14025 These normed ratios define WT12

The non-response adjustment weight (WT13)was then developed and applied to the weighted (bythe product of WT11 and WT12) dataset of 9282respondents to adjust for the fact that non-respondents differ from respondents The partici-pants in the short-form interviews from initialnon-respondent HUs were treated as representativeof all non-respondents in order to develop thisweight As described in the next subsection a two-

Table 3 Household listing information for respondents and other eligible residents of the households thatparticipated in the main survey1

Respondents Others Total

(SE) (SE) (SE)

Age18ndash29 227 (04) 269 (06) 241 (04)30ndash44 317 (05) 298 (07) 311 (04)45ndash59 246 (04) 266 (06) 253 (04)60+ 210 (04) 166 (05) 195 (03)

SexMale 446 (05) 520 (07) 471 (04)Female 554 (05) 480 (07) 529 (04)

Number of eligible HU residents1 293 (05) 00 (00) 194 (03)2 539 (05) 606 (07) 562 (04)3+ 168 (04) 394 (07) 244 (04)

Total (n) (9287) (4738) (14025)

1 A total of 9282 respondents from 7693 HUs participated in the main survey An additional 4733 eligible people lived in the 7693HUs The locked building weight (WT11) was used in calculating the distributions in the table converting the 9282 respondents intoa weighted total of 9287 and the 4733 other eligible HU residents into a weighted total of 4738 for a total of 14025

IJMPR 132 3rd v4 25604 1026 am Page 78

NCS-R design and field procedures 79

part weight was applied as a preliminary step to theseshort-form respondents aimed at making them asrepresentative as possible of all residents of non-respondent HUs As the short-form respondentscompleted the disorder screening section as well asthe demographic sections it was possible to use diag-nostic stem questions in addition to individual-leveldemographic variables as predictors in comparing themain survey respondents with the initial non-respon-dents Prior to carrying out the stepwise logisticregression analysis the main survey respondentswere weighted to a sum of weights of 14025 to repre-sent all eligible residents of the respondent HUswhile the initial non-respondents were weighted to asum of weights of 6302 to represent all eligible resi-dents of non-respondent HUs In both cases thesesums were obtained from HU listings

WT13 is a very important weight because thelogistic regression equation on which it is based eval-uates whether there is a significant bias in theprevalence of mental disorders in the main sampleWe consequently consider the construction ofWT13 in some detail in this part of the paper Fourstages of model fitting were used to create the logisticregression equation on which WT13 is based Thesestages sequentially evaluated the effects ofgeographic variables individual-level demographicvariables census small area aggregate variables andscreening measures of mental disorders Results ofthe first three stages are presented in Table 4 Thefirst stage (Model 1) examined segment-level differ-ences between respondent and non-respondent HUsin Census region and Census urbanicity Results inthe first two columns of Table 4 show that respon-dent HUs differ meaningfully from non-respondentsHUs in urbanicity (χ2

7 = 1060 p = 000) and region(χ2

3 = 66 p = 009) The odds-ratios (ORs) forurbanicity and region show that the sample over-represents non-metropolitan counties the Midwestand the South

The second stage (Model 2) added basic individual-level demographic variables to the equa-tion ndash age sex race-ethnicity marital statuseducation employment status and household sizeAs shown in Table 4 household size (χ2

3 = 604 p =000) and marital status (χ2

2 = 92 p = 001) werethe only significant predictors in this set withrespondents significantly less likely than non-respondents to live in houses with one to three

eligible residents and to be never married or previ-ously married

The third stage (Model 3) was based on a forwardstepwise logistic regression analysis that searched for2000 census block group (BG) aggregate measuresthat significantly discriminate between respondentsand non-respondents A wide range of variables wasused to describe BG characteristics includingpercentage variables (for example the percentage ofadults in the BG living in poverty the percentageliving alone) and mean variables (for example theaverage family income of HUs in the BG the averagenumber of adults per HU) In cases where a samplesegment crossed BG boundaries weighted averagesacross these units were calculated

Five aggregate variables were found to be signifi-cant predictors in the third stage of analysis All werediscretized in elaborations of the basic logistic regres-sion equations in order to study their functional formin distinguishing respondents from non-respondents(005 level two-sided tests) Dichotomous classifica-tion was found to be appropriate to characterize thefunctional form of four predictors A trichotomousspecification was required for the fifth predictorResults presented in Table 4 show that respondentswere significantly more likely than non-respondentsto live in areas with a low proportion of people overthe age of 65 a low proportion of foreign-speakers ahigh proportion of non-Hispanic whites a highproportion of never-married people and highproportions of people not in the labour force

In the fourth stage of estimating the non-responsebias equation positive responses to diagnostic stemquestions in the screening section of the interviewwere added to the predictors in Model 3 Stem questions were included for mood disturbance(dysphoria euphoria irritability) anxiety (persistentworry panic agoraphobia specific fear social fearchildhood and adult separation anxiety) substanceproblems (alcohol drugs nicotine) and impulsecontrol problems (oppositional-defiant conductdisorder intermittent explosive attentional andhyperactive) As shown in Table 5 none of thesevariables alone was either a statistically or substan-tively significant predictor in the fourth stage of theanalysis These variables are significant as a group(χ2

20 = 381 p = 0010) though even though noneof the predictors in the equation is individuallysignificant We also considered more complex

IJMPR 132 3rd v4 25604 1026 am Page 79

Kessler Berglund et al80

Table 4 Geographic and socio-demographic predictors of response versus non-response based on thecomparison of main survey respondents (n = 9282) versus short-form survey respondents (n = 475)1

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Region Midwest 17 (11ndash26) 66 16 (10ndash25) 53 14 (09ndash21) 28 3South 18 (11ndash29) 16 (10ndash26) 13 (08ndash22)West 13 (07ndash25) 14 (07ndash26) 13 (07ndash24)Northeast 10 ndash

County urbanicity2

Central counties of metro 10 ndash 10 ndash 10 ndashareas of 1+million

Fringe counties of metro a 10 (05ndash21) 1060 12 (05ndash24) 875 08 (04ndash15) 232 7reas 1+ million

Central and fringe counties 15 (09ndash25) 16 (10ndash27) 15 (10ndash25)of metro areas of 250000 ndash1 million

Central and fringe counties 15 (08ndash29) 16 (09ndash29) 13 (07ndash25)of metro areasof less than 250000

Non-metro counties of 20000 19 (09ndash42) 21 (10ndash45) 20 (11ndash40)or more adjacent to a metro area

Non-metro counties of 20000 or 69 (42ndash111) 69 (41ndash118) 28 (14ndash54)more not adjacent to a metro area

Non-metro counties of 17 (08ndash37) 19 (09ndash40) 16 (08ndash35)2500 ndash19999

Non-metro counties of less than 31 (18ndash55) 34 (20ndash58) 22 (12ndash43)2500

Age18ndash29 10 ndash 10 ndash30ndash44 09 (06ndash14) 23 10 (07ndash15) 29 345ndash59 08 (06ndash12) 09 (06ndash14)60+ 11 (06ndash19) 13 (08ndash23)

SexFemale 10 (09ndash13) ndash 10 (09ndash13) ndashMale 10 ndash 10 ndash

Race Non-Hispanic White 10 ndash 10 ndashNon-Hispanic Black 15 (10ndash22) 39 17 (11ndash26) 76 3Hispanic 11 (07ndash17) 14 (09ndash20)Other 09 (05ndash15) 09 (05ndash15)

Marital statusMarried3 10 ndash 10 ndashNever married 16 (11ndash24) 92 17 (11ndash25) 91 2Separatedwidoweddivorced 18 (12ndash28) 18 (12ndash27)

Education0ndash11 10 ndash 10 ndash12 09 (06ndash14) 18 09 (06ndash14) 32 313ndash15 10 (07ndash14) 10 (07ndash14)16+ 11 (08ndash16) 12 (08ndash17)

IJMPR 132 3rd v4 25604 1026 am Page 80

NCS-R design and field procedures 81

specifications that included disorder clusters andcounts but none yielded any more compellingevidence than in Table 5 for significant differencesbetween respondents and initial non-respondents inthe prevalence of these diagnostic stem questions

Based on these results the final non-responseadjustment weight was based on Model 3 Each ofthe 9282 main study respondents was assigned apredicted probability of response based on this equa-tion The non-response adjustment weight was

defined as the multiplicative inverse of thatpredicted probability This weight was then normedto sum to 9282 This normed weight defines WT13

The 9282 cases were then weighted by the jointproduct of WT11 WT12 and WT13 A fourthweight (WT14) was then created to adjust for varia-tion between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the March 2002 Current PopulationSurvey (CPS) data This post-stratification weight

Table 4 contd

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Employment statusEmployed 10 ndash 10 ndashHomemaker 13 (08ndash20) 20 14 (09ndash21) 28 4Retired 11 (06ndash18) 11 (07ndash18)Student 09 (04ndash18) 09 (04ndash18)Other 11 (06ndash19) 11 (06ndash20)

Number of eligible household residents1 07 (04ndash11) 604 06 (04ndash10) 720 32 19 (11ndash35) 18 (10ndash33)3 27 (14ndash52) 26 (13ndash50)4+ 10 ndash 10 ndash

Block group-level socio-demographics4

Unable to speak English D1-6 19 (14ndash26) Never married D5-10 15 (11ndash22) Not in labor force D1-3 06 (04ndash09) Age 65+ D1-3 27 (16ndash45) Age 65+ D4-9 15 (09ndash23) Non-Hispanic White D1 04 (02ndash06)

Significant at the 005 level two-sided test

1 Based on logistic regression equations in which main survey respondents were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of all non-respondents usingmethods described in the text2 Metropolitan counties are defined as either central or fringe counties of census metropolitan statistical areas Non-metropolitancounties are defined residually as all counties that are not in census metropolitan statistical areas For more details on the CensusBureau definition of metropolitan statistical areas see httpwwwcensusgovpopulationwwwestimatesmetrodefhtml3 Includes co-habitors4 The percentages in each county were converted to percentiles based on weighted (by population size) county-level distributionsand converted into deciles (D) of the weighted percentile distributions The coefficients from preliminary regression analysesincluded nine dummies for the deciles of each substantive variable These coefficients were examined in order to choose theoptimal way to collapse the deciles A dichotomous coding was optimal for four of the five substantive variables and a trichotomouscoding for the fifth

IJMPR 132 3rd v4 25604 1026 am Page 81

Kessler Berglund et al82

was based on comparisons of age sex race-ethnicityeducation marital status region and urbanicity inthe weighted (by the joint product of WT11WT12 and WT13) sample and the 2002 CPSsample for persons ages 18+ in the continental USAs the full cross-classification of these seven vari-ables even using coarse categories has more cellsthan there are respondents in the sample asmoothing method was used to fit only the two-waymarginals by estimating a logistic regression equationthat combined the weighted 9282 respondents(coded 1 on the dichotomous dependent variable)with a synthetic dataset representative of the individual-level 2002 CPS data for the populationages 18+ (coded 0 on the dependent variable) Theseven socio-demographic variables and their two-way interactions were the predictors The predictedprobability of being in the NCS-R sample ratherthan in the synthetic CPS population sample wasdefined for each respondent (pNn) based on thisequation The ratio pNn (1 ndash pNn) was then calcu-lated for each respondent The sum of these ratiosacross the 9282 respondents was normed to sum to9282 These normed ratios define WT14

The four-way product of weights WT11 throughWT14 was then created and applied to the 9282respondent cases This defines the consolidated PartI weight based on the NR approach An additionalweight (WT15) is needed though to analyse datain the Part II sample This is because as notedearlier the Part I sample was divided into three stratathat differed in their probabilities of selection intoPart II (100 in stratum I 50 in stratum II and25 in stratum III) The proportions who actuallycompleted Part II differ from these targets 992 ofthe 4088 respondents in stratum I (n = 4054)534 of the 1555 respondents in stratum II (n = 830) and 222 of the 3639 respondents instratum III (n = 808) for a total Part II sample of5692 respondents These differences from the targetproportions occurred because some respondentsrefused to complete Part II (approximately 1 ofdesignated cases in each stratum) and because therandom selection procedure for respondents imple-mented in the CAPI program had some randomerror The latter accounts for the fact that thenumber of Part II respondents in stratum II is largerthan the target proportion

We could have generated WT15 as the simple

inverse of the weighted (by the consolidated Part Iweight) proportion of Part I respondents in thestratum who completed Part II However in order tocheck for the possibility of systematic bias we esti-mated separate stepwise logistic regression equationsin each stratum to predict participation in Part IIfrom Part I responses Only a handful of variables allof them demographic were found to be significantpredictors in these equations Each Part II respon-dent was assigned the inverse of his or her predictedprobability of participation in Part II from the finalwithin-stratum equation These values were thennormed to have a sum equal to the sum of the Part Iweights in the full Part I sample in the stratumThese normed values were then summed across theentire Part II sample of 5692 cases and renormed tohave a sum of weights of 5692 These renormedvalues define WT15 WT15 was then multiplied bythe consolidated Part I weight to create the consoli-dated Part II weight

Comparison of the weighted and unweighteddistributions of the Part I and Part II samples with2002 CPS population distributions provides informa-tion on the effects of weighting As shown in Table6 the unweighted Part I samples overrepresentedracial minorities females residents of the Midwestpeople with 13+ years of education and residents ofmetropolitan areas All of these biases were correctedwith the consolidated Part I weight Biases in theunweighted Part II sample were similar althoughmore extreme biases were found than in the Part Isample with regard to the over-representation offemales young adults (ages 18ndash34) and residents ofMetropolitan areas All of these biases werecorrected with the consolidated Part II weight

Weighting short-form respondents to be representative ofall non-respondentsWe noted in the last subsection that WT13 relies ona two-part weighting scheme that was applied to theshort-form respondents before they were comparedwith the main survey respondents This was done inorder to increase the extent to which the short-formrespondents represent all main survey non-respondents The first part of this weighting schemecompared the 399 HUs in which short-form inter-views were completed with all other non-respondentHUs on the same aggregate 2000 census block group(BG) data as those used in the third stage (Model 3)

IJMPR 132 3rd v4 25604 1026 am Page 82

NCS-R design and field procedures 83

of the non-response adjustment model (Table 4)Stepwise logistic regression was used to select signifi-cant predictors of HU-level participation versusnon-participation The final set of significant predic-tors included region urbanicity and household sizeThe participating HUs were then weighted by theinverse of their predicted probability of participationto adjust for segment-level variation in responseThis weight was then normed to have a sum ofweights equal to 3258 the weighted total number ofnon-respondent HUs in the sample

With this first weight applied separately to each ofthe 773 residents of the 399 participating short-formHUs a comparison of the short-form respondents inthese 399 HUs with the remaining eligible residentsof the same HUs was made based on the cross-classification of listing information (age and sex ofeach eligible HU resident and number of eligibleresidents in each HU) A second weight was thendeveloped that was equivalent to WT12 in adjustingthe short-form respondents to be representative of all773 eligible residents of these HUs As the 399 HUs

Table 5 Diagnostic stem question predictors of response versus non-response based on the comparison ofmain survey respondents (n = 9282) versus short-form survey respondents in non-respondent households (n = 475)1

Bivariate Multivariate

OR (95 CI) OR (95 CI)

I Mood disturbanceDysphoria 09 (07ndash12) 10 (08ndash14)Euphoria 08 (06ndash11) 09 (07ndash12)Irritability 08 (06ndash10) 08 (06ndash11)Extreme irritability 08 (06ndash12) 09 (07ndash13)

II AnxietyPersistent worry 09 (07ndash11) 09 (07ndash12)Panic 08 (06ndash11) 08 (06ndash11)Agoraphobic fear 09 (07ndash12) 09 (07ndash12)Specific fear 10 (07ndash13) 10 (08ndash13)Social fear 10 (07ndash13) 10 (07ndash14)Separation anxietyndash Childhood only 12 (08ndash18) 13 (09ndash20)ndash Adult only 11 (07ndash17) 13 (08ndash20)ndash Both 13 (07ndash24) 16 (08ndash29)

III Substance problemsNicotine 12 (09ndash15) 12 (09ndash16)Alcohol-drugs 11 (08ndash15) 11 (08ndash14)

IV Impulse-control problemsOppositional-defiant 10 (08ndash13) 10 (08ndash14)Conduct 09 (07ndash11) 09 (06ndash11)Intermittent explosive 11 (09ndash14) 12 (10ndash16)Attention-hyperactive problemsndash Attention only 09 (06ndash13) 09 (07ndash13)ndash Hyperactive only 11 (07ndash19) 11 (06ndash20)ndash Both 10 (06ndash06) 10 (06ndash16)

1 Based on logistic regression equations in which respondents in the main survey were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of the residents of all non-respondent HUs using methods described in the text All equations controlled for the predictors in Model 3 in Table 4

IJMPR 132 3rd v4 25604 1026 am Page 83

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 3: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

NCS-R design and field procedures 71

questions guiding instrument development requiredus to assess a wide variety of topics in the samesample of respondents Some of the sections couldhave been administered independently of the maininterview in a separate survey without undercuttingthe investigation of the issues considered in thosesections However given that major mental healthsurveys like the NCS-R are only funded once everydecade we were eager to keep all of these sections inthe instrument

The problem of interview length was addressed inthree ways First we carefully reviewed each questionin the interview to make sure it added value We alsocarefully evaluated each skip instruction to makesure that we were skipping respondents out ofsections as soon as we had the information needed toevaluate the issues under consideration This wasespecially important in the diagnostic sectionswhere it was possible to skip respondents once itbecame clear that they failed to meet any symptomrequired for a diagnosis Given that one of our aimswas to explore sub-threshold diagnoses though wehad to balance the desire to invoke skips with theneed to obtain sub-threshold information

Second we evaluated the data-analysis goals todetermine if they could be achieved by administeringthe section to a probability subsample of respondentsrather than to all respondents For example onesection replicated questions about psychologicaldistress that were originally developed for the 1957Americans View Their Mental Health survey(Gurin Veroff and Feld 1960) and were laterrepeated in a 1976 replication of that survey (VeroffDouvan and Kulka 1981) The aim was to merge thearchival individual-level data files from these earliersurveys with NCS-R data to study trends in self-

reported psychological distress over time Howeveras the earlier surveys were based on samples of1800ndash2100 respondents there would have beenlittle incremental improvement in the statisticalpower of trend comparisons if we administered thesequestions to the entire sample This section wasconsequently administered to a 30 subsampleSimilar subsampling was used to evaluate familyburden and to assess a number of disorders that were either included for exploratory purposes (forexample adult attention-deficithyperactivitydisorder adult separation anxiety disorder) or thatrequired in-depth assessment (non-affectivepsychosis obsessive-compulsive disorder)

Third the interview schedule was divided intotwo parts Part I administered to all respondentsincluded all core WMH-CIDI disorders The admin-istration time of Part I averaged 338 minutes andhad an inter-quartile range between 226 and 398minutes (Table 1) Part II included assessments ofrisk factors consequences services and other corre-lates of the core disorders Part II also includedassessments of additional disorders that were eitherof secondary importance or that were very time-consuming to assess The administration time of PartII had a mean of 1094 minutes a median of 1017minutes and an inter-quartile range between 839and 1241 minutes In order to reduce respondentburden Part II was administered only to 5692 of the9282 NCS-R respondents over-sampling those withclinically significant psychopathology All respon-dents who did not receive Part II were administered abrief demographic battery and were then eitherterminated or sampled in their appropriate propor-tions into sub-sampled interview sections that aredescribed below

Table 1 Administration time (minutes) of the Part I and Part II NCS-R interview schedule

Among respondents who completedPercentile Part I only Part I and Part II

Part I Part II

0 24 01 0125 226 335 83950 297 492 101775 398 708 124199 1029 1824 2565

Mean 338 573 1094

IJMPR 132 3rd v4 25604 1026 am Page 71

Kessler Berglund et al72

Selection into Part II was controlled by the CAPIprogram which divided respondents into three stratabased on their Part I responses The first stratumconsisted of respondents who either (1) met lifetimecriteria for at least one of the mental disordersassessed in Part I (2) met subthreshold lifetimecriteria for any of these disorders and sought treat-ment for at least one of them at some time in theirlife or (3) ever in their life either made a plan tocommit suicide or attempted suicide All of thesepeople were administered Part II The secondstratum consisted of respondents who although notmeeting criteria for membership in the first stratumgave responses in Part I indicating that they either(1) ever met subthreshold criteria for any of the PartI disorders (2) ever sought treatment for anyemotional or substance problem (3) ever hadsuicidal ideation or (4) used any psychotropicmedications (whether or not under the direct super-vision of a physician) in the past 12 months to treatemotional problems A probability sample of 59 ofthe respondents in this second stratum was selectedto receive Part II The third stratum consisted of allother respondents of whom 25 were selected toreceive Part II

As virtually all planned analyses of the NCS-Rdata feature comparisons of cases with non-cases andas the number of non-cases substantially exceeds thenumber of cases the under-sampling of non-cases inthe second and third strata has only a small effect onstatistical power to study correlates of disorder Thisassertion is demonstrated empirically later in thepaper An additional efficiency of the Part II sampleis that respondents in the second and third stratawere selected with probabilities proportional to theirhousehold size This approach which is described inmore detail below is opposite of the procedure usedin initial sample selection leading to a partialcancelling out of the variation in the within house-hold probability of selection weight

The survey populationThe survey was designed to be representative ofEnglish-speaking adults ages 18 or older living in thenon-institutionalized civilian household populationof the coterminous US (excluding Alaska andHawaii) plus students living in campus grouphousing who have a permanent household address

Fieldwork organization and proceduresAs noted above the NCS-R fieldwork was carriedout by the professional SRC national field interviewstaff Over 300 interviewers participated in datacollection The SRC field staff was supervised by ateam of 18 experienced regional supervisorsSupervisors in larger regions also had team leaderswho worked with them A study manager located atthe central SRC facility in Michigan oversaw thework of the supervisors and their staff

After sample selection (see below) each inter-viewer received a folder from his or her supervisor foreach household in which an interview was to beobtained An advance letter and study fact brochurewere sent to each of these households at a timedesigned to arrive a few days before the interviewermade their first contact attempt The letterexplained the purpose of the study and gave a toll-free number for respondents who had additionalquestions The study fact brochure containedanswers to frequently asked questions Upon makingin-person contact with the household the inter-viewer explained the study once again and obtaineda household listing This listing was then used toselect a random respondent in the household Therandom respondent was approached the interviewwas explained and verbal informed consent wasobtained Respondents were given $50 as a token ofappreciation for participating in the survey It shouldbe noted that verbal rather than written informedconsent was obtained because the NCS-R wasdesigned as a trend study replication of the baselineNCS which used verbal informed consent

In cases where the interviewer had difficultycontacting the household or in which the householdwas reluctant to participate persuasion letters weresent to the household Sixty days before the end of thefield period a special effort was made to recruit asmany unresolved cases as possible by sending a specialrecruitment letter and offering a higher financialincentive ($100) to complete a truncated version ofthe interview either in person or over the telephoneInterviewers were allowed to make unlimited in-personcontact attempts to complete these final interviewsand were given financial incentives for completingthese interviews during the closeout period

Survey Research Center interviewers were paid bythe hour rather than by the interview This is

IJMPR 132 3rd v4 25604 1026 am Page 72

NCS-R design and field procedures 73

important in light of evidence that interviewers paidby the interview tend to get low response ratesbecause they focus their efforts on interviewing easy-to-recruit respondents In the case of theWMH-CIDI where there is great variation in lengthof interview per-interview payment rather than per-hour payment also encourages interviewers to rushthrough long interviews We wanted to avoid thisand in fact we encouraged interviewers duringtraining to work especially hard at long interviewsbecause respondents with long interviews are gener-ally those with complex histories of psychopathologywhich are of special interest to the project We alsoencouraged interviewers to set up appointments forsecond and third interview sessions to complete longinterviews It is easy to facilitate such procedureswhen interviewers are paid by the hour

The Human Subjects Committees of bothHarvard Medical School (HMS) and the Universityof Michigan approved these recruitment consentand field procedures

Interviewer training Each professional SRC interviewer must complete atwo-day general interviewer training (GIT) coursebefore working on any SRC survey Moreover expe-rienced interviewers have to complete GIT refreshercourses on a periodic basis Each interviewer whoworked on NCS-R also received 7 days of study-specific training Each interviewer had to completean NCS-R certification test that involved adminis-tering a series of practice interviews with scriptedresponses before beginning production work

Fieldwork quality control As noted above sample households were selectedcentrally to avoid interviewers selectively recruitingrespondents from attractive neighbourhoods Therandom respondent in the household was selectedusing a standardized method that minimizes theprobability of interviewer cheating to select easy-to-recruit household members In addition the CAPIprogram controlled skip logic and had a built-inclock to record speed of data entry making it difficultfor interviewers to truncate interviews by skippingsections or to fake parts of interviews by filling in thelast sections quickly

Despite these structural disincentives to error and

cheating supervisors checked for all of these types oferror Supervisors also made contact with a random10 of interviewed households to confirm thehousehold address household enumeration andrandom selection procedures Supervisors alsoconfirmed the length of the interview and repeated arandom sample of questions in order to make surethat interviewers administered the full interview torespondents and to make sure responses wererecorded accurately

Survey Research Center field procedures called forcompleted CAPI interviews to be sent electronicallyby interviewers every night This allowed supervisorsto review the completeness of open-ended responsesand to make other quality control checks of the dataon a daily basis In cases where problems weredetected the interviewers were contacted andinstructed to re-contact the respondent to obtainmissing data

The SRC uses a computerized survey tracking soft-ware system to facilitate field quality control Thenumber of interviews completed outstandingresponse rate hours per interview and so forth areall recorded on a constantly updated basis by thissystem at the interviewer level along with bench-mark comparisons Supervisors can at a glance callinto the SRC computer server to monitor thesestatistics and go over them with individual inter-viewers in their weekly phone calls These same dataare available to study staff The supervisors SRCand HMS staff used these systems to pinpoint inter-viewers with low response rates in the first replicatefor remedial re-training Interviewers who persistedin low performance or who were found to makeconscious errors were terminated from the study

In the case of interviews with stem-branch logiclike the WMH-CIDI an additional type of potentialproblem is that interviewers who want to shorten thelength of the interview can do so by entering nega-tive responses to diagnostic stem questions evenwhen respondents endorse these questions As notedearlier SRC interviewers are paid by the hour in aneffort to avoid providing an incentive for this type ofcheating but we have seen clear evidence of it insurveys where interviewers were paid by the inter-view and supervision was only minimal Thereforein the NCS-R as in the other WMH surveys inter-viewer-level data were monitored to look for

IJMPR 132 3rd v4 25604 1026 am Page 73

Kessler Berglund et al74

evidence of systematic patterns of under-reportingdiagnostic stem questions Consistent with the ratio-nale for paying interviewers by the hour no suchevidence was found in the NCS-R

The sample design

Household sample selection proceduresRespondents were selected from a four-stage areaprobability sample of the non-institutionalizedcivilian population using small area data collected bythe US Bureau of the Census from the year 2000census of the US population to select the first twostages of the sample

The first stage of sampling selected a probabilitysample of 62 primary sampling units (PSUs) repre-sentative of the population These PSUs were linkedto the original PSUs used in the baseline NCS inorder to maximize the efficiency of cross-timecomparison Each PSU consisted of all counties in acensus-defined metropolitan statistical area (MSA)or in the case of counties not in an MSA of indi-vidual counties Primary sampling units wereselected with probabilities proportional to size (PPS)and geographic stratification from all possiblesegments in the country

The 62 PSUs include 16 MSAs that entered thesample with certainty an additional 31 non-certainty MSAs and 15 non-MSA counties Thecertainty selections include in order of size NewYork City Los Angeles Chicago PhiladelphiaDetroit San Francisco Washington DC DallasFortWorth Houston Boston Nassau-Suffolk NY StLouis Pittsburgh Baltimore Minneapolis andAtlanta These 16 PSUs are referred to as lsquoself-representingrsquo PSUs because they were not selectedrandomly to represent other MSAs but are so largethat they represent themselves Rigorously speakingthese 16 are not PSUs in the technical sense of thatterm but rather population strata The other 46PSUs are lsquonon-self-representingrsquo in the sense thatthey were selected to be representative of smallerareas in the country Systematic selection from anordered list was used to select the non-self-repre-senting PSUs with the order building in secondarystratification for geographic variation across thecountry After selection each of the three largestself-representing PSUs (New York Los AngelesChicago) was divided into four pseudo-PSUs while

each of the remaining 13 self-representing PSUs wasdivided into two pseudo-PSUs When combinedwith the 46 non-self-representing PSUs this yieldeda sample of 84 PSUs and pseudo-PSUs We hence-forth refer to both PSUs and pseudo-PSUs as PSUs

The second stage of sampling divided each PSUinto segments of between 50 and 100 housing unitsbased on 2000 small-area census data and selected aprobability sample of 12 such segments from eachnon-self-representing PSU A larger number ofsegments were selected from the self-representingPSUs using the ratio of the population size over thesystematic sampling interval Within each PSU thesegments were selected systematically from anordered list with probabilities of selection propor-tional to size The order in the list built in secondarystratification for geographic variation A total of1001 area segments were selected in the entiresample

Once the sample segments were selected eachsegment was either visited by an interviewer torecord the addresses of all housing units (HUs) (inthe case of segments that were not included in thebaseline NCS) or the listing used in the baselineNCS was updated for new construction and demoli-tion (in the case of segments that were included inthe baseline NCS) These lists were entered into acentralized computer data file In order to adjust fordiscrepancies between expected and observednumbers of HUs a random sample of HUs wasselected that equals 10 times OE where O is theobserved number of households listed in the segmentand E is the number of HUs expected in the segmentfrom the Census data files

This three-stage design creates a sample in whichthe probability of any individual HU being selectedto participate in the survey is equal for every HU inthe coterminous US The fourth stage of selectionthen obtained a household listing of all residents inthe age range 18 and older from a household infor-mant The informant was also asked whether eachadult HU resident spoke English Once the HUlisting was obtained a probability procedure wasused to select one or in some cases (see below) tworespondents to be interviewed As the number ofsampled cases in the HU did not increase proportion-ally with increase in HU size there is a bias in thisapproach toward underrepresenting people who livein large HUs As described below this bias was

IJMPR 132 3rd v4 25604 1026 am Page 74

NCS-R design and field procedures 75

corrected by weighting the data to adjust for thedifferential probability of selection within HUs

Procedures for sampling students living away from homeThe largest segment of the US population not livingin HUs consists of students who live in campus grouphousing (for example dormitories fraternities andsororities) We included these students in thesampling frame if they had a permanent homeaddress (typically the home of their parents) byincluding them in HU listings in all sample house-holds If they were selected for interview theircontact information at school was obtained andspecial arrangements were made to interview themStudents living in off-campus private housing ratherthan campus group housing were eligible for sampleselection in their school residences Students of thislatter sort were not included in the HU listings oftheir permanent residences in order to avoid givingsuch students two chances to enter the sample

Within-household sample selection proceduresRecruitment of HUs began by the interviewermailing an advance letter and study fact brochure tothe HU These materials explained the purposes ofthe study described the funding sources and thesurvey organization that was carrying out the surveylisted the names and affiliations of the senior scien-tists involved in the research provided informationabout the content and length of the interviewdescribed confidentiality procedures clearly statedthat participation was voluntary and provided a toll-free number for respondents who had additionalquestions The advance letter said that the inter-viewer would make an in-person visit to the HUwithin the next week in order to answer anyremaining questions and to determine whether theHU resident selected to participate would be willingto do so

In cases where it was difficult to find a HU resi-dent at home at least 15 in-person call attemptswere made to each sample HU to make an initialcontact with a HU member Additional contactattempts were made by telephone using a reversedirectory and by leaving notes at the HU with thestudy toll-free number and asking someone in theHU to call the study office to schedule an appoint-ment Additional in-person contact attempts weremade based on the discretion of the supervisor

Once a HU resident was contacted a listing of alleligible HU residents was made that recorded the ageand sex of each such person confirmed that eachcould speak English and ensured that permanentresidents who were students living away from homein campus group housing were included The Kishtable selection method was used to select one eligibleperson as the primary predesignated respondent Inaddition in a probability sample of households inwhich there were at least two eligible residents asecond predesignated respondent was selected inorder to study within-household aggregation ofmental disorders and to reduce variation across HUsin within-household probability of selection into thesample No attempt was made to select or to inter-view a secondary respondent unless the primaryrespondent was interviewed first This decision wasmade to avoid compromising the response rate forprimary respondents

The sampling fraction for the second predesig-nated respondent was lower in HUs with only two eligible respondents than those with three ormore eligible respondents in order to reduce variancein the within-household probability of selectionweight No more than two interviews were taken perHU even when the number of HU residents was largebecause we were concerned that taking more thantwo interviews would adversely affect the responserate by increasing household burden Moreover welimited selection of a second respondent to 25 ofHUs because we were concerned that using thisprocedure in a larger proportion of HUs in a samplewith a target of 10000 interviews would adverselyaffect the geographic dispersion of the sample byreducing the total number of HUs in the sample

Interviewing hard-to-recruit casesHard-to-recruit cases included both HUs that weredifficult to contact because the residents were usuallyaway from home and respondents who were reluctantto participate once they were reached The first ofthese problems was addressed by being persistent inmaking contact attempts at all times of day and alldays of the week We also left notes at the HUswhere we were consistently unable to make contactsthat included toll free numbers for residents tocontact us to schedule interview appointments Thesecond problem was addressed by offering a $50financial incentive to all respondents and by using a

IJMPR 132 3rd v4 25604 1026 am Page 75

Kessler Berglund et al76

variety of standard survey persuasion techniques toexplain the purpose and importance of the surveyand to emphasize the confidentiality of responses

Main data collection continued until all non-contact HUs had their full complement of callattempts and all reluctant predesignated respondentshad a number of recruitment attempts It was clear atthis point that the remaining hard-to-recruit caseswere busy people who were unlikely to participate ina survey that could reasonably be expected to lastbetween 2 and 3 hours We therefore developed ashort-form version of the interview that could beadministered in less than 1 hour and we made onelast attempt to obtain an interview with hard-to-recruit cases using this shortened instrumentRemaining hard-to-reach cases were sent a letterinforming them that the study was about to end andthat we were offering an honorarium of $100 forcompleting a shortened version of the interview inthe next 30 days that would last no more than onehour Interviewers were also given a bonus for eachshort-form interview they completed in this 30-dayclose-out period at which point fieldwork ended

Sample dispositionThe sample disposition as of the end of the mainphase of data collection is shown in the first fourcolumns of Table 2 The first two columns describeprimary predesignated respondents in the 10843

final sample HUs while the third and fourthcolumns describe secondary predesignated respon-dents in the 1976 HUs that were selected to obtainsecond interviews Ineligible HUs are excluded fromthe table The response rate at this point in the datacollection was 709 among primary and 804among secondary predesignated respondents with atotal of 9282 completed interviews Non-respondents included 73 of primary and 63 of secondary cases who refused to participate 177 ofprimary and 116 of secondary respondents whowere reluctant to participate (who told the inter-viewer that they were too busy at the time of contactbut did not refuse) 20 of primary and 17 ofsecondary respondents who could not be interviewedbecause of either a permanent condition (forexample mental retardation) or a long-term situa-tion (such as an overseas work assignment) and HUsthat were never contacted (20)

We subsequently attempted to administer short-form interviews to the 2143 reluctant andno-contact primary predesignated respondents andthe 230 reluctant secondary predesignated respon-dents described in the first two columns of Table 2In the course of completing the short-form inter-views with primary respondents an additional 104secondary pre-designated respondents were gener-ated resulting in a total of 334 secondarypre-designated respondents who we attempted to

Table 2 The NCS-R sample disposition

Main interview Short-form interview

Primary Secondary Primary Secondary

(n) (n) (n) (n)

Interview 709 (7693) 804 (1589) 186 (399) 464 (155)Refusal 73 (791)1 63 (124)1 751 (1609) 443 (148)Reluctant 177 (1922) 116 (230) ndash 2 ndash 2 ndash 2 ndash 2Circumstantial 20 (216) 17 (33) 06 (14) 93 (31)No contact 20 (221) ndash 3 ndash 3 56 (121) ndash 3 ndash 3Total (10843) (1976) (2143) (334)

1 Refusals among primary respondents include informants who refused to provide a HU listing respondents who refused to be interviewed and respondents who began the interview but broke off before completing Part I Refusals among secondary respon-dents include the last two of these three categories2 All non-respondents in the short-form interview phase of the survey were classified as refusals rather than reluctant unless theyhad circumstantial reasons for not being interviewed 3 No contact is defined at the HU-level as never making any contact with a resident As a result none of the secondary pre-designated respondents was included in this category even if no contact was ever made with them In the latter case they wereclassified as reluctant in the main interview and as refusals in the short-form interview

IJMPR 132 3rd v4 25604 1026 am Page 76

NCS-R design and field procedures 77

interview The sample disposition is shown in thelast four columns of Table 2 The conditionalresponse rate was 186 among primary and 464among secondary pre-designated respondents with atotal of 554 completed short-form interviews

The 9282 main interviews combined with the554 short-form interviews total to 9836 with 8092among primary and 1744 among secondary respon-dents Focusing on primary respondents the overallresponse rate is 746 the overall cooperation rate(excluding from the denominator HUs that werenever contacted) is 761 and the cooperation rateamong pre-designated respondents without circum-stantial constraints on their participation is 778Among secondary respondents the conditionaloverall response rate is 838 and the cooperationrate among pre-designated respondents withoutcircumstantial constraints is 865

WeightingTwo approaches were taken to weighting the NCS-Rdata The first which we refer to as the non-response(NR) adjustment approach treats the 9282 main interviews as the achieved sample The short-form interviews are treated as non-respondentinterviews which are used to develop a non-responseadjustment weight applied to the 9282 main inter-views The second approach which we refer to as themultiple-imputation (MI) approach combines the9282 main interviews with the 554 short-form inter-views to create a combined sample of 9836 cases inwhich the 554 short-form interviews are weighted totreat them as representative of all initial non-respon-dents The method of MI (Rubin 1987) is used toadjust for the fact that the short-form interviews didnot include all the questions in the main interviewsAdditional weights are applied in both approaches toadjust for differences in within-household proba-bility of selection and to post-stratify the finalsample to approximate the distribution of the 2000Census on a range of socio-demographic variables

The NR approach is the main one used in analysisof the NCS-R data The MI approach is moreexploratory because MI can lead to downward bias inestimates of associations if the models used to makethe imputations do not capture the full complexity ofthe associations in the main cases On the otherhand by using explicit models to impute missingvalues the MI approach allows more flexibility than

the NR approach in combining observed variablesfor short-form cases with patterns detected in thecomplete data In addition mild modelling assumptions can be used in the MI approach toimprove efficiency compared to the NR approach Incases of this sort when the parameter estimates aresimilar in the two approaches the SE will generallybe lower in the MI approach As a result of thesepotential advantages of the MI approach we plan touse MI to replicate marginally significant substantivepatterns as a sensitivity analysis bearing in mindthat shrinkage of the associations might occur due tolack of precision in the imputations Based on thishierarchy between the two approaches in the NCS-Ranalyses the focus in this section of the paper islargely on the NR approach

The non-response adjustment approachFive weights are applied to the data in the NRapproach a locked building subsampling weight(WT11) a within-household probability of selec-tion weight (WT12) a non-response adjustmentweight (WT13) a post-stratification weight(WT14) and a Part II selection weight (WT15)The joint product of the first four of these weights isthe consolidated weight used to analyse data fromthe Part I sample in the NR approach (n = 9282)while the joint product of all five weights is theconsolidated weight used to analyse data from the PartII sample (n = 5692) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample andweight are used This section of the paper reviewseach of these five weights

The locked building subsampling weight (WT11)gives a double weight to 60 sample HUs from an orig-inal 120 apartments in locked apartment buildingswhere we could not make contact with a superinten-dent or other official to gain entry A random 50 ofthe predesignated units in these buildings wereselected for especially intense recruitment effortThese 60 were double weighted It should be notedthat residents of a number of other locked apartmentbuildings were also included and interviewed basedon the interviewers contacting the building ownersuperintendent or official committee of residentsand obtaining permission to carry out interviews inthe building In addition we were officially refusedpermission to carry out interviews in five large

IJMPR 132 3rd v4 25604 1026 am Page 77

Kessler Berglund et al78

locked buildings in New York City each representinga complete sample segment Non-probabilityreplacements of other locked buildings in the sameneighbourhoods were made in order to avoidcomplete non-response in these segments

The within-household probability of selectionweight (WT12) adjusts for the fact that the proba-bility of selection of respondents within HUs variesinversely with the number of people in the HU Thisis true because as noted earlier only one or tworespondents were selected for interview in each HUWT12 was created by generating a separate observa-tional record for each of the weighted (by WT11)14025 eligible HU residents in the 7693 partici-pating HUs The three variables in the householdlisting were included in the individual-level recordsrespondent age and sex and number of eligible resi-dents in the HU The weighted distributions of thesethree variables were then calculated for the 9282respondents the 4733 eligible HU residents whowere not interviewed and for all 14015 eligible HUresidents weighted to adjust for WT11 The sum ofweights was 14025 because of this adjustment

As shown in Table 3 these distributions differmeaningfully with the greatest difference being for

size of household This is because the individual-level probability of selection within HUs wasinversely proportional to HU size leading to adistortion in the age and sex distributions amongrespondents because these variables vary with size ofHU WT12 was constructed to correct this bias Theweighted three-way cross-classification of age sexand household size was computed separately for the9282 respondents (weighted to 9287 because ofWT11) and for all 14015 residents of the partici-pating HUs (weighted to 14025) The ratio of thenumber of cases in the latter to the former was thencalculated separately for each cell and these ratioswere applied to each respondent The sum of theweights across respondents was normed to sum to14025 These normed ratios define WT12

The non-response adjustment weight (WT13)was then developed and applied to the weighted (bythe product of WT11 and WT12) dataset of 9282respondents to adjust for the fact that non-respondents differ from respondents The partici-pants in the short-form interviews from initialnon-respondent HUs were treated as representativeof all non-respondents in order to develop thisweight As described in the next subsection a two-

Table 3 Household listing information for respondents and other eligible residents of the households thatparticipated in the main survey1

Respondents Others Total

(SE) (SE) (SE)

Age18ndash29 227 (04) 269 (06) 241 (04)30ndash44 317 (05) 298 (07) 311 (04)45ndash59 246 (04) 266 (06) 253 (04)60+ 210 (04) 166 (05) 195 (03)

SexMale 446 (05) 520 (07) 471 (04)Female 554 (05) 480 (07) 529 (04)

Number of eligible HU residents1 293 (05) 00 (00) 194 (03)2 539 (05) 606 (07) 562 (04)3+ 168 (04) 394 (07) 244 (04)

Total (n) (9287) (4738) (14025)

1 A total of 9282 respondents from 7693 HUs participated in the main survey An additional 4733 eligible people lived in the 7693HUs The locked building weight (WT11) was used in calculating the distributions in the table converting the 9282 respondents intoa weighted total of 9287 and the 4733 other eligible HU residents into a weighted total of 4738 for a total of 14025

IJMPR 132 3rd v4 25604 1026 am Page 78

NCS-R design and field procedures 79

part weight was applied as a preliminary step to theseshort-form respondents aimed at making them asrepresentative as possible of all residents of non-respondent HUs As the short-form respondentscompleted the disorder screening section as well asthe demographic sections it was possible to use diag-nostic stem questions in addition to individual-leveldemographic variables as predictors in comparing themain survey respondents with the initial non-respon-dents Prior to carrying out the stepwise logisticregression analysis the main survey respondentswere weighted to a sum of weights of 14025 to repre-sent all eligible residents of the respondent HUswhile the initial non-respondents were weighted to asum of weights of 6302 to represent all eligible resi-dents of non-respondent HUs In both cases thesesums were obtained from HU listings

WT13 is a very important weight because thelogistic regression equation on which it is based eval-uates whether there is a significant bias in theprevalence of mental disorders in the main sampleWe consequently consider the construction ofWT13 in some detail in this part of the paper Fourstages of model fitting were used to create the logisticregression equation on which WT13 is based Thesestages sequentially evaluated the effects ofgeographic variables individual-level demographicvariables census small area aggregate variables andscreening measures of mental disorders Results ofthe first three stages are presented in Table 4 Thefirst stage (Model 1) examined segment-level differ-ences between respondent and non-respondent HUsin Census region and Census urbanicity Results inthe first two columns of Table 4 show that respon-dent HUs differ meaningfully from non-respondentsHUs in urbanicity (χ2

7 = 1060 p = 000) and region(χ2

3 = 66 p = 009) The odds-ratios (ORs) forurbanicity and region show that the sample over-represents non-metropolitan counties the Midwestand the South

The second stage (Model 2) added basic individual-level demographic variables to the equa-tion ndash age sex race-ethnicity marital statuseducation employment status and household sizeAs shown in Table 4 household size (χ2

3 = 604 p =000) and marital status (χ2

2 = 92 p = 001) werethe only significant predictors in this set withrespondents significantly less likely than non-respondents to live in houses with one to three

eligible residents and to be never married or previ-ously married

The third stage (Model 3) was based on a forwardstepwise logistic regression analysis that searched for2000 census block group (BG) aggregate measuresthat significantly discriminate between respondentsand non-respondents A wide range of variables wasused to describe BG characteristics includingpercentage variables (for example the percentage ofadults in the BG living in poverty the percentageliving alone) and mean variables (for example theaverage family income of HUs in the BG the averagenumber of adults per HU) In cases where a samplesegment crossed BG boundaries weighted averagesacross these units were calculated

Five aggregate variables were found to be signifi-cant predictors in the third stage of analysis All werediscretized in elaborations of the basic logistic regres-sion equations in order to study their functional formin distinguishing respondents from non-respondents(005 level two-sided tests) Dichotomous classifica-tion was found to be appropriate to characterize thefunctional form of four predictors A trichotomousspecification was required for the fifth predictorResults presented in Table 4 show that respondentswere significantly more likely than non-respondentsto live in areas with a low proportion of people overthe age of 65 a low proportion of foreign-speakers ahigh proportion of non-Hispanic whites a highproportion of never-married people and highproportions of people not in the labour force

In the fourth stage of estimating the non-responsebias equation positive responses to diagnostic stemquestions in the screening section of the interviewwere added to the predictors in Model 3 Stem questions were included for mood disturbance(dysphoria euphoria irritability) anxiety (persistentworry panic agoraphobia specific fear social fearchildhood and adult separation anxiety) substanceproblems (alcohol drugs nicotine) and impulsecontrol problems (oppositional-defiant conductdisorder intermittent explosive attentional andhyperactive) As shown in Table 5 none of thesevariables alone was either a statistically or substan-tively significant predictor in the fourth stage of theanalysis These variables are significant as a group(χ2

20 = 381 p = 0010) though even though noneof the predictors in the equation is individuallysignificant We also considered more complex

IJMPR 132 3rd v4 25604 1026 am Page 79

Kessler Berglund et al80

Table 4 Geographic and socio-demographic predictors of response versus non-response based on thecomparison of main survey respondents (n = 9282) versus short-form survey respondents (n = 475)1

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Region Midwest 17 (11ndash26) 66 16 (10ndash25) 53 14 (09ndash21) 28 3South 18 (11ndash29) 16 (10ndash26) 13 (08ndash22)West 13 (07ndash25) 14 (07ndash26) 13 (07ndash24)Northeast 10 ndash

County urbanicity2

Central counties of metro 10 ndash 10 ndash 10 ndashareas of 1+million

Fringe counties of metro a 10 (05ndash21) 1060 12 (05ndash24) 875 08 (04ndash15) 232 7reas 1+ million

Central and fringe counties 15 (09ndash25) 16 (10ndash27) 15 (10ndash25)of metro areas of 250000 ndash1 million

Central and fringe counties 15 (08ndash29) 16 (09ndash29) 13 (07ndash25)of metro areasof less than 250000

Non-metro counties of 20000 19 (09ndash42) 21 (10ndash45) 20 (11ndash40)or more adjacent to a metro area

Non-metro counties of 20000 or 69 (42ndash111) 69 (41ndash118) 28 (14ndash54)more not adjacent to a metro area

Non-metro counties of 17 (08ndash37) 19 (09ndash40) 16 (08ndash35)2500 ndash19999

Non-metro counties of less than 31 (18ndash55) 34 (20ndash58) 22 (12ndash43)2500

Age18ndash29 10 ndash 10 ndash30ndash44 09 (06ndash14) 23 10 (07ndash15) 29 345ndash59 08 (06ndash12) 09 (06ndash14)60+ 11 (06ndash19) 13 (08ndash23)

SexFemale 10 (09ndash13) ndash 10 (09ndash13) ndashMale 10 ndash 10 ndash

Race Non-Hispanic White 10 ndash 10 ndashNon-Hispanic Black 15 (10ndash22) 39 17 (11ndash26) 76 3Hispanic 11 (07ndash17) 14 (09ndash20)Other 09 (05ndash15) 09 (05ndash15)

Marital statusMarried3 10 ndash 10 ndashNever married 16 (11ndash24) 92 17 (11ndash25) 91 2Separatedwidoweddivorced 18 (12ndash28) 18 (12ndash27)

Education0ndash11 10 ndash 10 ndash12 09 (06ndash14) 18 09 (06ndash14) 32 313ndash15 10 (07ndash14) 10 (07ndash14)16+ 11 (08ndash16) 12 (08ndash17)

IJMPR 132 3rd v4 25604 1026 am Page 80

NCS-R design and field procedures 81

specifications that included disorder clusters andcounts but none yielded any more compellingevidence than in Table 5 for significant differencesbetween respondents and initial non-respondents inthe prevalence of these diagnostic stem questions

Based on these results the final non-responseadjustment weight was based on Model 3 Each ofthe 9282 main study respondents was assigned apredicted probability of response based on this equa-tion The non-response adjustment weight was

defined as the multiplicative inverse of thatpredicted probability This weight was then normedto sum to 9282 This normed weight defines WT13

The 9282 cases were then weighted by the jointproduct of WT11 WT12 and WT13 A fourthweight (WT14) was then created to adjust for varia-tion between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the March 2002 Current PopulationSurvey (CPS) data This post-stratification weight

Table 4 contd

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Employment statusEmployed 10 ndash 10 ndashHomemaker 13 (08ndash20) 20 14 (09ndash21) 28 4Retired 11 (06ndash18) 11 (07ndash18)Student 09 (04ndash18) 09 (04ndash18)Other 11 (06ndash19) 11 (06ndash20)

Number of eligible household residents1 07 (04ndash11) 604 06 (04ndash10) 720 32 19 (11ndash35) 18 (10ndash33)3 27 (14ndash52) 26 (13ndash50)4+ 10 ndash 10 ndash

Block group-level socio-demographics4

Unable to speak English D1-6 19 (14ndash26) Never married D5-10 15 (11ndash22) Not in labor force D1-3 06 (04ndash09) Age 65+ D1-3 27 (16ndash45) Age 65+ D4-9 15 (09ndash23) Non-Hispanic White D1 04 (02ndash06)

Significant at the 005 level two-sided test

1 Based on logistic regression equations in which main survey respondents were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of all non-respondents usingmethods described in the text2 Metropolitan counties are defined as either central or fringe counties of census metropolitan statistical areas Non-metropolitancounties are defined residually as all counties that are not in census metropolitan statistical areas For more details on the CensusBureau definition of metropolitan statistical areas see httpwwwcensusgovpopulationwwwestimatesmetrodefhtml3 Includes co-habitors4 The percentages in each county were converted to percentiles based on weighted (by population size) county-level distributionsand converted into deciles (D) of the weighted percentile distributions The coefficients from preliminary regression analysesincluded nine dummies for the deciles of each substantive variable These coefficients were examined in order to choose theoptimal way to collapse the deciles A dichotomous coding was optimal for four of the five substantive variables and a trichotomouscoding for the fifth

IJMPR 132 3rd v4 25604 1026 am Page 81

Kessler Berglund et al82

was based on comparisons of age sex race-ethnicityeducation marital status region and urbanicity inthe weighted (by the joint product of WT11WT12 and WT13) sample and the 2002 CPSsample for persons ages 18+ in the continental USAs the full cross-classification of these seven vari-ables even using coarse categories has more cellsthan there are respondents in the sample asmoothing method was used to fit only the two-waymarginals by estimating a logistic regression equationthat combined the weighted 9282 respondents(coded 1 on the dichotomous dependent variable)with a synthetic dataset representative of the individual-level 2002 CPS data for the populationages 18+ (coded 0 on the dependent variable) Theseven socio-demographic variables and their two-way interactions were the predictors The predictedprobability of being in the NCS-R sample ratherthan in the synthetic CPS population sample wasdefined for each respondent (pNn) based on thisequation The ratio pNn (1 ndash pNn) was then calcu-lated for each respondent The sum of these ratiosacross the 9282 respondents was normed to sum to9282 These normed ratios define WT14

The four-way product of weights WT11 throughWT14 was then created and applied to the 9282respondent cases This defines the consolidated PartI weight based on the NR approach An additionalweight (WT15) is needed though to analyse datain the Part II sample This is because as notedearlier the Part I sample was divided into three stratathat differed in their probabilities of selection intoPart II (100 in stratum I 50 in stratum II and25 in stratum III) The proportions who actuallycompleted Part II differ from these targets 992 ofthe 4088 respondents in stratum I (n = 4054)534 of the 1555 respondents in stratum II (n = 830) and 222 of the 3639 respondents instratum III (n = 808) for a total Part II sample of5692 respondents These differences from the targetproportions occurred because some respondentsrefused to complete Part II (approximately 1 ofdesignated cases in each stratum) and because therandom selection procedure for respondents imple-mented in the CAPI program had some randomerror The latter accounts for the fact that thenumber of Part II respondents in stratum II is largerthan the target proportion

We could have generated WT15 as the simple

inverse of the weighted (by the consolidated Part Iweight) proportion of Part I respondents in thestratum who completed Part II However in order tocheck for the possibility of systematic bias we esti-mated separate stepwise logistic regression equationsin each stratum to predict participation in Part IIfrom Part I responses Only a handful of variables allof them demographic were found to be significantpredictors in these equations Each Part II respon-dent was assigned the inverse of his or her predictedprobability of participation in Part II from the finalwithin-stratum equation These values were thennormed to have a sum equal to the sum of the Part Iweights in the full Part I sample in the stratumThese normed values were then summed across theentire Part II sample of 5692 cases and renormed tohave a sum of weights of 5692 These renormedvalues define WT15 WT15 was then multiplied bythe consolidated Part I weight to create the consoli-dated Part II weight

Comparison of the weighted and unweighteddistributions of the Part I and Part II samples with2002 CPS population distributions provides informa-tion on the effects of weighting As shown in Table6 the unweighted Part I samples overrepresentedracial minorities females residents of the Midwestpeople with 13+ years of education and residents ofmetropolitan areas All of these biases were correctedwith the consolidated Part I weight Biases in theunweighted Part II sample were similar althoughmore extreme biases were found than in the Part Isample with regard to the over-representation offemales young adults (ages 18ndash34) and residents ofMetropolitan areas All of these biases werecorrected with the consolidated Part II weight

Weighting short-form respondents to be representative ofall non-respondentsWe noted in the last subsection that WT13 relies ona two-part weighting scheme that was applied to theshort-form respondents before they were comparedwith the main survey respondents This was done inorder to increase the extent to which the short-formrespondents represent all main survey non-respondents The first part of this weighting schemecompared the 399 HUs in which short-form inter-views were completed with all other non-respondentHUs on the same aggregate 2000 census block group(BG) data as those used in the third stage (Model 3)

IJMPR 132 3rd v4 25604 1026 am Page 82

NCS-R design and field procedures 83

of the non-response adjustment model (Table 4)Stepwise logistic regression was used to select signifi-cant predictors of HU-level participation versusnon-participation The final set of significant predic-tors included region urbanicity and household sizeThe participating HUs were then weighted by theinverse of their predicted probability of participationto adjust for segment-level variation in responseThis weight was then normed to have a sum ofweights equal to 3258 the weighted total number ofnon-respondent HUs in the sample

With this first weight applied separately to each ofthe 773 residents of the 399 participating short-formHUs a comparison of the short-form respondents inthese 399 HUs with the remaining eligible residentsof the same HUs was made based on the cross-classification of listing information (age and sex ofeach eligible HU resident and number of eligibleresidents in each HU) A second weight was thendeveloped that was equivalent to WT12 in adjustingthe short-form respondents to be representative of all773 eligible residents of these HUs As the 399 HUs

Table 5 Diagnostic stem question predictors of response versus non-response based on the comparison ofmain survey respondents (n = 9282) versus short-form survey respondents in non-respondent households (n = 475)1

Bivariate Multivariate

OR (95 CI) OR (95 CI)

I Mood disturbanceDysphoria 09 (07ndash12) 10 (08ndash14)Euphoria 08 (06ndash11) 09 (07ndash12)Irritability 08 (06ndash10) 08 (06ndash11)Extreme irritability 08 (06ndash12) 09 (07ndash13)

II AnxietyPersistent worry 09 (07ndash11) 09 (07ndash12)Panic 08 (06ndash11) 08 (06ndash11)Agoraphobic fear 09 (07ndash12) 09 (07ndash12)Specific fear 10 (07ndash13) 10 (08ndash13)Social fear 10 (07ndash13) 10 (07ndash14)Separation anxietyndash Childhood only 12 (08ndash18) 13 (09ndash20)ndash Adult only 11 (07ndash17) 13 (08ndash20)ndash Both 13 (07ndash24) 16 (08ndash29)

III Substance problemsNicotine 12 (09ndash15) 12 (09ndash16)Alcohol-drugs 11 (08ndash15) 11 (08ndash14)

IV Impulse-control problemsOppositional-defiant 10 (08ndash13) 10 (08ndash14)Conduct 09 (07ndash11) 09 (06ndash11)Intermittent explosive 11 (09ndash14) 12 (10ndash16)Attention-hyperactive problemsndash Attention only 09 (06ndash13) 09 (07ndash13)ndash Hyperactive only 11 (07ndash19) 11 (06ndash20)ndash Both 10 (06ndash06) 10 (06ndash16)

1 Based on logistic regression equations in which respondents in the main survey were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of the residents of all non-respondent HUs using methods described in the text All equations controlled for the predictors in Model 3 in Table 4

IJMPR 132 3rd v4 25604 1026 am Page 83

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 4: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

Kessler Berglund et al72

Selection into Part II was controlled by the CAPIprogram which divided respondents into three stratabased on their Part I responses The first stratumconsisted of respondents who either (1) met lifetimecriteria for at least one of the mental disordersassessed in Part I (2) met subthreshold lifetimecriteria for any of these disorders and sought treat-ment for at least one of them at some time in theirlife or (3) ever in their life either made a plan tocommit suicide or attempted suicide All of thesepeople were administered Part II The secondstratum consisted of respondents who although notmeeting criteria for membership in the first stratumgave responses in Part I indicating that they either(1) ever met subthreshold criteria for any of the PartI disorders (2) ever sought treatment for anyemotional or substance problem (3) ever hadsuicidal ideation or (4) used any psychotropicmedications (whether or not under the direct super-vision of a physician) in the past 12 months to treatemotional problems A probability sample of 59 ofthe respondents in this second stratum was selectedto receive Part II The third stratum consisted of allother respondents of whom 25 were selected toreceive Part II

As virtually all planned analyses of the NCS-Rdata feature comparisons of cases with non-cases andas the number of non-cases substantially exceeds thenumber of cases the under-sampling of non-cases inthe second and third strata has only a small effect onstatistical power to study correlates of disorder Thisassertion is demonstrated empirically later in thepaper An additional efficiency of the Part II sampleis that respondents in the second and third stratawere selected with probabilities proportional to theirhousehold size This approach which is described inmore detail below is opposite of the procedure usedin initial sample selection leading to a partialcancelling out of the variation in the within house-hold probability of selection weight

The survey populationThe survey was designed to be representative ofEnglish-speaking adults ages 18 or older living in thenon-institutionalized civilian household populationof the coterminous US (excluding Alaska andHawaii) plus students living in campus grouphousing who have a permanent household address

Fieldwork organization and proceduresAs noted above the NCS-R fieldwork was carriedout by the professional SRC national field interviewstaff Over 300 interviewers participated in datacollection The SRC field staff was supervised by ateam of 18 experienced regional supervisorsSupervisors in larger regions also had team leaderswho worked with them A study manager located atthe central SRC facility in Michigan oversaw thework of the supervisors and their staff

After sample selection (see below) each inter-viewer received a folder from his or her supervisor foreach household in which an interview was to beobtained An advance letter and study fact brochurewere sent to each of these households at a timedesigned to arrive a few days before the interviewermade their first contact attempt The letterexplained the purpose of the study and gave a toll-free number for respondents who had additionalquestions The study fact brochure containedanswers to frequently asked questions Upon makingin-person contact with the household the inter-viewer explained the study once again and obtaineda household listing This listing was then used toselect a random respondent in the household Therandom respondent was approached the interviewwas explained and verbal informed consent wasobtained Respondents were given $50 as a token ofappreciation for participating in the survey It shouldbe noted that verbal rather than written informedconsent was obtained because the NCS-R wasdesigned as a trend study replication of the baselineNCS which used verbal informed consent

In cases where the interviewer had difficultycontacting the household or in which the householdwas reluctant to participate persuasion letters weresent to the household Sixty days before the end of thefield period a special effort was made to recruit asmany unresolved cases as possible by sending a specialrecruitment letter and offering a higher financialincentive ($100) to complete a truncated version ofthe interview either in person or over the telephoneInterviewers were allowed to make unlimited in-personcontact attempts to complete these final interviewsand were given financial incentives for completingthese interviews during the closeout period

Survey Research Center interviewers were paid bythe hour rather than by the interview This is

IJMPR 132 3rd v4 25604 1026 am Page 72

NCS-R design and field procedures 73

important in light of evidence that interviewers paidby the interview tend to get low response ratesbecause they focus their efforts on interviewing easy-to-recruit respondents In the case of theWMH-CIDI where there is great variation in lengthof interview per-interview payment rather than per-hour payment also encourages interviewers to rushthrough long interviews We wanted to avoid thisand in fact we encouraged interviewers duringtraining to work especially hard at long interviewsbecause respondents with long interviews are gener-ally those with complex histories of psychopathologywhich are of special interest to the project We alsoencouraged interviewers to set up appointments forsecond and third interview sessions to complete longinterviews It is easy to facilitate such procedureswhen interviewers are paid by the hour

The Human Subjects Committees of bothHarvard Medical School (HMS) and the Universityof Michigan approved these recruitment consentand field procedures

Interviewer training Each professional SRC interviewer must complete atwo-day general interviewer training (GIT) coursebefore working on any SRC survey Moreover expe-rienced interviewers have to complete GIT refreshercourses on a periodic basis Each interviewer whoworked on NCS-R also received 7 days of study-specific training Each interviewer had to completean NCS-R certification test that involved adminis-tering a series of practice interviews with scriptedresponses before beginning production work

Fieldwork quality control As noted above sample households were selectedcentrally to avoid interviewers selectively recruitingrespondents from attractive neighbourhoods Therandom respondent in the household was selectedusing a standardized method that minimizes theprobability of interviewer cheating to select easy-to-recruit household members In addition the CAPIprogram controlled skip logic and had a built-inclock to record speed of data entry making it difficultfor interviewers to truncate interviews by skippingsections or to fake parts of interviews by filling in thelast sections quickly

Despite these structural disincentives to error and

cheating supervisors checked for all of these types oferror Supervisors also made contact with a random10 of interviewed households to confirm thehousehold address household enumeration andrandom selection procedures Supervisors alsoconfirmed the length of the interview and repeated arandom sample of questions in order to make surethat interviewers administered the full interview torespondents and to make sure responses wererecorded accurately

Survey Research Center field procedures called forcompleted CAPI interviews to be sent electronicallyby interviewers every night This allowed supervisorsto review the completeness of open-ended responsesand to make other quality control checks of the dataon a daily basis In cases where problems weredetected the interviewers were contacted andinstructed to re-contact the respondent to obtainmissing data

The SRC uses a computerized survey tracking soft-ware system to facilitate field quality control Thenumber of interviews completed outstandingresponse rate hours per interview and so forth areall recorded on a constantly updated basis by thissystem at the interviewer level along with bench-mark comparisons Supervisors can at a glance callinto the SRC computer server to monitor thesestatistics and go over them with individual inter-viewers in their weekly phone calls These same dataare available to study staff The supervisors SRCand HMS staff used these systems to pinpoint inter-viewers with low response rates in the first replicatefor remedial re-training Interviewers who persistedin low performance or who were found to makeconscious errors were terminated from the study

In the case of interviews with stem-branch logiclike the WMH-CIDI an additional type of potentialproblem is that interviewers who want to shorten thelength of the interview can do so by entering nega-tive responses to diagnostic stem questions evenwhen respondents endorse these questions As notedearlier SRC interviewers are paid by the hour in aneffort to avoid providing an incentive for this type ofcheating but we have seen clear evidence of it insurveys where interviewers were paid by the inter-view and supervision was only minimal Thereforein the NCS-R as in the other WMH surveys inter-viewer-level data were monitored to look for

IJMPR 132 3rd v4 25604 1026 am Page 73

Kessler Berglund et al74

evidence of systematic patterns of under-reportingdiagnostic stem questions Consistent with the ratio-nale for paying interviewers by the hour no suchevidence was found in the NCS-R

The sample design

Household sample selection proceduresRespondents were selected from a four-stage areaprobability sample of the non-institutionalizedcivilian population using small area data collected bythe US Bureau of the Census from the year 2000census of the US population to select the first twostages of the sample

The first stage of sampling selected a probabilitysample of 62 primary sampling units (PSUs) repre-sentative of the population These PSUs were linkedto the original PSUs used in the baseline NCS inorder to maximize the efficiency of cross-timecomparison Each PSU consisted of all counties in acensus-defined metropolitan statistical area (MSA)or in the case of counties not in an MSA of indi-vidual counties Primary sampling units wereselected with probabilities proportional to size (PPS)and geographic stratification from all possiblesegments in the country

The 62 PSUs include 16 MSAs that entered thesample with certainty an additional 31 non-certainty MSAs and 15 non-MSA counties Thecertainty selections include in order of size NewYork City Los Angeles Chicago PhiladelphiaDetroit San Francisco Washington DC DallasFortWorth Houston Boston Nassau-Suffolk NY StLouis Pittsburgh Baltimore Minneapolis andAtlanta These 16 PSUs are referred to as lsquoself-representingrsquo PSUs because they were not selectedrandomly to represent other MSAs but are so largethat they represent themselves Rigorously speakingthese 16 are not PSUs in the technical sense of thatterm but rather population strata The other 46PSUs are lsquonon-self-representingrsquo in the sense thatthey were selected to be representative of smallerareas in the country Systematic selection from anordered list was used to select the non-self-repre-senting PSUs with the order building in secondarystratification for geographic variation across thecountry After selection each of the three largestself-representing PSUs (New York Los AngelesChicago) was divided into four pseudo-PSUs while

each of the remaining 13 self-representing PSUs wasdivided into two pseudo-PSUs When combinedwith the 46 non-self-representing PSUs this yieldeda sample of 84 PSUs and pseudo-PSUs We hence-forth refer to both PSUs and pseudo-PSUs as PSUs

The second stage of sampling divided each PSUinto segments of between 50 and 100 housing unitsbased on 2000 small-area census data and selected aprobability sample of 12 such segments from eachnon-self-representing PSU A larger number ofsegments were selected from the self-representingPSUs using the ratio of the population size over thesystematic sampling interval Within each PSU thesegments were selected systematically from anordered list with probabilities of selection propor-tional to size The order in the list built in secondarystratification for geographic variation A total of1001 area segments were selected in the entiresample

Once the sample segments were selected eachsegment was either visited by an interviewer torecord the addresses of all housing units (HUs) (inthe case of segments that were not included in thebaseline NCS) or the listing used in the baselineNCS was updated for new construction and demoli-tion (in the case of segments that were included inthe baseline NCS) These lists were entered into acentralized computer data file In order to adjust fordiscrepancies between expected and observednumbers of HUs a random sample of HUs wasselected that equals 10 times OE where O is theobserved number of households listed in the segmentand E is the number of HUs expected in the segmentfrom the Census data files

This three-stage design creates a sample in whichthe probability of any individual HU being selectedto participate in the survey is equal for every HU inthe coterminous US The fourth stage of selectionthen obtained a household listing of all residents inthe age range 18 and older from a household infor-mant The informant was also asked whether eachadult HU resident spoke English Once the HUlisting was obtained a probability procedure wasused to select one or in some cases (see below) tworespondents to be interviewed As the number ofsampled cases in the HU did not increase proportion-ally with increase in HU size there is a bias in thisapproach toward underrepresenting people who livein large HUs As described below this bias was

IJMPR 132 3rd v4 25604 1026 am Page 74

NCS-R design and field procedures 75

corrected by weighting the data to adjust for thedifferential probability of selection within HUs

Procedures for sampling students living away from homeThe largest segment of the US population not livingin HUs consists of students who live in campus grouphousing (for example dormitories fraternities andsororities) We included these students in thesampling frame if they had a permanent homeaddress (typically the home of their parents) byincluding them in HU listings in all sample house-holds If they were selected for interview theircontact information at school was obtained andspecial arrangements were made to interview themStudents living in off-campus private housing ratherthan campus group housing were eligible for sampleselection in their school residences Students of thislatter sort were not included in the HU listings oftheir permanent residences in order to avoid givingsuch students two chances to enter the sample

Within-household sample selection proceduresRecruitment of HUs began by the interviewermailing an advance letter and study fact brochure tothe HU These materials explained the purposes ofthe study described the funding sources and thesurvey organization that was carrying out the surveylisted the names and affiliations of the senior scien-tists involved in the research provided informationabout the content and length of the interviewdescribed confidentiality procedures clearly statedthat participation was voluntary and provided a toll-free number for respondents who had additionalquestions The advance letter said that the inter-viewer would make an in-person visit to the HUwithin the next week in order to answer anyremaining questions and to determine whether theHU resident selected to participate would be willingto do so

In cases where it was difficult to find a HU resi-dent at home at least 15 in-person call attemptswere made to each sample HU to make an initialcontact with a HU member Additional contactattempts were made by telephone using a reversedirectory and by leaving notes at the HU with thestudy toll-free number and asking someone in theHU to call the study office to schedule an appoint-ment Additional in-person contact attempts weremade based on the discretion of the supervisor

Once a HU resident was contacted a listing of alleligible HU residents was made that recorded the ageand sex of each such person confirmed that eachcould speak English and ensured that permanentresidents who were students living away from homein campus group housing were included The Kishtable selection method was used to select one eligibleperson as the primary predesignated respondent Inaddition in a probability sample of households inwhich there were at least two eligible residents asecond predesignated respondent was selected inorder to study within-household aggregation ofmental disorders and to reduce variation across HUsin within-household probability of selection into thesample No attempt was made to select or to inter-view a secondary respondent unless the primaryrespondent was interviewed first This decision wasmade to avoid compromising the response rate forprimary respondents

The sampling fraction for the second predesig-nated respondent was lower in HUs with only two eligible respondents than those with three ormore eligible respondents in order to reduce variancein the within-household probability of selectionweight No more than two interviews were taken perHU even when the number of HU residents was largebecause we were concerned that taking more thantwo interviews would adversely affect the responserate by increasing household burden Moreover welimited selection of a second respondent to 25 ofHUs because we were concerned that using thisprocedure in a larger proportion of HUs in a samplewith a target of 10000 interviews would adverselyaffect the geographic dispersion of the sample byreducing the total number of HUs in the sample

Interviewing hard-to-recruit casesHard-to-recruit cases included both HUs that weredifficult to contact because the residents were usuallyaway from home and respondents who were reluctantto participate once they were reached The first ofthese problems was addressed by being persistent inmaking contact attempts at all times of day and alldays of the week We also left notes at the HUswhere we were consistently unable to make contactsthat included toll free numbers for residents tocontact us to schedule interview appointments Thesecond problem was addressed by offering a $50financial incentive to all respondents and by using a

IJMPR 132 3rd v4 25604 1026 am Page 75

Kessler Berglund et al76

variety of standard survey persuasion techniques toexplain the purpose and importance of the surveyand to emphasize the confidentiality of responses

Main data collection continued until all non-contact HUs had their full complement of callattempts and all reluctant predesignated respondentshad a number of recruitment attempts It was clear atthis point that the remaining hard-to-recruit caseswere busy people who were unlikely to participate ina survey that could reasonably be expected to lastbetween 2 and 3 hours We therefore developed ashort-form version of the interview that could beadministered in less than 1 hour and we made onelast attempt to obtain an interview with hard-to-recruit cases using this shortened instrumentRemaining hard-to-reach cases were sent a letterinforming them that the study was about to end andthat we were offering an honorarium of $100 forcompleting a shortened version of the interview inthe next 30 days that would last no more than onehour Interviewers were also given a bonus for eachshort-form interview they completed in this 30-dayclose-out period at which point fieldwork ended

Sample dispositionThe sample disposition as of the end of the mainphase of data collection is shown in the first fourcolumns of Table 2 The first two columns describeprimary predesignated respondents in the 10843

final sample HUs while the third and fourthcolumns describe secondary predesignated respon-dents in the 1976 HUs that were selected to obtainsecond interviews Ineligible HUs are excluded fromthe table The response rate at this point in the datacollection was 709 among primary and 804among secondary predesignated respondents with atotal of 9282 completed interviews Non-respondents included 73 of primary and 63 of secondary cases who refused to participate 177 ofprimary and 116 of secondary respondents whowere reluctant to participate (who told the inter-viewer that they were too busy at the time of contactbut did not refuse) 20 of primary and 17 ofsecondary respondents who could not be interviewedbecause of either a permanent condition (forexample mental retardation) or a long-term situa-tion (such as an overseas work assignment) and HUsthat were never contacted (20)

We subsequently attempted to administer short-form interviews to the 2143 reluctant andno-contact primary predesignated respondents andthe 230 reluctant secondary predesignated respon-dents described in the first two columns of Table 2In the course of completing the short-form inter-views with primary respondents an additional 104secondary pre-designated respondents were gener-ated resulting in a total of 334 secondarypre-designated respondents who we attempted to

Table 2 The NCS-R sample disposition

Main interview Short-form interview

Primary Secondary Primary Secondary

(n) (n) (n) (n)

Interview 709 (7693) 804 (1589) 186 (399) 464 (155)Refusal 73 (791)1 63 (124)1 751 (1609) 443 (148)Reluctant 177 (1922) 116 (230) ndash 2 ndash 2 ndash 2 ndash 2Circumstantial 20 (216) 17 (33) 06 (14) 93 (31)No contact 20 (221) ndash 3 ndash 3 56 (121) ndash 3 ndash 3Total (10843) (1976) (2143) (334)

1 Refusals among primary respondents include informants who refused to provide a HU listing respondents who refused to be interviewed and respondents who began the interview but broke off before completing Part I Refusals among secondary respon-dents include the last two of these three categories2 All non-respondents in the short-form interview phase of the survey were classified as refusals rather than reluctant unless theyhad circumstantial reasons for not being interviewed 3 No contact is defined at the HU-level as never making any contact with a resident As a result none of the secondary pre-designated respondents was included in this category even if no contact was ever made with them In the latter case they wereclassified as reluctant in the main interview and as refusals in the short-form interview

IJMPR 132 3rd v4 25604 1026 am Page 76

NCS-R design and field procedures 77

interview The sample disposition is shown in thelast four columns of Table 2 The conditionalresponse rate was 186 among primary and 464among secondary pre-designated respondents with atotal of 554 completed short-form interviews

The 9282 main interviews combined with the554 short-form interviews total to 9836 with 8092among primary and 1744 among secondary respon-dents Focusing on primary respondents the overallresponse rate is 746 the overall cooperation rate(excluding from the denominator HUs that werenever contacted) is 761 and the cooperation rateamong pre-designated respondents without circum-stantial constraints on their participation is 778Among secondary respondents the conditionaloverall response rate is 838 and the cooperationrate among pre-designated respondents withoutcircumstantial constraints is 865

WeightingTwo approaches were taken to weighting the NCS-Rdata The first which we refer to as the non-response(NR) adjustment approach treats the 9282 main interviews as the achieved sample The short-form interviews are treated as non-respondentinterviews which are used to develop a non-responseadjustment weight applied to the 9282 main inter-views The second approach which we refer to as themultiple-imputation (MI) approach combines the9282 main interviews with the 554 short-form inter-views to create a combined sample of 9836 cases inwhich the 554 short-form interviews are weighted totreat them as representative of all initial non-respon-dents The method of MI (Rubin 1987) is used toadjust for the fact that the short-form interviews didnot include all the questions in the main interviewsAdditional weights are applied in both approaches toadjust for differences in within-household proba-bility of selection and to post-stratify the finalsample to approximate the distribution of the 2000Census on a range of socio-demographic variables

The NR approach is the main one used in analysisof the NCS-R data The MI approach is moreexploratory because MI can lead to downward bias inestimates of associations if the models used to makethe imputations do not capture the full complexity ofthe associations in the main cases On the otherhand by using explicit models to impute missingvalues the MI approach allows more flexibility than

the NR approach in combining observed variablesfor short-form cases with patterns detected in thecomplete data In addition mild modelling assumptions can be used in the MI approach toimprove efficiency compared to the NR approach Incases of this sort when the parameter estimates aresimilar in the two approaches the SE will generallybe lower in the MI approach As a result of thesepotential advantages of the MI approach we plan touse MI to replicate marginally significant substantivepatterns as a sensitivity analysis bearing in mindthat shrinkage of the associations might occur due tolack of precision in the imputations Based on thishierarchy between the two approaches in the NCS-Ranalyses the focus in this section of the paper islargely on the NR approach

The non-response adjustment approachFive weights are applied to the data in the NRapproach a locked building subsampling weight(WT11) a within-household probability of selec-tion weight (WT12) a non-response adjustmentweight (WT13) a post-stratification weight(WT14) and a Part II selection weight (WT15)The joint product of the first four of these weights isthe consolidated weight used to analyse data fromthe Part I sample in the NR approach (n = 9282)while the joint product of all five weights is theconsolidated weight used to analyse data from the PartII sample (n = 5692) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample andweight are used This section of the paper reviewseach of these five weights

The locked building subsampling weight (WT11)gives a double weight to 60 sample HUs from an orig-inal 120 apartments in locked apartment buildingswhere we could not make contact with a superinten-dent or other official to gain entry A random 50 ofthe predesignated units in these buildings wereselected for especially intense recruitment effortThese 60 were double weighted It should be notedthat residents of a number of other locked apartmentbuildings were also included and interviewed basedon the interviewers contacting the building ownersuperintendent or official committee of residentsand obtaining permission to carry out interviews inthe building In addition we were officially refusedpermission to carry out interviews in five large

IJMPR 132 3rd v4 25604 1026 am Page 77

Kessler Berglund et al78

locked buildings in New York City each representinga complete sample segment Non-probabilityreplacements of other locked buildings in the sameneighbourhoods were made in order to avoidcomplete non-response in these segments

The within-household probability of selectionweight (WT12) adjusts for the fact that the proba-bility of selection of respondents within HUs variesinversely with the number of people in the HU Thisis true because as noted earlier only one or tworespondents were selected for interview in each HUWT12 was created by generating a separate observa-tional record for each of the weighted (by WT11)14025 eligible HU residents in the 7693 partici-pating HUs The three variables in the householdlisting were included in the individual-level recordsrespondent age and sex and number of eligible resi-dents in the HU The weighted distributions of thesethree variables were then calculated for the 9282respondents the 4733 eligible HU residents whowere not interviewed and for all 14015 eligible HUresidents weighted to adjust for WT11 The sum ofweights was 14025 because of this adjustment

As shown in Table 3 these distributions differmeaningfully with the greatest difference being for

size of household This is because the individual-level probability of selection within HUs wasinversely proportional to HU size leading to adistortion in the age and sex distributions amongrespondents because these variables vary with size ofHU WT12 was constructed to correct this bias Theweighted three-way cross-classification of age sexand household size was computed separately for the9282 respondents (weighted to 9287 because ofWT11) and for all 14015 residents of the partici-pating HUs (weighted to 14025) The ratio of thenumber of cases in the latter to the former was thencalculated separately for each cell and these ratioswere applied to each respondent The sum of theweights across respondents was normed to sum to14025 These normed ratios define WT12

The non-response adjustment weight (WT13)was then developed and applied to the weighted (bythe product of WT11 and WT12) dataset of 9282respondents to adjust for the fact that non-respondents differ from respondents The partici-pants in the short-form interviews from initialnon-respondent HUs were treated as representativeof all non-respondents in order to develop thisweight As described in the next subsection a two-

Table 3 Household listing information for respondents and other eligible residents of the households thatparticipated in the main survey1

Respondents Others Total

(SE) (SE) (SE)

Age18ndash29 227 (04) 269 (06) 241 (04)30ndash44 317 (05) 298 (07) 311 (04)45ndash59 246 (04) 266 (06) 253 (04)60+ 210 (04) 166 (05) 195 (03)

SexMale 446 (05) 520 (07) 471 (04)Female 554 (05) 480 (07) 529 (04)

Number of eligible HU residents1 293 (05) 00 (00) 194 (03)2 539 (05) 606 (07) 562 (04)3+ 168 (04) 394 (07) 244 (04)

Total (n) (9287) (4738) (14025)

1 A total of 9282 respondents from 7693 HUs participated in the main survey An additional 4733 eligible people lived in the 7693HUs The locked building weight (WT11) was used in calculating the distributions in the table converting the 9282 respondents intoa weighted total of 9287 and the 4733 other eligible HU residents into a weighted total of 4738 for a total of 14025

IJMPR 132 3rd v4 25604 1026 am Page 78

NCS-R design and field procedures 79

part weight was applied as a preliminary step to theseshort-form respondents aimed at making them asrepresentative as possible of all residents of non-respondent HUs As the short-form respondentscompleted the disorder screening section as well asthe demographic sections it was possible to use diag-nostic stem questions in addition to individual-leveldemographic variables as predictors in comparing themain survey respondents with the initial non-respon-dents Prior to carrying out the stepwise logisticregression analysis the main survey respondentswere weighted to a sum of weights of 14025 to repre-sent all eligible residents of the respondent HUswhile the initial non-respondents were weighted to asum of weights of 6302 to represent all eligible resi-dents of non-respondent HUs In both cases thesesums were obtained from HU listings

WT13 is a very important weight because thelogistic regression equation on which it is based eval-uates whether there is a significant bias in theprevalence of mental disorders in the main sampleWe consequently consider the construction ofWT13 in some detail in this part of the paper Fourstages of model fitting were used to create the logisticregression equation on which WT13 is based Thesestages sequentially evaluated the effects ofgeographic variables individual-level demographicvariables census small area aggregate variables andscreening measures of mental disorders Results ofthe first three stages are presented in Table 4 Thefirst stage (Model 1) examined segment-level differ-ences between respondent and non-respondent HUsin Census region and Census urbanicity Results inthe first two columns of Table 4 show that respon-dent HUs differ meaningfully from non-respondentsHUs in urbanicity (χ2

7 = 1060 p = 000) and region(χ2

3 = 66 p = 009) The odds-ratios (ORs) forurbanicity and region show that the sample over-represents non-metropolitan counties the Midwestand the South

The second stage (Model 2) added basic individual-level demographic variables to the equa-tion ndash age sex race-ethnicity marital statuseducation employment status and household sizeAs shown in Table 4 household size (χ2

3 = 604 p =000) and marital status (χ2

2 = 92 p = 001) werethe only significant predictors in this set withrespondents significantly less likely than non-respondents to live in houses with one to three

eligible residents and to be never married or previ-ously married

The third stage (Model 3) was based on a forwardstepwise logistic regression analysis that searched for2000 census block group (BG) aggregate measuresthat significantly discriminate between respondentsand non-respondents A wide range of variables wasused to describe BG characteristics includingpercentage variables (for example the percentage ofadults in the BG living in poverty the percentageliving alone) and mean variables (for example theaverage family income of HUs in the BG the averagenumber of adults per HU) In cases where a samplesegment crossed BG boundaries weighted averagesacross these units were calculated

Five aggregate variables were found to be signifi-cant predictors in the third stage of analysis All werediscretized in elaborations of the basic logistic regres-sion equations in order to study their functional formin distinguishing respondents from non-respondents(005 level two-sided tests) Dichotomous classifica-tion was found to be appropriate to characterize thefunctional form of four predictors A trichotomousspecification was required for the fifth predictorResults presented in Table 4 show that respondentswere significantly more likely than non-respondentsto live in areas with a low proportion of people overthe age of 65 a low proportion of foreign-speakers ahigh proportion of non-Hispanic whites a highproportion of never-married people and highproportions of people not in the labour force

In the fourth stage of estimating the non-responsebias equation positive responses to diagnostic stemquestions in the screening section of the interviewwere added to the predictors in Model 3 Stem questions were included for mood disturbance(dysphoria euphoria irritability) anxiety (persistentworry panic agoraphobia specific fear social fearchildhood and adult separation anxiety) substanceproblems (alcohol drugs nicotine) and impulsecontrol problems (oppositional-defiant conductdisorder intermittent explosive attentional andhyperactive) As shown in Table 5 none of thesevariables alone was either a statistically or substan-tively significant predictor in the fourth stage of theanalysis These variables are significant as a group(χ2

20 = 381 p = 0010) though even though noneof the predictors in the equation is individuallysignificant We also considered more complex

IJMPR 132 3rd v4 25604 1026 am Page 79

Kessler Berglund et al80

Table 4 Geographic and socio-demographic predictors of response versus non-response based on thecomparison of main survey respondents (n = 9282) versus short-form survey respondents (n = 475)1

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Region Midwest 17 (11ndash26) 66 16 (10ndash25) 53 14 (09ndash21) 28 3South 18 (11ndash29) 16 (10ndash26) 13 (08ndash22)West 13 (07ndash25) 14 (07ndash26) 13 (07ndash24)Northeast 10 ndash

County urbanicity2

Central counties of metro 10 ndash 10 ndash 10 ndashareas of 1+million

Fringe counties of metro a 10 (05ndash21) 1060 12 (05ndash24) 875 08 (04ndash15) 232 7reas 1+ million

Central and fringe counties 15 (09ndash25) 16 (10ndash27) 15 (10ndash25)of metro areas of 250000 ndash1 million

Central and fringe counties 15 (08ndash29) 16 (09ndash29) 13 (07ndash25)of metro areasof less than 250000

Non-metro counties of 20000 19 (09ndash42) 21 (10ndash45) 20 (11ndash40)or more adjacent to a metro area

Non-metro counties of 20000 or 69 (42ndash111) 69 (41ndash118) 28 (14ndash54)more not adjacent to a metro area

Non-metro counties of 17 (08ndash37) 19 (09ndash40) 16 (08ndash35)2500 ndash19999

Non-metro counties of less than 31 (18ndash55) 34 (20ndash58) 22 (12ndash43)2500

Age18ndash29 10 ndash 10 ndash30ndash44 09 (06ndash14) 23 10 (07ndash15) 29 345ndash59 08 (06ndash12) 09 (06ndash14)60+ 11 (06ndash19) 13 (08ndash23)

SexFemale 10 (09ndash13) ndash 10 (09ndash13) ndashMale 10 ndash 10 ndash

Race Non-Hispanic White 10 ndash 10 ndashNon-Hispanic Black 15 (10ndash22) 39 17 (11ndash26) 76 3Hispanic 11 (07ndash17) 14 (09ndash20)Other 09 (05ndash15) 09 (05ndash15)

Marital statusMarried3 10 ndash 10 ndashNever married 16 (11ndash24) 92 17 (11ndash25) 91 2Separatedwidoweddivorced 18 (12ndash28) 18 (12ndash27)

Education0ndash11 10 ndash 10 ndash12 09 (06ndash14) 18 09 (06ndash14) 32 313ndash15 10 (07ndash14) 10 (07ndash14)16+ 11 (08ndash16) 12 (08ndash17)

IJMPR 132 3rd v4 25604 1026 am Page 80

NCS-R design and field procedures 81

specifications that included disorder clusters andcounts but none yielded any more compellingevidence than in Table 5 for significant differencesbetween respondents and initial non-respondents inthe prevalence of these diagnostic stem questions

Based on these results the final non-responseadjustment weight was based on Model 3 Each ofthe 9282 main study respondents was assigned apredicted probability of response based on this equa-tion The non-response adjustment weight was

defined as the multiplicative inverse of thatpredicted probability This weight was then normedto sum to 9282 This normed weight defines WT13

The 9282 cases were then weighted by the jointproduct of WT11 WT12 and WT13 A fourthweight (WT14) was then created to adjust for varia-tion between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the March 2002 Current PopulationSurvey (CPS) data This post-stratification weight

Table 4 contd

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Employment statusEmployed 10 ndash 10 ndashHomemaker 13 (08ndash20) 20 14 (09ndash21) 28 4Retired 11 (06ndash18) 11 (07ndash18)Student 09 (04ndash18) 09 (04ndash18)Other 11 (06ndash19) 11 (06ndash20)

Number of eligible household residents1 07 (04ndash11) 604 06 (04ndash10) 720 32 19 (11ndash35) 18 (10ndash33)3 27 (14ndash52) 26 (13ndash50)4+ 10 ndash 10 ndash

Block group-level socio-demographics4

Unable to speak English D1-6 19 (14ndash26) Never married D5-10 15 (11ndash22) Not in labor force D1-3 06 (04ndash09) Age 65+ D1-3 27 (16ndash45) Age 65+ D4-9 15 (09ndash23) Non-Hispanic White D1 04 (02ndash06)

Significant at the 005 level two-sided test

1 Based on logistic regression equations in which main survey respondents were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of all non-respondents usingmethods described in the text2 Metropolitan counties are defined as either central or fringe counties of census metropolitan statistical areas Non-metropolitancounties are defined residually as all counties that are not in census metropolitan statistical areas For more details on the CensusBureau definition of metropolitan statistical areas see httpwwwcensusgovpopulationwwwestimatesmetrodefhtml3 Includes co-habitors4 The percentages in each county were converted to percentiles based on weighted (by population size) county-level distributionsand converted into deciles (D) of the weighted percentile distributions The coefficients from preliminary regression analysesincluded nine dummies for the deciles of each substantive variable These coefficients were examined in order to choose theoptimal way to collapse the deciles A dichotomous coding was optimal for four of the five substantive variables and a trichotomouscoding for the fifth

IJMPR 132 3rd v4 25604 1026 am Page 81

Kessler Berglund et al82

was based on comparisons of age sex race-ethnicityeducation marital status region and urbanicity inthe weighted (by the joint product of WT11WT12 and WT13) sample and the 2002 CPSsample for persons ages 18+ in the continental USAs the full cross-classification of these seven vari-ables even using coarse categories has more cellsthan there are respondents in the sample asmoothing method was used to fit only the two-waymarginals by estimating a logistic regression equationthat combined the weighted 9282 respondents(coded 1 on the dichotomous dependent variable)with a synthetic dataset representative of the individual-level 2002 CPS data for the populationages 18+ (coded 0 on the dependent variable) Theseven socio-demographic variables and their two-way interactions were the predictors The predictedprobability of being in the NCS-R sample ratherthan in the synthetic CPS population sample wasdefined for each respondent (pNn) based on thisequation The ratio pNn (1 ndash pNn) was then calcu-lated for each respondent The sum of these ratiosacross the 9282 respondents was normed to sum to9282 These normed ratios define WT14

The four-way product of weights WT11 throughWT14 was then created and applied to the 9282respondent cases This defines the consolidated PartI weight based on the NR approach An additionalweight (WT15) is needed though to analyse datain the Part II sample This is because as notedearlier the Part I sample was divided into three stratathat differed in their probabilities of selection intoPart II (100 in stratum I 50 in stratum II and25 in stratum III) The proportions who actuallycompleted Part II differ from these targets 992 ofthe 4088 respondents in stratum I (n = 4054)534 of the 1555 respondents in stratum II (n = 830) and 222 of the 3639 respondents instratum III (n = 808) for a total Part II sample of5692 respondents These differences from the targetproportions occurred because some respondentsrefused to complete Part II (approximately 1 ofdesignated cases in each stratum) and because therandom selection procedure for respondents imple-mented in the CAPI program had some randomerror The latter accounts for the fact that thenumber of Part II respondents in stratum II is largerthan the target proportion

We could have generated WT15 as the simple

inverse of the weighted (by the consolidated Part Iweight) proportion of Part I respondents in thestratum who completed Part II However in order tocheck for the possibility of systematic bias we esti-mated separate stepwise logistic regression equationsin each stratum to predict participation in Part IIfrom Part I responses Only a handful of variables allof them demographic were found to be significantpredictors in these equations Each Part II respon-dent was assigned the inverse of his or her predictedprobability of participation in Part II from the finalwithin-stratum equation These values were thennormed to have a sum equal to the sum of the Part Iweights in the full Part I sample in the stratumThese normed values were then summed across theentire Part II sample of 5692 cases and renormed tohave a sum of weights of 5692 These renormedvalues define WT15 WT15 was then multiplied bythe consolidated Part I weight to create the consoli-dated Part II weight

Comparison of the weighted and unweighteddistributions of the Part I and Part II samples with2002 CPS population distributions provides informa-tion on the effects of weighting As shown in Table6 the unweighted Part I samples overrepresentedracial minorities females residents of the Midwestpeople with 13+ years of education and residents ofmetropolitan areas All of these biases were correctedwith the consolidated Part I weight Biases in theunweighted Part II sample were similar althoughmore extreme biases were found than in the Part Isample with regard to the over-representation offemales young adults (ages 18ndash34) and residents ofMetropolitan areas All of these biases werecorrected with the consolidated Part II weight

Weighting short-form respondents to be representative ofall non-respondentsWe noted in the last subsection that WT13 relies ona two-part weighting scheme that was applied to theshort-form respondents before they were comparedwith the main survey respondents This was done inorder to increase the extent to which the short-formrespondents represent all main survey non-respondents The first part of this weighting schemecompared the 399 HUs in which short-form inter-views were completed with all other non-respondentHUs on the same aggregate 2000 census block group(BG) data as those used in the third stage (Model 3)

IJMPR 132 3rd v4 25604 1026 am Page 82

NCS-R design and field procedures 83

of the non-response adjustment model (Table 4)Stepwise logistic regression was used to select signifi-cant predictors of HU-level participation versusnon-participation The final set of significant predic-tors included region urbanicity and household sizeThe participating HUs were then weighted by theinverse of their predicted probability of participationto adjust for segment-level variation in responseThis weight was then normed to have a sum ofweights equal to 3258 the weighted total number ofnon-respondent HUs in the sample

With this first weight applied separately to each ofthe 773 residents of the 399 participating short-formHUs a comparison of the short-form respondents inthese 399 HUs with the remaining eligible residentsof the same HUs was made based on the cross-classification of listing information (age and sex ofeach eligible HU resident and number of eligibleresidents in each HU) A second weight was thendeveloped that was equivalent to WT12 in adjustingthe short-form respondents to be representative of all773 eligible residents of these HUs As the 399 HUs

Table 5 Diagnostic stem question predictors of response versus non-response based on the comparison ofmain survey respondents (n = 9282) versus short-form survey respondents in non-respondent households (n = 475)1

Bivariate Multivariate

OR (95 CI) OR (95 CI)

I Mood disturbanceDysphoria 09 (07ndash12) 10 (08ndash14)Euphoria 08 (06ndash11) 09 (07ndash12)Irritability 08 (06ndash10) 08 (06ndash11)Extreme irritability 08 (06ndash12) 09 (07ndash13)

II AnxietyPersistent worry 09 (07ndash11) 09 (07ndash12)Panic 08 (06ndash11) 08 (06ndash11)Agoraphobic fear 09 (07ndash12) 09 (07ndash12)Specific fear 10 (07ndash13) 10 (08ndash13)Social fear 10 (07ndash13) 10 (07ndash14)Separation anxietyndash Childhood only 12 (08ndash18) 13 (09ndash20)ndash Adult only 11 (07ndash17) 13 (08ndash20)ndash Both 13 (07ndash24) 16 (08ndash29)

III Substance problemsNicotine 12 (09ndash15) 12 (09ndash16)Alcohol-drugs 11 (08ndash15) 11 (08ndash14)

IV Impulse-control problemsOppositional-defiant 10 (08ndash13) 10 (08ndash14)Conduct 09 (07ndash11) 09 (06ndash11)Intermittent explosive 11 (09ndash14) 12 (10ndash16)Attention-hyperactive problemsndash Attention only 09 (06ndash13) 09 (07ndash13)ndash Hyperactive only 11 (07ndash19) 11 (06ndash20)ndash Both 10 (06ndash06) 10 (06ndash16)

1 Based on logistic regression equations in which respondents in the main survey were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of the residents of all non-respondent HUs using methods described in the text All equations controlled for the predictors in Model 3 in Table 4

IJMPR 132 3rd v4 25604 1026 am Page 83

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 5: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

NCS-R design and field procedures 73

important in light of evidence that interviewers paidby the interview tend to get low response ratesbecause they focus their efforts on interviewing easy-to-recruit respondents In the case of theWMH-CIDI where there is great variation in lengthof interview per-interview payment rather than per-hour payment also encourages interviewers to rushthrough long interviews We wanted to avoid thisand in fact we encouraged interviewers duringtraining to work especially hard at long interviewsbecause respondents with long interviews are gener-ally those with complex histories of psychopathologywhich are of special interest to the project We alsoencouraged interviewers to set up appointments forsecond and third interview sessions to complete longinterviews It is easy to facilitate such procedureswhen interviewers are paid by the hour

The Human Subjects Committees of bothHarvard Medical School (HMS) and the Universityof Michigan approved these recruitment consentand field procedures

Interviewer training Each professional SRC interviewer must complete atwo-day general interviewer training (GIT) coursebefore working on any SRC survey Moreover expe-rienced interviewers have to complete GIT refreshercourses on a periodic basis Each interviewer whoworked on NCS-R also received 7 days of study-specific training Each interviewer had to completean NCS-R certification test that involved adminis-tering a series of practice interviews with scriptedresponses before beginning production work

Fieldwork quality control As noted above sample households were selectedcentrally to avoid interviewers selectively recruitingrespondents from attractive neighbourhoods Therandom respondent in the household was selectedusing a standardized method that minimizes theprobability of interviewer cheating to select easy-to-recruit household members In addition the CAPIprogram controlled skip logic and had a built-inclock to record speed of data entry making it difficultfor interviewers to truncate interviews by skippingsections or to fake parts of interviews by filling in thelast sections quickly

Despite these structural disincentives to error and

cheating supervisors checked for all of these types oferror Supervisors also made contact with a random10 of interviewed households to confirm thehousehold address household enumeration andrandom selection procedures Supervisors alsoconfirmed the length of the interview and repeated arandom sample of questions in order to make surethat interviewers administered the full interview torespondents and to make sure responses wererecorded accurately

Survey Research Center field procedures called forcompleted CAPI interviews to be sent electronicallyby interviewers every night This allowed supervisorsto review the completeness of open-ended responsesand to make other quality control checks of the dataon a daily basis In cases where problems weredetected the interviewers were contacted andinstructed to re-contact the respondent to obtainmissing data

The SRC uses a computerized survey tracking soft-ware system to facilitate field quality control Thenumber of interviews completed outstandingresponse rate hours per interview and so forth areall recorded on a constantly updated basis by thissystem at the interviewer level along with bench-mark comparisons Supervisors can at a glance callinto the SRC computer server to monitor thesestatistics and go over them with individual inter-viewers in their weekly phone calls These same dataare available to study staff The supervisors SRCand HMS staff used these systems to pinpoint inter-viewers with low response rates in the first replicatefor remedial re-training Interviewers who persistedin low performance or who were found to makeconscious errors were terminated from the study

In the case of interviews with stem-branch logiclike the WMH-CIDI an additional type of potentialproblem is that interviewers who want to shorten thelength of the interview can do so by entering nega-tive responses to diagnostic stem questions evenwhen respondents endorse these questions As notedearlier SRC interviewers are paid by the hour in aneffort to avoid providing an incentive for this type ofcheating but we have seen clear evidence of it insurveys where interviewers were paid by the inter-view and supervision was only minimal Thereforein the NCS-R as in the other WMH surveys inter-viewer-level data were monitored to look for

IJMPR 132 3rd v4 25604 1026 am Page 73

Kessler Berglund et al74

evidence of systematic patterns of under-reportingdiagnostic stem questions Consistent with the ratio-nale for paying interviewers by the hour no suchevidence was found in the NCS-R

The sample design

Household sample selection proceduresRespondents were selected from a four-stage areaprobability sample of the non-institutionalizedcivilian population using small area data collected bythe US Bureau of the Census from the year 2000census of the US population to select the first twostages of the sample

The first stage of sampling selected a probabilitysample of 62 primary sampling units (PSUs) repre-sentative of the population These PSUs were linkedto the original PSUs used in the baseline NCS inorder to maximize the efficiency of cross-timecomparison Each PSU consisted of all counties in acensus-defined metropolitan statistical area (MSA)or in the case of counties not in an MSA of indi-vidual counties Primary sampling units wereselected with probabilities proportional to size (PPS)and geographic stratification from all possiblesegments in the country

The 62 PSUs include 16 MSAs that entered thesample with certainty an additional 31 non-certainty MSAs and 15 non-MSA counties Thecertainty selections include in order of size NewYork City Los Angeles Chicago PhiladelphiaDetroit San Francisco Washington DC DallasFortWorth Houston Boston Nassau-Suffolk NY StLouis Pittsburgh Baltimore Minneapolis andAtlanta These 16 PSUs are referred to as lsquoself-representingrsquo PSUs because they were not selectedrandomly to represent other MSAs but are so largethat they represent themselves Rigorously speakingthese 16 are not PSUs in the technical sense of thatterm but rather population strata The other 46PSUs are lsquonon-self-representingrsquo in the sense thatthey were selected to be representative of smallerareas in the country Systematic selection from anordered list was used to select the non-self-repre-senting PSUs with the order building in secondarystratification for geographic variation across thecountry After selection each of the three largestself-representing PSUs (New York Los AngelesChicago) was divided into four pseudo-PSUs while

each of the remaining 13 self-representing PSUs wasdivided into two pseudo-PSUs When combinedwith the 46 non-self-representing PSUs this yieldeda sample of 84 PSUs and pseudo-PSUs We hence-forth refer to both PSUs and pseudo-PSUs as PSUs

The second stage of sampling divided each PSUinto segments of between 50 and 100 housing unitsbased on 2000 small-area census data and selected aprobability sample of 12 such segments from eachnon-self-representing PSU A larger number ofsegments were selected from the self-representingPSUs using the ratio of the population size over thesystematic sampling interval Within each PSU thesegments were selected systematically from anordered list with probabilities of selection propor-tional to size The order in the list built in secondarystratification for geographic variation A total of1001 area segments were selected in the entiresample

Once the sample segments were selected eachsegment was either visited by an interviewer torecord the addresses of all housing units (HUs) (inthe case of segments that were not included in thebaseline NCS) or the listing used in the baselineNCS was updated for new construction and demoli-tion (in the case of segments that were included inthe baseline NCS) These lists were entered into acentralized computer data file In order to adjust fordiscrepancies between expected and observednumbers of HUs a random sample of HUs wasselected that equals 10 times OE where O is theobserved number of households listed in the segmentand E is the number of HUs expected in the segmentfrom the Census data files

This three-stage design creates a sample in whichthe probability of any individual HU being selectedto participate in the survey is equal for every HU inthe coterminous US The fourth stage of selectionthen obtained a household listing of all residents inthe age range 18 and older from a household infor-mant The informant was also asked whether eachadult HU resident spoke English Once the HUlisting was obtained a probability procedure wasused to select one or in some cases (see below) tworespondents to be interviewed As the number ofsampled cases in the HU did not increase proportion-ally with increase in HU size there is a bias in thisapproach toward underrepresenting people who livein large HUs As described below this bias was

IJMPR 132 3rd v4 25604 1026 am Page 74

NCS-R design and field procedures 75

corrected by weighting the data to adjust for thedifferential probability of selection within HUs

Procedures for sampling students living away from homeThe largest segment of the US population not livingin HUs consists of students who live in campus grouphousing (for example dormitories fraternities andsororities) We included these students in thesampling frame if they had a permanent homeaddress (typically the home of their parents) byincluding them in HU listings in all sample house-holds If they were selected for interview theircontact information at school was obtained andspecial arrangements were made to interview themStudents living in off-campus private housing ratherthan campus group housing were eligible for sampleselection in their school residences Students of thislatter sort were not included in the HU listings oftheir permanent residences in order to avoid givingsuch students two chances to enter the sample

Within-household sample selection proceduresRecruitment of HUs began by the interviewermailing an advance letter and study fact brochure tothe HU These materials explained the purposes ofthe study described the funding sources and thesurvey organization that was carrying out the surveylisted the names and affiliations of the senior scien-tists involved in the research provided informationabout the content and length of the interviewdescribed confidentiality procedures clearly statedthat participation was voluntary and provided a toll-free number for respondents who had additionalquestions The advance letter said that the inter-viewer would make an in-person visit to the HUwithin the next week in order to answer anyremaining questions and to determine whether theHU resident selected to participate would be willingto do so

In cases where it was difficult to find a HU resi-dent at home at least 15 in-person call attemptswere made to each sample HU to make an initialcontact with a HU member Additional contactattempts were made by telephone using a reversedirectory and by leaving notes at the HU with thestudy toll-free number and asking someone in theHU to call the study office to schedule an appoint-ment Additional in-person contact attempts weremade based on the discretion of the supervisor

Once a HU resident was contacted a listing of alleligible HU residents was made that recorded the ageand sex of each such person confirmed that eachcould speak English and ensured that permanentresidents who were students living away from homein campus group housing were included The Kishtable selection method was used to select one eligibleperson as the primary predesignated respondent Inaddition in a probability sample of households inwhich there were at least two eligible residents asecond predesignated respondent was selected inorder to study within-household aggregation ofmental disorders and to reduce variation across HUsin within-household probability of selection into thesample No attempt was made to select or to inter-view a secondary respondent unless the primaryrespondent was interviewed first This decision wasmade to avoid compromising the response rate forprimary respondents

The sampling fraction for the second predesig-nated respondent was lower in HUs with only two eligible respondents than those with three ormore eligible respondents in order to reduce variancein the within-household probability of selectionweight No more than two interviews were taken perHU even when the number of HU residents was largebecause we were concerned that taking more thantwo interviews would adversely affect the responserate by increasing household burden Moreover welimited selection of a second respondent to 25 ofHUs because we were concerned that using thisprocedure in a larger proportion of HUs in a samplewith a target of 10000 interviews would adverselyaffect the geographic dispersion of the sample byreducing the total number of HUs in the sample

Interviewing hard-to-recruit casesHard-to-recruit cases included both HUs that weredifficult to contact because the residents were usuallyaway from home and respondents who were reluctantto participate once they were reached The first ofthese problems was addressed by being persistent inmaking contact attempts at all times of day and alldays of the week We also left notes at the HUswhere we were consistently unable to make contactsthat included toll free numbers for residents tocontact us to schedule interview appointments Thesecond problem was addressed by offering a $50financial incentive to all respondents and by using a

IJMPR 132 3rd v4 25604 1026 am Page 75

Kessler Berglund et al76

variety of standard survey persuasion techniques toexplain the purpose and importance of the surveyand to emphasize the confidentiality of responses

Main data collection continued until all non-contact HUs had their full complement of callattempts and all reluctant predesignated respondentshad a number of recruitment attempts It was clear atthis point that the remaining hard-to-recruit caseswere busy people who were unlikely to participate ina survey that could reasonably be expected to lastbetween 2 and 3 hours We therefore developed ashort-form version of the interview that could beadministered in less than 1 hour and we made onelast attempt to obtain an interview with hard-to-recruit cases using this shortened instrumentRemaining hard-to-reach cases were sent a letterinforming them that the study was about to end andthat we were offering an honorarium of $100 forcompleting a shortened version of the interview inthe next 30 days that would last no more than onehour Interviewers were also given a bonus for eachshort-form interview they completed in this 30-dayclose-out period at which point fieldwork ended

Sample dispositionThe sample disposition as of the end of the mainphase of data collection is shown in the first fourcolumns of Table 2 The first two columns describeprimary predesignated respondents in the 10843

final sample HUs while the third and fourthcolumns describe secondary predesignated respon-dents in the 1976 HUs that were selected to obtainsecond interviews Ineligible HUs are excluded fromthe table The response rate at this point in the datacollection was 709 among primary and 804among secondary predesignated respondents with atotal of 9282 completed interviews Non-respondents included 73 of primary and 63 of secondary cases who refused to participate 177 ofprimary and 116 of secondary respondents whowere reluctant to participate (who told the inter-viewer that they were too busy at the time of contactbut did not refuse) 20 of primary and 17 ofsecondary respondents who could not be interviewedbecause of either a permanent condition (forexample mental retardation) or a long-term situa-tion (such as an overseas work assignment) and HUsthat were never contacted (20)

We subsequently attempted to administer short-form interviews to the 2143 reluctant andno-contact primary predesignated respondents andthe 230 reluctant secondary predesignated respon-dents described in the first two columns of Table 2In the course of completing the short-form inter-views with primary respondents an additional 104secondary pre-designated respondents were gener-ated resulting in a total of 334 secondarypre-designated respondents who we attempted to

Table 2 The NCS-R sample disposition

Main interview Short-form interview

Primary Secondary Primary Secondary

(n) (n) (n) (n)

Interview 709 (7693) 804 (1589) 186 (399) 464 (155)Refusal 73 (791)1 63 (124)1 751 (1609) 443 (148)Reluctant 177 (1922) 116 (230) ndash 2 ndash 2 ndash 2 ndash 2Circumstantial 20 (216) 17 (33) 06 (14) 93 (31)No contact 20 (221) ndash 3 ndash 3 56 (121) ndash 3 ndash 3Total (10843) (1976) (2143) (334)

1 Refusals among primary respondents include informants who refused to provide a HU listing respondents who refused to be interviewed and respondents who began the interview but broke off before completing Part I Refusals among secondary respon-dents include the last two of these three categories2 All non-respondents in the short-form interview phase of the survey were classified as refusals rather than reluctant unless theyhad circumstantial reasons for not being interviewed 3 No contact is defined at the HU-level as never making any contact with a resident As a result none of the secondary pre-designated respondents was included in this category even if no contact was ever made with them In the latter case they wereclassified as reluctant in the main interview and as refusals in the short-form interview

IJMPR 132 3rd v4 25604 1026 am Page 76

NCS-R design and field procedures 77

interview The sample disposition is shown in thelast four columns of Table 2 The conditionalresponse rate was 186 among primary and 464among secondary pre-designated respondents with atotal of 554 completed short-form interviews

The 9282 main interviews combined with the554 short-form interviews total to 9836 with 8092among primary and 1744 among secondary respon-dents Focusing on primary respondents the overallresponse rate is 746 the overall cooperation rate(excluding from the denominator HUs that werenever contacted) is 761 and the cooperation rateamong pre-designated respondents without circum-stantial constraints on their participation is 778Among secondary respondents the conditionaloverall response rate is 838 and the cooperationrate among pre-designated respondents withoutcircumstantial constraints is 865

WeightingTwo approaches were taken to weighting the NCS-Rdata The first which we refer to as the non-response(NR) adjustment approach treats the 9282 main interviews as the achieved sample The short-form interviews are treated as non-respondentinterviews which are used to develop a non-responseadjustment weight applied to the 9282 main inter-views The second approach which we refer to as themultiple-imputation (MI) approach combines the9282 main interviews with the 554 short-form inter-views to create a combined sample of 9836 cases inwhich the 554 short-form interviews are weighted totreat them as representative of all initial non-respon-dents The method of MI (Rubin 1987) is used toadjust for the fact that the short-form interviews didnot include all the questions in the main interviewsAdditional weights are applied in both approaches toadjust for differences in within-household proba-bility of selection and to post-stratify the finalsample to approximate the distribution of the 2000Census on a range of socio-demographic variables

The NR approach is the main one used in analysisof the NCS-R data The MI approach is moreexploratory because MI can lead to downward bias inestimates of associations if the models used to makethe imputations do not capture the full complexity ofthe associations in the main cases On the otherhand by using explicit models to impute missingvalues the MI approach allows more flexibility than

the NR approach in combining observed variablesfor short-form cases with patterns detected in thecomplete data In addition mild modelling assumptions can be used in the MI approach toimprove efficiency compared to the NR approach Incases of this sort when the parameter estimates aresimilar in the two approaches the SE will generallybe lower in the MI approach As a result of thesepotential advantages of the MI approach we plan touse MI to replicate marginally significant substantivepatterns as a sensitivity analysis bearing in mindthat shrinkage of the associations might occur due tolack of precision in the imputations Based on thishierarchy between the two approaches in the NCS-Ranalyses the focus in this section of the paper islargely on the NR approach

The non-response adjustment approachFive weights are applied to the data in the NRapproach a locked building subsampling weight(WT11) a within-household probability of selec-tion weight (WT12) a non-response adjustmentweight (WT13) a post-stratification weight(WT14) and a Part II selection weight (WT15)The joint product of the first four of these weights isthe consolidated weight used to analyse data fromthe Part I sample in the NR approach (n = 9282)while the joint product of all five weights is theconsolidated weight used to analyse data from the PartII sample (n = 5692) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample andweight are used This section of the paper reviewseach of these five weights

The locked building subsampling weight (WT11)gives a double weight to 60 sample HUs from an orig-inal 120 apartments in locked apartment buildingswhere we could not make contact with a superinten-dent or other official to gain entry A random 50 ofthe predesignated units in these buildings wereselected for especially intense recruitment effortThese 60 were double weighted It should be notedthat residents of a number of other locked apartmentbuildings were also included and interviewed basedon the interviewers contacting the building ownersuperintendent or official committee of residentsand obtaining permission to carry out interviews inthe building In addition we were officially refusedpermission to carry out interviews in five large

IJMPR 132 3rd v4 25604 1026 am Page 77

Kessler Berglund et al78

locked buildings in New York City each representinga complete sample segment Non-probabilityreplacements of other locked buildings in the sameneighbourhoods were made in order to avoidcomplete non-response in these segments

The within-household probability of selectionweight (WT12) adjusts for the fact that the proba-bility of selection of respondents within HUs variesinversely with the number of people in the HU Thisis true because as noted earlier only one or tworespondents were selected for interview in each HUWT12 was created by generating a separate observa-tional record for each of the weighted (by WT11)14025 eligible HU residents in the 7693 partici-pating HUs The three variables in the householdlisting were included in the individual-level recordsrespondent age and sex and number of eligible resi-dents in the HU The weighted distributions of thesethree variables were then calculated for the 9282respondents the 4733 eligible HU residents whowere not interviewed and for all 14015 eligible HUresidents weighted to adjust for WT11 The sum ofweights was 14025 because of this adjustment

As shown in Table 3 these distributions differmeaningfully with the greatest difference being for

size of household This is because the individual-level probability of selection within HUs wasinversely proportional to HU size leading to adistortion in the age and sex distributions amongrespondents because these variables vary with size ofHU WT12 was constructed to correct this bias Theweighted three-way cross-classification of age sexand household size was computed separately for the9282 respondents (weighted to 9287 because ofWT11) and for all 14015 residents of the partici-pating HUs (weighted to 14025) The ratio of thenumber of cases in the latter to the former was thencalculated separately for each cell and these ratioswere applied to each respondent The sum of theweights across respondents was normed to sum to14025 These normed ratios define WT12

The non-response adjustment weight (WT13)was then developed and applied to the weighted (bythe product of WT11 and WT12) dataset of 9282respondents to adjust for the fact that non-respondents differ from respondents The partici-pants in the short-form interviews from initialnon-respondent HUs were treated as representativeof all non-respondents in order to develop thisweight As described in the next subsection a two-

Table 3 Household listing information for respondents and other eligible residents of the households thatparticipated in the main survey1

Respondents Others Total

(SE) (SE) (SE)

Age18ndash29 227 (04) 269 (06) 241 (04)30ndash44 317 (05) 298 (07) 311 (04)45ndash59 246 (04) 266 (06) 253 (04)60+ 210 (04) 166 (05) 195 (03)

SexMale 446 (05) 520 (07) 471 (04)Female 554 (05) 480 (07) 529 (04)

Number of eligible HU residents1 293 (05) 00 (00) 194 (03)2 539 (05) 606 (07) 562 (04)3+ 168 (04) 394 (07) 244 (04)

Total (n) (9287) (4738) (14025)

1 A total of 9282 respondents from 7693 HUs participated in the main survey An additional 4733 eligible people lived in the 7693HUs The locked building weight (WT11) was used in calculating the distributions in the table converting the 9282 respondents intoa weighted total of 9287 and the 4733 other eligible HU residents into a weighted total of 4738 for a total of 14025

IJMPR 132 3rd v4 25604 1026 am Page 78

NCS-R design and field procedures 79

part weight was applied as a preliminary step to theseshort-form respondents aimed at making them asrepresentative as possible of all residents of non-respondent HUs As the short-form respondentscompleted the disorder screening section as well asthe demographic sections it was possible to use diag-nostic stem questions in addition to individual-leveldemographic variables as predictors in comparing themain survey respondents with the initial non-respon-dents Prior to carrying out the stepwise logisticregression analysis the main survey respondentswere weighted to a sum of weights of 14025 to repre-sent all eligible residents of the respondent HUswhile the initial non-respondents were weighted to asum of weights of 6302 to represent all eligible resi-dents of non-respondent HUs In both cases thesesums were obtained from HU listings

WT13 is a very important weight because thelogistic regression equation on which it is based eval-uates whether there is a significant bias in theprevalence of mental disorders in the main sampleWe consequently consider the construction ofWT13 in some detail in this part of the paper Fourstages of model fitting were used to create the logisticregression equation on which WT13 is based Thesestages sequentially evaluated the effects ofgeographic variables individual-level demographicvariables census small area aggregate variables andscreening measures of mental disorders Results ofthe first three stages are presented in Table 4 Thefirst stage (Model 1) examined segment-level differ-ences between respondent and non-respondent HUsin Census region and Census urbanicity Results inthe first two columns of Table 4 show that respon-dent HUs differ meaningfully from non-respondentsHUs in urbanicity (χ2

7 = 1060 p = 000) and region(χ2

3 = 66 p = 009) The odds-ratios (ORs) forurbanicity and region show that the sample over-represents non-metropolitan counties the Midwestand the South

The second stage (Model 2) added basic individual-level demographic variables to the equa-tion ndash age sex race-ethnicity marital statuseducation employment status and household sizeAs shown in Table 4 household size (χ2

3 = 604 p =000) and marital status (χ2

2 = 92 p = 001) werethe only significant predictors in this set withrespondents significantly less likely than non-respondents to live in houses with one to three

eligible residents and to be never married or previ-ously married

The third stage (Model 3) was based on a forwardstepwise logistic regression analysis that searched for2000 census block group (BG) aggregate measuresthat significantly discriminate between respondentsand non-respondents A wide range of variables wasused to describe BG characteristics includingpercentage variables (for example the percentage ofadults in the BG living in poverty the percentageliving alone) and mean variables (for example theaverage family income of HUs in the BG the averagenumber of adults per HU) In cases where a samplesegment crossed BG boundaries weighted averagesacross these units were calculated

Five aggregate variables were found to be signifi-cant predictors in the third stage of analysis All werediscretized in elaborations of the basic logistic regres-sion equations in order to study their functional formin distinguishing respondents from non-respondents(005 level two-sided tests) Dichotomous classifica-tion was found to be appropriate to characterize thefunctional form of four predictors A trichotomousspecification was required for the fifth predictorResults presented in Table 4 show that respondentswere significantly more likely than non-respondentsto live in areas with a low proportion of people overthe age of 65 a low proportion of foreign-speakers ahigh proportion of non-Hispanic whites a highproportion of never-married people and highproportions of people not in the labour force

In the fourth stage of estimating the non-responsebias equation positive responses to diagnostic stemquestions in the screening section of the interviewwere added to the predictors in Model 3 Stem questions were included for mood disturbance(dysphoria euphoria irritability) anxiety (persistentworry panic agoraphobia specific fear social fearchildhood and adult separation anxiety) substanceproblems (alcohol drugs nicotine) and impulsecontrol problems (oppositional-defiant conductdisorder intermittent explosive attentional andhyperactive) As shown in Table 5 none of thesevariables alone was either a statistically or substan-tively significant predictor in the fourth stage of theanalysis These variables are significant as a group(χ2

20 = 381 p = 0010) though even though noneof the predictors in the equation is individuallysignificant We also considered more complex

IJMPR 132 3rd v4 25604 1026 am Page 79

Kessler Berglund et al80

Table 4 Geographic and socio-demographic predictors of response versus non-response based on thecomparison of main survey respondents (n = 9282) versus short-form survey respondents (n = 475)1

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Region Midwest 17 (11ndash26) 66 16 (10ndash25) 53 14 (09ndash21) 28 3South 18 (11ndash29) 16 (10ndash26) 13 (08ndash22)West 13 (07ndash25) 14 (07ndash26) 13 (07ndash24)Northeast 10 ndash

County urbanicity2

Central counties of metro 10 ndash 10 ndash 10 ndashareas of 1+million

Fringe counties of metro a 10 (05ndash21) 1060 12 (05ndash24) 875 08 (04ndash15) 232 7reas 1+ million

Central and fringe counties 15 (09ndash25) 16 (10ndash27) 15 (10ndash25)of metro areas of 250000 ndash1 million

Central and fringe counties 15 (08ndash29) 16 (09ndash29) 13 (07ndash25)of metro areasof less than 250000

Non-metro counties of 20000 19 (09ndash42) 21 (10ndash45) 20 (11ndash40)or more adjacent to a metro area

Non-metro counties of 20000 or 69 (42ndash111) 69 (41ndash118) 28 (14ndash54)more not adjacent to a metro area

Non-metro counties of 17 (08ndash37) 19 (09ndash40) 16 (08ndash35)2500 ndash19999

Non-metro counties of less than 31 (18ndash55) 34 (20ndash58) 22 (12ndash43)2500

Age18ndash29 10 ndash 10 ndash30ndash44 09 (06ndash14) 23 10 (07ndash15) 29 345ndash59 08 (06ndash12) 09 (06ndash14)60+ 11 (06ndash19) 13 (08ndash23)

SexFemale 10 (09ndash13) ndash 10 (09ndash13) ndashMale 10 ndash 10 ndash

Race Non-Hispanic White 10 ndash 10 ndashNon-Hispanic Black 15 (10ndash22) 39 17 (11ndash26) 76 3Hispanic 11 (07ndash17) 14 (09ndash20)Other 09 (05ndash15) 09 (05ndash15)

Marital statusMarried3 10 ndash 10 ndashNever married 16 (11ndash24) 92 17 (11ndash25) 91 2Separatedwidoweddivorced 18 (12ndash28) 18 (12ndash27)

Education0ndash11 10 ndash 10 ndash12 09 (06ndash14) 18 09 (06ndash14) 32 313ndash15 10 (07ndash14) 10 (07ndash14)16+ 11 (08ndash16) 12 (08ndash17)

IJMPR 132 3rd v4 25604 1026 am Page 80

NCS-R design and field procedures 81

specifications that included disorder clusters andcounts but none yielded any more compellingevidence than in Table 5 for significant differencesbetween respondents and initial non-respondents inthe prevalence of these diagnostic stem questions

Based on these results the final non-responseadjustment weight was based on Model 3 Each ofthe 9282 main study respondents was assigned apredicted probability of response based on this equa-tion The non-response adjustment weight was

defined as the multiplicative inverse of thatpredicted probability This weight was then normedto sum to 9282 This normed weight defines WT13

The 9282 cases were then weighted by the jointproduct of WT11 WT12 and WT13 A fourthweight (WT14) was then created to adjust for varia-tion between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the March 2002 Current PopulationSurvey (CPS) data This post-stratification weight

Table 4 contd

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Employment statusEmployed 10 ndash 10 ndashHomemaker 13 (08ndash20) 20 14 (09ndash21) 28 4Retired 11 (06ndash18) 11 (07ndash18)Student 09 (04ndash18) 09 (04ndash18)Other 11 (06ndash19) 11 (06ndash20)

Number of eligible household residents1 07 (04ndash11) 604 06 (04ndash10) 720 32 19 (11ndash35) 18 (10ndash33)3 27 (14ndash52) 26 (13ndash50)4+ 10 ndash 10 ndash

Block group-level socio-demographics4

Unable to speak English D1-6 19 (14ndash26) Never married D5-10 15 (11ndash22) Not in labor force D1-3 06 (04ndash09) Age 65+ D1-3 27 (16ndash45) Age 65+ D4-9 15 (09ndash23) Non-Hispanic White D1 04 (02ndash06)

Significant at the 005 level two-sided test

1 Based on logistic regression equations in which main survey respondents were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of all non-respondents usingmethods described in the text2 Metropolitan counties are defined as either central or fringe counties of census metropolitan statistical areas Non-metropolitancounties are defined residually as all counties that are not in census metropolitan statistical areas For more details on the CensusBureau definition of metropolitan statistical areas see httpwwwcensusgovpopulationwwwestimatesmetrodefhtml3 Includes co-habitors4 The percentages in each county were converted to percentiles based on weighted (by population size) county-level distributionsand converted into deciles (D) of the weighted percentile distributions The coefficients from preliminary regression analysesincluded nine dummies for the deciles of each substantive variable These coefficients were examined in order to choose theoptimal way to collapse the deciles A dichotomous coding was optimal for four of the five substantive variables and a trichotomouscoding for the fifth

IJMPR 132 3rd v4 25604 1026 am Page 81

Kessler Berglund et al82

was based on comparisons of age sex race-ethnicityeducation marital status region and urbanicity inthe weighted (by the joint product of WT11WT12 and WT13) sample and the 2002 CPSsample for persons ages 18+ in the continental USAs the full cross-classification of these seven vari-ables even using coarse categories has more cellsthan there are respondents in the sample asmoothing method was used to fit only the two-waymarginals by estimating a logistic regression equationthat combined the weighted 9282 respondents(coded 1 on the dichotomous dependent variable)with a synthetic dataset representative of the individual-level 2002 CPS data for the populationages 18+ (coded 0 on the dependent variable) Theseven socio-demographic variables and their two-way interactions were the predictors The predictedprobability of being in the NCS-R sample ratherthan in the synthetic CPS population sample wasdefined for each respondent (pNn) based on thisequation The ratio pNn (1 ndash pNn) was then calcu-lated for each respondent The sum of these ratiosacross the 9282 respondents was normed to sum to9282 These normed ratios define WT14

The four-way product of weights WT11 throughWT14 was then created and applied to the 9282respondent cases This defines the consolidated PartI weight based on the NR approach An additionalweight (WT15) is needed though to analyse datain the Part II sample This is because as notedearlier the Part I sample was divided into three stratathat differed in their probabilities of selection intoPart II (100 in stratum I 50 in stratum II and25 in stratum III) The proportions who actuallycompleted Part II differ from these targets 992 ofthe 4088 respondents in stratum I (n = 4054)534 of the 1555 respondents in stratum II (n = 830) and 222 of the 3639 respondents instratum III (n = 808) for a total Part II sample of5692 respondents These differences from the targetproportions occurred because some respondentsrefused to complete Part II (approximately 1 ofdesignated cases in each stratum) and because therandom selection procedure for respondents imple-mented in the CAPI program had some randomerror The latter accounts for the fact that thenumber of Part II respondents in stratum II is largerthan the target proportion

We could have generated WT15 as the simple

inverse of the weighted (by the consolidated Part Iweight) proportion of Part I respondents in thestratum who completed Part II However in order tocheck for the possibility of systematic bias we esti-mated separate stepwise logistic regression equationsin each stratum to predict participation in Part IIfrom Part I responses Only a handful of variables allof them demographic were found to be significantpredictors in these equations Each Part II respon-dent was assigned the inverse of his or her predictedprobability of participation in Part II from the finalwithin-stratum equation These values were thennormed to have a sum equal to the sum of the Part Iweights in the full Part I sample in the stratumThese normed values were then summed across theentire Part II sample of 5692 cases and renormed tohave a sum of weights of 5692 These renormedvalues define WT15 WT15 was then multiplied bythe consolidated Part I weight to create the consoli-dated Part II weight

Comparison of the weighted and unweighteddistributions of the Part I and Part II samples with2002 CPS population distributions provides informa-tion on the effects of weighting As shown in Table6 the unweighted Part I samples overrepresentedracial minorities females residents of the Midwestpeople with 13+ years of education and residents ofmetropolitan areas All of these biases were correctedwith the consolidated Part I weight Biases in theunweighted Part II sample were similar althoughmore extreme biases were found than in the Part Isample with regard to the over-representation offemales young adults (ages 18ndash34) and residents ofMetropolitan areas All of these biases werecorrected with the consolidated Part II weight

Weighting short-form respondents to be representative ofall non-respondentsWe noted in the last subsection that WT13 relies ona two-part weighting scheme that was applied to theshort-form respondents before they were comparedwith the main survey respondents This was done inorder to increase the extent to which the short-formrespondents represent all main survey non-respondents The first part of this weighting schemecompared the 399 HUs in which short-form inter-views were completed with all other non-respondentHUs on the same aggregate 2000 census block group(BG) data as those used in the third stage (Model 3)

IJMPR 132 3rd v4 25604 1026 am Page 82

NCS-R design and field procedures 83

of the non-response adjustment model (Table 4)Stepwise logistic regression was used to select signifi-cant predictors of HU-level participation versusnon-participation The final set of significant predic-tors included region urbanicity and household sizeThe participating HUs were then weighted by theinverse of their predicted probability of participationto adjust for segment-level variation in responseThis weight was then normed to have a sum ofweights equal to 3258 the weighted total number ofnon-respondent HUs in the sample

With this first weight applied separately to each ofthe 773 residents of the 399 participating short-formHUs a comparison of the short-form respondents inthese 399 HUs with the remaining eligible residentsof the same HUs was made based on the cross-classification of listing information (age and sex ofeach eligible HU resident and number of eligibleresidents in each HU) A second weight was thendeveloped that was equivalent to WT12 in adjustingthe short-form respondents to be representative of all773 eligible residents of these HUs As the 399 HUs

Table 5 Diagnostic stem question predictors of response versus non-response based on the comparison ofmain survey respondents (n = 9282) versus short-form survey respondents in non-respondent households (n = 475)1

Bivariate Multivariate

OR (95 CI) OR (95 CI)

I Mood disturbanceDysphoria 09 (07ndash12) 10 (08ndash14)Euphoria 08 (06ndash11) 09 (07ndash12)Irritability 08 (06ndash10) 08 (06ndash11)Extreme irritability 08 (06ndash12) 09 (07ndash13)

II AnxietyPersistent worry 09 (07ndash11) 09 (07ndash12)Panic 08 (06ndash11) 08 (06ndash11)Agoraphobic fear 09 (07ndash12) 09 (07ndash12)Specific fear 10 (07ndash13) 10 (08ndash13)Social fear 10 (07ndash13) 10 (07ndash14)Separation anxietyndash Childhood only 12 (08ndash18) 13 (09ndash20)ndash Adult only 11 (07ndash17) 13 (08ndash20)ndash Both 13 (07ndash24) 16 (08ndash29)

III Substance problemsNicotine 12 (09ndash15) 12 (09ndash16)Alcohol-drugs 11 (08ndash15) 11 (08ndash14)

IV Impulse-control problemsOppositional-defiant 10 (08ndash13) 10 (08ndash14)Conduct 09 (07ndash11) 09 (06ndash11)Intermittent explosive 11 (09ndash14) 12 (10ndash16)Attention-hyperactive problemsndash Attention only 09 (06ndash13) 09 (07ndash13)ndash Hyperactive only 11 (07ndash19) 11 (06ndash20)ndash Both 10 (06ndash06) 10 (06ndash16)

1 Based on logistic regression equations in which respondents in the main survey were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of the residents of all non-respondent HUs using methods described in the text All equations controlled for the predictors in Model 3 in Table 4

IJMPR 132 3rd v4 25604 1026 am Page 83

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 6: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

Kessler Berglund et al74

evidence of systematic patterns of under-reportingdiagnostic stem questions Consistent with the ratio-nale for paying interviewers by the hour no suchevidence was found in the NCS-R

The sample design

Household sample selection proceduresRespondents were selected from a four-stage areaprobability sample of the non-institutionalizedcivilian population using small area data collected bythe US Bureau of the Census from the year 2000census of the US population to select the first twostages of the sample

The first stage of sampling selected a probabilitysample of 62 primary sampling units (PSUs) repre-sentative of the population These PSUs were linkedto the original PSUs used in the baseline NCS inorder to maximize the efficiency of cross-timecomparison Each PSU consisted of all counties in acensus-defined metropolitan statistical area (MSA)or in the case of counties not in an MSA of indi-vidual counties Primary sampling units wereselected with probabilities proportional to size (PPS)and geographic stratification from all possiblesegments in the country

The 62 PSUs include 16 MSAs that entered thesample with certainty an additional 31 non-certainty MSAs and 15 non-MSA counties Thecertainty selections include in order of size NewYork City Los Angeles Chicago PhiladelphiaDetroit San Francisco Washington DC DallasFortWorth Houston Boston Nassau-Suffolk NY StLouis Pittsburgh Baltimore Minneapolis andAtlanta These 16 PSUs are referred to as lsquoself-representingrsquo PSUs because they were not selectedrandomly to represent other MSAs but are so largethat they represent themselves Rigorously speakingthese 16 are not PSUs in the technical sense of thatterm but rather population strata The other 46PSUs are lsquonon-self-representingrsquo in the sense thatthey were selected to be representative of smallerareas in the country Systematic selection from anordered list was used to select the non-self-repre-senting PSUs with the order building in secondarystratification for geographic variation across thecountry After selection each of the three largestself-representing PSUs (New York Los AngelesChicago) was divided into four pseudo-PSUs while

each of the remaining 13 self-representing PSUs wasdivided into two pseudo-PSUs When combinedwith the 46 non-self-representing PSUs this yieldeda sample of 84 PSUs and pseudo-PSUs We hence-forth refer to both PSUs and pseudo-PSUs as PSUs

The second stage of sampling divided each PSUinto segments of between 50 and 100 housing unitsbased on 2000 small-area census data and selected aprobability sample of 12 such segments from eachnon-self-representing PSU A larger number ofsegments were selected from the self-representingPSUs using the ratio of the population size over thesystematic sampling interval Within each PSU thesegments were selected systematically from anordered list with probabilities of selection propor-tional to size The order in the list built in secondarystratification for geographic variation A total of1001 area segments were selected in the entiresample

Once the sample segments were selected eachsegment was either visited by an interviewer torecord the addresses of all housing units (HUs) (inthe case of segments that were not included in thebaseline NCS) or the listing used in the baselineNCS was updated for new construction and demoli-tion (in the case of segments that were included inthe baseline NCS) These lists were entered into acentralized computer data file In order to adjust fordiscrepancies between expected and observednumbers of HUs a random sample of HUs wasselected that equals 10 times OE where O is theobserved number of households listed in the segmentand E is the number of HUs expected in the segmentfrom the Census data files

This three-stage design creates a sample in whichthe probability of any individual HU being selectedto participate in the survey is equal for every HU inthe coterminous US The fourth stage of selectionthen obtained a household listing of all residents inthe age range 18 and older from a household infor-mant The informant was also asked whether eachadult HU resident spoke English Once the HUlisting was obtained a probability procedure wasused to select one or in some cases (see below) tworespondents to be interviewed As the number ofsampled cases in the HU did not increase proportion-ally with increase in HU size there is a bias in thisapproach toward underrepresenting people who livein large HUs As described below this bias was

IJMPR 132 3rd v4 25604 1026 am Page 74

NCS-R design and field procedures 75

corrected by weighting the data to adjust for thedifferential probability of selection within HUs

Procedures for sampling students living away from homeThe largest segment of the US population not livingin HUs consists of students who live in campus grouphousing (for example dormitories fraternities andsororities) We included these students in thesampling frame if they had a permanent homeaddress (typically the home of their parents) byincluding them in HU listings in all sample house-holds If they were selected for interview theircontact information at school was obtained andspecial arrangements were made to interview themStudents living in off-campus private housing ratherthan campus group housing were eligible for sampleselection in their school residences Students of thislatter sort were not included in the HU listings oftheir permanent residences in order to avoid givingsuch students two chances to enter the sample

Within-household sample selection proceduresRecruitment of HUs began by the interviewermailing an advance letter and study fact brochure tothe HU These materials explained the purposes ofthe study described the funding sources and thesurvey organization that was carrying out the surveylisted the names and affiliations of the senior scien-tists involved in the research provided informationabout the content and length of the interviewdescribed confidentiality procedures clearly statedthat participation was voluntary and provided a toll-free number for respondents who had additionalquestions The advance letter said that the inter-viewer would make an in-person visit to the HUwithin the next week in order to answer anyremaining questions and to determine whether theHU resident selected to participate would be willingto do so

In cases where it was difficult to find a HU resi-dent at home at least 15 in-person call attemptswere made to each sample HU to make an initialcontact with a HU member Additional contactattempts were made by telephone using a reversedirectory and by leaving notes at the HU with thestudy toll-free number and asking someone in theHU to call the study office to schedule an appoint-ment Additional in-person contact attempts weremade based on the discretion of the supervisor

Once a HU resident was contacted a listing of alleligible HU residents was made that recorded the ageand sex of each such person confirmed that eachcould speak English and ensured that permanentresidents who were students living away from homein campus group housing were included The Kishtable selection method was used to select one eligibleperson as the primary predesignated respondent Inaddition in a probability sample of households inwhich there were at least two eligible residents asecond predesignated respondent was selected inorder to study within-household aggregation ofmental disorders and to reduce variation across HUsin within-household probability of selection into thesample No attempt was made to select or to inter-view a secondary respondent unless the primaryrespondent was interviewed first This decision wasmade to avoid compromising the response rate forprimary respondents

The sampling fraction for the second predesig-nated respondent was lower in HUs with only two eligible respondents than those with three ormore eligible respondents in order to reduce variancein the within-household probability of selectionweight No more than two interviews were taken perHU even when the number of HU residents was largebecause we were concerned that taking more thantwo interviews would adversely affect the responserate by increasing household burden Moreover welimited selection of a second respondent to 25 ofHUs because we were concerned that using thisprocedure in a larger proportion of HUs in a samplewith a target of 10000 interviews would adverselyaffect the geographic dispersion of the sample byreducing the total number of HUs in the sample

Interviewing hard-to-recruit casesHard-to-recruit cases included both HUs that weredifficult to contact because the residents were usuallyaway from home and respondents who were reluctantto participate once they were reached The first ofthese problems was addressed by being persistent inmaking contact attempts at all times of day and alldays of the week We also left notes at the HUswhere we were consistently unable to make contactsthat included toll free numbers for residents tocontact us to schedule interview appointments Thesecond problem was addressed by offering a $50financial incentive to all respondents and by using a

IJMPR 132 3rd v4 25604 1026 am Page 75

Kessler Berglund et al76

variety of standard survey persuasion techniques toexplain the purpose and importance of the surveyand to emphasize the confidentiality of responses

Main data collection continued until all non-contact HUs had their full complement of callattempts and all reluctant predesignated respondentshad a number of recruitment attempts It was clear atthis point that the remaining hard-to-recruit caseswere busy people who were unlikely to participate ina survey that could reasonably be expected to lastbetween 2 and 3 hours We therefore developed ashort-form version of the interview that could beadministered in less than 1 hour and we made onelast attempt to obtain an interview with hard-to-recruit cases using this shortened instrumentRemaining hard-to-reach cases were sent a letterinforming them that the study was about to end andthat we were offering an honorarium of $100 forcompleting a shortened version of the interview inthe next 30 days that would last no more than onehour Interviewers were also given a bonus for eachshort-form interview they completed in this 30-dayclose-out period at which point fieldwork ended

Sample dispositionThe sample disposition as of the end of the mainphase of data collection is shown in the first fourcolumns of Table 2 The first two columns describeprimary predesignated respondents in the 10843

final sample HUs while the third and fourthcolumns describe secondary predesignated respon-dents in the 1976 HUs that were selected to obtainsecond interviews Ineligible HUs are excluded fromthe table The response rate at this point in the datacollection was 709 among primary and 804among secondary predesignated respondents with atotal of 9282 completed interviews Non-respondents included 73 of primary and 63 of secondary cases who refused to participate 177 ofprimary and 116 of secondary respondents whowere reluctant to participate (who told the inter-viewer that they were too busy at the time of contactbut did not refuse) 20 of primary and 17 ofsecondary respondents who could not be interviewedbecause of either a permanent condition (forexample mental retardation) or a long-term situa-tion (such as an overseas work assignment) and HUsthat were never contacted (20)

We subsequently attempted to administer short-form interviews to the 2143 reluctant andno-contact primary predesignated respondents andthe 230 reluctant secondary predesignated respon-dents described in the first two columns of Table 2In the course of completing the short-form inter-views with primary respondents an additional 104secondary pre-designated respondents were gener-ated resulting in a total of 334 secondarypre-designated respondents who we attempted to

Table 2 The NCS-R sample disposition

Main interview Short-form interview

Primary Secondary Primary Secondary

(n) (n) (n) (n)

Interview 709 (7693) 804 (1589) 186 (399) 464 (155)Refusal 73 (791)1 63 (124)1 751 (1609) 443 (148)Reluctant 177 (1922) 116 (230) ndash 2 ndash 2 ndash 2 ndash 2Circumstantial 20 (216) 17 (33) 06 (14) 93 (31)No contact 20 (221) ndash 3 ndash 3 56 (121) ndash 3 ndash 3Total (10843) (1976) (2143) (334)

1 Refusals among primary respondents include informants who refused to provide a HU listing respondents who refused to be interviewed and respondents who began the interview but broke off before completing Part I Refusals among secondary respon-dents include the last two of these three categories2 All non-respondents in the short-form interview phase of the survey were classified as refusals rather than reluctant unless theyhad circumstantial reasons for not being interviewed 3 No contact is defined at the HU-level as never making any contact with a resident As a result none of the secondary pre-designated respondents was included in this category even if no contact was ever made with them In the latter case they wereclassified as reluctant in the main interview and as refusals in the short-form interview

IJMPR 132 3rd v4 25604 1026 am Page 76

NCS-R design and field procedures 77

interview The sample disposition is shown in thelast four columns of Table 2 The conditionalresponse rate was 186 among primary and 464among secondary pre-designated respondents with atotal of 554 completed short-form interviews

The 9282 main interviews combined with the554 short-form interviews total to 9836 with 8092among primary and 1744 among secondary respon-dents Focusing on primary respondents the overallresponse rate is 746 the overall cooperation rate(excluding from the denominator HUs that werenever contacted) is 761 and the cooperation rateamong pre-designated respondents without circum-stantial constraints on their participation is 778Among secondary respondents the conditionaloverall response rate is 838 and the cooperationrate among pre-designated respondents withoutcircumstantial constraints is 865

WeightingTwo approaches were taken to weighting the NCS-Rdata The first which we refer to as the non-response(NR) adjustment approach treats the 9282 main interviews as the achieved sample The short-form interviews are treated as non-respondentinterviews which are used to develop a non-responseadjustment weight applied to the 9282 main inter-views The second approach which we refer to as themultiple-imputation (MI) approach combines the9282 main interviews with the 554 short-form inter-views to create a combined sample of 9836 cases inwhich the 554 short-form interviews are weighted totreat them as representative of all initial non-respon-dents The method of MI (Rubin 1987) is used toadjust for the fact that the short-form interviews didnot include all the questions in the main interviewsAdditional weights are applied in both approaches toadjust for differences in within-household proba-bility of selection and to post-stratify the finalsample to approximate the distribution of the 2000Census on a range of socio-demographic variables

The NR approach is the main one used in analysisof the NCS-R data The MI approach is moreexploratory because MI can lead to downward bias inestimates of associations if the models used to makethe imputations do not capture the full complexity ofthe associations in the main cases On the otherhand by using explicit models to impute missingvalues the MI approach allows more flexibility than

the NR approach in combining observed variablesfor short-form cases with patterns detected in thecomplete data In addition mild modelling assumptions can be used in the MI approach toimprove efficiency compared to the NR approach Incases of this sort when the parameter estimates aresimilar in the two approaches the SE will generallybe lower in the MI approach As a result of thesepotential advantages of the MI approach we plan touse MI to replicate marginally significant substantivepatterns as a sensitivity analysis bearing in mindthat shrinkage of the associations might occur due tolack of precision in the imputations Based on thishierarchy between the two approaches in the NCS-Ranalyses the focus in this section of the paper islargely on the NR approach

The non-response adjustment approachFive weights are applied to the data in the NRapproach a locked building subsampling weight(WT11) a within-household probability of selec-tion weight (WT12) a non-response adjustmentweight (WT13) a post-stratification weight(WT14) and a Part II selection weight (WT15)The joint product of the first four of these weights isthe consolidated weight used to analyse data fromthe Part I sample in the NR approach (n = 9282)while the joint product of all five weights is theconsolidated weight used to analyse data from the PartII sample (n = 5692) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample andweight are used This section of the paper reviewseach of these five weights

The locked building subsampling weight (WT11)gives a double weight to 60 sample HUs from an orig-inal 120 apartments in locked apartment buildingswhere we could not make contact with a superinten-dent or other official to gain entry A random 50 ofthe predesignated units in these buildings wereselected for especially intense recruitment effortThese 60 were double weighted It should be notedthat residents of a number of other locked apartmentbuildings were also included and interviewed basedon the interviewers contacting the building ownersuperintendent or official committee of residentsand obtaining permission to carry out interviews inthe building In addition we were officially refusedpermission to carry out interviews in five large

IJMPR 132 3rd v4 25604 1026 am Page 77

Kessler Berglund et al78

locked buildings in New York City each representinga complete sample segment Non-probabilityreplacements of other locked buildings in the sameneighbourhoods were made in order to avoidcomplete non-response in these segments

The within-household probability of selectionweight (WT12) adjusts for the fact that the proba-bility of selection of respondents within HUs variesinversely with the number of people in the HU Thisis true because as noted earlier only one or tworespondents were selected for interview in each HUWT12 was created by generating a separate observa-tional record for each of the weighted (by WT11)14025 eligible HU residents in the 7693 partici-pating HUs The three variables in the householdlisting were included in the individual-level recordsrespondent age and sex and number of eligible resi-dents in the HU The weighted distributions of thesethree variables were then calculated for the 9282respondents the 4733 eligible HU residents whowere not interviewed and for all 14015 eligible HUresidents weighted to adjust for WT11 The sum ofweights was 14025 because of this adjustment

As shown in Table 3 these distributions differmeaningfully with the greatest difference being for

size of household This is because the individual-level probability of selection within HUs wasinversely proportional to HU size leading to adistortion in the age and sex distributions amongrespondents because these variables vary with size ofHU WT12 was constructed to correct this bias Theweighted three-way cross-classification of age sexand household size was computed separately for the9282 respondents (weighted to 9287 because ofWT11) and for all 14015 residents of the partici-pating HUs (weighted to 14025) The ratio of thenumber of cases in the latter to the former was thencalculated separately for each cell and these ratioswere applied to each respondent The sum of theweights across respondents was normed to sum to14025 These normed ratios define WT12

The non-response adjustment weight (WT13)was then developed and applied to the weighted (bythe product of WT11 and WT12) dataset of 9282respondents to adjust for the fact that non-respondents differ from respondents The partici-pants in the short-form interviews from initialnon-respondent HUs were treated as representativeof all non-respondents in order to develop thisweight As described in the next subsection a two-

Table 3 Household listing information for respondents and other eligible residents of the households thatparticipated in the main survey1

Respondents Others Total

(SE) (SE) (SE)

Age18ndash29 227 (04) 269 (06) 241 (04)30ndash44 317 (05) 298 (07) 311 (04)45ndash59 246 (04) 266 (06) 253 (04)60+ 210 (04) 166 (05) 195 (03)

SexMale 446 (05) 520 (07) 471 (04)Female 554 (05) 480 (07) 529 (04)

Number of eligible HU residents1 293 (05) 00 (00) 194 (03)2 539 (05) 606 (07) 562 (04)3+ 168 (04) 394 (07) 244 (04)

Total (n) (9287) (4738) (14025)

1 A total of 9282 respondents from 7693 HUs participated in the main survey An additional 4733 eligible people lived in the 7693HUs The locked building weight (WT11) was used in calculating the distributions in the table converting the 9282 respondents intoa weighted total of 9287 and the 4733 other eligible HU residents into a weighted total of 4738 for a total of 14025

IJMPR 132 3rd v4 25604 1026 am Page 78

NCS-R design and field procedures 79

part weight was applied as a preliminary step to theseshort-form respondents aimed at making them asrepresentative as possible of all residents of non-respondent HUs As the short-form respondentscompleted the disorder screening section as well asthe demographic sections it was possible to use diag-nostic stem questions in addition to individual-leveldemographic variables as predictors in comparing themain survey respondents with the initial non-respon-dents Prior to carrying out the stepwise logisticregression analysis the main survey respondentswere weighted to a sum of weights of 14025 to repre-sent all eligible residents of the respondent HUswhile the initial non-respondents were weighted to asum of weights of 6302 to represent all eligible resi-dents of non-respondent HUs In both cases thesesums were obtained from HU listings

WT13 is a very important weight because thelogistic regression equation on which it is based eval-uates whether there is a significant bias in theprevalence of mental disorders in the main sampleWe consequently consider the construction ofWT13 in some detail in this part of the paper Fourstages of model fitting were used to create the logisticregression equation on which WT13 is based Thesestages sequentially evaluated the effects ofgeographic variables individual-level demographicvariables census small area aggregate variables andscreening measures of mental disorders Results ofthe first three stages are presented in Table 4 Thefirst stage (Model 1) examined segment-level differ-ences between respondent and non-respondent HUsin Census region and Census urbanicity Results inthe first two columns of Table 4 show that respon-dent HUs differ meaningfully from non-respondentsHUs in urbanicity (χ2

7 = 1060 p = 000) and region(χ2

3 = 66 p = 009) The odds-ratios (ORs) forurbanicity and region show that the sample over-represents non-metropolitan counties the Midwestand the South

The second stage (Model 2) added basic individual-level demographic variables to the equa-tion ndash age sex race-ethnicity marital statuseducation employment status and household sizeAs shown in Table 4 household size (χ2

3 = 604 p =000) and marital status (χ2

2 = 92 p = 001) werethe only significant predictors in this set withrespondents significantly less likely than non-respondents to live in houses with one to three

eligible residents and to be never married or previ-ously married

The third stage (Model 3) was based on a forwardstepwise logistic regression analysis that searched for2000 census block group (BG) aggregate measuresthat significantly discriminate between respondentsand non-respondents A wide range of variables wasused to describe BG characteristics includingpercentage variables (for example the percentage ofadults in the BG living in poverty the percentageliving alone) and mean variables (for example theaverage family income of HUs in the BG the averagenumber of adults per HU) In cases where a samplesegment crossed BG boundaries weighted averagesacross these units were calculated

Five aggregate variables were found to be signifi-cant predictors in the third stage of analysis All werediscretized in elaborations of the basic logistic regres-sion equations in order to study their functional formin distinguishing respondents from non-respondents(005 level two-sided tests) Dichotomous classifica-tion was found to be appropriate to characterize thefunctional form of four predictors A trichotomousspecification was required for the fifth predictorResults presented in Table 4 show that respondentswere significantly more likely than non-respondentsto live in areas with a low proportion of people overthe age of 65 a low proportion of foreign-speakers ahigh proportion of non-Hispanic whites a highproportion of never-married people and highproportions of people not in the labour force

In the fourth stage of estimating the non-responsebias equation positive responses to diagnostic stemquestions in the screening section of the interviewwere added to the predictors in Model 3 Stem questions were included for mood disturbance(dysphoria euphoria irritability) anxiety (persistentworry panic agoraphobia specific fear social fearchildhood and adult separation anxiety) substanceproblems (alcohol drugs nicotine) and impulsecontrol problems (oppositional-defiant conductdisorder intermittent explosive attentional andhyperactive) As shown in Table 5 none of thesevariables alone was either a statistically or substan-tively significant predictor in the fourth stage of theanalysis These variables are significant as a group(χ2

20 = 381 p = 0010) though even though noneof the predictors in the equation is individuallysignificant We also considered more complex

IJMPR 132 3rd v4 25604 1026 am Page 79

Kessler Berglund et al80

Table 4 Geographic and socio-demographic predictors of response versus non-response based on thecomparison of main survey respondents (n = 9282) versus short-form survey respondents (n = 475)1

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Region Midwest 17 (11ndash26) 66 16 (10ndash25) 53 14 (09ndash21) 28 3South 18 (11ndash29) 16 (10ndash26) 13 (08ndash22)West 13 (07ndash25) 14 (07ndash26) 13 (07ndash24)Northeast 10 ndash

County urbanicity2

Central counties of metro 10 ndash 10 ndash 10 ndashareas of 1+million

Fringe counties of metro a 10 (05ndash21) 1060 12 (05ndash24) 875 08 (04ndash15) 232 7reas 1+ million

Central and fringe counties 15 (09ndash25) 16 (10ndash27) 15 (10ndash25)of metro areas of 250000 ndash1 million

Central and fringe counties 15 (08ndash29) 16 (09ndash29) 13 (07ndash25)of metro areasof less than 250000

Non-metro counties of 20000 19 (09ndash42) 21 (10ndash45) 20 (11ndash40)or more adjacent to a metro area

Non-metro counties of 20000 or 69 (42ndash111) 69 (41ndash118) 28 (14ndash54)more not adjacent to a metro area

Non-metro counties of 17 (08ndash37) 19 (09ndash40) 16 (08ndash35)2500 ndash19999

Non-metro counties of less than 31 (18ndash55) 34 (20ndash58) 22 (12ndash43)2500

Age18ndash29 10 ndash 10 ndash30ndash44 09 (06ndash14) 23 10 (07ndash15) 29 345ndash59 08 (06ndash12) 09 (06ndash14)60+ 11 (06ndash19) 13 (08ndash23)

SexFemale 10 (09ndash13) ndash 10 (09ndash13) ndashMale 10 ndash 10 ndash

Race Non-Hispanic White 10 ndash 10 ndashNon-Hispanic Black 15 (10ndash22) 39 17 (11ndash26) 76 3Hispanic 11 (07ndash17) 14 (09ndash20)Other 09 (05ndash15) 09 (05ndash15)

Marital statusMarried3 10 ndash 10 ndashNever married 16 (11ndash24) 92 17 (11ndash25) 91 2Separatedwidoweddivorced 18 (12ndash28) 18 (12ndash27)

Education0ndash11 10 ndash 10 ndash12 09 (06ndash14) 18 09 (06ndash14) 32 313ndash15 10 (07ndash14) 10 (07ndash14)16+ 11 (08ndash16) 12 (08ndash17)

IJMPR 132 3rd v4 25604 1026 am Page 80

NCS-R design and field procedures 81

specifications that included disorder clusters andcounts but none yielded any more compellingevidence than in Table 5 for significant differencesbetween respondents and initial non-respondents inthe prevalence of these diagnostic stem questions

Based on these results the final non-responseadjustment weight was based on Model 3 Each ofthe 9282 main study respondents was assigned apredicted probability of response based on this equa-tion The non-response adjustment weight was

defined as the multiplicative inverse of thatpredicted probability This weight was then normedto sum to 9282 This normed weight defines WT13

The 9282 cases were then weighted by the jointproduct of WT11 WT12 and WT13 A fourthweight (WT14) was then created to adjust for varia-tion between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the March 2002 Current PopulationSurvey (CPS) data This post-stratification weight

Table 4 contd

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Employment statusEmployed 10 ndash 10 ndashHomemaker 13 (08ndash20) 20 14 (09ndash21) 28 4Retired 11 (06ndash18) 11 (07ndash18)Student 09 (04ndash18) 09 (04ndash18)Other 11 (06ndash19) 11 (06ndash20)

Number of eligible household residents1 07 (04ndash11) 604 06 (04ndash10) 720 32 19 (11ndash35) 18 (10ndash33)3 27 (14ndash52) 26 (13ndash50)4+ 10 ndash 10 ndash

Block group-level socio-demographics4

Unable to speak English D1-6 19 (14ndash26) Never married D5-10 15 (11ndash22) Not in labor force D1-3 06 (04ndash09) Age 65+ D1-3 27 (16ndash45) Age 65+ D4-9 15 (09ndash23) Non-Hispanic White D1 04 (02ndash06)

Significant at the 005 level two-sided test

1 Based on logistic regression equations in which main survey respondents were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of all non-respondents usingmethods described in the text2 Metropolitan counties are defined as either central or fringe counties of census metropolitan statistical areas Non-metropolitancounties are defined residually as all counties that are not in census metropolitan statistical areas For more details on the CensusBureau definition of metropolitan statistical areas see httpwwwcensusgovpopulationwwwestimatesmetrodefhtml3 Includes co-habitors4 The percentages in each county were converted to percentiles based on weighted (by population size) county-level distributionsand converted into deciles (D) of the weighted percentile distributions The coefficients from preliminary regression analysesincluded nine dummies for the deciles of each substantive variable These coefficients were examined in order to choose theoptimal way to collapse the deciles A dichotomous coding was optimal for four of the five substantive variables and a trichotomouscoding for the fifth

IJMPR 132 3rd v4 25604 1026 am Page 81

Kessler Berglund et al82

was based on comparisons of age sex race-ethnicityeducation marital status region and urbanicity inthe weighted (by the joint product of WT11WT12 and WT13) sample and the 2002 CPSsample for persons ages 18+ in the continental USAs the full cross-classification of these seven vari-ables even using coarse categories has more cellsthan there are respondents in the sample asmoothing method was used to fit only the two-waymarginals by estimating a logistic regression equationthat combined the weighted 9282 respondents(coded 1 on the dichotomous dependent variable)with a synthetic dataset representative of the individual-level 2002 CPS data for the populationages 18+ (coded 0 on the dependent variable) Theseven socio-demographic variables and their two-way interactions were the predictors The predictedprobability of being in the NCS-R sample ratherthan in the synthetic CPS population sample wasdefined for each respondent (pNn) based on thisequation The ratio pNn (1 ndash pNn) was then calcu-lated for each respondent The sum of these ratiosacross the 9282 respondents was normed to sum to9282 These normed ratios define WT14

The four-way product of weights WT11 throughWT14 was then created and applied to the 9282respondent cases This defines the consolidated PartI weight based on the NR approach An additionalweight (WT15) is needed though to analyse datain the Part II sample This is because as notedearlier the Part I sample was divided into three stratathat differed in their probabilities of selection intoPart II (100 in stratum I 50 in stratum II and25 in stratum III) The proportions who actuallycompleted Part II differ from these targets 992 ofthe 4088 respondents in stratum I (n = 4054)534 of the 1555 respondents in stratum II (n = 830) and 222 of the 3639 respondents instratum III (n = 808) for a total Part II sample of5692 respondents These differences from the targetproportions occurred because some respondentsrefused to complete Part II (approximately 1 ofdesignated cases in each stratum) and because therandom selection procedure for respondents imple-mented in the CAPI program had some randomerror The latter accounts for the fact that thenumber of Part II respondents in stratum II is largerthan the target proportion

We could have generated WT15 as the simple

inverse of the weighted (by the consolidated Part Iweight) proportion of Part I respondents in thestratum who completed Part II However in order tocheck for the possibility of systematic bias we esti-mated separate stepwise logistic regression equationsin each stratum to predict participation in Part IIfrom Part I responses Only a handful of variables allof them demographic were found to be significantpredictors in these equations Each Part II respon-dent was assigned the inverse of his or her predictedprobability of participation in Part II from the finalwithin-stratum equation These values were thennormed to have a sum equal to the sum of the Part Iweights in the full Part I sample in the stratumThese normed values were then summed across theentire Part II sample of 5692 cases and renormed tohave a sum of weights of 5692 These renormedvalues define WT15 WT15 was then multiplied bythe consolidated Part I weight to create the consoli-dated Part II weight

Comparison of the weighted and unweighteddistributions of the Part I and Part II samples with2002 CPS population distributions provides informa-tion on the effects of weighting As shown in Table6 the unweighted Part I samples overrepresentedracial minorities females residents of the Midwestpeople with 13+ years of education and residents ofmetropolitan areas All of these biases were correctedwith the consolidated Part I weight Biases in theunweighted Part II sample were similar althoughmore extreme biases were found than in the Part Isample with regard to the over-representation offemales young adults (ages 18ndash34) and residents ofMetropolitan areas All of these biases werecorrected with the consolidated Part II weight

Weighting short-form respondents to be representative ofall non-respondentsWe noted in the last subsection that WT13 relies ona two-part weighting scheme that was applied to theshort-form respondents before they were comparedwith the main survey respondents This was done inorder to increase the extent to which the short-formrespondents represent all main survey non-respondents The first part of this weighting schemecompared the 399 HUs in which short-form inter-views were completed with all other non-respondentHUs on the same aggregate 2000 census block group(BG) data as those used in the third stage (Model 3)

IJMPR 132 3rd v4 25604 1026 am Page 82

NCS-R design and field procedures 83

of the non-response adjustment model (Table 4)Stepwise logistic regression was used to select signifi-cant predictors of HU-level participation versusnon-participation The final set of significant predic-tors included region urbanicity and household sizeThe participating HUs were then weighted by theinverse of their predicted probability of participationto adjust for segment-level variation in responseThis weight was then normed to have a sum ofweights equal to 3258 the weighted total number ofnon-respondent HUs in the sample

With this first weight applied separately to each ofthe 773 residents of the 399 participating short-formHUs a comparison of the short-form respondents inthese 399 HUs with the remaining eligible residentsof the same HUs was made based on the cross-classification of listing information (age and sex ofeach eligible HU resident and number of eligibleresidents in each HU) A second weight was thendeveloped that was equivalent to WT12 in adjustingthe short-form respondents to be representative of all773 eligible residents of these HUs As the 399 HUs

Table 5 Diagnostic stem question predictors of response versus non-response based on the comparison ofmain survey respondents (n = 9282) versus short-form survey respondents in non-respondent households (n = 475)1

Bivariate Multivariate

OR (95 CI) OR (95 CI)

I Mood disturbanceDysphoria 09 (07ndash12) 10 (08ndash14)Euphoria 08 (06ndash11) 09 (07ndash12)Irritability 08 (06ndash10) 08 (06ndash11)Extreme irritability 08 (06ndash12) 09 (07ndash13)

II AnxietyPersistent worry 09 (07ndash11) 09 (07ndash12)Panic 08 (06ndash11) 08 (06ndash11)Agoraphobic fear 09 (07ndash12) 09 (07ndash12)Specific fear 10 (07ndash13) 10 (08ndash13)Social fear 10 (07ndash13) 10 (07ndash14)Separation anxietyndash Childhood only 12 (08ndash18) 13 (09ndash20)ndash Adult only 11 (07ndash17) 13 (08ndash20)ndash Both 13 (07ndash24) 16 (08ndash29)

III Substance problemsNicotine 12 (09ndash15) 12 (09ndash16)Alcohol-drugs 11 (08ndash15) 11 (08ndash14)

IV Impulse-control problemsOppositional-defiant 10 (08ndash13) 10 (08ndash14)Conduct 09 (07ndash11) 09 (06ndash11)Intermittent explosive 11 (09ndash14) 12 (10ndash16)Attention-hyperactive problemsndash Attention only 09 (06ndash13) 09 (07ndash13)ndash Hyperactive only 11 (07ndash19) 11 (06ndash20)ndash Both 10 (06ndash06) 10 (06ndash16)

1 Based on logistic regression equations in which respondents in the main survey were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of the residents of all non-respondent HUs using methods described in the text All equations controlled for the predictors in Model 3 in Table 4

IJMPR 132 3rd v4 25604 1026 am Page 83

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 7: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

NCS-R design and field procedures 75

corrected by weighting the data to adjust for thedifferential probability of selection within HUs

Procedures for sampling students living away from homeThe largest segment of the US population not livingin HUs consists of students who live in campus grouphousing (for example dormitories fraternities andsororities) We included these students in thesampling frame if they had a permanent homeaddress (typically the home of their parents) byincluding them in HU listings in all sample house-holds If they were selected for interview theircontact information at school was obtained andspecial arrangements were made to interview themStudents living in off-campus private housing ratherthan campus group housing were eligible for sampleselection in their school residences Students of thislatter sort were not included in the HU listings oftheir permanent residences in order to avoid givingsuch students two chances to enter the sample

Within-household sample selection proceduresRecruitment of HUs began by the interviewermailing an advance letter and study fact brochure tothe HU These materials explained the purposes ofthe study described the funding sources and thesurvey organization that was carrying out the surveylisted the names and affiliations of the senior scien-tists involved in the research provided informationabout the content and length of the interviewdescribed confidentiality procedures clearly statedthat participation was voluntary and provided a toll-free number for respondents who had additionalquestions The advance letter said that the inter-viewer would make an in-person visit to the HUwithin the next week in order to answer anyremaining questions and to determine whether theHU resident selected to participate would be willingto do so

In cases where it was difficult to find a HU resi-dent at home at least 15 in-person call attemptswere made to each sample HU to make an initialcontact with a HU member Additional contactattempts were made by telephone using a reversedirectory and by leaving notes at the HU with thestudy toll-free number and asking someone in theHU to call the study office to schedule an appoint-ment Additional in-person contact attempts weremade based on the discretion of the supervisor

Once a HU resident was contacted a listing of alleligible HU residents was made that recorded the ageand sex of each such person confirmed that eachcould speak English and ensured that permanentresidents who were students living away from homein campus group housing were included The Kishtable selection method was used to select one eligibleperson as the primary predesignated respondent Inaddition in a probability sample of households inwhich there were at least two eligible residents asecond predesignated respondent was selected inorder to study within-household aggregation ofmental disorders and to reduce variation across HUsin within-household probability of selection into thesample No attempt was made to select or to inter-view a secondary respondent unless the primaryrespondent was interviewed first This decision wasmade to avoid compromising the response rate forprimary respondents

The sampling fraction for the second predesig-nated respondent was lower in HUs with only two eligible respondents than those with three ormore eligible respondents in order to reduce variancein the within-household probability of selectionweight No more than two interviews were taken perHU even when the number of HU residents was largebecause we were concerned that taking more thantwo interviews would adversely affect the responserate by increasing household burden Moreover welimited selection of a second respondent to 25 ofHUs because we were concerned that using thisprocedure in a larger proportion of HUs in a samplewith a target of 10000 interviews would adverselyaffect the geographic dispersion of the sample byreducing the total number of HUs in the sample

Interviewing hard-to-recruit casesHard-to-recruit cases included both HUs that weredifficult to contact because the residents were usuallyaway from home and respondents who were reluctantto participate once they were reached The first ofthese problems was addressed by being persistent inmaking contact attempts at all times of day and alldays of the week We also left notes at the HUswhere we were consistently unable to make contactsthat included toll free numbers for residents tocontact us to schedule interview appointments Thesecond problem was addressed by offering a $50financial incentive to all respondents and by using a

IJMPR 132 3rd v4 25604 1026 am Page 75

Kessler Berglund et al76

variety of standard survey persuasion techniques toexplain the purpose and importance of the surveyand to emphasize the confidentiality of responses

Main data collection continued until all non-contact HUs had their full complement of callattempts and all reluctant predesignated respondentshad a number of recruitment attempts It was clear atthis point that the remaining hard-to-recruit caseswere busy people who were unlikely to participate ina survey that could reasonably be expected to lastbetween 2 and 3 hours We therefore developed ashort-form version of the interview that could beadministered in less than 1 hour and we made onelast attempt to obtain an interview with hard-to-recruit cases using this shortened instrumentRemaining hard-to-reach cases were sent a letterinforming them that the study was about to end andthat we were offering an honorarium of $100 forcompleting a shortened version of the interview inthe next 30 days that would last no more than onehour Interviewers were also given a bonus for eachshort-form interview they completed in this 30-dayclose-out period at which point fieldwork ended

Sample dispositionThe sample disposition as of the end of the mainphase of data collection is shown in the first fourcolumns of Table 2 The first two columns describeprimary predesignated respondents in the 10843

final sample HUs while the third and fourthcolumns describe secondary predesignated respon-dents in the 1976 HUs that were selected to obtainsecond interviews Ineligible HUs are excluded fromthe table The response rate at this point in the datacollection was 709 among primary and 804among secondary predesignated respondents with atotal of 9282 completed interviews Non-respondents included 73 of primary and 63 of secondary cases who refused to participate 177 ofprimary and 116 of secondary respondents whowere reluctant to participate (who told the inter-viewer that they were too busy at the time of contactbut did not refuse) 20 of primary and 17 ofsecondary respondents who could not be interviewedbecause of either a permanent condition (forexample mental retardation) or a long-term situa-tion (such as an overseas work assignment) and HUsthat were never contacted (20)

We subsequently attempted to administer short-form interviews to the 2143 reluctant andno-contact primary predesignated respondents andthe 230 reluctant secondary predesignated respon-dents described in the first two columns of Table 2In the course of completing the short-form inter-views with primary respondents an additional 104secondary pre-designated respondents were gener-ated resulting in a total of 334 secondarypre-designated respondents who we attempted to

Table 2 The NCS-R sample disposition

Main interview Short-form interview

Primary Secondary Primary Secondary

(n) (n) (n) (n)

Interview 709 (7693) 804 (1589) 186 (399) 464 (155)Refusal 73 (791)1 63 (124)1 751 (1609) 443 (148)Reluctant 177 (1922) 116 (230) ndash 2 ndash 2 ndash 2 ndash 2Circumstantial 20 (216) 17 (33) 06 (14) 93 (31)No contact 20 (221) ndash 3 ndash 3 56 (121) ndash 3 ndash 3Total (10843) (1976) (2143) (334)

1 Refusals among primary respondents include informants who refused to provide a HU listing respondents who refused to be interviewed and respondents who began the interview but broke off before completing Part I Refusals among secondary respon-dents include the last two of these three categories2 All non-respondents in the short-form interview phase of the survey were classified as refusals rather than reluctant unless theyhad circumstantial reasons for not being interviewed 3 No contact is defined at the HU-level as never making any contact with a resident As a result none of the secondary pre-designated respondents was included in this category even if no contact was ever made with them In the latter case they wereclassified as reluctant in the main interview and as refusals in the short-form interview

IJMPR 132 3rd v4 25604 1026 am Page 76

NCS-R design and field procedures 77

interview The sample disposition is shown in thelast four columns of Table 2 The conditionalresponse rate was 186 among primary and 464among secondary pre-designated respondents with atotal of 554 completed short-form interviews

The 9282 main interviews combined with the554 short-form interviews total to 9836 with 8092among primary and 1744 among secondary respon-dents Focusing on primary respondents the overallresponse rate is 746 the overall cooperation rate(excluding from the denominator HUs that werenever contacted) is 761 and the cooperation rateamong pre-designated respondents without circum-stantial constraints on their participation is 778Among secondary respondents the conditionaloverall response rate is 838 and the cooperationrate among pre-designated respondents withoutcircumstantial constraints is 865

WeightingTwo approaches were taken to weighting the NCS-Rdata The first which we refer to as the non-response(NR) adjustment approach treats the 9282 main interviews as the achieved sample The short-form interviews are treated as non-respondentinterviews which are used to develop a non-responseadjustment weight applied to the 9282 main inter-views The second approach which we refer to as themultiple-imputation (MI) approach combines the9282 main interviews with the 554 short-form inter-views to create a combined sample of 9836 cases inwhich the 554 short-form interviews are weighted totreat them as representative of all initial non-respon-dents The method of MI (Rubin 1987) is used toadjust for the fact that the short-form interviews didnot include all the questions in the main interviewsAdditional weights are applied in both approaches toadjust for differences in within-household proba-bility of selection and to post-stratify the finalsample to approximate the distribution of the 2000Census on a range of socio-demographic variables

The NR approach is the main one used in analysisof the NCS-R data The MI approach is moreexploratory because MI can lead to downward bias inestimates of associations if the models used to makethe imputations do not capture the full complexity ofthe associations in the main cases On the otherhand by using explicit models to impute missingvalues the MI approach allows more flexibility than

the NR approach in combining observed variablesfor short-form cases with patterns detected in thecomplete data In addition mild modelling assumptions can be used in the MI approach toimprove efficiency compared to the NR approach Incases of this sort when the parameter estimates aresimilar in the two approaches the SE will generallybe lower in the MI approach As a result of thesepotential advantages of the MI approach we plan touse MI to replicate marginally significant substantivepatterns as a sensitivity analysis bearing in mindthat shrinkage of the associations might occur due tolack of precision in the imputations Based on thishierarchy between the two approaches in the NCS-Ranalyses the focus in this section of the paper islargely on the NR approach

The non-response adjustment approachFive weights are applied to the data in the NRapproach a locked building subsampling weight(WT11) a within-household probability of selec-tion weight (WT12) a non-response adjustmentweight (WT13) a post-stratification weight(WT14) and a Part II selection weight (WT15)The joint product of the first four of these weights isthe consolidated weight used to analyse data fromthe Part I sample in the NR approach (n = 9282)while the joint product of all five weights is theconsolidated weight used to analyse data from the PartII sample (n = 5692) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample andweight are used This section of the paper reviewseach of these five weights

The locked building subsampling weight (WT11)gives a double weight to 60 sample HUs from an orig-inal 120 apartments in locked apartment buildingswhere we could not make contact with a superinten-dent or other official to gain entry A random 50 ofthe predesignated units in these buildings wereselected for especially intense recruitment effortThese 60 were double weighted It should be notedthat residents of a number of other locked apartmentbuildings were also included and interviewed basedon the interviewers contacting the building ownersuperintendent or official committee of residentsand obtaining permission to carry out interviews inthe building In addition we were officially refusedpermission to carry out interviews in five large

IJMPR 132 3rd v4 25604 1026 am Page 77

Kessler Berglund et al78

locked buildings in New York City each representinga complete sample segment Non-probabilityreplacements of other locked buildings in the sameneighbourhoods were made in order to avoidcomplete non-response in these segments

The within-household probability of selectionweight (WT12) adjusts for the fact that the proba-bility of selection of respondents within HUs variesinversely with the number of people in the HU Thisis true because as noted earlier only one or tworespondents were selected for interview in each HUWT12 was created by generating a separate observa-tional record for each of the weighted (by WT11)14025 eligible HU residents in the 7693 partici-pating HUs The three variables in the householdlisting were included in the individual-level recordsrespondent age and sex and number of eligible resi-dents in the HU The weighted distributions of thesethree variables were then calculated for the 9282respondents the 4733 eligible HU residents whowere not interviewed and for all 14015 eligible HUresidents weighted to adjust for WT11 The sum ofweights was 14025 because of this adjustment

As shown in Table 3 these distributions differmeaningfully with the greatest difference being for

size of household This is because the individual-level probability of selection within HUs wasinversely proportional to HU size leading to adistortion in the age and sex distributions amongrespondents because these variables vary with size ofHU WT12 was constructed to correct this bias Theweighted three-way cross-classification of age sexand household size was computed separately for the9282 respondents (weighted to 9287 because ofWT11) and for all 14015 residents of the partici-pating HUs (weighted to 14025) The ratio of thenumber of cases in the latter to the former was thencalculated separately for each cell and these ratioswere applied to each respondent The sum of theweights across respondents was normed to sum to14025 These normed ratios define WT12

The non-response adjustment weight (WT13)was then developed and applied to the weighted (bythe product of WT11 and WT12) dataset of 9282respondents to adjust for the fact that non-respondents differ from respondents The partici-pants in the short-form interviews from initialnon-respondent HUs were treated as representativeof all non-respondents in order to develop thisweight As described in the next subsection a two-

Table 3 Household listing information for respondents and other eligible residents of the households thatparticipated in the main survey1

Respondents Others Total

(SE) (SE) (SE)

Age18ndash29 227 (04) 269 (06) 241 (04)30ndash44 317 (05) 298 (07) 311 (04)45ndash59 246 (04) 266 (06) 253 (04)60+ 210 (04) 166 (05) 195 (03)

SexMale 446 (05) 520 (07) 471 (04)Female 554 (05) 480 (07) 529 (04)

Number of eligible HU residents1 293 (05) 00 (00) 194 (03)2 539 (05) 606 (07) 562 (04)3+ 168 (04) 394 (07) 244 (04)

Total (n) (9287) (4738) (14025)

1 A total of 9282 respondents from 7693 HUs participated in the main survey An additional 4733 eligible people lived in the 7693HUs The locked building weight (WT11) was used in calculating the distributions in the table converting the 9282 respondents intoa weighted total of 9287 and the 4733 other eligible HU residents into a weighted total of 4738 for a total of 14025

IJMPR 132 3rd v4 25604 1026 am Page 78

NCS-R design and field procedures 79

part weight was applied as a preliminary step to theseshort-form respondents aimed at making them asrepresentative as possible of all residents of non-respondent HUs As the short-form respondentscompleted the disorder screening section as well asthe demographic sections it was possible to use diag-nostic stem questions in addition to individual-leveldemographic variables as predictors in comparing themain survey respondents with the initial non-respon-dents Prior to carrying out the stepwise logisticregression analysis the main survey respondentswere weighted to a sum of weights of 14025 to repre-sent all eligible residents of the respondent HUswhile the initial non-respondents were weighted to asum of weights of 6302 to represent all eligible resi-dents of non-respondent HUs In both cases thesesums were obtained from HU listings

WT13 is a very important weight because thelogistic regression equation on which it is based eval-uates whether there is a significant bias in theprevalence of mental disorders in the main sampleWe consequently consider the construction ofWT13 in some detail in this part of the paper Fourstages of model fitting were used to create the logisticregression equation on which WT13 is based Thesestages sequentially evaluated the effects ofgeographic variables individual-level demographicvariables census small area aggregate variables andscreening measures of mental disorders Results ofthe first three stages are presented in Table 4 Thefirst stage (Model 1) examined segment-level differ-ences between respondent and non-respondent HUsin Census region and Census urbanicity Results inthe first two columns of Table 4 show that respon-dent HUs differ meaningfully from non-respondentsHUs in urbanicity (χ2

7 = 1060 p = 000) and region(χ2

3 = 66 p = 009) The odds-ratios (ORs) forurbanicity and region show that the sample over-represents non-metropolitan counties the Midwestand the South

The second stage (Model 2) added basic individual-level demographic variables to the equa-tion ndash age sex race-ethnicity marital statuseducation employment status and household sizeAs shown in Table 4 household size (χ2

3 = 604 p =000) and marital status (χ2

2 = 92 p = 001) werethe only significant predictors in this set withrespondents significantly less likely than non-respondents to live in houses with one to three

eligible residents and to be never married or previ-ously married

The third stage (Model 3) was based on a forwardstepwise logistic regression analysis that searched for2000 census block group (BG) aggregate measuresthat significantly discriminate between respondentsand non-respondents A wide range of variables wasused to describe BG characteristics includingpercentage variables (for example the percentage ofadults in the BG living in poverty the percentageliving alone) and mean variables (for example theaverage family income of HUs in the BG the averagenumber of adults per HU) In cases where a samplesegment crossed BG boundaries weighted averagesacross these units were calculated

Five aggregate variables were found to be signifi-cant predictors in the third stage of analysis All werediscretized in elaborations of the basic logistic regres-sion equations in order to study their functional formin distinguishing respondents from non-respondents(005 level two-sided tests) Dichotomous classifica-tion was found to be appropriate to characterize thefunctional form of four predictors A trichotomousspecification was required for the fifth predictorResults presented in Table 4 show that respondentswere significantly more likely than non-respondentsto live in areas with a low proportion of people overthe age of 65 a low proportion of foreign-speakers ahigh proportion of non-Hispanic whites a highproportion of never-married people and highproportions of people not in the labour force

In the fourth stage of estimating the non-responsebias equation positive responses to diagnostic stemquestions in the screening section of the interviewwere added to the predictors in Model 3 Stem questions were included for mood disturbance(dysphoria euphoria irritability) anxiety (persistentworry panic agoraphobia specific fear social fearchildhood and adult separation anxiety) substanceproblems (alcohol drugs nicotine) and impulsecontrol problems (oppositional-defiant conductdisorder intermittent explosive attentional andhyperactive) As shown in Table 5 none of thesevariables alone was either a statistically or substan-tively significant predictor in the fourth stage of theanalysis These variables are significant as a group(χ2

20 = 381 p = 0010) though even though noneof the predictors in the equation is individuallysignificant We also considered more complex

IJMPR 132 3rd v4 25604 1026 am Page 79

Kessler Berglund et al80

Table 4 Geographic and socio-demographic predictors of response versus non-response based on thecomparison of main survey respondents (n = 9282) versus short-form survey respondents (n = 475)1

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Region Midwest 17 (11ndash26) 66 16 (10ndash25) 53 14 (09ndash21) 28 3South 18 (11ndash29) 16 (10ndash26) 13 (08ndash22)West 13 (07ndash25) 14 (07ndash26) 13 (07ndash24)Northeast 10 ndash

County urbanicity2

Central counties of metro 10 ndash 10 ndash 10 ndashareas of 1+million

Fringe counties of metro a 10 (05ndash21) 1060 12 (05ndash24) 875 08 (04ndash15) 232 7reas 1+ million

Central and fringe counties 15 (09ndash25) 16 (10ndash27) 15 (10ndash25)of metro areas of 250000 ndash1 million

Central and fringe counties 15 (08ndash29) 16 (09ndash29) 13 (07ndash25)of metro areasof less than 250000

Non-metro counties of 20000 19 (09ndash42) 21 (10ndash45) 20 (11ndash40)or more adjacent to a metro area

Non-metro counties of 20000 or 69 (42ndash111) 69 (41ndash118) 28 (14ndash54)more not adjacent to a metro area

Non-metro counties of 17 (08ndash37) 19 (09ndash40) 16 (08ndash35)2500 ndash19999

Non-metro counties of less than 31 (18ndash55) 34 (20ndash58) 22 (12ndash43)2500

Age18ndash29 10 ndash 10 ndash30ndash44 09 (06ndash14) 23 10 (07ndash15) 29 345ndash59 08 (06ndash12) 09 (06ndash14)60+ 11 (06ndash19) 13 (08ndash23)

SexFemale 10 (09ndash13) ndash 10 (09ndash13) ndashMale 10 ndash 10 ndash

Race Non-Hispanic White 10 ndash 10 ndashNon-Hispanic Black 15 (10ndash22) 39 17 (11ndash26) 76 3Hispanic 11 (07ndash17) 14 (09ndash20)Other 09 (05ndash15) 09 (05ndash15)

Marital statusMarried3 10 ndash 10 ndashNever married 16 (11ndash24) 92 17 (11ndash25) 91 2Separatedwidoweddivorced 18 (12ndash28) 18 (12ndash27)

Education0ndash11 10 ndash 10 ndash12 09 (06ndash14) 18 09 (06ndash14) 32 313ndash15 10 (07ndash14) 10 (07ndash14)16+ 11 (08ndash16) 12 (08ndash17)

IJMPR 132 3rd v4 25604 1026 am Page 80

NCS-R design and field procedures 81

specifications that included disorder clusters andcounts but none yielded any more compellingevidence than in Table 5 for significant differencesbetween respondents and initial non-respondents inthe prevalence of these diagnostic stem questions

Based on these results the final non-responseadjustment weight was based on Model 3 Each ofthe 9282 main study respondents was assigned apredicted probability of response based on this equa-tion The non-response adjustment weight was

defined as the multiplicative inverse of thatpredicted probability This weight was then normedto sum to 9282 This normed weight defines WT13

The 9282 cases were then weighted by the jointproduct of WT11 WT12 and WT13 A fourthweight (WT14) was then created to adjust for varia-tion between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the March 2002 Current PopulationSurvey (CPS) data This post-stratification weight

Table 4 contd

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Employment statusEmployed 10 ndash 10 ndashHomemaker 13 (08ndash20) 20 14 (09ndash21) 28 4Retired 11 (06ndash18) 11 (07ndash18)Student 09 (04ndash18) 09 (04ndash18)Other 11 (06ndash19) 11 (06ndash20)

Number of eligible household residents1 07 (04ndash11) 604 06 (04ndash10) 720 32 19 (11ndash35) 18 (10ndash33)3 27 (14ndash52) 26 (13ndash50)4+ 10 ndash 10 ndash

Block group-level socio-demographics4

Unable to speak English D1-6 19 (14ndash26) Never married D5-10 15 (11ndash22) Not in labor force D1-3 06 (04ndash09) Age 65+ D1-3 27 (16ndash45) Age 65+ D4-9 15 (09ndash23) Non-Hispanic White D1 04 (02ndash06)

Significant at the 005 level two-sided test

1 Based on logistic regression equations in which main survey respondents were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of all non-respondents usingmethods described in the text2 Metropolitan counties are defined as either central or fringe counties of census metropolitan statistical areas Non-metropolitancounties are defined residually as all counties that are not in census metropolitan statistical areas For more details on the CensusBureau definition of metropolitan statistical areas see httpwwwcensusgovpopulationwwwestimatesmetrodefhtml3 Includes co-habitors4 The percentages in each county were converted to percentiles based on weighted (by population size) county-level distributionsand converted into deciles (D) of the weighted percentile distributions The coefficients from preliminary regression analysesincluded nine dummies for the deciles of each substantive variable These coefficients were examined in order to choose theoptimal way to collapse the deciles A dichotomous coding was optimal for four of the five substantive variables and a trichotomouscoding for the fifth

IJMPR 132 3rd v4 25604 1026 am Page 81

Kessler Berglund et al82

was based on comparisons of age sex race-ethnicityeducation marital status region and urbanicity inthe weighted (by the joint product of WT11WT12 and WT13) sample and the 2002 CPSsample for persons ages 18+ in the continental USAs the full cross-classification of these seven vari-ables even using coarse categories has more cellsthan there are respondents in the sample asmoothing method was used to fit only the two-waymarginals by estimating a logistic regression equationthat combined the weighted 9282 respondents(coded 1 on the dichotomous dependent variable)with a synthetic dataset representative of the individual-level 2002 CPS data for the populationages 18+ (coded 0 on the dependent variable) Theseven socio-demographic variables and their two-way interactions were the predictors The predictedprobability of being in the NCS-R sample ratherthan in the synthetic CPS population sample wasdefined for each respondent (pNn) based on thisequation The ratio pNn (1 ndash pNn) was then calcu-lated for each respondent The sum of these ratiosacross the 9282 respondents was normed to sum to9282 These normed ratios define WT14

The four-way product of weights WT11 throughWT14 was then created and applied to the 9282respondent cases This defines the consolidated PartI weight based on the NR approach An additionalweight (WT15) is needed though to analyse datain the Part II sample This is because as notedearlier the Part I sample was divided into three stratathat differed in their probabilities of selection intoPart II (100 in stratum I 50 in stratum II and25 in stratum III) The proportions who actuallycompleted Part II differ from these targets 992 ofthe 4088 respondents in stratum I (n = 4054)534 of the 1555 respondents in stratum II (n = 830) and 222 of the 3639 respondents instratum III (n = 808) for a total Part II sample of5692 respondents These differences from the targetproportions occurred because some respondentsrefused to complete Part II (approximately 1 ofdesignated cases in each stratum) and because therandom selection procedure for respondents imple-mented in the CAPI program had some randomerror The latter accounts for the fact that thenumber of Part II respondents in stratum II is largerthan the target proportion

We could have generated WT15 as the simple

inverse of the weighted (by the consolidated Part Iweight) proportion of Part I respondents in thestratum who completed Part II However in order tocheck for the possibility of systematic bias we esti-mated separate stepwise logistic regression equationsin each stratum to predict participation in Part IIfrom Part I responses Only a handful of variables allof them demographic were found to be significantpredictors in these equations Each Part II respon-dent was assigned the inverse of his or her predictedprobability of participation in Part II from the finalwithin-stratum equation These values were thennormed to have a sum equal to the sum of the Part Iweights in the full Part I sample in the stratumThese normed values were then summed across theentire Part II sample of 5692 cases and renormed tohave a sum of weights of 5692 These renormedvalues define WT15 WT15 was then multiplied bythe consolidated Part I weight to create the consoli-dated Part II weight

Comparison of the weighted and unweighteddistributions of the Part I and Part II samples with2002 CPS population distributions provides informa-tion on the effects of weighting As shown in Table6 the unweighted Part I samples overrepresentedracial minorities females residents of the Midwestpeople with 13+ years of education and residents ofmetropolitan areas All of these biases were correctedwith the consolidated Part I weight Biases in theunweighted Part II sample were similar althoughmore extreme biases were found than in the Part Isample with regard to the over-representation offemales young adults (ages 18ndash34) and residents ofMetropolitan areas All of these biases werecorrected with the consolidated Part II weight

Weighting short-form respondents to be representative ofall non-respondentsWe noted in the last subsection that WT13 relies ona two-part weighting scheme that was applied to theshort-form respondents before they were comparedwith the main survey respondents This was done inorder to increase the extent to which the short-formrespondents represent all main survey non-respondents The first part of this weighting schemecompared the 399 HUs in which short-form inter-views were completed with all other non-respondentHUs on the same aggregate 2000 census block group(BG) data as those used in the third stage (Model 3)

IJMPR 132 3rd v4 25604 1026 am Page 82

NCS-R design and field procedures 83

of the non-response adjustment model (Table 4)Stepwise logistic regression was used to select signifi-cant predictors of HU-level participation versusnon-participation The final set of significant predic-tors included region urbanicity and household sizeThe participating HUs were then weighted by theinverse of their predicted probability of participationto adjust for segment-level variation in responseThis weight was then normed to have a sum ofweights equal to 3258 the weighted total number ofnon-respondent HUs in the sample

With this first weight applied separately to each ofthe 773 residents of the 399 participating short-formHUs a comparison of the short-form respondents inthese 399 HUs with the remaining eligible residentsof the same HUs was made based on the cross-classification of listing information (age and sex ofeach eligible HU resident and number of eligibleresidents in each HU) A second weight was thendeveloped that was equivalent to WT12 in adjustingthe short-form respondents to be representative of all773 eligible residents of these HUs As the 399 HUs

Table 5 Diagnostic stem question predictors of response versus non-response based on the comparison ofmain survey respondents (n = 9282) versus short-form survey respondents in non-respondent households (n = 475)1

Bivariate Multivariate

OR (95 CI) OR (95 CI)

I Mood disturbanceDysphoria 09 (07ndash12) 10 (08ndash14)Euphoria 08 (06ndash11) 09 (07ndash12)Irritability 08 (06ndash10) 08 (06ndash11)Extreme irritability 08 (06ndash12) 09 (07ndash13)

II AnxietyPersistent worry 09 (07ndash11) 09 (07ndash12)Panic 08 (06ndash11) 08 (06ndash11)Agoraphobic fear 09 (07ndash12) 09 (07ndash12)Specific fear 10 (07ndash13) 10 (08ndash13)Social fear 10 (07ndash13) 10 (07ndash14)Separation anxietyndash Childhood only 12 (08ndash18) 13 (09ndash20)ndash Adult only 11 (07ndash17) 13 (08ndash20)ndash Both 13 (07ndash24) 16 (08ndash29)

III Substance problemsNicotine 12 (09ndash15) 12 (09ndash16)Alcohol-drugs 11 (08ndash15) 11 (08ndash14)

IV Impulse-control problemsOppositional-defiant 10 (08ndash13) 10 (08ndash14)Conduct 09 (07ndash11) 09 (06ndash11)Intermittent explosive 11 (09ndash14) 12 (10ndash16)Attention-hyperactive problemsndash Attention only 09 (06ndash13) 09 (07ndash13)ndash Hyperactive only 11 (07ndash19) 11 (06ndash20)ndash Both 10 (06ndash06) 10 (06ndash16)

1 Based on logistic regression equations in which respondents in the main survey were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of the residents of all non-respondent HUs using methods described in the text All equations controlled for the predictors in Model 3 in Table 4

IJMPR 132 3rd v4 25604 1026 am Page 83

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 8: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

Kessler Berglund et al76

variety of standard survey persuasion techniques toexplain the purpose and importance of the surveyand to emphasize the confidentiality of responses

Main data collection continued until all non-contact HUs had their full complement of callattempts and all reluctant predesignated respondentshad a number of recruitment attempts It was clear atthis point that the remaining hard-to-recruit caseswere busy people who were unlikely to participate ina survey that could reasonably be expected to lastbetween 2 and 3 hours We therefore developed ashort-form version of the interview that could beadministered in less than 1 hour and we made onelast attempt to obtain an interview with hard-to-recruit cases using this shortened instrumentRemaining hard-to-reach cases were sent a letterinforming them that the study was about to end andthat we were offering an honorarium of $100 forcompleting a shortened version of the interview inthe next 30 days that would last no more than onehour Interviewers were also given a bonus for eachshort-form interview they completed in this 30-dayclose-out period at which point fieldwork ended

Sample dispositionThe sample disposition as of the end of the mainphase of data collection is shown in the first fourcolumns of Table 2 The first two columns describeprimary predesignated respondents in the 10843

final sample HUs while the third and fourthcolumns describe secondary predesignated respon-dents in the 1976 HUs that were selected to obtainsecond interviews Ineligible HUs are excluded fromthe table The response rate at this point in the datacollection was 709 among primary and 804among secondary predesignated respondents with atotal of 9282 completed interviews Non-respondents included 73 of primary and 63 of secondary cases who refused to participate 177 ofprimary and 116 of secondary respondents whowere reluctant to participate (who told the inter-viewer that they were too busy at the time of contactbut did not refuse) 20 of primary and 17 ofsecondary respondents who could not be interviewedbecause of either a permanent condition (forexample mental retardation) or a long-term situa-tion (such as an overseas work assignment) and HUsthat were never contacted (20)

We subsequently attempted to administer short-form interviews to the 2143 reluctant andno-contact primary predesignated respondents andthe 230 reluctant secondary predesignated respon-dents described in the first two columns of Table 2In the course of completing the short-form inter-views with primary respondents an additional 104secondary pre-designated respondents were gener-ated resulting in a total of 334 secondarypre-designated respondents who we attempted to

Table 2 The NCS-R sample disposition

Main interview Short-form interview

Primary Secondary Primary Secondary

(n) (n) (n) (n)

Interview 709 (7693) 804 (1589) 186 (399) 464 (155)Refusal 73 (791)1 63 (124)1 751 (1609) 443 (148)Reluctant 177 (1922) 116 (230) ndash 2 ndash 2 ndash 2 ndash 2Circumstantial 20 (216) 17 (33) 06 (14) 93 (31)No contact 20 (221) ndash 3 ndash 3 56 (121) ndash 3 ndash 3Total (10843) (1976) (2143) (334)

1 Refusals among primary respondents include informants who refused to provide a HU listing respondents who refused to be interviewed and respondents who began the interview but broke off before completing Part I Refusals among secondary respon-dents include the last two of these three categories2 All non-respondents in the short-form interview phase of the survey were classified as refusals rather than reluctant unless theyhad circumstantial reasons for not being interviewed 3 No contact is defined at the HU-level as never making any contact with a resident As a result none of the secondary pre-designated respondents was included in this category even if no contact was ever made with them In the latter case they wereclassified as reluctant in the main interview and as refusals in the short-form interview

IJMPR 132 3rd v4 25604 1026 am Page 76

NCS-R design and field procedures 77

interview The sample disposition is shown in thelast four columns of Table 2 The conditionalresponse rate was 186 among primary and 464among secondary pre-designated respondents with atotal of 554 completed short-form interviews

The 9282 main interviews combined with the554 short-form interviews total to 9836 with 8092among primary and 1744 among secondary respon-dents Focusing on primary respondents the overallresponse rate is 746 the overall cooperation rate(excluding from the denominator HUs that werenever contacted) is 761 and the cooperation rateamong pre-designated respondents without circum-stantial constraints on their participation is 778Among secondary respondents the conditionaloverall response rate is 838 and the cooperationrate among pre-designated respondents withoutcircumstantial constraints is 865

WeightingTwo approaches were taken to weighting the NCS-Rdata The first which we refer to as the non-response(NR) adjustment approach treats the 9282 main interviews as the achieved sample The short-form interviews are treated as non-respondentinterviews which are used to develop a non-responseadjustment weight applied to the 9282 main inter-views The second approach which we refer to as themultiple-imputation (MI) approach combines the9282 main interviews with the 554 short-form inter-views to create a combined sample of 9836 cases inwhich the 554 short-form interviews are weighted totreat them as representative of all initial non-respon-dents The method of MI (Rubin 1987) is used toadjust for the fact that the short-form interviews didnot include all the questions in the main interviewsAdditional weights are applied in both approaches toadjust for differences in within-household proba-bility of selection and to post-stratify the finalsample to approximate the distribution of the 2000Census on a range of socio-demographic variables

The NR approach is the main one used in analysisof the NCS-R data The MI approach is moreexploratory because MI can lead to downward bias inestimates of associations if the models used to makethe imputations do not capture the full complexity ofthe associations in the main cases On the otherhand by using explicit models to impute missingvalues the MI approach allows more flexibility than

the NR approach in combining observed variablesfor short-form cases with patterns detected in thecomplete data In addition mild modelling assumptions can be used in the MI approach toimprove efficiency compared to the NR approach Incases of this sort when the parameter estimates aresimilar in the two approaches the SE will generallybe lower in the MI approach As a result of thesepotential advantages of the MI approach we plan touse MI to replicate marginally significant substantivepatterns as a sensitivity analysis bearing in mindthat shrinkage of the associations might occur due tolack of precision in the imputations Based on thishierarchy between the two approaches in the NCS-Ranalyses the focus in this section of the paper islargely on the NR approach

The non-response adjustment approachFive weights are applied to the data in the NRapproach a locked building subsampling weight(WT11) a within-household probability of selec-tion weight (WT12) a non-response adjustmentweight (WT13) a post-stratification weight(WT14) and a Part II selection weight (WT15)The joint product of the first four of these weights isthe consolidated weight used to analyse data fromthe Part I sample in the NR approach (n = 9282)while the joint product of all five weights is theconsolidated weight used to analyse data from the PartII sample (n = 5692) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample andweight are used This section of the paper reviewseach of these five weights

The locked building subsampling weight (WT11)gives a double weight to 60 sample HUs from an orig-inal 120 apartments in locked apartment buildingswhere we could not make contact with a superinten-dent or other official to gain entry A random 50 ofthe predesignated units in these buildings wereselected for especially intense recruitment effortThese 60 were double weighted It should be notedthat residents of a number of other locked apartmentbuildings were also included and interviewed basedon the interviewers contacting the building ownersuperintendent or official committee of residentsand obtaining permission to carry out interviews inthe building In addition we were officially refusedpermission to carry out interviews in five large

IJMPR 132 3rd v4 25604 1026 am Page 77

Kessler Berglund et al78

locked buildings in New York City each representinga complete sample segment Non-probabilityreplacements of other locked buildings in the sameneighbourhoods were made in order to avoidcomplete non-response in these segments

The within-household probability of selectionweight (WT12) adjusts for the fact that the proba-bility of selection of respondents within HUs variesinversely with the number of people in the HU Thisis true because as noted earlier only one or tworespondents were selected for interview in each HUWT12 was created by generating a separate observa-tional record for each of the weighted (by WT11)14025 eligible HU residents in the 7693 partici-pating HUs The three variables in the householdlisting were included in the individual-level recordsrespondent age and sex and number of eligible resi-dents in the HU The weighted distributions of thesethree variables were then calculated for the 9282respondents the 4733 eligible HU residents whowere not interviewed and for all 14015 eligible HUresidents weighted to adjust for WT11 The sum ofweights was 14025 because of this adjustment

As shown in Table 3 these distributions differmeaningfully with the greatest difference being for

size of household This is because the individual-level probability of selection within HUs wasinversely proportional to HU size leading to adistortion in the age and sex distributions amongrespondents because these variables vary with size ofHU WT12 was constructed to correct this bias Theweighted three-way cross-classification of age sexand household size was computed separately for the9282 respondents (weighted to 9287 because ofWT11) and for all 14015 residents of the partici-pating HUs (weighted to 14025) The ratio of thenumber of cases in the latter to the former was thencalculated separately for each cell and these ratioswere applied to each respondent The sum of theweights across respondents was normed to sum to14025 These normed ratios define WT12

The non-response adjustment weight (WT13)was then developed and applied to the weighted (bythe product of WT11 and WT12) dataset of 9282respondents to adjust for the fact that non-respondents differ from respondents The partici-pants in the short-form interviews from initialnon-respondent HUs were treated as representativeof all non-respondents in order to develop thisweight As described in the next subsection a two-

Table 3 Household listing information for respondents and other eligible residents of the households thatparticipated in the main survey1

Respondents Others Total

(SE) (SE) (SE)

Age18ndash29 227 (04) 269 (06) 241 (04)30ndash44 317 (05) 298 (07) 311 (04)45ndash59 246 (04) 266 (06) 253 (04)60+ 210 (04) 166 (05) 195 (03)

SexMale 446 (05) 520 (07) 471 (04)Female 554 (05) 480 (07) 529 (04)

Number of eligible HU residents1 293 (05) 00 (00) 194 (03)2 539 (05) 606 (07) 562 (04)3+ 168 (04) 394 (07) 244 (04)

Total (n) (9287) (4738) (14025)

1 A total of 9282 respondents from 7693 HUs participated in the main survey An additional 4733 eligible people lived in the 7693HUs The locked building weight (WT11) was used in calculating the distributions in the table converting the 9282 respondents intoa weighted total of 9287 and the 4733 other eligible HU residents into a weighted total of 4738 for a total of 14025

IJMPR 132 3rd v4 25604 1026 am Page 78

NCS-R design and field procedures 79

part weight was applied as a preliminary step to theseshort-form respondents aimed at making them asrepresentative as possible of all residents of non-respondent HUs As the short-form respondentscompleted the disorder screening section as well asthe demographic sections it was possible to use diag-nostic stem questions in addition to individual-leveldemographic variables as predictors in comparing themain survey respondents with the initial non-respon-dents Prior to carrying out the stepwise logisticregression analysis the main survey respondentswere weighted to a sum of weights of 14025 to repre-sent all eligible residents of the respondent HUswhile the initial non-respondents were weighted to asum of weights of 6302 to represent all eligible resi-dents of non-respondent HUs In both cases thesesums were obtained from HU listings

WT13 is a very important weight because thelogistic regression equation on which it is based eval-uates whether there is a significant bias in theprevalence of mental disorders in the main sampleWe consequently consider the construction ofWT13 in some detail in this part of the paper Fourstages of model fitting were used to create the logisticregression equation on which WT13 is based Thesestages sequentially evaluated the effects ofgeographic variables individual-level demographicvariables census small area aggregate variables andscreening measures of mental disorders Results ofthe first three stages are presented in Table 4 Thefirst stage (Model 1) examined segment-level differ-ences between respondent and non-respondent HUsin Census region and Census urbanicity Results inthe first two columns of Table 4 show that respon-dent HUs differ meaningfully from non-respondentsHUs in urbanicity (χ2

7 = 1060 p = 000) and region(χ2

3 = 66 p = 009) The odds-ratios (ORs) forurbanicity and region show that the sample over-represents non-metropolitan counties the Midwestand the South

The second stage (Model 2) added basic individual-level demographic variables to the equa-tion ndash age sex race-ethnicity marital statuseducation employment status and household sizeAs shown in Table 4 household size (χ2

3 = 604 p =000) and marital status (χ2

2 = 92 p = 001) werethe only significant predictors in this set withrespondents significantly less likely than non-respondents to live in houses with one to three

eligible residents and to be never married or previ-ously married

The third stage (Model 3) was based on a forwardstepwise logistic regression analysis that searched for2000 census block group (BG) aggregate measuresthat significantly discriminate between respondentsand non-respondents A wide range of variables wasused to describe BG characteristics includingpercentage variables (for example the percentage ofadults in the BG living in poverty the percentageliving alone) and mean variables (for example theaverage family income of HUs in the BG the averagenumber of adults per HU) In cases where a samplesegment crossed BG boundaries weighted averagesacross these units were calculated

Five aggregate variables were found to be signifi-cant predictors in the third stage of analysis All werediscretized in elaborations of the basic logistic regres-sion equations in order to study their functional formin distinguishing respondents from non-respondents(005 level two-sided tests) Dichotomous classifica-tion was found to be appropriate to characterize thefunctional form of four predictors A trichotomousspecification was required for the fifth predictorResults presented in Table 4 show that respondentswere significantly more likely than non-respondentsto live in areas with a low proportion of people overthe age of 65 a low proportion of foreign-speakers ahigh proportion of non-Hispanic whites a highproportion of never-married people and highproportions of people not in the labour force

In the fourth stage of estimating the non-responsebias equation positive responses to diagnostic stemquestions in the screening section of the interviewwere added to the predictors in Model 3 Stem questions were included for mood disturbance(dysphoria euphoria irritability) anxiety (persistentworry panic agoraphobia specific fear social fearchildhood and adult separation anxiety) substanceproblems (alcohol drugs nicotine) and impulsecontrol problems (oppositional-defiant conductdisorder intermittent explosive attentional andhyperactive) As shown in Table 5 none of thesevariables alone was either a statistically or substan-tively significant predictor in the fourth stage of theanalysis These variables are significant as a group(χ2

20 = 381 p = 0010) though even though noneof the predictors in the equation is individuallysignificant We also considered more complex

IJMPR 132 3rd v4 25604 1026 am Page 79

Kessler Berglund et al80

Table 4 Geographic and socio-demographic predictors of response versus non-response based on thecomparison of main survey respondents (n = 9282) versus short-form survey respondents (n = 475)1

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Region Midwest 17 (11ndash26) 66 16 (10ndash25) 53 14 (09ndash21) 28 3South 18 (11ndash29) 16 (10ndash26) 13 (08ndash22)West 13 (07ndash25) 14 (07ndash26) 13 (07ndash24)Northeast 10 ndash

County urbanicity2

Central counties of metro 10 ndash 10 ndash 10 ndashareas of 1+million

Fringe counties of metro a 10 (05ndash21) 1060 12 (05ndash24) 875 08 (04ndash15) 232 7reas 1+ million

Central and fringe counties 15 (09ndash25) 16 (10ndash27) 15 (10ndash25)of metro areas of 250000 ndash1 million

Central and fringe counties 15 (08ndash29) 16 (09ndash29) 13 (07ndash25)of metro areasof less than 250000

Non-metro counties of 20000 19 (09ndash42) 21 (10ndash45) 20 (11ndash40)or more adjacent to a metro area

Non-metro counties of 20000 or 69 (42ndash111) 69 (41ndash118) 28 (14ndash54)more not adjacent to a metro area

Non-metro counties of 17 (08ndash37) 19 (09ndash40) 16 (08ndash35)2500 ndash19999

Non-metro counties of less than 31 (18ndash55) 34 (20ndash58) 22 (12ndash43)2500

Age18ndash29 10 ndash 10 ndash30ndash44 09 (06ndash14) 23 10 (07ndash15) 29 345ndash59 08 (06ndash12) 09 (06ndash14)60+ 11 (06ndash19) 13 (08ndash23)

SexFemale 10 (09ndash13) ndash 10 (09ndash13) ndashMale 10 ndash 10 ndash

Race Non-Hispanic White 10 ndash 10 ndashNon-Hispanic Black 15 (10ndash22) 39 17 (11ndash26) 76 3Hispanic 11 (07ndash17) 14 (09ndash20)Other 09 (05ndash15) 09 (05ndash15)

Marital statusMarried3 10 ndash 10 ndashNever married 16 (11ndash24) 92 17 (11ndash25) 91 2Separatedwidoweddivorced 18 (12ndash28) 18 (12ndash27)

Education0ndash11 10 ndash 10 ndash12 09 (06ndash14) 18 09 (06ndash14) 32 313ndash15 10 (07ndash14) 10 (07ndash14)16+ 11 (08ndash16) 12 (08ndash17)

IJMPR 132 3rd v4 25604 1026 am Page 80

NCS-R design and field procedures 81

specifications that included disorder clusters andcounts but none yielded any more compellingevidence than in Table 5 for significant differencesbetween respondents and initial non-respondents inthe prevalence of these diagnostic stem questions

Based on these results the final non-responseadjustment weight was based on Model 3 Each ofthe 9282 main study respondents was assigned apredicted probability of response based on this equa-tion The non-response adjustment weight was

defined as the multiplicative inverse of thatpredicted probability This weight was then normedto sum to 9282 This normed weight defines WT13

The 9282 cases were then weighted by the jointproduct of WT11 WT12 and WT13 A fourthweight (WT14) was then created to adjust for varia-tion between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the March 2002 Current PopulationSurvey (CPS) data This post-stratification weight

Table 4 contd

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Employment statusEmployed 10 ndash 10 ndashHomemaker 13 (08ndash20) 20 14 (09ndash21) 28 4Retired 11 (06ndash18) 11 (07ndash18)Student 09 (04ndash18) 09 (04ndash18)Other 11 (06ndash19) 11 (06ndash20)

Number of eligible household residents1 07 (04ndash11) 604 06 (04ndash10) 720 32 19 (11ndash35) 18 (10ndash33)3 27 (14ndash52) 26 (13ndash50)4+ 10 ndash 10 ndash

Block group-level socio-demographics4

Unable to speak English D1-6 19 (14ndash26) Never married D5-10 15 (11ndash22) Not in labor force D1-3 06 (04ndash09) Age 65+ D1-3 27 (16ndash45) Age 65+ D4-9 15 (09ndash23) Non-Hispanic White D1 04 (02ndash06)

Significant at the 005 level two-sided test

1 Based on logistic regression equations in which main survey respondents were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of all non-respondents usingmethods described in the text2 Metropolitan counties are defined as either central or fringe counties of census metropolitan statistical areas Non-metropolitancounties are defined residually as all counties that are not in census metropolitan statistical areas For more details on the CensusBureau definition of metropolitan statistical areas see httpwwwcensusgovpopulationwwwestimatesmetrodefhtml3 Includes co-habitors4 The percentages in each county were converted to percentiles based on weighted (by population size) county-level distributionsand converted into deciles (D) of the weighted percentile distributions The coefficients from preliminary regression analysesincluded nine dummies for the deciles of each substantive variable These coefficients were examined in order to choose theoptimal way to collapse the deciles A dichotomous coding was optimal for four of the five substantive variables and a trichotomouscoding for the fifth

IJMPR 132 3rd v4 25604 1026 am Page 81

Kessler Berglund et al82

was based on comparisons of age sex race-ethnicityeducation marital status region and urbanicity inthe weighted (by the joint product of WT11WT12 and WT13) sample and the 2002 CPSsample for persons ages 18+ in the continental USAs the full cross-classification of these seven vari-ables even using coarse categories has more cellsthan there are respondents in the sample asmoothing method was used to fit only the two-waymarginals by estimating a logistic regression equationthat combined the weighted 9282 respondents(coded 1 on the dichotomous dependent variable)with a synthetic dataset representative of the individual-level 2002 CPS data for the populationages 18+ (coded 0 on the dependent variable) Theseven socio-demographic variables and their two-way interactions were the predictors The predictedprobability of being in the NCS-R sample ratherthan in the synthetic CPS population sample wasdefined for each respondent (pNn) based on thisequation The ratio pNn (1 ndash pNn) was then calcu-lated for each respondent The sum of these ratiosacross the 9282 respondents was normed to sum to9282 These normed ratios define WT14

The four-way product of weights WT11 throughWT14 was then created and applied to the 9282respondent cases This defines the consolidated PartI weight based on the NR approach An additionalweight (WT15) is needed though to analyse datain the Part II sample This is because as notedearlier the Part I sample was divided into three stratathat differed in their probabilities of selection intoPart II (100 in stratum I 50 in stratum II and25 in stratum III) The proportions who actuallycompleted Part II differ from these targets 992 ofthe 4088 respondents in stratum I (n = 4054)534 of the 1555 respondents in stratum II (n = 830) and 222 of the 3639 respondents instratum III (n = 808) for a total Part II sample of5692 respondents These differences from the targetproportions occurred because some respondentsrefused to complete Part II (approximately 1 ofdesignated cases in each stratum) and because therandom selection procedure for respondents imple-mented in the CAPI program had some randomerror The latter accounts for the fact that thenumber of Part II respondents in stratum II is largerthan the target proportion

We could have generated WT15 as the simple

inverse of the weighted (by the consolidated Part Iweight) proportion of Part I respondents in thestratum who completed Part II However in order tocheck for the possibility of systematic bias we esti-mated separate stepwise logistic regression equationsin each stratum to predict participation in Part IIfrom Part I responses Only a handful of variables allof them demographic were found to be significantpredictors in these equations Each Part II respon-dent was assigned the inverse of his or her predictedprobability of participation in Part II from the finalwithin-stratum equation These values were thennormed to have a sum equal to the sum of the Part Iweights in the full Part I sample in the stratumThese normed values were then summed across theentire Part II sample of 5692 cases and renormed tohave a sum of weights of 5692 These renormedvalues define WT15 WT15 was then multiplied bythe consolidated Part I weight to create the consoli-dated Part II weight

Comparison of the weighted and unweighteddistributions of the Part I and Part II samples with2002 CPS population distributions provides informa-tion on the effects of weighting As shown in Table6 the unweighted Part I samples overrepresentedracial minorities females residents of the Midwestpeople with 13+ years of education and residents ofmetropolitan areas All of these biases were correctedwith the consolidated Part I weight Biases in theunweighted Part II sample were similar althoughmore extreme biases were found than in the Part Isample with regard to the over-representation offemales young adults (ages 18ndash34) and residents ofMetropolitan areas All of these biases werecorrected with the consolidated Part II weight

Weighting short-form respondents to be representative ofall non-respondentsWe noted in the last subsection that WT13 relies ona two-part weighting scheme that was applied to theshort-form respondents before they were comparedwith the main survey respondents This was done inorder to increase the extent to which the short-formrespondents represent all main survey non-respondents The first part of this weighting schemecompared the 399 HUs in which short-form inter-views were completed with all other non-respondentHUs on the same aggregate 2000 census block group(BG) data as those used in the third stage (Model 3)

IJMPR 132 3rd v4 25604 1026 am Page 82

NCS-R design and field procedures 83

of the non-response adjustment model (Table 4)Stepwise logistic regression was used to select signifi-cant predictors of HU-level participation versusnon-participation The final set of significant predic-tors included region urbanicity and household sizeThe participating HUs were then weighted by theinverse of their predicted probability of participationto adjust for segment-level variation in responseThis weight was then normed to have a sum ofweights equal to 3258 the weighted total number ofnon-respondent HUs in the sample

With this first weight applied separately to each ofthe 773 residents of the 399 participating short-formHUs a comparison of the short-form respondents inthese 399 HUs with the remaining eligible residentsof the same HUs was made based on the cross-classification of listing information (age and sex ofeach eligible HU resident and number of eligibleresidents in each HU) A second weight was thendeveloped that was equivalent to WT12 in adjustingthe short-form respondents to be representative of all773 eligible residents of these HUs As the 399 HUs

Table 5 Diagnostic stem question predictors of response versus non-response based on the comparison ofmain survey respondents (n = 9282) versus short-form survey respondents in non-respondent households (n = 475)1

Bivariate Multivariate

OR (95 CI) OR (95 CI)

I Mood disturbanceDysphoria 09 (07ndash12) 10 (08ndash14)Euphoria 08 (06ndash11) 09 (07ndash12)Irritability 08 (06ndash10) 08 (06ndash11)Extreme irritability 08 (06ndash12) 09 (07ndash13)

II AnxietyPersistent worry 09 (07ndash11) 09 (07ndash12)Panic 08 (06ndash11) 08 (06ndash11)Agoraphobic fear 09 (07ndash12) 09 (07ndash12)Specific fear 10 (07ndash13) 10 (08ndash13)Social fear 10 (07ndash13) 10 (07ndash14)Separation anxietyndash Childhood only 12 (08ndash18) 13 (09ndash20)ndash Adult only 11 (07ndash17) 13 (08ndash20)ndash Both 13 (07ndash24) 16 (08ndash29)

III Substance problemsNicotine 12 (09ndash15) 12 (09ndash16)Alcohol-drugs 11 (08ndash15) 11 (08ndash14)

IV Impulse-control problemsOppositional-defiant 10 (08ndash13) 10 (08ndash14)Conduct 09 (07ndash11) 09 (06ndash11)Intermittent explosive 11 (09ndash14) 12 (10ndash16)Attention-hyperactive problemsndash Attention only 09 (06ndash13) 09 (07ndash13)ndash Hyperactive only 11 (07ndash19) 11 (06ndash20)ndash Both 10 (06ndash06) 10 (06ndash16)

1 Based on logistic regression equations in which respondents in the main survey were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of the residents of all non-respondent HUs using methods described in the text All equations controlled for the predictors in Model 3 in Table 4

IJMPR 132 3rd v4 25604 1026 am Page 83

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 9: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

NCS-R design and field procedures 77

interview The sample disposition is shown in thelast four columns of Table 2 The conditionalresponse rate was 186 among primary and 464among secondary pre-designated respondents with atotal of 554 completed short-form interviews

The 9282 main interviews combined with the554 short-form interviews total to 9836 with 8092among primary and 1744 among secondary respon-dents Focusing on primary respondents the overallresponse rate is 746 the overall cooperation rate(excluding from the denominator HUs that werenever contacted) is 761 and the cooperation rateamong pre-designated respondents without circum-stantial constraints on their participation is 778Among secondary respondents the conditionaloverall response rate is 838 and the cooperationrate among pre-designated respondents withoutcircumstantial constraints is 865

WeightingTwo approaches were taken to weighting the NCS-Rdata The first which we refer to as the non-response(NR) adjustment approach treats the 9282 main interviews as the achieved sample The short-form interviews are treated as non-respondentinterviews which are used to develop a non-responseadjustment weight applied to the 9282 main inter-views The second approach which we refer to as themultiple-imputation (MI) approach combines the9282 main interviews with the 554 short-form inter-views to create a combined sample of 9836 cases inwhich the 554 short-form interviews are weighted totreat them as representative of all initial non-respon-dents The method of MI (Rubin 1987) is used toadjust for the fact that the short-form interviews didnot include all the questions in the main interviewsAdditional weights are applied in both approaches toadjust for differences in within-household proba-bility of selection and to post-stratify the finalsample to approximate the distribution of the 2000Census on a range of socio-demographic variables

The NR approach is the main one used in analysisof the NCS-R data The MI approach is moreexploratory because MI can lead to downward bias inestimates of associations if the models used to makethe imputations do not capture the full complexity ofthe associations in the main cases On the otherhand by using explicit models to impute missingvalues the MI approach allows more flexibility than

the NR approach in combining observed variablesfor short-form cases with patterns detected in thecomplete data In addition mild modelling assumptions can be used in the MI approach toimprove efficiency compared to the NR approach Incases of this sort when the parameter estimates aresimilar in the two approaches the SE will generallybe lower in the MI approach As a result of thesepotential advantages of the MI approach we plan touse MI to replicate marginally significant substantivepatterns as a sensitivity analysis bearing in mindthat shrinkage of the associations might occur due tolack of precision in the imputations Based on thishierarchy between the two approaches in the NCS-Ranalyses the focus in this section of the paper islargely on the NR approach

The non-response adjustment approachFive weights are applied to the data in the NRapproach a locked building subsampling weight(WT11) a within-household probability of selec-tion weight (WT12) a non-response adjustmentweight (WT13) a post-stratification weight(WT14) and a Part II selection weight (WT15)The joint product of the first four of these weights isthe consolidated weight used to analyse data fromthe Part I sample in the NR approach (n = 9282)while the joint product of all five weights is theconsolidated weight used to analyse data from the PartII sample (n = 5692) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample andweight are used This section of the paper reviewseach of these five weights

The locked building subsampling weight (WT11)gives a double weight to 60 sample HUs from an orig-inal 120 apartments in locked apartment buildingswhere we could not make contact with a superinten-dent or other official to gain entry A random 50 ofthe predesignated units in these buildings wereselected for especially intense recruitment effortThese 60 were double weighted It should be notedthat residents of a number of other locked apartmentbuildings were also included and interviewed basedon the interviewers contacting the building ownersuperintendent or official committee of residentsand obtaining permission to carry out interviews inthe building In addition we were officially refusedpermission to carry out interviews in five large

IJMPR 132 3rd v4 25604 1026 am Page 77

Kessler Berglund et al78

locked buildings in New York City each representinga complete sample segment Non-probabilityreplacements of other locked buildings in the sameneighbourhoods were made in order to avoidcomplete non-response in these segments

The within-household probability of selectionweight (WT12) adjusts for the fact that the proba-bility of selection of respondents within HUs variesinversely with the number of people in the HU Thisis true because as noted earlier only one or tworespondents were selected for interview in each HUWT12 was created by generating a separate observa-tional record for each of the weighted (by WT11)14025 eligible HU residents in the 7693 partici-pating HUs The three variables in the householdlisting were included in the individual-level recordsrespondent age and sex and number of eligible resi-dents in the HU The weighted distributions of thesethree variables were then calculated for the 9282respondents the 4733 eligible HU residents whowere not interviewed and for all 14015 eligible HUresidents weighted to adjust for WT11 The sum ofweights was 14025 because of this adjustment

As shown in Table 3 these distributions differmeaningfully with the greatest difference being for

size of household This is because the individual-level probability of selection within HUs wasinversely proportional to HU size leading to adistortion in the age and sex distributions amongrespondents because these variables vary with size ofHU WT12 was constructed to correct this bias Theweighted three-way cross-classification of age sexand household size was computed separately for the9282 respondents (weighted to 9287 because ofWT11) and for all 14015 residents of the partici-pating HUs (weighted to 14025) The ratio of thenumber of cases in the latter to the former was thencalculated separately for each cell and these ratioswere applied to each respondent The sum of theweights across respondents was normed to sum to14025 These normed ratios define WT12

The non-response adjustment weight (WT13)was then developed and applied to the weighted (bythe product of WT11 and WT12) dataset of 9282respondents to adjust for the fact that non-respondents differ from respondents The partici-pants in the short-form interviews from initialnon-respondent HUs were treated as representativeof all non-respondents in order to develop thisweight As described in the next subsection a two-

Table 3 Household listing information for respondents and other eligible residents of the households thatparticipated in the main survey1

Respondents Others Total

(SE) (SE) (SE)

Age18ndash29 227 (04) 269 (06) 241 (04)30ndash44 317 (05) 298 (07) 311 (04)45ndash59 246 (04) 266 (06) 253 (04)60+ 210 (04) 166 (05) 195 (03)

SexMale 446 (05) 520 (07) 471 (04)Female 554 (05) 480 (07) 529 (04)

Number of eligible HU residents1 293 (05) 00 (00) 194 (03)2 539 (05) 606 (07) 562 (04)3+ 168 (04) 394 (07) 244 (04)

Total (n) (9287) (4738) (14025)

1 A total of 9282 respondents from 7693 HUs participated in the main survey An additional 4733 eligible people lived in the 7693HUs The locked building weight (WT11) was used in calculating the distributions in the table converting the 9282 respondents intoa weighted total of 9287 and the 4733 other eligible HU residents into a weighted total of 4738 for a total of 14025

IJMPR 132 3rd v4 25604 1026 am Page 78

NCS-R design and field procedures 79

part weight was applied as a preliminary step to theseshort-form respondents aimed at making them asrepresentative as possible of all residents of non-respondent HUs As the short-form respondentscompleted the disorder screening section as well asthe demographic sections it was possible to use diag-nostic stem questions in addition to individual-leveldemographic variables as predictors in comparing themain survey respondents with the initial non-respon-dents Prior to carrying out the stepwise logisticregression analysis the main survey respondentswere weighted to a sum of weights of 14025 to repre-sent all eligible residents of the respondent HUswhile the initial non-respondents were weighted to asum of weights of 6302 to represent all eligible resi-dents of non-respondent HUs In both cases thesesums were obtained from HU listings

WT13 is a very important weight because thelogistic regression equation on which it is based eval-uates whether there is a significant bias in theprevalence of mental disorders in the main sampleWe consequently consider the construction ofWT13 in some detail in this part of the paper Fourstages of model fitting were used to create the logisticregression equation on which WT13 is based Thesestages sequentially evaluated the effects ofgeographic variables individual-level demographicvariables census small area aggregate variables andscreening measures of mental disorders Results ofthe first three stages are presented in Table 4 Thefirst stage (Model 1) examined segment-level differ-ences between respondent and non-respondent HUsin Census region and Census urbanicity Results inthe first two columns of Table 4 show that respon-dent HUs differ meaningfully from non-respondentsHUs in urbanicity (χ2

7 = 1060 p = 000) and region(χ2

3 = 66 p = 009) The odds-ratios (ORs) forurbanicity and region show that the sample over-represents non-metropolitan counties the Midwestand the South

The second stage (Model 2) added basic individual-level demographic variables to the equa-tion ndash age sex race-ethnicity marital statuseducation employment status and household sizeAs shown in Table 4 household size (χ2

3 = 604 p =000) and marital status (χ2

2 = 92 p = 001) werethe only significant predictors in this set withrespondents significantly less likely than non-respondents to live in houses with one to three

eligible residents and to be never married or previ-ously married

The third stage (Model 3) was based on a forwardstepwise logistic regression analysis that searched for2000 census block group (BG) aggregate measuresthat significantly discriminate between respondentsand non-respondents A wide range of variables wasused to describe BG characteristics includingpercentage variables (for example the percentage ofadults in the BG living in poverty the percentageliving alone) and mean variables (for example theaverage family income of HUs in the BG the averagenumber of adults per HU) In cases where a samplesegment crossed BG boundaries weighted averagesacross these units were calculated

Five aggregate variables were found to be signifi-cant predictors in the third stage of analysis All werediscretized in elaborations of the basic logistic regres-sion equations in order to study their functional formin distinguishing respondents from non-respondents(005 level two-sided tests) Dichotomous classifica-tion was found to be appropriate to characterize thefunctional form of four predictors A trichotomousspecification was required for the fifth predictorResults presented in Table 4 show that respondentswere significantly more likely than non-respondentsto live in areas with a low proportion of people overthe age of 65 a low proportion of foreign-speakers ahigh proportion of non-Hispanic whites a highproportion of never-married people and highproportions of people not in the labour force

In the fourth stage of estimating the non-responsebias equation positive responses to diagnostic stemquestions in the screening section of the interviewwere added to the predictors in Model 3 Stem questions were included for mood disturbance(dysphoria euphoria irritability) anxiety (persistentworry panic agoraphobia specific fear social fearchildhood and adult separation anxiety) substanceproblems (alcohol drugs nicotine) and impulsecontrol problems (oppositional-defiant conductdisorder intermittent explosive attentional andhyperactive) As shown in Table 5 none of thesevariables alone was either a statistically or substan-tively significant predictor in the fourth stage of theanalysis These variables are significant as a group(χ2

20 = 381 p = 0010) though even though noneof the predictors in the equation is individuallysignificant We also considered more complex

IJMPR 132 3rd v4 25604 1026 am Page 79

Kessler Berglund et al80

Table 4 Geographic and socio-demographic predictors of response versus non-response based on thecomparison of main survey respondents (n = 9282) versus short-form survey respondents (n = 475)1

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Region Midwest 17 (11ndash26) 66 16 (10ndash25) 53 14 (09ndash21) 28 3South 18 (11ndash29) 16 (10ndash26) 13 (08ndash22)West 13 (07ndash25) 14 (07ndash26) 13 (07ndash24)Northeast 10 ndash

County urbanicity2

Central counties of metro 10 ndash 10 ndash 10 ndashareas of 1+million

Fringe counties of metro a 10 (05ndash21) 1060 12 (05ndash24) 875 08 (04ndash15) 232 7reas 1+ million

Central and fringe counties 15 (09ndash25) 16 (10ndash27) 15 (10ndash25)of metro areas of 250000 ndash1 million

Central and fringe counties 15 (08ndash29) 16 (09ndash29) 13 (07ndash25)of metro areasof less than 250000

Non-metro counties of 20000 19 (09ndash42) 21 (10ndash45) 20 (11ndash40)or more adjacent to a metro area

Non-metro counties of 20000 or 69 (42ndash111) 69 (41ndash118) 28 (14ndash54)more not adjacent to a metro area

Non-metro counties of 17 (08ndash37) 19 (09ndash40) 16 (08ndash35)2500 ndash19999

Non-metro counties of less than 31 (18ndash55) 34 (20ndash58) 22 (12ndash43)2500

Age18ndash29 10 ndash 10 ndash30ndash44 09 (06ndash14) 23 10 (07ndash15) 29 345ndash59 08 (06ndash12) 09 (06ndash14)60+ 11 (06ndash19) 13 (08ndash23)

SexFemale 10 (09ndash13) ndash 10 (09ndash13) ndashMale 10 ndash 10 ndash

Race Non-Hispanic White 10 ndash 10 ndashNon-Hispanic Black 15 (10ndash22) 39 17 (11ndash26) 76 3Hispanic 11 (07ndash17) 14 (09ndash20)Other 09 (05ndash15) 09 (05ndash15)

Marital statusMarried3 10 ndash 10 ndashNever married 16 (11ndash24) 92 17 (11ndash25) 91 2Separatedwidoweddivorced 18 (12ndash28) 18 (12ndash27)

Education0ndash11 10 ndash 10 ndash12 09 (06ndash14) 18 09 (06ndash14) 32 313ndash15 10 (07ndash14) 10 (07ndash14)16+ 11 (08ndash16) 12 (08ndash17)

IJMPR 132 3rd v4 25604 1026 am Page 80

NCS-R design and field procedures 81

specifications that included disorder clusters andcounts but none yielded any more compellingevidence than in Table 5 for significant differencesbetween respondents and initial non-respondents inthe prevalence of these diagnostic stem questions

Based on these results the final non-responseadjustment weight was based on Model 3 Each ofthe 9282 main study respondents was assigned apredicted probability of response based on this equa-tion The non-response adjustment weight was

defined as the multiplicative inverse of thatpredicted probability This weight was then normedto sum to 9282 This normed weight defines WT13

The 9282 cases were then weighted by the jointproduct of WT11 WT12 and WT13 A fourthweight (WT14) was then created to adjust for varia-tion between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the March 2002 Current PopulationSurvey (CPS) data This post-stratification weight

Table 4 contd

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Employment statusEmployed 10 ndash 10 ndashHomemaker 13 (08ndash20) 20 14 (09ndash21) 28 4Retired 11 (06ndash18) 11 (07ndash18)Student 09 (04ndash18) 09 (04ndash18)Other 11 (06ndash19) 11 (06ndash20)

Number of eligible household residents1 07 (04ndash11) 604 06 (04ndash10) 720 32 19 (11ndash35) 18 (10ndash33)3 27 (14ndash52) 26 (13ndash50)4+ 10 ndash 10 ndash

Block group-level socio-demographics4

Unable to speak English D1-6 19 (14ndash26) Never married D5-10 15 (11ndash22) Not in labor force D1-3 06 (04ndash09) Age 65+ D1-3 27 (16ndash45) Age 65+ D4-9 15 (09ndash23) Non-Hispanic White D1 04 (02ndash06)

Significant at the 005 level two-sided test

1 Based on logistic regression equations in which main survey respondents were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of all non-respondents usingmethods described in the text2 Metropolitan counties are defined as either central or fringe counties of census metropolitan statistical areas Non-metropolitancounties are defined residually as all counties that are not in census metropolitan statistical areas For more details on the CensusBureau definition of metropolitan statistical areas see httpwwwcensusgovpopulationwwwestimatesmetrodefhtml3 Includes co-habitors4 The percentages in each county were converted to percentiles based on weighted (by population size) county-level distributionsand converted into deciles (D) of the weighted percentile distributions The coefficients from preliminary regression analysesincluded nine dummies for the deciles of each substantive variable These coefficients were examined in order to choose theoptimal way to collapse the deciles A dichotomous coding was optimal for four of the five substantive variables and a trichotomouscoding for the fifth

IJMPR 132 3rd v4 25604 1026 am Page 81

Kessler Berglund et al82

was based on comparisons of age sex race-ethnicityeducation marital status region and urbanicity inthe weighted (by the joint product of WT11WT12 and WT13) sample and the 2002 CPSsample for persons ages 18+ in the continental USAs the full cross-classification of these seven vari-ables even using coarse categories has more cellsthan there are respondents in the sample asmoothing method was used to fit only the two-waymarginals by estimating a logistic regression equationthat combined the weighted 9282 respondents(coded 1 on the dichotomous dependent variable)with a synthetic dataset representative of the individual-level 2002 CPS data for the populationages 18+ (coded 0 on the dependent variable) Theseven socio-demographic variables and their two-way interactions were the predictors The predictedprobability of being in the NCS-R sample ratherthan in the synthetic CPS population sample wasdefined for each respondent (pNn) based on thisequation The ratio pNn (1 ndash pNn) was then calcu-lated for each respondent The sum of these ratiosacross the 9282 respondents was normed to sum to9282 These normed ratios define WT14

The four-way product of weights WT11 throughWT14 was then created and applied to the 9282respondent cases This defines the consolidated PartI weight based on the NR approach An additionalweight (WT15) is needed though to analyse datain the Part II sample This is because as notedearlier the Part I sample was divided into three stratathat differed in their probabilities of selection intoPart II (100 in stratum I 50 in stratum II and25 in stratum III) The proportions who actuallycompleted Part II differ from these targets 992 ofthe 4088 respondents in stratum I (n = 4054)534 of the 1555 respondents in stratum II (n = 830) and 222 of the 3639 respondents instratum III (n = 808) for a total Part II sample of5692 respondents These differences from the targetproportions occurred because some respondentsrefused to complete Part II (approximately 1 ofdesignated cases in each stratum) and because therandom selection procedure for respondents imple-mented in the CAPI program had some randomerror The latter accounts for the fact that thenumber of Part II respondents in stratum II is largerthan the target proportion

We could have generated WT15 as the simple

inverse of the weighted (by the consolidated Part Iweight) proportion of Part I respondents in thestratum who completed Part II However in order tocheck for the possibility of systematic bias we esti-mated separate stepwise logistic regression equationsin each stratum to predict participation in Part IIfrom Part I responses Only a handful of variables allof them demographic were found to be significantpredictors in these equations Each Part II respon-dent was assigned the inverse of his or her predictedprobability of participation in Part II from the finalwithin-stratum equation These values were thennormed to have a sum equal to the sum of the Part Iweights in the full Part I sample in the stratumThese normed values were then summed across theentire Part II sample of 5692 cases and renormed tohave a sum of weights of 5692 These renormedvalues define WT15 WT15 was then multiplied bythe consolidated Part I weight to create the consoli-dated Part II weight

Comparison of the weighted and unweighteddistributions of the Part I and Part II samples with2002 CPS population distributions provides informa-tion on the effects of weighting As shown in Table6 the unweighted Part I samples overrepresentedracial minorities females residents of the Midwestpeople with 13+ years of education and residents ofmetropolitan areas All of these biases were correctedwith the consolidated Part I weight Biases in theunweighted Part II sample were similar althoughmore extreme biases were found than in the Part Isample with regard to the over-representation offemales young adults (ages 18ndash34) and residents ofMetropolitan areas All of these biases werecorrected with the consolidated Part II weight

Weighting short-form respondents to be representative ofall non-respondentsWe noted in the last subsection that WT13 relies ona two-part weighting scheme that was applied to theshort-form respondents before they were comparedwith the main survey respondents This was done inorder to increase the extent to which the short-formrespondents represent all main survey non-respondents The first part of this weighting schemecompared the 399 HUs in which short-form inter-views were completed with all other non-respondentHUs on the same aggregate 2000 census block group(BG) data as those used in the third stage (Model 3)

IJMPR 132 3rd v4 25604 1026 am Page 82

NCS-R design and field procedures 83

of the non-response adjustment model (Table 4)Stepwise logistic regression was used to select signifi-cant predictors of HU-level participation versusnon-participation The final set of significant predic-tors included region urbanicity and household sizeThe participating HUs were then weighted by theinverse of their predicted probability of participationto adjust for segment-level variation in responseThis weight was then normed to have a sum ofweights equal to 3258 the weighted total number ofnon-respondent HUs in the sample

With this first weight applied separately to each ofthe 773 residents of the 399 participating short-formHUs a comparison of the short-form respondents inthese 399 HUs with the remaining eligible residentsof the same HUs was made based on the cross-classification of listing information (age and sex ofeach eligible HU resident and number of eligibleresidents in each HU) A second weight was thendeveloped that was equivalent to WT12 in adjustingthe short-form respondents to be representative of all773 eligible residents of these HUs As the 399 HUs

Table 5 Diagnostic stem question predictors of response versus non-response based on the comparison ofmain survey respondents (n = 9282) versus short-form survey respondents in non-respondent households (n = 475)1

Bivariate Multivariate

OR (95 CI) OR (95 CI)

I Mood disturbanceDysphoria 09 (07ndash12) 10 (08ndash14)Euphoria 08 (06ndash11) 09 (07ndash12)Irritability 08 (06ndash10) 08 (06ndash11)Extreme irritability 08 (06ndash12) 09 (07ndash13)

II AnxietyPersistent worry 09 (07ndash11) 09 (07ndash12)Panic 08 (06ndash11) 08 (06ndash11)Agoraphobic fear 09 (07ndash12) 09 (07ndash12)Specific fear 10 (07ndash13) 10 (08ndash13)Social fear 10 (07ndash13) 10 (07ndash14)Separation anxietyndash Childhood only 12 (08ndash18) 13 (09ndash20)ndash Adult only 11 (07ndash17) 13 (08ndash20)ndash Both 13 (07ndash24) 16 (08ndash29)

III Substance problemsNicotine 12 (09ndash15) 12 (09ndash16)Alcohol-drugs 11 (08ndash15) 11 (08ndash14)

IV Impulse-control problemsOppositional-defiant 10 (08ndash13) 10 (08ndash14)Conduct 09 (07ndash11) 09 (06ndash11)Intermittent explosive 11 (09ndash14) 12 (10ndash16)Attention-hyperactive problemsndash Attention only 09 (06ndash13) 09 (07ndash13)ndash Hyperactive only 11 (07ndash19) 11 (06ndash20)ndash Both 10 (06ndash06) 10 (06ndash16)

1 Based on logistic regression equations in which respondents in the main survey were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of the residents of all non-respondent HUs using methods described in the text All equations controlled for the predictors in Model 3 in Table 4

IJMPR 132 3rd v4 25604 1026 am Page 83

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 10: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

Kessler Berglund et al78

locked buildings in New York City each representinga complete sample segment Non-probabilityreplacements of other locked buildings in the sameneighbourhoods were made in order to avoidcomplete non-response in these segments

The within-household probability of selectionweight (WT12) adjusts for the fact that the proba-bility of selection of respondents within HUs variesinversely with the number of people in the HU Thisis true because as noted earlier only one or tworespondents were selected for interview in each HUWT12 was created by generating a separate observa-tional record for each of the weighted (by WT11)14025 eligible HU residents in the 7693 partici-pating HUs The three variables in the householdlisting were included in the individual-level recordsrespondent age and sex and number of eligible resi-dents in the HU The weighted distributions of thesethree variables were then calculated for the 9282respondents the 4733 eligible HU residents whowere not interviewed and for all 14015 eligible HUresidents weighted to adjust for WT11 The sum ofweights was 14025 because of this adjustment

As shown in Table 3 these distributions differmeaningfully with the greatest difference being for

size of household This is because the individual-level probability of selection within HUs wasinversely proportional to HU size leading to adistortion in the age and sex distributions amongrespondents because these variables vary with size ofHU WT12 was constructed to correct this bias Theweighted three-way cross-classification of age sexand household size was computed separately for the9282 respondents (weighted to 9287 because ofWT11) and for all 14015 residents of the partici-pating HUs (weighted to 14025) The ratio of thenumber of cases in the latter to the former was thencalculated separately for each cell and these ratioswere applied to each respondent The sum of theweights across respondents was normed to sum to14025 These normed ratios define WT12

The non-response adjustment weight (WT13)was then developed and applied to the weighted (bythe product of WT11 and WT12) dataset of 9282respondents to adjust for the fact that non-respondents differ from respondents The partici-pants in the short-form interviews from initialnon-respondent HUs were treated as representativeof all non-respondents in order to develop thisweight As described in the next subsection a two-

Table 3 Household listing information for respondents and other eligible residents of the households thatparticipated in the main survey1

Respondents Others Total

(SE) (SE) (SE)

Age18ndash29 227 (04) 269 (06) 241 (04)30ndash44 317 (05) 298 (07) 311 (04)45ndash59 246 (04) 266 (06) 253 (04)60+ 210 (04) 166 (05) 195 (03)

SexMale 446 (05) 520 (07) 471 (04)Female 554 (05) 480 (07) 529 (04)

Number of eligible HU residents1 293 (05) 00 (00) 194 (03)2 539 (05) 606 (07) 562 (04)3+ 168 (04) 394 (07) 244 (04)

Total (n) (9287) (4738) (14025)

1 A total of 9282 respondents from 7693 HUs participated in the main survey An additional 4733 eligible people lived in the 7693HUs The locked building weight (WT11) was used in calculating the distributions in the table converting the 9282 respondents intoa weighted total of 9287 and the 4733 other eligible HU residents into a weighted total of 4738 for a total of 14025

IJMPR 132 3rd v4 25604 1026 am Page 78

NCS-R design and field procedures 79

part weight was applied as a preliminary step to theseshort-form respondents aimed at making them asrepresentative as possible of all residents of non-respondent HUs As the short-form respondentscompleted the disorder screening section as well asthe demographic sections it was possible to use diag-nostic stem questions in addition to individual-leveldemographic variables as predictors in comparing themain survey respondents with the initial non-respon-dents Prior to carrying out the stepwise logisticregression analysis the main survey respondentswere weighted to a sum of weights of 14025 to repre-sent all eligible residents of the respondent HUswhile the initial non-respondents were weighted to asum of weights of 6302 to represent all eligible resi-dents of non-respondent HUs In both cases thesesums were obtained from HU listings

WT13 is a very important weight because thelogistic regression equation on which it is based eval-uates whether there is a significant bias in theprevalence of mental disorders in the main sampleWe consequently consider the construction ofWT13 in some detail in this part of the paper Fourstages of model fitting were used to create the logisticregression equation on which WT13 is based Thesestages sequentially evaluated the effects ofgeographic variables individual-level demographicvariables census small area aggregate variables andscreening measures of mental disorders Results ofthe first three stages are presented in Table 4 Thefirst stage (Model 1) examined segment-level differ-ences between respondent and non-respondent HUsin Census region and Census urbanicity Results inthe first two columns of Table 4 show that respon-dent HUs differ meaningfully from non-respondentsHUs in urbanicity (χ2

7 = 1060 p = 000) and region(χ2

3 = 66 p = 009) The odds-ratios (ORs) forurbanicity and region show that the sample over-represents non-metropolitan counties the Midwestand the South

The second stage (Model 2) added basic individual-level demographic variables to the equa-tion ndash age sex race-ethnicity marital statuseducation employment status and household sizeAs shown in Table 4 household size (χ2

3 = 604 p =000) and marital status (χ2

2 = 92 p = 001) werethe only significant predictors in this set withrespondents significantly less likely than non-respondents to live in houses with one to three

eligible residents and to be never married or previ-ously married

The third stage (Model 3) was based on a forwardstepwise logistic regression analysis that searched for2000 census block group (BG) aggregate measuresthat significantly discriminate between respondentsand non-respondents A wide range of variables wasused to describe BG characteristics includingpercentage variables (for example the percentage ofadults in the BG living in poverty the percentageliving alone) and mean variables (for example theaverage family income of HUs in the BG the averagenumber of adults per HU) In cases where a samplesegment crossed BG boundaries weighted averagesacross these units were calculated

Five aggregate variables were found to be signifi-cant predictors in the third stage of analysis All werediscretized in elaborations of the basic logistic regres-sion equations in order to study their functional formin distinguishing respondents from non-respondents(005 level two-sided tests) Dichotomous classifica-tion was found to be appropriate to characterize thefunctional form of four predictors A trichotomousspecification was required for the fifth predictorResults presented in Table 4 show that respondentswere significantly more likely than non-respondentsto live in areas with a low proportion of people overthe age of 65 a low proportion of foreign-speakers ahigh proportion of non-Hispanic whites a highproportion of never-married people and highproportions of people not in the labour force

In the fourth stage of estimating the non-responsebias equation positive responses to diagnostic stemquestions in the screening section of the interviewwere added to the predictors in Model 3 Stem questions were included for mood disturbance(dysphoria euphoria irritability) anxiety (persistentworry panic agoraphobia specific fear social fearchildhood and adult separation anxiety) substanceproblems (alcohol drugs nicotine) and impulsecontrol problems (oppositional-defiant conductdisorder intermittent explosive attentional andhyperactive) As shown in Table 5 none of thesevariables alone was either a statistically or substan-tively significant predictor in the fourth stage of theanalysis These variables are significant as a group(χ2

20 = 381 p = 0010) though even though noneof the predictors in the equation is individuallysignificant We also considered more complex

IJMPR 132 3rd v4 25604 1026 am Page 79

Kessler Berglund et al80

Table 4 Geographic and socio-demographic predictors of response versus non-response based on thecomparison of main survey respondents (n = 9282) versus short-form survey respondents (n = 475)1

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Region Midwest 17 (11ndash26) 66 16 (10ndash25) 53 14 (09ndash21) 28 3South 18 (11ndash29) 16 (10ndash26) 13 (08ndash22)West 13 (07ndash25) 14 (07ndash26) 13 (07ndash24)Northeast 10 ndash

County urbanicity2

Central counties of metro 10 ndash 10 ndash 10 ndashareas of 1+million

Fringe counties of metro a 10 (05ndash21) 1060 12 (05ndash24) 875 08 (04ndash15) 232 7reas 1+ million

Central and fringe counties 15 (09ndash25) 16 (10ndash27) 15 (10ndash25)of metro areas of 250000 ndash1 million

Central and fringe counties 15 (08ndash29) 16 (09ndash29) 13 (07ndash25)of metro areasof less than 250000

Non-metro counties of 20000 19 (09ndash42) 21 (10ndash45) 20 (11ndash40)or more adjacent to a metro area

Non-metro counties of 20000 or 69 (42ndash111) 69 (41ndash118) 28 (14ndash54)more not adjacent to a metro area

Non-metro counties of 17 (08ndash37) 19 (09ndash40) 16 (08ndash35)2500 ndash19999

Non-metro counties of less than 31 (18ndash55) 34 (20ndash58) 22 (12ndash43)2500

Age18ndash29 10 ndash 10 ndash30ndash44 09 (06ndash14) 23 10 (07ndash15) 29 345ndash59 08 (06ndash12) 09 (06ndash14)60+ 11 (06ndash19) 13 (08ndash23)

SexFemale 10 (09ndash13) ndash 10 (09ndash13) ndashMale 10 ndash 10 ndash

Race Non-Hispanic White 10 ndash 10 ndashNon-Hispanic Black 15 (10ndash22) 39 17 (11ndash26) 76 3Hispanic 11 (07ndash17) 14 (09ndash20)Other 09 (05ndash15) 09 (05ndash15)

Marital statusMarried3 10 ndash 10 ndashNever married 16 (11ndash24) 92 17 (11ndash25) 91 2Separatedwidoweddivorced 18 (12ndash28) 18 (12ndash27)

Education0ndash11 10 ndash 10 ndash12 09 (06ndash14) 18 09 (06ndash14) 32 313ndash15 10 (07ndash14) 10 (07ndash14)16+ 11 (08ndash16) 12 (08ndash17)

IJMPR 132 3rd v4 25604 1026 am Page 80

NCS-R design and field procedures 81

specifications that included disorder clusters andcounts but none yielded any more compellingevidence than in Table 5 for significant differencesbetween respondents and initial non-respondents inthe prevalence of these diagnostic stem questions

Based on these results the final non-responseadjustment weight was based on Model 3 Each ofthe 9282 main study respondents was assigned apredicted probability of response based on this equa-tion The non-response adjustment weight was

defined as the multiplicative inverse of thatpredicted probability This weight was then normedto sum to 9282 This normed weight defines WT13

The 9282 cases were then weighted by the jointproduct of WT11 WT12 and WT13 A fourthweight (WT14) was then created to adjust for varia-tion between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the March 2002 Current PopulationSurvey (CPS) data This post-stratification weight

Table 4 contd

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Employment statusEmployed 10 ndash 10 ndashHomemaker 13 (08ndash20) 20 14 (09ndash21) 28 4Retired 11 (06ndash18) 11 (07ndash18)Student 09 (04ndash18) 09 (04ndash18)Other 11 (06ndash19) 11 (06ndash20)

Number of eligible household residents1 07 (04ndash11) 604 06 (04ndash10) 720 32 19 (11ndash35) 18 (10ndash33)3 27 (14ndash52) 26 (13ndash50)4+ 10 ndash 10 ndash

Block group-level socio-demographics4

Unable to speak English D1-6 19 (14ndash26) Never married D5-10 15 (11ndash22) Not in labor force D1-3 06 (04ndash09) Age 65+ D1-3 27 (16ndash45) Age 65+ D4-9 15 (09ndash23) Non-Hispanic White D1 04 (02ndash06)

Significant at the 005 level two-sided test

1 Based on logistic regression equations in which main survey respondents were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of all non-respondents usingmethods described in the text2 Metropolitan counties are defined as either central or fringe counties of census metropolitan statistical areas Non-metropolitancounties are defined residually as all counties that are not in census metropolitan statistical areas For more details on the CensusBureau definition of metropolitan statistical areas see httpwwwcensusgovpopulationwwwestimatesmetrodefhtml3 Includes co-habitors4 The percentages in each county were converted to percentiles based on weighted (by population size) county-level distributionsand converted into deciles (D) of the weighted percentile distributions The coefficients from preliminary regression analysesincluded nine dummies for the deciles of each substantive variable These coefficients were examined in order to choose theoptimal way to collapse the deciles A dichotomous coding was optimal for four of the five substantive variables and a trichotomouscoding for the fifth

IJMPR 132 3rd v4 25604 1026 am Page 81

Kessler Berglund et al82

was based on comparisons of age sex race-ethnicityeducation marital status region and urbanicity inthe weighted (by the joint product of WT11WT12 and WT13) sample and the 2002 CPSsample for persons ages 18+ in the continental USAs the full cross-classification of these seven vari-ables even using coarse categories has more cellsthan there are respondents in the sample asmoothing method was used to fit only the two-waymarginals by estimating a logistic regression equationthat combined the weighted 9282 respondents(coded 1 on the dichotomous dependent variable)with a synthetic dataset representative of the individual-level 2002 CPS data for the populationages 18+ (coded 0 on the dependent variable) Theseven socio-demographic variables and their two-way interactions were the predictors The predictedprobability of being in the NCS-R sample ratherthan in the synthetic CPS population sample wasdefined for each respondent (pNn) based on thisequation The ratio pNn (1 ndash pNn) was then calcu-lated for each respondent The sum of these ratiosacross the 9282 respondents was normed to sum to9282 These normed ratios define WT14

The four-way product of weights WT11 throughWT14 was then created and applied to the 9282respondent cases This defines the consolidated PartI weight based on the NR approach An additionalweight (WT15) is needed though to analyse datain the Part II sample This is because as notedearlier the Part I sample was divided into three stratathat differed in their probabilities of selection intoPart II (100 in stratum I 50 in stratum II and25 in stratum III) The proportions who actuallycompleted Part II differ from these targets 992 ofthe 4088 respondents in stratum I (n = 4054)534 of the 1555 respondents in stratum II (n = 830) and 222 of the 3639 respondents instratum III (n = 808) for a total Part II sample of5692 respondents These differences from the targetproportions occurred because some respondentsrefused to complete Part II (approximately 1 ofdesignated cases in each stratum) and because therandom selection procedure for respondents imple-mented in the CAPI program had some randomerror The latter accounts for the fact that thenumber of Part II respondents in stratum II is largerthan the target proportion

We could have generated WT15 as the simple

inverse of the weighted (by the consolidated Part Iweight) proportion of Part I respondents in thestratum who completed Part II However in order tocheck for the possibility of systematic bias we esti-mated separate stepwise logistic regression equationsin each stratum to predict participation in Part IIfrom Part I responses Only a handful of variables allof them demographic were found to be significantpredictors in these equations Each Part II respon-dent was assigned the inverse of his or her predictedprobability of participation in Part II from the finalwithin-stratum equation These values were thennormed to have a sum equal to the sum of the Part Iweights in the full Part I sample in the stratumThese normed values were then summed across theentire Part II sample of 5692 cases and renormed tohave a sum of weights of 5692 These renormedvalues define WT15 WT15 was then multiplied bythe consolidated Part I weight to create the consoli-dated Part II weight

Comparison of the weighted and unweighteddistributions of the Part I and Part II samples with2002 CPS population distributions provides informa-tion on the effects of weighting As shown in Table6 the unweighted Part I samples overrepresentedracial minorities females residents of the Midwestpeople with 13+ years of education and residents ofmetropolitan areas All of these biases were correctedwith the consolidated Part I weight Biases in theunweighted Part II sample were similar althoughmore extreme biases were found than in the Part Isample with regard to the over-representation offemales young adults (ages 18ndash34) and residents ofMetropolitan areas All of these biases werecorrected with the consolidated Part II weight

Weighting short-form respondents to be representative ofall non-respondentsWe noted in the last subsection that WT13 relies ona two-part weighting scheme that was applied to theshort-form respondents before they were comparedwith the main survey respondents This was done inorder to increase the extent to which the short-formrespondents represent all main survey non-respondents The first part of this weighting schemecompared the 399 HUs in which short-form inter-views were completed with all other non-respondentHUs on the same aggregate 2000 census block group(BG) data as those used in the third stage (Model 3)

IJMPR 132 3rd v4 25604 1026 am Page 82

NCS-R design and field procedures 83

of the non-response adjustment model (Table 4)Stepwise logistic regression was used to select signifi-cant predictors of HU-level participation versusnon-participation The final set of significant predic-tors included region urbanicity and household sizeThe participating HUs were then weighted by theinverse of their predicted probability of participationto adjust for segment-level variation in responseThis weight was then normed to have a sum ofweights equal to 3258 the weighted total number ofnon-respondent HUs in the sample

With this first weight applied separately to each ofthe 773 residents of the 399 participating short-formHUs a comparison of the short-form respondents inthese 399 HUs with the remaining eligible residentsof the same HUs was made based on the cross-classification of listing information (age and sex ofeach eligible HU resident and number of eligibleresidents in each HU) A second weight was thendeveloped that was equivalent to WT12 in adjustingthe short-form respondents to be representative of all773 eligible residents of these HUs As the 399 HUs

Table 5 Diagnostic stem question predictors of response versus non-response based on the comparison ofmain survey respondents (n = 9282) versus short-form survey respondents in non-respondent households (n = 475)1

Bivariate Multivariate

OR (95 CI) OR (95 CI)

I Mood disturbanceDysphoria 09 (07ndash12) 10 (08ndash14)Euphoria 08 (06ndash11) 09 (07ndash12)Irritability 08 (06ndash10) 08 (06ndash11)Extreme irritability 08 (06ndash12) 09 (07ndash13)

II AnxietyPersistent worry 09 (07ndash11) 09 (07ndash12)Panic 08 (06ndash11) 08 (06ndash11)Agoraphobic fear 09 (07ndash12) 09 (07ndash12)Specific fear 10 (07ndash13) 10 (08ndash13)Social fear 10 (07ndash13) 10 (07ndash14)Separation anxietyndash Childhood only 12 (08ndash18) 13 (09ndash20)ndash Adult only 11 (07ndash17) 13 (08ndash20)ndash Both 13 (07ndash24) 16 (08ndash29)

III Substance problemsNicotine 12 (09ndash15) 12 (09ndash16)Alcohol-drugs 11 (08ndash15) 11 (08ndash14)

IV Impulse-control problemsOppositional-defiant 10 (08ndash13) 10 (08ndash14)Conduct 09 (07ndash11) 09 (06ndash11)Intermittent explosive 11 (09ndash14) 12 (10ndash16)Attention-hyperactive problemsndash Attention only 09 (06ndash13) 09 (07ndash13)ndash Hyperactive only 11 (07ndash19) 11 (06ndash20)ndash Both 10 (06ndash06) 10 (06ndash16)

1 Based on logistic regression equations in which respondents in the main survey were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of the residents of all non-respondent HUs using methods described in the text All equations controlled for the predictors in Model 3 in Table 4

IJMPR 132 3rd v4 25604 1026 am Page 83

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 11: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

NCS-R design and field procedures 79

part weight was applied as a preliminary step to theseshort-form respondents aimed at making them asrepresentative as possible of all residents of non-respondent HUs As the short-form respondentscompleted the disorder screening section as well asthe demographic sections it was possible to use diag-nostic stem questions in addition to individual-leveldemographic variables as predictors in comparing themain survey respondents with the initial non-respon-dents Prior to carrying out the stepwise logisticregression analysis the main survey respondentswere weighted to a sum of weights of 14025 to repre-sent all eligible residents of the respondent HUswhile the initial non-respondents were weighted to asum of weights of 6302 to represent all eligible resi-dents of non-respondent HUs In both cases thesesums were obtained from HU listings

WT13 is a very important weight because thelogistic regression equation on which it is based eval-uates whether there is a significant bias in theprevalence of mental disorders in the main sampleWe consequently consider the construction ofWT13 in some detail in this part of the paper Fourstages of model fitting were used to create the logisticregression equation on which WT13 is based Thesestages sequentially evaluated the effects ofgeographic variables individual-level demographicvariables census small area aggregate variables andscreening measures of mental disorders Results ofthe first three stages are presented in Table 4 Thefirst stage (Model 1) examined segment-level differ-ences between respondent and non-respondent HUsin Census region and Census urbanicity Results inthe first two columns of Table 4 show that respon-dent HUs differ meaningfully from non-respondentsHUs in urbanicity (χ2

7 = 1060 p = 000) and region(χ2

3 = 66 p = 009) The odds-ratios (ORs) forurbanicity and region show that the sample over-represents non-metropolitan counties the Midwestand the South

The second stage (Model 2) added basic individual-level demographic variables to the equa-tion ndash age sex race-ethnicity marital statuseducation employment status and household sizeAs shown in Table 4 household size (χ2

3 = 604 p =000) and marital status (χ2

2 = 92 p = 001) werethe only significant predictors in this set withrespondents significantly less likely than non-respondents to live in houses with one to three

eligible residents and to be never married or previ-ously married

The third stage (Model 3) was based on a forwardstepwise logistic regression analysis that searched for2000 census block group (BG) aggregate measuresthat significantly discriminate between respondentsand non-respondents A wide range of variables wasused to describe BG characteristics includingpercentage variables (for example the percentage ofadults in the BG living in poverty the percentageliving alone) and mean variables (for example theaverage family income of HUs in the BG the averagenumber of adults per HU) In cases where a samplesegment crossed BG boundaries weighted averagesacross these units were calculated

Five aggregate variables were found to be signifi-cant predictors in the third stage of analysis All werediscretized in elaborations of the basic logistic regres-sion equations in order to study their functional formin distinguishing respondents from non-respondents(005 level two-sided tests) Dichotomous classifica-tion was found to be appropriate to characterize thefunctional form of four predictors A trichotomousspecification was required for the fifth predictorResults presented in Table 4 show that respondentswere significantly more likely than non-respondentsto live in areas with a low proportion of people overthe age of 65 a low proportion of foreign-speakers ahigh proportion of non-Hispanic whites a highproportion of never-married people and highproportions of people not in the labour force

In the fourth stage of estimating the non-responsebias equation positive responses to diagnostic stemquestions in the screening section of the interviewwere added to the predictors in Model 3 Stem questions were included for mood disturbance(dysphoria euphoria irritability) anxiety (persistentworry panic agoraphobia specific fear social fearchildhood and adult separation anxiety) substanceproblems (alcohol drugs nicotine) and impulsecontrol problems (oppositional-defiant conductdisorder intermittent explosive attentional andhyperactive) As shown in Table 5 none of thesevariables alone was either a statistically or substan-tively significant predictor in the fourth stage of theanalysis These variables are significant as a group(χ2

20 = 381 p = 0010) though even though noneof the predictors in the equation is individuallysignificant We also considered more complex

IJMPR 132 3rd v4 25604 1026 am Page 79

Kessler Berglund et al80

Table 4 Geographic and socio-demographic predictors of response versus non-response based on thecomparison of main survey respondents (n = 9282) versus short-form survey respondents (n = 475)1

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Region Midwest 17 (11ndash26) 66 16 (10ndash25) 53 14 (09ndash21) 28 3South 18 (11ndash29) 16 (10ndash26) 13 (08ndash22)West 13 (07ndash25) 14 (07ndash26) 13 (07ndash24)Northeast 10 ndash

County urbanicity2

Central counties of metro 10 ndash 10 ndash 10 ndashareas of 1+million

Fringe counties of metro a 10 (05ndash21) 1060 12 (05ndash24) 875 08 (04ndash15) 232 7reas 1+ million

Central and fringe counties 15 (09ndash25) 16 (10ndash27) 15 (10ndash25)of metro areas of 250000 ndash1 million

Central and fringe counties 15 (08ndash29) 16 (09ndash29) 13 (07ndash25)of metro areasof less than 250000

Non-metro counties of 20000 19 (09ndash42) 21 (10ndash45) 20 (11ndash40)or more adjacent to a metro area

Non-metro counties of 20000 or 69 (42ndash111) 69 (41ndash118) 28 (14ndash54)more not adjacent to a metro area

Non-metro counties of 17 (08ndash37) 19 (09ndash40) 16 (08ndash35)2500 ndash19999

Non-metro counties of less than 31 (18ndash55) 34 (20ndash58) 22 (12ndash43)2500

Age18ndash29 10 ndash 10 ndash30ndash44 09 (06ndash14) 23 10 (07ndash15) 29 345ndash59 08 (06ndash12) 09 (06ndash14)60+ 11 (06ndash19) 13 (08ndash23)

SexFemale 10 (09ndash13) ndash 10 (09ndash13) ndashMale 10 ndash 10 ndash

Race Non-Hispanic White 10 ndash 10 ndashNon-Hispanic Black 15 (10ndash22) 39 17 (11ndash26) 76 3Hispanic 11 (07ndash17) 14 (09ndash20)Other 09 (05ndash15) 09 (05ndash15)

Marital statusMarried3 10 ndash 10 ndashNever married 16 (11ndash24) 92 17 (11ndash25) 91 2Separatedwidoweddivorced 18 (12ndash28) 18 (12ndash27)

Education0ndash11 10 ndash 10 ndash12 09 (06ndash14) 18 09 (06ndash14) 32 313ndash15 10 (07ndash14) 10 (07ndash14)16+ 11 (08ndash16) 12 (08ndash17)

IJMPR 132 3rd v4 25604 1026 am Page 80

NCS-R design and field procedures 81

specifications that included disorder clusters andcounts but none yielded any more compellingevidence than in Table 5 for significant differencesbetween respondents and initial non-respondents inthe prevalence of these diagnostic stem questions

Based on these results the final non-responseadjustment weight was based on Model 3 Each ofthe 9282 main study respondents was assigned apredicted probability of response based on this equa-tion The non-response adjustment weight was

defined as the multiplicative inverse of thatpredicted probability This weight was then normedto sum to 9282 This normed weight defines WT13

The 9282 cases were then weighted by the jointproduct of WT11 WT12 and WT13 A fourthweight (WT14) was then created to adjust for varia-tion between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the March 2002 Current PopulationSurvey (CPS) data This post-stratification weight

Table 4 contd

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Employment statusEmployed 10 ndash 10 ndashHomemaker 13 (08ndash20) 20 14 (09ndash21) 28 4Retired 11 (06ndash18) 11 (07ndash18)Student 09 (04ndash18) 09 (04ndash18)Other 11 (06ndash19) 11 (06ndash20)

Number of eligible household residents1 07 (04ndash11) 604 06 (04ndash10) 720 32 19 (11ndash35) 18 (10ndash33)3 27 (14ndash52) 26 (13ndash50)4+ 10 ndash 10 ndash

Block group-level socio-demographics4

Unable to speak English D1-6 19 (14ndash26) Never married D5-10 15 (11ndash22) Not in labor force D1-3 06 (04ndash09) Age 65+ D1-3 27 (16ndash45) Age 65+ D4-9 15 (09ndash23) Non-Hispanic White D1 04 (02ndash06)

Significant at the 005 level two-sided test

1 Based on logistic regression equations in which main survey respondents were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of all non-respondents usingmethods described in the text2 Metropolitan counties are defined as either central or fringe counties of census metropolitan statistical areas Non-metropolitancounties are defined residually as all counties that are not in census metropolitan statistical areas For more details on the CensusBureau definition of metropolitan statistical areas see httpwwwcensusgovpopulationwwwestimatesmetrodefhtml3 Includes co-habitors4 The percentages in each county were converted to percentiles based on weighted (by population size) county-level distributionsand converted into deciles (D) of the weighted percentile distributions The coefficients from preliminary regression analysesincluded nine dummies for the deciles of each substantive variable These coefficients were examined in order to choose theoptimal way to collapse the deciles A dichotomous coding was optimal for four of the five substantive variables and a trichotomouscoding for the fifth

IJMPR 132 3rd v4 25604 1026 am Page 81

Kessler Berglund et al82

was based on comparisons of age sex race-ethnicityeducation marital status region and urbanicity inthe weighted (by the joint product of WT11WT12 and WT13) sample and the 2002 CPSsample for persons ages 18+ in the continental USAs the full cross-classification of these seven vari-ables even using coarse categories has more cellsthan there are respondents in the sample asmoothing method was used to fit only the two-waymarginals by estimating a logistic regression equationthat combined the weighted 9282 respondents(coded 1 on the dichotomous dependent variable)with a synthetic dataset representative of the individual-level 2002 CPS data for the populationages 18+ (coded 0 on the dependent variable) Theseven socio-demographic variables and their two-way interactions were the predictors The predictedprobability of being in the NCS-R sample ratherthan in the synthetic CPS population sample wasdefined for each respondent (pNn) based on thisequation The ratio pNn (1 ndash pNn) was then calcu-lated for each respondent The sum of these ratiosacross the 9282 respondents was normed to sum to9282 These normed ratios define WT14

The four-way product of weights WT11 throughWT14 was then created and applied to the 9282respondent cases This defines the consolidated PartI weight based on the NR approach An additionalweight (WT15) is needed though to analyse datain the Part II sample This is because as notedearlier the Part I sample was divided into three stratathat differed in their probabilities of selection intoPart II (100 in stratum I 50 in stratum II and25 in stratum III) The proportions who actuallycompleted Part II differ from these targets 992 ofthe 4088 respondents in stratum I (n = 4054)534 of the 1555 respondents in stratum II (n = 830) and 222 of the 3639 respondents instratum III (n = 808) for a total Part II sample of5692 respondents These differences from the targetproportions occurred because some respondentsrefused to complete Part II (approximately 1 ofdesignated cases in each stratum) and because therandom selection procedure for respondents imple-mented in the CAPI program had some randomerror The latter accounts for the fact that thenumber of Part II respondents in stratum II is largerthan the target proportion

We could have generated WT15 as the simple

inverse of the weighted (by the consolidated Part Iweight) proportion of Part I respondents in thestratum who completed Part II However in order tocheck for the possibility of systematic bias we esti-mated separate stepwise logistic regression equationsin each stratum to predict participation in Part IIfrom Part I responses Only a handful of variables allof them demographic were found to be significantpredictors in these equations Each Part II respon-dent was assigned the inverse of his or her predictedprobability of participation in Part II from the finalwithin-stratum equation These values were thennormed to have a sum equal to the sum of the Part Iweights in the full Part I sample in the stratumThese normed values were then summed across theentire Part II sample of 5692 cases and renormed tohave a sum of weights of 5692 These renormedvalues define WT15 WT15 was then multiplied bythe consolidated Part I weight to create the consoli-dated Part II weight

Comparison of the weighted and unweighteddistributions of the Part I and Part II samples with2002 CPS population distributions provides informa-tion on the effects of weighting As shown in Table6 the unweighted Part I samples overrepresentedracial minorities females residents of the Midwestpeople with 13+ years of education and residents ofmetropolitan areas All of these biases were correctedwith the consolidated Part I weight Biases in theunweighted Part II sample were similar althoughmore extreme biases were found than in the Part Isample with regard to the over-representation offemales young adults (ages 18ndash34) and residents ofMetropolitan areas All of these biases werecorrected with the consolidated Part II weight

Weighting short-form respondents to be representative ofall non-respondentsWe noted in the last subsection that WT13 relies ona two-part weighting scheme that was applied to theshort-form respondents before they were comparedwith the main survey respondents This was done inorder to increase the extent to which the short-formrespondents represent all main survey non-respondents The first part of this weighting schemecompared the 399 HUs in which short-form inter-views were completed with all other non-respondentHUs on the same aggregate 2000 census block group(BG) data as those used in the third stage (Model 3)

IJMPR 132 3rd v4 25604 1026 am Page 82

NCS-R design and field procedures 83

of the non-response adjustment model (Table 4)Stepwise logistic regression was used to select signifi-cant predictors of HU-level participation versusnon-participation The final set of significant predic-tors included region urbanicity and household sizeThe participating HUs were then weighted by theinverse of their predicted probability of participationto adjust for segment-level variation in responseThis weight was then normed to have a sum ofweights equal to 3258 the weighted total number ofnon-respondent HUs in the sample

With this first weight applied separately to each ofthe 773 residents of the 399 participating short-formHUs a comparison of the short-form respondents inthese 399 HUs with the remaining eligible residentsof the same HUs was made based on the cross-classification of listing information (age and sex ofeach eligible HU resident and number of eligibleresidents in each HU) A second weight was thendeveloped that was equivalent to WT12 in adjustingthe short-form respondents to be representative of all773 eligible residents of these HUs As the 399 HUs

Table 5 Diagnostic stem question predictors of response versus non-response based on the comparison ofmain survey respondents (n = 9282) versus short-form survey respondents in non-respondent households (n = 475)1

Bivariate Multivariate

OR (95 CI) OR (95 CI)

I Mood disturbanceDysphoria 09 (07ndash12) 10 (08ndash14)Euphoria 08 (06ndash11) 09 (07ndash12)Irritability 08 (06ndash10) 08 (06ndash11)Extreme irritability 08 (06ndash12) 09 (07ndash13)

II AnxietyPersistent worry 09 (07ndash11) 09 (07ndash12)Panic 08 (06ndash11) 08 (06ndash11)Agoraphobic fear 09 (07ndash12) 09 (07ndash12)Specific fear 10 (07ndash13) 10 (08ndash13)Social fear 10 (07ndash13) 10 (07ndash14)Separation anxietyndash Childhood only 12 (08ndash18) 13 (09ndash20)ndash Adult only 11 (07ndash17) 13 (08ndash20)ndash Both 13 (07ndash24) 16 (08ndash29)

III Substance problemsNicotine 12 (09ndash15) 12 (09ndash16)Alcohol-drugs 11 (08ndash15) 11 (08ndash14)

IV Impulse-control problemsOppositional-defiant 10 (08ndash13) 10 (08ndash14)Conduct 09 (07ndash11) 09 (06ndash11)Intermittent explosive 11 (09ndash14) 12 (10ndash16)Attention-hyperactive problemsndash Attention only 09 (06ndash13) 09 (07ndash13)ndash Hyperactive only 11 (07ndash19) 11 (06ndash20)ndash Both 10 (06ndash06) 10 (06ndash16)

1 Based on logistic regression equations in which respondents in the main survey were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of the residents of all non-respondent HUs using methods described in the text All equations controlled for the predictors in Model 3 in Table 4

IJMPR 132 3rd v4 25604 1026 am Page 83

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 12: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

Kessler Berglund et al80

Table 4 Geographic and socio-demographic predictors of response versus non-response based on thecomparison of main survey respondents (n = 9282) versus short-form survey respondents (n = 475)1

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Region Midwest 17 (11ndash26) 66 16 (10ndash25) 53 14 (09ndash21) 28 3South 18 (11ndash29) 16 (10ndash26) 13 (08ndash22)West 13 (07ndash25) 14 (07ndash26) 13 (07ndash24)Northeast 10 ndash

County urbanicity2

Central counties of metro 10 ndash 10 ndash 10 ndashareas of 1+million

Fringe counties of metro a 10 (05ndash21) 1060 12 (05ndash24) 875 08 (04ndash15) 232 7reas 1+ million

Central and fringe counties 15 (09ndash25) 16 (10ndash27) 15 (10ndash25)of metro areas of 250000 ndash1 million

Central and fringe counties 15 (08ndash29) 16 (09ndash29) 13 (07ndash25)of metro areasof less than 250000

Non-metro counties of 20000 19 (09ndash42) 21 (10ndash45) 20 (11ndash40)or more adjacent to a metro area

Non-metro counties of 20000 or 69 (42ndash111) 69 (41ndash118) 28 (14ndash54)more not adjacent to a metro area

Non-metro counties of 17 (08ndash37) 19 (09ndash40) 16 (08ndash35)2500 ndash19999

Non-metro counties of less than 31 (18ndash55) 34 (20ndash58) 22 (12ndash43)2500

Age18ndash29 10 ndash 10 ndash30ndash44 09 (06ndash14) 23 10 (07ndash15) 29 345ndash59 08 (06ndash12) 09 (06ndash14)60+ 11 (06ndash19) 13 (08ndash23)

SexFemale 10 (09ndash13) ndash 10 (09ndash13) ndashMale 10 ndash 10 ndash

Race Non-Hispanic White 10 ndash 10 ndashNon-Hispanic Black 15 (10ndash22) 39 17 (11ndash26) 76 3Hispanic 11 (07ndash17) 14 (09ndash20)Other 09 (05ndash15) 09 (05ndash15)

Marital statusMarried3 10 ndash 10 ndashNever married 16 (11ndash24) 92 17 (11ndash25) 91 2Separatedwidoweddivorced 18 (12ndash28) 18 (12ndash27)

Education0ndash11 10 ndash 10 ndash12 09 (06ndash14) 18 09 (06ndash14) 32 313ndash15 10 (07ndash14) 10 (07ndash14)16+ 11 (08ndash16) 12 (08ndash17)

IJMPR 132 3rd v4 25604 1026 am Page 80

NCS-R design and field procedures 81

specifications that included disorder clusters andcounts but none yielded any more compellingevidence than in Table 5 for significant differencesbetween respondents and initial non-respondents inthe prevalence of these diagnostic stem questions

Based on these results the final non-responseadjustment weight was based on Model 3 Each ofthe 9282 main study respondents was assigned apredicted probability of response based on this equa-tion The non-response adjustment weight was

defined as the multiplicative inverse of thatpredicted probability This weight was then normedto sum to 9282 This normed weight defines WT13

The 9282 cases were then weighted by the jointproduct of WT11 WT12 and WT13 A fourthweight (WT14) was then created to adjust for varia-tion between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the March 2002 Current PopulationSurvey (CPS) data This post-stratification weight

Table 4 contd

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Employment statusEmployed 10 ndash 10 ndashHomemaker 13 (08ndash20) 20 14 (09ndash21) 28 4Retired 11 (06ndash18) 11 (07ndash18)Student 09 (04ndash18) 09 (04ndash18)Other 11 (06ndash19) 11 (06ndash20)

Number of eligible household residents1 07 (04ndash11) 604 06 (04ndash10) 720 32 19 (11ndash35) 18 (10ndash33)3 27 (14ndash52) 26 (13ndash50)4+ 10 ndash 10 ndash

Block group-level socio-demographics4

Unable to speak English D1-6 19 (14ndash26) Never married D5-10 15 (11ndash22) Not in labor force D1-3 06 (04ndash09) Age 65+ D1-3 27 (16ndash45) Age 65+ D4-9 15 (09ndash23) Non-Hispanic White D1 04 (02ndash06)

Significant at the 005 level two-sided test

1 Based on logistic regression equations in which main survey respondents were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of all non-respondents usingmethods described in the text2 Metropolitan counties are defined as either central or fringe counties of census metropolitan statistical areas Non-metropolitancounties are defined residually as all counties that are not in census metropolitan statistical areas For more details on the CensusBureau definition of metropolitan statistical areas see httpwwwcensusgovpopulationwwwestimatesmetrodefhtml3 Includes co-habitors4 The percentages in each county were converted to percentiles based on weighted (by population size) county-level distributionsand converted into deciles (D) of the weighted percentile distributions The coefficients from preliminary regression analysesincluded nine dummies for the deciles of each substantive variable These coefficients were examined in order to choose theoptimal way to collapse the deciles A dichotomous coding was optimal for four of the five substantive variables and a trichotomouscoding for the fifth

IJMPR 132 3rd v4 25604 1026 am Page 81

Kessler Berglund et al82

was based on comparisons of age sex race-ethnicityeducation marital status region and urbanicity inthe weighted (by the joint product of WT11WT12 and WT13) sample and the 2002 CPSsample for persons ages 18+ in the continental USAs the full cross-classification of these seven vari-ables even using coarse categories has more cellsthan there are respondents in the sample asmoothing method was used to fit only the two-waymarginals by estimating a logistic regression equationthat combined the weighted 9282 respondents(coded 1 on the dichotomous dependent variable)with a synthetic dataset representative of the individual-level 2002 CPS data for the populationages 18+ (coded 0 on the dependent variable) Theseven socio-demographic variables and their two-way interactions were the predictors The predictedprobability of being in the NCS-R sample ratherthan in the synthetic CPS population sample wasdefined for each respondent (pNn) based on thisequation The ratio pNn (1 ndash pNn) was then calcu-lated for each respondent The sum of these ratiosacross the 9282 respondents was normed to sum to9282 These normed ratios define WT14

The four-way product of weights WT11 throughWT14 was then created and applied to the 9282respondent cases This defines the consolidated PartI weight based on the NR approach An additionalweight (WT15) is needed though to analyse datain the Part II sample This is because as notedearlier the Part I sample was divided into three stratathat differed in their probabilities of selection intoPart II (100 in stratum I 50 in stratum II and25 in stratum III) The proportions who actuallycompleted Part II differ from these targets 992 ofthe 4088 respondents in stratum I (n = 4054)534 of the 1555 respondents in stratum II (n = 830) and 222 of the 3639 respondents instratum III (n = 808) for a total Part II sample of5692 respondents These differences from the targetproportions occurred because some respondentsrefused to complete Part II (approximately 1 ofdesignated cases in each stratum) and because therandom selection procedure for respondents imple-mented in the CAPI program had some randomerror The latter accounts for the fact that thenumber of Part II respondents in stratum II is largerthan the target proportion

We could have generated WT15 as the simple

inverse of the weighted (by the consolidated Part Iweight) proportion of Part I respondents in thestratum who completed Part II However in order tocheck for the possibility of systematic bias we esti-mated separate stepwise logistic regression equationsin each stratum to predict participation in Part IIfrom Part I responses Only a handful of variables allof them demographic were found to be significantpredictors in these equations Each Part II respon-dent was assigned the inverse of his or her predictedprobability of participation in Part II from the finalwithin-stratum equation These values were thennormed to have a sum equal to the sum of the Part Iweights in the full Part I sample in the stratumThese normed values were then summed across theentire Part II sample of 5692 cases and renormed tohave a sum of weights of 5692 These renormedvalues define WT15 WT15 was then multiplied bythe consolidated Part I weight to create the consoli-dated Part II weight

Comparison of the weighted and unweighteddistributions of the Part I and Part II samples with2002 CPS population distributions provides informa-tion on the effects of weighting As shown in Table6 the unweighted Part I samples overrepresentedracial minorities females residents of the Midwestpeople with 13+ years of education and residents ofmetropolitan areas All of these biases were correctedwith the consolidated Part I weight Biases in theunweighted Part II sample were similar althoughmore extreme biases were found than in the Part Isample with regard to the over-representation offemales young adults (ages 18ndash34) and residents ofMetropolitan areas All of these biases werecorrected with the consolidated Part II weight

Weighting short-form respondents to be representative ofall non-respondentsWe noted in the last subsection that WT13 relies ona two-part weighting scheme that was applied to theshort-form respondents before they were comparedwith the main survey respondents This was done inorder to increase the extent to which the short-formrespondents represent all main survey non-respondents The first part of this weighting schemecompared the 399 HUs in which short-form inter-views were completed with all other non-respondentHUs on the same aggregate 2000 census block group(BG) data as those used in the third stage (Model 3)

IJMPR 132 3rd v4 25604 1026 am Page 82

NCS-R design and field procedures 83

of the non-response adjustment model (Table 4)Stepwise logistic regression was used to select signifi-cant predictors of HU-level participation versusnon-participation The final set of significant predic-tors included region urbanicity and household sizeThe participating HUs were then weighted by theinverse of their predicted probability of participationto adjust for segment-level variation in responseThis weight was then normed to have a sum ofweights equal to 3258 the weighted total number ofnon-respondent HUs in the sample

With this first weight applied separately to each ofthe 773 residents of the 399 participating short-formHUs a comparison of the short-form respondents inthese 399 HUs with the remaining eligible residentsof the same HUs was made based on the cross-classification of listing information (age and sex ofeach eligible HU resident and number of eligibleresidents in each HU) A second weight was thendeveloped that was equivalent to WT12 in adjustingthe short-form respondents to be representative of all773 eligible residents of these HUs As the 399 HUs

Table 5 Diagnostic stem question predictors of response versus non-response based on the comparison ofmain survey respondents (n = 9282) versus short-form survey respondents in non-respondent households (n = 475)1

Bivariate Multivariate

OR (95 CI) OR (95 CI)

I Mood disturbanceDysphoria 09 (07ndash12) 10 (08ndash14)Euphoria 08 (06ndash11) 09 (07ndash12)Irritability 08 (06ndash10) 08 (06ndash11)Extreme irritability 08 (06ndash12) 09 (07ndash13)

II AnxietyPersistent worry 09 (07ndash11) 09 (07ndash12)Panic 08 (06ndash11) 08 (06ndash11)Agoraphobic fear 09 (07ndash12) 09 (07ndash12)Specific fear 10 (07ndash13) 10 (08ndash13)Social fear 10 (07ndash13) 10 (07ndash14)Separation anxietyndash Childhood only 12 (08ndash18) 13 (09ndash20)ndash Adult only 11 (07ndash17) 13 (08ndash20)ndash Both 13 (07ndash24) 16 (08ndash29)

III Substance problemsNicotine 12 (09ndash15) 12 (09ndash16)Alcohol-drugs 11 (08ndash15) 11 (08ndash14)

IV Impulse-control problemsOppositional-defiant 10 (08ndash13) 10 (08ndash14)Conduct 09 (07ndash11) 09 (06ndash11)Intermittent explosive 11 (09ndash14) 12 (10ndash16)Attention-hyperactive problemsndash Attention only 09 (06ndash13) 09 (07ndash13)ndash Hyperactive only 11 (07ndash19) 11 (06ndash20)ndash Both 10 (06ndash06) 10 (06ndash16)

1 Based on logistic regression equations in which respondents in the main survey were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of the residents of all non-respondent HUs using methods described in the text All equations controlled for the predictors in Model 3 in Table 4

IJMPR 132 3rd v4 25604 1026 am Page 83

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 13: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

NCS-R design and field procedures 81

specifications that included disorder clusters andcounts but none yielded any more compellingevidence than in Table 5 for significant differencesbetween respondents and initial non-respondents inthe prevalence of these diagnostic stem questions

Based on these results the final non-responseadjustment weight was based on Model 3 Each ofthe 9282 main study respondents was assigned apredicted probability of response based on this equa-tion The non-response adjustment weight was

defined as the multiplicative inverse of thatpredicted probability This weight was then normedto sum to 9282 This normed weight defines WT13

The 9282 cases were then weighted by the jointproduct of WT11 WT12 and WT13 A fourthweight (WT14) was then created to adjust for varia-tion between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the March 2002 Current PopulationSurvey (CPS) data This post-stratification weight

Table 4 contd

Model 1 Model 2 Model 3

OR (95 CI) χ2 OR (95 CI) χ2 OR (95 CI) χ2 df

Employment statusEmployed 10 ndash 10 ndashHomemaker 13 (08ndash20) 20 14 (09ndash21) 28 4Retired 11 (06ndash18) 11 (07ndash18)Student 09 (04ndash18) 09 (04ndash18)Other 11 (06ndash19) 11 (06ndash20)

Number of eligible household residents1 07 (04ndash11) 604 06 (04ndash10) 720 32 19 (11ndash35) 18 (10ndash33)3 27 (14ndash52) 26 (13ndash50)4+ 10 ndash 10 ndash

Block group-level socio-demographics4

Unable to speak English D1-6 19 (14ndash26) Never married D5-10 15 (11ndash22) Not in labor force D1-3 06 (04ndash09) Age 65+ D1-3 27 (16ndash45) Age 65+ D4-9 15 (09ndash23) Non-Hispanic White D1 04 (02ndash06)

Significant at the 005 level two-sided test

1 Based on logistic regression equations in which main survey respondents were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of all non-respondents usingmethods described in the text2 Metropolitan counties are defined as either central or fringe counties of census metropolitan statistical areas Non-metropolitancounties are defined residually as all counties that are not in census metropolitan statistical areas For more details on the CensusBureau definition of metropolitan statistical areas see httpwwwcensusgovpopulationwwwestimatesmetrodefhtml3 Includes co-habitors4 The percentages in each county were converted to percentiles based on weighted (by population size) county-level distributionsand converted into deciles (D) of the weighted percentile distributions The coefficients from preliminary regression analysesincluded nine dummies for the deciles of each substantive variable These coefficients were examined in order to choose theoptimal way to collapse the deciles A dichotomous coding was optimal for four of the five substantive variables and a trichotomouscoding for the fifth

IJMPR 132 3rd v4 25604 1026 am Page 81

Kessler Berglund et al82

was based on comparisons of age sex race-ethnicityeducation marital status region and urbanicity inthe weighted (by the joint product of WT11WT12 and WT13) sample and the 2002 CPSsample for persons ages 18+ in the continental USAs the full cross-classification of these seven vari-ables even using coarse categories has more cellsthan there are respondents in the sample asmoothing method was used to fit only the two-waymarginals by estimating a logistic regression equationthat combined the weighted 9282 respondents(coded 1 on the dichotomous dependent variable)with a synthetic dataset representative of the individual-level 2002 CPS data for the populationages 18+ (coded 0 on the dependent variable) Theseven socio-demographic variables and their two-way interactions were the predictors The predictedprobability of being in the NCS-R sample ratherthan in the synthetic CPS population sample wasdefined for each respondent (pNn) based on thisequation The ratio pNn (1 ndash pNn) was then calcu-lated for each respondent The sum of these ratiosacross the 9282 respondents was normed to sum to9282 These normed ratios define WT14

The four-way product of weights WT11 throughWT14 was then created and applied to the 9282respondent cases This defines the consolidated PartI weight based on the NR approach An additionalweight (WT15) is needed though to analyse datain the Part II sample This is because as notedearlier the Part I sample was divided into three stratathat differed in their probabilities of selection intoPart II (100 in stratum I 50 in stratum II and25 in stratum III) The proportions who actuallycompleted Part II differ from these targets 992 ofthe 4088 respondents in stratum I (n = 4054)534 of the 1555 respondents in stratum II (n = 830) and 222 of the 3639 respondents instratum III (n = 808) for a total Part II sample of5692 respondents These differences from the targetproportions occurred because some respondentsrefused to complete Part II (approximately 1 ofdesignated cases in each stratum) and because therandom selection procedure for respondents imple-mented in the CAPI program had some randomerror The latter accounts for the fact that thenumber of Part II respondents in stratum II is largerthan the target proportion

We could have generated WT15 as the simple

inverse of the weighted (by the consolidated Part Iweight) proportion of Part I respondents in thestratum who completed Part II However in order tocheck for the possibility of systematic bias we esti-mated separate stepwise logistic regression equationsin each stratum to predict participation in Part IIfrom Part I responses Only a handful of variables allof them demographic were found to be significantpredictors in these equations Each Part II respon-dent was assigned the inverse of his or her predictedprobability of participation in Part II from the finalwithin-stratum equation These values were thennormed to have a sum equal to the sum of the Part Iweights in the full Part I sample in the stratumThese normed values were then summed across theentire Part II sample of 5692 cases and renormed tohave a sum of weights of 5692 These renormedvalues define WT15 WT15 was then multiplied bythe consolidated Part I weight to create the consoli-dated Part II weight

Comparison of the weighted and unweighteddistributions of the Part I and Part II samples with2002 CPS population distributions provides informa-tion on the effects of weighting As shown in Table6 the unweighted Part I samples overrepresentedracial minorities females residents of the Midwestpeople with 13+ years of education and residents ofmetropolitan areas All of these biases were correctedwith the consolidated Part I weight Biases in theunweighted Part II sample were similar althoughmore extreme biases were found than in the Part Isample with regard to the over-representation offemales young adults (ages 18ndash34) and residents ofMetropolitan areas All of these biases werecorrected with the consolidated Part II weight

Weighting short-form respondents to be representative ofall non-respondentsWe noted in the last subsection that WT13 relies ona two-part weighting scheme that was applied to theshort-form respondents before they were comparedwith the main survey respondents This was done inorder to increase the extent to which the short-formrespondents represent all main survey non-respondents The first part of this weighting schemecompared the 399 HUs in which short-form inter-views were completed with all other non-respondentHUs on the same aggregate 2000 census block group(BG) data as those used in the third stage (Model 3)

IJMPR 132 3rd v4 25604 1026 am Page 82

NCS-R design and field procedures 83

of the non-response adjustment model (Table 4)Stepwise logistic regression was used to select signifi-cant predictors of HU-level participation versusnon-participation The final set of significant predic-tors included region urbanicity and household sizeThe participating HUs were then weighted by theinverse of their predicted probability of participationto adjust for segment-level variation in responseThis weight was then normed to have a sum ofweights equal to 3258 the weighted total number ofnon-respondent HUs in the sample

With this first weight applied separately to each ofthe 773 residents of the 399 participating short-formHUs a comparison of the short-form respondents inthese 399 HUs with the remaining eligible residentsof the same HUs was made based on the cross-classification of listing information (age and sex ofeach eligible HU resident and number of eligibleresidents in each HU) A second weight was thendeveloped that was equivalent to WT12 in adjustingthe short-form respondents to be representative of all773 eligible residents of these HUs As the 399 HUs

Table 5 Diagnostic stem question predictors of response versus non-response based on the comparison ofmain survey respondents (n = 9282) versus short-form survey respondents in non-respondent households (n = 475)1

Bivariate Multivariate

OR (95 CI) OR (95 CI)

I Mood disturbanceDysphoria 09 (07ndash12) 10 (08ndash14)Euphoria 08 (06ndash11) 09 (07ndash12)Irritability 08 (06ndash10) 08 (06ndash11)Extreme irritability 08 (06ndash12) 09 (07ndash13)

II AnxietyPersistent worry 09 (07ndash11) 09 (07ndash12)Panic 08 (06ndash11) 08 (06ndash11)Agoraphobic fear 09 (07ndash12) 09 (07ndash12)Specific fear 10 (07ndash13) 10 (08ndash13)Social fear 10 (07ndash13) 10 (07ndash14)Separation anxietyndash Childhood only 12 (08ndash18) 13 (09ndash20)ndash Adult only 11 (07ndash17) 13 (08ndash20)ndash Both 13 (07ndash24) 16 (08ndash29)

III Substance problemsNicotine 12 (09ndash15) 12 (09ndash16)Alcohol-drugs 11 (08ndash15) 11 (08ndash14)

IV Impulse-control problemsOppositional-defiant 10 (08ndash13) 10 (08ndash14)Conduct 09 (07ndash11) 09 (06ndash11)Intermittent explosive 11 (09ndash14) 12 (10ndash16)Attention-hyperactive problemsndash Attention only 09 (06ndash13) 09 (07ndash13)ndash Hyperactive only 11 (07ndash19) 11 (06ndash20)ndash Both 10 (06ndash06) 10 (06ndash16)

1 Based on logistic regression equations in which respondents in the main survey were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of the residents of all non-respondent HUs using methods described in the text All equations controlled for the predictors in Model 3 in Table 4

IJMPR 132 3rd v4 25604 1026 am Page 83

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 14: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

Kessler Berglund et al82

was based on comparisons of age sex race-ethnicityeducation marital status region and urbanicity inthe weighted (by the joint product of WT11WT12 and WT13) sample and the 2002 CPSsample for persons ages 18+ in the continental USAs the full cross-classification of these seven vari-ables even using coarse categories has more cellsthan there are respondents in the sample asmoothing method was used to fit only the two-waymarginals by estimating a logistic regression equationthat combined the weighted 9282 respondents(coded 1 on the dichotomous dependent variable)with a synthetic dataset representative of the individual-level 2002 CPS data for the populationages 18+ (coded 0 on the dependent variable) Theseven socio-demographic variables and their two-way interactions were the predictors The predictedprobability of being in the NCS-R sample ratherthan in the synthetic CPS population sample wasdefined for each respondent (pNn) based on thisequation The ratio pNn (1 ndash pNn) was then calcu-lated for each respondent The sum of these ratiosacross the 9282 respondents was normed to sum to9282 These normed ratios define WT14

The four-way product of weights WT11 throughWT14 was then created and applied to the 9282respondent cases This defines the consolidated PartI weight based on the NR approach An additionalweight (WT15) is needed though to analyse datain the Part II sample This is because as notedearlier the Part I sample was divided into three stratathat differed in their probabilities of selection intoPart II (100 in stratum I 50 in stratum II and25 in stratum III) The proportions who actuallycompleted Part II differ from these targets 992 ofthe 4088 respondents in stratum I (n = 4054)534 of the 1555 respondents in stratum II (n = 830) and 222 of the 3639 respondents instratum III (n = 808) for a total Part II sample of5692 respondents These differences from the targetproportions occurred because some respondentsrefused to complete Part II (approximately 1 ofdesignated cases in each stratum) and because therandom selection procedure for respondents imple-mented in the CAPI program had some randomerror The latter accounts for the fact that thenumber of Part II respondents in stratum II is largerthan the target proportion

We could have generated WT15 as the simple

inverse of the weighted (by the consolidated Part Iweight) proportion of Part I respondents in thestratum who completed Part II However in order tocheck for the possibility of systematic bias we esti-mated separate stepwise logistic regression equationsin each stratum to predict participation in Part IIfrom Part I responses Only a handful of variables allof them demographic were found to be significantpredictors in these equations Each Part II respon-dent was assigned the inverse of his or her predictedprobability of participation in Part II from the finalwithin-stratum equation These values were thennormed to have a sum equal to the sum of the Part Iweights in the full Part I sample in the stratumThese normed values were then summed across theentire Part II sample of 5692 cases and renormed tohave a sum of weights of 5692 These renormedvalues define WT15 WT15 was then multiplied bythe consolidated Part I weight to create the consoli-dated Part II weight

Comparison of the weighted and unweighteddistributions of the Part I and Part II samples with2002 CPS population distributions provides informa-tion on the effects of weighting As shown in Table6 the unweighted Part I samples overrepresentedracial minorities females residents of the Midwestpeople with 13+ years of education and residents ofmetropolitan areas All of these biases were correctedwith the consolidated Part I weight Biases in theunweighted Part II sample were similar althoughmore extreme biases were found than in the Part Isample with regard to the over-representation offemales young adults (ages 18ndash34) and residents ofMetropolitan areas All of these biases werecorrected with the consolidated Part II weight

Weighting short-form respondents to be representative ofall non-respondentsWe noted in the last subsection that WT13 relies ona two-part weighting scheme that was applied to theshort-form respondents before they were comparedwith the main survey respondents This was done inorder to increase the extent to which the short-formrespondents represent all main survey non-respondents The first part of this weighting schemecompared the 399 HUs in which short-form inter-views were completed with all other non-respondentHUs on the same aggregate 2000 census block group(BG) data as those used in the third stage (Model 3)

IJMPR 132 3rd v4 25604 1026 am Page 82

NCS-R design and field procedures 83

of the non-response adjustment model (Table 4)Stepwise logistic regression was used to select signifi-cant predictors of HU-level participation versusnon-participation The final set of significant predic-tors included region urbanicity and household sizeThe participating HUs were then weighted by theinverse of their predicted probability of participationto adjust for segment-level variation in responseThis weight was then normed to have a sum ofweights equal to 3258 the weighted total number ofnon-respondent HUs in the sample

With this first weight applied separately to each ofthe 773 residents of the 399 participating short-formHUs a comparison of the short-form respondents inthese 399 HUs with the remaining eligible residentsof the same HUs was made based on the cross-classification of listing information (age and sex ofeach eligible HU resident and number of eligibleresidents in each HU) A second weight was thendeveloped that was equivalent to WT12 in adjustingthe short-form respondents to be representative of all773 eligible residents of these HUs As the 399 HUs

Table 5 Diagnostic stem question predictors of response versus non-response based on the comparison ofmain survey respondents (n = 9282) versus short-form survey respondents in non-respondent households (n = 475)1

Bivariate Multivariate

OR (95 CI) OR (95 CI)

I Mood disturbanceDysphoria 09 (07ndash12) 10 (08ndash14)Euphoria 08 (06ndash11) 09 (07ndash12)Irritability 08 (06ndash10) 08 (06ndash11)Extreme irritability 08 (06ndash12) 09 (07ndash13)

II AnxietyPersistent worry 09 (07ndash11) 09 (07ndash12)Panic 08 (06ndash11) 08 (06ndash11)Agoraphobic fear 09 (07ndash12) 09 (07ndash12)Specific fear 10 (07ndash13) 10 (08ndash13)Social fear 10 (07ndash13) 10 (07ndash14)Separation anxietyndash Childhood only 12 (08ndash18) 13 (09ndash20)ndash Adult only 11 (07ndash17) 13 (08ndash20)ndash Both 13 (07ndash24) 16 (08ndash29)

III Substance problemsNicotine 12 (09ndash15) 12 (09ndash16)Alcohol-drugs 11 (08ndash15) 11 (08ndash14)

IV Impulse-control problemsOppositional-defiant 10 (08ndash13) 10 (08ndash14)Conduct 09 (07ndash11) 09 (06ndash11)Intermittent explosive 11 (09ndash14) 12 (10ndash16)Attention-hyperactive problemsndash Attention only 09 (06ndash13) 09 (07ndash13)ndash Hyperactive only 11 (07ndash19) 11 (06ndash20)ndash Both 10 (06ndash06) 10 (06ndash16)

1 Based on logistic regression equations in which respondents in the main survey were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of the residents of all non-respondent HUs using methods described in the text All equations controlled for the predictors in Model 3 in Table 4

IJMPR 132 3rd v4 25604 1026 am Page 83

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 15: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

NCS-R design and field procedures 83

of the non-response adjustment model (Table 4)Stepwise logistic regression was used to select signifi-cant predictors of HU-level participation versusnon-participation The final set of significant predic-tors included region urbanicity and household sizeThe participating HUs were then weighted by theinverse of their predicted probability of participationto adjust for segment-level variation in responseThis weight was then normed to have a sum ofweights equal to 3258 the weighted total number ofnon-respondent HUs in the sample

With this first weight applied separately to each ofthe 773 residents of the 399 participating short-formHUs a comparison of the short-form respondents inthese 399 HUs with the remaining eligible residentsof the same HUs was made based on the cross-classification of listing information (age and sex ofeach eligible HU resident and number of eligibleresidents in each HU) A second weight was thendeveloped that was equivalent to WT12 in adjustingthe short-form respondents to be representative of all773 eligible residents of these HUs As the 399 HUs

Table 5 Diagnostic stem question predictors of response versus non-response based on the comparison ofmain survey respondents (n = 9282) versus short-form survey respondents in non-respondent households (n = 475)1

Bivariate Multivariate

OR (95 CI) OR (95 CI)

I Mood disturbanceDysphoria 09 (07ndash12) 10 (08ndash14)Euphoria 08 (06ndash11) 09 (07ndash12)Irritability 08 (06ndash10) 08 (06ndash11)Extreme irritability 08 (06ndash12) 09 (07ndash13)

II AnxietyPersistent worry 09 (07ndash11) 09 (07ndash12)Panic 08 (06ndash11) 08 (06ndash11)Agoraphobic fear 09 (07ndash12) 09 (07ndash12)Specific fear 10 (07ndash13) 10 (08ndash13)Social fear 10 (07ndash13) 10 (07ndash14)Separation anxietyndash Childhood only 12 (08ndash18) 13 (09ndash20)ndash Adult only 11 (07ndash17) 13 (08ndash20)ndash Both 13 (07ndash24) 16 (08ndash29)

III Substance problemsNicotine 12 (09ndash15) 12 (09ndash16)Alcohol-drugs 11 (08ndash15) 11 (08ndash14)

IV Impulse-control problemsOppositional-defiant 10 (08ndash13) 10 (08ndash14)Conduct 09 (07ndash11) 09 (06ndash11)Intermittent explosive 11 (09ndash14) 12 (10ndash16)Attention-hyperactive problemsndash Attention only 09 (06ndash13) 09 (07ndash13)ndash Hyperactive only 11 (07ndash19) 11 (06ndash20)ndash Both 10 (06ndash06) 10 (06ndash16)

1 Based on logistic regression equations in which respondents in the main survey were coded 1 and short-form survey respondents(excluding secondary respondents in HUs where the primary respondent participated in the main survey) were coded 0 on a dichoto-mous dependent variable Short-form respondents in addition were weighted to be representative of the residents of all non-respondent HUs using methods described in the text All equations controlled for the predictors in Model 3 in Table 4

IJMPR 132 3rd v4 25604 1026 am Page 83

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 16: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

Kessler Berglund et al84

were weighted to represent all 3258 non-respondentHUs in the entire sample the sum of weights acrossthe 773 eligible residents of the 399 participatingHUs was equal to 6302 The latter is our best esti-mate of the number of eligible residents of allnon-respondent HUs The two weights were thenmultiplied together and the sum of the productnormed to equal 6302

This weighting scheme was then used to comparethe 9282 main sample respondents (with a sum ofweights of 14025) with the short-form respondentswho lived in HUs where the primary pre-designatedrespondent did not complete an interview in themain survey (n = 475 with a sum of weights of6302) to create a non-response adjustment weightNote that no weighting of the 9282 cases based onaggregate Census small area data was made beforecomparison with the short-form respondents Thereason is that the 9282 respondents represented onlytheir own HUs in this comparison whereas theshort-form respondents were made to represent allnon-respondents rather than only non-respondentsin their 399 HUs Note that the secondary short-form respondents from HUs where the primaryrespondent completed an interview in the mainsurvey were excluded from this exercise

The multiple-imputation approach Five weights were used in the MI approach a lockedbuilding subsampling weight (WT21) a weight thatadjusts for geographic variation in response rate(WT22) a within-household probability of selec-tion weight (WT23) a post-stratification weight(WT24) and a Part II selection weight (WT25)The joint product of the first four weights is theconsolidated weight used to analyse data from thePart I sample in the MI approach (n = 9836) whilethe joint product of all five weights is the consoli-dated weight used to analyse data from the Part IIsample (n = 6279) When the data to be analysedinclude some variables assessed in Part I and othervariables assessed in Part II the Part II sample isused This section of the paper briefly reviews each ofthese five weights

The locked building weight (WT21) is identicalto WT11 The non-response adjustment weight(WT22) in comparison differs from the similarweight in the NR approach because the short-formcases which were treated as non-respondents in the

NR approach are treated as respondents in the MIapproach This means that non-response adjustmentin the MI approach must be based entirely oncomparisons of small area census data betweenrespondent HUs and non-respondent HUs As aresult the MI non-response adjustment weight(WT21) adjusts for segment-level variation in thehousehold response rate The simple way of doingthis would have been to weight each participatingHU by the inverse of the response rate in itssegment However this would have created aproblem for the small number of segments in whichno interviews were obtained and it would also haveintroduced an unnecessarily large amount of varia-tion in the weight because of the small level ofaggregation As an alternative then the sameapproach was used as in developing the non-responseadjustment weight (Table 4) Specifically stepwiselogistic regression analysis was used to calculate thepredicted probability of participation of each HUthat actually participated in the survey based on acomparison of BG demographic data from the 2000census The inverse of this predicted probability wasthen used as the non-response adjustment weight

The product of WT21 and WT22 was thencomputed for each participating HU and thisproduct was normed so that it summed to 8092 thenumber of participating HUs in the entire survey Athird weight (WT23) was then created at therespondent level to adjust for variation in within-household probability of selection As in the NRapproach this was done by creating a separateweighted (by the product of WT21 and WT22)observational record for each of the 14788 eligibleHU residents in the 8092 participating HUs witheach case assigned his or her HU weight The threevariables in the household listing (respondent ageand sex and size of household) were included in theindividual-level records The weighted distributionsof these three variables were then calculated sepa-rately for the 9836 respondents and the remaining4952 eligible residents in the same HUs As with theNR approach these distributions were found todiffer meaningfully with the greatest differencebeing for size of household As in the non-responseadjustment approach the weighted three-way cross-classifications of age sex and household size wascomputed separately for the 9836 respondents andfor all 14788 residents of the participating HUs the

IJMPR 132 3rd v4 25604 1026 am Page 84

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 17: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

NCS-R design and field procedures 85

ratio of the number of cases in the latter versus theformer was calculated within cells and these ratioswere applied to each of the 9836 respondents Thesum of the ratios was then normed to sum to 9836These normed ratios define WT23

The three-way product of WT21 WT22 andWT23 was computed for each respondent and thisproduct was normed so that it summed to 9836 Afourth weight (WT24) was then created to adjust for

variation between the joint distribution of severalsocio-demographic variables in this weighted samplecompared to the 2000 Census This post-stratifica-tion weight was constructed in exactly the same wayas in the NR approach (WT14) As a result adescription of this method will not be repeated here

The four-way product of WT21-WT24 was thencomputed to define the consolidated Part I weightbased on the MI approach An additional weight

Table 6 Comparison of unweighted and weighted Part I and Part II NCS-R respondents with socio-demographicinformation from the March 2002 Current Population Survey (CPS)

NCS-R Part I NCS-R Part II CPS

Unweighted Weighted Unweighted Weighted

(SE) (SE) (SE) (SE)

RaceNon-Hispanic White 721 (18) 732 (19) 734 (17) 728 (18) 730Non-Hispanic Black 133 (11) 116 (11) 126 (10) 124 (10) 116Hispanic 95 (10) 108 (10) 93 (10) 111 (12) 110Other 51 (07) 44 (04) 47 (07) 38 (04) 44

SexMale 446 (05) 479 (05) 418 (06) 470 (10) 480Female 554 (05) 521 (05) 582 (06) 530 (10) 520

RegionNortheast 184 (18) 193 (33) 183 (18) 188 (30) 192Midwest 267 (17) 232 (18) 275 (17) 235 (18) 229South 345 (10) 358 (20) 325 (12) 356 (19) 358West 205 (06) 217 (22) 217 (10) 221 (19) 221

Age18ndash34 327 (10) 315 (12) 340 (11) 315 (12) 31235ndash49 309 (06) 315 (08) 322 (07) 309 (10) 31750ndash64 207 (05) 211 (06) 213 (07) 209 (10) 21165+ 157 (05) 160 (05) 125 (06) 167 (10) 161

Educationlt 12 Years 449 (13) 484 (17) 450 (14) 493 (15) 48713+ Years 551 (13) 516 (17) 550 (14) 507 (15) 513

Marital statusMarried 573 (09) 558 (11) 569 (10) 559 (12) 560Not married 427 (09) 442 (11) 431 (10) 441 (12) 440

Urbanicity statusMetropolitan 756 (30) 675 (54) 769 (31) 682 (50) 673Non-metro 244 (30) 325 (54) 231 (31) 318 (50) 327

IJMPR 132 3rd v4 25604 1026 am Page 85

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 18: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

Kessler Berglund et al86

(WT25) was then constructed to analyse dataincluded in Part II of the interview (n = 6279)WT25 was constructed using the same three strataand the same methods as WT15 in the NRapproach The product of the Part I MI weight andWT25 was computed for each Part I respondent andthis product was normed so that it summed to 6279This normed product defines the consolidated Part IIweight based on the MI approach

Item-level imputationThe amount of item-missing data is much smaller inNCS-R than in a paper-and-pencil survey of compa-rable complexity because the CAPI programeliminated interviewer skip errors However respon-dents occasionally refused to answer some questionsand sometimes reported that they did not know theanswers to other questions As in many surveys thehighest item-level non-response was for the ques-tions about earnings and family income Aregression-based multiple imputation approach wasused to impute missing values for these income datausing information about age sex education employ-ment status and occupation of household residentsConservative rational imputation was used for otheritems that have item-missing cases For example inthe life events section the small numbers of missingvalues were recoded as negative responses In thecase of missing items in a psychometric scale respon-dents were assigned a total scale score based onpartial values using mean standardized scores on theremaining items in the scale The MI approach couldhave been used to take these item-level imputationsinto account in evaluating statistical significancebut we did not do so because the amount of item-missing data was quite small

Design-based estimationWeighting and clustering introduce imprecision intodescriptive statistics Conventional methods of esti-mating significance which assume a simple randomsample do not take this imprecision into considera-tion As a result special design-based methods ofestimating SEs and significance tests are being usedin the analysis of the NCS-R data The Taylor serieslinearization method is the main approach used here(Wolter 1985) although we also use the morecomputationally intensive method of jackkniferepeated replications (JRR) for some applications

(Kish and Frankel 1974) JRR is used for applica-tions where a convenient software application usingthe Taylor series method is not readily available andfor highly non-linear estimation problems in whichthe linearization of the Taylor series method mightbe problematic

As noted earlier in the paper the NCS-R is basedon 84 PSUs and pseudo-PSUs These 84 weredivided into 42 matched pairs for purposes of esti-mating design effects Design-based estimation wasthen carried out using 42 sampling strata and twosampling error calculation units (SECUs) perstratum Although the decision to work with onlytwo SECUs per stratum was arbitrary from theperspective of the Taylor series estimation method itwas necessary for using JRR

Although the effects of weighting on clusteringcan be described in a number of ways a particularlyconvenient approach is to calculate a statistic knownas the design effect (DE) (Kish and Kish 1965) for anumber of variables of interest The DE is the squareof the ratio of the design-based SE of a descriptivestatistic divided by the simple random sample SEThe DE can be interpreted as the approximateproportional increase in the sample size that wouldbe required to increase the precision of the design-based estimate to the precision of an estimate basedon a simple random sample of the same size

Design effectsDesign effects due to clustering are usually a gooddeal larger in estimating means and other first-orderstatistics than more complex statistics The reasonfor this is that the number of respondents having thesame characteristics in the same SECU of a singlestratum becomes smaller and smaller as the statisticsbecome more complex This leads to a reduction inthe effects of clustering in the estimation of DEDesign effects due to weighting are also usuallysomewhat smaller for multivariate than bivariatedescriptive statistics because DEs are due not onlyto the variance in the weights but also to thestrength of the association between the weights andthe substantive variables under considerationMeans typically have higher DEs than other statis-tics so evaluations of DEs typically focus on theestimation of means We do the same here consid-ering prevalence estimates for a number of themental disorders assessed in the NCS-R Five

IJMPR 132 3rd v4 25604 1026 am Page 86

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 19: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

NCS-R design and field procedures 87

dichotomous measures of lifetime prevalence ofDSM-IV disorders included in the Part I samplewere included in the evaluation of Part I DEs Theseinclude major depressive disorder (MDD) bipolardisorder (BPD) generalized anxiety disorder(GAD) panic disorder (PD) and intermittentexplosive disorder (IED) The DEs for those esti-mates are 16 for MDD 14 for BPD 16 for GAD09 for PD and 19 for IED

As described earlier only 5692 of the 9282 Part Irespondents were administered Part II If Part IIrespondents were a simple random sample of the PartI sample the expected design effect of the formercompared to the latter would be approximately 16(92825692) However we expected the designeffects for most prevalence estimates to be consider-ably lower than this due to the fact that Part IIrespondents include all Part I respondents who metcriteria for any of the DSM-IV disorders assessed inPart I plus other respondents selected proportional tohousehold size In order to evaluate whether thisexpectation is borne out in the data we calculatedthe DEs for the same five prevalence estimates as inthe last paragraph for the Part II sample and thencomputed the ratios of the DEs based on the Part IIversus Part I samples The results are as follows 156for MDD 092 for BPD 112 for GAD 062 for PDand 080 for IED

The fact that these estimates with the exceptionof MDD are all considerably smaller than the 16 wewould have expected based on using simple randomsampling to select respondents into Part II demon-strates that the disproportional case-controlsampling approach used to select the Part II sampleincreased the efficiency of the Part II sample Indeedthe fact that the design effect is close to 10 in anumber of cases means that this sub-samplingapproach was able to retain the vast majority of theprecision in the full Part I sample with only slightlymore than 60 of the Part I sample The exception isMDD by far the most prevalent of the disordersconsidered here with a ratio of controls to cases inthe Part II sample of less than 41 As a result of thiscomparatively low ratio a meaningful amount ofprecision was lost by subsampling controls into PartII However this is a self-limiting problem in thatthe high prevalence of MDD means that we havegreater precision for studying this condition than theless common conditions

Trimming weights to reduce design effectsEstimates of DE can be sensitive to extreme weightsWeight trimming of various sorts is often used toreduce this sensitivity A small amount of trimmingwas built into the construction of two of theweighting steps described above First the withinprobability of selection weights (WT12 WT23)were trimmed by combining respondents in house-holds with three or more eligible residents into asingle weighting stratum This means that noattempt was made for example to assign a weight of80 to the rare respondent who lived in a householdwith eight eligible residents Instead respondentsliving in HUs with three or more eligible residentswere combined and the weight to adjust for theirunder-sampling was distributed equally among all ofthem Second trimming was used in the non-response adjustment weights (WT13 WT22) todistribute the weights at the tails of these distribu-tions (the upper and lower 2 of each distribution)equally across all cases at these tails

We also empirically investigated the implicationsof trimming the final consolidated Part I weight inthe non-response adjustment approach Althoughweight trimming usually reduces the variance ofweights and in this way improves the precision ofestimates and the statistical power of tests trimmingcan also lead to bias in the estimates If the reductionin variance created due to added efficiency exceedsthe increase in variance due to bias the trimming ishelpful overall Weighting is unhelpful in compar-ison if the opposite occurs It is possible to study thistrade-off between bias and efficiency empirically inorder to evaluate alternative weight trimmingschemes by making use of the equality

MSEYp = BYp2 + Var(Yp) (1a)

= (BYp)2 - Var(BYp) + Var(Yp) (1b)

where MSEYp is the estimated mean squared error ofthe prevalence of outcome variable Y at trimmingpoint p BYp is the estimated bias of that prevalenceestimate Var(BYp) is the estimated variance of BYpand Var(Yp) is the estimated variance of Yp

Each of the three elements in equation (1b) canbe estimated empirically for any value of p makingit possible to calculate MSE across a range of trim-ming points and to determine in this way the

IJMPR 132 3rd v4 25604 1026 am Page 87

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 20: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

Kessler Berglund et al88

trimming point that minimizes MSE The firstelement (BYp)2 was estimated directly as (Yp minus Y0)2

where Y0 represents the weighted prevalence estimate of Y based on the untrimmed weight Theother two elements in equation (1b) were estimatedusing a pseudo-replicate method in which 84 sepa-rate estimates were generated for Yp at each value ofp (Zaslavsky Schenker and Belin 2001) Thenumber 84 is based on the fact that the NCS-Rsample design has 42 strata each with two SECUsfor a total of 84 stratum-SECU combinations Theseparate estimates were obtained by sequentiallymodifying the sample and then generating an esti-mate based on that modified sample Themodification consisted of removing all cases fromone SECU and then weighting the cases in theremaining SECU in the same stratum to have a sumof weights equal to the original sum of weights inthat stratum If we define Yp as the weighted estimateof Y at trimming point p in the total sample and wedefine Yp(sn) as the weighted estimate at the sametrimming point in the sample that deletes SECU n (n = 12) of stratum s (s = 1ndash42) then Var(Yp) can beestimated as

Var(Yp) = SUMS[(Yp(s1) minus Yp)2 + (Yp(s2) minus Yp)2]2

(2)

Var(BYp) was estimated in the same fashion byreplacing Yp(sn) in Eq (2) with BYp(sn) = Yp(sn) ndash Y0(sn)

and replacing Yp with BYp(sn) = Yp ndash Y0 This analysis compared the design-based MSE of

lifetime prevalence estimates for the same fiveoutcomes considered in the last subsection plus five comparable outcomes for 12-month prevalenceusing the consolidated Part I weight and 10 succes-sively more severely trimmed versions of theseweights in which between 1 and 10 of cases weretrimmed at each tail of the distribution Trimmingconsisted of distributing the weights at each of thesetails equally across all cases in that tail As resultswere very similar across the outcomes averages ofMSEYp BY

2 and Var(Yp) were calculated at eachtrimming point The mean of MSEY0 across theoutcomes was then arbitrarily set at 100 and all othervalues were defined in relation to that mean for easeof interpretation

Summary results are presented in Table 8 Three

patterns are immediately apparent First the equalityin Eq (1a)ndash(1b) does not hold in the table becausethe results are based on means across a number ofequations that were calculated on raw data Secondthe decrease in the variance of the prevalence is verymodest with successively higher percentages of trim-ming (approximately 25) and only has effects atthe highest levels of trimming (9ndash10) Third thevariance due to bias increases from a low of approxi-mately 05 at 1 trimming to approximately 14at 7 trimming and remains fairly constant in therange 7ndash10 Because this increase is greater thanthe decrease in the variance of Y due to trimmingMSE increases as a result of trimming Based on thisresult we did not trim the consolidated weight

Model-based versus design-based estimationAlthough analysis of the NCS-R data will largelyinvolve design-based methods we will also explorethe possibility of using model-based estimation ofrisk factors This approach uses weights as predictorvariables in unweighted analyses It is also possible tocarry out model-based analyses of the effects of clus-tering by including dummy variables for segments orSECUs as predictor variables In cases where theweights or clusters significantly interact withsubstantive predictors it is necessary to include theseinteractions in prediction equations and to recognizethat the effects of the substantive predictors varydepending on the determinants of the weights andoron geography Insights into interactions that involveweights can be obtained by substituting the substan-tive variables on which the weights are based for theweights themselves in expanded prediction equa-tions Similar insights into interactions that involveclusters can be obtained by including small areacensus data in expanded prediction equations usingrandom effects models to interpret the modifyingeffects of the clusters

In cases where the weight or cluster variables arefound to be significant predictors but not to havesignificant interactions with substantive predictorsthe coefficients associated with substantive predic-tors can be interpreted as they would if the analyseswere based on weighted data The reason for doingthe extra work involved in the model-based estima-tion in such a situation is that the SE of thesubstantive predictors are likely to be smallerperhaps substantially so than in analyses based on

IJMPR 132 3rd v4 25604 1026 am Page 88

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 21: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

NCS-R design and field procedures 89

weighted data Finally in cases where the weights areneither significant predictors nor significant modi-fiers of the effects of the substantive predictors theweights can be ignored entirely leading to a similarreduction in the estimated SE of the coefficientsassociated with substantive predictors

As model-based analysis is very labour intensiveour use of this approach in the NCS-R will bereserved for the most important areas of investiga-tion in which concerns exist about the statisticalpower and sensitivity of parameter estimates A verypreliminary exploration of likely complexities inimplementing such an approach based loosely onthe approach proposed by DuMouchel and Duncan(1983) was carried out by examining the associationsof weights WT12 through WT14 in predicting life-time prevalence of the same five disorders consideredin the last section In addition to evaluating themain effects of the weights we also evaluated the statistical significance of interactions betweenthe weights and several socio-demographic variables(age sex education race-ethnicity) in predictingthe outcomes

Summary results are presented in Table 8 The χ2

values of main effects are shown in Part I for clusters(83 dummy predictor variables) and weights (threeseparate continuous variables) and in Part II forselected socio-demographic variables (age sex

education race-ethnicity) In Part I none of themain effects of either clusters or WT13 is significantat the 005 level while WT12 is significant in fourequations and WT14 in one equation Inspection ofthe coefficients associated with the significantweights shows that the effects of WT12 are due topeople who live alone (who are overrepresented inthe unweighted sample) having higher prevalencethan other respondents while the effect of WT14 isdue to women (who are overrepresented in theunweighted sample) having a higher prevalence ofMDD than men In Part II age is significant in allfive equations (due to high prevalence in the latemiddle-aged age group) sex is significant in four ofthe five (due to women having higher prevalencethan men) education in one (due to high prevalenceof BPD among people with the highest education)and race-ethnicity is significant in three of the five(due to non-Hispanic Whites having higher preva-lence than non-Hispanic Blacks or Hispanics)

Parts III-V show χ2 values of interactions betweenthe socio-demographic variables and the weightsUsing 005-level tests 10 of the interactionsinvolving WT12 10 involving WT13 and 30involving WT14 are statistically significant Of the10 significant interactions six involve age and fourinvolve education Inspection of the coefficientsshows that the age interactions are due to greater

Table 7 The effects of trimming the Part I weight in the range 1ndash10 on the mean squared error of prevalence estimates1

of trimming at each tail Bias (BYp2) Inefficiency [Var(Yp)] Mean squared error

0 00 1000 10001 05 1006 10132 08 1025 10293 21 1006 10224 32 991 10165 60 987 10386 85 1003 10847 143 1003 11448 132 997 11179 129 975 1087

10 128 981 1090

1 Lifetime and 12-month prevalence estimates for positive screens of broad categories of DSM-IV disorders were included in theevaluation of design effects These included any lifetime anxiety disorder mood disorder impulse-control disorder substance usedisorder and any of the four kinds of disorder The Taylor series method was used to estimate each prevalence with the untrimmedPart I weight (Trimming = 0) and with trimming in the range 1ndash10 at each tail Trimming consisted of equally distributing the sumof weights at the tail across all cases at the tail As results were quite similar for all ten screening measures results are averagedhere Note that the equality BYp2 + Var(Yp) = mean squared error does not hold here as the averaging was done at the level ofabsolute bias SE and root mean squared error

IJMPR 132 3rd v4 25604 1026 am Page 89

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 22: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

Kessler Berglund et al90

effects of age among people who live alone (WT12)among people who have a low probability of partici-pating in the survey (WT13) and among peoplewho are underrepresented in the sample afteradjusting for non-response bias (WT14) The educa-tion interactions are due to greater effects ofeducation among people who are under-representedin the sample after adjustment for non-response bias(WT14)

Three broad conclusions can be drawn from thisexercise The first is that the effects of weightscannot be ignored even in studying very simple asso-ciations of the sort considered in Table 8 Some wayof dealing with the weights is needed using eitherdesign-based or model-based methods The secondconclusion is that the effects of many substantivelyimportant predictors are not modified by weightingThis means that it will often be useful to carry outmodel-based analysis to investigate whether riskfactors of special importance are unaffected byweights The third conclusion is that the significantmodifying effects of weights can often be decom-posed to yield substantively meaningfulinterpretations in unweighted analyses This thirdconclusion supports the use of model-based methodsto study important risk factors even in cases whereweight modification is found to exist

OverviewThis paper has presented an overview of the NCS-Rsurvey design and field procedures The use of face-to-face interviewing as the data collection mode isconsistent with most other major national govern-ment-sponsored household surveys Our ability tomanage the complexity of the interview wasincreased greatly by the use of CAPI rather thanPAPI administration The use of CAPI was the onemajor change in the fundamental survey conditionsintroduced into the NCS-R compared to the baselineNCS As described in the body of the paper westopped short of using A-CASI because of concernsabout trending disorder prevalence estimates andtiming However A-CASI is likely to be the mostimportant innovation introduced into the nextround of the NCS which we tentatively plan to fieldin 2010

The NCS-R multi-stage area probability sampledesign was very similar to the NCS design This wasdone to facilitate trend comparisons However three

changes were made in the within-HU selectionprocedures First we interviewed people in the agerange 18+ rather than in the NCS age range of15ndash54 The exclusion of the 15ndash17 age range wasdictated by the fact that we are carrying out a sepa-rate NCS Adolescent survey of 10000 respondentsin the age range 13ndash17 The inclusion of the agerange 55+ was based on the desire to study the entireadult age range Second we sampled two respondentsin a probability subsample of NCS-R HUs TheNCS in comparison had only one respondent ineach HU The implications of this design change canbe evaluated by analysing the NCS-R data separatelyin the subsample of primary respondents which isidentical to the NCS within-HU sample designThird we used a much more efficient way ofselecting Part I non-cases for participation in Part IIof the interview in the NCS-R than in the NCSWhile simple random sampling was used to selectPart II respondents in the NCS differential samplingproportional to HU size was used in the NCS-RBecause of this change the Part II NCS-R sample of5692 cases is considerably more efficient than thePart II NCS sample of 5877 cases

The 709 response rate of primary respondentsin the NCS-R is considerably lower than the 802response rate in the NCS a decade earlier This ispart of a consistent trend in survey response ratesbecoming much lower over the past decade than inthe past Interestingly though while the NCS non-response survey which was very similar in design tothe NCS-R short-form survey found statisticallysignificant underestimation of disorder prevalenceamong NCS respondents versus non-respondents noevidence for downward bias was found in the NCS-Rshort-form survey This result suggests that the NCS-R might be as representative of the population orperhaps even more so with respect topsychopathology as the baseline NCS despite thelower response rate

The Part I NCS-R interview was very similar inlength to Part I of the NCS interview while the PartII NCS-R interview was longer by an average ofabout 30 minutes than NCS Part II This led to ahigher proportion of NCS-R than NCS interviewsthat had to be completed over multiple interviewsessions This difference may have implications fortrending We were mindful of this possibility indesigning the NCS-R interview schedule and we

IJMPR 132 3rd v4 25604 1026 am Page 90

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 23: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

NCS-R design and field procedures 91

consequently included comparable questions largelyin the first half of the interview schedule in order notto change the time burden across the two surveys atthe time the comparable questions were asked Thegoal here was to remove any potential order bias thatmight lead to differences in response

The design-based estimation approach brieflydescribed in this paper is identical to the approachused to analyse the NCS data Importantly as theNCS-R PSUs are linked to the NCS PSUs it will bepossible to merge the two surveys and blend strata forpurposes of increasing statistical power in carrying

Table 8 χ2 values of the main effects and interactions of design weights with socio-demographic variables inpredicting life time Major Depressive Disorder (MDD) Bipolar Disorder I or II (BPD) Generalized AnxietyDisorder (GAD) Panic Disorder (PD) and Intermittent Explosive Disorder (IED) in the NCS-R Part 2 Sample (n = 5692)

df MDD BPD GAD PD IED

I Main effects of clusters and weightsStrataSECU variables 83 1009 797 627 516 726HU selection weight (WT12) 1 98 003 135 97 63Non-response weight (WT13) 1 003 14 11 00 00Post-stratification weight (WT14) 1 82 27 16 02 01

II Main effects of demographicsAge 3 663 558 198 666 240Education 3 52 138 51 58 70Sex 1 407 003 136 406 110Race 3 140 27 89 139 45

III Interactions involved with WT12HU selectionage 3 95 50 50 96 28HU selectioneducation 3 27 18 62 28 01HU selectionsex 1 23 19 71 23 09HU selectionrace 3 19 14 08 18 13

IV Interactions involved with WT13Non-response age 3 183 21 09 184 57Non-response education 3 50 32 71 50 14Non-response sex 1 13 08 07 13 05Nonresponse race 3 70 08 31 18 08

V Interactions involved with WT14Post-stratification age 3 50 67 57 49 84Post-stratification education 3 108 45 80 108 128Post-stratification sex 1 34 09 26 33 00Post-stratification race 3 42 101 05 41 19

Significant at the 005 level two-sided test using estimation methods that assume the sample is a simple random sample

All disorders were defined with diagnostic hierarchy rules The Taylor series method was used to estimate design effectsStandard errors based on simple random sampling were assumed to have an expectation of (pqn)12

out trend comparisons Although model-based esti-mation was not used in the NCS this will be animportant feature of the NCS-R analyses as well as ofNCS versus NCS-R trend analyses based on thegreater focus than in the NCS on risk factor analysesand on concerns about the extent to which riskfactor results generalize to segments of the popula-tion that might differ in representation across sampleweights and clusters

Finally it is important to recognize that a richmulti-purpose survey like the NCS-R contains somuch information that it will take many years and

IJMPR 132 3rd v4 25604 1026 am Page 91

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92

Page 24: The US National Comorbidity Survey - Harvard Medical School · The US National Comorbidity Survey Replication (NCS-R): design and field procedures RONALD C. KESSLER, 1 PA TRICIA BERGLUND,

Kessler Berglund et al92

many workers to learn all the survey has to tell usThis is a much greater undertaking than any oneresearch team can hope to achieve As a result apublic use NCS-R data file is being released for unre-stricted use We hope that this data file will be thesource of many dissertations and secondary analysesthat expand on the primary analyses being carried outby our research group The NCS Web site will postinformation about timing and procedures forobtaining the public data file We will also offer aseries of training courses in the use of the public datafile Information about these courses will be posted onthe Web site as the schedule is set Finally we plan tooffer help in letting users of the public data file knowabout each other and coordinate their investigationsby creating a separate page on the NCS Web site forusers to describe the lines of research they arecarrying out with the data

AcknowledgementsThe National Comorbidity Survey Replication (NCS-R) issupported by the National Institute of Mental Health(NIMH U01-MH60220) with supplemental support fromthe National Institute of Drug Abuse (NIDA) theSubstance Abuse and Mental Health ServicesAdministration (SAMHSA) the Robert Wood JohnsonFoundation (RWJF Grant 044708) and the John WAlden Trust Collaborating investigators include RonaldC Kessler (Principal Investigator Harvard MedicalSchool) Kathleen Merikangas (Co-Principal InvestigatorNIMH) James Anthony (Michigan State University)William Eaton (The Johns Hopkins University) MeyerGlantz (NIDA) Doreen Koretz (Harvard University) JaneMcLeod (Indiana University) Mark Olfson (ColumbiaUniversity College of Physicians and Surgeons) HaroldPincus (University of Pittsburgh) Greg Simon (GroupHealth Cooperative) Michael Von Korff (Group HealthCooperative) Philip Wang (Harvard Medical School)Kenneth Wells (UCLA) Elaine Wethington (CornellUniversity) and Hans-Ulrich Wittchen (Institute ofClinical Psychology Technical University Dresden and Max Planck Institute of Psychiatry) The authorsappreciate the helpful comments on earlier drafts of Jim Anthony Doreen Koretz Kathleen MerikangasBedirhan Uumlstuumln Michael von Korff and Philip Wang A complete list of NCS publications and the full text of all NCS-R instruments can be found athttpwwwhcpmedharvardeduncs Send correspon-dence to NCShcpmedharvardedu

ReferencesDuMouchel WH Duncan GJ Using sample survey

weights in multiple regression analyses of stratifiedsamples J Am Stat Assoc 1983 78 535ndash43

Groves R Fowler F Couper M Lepkowski J Singer ETourangeau R Survey Methodology New York Wileyin press

Gurin G Veroff J Feld SC Americans View Their MentalHealth A Nationwide Interview New York BasicBooks Inc 1960

Kessler RC Uumlstuumln TB The World Mental Health (WMH)survey initiative version of the World HealthOrganization Composite International DiagnosticInterview (CIDI) International Journal of Methods inPsychiatric Research 2004 (this issue)

Kish L Frankel MR Inferences from complex samplesJournal of the Royal Statistical Society 1974 36 (SeriesB) 1ndash37

Kish L Kish JL Survey Sampling New York John Wileyamp Sons 1965

Rubin DB Multiple Imputation for Nonresponse inSurveys New York John Wiley amp Sons 1987

Tourangeau R Smith TW Collecting sensitive informa-tion with different modes of data collection In MCouper RP Baker J Bethlehem CZF Clark J MartinWL Nicholos II JM OrsquoReilly eds Computer AssistedSurvey Information Collection New York Wiley 1998431ndash54

Turner CF Ku L Rogers SM Lindberg LD Pleck JHSonenstein FL Adolescent sexual behavior drug useand violence increased reporting with computer surveytechnology Science 1998 280 (5365) 867ndash73

Turner CF Lessler JT Devore J Effects of mode of admin-istration and wording on reporting of drug use In CFTurner JT Lessler and J Gfroerer eds SurveyMeasurement of Drug Use Methodological StudiesRockville MD National Institute on Drug Abuse1982

Veroff J Douvan E Kulka R The Inner American A SelfPortrait from 1957 to 1976 New York Basic Books1981

Wolter KM Introduction to Variance Estimation NewYork Springer-Verlag 1985

Zaslavsky AM Schenker N Belin TR Downweightinginfluential clusters in surveys application to the 1990Post Enumeration Survey Am J Stat Assoc 2001 96(455) 858ndash69

Correspondence RC Kessler Department of HealthCare Policy Harvard Medical School 180 LongwoodAvenue Boston MA USA 02115Telephone (+1) 617-432-3587Fax (+1) 617-432-3588Email kesslerhcpmedharvardedu

IJMPR 132 3rd v4 25604 1026 am Page 92