test-retest reliability of the diagnostic interview

12
Test-Retest Reliability of the Diagnostic Interview Schedule for Children (DISC 2.1) Peter Jensen , National Institute of Mental Health Margaret Roper, National Institute of Mental Health Prudence Fisher, Columbia University College of Physicians and Surgeons John Piacentini, Columbia University College of Physicians and Surgeons Glorisa Canino, University of Puerto Rico John Richters, National Institute of Mental Health Maritza Rubio-Stipec, University of Puerto Rico Mina Dulcan, Emory University Sherryl H Goodman, Emory University Mark Davies, Columbia University College of Physicians and Surgeons Only first 10 authors above; see publication for full author list. Journal Title: Archives of General Psychiatry Volume: Volume 52, Number 1 Publisher: American Medical Association (AMA) | 1995-01-01, Pages 61-61 Type of Work: Article | Final Publisher PDF Publisher DOI: 10.1001/archpsyc.1995.03950130061007 Permanent URL: https://pid.emory.edu/ark:/25593/rfm56 Final published version: http://dx.doi.org/10.1001/archpsyc.1995.03950130061007 Copyright information: © 1995, American Medical Association Accessed November 28, 2021 7:51 AM EST

Upload: others

Post on 28-Nov-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Test-Retest Reliability of the Diagnostic InterviewSchedule for Children (DISC 2.1)Peter Jensen , National Institute of Mental HealthMargaret Roper, National Institute of Mental HealthPrudence Fisher, Columbia University College of Physicians and SurgeonsJohn Piacentini, Columbia University College of Physicians and SurgeonsGlorisa Canino, University of Puerto RicoJohn Richters, National Institute of Mental HealthMaritza Rubio-Stipec, University of Puerto RicoMina Dulcan, Emory UniversitySherryl H Goodman, Emory UniversityMark Davies, Columbia University College of Physicians and Surgeons

Only first 10 authors above; see publication for full author list.

Journal Title: Archives of General PsychiatryVolume: Volume 52, Number 1Publisher: American Medical Association (AMA) | 1995-01-01, Pages 61-61Type of Work: Article | Final Publisher PDFPublisher DOI: 10.1001/archpsyc.1995.03950130061007Permanent URL: https://pid.emory.edu/ark:/25593/rfm56

Final published version:http://dx.doi.org/10.1001/archpsyc.1995.03950130061007

Copyright information:© 1995, American Medical Association

Accessed November 28, 2021 7:51 AM EST

Test-Retest Reliability of the Diagnostic InterviewSchedule for Children (DISC 2.1)Parent, Child, and Combined AlgorithmsPeter Jensen, MD; Margaret Roper, MS; Prudence Fisher, MS; John Piacentini, PhD; Glorisa Canino, PhD;John Richters, PhD; Maritza Rubio-Stipec, MA; Mina Dulcan, MD; Sherryl Goodman, PhD; Mark Davies, MPH;Donald Rae, MS; David Shaffer, MD; Hector Bird, MD; Benjamin Lahey, PhD; Mary Schwab-Stone, MD

Background: Previous research has not compared thepsychometric properties of diagnostic interviews of com¬

munity samples and clinically referred subjects within a

single study. As part of a multisite cooperative agreementstudy funded by the National Institute of Mental Health,97 families with clinically referred children and 278 fami¬lies identified through community sampling proceduresparticipated in a test-retest study of version 2.1 of the Di¬agnostic Interview Schedule for Children (DISC 2.1).

Methods: The DISC was separately administered to chil¬dren and parents, and diagnoses were derived from com¬

puter algorithms keyed to DSM-I1I-R criteria. Three setsof diagnoses were obtained, based on parent informa¬tion only (DISC-P), child information only (DISC-C), andinformation from either or both (DISC-PC).

Results: Test-retest reliabilities of the DISC-PC rangedfrom moderate to substantial for diagnoses in the clini-

cal sample. Test-retest coefficients were higher for theclinical sample than for the community sample. TheDISC-PC algorithm generally had higher reliabilities thanthe algorithms that relied on single informants. Unreli¬ability was primarily due to diagnostic attenuation at time2. Attenuation was greatest among child informants andless severe cases and in the community sample.Conclusions: Test-retest reliability findings were con¬sistent with or superior to those reported in previous stud¬ies. Results support the usefulness of the DISC in fur¬ther clinical and epidemiologie research; however, closelyspaced or repeated DISC interviews may result in sig¬nificant diagnostic attenuation on retest. Further stud¬ies of the test-retest attenuation phenomena are needed,including careful examination of the child, family, andillness characteristics of diagnostic stability.(Arch Gen Psychiatry. 1995;52:61-71)

The last decade has been atime of remarkable growthand progress in the study ofchild psychopathology. De¬spite limitations, the stan¬

dardized nomenclatures1 have enabledclinical investigators and epidemiolo¬gists to develop instruments that can beemployed across clinical, laboratory, andepidemiologie settings. A number of thesemeasures are intended to be used bytrained lay interviewers, thus making pos¬sible some studies not otherwise feasibleusing traditional research interviews withclinically trained interviewers.2

Of the available diagnostic instru¬ments for children,3 the Diagnostic Inter¬view Schedule for Children (DISC) has hadthe most extensive history of develop¬ment, including four field trials. The re¬sults of the second of these trials have beenrecently described by Shaffer and col¬leagues.46 Investigators have used the DISCacross a range of research settings,7 9 haveexamined its sensitivity with rare condi-

with young children.11 In earlier ver¬

sions, its criterion validity and relation¬ships with other measures have been de¬scribed8·9 as well as the degree to which itis tolerated by subjects.12

The last two field trials have been con¬

ducted as part of the Cooperative Agree¬ment for Methodologie Research for Mul-tisite Epidemiologie Surveys of MentalDisorders in Child and Adolescent Popu¬lations (hereafter referred to as the Meth¬odology for Epidemiology in Children andAdolescents [MECA] Study), funded by theNational Institute of Mental Health, Rock-ville, Md. As envisioned in the original re¬

quest for applications, the purpose of theMECA Study was to test the feasibility andmerits of epidemiologie methods (struc-

From the National Institute ofMental Health, Rockville, Md(Drs Jensen and Richters,Ms Roper, and Mr Rae); theDivision of Child and AdolescentPsychiatry, New York StatePsychiatric Institute andColumbia University Collegeof Physicians and Surgeons,New York, NY (Ms Fisher,Drs Piacentini, Shaffer, andBird, and Mr Davies); theDepartment of Psychiatry,University of Puerto Rico,San ]uan (Dr Canino andMs Ruhio-Stipec); theDepartment of Psychiatry,Emory University School ofMedicine, Atlanta, Ga(Drs Dulcan and Goodman);the Department of Psychiatry,University of Miami (Fla)School of Medicine (Dr Lahey);and the Yale Child StudyCenter, Yale University Schoolof Medicine, New Haven, Conn(Dr Schwab-Stone).

Downloaded From: http://archpsyc.jamanetwork.com/ by a Emory University User on 01/05/2016

METHODS

The data reported were obtained during the first phase ofthe MECA Study. The DISC 2.1 was administered as part ofan extensive battery of measures being tested, including de¬mographic characteristics, global impairment, service use,barriers to service use, potential presumed risk factors, andcorrelates of childhood psychopathologic conditions.

SAMPLE

The subjects were 97 clinically referred and 278 community-sampled 9- to 17-year-old children and one of their pri¬mary caretakers (usually the mother). Clinical subjects wererecruited from clinical settings in three geographic areas

(Atlanta, Ga; New York, NY; and Puerto Rico) involved inthe MECA Study. The 278 nonclinical subjects were re¬cruited across the three sites from community popula¬tions. For these community subjects, one site (New York)used a stratified random sampling procedure (with replace¬ment of refusals), and two sites (Atlanta and Puerto Rico)used household probability sampling procedures (with no

replacement). Data from a fourth site (Connecticut) are not

presented here, since this site used different procedures andexamined validity rather than reliability. The three sites'sampling procedures are described below.

New York

The New York sample was drawn from an area directly northof New York City, where there is a population with signifi¬cant socioeconomic, geographic, and ethnic diversity, includ¬ing sizable proportions of blacks, Hispanics, and other eth¬nic minorities. The aims of the New York site's samplingprocedure were to achieve heterogeneity and to parallel thegeneral population's proportions in age, sex, socioeconomicstatus (SES), population density, and ethnicity.

Fifty-three of the 63 subjects from the clinical samplewere outpatients from school-based outpatient clinics in fivecounties within the designated catchment area. Ten addi¬tional participating subjects (from the same region) were cur¬

rently hospitalized at a child psychiatric facility. All of theclinical subjects were active cases and had been involved withthe clinical facility for a minimum of3 months and for a maxi¬mum of 2 years. The clinical subjects were drawn from a largersample of 123 eligible clinical subjects, resulting in a studyparticipation rate of 51%. The community sample was ob¬tained by using Department of Education data tapes to de¬termine the age and ethnicity profiles ofschool systems, withinwhich individual schools were selected based on their rep¬resentativeness of the characteristics of the community. Withintarget schools, English classes (or "homeroom" classes foryounger ages) were selected at random, as were students withinthese classes, based on the stratification target numbers. Chil¬dren enrolled in classes for English as a second language andthose in special education classes were excluded so that re¬

sponses would not be biased by lack of language proficiencyor cognitive ability. Three hundred sixty-four eligible sub¬jects were identified, of whom the parents of 138 childrenindicated by letter or telephone that they did not wish to par¬ticipate (38%). An additional 104 families could not be reachedto secure study participation, and complete data were notavailable for two families. Despite these difficulties, the re-

suiting sample (n= 120; study participation rate, 33%) achievedthe site's sampling aims for heterogeneity, was distributed alongthe socioeconomic spectrum and the age groups studied, con¬sisted ofhalfwhite and half nonwhite subjects, and paralleledthe general population for urban/suburban/rural status. Therewas a preponderance of females in the community sample, incontrast to more males in the clinical sample (Table 1 ).

Puerto Rico

The clinical test-retest subjects were obtained from 9- to 17-year-olds consecutively admitted to an outpatient clinic (18cases) and additional subjects (10 cases) from a substance abusetreatment center. Two thirds of subjects were male. Twenty-four of the 28 cases completed the interviews at both timepoints. As expected in clinical samples drawn from the pub¬lic sector, children in the clinic sample were predominantlymale (Table 1) and of low SES (82%). Refusal rates were not

tracked, so it was not possible to examine factors that mayhave shaped study participation in this sample.

Community subjects were obtained from a probabil¬ity sample of children ages 9 through 17 years from the SanJuan metropolitan area. Sampling blocks were randomly dis¬tributed into two community subsamples of 124 subjects eachto carry out two separate studies designed to measure reli¬ability and validity of the DISC. This report concerns onlyone of the two samples (the reliability sample). Of these 124families, 118 pairs (95%) agreed to participate and completedthe study. This community sample resembled the 1990 cen¬

sus population of children in Puerto Rico ages 9 to 17 years,evenly distributed by gender, with most of low SES.

Atlanta

Twelve children being seen in a psychiatric outpatient clinicwere selected for participation in the test-retest study; ofthese 12, one family refused (92% participation). As in theNew York and Puerto Rico clinical samples, males out¬numbered females. The community sample was selectedfrom two census tracts in Georgia, one in an urban countyand one in a rural county. These two census tracts were

chosen so that the resultant probability sample of youthswould have variation in race, urban/rural residence, andSES. A stratified multistage survey design was used to iden¬tify youths to be interviewed. Sampling regions in the ur¬

ban census tract were stratified by race (mostly black, mostlywhite, mixed) and by SES (lower, upper). The rural cen¬

sus tract was stratified by race only since the SES appearedrelatively homogeneous. Differential sampling rates were

applied across strata to equalize the number of black andwhite youths and the number of lower and upper SES youthsin the sample. Sample housing units were randomly se¬

lected for participation in the survey. If there was more thanone child of the appropriate age living in the household,one was randomly selected using a table of random num¬bers. Sixty-six subjects were selected for study participa¬tion; of these, 26 refused, leaving 40 subjects and families(61%) who participated in the community test-retest studyat the Atlanta site.

INSTRUMENT

The DISC 2.1, a highly structured diagnostic instrument in¬tended for lay administration, was administered to parents and

Downloaded From: http://archpsyc.jamanetwork.com/ by a Emory University User on 01/05/2016

children.6 Parent and child versions of the interview re¬

quire approximately 60 to 75 minutes of the subject's time.This version of the DISC (version 2.1) differs significantlyfrom earlier versions of the instrument,6·8 in that items were

grouped into separate diagnostic modules, each based ona set of related diagnoses. Also, the time covered by the in¬terview was changed from 1 year to 6 months (to main¬tain consistency with the duration requirements in DSM-ffl-R), and questions concerning age at first episode, current

impairment, treatment history, and precipitating Stressorswere added at the end of each diagnostic module. Furtherspecific refinements were made based on the input of ex¬

perienced clinical investigators concerning exact wordingof specific questions. Also, unreliable items were deleted,based on data analyses from the earlier DISC-Revised fieldtrials.6

All DISCs were administered by college or graduate level"lay" interviewers who had completed an intensive trainingperiod (usually 2 weeks, with slight site-to-site variation). TheDISC interviews were monitored by experienced child inter¬viewers and/or child clinicians through review of audio-tapes. After the interview was completed, all responses fromboth parents and children were entered into a computer. Us¬ing computer algorithms based on DSM-III-R diagnostic cri¬teria, DSM-III-R diagnoses were generated separately from par¬ent information (DISC-P) and child information (DISC-C).The algorithms also generated a third set of diagnoses basedon the "or" rule (DISC-PC, where PC indicates both parentand child informants). The "or" rule specifies that either in¬formant may provide symptom information, which when com¬bined may be used to jointly meet criteria for a particular di¬agnosis, even though the diagnostic criteria may not be met

by any single informant. Thus, the "or" rule accepts the en¬dorsement of a positive symptom from either the child or theadult informant, while the determination of symptom ab¬sence requires two negative responses. The application of the"or" rule generally tends to inflate prevalence but is thoughtto more closely approximate clinical practice.18·"PROCEDURES

Clinically referred cases were generally interviewed in the clini¬cal setting, while community subjects were interviewed in theirhomes. In general, retest intervals were shorter for clinical thancommunity subjects (about 2 weeks vs 3 weeks), and sites dif¬fered significantly in test-retest intervals (Table 1). All inter¬views (parent, child, and test-retest) were conducted by sepa¬rate interviewers who had no knowledge of the other inter¬views, requiring a total of four different interviewers for eachparent-child pair. The use of two different interviewers for eachadministration introduces maximum variability ofconditionsand submits the instrument to a stringent test of its reliabil¬ity. Parent and child interviews were conducted concurrently(at the same time but in separate rooms) for over 93% ofcases.The adultwho knew the selected childbest (usually the mother)was chosen as the adult informant.

DATA ANALYSIS

There currently exists little empiric information that demon¬strates superior validity for either parent or child informa¬tion. The current standard of clinical diagnostic approachesusually involves gathering information from both the parentand the child. Accordingly, we restricted our initial analyses

to the combined (DISC-PC) algorithm results. Research sup¬port for this approach is provided in several recent studies thathave indicated that statistical weighting of informants' re¬

sponses or exclusive reliance on single-informant algorithmsdoes not perform as well as the simple combinatorial "or" ruleagainst clinicians' diagnoses.18·19

The data analytic strategy proceeded as follows: First,reliability statistics from individual sites' data sets were com¬

pared to determine whether sites differed in reliability. Thiswas warranted as an initial strategy given the geographic,linguistic, and sampling differences between sites. Fur¬thermore, given the instability of statistics with few cases,site differences were examined only for those diagnoses forwhich five or more subjects met diagnostic criteria at theinitial time point at two or more sites. In some instances,all three sites had sufficient cases to allow this compari¬son; in other instances, only two of the three sites had fiveor more cases; and in several instances, sites could not becompared because of insufficient numbers.

Because we found few or no site differences, furtheranalyses of combined sites' data were undertaken. In mostinstances, clinical and community samples were analyzedseparately. Except where noted, diagnostic reliabilities wereexamined for DISC-PC diagnoses. Given sufficient cases,diagnostic reliabilities were compared between males andfemales, younger and older children, internalizing vs ex¬

ternalizing disorders, clinic vs community subjects, and par¬ent vs child informants. Some diagnoses may be more stableas a function of sex, considering the sex differences in preva¬lence of most childhood conditions, both before and afterpuberty. Similarly, some diagnoses may be more stable asa function of age, considering the age differences in preva¬lence of most childhood conditions.

Given the relatively small sample size and number ofposi¬tive diagnoses, priority was given to examining test-retest re¬

liability for the most common childhood conditions. Therewere sufficient cases of ADHD, ODD, and CD to allow for theexamination of their reliabilities as separate categories. For mostaffective and anxiety disorders, however, too few cases wereavailable to conduct meaningful reliability analyses across theclinical and community samples or to conduct more fine¬grained analyses by age and sex. Therefore, we constructedtwo additional diagnostic categories: (1) Children with ma¬

jor depression and/or dysthymia were combined into a singledepressed diagnostic group. (2) Children with any of the DSM-III-R anxiety disorder diagnoses (simple phobia, social pho¬bia, separation anxiety disorder, generalized anxiety disor¬der, panic disorder, overanxious disorder, etc) were combinedinto a single group with any anxiety disorder. More detailedanalyses of specific affective and anxiety disorders (eg, majordepression, simple phobia, and separation anxiety disorder)are available from the authors on request.

To parallel the major conceptual distinction drawn bymany investigators concerning types of childhood psycho-pathologic conditions, we also constructed superordinatecategories of any internalizing disorder (major depressivedisorder, dysthymia, and/or any anxiety disorder) and anyexternalizing disorder (ADHD, ODD, and CD). Reliabilitycoefficients were computed for each of the three diagnos¬tic algorithms (DISC-P, DISC-C, and DISC-PC).

To examine the possibility that unreliability is relatedto the relative oversensitivity of statistics to slight differences

Continued on next page

Downloaded From: http://archpsyc.jamanetwork.com/ by a Emory University User on 01/05/2016

in symptom levels around the diagnostic threshold,we computed intraclass correlations based on the test-retest diagnostic criteria for all major diagnostic cat¬

egories. For example, the 14 ADHD symptoms listedin the diagnostic criteria were treated as a dimen¬sional scalar variable, and the intraclass correlationwas computed between the two time points for this"scale." The criteria for most diagnoses involve some

degree of summated symptoms as well as durationand age of onset items. However, because onset andduration criteria rarely can be summated for more thantwo items, intraclass correlations were only com¬

puted for symptom count criteria.

tured lay diagnostic interviews, multi-informant ap¬proaches, risk factor assessments, etc) in a moderate-sized survey of children and adolescents prior to mountinga full-scale nationwide study.

Further development of the DISC is forthcoming,pending analyses from the two field trials conducted dur¬ing the MECA Study. The validity of the major diagnos¬tic categories is being assessed as a part of the secondMECA field trial and will form the basis of a future re¬

port. The focus of this report concerns the first MECAfield trial, during which a test-retest reliability study was

carried out. Information concerning the test-retest char¬acteristics of the instrument is critical to assist field work¬ers and clinical researchers who wish to conduct stud¬ies with this instrument.

Previous studies examining the reliability of earlierversions of the DISC reported moderate test-retest reli¬abilities, with the best reliabilities in the externalizing dis¬orders (conduct disorder [CD], attention-deficit hyper-activity disorder [ADHD], oppositional defiant disorder[ODD]). These studies were carried out only in clinicalsettings,4·8 an important limitation, since instruments' psy¬chometric properties are best examined in settingssimilar to those where they will be used. Instrumentsbehave quite differently in clinical vs communitysettings; they will likely be less reliable and less valid inthe community.4 Testing the psychometric perfor¬mance of the agreed-on diagnostic interview in commu¬

nity settings was a central stated goal of the initialrequest for applications for the MECA Study. The DISCwas the diagnostic instrument chosen by the MECAinvestigators.

Unfortunately, few systematic studies exist concern¬

ing the reliability of any child diagnostic instrument incommunity settings, so points of comparison are diffi¬cult to establish. Indeed, we have been unable to locateany studies that have compared the psychometric prop¬erties of diagnostic interviews for community samples andclinically referred subjects within a single study, eitherof children or of adults. Some relevant information bear¬ing on this question is found in the adult literature, how¬ever. As a part of the Epidemiologie Catchment AreaStudy, Heizer et al13 examined lay-clinician test-retest re¬

liability in a community sample. They found lower reli¬abilities with the Diagnostic Interview Schedule (DIS) for

adults than reported for clinical samples assessed withearlier versions of the instrument.14 Similar findings havebeen noted by other investigators,15"17 with low reliabil¬ity, diagnostic stability, and validity in community sub¬jects compared with clinical subjects.

If diagnostic instruments are less reliable in commu¬

nity settings than in clinical settings, the discrepancy maybe due to several factors. First, those with severe cases maybe more likely to seek out or be referred for care than thosewith mild cases, who may hover just above the diagnostic"threshold." Among such threshold subjects, small de¬creases in symptom reporting over the test-retest intervalwill result in decreased reliability.13 Second, as clinical sub¬jects and parents learn more about a diagnosis and/or itssymptoms during the evaluation or treatment process, theymay become more accepting of the symptoms and diag¬nostic process and more likely to reaffirm the presence ofsymptoms in a test-retest paradigm. Third, parents of chil¬dren referred to clinical settings may have justifiably greaterconcern about their children and may respond more care¬

fully to interview questions than parents from a commu¬

nity sample, whose motivation to participate is less wellunderstood. Also, given two informants, the parent andchild could confer about their interview responses be¬tween the first test and the retest. It is unclear what ef¬fects (ifany) the possibility of parent-child discussions mayhave on retest, but some instances of attenuation may beexplained by this factor.

The purpose of this report from the MECA Study isto describe the test-retest reliability of the recently revisedDISC (version 2.1) in clinical settings and to extend pre¬vious research by also examining its test-retest reliabilityin community settings across the participating MECA sites.Factors mediating differences between clinical and com¬

munity sample test-retest statistics are explored.RESULTS

In Table 1 we outline the number of subjects from theclinical and community settings from each site as well

Downloaded From: http://archpsyc.jamanetwork.com/ by a Emory University User on 01/05/2016

*Bold, underlined statistics indicate groups in which five or more cases were positive at time 1 for that site. Site comparisons were computed onlybetween these sites.

\Only one site had five or more cases, so comparisons were not computed.

as the mean age of subjects, male-female ratio, and test-retest intervals. Sites differed significantly in terms of age,with Puerto Rican subjects significantly older than sub¬jects from the other two sites in the clinical setting. Incontrast, Puerto Rican subjects were significantly youngerthan subjects from the other two sites in the communitysetting. Sites did not differ in proportions of males andfemales. There was a preponderance of males in the clini¬cal setting, possibly reflecting the increased referral ratesof young males commonly found in studies of clinicalpopulations.20

Given the range of methods across sites and the dif¬ferences in age noted above, in Table 2 we consideredthe extent to which findings differed across sites. Giventhe relatively modest power to test for site differences be¬cause of the small number of cases, we did not correctfor the number of comparisons in this table. Despite theméthodologie, geographic, cultural, linguistic, and eth¬nic differences among sites and despite the fact that theDISC had been translated into Spanish for use in PuertoRico, there was little evidence of consistent or large in¬tersite differences in either the clinical or community set¬ting. Therefore, data for individual sites were combinedfor all subsequent analyses.

Table 3 provides information on the test-retest re¬

liability of the DISC-PC in the clinical and communitysettings. Extensive information is not provided on the par¬ent and child algorithms because the DISC-PC gener¬ally yielded more stable and higher reliabilities than theDISC-P or DISC-C. Comparisons of clinical and com¬

munity values indicate that, as predicted, the coeffi¬cients are higher in clinical than community subjects (fiveof five coefficients were higher in the clinical sample,Ps.05 by one-tailed sign test). This same pattern heldwithin each site. Further comparisons indicated that atthe individual diagnostic level, clinical coefficients were

higher than community coefficients for depressionand/or dysthymia (.70 vs .26, 2=15.3, P=.0001). A simi¬lar but nonsignificant trend was noted for any anxietydisorder (.50 vs .32, 2=2.45, P=.ll). Kappa coefficientsfor the other diagnoses did not differ between clinic and

community settings (analyses available from the au¬

thors on request).Table 4 provides further information about the test-

retest reliability of the DISC-P and DISC-C. Among com¬

munity subjects, four of five coefficients were higherfor DISC-P than for DISC-C, and among clinical sub¬jects, four of five coefficients were higher for DISC-Pthan for DISC-C.

Analyses were conducted to determine whether di¬agnostic reliabilities were related to the child's gender (10comparisons, five for each major diagnostic area, withseparate analyses for community and clinic subjects).These analyses yielded only one significant difference inreliability as a function of gender: within the commu¬

nity sample, higher test-retest reliability was found fordepression and/or dysthymia among female subjects( =.40, N=148) than among male subjects ( =

.01,N=126) ( 2=23.5, P=.0001).

To determine whether reliabilities were related tothe child's age, the samples were divided into subjects 9through 12 years and those 13 through 18 years. Thissplit allowed the most even distribution of cases in thetwo age groups and is reasonably consistent with age dif¬ferences in diagnoses before and after puberty. Theseanalyses (10 comparisons, five for each major diagnos¬tic area, with separate analyses for community and clinicsubjects) yielded two significant differences as a func¬tion of age: Within the community sample, higher test-retest reliability was found for depression and/or dys¬thymia among older subjects ( =.28, N=152) than amongyounger subjects ( =.00, N=117). Also within the com¬

munity sample, higher test-retest reliability was foundfor conduct disorder among older subjects ( =.68, N=150)than among younger subjects ( =.43, N=118) ( 2=5.19,P=.02). Given the small number of cases and relativelymodest power to find true differences, the number of com¬

parisons was not corrected for chance and should beviewed with caution.

To assess the possibility that unreliability is relatedto the sensitivity of statistics to slight differences insymptom or criterion levels around the diagnostic thresh-

Downloaded From: http://archpsyc.jamanetwork.com/ by a Emory University User on 01/05/2016

*AII values are in the form clinic/community

old, intraclass correlations were computed to examinethe test-retest reliability for the diagnostic criteria for allmajor diagnostic categories. Intraclass correlation coef¬ficients for clinical subjects were .63 for any anxiety dis¬order, .68 for depression and/or dysthymia, .79 for ADHD,.76 for ODD, and .88 for CD. Similarly, community test-retest intraclass correlation coefficients were .38 for de¬pression and/or dysthymia, .47 for any anxiety disorder,.74 for ADHD, .63 for ODD, and .68 for CD. As can beseen when these figures are compared with those in Table3, in all instances but one the intraclass correlation co¬

efficients were higher than the coefficients, and thesedifferences were sizable, especially for community sub¬jects. These data are supportive of the hypothesis thatsome difficulties with reliability occur around the diag¬nostic threshold, particularly in community cases. Giventhat the intraclass correlations themselves fell signifi¬cantly short of optimal reliability estimates, however, itmust be concluded that threshold cases at time 1 are nota sole or sufficient explanation for test-retest attenua¬tion or the less-than-optimal reliabilities.

SOURCES OF UNRELIABILITY

Given the lower reliabilities found in community set¬tings (especially for anxiety and depressive disorders),we conducted a number of post hoc analyses to furtherexplore the sources of unreliability and attenuation. Weexamined the extent to which caseness unreliability mightbe related to symptom severity at time 1, so that cases

less severely impaired at time 1 were less likely to meet

diagnostic criteria on retest. We operationalized sever¬

ity as a ratio of the sum of all endorsed "stem" questions(those asked of all respondents, regardless of the skipstructure built into the DISC) across the five major di¬agnostic categories, divided by the total number of pos¬sible stem questions. As expected, in the communitysample, stable cases (n=71, positive for diagnosis at bothtime points) endorsed an average of 43.7% of time 1 stem

symptoms compared with 33.4% among attenuating cases

(n=74) (t=-4.96, P<.0001). Similarly, in the clinicalsample, stable cases (n=64) endorsed 53.4% of time 1 stem

symptoms compared with 41.4% among attenuating cases

(n=15) (t=-3.04, P<.003). Thus, attenuation at thediagnostic level is not simply a result of subjectsswitching to different symptom patterns at time 2 butreflects an absolute decrease in the number ofsymptoms.

Similarly, subjects in the community sample whohad used mental health services in the past year (n=46)endorsed an average of 39.0% of time 1 stem symptomscompared with 28.7% among nonusers (n=232) (t=—4.38,P^.0001). Finally, we examined the proportion of com¬

munity cases who used services and compared serviceusers and nonusers on diagnostic attenuation (attenua¬tors vs nonattenuators). Among service users, 20 (69%)of 29 cases showed diagnostic stability, while among non-

users, only 54 (47%) of 116 cases showed diagnostic sta¬

bility (Fisher's Exact Test, P^.04). These findings fur¬ther indicate that the lower reliabilities in communitysamples are related to a combination of factors, includ¬ing decreased symptom severity, the presence of thresh¬old cases, and other possible factors we did not examine(eg, mental health attitudes, stigma, burden).

MODELING UNRELIABILITY

As a final examination of the sources of unreliability, we

adapted methods described by Rubio-Stipec and col¬leagues21 to perform regression analyses using com¬

puted values as the dependent variable. Because therewere five diagnostic categories, two algorithms (we usedDISC-P and DISC-C only, since DISC-PC is the combi¬nation of the first two), and two settings (clinic and com¬

munity) (5X2X2=20), there were 20 total values. Inthe regression analyses we entered the DISC diagnosissource (1 for child algorithm, 2 for parent algorithm),setting (1 for community, 2 for clinic), and simple ra¬

tios for the four sources of unreliability.According to the rationale for the computation of

these four ratios, under ideal circumstances and perfectreliability, the population consists of persons who are ei¬ther "true cases" or "true noncases." Those who meet cri¬teria at both time points are part of the population of true

cases, while those who do not meet criteria at either timepoint are part of the population of true noncases. Withfallible instruments administered under human condi¬tions, however, four permutations of test-retest re¬

sponses are possible: time 1 positive and time 2 positive(cell a [4-/4-]), time 1 positive and time 2 negative (cellb [4-/—]), time 1 negative and time 2 positive (cell c

[—

/+ ]), and time 1 negative and time 2 negative (cell d[-/-]).

With these four cells, the ratio that models the er¬ror represented by persons who are from the populationof true cases but fail to meet diagnostic criteria on sec-

Downloaded From: http://archpsyc.jamanetwork.com/ by a Emory University User on 01/05/2016

ond interview (for any of a variety of reasons) is b/(a4-b), representing "true case attenuation." Second, a

group of subjects with a tendency to impulsively over-endorse symptoms in novel settings may meet casenesscriteria at time 1 even though they are actually part ofthe population of true noncases (cell d). The ratio to ap¬proximate this error is b/(b4-d), called for descriptive pur¬poses "true noncase attenuation." Third, some true non-cases might endorse and meet criteria at time 2, possiblybecause of measurement error, random responding, etc.The ratio to approximate this potential source of infor¬mation discrepancy is c/(c4-d), called for descriptive pur¬poses "true noncase augmentation." Fourth, some sub¬jects who are part of the population of true cases couldbe slow to warm up, requiring a second interview be¬fore they are fully able to disclose their symptoms (or theymay have developed the disorder in the test-retest in¬terim). The ratio to model this "error" is c/(a4-c), called"true case augmentation." Of course, any number of hy¬potheses might be invoked to explain each of these foursources of unreliability; we have only described severalof the possibilities.

These four specific ratios were entered into the re¬

gression equation along with the informant source vari¬able (child or parent) and setting variable (clinic or com¬

munity). Stepwise and hierarchical regression analyseswere performed to determine the percentage variance ac¬

counted for in the statistics by these four sources of"error" and by the source and setting variables. As seenin Table 5, the regression was highly significant (as itshould be, since all the variance in the coefficients isrepresented by the four ratios), and the great prepon¬derance of decreased values was explained by the "truecase attenuation" ratio (usually more than four times thatof the other variables). A number of related analyses were

run, systematically forcing each of the other variables andpotential sources of unreliability first into the model; theresults of these analyses were similar, in that the "truecase attenuation" ratio continued to account for over 80%of the variance in values. The informant and settingvariables did not enter into any models, as all the vari¬ance was all accounted for by three of the four ratios. Otherterms did not enter into the final model, so no ß weights

or t values are noted in Table 5 for them (more exten¬sive information is available from the authors on request).

COMMENT

Before we comment on our findings, several caveats are

necessary. First, there were major differences among sitesin sampling procedures and subject characteristics as wellas variations in ethnicity, culture, and language. Al¬though these differences were part of the planned, al¬lowable site variations during the first phase of the MECAStudy, they nonetheless constitute important méthodo¬logie differences and possible study limitations. Regard¬less, the examination of site-specific data revealed onlyminor variations across sites. In fact, the patterns of re¬

liability (eg, clinic vs community and parent vs child) re¬

mained fairly stable across all sites, particularly in diag¬noses for sites with at least five time 1 cases.

Furthermore, because most clinical subjects weremale (whereas community subjects were equally distrib¬uted across gender) and because of the longer test-retest intervals in the community subjects, it cannot bedetermined whether the higher clinical test-retest reli¬abilities are best explained by clinical referral, male gen¬der, or the length of the test-retest interval. However, onlyone difference in reliability was found as a function ofgender (depression and/or dysthymia for females vs malesin the community sample only), which tends to rule outgender as a major confounder in the overall pattern ofresults. Similarly, differences in clinical vs communitytest-retest intervals (2 vs 3 weeks) were small and seem

unlikely to explain the clinic-community reliability dif¬ferences. In contrast, post hoc analyses of community sub¬jects indicated that a history of mental health services usewas related to increased diagnostic stability, as was in¬creased symptom severity. In our view, both of these fac¬tors are more closely related to our observed clinic-community reliability differences.

COMPARISONS WITH PREVIOUS STUDIES

In general, our reliability findings are consistent withor superior to those reported in previous studies.42223Compared with those found using earlier versions ofthe DISC studied in clinical samples,4·23 our clinicalsample reliabilities are substantially better for CD,moderately better for ADHD, and equivalent or betterfor depressive disorders. Similarly, we found superiorreliabilities for test-retest reliabilities of two anxietydisorders studied in earlier versions of the DISC,23 .74for separation anxiety disorder and .57 for overanx¬ious disorder (further details of specific anxiety anddepressive disorder reliabilities are available on

request from the authors). Compared with test-reteststudies of other diagnostic instruments, clinicalsample reliabilities reported here are comparable or

superior, despite the need for clinically trained inter¬viewers for most other instruments.23

Unfortunately, there are few points of comparisonfor our community sample reliability findings. Two re¬

cent reports used a two-stage sampling design to selectcommunity subjects with a high probability of diagno-

Downloaded From: http://archpsyc.jamanetwork.com/ by a Emory University User on 01/05/2016

*R2=.987; H2 (adjusted for degrees of freedom)=.985; F=422.9; Prs.0001; df, 3,16.^Significance of zero-order correlations: , ==.001. §,P^.07; and\\, P^.10.II These three variables did not enter the final model, so ß coefficients and t values are not reported.

sis at the second stage.11·22 However, the two-stage de¬sign of these studies makes direct comparisons of sta¬tistics with our data difficult, since such samplingstrategies increase disorder base rates and sample het¬erogeneity and result in higher coefficients in equiva¬lent (ie, fixed levels of sensitivity and specificity) diag¬nostic instruments.24 Nonetheless, these investigatorsreported much lower community sample reliabilitiesthan in previous test-retest studies in clinical samplesusing the same instrument.23 Our community sampletest-retest reliabilities do compare favorably (generallysuperior) with the reliabilities of the DIS used in theadult Epidemiologie Catchment Area studies,13·14·25despite the challenges entailed in obtaining and com¬

bining information from two informants in the MECAStudy. However, DIS reliability data may not bedirectly comparable to our results, since the DIS test-retest study was conducted with two different types ofinterviewers (lay interviewers at time 1, clinicians attime 2).

ADEQUACY OF THE DISC

Given the lower test-retest reliability of the DISC in thecommunity, are lay interviewers are really "up to the task"of collecting complex diagnostic data for epidemiologiestudies? The coefficients reported in the present studymust be considered in light of the fact that an instru¬ment with acceptable characteristics (eg, sensitivity, speci¬ficity, test-retest coefficients) in clinical settings willdemonstrate lower test-retest coefficients in epidemio¬logie (homogeneous) samples with low base rates. As oth¬ers have noted,24·26 this does not indicate a problem with statistics per se but reflects the true level of difficultyin obtaining agreement between two raters of less com¬mon conditions in the community. We suggest that incommunity samples with low disorder prevalence, "ac¬ceptable" values may be overly conservative, and thelower community sample reliability values reportedhere for the internalizing disorders (.26 to .32) shouldnot be dismissed as too unreliable for use in epidemio¬logie studies.

Could clinical interviewers do better? In a recenthead-to-head comparison with the DISC, clinician-generated diagnoses proved less reliable than lay inter¬viewer-administered DISC diagnoses.5 Large-scale epi¬demiologie studies with clinical interviewers are probably

not logistically feasible, and clinician-generated diag¬noses are likely to be quite unreliable in communitysettings.

It is unfortunate that our samples only included chil¬dren age 9 years and older, given the relatively large pro¬portion of younger children who come to clinical atten¬tion and are in need of services.27·28 While previousresearch has indicated that problems with test-retest re¬

liability increase as a function of younger age of the childinformant, this is not necessarily the case with parentalinformants. In fact, evidence suggests that some diag¬noses may be more reliable with the parents of youngerchildren than with the parents of older children.9·11·22 Cer¬tainly, more information is needed concerning the psy¬chometric properties of diagnostic interviews for youngerchildren.

Given the greater reliabilities among parents thantheir children, could one use the parent report alonefor diagnostic purposes? Such a strategy is problematicbecause parents and children provide nonredundantinformation. For example, in the present report, in thecommunity sample, most of the conduct, anxiety, andaffective disorders were identified through informa¬tion obtained from the DISC-C. While opposite find¬ings were seen in the clinical sample (most cases were

identified through the DISC-P), increasing evidencesuggests that child-derived diagnoses in communitysubjects are reliable and, more important, have long-term prognostic significance.29·30 Instead of surrender¬ing children as potential informants, increasedresearch is needed to determine under which circum¬stances (eg, clinical vs community settings) whichdiagnoses are best determined from which informants(parent, child, or both) for which children (eg, as a

function of age and gender). Certainly, it would bedubious to fail to obtain information directly fromchildren for internalizing diagnoses or conduct disor¬der, where the child often does not reveal symptomsto the parent. Even in the absence of parental endorse¬ment, the child's report of symptoms may have mean¬

ing in terms of subthreshold conditions, other diag¬nostic entities, or risk for future disorder.

ATTENUATION

Collectively, our analyses indicate that test-retest changesin DISC-based diagnostic status are not just a function

Downloaded From: http://archpsyc.jamanetwork.com/ by a Emory University User on 01/05/2016

of random error but are significantly more likely to oc¬cur around the diagnostic threshold, occur more com¬

monly in community than in clinic cases, and are re¬

lated to symptom severity at time 1. In addition, forspecific diagnoses, decreased reliability may be relatedto children's age (depression, CD) and gender (depres¬sion).

The level of test-retest attenuation we report hereis not unique to childhood diagnostic interviews but isalso a significant problem for adult diagnostic inter¬views, including semistructured interviews completed byclinicians and highly structured interviews conducted bylay interviewers.1317 This pervasiveness of test-retest at¬tenuation has led some investigators to argue that diag¬nostic instability should be regarded as a phenomenonworthy of study in its own right,15 with particular atten¬tion to personal and situational factors that are system¬atically related to test-retest differences in symptom re¬

porting. It is probably too narrow a perspective to dismisssuch phenomena simply as "unreliability," since system¬atic, measurable (and perhaps alterable) sources of vari¬ance (heretofore presumed to be "error") underlie sig¬nificant amounts of test-retest attenuation.

Although test-retest attenuation in psychopathol-ogy research very much reflects a phenomenon in needof an explanation, its pervasiveness across measures, meth¬ods, and informants suggests the possibility that tradi¬tional retest strategies in this domain may constitute a

poor assay of a measure's reliability. As Robins31 has sug¬gested, the diagnostic interview process itself appears to

change subjects' response sets in ways that make a closelyspaced retest interview difficult to interpret. If such a re¬

activity effect is operating, the comparison of two inter¬views over a relatively short time may be no more inter¬pretable than retesting the effects of an acute psychologicalstress on heart rate immediately following an initial test.In both cases, it may be necessary to wait until a sub¬ject's reactivity to the initial stimulus condition has re¬

turned to its baseline level.Any number of mechanisms may underlie interview-

triggered changes in subjects' response sets. One possi¬bility is that the intensive nature of diagnostic interviewquestions alters an individual's threshold for symptomreporting. The mere act of reporting a symptom as pres¬ent may have a cathartic effect on some individuals, so

that they later de-emphasize the significance of thosesymptoms. This hypothesis has not been examined in theliterature, but it is reasonable to predict that any cathar¬tic effect on symptom reporting would be short-lived, withsubjects returning to their baseline evaluative levels fol¬lowing a sufficient "washout" period. Existing data (in¬cluding those reported above) suggest that the catharsishypothesis will need to be tested with test-retest dura¬tions exceeding the standard retest interval of 2 weeks.

Another source of test-retest attenuation may be a

subject's conscious or unconscious desire to shorten thesecond interview by saying "no" to more symptom ques¬tions, having learned that "yes" responses during the ini¬tial interview result in additional questions and a longerinterview. To the best of our knowledge, this hypoth¬esis has not been tested in either the adult or child psy-chopathology literature. An optimal test of this hypoth-

esis requires a design with symptom questions presentedin counterbalanced order on two occasions. Support forthe endorsement-avoidance hypothesis would be con¬sistent with a significant association between likelihoodof symptom endorsement and the temporal ordering ofsymptoms within either interview or with a negative as¬sociation between the length of the first and second in¬terviews. This hypothesis cannot be tested on our databecause symptom questions were presented in a fixed or¬

der, with more severe diagnoses (and therefore lowerprobabilities of symptom endorsement) coming towardthe end of the interview. Interestingly, Ribera and col¬leagues (J. Ribera, PhD, G.C., M.R.-S., et al, unpub¬lished results, 1992) have noted that the test-retest at¬tenuation may be lower when the subsequent interviewis completed by a clinician, and they note the possibilitythat the interviewee continues to tell the complete storyon retest, possibly because he or she may feel that thephysician would notice the discrepancy between the firstreport and the subsequent one. These alternative expla¬nations will be examined in more detail in subsequentreports.

THE POST HOC regression analyses do not inany sense "prove" which if any of the cellsconstitute the "true" cases (eg, 4-/4-, 47-,or-/4-). However, because psychopatho-logic conditions are not usually seen as so¬

cially desirable, because of the potential cathartic ef¬fects of the first interview, and because subjects may wishto shorten the second interview by saying no (havinglearned that "yes" responses result in additional ques¬tions and a lengthier interview), we tend to place greatercredence on the veracity of the first interview (4-/4- plus4-/- subjects) rather than on the smaller number of cases

who met the criteria for diagnosis in both time periods(4-/4- subjects only).31 Our analyses of the four ratios ofpotential unreliability (Table 5) tend to support this po¬sition, but further studies specifically examining this ques¬tion are needed.

Attenuation appears to be higher in children thanadults, possibly because children may be more vulner¬able to cognitive distortions of the meaning of the re-

test. For example, children may fail to understand themeaning of the retest, thinking that they need not re¬

peat information given earlier (assuming it is now known),or they may conclude that the interview questions are

being repeated because their previous responses were in¬correct. If such explanations are accurate, the actual na¬

ture of the preparation of subjects for test-retest designsis of major importance for future studies.

Other factors that may explain parent-child attenu¬ation and reliability differences include the possibility thatchildren are more likely to forget the details of the pre¬vious interview or attach less significance to the rereport-ing of all symptoms or to the diagnostic process, so thatthey discount symptoms more readily (resulting in greaterattenuation on the second interview). Also, children are

likely more impatient than adults, have a shorter atten¬tion span, and have a lower tolerance for boredom. If so,a second interview will be less novel than the first and

Downloaded From: http://archpsyc.jamanetwork.com/ by a Emory University User on 01/05/2016

may fail to hold their attention, with the result that theymay tend to deny symptoms the second time around, at¬

tempting to rush through the interview.A number of issues particular to child and adoles¬

cent psychopathology warrant special mention. For ex¬

ample, the relatively greater reliability of externalizingdisorders (ADHD, ODD, and CD) than internalizing dis¬orders (depression and/or dysthymia, any anxiety disor¬der) (Table 4) suggests that externalizing disorders are

more "noticeable" and problematic to parents (hence re¬membered and recalled at time 2), while internalizing dis¬orders are more subjective, transient, and prone to re¬

call difficulties. Furthermore, much of the parent's andchild's recall (as in studies of adult psychopathologic con¬ditions and life stress research) may be colored by thereporter's current emotional state.32·33 We have rela¬tively little information about such questions, and sys¬tematic studies of these phenomena are needed.

CONCLUSIONS

In summary, the test-retest reliability data from the MECAStudy range from moderate to substantial34 in clinical set¬

tings and are comparable or superior to the reliabilitiesreported for other child and adolescent diagnostic in¬struments, despite the stringent conditions to which theDISC was submitted. However, the test-retest reliabili¬ties from the community samples range from only fair(internalizing disorders) to substantial (externalizing dis¬orders).34

Is the DISC too "unreliable" for use in field sur¬

veys? Given the ubiquitous difficulties inherent intest-retest designs (particularly in community sampleswith low base rates, decreased diagnostic severity, andincreased attenuation), we suggest that such questionsare better phrased as, "What is the validity of diagnos¬tic information obtained at a single (first) time point?"and "Which strategies will decrease diagnostic attenu¬ation, particularly in longitudinal studies that requirerepeated diagnostic assessments?" Further analysesfrom the second phase of the MECA studies willexamine the validity of DISC diagnoses with respect toclinician interviews and external impairment criteria(eg, impairment, need for and use of mental healthservices). If, as expected, these data indicate robustvalidity correlates of time 1 DISC interviews, the DISCshould provide meaningful prevalence estimates incommunity surveys. However, further studies of diag¬nostic attenuation are needed to examine variations inthe DISC by reporter (parent vs child), by interviewer(layperson vs clinician), and by diagnosis. Studies ofthe effects of child and parent age, intelligence, attri¬butions about symptoms, and the context in which thechild and/or parent describes the child's symptoms are

essential. Innovative strategies to reduce the effects ofrepeated assessment procedures on symptom report¬ing are very much needed.

Accepted for publication June 13, 1994.The opinions and assertions contained in this article

are the private views of the authors and are not to be con¬

strued as official or as reflecting the views of the Depart-

ment of Health and Human Services or the National In¬stitute of Mental Health.

The logic for the computer algorithms used to pro¬cess the responses was developed by a team composed ofDavid Shaffer, MD, Prudence Fisher, MS, and John Pia¬centini, PhD. The computer programming was done by MaryRojas, PhD, Michael Parides, MS, and Mark Davies, MPH.

The MECA Program is an epidemiologie methodol¬ogy study performed by four independent research teamsin collaboration with staff of the Division of Clinical Re¬search, which was reorganized in 1992 with componentsnow in the Division of Epidemiology and Services Re¬search and the Division of Clinical and Treatment Re¬search of the National Institute of Mental Health (NIMH),Rockville, Md. The NIMH Principal Collaborators are DarrylA. Regier, MD, MPH, Ben Z. Locke, MSPH, Peter S. Jensen,MD, William E. Narrow, MD, MPH, and Donald S. Rae,MA; the NIMH Project Officer was William J. Huber. ThePrincipal Investigators and Coinvestigators from the foursites are as follows: Emory University, Atlanta, Ga (UOlMH46725): Mina . Dulcan, MD, Benjamin B. Lahey, PhD,Donna J. Brogan, PhD, Sherryl Goodman, PhD, and ElaineFlagg, PhD; Research Foundation for Mental Hygiene atNew York State Psychiatric Institute, Columbia Univer¬sity, New York (UOl MH46718): Hector R. Bird, MD,David Shaffer, MD, Myrna Weissman, PhD, Patricia Co¬hen, PhD, Denise Kandel, PhD, Christina Hoven, PhD, MarkDavies, MPH, Madelyn S. Gould, PhD, and Agnes Whitaker,MD; Yale University, New Haven, Conn (UOl MH46717):Mary Schwab-Stone, MD, Philip J. Leaf, PhD, Sarah Hor-witz, PhD, and Judith H. Lichtman, MPH; and Universityof Puerto Rico, San Juan (UOl MH46732): Glorisa Canino,PhD, Maritza Rubio-Stipec, MA, Milagros Bravo, PhD,Margarita Alegría, PhD, Julio Ribera, PhD, Sarah Huer¬tas, MD, and Michael Woodbury, MD.

Reprint requests to Room 18C-17, Parklawn Bldg,Child and Adolescent Disorders Research Branch,National Institute of Mental Health, 5600 Fishers Ln,Rockville, MD 20857 (Dr Jensen).

REFERENCES

American Psychiatric Association. Diagnostic and Statistical Manual of MentalDisorders, Revised Third Edition. Washington, DC: American Psychiatric As¬sociation; 1987.Chambers W, Puig-Antich J, Hirsch M, Paez P, Ambrosini , Tabrizi M, DaviesM. The assessment of affective disorders in children and adolescents by semi-structured Interview. Arch Gen Psychiatry. 1985;42:696-702.Guttermann E, O'Brien J, Young G. Structured Diagnostic Interviews for Chil¬dren and Adolescents: current status and future directions. J Am Acad ChildAdolesc Psychiatry. 1987;26:621-630.Schwab-Stone M, Fisher P, Piacentini JC, Shaffer D, Gioia , Davies M. TheDiagnostic Interview Schedule for Children-Revised Version (DISC-R), II: test-retest reliability. J Am Acad Child Adolesc Psychiatry. 1993;32:651 -657.Piacentini J, Shaffer D, Fisher P, Schwab-Stone M, Davies M, Gioia P. TheDiagnostic Interview Schedule for Children-Revised Version (DISC-R), III: con¬current criterion validity. J Am Acad Child Adolesc Psychiatry. 1993;32:658-665.Shaffer D, Schwab-Stone M, Fisher P, Cohen P, Piacentini J, Davies M, Edel-brock C, Regler D. The Diagnostic Interview Schedule for Children-RevisedVersion (DISC-R), I: preparation, field testing, interrater reliability, and accept¬ability. JAm Acad Child Adolesc Psychiatry. 1993;32:643-650.Cohen P, O'Connor P, Lewis S, Velez N, Malachowsky B. Comparison of DISCand K-SADS-P interviews of an epidemiologica! sample of children. JAm AcadChild Adolesc Psychiatry. 1987;26:662-667.

Downloaded From: http://archpsyc.jamanetwork.com/ by a Emory University User on 01/05/2016

8. Costello E, Edelbrock C, Costello A. Validity of the NIMH Diagnostic InterviewSchedule for Children: a comparison between psychiatric and pediatrie refer¬rals. J Abnorm Child Psychol. 1985;13:579-595.

9. Edelbrock C, Costello AJ, Dulcan MK, Conover NC, Kalas R. Age differences inthe reliability of the psychiatric interview of the child. Child Dev. 1988;56:265-275.

10. Fisher P, Shaffer D, Piacentini J, Lapkin J, Kafantaris V, Leonard H, Herzog D.Sensitivity of the Diagnostic Interview Schedule for Children, 2nd edition (DISC-2.1), for specific diagnoses of children and adolescents. J Am Acad Child Ado¬lesc Psychiatry. 1993;32:666-673.

11. Schwab-Stone M, Fallón , Briggs M, Crowther B. Reliability of diagnostic test¬ing for children ages 6-11 years: a test-retest study of the Diagnostic Inter¬view Schedule for Children-Revised. Am J Psychiatry. 1994;151:1048-1054.

12. Lewis SA. Gorsky A, Cohen P, Hartmark C. The reactions of youth to diag¬nostic interviews. J Am Acad Child Adolesc Psychiatry. 1985;24:750-755.

13. Heizer JE, Robins LIM, McEvoy LT, Spitznagel EL, Stolzman RK, Farmer A, Brock-Ington IF. A comparison of clinical and Diagnostic Interview Schedule diag¬noses. Arch Gen Psychiatry. 1985;42:657-666.

14. Heizer J, Robins L, Croughan J, Welner A. Renard Diagnostic Interview: itsreliability and procedural validity with physicians and lay interviewers. ArchGen Psychiatry. 1981;38:393-398.

15. Rice JP, Rochberg , Endicott J, Lavori PW, Miller C. Stability of psychiatricdiagnoses: an application to the affective disorders. Arch Gen Psychiatry. 1992;49:824-830.

16. Andreasen NC, Grove WM, Shapiro RW, Keller MB, Hirschfeld RMA, McDonald-Scott P. Reliability of lifetime diagnosis. Arch Gen Psychiatry. 1981;38:400-405.

17. Anthony JC, Folstein M. Comparison of the lay Diagnostic Interview Scheduleand a standardized psychiatric diagnosis. Arch Gen Psychiatry. 1985:42:667-675.

18. Bird H, Gould M, Staghezza B. Aggregating data from multiple Informants inchild psychiatry epidemiologica! research. J Am Acad Child Adolesc Psychia¬try. 1992;31:78-85.

19. Piacentini JC, Cohen P, Cohen J. Combining discrepant diagnostic informationfrom multiple sources: are complex algorithms better than simple ones? J Ab¬norm Child Psychol. 1992;20:51-63.

20. Jensen PS, Bloedau L, Davis H. Children at risk, II: predictors of clinic utili¬zation. J Am Acad Child Adolesc Psychiatry. 1990;29:804-812.

21. Rubio-Stipec M, Canino G, Shrout , Dulcan M, Freeman D, Bravo M. Psy-

chometric properties of parents and children as informants in child psychiatryepidemiology in the Spanish Diagnostic Interview Schedule for Children (DISC-2). J Abnorm Child Psychol. 1994;22:1-18.

22. Boyle MH, Offord DR, Racine Y, Sanford M, Szatmari P, Fleming JE, Price-Munn N. Evaluation of the Diagnostic Interview for Children and Adolescentsfor use in general population samples. J Abnorm Child Psychol. 1993;21:663-681.

23. Hodges K. Structured interviews for assessing children. J Child Psychol Psy¬chiatry. 1993;34:49-68.

24. Shrout PE, Spitzer RL, Flelss J. Quantification of agreement in psychiatric di¬agnosis revisited. Arch Gen Psychiatry. 1987;44:172-177.

25. Regier DA, Myers JK, Kramer M, Robins LN, Blazer DG, Hough RL, Eaton WW,Locke BZ. The NIMH Epidemiologie Catchment Area Program: historical con¬

text, major objectives, and study population characteristics. Arch Gen Psy¬chiatry. 1984;41:934-941.

26. Kraemer HC. Charlie Brown and statistics: an exchange. Arch Gen Psychiatry.1987;44:192-193.

27. Novack AH, Bromet E, Neill TK, Abramovitz RH, Storch S. Children's mentalhealth services in an Inner-city neighborhood. Am J Public Health. 1975;65:133-138.

28. Offord DR, Boyle NH, Szatmari P, Rae-Grant NI, Links PS, Cadman DT, BylesJA, Crawford JW, Blum HM, Byrne C, Thomas H, Woodward CA. Ontario ChildHealth Study, II: six-month prevalence of disorder and rates of service utili¬zation. Arch Gen Psychiatry. 1987;44:832-836.

29. McGee R, Feehan M, Williams S, Anderson J. DSM-III disorders from age 11to age 15 years. JAm Acad Child Adolesc Psychiatry. 1992;31:50-59.

30. Rohde P, Lewinsohn PM, Seeley JR. Comorbidity of unipolar depression, II:comorbidity with other mental disorders in adolescents and adults. J AbnormPsychol. 1991;100:214-222.

31. Robins L. Epidemiology: reflections on testing the validity of psychiatric in¬terviews. Arch Gen Psychiatry. 1985;42:918-924.

32. Jenkins CD, Hurst MW, Rose RM. Life changes: do people really remember?Arch Gen Psychiatry. 1979;36:379-384.

33. Jensen PS, Traylor J, Xenakis SN, Davis H. Child psychopathology rating scalesand interrater agreement, I: parents' gender and psychiatric symptoms. J AmAcad Child Adolesc Psychiatry. 1988;27:442-450.

34. Landis JR, Koch GG. The measurement of observer agreement for categoricaldata. Biometrics. 1977;33:766-771.

Notice to Our Readers

If you are unable to access AMA publications throughLexis/Nexis Research Services you may find these pub¬lications on the following online services:

Dialog Information Services, Ine3460 Hillview AvenuePO Box 10010Palo Alto, CA 94303-0993800-3-DIALOGFAX 415-858-7069(JAMA, Archives series)

Information Access Company362 Lakeside DriveFoster City, CA 94404800-227-8431FAX 415-378-5369(JAMA, Archives series, AMNews)

CD Plus Technologies (formerly BRS-Colleague)333 Seventh AvenueNew York, NY 10001212-563-3006FAX 212-563-3784(JAMA only)

Downloaded From: http://archpsyc.jamanetwork.com/ by a Emory University User on 01/05/2016