issues associated with repeated neuropsychological assessment

Upload: icaro

Post on 06-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Issues Associated With Repeated Neuropsychological Assessment

TRANSCRIPT

  • Neuropsychology Review, VoL 5, No. 3, 1995

    Issues Associated with Repeated Neuropsychological Assessments

    Robert J. McCaffrey 1,2 and Holly James Westervelt 1

    Distingu&hing practice effects fi'om other factors in repeated neuropsychologi- cal assessments are discussed in the context of research studies and clinical/fo- rensic assessments. Potential methodological procedures for reducing the impact of practice effects in research settings are outlined. In contrast, the po- tential clinical utility and intelpretation of practice effects in clinical assess- ments and forensic evaluations are highlighted.

    KEY WORDS: practice effects; regression to the mean; serial assessments; forensic evalu- ations; clinical assessments; neuropsychological assessment.

    INTRODUCTION

    Clinical neuropsychological practitioners employed in either a clinical or research capacity may be called upon to perform and interpret serial neuropsychological assessments. The need to perform successive neuropsy- chological assessments may arise in order to monitor the progression of a disease process (e.g., dementia), to evaluate the therapeutic efficacy of a drug (e.g., AZT), or to document the effectiveness of a rehabilitation pro- gram (e.g., attention remediation training). In these instances, the task of the clinical neuropsychological practitioner is to ascertain, within reason- able certainty, the causal relationship between the independent variables and the change in the dependent variable(s).

    An important factor in any serial neuropsychological assessment is the partialling of the total variance into that attributable to the factor under

    1Department of Psychology, University at Albany, State University of New York, Albany, New York 12222.

    2To whom reprint requests should be addressed.

    203

    1(~1(1-7308/95/0900-0203507.50/0 9 1995 Plenum Publishing Corporation

  • 204 McCaffrey and Westervelt

    examination and that attributable to unwanted sources of variance. A pa- tient's neuropsychological performance across evaluations may be affected, in a complex manner, by unwanted sources of variance.

    There are multiple sources of potential variance in any neuropsy- chological assessment situation (c.f., Puente & McCaffrey, 1992). Thorndike (1949) classified sources of test-score variance by lasting/tem- porary and general/specific.

    Lasting and general characteristics of the individual would influence test-score variance in several ways. The individuals general skills (i.e., read- ing level) may affect test performance. The individual's general ability to comprehend instructions and his/her test-taking-ability would also be im- portant factors, along with the ability to solve the types of problems con- tained in the test and his/her general emotional reaction to test taking situations (viz., self-confidence).

    Lasting and specific characteristics of the individual would include knowledge and skills required by a particular item or type of test. Addi- tionally, prior attitudes and emotional reactions related to particular as- sessment stimuli may affect performance. For example, an adult with a developmental reading disability may experience a stimulus specific reac- tion to being asked to read aloud written materials from any of a number of neuropsychological assessment instruments.

    Temporary and general characteristics would be expected to system- atically affect an individual's performance on various assessment instru- ments at a particular point in time. Included among these variables are the individual's health and emotional state. The assessment environment could also contribute to this subtype of error variance (e.g., room tempera- ture, light, ventilation, noise, etc.). The individual's motivation and rapport with the assessor would also be important sources of error variance.

    Temporary and specific characteristics of the individual pose the great- est challenge to neuropsychological practitioners conducting serial assess- ments since, by definition, these potential sources of variance are the least stable. For example, fluctuations in attention, concentration, and memory are potential sources of variance in serial neuropsychological assessment. Changes in fatigue or motivation, as well as emotional states, may also contribute to the apparent instability of an individual's performance across assessments. In some instances, the issue of malingering could also be an important factor in discrepant performance between assessments.

    Another important source of unwanted variance in serial neuropsy- chological assessments is practice effects, that is, improvement in a patient's neuropsychological performance attributable to the effects of repeated as- sessment with the same instrument(s). This paper will discuss practice ef- fects from two perspectives: research studies and clinical/forensic

  • Practice Effects 205

    evaluations. In research settings, controlling or reducing the variance at- tributable to practice effects may be an important experimental design con- sideration to accurately evaluate changes in patients' performance. In routine clinical assessments/forensic evaluations, the presence of practice effects may provide the clinician with important and useful information about the integrity of the underlying cerebral system(s) mediating an indi- vidual patient's performance. Thus, depending upon the setting, practice effects may be conceptualized as either a source of unwanted variance needing to be addressed or as another factor to be entered into the clinical judgement equation.

    Reliability

    Reliability may be defined as the degree to which test scores are free from the effects of measurement error. The difference in a patient's per- formance on the same neuropsychological assessment instrument from one time to another may be due to measurement error. Measurement error reduces both the reliability and generalizability of patients' performance (Standards for Educational and Psychological Testing, 1985).

    The psychological and neuropsychological assessment literature contains data on the test-retest reliability for only the most widely used instruments that assess cognitive/behavioral performance. For the most part, however, only reliability coefficients are reported (e.g., Brown, Rourke, & Cicchetti, 1989; Matarazzo, Wiens, Matarazzo, & Manaugh, 1973; Su & Yerxa, 1984). Although reliability coefficients provide useful psychometric information, they are meaningless with regards to evaluating the impact of practice effects. A test-retest reliability coefficient of 0.89 does not necessarily imply that the group mean performance is highly stable from testing at time one to time two. A test-retest reliability coefficient of .89 could be obtained under several scenarios while the mean performance between time one and time two could be markedly different. For example, if patients systematically made a mild, moderate, or substantial increase or decrease in their performance at time two compared to time one but maintained their relative rank ordering on the two administrations, the test-retest reliability coefficient could remain at 0.89. This is because test-retest reliability coefficients do not take into con- sideration the potential influence of practice effects.

    Regression to the Mean

    Regression to the mean is a statistical feature of any linear prediction rule that utilizes a "least squares" model. Specifically, given any standard

  • 206 McCaffrey and Westervelt

    score Zx, the best linear prediction of the standard score Zy is one which is relatively closer to the mean of zero than is Zx if the correlation between Zx and Zy is greater than zero (Hays, 1988). Under these circumstances, it is more probable that an individual's score on subsequent evaluations will fall closer to the group mean of the thing being measured regardless of whether or not the initial score was above or below the group mean. As noted by Hays (1988), regression to the mean is "...not some immutable law of nature. Rather, it is, at least in part, a statistical consequence of our choosing to predict in this linear way, using the criterion of least squares in the choice of a rule" (p. 560).

    On a theoretical level, regression to the mean could result in either an increment or a decrement in an individual's performance. Incremental and decremental changes in performance due to regression to the mean would be a function of whether the initial score was below or above the group mean, respectively. On a pragmatic level, it may be difficult to es- timate the influence of regression to the mean from other potential sources of variance.

    In an effort to distinguish practice effects from regression to the mean in a re-analysis of ratings of hyperactivity in children at time one and time two, Zentall and Zentall (1986) make several compelling points worthy of elaboration. First, regression effects cannot explain changes in an overall mean score between time one and time two since the overall mean score at time two would be expected to equal the overall mean score at time one. This assumes, however, that no interventions, manipulations, etc. have occurred between obtaining the two scores. For example, a statistically sig- nificant change in the overall mean performance from the first to the sec- ond administration of an assessment instrument with no manipulation of an independent variable cannot be due to a regression effect. Rather, this would usually represent practice effects. With regression to the mean, sub- jects who initially scored above the mean would be expected to regress downwards toward the mean whereas subjects who initially scored below the mean would be expected to regress upward towards the mean provided that the subjects were not an extreme group drawn from a larger population (e.g., severely depressed college students selected from the general college population on the dimension of depression). If the subjects were selected from an extreme group, they would be expected, at time two, to demon- strate regression to the population mean from which they were drawn. No regression effect would be expected to be operating towards the sample mean. In this example, the extreme depressed sample of college students would be expected to regress towards the mean of the general college stu- dent population but not towards the mean of the sample of extreme de- pressed college students.

  • Practice Effects 207

    Regression to the mean is a phenomenon which can be objectively evaluated by examining the relationship of each subject's score relative to the overall group mean at various assessment times. Changes in overall mean performance across assessments, on the other hand, would reflect practice effects.

    Practice Effects

    The repeated administration of the same neuropsychological assess- ment instrument or battery may result in an increment in the patient's per- formance due to practice effects alone. The current neuropsychological literature offers little guidance on the issue of practice effects. In general, neuropsychological assessment instruments involving a timed test, requiring an infrequently practiced response, or having a single, easily conceptualized solution are reported to be more likely to result in significant practice ef- fects (Dodrill & Troupin, 1975). Lezak (1982) notes that significant practice effects are reported to occur frequently among brain injured patients but not among neurologically intact subjects. Other investigators (Dikmen, Machamer, Temkin, & McLean, 1990; Levin, Ewing-Cobbs, & Fletcher, 1989), however, have stated that head injured patients are less likely to benefit fully from an initial neuropsychological assessment than are non- injured subjects. Shatz (1981) reached the tentative conclusion that patients with cerebral dysfunction would not be expected to show practice effects with a single retest using the WAIS. It seems likely that these differing conclusions are a function of the instrument employed, the population stud- ied, and other factors, such as time since brain injury.

    Addressing Practice Effects

    The issue of practice effects in clinical neuropsychology has begun to be addressed in recent symposia (Francis, Fletcher, Davidson, & Steubing, 1991; Johnson & Kane, 1991; Kay, 1991; Spector, 1991; Goldstein, 1991; McCaffrey, 1991; Chelune, 1991; Hermann & Wyler, 1991) and statistical procedures developed for the evaluation of change in clinical trials (Brouwers & Mohr, 1989; Knight & Shelton, 1983; Meredith & Tisak, 1990; Mohr & Brouwers, 1991; Tisak & Meredith, 1989; Welford, 1985, 1987). Nonetheless, data on the effects of repeated administrations of neuropsy- chological instruments and batteries, in the absence of specific interven- tions, are lacking.

    Statistically, Shatz (1981) has suggested that the standard error of measurement for each neuropsychological assessment instrument be used

  • 208 McCaffrey and Westervelt

    to set up confidence intervals around an individual patient's scores in order to partial out practice effects from other factors related to improvement in patients' performance across assessments (e.g., recovery of function). An- other approach has been the use of equated alternate forms [c.f., Repeat- able Cognitive-Perceptual-Motor Battery (Lewis & Rennick, 1979)]. For the majority of the neuropsychological assessment instruments, however, these are not available. Moreover, increments in a patient's performance may also occur at retesting using equated or parallel forms. This has been referred to as the "test sophistication effect" (Anastasi, 1988). In general, gains on alternate forms are smaller than those obtained using the same form on successive assessments.

    An alternative approach to reducing the influence of practice effects, which we have been utilizing in our neuropsychological assessment labora- tory since 1986, is to administer the entire neuropsychological battery twice prior to the introduction of any independent variables. This procedure in- volves, in effect, obtaining dual baseline assessments. The manipulation of the independent variables occurs after the second administration of the neuropsychological battery. The baseline level of performance is more stable since any benefits of practice have occurred. The second administration of the battery is then used as the baseline for comparison with subsequent as- sessments. The initial administration of the neuropsychological battery serves as a methodological procedure to reduce the influence of practice effects on subsequent assessments. This methodological procedure for reducing the impact of practice effects may not be appropriate for use with all patient populations and with all neuropsychological assessment instruments. For ex- ample, the Paced Auditory Serial Addition Test may require several admini- strations before practice effects are minimized and a stable level performance is achieved (Stuss, Stethem, Hugenholtz, & Richard, 1989).

    Control Groups Considerations

    In biomedical research and treatment outcome studies, the utilization and selection of a control group(s) is of considerable methodological im- portance. The process of attributing changes in the dependent variable(s) of interest to manipulation of the independent variables may require con- sideration of more than one factor. In traumatic brain injury research and clinical practice, several factors may be operating simultaneously which may differentially affect patients' performance on serial neuropsychological as- sessments. This is all the more important when the factors affecting a pa- tient's performance have either non-uniform or differential temporal relationships. For example, the rate of recovery of function following a trau-

  • Practice Effects 209

    matic brain injury may not be consistent across patients in a particular study. While evaluations of patients at 2, 4, 8, and 12 months post-injury share a common temporal relationship, this does not assure that the pa- tients are equivalent in terms of recovery of function at each assessment point. The extent of change in an individual patient's performance across assessments may reflect differential and, potentially, confounded variables. In these instances, the selection of a control group or control procedure becomes crucial in understanding the changes in a patient's performance across assessments.

    Recently, Rawlings and Crewe (1992) examined WAIS-R performance in two groups of head injured patients. Group one was administered the WAIS-R at approximately 2, 4, 8, and 12 months post-injury. The patients in group two, who were matched to those in group one on a variety of variables, were evaluated at approximately 2 and 12 months post-injury. The results indicated significantly greater improvements on several of the WAIS-R subtests for patients in group one relative to those in group two. These findings suggest that the differential performance between the two groups was due to practice effects.

    Another important factor in partialling out the contribution of practice effects is the selection of the most appropriate control group. For example, in evaluating the neuropsychological sequelae of prophylactic cranial irra- diation therapy in patients with small cell lung cancer, McCaffrey et al. (1990) included a control group of chronic cigarette smokers over other more readily available populations (e.g., college students). This was done in order to control for the influence of pulmonary disease on neuropsy- chological functioning secondary to cigarette smoking since the majority of the cancer patients had a history of smoking. If a more readily available control group had been utilized, the degree of practice effects may, or may not, have been comparable. A similar rationale may be applied to other biomedical studies involving diseases such as AIDS and the use of control groups comprised of "at risk" volunteers versus more convenient samples.

    PRACTICE EFFECTS IN RESEARCH

    In our research on the neuropsychological effects of a beta-adrenergic blocker (metoprolol) in patients with essential hypertension (McCaffrey, Ortega, Orsillo, Haase, & McCoy, 1992), the effects of prophylactic cranial irradiation in patients undergoing treatment for small cell lung cancer (McCaffrey, Ortega, Orsillo, Nelles, & Haase, 1992) and the progression of cognitive changes in symptomatic and asymptomatic patients diagnosed with HIV infection (McCaffrey et al., 1995), we have utilized the dual base-

  • 210 McCaffrey and Westervelt

    line assessments procedure as a means for reducing the confounding influ- ence of practice effects. In the essential hypertensive study, significant prac- tice effects at a 7 to 10 day test-retest interval were found for the Wechsler Memory Scale-Russell's Revision, the paired associate subtest of the Wech- sler Memory Scale, the digits backwards portion of the Digit Span subtest of the WAIS-R, a Math Test from the State of New York Regents High School Examination Competency Test, and the Grooved Pegboard test (non-preferred hand). In the cancer study, the control group of chronic cigarette smokers also demonstrated significant practice effects, at a seven to ten day test-retest interval, on the Wechsler Memory Scale-Russell's re- vision, as well as on the Trail Making Test (Part B), the Grooved Pegboard Test (non-preferred hand), and a Simple Auditory Reaction Time Test. In a follow up report, McCaffrey, Ortega and Haase (1993) found that across four assessment periods (practice effects, baseline, 3 month and 6 month follow up), the chronic cigarette smokers demonstrated statistically signifi- cant trend analyses on the Wechsler Memory Scale-Russell's Revision, the Trail Making Test (Part B) and the Grooved Pegboard (preferred hand). Of interest was the highly consistent and stable performance of the subjects on the Speech Sounds Perception Test, the Seashore Rhythm Test and Simple Auditory Reaction time across the four assessments.

    In an ongoing project using the Brief NIMH Neuropsychological Bat- tery for HIV Infection and AIDS (Butters et. al., 1990), we have followed the practice of obtaining dual baseline assessments (Van Gorp, Lamb, & Schmitt, 1993). This study is comprised of two patient groups and an "at risk" control group. The patient groups are asymptomatic and symptomatic HIV infected patients whereas the control group consists of at risk hetero- sexual and homosexual volunteers. Preliminary results (McCaffrey et al., 1995) reveal the presence of significant practice effects with a 7 to 14 day test-retest interval for the California Verbal Learning Test in all three groups. Significant practice effects have also been obtained on the Paced Auditory Serial Addition Test for all groups. On the Visual Search Test significant practice effects were obtained for the asymptomatic HIV group, only. The at risk control group demonstrated significant practice effects for the Vocabulary subtest of the WAIS-R. These findings demonstrate the importance of not generalizing between patient groups and control groups regarding the likelihood of practice effects.

    CASE ILLUSTRATION

    The following study illustrates the potential for misinterpretation of serial neuropsychological testing, as well as the importance of experimental

  • Practice Effects 211

    design considerations. Kilburn, Warsaw, and Shields (1989) conducted a study to evaluate neurobehavioral dysfunction in firemen exposed to poly- chlorinated biphenyls (PCBs) during the course of a fire involving a trans- former. The 14 firemen were exposed to PCBs through the skin contact and the inhalation of smoke and gases. All 14 firemen began to experience various combinations of symptoms 2 days to 3 months after the fire. The symptoms included extreme fatigue, headache, muscle weakness, aching joints, memory loss, impaired concentration, irritability, insomnia, impaired balance, weight loss and hypertension.

    Approximately 5 months after the fire, neurobehavioral, medical, and biochemical studies were conducted. Following these evaluations, the fire- men underwent a 2 to 3 week experimental detoxification regimen consist- ing of a regulated diet (polyunsaturated oil supplement and gradually increasing doses of niacin), and two daily sessions of aerobic exercise for 30 to 60 minutes and heat stress (sauna to 155~ to increase fat metabo- lism and sweating. Approximately 2 months after the experimental detoxi- fication regimen, the firemen were readministered the neurobehavioral tests. At the time of the readministration of the neurobehavioral tests to the PCB exposed firemen, another group of 14 firemen who had not been involved in the PCB transformer fire were also administered the neurobe- havioral tests.

    The neurobehavioral assessment consisted of the Verbal Memory, Fig- ural Memory and Digit Span subtests of the Wechsler Memory Scale (Wechsler, 1945), the Block Design subtest from the Wechsler Adult In- telligence Test, the Embedded Figures Test, a cultural free test, the Grooved Pegboard Test (dominant hand only) and a visual two choice re- action time task. In addition, Parts A and B of the Trail Making Test and the Finger Tip Number Writing subtests from the Halstead-Reitan Neurop- sychological Battery were included. In addition, the firemen were also ad- ministered the Profile of Mood States and two standing erect body balance measure.

    A comparison of the baseline performance of the 14 PCB exposed firemen with the 14 control firemen revealed that the PCB exposed fire- men's neuropsychological performance was statistically poorer (p < .05) on the Verbal Memory, Figural Memory, and the digits backward subtest of the of the WMS, the Block Design from the WAIS, Parts A and B of the Trail Making Test, the Culture Fair test and the Choice Reaction Time test. Statistically significant differences were also obtained on the Profile of Mood States for anger, depression, vigor and fatigue.

    Kilburn et al. (1989) then compared the PCB exposed firemen's neurobehavioral performance pre- to posttreatment. This comparison indi- cated improved performance from the pretreatment assessment on the Vet-

  • 212 McCaffrey and Westervell

    bal Memory and Figural Memory subtests of the WMS, the Block Design subtest of the WAIS, Part B of the Trail Making Test, the Embedded Fig- ures Test and on one of the two indices of body balance. There were no statistically significant changes obtained with the Profile of Mood States questionnaire.

    Kilburn et al. (1989) concluded that their results indicate the revers- ibili O, of neuropsychological impairment as a function of the experimental detoxification regimen. A closer examination of their procedures raises sev- eral issues which may not support the authors conclusions. First, the control group of firemen were evaluated only once for comparison with the pre- treatment status of the exposed firemen. The differences obtained between the two groups at the "pretreatment" assessment may have been due to a variety of factors, in addition to PCB exposure. Second, the control group of firemen were not reassessed for direct comparison with the PCB exposed firemen. This raises a serious question about the role of practice effects as a primary factor in the obtained "improvement" in the PCB exposed fire- men's neuropsychological status from pretreatment to posttreatment. None- theless, the authors conclude "... we consider it unlikely that the improved function recorded across the test-retest interval in firemen reflects test fa- miliarity or learning" (p. 348). The authors later acknowledged that their findings should be interpreted with caution and that the improved cognitive functioning in the PCB exposed firemen from pre- to posttreatment may not be due exclusively to the experimental detoxification program.

    From a research perspective, this report is, at best, inconclusive for two key reasons. First, the data from our laboratory has demonstrated sig- nificant practice effects for the Verbal Memory and Figural Memory sub- tests of the WMS and for Part B of the Trail Making Test. Although the test-retest interval and patient populations studied in our lab differ from those in the Kilburn et al. (1989) report, we have obtained statistically sig- nificant practice effects, in more than one population, on three of the five indices that Kilburn et al. reportedly obtained significant improvement on and attributed those changes to their experimental detoxification treatment. Second, in the absence of a posttreatment assessment for the control group of firemen, it is unclear whether or not the reported cognitive improve- ments in the PCB exposed firemen were due to the detoxification treat- ment, practice effects or a combination of factors. Third, Kilburn et al. ignored the issues associated with multiple t-tests and also presumed that the assessment instruments employed in their study were free of practice effects. The assumption that neuropsychological assessment instruments are free of significant practice effects is all to commonly made by both re- searchers and clinicians in the absence of data to support this premise.

  • Practice Effects 213

    ASSUMPTIONS REGARDING PRACTICE EFFECTS

    The clinical and research literature contains multiple examples of prac- titioners and investigators presuming that there are minimal or no practice effects associated with the assessment instruments they have chosen to utilize or that the population under study is somehow immune to practice effects. For example, Schain, Ward, and Guthrie (1977) examined the behavioral and cognitive effects of an anticonvulsant drug in children with seizure dis- orders. The improved performance on all of the cognitive measures was at- tributed to the anticonvulsant medication. The following statement was made by the investigator in order to justify their disregarding of the role of practice effects in their results: "The 4- to 6-month interval between pretrial and post-trial measurements suggests to us that practice effects should be minimal. Furthermore, the lack of correlation of test changes with age or intelligence argues against practice effects, since older, more intelligent sub- jects are more likely to learn and retain test-taking skills" (p. 252). Although the assumptions of Shain et al. appear logical and, perhaps, reasonable, they are nothing more than unsubstantiated conjecture. An equally plausible ar- gument could be made for the more intelligent children's performance to be nearer the ceiling range at the initial assessment and, therefore, have less potential for improvement in their performance at the retest. Using this same logic, the less intelligent children would be expected to perform well below the ceiling range and have a greater potential for improved perform- ance (i.e., practice effect) at the retest.

    Other researchers have made similar assumptions regarding practice effects. In a study evaluating test-retest IQ changes among patients with epilepsy, Seidenberg, O'Leary, Giordani, Berent, and Boll (1981), noted that "...the test-retest interval for the three groups exceeds 20 months, which is longer than the period of time the effects of practice are typically expected to operate" (p. 252). Again, this is an unsubstantiated assumption.

    These two studies highlight the tendency of investigators to make con- venient but unsubstantiated assumptions regarding practice effects.

    Practice Effects and Reliability Coefficients

    The rationale for the presumed absence of practice effects with neuropsychological assessment instruments administered at intervals of 6 months or longer is not clearly delineated in either the clinical or research literature. One possible explanation for this unfounded assumption is the observation that the magnitude of test-retest reliability coefficients tend to be inversely related to the duration of the test-retest interval. Specifically,

  • 214 McCaffrey and Westervelt

    the shorter the test-retest interval the more stable the correlation coeffi- cient whereas the longer the test-retest interval the less stable the corre- lation coefficient. As noted previously, the stability of test-retest correlation coefficients may be independent of practice effects. Test-retest correlations evaluate the relative stability of the rank order of the individual scores be- tween tests. Practice effects, however, are based on the overall group mean performance between test intervals. As such, it is possible for the overall group mean performance of subjects to improve between assessments while the relative rankings of the same subjects varies. This could result in a lowered test-retest correlation coefficient and a substantial practice effect. The stability of test-retest reliability scores between assessments may have no bearing on the presence or degree of practice effects. Therefore, infer- ences regarding the presence or absence of practice effects based on test- retest reliability correlation coefficients are unfounded.

    AGE FACTORS

    In normal adults, cognitive abilities tend to remain quite stable. As such, improved performance in a test-retest situation most likely reflects the effects of practice. The normal process of aging, however, may require special considerations in the evaluation of practice effects when dealing with children and older adults.

    CHILDREN

    Several researchers have claimed that it seems unlikely that children would be susceptible to the effects of practice (Schain, Ward, Guthrie, 1977), however, empirical support for this position is lacking. Dyche and Johnson (1991) found that children ages 8 to 14 made a gain of 19.7% on a children's version of the Paced Auditory Serial Addition Test with a test- retest interval of approximately 4 weeks. This gain was slightly greater than that generally obtained by adults on the PASAT (Stuss, Stathem, & Poirier, 1987). In addition, Longstreth and Alcorn (1990) reported practice effects on four of the five performance subtests on the Wechsler Preschool and Primary Scale for Children in groups of children ages 3 to 6 at a test-retest interval of seven to ten days.

    A variety of factors may complicate the interpretation of practice ef- fects when working with children. Most notable is the issue of maturation. In children, maturational changes may account for improved performance on neuropsychological tests at test-retest intervals as brief as six months

  • Practice Effects 215

    (Levin & Ewing-Cobb, 1989). Another important issue has been outlined by Coutts et. al. (1987), who found that among sixth graders exposure to other testing situations alone may improve performance on the Category Test of the Halstead-Reitan Neuropsychological Battery as much as re- peated exposure to the Category Test. Dirks (1982) found that exposure to a commercially available game (Trac 4) increased the performance of 10 year old children on the Block Design subtest of the WISC-R. The ob- ject of the Trac 4 game is to arrange, as quickly and accurately as possible, three-dimensional cubes to match a picture model. Dirks noted that such a finding may be important in that the Block Design subtest along with the Vocabulary subtest is often used in short-form estimates of the overall WISC-R IQ. These findings may not be robust, however, as Longstreth and Alcorn (1990) were not able to replicate these findings in a similar study. The discrepancy in the findings between the two studies could be the result of different testing materials and procedures.

    Bourgeois, Prensky, Palkes, Talent, and Busch (1983) point out that in studies involving children's cognitive abilities, parents of bright children may be more likely to allow their child to participate than are parents of less bright children. This may present a serious selection bias in the re- cruitment of children as either experimental or control subjects. In addition, the issues surrounding regression to the mean of populations versus sam- ples discussed earlier would be a major concern. Bourgeois et al. recom- mended using non-affected siblings of the experimental group as controls to correct for potential subject selection bias.

    Older Adults

    Decline in intellectual test scores may begin around age 50 in non-pa- tient populations (Albert, Duffy, & Naeser, 1987). Albert et al. reported a decline in performance on memory tests, especially for delayed recall, be- tween the ages of 30 and 50. On tasks that involved confrontation naming and abstraction, Albert et al. reported a decline in performance starting at around age 60. These changes in cognitive functioning were also accom- panied by a change in the magnitude of practice effects. Ryan, Paolo, and Brungardt (1992) noted that the practice effects typically seen in middle- aged adults do not occur as reliably or as robustly in older adults. Per- formance changes may reflect an age-related decline in fluid intelligence, specifically, a decline in older adults' abilities to benefit from learning. Shatz (1981) reported that the practice effects on the WAIS generally seen in older adults may be less than half of what is seen in middle-aged adults, or even non-existent. Middle aged adults may be expected to show and

  • 216 McCaffrey and Westervelt

    increase of 5 IQ points on a retest of the WAIS compared to older adults where a test-retest increment of only 2 IQ points may be found (Matarazzo, Carmody, & Jacobs, 1980). Similarly, Mitrushina and Satz (1991) found that 66-75 year old adults were less likely to demonstrate test-retest im- provement relative to 57-65 year old adults on a battery of neuropsychologi- cal tests. In addition, adults 75 years of age and older were not likely to benefit at all from previous exposure (i.e., a test-retest interval of one year). Mitrushina and Satz noted that it is possible that consistent test-retest per- formance in the older adults may actually be comprised of a cognitive de- cline which is being offset by practice effects. It is important to note that decreased scores on performance tasks (in this case the Trail Making Test Parts A and B) among elderly may reflect not only a decline in atten- tion/concentration capacities, but may also be the result of psychomotor slowing (Mitrushina & Satz, 1991). As a general rule, Ryan et al. (1992) suggest that on a WAIS-R retest, a decline of either 7 or more IQ points or 3 or more scaled points on a subtest may be cause for further investi- gation.

    CLINICAL ASSESSMENTS

    Researchers may perceive practice effects as unwanted variance which needs to be controlled so as to minimize the potential confusion in the interpretation of change in patients' performance across assessments. In the clinical assessment domain, however, practice effects may provide the clinician with useful information about an individual patient. For example, if a patient has a reduced capacity to function within a specific cognitive, perceptual or motoric domain, then the patient may also demonstrate a corresponding reduced capacity with regards to demonstrating practice ef- fects on serial assessments. If the base rate of practice effects for both a particular neuropsychological assessment instrument or battery and specific patient population were established, then the presence and magnitude of practice effects might provide useful information on the integrity of the underlying cerebral system subserving the patient's performance. Unfortu- nately, the systematic collection of such data on base rates remains an im- portant issue yet to be addressed empirically by clinical neuropsychologists.

    FORENSIC EVALUATIONS

    Neuropsychologists are often confronted with the request to assess an individual who has been assessed earlier by another neuropsychologist. A

  • Practice Effects 217

    frequent situation involves medico-legal cases in which the neuropsy- chological practitioner is requested by defense counsel to perform a neuropsychological evaluation of a plaintiff.

    In forensic cases involving mild head injury, Barth, Gideon, Sciara, Hulsey, and Anchor (1986) recommended reevaluation proximate to the court date since this generally will provide the most valuable information as to the patient's current neuropsychological functioning and course of recovery. Of course, the issue of practice effects would require careful con- sideration by the neuropsychologist so as not to misattribute improved test score performance as being due to variables other than practice effects (e.g., recovery of function, regression to the mean, rehabilitation, etc.).

    The forensic neuropsychological literature offers few guidelines when addressing this issue from an idiographic perspective. A recent case study by Putnam, Adams, and Schneider (1992) of a plaintiff in a personal injury case provides a unique opportunity within which to consider the issue of practice effects, regression to the mean, as well as other factors. Putnam et al. (1992) report on a patient who was assessed by two independent neuropsychologists at a one-day test-retest interval. Putnam et al. defined practice effects as a score at the retest, indicative of improved performance, which exceeded two standard errors of the mean (SEM) for each of the neuropsychological assessment instruments. Using this criteria, Putnam et al. concluded that the majority of the findings were equivalent between the two assessments. Increases in performance at the retest attributed to prac- tice effects were found on only four of the instruments. While the use of an objectively defined criteria for defining practice effects from other sources of variability in serial neuropsychological assessments is important, the use of two SEM by Putnam et al. was an arbitrary criterion which may not be applicable with all assessment instruments/batteries and patient/sub- ject populations. The SEM is defined as the standard deviation divided by the square root of n. Defining practice effects as changes which exceed two SEM may be too stringent a criteria. In addition, using the SEM may actually involve using a criterion with unknown properties. Specifically, de- fining practice effects based on the SEM implies that the relationship of the individual subject/patient to the assessment instrument's normative group is known. A direct comparison of an individual subject/patient to established norms, however, may not always be possible. What would be the appropriate normative group for comparison with a 34 year old woman alleged to have sustained a mild traumatic brain injury in a MVA who also was known to have sustained an anoxic episode at age 3? Using a SEM criterion, the normative group would have to consist of subjects who had sustained an early anoxic event and later sustained a mild traumatic brain injury. Moreover, any arbitrarily defined criterion may by its' very nature

  • 218 McCaffrey and Westervelt

    overlook consistent practice effects that do not achieve the criterion. For example, Putnam et al.'s use of a 2 SEM difference for defining practice effects would exclude all "real" practice effects that are less than their 2 SEM criterion.

    The type of comparison conducted by Putnam et al. may not always be possible, particularly when neuropsychologists utilize a flexible battery (Kane, 1991) or casually composed neuropsychological battery (Reitan & Wolfson, 1993). There are also a number of other factors which could con- tribute to performance differences in medico-legal evaluations beyond those associated with practice effects (cf., McCaffrey, Williams, Fisher, & Laing, in preparation). Included among these are the test-retest interval, the matter of secondary gain, malingering, etc. The clinician's ability to interpret the significance, if any, associated with practice effects in both the clinical assessment and forensic evaluations arenas necessitates the de- velopment of base rate data for specific assessment instruments/batteries and patient populations.

    SUMMARY AND FUTURE DIRECTIONS

    The challenges posed by practice effects in serial neuropsychological assessments may ultimately be best addressed by the establishment of base rate data on the degree of practice effects associated with various temporal intervals, specific instruments/batteries and patient/subject populations. We are currently in the first stage of this process, as evidenced by the articles contained in this issue. The next step in the development of base rate data on practice effects could begin with a reanalysis of existing reliability data on most clinical neuropsychological assessment instruments/batteries. If the reliability measures were based on test-retest reliability data, then the ac- tual raw data necessary for the computation of the degree of practice ef- fects exists. An archival analysis of these data would go a long way towards establishing base rate data on practice effects. Clinical neuropsychologists engaged in either the development or revision of clinical neuropsychologi- cal instruments/batteries need to be encouraged to report both the standard psychometric properties of the instrument/battery (viz. validity and reliabil- ity) and practice effects data.

    As in other areas of clinical neuropsychology, attention must be paid as to how biopsychosocial variables can be involved in modulating brain function and dysfunction with regards to practice effects (cf. Puente & McCaffrey). From a biopsychosocial perspective, our current knowledge base of practice effects seems to have centered on the variable of age. To date, this appears to have been more of an unintended consequence than

  • Practice Effects 219

    the result of a systematic exploration of the effects of age on practice ef- fects. Although we have a limited data base on the role of age as it affects practice effects on neuropsychological assessment instruments, our under- standing of this variable and its interaction with other factors remains in- complete. In addition, there are a host of other biopsychosocial variables whose impact upon practice effects, if any, have yet to be examined sys- tematically by clinical neuropsychologists. Among these biopsychosocial variables are gender, ethnicity, bilingualism, socioeducational factors, psy- chiatric conditions, medical conditions, and environmental factors. Increas- ing our understanding of practice effects in serial neuropsychological assessments poses an important challenge for both the practitioner and re- searcher.

    REFERENCES

    Albert, M., Duffy, F. H., and Naeser, M. (1987). Nonlinear changes in cognition with age and their neuropsychological correlates. Canadian Journal of Psychology 41: 141-157.

    Anastasi, A. (1988). Psychological testing (6th ed.), Macmillan, New York. Barth, J. T., Gideon, D. A., Scieara, A. D., Hulsey, P. H., and Anchor, K. N. (1986). Forensic

    aspects of mild head trauma. Journal of Head Trauma Rehabilitation 1: 63-70. Brown, S. J., Rourke, B. P., and Cicchetti, D. V. (1989). Reliabilities of tests and measures

    used in neuropsychological assessment of children. The Clinical Neuropsychologist 3: 353- 368.

    Bourgeois, B. F. D., Prensky, A. L., Palkes, H. S., Talent, B. K., and Busch, S. G. (1983), Intelligence in epilepsy: A prospective study in children. Annals of Neurology 14: 438-444.

    Brouwers, P., and Mohr, E. (1989), A metric for the evaluation of change in clinical trials. Clinical Neuropharmacology 12: 12%133.

    Butters, N., Grant, I., H~xby, J., Judd, L. L., Martin, A., McClelland, J., Pequegnat, W., Schac- ter, D., and Stover, E. (1990). Assessment of AIDS-related cognitive changes: Recom- mendations of the NIMH workshop on neuropsychological assessment approaches. Journal of Clinical and Experimental Neuropsychology 12: 963-978.

    Chelune, G. (1991). Impact of confidence intervals and base-rate data on clinical interpretation [Summary]. The Clinical Neuropo,chologist 5: 263.

    Coutts, R. L., Lichstein, L., Bermudez, J. M., Daigle, M., Mann, D. P. Charbonnel, T. S., Michaud, R., and Williams, C. R. (1987). Treatment assessment of learning disabled chil- dren: Is there a role for frequently repeated neuropsychological testing? Archives of Clhzi- cal Neuropsychologv 2: 237-244.

    Dikmen, S., Machamer, J., Temkin, N., and McLean, A. (1990). Neuropsychological recovery in patients with moderate to severe head injury: 2 year follow-up. Journal of Clb,ical and Experimental Neuropsychology 12: 507-519.

    Dirks, J. (1982). The effect of a commercial game on children's Block Design scores on the WISC-R IQ Test. Intelligence 6: 109-123.

    Dodrill, C. B., and Troupin, A. S. (1975). Effects of repeated administration of a compre- hensive neuropsychological battery among chronic epileptics. Journal of Nen,ous and Men- tal Disease 161: 185-190.

    Dyche, G. M., and Johnson, D. A. (1991). Development and evaluation of CHIPASAT, an attention test for children: ll. Test-retest reliability and practice effect for a normal sam- ple. Perceptual and Motor Skills" 72: 563-572.

  • 220 McCaffrey and Westervelt

    Francis, D. J., Fletcher, J. M., Davidson, K. C., and Steubing, K. K. (1991). Conceptual and statistical issues in modeling individual growth and recovery [Summary]. Journal of Clinical and Experimental Neuropsychology 13: 48-49.

    Goldstein, G. (1991). Practice effect phenomena in a national hypertension study [Summary]. The Clinical Neuropsychologist 5: 263.

    Hays, W. L. (1988). Statistic.; (4th ed.), Holt, Rinehart and Winston, New York. Hermann, B., and Wyler, A. R. (1991). The impact of regression toward the mean and in-

    terpretation of test-retest data [Summary]. The Clinical Neuropsychologist 5: 263. Johnson, B. F., and Kane, R. L. (1991). An assessment of pretraining, cross-over design, and

    learning curve analysis in separating practice from medication effects [Summary]. Journal of Clinical and Experimental Neuropsychology 13: 50.

    Kane, R. L. (1991). Standardized and flexible batteries in neuropsychology: An assessment update. Neuropsychology Review 2: 281-339.

    Kay, G. (1991). Repeated testing applications employing computer-based performance assess- ment measures [Summary]. Journal of Clinical and Erperimental Neuropsychology 13: 50.

    Kilburn, K. H., Warsaw, R. H., and Shields, M. G. (1989). Neurobehavioral dysfunction in firemen exposed to polychlorinated biphenyls (PCBs): Possible improvement after detoxi- fication. Archives of Environmental Health 44: 345-350.

    Knight, R. G., and Shclton, E. J. (1983). Tables for evaluating predicted retest changes in Wechsler Adult Intelligence Scale scores. British Journal of Clinical Psychology 22: 77-81.

    Levin, H. S., Ewing-Cobbd, L., and Fletcher, J. M. (1989). Neurobehavioral outcome of mild head injury in children. In Levin, H. S.. Eisenberg, H. M., and Benton, A. L. (Eds.), MiM Head hzjur), (pp. 189-213), Oxford University Press, New York.

    Lewis, R. F., and Rennick, P. M. (1979). Mammal for the repeatable cognitive-Perceptual-Motor Batter),. Axon Publishing Company, Grosse Pointe Park, MI.

    Lezak, M. D. (1982, June). The test-retest stability and reliabifity of some tests commonly used in neurop~chological assessment. Paper presented at the fifth European conference of the International Neuropsychological Society, Deauville, France.

    Longstreth, L. E., and Alcorn, M. B. (1990). Susceptibility of Weschler Spatial Ability to ex- perience with related games. Educational and Psychological Measurement 50: 1-6.

    Matarazzo, J. D., Carmody, T. P., and Jacobs, L. D. (1980). Test-retest reliability and stability of the WAIS: A literature review with implications for clinical practice. Journal of Clinical Neurop~ychology 2: 89-105.

    Matarazzo, J. D., Weins, A. N., Matarazzo, R. G., and Goldstein, S. G. (1974). Psychometric and clinical test-retest reliability of the Halstead Impairment index in a sample of healthy, young, normal men. The Journal of Nen,ous and Mental Disease 158: 37-49.

    Matarazzo, R. G., Wiens, A. N., Matarazzo, J. D., and Manaugh, T. S. (1973). Test-retest reliability of the WAIS in a normal population. Journal of Clhlical Psychology 29: 194-197.

    McCaffrey, R. J. (1991). Clinical issues in the reliability and stability of neuropsycho[ogical instruments in four patient samples [Summary]. The Clinical Neuropsychologist 5: 263-264.

    McCaffrey, R. J., Orsillo, S. M., Lefkowicz, D. P., Ortega, A., Haase, R. F., Wagner, H., and Ruckdeschel, J. C. (1990, November). Neuropsycholo~qcal sequelae of chemotherapy and prophylactic cranial irradiation: An e_t-tension of earlier findings. Presented at the Annual Meeting of the National Academy of Neuropsychology, Reno, NV.

    McCaffrey, R. J., Ortega, A., Orsillo, S. M., Nelles, W. B., and Haase, R. F. (1992). Practice effects in repeated neuropsychological assessments. The Clinical Neurop~ychologist 6: 32- 42.

    McCaffrey, R. J., Ortega, A., Orsillo, S. M., Haase, R. F., and McCoy, G. C. (1992). Neurop- sychologica[ and physical side effects of metoprolol in essential hypertensives. Neuropsy- cholo~, 6: 225-238.

    McCaffrey, R. J., Ortega, A., and Haase, R. F. (1993). Effects of repeated neuropsychological assessments. Archives of Clinical Neuropsychology 8: 519-524.

    McCaffrey, R. J., Cousins, J. P., Westervelt, H. J., Martynowicz, M., Remick, S. C., Szebenyi, S., Wagle, W. A., Bottomley, P. A., Hardy, C. J., and Haase, R. F. (1995). Practice effects with the NIMH AIDS Abbreviated Neuropsychological Battery. Archives of Clinical Neurop~ycholo9 10: 241-250.

  • Practice Effects 221

    McCaffrey, R. J., Williams, A. D., Fisher, J. M., and Laing, L. C. (in preparation). The Practice of Forensic Neuropsychology, Plenum Press, New York.

    Meredith, W., and Tisak, J. (1990). Latent curve analysis. Psychometrika 55: 107-122. Mitrushina, M., and Satz, P. (1991). Effect of repeated administration of a neuropsychological

    battery in the elder. Journal of Clinical Psychology 47: 790-801. Mohr, E., and Brouwers, P. (Eds.). (1991). Handbook of Clinical Trials: The Neurobehavioral

    Approach, Swets & Zeitlinger, Netherlands. Puente, A. E., and McCaffrey, R. J. (Eds.). (1992). Handbook of Neuropsychological Assess-

    ment: A Biopsychosocial Perspective, Plenum Press, New York. Putnam, S. H., Adams, K. M., and Schneider, A. M. (1992). One-day test-retest reliability of

    neuropsychological tests in a personal injury case. Psychological Assessment 4: 312-316. Rawlings, D. B., and Crewe, N. M. (1992). Test-retest practice effects and test score changes

    of the WAIS-R in recovering traumatically brain-injured survivors. The Clhlical Neurop- sychologist 6: 415-430.

    Reitan, R. M., and Wolfson, D. (1993, October). Issues in the interpretation of difficult cases using the Halstead-Reitan neuropsychological test battery. Presented at the annual meet- ing of the National Academy of Neuropsychology, Phoenix, AZ.

    Ryan, J. J., Paolo, A. M., and Brungardt, T. M. (1992). WAIS-R test-retest stability in normal persons 75 years and older. The Clinical Neuropsychologist 6: 3-8.

    Schain, R. J., Ward, J. W., and Guthrie, D. (1977). Carbamazepine as an anticonvulsant in children. Neurology 27: 476-480.

    Seidenberg, M., O'Leary, D. S., Giordani, B., Berent, S., Boll, T. J. (1981). Test-retest IQ changes of epilepsy patients: Assessing the influence of practice effects. Journal of Clinical Neuropsychology 3: 237-255.

    Shatz, M. W. (1981). WAIS practice effects in clinical neuropsychology. Journal of Clinical Neuropsychology 3: 171-179.

    Spector, J. (1991). Within-session repeated measures of neuropsychological functioning [Sum- mary]. Journal of Clinical and Eaperimental Nettropsychology 13: 50.

    Standards for Educational and Psychological Testing. (1985). American Psychological Associa- tion, Washington, DC.

    Stuss, D. T., Stethem, L. L., Hugenholtz, H., and Richard, M. T. (1989). Traumatic brain injury: A comparison of three clinical tests, and an analysis of recovery. The Clinical Neuropsychologist 3: 145-156.

    Stuss, D. T., Stethem, L. L., and Poirier, C. A. (1987). Comparison of three tests of attention and rapid information processing across six age groups. The Clinical Neuropsychologist 1: 139-152.

    Su, R., and Yerxa, E. J. (1984). Comparison of the motor test of the SCSIT and the LNNBC. The Occupational Therapy Jottrnal of Research 4: 96-108.

    Thorndike, R. L. (1949). Personnel Selection, Wiley, New York. Tisak, J., and Meredith, W. (1989). Exploratory longitudinal factor analysis in multiple popu-

    lations. Psychometrika 54: 261-281. Van Gorp, W. G., Lamb, D. G., Schmitt, F. A. (1993). Methodologic issues in neuropsy-

    chological research with HIV-spectrum disease. Archives of Clinical Neuropsychology 8: 17-33.

    Wechsler, D. (1945). A standardized memory scale for clinical use. Journal of Psychology 19: 87-95.

    Welford, A. T. (1985). Practice effects in relation to age: A review and a theory. Developmental Neuropsychology 1: 173-190.

    Welford, A. T. (1987). On rates of improvement with practice. Journal of Motor Behavior 19: 401-415.

    Zentall, S. S., and Zentall, T. R. (1986). Hyperactivity ratings: Statistical regression provides an insufficient explanation of practice effects. Journal of Pediatric Psychology 11: 393-396.