the distribution of scaled scores and possible floor effects on the wisc-iii and wais-iii
TRANSCRIPT
The Distribution of Scaled Scores and PossibleFloor Effects on the WISC-III and WAIS-IIISimon Whitaker* and Christopher Wood�
*The Learning Disability Research Unit, University of Huddersfield, Huddersfield, UK; �Clinical Psychology, School of
Psychological Sciences, University of Manchester, Manchester, UK
Accepted for publication 21 May 2007
Objective It has been suggested that, as the Wechsler
Adult Intelligence Scale-Third Edition (WAIS-III) and
the Wechsler Intelligence Scale for Children-Third Edi-
tion (WISC-III) give a scaled score of one even if a cli-
ent scores a raw score of zero, these assessments may
have a hidden floor effect at low IQ levels. The study
looked for indications of this in a sample of assess-
ments that had been given for clinical and diagnostic
reasons.
Design The degree to which a hidden floor effect could
be present was assessed by looking at the proportion of
scaled scores of one in IQ bands: 50–59, 60–69 and 70
plus and by plotting the distribution of scaled scores in
these bands for both the WISC-III and WAIS-III.
Method Fifty WISC-III and 49 WAIS-III assessments were
obtained from records and analysed.
Results The distribution of scaled scores in the WAIS-III
was approximately normal with very few scale scores of
one, suggesting that a hidden floor effect would only be a
potential problem for IQs in the 40s and 50s. The WISC-III
had a skewed distribution of scaled scores with more
scaled scores of one than any other scaled score. Scaled
scores of one were shown in all IQ levels up to 70 plus.
Conclusions There is potentially a significant floor effect
on the WAIS-III at IQs in the 40s and 50s and on the
WISC-III up to IQs in the 70s. There are also indications
that the WISC-III has a much harder criterion for gain-
ing a scaled score of two than the WAIS-III, resulting in
it producing lower IQs.
Keywords: floor effect, intellectual disabilities, WAIS-III,
WISC-III
Introduction
The Wechsler Adult Intelligence Scale-Third Edition
(WAIS-III) gives IQs down to 45 and the Wechsler
Intelligence Scale for Children-Third Edition (WISC-III)
to 40, which correspond to 3.67 and 4 standard devia-
tions (SDs) below the norm respectively. In both
assessments these floor IQs occur when a client scores
the minimum scaled score of one in each subtest used
to calculate IQ. The scaled scores have a mean of 10
and SD of 3; a scaled score of one is therefore 3 SDs
below the mean. Whitaker (2005) has suggested that,
as a scaled score of one is given even if the client
gains a raw scale score of zero, there may be a hid-
den floor effect. If a client with an ability level less
than 3 SDs below the norm is given a scaled score of
one, then this will artificially increase his ⁄ her overall
IQ score. To some extent the test designers recognize
this as a problem, as both WISC-III and WAIS-III
manuals state that a Full Scale IQ should not be given
unless the client has raw scores above zero on at least
three Verbal and three Performance subtests. However,
this does not seem to be a sufficient safeguard against
a client with very low raw scores having their IQ
overestimated by the assessments. A raw score of zero
could imply an ability level just below that corres-
ponding to scaled score one; however, it could also
imply ability well below this or no ability at all. Logi-
cally there should be raw score that corresponds to a
scaled score of zero or less. It is therefore not known
what raw score should correspond to a scaled score
of one. It is therefore possible that some clients who
gain low raw scores have ability levels more than 3
SDs below the norm and so have their ability overesti-
mated by the allocation of a scale score of one. This
will obviously be a problem with IQs in the 40s
where scale scores of one are inevitable. What is not
clear is whether it would affect higher IQs.
Journal of Applied Research in Intellectual Disabilities 2008, 21, 136–141
� 2007 The Authors. Journal compilation � 2007 Blackwell Publishing Ltd 10.1111/j.1468-3148.2007.00378.x
The degree to which this floor effect should be con-
sidered a concern for higher IQs in the 50s, 60s and
70s can be indicated by two measures. First, the abso-
lute number of scaled scores of one obtained at given
IQ levels will indicate how low the assessment will
measure before this hidden floor effect may become a
problem. Secondly, the distribution of scaled scores
will give an indication as to whether this hidden floor
is genuine. An individual’s score on a subtest should
be a function of a number of factors, the main one
being their true intellectual ability; others are their spe-
cific skills in the subtest, and situational factors such as
how the client was when assessed, level of distraction
and variation in how the assessment was given. The
combination of these factors should result in variation
in scaled scores on different subtests. If the scaled
scores of a number of subjects of similar ability were
combined one would expect that the distribution
would be approximately normal with the majority of
scaled scores being at the mean value and few very
low or high scaled scores. One would therefore expect
few scaled scores of one. However, if the assessment
was subject to a floor effect and a number of clients
with intellectual ability levels less than 3 SDs below
the mean were allocated a scaled score of one, there
would be more scaled scores of one than would be
predicted by a normal distribution.
The literature on the WISC-III and the WAIS-III does
not provide any information on the distribution of
scaled scores or the number of scaled scores of one that
can be expected at low IQ levels. It is the aim of this
study to examine this in order to assess if there is evi-
dence of a hidden floor effect.
Method
The files of the learning disability psychology services
in the locality were searched to find WISC-III and
WAIS-III assessments. In all, 49 WAIS-III assessments
and 50 WISC-III assessments were identified. The
assessments were ether conducted by a clinical psy-
chologist or an assistant psychologist who had been
trained to administer the assessment. All assessors
administered both the WISC-III and WAIS-III. In all
cases only the subtests needed to calculate a Full Scale
(FS) IQ were completed. The average age of those
assessed on the WAIS-III was 30 years 4 months (range
16–58 years, SD 12.02 years) and on the WISC-III
11 years 9 months (range 8–16 years, SD 2.67 years).
All assessments were performed for either clinical or
diagnostic reasons.
Results
Table 1 shows the mean IQs for both the WISC-III and
WAIS-III for the three FS IQ bands: 50–59, 60–69 and 70
plus. The IQ range 40–49 was not used in the quantita-
tive analysis as, although 13 clients given the WISC-III
had FS IQ of less than 50, only one client given the
WAIS-III scored below 50. There is no significant differ-
ence in mean IQs (using t-tests) between the WISC-III
and WAIS-III in any of these bands. Table 2 shows the
number and percentage of scaled scores of one for both
the WISC-III and WAIS-III for each of the IQ bands. In
Table 1 The mean and standard deviation (SD) of Verbal IQ
(VIQ), Performance IQ (PIQ) and Full Scale IQ (FS IQ) for both
the WISC-III and WAIS-III for three Full Scale IQ bands: 50–59,
60–69 and 70 and above.
Level of IQ WISC-III (SD) WAIS-III (SD) Diff IQ
50s n = 13 n = 8
V IQ (SD) 57.85 (5.44) 59.63 (4.27) 1.78 NS
P IQ (SD) 59.15 (5.97) 58.50 (3.46) )0.65 NS
FS IQ (SD) 55.38 (3.12) 55.50 (3.38) 0.12 NS
60s n = 19 n = 25
V IQ (SD) 63.53 (6.10) 66.04 (3.35) 2.51 NS
P IQ (SD) 69.21 (7.38) 67.84 (5.31) )1.37 NS
FS IQ (SD) 63.84 (2.63) 64.08 (2.27) 0.24 NS
70s n = 5 n = 15
V IQ (SD) 72.80 (7.95) 75.00 (6.90) 2.2 NS
P IQ (SD) 81.40 (8.91) 76.13 (7.55) )5.27 NS
FS IQ (SD) 74.00 (3.87) 73.33 (3.04) )0.67 NS
The difference between mean is shown in the Diff. IQ column.
The statistical significance in the differences in the mean IQ is
calculated using t-tests.
NS, non-significant.
Table 2 Percent (number) of scale score 1 for the IQ bands
50s, 60s and 70s
Level of IQ WISC-III WAIS-III Diff
50s n = 13 n = 8
31.5% (41) 17.0% (15) 14.5 NS
60s n = 19 n = 25
15.3% (29) 2.6% (7) 12.7*
70s n = 5 n = 15
10.0% (5) 0.0% (0) 10.0 NS
The statistical significance of the difference in the frequency of
scale scores of one is compared using Wilcoxon’s rank-sum test.
*P < 0.002.
NS, non-significant.
Journal of Applied Research in Intellectual Disabilities 137
� 2007 The Authors. Journal compilation � 2007 Blackwell Publishing Ltd, 21, 136–141
all IQ bands the percentage of scaled scores of one was
greater on the WISC-III than that on the WAIS-III. The
significance of this difference was tested using Wilcox-
on’s rank-sum test (cf. Howell 1992), for each IQ band.
For IQs in the 50s and 70s this just failed to reach statis-
tical significance (P < 0.10); however, for the 60s IQ
band it was highly significant (P < 0.002).
Figure 1 shows the distribution of scaled scores for
both WISC-III and WAIS-III in the three IQ bands,
together with the distribution of scaled score for the 13
clients with an FS IQ less than 50 on the WISC-III. For
IQs in the 40s, 50s and 60s on the WISC-III there are
more scaled scores of one than any other scaled score.
For IQs in the 70s, although there are more scaled scores
of five than one, 18 as opposed to 15, the distribution is
bimodal. In all cases the impression is of a truncated
normal distribution in which the area under the hidden
left side of the curve has been added to the left part of
the visible curve. The distributions are consistent with a
floor effect in which the non-allocated scaled scores
below one are added to the tally of scaled score one.
With WAIS-III, although the distribution of scaled scores
of IQs in the 50s appears to be somewhat truncated, the
distribution in all three IQ bands appears to be approxi-
mately normal with the mode in the midpoint of the
distribution.
0
0.2
0.4
0.6
0.8
1
Scaled scorePer
cen
tag
e o
f sc
aled
sco
res
Percentage of scale scores on WISC-III for IQs in 40s
0
0.050.1
0.150.2
0.250.3
0.35
Scaled score
Percentage of scale scores on WISC-III for IQs in 50s
0
0.05
0.1
0.15
0.2
Scaled score
Percentage of scale scores on WISC-III for IQs in 60s
0
0.05
0.1
0.15
0.2
1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 2 3 4 5 6 7 8 9 10 11 12 13
Scaled score
Percentage of scale scores on WISC-III for IQs in 70s
00.05
0.10.15
0.20.25
0.30.35
Scaled score
Percentage of scale scores on WAIS-III for IQs in 50s
0
0.05
0.1
0.15
0.2
0.25
0.3
Scaled score
Percentage of scale scores on WAIS-III for IQs in 60s
0
0.05
0.1
0.15
0.2
0.25
Scaled score
Percentage of scale scores on WAIS-III for IQs in 70s
Per
cen
tag
e o
f sc
aled
sco
res
Per
cen
tag
e o
f sc
aled
sco
res
Per
cen
tag
e o
f sc
aled
sco
res
Per
cen
tag
e o
f sc
aled
sco
res
Per
cen
tag
e o
f sc
aled
sco
res
Per
cen
tag
e o
f sc
aled
sco
res
1 2 3 4 5 6 7 8 9 10 11 12 13
Figure 1 Distribution of scaled scores for
WISC-III and WAIS-III.
138 Journal of Applied Research in Intellectual Disabilities
� 2007 The Authors. Journal compilation � 2007 Blackwell Publishing Ltd, 21, 136–141
Discussion
The aim of the study was to look at the distribution of
scaled scores and the relative number of scaled scores of
one in order to see if there were indications of a hidden
floor effect. It was found that the WISC-III, and to a les-
ser extent the WAIS-III, had a relatively high number of
scaled score of one for IQ less than 60. There is therefore
a possibility that IQ scores in this low range may be
artificially increased by this floor effect. For IQs in the
60s and 70s the WISC-III also showed far more scaled
scores of one than would have been expected, suggest-
ing that at these relatively high IQ levels there may be a
floor effect.
The reason for the difference in the number of scaled
scores of one between the WISC-III and the WAIS-III is
not clear. It is not likely to be due to a difference in
the intellectual abilities of the clients given the differ-
ent assessments, as there was no significant differ-
ence between WISC-III and WAIS-III mean IQs in the IQ
bands. An alternative explanation is that the WISC-III
could have harder criteria for gaining a scaled score of
two than the WAIS-III. Both Flynn (1985) and Spitz
(1986, 1989) have pointed to differences between the
WISC-R and the WAIS-R for IQs of 70 and below. Flynn
(1985) suggests that the WAIS-R will score as much as 13
points higher than the WISC-R in this range. Similarly
Spitz (1986) reports data showing that the WAIS-R scores
10 to 19 points higher than the WISC-R for IQs of 70 and
below. In a more detailed analysis of the literature, Spitz
(1989) compared the WISC-R and the WAIS-R and found
the WISC-R scores 15 points lower at IQ 60 (on the
WAIS-R). We are not aware of any empirical comparison
of the WAIS-III and WISC-III. However, a comparison
between the criteria specified in the manuals for 16-year
olds (the age range covered by both assessments) to gain
a scale score of two suggests that the WISC-III criterion
is harder.
Table 3 indicates the requirements for a 16-year old
to obtain a scaled score of two on the common subtests
needed for an FS IQ on both the WISC-III and WAIS-
III. It shows the minimum raw score needed and gives
a description of what the client has to do to obtain that
raw score, usually by giving the final item that the cli-
ent would need to pass to get that raw score, assuming
they got full marks on all previous items. It is clear
that the raw scores required on the WISC-III are con-
siderably greater than those on the WAIS-III. This in
itself is not surprising as the WISC-III is designed to
test children as young as 6 years old and so will need
to have items that 6-year olds with low intellectual
Table 3 Raw score and items or final item required to be
passed by a 16-year olds to obtain scaled score of two on the
WISC-III and WAIS-III
Subtest
Raw
score
WISC-III
Vocabulary
What does brave mean?1
22
Similarities
In what way are an elbow and knee alike?1
11
Arithmetic
Jim had 8 crayons and he bought 6 more.
How many crayons did he have altogether?1
13
Information
How many things make a dozen?1
11
Comprehension
Tell me some reasons why games have rules1
17
Picture completion
A plug hole missing from a bath1
16
Coding 39
Block design
Completion of one 2-block model and six 4-block
models gaining full bonus points for time on three
of the models
29
Picture arrangement
Arranging four sets of pictures correctly gaining all
bonus points for time on item 3 and all but one
on item 4
13
WAIS-III
Vocabulary
Tell me what ship means1
4
Similarities
In what way are a dog and a lion alike?1
4
Arithmetic
If you have 3 books and give one away, how many
do you have left?1
4
Information
What is the day that comes after Saturday?
1
Comprehension
What do people use money for?
1
Picture completion
A nose missing from a face1
4
Digit symbol 14
Block design
Completion of two 2-block models,
being given a second trial on one model
if an error occurred on the first trial
3
Picture Arrangement
Arranging one set of pictures that the client had
previously seen the examiner demonstrate and being
given a second trial due to an error on the first trial
1
1The item that would gain a scale score of two assuming all
previous items gained full points and all subsequent items are
failed.
Journal of Applied Research in Intellectual Disabilities 139
� 2007 The Authors. Journal compilation � 2007 Blackwell Publishing Ltd, 21, 136–141
ability will be able to pass. However, further examina-
tion of criteria for a scaled score of two suggests that
the WISC-III is harder. On Vocabulary, Similarities,
Information and Comprehension, the WAIS-III requires
an understanding of common concrete concepts that
people would use in their day-to-day lives, for example
the days of the week, money and animals that are at
least commonly seen on television. On the WISC-III the
concepts are more abstract, for example ‘brave’ and
‘rule’, or require an understanding of function, i.e. that
an elbow and knee are both joints and not just parts of
the body. On the WAIS-III Arithmetic subtest the client
is required to take one from three, needing an under-
standing of number to three, while on the WISC-III
they have to add eight and six requiring an ability to
deal with numbers above 10.
On Block Design, Picture Completion and Picture
Arrangement the client has to complete more items, the
final ones of which are more complex, on the WISC-III
than on the WAIS-III. However, the clearest indication
that the WISC-III is harder than the WAIS-III for scaled
score two comes from the Coding and Digit Symbol
subtests, which are virtually the same test on both
assessments. On the WAIS-III the 16-year old is required
to fill in 14 symbols and on the WISC-III he ⁄ she is
required to complete 39, a score that on the WAIS-III
would get them a scaled score of 5. All of this suggests
that it is harder for a 16-year old to obtain a scaled score
of two on the WISC-III than it is on the WAIS-III.
There is clearly a possibility that one reason why
WISC-III gives a greater number of scaled scores of one
is because it is much harder to gain a scaled score of
two than it is on the WAIS-III. This is somewhat para-
doxical, as it would also be expected that, given the
high number of scaled scores of one (37%) on the WISC-
III, it would be subject to a floor effect that would artifi-
cially increase IQ scores.
Probably one of the basic problems is that neither the
WISC-III nor WAIS-III was standardized on samples
that contained a sufficient number of people with low
IQs to find the correct relationship between raw score
and scaled scores below four. Although both assess-
ments were standardized using relatively large stratified
samples of the US population (2450 for the WAIS-III,
2200 for the WISC-III), the samples were then divided
into subsamples of 200 subjects in specified age ranges.
Effectively the standardization was done separately for
each age range. This meant that there were very few
subjects with low IQs, only five people below two SDs
(IQ 70 or scaled score four) and none below 2.8 SDs (IQ
58 or scaled score two). There would therefore be too
few subjects in the sample to reliably find the appropri-
ate raw scores corresponding to scaled scores of three
and two, or sum of scaled scores corresponding to IQs
below about 65. It is possible that scaled scores of one
on the WISC-III are subject to both a floor effect increas-
ing the overall IQ score if a client has a relatively low
raw score, and a suppression effect due to the harder
criteria for scaled score two if the client has a relatively
high raw score. The problem is that we do not know at
what raw score these effects may occur.
Conclusions
The aim of the study was to look for indications of a
hidden floor effect that could artificially increase a cli-
ent’s IQ. This was found to be the case, suggesting that
IQs, particularly measured by the WISC-III, may be
overestimates. However, evidence was also found that it
is more difficult to gain a scaled score of two on the
WISC-III than it is on the WAIS-III, suggesting that the
IQs given by the two assessments are not equivalent.
These two findings will be considered separately.
The floor effect
Logically, a floor effect due to scaled scores of one will
occur for IQs in the 40s where scaled scores of one
would inevitably occur. It is therefore likely that many
assessed IQs in the 40s are overestimates of the client’s
true ability. The degree to which a client’s true ability is
overestimated is not clear as, although a scaled score of
one may correspond to a true scaled score of one or
zero, at this level of ability, it may also correspond to a
true scaled score of less than zero. Therefore the only
accurate way of reporting a measured IQ in the 40s, in
which there are several scaled scores of one, is to state
that the true IQ is equal to or less than that calculated
on the basis of the scaled scores of one being one.
At higher IQ levels there does not seem to be a major
concern with the WAIS-III even for IQs in the 50s where
only 17% of scaled scores were one. However, with the
WISC-III there are potentially more serious problems. In
the 50–59 IQ range 31% of scaled scores were one, with
one client having six scaled scores of one. In the 60–69
IQ range, 15% of scaled scores were one and one client
had five scaled scores of one. In the 70–79 IQ range 10%
of scaled scores were one, with one client having three
scaled scores of one. The degree to which this would
affect the scores of clients in these IQ ranges can be
estimated if it is assumed that a scaled score of one will
correspond to a true scaled score of not less than zero,
140 Journal of Applied Research in Intellectual Disabilities
� 2007 The Authors. Journal compilation � 2007 Blackwell Publishing Ltd, 21, 136–141
which at this level may be a reasonable assumption.
Therefore, the true IQ will be between the IQ corres-
ponding to the scaled scores of one being counted as
one and the IQ in which all the scaled scores of one are
counted as zero. An IQ based on three scaled scores of
one would fall between that calculated if these scaled
scores were considered to be one and that calculated if
they were considered to be zero. For IQs in the 70s three
scaled scores of one could therefore be a two-point
lower IQ. These two points in turn could determine
whether the client is given a diagnosis of learning dis-
abilities and a service or not, and so cannot be regarded
as unimportant.
Criteria for gaining a scaled score of two
A further finding was the difference in the shape of the
distribution of scaled scores between the WISC-III and
WAIS-III. The WAIS-III had an apparently normal distri-
bution of scaled scores though with a few more scaled
scores of one for IQs in the 50s. On the other hand, the
WISC-III had highly skewed distortions with more
scaled scores of one than any other in all but IQs in the
70s. It is not clear why the differences in the distribu-
tions of scaled scores between the WISC-III and WAIS-
III occur, though it is suggested that the WISC-III has
more difficult criteria for gaining a scale score of two.
This in turn would result in the WISC-III giving lower
IQs than the WAIS-III. A similar situation has been
reported by Spitz (1989) with the WISC-R and WAIS-R,
with the WAIS-R being reported to score 15 points
higher than the WISC-R.
If this were the case for the WISC-III and WAIS-III,
then an individual found to have an IQ of 62 on the
WISC-III as a child meeting the criterion for having a
learning disability could be reassessed on the WAIS-III
some years later and found to have an IQ of 77 and be
considered to be no longer be eligible for learning dis-
ability services. If there were clear empirical evidence
that the WISC-III scored lower than the WAIS-III, and
by how much, then this could be taken into account
when drawing up criteria for services. The problem is
that this evidence does not exist so all that is known is
that WISC-III may score substantially lower than the
WAIS-III.
The overall conclusion is that there may be a lot more
error in the assessment of IQs below 80 than has previ-
ously been acknowledged, not only because of the floor
effect but also problems in the standardization of the
assessments leading to disparities between commonly
used assessments. With our current state of knowledge,
although it may be possible to give some indication as
to the degree of error caused by the floor effect, we do
not know how much the other factors may affect IQ
scores. It may therefore not be appropriate to state IQ
scores in reports without indicating that they may be
subject to a considerably greater degree of error than
that suggested in the manual.
There is a need for further research. This current
study was based on data that were available in records
and is clearly subject to possible error in how consis-
tently the assessments were administered, the motiva-
tion and state of the client being assessed and the
conditions under which the assessment took place. In
addition, there is no way of telling how similar the
adults given the WAIS-III were in intellectual ability to
the children given the WISC-III. A much better study,
which could control for the level of intellectual ability
and other errors in testing, would be for a large group
of 16-year olds to be tested on both the WISC-III and
WAIS-III in a counterbalanced order. A further question
that needs to be addressed is the degree to which the
WISC-IV is subject to the possible floor effects and the
degree to which it will produce scores equivalent to the
WAIS-III at low IQ levels.
Correspondence
Any correspondence should be directed to Simon Whi-
taker, The Learning Disability Research Unit, Room HW
2 ⁄ 08, University of Huddersfield, Huddersfield HD1
2DH, UK (E-mail: [email protected])
References
Flynn J.R. (1985) Wechsler intelligence tests: do we really have
a criterion of mental retardation. American Journal of Mental
Deficiency 90, 236–244.
Howell D.C. (1992) Statistical Methods for Psychologists 3rd edn.
Belmont Duxbury Press, Belmont.
Spitz H.H. (1986) Disparity in mentally retarded persons’ IQs
derived from different intelligence tests. American Journal of
Mental Deficiency 90, 588–591.
Spitz H.H. (1989) Variations in the Wechsler interscale IQ dis-
parities at different levels of IQ. Intelligence 13, 157–167.
Whitaker S. (2005) The use of the WISC-III and the WAIS-III
with people with a learning disability: three concerns. Clinical
Psychology 50, 37–40.
Journal of Applied Research in Intellectual Disabilities 141
� 2007 The Authors. Journal compilation � 2007 Blackwell Publishing Ltd, 21, 136–141