the distribution of scaled scores and possible floor effects on the wisc-iii and wais-iii

The Distribution of Scaled Scores and PossibleFloor Effects on the WISC-III and WAIS-IIISimon Whitaker* and Christopher Wood�

*The Learning Disability Research Unit, University of Huddersfield, Huddersfield, UK; �Clinical Psychology, School of

Psychological Sciences, University of Manchester, Manchester, UK

Accepted for publication 21 May 2007

Objective It has been suggested that, as the Wechsler

Adult Intelligence Scale-Third Edition (WAIS-III) and

the Wechsler Intelligence Scale for Children-Third Edi-

tion (WISC-III) give a scaled score of one even if a cli-

ent scores a raw score of zero, these assessments may

have a hidden floor effect at low IQ levels. The study

looked for indications of this in a sample of assess-

ments that had been given for clinical and diagnostic

reasons.

Design The degree to which a hidden floor effect could

be present was assessed by looking at the proportion of

scaled scores of one in IQ bands: 50–59, 60–69 and 70

plus and by plotting the distribution of scaled scores in

these bands for both the WISC-III and WAIS-III.

Method Fifty WISC-III and 49 WAIS-III assessments were

obtained from records and analysed.

Results The distribution of scaled scores in the WAIS-III

was approximately normal with very few scale scores of

one, suggesting that a hidden floor effect would only be a

potential problem for IQs in the 40s and 50s. The WISC-III

had a skewed distribution of scaled scores with more

scaled scores of one than any other scaled score. Scaled

scores of one were shown in all IQ levels up to 70 plus.

Conclusions There is potentially a significant floor effect

on the WAIS-III at IQs in the 40s and 50s and on the

WISC-III up to IQs in the 70s. There are also indications

that the WISC-III has a much harder criterion for gain-

ing a scaled score of two than the WAIS-III, resulting in

it producing lower IQs.

Keywords: floor effect, intellectual disabilities, WAIS-III,

WISC-III

Introduction

The Wechsler Adult Intelligence Scale-Third Edition

(WAIS-III) gives IQs down to 45 and the Wechsler

Intelligence Scale for Children-Third Edition (WISC-III)

to 40, which correspond to 3.67 and 4 standard devia-

tions (SDs) below the norm respectively. In both

assessments these floor IQs occur when a client scores

the minimum scaled score of one in each subtest used

to calculate IQ. The scaled scores have a mean of 10

and SD of 3; a scaled score of one is therefore 3 SDs

below the mean. Whitaker (2005) has suggested that,

as a scaled score of one is given even if the client

gains a raw scale score of zero, there may be a hid-

den floor effect. If a client with an ability level less

than 3 SDs below the norm is given a scaled score of

one, then this will artificially increase his ⁄ her overall

IQ score. To some extent the test designers recognize

this as a problem, as both WISC-III and WAIS-III

manuals state that a Full Scale IQ should not be given

unless the client has raw scores above zero on at least

three Verbal and three Performance subtests. However,

this does not seem to be a sufficient safeguard against

a client with very low raw scores having their IQ

overestimated by the assessments. A raw score of zero

could imply an ability level just below that corres-

ponding to scaled score one; however, it could also

imply ability well below this or no ability at all. Logi-

cally there should be raw score that corresponds to a

scaled score of zero or less. It is therefore not known

what raw score should correspond to a scaled score

of one. It is therefore possible that some clients who

gain low raw scores have ability levels more than 3

SDs below the norm and so have their ability overesti-

mated by the allocation of a scale score of one. This

will obviously be a problem with IQs in the 40s

where scale scores of one are inevitable. What is not

clear is whether it would affect higher IQs.

Journal of Applied Research in Intellectual Disabilities 2008, 21, 136–141

� 2007 The Authors. Journal compilation � 2007 Blackwell Publishing Ltd 10.1111/j.1468-3148.2007.00378.x

The degree to which this floor effect should be con-

sidered a concern for higher IQs in the 50s, 60s and

70s can be indicated by two measures. First, the abso-

lute number of scaled scores of one obtained at given

IQ levels will indicate how low the assessment will

measure before this hidden floor effect may become a

problem. Secondly, the distribution of scaled scores

will give an indication as to whether this hidden floor

is genuine. An individual’s score on a subtest should

be a function of a number of factors, the main one

being their true intellectual ability; others are their spe-

cific skills in the subtest, and situational factors such as

how the client was when assessed, level of distraction

and variation in how the assessment was given. The

combination of these factors should result in variation

in scaled scores on different subtests. If the scaled

scores of a number of subjects of similar ability were

combined one would expect that the distribution

would be approximately normal with the majority of

scaled scores being at the mean value and few very

low or high scaled scores. One would therefore expect

few scaled scores of one. However, if the assessment

was subject to a floor effect and a number of clients

with intellectual ability levels less than 3 SDs below

the mean were allocated a scaled score of one, there

would be more scaled scores of one than would be

predicted by a normal distribution.

The literature on the WISC-III and the WAIS-III does

not provide any information on the distribution of

scaled scores or the number of scaled scores of one that

can be expected at low IQ levels. It is the aim of this

study to examine this in order to assess if there is evi-

dence of a hidden floor effect.

Method

The files of the learning disability psychology services

in the locality were searched to find WISC-III and

WAIS-III assessments. In all, 49 WAIS-III assessments

and 50 WISC-III assessments were identified. The

assessments were ether conducted by a clinical psy-

chologist or an assistant psychologist who had been

trained to administer the assessment. All assessors

administered both the WISC-III and WAIS-III. In all

cases only the subtests needed to calculate a Full Scale

(FS) IQ were completed. The average age of those

assessed on the WAIS-III was 30 years 4 months (range

16–58 years, SD 12.02 years) and on the WISC-III

11 years 9 months (range 8–16 years, SD 2.67 years).

All assessments were performed for either clinical or

diagnostic reasons.

Results

Table 1 shows the mean IQs for both the WISC-III and

WAIS-III for the three FS IQ bands: 50–59, 60–69 and 70

plus. The IQ range 40–49 was not used in the quantita-

tive analysis as, although 13 clients given the WISC-III

had FS IQ of less than 50, only one client given the

WAIS-III scored below 50. There is no significant differ-

ence in mean IQs (using t-tests) between the WISC-III

and WAIS-III in any of these bands. Table 2 shows the

number and percentage of scaled scores of one for both

the WISC-III and WAIS-III for each of the IQ bands. In

Table 1 The mean and standard deviation (SD) of Verbal IQ

(VIQ), Performance IQ (PIQ) and Full Scale IQ (FS IQ) for both

the WISC-III and WAIS-III for three Full Scale IQ bands: 50–59,

60–69 and 70 and above.

Level of IQ WISC-III (SD) WAIS-III (SD) Diff IQ

50s n = 13 n = 8

V IQ (SD) 57.85 (5.44) 59.63 (4.27) 1.78 NS

P IQ (SD) 59.15 (5.97) 58.50 (3.46) )0.65 NS

FS IQ (SD) 55.38 (3.12) 55.50 (3.38) 0.12 NS

60s n = 19 n = 25

V IQ (SD) 63.53 (6.10) 66.04 (3.35) 2.51 NS

P IQ (SD) 69.21 (7.38) 67.84 (5.31) )1.37 NS

FS IQ (SD) 63.84 (2.63) 64.08 (2.27) 0.24 NS

70s n = 5 n = 15

V IQ (SD) 72.80 (7.95) 75.00 (6.90) 2.2 NS

P IQ (SD) 81.40 (8.91) 76.13 (7.55) )5.27 NS

FS IQ (SD) 74.00 (3.87) 73.33 (3.04) )0.67 NS

The difference between mean is shown in the Diff. IQ column.

The statistical significance in the differences in the mean IQ is

calculated using t-tests.

NS, non-significant.

Table 2 Percent (number) of scale score 1 for the IQ bands

50s, 60s and 70s

Level of IQ WISC-III WAIS-III Diff

50s n = 13 n = 8

31.5% (41) 17.0% (15) 14.5 NS

60s n = 19 n = 25

15.3% (29) 2.6% (7) 12.7*

70s n = 5 n = 15

10.0% (5) 0.0% (0) 10.0 NS

The statistical significance of the difference in the frequency of

scale scores of one is compared using Wilcoxon’s rank-sum test.

*P < 0.002.

NS, non-significant.

Journal of Applied Research in Intellectual Disabilities 137

� 2007 The Authors. Journal compilation � 2007 Blackwell Publishing Ltd, 21, 136–141

all IQ bands the percentage of scaled scores of one was

greater on the WISC-III than that on the WAIS-III. The

significance of this difference was tested using Wilcox-

on’s rank-sum test (cf. Howell 1992), for each IQ band.

For IQs in the 50s and 70s this just failed to reach statis-

tical significance (P < 0.10); however, for the 60s IQ

band it was highly significant (P < 0.002).

Figure 1 shows the distribution of scaled scores for

both WISC-III and WAIS-III in the three IQ bands,

together with the distribution of scaled score for the 13

clients with an FS IQ less than 50 on the WISC-III. For

IQs in the 40s, 50s and 60s on the WISC-III there are

more scaled scores of one than any other scaled score.

For IQs in the 70s, although there are more scaled scores

of five than one, 18 as opposed to 15, the distribution is

bimodal. In all cases the impression is of a truncated

normal distribution in which the area under the hidden

left side of the curve has been added to the left part of

the visible curve. The distributions are consistent with a

floor effect in which the non-allocated scaled scores

below one are added to the tally of scaled score one.

With WAIS-III, although the distribution of scaled scores

of IQs in the 50s appears to be somewhat truncated, the

distribution in all three IQ bands appears to be approxi-

mately normal with the mode in the midpoint of the

distribution.

0

0.2

0.4

0.6

0.8

1

Scaled scorePer

cen

tag

e o

f sc

aled

sco

res

Percentage of scale scores on WISC-III for IQs in 40s

0

0.050.1

0.150.2

0.250.3

0.35

Scaled score


0

0.05

0.1

0.15

0.2

Scaled score


0

0.05

0.1

0.15

0.2

1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13

1 2 3 4 5 6 7 8 9 10 11 12 13

1 2 3 4 5 6 7 8 9 10 11 12 13

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13

Scaled score


00.05

0.10.15

0.20.25

0.30.35

Scaled score

Percentage of scale scores on WAIS-III for IQs in 50s

0

0.05

0.1

0.15

0.2

0.25

0.3

Scaled score


0

0.05

0.1

0.15

0.2

0.25

Scaled score


Per

cen

tag

e o

f sc

aled

sco

res

Per

cen

tag

e o

f sc

aled

sco

res

Per

cen

tag

e o

f sc

aled

sco

res

Per

cen

tag

e o

f sc

aled

sco

res

Per

cen

tag

e o

f sc

aled

sco

res

Per

cen

tag

e o

f sc

aled

sco

res

1 2 3 4 5 6 7 8 9 10 11 12 13

Figure 1 Distribution of scaled scores for

WISC-III and WAIS-III.

138 Journal of Applied Research in Intellectual Disabilities


Discussion

The aim of the study was to look at the distribution of

scaled scores and the relative number of scaled scores of

one in order to see if there were indications of a hidden

floor effect. It was found that the WISC-III, and to a les-

ser extent the WAIS-III, had a relatively high number of

scaled score of one for IQ less than 60. There is therefore

a possibility that IQ scores in this low range may be

artificially increased by this floor effect. For IQs in the

60s and 70s the WISC-III also showed far more scaled

scores of one than would have been expected, suggest-

ing that at these relatively high IQ levels there may be a

floor effect.

The reason for the difference in the number of scaled

scores of one between the WISC-III and the WAIS-III is

not clear. It is not likely to be due to a difference in

the intellectual abilities of the clients given the differ-

ent assessments, as there was no significant differ-

ence between WISC-III and WAIS-III mean IQs in the IQ

bands. An alternative explanation is that the WISC-III

could have harder criteria for gaining a scaled score of

two than the WAIS-III. Both Flynn (1985) and Spitz

(1986, 1989) have pointed to differences between the

WISC-R and the WAIS-R for IQs of 70 and below. Flynn

(1985) suggests that the WAIS-R will score as much as 13

points higher than the WISC-R in this range. Similarly

Spitz (1986) reports data showing that the WAIS-R scores

10 to 19 points higher than the WISC-R for IQs of 70 and

below. In a more detailed analysis of the literature, Spitz

(1989) compared the WISC-R and the WAIS-R and found

the WISC-R scores 15 points lower at IQ 60 (on the

WAIS-R). We are not aware of any empirical comparison

of the WAIS-III and WISC-III. However, a comparison

between the criteria specified in the manuals for 16-year

olds (the age range covered by both assessments) to gain

a scale score of two suggests that the WISC-III criterion

is harder.

Table 3 indicates the requirements for a 16-year old

to obtain a scaled score of two on the common subtests

needed for an FS IQ on both the WISC-III and WAIS-

III. It shows the minimum raw score needed and gives

a description of what the client has to do to obtain that

raw score, usually by giving the final item that the cli-

ent would need to pass to get that raw score, assuming

they got full marks on all previous items. It is clear

that the raw scores required on the WISC-III are con-

siderably greater than those on the WAIS-III. This in

itself is not surprising as the WISC-III is designed to

test children as young as 6 years old and so will need

to have items that 6-year olds with low intellectual

Table 3 Raw score and items or final item required to be

passed by a 16-year olds to obtain scaled score of two on the

WISC-III and WAIS-III

Subtest

Raw

score

WISC-III

Vocabulary

What does brave mean?1

22

Similarities

In what way are an elbow and knee alike?1

11

Arithmetic

Jim had 8 crayons and he bought 6 more.

How many crayons did he have altogether?1

13

Information

How many things make a dozen?1

11

Comprehension

Tell me some reasons why games have rules1

17

Picture completion

A plug hole missing from a bath1

16

Coding 39

Block design

Completion of one 2-block model and six 4-block

models gaining full bonus points for time on three

of the models

29

Picture arrangement

Arranging four sets of pictures correctly gaining all

bonus points for time on item 3 and all but one

on item 4

13

WAIS-III

Vocabulary

Tell me what ship means1

4

Similarities

In what way are a dog and a lion alike?1

4

Arithmetic

If you have 3 books and give one away, how many

do you have left?1

4

Information

What is the day that comes after Saturday?

1

Comprehension

What do people use money for?

1

Picture completion

A nose missing from a face1

4

Digit symbol 14

Block design

Completion of two 2-block models,

being given a second trial on one model

if an error occurred on the first trial

3

Picture Arrangement

Arranging one set of pictures that the client had

previously seen the examiner demonstrate and being

given a second trial due to an error on the first trial

1

1The item that would gain a scale score of two assuming all

previous items gained full points and all subsequent items are

failed.



ability will be able to pass. However, further examina-

tion of criteria for a scaled score of two suggests that

the WISC-III is harder. On Vocabulary, Similarities,

Information and Comprehension, the WAIS-III requires

an understanding of common concrete concepts that

people would use in their day-to-day lives, for example

the days of the week, money and animals that are at

least commonly seen on television. On the WISC-III the

concepts are more abstract, for example ‘brave’ and

‘rule’, or require an understanding of function, i.e. that

an elbow and knee are both joints and not just parts of

the body. On the WAIS-III Arithmetic subtest the client

is required to take one from three, needing an under-

standing of number to three, while on the WISC-III

they have to add eight and six requiring an ability to

deal with numbers above 10.

On Block Design, Picture Completion and Picture

Arrangement the client has to complete more items, the

final ones of which are more complex, on the WISC-III

than on the WAIS-III. However, the clearest indication

that the WISC-III is harder than the WAIS-III for scaled

score two comes from the Coding and Digit Symbol

subtests, which are virtually the same test on both

assessments. On the WAIS-III the 16-year old is required

to fill in 14 symbols and on the WISC-III he ⁄ she is

required to complete 39, a score that on the WAIS-III

would get them a scaled score of 5. All of this suggests

that it is harder for a 16-year old to obtain a scaled score

of two on the WISC-III than it is on the WAIS-III.

There is clearly a possibility that one reason why

WISC-III gives a greater number of scaled scores of one

is because it is much harder to gain a scaled score of

two than it is on the WAIS-III. This is somewhat para-

doxical, as it would also be expected that, given the

high number of scaled scores of one (37%) on the WISC-

III, it would be subject to a floor effect that would artifi-

cially increase IQ scores.

Probably one of the basic problems is that neither the

WISC-III nor WAIS-III was standardized on samples

that contained a sufficient number of people with low

IQs to find the correct relationship between raw score

and scaled scores below four. Although both assess-

ments were standardized using relatively large stratified

samples of the US population (2450 for the WAIS-III,

2200 for the WISC-III), the samples were then divided

into subsamples of 200 subjects in specified age ranges.

Effectively the standardization was done separately for

each age range. This meant that there were very few

subjects with low IQs, only five people below two SDs

(IQ 70 or scaled score four) and none below 2.8 SDs (IQ

58 or scaled score two). There would therefore be too

few subjects in the sample to reliably find the appropri-

ate raw scores corresponding to scaled scores of three

and two, or sum of scaled scores corresponding to IQs

below about 65. It is possible that scaled scores of one

on the WISC-III are subject to both a floor effect increas-

ing the overall IQ score if a client has a relatively low

raw score, and a suppression effect due to the harder

criteria for scaled score two if the client has a relatively

high raw score. The problem is that we do not know at

what raw score these effects may occur.

Conclusions

The aim of the study was to look for indications of a

hidden floor effect that could artificially increase a cli-

ent’s IQ. This was found to be the case, suggesting that

IQs, particularly measured by the WISC-III, may be

overestimates. However, evidence was also found that it

is more difficult to gain a scaled score of two on the

WISC-III than it is on the WAIS-III, suggesting that the

IQs given by the two assessments are not equivalent.

These two findings will be considered separately.

The floor effect

Logically, a floor effect due to scaled scores of one will

occur for IQs in the 40s where scaled scores of one

would inevitably occur. It is therefore likely that many

assessed IQs in the 40s are overestimates of the client’s

true ability. The degree to which a client’s true ability is

overestimated is not clear as, although a scaled score of

one may correspond to a true scaled score of one or

zero, at this level of ability, it may also correspond to a

true scaled score of less than zero. Therefore the only

accurate way of reporting a measured IQ in the 40s, in

which there are several scaled scores of one, is to state

that the true IQ is equal to or less than that calculated

on the basis of the scaled scores of one being one.

At higher IQ levels there does not seem to be a major

concern with the WAIS-III even for IQs in the 50s where

only 17% of scaled scores were one. However, with the

WISC-III there are potentially more serious problems. In

the 50–59 IQ range 31% of scaled scores were one, with

one client having six scaled scores of one. In the 60–69

IQ range, 15% of scaled scores were one and one client

had five scaled scores of one. In the 70–79 IQ range 10%

of scaled scores were one, with one client having three

scaled scores of one. The degree to which this would

affect the scores of clients in these IQ ranges can be

estimated if it is assumed that a scaled score of one will

correspond to a true scaled score of not less than zero,

140 Journal of Applied Research in Intellectual Disabilities


which at this level may be a reasonable assumption.

Therefore, the true IQ will be between the IQ corres-

ponding to the scaled scores of one being counted as

one and the IQ in which all the scaled scores of one are

counted as zero. An IQ based on three scaled scores of

one would fall between that calculated if these scaled

scores were considered to be one and that calculated if

they were considered to be zero. For IQs in the 70s three

scaled scores of one could therefore be a two-point

lower IQ. These two points in turn could determine

whether the client is given a diagnosis of learning dis-

abilities and a service or not, and so cannot be regarded

as unimportant.

Criteria for gaining a scaled score of two

A further finding was the difference in the shape of the

distribution of scaled scores between the WISC-III and

WAIS-III. The WAIS-III had an apparently normal distri-

bution of scaled scores though with a few more scaled

scores of one for IQs in the 50s. On the other hand, the

WISC-III had highly skewed distortions with more

scaled scores of one than any other in all but IQs in the

70s. It is not clear why the differences in the distribu-

tions of scaled scores between the WISC-III and WAIS-

III occur, though it is suggested that the WISC-III has

more difficult criteria for gaining a scale score of two.

This in turn would result in the WISC-III giving lower

IQs than the WAIS-III. A similar situation has been

reported by Spitz (1989) with the WISC-R and WAIS-R,

with the WAIS-R being reported to score 15 points

higher than the WISC-R.

If this were the case for the WISC-III and WAIS-III,

then an individual found to have an IQ of 62 on the

WISC-III as a child meeting the criterion for having a

learning disability could be reassessed on the WAIS-III

some years later and found to have an IQ of 77 and be

considered to be no longer be eligible for learning dis-

ability services. If there were clear empirical evidence

that the WISC-III scored lower than the WAIS-III, and

by how much, then this could be taken into account

when drawing up criteria for services. The problem is

that this evidence does not exist so all that is known is

that WISC-III may score substantially lower than the

WAIS-III.

The overall conclusion is that there may be a lot more

error in the assessment of IQs below 80 than has previ-

ously been acknowledged, not only because of the floor

effect but also problems in the standardization of the

assessments leading to disparities between commonly

used assessments. With our current state of knowledge,

although it may be possible to give some indication as

to the degree of error caused by the floor effect, we do

not know how much the other factors may affect IQ

scores. It may therefore not be appropriate to state IQ

scores in reports without indicating that they may be

subject to a considerably greater degree of error than

that suggested in the manual.

There is a need for further research. This current

study was based on data that were available in records

and is clearly subject to possible error in how consis-

tently the assessments were administered, the motiva-

tion and state of the client being assessed and the

conditions under which the assessment took place. In

addition, there is no way of telling how similar the

adults given the WAIS-III were in intellectual ability to

the children given the WISC-III. A much better study,

which could control for the level of intellectual ability

and other errors in testing, would be for a large group

of 16-year olds to be tested on both the WISC-III and

WAIS-III in a counterbalanced order. A further question

that needs to be addressed is the degree to which the

WISC-IV is subject to the possible floor effects and the

degree to which it will produce scores equivalent to the

WAIS-III at low IQ levels.

Correspondence

Any correspondence should be directed to Simon Whi-

taker, The Learning Disability Research Unit, Room HW

2 ⁄ 08, University of Huddersfield, Huddersfield HD1

2DH, UK (E-mail: [email protected])

References

Flynn J.R. (1985) Wechsler intelligence tests: do we really have

a criterion of mental retardation. American Journal of Mental

Deficiency 90, 236–244.

Howell D.C. (1992) Statistical Methods for Psychologists 3rd edn.

Belmont Duxbury Press, Belmont.

Spitz H.H. (1986) Disparity in mentally retarded persons’ IQs

derived from different intelligence tests. American Journal of

Mental Deficiency 90, 588–591.

Spitz H.H. (1989) Variations in the Wechsler interscale IQ dis-

parities at different levels of IQ. Intelligence 13, 157–167.

Whitaker S. (2005) The use of the WISC-III and the WAIS-III

with people with a learning disability: three concerns. Clinical

Psychology 50, 37–40.



the distribution of scaled scores and possible floor effects on the wisc-iii and wais-iii

Documents