how probable is probable? a numerical translation of verbal probability expressions

13
Journal of Forecasting, Vol. I, 257-269 (1982) How Probable is Probable? A Numerical Translation of Verbal Probabi I ity Expressions RUTH BEYTH-MAROM Decision Research, A Branch of Perceptronics, Eugene, Oregon ABSTRACT The reported experiment took place in a professional forecasting organi- zation accustomed to giving verbal probability assessments (‘likely’, ‘pro- bable’, etc.). It attempts to highlight the communication problems caused by verbal probability expressions and to offer possible solutions that are compatible with the forecasters’ overall perspective on their jobs. Experts in the organization were first asked to give a numerical translation to 30 different verbal probability expressions most ofwhich were taken from the organization’s own published political forecasts. In a second part of the experiment the experts were given 15 paragraphs selected from the organization’s political publications each of which contained at least one verbal expression of probability. Subjects were again asked to give a numerical translation to each verbal probability expression. The results indicate that (a) there is a high variability in the interpretation of verbal probability expressions and (b) the variability is even higher in context. Possible reasons for the context effect are discussed and practical implications are suggested. KEY WORDS Verbal probability Probability estimation Subjective probability Forecasting Forecasting is essential for decisions that involve possible future events: how to invest one’s money depends on forecasting future market behaviour; when to launch a space shuttle depends on the weather forecast; whether to invade Poland depends on anticipated NATO reactions and Polish resistance. Sometimes the forecaster and the decision maker are the same person, as when a physician makes a diagnosis and prescribes treatment. Frequently, however, there is a division of labour; one person or organization forecasts (e.g. an intelligence unit in an army), while another makes the decisions (e.g. an operational unit). A necessary condition for good decision processes is good communication between these two persons or organizations. To ensure this, all forecasts should unambiguously specify the event (‘The Polish army will resist a Russian invasion’) and the probability of its occurrence (‘It is unlikely that.. .’). However, the evidence suggests that communication problems may be quite common. Lichtenstein and Newman (1967) showed that the interpretation of everyday probability expressions is highly ambiguous. When subjects were asked to assign numerical values (between 0 and 100) to 41 different expressions (e.g. highly 0277-6693/82/030257-13$01.30 Received March 1982 0 1982 by John Wiley & Sons, Ltd.

Upload: ruth-beyth-marom

Post on 11-Jun-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How probable is probable? A numerical translation of verbal probability expressions

Journal of Forecasting, Vol. I , 257-269 (1982)

How Probable is Probable? A Numerical Translation of Verbal Pro ba bi I ity Expressions

RUTH BEYTH-MAROM Decision Research, A Branch of Perceptronics, Eugene, Oregon

ABSTRACT

The reported experiment took place in a professional forecasting organi- zation accustomed to giving verbal probability assessments (‘likely’, ‘pro- bable’, etc.). It attempts to highlight the communication problems caused by verbal probability expressions and to offer possible solutions that are compatible with the forecasters’ overall perspective on their jobs.

Experts in the organization were first asked to give a numerical translation to 30 different verbal probability expressions most ofwhich were taken from the organization’s own published political forecasts. In a second part of the experiment the experts were given 15 paragraphs selected from the organization’s political publications each of which contained at least one verbal expression of probability. Subjects were again asked to give a numerical translation to each verbal probability expression.

The results indicate that (a) there is a high variability in the interpretation of verbal probability expressions and (b) the variability is even higher in context. Possible reasons for the context effect are discussed and practical implications are suggested.

KEY WORDS Verbal probability Probability estimation Subjective probability Forecasting

Forecasting is essential for decisions that involve possible future events: how to invest one’s money depends on forecasting future market behaviour; when to launch a space shuttle depends on the weather forecast; whether to invade Poland depends on anticipated NATO reactions and Polish resistance.

Sometimes the forecaster and the decision maker are the same person, as when a physician makes a diagnosis and prescribes treatment. Frequently, however, there is a division of labour; one person or organization forecasts (e.g. an intelligence unit in an army), while another makes the decisions (e.g. an operational unit). A necessary condition for good decision processes is good communication between these two persons or organizations. To ensure this, all forecasts should unambiguously specify the event (‘The Polish army will resist a Russian invasion’) and the probability of its occurrence (‘It is unlikely that. . .’). However, the evidence suggests that communication problems may be quite common. Lichtenstein and Newman (1967) showed that the interpretation of everyday probability expressions is highly ambiguous. When subjects were asked to assign numerical values (between 0 and 100) to 41 different expressions (e.g. highly 0277-6693/82/030257-13$01.30 Received March 1982 0 1982 by John Wiley & Sons, Ltd.

Page 2: How probable is probable? A numerical translation of verbal probability expressions

258 Journal of Forecasting Vol. 1, Iss. No. 3

probable, seldom, etc.), the range of responses for each word was very large. For example, ‘probable’was given numbers between 1 and 99; the range of ‘seldom’ was 1 to 47. This ambiguity associated with verbal expressions of uncertainty has led some agencies, such as the National Weather Service, to express forecasts numerically (Murphy and Winkler, 1974).

None the less, verbal probability expressions are still very common, both because people are unaware of this ambiguity and because they resist numerical probabilities, often for reasons that are ungrounded. One likely reason for resistance is to avoid having the quality of their forecast judged (‘I said it is only possible.’). Proper scoring rules (Winkler and Murphy, 1968) can be used to evaluate a single numerical probability assessment. The result of such an evaluation has only a small reflection upon the forecaster’s general ability. Proper evaluation of a forecaster requires the consideration of a set of probability assessments, examining whether predictions assigned higher probabilities are more often correct than those assigned lower probabilities (Lichtenstein, Fischhoff and Phillips, 1982).

Numerical probabilities may also be resisted because people believe that the precise value chosen must be defended by an explicit derivation (e.g. ‘I have 4 reasons for and 1 reason against, therefore, the probability is 0.8’). Probabilities are, however, a subjective degree of belief; the same set of reasons causing one person to reach an evaluation of 0.2 may cause another to conclude that the probability is 0.5.

A third misconception is the belief that the opposite of ambiguity is‘being precise’; believing that numbers like 25, 50, and even a specific number like 67 should replace verbal expressions of probability. As people find it difficult to specify a precise number, they prefer a verbal ‘substitute’. Even if people find it easier to express a range of numbers (e.g. 10-30), they may not choose to express their degree of belief in that way if they believe that a range of probability values is as ambiguous as a verbal expression. Yet, a verbal expression is ambiguous because of the variability associated with its interpretation; there is no variability in the interpretation of a numerical range. The use of numerical ranges to express degree of confidence is no less legitimate than the use of precise numbers.

The remainder of this paper reports an experiment that took place in a professional forecasting organization in Israel accustomed to giving verbal probability assessments.’ It attempted first to highlight the problems caused by verbal probability expressions and then to offer possible solutions that are compatible with the forecasters’ overall perspective on their jobs.

In contrast to Lichtenstein and Newman’s subjects, who were not forecasters, the present experiment was done with experts in political forecasting. Furthermore, each verbal probability expression was judged by the expert not only in isolation (‘What is the numerical translation of “probable”?) but also in context. The experts had to substitute numbers for verbal probability expressions that were embedded in paragraphs taken from reports published by the studied organization.

METHOD

Subjects The subjects were all experts in political forecasting. Most had at least three years of higher education in political science or international relations. Twenty-seven subjects took part in the ‘in- isolation’ section. Twenty-five of those participated in the second ‘incontext’ section, in addition to seven other subjects, making a total of 32 in the second part of the experiment.

I Confidentiality of the participants and the organization with which they were atfiliated were guaranteed; describing them in more detail than what is given here would jeopardize this confidentiality.

Page 3: How probable is probable? A numerical translation of verbal probability expressions

Ruth Beyth-Marom Verbal Probability Expressions 259

Design The experiment had two parts. In the first (isolation) part, thirty verbal probability expressions, in Hebrew, were collected (the translated expressions are in the first column of Exhibit 2). Most were taken from the organization’s own published political forecasts. The remainder (7) were extreme expressions, such as ‘not likely’ or ‘nearly certain’, that were not found in any of the many written publications reviewed by the experimenter. In addition, a list of 50 clearly defined political events was prepared, all of which were possible during 1980 (the year of the experiment). Each of the 30 uncertainty expressions and 50 events was typed on a separate card.

For the second (in context) part, 15 paragraphs were selected from the organization’s political publications, each of which contained at least one verbal expression of probability. In all, they included 14 different expressions, some mentioned more than once.

Procedure In-isolation part In this part, three questions were tested: (a) how ambiguous are verbal probability expressions? (b) are subjects’ probability assessments specific enough so as to ascribe each event to one of 7 probability categories? and (c) which verbal expressions do subjects prefer as descriptions of the 7 probability categories?

Subjects were seen individually and were seated across from the experimenter. Each subject performed the following 5 tasks:

1 . Numerical translation: The experimenter told the subject that: On each of the 30 cards, there is written a verbal expression of uncertainty. Each verbal expression can be translated into a number between 0 and 100 where 0 expresses com- plete confidence that the event will not happen; 100 expresses complete confidence that the event will happen, and any number between 0 and 100 expresses different levels of likelihood. For each verbal expression, indicate the number that best represents it. The 30 cards were given to all subjects in the same randomized order.

2 . Eventclassification: Each subject was asked to classify the 50 events according to the probability of their occurrence before the end of 1980 (about a year from the date of the experiment). This classification was done in 3 separate steps: (a) ‘Classify the events (cards) into 4 categories: “I don’t know what the probability is (don’t know)”, ‘the probability that the event will occur is identical to the probability that the event will not occur (5&50)”, “it is more probable that the event will occur than that it will not occur (more than 50)”, and “it is more probable that theevent will not occur than that it will occur (less than 50)”.’ (b) ‘Distribute all the events in the “more than 50” category into three subcategories from the least probable (I) to the most probable (111). You may leave some of the events in the original “more than 5 0 pile’. (c) ‘Distribute all the events in the “less than 5 0 category into three subcategories from the most probable (V) to the least probable (VII). Again, you may leave some of the events in the original “less than 50” pile’.

At the end of this procedure, each subject had 7 piles of events labelled I, 11, . . ., VII, where the 5&50 pilewas the fourth(1V). Most subjects distributed all events between the 7 piles, leaving none in the ‘I don’t know’, ‘more than 50’, and ‘less than 50’ piles.

3. Probability-expression classification: Subjects were asked to order the 30 probability expressions from least probable to most probable. In contrast to the event classification task (where subjects were forced to distribute the events into 7 categories) the present task left the number of probability categories open.

4. Matching: Subjects were asked to match each probability category to one of the 7 event categories (see example in Exhibit 1). In doing so, subjects expressed their subjective 7-point scale

Page 4: How probable is probable? A numerical translation of verbal probability expressions

260 Journal of Forecasting Vol. 1, Iss. No . 3

Ranked P i l e s of Events

I I1 I11 I V V VI V I I

n u n = = m m

00 n o oc70 0 00 0 Ranked P i l e s of P r o b a b i l i t y Expressions

Exhibit I . A matching example

of probability expressions, i.e. what they thought were suitable verbal expressions for each one of the 7 ordered-event categories.

5. Second numerical translation: Subjects gave each probability-expression category a number between 0 and 100 representing their interpretation of the likelihood conveyed by all expressions in that category.

In-context part The 15 selected paragraphs were given to subjects in a take-home questionnaire that they returned to the experimenter after completion. This was given to them at least a week after their participation in the first, in-isolation, part. It is reasonable to assume that after a week subjects could not remember all their numerical translations to the 30 verbal probability expressions. The instructions read as follows:

The questionnaire presents a number of paragraphs taken from papers published by [the organization with which you are affiliated]. Each paragraph contains a number of words expressing the probability that an event will occur (e.g., probable, there is a chance, etc). Those expressions are underlined and followed by empty parentheses. Please match every underlined verbal probability expression with a number and write it in the empty parentheses. Your number should be between 0 and 100 such that 0 indicates ‘complete confidence that the event will not happen’, 100 expresses ‘complete confidence that the event will happen’ and any number between 0 and 100 expresses a different level of chance. The number expresses the way you interpret what is written in the paragraph and not your opinions concerning the specific topic. You have to try to think what the author of the paragraph had in mind when it was written.

RESULTS

In-isolation part Within-subject consistency As subjects gave numbers to the verbal expressions twice (tasks 1 and 5) , it was possible to check their consistency. For each subject, for each expression, the difference between the two numbers was calculated, representing the difference between the number given to an expression before seeing and ordering all expressions and after doing so. All 810 differences were smaller than 15 (on a scale from 0 to 100). Thus, subjects were highly consistent in their numerical translation at least during the experimental session.

Page 5: How probable is probable? A numerical translation of verbal probability expressions

Ruth Beyth-Marom Verbal Probability Expressions 261

Between-subject consistency The numbers which were given after classifying the 30 expressions were used for the between- subject consistency test. For each expression, two measures of dispersion were calculated : the interquartile range (C25-C,5) and the 80 per cent range (Clo-C90). The full range was not chosen because of its sensitivity to extreme and unrepresentative responses. The interquartile range treats this problem by discarding 50 per cent of the responses as unrepresentative; the 80 per cent range is less extreme, discarding only 20 per cent of the sample. These measures (rounded to integers) appear in columns 3-6 of Exhibits 2 and 3. The interquartile range highlights the most common interpretation of the 30 expressions and their classification, whereas the 80 per cent range highlights the variability of their interpretation.

Grouping Column 7 of Exhibit 2 shows the median rank attached to each probability expression in the matching task. Most of these medians were also modes. From Exhibit 4 (which was produced from the results shown in Exhibit 3(b)), one can see that most of the verbal expressions fall into six more

Range of expression

No. Verbal expression C25-Cn C1o-C90

Limits Range Limits Range

Median Rank

1. 2. 3. 4. 5. 6. 7. 8. 9.

10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.

Not likely Very low chance Poor chance Doubtful Low chance Small chance Can’t rule out entirely Chances are not great Not inevitable Perhaps One must consider There is a chance May It could be Possible One can expect Reasonable to assume Likely It seems Non-negligible chance It seems to me One should assume Reasonable chance Meaningful chance High chance Close to certain Most likely Nearly certain Very high chance Certain

5-1 5 19-18 11-25 16-33 22-34 22-36 2 4 4 9 2 8 4 1 35-56 36-53 37-59 37-60 41-58 42-57 51-58 5 1-63 52-69 53-69 53-65 53-67 54-67 5 4 6 8 54-69 63-80 75-87 15-92 78-92 83-96 87-96 98-1 00

10 8

14 17 12 14 25 13 21 17 22 23 17 15 7

12 17 16 12 14 13 14 15 17 12 17 14 13 9 2

2-1 8 4-23 4-3 3

11-39 15-38 17-42 12-58 22-52 26-59 28-58 27-64 28-67 32-65 3 4 6 3 42-6 1 42-69 43-8 1 42-8 1 50-69 36-77 50-73 48-75 49-8 1 58-86 71-91 5 8-9 7 72-97 76-99 83-99 93-100

16 19 29 28 23 25 46 30 33 30 34 39 33 29 19 27 38 39 19 41 23 27 32 28 20 39 25 23 16 7

I I I I1 I1 I1 111 11-111 IV 111 IV IV 111-IV V IV V V V V V V V V v VI VI-VII VI VII VII VII

Exhibit 2. Numerical translation of verbal probability expressions in isolation: range of expression

Page 6: How probable is probable? A numerical translation of verbal probability expressions

b

O0"btf"l

I',

Dv

vb

tfu

l c ,

I

Prio

r c

ha

nc

e

I I1

-. - - - -

- - - -

P

oo

r ih

an

rc

( ,>

c______

~ ..--. - ---

-- --

.-_-

. --

__

A

dded

0xp

r.ssi

ons

Page 7: How probable is probable? A numerical translation of verbal probability expressions

Ruth Beyth-Marom Verbal Probability Expressions 263

Main classes o f expressions

-Poor chance

-Reasonable

-One should

-It seems to me -It could be -Non-negligible -May -It seems -There is -Likely a chance -One must -Reasonable consider to assume

chance

assume

-Certain

-Very high chance

-Very low -Low/small -Perhaps -One can -Close to -Nearly certain

-Not 1 i kely -Doubtful -Not -Possible -High chance -Most likely chances chance expect certain

i nevi tab1 e

I I1 111-IV V V I V I I 10- 35 10-40 25-65 40-80 70-90 70-100

Intermediate expressions

-Can’t rule

-Chance not out entirely

great

-Meaningful chance

11-111 v - V I

Exhibit 4. A categorization of the verbal probability expressions

or less distinct groups; four extreme ones, assigned ranks, I, 11, VI, and VII and two middle groups: one sharing ranks 111 and IV, and one to which rank V was assigned. However, there are some intermediate expressions which have a very large range and/or partly overlap with two adjacent classes. These can be seen in the lower part of Exhibit 4.

Note that the expressions in the two extreme classes are those seldom, if ever, found in any of the material published by the organization in question. In the categories which are used (11-VI), the more extreme the category, the smaller the number of different expressions used to indicate that category. As many as 14 verbal expressions. covering the two middle categories, were detected in the organization’s publications.

In-context part Within-subject consistency As the second part of the experiment was planned when the first one was nearly complete, many of the subjects has already been debriefed about the first part of the experiment. As a result, most

Page 8: How probable is probable? A numerical translation of verbal probability expressions

264 Journal of Forecasting Vol. 1, Iss. No. 3

Range of expression

No. Verbal expression In In context isolation

1 2 3 4 5

4.

6.

8.

9.

10.

1 1 .

13.

14.

15.

16.

18.

19.

21.

22.

Doubtful

Small chance

Chances are not great

Not inevitable

Perhaps

One must consider

May

It could be

Possible

One can expect

Likely

It seems

It seems to me

One should assume

28

25

30

33

30 (28-58) 34

(27-64) 33

29

19

27

39

19

23

27

( 1 1-39)

( 17-42)

(22-52)

(26-59)

(32-65)

(3463)

(42-61)

(42- 69)

(42-81)

(50-69)

(50-73)

(48-75)

66

48

55 (20-75) 45

(30-75) 33

(30-66) 40

(30-70) 38

(30-68) 44

(22-66) 40

(27-67) 53

(27-80) 37

(50-87) 43

(47-90) 27

(6&87) 42

(30-72)

(1 7-83)

( 12-60)

38 (1 0 4 8 )

42 (34-76)

46 (30-76)

48 (40-88) 30

(53-83)

29

52 (60-89)

(35-87)

33 38 (46-79) (41-79)

49 (37-86) 29 41 28

(6 1-90) (55-96) (5482)

Exhibit 5. expression

Numerical translations of verbal probability expressions in isolation and in context: range of

tried to be consistent, in the sense of giving the same number to a particular verbal probability expression each time it was repeated in different paragraphs. They were quite successful; hence, no results will be reported from this analysis.

Between-subject consistency For each verbal expression appearing in the paragraphs, Clo-Cq0 was calculated and compared to the same measure taken from Part I (see Exhibits 5 and 6). In all but one case (‘likely’), judgements were more variablewhen an expression appeared in context than when it was judged out of context.

DISCUSSION

The results indicate that there is considerable disagreement in the interpretation of most verbal probability expressions as evidenced in the large range of numbers given to each expression. This result replicates Lichtenstein and Newman’s ( 1967) findings concerning probability expressions and

Page 9: How probable is probable? A numerical translation of verbal probability expressions

c --

----

----

4

0 10

20

30

4

0

50

60

7

0

80

90

10

0 4

. D

OU

BTFU

L

0 10

2

0

30

40

50

60

70

80

90

10

0

9.

NOT

INEV

ITA

BLE

0 10

20

30

40

50

60

70

80

90

10

0

16.

ON

E CA

N EX

PECT

. 0

10

20

30

40

50

60

70

80

90

10

0

21.

IT S

EEM

S TO

ME

-0

,-..--------

-.

0 10

20

30

4

0

50

60

70

80

90

10

0

6.

SW

LL

CH

AN

CE

--*

L*

II

11

,1

, .

,

0 10

20

30

4

0

50

60

70

80

90

10

0

10.

PERH

APS

0 10

2

0

30

40

50

6 70

80

90

10

0

14

. IT

CO

C1.

D

BE

0 10

20

30

4

0

SO

60

70

80

90

100

18

. L

IKE

LY

*

0 10

20

30

4

0

50

60

70

80

90

100

22,

ON

E SH

OU

LD A

SSU

ME

._

- - - - -

- - - -

- - _.

0

10

20

30

40

5

0

60

70

8

0

90

100

8. CH

ANCE

S AR

E NO

T G

REA

T

c --

_--

---_

--_

_--

4

0 10

20

30

40

50

60

7

0

80

90

100

11.

ONE

MU

ST CO

NSID

ER

0 10

2

0

30

40

50

60

70

80

90

100

15

. PO

SSIB

LE

c-------

0 10

20

30

4

0

50

60

70

80

90

10

0

19.

IT S

EEM

S I - Inconte

xt

In is

ola

tio

n

Exhi

bit 6

. in

Exh

ibit

5) A co

mpa

rison

bet

wee

n C

,,-C

,, of

ver

bal p

roba

bilit

y ex

pres

sion

s in

isol

atio

n an

d in

cont

ext (

the

expr

essi

on n

umbe

rs r

efer

to

the

sam

e nu

mbe

rs

Page 10: How probable is probable? A numerical translation of verbal probability expressions

266 Journal of Forecasring Vol. 1, Iss. No. 3

Simpson’s (1944, 1963) results concerning frequency words (e.g. often, seldom). It is, however, difficult to compare the present results with previous ones with regard to specific words since most of the translated words (from Hebrew) are different from the word list used by Lichtenstein and Newman (1967). The difference can be ascribed to the word-selection process. In the work of Lichtenstein and Newman, subjects were asked to elicit verbal probability expressions in a pilot study. From their suggestions, a list of expressions was composed (personal communication by S. Lichtenstein, 1982). In the present experiment, the list primarily contained words found in publications written by people in the organization under study.

No doubt, verbal probability expressions are a poor tool to convey one’s confidence in a forecast. A decision maker receiving such a forecast may interpret the event probability very differently from the way the forecaster intended, and may base an important decision on an erroneous interpretation.

One might be tempted to discount such disagreements on the grounds that probability expressions normally are used in a specific context which tends to decrease their range of interpretation. However, the in-context higher disagreement in the interpretation of verbal probability expressions refutes this claim.

Three explanations can be given for this last result. First, the events in the given paragraphs may have not been defined clearly, causing large differences between subjects in interpreting them. For example, when assigning a number to ‘the outbreak of hostile activities is most likely ( )’, subjects may differ in their interpretation of ‘hostile activities’: an ‘outbreak of war’ is less likely than ‘one rocket shell’, but both are ‘hostile activities’. Thus, the interpretation of ‘most likely’ may fluctuate with the interpretation of ‘hostile activities’. In such cases, an ambiguous event definition would cause disagreement about both the nature of the event itself and about its likelihood, causing severe communication problems.

Second, although subjects were instructed not to think about their own opinions concerning the probability of the specific event in question, subjects’ deep involvement in the task may have encouraged personal evaluations (of the events’ probabilities) instead of pure numerical translations (of the given verbal probability). Opinion differences with regard to the events’ probabilities would then increase the variability of the values assigned.

Finally, although normative models (Raiffa, 1968) suggest that subjective probabilities should be independent of the values assigned to the events, previous research has indicated the opposite; the desirability of an event influences its judged probability (Slovic, 1966). People find it difficult to ignore the value of an event while assessing its probability. Values, like probabilities, are subjective, differing from person to person. If, when translating a verbal expression into numbers, the translation is affected by people’s own opinions with regard to the probability of the event, and if they differ in the values they ascribe to the event, then one would, again, expect greater variability in probability values assigned to verbal probability expressions when assessed in context.

Another result relates to the number of words expressing different degrees of probability. Of the probability expressions found in the publications of the studied organization, there were more words expressing intermediate probabilities than extreme ones. This may be a result of the fact that events with the intermediate range probability occur much more frequently in these individuals’ work. Lepley and Kobrick (1952) showed that the variety of the individual’s synonym vocabulary for a concept varies directly with the frequency with which the individual uses that concept.

The frequency usage of the intermediate range of probabilities may be caused by either a sincere feeling that the uncertainty inherent in political events is very high, or a preference for non- committal verbal expressions, allowing one to defend one’s prediction in hindsight. A related possibility is that people use probability expressions mainly when they feel high uncertainty. When

Page 11: How probable is probable? A numerical translation of verbal probability expressions

Ruth Beyth-Marom Verbal Probability Expressions 267

talking about a binary event, this high uncertainty is expressed by numbers near 0.5 for each of the two possible outcomes. However, when uncertainty is small (i.e. the subjective probability of the event is very high or very low), maybe probability expressions are not used and certainty expressions take their place. Such a situation could be very harmful for a decision maker who bases decisions on such forecasts. A decision maker often would like to take preventive steps against low probability-high risk events. No such preventive steps will be taken if the events are evaluated as ‘impossible’.

Possible practical implications The reported results should convince any forecasting organization to change its policy and to use namerical expressions of probability rather than verbal ones. In addition to the better communication achieved by a common scale of probability expressions, a secondary gain is the possible application of various quantitative approaches to political forecasting (Heuer, 1978), all of which require clear definition of the events and their related probabilities. However, as mentioned before, the resistance to such a change can be enormous. When this approach does fail, a second compromise solution may be suggested: a scale of verbal probability expressions with underlying numerical translation. The formation of such a scale is demonstrated.

On the basis of the numerical translation and matching tasks, a 7category scale of probability was constructed, with each category having a numerical range and few verbal expressions (see Exhibit 7). Subjects’ performance in the eventclassification task indicated that they easily distributed the 50 events among 7 probability categories. Although the option of leaving some events in the ‘more than 50’ or ‘less than 50’ piles was suggested, most subjects distributed all events among the 7 categories. Thus, subjects seem able to discriminate 7 levels of subjective confidence. The constructed scale that emerges can be suggested to the organization as a common scale to be used in all internal communications and reports to decision makers on the outside. Although each word has a numerical translation, members are not forced to use those numbers directly. Rather, they can use verbal expressions of probability as long as they agree upon their interpretation. For example, when a sentence like ‘it is likely that the U.S. will intervene in El Salvador during the coming year’ is written, both writer and reader should make the following inferences: (a) it is more

Rank 1 2 3 4 5 6 7

Range 0-10 10- 30 30-50 50 50-70 70-90 90- 100

-Very small -Small chance chance

-Poor -Doubtful v)

0 c chance .C vl

P 0

aJ > n

-Perhaps - I t could -L ike ly -High -Very high be chance chance

- h Y

-Chance not grea t

-Reasonable -Close to -Most l i k e l y t o assume c e r t a i n

-One should assume

-Reasonable chance

- I t seems t o me

-Can expect - I t seems

Exhibit 7. A suggested common scale for the numerical translation of verbal probability expressions

Page 12: How probable is probable? A numerical translation of verbal probability expressions

268 Journal of Forecasting Vol. I , Iss. No. 3

probable that the U.S. will intervene than that it will not; (b) the probability that they will intervene is 50-70 (and the probability that they will not intervene is 30-50); (c) the probability that they will intervene is at most about twice as high as the probability that they will not intervene.

With such a scale, the number of categories, the range of each category, and the words describing each are all based on subjects’ responses and thus are as compatible as possible with their own judgements. A further advantage of such a common scale is that it uses a probability range rather than a point estimate, in keeping with subjects strong resistance to making point estimates. Furthermore, the provision of several verbal expressions for each category allows writers to feel that their writings are interesting and creative as well as clear and normative.

In addition to the common scale, evaluators should get a list of ‘words that should nor be used to indicate how probable events are’. That list should definitely include words that indicate only that a probability is not zero, but say little about how probable it is: ‘One must consider’, ‘one can’t rule it out entirely’ and ‘not inevitable’. People tend to perceive these terms as indicating low probability but their range of interpretations is very high (Clo-C90 = 36.2, 45.9, 33.6, respectively).

A second category of forbidden words are those such as ‘meaningful chance’ or a ‘good chance’ (which was not in the list of 30 words), which tend to confuse between the strength of the probability and the desirability of the associated outcome. A 10 per cent chance to recover from an operation may be a ‘good’ one if the patient would otherwise die. However, the same 10 per cent chance is ‘bad’ if the operation is not essential and the person is healthy. The chance to recover from the operation is the important one and not the forecaster’s evaluation of the outcome.

A verbal scale of probability expressions is a compromise between people’s resistance to the use of numbers and the necessity to have a common numerical scale. There is no doubt that this is the second best solution and should be implemented only after giving up hope to convince the organization to use numerical expressions explicitly. The big disadvantage of this solution is that people trained to use it can easily regress to their former ways of forecasting without detection. They may continue using the same recommended verbal expressions without paying any attention to their underlying numerical interpretations.

The implementation of a common scale (be it numbers per se or verbal expressions substituting them) is a big organizational change. As such it calls for a great deal of training (for forecasters in the organization and decision makers outside) and serious post-implementation evaluation.

If a verbal scale is implemented, a replication of the present experiment after training, implementation, and an extended period of usage could be an efficient evaluative tool. Similar ranges of interpretation will indicate that nothing has actually changed whereas significantly smaller ranges will indicate improvement.

ACKNOWLEDGEMENTS

My thanks to Baruch Fischhoff, Paul Slovic, Don MacGregor, and Sarah Lichtenstein for their helpful comments on earlier drafts of this paper. This research was supported by the U S . Office of Naval Research under contract NO001 4-80-C-0150 to Perceptronics, Inc.

REFERENCES

Heuer, R . J . , Quantitative approaches to political intelligence: The CIA experience. Boulder, Colorado:

Lepley, W. M. and Kobrick, J . L., ‘Word usage and synonym representation in the English language’, Westview, 1978.

Journal of Abnormal and Social Psychology, 41 (1952), 572-573.

Page 13: How probable is probable? A numerical translation of verbal probability expressions

Ruth Beyth-Marom Verbal Probabili ty Expressions 269

Lichtenstein, S. and Newman, J. R., ‘Empirical scaling of common verbal phrases associated with numerical probabilities’, Psychonomic Science, 9 (1967), 563-564.

Lichtenstein, S., Fischoff, B. and Phillips, L. D., ‘Calibration of probabilities: State of the art to 1980’, in Kahneman, D., Slovic, P. and Tversky, A. (eds.), Judgement under uncertainty: Heuristics and binses, New York: Cambridge University Press, 1982.

Murphy, A. H. and Winkler, R. L., ‘Probability forecasts: A survey of National Weather Service forecasters’, Bulletin of the American Meteorological Society, 55 (1974), 149-1453,

Raiffa, H., Decision analysis, Reading, Mass.: Addison-Wesley, 1968. Simpson, R. H., ‘The specific meaning of certain terms indicating differing degrees of frequency’, Quarterly

Simpson, R. H. ‘Stability in meanings for quantitative terms: A comparison over 20 years’, Quarterly Journal

Slovic, P. ‘Value as a determiner of subjective probability’, Transactions of the Institute of Electronic

Winkler, R. L. and Murphy, A. H., ‘“Good” probability assessors’, Journal of Applied Meteorology, 7

Journal of Speech, 30 (1944), 328-330.

of Speech, 49 (1963), 146-151.

Engineers: Human Factors Issue, HFE-7 (1966), 22-28.

(1968), 751-758.

Author’s biography: Ruth Beyth-Marom is an experimental psychologist interested in how people make judgements and decisions under conditions of uncertainty and how they can be aided.

Author’s address: Dr. Ruth Beyth-Marom, 11 Derech Haganin St., Kfar Shmaryahu 46910, Israel.