a method for mapping stimulus distance into reinforcement value

9
LEARNING AND MOTIVATION (1971) 2, 40-48 A Method for Mapping Stimulus Distance into Reinforcement Value1 JAMES PETERSON AND DAVID PREMACK University of California, Santa Barbara Three pigeons were trained to peck for grain on a five-ply multiple variable interval (VI) schedule of reinforcement. A different VI schedule was associated with each of the five orthogonal training stimuli. A vertical line (90’ ) was associated with the highest rate of reinforcement (VI 15 set). Subsequent paired-comparison choice tests revealed that the training stimuli were chosen in direct relation to the reinforcement frequency of each of them. More importantly, the proportions of choice of line orientations other than the 90” line, when paired with the training stimuli and with each other, reflected both similarity to the 90’ line and reinforcement frequency in the training. The line closest to 90’ was chosen in preference to any of the other line test stimuli and to most of the orthogonal training stimuli by all birds. The other line test stimuli could similarly be located at some point on the reinforcement frequency continuum of the training stimuli. The data were interpreted in terms of expected probabilities of reinforcement which may be produced directly by actual frequencies of reinforcement associated with individual stimuli or indirectly as in stimulus generalization. The data suggested the possibility of mapping a stimulus distance variable, such as line orientation, into a reinforcement variable such as frequency of reinforcement. In this paper we describe a method for mapping stimulus distance into reinforcement value. Consider that a pigeon is trained to peck at a vertical line for which it receives 60 reinforcements per hour (rft/ hr) and at a red dot for which it receives 30 rft/hr. Assume that the red dot and black line lie on orthogonal dimensions. Subsequently the bird is given a choice between the dot and various orientations of the line. Consider that for angles down to about 60” the bird consistently chooses the line, that below 40” it chooses the dot, and that at 50’ it chooses the line and the dot about equally. Although the 50” line was not previously associated with reinforce- ment, the bird nonetheless chose indifferently between it and the red 1 Supported in part by National Institutes of Health Grants USPHS MH 15616. Requests for reprints should be sent to David Premack, Department of Psychology, IJniversity of California, Santa Barbara, California 93106. 40

Upload: james-peterson

Post on 10-Nov-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

LEARNING AND MOTIVATION (1971) 2, 40-48

A Method for Mapping Stimulus Distance into

Reinforcement Value1

JAMES PETERSON AND DAVID PREMACK

University of California, Santa Barbara

Three pigeons were trained to peck for grain on a five-ply multiple variable interval (VI) schedule of reinforcement. A different VI schedule was associated with each of the five orthogonal training stimuli. A vertical line (90’ ) was associated with the highest rate of reinforcement (VI 15 set). Subsequent paired-comparison choice tests revealed that the training stimuli were chosen in direct relation to the reinforcement frequency of each of them. More importantly, the proportions of choice of line orientations other than the 90” line, when paired with the training stimuli and with each other, reflected both similarity to the 90’ line and reinforcement frequency in the training. The line closest to 90’ was chosen in preference to any of the other line test stimuli and to most of the orthogonal training stimuli by all birds. The other line test stimuli could similarly be located at some point on the reinforcement frequency continuum of the training stimuli. The data were interpreted in terms of expected probabilities of reinforcement which may be produced directly by actual frequencies of reinforcement associated with individual stimuli or indirectly as in stimulus generalization. The data suggested the possibility of mapping a stimulus distance variable, such as line orientation, into a reinforcement variable such as frequency of reinforcement.

In this paper we describe a method for mapping stimulus distance into reinforcement value. Consider that a pigeon is trained to peck at a vertical line for which it receives 60 reinforcements per hour (rft/ hr) and at a red dot for which it receives 30 rft/hr. Assume that the red dot and black line lie on orthogonal dimensions. Subsequently the bird is given a choice between the dot and various orientations of the line. Consider that for angles down to about 60” the bird consistently chooses the line, that below 40” it chooses the dot, and that at 50’ it chooses the line and the dot about equally.

Although the 50” line was not previously associated with reinforce- ment, the bird nonetheless chose indifferently between it and the red

1 Supported in part by National Institutes of Health Grants USPHS MH 15616. Requests for reprints should be sent to David Premack, Department of Psychology, IJniversity of California, Santa Barbara, California 93106.

40

STIMULI SCALED AGAINST REINFORCEMENT DENSITY 41

dot which was previously reinforced. In fact, the red dot was associated with 30 rft/hr. Since the bird chose indifferently between the dot and the 50” line, we assume that the latter has a reinforcement value equivalent to that of 30 rft/hr. How did the 50” line attain a reinforce- ment value when it was not previously associated with reinforcement? The 50” line belongs to a stimulus continuum at least one member of which was reinforced. That is, the 90” line was associated with 60 rft/hr (twice the frequency associated with the red clot), and we assume that the reinforcement of any member of a stimulus continuum gen- erates expected reinforcement value in other members of the continuum.

Can stimulus distance be mapped into reinforcement value on the basis of such data? Notice that 60 rft/hr were associated with the 90” line and 30 rft/hr with the stimulus which the bird treated as being equivalent to a 50” line. This says that reinforcement value apparently decreased from 60 rft/hr at 90” to the equivalent of 30 rft/hr at W, which is a decrease of 30 rft/hr over a stimulus change of 40”, or an average loss of .75 rft/hr per degree of change in line orientation. In this paper we describe the first results obtained with the use of this general procedure.

METHOD

Subjects. Three experimentally naive, white Carneaux pigeons (Pal- metto Pigeon Plant) were reduced to SO% of free-feeding weight and maintained at that level throughout the study.

Apparatus. A Lehigh Valley pigeon chamber was modified to contain three transparent pecking keys mounted behind 25.4 mm circular open- ings and a standard grain magazine. Since the center key was not used in this study, it remained dark (but uncovered) throughout the study.

One IEE in-line digital readout was mounted behind each of the pecking keys. Each readout contained four line orientations differing in 26” steps from the vertical in the counterclockwise direction, i.e., 90, 64, 38, and 12”, and four stimuli presumably orthogonal to line orientation and to one another. The various line orientations were provided by 22.2 x 1.6-mm black lines on a plain white background. The orthogonal stimuli were a plain white surface (white), a plain red surface (red), an irree- lar grid of thin black lines (grid), and a large black spot I9 mm in diameter on a white background (spot).

The experiment was programmed and the data were recorded by means of solid-state electronic circuitry and Friclen tape readers and tape punches.

Method. The birds were magazine trained and shaped to peck both keys; 3-set access to mixed grain was the contingent reward used

42 PETERSON AND PREMACK

throughout the study. After preliminary training, all birds were trained on the following four procedures.

1. Preferences among the five stimuli were assessed by reinforcing all stimuli equally. The vertical line and four orthogonal stimuli (red, dot, grid, white) were presented equally often on each key in an irregu- lar order in association with the same VI 15set reinforcement schedule. Three to five 30-min sessions were given on this procedure.

2. Different VI schedules were associated with each of the five stimuli, and all birds were stabilized on the resulting five-ply multiple VI schedule. The schedule associated with the stimuli differed over birds (Table l), but the vertical line was associated with the highest

TABLE 1 Schedule of Reinforcement and the Reinforcement Frequency of Each Training

Stimulus for Each Pigeon

Subject number

Training stimulus

Vertical line

spot

Grid

Red

White

1, 5 4

VI 15 see VI 15 see (48 rft/hr) (48 rft/hr) VI 30 set VI 360 set (24 rft/hr) (2 rft/hr) VI 60 set VI 30 set (12 rft/hr) (24 rft/hr) VI 120 set VI 60 set (6 rft/hr) (12 rft/hr) VI 360 see VI 120 set (2 rft/hr) (6 rft/hr)

reinforcement for all birds, since line orientation was the dimension for which the test of generalized reinforcement value was ultimately made; each of the orthogonal stimuli was used as a standard and com- pared with different line orientations. The schedules used were VI 15, 30, 60, 120, and 360 set respectively, or in terms of possible reinforce- ments per hour, 48, 24, 12, 6, and 2, respectively. Since each session lasted exactly 30 min, a maximum of 46 reinforcements could be obtained per session. Stimulus periods lasted 60 set and were separated by lo-see time out (TO) periods during which all lights were extinguished in the box. All stimuli were presented equally often on both keys, and each session consisted of six randomized blocks of the five stimuli. Each bird was trained until the proportion of daily responding to each stimulus was stable over five 3-day blocks of sessions. Birds 1 and 5 were trained for 33 sessions and Bird 4 for 51 sessions.

STIMULI SCALED AGAINST ENFORCEMENT DENSITY 43

3. Each bird was given a choice test involving aI possible pairs of the five stimuli used in steps 1 and 2. Each test pair was presented 10 times for a total of 100 test trials per session. A session consisted of 100 30-set stimulus periods separated by lo-set TOs. A session began with the simultaneous presentation of any two of the stimuli, one on each of the two keys used in training. When the bird pecked either of the two keys, the light on the unchosen key was extinguished and the chosen stimulus remained available for the balance of the 30-set period. All stimulus pairs were presented an equal number of times, in an irregular order, counterbalanced for side of presentation. NO reinforcement was given during this procedure. After this test, Birds 1 and 5 and Bird 4 were given 39 and 33 type-2 sessions, respectively, and then moved to the last step of the experiment.

4. Line orientations (64, 38, and 12” ) other than the 90” used in training were introduced for the first time, and all birds were given two kinds of choice tests neither of which was reinforced. (a) All possible pairs of the four-line orientations were presented so as to obtain a generalization gradient of line orientation, e.g., 90’ vs 64”, 64” vs 12’, etc. (b) AI1 possible pairs consisting of a line orientation and an orthog- onal stimulus, e.g., 38” vs red, were presented so as to compare line orientations, for which reinforcement values were unknown, with the orthogonal training stimuli, for which reinforcement values were known. The choice procedure was the same as that used earlier, i.e., stimulus pairs were presented for 30 set, separated by lo-set TOs. The unchosen stimulus was extinguished and the chosen stimulus allowed to remain for the balance of the 30-set period, The two kinds of pairs, line-line and line-orthogonal stimulus, were intermixed throughout the session and all pairs were presented equally often in an irregular order counter- balanced for position. Each test pair was presented six times for a total of 108 test trials per session.

RESULTS

All birds showed some preferences among the five training stimuli. For example, in the first session, Birds 1, 4, and 5 responded 68, 70, and 80% less, respectively, to red than to the average of the other four stimuli. These initial biases may account for at least some of the anoma- lies observed in the subsequent choice data, e.g., Birds 1 and 4 tended to underrespond to red even though red was subsequently a relatively highly reinforced stimulus for them.

Figure 1 presents relative rates of responding as a function of relative rate of reinforcement associated with each of the five training stimuli averaged over the terminal nine sessions. Relative response rate was

44 PETERSON AND PREMACK

SESSI ON5

O-9 43 -45 O-046-48 A-a 4*-s,

RELATIVE RATE OF REtNFORCEMENT

FIG. 1. Mean relative response rate to each of the five training stimuli as a function of the relative frequency of reinforcement of each stimulus.

proportional to relative reinforcement rate except at the highest rein- forcement values where the curves flattened for both bird 1 and 4. Reynolds (1963) reported that response rate in two-component mul- tiple schedules flattens above 30 rft/hr. The highest frequencies used here, in the context of a five-component schedule, were 24 and 48 rft/hr.

Figure 2 shows the results of the paired comparisons which were given

40

30

20

0’ , 1

.I 0 Jo .30 4’0 I

20 I

10 ’ ’ I I ’

20 30 .40 30 REL*T,“E RATE OF REINFORCEMENT

FIG. 2. Proportion of actual choices to possible choices for each training stimulus and mean number of responses per minute to each stimulus in the first choice test.

STIMULI SCALED AGAINST REINFORCEMENT DENSITY 45

between all posible pairs of the five training stimuli. Response rate to each of the five stimuli and the number of times each was chosen were divided by total amount of responding and total number of choice trials, respectively, and are plotted against the relative frequency of reinforcement. Choice and rate were in general both monotonically related to relative frequency of reinforcement; major exceptions were the low responding to red by Birds 1 and 4. Interestingly, the choice data show that Birds 1 and 4 clearly preferred the stimulus associated with 48 rft/hr to the one associated with 24 rft/hr, even though this was not predictable from the response rate data shown in Fig. 1.

Figure 3 shows the individual and average generalization gradients for line orientation plotted in terms of relative rate and proportion of choices. The choice gradients were the most orderly, decreasing mono- tonically with increasing distance from the training stimulus for all three birds. The rate gradients were monotonic too except for minor inversions.

The data of greatest interest are the birds’ choices between the new line orientations which had not been associated with reinforcement, and the training stimuli all of which were associated with known frequencies

“““1 SI g - CHOICE r

I \ - RATE .80

t

FIG. 3. Generalization gradients for response rate and choice along the dimension of line orientation from the final choice test.

46 PETERSON AND PREMACK

of reinforcement. These data are shown in Fig. 4 where the ratio of choice of test stimulus to training stimulus is plotted as a function of the reinforcement value of the training stimuli. Separate curves are shown for each of the test stimuli, i.e., 64, 38, and 12”. Despite inversions in the individual data, the data were orderly in two major respects. First, preference was inversely related to stimulus distance; second, it was directly related to the reinforcement frequency of the training stimulus; e.g., the 64” line was chosen more often than either the 38 or 12” lines, and stimuli associated with 24 rft/hr were chosen more often than those associated with 12, 6, or 2 rft/hr.

DISCUSSION

Does the indifference which a bird may show between a 50” line and a red dot indicate that the bird is unable to discriminate between them? A more plausible interpretation is that the bird chooses stimuli in proportion to the subjective expected probability of reinforcement which they occasion, and hence chooses stimuli equally when their expected probability of reinforcement is the same. This interpretation is interesting because, as the present results indicate, the value a bird ascribes to a stimulus can have altogether different procedural determi- nants. In the case of the red dot the critical determination is presumably the objective frequency of reinforcement associated with responding to that stimulus. But in the case of the 50” line it cannot be actual rein- forcement frequency since no reinforcement was ever associated with the 50’ line. The procedure in this case was different; an objective reinforcement frequency was associated with the 90” line, and the stimulus was then displaced on the stimulus scale from 90” to 50”.

If we accept the assumption that stimuli which are chosen indifferently have equal reinforcement value, the present procedure makes it pos- sible to map stimulus distance into reinforcement value. (We have used only reinforcement frequency in this study, but the procedure should apply equally to other parameters of reinforcement, e.g., duration, sucrose concentration, etc. ) The mapping requires a knowledge of at least two coordinates: the reinforcement value and stimulus location on a common continuum of each of two stimuli. The stimulus location of the two stimuli is known directly (90” and 50”), as in the reinforcement value of one of the two stimuli. The reinforcement value of the second stimulus is then determined by finding an orthogonal stimulus of known reinforcement value which the subject treats as being equal to the stimulus in question. These two coordinate pieces of information make it possible to say, for example, that reinforcement value declined from

STIMULI SCALED AGAINST REINFORCEMENT DENSITY 47

60 rft/hr at 90” to 30 rft/hr at 50”, a change of 30 rft/hr over a dis- tance of 40”.

How internally consistent was the present mapping? Figure 4, which shows the data for choice between the orthogonal test stimuli and the unreinforced line orientations, reveals a number of inversions in the individual data, suggesting only a moderate order of internal consis- tency. The mean curves show the kind of relationships which we may hope to obtain in the individual data with improved procedures. The group data suggest that the reinforcement value of the 64-degree line is between 24 and 12 rft/hr, the value of the 3%degree line is between I2 and 2 rft/hr, and the value of the 12-degree line is less than the number of rft/hr represented in this series. If the 64-degree line were equivalent to 20 rft/hr, the loss in rft/hr per degree of line orientation would be about 1.08. Ideally, this loss would be the same at the 3% degree and 1Zdegree line orientations. Some of the inconsistency in the individual data may be due to unreliability, Each point in Fig. 4 is based on only six choices. Choices were restricted because the data were collected in the absence of reinforcement to avoid affecting the values of the new line orientations. and there was evidence of extinction

I I 7 I

1 I

4’0 I

2-4 I

4s 24 12 / 11 I 1 6 2

REINFORCEMENTS HOUR

FIG. 4. Proportion of times each line stimulus was chosen when paired with training stimuli represented on the horizonal axis by their respective reinforcement frequencies.

48 PETERSON AND PREMACK

over the course of the session, since the number of times birds failed to choose either alternative in the allotted time increased over the session.

The present mapping was based on physical parameters, e.g., degrees of line orientation on the one hand and number of reinforcements per hour on the other. A more accurate or internally consistent mapping may be possible with the use of behavioral parameters. For example, for the physical distance between test stimuli we might substitute the psycho- logical distance revealed by the generalization gradient on line orientation. Similarly, number of reinforcements per hour might be translated into the behavioral equivalent, rate of responding or choice between the stimuli. The present data are too slight to use as a basis for deciding between physical and behavioral parameters; we plan to make this comparison with more recent data obtained with improved test procedures.

REFERENCES

REYNOLDS, G. S. On some determinants of choice in pigeons. Journal of The Experimental Analysis of Behuvior, 1963, 6, 53-59.

(Received April 21, 1970)