processing similarity does not improve metamemory

9
Processing Similarity Does Not Improve Metamemory: Eviden ce Again st Transfe r-Appro priate Monit oring Charles A. Weaver III Baylor University William L. Kelemen California State University, Long Beach The trans fer-ap propr iate moni torin g (TAM) hypothe sis of metamemory predicts that judg ment of learning (JOL) accuracy should improve when conditions during JOLs closely match conditions of the memory test. The authors devised 5 types of delayed JOLs for paired associates and varied them along with the type of memory test (cued recall or recognition). If the TAM hypothesis is correct, JOL and test type should interact to influence metamemory. Contrary to TAM, metamemory accuracy did not improve when JOL and test conditions matched but instead tended to vary according to whether the answer was apparent at time of JOL. Memory test scores and JOL magnitude were both greater when the correct target was evident during JOLs. Overall, the results are largely consistent with a monitoring retrieval view of delayed JOLs and do not support TAM as a viable account of JOL accuracy. Many studies of metamemory have focused on judgments of learning (JOLs), which are predictions about future memory per- formance that occur during, or soon after, study (Nelson & Narens, 1990). When JOLs occur for paire d assoc iates (e.g.,  elephant– sunburn) and the JOL cue provides only the cue term ( elephant ) and asks about future recall of the absent target term ( sunburn), metamemory accuracy increases dramatically if a delay of several minutes occurs between study and JOL. This finding is known as the delayed-JOL effect (Nelson & Dunlosky, 1991), and it pro- duces the largest change in JOL accuracy yet reported. Subsequent research has shown that the delayed-JOL effect is quite robust, and its theoretical explanation has been discussed widely (Dunlosky & Nelson, 1992, 1994, 1997; Kelemen, 2000; Kelemen & Weaver, 1997; Kimball & Metcalfe, 2002, in press; Koriat, 1997; Nelson & Dunlos ky, 1991, 1992, 1996; Schwa rtz, 1994; Spellman & Bjork, 1992, 1997; Weaver & Kelemen, 1997). In the present study, we focus ed on one possib le expla nation of delay ed-JO L accu racy known as transfer-appropriate monitoring (TAM). Conceptually, TAM can be seen as an extension of the well- known trans fer- approp riate proce ssing memory hypothe sis. Ac- cording to this view, memory is best when the processes used during encoding are recapitulated during retrieval (Blaxton, 1986; Graf & Ryan, 1990; Lockhart, 2002; Morris, 1978; Morris, Brans- ford, & Franks, 1977; Rajaram, Srinivas, & Roediger, 1998; Roe- diger, 1990; Roediger, Gallo, & Geraci, 2002). Similarly, accord- ing to the TAM hypothesis, metamemory accuracy should vary as a function of the match betwe en condit ions dur ing JOL s and conditions during subsequent memory tests; the closer the match bet we en JOL and ret rie val condit ions, the more ac cur ate the monitoring. To test the TAM hypothesi s, Dun los ky and Ne lson (1997) examined memory for paired associates on an associative recog- nition test, eliciting delayed JOLs with either the cue alone or the entire cue–ta rget pair. Beca use the recog nition test involved both the cue and the target, TAM predicts that metamemory accuracy should have been higher for the cue–target JOL cues. In fact, the opposite pattern emerged: Cue-alone JOL accuracy was better than cue–target accuracy. The above -me ntio ned view of TAM could be considere d a context-oriented version of TAM; when attempting to match judg- ment and retrieval conditions, one looks for a match between the stimuli. If the stimuli present at judgment match those at retrieval, a context-or iente d view of TAM would allow one to consider this a match. The results of Dunlosky and Nelson (1997) can be used to argue against the context-oriented version of TAM. Alternatively, one might focus not on the specific stimuli that are pres ent but on the kind of proc ess ing the stimuli evoke. Specifically, metamemory accuracy might increase if the cognitive processing required during JOLs matches that required for suc- cessful test performance, even if the exact context differs. We call this the process-oriented version of TAM. This distinction, though slight , may prove impor tant. 1 For example, Begg, Duft, Lalonde, Melnick, and Sanvito (1989) compared recall and recognition of paired associates (items A and B) using different kinds of condi- 1 This distinction is similar to the one between encoding specificity and transfer-appropriate processing. Encoding specificity, for all its utility, is descriptive in nature. Encoding specificity does not explain why memory improves when encoding conditions match retrieval conditions; it simply states that it does. Transfer-appropriate processing is an attempt to provide an explanation in terms of the processing similarity. Charles A. Weaver III, Department of Psychology and Neuroscience, Baylor University; William L. Kelemen, Department of Psychology, Cal- ifornia State University, Long Beach. We thank John Dunlosky and Janet Metcalfe for helpful comments on a draft of this article. We also thank Sheila Barnes and Candice Ferguson for assistance with data collection. Correspondence concerning this article should be addressed to Charles A. Weaver III, Department of Psychology and Neuroscience, Baylor Uni- ver sit y, Box 97334, Waco, Tex as 767 98. E-mail : charle s_weaver@ baylor.edu Journal of Experimental Psychology: Copyright 2003 by the American Psychological Association, Inc. Learning, Memory, and Cognition 2003, Vol. 29, No. 6, 1058–1065 0278-7393/03/$12.00 DOI: 10.1037/0278-7393.29. 6.1058 1058

Upload: 119568

Post on 03-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Processing Similarity Does Not Improve Metamemory

8/12/2019 Processing Similarity Does Not Improve Metamemory

http://slidepdf.com/reader/full/processing-similarity-does-not-improve-metamemory 1/8

Processing Similarity Does Not Improve Metamemory:Evidence Against Transfer-Appropriate Monitoring

Charles A. Weaver IIIBaylor University

William L. KelemenCalifornia State University, Long Beach

The transfer-appropriate monitoring (TAM) hypothesis of metamemory predicts that judgment of 

learning (JOL) accuracy should improve when conditions during JOLs closely match conditions of the

memory test. The authors devised 5 types of delayed JOLs for paired associates and varied them along

with the type of memory test (cued recall or recognition). If the TAM hypothesis is correct, JOL and test

type should interact to influence metamemory. Contrary to TAM, metamemory accuracy did not improve

when JOL and test conditions matched but instead tended to vary according to whether the answer was

apparent at time of JOL. Memory test scores and JOL magnitude were both greater when the correct

target was evident during JOLs. Overall, the results are largely consistent with a monitoring retrieval

view of delayed JOLs and do not support TAM as a viable account of JOL accuracy.

Many studies of metamemory have focused on judgments of 

learning (JOLs), which are predictions about future memory per-

formance that occur during, or soon after, study (Nelson & Narens,

1990). When JOLs occur for paired associates (e.g.,   elephant– 

sunburn) and the JOL cue provides only the cue term (elephant )

and asks about future recall of the absent target term (sunburn),

metamemory accuracy increases dramatically if a delay of several

minutes occurs between study and JOL. This finding is known as

the delayed-JOL effect (Nelson & Dunlosky, 1991), and it pro-

duces the largest change in JOL accuracy yet reported. Subsequent

research has shown that the delayed-JOL effect is quite robust, and

its theoretical explanation has been discussed widely (Dunlosky &Nelson, 1992, 1994, 1997; Kelemen, 2000; Kelemen & Weaver,

1997; Kimball & Metcalfe, 2002, in press; Koriat, 1997; Nelson &

Dunlosky, 1991, 1992, 1996; Schwartz, 1994; Spellman & Bjork,

1992, 1997; Weaver & Kelemen, 1997). In the present study, we

focused on one possible explanation of delayed-JOL accuracy

known as transfer-appropriate monitoring (TAM).

Conceptually, TAM can be seen as an extension of the well-

known transfer-appropriate processing memory hypothesis. Ac-

cording to this view, memory is best when the processes used

during encoding are recapitulated during retrieval (Blaxton, 1986;

Graf & Ryan, 1990; Lockhart, 2002; Morris, 1978; Morris, Brans-

ford, & Franks, 1977; Rajaram, Srinivas, & Roediger, 1998; Roe-

diger, 1990; Roediger, Gallo, & Geraci, 2002). Similarly, accord-

ing to the TAM hypothesis, metamemory accuracy should vary as

a function of the match between conditions during JOLs and

conditions during subsequent memory tests; the closer the match

between JOL and retrieval conditions, the more accurate the

monitoring.

To test the TAM hypothesis, Dunlosky and Nelson (1997)

examined memory for paired associates on an associative recog-

nition test, eliciting delayed JOLs with either the cue alone or the

entire cue–target pair. Because the recognition test involved both

the cue and the target, TAM predicts that metamemory accuracy

should have been higher for the cue–target JOL cues. In fact, the

opposite pattern emerged: Cue-alone JOL accuracy was better thancue–target accuracy.

The above-mentioned view of TAM could be considered a

context-oriented version of TAM; when attempting to match judg-

ment and retrieval conditions, one looks for a match between the

stimuli. If the stimuli present at judgment match those at retrieval,

a context-oriented view of TAM would allow one to consider this

a match. The results of Dunlosky and Nelson (1997) can be used

to argue against the context-oriented version of TAM.

Alternatively, one might focus not on the specific stimuli that

are present but on the kind of processing the stimuli evoke.

Specifically, metamemory accuracy might increase if the cognitive

processing required during JOLs matches that required for suc-

cessful test performance, even if the exact context differs. We callthis the process-oriented version of TAM. This distinction, though

slight, may prove important.1 For example, Begg, Duft, Lalonde,

Melnick, and Sanvito (1989) compared recall and recognition of 

paired associates (items A and B) using different kinds of condi-

1 This distinction is similar to the one between encoding specificity and

transfer-appropriate processing. Encoding specificity, for all its utility, is

descriptive in nature. Encoding specificity does not explain why memory

improves when encoding conditions match retrieval conditions; it simply

states that it does. Transfer-appropriate processing is an attempt to provide

an explanation in terms of the processing similarity.

Charles A. Weaver III, Department of Psychology and Neuroscience,

Baylor University; William L. Kelemen, Department of Psychology, Cal-

ifornia State University, Long Beach.

We thank John Dunlosky and Janet Metcalfe for helpful comments on

a draft of this article. We also thank Sheila Barnes and Candice Ferguson

for assistance with data collection.

Correspondence concerning this article should be addressed to Charles

A. Weaver III, Department of Psychology and Neuroscience, Baylor Uni-

versity, Box 97334, Waco, Texas 76798. E-mail: charles_weaver@

baylor.edu

Journal of Experimental Psychology: Copyright 2003 by the American Psychological Association, Inc.Learning, Memory, and Cognition2003, Vol. 29, No. 6, 1058–1065

0278-7393/03/$12.00 DOI: 10.1037/0278-7393.29.6.1058

1058

Page 2: Processing Similarity Does Not Improve Metamemory

8/12/2019 Processing Similarity Does Not Improve Metamemory

http://slidepdf.com/reader/full/processing-similarity-does-not-improve-metamemory 2/8

tions at time of judgment. All participants were tested on recog-

nition of A and on cued (A 3  B) recall. Begg et al. found that a

precise match between prediction and text context was not re-

quired for accurate predictions. Instead, more accurate predictions

were obtained when then processing requirements were similar.

For example, consider the case in which the memory test requires

recall of B given A as a cue. Judgments were made in severaldifferent ways, but in this example two are especially critical:

predicting recall of B given AB as the cue, and predicting recog-

nition of B given A as a cue. The second condition produced more

accurate predictions, even though the predictions involved recog-

nition, whereas the test was cued recall. Begg et al. (1989) con-

cluded that “predictive accuracy depends on whether the predictive

task requires the same processes as the test, not on the nominal

question . . .”   (p. 630). This could be interpreted as supporting a

process-oriented view of TAM.

One complicating factor in comparing the context-oriented and

process-oriented versions of TAM is the covariability of the two.

In general, altering the stimuli present at study, prediction, and test

induces processing differences. One useful way of isolating the

two is comparing recall and recognition. Predictions of recall

should be best in a cue-only JOL condition. One’s ability to predict

future performance on a recognition test, however, requires knowl-

edge not only of the correct answer but also of the alternatives. In

a situation in which one is uncertain of an answer, it is much easier

to recognize the correct response if the alternatives are not plau-

sible. Recognition of  caballo   as the Spanish word for   horse   is

much easier given agua, verde, and  sol  as distractors than if given

 pato, vaquero, and  oveja. Indeed, the difficulty of alternatives is a

major determinant of recognition performance (e.g., Drum, Calfee,

& Cook, 1981).

To examine a process-oriented version of TAM, we compared

performance on five different delayed-JOL conditions in a paired-

associate task, summarized in Table 1. Participants studied a series

of cue–target pairs like   ELEPHANT  – sunburn   and made JOLs

several minutes after studying. Participants in Condition 1 were

presented with only the cue word at time of JOL ( ELEPHANT ) and

were asked to predict the likelihood of future recall (or recogni-

tion) of the target (sunburn) given the cue. Participants in Condi-

tion 2 also predicted future performance but were presented boththe cue and the target at JOL ( ELEPHANT  – sunburn). The supe-

riority of the cue-alone condition is well established but consistent

with both context-oriented and process-oriented versions of TAM.

To distinguish between the two, we varied prediction conditions on

a recognition test (Conditions 3–5).

Participants in Condition 3 studied the same word pairs, and at

time of JOL, they were shown the cue alone ( ELEPHANT –?) along

with six incorrect alternative pairs and were asked to predict later

recognition performance. These incorrect alternatives combined a

correct cue ( ELEPHANT ) paired with incorrect responses (such as

elbow). Condition 4 was identical to Condition 3, except that the

cue-alone alternative was replaced with the correct cue–target.

Finally, in Condition 5, not only was the correct cue–target pairpresented at JOL (as in Condition 4) but it was marked as correct

with asterisks.

According to a process version of TAM, the JOL conditions that

produce the most accurate metamemory should vary according to

the type of test. For recognition, JOL accuracy should be highest

in Condition 4 because the processing elicited during JOLs most

closely matches the test. In contrast, Condition 1 should produce

the best performance for cued-recall tests; including incorrect

alternatives at time of JOL might even hinder metamemory by

providing irrelevant information. Thus, a critical prediction of the

TAM hypothesis is that a significant interaction between type of 

JOL and type of test should emerge.

Table 1

 Examples of the Five Delayed-JOL Conditions for a Hypothetical Item, ELEPHANT–sunburn

Condition Description Example of JOL prompt

1 Cue alone   ELEPHANT  – ?2 Cue–target   ELEPHANT  – sunburn3 Cue alone incorrect alternatives   ELEPHANT  – diamond 

 ELEPHANT  – hillside ELEPHANT  – macaroni ELEPHANT  – bar 

 ELEPHANT  –  ELEPHANT  – elbow ELEPHANT  – sugar 

4 Cue–target incorrect alternatives   ELEPHANT  – diamond 

 ELEPHANT  – hillside ELEPHANT  – macaroni

 ELEPHANT  – bar  ELEPHANT  – sunburn ELEPHANT  – elbow

 ELEPHANT  – sugar 5 Cue–target (marked) incorrect alternatives   ELEPHANT  – diamond 

 ELEPHANT  – hillside

 ELEPHANT  – macaroni ELEPHANT  – bar  ELEPHANT  – sunburn***

 ELEPHANT  – elbow ELEPHANT  – sugar 

 Note.   JOL  judgment of learning.

1059TRANSFER-APPROPRIATE METAMEMORY

Page 3: Processing Similarity Does Not Improve Metamemory

8/12/2019 Processing Similarity Does Not Improve Metamemory

http://slidepdf.com/reader/full/processing-similarity-does-not-improve-metamemory 3/8

Method

Participants and Materials

A total of 68 college undergraduates participated for course credit. The

stimuli were 60 unrelated pairs of nouns from Paivio, Yuille, and Madi-

gan’s (1968) norms. Up to 4 people participated simultaneously during

experimental sessions, and all testing was conducted in individual cubiclesusing IBM-compatible PCs.

 Design and Procedures

We used a 2 5 (type of memory test, varied between subjects; by type

of JOL, within subject) mixed factorial design. All JOLs were delayed by

at least 2.5 min. On arrival, individuals were designated to receive either a

cued-recall test or a seven-alternative forced-choice recognition test over

all 60 items. Participants were informed which type of test would be

administered. Incorrect alternatives for the recognition test were con-

structed from correct answers for other stimuli.

Participants studied the items at a rate of 4 s/pair. An additional 5 pairs

of items (one in each JOL condition) were included at the beginning of the

study phase as a primacy buffer. The 60 critical stimuli were divided into

two blocks of 30 during the study phase (although these blocks weretransparent to the participants). Following study, the items were presented

for JOLs, again preceded by the 5 buffer items. The first block of 30 paired

associates was presented in random order for JOLs, followed by the second

block. Thus, at least 35 items (30 studied items plus the JOLs on the 5

buffer items) intervened between study and JOL. After providing all 60

JOLs, participants completed an unrelated filler activity for 10 min, fol-

lowed by an untimed memory test (either cued recall or recognition).

For each person, 12 items were randomly assigned to each of the five

JOL conditions. As described above, these JOL conditions were as follows:

1. Cue alone: The cue was shown at the top of the screen, followed

by the phrase   The first word appears alone above.

2. Cue Target: The previously studied pair was shown at the top

of the screen, followed by the phrase   The correct pair appearsabove.

3. Cue alone with incorrect alternative pairs: The cue word (alone)

was shown at the top of the screen, followed by six incorrect

cue–target pairs and then the statement  The first word appears

alone above, mixed with six incorrect pairs.

4. Cue Target with six incorrect alternative word pairs, followed

by the statement The correct pair appears above, mixed with six

incorrect pairs.

5. Cue Target with six incorrect alternative word pairs, in which

case the correct pair was noted by flanking asterisks followed by

the statement  The correct pair is marked with  “***” above. Six

incorrect pairs also are listed.

Participants in all conditions were asked to rate how likely they were to

recall or recognize the correct answer by selecting ratings of 0% (labeled

definitely will not remember ), 20%, 40%, 60%, 80%, or 100% (labeled

definitely will remember ) confident.

Results

An alpha level of .05 was used for all statistical tests except

where noted. We computed  2 as a measure of effect size for all

statistically significant analyses of variance (ANOVAs), and we

used guidelines based on Cohen (1988) to interpret   2: 0.01  

small effect size, 0.06     medium effect size, and 0.14     large

effect size (see Clark-Carter, 1997, for details).

 Metamemory Accuracy

We computed Goodman–Kruskal Gamma correlations (G) be-

tween JOL magnitude and memory test performance for each

participant. G was undefined in some conditions in which there

was a lack of variability in JOLs (i.e., using the same JOL rating

for all 12 items) or test performance (i.e., scoring 0/12 or 12/12 on

the memory test). Fourteen participants who received the recall test

and 15 who received the recognition test had undefined Gs in one

or more JOL conditions. Data from these participants were ex-

cluded from analyses; mean Gs from the remaining participants

(n 39) appear in Figure 1.

Gs were significantly lower for recognition tests than for recall

tests (cf. Thiede & Dunlosky, 1994), F (1, 37) 6.41,  MSE  .36,

2 .15, and JOL type had a significant effect,  F (4, 148) 3.49,

 MSE  .18,  2 .09. The interaction between test type and JOL

type, however, was not significant, F (4, 148) 1.09,  MSE  .18,

 p .05. This is important because TAM predicts that monitoring

will be best when stimuli (for the context-oriented view) or the

processing used (for the processing-oriented view) at JOL match

those at test. For recall, this would be JOL Condition 1: The JOL

prompt is the cue alone, exactly like the final test. Although

predictive accuracy was high in Condition 1 for recall, adding

irrelevant alternatives to the context (Conditions 3 and 4) did not

decrease G so long as the correct answer was not evident. For

recognition, best performance would be expected in JOL Condi-

tion 4, where the unmarked cue–target pair is presented with the

same six distractor pairs that would be present at test; G was

intermediate in that condition.Because the interaction was not significant (and because the

pattern was similar across the two conditions), we combined Gs for

recall and recognition tests and obtained the following mean Gs for

Conditions 1–5, respectively: 0.69, 0.44, 0.72, 0.67, and 0.50.

Mean Gs tended to be higher when the answer was not evident.

Comparing all pairwise combinations of the five JOL conditions

required 10 separate tests, so we used the Bonferroni correction

procedure and adopted a more stringent alpha level (.05/10)  

.005 for each t  test. No significant differences in mean Gs emerged

at the modified alpha level (observed  p  values for tests comparing

Conditions 2 and 5 vs. Conditions 1, 3, and 4 ranged from .007 to

.056). However, when recall and recognition tests were considered

separately, the direction of the effect (higher G when the answerwas not evident) was consistent in 12 of 12 cases; this is significant

by a sign test ( p .0002). Overall, Gs did not vary as predicted

by TAM but rather varied according to whether the correct answer

was evident during JOLs.2

2 Using 137 participants, we replicated this finding in a second experi-

ment. The procedures were the same except that type of JOL varied

between subjects and all participants received a recall test for 30 items and

a recognition test for the remaining 30 items. This modification produced

fewer indeterminate Gs and reduced the variability in mean scores but

again completely failed to support the TAM hypothesis.

1060   WEAVER AND KELEMEN

Page 4: Processing Similarity Does Not Improve Metamemory

8/12/2019 Processing Similarity Does Not Improve Metamemory

http://slidepdf.com/reader/full/processing-similarity-does-not-improve-metamemory 4/8

 JOL Magnitude

Mean JOL magnitude across conditions appears in Table 2. A

5    2 mixed ANOVA showed a significant main effect of JOL

type,  F (4, 264) 51.74,  MSE  .02,  2 .44, and a significant

interaction between JOL type and test,  F (4, 264) 3.56,  MSE 

.02,  2 .05. To follow up the interaction, we conducted separate

one-way ANOVAs for the recall and recognition tests, and the

effect of JOL type remained significant in both cases ( F s    15).

Next, we conducted post hoc paired   t   tests at alpha     .005 to

examine differences across conditions. JOL magnitude in Condi-

tions 2, 4, and 5 was significantly higher compared with Condi-

tions 1 and 3 for both types of tests in all but one case (11 of 12

comparisons,   p     .003 using a sign test). Overall, participants

were more confident about their future memory performances

when they saw the answer at time of JOL.

Test Performance

Performance was better on recognition tests than on recall tests

(see Table 2),  F (1, 66)    48.73,  MSE     .23,  2   .43. Type of 

JOL also had a strong influence on test performance,  F (4, 264)

46.58,  MSE  .02,  2 .41. In addition, the interaction between

test type and JOL type was significant,  F (4, 264) 4.92,  MSE 

.02,   2   .07. We conducted separate repeated measures

ANOVAs for each type of test, and the influence of JOL type

remained significant (F s 18) with large effect sizes (2 .35)

for both types of tests. The influence of JOL cues on test perfor-

mance is clear: Participants’  memory was best when the correct

answer was evident (Conditions 2 and 5), moderate when the

answer was included but not distinguished from incorrect alterna-tives (Condition 4), and lowest when the answer was not shown

(Conditions 1 and 3). Post hoc   t   tests were consistent with this

interpretation (see subscripts in Table 2).

Discussion

The purpose of this study was to evaluate the effects of match-

ing the processing elicited during delayed JOLs and the processing

required at subsequent tests. The context-oriented version of the

TAM hypothesis proposes that metamemory accuracy will in-

crease as judgments and tests become more similar. One major

prediction was the emergence of a reliable interaction between

type of JOL and test: For recall tests, Condition 1 should have

produced the highest Gs; for recognition tests, Condition 4 should

have been the best. This interaction did not occur. At the same

time, we failed to obtain evidence to support a processing-oriented

version of TAM. Although metamemory accuracy was high in

Condition 1 for recall, adding irrelevant alternatives during

JOLs—thereby degrading the processing match— did not decrease

metamemory in Conditions 3 and 4. For associative recognition,

Condition 4 provided an exact match of context and processing at

JOL and test, but metamemory accuracy did not improve as

predicted. In fact, mean Gs were slightly higher in Conditions 1

and 3, which provided imperfect matches.

We found large increases in JOL magnitude and test perfor-

mance when the correct answer was evident during delayed JOLs

(Conditions 2 and 5) compared with when the answer was absent(Conditions 1 and 3). Seeing the correct cue–target item at time of 

JOL improved subsequent memory, and participants adjusted their

JOLs accordingly. The opposite pattern of results was obtained for

relative metamemory accuracy: Mean Gs were higher when the

correct answer was absent at time of JOL and lower when the

correct answer was evident. These results may have emerged

because participants attempted to retrieve the answer during JOLs

in Conditions 1 and 3, which provided highly diagnostic informa-

Table 2

 Mean JOL Magnitude and Test Performance by Type of JOL and Type of Test 

Test type andperformance

JOL condition

1 2 3 4 5

 M SE M SE M SE M SE M SE 

RecallJOL magnitude .22a   .04 .51b   .05 .24a   .03 .44b   .04 .51b   .04Performance .17a   .03 .43c   .04 .14a   .03 .31b   .04 .41c   .04

RecognitionJOL magnitude .33a   .03 .52b   .04 .37a, c   .04 .44b, c   .04 .52b   .04Performance .61a   .04 .79b   .04 .57a   .04 .58a   .05 .73b   .05

 Note.   Means in the same row with different subscripts were significantly different at p .01 using post hocpaired t   tests. JOL  judgment of learning.

Figure 1.   Mean gammas as a function of judgment of learning (JOL)

condition and type of test. Vertical bars represent standard errors of the

mean.

1061TRANSFER-APPROPRIATE METAMEMORY

Page 5: Processing Similarity Does Not Improve Metamemory

8/12/2019 Processing Similarity Does Not Improve Metamemory

http://slidepdf.com/reader/full/processing-similarity-does-not-improve-metamemory 5/8

tion regarding future memory performance. Conditions 2 and 5, on

the other hand, provided an additional opportunity to learn the item

but offered fewer diagnostic cues for JOLs.

Isolating Causes of Metamemory Accuracy

The main differences in metamemory accuracy can be summed

up as follows: Metamemory accuracy in JOL Conditions 2 and 5

is lower than it is in the other three conditions. Why is this the

case? First of all, Conditions 2 and 5 present the (clearly marked)

correct answer during the JOL, removing the need for any kind of 

covert retrieval attempt, if in fact this is what participants are doing

at JOL. Specifically, these conditions preclude the case of a failed

retrieval attempt, which is particularly diagnostic (see Nelson,

Narens, & Dunlosky, in press). If so, then anything that reduces the

number of low JOLs (predictions of unsuccessful future recall)

should reduce JOL accuracy. Second, those JOL conditions may

induce an illusion of knowing (Glenberg, Wilkinson, & Epstein,

1982; Hart et al., 1992; Koriat, 1998), in which individuals de-

velop a sense of overconfidence, believing that they know more

than they do. Finally, presenting the targets at time of JOL mayimpair JOL accuracy simply by distorting the distribution of JOL.

That is, decreasing the frequency of low JOLs may reduce gammas

for reasons having to do with measurement factors, not metacog-

nitive factors. By restricting the range of JOLs, observed levels of 

G may be reduced. Compared with immediate JOLs, delayed JOLs

induce many more JOLs at the extremes of the JOL continuum

(Dunlosky & Nelson, 1994; Schwartz, 1994). In previous work 

(Weaver & Kelemen, 1997), however, we determined that this

distribution shift was not the primary cause of the delayed-JOL

effect.

In the present study, the different JOL conditions did induce

major differences in the distribution of JOLs. Table 3 displays thefrequency with which different JOLs were selected as a function of 

JOL condition. JOL Conditions 2 and 5 elicit far fewer judgments

of 0 than any other condition: less than 10% of the time for both

recall and recognition tests. The other JOL conditions elicited

JOLs of 0 between two and six times more frequently. At the same

time, these conditions produce different levels of correct perfor-

mance when conditionalized upon JOL (also shown in Table 3);

we refer to the patterns of conditional proportions correct as

calibration curves, following common practice in this field and in

 judgments and decision making (Hart et al., 1992; Nelson, 1996;

Stankov, 1998; Wallsten, 1996; Weaver, 1990). Theoretically,

perfect metacognitive accuracy would be indicated by proportions

correct that are identical to the JOL level (that is, items with JOLsof 80% would be answered correctly 80% of the time) and to be

independent of JOL frequency. When G is less than perfect,

though, the distribution of JOLs has a significant effect. G involves

a weighted averaging of items. An inaccurate prediction that

occurs frequently will significantly lower G. The same inaccuracy

Table 3

Frequency of JOL Usage and Conditional Proportion Correct by Type of JOL and Type of Test 

JOLcondition Measure

JOL

0 20 40 60 80 100

Recall1 Frequency .55 .19 .05 .07 .05 .09

Proportion correct .04 .09 .30 .34 .45 .83

2 Frequency .10 .25 .20 .11 .13 .21Proportion correct .20 .28 .55 .57 .47 .50

3 Frequency .48 .25 .08 .07 .06 .07Proportion correct .03 .07 .16 .29 .54 .69

4 Frequency .22 .25 .13 .11 .13 .17Proportion correct .03 .13 .37 .33 .58 .68

5 Frequency .08 .26 .20 .14 .13 .19Proportion correct .06 .31 .30 .54 .65 .53

Recognition

1 Frequency .27 .36 .14 .04 .06 .13Proportion correct .39 .56 .70 1.00 .88 .89

2 Frequency .05 .24 .23 .18 .13 .17Proportion correct .60 .60 .81 .82 .85 .99

3 Frequency .20 .32 .20 .09 .07 .12Proportion correct .36 .44 .63 .66 .79 .98

4 Frequency .23 .27 .12 .08 .08 .23Proportion correct .30 .31 .62 .85 .90 .94

5 Frequency .08 .24 .19 .17 .13 .19Proportion correct .45 .58 .69 .83 .83 .94

 Note.   JOL  judgment of learning.

1062   WEAVER AND KELEMEN

Page 6: Processing Similarity Does Not Improve Metamemory

8/12/2019 Processing Similarity Does Not Improve Metamemory

http://slidepdf.com/reader/full/processing-similarity-does-not-improve-metamemory 6/8

observation, occurring infrequently, has much less of an effect.

Estimates of G, then, are influenced not only by the function

relating JOL and performance (the calibration curves) but also by

the relative frequency with which each JOL category is used.

To separate these two influences, we conducted a series of 

Monte Carlo simulations, similar to those performed by Weaver

(1990) and Weaver and Kelemen (1997). For each observation, we

first determined the JOL for that item using the frequency distri-butions displayed in Table 3. For example, in JOL Condition 1 for

the recall data, participants selected the 0% JOL rating 55% of the

time, the 20% rating 19% of the time, and so on. We generated a

random number between 0 and 1 and used this to determine the

item’s JOL: If the random number was less than .55, the item was

assigned a JOL of 0. If the number was between .55 and .74 (.55

.19), it was given a JOL of 2, and so on. Once an item received a

JOL—assume for illustrative purposes that the item was assigned

a JOL of 0 —the proportion correct data from Table 3 were used to

determine whether this item was successfully recalled. Another

random number was generated; if the number was less than .04

(the conditional proportion correct for JOL Condition 1    JOL  

0), the item was presumed to have been correctly recalled. This

was repeated for each of 60 items, for 50 participants per simulated

experiment. Each experiment was replicated 50 times.

These procedures allowed us to separate the effects of JOL

distribution shifts from those due to differences in calibration

curves. For example, are the lower observed Gs in Conditions 2

and 5 an artifact of the relative infrequency of using JOLs of 0? If 

so, then assigning JOLs based on data where JOLs of 0 are more

frequent (such as recall, Condition 1) but using the same calibra-

tion curve should produce higher Gs. If the lower Gs reflect true

metacognitive impairments, then varying the calibration curves

while holding constant the JOL distributions should produce larger

effects. In all, 25 combinations of 5 JOL distributions and 5

conditional proportions correct were possible for both the recall

and the recognition data.

The results of the simulations are shown in Table 4. The main

diagonal indicates places where the accuracy of the simulations

can be checked with participants’ actual data. In 8 of 10 cases, the

simulated Gs were nearly perfect (within the 95% confidence

interval for the mean of participants’   data). In the others, the

pattern observed still mirrors the data actually obtained. Overall,we are satisfied that our simulations allow us to answer the

questions of interest.3

The results of the simulations are clear and striking. Although

changing the JOL distribution alters the Gs somewhat, varying the

calibration curves alters them substantially. Regardless of the

underlying JOL distributions, the calibration curves from Condi-

tions 2 and 5 (those at which the correct answer is displayed and

identified at JOL) produce substantially lower Gs. The effects are

particularly powerful with recall data. This is noteworthy because

the vast majority of JOL research uses cued recall as the dependent

variable. We conclude from these data that the poor metacognitive

3 We can speculate as to why our results in some simulated conditions

differed more than others. First of all, our simulations assume that JOLs are

distributed randomly across each participant and among all participants.

The condition in which our error was greatest, recognition JOL Condition

4, illustrates one consequence of this assumption. If we assume that all

participants use all categories equally, our simulated numbers are more

believable. However, if participants tended to use either the higher cate-

gories or the lower categories more frequently, observed gammas would be

lower than simulated gammas. This is true because conditional probabili-

ties for Categories 0 and 20 are almost identical, as are those for Categories

60, 80, and 100. Those using JOLs at only the higher range, for example,

will have many cases in which the item with the higher JOL is not more

likely to be recalled, lowering the gammas.

Table 4

 Results (Mean gamma) of Monte Carlo Simulations Varying JOL Frequency and Conditional

Proportion Correct 

JOLdistribution

Condition

1 2 3 4 5   M 

Recall

1 .81,  .84   .43 .81 .80 .70 .712 .74 .28,   .44   .74 .63 .42 .593 .77 .43 .82,   .83   .77 .66 .694 .76 .35 .77 .70,  .84   .53 .625 .71 .27 .73 .61 .40,  .57   .54

 M    .76 .35 .71 .70 .54

Recognition

1 .56,  .53   .45 .55 .63 .50 .542 .58 .51,   .45   .55 .72 .50 .573 .58 .47 .54,   .59   .66 .49 .554 .62 .57 .64 .72,  .49   .58 .625 .59 .53 .59 .73 .42,  .37   .59

 M    .59 .50 .57 .70 .50

 Note.   Simulated data are based on observed values from each experimental condition. Actual results are shownin bold.

1063TRANSFER-APPROPRIATE METAMEMORY

Page 7: Processing Similarity Does Not Improve Metamemory

8/12/2019 Processing Similarity Does Not Improve Metamemory

http://slidepdf.com/reader/full/processing-similarity-does-not-improve-metamemory 7/8

performance seen in Conditions 2 and 5 is a true deficit, not an

artifact of the shift in JOL distributions.

Our data, unfortunately, do not let us distinguish between the

two most compelling explanations of JOLs, the monitoring-dual

memories (MDM) hypothesis of Nelson and Dunlosky (Dunlosky

& Nelson, 1992, 1994, 1997; Nelson & Dunlosky, 1991, 1992) and

the self-fulfilling hypothesis of Spellman and Bjork (1992) and itsmore recent variant, the memory hypothesis of Kimball and Met-

calfe (in press). Gs tended to be high when the correct answers

were not evident during JOLs; this is largely consistent with an

MDM account of JOLs. However, Gs were relatively high for

recall tests in Condition 4, even though the correct-cue–target pair

was presented. At the same time, presenting target answers along

with cues at time of JOL also produced an increase in memory

accuracy (at the expense of metamemory accuracy); this is con-

sistent with the self-fulfilling and memory hypotheses. Nelson et

al.’s newly developed prejudgment recall and monitoring (PRAM)

procedure (unpublished manuscript), in which recall attempts are

made prior to JOLs, may allow this question to be addressed more

directly in future research.

Most important, our data strongly argue against a processing

view of TAM for paired associates. It is possible that support for

TAM may yet emerge using more complex stimulus materials such

as passages of text. Text materials permit a wider range of encod-

ing strategies and processing during judgment and test; this might

increase the importance of matching processing at these times

(though see Rawson, Dunlosky, & McDonald, 2002, for a discus-

sion that contradicts this view). At present, however, we see little

evidence to support TAM as a viable account of metamemory

accuracy.

References

Begg, I., Duft, S., Lalonde, P., Melnick, R., & Sanvito, J. (1989). Memory

predictions are based on ease of processing.   Journal of Memory and 

 Language, 28, 610 – 632.

Blaxton, T. A. (1986). Investigating dissociations among memory mea-

sures: Support for a transfer appropriate processing framework (Doctoral

dissertation, Purdue University, 1985).   Dissertation Abstracts Interna-

tional, 47,  408.

Clark-Carter, D. (1997). Doing quantitative psychological research: From

design to report.  East Sussex, England: Psychology Press.

Cohen, J. (1988).  Statistical power analysis for the behavioral sciences

(2nd ed.). Hillsdale, NJ: Erlbaum.

Drum, P. A., Calfee, R. C., & Cook, L. K. (1981). The effects of surface

structure variables on performance in reading comprehension tests.

 Reading Research Quarterly, 16,  486 –514.

Dunlosky, J., & Nelson, T. O. (1992). Importance of the kind of cue for

 judgments of learning (JOL) and the delayed-JOL effect.   Memory and 

Cognition, 20,  374 –380.

Dunlosky, J., & Nelson, T. O. (1994). Does the sensitivity of judgments of 

learning (JOLs) to the effects of various study activities depend on when

the JOLs occur?   Journal of Memory and Language, 33,  545–565.

Dunlosky, J., & Nelson, T. O. (1997). Similarity between the cue for

 judgments of learning (JOL) and the cue for test is not the primary

determinant of JOL accuracy.  Journal of Memory and Language, 36,

34 – 49.

Glenberg, A. M., Wilkinson, A. C., & Epstein, W. (1982). The illusion of 

knowing: Failure in the self-assessment of comprehension. Memory and 

Cognition, 10,  597– 602.

Graf, P., & Ryan, L. (1990). Transfer-appropriate processing for implicit

and explicit memory.   Journal of Experimental Psychology: Learning,

 Memory, and Cognition, 16, 978 –992.

Hart, J. T., Nelson, T. O., Gerler, D., Narens, L., Arbuckle, T. Y., Cuddy,

L. A., et al. (1992). Metacognitive monitoring. In T. O. Nelson (Ed.),

 Metacognition: Core readings  (pp. 131–231). Needham Heights, MA:

Allyn & Bacon.

Kelemen, W. L. (2000). Metamemory cues and monitoring accuracy:

Judging what you know and what you will know. Journal of Educational

Psychology, 92,  800 – 810.

Kelemen, W. L., & Weaver, C. A., III. (1997). Enhanced memory at

delays: Why do judgments of learning improve over time?   Journal of 

 Experimental Psychology: Learning, Memory, and Cognition, 23, 1394 –

1409.

Kimball, D. R., & Metcalfe, J. (2002, November).  Explaining the delayed-

 JOL effect: Evidence of a Heisenberg effect. Paper presented at the 43rd

Annual Meeting of the Psychonomic Society, Kansas City, MO.

Kimball, D. R., & Metcalfe, J. (in press). Delaying judgments of learning

affects memory, not metamemory.  Memory & Cognition.

Koriat, A. (1997). Monitoring one’s own knowledge during study: A

cue-utilization approach to judgments of learning.  Journal of Experi-

mental Psychology: General, 126,  349 –370.

Koriat, A. (1998). Illusions of knowing: The link between knowledge andmetaknowledge. In V. Y. Yzerbyt (Ed.),  Metacognition: Cognitive and 

social dimensions  (pp. 16 –34). Thousand Oaks, CA: Sage.

Lockhart, R. S. (2002). Levels of processing, transfer-appropriate process-

ing, and the concept of robust encoding.  Memory, 10,  397– 403.

Morris, C. D. (1978). Transfer appropriate processing between different

encoding dimensions (Doctoral dissertation, Vanderbilt University,

1977).  Dissertation Abstracts International, 39,   1017.

Morris, C. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing

versus transfer appropriate processing.  Journal of Verbal Learning and 

Verbal Behavior, 16,  519 –533.

Nelson, T. O. (1996). Gamma is a measure of the accuracy of predicting

performance on one item relative to another item, not of the absolute

performance on an individual item.   Applied Cognitive Psychology, 10,

257–260.

Nelson, T. O., & Dunlosky, J. (1991). When people’s judgments of learning (JOLs) are extremely accurate at predicting subsequent recall:

The “delayed-JOL effect.”  Psychological Science, 2,  267–270.

Nelson, T. O., & Dunlosky, J. (1992). How shall we explain the delayed-

 judgment-of-learning effect? Psychological Science, 3,  317–318.

Nelson, T. O., & Dunlosky, J. (1996, November).  Toward the theoretical

mechanisms underlying immediate versus delayed judgments of learn-

ing.  Paper presented at the 37th Annual Meeting of the Psychonomic

Society, Chicago.

Nelson, T. O., & Narens, L. (1990). Metamemory: A theoretical framework 

and new findings. In G. Bower (Ed.),   The psychology of learning and 

motivation (Vol. 26, pp. 125–173). San Diego, CA: Academic Press.

Nelson, T. O., Narens, L., & Dunlosky, J. (in press).  A revised methodology

 for research on metamemory: Pre-judgment recall and monitoring

(PRAM). Psychological Methods.

Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery,

and meaningfulness values for 925 nouns.   Journal of Experimental

Psychology Monographs, 76 (1, Pt. 2).

Rajaram, S., Srinivas, K., & Roediger, H. L. (1998). A transfer-appropriate

processing account of context effects in word-fragment completion.

 Journal of Experimental Psychology: Learning, Memory, and Cogni-

tion, 24,  993–1004.

Rawson, K. A., Dunlosky, J., & McDonald, S. L. (2002). Influences of 

metamemory on performance predictions for text.  Quarterly Journal of 

 Experimental Psychology: Human Experimental Psychology, 55A,  505–

524.

Roediger, H. L. (1990). Implicit memory: Retention without remembering.

 American Psychologist, 45,  1043–1056.

1064   WEAVER AND KELEMEN

Page 8: Processing Similarity Does Not Improve Metamemory

8/12/2019 Processing Similarity Does Not Improve Metamemory

http://slidepdf.com/reader/full/processing-similarity-does-not-improve-metamemory 8/8

Roediger, H. L., Gallo, D. A., & Geraci, L. (2002). Processing approaches

to cognition: The impetus from the levels-of-processing framework.

 Memory, 10, 319 –332.

Schwartz, B. L. (1994). Sources of information in metamemory: Judgments of 

learning and feelingsof knowing. Psychonomic Bulletin & Review, 1, 357–375.

Spellman, B. A., & Bjork, R. A. (1992). When predictions create reality:

Judgments of learning may alter what they are intended to assess.

Psychological Science, 3,  315–316.Spellman, B. A., & Bjork, R. A. (1997, November).   When prophecy

succeeds (too well): Inaccurate judgments of learning can produce

better-than-perfect predictions.   Paper presented at the 38th Annual

Meeting of the Psychonomic Society, Philadelphia.

Stankov, L. (1998). Calibration curves, scatterplots and the distinction

between general knowledge and perceptual tasks.  Learning and Individ-

ual Differences, 10,  29 –50.

Thiede, K. W., & Dunlosky, J. (1994). Delaying students’  metacognitive

monitoring improves their accuracy in predicting their recognition per-

formance. Journal of Educational Psychology, 86,  290 –302.

Wallsten, T. S. (1996). An analysis of judgment research analyses.  Orga-

nizational Behavior and Human Decision Processes, 65,  220 –226.

Weaver, C. A., III. (1990). Constraining factors in calibration of compre-

hension.  Journal of Experimental Psychology: Learning, Memory, and 

Cognition, 16,  214 –222.

Weaver, C. A., III, & Kelemen, W. L. (1997). Judgments of learning at

delays: Shifts in response patterns or increased metamemory accuracy?

Psychological Science, 8,  318 –321.

Received February 20, 2002

Revision received March 20, 2003

Accepted May 8, 2003  

1065TRANSFER-APPROPRIATE METAMEMORY