depth of processing and the retention of words in episodic

27
Journal ol Experimental Psychology: General 1975, Vol. 104, No. 3, 268-294 Depth of Processing and the Retention of Words in Episodic Memory Fergus I. M. Craik and Endel Tulving University of Toronto, Toronto, Ontario, Canada SUMMARY Ten experiments were designed to explore the levels of processing framework for human memory research proposed by Craik and Lockhart (1972). The basic notions are that the episodic memory trace may be thought of as a rather auto- matic by-product of operations carried out by the cognitive system and that the durability of the trace is a positive function of "depth" of processing, where depth refers to greater degrees of semantic involvement. Subjects were induced to process words to different depths by answering various questions about the words. For example, shallow encodings were achieved by asking questions about type- script; intermediate levels of encoding were accomplished by asking questions about rhymes; deep levels were induced by asking whether the word would fit into a given category or sentence frame. After the encoding phase was completed, subjects were unexpectedly given a recall or recognition test for the words. In general, deeper encodings took longer to accomplish and were associated with higher levels of performance on the subsequent memory test. Also, questions lead- ing to positive responses were associated with higher retention levels than questions leading to negative responses, at least at deeper levels of encoding. Further experiments examined this pattern of effects in greater analytic detail. It was established that the original results did not simply reflect differential encod- ing times; an experiment was designed in which a complex but shallow task took longer to carry out but yielded lower levels of recognition than an easy, deeper task. Other studies explored reasons for the superior retention of words associated with positive responses on the initial task. Negative responses were remembered as well as positive responses when the questions led to an equally elaborate encoding in the two cases. The idea that elaboration or "spread" of encoding provides a better description of the results was given a further boost by the finding of the typical pattern of results under intentional learning conditions, and where each word was exposed for 6 sec in the initial phase. While spread and elaboration may indeed be better descriptive terms for the present findings, retention depends critically on the qualitative nature of the encoding operations performed; a minimal semantic analysis is more beneficial than an extensive structural analysis. Finally, Schulman's (1974) principle of congruity appears necessary for a complete description of the effects obtained. Memory performance is enhanced to the extent that the context, or encoding question, forms an integrated unit with the word presented. A congruous encoding yields superior memory performance because a more elaborate trace is laid down and because in such cases the struc- ture of semantic memory can be utilized more effectively to facilitate retrieval. The article concludes with a discussion of the broader implications of these data and ideas for the study of human learning and memory, 268

Upload: others

Post on 16-Oct-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Depth of Processing and the Retention of Words in Episodic

Journal ol Experimental Psychology: General1975, Vol. 104, No. 3, 268-294

Depth of Processing and the Retention of Wordsin Episodic Memory

Fergus I. M. Craik and Endel TulvingUniversity of Toronto, Toronto, Ontario, Canada

SUMMARY

Ten experiments were designed to explore the levels of processing frameworkfor human memory research proposed by Craik and Lockhart (1972). The basicnotions are that the episodic memory trace may be thought of as a rather auto-matic by-product of operations carried out by the cognitive system and that thedurability of the trace is a positive function of "depth" of processing, where depthrefers to greater degrees of semantic involvement. Subjects were induced toprocess words to different depths by answering various questions about the words.For example, shallow encodings were achieved by asking questions about type-script; intermediate levels of encoding were accomplished by asking questionsabout rhymes; deep levels were induced by asking whether the word would fit intoa given category or sentence frame. After the encoding phase was completed,subjects were unexpectedly given a recall or recognition test for the words. Ingeneral, deeper encodings took longer to accomplish and were associated withhigher levels of performance on the subsequent memory test. Also, questions lead-ing to positive responses were associated with higher retention levels than questionsleading to negative responses, at least at deeper levels of encoding.

Further experiments examined this pattern of effects in greater analytic detail.It was established that the original results did not simply reflect differential encod-ing times; an experiment was designed in which a complex but shallow task tooklonger to carry out but yielded lower levels of recognition than an easy, deepertask. Other studies explored reasons for the superior retention of words associatedwith positive responses on the initial task. Negative responses were rememberedas well as positive responses when the questions led to an equally elaborate encodingin the two cases. The idea that elaboration or "spread" of encoding provides abetter description of the results was given a further boost by the finding of thetypical pattern of results under intentional learning conditions, and where eachword was exposed for 6 sec in the initial phase. While spread and elaborationmay indeed be better descriptive terms for the present findings, retention dependscritically on the qualitative nature of the encoding operations performed; aminimal semantic analysis is more beneficial than an extensive structural analysis.

Finally, Schulman's (1974) principle of congruity appears necessary for acomplete description of the effects obtained. Memory performance is enhancedto the extent that the context, or encoding question, forms an integrated unit withthe word presented. A congruous encoding yields superior memory performancebecause a more elaborate trace is laid down and because in such cases the struc-ture of semantic memory can be utilized more effectively to facilitate retrieval.The article concludes with a discussion of the broader implications of these dataand ideas for the study of human learning and memory,

268

Page 2: Depth of Processing and the Retention of Words in Episodic

DEPTH OF PROCESSING AND WORD RETENTION 269

While information-processing models ofhuman memory have been concerned largelywith structural aspects of the system, thereis a growing tendency for theorists to focus,rather, on the processes involved in learningand remembering. Thus the theorist's task,until recently, has been to provide anadequate description of the characteristicsand interrelations of the successive stagesthrough which information flows. An al-ternative approach is to study more directlythose processes involved in remembering—processes such as attention, encoding, re-hearsal, and retrieval—and to formulate adescription of the memory system in termsof these constituent operations. This alter-native viewpoint has been advocated byCermak (1972), Craik and Lockhart(1972), Hyde and Jenkins (1969, 1973),Kolers (1973a), Neisser (1967), and Paivio(1971), among others, and it represents asufficiently different set of fundamentalassumptions to justify its description as anew paradigm, or at least a miniparadigm,in memory research. How should we con-ceptualize learning and retrieval operationsin these terms? What changes in the sys-tem underlie remembering? Is the "mem-ory trace" best regarded as some copy ofthe item in a memory store (Waugh & Nor-man, 1965), as a bundle of features (Bower,1967), as the record resulting from theperceptual and cognitive analyses carriedout on the stimulus (Craik & Lockhart,1972), or do we remember in terms of theencoding operations themselves (Neisser,1967; Kolers, 1973a) ? Although we arestill some way from answering these crucialquestions satisfactorily, several recent stud-ies have provided important clues.

The incidental learning situation, in whichsubjects perform different orienting tasks,

The research reported in this article was sup-ported by National Research Council of CanadaGrants A8261 and A8632 to the first and secondauthors, respectively. The authors gratefullyacknowledge the assistance of Michael Anderson,Ed Darte, Gregory Mazuryk, Marsha Carnat,Marilyn Tiller, and Margaret Barr.

Requests for reprints should be sent to F. I. M.Craik, Erindale College, University of Toronto,Mississauga, Ontario, LSL 1C6, Canada.

provides an experimental setting for thestudy of mental operations and their effectson learning. It has been shown that whensubjects perform orienting tasks requiringanalysis of the meaning of words in a list,subsequent recall is as extensive and ashighly structured as the recall observedunder intentional conditions in the absenceof any specific orienting task; further re-search has indicated that a "process"explanation is most compatible with theresults (Hyde, 1973; Hyde & Jenkins,1969, 1973; Walsh & Jenkins, 1973).Schulman (1971) has also shown that asemantic orienting task is followed byhigher retention of words than a "struc-tural" task in which the nonsemantic aspectsof the words are attended to. Similar find-ings have been reported for the retention ofsentences (Bobrow & Bower, 1969; Rosen-berg & Schiller, 1971; Treisman & Tux-worth, 1974) and in memory for faces(Bower & Karlin, 1974). In all theseexperiments, an orienting task requiringsemantic or affective judgments led tobetter memory performance than tasksinvolving structural or syntactic judgments.However, the involvement of semanticanalyses is not the whole story: Schulman(1974) has shown that congruous queriesabout words (e.g., "Is • a SOPRANO asinger?") yield better memory for thewords than incongruous queries (e.g., "IsMUSTARD concave?"). Instruction to formimages from the words also leads to excel-lent retention (e.g., Paivio, 1971; Sheehan,1971).

The results of these studies have impor-tant theoretical implications. First, theydemonstrate a continuity between incidentaland intentional learning—the operationscarried out on the material, not the intentionto learn, as such, determine retention. Theresults thus corroborate Postman's (1964)position on the essential similarity of inci-dental and intentional learning, although therecent work is more usually described interms of similar processes rather than sim-ilar responses (Hyde & Jenkins, 1973).Second, it seems clear that attention to theword's meaning is a necessary prerequisiteof good retention. Third, since retrieval

Page 3: Depth of Processing and the Retention of Words in Episodic

270 FERGUS I. M. CRAIK AND ENDEL TULVING

conditions are typically held constant inthe experiments described above, the dif-ferences in retention reflect the effects ofdifferent encoding operations, although thepicture is complicated by the finding thatdifferent encoding operations are optimalfor different retrieval conditions (e.g.,Eagle & Leiter, 1964; Jacoby, 1973).Fourth, large differences in recall underdifferent encoding operations have beenobserved under conditions where the sub-jects' task does not entail organization orestablishment of interitem associations;thus the results seem to take us beyondassociative and organization processes, asimportant determinants of learning andretention. It may be, of course, that theorienting tasks actually do lead to organiz-ation as suggested by the results of Hydeand Jenkins (1973). Yet, it now becomespossible to entertain the hypothesis thatoptimal processing of individual words, quaindividual words, is sufficient to supportgood recall. Finally, the experiments mayyield some insights into the nature of learn-ing operations themselves. Classical verballearning theory has not been much con-cerned with processes and changes withinthe system but has concentrated largely onmanipulations of the material or the experi-mental situation and the resulting effectson learning. Thus at the moment, we knowa lot about the effects of meaningfulness,word frequency, rate of presentation, var-ious learning instructions, and the like, butrather little about the nature and character-istics of underlying or accompanyingmental events. Experimental and theo-retical analysis of the effects of variousencoding operations holds out the promisethat intentional learning can be reducedto, and understood in terms of, some com-bination of more basic operations.

The experiments reported in the presentpaper were carried out to gain further in-sights into the processes involved in goodmemory performance. The initial experi-ments were designed to gather evidencefor the depth of processing view of mem-ory outlined by Craik and Lockhart (1972).These authors proposed that the memorytrace could usefully be regarded as the by-

product of perceptual processing; just asperception may be thought to be composedof a series of analyses, proceeding fromearly sensory processing to later semantic-associative operations, so the resultantmemory trace may be more or less elab-orate depending on the number and qualita-tive nature of the perceptual analyses car-ried out on the stimulus. It was furthersuggested that the durability of the memorytrace is a function of depth of processing.That is, stimuli which do not receive fullattention, and are analyzed only to a shal-low sensory level, give rise to very transientmemory traces. On the other hand, stimulithat are attended to, fully analyzed, andenriched by associations or images yield adeeper encoding of the event, and a long-lasting trace.

The Craik and Lockhart formulationprovides one possible framework to accom-modate the findings from the incidentallearning studies cited above. It has theadvantage of focusing attention on the pro-cesses underlying trace formation and onthe importance of encoding operations;also, since memory traces are not seen asresiding in one of several stores, the depthof processing approach eliminates the neces-sity to document the capacity of postulatedstores, to define the coding characteristic ofeach store, or to characterize the mechanismby which an item is transferred from onestore to another. Despite these advantages,there are several obvious shortcomings ofthe Craik and Lockhart viewpoint. Doesthe levels of processing framework say anymore than "meaningful events are wellremembered" ? If not, it is simply a collec-tion of old ideas in a somewhat differentsetting. Further, the position may actuallyrepresent a backward step in the study ofhuman memory since the notions are muchvaguer than any of the mathematical modelsproposed, for example, in Norman's (1970)collection. If we already know that thememory trace can be precisely representedas

I =

(Wickelgren, 1973), then such woollystatements as "deeper processing yields a

Page 4: Depth of Processing and the Retention of Words in Episodic

DEPTH OF PROCESSING AND WORD RETENTION 271

more durable trace" are surely far behindus. Third, and most serious perhaps, thevery least the levels position requires issome independent index of depth—there areobvious dangers of circularity present inthat any well-remembered event can tooeasily be labeled deeply processed.

Such criticisms can be partially countered.First, cogent arguments can be marshaled(e.g., Broadbent, 1961) for the advantagesof working with a rather general theory-—•provided the theory is still capable of gen-erating predictions which are distinguish-able from the predictions of other theories.From this general and undoubtedly truestarting point, the concepts can be refined inthe light of experimental results suggestedby the theoretical framework. In thissense the levels of processing viewpoint willencourage rather different types of questionand may yield new insights. A furtherpoint on the issue of general versus specifictheories is that while strength theories ofmemory are commendably specific and so-phisticated mathematically, the sophistica-tion may be out of place if the basic premisesare of limited generality or even wrong. Itis now established, for example, that thetrace of an event can be readily retrieved inone environment of retrieval cues, while itis retrieved with difficulty in another (e.g.,Tulving & Thomson, 1973) ; it is hard toreconcile such a finding with the view thatthe probability of retrieval depends only onsome unidimensional strength.

With regard to an independent index ofprocessing depth, Craik and Lockhart(1972) suggested that, when other thingsare held constant, deeper levels of process-ing would require longer processing times.Processing time cannot always be taken asan absolute indicator of depth, however,since highly familiar stimuli (e.g., simplephrases or pictures) can be rapidly analyzedto a complex meaningful level. But withinone class of materials, or better, with onespecific stimulus, deeper processing isassumed to require more time. Thus, inthe present studies, the time to make deci-sions at different levels of analysis wastaken as an initial index of processingdepth.

The purpose of this article is to describe10 experiments carried out within the levelsof processing framework. The first experi-ments examined the-plausibility of the basicnotions and attempted to rule out alterna-tive explanations of the results. Furtherexperiments were carried out in an attemptto achieve a better characterization of depthof processing and how it is that deepersemantic analysis yields superior memoryperformance. Finally, the implications ofthe results for an understanding of learningoperations are examined, and the adequacyof the depth of processing metaphor ques-tioned.

EXPERIMENTAL INVESTIGATIONS

Since one basic paradigm is used through-out the series of studies, the method will bedescribed in detail at this point. Variationsin the general method will be indicated aseach study is described.

General MethodTypically, subjects were tested individually.

They were informed that the experiment con-cerned perception and speed of reaction. On eachtrial a different word (usually a common noun)was exposed in a tachistoscope for 200 msec.Before the word was exposed, the subject wasasked a question about the word. The purposeof the question was to induce the subject to pro-cess the word to one of several levels of analysis,thus the questions were chosen to necessitateprocessing either to a relatively shallow level(e.g., questions about the word's physical appear-ance) or to a relatively deep level (e.g., questionsabout the word's meaning). In some experiments,the subject read the question on a card; in others,the question was read to him. After reading orhearing the question, the subject looked in thetachistoscope with one hand resting on a yesresponse key and the other on a no response key.One second after a warning "ready" signal theword appeared and the subject recorded his (orher) decision by pressing the appropriate key(e.g., if the question was "Is the word an animalname?" and the word presented was TIGER, thesubject would respond yes). After a series ofsuch question and answer trials, the subject wasunexpectedly given a retention test for the words.The expectation was that memory performancewould vary systematically with the depth ofprocessing.

Three types of question were asked in theinitial encoding phase, (a) An analysis of thephysical structure of the word was effected byasking about the physical structure of the word

Page 5: Depth of Processing and the Retention of Words in Episodic

272 FERGUS I. M. CRAIK AND ENDEL TULVING

TABLE 1TYPICAL QUESTIONS AND RESPONSES USED IN THE EXPERIMENTS

Level of processingAnswer

Question Yes No

StructuralPhonemicCategorySentence

Is the word in capital letters?Does the word rhyme with WEIGHT?Is the word a type of fish?Would the word fit the sentence:"He met a in the street"?

TABLEcrateSHARK

FRIEND

tableMARKETheaven

cloud

(e.g., "Is the word printed in capital letters?").(b) A phonemic level of analysis was induced byasking about the word's rhyming characteristics(e.g., "Does the word rhyme with TRAIN?").(c) A semantic analysis was activated by askingeither categorical questions (e.g., "Is the wordan animal name?") or "sentence" questions (e.g.,"Would the word fit the following sentence:The girl placed the on the table'?").Further examples are shown in Table 1. At eachof the three levels of analysis, half of the ques-tions yielded yes responses and half no responses.

The general procedure thus consisted ofexplaining the perceptual-reaction time task to asingle subject, giving him a long series of trialsin which both the type of question and yes-nodecisions were randomized, and finally giving himan unexpected retention test. This test was eitherfree recall ("Recall all the words you have seenin the perceptual task, in any order") ; cued recall,in which some aspect of each word event was re-presented as a cue; or recognition, where copiesof the original words were re-presented alongwith a number of distractors. In the initial en-coding phase, response latencies were in factrecorded: A millisecond stop clock was started bythe timing mechanism which activated the tachisto-scope, and the clock was stopped by the subject'skey response. Typically, over a group of sub-jects, the same pool of words was used, but eachword was rotated through the various level andresponse combinations (CAPITALS ?-yes; SEN-TENCE ?-no, and so on). The general predictionwas that deeper level questions would take longerto answer but would yield a more elaborate mem-ory trace which in turn would support higherrecognition and recall performance.

Experiment 1Method. In the first experiment, single subjects

were given the perceptual-reaction time test; thisencoding phase was followed by a recognition test.Five types of question were used. First, "Is therea word present?" Second, "Is the word in cap-ital letters?" Third, "Does the word rhyme with

?" Fourth, "Is the word in the cat-egory ?" Fifth, "Would the word fitin the sentence ?" When the first typeof question was asked ("Is there a word pres-ent?"), on half of the trials a word was present

and on half of the trials no word was present onthe tachistoscope card; thus, the subject couldrespond yes when he detected any wordlike pat-tern on the card. (This task may be rather dif-ferent from the others and was not used infurther experiments; also, of course, it yieldsdifficulties of analysis since no word is presentedon the negative trials, these trials cannot beincluded in the measurement of retention.)

The stimuli used were common two-syllablenouns of 5, 6, or 7 letters. Forty trials weregiven; 4 words represented each of the 10 condi-tions (5 levels X yes-no). The same pool of 40words was used for all 20 subjects, but each wordwas rotated through the 10 conditions so that, fordifferent subjects, a word was presented as arhyme-jie.? stimulus, a category-wo stimulus andso on. This procedure yielded 10 combinationsof questions and words; 2 subjects received eachcombination. On each trial, the question wasread to the subject who was already lookingin the tachistoscope. After 2 sec, the word wasexposed and the subject responded by saying yesor no—his vocal response activated a voice keywhich stopped a millisecond timer. The experi-menter recorded the response latency, changed theword in the tachistoscope, and read the nextquestion; trials thus occurred approximatelyevery 10 sec.

After a brief rest, the subject was given a sheetwith the 40 original words plus 40 similar dis-tractors typed on it. Any one subject hadactually only seen 36 words as no word waspresented on negative "Word present?" trials. Hewas asked to check all words he had seen on thetachistoscope. No time limit was imposed forthis task. Two different randomizations of the80 recognition words were typed; one random-ization was given to each member of the pair ofsubjects who received identical study lists. Thuseach subject received a unique presentation-recognition combination. The 20 subjects werecollege students of both sexes paid for theirservices.

Results and discussion. The results areshown in Table 2. The upper portionshows response latencies for the differentquestions. Only correct answers were in-

Page 6: Depth of Processing and the Retention of Words in Episodic

DEPTH OF PROCESSING AND WORD RETENTION 273

eluded in the analysis. The median latencywas calculated for each subject; Table 2shows mean medians. Although the fivequestion levels were selected intuitively, thetable shows that in fact response latencyrises systematically as the questions neces-sitated deeper processing. Apart from thesentence level, yes and no responses tookequivalent times. The median latencyscores were subjected to an analysis ofvariance (after log transformation). Theanalysis showed a significant effect of level,F(4, 171) = 35.4, p < .001, but no effectof response type (yes-no) and no inter-action. Thus, intuitively deeper questions—semantic as opposed to structural deci-sions about the word—required slightlylonger processing times (150-200 msec).

Table 2 also shows the recognition re-sults. Performance (the hit rate) increasedsubstantially from below 20% recognizedfor questions concerning structural charac-teristics, to 96% correct for sentence-yesdecisions. The other prominent feature ofthe recognition results is that the yes re-sponses to words in the initial perceptualphase were accompanied by higher sub-sequent recognition than the no responses.Further, the superiority of recognition ofyes words increased with depth (until thetrend was apparently halted by a ceilingeffect). These observations were confirmedby analysis of variance on recognition pro-portions (after arc sine transformation).Since the first level (word present?) hadonly yes responses, words from this levelwere not included in the analysis. Type ofquestion was a significant factor, F(3, 133)= 52.8, p < .001, as was response type (yes-no), F(\, 133) =40.2, /X.001. TheQuestion X Response Type interaction wasalso significant, F(3, 133) = 6.77, p < .001.

The results have thus shown that differ-ent encoding questions led to different re-sponse latencies; questions about the sur-face form of the word were answered com-paratively rapidly, while more abstractquestions about the word's meaning tooklonger to answer. If processing time is anindex of depth, then words presented aftera semantic question were indeed processedmore deeply. Further, the different encod-

TABLE 2INITIAL DECISION LATENCY AND RECOGNITION

PERFORMANCE FOR WORDS AS A FUNCTION OFINITIAL TASK (EXPERIMENT 1)

Responsetype

Level of processing1 2 3 4 S

Response latency (msec)

YesNo

591590

614625

689678

711716

746832

Proportion recognized

YesNo

.22 .18.14

.78

.36.93.63

.96

.83

ing questions were associated with markeddifferences in recognition performance:Semantic questions were followed by higherrecognition of the word. In fact, Table 2shows that initial response latency is sys-tematically related to subsequent recogni-tion. Thus, within the limits of the presentassumptions, it may be concluded thatdeeper processing yields superior retention.

It is of course possible to argue that thehigher recognition levels are more simplyattributable to longer study times. Thispoint will be dealt with later in the paper,but for the present it may be noted that inthese terms, 200 msec of extra study timeled to a 400% improvement in retention.It seems more reasonable to attribute theenhanced performance to qualitative differ-ences in processing and to conclude thatmanipulation of levels of processing at thetime of input is an extremely powerfuldeterminant of retention of word events.The reason for the superior recognition ofyes responses is not immediately apparent—it cannot be greater depth of processing inthe simple sense, since yes and no responsestook the same time for each encoding ques-tion. Further discussion of this point isdeferred until more experiments are de-scribed.

Experiment 2 is basically a replication ofExperiment 1 but with a somewhat tidierdesign and with more recognition distrac-tors to remove ceiling effects.

Experiment 2Method. Only three levels of encoding were

used in this study: questions concerning type-

Page 7: Depth of Processing and the Retention of Words in Episodic

274 FERGUS I. M. CRAIK AND ENDEL TULVING

CASE RHYME SENTENCE CASE RHYME SENTENCE

LEVEL OF PROCESSING

FIGURE 1. Initial decision latency and recognition performance for wordsas a function of the initial task (Experiment 2).

script (uppercase or lowercase), rhyme questions,and sentence questions (in which subjects weregiven a sentence frame with one word missing).During the initial perceptual phase 60 questionswere presented: 10 yes and 10 no questions ateach of the three levels. Question type was ran-domized within the block of 60 trials. The ques-tion was presented auditorily to the subject; 2sec later the word appeared in the tachistoscopefor 200 msec. The subject responded as rapidlyas possible by pressing one of two response keys.After completing the 60 initial trials, the subjectwas given a typed list of 180 words comprisingthe 60 original words plus 120 distractors. Hewas told to check all words he had seen in thefirst phase.

All words used were five-letter common con-crete nouns. From the pool Of 60 words, twoquestion formats were constructed by randomlyallocating each word to a question type until all10 words for each question type were filled. Inaddition, two orders of question presentation andtwo random orderings of the 180-word recogni-tion list were used. Three subjects were testedon each of the eight combinations thus generated.The 24 subjects were students of both sexes paidfor their services and tested individually.

Results and discussion. The left-handpanel of Figure 1 shows that responselatency rose systematically for both- responsetypes, from case questions to rhyme ques-tions to sentence questions. These dataagain are interpreted as showing that deeperprocessing took longer to accomplish. At

each level, positive and negative responsestook the same time. An analysis of varianceon mean medians yielded an effect of ques-tion type, F(2, 46) = 46.5, p < .001, butyielded no effect of response type and nointeraction.

Figure 1 also shows the recognitionresults. For yes words, performance in-creased from 15% for case decisions to 81%for sentence decisions—more than a five-fold increase in hit rate for memory per-formance for the same subjects in the sameexperiment. Recognition of no words alsoincreased, but less sharply from 19% (case)to 49% (sentence). An analysis of vari-ance showed a question type (level of pro-cessing) effect, F(2, 46) = 118, p < .001,a response type (yes-no} effect, F(\, 23)= 47.9, p < .001, and a Question Type XResponse Type interaction, F(2, 46) =22.5, p < .001.

Experiment 2 thus replicated the resultsof Experiment 1 and showed clearly (a)Different encoding questions are associatedwith different response latencies—this find-ing is interpreted to mean that semanticquestions induce a deeper level of analysisof the presented word, (b) positive andnegative responses are equally fast, (c)

Page 8: Depth of Processing and the Retention of Words in Episodic

DEPTH OF PROCESSING AND WORD RETENTION 275

recognition increases to the extent that theencoding question deals with more abstract,semantic features of the word, and (d)words given a positive response are asso-ciated with higher recognition performance,but only after rhyme and category ques-tions.

The data from Figure 1 are replotted inFigure 2, in which recognition performanceis shown as a function of initial categoriza-tion time. Both yes and no functions arestrikingly linear, with a steeper slope foryes responses. This pattern of data sug-gests that memory performance may simplybe a function of processing time as such(regardless of "level of analysis"). Thissuggestion is examined (and rejected) inthis article, where we argue that level ofanalysis, not processing time, is the criticaldeterminant of recognition performance.

Experiments 3 and 4 extended the gen-erality of these findings by showing thatthe same pattern of results holds in recalland under intentional learning conditions.

Experiment 3Method. Three levels of encoding were again

included in the study by asking questions abouttypescript (case), rhyme, and sentences. On eachtrial the question was read to the subject; after2 sec the word was exposed for 200 msec on thetachistoscope. The subject responded by press-ing the relevant response key. At the end ofthe encoding trials, the subject was allowed torest for 1 min and was then asked to recall asmany words as he could. In Experiment 3, thisfinal recall task was unexpected—thus the initialencoding phase may be considered an incidentallearning task—while in Experiment 4, subjectswere informed at the beginning of the sessionthat they would be required to recall the words.

Pilot studies had shown that the recall levelin this situation tends to be low. Thus, to boostrecall, and to examine the effects of encodinglevel on recall more clearly, half of the words inthe present study were presented twice. In all,48 different words were used, but 24 were pre-sented twice, making a total of 72 trials. Of the24 words presented once only, 4 were presentedunder each of the six conditions (three types ofquestion X yes-no). Similarly, of the 24 wordspresented twice, 4 were presented under each ofthe six conditions. When a word was repeated,it always occurred as the 20th item after its firstpresentation; that is, the lag between first andsecond presentations was held constant. On itssecond appearance, the same type of question wasasked as on the word's first appearance but, for

'500 600 700 800 900 1000

INITIAL DECISION TIME (msec)

FIGURE 2. Proportion of words recognized asa function of initial decision time (Experiment2).

rhyme and sentence questions, a different specificquestion was asked. Thus, when the word TRAINfell into the rhyme-yes category, the questionasked on its first presentation might have been"Does the word rhyme with BRAIN?" while onthe second presentation the question might havebeen "Does the word rhyme with CRANE?" Forcase questions the same question was asked on thetwo occurrences since each subject was given thesame question throughout the experiment '(e.g.,"Is the word in lowercase?"). This procedurewas adopted as early work had shown that sub-jects' response latencies were greatly slowed ifthey had to associate yes responses to both upper-case and lowercase words.

A constant pool of 48 words was used for allsubjects. The words were common concretenouns. Five presentation formats were constructedin which the words were randomly allocated tothe various encoding conditions. Four subjectswere tested on each format: Two made yesresponses with their right hand on the rightresponse key while two used the left-hand keyfor yes responses. The 20 student subjects werepaid for their services. They were told that theexperiment concerned perception and reaction time;they were warned that some words would occurtwice, but they were not informed of the finalrecall test.

Results and discussion. Response laten-cies are shown in Table 3. For each sub-ject and each experimental condition (e.g.,case-yes) the median response latency wascalculated for the eight words presented ontheir first occurrence (i.e., the four wordspresented only once, and the first occurrenceof the four repeated words). The median

Page 9: Depth of Processing and the Retention of Words in Episodic

276 FERGUS I. M. CRAIK AND ENDEL TULVING

TABLE 3RESPONSE LATENCIES FOR EXPERIMENTS

3 AND 4

Condition Case Rhyme Sentence

1st presentation

Incidental(Exp. 3)

YesNo

Intentional(Exp. 4)

68970S

816725

870872

YesNo

687685

796768

897911

2nd presentation

Incidental(Exp. 3)

YesNo

Intentional(Exp. 4)YesNo

616634

609599

689725

684716

771856

793866

Note. Mean medians of response latencies are presented.

latency was also calculated for the fourrepeated words on their second presentation.Only correct responses were included in thecalculation of the medians. Table 3 showsthe mean medians for the various experi-mental conditions. There was a systematicincrease in response latency from case ques-tion to sentence questions. Also, responselatencies were more rapid on the word'ssecond presentation—this was especiallytrue for yes responses. These observationswere confirmed by an analysis of variance.The effect of question type was significant,F(2, 38) = 14.4, p < .01, but the effect ofresponse type was not (F < 1.0). Repeatedwords were responded to reliably faster,F(l, 19) = 10.3, p < .01 and the Numberof Presentations X Response Type (yes-no}interaction was significant, F(l, 19) = 5.33,p < .05.

Thus, again, deeper level questions tooklonger to process, but yes responses tookno longer than no responses. The extrafacilitation shown by positive responses onthe second presentation may be attributableto the greater predictive value of yes ques-

tions. For example, the second presenta-tion of a rhyme question may remind thesubject of the first presentation and thusfacilitate the decision.

Figure 3 shows the recall probabilitiesfor words presented once or twice. Thereis a marked effect of question type (sen-tence > rhymes > case); retention is againsuperior for words given an initial yesresponse and recall of twice-presented wordsis higher than once-presented words. Ananalysis of variance confirmed these obser-vations. Semantic questions yielded higherrecall, F(2, 38) = 36.9, p < .01; more yesresponses than no responses were recalled,F(l, 19) = 21.4, p < .01; two presenta-tions increased performance, F(l, 19) =33.0, p < .01. In addition, semanticallyencoded words benefited more from the sec-ond presentation, as shown by the signifi-cant Question Level X Number of Presen-tations interaction, F(2, 38) = 10.8, p <.01.

Experiment 3 thus confirmed that deeperlevels of encoding take longer to accomplishand that yes and no responses take equalencoding times. More important, semanticquestions led to higher recall performanceand more yes response words were recalledthan no response words. These basic re-sults thus apply as well to recall as they doto recognition. Experiments 1-3 have usedan incidental learning paradigm; there aregood reasons to believe that the incidentalnature of the task is not critical for the ob-tained pattern of results to appear (Hyde& Jenkins, 1973). Nevertheless, it wasdecided to verify Hyde and Jenkins' con-clusion using the present paradigm. Thus,Experiment 4 was a replication of Experi-ment 3, but with the difference that sub-jects were informed of the final recall taskat the beginning of the session.

Experiment 4Method. The material and procedures were

identical to those in Experiment 3 except thatsubjects were informed of the final free recalltask. They were told that the memory task wasof equal importance to the initial phase and thatthey should thus attempt to remember all wordsshown in the tachistoscope. A 10-min period wasallowed for recall. The subjects were 20 college

Page 10: Depth of Processing and the Retention of Words in Episodic

DEPTH OF PROCESSING AND WORD RETENTION 277

CASE RHYME SENTENCE CASE RHYME SENTENCE

LEVEL OF PROCESSING

FIGURE 3. Proportion of words recalled as a function of the initial task(Experiment 3).

students, none of whom had participated in Experi-ments 1, 2, or 3.

Results and discussion. The responselatencies are shown in Table 3. These dataare very similar to those from Experiment3, indicating that subjects took no longer torespond under intentional learning instruc-tions. Analysis of variance showed thatdeeper levels were associated with longerdecision latencies, F(2, 38) = 27.7, p <.01, and that second presentations were re-sponded to faster, F(l, 19) = 18.9, p <.01. No other effect was statisticallyreliable.

With regard to the recall results, theanalysis of variance yielded significanteffects of processing level, F(2, 38) = 43.4,p < .01, of repetition, F(\, 19) = 69.7, p< .01, and of response type (yes-no}, F(l,19) = 13.9, p < .01. In addition, the Num-ber of Presentations X Level of Processinginteraction, F(2, 38) = 12.4, p < .01, andthe Number of Presentations X ResponseType (yes^no) interaction, F(\, 19) = 7.93,p < .025, were statistically reliable. Figure4 shows that these effects were attributableto superior recall of sentence decisions,

twice-presented words and yes responses.Words associated with semantic questionsand with yes responses showed the greatestenhancement of recall after a second presen-tation.

To further explore the effects of inten-tional versus incidental conditions morecomprehensive analyses of variance werecarried out, involving the data from bothExperiments 3 and 4. For the latency data,there was no significant effect of the inten-tional-incidental manipulation, nor did theintentional-incidental factor interact withany other factor. Thus, knowledge of thefinal recall test had no effect on subjects'decision times. In the case of recall scores,intentional instructions yielded superiorperformance, F(l, 38) = 11.73, p < .01,and the Intentional-Incidental X Number ofPresentations interaction was significant,F(l, 38) = 5.75, p < .05. This latter ef-fect shows that the superiority of inten-tional instructions was greater for twice-presented items. No other interaction in-volving the incidental-intentional factor wassignificant. It may thus be concluded thatthe pattern of results obtained in the present

Page 11: Depth of Processing and the Retention of Words in Episodic

278 FERGUS I. M. CRAIK AND ENDEL TULVtNG

CASE RHYME SENTENCE CASE RHYME SENTENCE

LEVEL OF PROCESSING

FIGURE 4. Proportion of words recalled as a function of the initial task(Experiment 4).

experiments does not depend critically onincidental instructions.

The findings that intentional recall wassuperior to incidental recall, but that deci-sion times did not differ between intentionaland incidental conditions, is at first sightcontrary to the theoretical notions proposedin the introduction to this article. If recallis a function of depth of processing anddepth is indexed by decision time, thenclearly differences in recall should be asso-ciated with differences in initial responselatency. However, it is possible that fur-ther processing was carried out in the inten-tional condition, after the orienting taskquestion was answered, and was thus notreflected in the decision times.

Discussion of Experiments 1—4

Experiments 1-4 have provided empiricalflesh for the theoretical bones of the argu-ment advanced by Craik and Lockhart(1972). When semantic (deeper level)questions were asked about a presentedword, its subsequent retention was greatlyenhanced. This result held for both recog-nition and recall; it also held for both inci-

dental and intentional learning (Hyde &Jenkins, 1969, 1973; Till & Jenkins, 1973).The reported effects were both robust, andlarge in magnitude: Sentence-^« wordsshowed recognition and recall levels whichwere superior to case-wo words by a factorranging from 2.4 to 13.6. Plainly, the na-ture of the encoding operation is an impor-tant determinant of both incidental andintentional learning and hence of retention.

At the same time, some aspects of thepresent results are clearly inconsistent withthe depth of processing formulation outlinedin the introduction. First, words given ayes response in the initial task were betterrecalled and recognized than words given ano response, although reaction times to yesand no responses were identical. Eitherreaction time is not an adequate index ofdepth, or depth is not a good predictor ofsubsequent retention. We will argue theformer case. If depth of processing (definedloosely as increasing semantic-associativeanalysis of the stimulus) is decoupled fromprocessing time, then on the one hand theindependent index of depth has been lost,but on the other hand, the results of Experi-

Page 12: Depth of Processing and the Retention of Words in Episodic

DEPTH OF" PROCESSING AND WORD RETENTION 273

rttents 1-4 can be described in terms ofqualitative differences in encoding opera-tions rather than simply in terms of in-creased processing times. The followingsection describes evidence relevant to thequestion of whether retention performanceis primarily a function of "study time" orthe qualitative nature of mental operationscarried out during that time.

The results obtained under intentionallearning conditions (Experiment 4) arealso not well accommodated by the initialdepth of processing notions. If the largedifferences in retention found in Experi-ments 1-3 are attributable to differentdepths of processing in the rather literalsense that only structural analyses are acti-vated by the case judgment task, phonemicanalyses are activated by rhyme judgments,and semantic analyses activated by categoryor sentence judgments, then surely underintentional learning conditions the subjectwould analyse and perceive the name andmeaning of the target word with all threetypes of question. In this case equal reten-tion should ensue (by the Craik and Lock-hart formulation), but Experiment 4 showedthat large differences in recall were stillfound.

A more promising notion is that retentiondifferences should be attributed to degreesof stimulus elaboration rather than to differ-ences in depth. This revised formulationretains the important point (borne out byExperiments 1-4) that the qualitative na-ture .of encoding operations is critical forthe establishment of a durable trace, butgets away from the notions that semanticanalyses necessarily always follow structuralanalyses and that no meaning is involved inshallow processing tasks.

Discussion of the best descriptive frame-work for these studies will be resumed afterfurther experiments are reported; for themoment, the term depth is retained to signifygreater degrees of semantic involvement.Before further discussions of the theoreticalframework are presented, the following sec-tion describes attempts to evaluate the rela-tive effects of processing time and the qual-itative nature of encoding operations on theretention of words.

PROCESSING TIME VERSUS ENCODINGOPERATIONS

As a first step, the data from Experiment2 were examined for evidence relating theeffects of processing time to subsequentmemory performance. At first sight, Ex-periment 2 provided evidence in line withthe notion that longer categorization timesare associated with higher retention levels—Figure 2 demonstrated linear relationshipsbetween initial decision latency and sub-sequent recognition performance. How-ever, if it is processing time which deter-mines performance, and not the qualitativenature of the task, then within one task,longer processing times should be associatedwith superior memory performance. Thatis, with the qualitative differences in pro-cessing held constant, performance shouldbe determined by the time taken to make theinitial decision. On the other hand, if dif-ferences in encoding operations are criticalfor differences in retention, then memoryperformance should vary between orientingtasks, but within any given task, retentionlevel should not depend on processing time.

This point was explored by analyzing thedata from Experiment 2 in terms of fast andslow categorization times. The 10 responselatencies for each subject in each conditionwere divided into the 5 fastest responsesand the 5 slowest responses. Next, meanrecognition probabilities for the fast andslow subsets of words were calculated acrossall subjects for each condition. The resultsof this analysis are shown in Figure 5;mean medians for the response latencies ineach subset are plotted against recognitionprobabilities. If processing time werecrucial, then the words which fell into theslow subset for each task should have beenrecognized at higher levels than words whichelicited fast responses. Figure 5 showsthat this did not happen; Slow responseswere recognized little better than fastresponses within each level of analysis. Onthe other hand, the qualitative nature ofthe task continued to exert a very largeeffect on recognition performance, suggest-ing again that it is the nature of the encod-

Page 13: Depth of Processing and the Retention of Words in Episodic

280 FERGUS I. M. CRAIK AND ENDEL TULVING

I.U

.9QUJ 8Nz |7

O 6LJ

.5

i .4h-CC 3

§ 'Z

. 1

0

YES Decisions

_A

-

-

• _„

- i

^

NO Decisions

.

-A-̂ 4 SENTENCE• — • RHYME AD — D CASE _ -̂- "

* —

• °~~ ̂ ^̂ — *•

1 1 1 I 1 1 1 t500 600 700 800 900 500 600 700

INITIAL DECISION TIME (msec)

800 900

FIGURE 5. Recognition of words as a function of task and initial decisiontime: Data partitioned into fast and slow decision times (Experiment 2).

ing operations and not processing time whichdetermines memory performance.

For both yes and no responses, slow casecategorization decisions took longer thanfast sentence decisions. However, wordsabout which subjects had made sentencedecisions showed higher levels of recogni-tion; 73% as opposed to 17% for yes re-sponses and 45% as opposed to 17% for noresponses. No statistical analysis wasthought necessary to support the conclusionthat task rather than time is the crucialaspect in these experiments. Since thepoint is an important one, however, a fur-ther experiment was conducted to clinch theissue. Subjects were given either a com-plex structural task or a simple semantictask to perform; it was predicted that thecomplex structural task would take longerto accomplish but that the semantic taskwould yield superior memory performance.

Experiment 5Method. The purpose of Experiment S was to

devise a shallow nonsemantic task which wasdifficult to perform and would thus take longerthan an easy but deeper semantic task. In thisway, further evidence on the relative contribu-tions of processing time and processing depth tomemory performance could be obtained. In bothtasks, a five-letter word was shown in the tachisto-scope for 200 msec and the subject made a yes-wodecision about the word. The nonsemantic deci-sion concerned the pattern of vowels and con-sonants which made up the word. Where V =

vowel and C = consonant, the word brain couldbe characterized as CCVVC, the word uncle asVCCCV, and so on. Before each nonsemantictrial the subject was shown a card with a partic-ular consonant-vowel pattern typed on it; afterstudying the card as long as necessary, the sub-ject looked into the tachistoscope and the wordwas exposed. The experiment was again describedas a perceptual, reaction time study concerningdifferent aspects of words and the subject wasinstructed to respond as rapidly as possible bypressing one of two response keys. The seman-tic task was the sentence task from previousstudies in the series. In this case, the subjectwas shown a card with a short sentence typedon it; the sentence had one missing word, thusthe subject's task was to decide whether the wordon the tachistoscope screen would fit the sentence.Examples of sentence-jiM trials are: "The manthrew the ball to the " (CHILD) and"Near her bed she kept a " (CLOCK).On sentence-Mo trials an inappropriate noun fromthe general pool was exposed on the tachistoscope.Again the subject responded as rapidly as pos-sible. The subjects were not informed of thesubsequent memory test.

The pool of words used consisted of 120 highfrequency, concrete five-letter nouns. Each sub-ject received 40 words on the initial decisionphase of the task and was then shown all 120words, 40 targets and 80 distractors mixed ran-domly, in the second phase. He was then askedto recognize the 40 words he had been shown onthe tachistoscope by circling exactly 40 words.Two forms of the recognition test were typed withthe same 120 words randomized differently. Inall, 24 subjects were tested in the experiment.The pool of 120 words was arbitrarily parti-tioned into three blocks of 40 words; the first 8subjects received one block of 40 as targets and

Page 14: Depth of Processing and the Retention of Words in Episodic

DEPTH OF PROCESSING AND WORD RETENTION 281

the remaining 80 words served as distractors;the second 8 subjects received the second blockof 40 words as targets and the third 8 subjectsreceived the third block of 40—in all cases theremaining 80 words formed the distractor pool.Within each group of 8 subjects who receivedthe same 40 target words, 4 received one formof the recognition test and 4 received the otherform. Finally, within each group of 4 subjects,each word was rotated so .that it appeared (fordifferent subjects) in all four conditions: non-semantic yes and no and semantic yes and no.

Each subject was tested individually. Afterthe two tasks had been explained, he was given afew practice trials, then received 40 further trials,10 under each experimental condition. The orderof presentation of conditions was randomized.After a brief rest period the subject was giventhe recognition list and told to circle exactly 40words (those he had just seen on the tachisto-scope), guessing if necessary. The subjects were24 undergraduate students of both sexes, paidfor their services.

Results. The results of the experimentare straightforward. Table 4 shows that thenonsemantic task took longer to accomplishbut that the deeper sentence task gave riseto higher levels of recognition. Decisionsabout consonant-vowel structure of wordswere substantially slower than sentencedecisions'(1.7 sec as opposed to .85 sec)and this difference was significant statis-tically, F(l, 23) = 11.3, p < .01. Neitherthe response type (yes-no) nor the inter-action was significant. For recognition, theanalysis of variance showed that sentencedecisions gave rise to higher recognition,P(\, 23) = 40.9, p < .001; yes responseswere recognized better than no responses,F(l, 23) = 10.6, p < .01, but the Task XResponse Type interaction was not signifi-cant.

Experiment 5 has thus confirmed the con-clusion from the reanalysis of Experiment2; that it is the qualitative nature of the task—we argue, depth of processing—and notthe amount of processing time, which deter-mines memory performance. Figure 2illustrates that a deep semantic task takeslonger to accomplish and yields superiormemory performance, but when the twofactors are separated it is the task which iscrucial, not processing time as such.

One constant feature of Experiments 1-4has been the superior recall or recognitionof words given a yes response in the initial

TABLE 4DECISION LATENCY AND RECOGNITION PERFORM-

ANCE FOR WORDS AS A FUNCTION OF THE INITIALTASK (EXPERIMENT 5)

Response typeLevel of processing

Structure Sentence

Response latency (sec)

YesNo

1.701.74

.83

.88

Proportion recognized

YesNo

.57

.50.82.69

perceptual phase. This result has alsobeen reported by Schulman (1974). Thereasons for the better retention of yes re-sponses are not immediately apparent; forexample, it is not obvious that positiveresponses require deeper processing beforethe initial perceptual decision can be made.This problem invites a closer investigationof the yes-no difference and may perhapsforce a further reevaluation of the concept ofdepth.

POSITIVE AND NEGATIVE CATEGORIZATIONDECISIONS

Why are words to which positive re-sponses are made in the perceptual-decisiontask better remembered ? As discussed pre-viously, it does not seem intuitively reason-able that words associated with yes responsesrequire deeper processing before the deci-sion is made. However, if high levels ofretention are associated with "rich" or"elaborate" encodings of the word (ratherthan deep encodings), the differences inretention between positive and negativewords become understandable. In caseswhere a positive response is made, theencoding question and the target word canform a coherent, integrated unit. Thisintegration would be especially likely withsemantic questions: for example, "A four-footed animal?" (BEAR) or "The boy met a

— on the street" (FRIEND). How-ever,, integration of the question and tar-get word would be much less likely in thenegative case: "A four-footed animal ?"

Page 15: Depth of Processing and the Retention of Words in Episodic

282 FERGUS I. M. CRAIK AND ENDEL TULVING

(CLOUD) or "The boy met a onthe street" (SPEECH). Greater degrees ofintegration (or, alternatively, greater de-grees of elaboration of the target word)may support higher retention in the sub-sequent test. This factor of integration orcongruity (Schulman, 1974) between tar-get word and question would also apply torhyme questions but not to questions abouttypescript: If the target word is in capitalletters (a yes decision), the word's encod-ing would be elaborated no more than if theword had been presented in lowercase type(a no decision). This analysis is based onthe premise that effective elaboration of anencoding requires further descriptive attri-butes which (a) are salient, or applicable tothe event, and (b) specify the event moreuniquely. While positive semantic andrhyme decisions fit this description, neg-ative semantic and rhyme decisions andboth types of case decision do not. In linewith this analysis is the finding from Experi-ments 1-4 that while positive decisions areassociated with higher retention levels forsemantic and rhyme questions, words elicit-ing positive and negative decisions areequally well retained after typescript judg-ments.

If the preceding argument is valid, thenquestions leading to equivalent elaborationfor positive and negative decisions should befollowed by equivalent levels of retention.Questions which appear to meet the caseare those of the type "Is the object biggerthan a chair?" In this case both positivetarget words (HOUSE, TRUCK) and negativetarget words (MOUSE, PIN) should be en-coded with equivalent degrees of elabora-tion; thus, they should be equally wellremembered. This proposition was testedin Experiment 6.

Experiment 6Method, Eight descriptive dimensions were

used in the study: size, length, width, height,weight, temperature, sharpness, and value. Foreach of these dimensions, a set of eight concretenouns was generated, such that the dimension was asalient descriptive feature for the words in each set(e.g., size-ELEPHANT, MOUSE; value-DiAMOND,CRUMB). The words were chosen to span the com-plete range of the relevant dimension (e.g., fromvery small to very large; very hot to very cold).

For each set an additional reference object waschosen such that half of the objects represented bythe word set were "greater than" the reference ob-ject and half of the objects were "less than" thereferent. The reference object was always usedin the question pertaining to that dimension;examples were "Taller than a man?" (STEEPLE-yes; CHILD-WO), "More valuable than $10?"(JEWEL-JIM; BUTTON-WO). " Sharper than afork?" (NEEDLE-JIM; CLUB-no). For half of thesubjects, the question was reversed in sense, sothat words given a yes response by one group ofsubjects were given a no response by the othergroup. Thus, "Taller than a man?" became"Shorter than a man?" (STEEPLE-WO; CHILD-yes).

Each subject was asked questions relating totwo dimensions; he thus answered 16 questions—4 yielding positive responses and 4 yielding neg-ative responses for each dimension. Four dif-ferent versions of the questions and targets wereconstructed, with two different dimensions beingused in each version. Four subjects received eachversion—two received the original questions (e.g.,"heavier than . . ." "hotter than . . .") and tworeceived the questions reversed ("lighter than . . .""colder than . . ."). Thus each subject received16 questions; both question type and responsetype (yes-no) were randomized. Subjects were16 undergraduate students of both sexes; theywere paid for their services.

On each trial, the subject looked into a tachisto-scope; the question was presented auditorily, and2 sec later the target word was exposed for 1sec. The subject responded by pressing the ap-propriate one of two keys. Subjects were againtold that they had to make rapid judgments aboutwords; they were not informed of the retentiontest. After completing the 16 question trials,subjects were asked to recall the target words.Each subject was reminded of the questions hehad been asked. Thus, in this study, memorywas assessed in the presence of the originalquestions.

Results. Again, the results are mucheasier to describe than the procedure.Words given yes responses were recalledwith a probability of .36, while words givenno responses were recalled with a probabil-ity of .39. These proportions did not differsignificantly when tested by the Wilcoxontest. Thus, when positive and negativedecisions are equally well encoded, the re-spective sets of target words are equally wellrecalled. The results of this demonstrationstudy suggest that it is not the type ofresponse given to the presented word that isresponsible for differences in subsequentrecall and recognition, but rather the rich-

Page 16: Depth of Processing and the Retention of Words in Episodic

DEPTH OF PROCESSING AND WORD RETENTION 283

ness or elaborateness of the encoding. Itis possible that negative decisions in Experi-ments 1-4 were associated with rather poorencodings of the presented words—they didnot fit the encoding question and thus didnot form an integrated unit with the ques-tion. On the other hand, positive responseswould be integrated with the question, andthus, arguably, formed more elaborate en-codings which supported better retentionperformance.

Experiment 7 was an attempt to manip-ulate encoding elaboration more directly.Only semantic information was involved inthis study. All encoding questions weresentences with a missing word; on half ofthe trials the word fitted the sentence (thusall queries were congruous in Schulman'sterms). The degree of encoding elabora-tion was varied by presenting three levelsof sentence complexity, ranging from verysimple, • spare sentence frames (e.g., "Hedropped the ") to complex, elaborateframes (e.g., "The old man hobbled acrossthe room and picked up the valuablefrom the mahogany table"). The wordpresented was WATCH in both cases. Al-though the second sentence is no morepredictive of the word, it should yield amore elaborate encoding and thus superiormemory performance.

Experiment 7Method. Three levels of sentence complexity

were used: simple, medium, and complex. Eachsubject received 20 sentence frames at each levelof complexity; within each set of 20 there were10 yes responses and 10 no responses. The 60encoding trials were randomized with respectto level of complexity and response type. Aconstant pool of 60 words was used in the experi-ment, but two completely different sets of en-coding questions were constructed. Words wererandomly allocated to sentence level and responsetype in the two sets (with the obvious constraintthat yes and no words clearly fitted or did notfit the sentence frame, respectively). Withineach set of sentence frames, two different ran-dom presentation orders were constructed. Fivesubjects were presented with each format thusgenerated and 20 subjects were tested in all.

The words used were common nouns. Examplesof sentence frames used are: simple, "She cookedthe " "The • is torn"; medium, "The

frightened the children" and "The ripetasted delicious"; complex, "The great bird

swooped down and carried off the struggling" and "The small lady angrily picked up

the red ." The sentence frames werewritten on cards and given to the subject. Afterstudying it he looked into the tachistoscope withone hand on each response key. After a readysignal the word was presented for 1.0 sec andthe subject responded yes or no by pressing theappropriate key. The words were exposed fora longer time in this study since the questionswere more complex. Subjects were again toldthat the experiment was concerned with percep-tion and speed of reaction and that they shouldthus respond as rapidly as possible. No mentionwas made of a memory test. The 20 subjectswere tested individually. They were undergrad-uate students of both sexes, paid for their services.

After completing the 60 encoding trials, sub-jects were given a short rest and then asked torecall as many words as they could from the firstphase of the experiment. They were given 8 minfor free recall. After a further rest, they weregiven the deck of cards containing the originalsentence frames (in a new random order) andasked to recall the word associated with eachsentence. Thus there were two retention tests inthis study: free recall followed by cued recall.

Results. Figure 6 shows the results.For free recall, there is no effect of sentencecomplexity in the case of no responses, buta systematic increase in recall from simpleto complex in the case of yes responses.The provision of the sentence frames ascues did not enhance the recall of no re-sponses, but had a large positive effect onthe recall of yes responses; the effect ofsentence complexity was also amplified incued recall. These observations were con-

i.o.9

QLd .8

SIMPLE MEDIUM COMPLEX

SENTENCE TYPE

FIGURE 6. Proportion of words recalled as afunction of sentence complexity (Experiment 7).(CR = cued recall, NCR = noncued recall.)

Page 17: Depth of Processing and the Retention of Words in Episodic

284 FERGUS I. M. CRAIK AND ENDEL TULVING

firmed by analysis of variance. In freerecall, a greater proportion of words givenpositive responses were recalled than thosegiven negative responses, F(l, 19) = 18.6,p < .001; the overall effect of complexitywas not significant, F(2, 38) = 2.37, p >.05, but the interaction between complexityand yes-no was reliable, F(2, 38) = 3.78,p < .05. A further analysis, involving posi-tive responses only, showed that greatersentence complexity was reliably associatedwith higher recall levels, F(2, 38) = 4.44,p < .025. In cued recall, there were sig-nificant effects of response type, F(\, 19)= 213, p < .001, complexity, F(2, 38) =49.2, p < .001, and the Complexity X Re-sponse Type interaction, F(2, 38) = 19.2,p < .001. An overall analysis of variance,incorporating both free and cued recall, wasalso carried out and this analysis revealedsignificantly higher performance for greatercomplexity, F(2, 38) = 36.5, p < .001,for positive target words, F(\, 19)= 139, p < .001, and for cued recall rela-tive to free recall, F(\, 19) = 100, p <.001. All the interactions were significantat the p < .01 level or better; the descrip-tion of these effects is provided by Figure 6.

Experiment 7 has thus demonstrated thatmore complex, elaborate sentence framesdo lead to higher recall, but only in the caseof positive target words. Further, theeffects of complexity and response type aregreatly magnified by reproviding the sen-tence frames as cues.

These results do not fit the original simpleview that memory performance is deter-mined only by the nominal level of pro-cessing. In all conditions of Experiment 7semantic processing of the target word wasnecessary, yet there were still large differ-ences in performance depending on sentencecomplexity, the relation between target wordand the sentence context, and the presenceor absence of cues. It seems that otherfactors besides the level of processing re-quired to make the perceptual decision areimportant determinants of memory perform-ance.

The notion of code elaboration providesa more satisfactory basis for describing theresults. If a presented word does not fit

the sentence frame, the subject cannot forma unified image or percept of the completesentence, the memory trace will not rep-resent an integrated meaningful pattern,and the word will not be well recalled. Inthe case of positive responses, such coherentpatterns can be formed and their degree ofcognitive elaborateness will increase withsentence complexity. While increased elab-oration by itself leads to some increase inrecall (possibly because richer sentenceframes can be more readily recalled) per-formance is further enhanced when part ofthe encoded trace is reprovided as a cue.It is well established that cuing aids recall,provided that the cue information has beenencoded with the target word at presenta-tion and thus forms part of the same encodedunit (Tulving & Thomson, 1973). Thepresent results are consistent with the find-ing, but may also be interpreted as showingthat a cue is effective to the extent that thecognitive system can encode the cue and thetarget as a congruous, integrated unit.Elaborate cues by themselves do not aidperformance even if they were presentedwith the target word at input, as shown bythe poor recall of negative response words.It is also necessary that the target and thecue form a coherent, integrated pattern.

Schulman (1974) reported results whichare essentially identical to the results ofExperiment 7. He found better recall ofcongruous than incongruous phrases; healso found that cuing benefited congruouslyencoded words much more than incongruouswords. Schulman suggests that congruentwords can form a relational encoding withtheir context, and that the context can thenserve as an effective redintegrative cue atrecall (Begg, 1972; Horowitz & Prytulak,1969). In these terms, Experiment 7 hasadded the finding that the semantic richnessof the context benefits congruent encodingsbut has no effect on the encoding of incon-gruous words.

Is the concept of depth still useful indescribing the present experimental results,or are the findings better described in termsof the "spread" of encoding where spreadrefers to the degrees of encoding elaborationor the number of encoded features? These

Page 18: Depth of Processing and the Retention of Words in Episodic

DEPTH OF PROCESSING AND WORD RETENTION 285

questions will be taken up in the generaldiscussion, but in outline, we believe thatdepth still gives a useful account of themajor qualitative shifts in a word's encod-ing (from an analysis of physical featuresthrough phonemic features to semantic prop-erties). Within one encoding domain, how-ever, spread or number of encoded featuresmay be better descriptions. Before grap-pling with these theoretical issues, three finalshort experiments will be described. Thefindings from the preceding experimentswere so robust that it becomes of interestto ask under what conditions the effects ofdifferential encoding disappear. Experi-ments 8, 9, and 10 were attempts to setboundary limits on the phenomena.

FURTHER EXPLORATIONS OF DEPTH ANDELABORATION

The three studies described in this sec-tion were undertaken to examine furtheraspects of depth of processing and to throwmore light on the factors underlying goodmemory performance. The first experi-ment explored the idea that the critical dif-ference between case-encoded and sentence-encoded words might lie in the similarityof encoding operations within the group ofcase-encoded words. That is, each case-encoded word is preceded by the same ques-tion, "Is the word in capital letters?",whereas each rhyme-encoded and sentence-encoded word has its own unique question.At retrieval, it is likely that the subject useswhat he can remember of the encodingquestion to help him retrieve the targetword, Plausibly, encoding questions whichwere used for many target words would beless effective as retrieval cues since theydo not uniquely specify one encoded eventin episodic memory. This overloading ofretrieval cues would be particularly evidentfor case-encoded words. It is possible toextend the argument to rhyme-encodedwords also; although each target wordreceives a different rhyme question, pho-nemic differences may not be so unique ordistinctive as semantic differences (Lock-hart, Craik, & Jacoby, 1975).

Some empirical support for these ideasmay be drawn from two unpublished studiesby Moscovitch and Craik (Note 1). Thefirst study used the same paradigm as thepresent series and compared cued with non-cued recall, where the cues were the originalencoding questions. It was found that cuingenhanced recall, and that the effect of cuingwas greater with deeper levels of encoding.Thus the encoding questions do helpretrieval, and their beneficial effect isgreatest with semantically encoded words.The second study showed that when severaltarget words shared the same encodingquestion (e.g., "Rhymes with train?" BRAIN,CRANE, PLANE; "Animal category?" LION,HORSE, GIRAFFE), the sharing manipulationhad an adverse effect on cued recall. Fur-ther, the adverse effect was greatest fordeeper levels of encoding, suggesting thatthe normal advantage to deeper levels isassociated with the uniqueness of the en-coded question-target complex, and thatwhen this uniqueness is removed, themnemonic advantage disappears.

These ideas and findings suggest anexperiment in which a case-encoded wordis made more unique by being the one wordin an encoding series to be encoded in thisway. In this situation the one case wordmight be remembered as well as a word,which, nominally, received deeper process-ing. Such an experiment in its extremeform would be expensive to conduct, in thatone word forms the focus of interest. Ex-periment 8 pursues the idea of uniquenessin a less extreme form. Three groups ofsubjects each received 60 encoding trials;each trial consisted of a case, rhyme, orcategory question. However, each groupof subjects received a different number oftrials of each question type: either 4 case,16 rhyme, and 40 category trials; 16, 40,and 4 trials; or 40, 4, and 16 trials, respec-tively. The prediction was that while thetypical pattern of results would be foundwhen 40 trials of one type were given, sub-sequent recognition performance would beenhanced with smaller set sizes; this en-hancement would be especially marked forthe case level of encoding.

Page 19: Depth of Processing and the Retention of Words in Episodic

286 FERGUS I. M. CRAIK AND ENDEL TULVING

TABLE 5DESIGN AND RESULTS OF EXPERIMENT 8

Experimentalcondition

Case Rhyme

Yes No Yes No

Category

Yes No

Design: Number of trials per condition

Group 1Group 2Group 3

2820

2820

8202

8202

2028

2028

Proportion recognized

Group 1Group 2Group 3

Set size 4Set size 16Set size 40

.50

.51

.49

.50

.51

.49

.36

.40

.43

.36

.40

.43

.73

.66

.90

.90

.73

.66

.47

.54

.70

.70

.47

.54

.88

.95

.91

.95

.91

.88

.70

.64

.68

.64

.68

.70

Experiment 8Method. Three groups of subjects were tested.

Group 1 received 4 case questions, 16 rhymequestions, and 40 category questions. Group 2received 16, 40, and 4, respectively, while Group3 received 40, 4, and 16, respectively. At eachlevel of encoding, half of the questions were de-signed to elicit yes responses and half no responses.Thus each group received 60 trials; question typeand response type were randomized. The designis shown in Table 5.

The subjects were tested individually. Eachquestion was read by the experimenter while thesubject looked in the tachistoscope; the word wasexposed for 200 msec and the subject respondedby pressing one of two response keys. The sub-jects were informed that the test was a perceptual-reaction time task; the subsequent memory testwas not mentioned. After completing the 60 en-coding trials, each subject was given a sheetcontaining the 60 target words plus 120 distrac-tors. He was told to check exactly 60 words-—those words he had seen on the tachistoscope.

The same pool of 60 common nouns was usedas targets throughout the experiment. Withineach experimental group there were four pre-sentation lists; in each case Lists 1 and 2 differedonly in the reversal of positive and negative deci-sions (e.g., category-jiej in List 1 became cat-egory-no in List 2). Lsits 3 and 4 contained afresh randomization of the 60 words, but againLists 3 and 4 differed between themselves onlyin the reversal of positive and negative responses.In all, 32 subjects were tested in the experiment;11 each in Groups 1 and 2, and 10 in Group 3.Two or three subjects were tested under eachrandomization condition.

Results. Table 5 shows the proportionrecognized by each group. Each groupshows the typical pattern of results already

familiar from Experiments 1-4; there is noevidence of a perturbation due to set size.Table 5 also shows the recognition resultsorganized by set size; it may now be seenthat set size does exert some effect, mostconspicuously on rhyme-yes responses.However, the differences previously attri-buted to different levels of encoding werecertainly not eliminated by the manipula-tion of set size; in general, when set sizewas held constant (across groups), strongeffects of question type were still found.

To recapitulate, the argument underlyingExperiment 8 was that in the standard ex-periment, the encoding operation for casedecisions is, in some sense, always the same;for rhyme decisions, it is somewhat similarfrom word to word, and is most dissimilaramong words in the category task. If theisolation effect in memory (see Cermak,1972) is a consequence of uniqueness ofencoding operations, then when similar en-codings (e.g., "case decision" words) arefew in number, they should also be encodeduniquely, show the isolation effect, and thusbe well recalled. Table 5 shows that reduc-ing the number of case-encoded words from40 to 4 did not enhance their recall, thuslack of isolation cannot account for their lowretention. On the other hand, a reductionin set size did enhance the recall of rhyme-encoded words, thus isolation effects mayplay some part in these experiments,although they cannot account for all aspects

Page 20: Depth of Processing and the Retention of Words in Episodic

DEPTH OF PROCESSING AND WORD RETENTION 287

of the results. Finally, it may be of someinterest that recall proportions for rhymes-Set Size 4 are quite similar to category-SetSize 40 (.90 and .70 vs. .88 and .70); thisobservation is at least in line with the notionthat when rhyme encodings are made moreunique, their recall levels are equivalent tosemantic encodings.

Experiment 9: A Classroom Demonstration

Throughout this series of experiments,experimental rigor was strictly observed.Words were exposed for exactly 200 msec;great care was exercised to ensure thatsubjects would not inform future subjectsthat a memory test formed part of the ex-periment; subjects were told that the experi-ments concerned perception and reactiontime; response latencies were painstakinglyrecorded in all cases. One of the authors,by nature more skeptical than the other, hadformed a growing suspicion that this rigorreflected superstitious behavior rather thanessential features of the paradigm. Thisfeeling of suspicion was increased by thefinding of the typical pattern of results inExperiment 9, which was conducted underintentional learning conditions. Accord-ingly, a simplified version of Experiment 2was formulated which violated many of therules observed in previous studies. Sub-jects were informed that the main purposeof the experiment was to study an aspect ofmemory; thus the final recognition test wasexpected and encoding was intentionalrather than incidental. Words were pre-sented serially on a screen at a 6-sec rate;during each 6-sec interval subjects recordedtheir response to the encoding question.Indeed, the subjects were tested in one groupof 12 in a classroom situation during a courseon learning and memory; they recordedtheir own judgments on a question sheet andsubsequently attempted to recognize the tar-get words from a second sheet. Reactiontimes were not measured.

The point of this study was not to attackexperimental rigor, but rather to deter-mine to what extent the now familiar pat-tern of results would emerge under thesemuch looser conditions. If such a patterndoes emerge, it will force a further examina-tion of what is meant by deeper levels of

TABLE 6PROPORTION OF WORDS RECOGNIZED FROM Two

REPLICATIONS OF EXPERIMENT 9

Responsetype Case Rhyme Category

1st study

YesNo

.23 .59

.28 .33.81.62

2nd study

YesNo

.42

.37.65.50

.90

.65

processing and what factors underlie thesuperior retention of deeply processedstimuli.

Method. On a projection screen, 60 words werepresented, one at a time, for 1 sec each with aS-sec interword interval. All subjects saw thesame sequence of words, but different subjectswere asked different questions about each word.For example, if the first word was COPPER, onesubject would be asked, "Is the word a metal?",a second, "Is the word a kind of fruit?", a third,"Does the word rhyme with STOPPER?", and soon. For each word, six questions were asked(case, rhyme, category X yes-no). During theseries of 60 words, each subject received 10 trialsof each question-response combination, but in adifferent random order. The questions were pre-sented in booklets, 20 questions per page. Sixtypes of question sheet were made up, each typepresented to two subjects. These sheets balancedthe words across question types. The subjectstudied the question, saw the word exposed on thescreen, then answered the question by checkingyes or no on the sheet. After the 60 encodingtrials, subjects received a further sheet contain-ing 180 words consisting of the original 60 targetwords plus 120 distractors. The subjects wereasked to check exactly 60 words as "old." Twodifferent randomizations of the recognition listwere constructed; this control variable was crossedwith the six types of question sheets. Thus eachof the 12 subjects served in a unique replicationof the experiment. Instructions to subjectsemphasized that their main task was to rememberthe words, and that a recognition test wouldbe given after the presentation phase. The ma-terials used are presented in the Appendix.

Result. The top of Table 6 shows thatthe results of Experiment 9 are quite similarto those of Experiment 2, despite the factthat in the present study subjects knew ofthe recognition test and words were pre-sented at the rate of 6 sec each. The find-ing that subjects show exactly the same pat-

Page 21: Depth of Processing and the Retention of Words in Episodic

288 FERGUS I. M. CRAIK AND ENDEL TULVING

tern of results under these very differentconditions attests to the fact that the basicphenomenon under study is a robust one.It parallels results from Experiment 4 andprevious findings of Hyde and Jenkins(1969, 1973). Before considering theimplications of Experiment 9, a replicationwill be mentioned. This second experimentwas a complete replication with 12 othersubjects. The results of the second studyare also shown in Table 6. Overall recog-nition performance was higher, especiallywith case questions, but the pattern is thesame.

The results of these two studies are quitesurprising. Despite intentional learningconditions and a slow presentation rate,subjects were quite poor at recognizingwords which had been given shallow encod-ings. Since subjects in this experimentwere asked to circle exactly 60 words, theycould not have used a strict criterion ofresponding. Thus their low level of recog-nition performance in the case task mustreflect inadequate initial registration of theinformation or rapid loss of registered infor-mation. Indeed, chance performance inthis task would be 33%; we have not cor-rected the data for chance in any experi-ment. The question now arises as to whysubjects do not encode case words to adeeper level during the time after theirjudgment was recorded. It is possible thatrecognition of the less well-encoded items issomehow adversely affected by well-encodeditems. It is also possible that subjects donot know how best to prepare for a memorytest and thus do no further processing ofeach word beyond the particular judgmentthat is asked. A third hypothesis, that sub-jects were poorly motivated and thus simplydid not bother to rehearse case words in amore effective way, is put to test in thefinal experiment. Here subjects were paidby results; in one condition the recognitionof case words carried a much higher rewardthan the recognition of category words.

In any event, Experiment 9 has demon-strated that encoding operations constitutean important determinant of learning orretention under a wide variety of experi-mental conditions. The finding of a strongeffect under quite loosely controlled class-

room conditions, without the trappings oftimers and tachistoscopes, is difficult toreconcile with the view that was implicit inthe initial experiments of the series: thatprocessing of an item is somehow stoppedat a particular level and that an additionalfraction of a second would have led to bet-ter performance. This view is thereforenow rejected. It seems to be the qualitativenature of the encoding achieved that isimportant for memory, regardless of howmuch time the system requires to reachsome hypothetical level or depth of encod-ing.

Experiment 10The final experiment to be reported was

carried out to determine whether subjectscan achieve high recognition performancewith case-encoded words if they are givena stronger inducement to concentrate onthese items. Subjects were paid for eachword correctly recognized; also, they wereinformed beforehand that a recognition testwould be given. Correct recognition of thethree types of word was differentially re-warded under three different conditions.Subjects know that case, rhyme, and cat-egory words carried either a 1 ,̂ 3(f, or 6^reward.

Method. Subjects were tested under the sameconditions as subjects in Experiment 9. Thatis, 60 words were presented for 1 sec each plusS sec for the subject to record his judgment.Each subject had 20 words under each encodingcondition (case, rhyme, category) with 10 yes and10 no responses in each condition. As in Experi-ment 9, each word appeared in each encodingcondition across different subjects. After theinitial phase, subjects were given a recognitionsheet of 180 words (60 targets plus 120 distrac-tors) and instructed to check exactly 60 words.

There were three experimental groups. Allsubjects were informed that the experiment wasa study of word recognition, that they would bepaid according to the number of words theyrecognized, and therefore that they shouldattempt to learn each word. The groups differedin the value associated with each class of word:Group 1 subjects knew that they would be paid10, 60, and 30 for case, rhyme, and categorywords, respectively; Group 2 subjects were paid30, 10, and 60, respectively; and Group 3 subjectswere paid 6tf, 30, and 10, respectively. Theseconditions are summarized in Table 7. Thus,across groups, each class of words was associatedwith each reward. There were 12 undergraduatesubjects in each of three groups.

Page 22: Depth of Processing and the Retention of Words in Episodic

DEPTH OF PROCESSING AND WORD RETENTION 289

Results. Table 7 shows that while recog-nition performance was somewhat higherthan the comparable conditions of Experi-ment 9 (Table 6), the differential rewardmanipulation had no effect whatever. Ananalysis of variance confirmed the obvious;there were significant effects due to typeof encoding, F(2, 22) = 90.7, p < .01,response type (yes-no), F(l, 11) = 42.4,p < .01, and the Encoding X ResponseType interaction, F(2, 22) = 4.13, p < .05,but no significant main effect or interactionsinvolving the differential reward conditions.

Although this experiment yielded a nullresult, its results are not without interest.Even when subjects were presumably quitemotivated to learn and recognize case-encoded words, they failed to reach the per-formance levels associated with rhyme orcategory words. Subjects in Group 3(6-3-1) reported that although they reallydid attempt to concentrate on case words, .the category words were somehow "simplyeasier" to recognize in the second phase ofthe study.

Thus, Experiments 8, 9, and 10, con-ducted in an attempt to establish the bound-ary conditions for the depth of processingeffect, failed to remove the strong superi-ority originally found for semantically en-coded words. The effect is not due to iso-lation, in the simple sense at least (Experi-ment 8), it does not disappear under inten-tional learning conditions and a slow pre-sentation rate (Experiment 9), and it re-mains when subjects are rewarded more forrecognizing words with shallower encod-ings (Experiment 10). The problem nowis to develop an adequate theoretical con-text for these findings and it is to this taskthat we now turn.

GENERAL DISCUSSION

The experimental results will first bebriefly summarized. Experiments 1-4showed that when subjects are asked tomake various cognitive judgments aboutwords exposed briefly on a tachistoscope,subsequent memory performance is stronglydetermined by the nature of that judgment.Questions concerning the word's meaningyielded higher memory performance thanquestions concerning either the word's

TABLE 7PROPORTIONS OF WORDS RECOGNIZED UNDER

EACH CONDITION IN EXPERIMENT 10

Encodingoperation

Case

YesNo

RhymeYesNo

CategoryYesNo

MeanYesNo

Reward value1 cent 3 cents 6 cents

.50

.51

.73

.53

.93

.72

.72

.59

.51

.50

.73

.50

.89

.75

.71

.58

.54

.52

.69

.60

.88

.77

.70

.63

M

.52

.51

.72

.54

.90

.75

.71

.60

sound or the physical characteristics of itsprinted form. Further, positive decisionsin the initial task were associated withhigher memory performance (for moresemantic questions at least) than werenegative decisions. These effects wereshown to hold for recognition and recallunder incidental and intentional memoriz-ing conditions. One analysis of Experi-ment 2 showed that recognition increasedsystematically with initial categorizationtime, but a further analysis demonstratedthat it was the nature of the encoding op-erations which was crucial for retention,not the amount of time as such. Experi-ment 5 confirmed that conclusion. Experi-ments 6 and 7 explored possible reasonsfor the higher retention of words givenpositive' responses; it was argued that en-coding elaboration provided a more satis-factory description of the results than depthof encoding. Experiment 8 showed thatisolation effects could not by themselvesgive an account of the results, Experiment9 demonstrated that the main findings stilloccurred under much looser experimentalconditions, and Experiment 10 showed thatthe pattern of results was unaffected whendifferential rewards were offered for remem-bering words associated with differentorienting tasks.

This set of results confirms and extendsthe findings of other recent investigations,

Page 23: Depth of Processing and the Retention of Words in Episodic

290 FERGUS I. M. CRAIK AND ENDEL TULVING

notably the series of studies by Hyde, Jenk-ins, and their colleagues (Hyde, 1973; Hydeand Jenkins, 1969, 1973; Till & Jenkins,1973; Walsh & Jenkins, 1973) and bySchulman (1971, 1974). It is abundantlyclear that what determines the level of recallor recognition of a word event is not inten-tion to learn, the amount of effort involved,the difficulty of the orienting task, theamount of time spent making judgmentsabout the items, or even the amount ofrehearsal the items receive (Craik & Wat-kins, 1973) ; rather it is the qualitativenature of the task, the kind of operationscarried out on the items, that determinesretention. The problem now is to developan adequate theoretical formulation whichcan take us beyond such vague statementsas "meaningful things are well remem-bered."

Depth of Processing

Craik and Lockhart (1972) suggestedthat memory performance depends on thedepth to which the stimulus is analyzed.This formulation implies that the stimulusis processed through a fixed series of ana-lyzers, from structural to semantic; thatthe system stops processing the stimulusonce the analysis relevant to the task hasbeen carried out, and that judgment timemight serve as an index of the depth reachedand thus of the trace's memorability.

These original notions now seem unsatis-factory in a number of ways. First, thepostulated series of analyzers cannot lie ona continuum since structural analyses do notshade into semantic analyses. The modifiedview of "domains" of encoding (Sutherland,1972) was suggested by Lockhart, Craik,and Jacoby (1975). The modificationpostulates that while some structuralanalysis must precede semantic analysis,a full structural analysis is not usually car-ried out; only those structural analysesnecessary to provide evidence for subsequentdomains are performed. Thus, in the casewhere a stimulus is highly predictable atthe semantic level, only rather minimalstructural analysis, sufficient to confirm theexpectation, would be carried out. Theoriginal levels of processing viewpoint isalso unsatisfactory in the light of the present

empirical findings if it is assumed that yesand no responses are processed to roughlythe same depth before a decision can bemade, since there are no differences inreaction times, yet there are large differ-ences in retention of the words.

Second, large differences in retentionwere also found when the complexity ofthe encoding context was manipulated.Experiment 7 showed that elaborate sen-tence frames led to higher recall levels thandid simple sentence frames. This observa-tion suggests than an adequate theory mustnot focus only on the nominal stimulus butmust also consider the encoded pattern of"stimulus in context."

Third, and most crucial perhaps, strongencoding effects were found under inten-tional learning conditions in Experiments 4and 9; it is totally implausible that, undersuch conditions, the system stops processingthe stimulus at some peripheral level.Unless one assumes complete perversity ofsubjects, it must be clear that the word isfully perceived on each trial. Thus, dif-ferential depth of encoding does not seema promising description, except in very gen-eral terms. Finally, as detailed earlier,initial processing time is not always a goodpredictor of retention. Many of the ideassuggested in the Craik and Lockhart (1972)article thus stand in need of considerablemodification if that processing frameworkis to remain useful.

Degree of Encoding Elaboration

Is spread of encoding a more satisfactorymetaphor than depth? The implicationof this second description is that while averbal stimulus is usually identified as aparticular word, this minimal core encodingcan be elaborated by a context of furtherstructural, phonemic, and semantic encod-ings. Again, the memory trace can be con-ceptualized as a record of the various pat-tern-recognition and interpretive analysescarried out on the stimulus and its context;the difference between the depth and spreadviewpoints lies only in the postulated orga-nization of the cognitive structures respon-sible for pattern recognition and elabora-tion, with depth implying that encodingoperations are carried out in a fixed

Page 24: Depth of Processing and the Retention of Words in Episodic

DEPTH OF PROCESSING AND WORD RETENTION 291

sequence and spread leading to the moreflexible notion' that the basic perceptualcore of the event can be elaborated in manydifferent ways. The notion of encodingdomains suggested by Lockhart, Craik, andJacoby (1975) is in essence a spread theory,since encoding elaboration depends more onthe breadth of analysis carried out withineach domain than on the ordinal position ofan analysis in the processing sequence.However, while spread and elaboration mayindeed be better descriptive terms for theresults reported in this paper, it should beborne in mind that retention dependscritically on the qualitative nature of theencoding operations performed—a minimalsemantic analysis is more beneficial formemory than an elaborate structural analysis(Experiment 5).

Whatever the sequence of operations, thepresent findings are well described by theidea that memory performance depends onthe elaborateness of the final encoding.Retention is enhanced when the encodingcontext is more fully descriptive (Experi-ment 7), although this beneficial effect isrestricted to cases where the target stim-ulus is compatible with the context and canthus form an integrated encoded unit withit. Thus the increased elaboration providedby complex sentence frames in Experiment7 did'not increase recall performance in thecase of negative response words.- The sameargument can be applied to the generallysuperior retention of positive responsewords in all the present experiments; forpositive responses the encoding questioncan be integrated with the target word anda more elaborate unit formed. In certaincases, however, positive responses do notyield a more elaborately encoded unit; suchcases occur when negative decisions specifythe nature of the attributes in question asprecisely as positive decisions. For ex-ample, the response no to the question "Isthe word in capital letters?" indicatesclearly that the word is in lowercase letters;similarly a no response to the question "Isthe object bigger than a man?" indicatesthat the object is smaller than a man. Whenno responses yield as elaborate an encodingas yes responses, memory performancelevels are equivalent. There is nothing

inherently superior about a yes response;retention depends on the degree of elabora-tion of the encoded trace.

Several authors (e.g., Bower, 1967; Tul-ving & Watkins, 1975) have suggested thatthe memory trace can be described in termsof its component attributes. This viewpointis quite compatible with the notion of encod-ing elaboration. The position argued in thissection is that the trace may be consideredthe record of encoding operations carried outon the input; the function of these opera-tions is to analyze and specify the attributesof the stimulus. However, it is necessaryto add that memory performance cannot beconsidered simply a function of the num-ber of encoded attributes; the qualitativenature of these attributes is critically im-portant. A second equivalent descriptionis in terms of the "features checked" duringencoding. Again, a greater number of fea-tures (especially deeper semantic features)implies a more elaborate trace.

Finally, it seems necessary to bring in theprinciple of integration or congruity for acomplete description of encoding. That is,memory performance is enhanced to theextent that the encoding question or contextforms an integrated unit with the targetword. The higher retention of positivedecision words in Schulman's (1974) studyand in the present experiments can be de-scribed in this way. The question immedi-ately arises as to why integration with theencoding context is so helpful. One pos-sibility is that an encoded unit is unitizedor integrated on the basis of past experienceand, just as the target stimulus fits naturallyinto a compatible context at encoding, so atretrieval, re-presentation of part of theencoded unit will lead easily to regenerationof the total unit. The suggestion is that atencoding the stimulus is interpreted interms of the system's structured record ofpast learning, that is, knowledge of theworld or "semantic memory" (Tulving,1972) ; at retrieval, the information pro-vided as a cue again utilizes the structureof semantic memory to reconstruct the initialencoding. An integrated or congruousencoding thus yields better memory per-formance, first, because a more elaboratetrace is laid down and, second, because

Page 25: Depth of Processing and the Retention of Words in Episodic

292 FERGUS I. M. CRAIK AND ENDEL TULVING

richer encoding implies greater compatibilitywith the structure, rules, and organizationof semantic memory. This structure, inturn, is drawn upon to facilitate retrievalprocesses.

Broader Implications

Finally, the implications of the presentexperiments and the related work reportedby Hyde and Jenkins (1969, 1973), Schul-man (1971, 1974) and Kolers (1973a;Kolers & Ostry, 1974) will be briefly dis-cussed. All these studies conform to thenew look in memory research in that thestress is on mental operations; items areremembered not as presented stimuli actingon the organism, but as components of men-tal activity. Subjects remember not whatwas "out there" but what they did duringencoding.

In more traditional memory paradigms,the major theoretical concepts were tracesand associations; in both cases their maintheoretical property was strength. In turn,the subject's performance in acquisition,retention, transfer, and retrieval was held tobe a direct function of the strength of asso-ciations and their interrelations. The deter-minants of strength were also well known:study time, number of repetitions, recency,intentionality of the subject, preexperimentalassociative strength between items, inter-ference by associations involving identicalor similar elements, and so on. In the ex-periments we have described here, theseimportant determinants of the strength ofassociations and traces were held constant:nominal identity of items, preexperimentalassociations among items, intralist similarity,frequency, recency, instructions to "learn"the materials, the amount and duration ofinterpolated activity. The only thing thatwas manipulated was the mental activity ofthe learner; yet, as the results showed,memory performance was dramaticallyaffected by these activities.

This difference between the old paradigmand the new creates many interesting re-search problems that would not readily havesuggested themselves in the former frame-work. For example, to what extent arethe encoding operations performed on anevent under the person's volitional strategic

control, and to what extent are they deter-mined by factors such as context and set?Why are there such large differences be-tween different encoding operations? Inparticular, why is it that subjects do not, orcan not, encode case words efficiently whenthey are given explicit instructions to learnthe words? How does the ability of onelist item to serve as a retrieval cue foranother list item (e.g., in an A-B pair)vary as a function of encoding operationsperformed on the pair as opposed to theindividual items? The important conceptof association as such, the bond or relationbetween the two items, A and B, mayassume a different form in the new paradigm.The classical ideas of frequency and recencymay be eclipsed by notions referring tomental activity.

There are problems, too, associated withthe development of a taxonomy of encodingoperations. How should such operationsbe classified ? Do encoding operations reallyfall into types as implied by the distinctionbetween case, rhyme, and category in thepresent experiments, or is there someunderlying continuity between different op-erations ? This last point reflects the debatewithin theories of perception on whetheranalysis of structure and analysis of mean-ing are qualitatively distinct (Sutherland,1972) or are better thought of as continuous(Kolers, 1973b).

Finally, the major question generated bythe present approach is what are the encod-ing operations underlying "normal" learn-ing and remembering? The experimentsreported in this article show that people donot necessarily learn best when they aremerely given "learn" instructions. Thepresent viewpoint suggests that when sub-jects are instructed to learn a list of items,they perform self initiated encoding opera-ions on the items. Thus, by comparingquantitative and qualitative aspects of per-formance under learn instructions with per-formance after various combinations of in-cidental orienting tasks, the nature of learn-ing processes may be further elucidated.The possibility of analysis and control oflearning through its constituent mental op-erations opens up exciting vistas for theoryand application.

Page 26: Depth of Processing and the Retention of Words in Episodic

DEPTH OF PROCESSING AND WORD RETENTION 293

REFERENCE NOTE

1. Moscovitch, M., & Craik, F. I. M. Retrievaleves and levels of processing in recall andrecognition. Unpublished manuscript, 1975.(Available from Morris Moscovitch, ErindaleCollege, Mississauga, Ontario, Canada).

REFERENCESBegg, I. Recall of meaningful phrases. Journal

of Verbal Learning and Verbal Behavior, 1972,11, 431-439.

Bobrow, S. A., & Bower, G. H. Comprehensionand recall of sentences. Journal of Experi-mental Psychology, 1969, 80, 55-61.

Bower, G. H. A multicomponent theory of thememory trace. In K. W. Spence & J. T. Spence(Eds.), The psychology of learning and moti-vation (Vol. 1). New York: Academic Press,1967.

Bower, G. H., & Karlin, M. B. Depth of pro-cessing pictures of faces and recognition mem-ory. Journal of Experimental Psychology,

' 1974, 103, 751-757.Broadbent, D. E. Behaviour. London: Eyre &

Spottiswoode, 1961.Cermak, L. S. Human memory: Research and

theory. New York: Ronald, 1972.Craik, F. I. M., & Lockhart, R. S. Levels of

processing: A framework for memory research.Journal of Verbal Learning and Verbal Be-havior, 1972, 11, 671-684.

Craik, F. I. M., & Watkins, M. J. The role ofrehearsal in short-term memory. Journal ofVerbal Learning and Verbal Behavior, 1973,12, 599-607.

Eagle, M., & Leiter, E. Recall and recognition inintentional and incidental learning. Journal ofExperimental Psychology, 1964, 68, 58-63.

Horowitz, L. M., & Prytulak, L. S. Redinte-grative memory. Psychological Review, 1969,76, 519-531.

Hyde, T. S, Differential effects of effort andtype of orienting task on recall and organiza-tion of highly associated words. Journal ofExperimental Psychology, 1973, 79, 111-113.

Hyde, T. S., & Jenkins, J. J. Differential effectsof incidental tasks on the organization ofrecall of a list of highly associated words.Journal of Experimental Psychology, 1969, 82,472-481.

Hyde, T. S., & Jenkins, J. J. Recall for wordsas a function of semantic, graphic, and syntacticorienting tasks. Journal of Verbal Learningand Verbal Behavior, 1973, 12, 471-480.

Jacoby, L. L. Test appropriate strategies inretention of categorized lists. Journal of VerbalLearning and Verbal Behavior, 1973, 12, 675-682.

Kolers, P. A. Remembering operations. Mem-ory & Cognition, 1973, 1, 347-355. (a)

Kolers, P. A. Some modes of representation. InP. Pliner, L. Krames, & T. Alloway (Eds.),Communication and affect: Language andthought. New York: Academic Press, 1973.(b)

Kolers, P. A., & Ostry, D. J. Time course ofloss of information regarding pattern analyzingoperations. Journal of Verbal Learning andVerbal Behavior, 1974, 13, 599-612.

Lockhart, R. S., Craik, F. I. M., & Jacoby, L. L.Depth of processing in recognition and recall:Some aspects of a general memory system. InJ. Brown (Ed.), Recognition and recall. Lon-don: Wiley, 1975.

Neisser, U. Cognitive psychology. New York:Appleton-Century-Crofts, 1967.

Norman, D. A. (Ed.). Models of human mem-ory. New York: Academic Press, 1970.

Paivio, A. Imagery and verbal processes. NewYork: Holt, Rinehart & Winston, 1971.

Postman, L. Short-term memory and incidentallearning. In A. W. Melton (Ed.), Categoriesof human learning. New York: AcademicPress, 1964.

Rosenberg, S., & Schiller, W. J. Semantic cod-ing and incidental sentence recall. Journal ofExperimental Psychology, 1971, 90, 345-346.

Schulman, A. I. Recognition memory for targetsfrom a scanned word list. British Journal ofPsychology, 1971, 62, 335-346.

Schulman, A. I. Memory for words recentlyclassified. Memory & Cognition, 1974, 2, 47-52.

Sheehan, P. W. The role of imagery in incidentallearning. British Journal of Psychology, 1971,62, 235-244.

Sutherland, N. S. Object recognition. In E. C.Carterette & M. P. Friedman (Eds.), Hand-book of perception (Vol. 3). New York:Academic Press, 1972.

Till, R. E., & Jenkins, J. J. The effects of cuedorienting tasks on the free recall of words.Journal of Verbal Learning and Verbal Be-havior, 1973, 12, 489-498.

Treisman, A., & Tuxworth, J. Immediate anddelayed recall of sentences after perceptualprocessing at different levels. Journal of Ver-bal Learning and Verbal Behavior, 1974, 13,38-44.

Tulving, E. Episodic and semantic memory. InE. Tulving & W. Donaldson (Eds.), Organiza-tion of memory. New York: Academic Press,1972.

Tulving, E., & Thomson, D. M. Encodingspecificity and retrieval processes in episodicmemory. Psychological Review, 1973, 80, 352-373.

Tulving, E., & Watkins, M. J. Structure ofmemory traces. Psychological Review, 1975,82, 261-275.

Walsh, D. A., & Jenkins, J. J. Effects of orient-ing tasks on free recall in incidental learning:"Difficulty," "effort," and "process" explana-tions. Journal of Verbal Learning and VerbalBehavior, 1973, 12, 481-488.

Waugh, N. C., & Norman, D. A. Primarymemory. Psychological Review, 1965, 72, 89-104.

Wickelgren, W. A. The long and the short ofmemory. Psychological Bulletin, 1973, 80, 425-438.

(Received February 5, 1975)

Page 27: Depth of Processing and the Retention of Words in Episodic

294 FERGUS I. M. CRAIK AND ENDEL TULVING

APPENDIX

Each subject in Experiment 9 received thesame 60 words in the same order, but six dif-ferent "formats" were constructed, such thatall six possible questions (case, rhyme, cat-egory X yes-no) were asked for each word(Table Al). Thus, for SPEECH, the questionswere (a) Is the word in capital letters? (b)Is the word in small print? (c) Does the

word rhyme with each? (d) Does the wordrhyme with tense? (e) Is the word a form ofcommunication? (f) Is the word somethingto wear? Each format contained 10 questionsof each type. Negative questions were drawnfrom the pool of unused questions in thatparticular format.

TABLE AlWORDS AND QUESTIONS USED IN EXPERIMENT 9

Word

SPEECHBRUSHCHEEKFENCEFLAMEFLOURHONEYKNIFESHEEPCOPPERGLOVE

XMONKDAISYMINERCARTCLOVEROBBERMASTFIDDLECHAPELSONNETWITCHROACHBRAKETWIGGRINDRILLMOANCLAWSINGER

Rhymequestion

eachlushteaktenseclaimsourfunnywifeleapstoppershovetrunkcrazylinerstartroveclobberpastriddlegrapplebonnetrichcoachshakebigbinfillpronerawringer

Category question

a form of communicationused for cleaninga part of the bodyfound in the gardensomething hotused for cookinga type of fooda type of weapona type of farm animala type of metalsomething to weara type of clergya type of flowera type of occupationa type of vehiclea type of herba type of criminala part of a shipa musical instrumenta type of buildinga written form of artassociated with magica type of insecta part of a cara part of a treea human expressiona type of implementa human sounda part of an animala type of entertainer

Word

BEARLAMPCHERRY

XROCKEARLPOOLWEEKBOATPAILTROUTGRAMWOOLCLIPJUICEPONDLANENURSELARKSTATESOAPJADESLEETRICETIRECHILDDANCEFIELDFLOORGLASSTRIBE

Rhymequestion

haircampverystockpearlschoolpeakrotewhalebouttrampullshipnoosewandpaincurseparkcrateroperaidfeetdicefirewildstanceshieldsorepassscribe

Category question

a wild animala type of furniturea type of fruita type of minerala type of nobilitya type of gamea division of timea mode of travela type of containera type of fisha type of measurementa type of materiala type of office supplya type of beveragea body of watera type of thoroughfareassociated with medicinea type of birda territorial unita type of toiletrya type of precious stonea type of weathera type of graina round objecta human beinga type of physical activityfound in tne countrysidea part of a rooma type of utensila group of people