rosenblatt 1958

23
Psychological Review Vol. 65, No. 6, 19S8 THE PERCEPTRON: A PROBABILISTIC MODEL FOR INFORMATION STORAGE AND ORGANIZATION IN THE BRAIN 1 F. ROSENBLATT Cornell Aeronautical Laboratory If we are eventually to understand the capability of higher organisms for perceptual recognition, generalization, recall, and thinking, we must first have answers to three fundamental questions: 1. How is information about the physical world sensed, or detected, by the biological system? 2. In what form is information stored, or remembered? 3. How does information contained in storage, or in memory, influence recognition and behavior? The first of these questions is in the province of sensory physiology, and is the only one for which appreciable understanding has been achieved. This article will be concerned pri- marily with the second and third questions, which are still subject to a vast amount of speculation, and where the few relevant facts currently sup- plied by neurophysiology have not yet been integrated into an acceptable theory. With regard to the second question, two alternative positions have been maintained. The first suggests that storage of sensory information is in the form of coded representations or images, with some sort of one-to-one mapping between the sensory stimulus 1 The development of this theory has been carried out at the Cornell Aeronautical Lab- oratory, Inc., under the sponsorship of the Office of Naval Research, Contract Nonr- 2381(00). This article is primarily'an adap- tation of material reported in Ref. IS, which constitutes the first full report on the program. and the stored pattern. According to this hypothesis, if one understood the code or "wiring diagram" of the nerv- ous system, one should, in principle, be able to discover exactly what an organism remembers by reconstruct- ing the original sensory patterns from the "memory traces" which they have left, much as we might develop a photographic negative, or translate the pattern of electrical charges in the "memory" of a digital computer. This hypothesis is appealing in its simplicity and ready intelligibility, and a large family of theoretical brain models has been developed around the idea of a coded, representational mem- ory (2, 3, 9, 14). The alternative ap- proach, which stems from the tradi- tion of British empiricism, hazards the guess that the images of stimuli may never really be recorded at all, and that the central nervous system simply acts as an intricate switching network, where retention takes the form of new connections, or pathways, between centers of activity. In many of the more recent developments of this position (Hebb's "cell assembly," and Hull's "cortical anticipatory goal response," for example) the "re- sponses" which are associated to stimuli may be entirely contained within the CNS itself. In this case the response represents an "idea" rather than an action. The impor- tant feature of this approach is that there is never any simple mapping of the stimulus into memory, according to some code which would permit its later reconstruction. Whatever in- 386

Upload: inesh-eskanesiari

Post on 28-Dec-2015

94 views

Category:

Documents


0 download

DESCRIPTION

roseablatt

TRANSCRIPT

Psychological ReviewVol. 65, No. 6, 19S8

THE PERCEPTRON: A PROBABILISTIC MODEL FORINFORMATION STORAGE AND ORGANIZATION

IN THE BRAIN1

F. ROSENBLATT

Cornell Aeronautical Laboratory

If we are eventually to understandthe capability of higher organisms forperceptual recognition, generalization,recall, and thinking, we must firsthave answers to three fundamentalquestions:

1. How is information about thephysical world sensed, or detected, bythe biological system?

2. In what form is informationstored, or remembered?

3. How does information containedin storage, or in memory, influencerecognition and behavior?

The first of these questions is in theprovince of sensory physiology, and isthe only one for which appreciableunderstanding has been achieved.This article will be concerned pri-marily with the second and thirdquestions, which are still subject to avast amount of speculation, and wherethe few relevant facts currently sup-plied by neurophysiology have not yetbeen integrated into an acceptabletheory.

With regard to the second question,two alternative positions have beenmaintained. The first suggests thatstorage of sensory information is inthe form of coded representations orimages, with some sort of one-to-onemapping between the sensory stimulus

1 The development of this theory has beencarried out at the Cornell Aeronautical Lab-oratory, Inc., under the sponsorship of theOffice of Naval Research, Contract Nonr-2381(00). This article is primarily'an adap-tation of material reported in Ref. IS, whichconstitutes the first full report on the program.

and the stored pattern. According tothis hypothesis, if one understood thecode or "wiring diagram" of the nerv-ous system, one should, in principle,be able to discover exactly what anorganism remembers by reconstruct-ing the original sensory patterns fromthe "memory traces" which they haveleft, much as we might develop aphotographic negative, or translatethe pattern of electrical charges in the"memory" of a digital computer.This hypothesis is appealing in itssimplicity and ready intelligibility,and a large family of theoretical brainmodels has been developed around theidea of a coded, representational mem-ory (2, 3, 9, 14). The alternative ap-proach, which stems from the tradi-tion of British empiricism, hazards theguess that the images of stimuli maynever really be recorded at all, andthat the central nervous systemsimply acts as an intricate switchingnetwork, where retention takes theform of new connections, or pathways,between centers of activity. In manyof the more recent developments ofthis position (Hebb's "cell assembly,"and Hull's "cortical anticipatory goalresponse," for example) the "re-sponses" which are associated tostimuli may be entirely containedwithin the CNS itself. In this casethe response represents an "idea"rather than an action. The impor-tant feature of this approach is thatthere is never any simple mapping ofthe stimulus into memory, accordingto some code which would permit itslater reconstruction. Whatever in-

386

THE PERCEPTRON 387

formation is retained must somehowbe stored as a preference for a par-ticular response; i.e., the informationis contained in connections or associa-tions rather than topographic repre-sentations. (The term response, forthe remainder of this presentation,should be understood to mean anydistinguishable state of the organism,which may or may not involve ex-ternally detectable muscular activity.The activation of some nucleus of cellsin the central nervous system, forexample, can constitute a response,according to this definition.)

Corresponding to these two posi-tions on the method of informationretention, there exist two hypotheseswith regard to the third question, themanner in which stored informationexerts its influence on current activity.The "coded memory theorists" areforced to conclude that recognition ofany stimulus involves the matchingor systematic comparison of the con-tents of storage with incoming sen-sory patterns, in order to determinewhether the current stimulus has beenseen before, and to determine the ap-propriate response from the organism.The theorists in the empiricist tradi-tion, on the other hand, have essen-tially combined the answer to thethird question with their answer to thesecond: since the stored informationtakes the form of new connections, ortransmission channels in the nervoussystem (or the creation of conditionswhich are functionally equivalent tonew connections), it follows that thenew stimuli will make use of these newpathways which have been created,automatically activating the appro-priate response without requiring anyseparate process for their recognitionor identification.

The theory to be presented heretakes the empiricist, or "connectionist"'position with regard to these ques-

tions. The theory has been developedfor a hypothetical nervous system, ormachine, called a perceptron. Theperceptron is designed to illustratesome of the fundamental properties ofintelligent systems in general, withoutbecoming too deeply enmeshed in thespecial, and frequently unknown, con-ditions which hold for particular bio-logical organisms. The analogy be-tween the perceptron and biologicalsystems should be readily apparent tothe reader.

During the last few decades, thedevelopment of symbolic logic, digitalcomputers, and switching theory hasimpressed many theorists with thefunctional similarity between a neuronand the simple on-off units of whichcomputers are constructed, and hasprovided the analytical methods nec-essary for representing highly complexlogical functions in terms of suchelements. The result has been aprofusion of brain models whichamount simply to logical contrivancesfor performing particular algorithms(representing "recall," stimulus com-parison, transformation, and variouskinds of analysis) in response tosequences of stimuli—e.g., Rashevsky(14), McCulloch (10), McCulloch &Pitts (11), Culbertson (2), Kleene(8), and Minsky (13). A relativelysmall number of theorists, like Ashby(1) and von Neumann (17, 18), havebeen concerned with the problems ofhow an imperfect neural network,containing many random connections,can be made to perform reliably thosefunctions which might be representedby idealized wiring diagrams. Un-fortunately, the language of symboliclogic and Boolean algebra is less wellsuited for such investigations. Theneed for a suitable language for themathematical analysis of events insystems where only the gross organ-ization can be characterized, and the

388 F. ROSENBLATT

precise structure is unknown, has ledthe author to formulate the currentmodel in terms of probability theoryrather than symbolic logic.

The theorists referred to above werechiefly concerned with the question ofhow such functions as perception andrecall might be achieved by a deter-ministic physical system of any sort,rather than how this is actually doneby the brain. The models which havebeen produced all fail in some im-portant respects (absence of equi-potentiality, lack of neuroeconomy,excessive specificity of connectionsand synchronization requirements,unrealistic specificity of stimuli suffi-cient for cell firing;, postulation ofvariables or functional features withno known neurological correlates, etc.)to correspond to a biological system.The proponents of this line of ap-proach have maintained that, once ithas been shown how a physicalsystem of any variety might be madeto perceive and recognize stimuli, orperform other brainlike functions, itwould require only a refinement ormodification of existing principles tounderstand the working of a morerealistic nervous system, and to elim-inate the shortcomings mentionedabove. The writer takes the position,on the other hand, that these short-comings are such that a mere refine-ment or improvement of the principlesalready suggested can never accountfor biological intelligence; a differencein principle is clearly indicated. Thetheory of statistical separability (Cf.15), which is to be summarized here,appears to offer a solution in principleto all of these difficulties.

Those theorists—Hebb (7), Milner(12), Eccles (4), Hayek (6)—whohave been more directly concernedwith the biological nervous systemand its activity in a natural environ-ment, rather than with formally'anal-

ogous machines, have generally beenless exact in their formulations and farfrom rigorous in their analysis, so thatit is frequently hard to assess whetheror not the systems that they describecould actually work in a realistic nerv-ous system, and what the necessaryand sufficient conditions might be.Here again, the lack of an analyticlanguage comparable in proficiency tothe Boolean algebra of the networkanalysts has been one of the mainobstacles. The contributions of thisgroup should perhaps be considered assuggestions of what to look for andinvestigate, rather than as finishedtheoretical systems in their own right.Seen from this viewpoint, the mostsuggestive work, from the standpointof the following theory, is that ofHebb and Hayek.

The position, elaborated by Hebb(7), Hayek (6), Uttley (16), andAshby (1), in particular, upon whichthe theory of the perceptron is based,can be summarized by the followingassumptions:

1. The physical connections of thenervous system which are involved inlearning and recognition are not iden-tical from one organism to another.At birth, the construction of the mostimportant networks is largely random,subject to a minimum number ofgenetic constraints.

2. The original system of connectedcells is capable of a certain amount ofplasticity; after a period of neuralactivity, the probability that a stim-ulus applied to one set of cells willcause a response in some other set islikely to change, due to some rela-tively long-lasting changes in theneurons themselves.

3. Through exposure to a largesample of stimuli, those which aremost "similar" (in some sense whichmust be defined in terms of theparticular physical system) will tend

THE PEECEPTRON 389

to form pathways to the same sets ofresponding cells. Those which aremarkedly "dissimilar" will tend todevelop connections to different sets ofresponding cells.

4. The application of positive and/or negative reinforcement (or stimuliwhich serve this function) may facil-itate or hinder whatever formation ofconnections is currently in progress.

5. Similarity, in such a system, isrepresented at some level of the nerv-ous system by a tendency of similarstimuli to activate the same sets ofcells. Similarity is not a necessaryattribute of particular formal or geo-metrical classes of stimuli, but de-pends on the physical organization ofthe perceiving system, an organiza-tion which evolves through interactionwith a given environment. Thestructure of the system, as well as theecology of the stimulus-environment,will affect, and will largely determine,the classes of "things" into which theperceptual world is divided.

THE ORGANIZATION OF A PERCEPTRON

The organization of a typical photo-perceptron (a perceptron respondingto optical patterns as stimuli) is shownin Fig. 1. The rules of its organiza-tion are as follows:

1. Stimuli impinge on a retina ofsensory units (S-points), which areassumed to respond on an all-or-nothing basis, in some models, or witha pulse amplitude or frequency pro-portional to the stimulus intensity, inother models. In the models con-sidered here, an all-or-nothing re-sponse will be assumed.

2. Impulses are transmitted to a setof association cells (A-units) in a"projection area" (Ai). This pro-jection area may be omitted in somemodels, where the retina is connecteddirectly to the association area (An).

FIG. 1. Organization of a perceptron.

The cells in the projection area eachreceive a number of connections fromthe sensory points. The set of S-points transmitting impulses to a par-ticular A-unit will be called the originpoints of that A-unit. These originpoints may be either excitatory or in-hibitory in their effect on the A-unit.If the algebraic sum of excitatory andinhibitory impulse intensities is equalto or greater than the threshold (6) ofthe A-unit, then the A-unit fires, againon an all-or-nothing basis (or, in somemodels, which will not be consideredhere, with a frequency which dependson the net value of the impulsesreceived). The origin points of theA-units in the projection area tend tobe clustered or focalized, about somecentral point, corresponding to eachA-unit. The number of origin pointsfalls off exponentially as the retinaldistance from the central point forthe A-unit in question increases.(Such a distribution seems to be sup-ported by physiological evidence, andserves an important functional pur-pose in contour detection.)

3. Between the projection area andthe association area (An), connectionsare assumed to be random. That is,each A-unit in the An set receivessome number of fibers from originpoints in the AI set, but these originpoints are scattered at randomthroughout the projection area.Apart from their connection distri-bution, the An units are identicalwith the AI units, and respond undersimilar conditions.

4. The "responses," Ri, R^, . . . ,Rn are cells (or sets of cells) which

390 F. ROSENBLATT

respond in much the same fashion asthe A-units. Each response has atypically large number of origin pointslocated at random in the An set. Theset of A-units transmitting impulsesto a particular response will be calledthe source-set for that response.(The source-set of a response is iden-tical to its set of origin points in theA-system.) The arrows in Fig. 1indicate the direction of transmissionthrough the network. Note that upto An all connections are forward, andthere is no feedback. When we cometo the last set of connections, betweenAn and the R-units, connections areestablished in both directions. Therule governing feedback connections,in most models of the perceptron, canbe either of the following alternatives:

(a) Each response has excitatoryfeedback connections to the cells in itsown source-set, or

(&) Each response has inhibitoryfeedback connections to the comple-ment of its own source-set (i.e., it tendsto prohibit activity in any associationcells which do not transmit to it).

The first of these rules seems moreplausible anatomically, since the R-units might be located in the samecortical area as their respective source-

flHOKfflt LIHCt >WIHHtllTOMV

COHKECTIOHI

FIG. 2A. Schematic representation ofconnections in a simple perceptron.

OWHWTMV OMNICTIOH

• EMITITMV CONNtcriON

FIG. 2B. Venn diagram of the same per-ceptronf (shading shows active sets for RIresponse).

sets, making mutual excitation be-tween the R-units and the A-units ofthe appropriate source-set highlyprobable. The alternative rule (6)leads to a more readily analyzed sys-tem, however, and will therefore beassumed for most of the systems to beevaluated here.

Figure 2 shows the organization ofa simplified perceptron, which affordsa convenient entry into the theory ofstatistical separability. After thetheory has been developed for thissimplified model, we will be in a betterposition to discuss the advantages ofthe system in Fig. 1. The feedbackconnections shown in Fig. 2 are in-hibitory, and go to the complementof the source-set for the response fromwhich they originate; consequently,this system is organized according toRule b, above. The system shownhere has only three stages, the firstassociation stage having been elim-inated. Each A-unit has a set ofrandomly located origin points in theretina. Such a system will form simi-larity concepts on the basis of coin-cident areas of stimuli, rather than bythe similarity of contours or outlines.While such a system is at a disadvan-tage in many discrimination experi-ments, its capability is still quiteimpressive, as will be demonstratedpresently. The system shown in Fig.2 has only two responses, but there isclearly no limit on the number thatmight be included.

The responses in a system organizedin this fashion are mutually exclusive.If RI occurs, it will tend to inhibit Rs,and will also inhibit the source-set forR2. Likewise, if Ra should occur, itwill tend to inhibit RI. If the totalimpulse received from all the A-unitsin one source-set is stronger or morefrequent than the impulse receivedby the alternative (antagonistic) re-sponse, then the first response will

THE PERCEPTRON 391

tend to gain an advantage over theother, and will be the one whichoccurs. If such a system is to becapable of learning, then it must bepossible to modify the A-units or theirconnections in such a way that stimuliof one class will tend to evoke astronger impulse in the Ri source-setthan in the Ra source-set, whilestimuli of another (dissimilar) classwill tend to evoke a^stronger impulsein the Ra source-set than in the Risource-set.

It will be assumed that the impulsesdelivered by each A-unit can becharacterized by a value, V, whichmay be an amplitude, frequency,latency, or probability of completingtransmission. If an A-unit has a highvalue, then all of its output impulsesare considered to be more effective,

more potent, or more likely to arriveat their endbulbs than impulses froman A-unit with a lower value. Thevalue of an A-unit is considered to bea fairly stable characteristic, probablydepending on the metabolic conditionof the cell and the cell membrane, butit is not absolutely constant. It isassumed that, in general, periods ofactivity tend to increase a cell's value,while the value may decay (in somemodels) with inactivity. The mostinteresting models are those in whichcells are assumed to compete for met-abolic materials, the more active cellsgaining at the expense of the lessactive cells. In such a system, ifthere is no activity, all cells will tendto remain in a relatively constantcondition, and (regardless of activity)the net value of the system, taken in

TABLE 1

COMPARISON or LOGICAL CHARACTERISTICS OF a, /3, AND 7 SYSTEMS

Total value-gain of source set per rein-forcement

AV for A-units active for 1 unit of time

AV for inactive A-units outside of domi-nant set

A V for inactive A-units of dominant set

Mean value of A-system

Difference between mean values ofsource-sets

a-System(Uncompensated

Gain System)

Nar

+1

0

0

Increases with numberof reinforcements

Proportional to differ-ences of reinforce-ment frequency

(»,„—»,„)

/3-System(Constant Feed

System)

K

K/Nar

K/NA,

0

Increases withtime

0

7-System(Parasitic Gain

System)

0

+ 1

0

-N*r

NAr-Nar

Constant

0

Note: In the /3 and 7 systems, the total value-change for any A-unit will be the sum of the AV'sfor all source-sets of which it is a member.

N»r = Number of active units in source-setNAT — Total number of units in source-set«,„ = Number of stimuli associated to response TJ

K = Arbitrary constant

392 F. ROSENBLATT

its entirety, will remain constant atall times. Three types of systems,which differ in their value dynamics,have been investigated quantitatively.Their principal logical features arecompared in Table 1. In the alphasystem, an active cell simply gains anincrement of value for every impulse,and holds this gain indefinitely. Inthe beta system, each source-set isallowed a certain constant rate of gain,the increments being apportionedamong the cells of the source-set inproportion to their activity. In thegamma system, active cells gain invalue at the expense of the inactivecells of their source-set, so that thetotal value of a source-set is alwaysconstant.

For purposes of analysis, it is con-venient to distinguish two phases inthe response of the system to a stim-ulus (Fig. 3). In the predominantphase, some proportion of A-units(represented by solid dots in thefigure) responds to the stimulus, butthe R-units are still inactive. Thisphase is transient, and quickly givesway to the postdominant phase, inwhich one of the responses becomesactive, inhibiting activity in the com-

FIG. 3A. Predominant phase. Inhibitory connectionsare not shown. Solid black units are active.

FIG. 3B. Postdominant phase. Dominant subsetsuppresses rival' sets. Inhibitory connections shownonly for Ri.

FIG. 3. Phases of response to a stimulus,

plement of its own source-set, andthus preventing the occurrence of anyalternative response. The responsewhich happens to become dominant isinitially random, but if the A-units arereinforced (i.e., if the active units areallowed to gain in value), then whenthe same stimulus is presented againat a later time, the same response willhave a stronger tendency to recur, andlearning can be said to have takenplace.

ANALYSIS OF THE PREDOMINANTPHASE

The perceptrons considered herewill always assume a fixed threshold,6, for the activation of the A-units.Such a system will be called a fixed-threshold model, in contrast to a con-tinuous transducer model, where theresponse of the A-unit is some con-tinuous function of the impingingstimulus energy.

In order to predict the learningcurves of a fixed-threshold perceptron,two variables have been found to beof primary importance. They aredefined as follows :

Pa = the expected proportion of A-units activated by a stimulus of agiven size,

PC = the conditional probabilitythat an A-unit which responds to agiven stimulus, Si, will also respondto another given stimulus, 82-

It can be shown (Rosenblatt, IS) thatas the size of the retina is increased,the number of S-points (Na} quicklyceases to be a significant parameter,and the values of Pa and Pc approachthe value that they would have for aretina with infinitely many points.For a large retina, therefore, theequations are as follows :

(1)

THE PERCEPTRON 393

where

P(e,i) =

and

X

R = proportion of S-points activatedby the stimulus

x — number of excitatory connec-tions to each A-unit

y = number of inhibitory connec-tions to each A-unit

0 = threshold of A-units.

(The quantities e and i are the ex-citatory and inhibitory components ofthe excitation received by the A-unitfrom the stimulus. If the algebraicsum a = e + i is equal to or greaterthan 6, the A-unit is assumed to re-spond.)

j x y e i

•* e = " ~ 2~i 2-i 2—i 2-

x—0 y~~^

(e -

where

£P(«,t,J.,/<,«.,«<) (2)

- J, + /,- + ge - «,- > 0)

X

X>

/ x - e\x ( j G"«(I - G) *-«-<'•\ s^ /

x (';') G.,(1-G).-.,

and

L — proportion of the S-points illumi-nated by the first stimulus, Si,which are not illuminated byS2

G — proportion of the residual S-set(left over from the first stim-ulus) which is included in thesecond stimulus (82).

The quantities R, L, and G specify thetwo stimuli and their retinal overlap.le and /,• are, respectively, the numbersof excitatory and inhibitory originpoints "lost" by the A-unit whenstimulus Si is replaced by 82; ge andgi are the numbers of excitatory andinhibitory origin points "gained"when stimulus Si is replaced by 82.The summations in Equation 2 arebetween the limits indicated, subjectto the side condition e — i — I, + l{

+g, - gi > 0.Some of the most important char-

acteristics of Pa are illustrated in Fig.4, which shows Pa as a function of theretinal area illuminated (R). Notethat Pa can be reduced in magnitudeby either increasing the threshold, 6,or by increasing the proportion of in-hibitory connections (y). A compari-son of Fig. 4b and 4c shows that if theexcitation is about equal to the inhibi-tion, the curves for Pa as a functionof R are flattened out, so that there islittle variation in Pa for stimuli ofdifferent sizes. This fact is of greatimportance for systems which requirePa to be close to an optimum value inorder to perform properly.

The behavior of Pc is illustrated inFig. 5 and 6. The curves in Fig. 5 canbe compared with those for Pa in Fig.4. Note that as the threshold is in-creased, there is an even sharper re-duction in the value of Pc than was thecase with Pa. Pc also decreases as theproportion of inhibitory connectionsincreases, as does Pa. Fig. 5, which is

394 F. ROSENBLATT

(•] EFFECT OF INHIBITORY-EXCITATORY MIXTURE. 6 " I

(b) VARIATION WITH 6FOR X • 10, V * 0

(c) VARIATION WITH M AHD 0FOR MIXTURES ABOUT.60JINHIBITORY. SOLID LINESARE FOR X • 5, Y » 5.

0 .1 .2 .3 .1 .5PROPORTION OF S-POIHTS ILLUMINATED

(K)

.1 .2 .3 .it .B

FIG. 4. P0 as function of retinal area illuminated.

calculated for nonoverlapping stimuli,illustrates the fact that Pc remainsgreater than zero even when the stim-uli are completely disjunct, and illumi-nate no retinal points in common. InFig. 6, the effect of varying amountsof overlap between the stimuli isshown. In all cases, the value of Pc

goes to unity as the stimuli approachperfect identity. For smaller stimuli(broken line curves), the value of Pc

is lower than for large stimuli. Simi-larly, the value is less for high thresh-olds than for low thresholds. Theminimum value of Pc will be equal to

Pcmin = (1 - L)'(l - G)». (3)

In Fig. 6, Pemin corresponds to thecurve for 6 = 10. Note that underthese conditions the probability thatthe A-unit responds to both stimuli(Pc) is practically zero, except forstimuli which are quite close toidentity. This condition can be ofconsiderable help in discriminationlearning.

MATHEMATICAL ANALYSISOF LEARNING IN THE

PERCEPTRON

The response of the perceptron inthe predominant phase, where somefraction of the A-units (scatteredthroughout the system) responds tothe stimulus, quickly gives way to thepostdominant response, in which ac-tivity is limited to a single source-set,the other sets being suppressed. Twopossible systems have been studied forthe determination of the "dominant"response, in the postdominant phase.In one (the mean-discriminating sys-tem, or ju-system), the response whoseinputs have the greatest mean valueresponds first, gaining a slight advan-tage over the others, so that it quicklybecomes dominant. In the secondcase (the sum-discriminating system,or S-system), the response whose in-puts have the greatest net value gainsan advantage. In most cases, sys-tems which respond to mean valueshave an advantage over systems whichrespond to sums, since the means are

THE PERCEPTEON 395

.1 .2 .3 .4 .5R (RETINAL AREA ILLUMINATED)

FIG. 5. Pc as a function of R,for nonoverlapping stimuli.

less influenced by random variationsin Pa from one source-set to another.In the case of the -/-system (see Table1), however, the performance ofthe ^-system and S-system becomeidentical.

We have indicated that the percep-tron is expected to learn, or to formassociations, as a result of the changesin value that occur as a result of theactivity of the association cells. Inevaluating this learning, one of twotypes of hypothetical experiments canbe considered. In the first case, theperceptron is exposed to some seriesof stimulus patterns (which might bepresented in random positions on theretina) and is "forced" to give thedesired response in each case. (Thisforcing of responses is assumed to bea prerogative of the experimenter. Inexperiments intended to evaluatetrial-and-error learning, with moresophisticated perceptrons, the experi-menter does not force the system to

respond in the desired fashion, butmerely applies positive reinforcementwhen the response happens to be cor-rect, and negative reinforcement whenthe response is wrong.) In evaluatingthe learning which has taken placeduring this "learning series," theperceptron is assumed to be "frozen"in its current condition, no furthervalue changes being allowed, and thesame series of stimuli is presentedagain in precisely the same fashion, sothat the stimuli fall on identical posi-tions on the retina. The probabilitythat the perceptron will show a biastowards the "correct" response (theone which has been previously rein-forced during the learning series) inpreference to any given alternativeresponse is called Pr, the probabilityof correct choice of response betweentwo alternatives.

In the second type of experiment, alearning series is presented exactly asbefore, but instead of evaluating theperceptron's performance using thesame series of stimuli which wereshown before, a new series is pre-sented, in which stimuli may be drawnfrom the same classes that were previ-

(PROPORTION OF 0¥E«L«P BETMEEK STIMULI)

FIG. 6. Pc as a function of C. X - 10,Y = 0. Solid lines: R = .5; broken lines:R = .2.

396 F. ROSENBLATT

ously experienced, but are not neces-sarily identical. This new test seriesis assumed to be composed of stimuliprojected onto random retinal posi-tions, which are chosen independentlyof the positions selected for the learn-ing series. The stimuli of the testseries may also differ in size or rota-tional position from the stimuli whichwere previously experienced. In thiscase, we are interested in the prob-ability that the perceptron will givethe correct response for the class ofstimuli which is represented, regard-less of whether the particular stimulushas been seen before or not. Thisprobability is called Pt, the prob-ability of correct generalization. Aswith Pr, Pg is actually the probabilitythat a bias will be found in favor of theproper response rather than any onealternative ; only one pair of responsesat a time is considered, and the factthat the response bias is correct in onepair does not mean that there maynot be other pairs in which the biasfavors the wrong response. The prob-ability that the correct response willbe preferred over all alternatives isdesignated PR or Pa.

In all cases investigated, a singlegeneral equation gives a close ap-proximation to Pr and Pg, if the ap-propriate constants are substituted.This equation is of the form :

P = P(^a r>0).<£(Z) (4)

where

P(Nar > 0) = 1 - (1 - P.)*.

<t>(Z) = normal curve integralfrom — oo to Z

and

c4n,

If Ri is the "correct" response, and R2is the alternative response under con-sideration, Equation 4 is the prob-ability that Rj, will be preferred over

Ra after n,r stimuli have been shownfor each of the two responses, duringthe learning period. N, is the numberof "effective" A-units in each source-set ; that is, the number of A-units ineither source-set which are not con-nected in common to both responses.Those units which are connected incommon contribute equally to bothsides of the value balance, and con-sequently do not affect the net biastowards one response or the other.Nar is the number of active units in asource-set, which respond to the teststimulus, St-P(Nar > 0) is the prob-ability that at least one of the Ne

effective units in the source-set of thecorrect response (designated, by con-vention, as the Ri response) will beactivated by the test stimulus, St.

In the case of Pg, the constant c2 isalways equal to zero, the other threeconstants being the same as for Pr.The values of the four constantsdepend on the parameters of thephysical nerve net (the perceptron)and also on the organization of thestimulus environment.

The simplest cases to analyze arethose in which the perceptron is shownstimuli drawn from an "ideal environ-ment," consisting of randomly placedpoints of illumination, where there isno attempt to classify stimuli accord-ing to intrinsic similarity. Thus, in atypical learning experiment, we mightshow the perceptron 1,000 stimulimade up of random collections ofilluminated retinal points, and wemight arbitrarily reinforce Ri as the"correct" response for the first 500of these, and R.2 for the remaining 500.This environment is "ideal" only inthe sense that we speak of an ideal gasin physics; it is a convenient artifactfor purposes of analysis, and does notlead to the best performance from theperceptron. In the ideal environ-ment situation, the constant c\ isalways equal to zero, so that, in the

THE PERCEPTRON 397

case of Pg (where c2 is also zero), thevalue of Z will be zero, and Pg cannever be any better than the randomexpectation of 0.5. The evaluationof Pr for these conditions, however,throws some interesting light on thedifferences between the alpha, beta,and gamma systems (Table 1).

First consider the alpha system,which has the simplest dynamics ofthe three. In this system, wheneveran A-unit is active for one unit oftime, it gains one unit of value. Wewill assume an experiment, initially,in which N,r (the number of stimuliassociated to each response) is con-stant for all responses. In this case,for the sum system,

(5)

where u> = the fraction of responsesconnected to each A-unit. If thesource-sets are disjunct, w = I/NR,where NR is the number of responsesin the system. For the ^-system,

(6)

The reduction of c3 to zero gives the^-system a definite advantage over theS-system. Typical learning curvesfor these systems are compared inFig. 7 and 8. Figure 9 shows theeffect of variations in Pa upon theperformance of the system.

If n,r, instead of being fixed, istreated as a random variable, so thatthe number of stimuli associated toeach response is drawn separatelyfrom some distribution, then the per-

100 1000 10,0000,n (NUMBER OF STIMULI ASSOCIATED TO EACH RESPONSE)

100,000

FIG. 7. P,( 2) as function of an,, for discrete subsets.(a>« = 0, Pa = .005. Ideal environment assumed.)

398 F. ROSENBLATT

10,000 100,000

FIG. 8. Prdit as function of an,. (For Pa — .07, wc = 0. Ideal environment assumed.)

formance of the a-system is consider-ably poorer than the above equationsindicate. Under these conditions, theconstants for the //-system are

Ci — 0C2 = 1 - Pa

r MTB - I )« ]L 7V f l -2 ^ X J

C^ —2(1 - Pa)

(7)

where

q = ratio of 0*,, to n,,NR = number of responses in the sys-

temNA = number of A-units in the sys-

temco,. = proportion of A-units common

to RX and R2.

For this equation (and any others inwhich n,r is treated as a randomvariable), it is necessary to define n,rin'. Equation 4 as the expected valueoff this variable, over the set of allresponses.

For the /3-system, there is an evengreater deficit in performance, due tothe fact that the net value continuesto grow regardless of what happensto the system. The large net valuesof the subsets activated by a stimulustend to amplify small statistical differ-ences, causing an unreliable perform-ance. The constants in this case(again for the /u-system) are

ci = 0c, = (1 - Pa)Ne

c, = 2(PaNequNB*)*c« = 2(1 -

(8)

In both the alpha and beta systems,performance will be poorer for thesum-discriminating model than for themean-discriminating case. In thegamma-system, however, it can beshown thatP,(s) = PK/O; i.e., it makesno difference in performance whetherthe S-system or ^-system is used.Moreover, the constants for the y-system, with variable nsr, are identicalto the constants for the alpha jt-sys-

THE PERCEPTRON 399

FIG. 9. P,tf) as function of P*. (For n,T — 1,000, u, = 0. Ideal environment assumed.)

tern, with nsr fixed (Equation 6). demonstrates the advantage of theThe performance of the three systems 7-system.is compared in Fig. 10, which clearly Let us now replace the "ideal en-

10,000

FIG. 10. Comparison of a, /3, and 7 systems, for variable «»,(NR = 100, <mr, = .5*,,, NA = 10,000, Pa = ,07, w = ,2).

400 F. ROSENBLATT

vironment" assumptions with a modelfor a "differentiated environment," inwhich several distinguishable classesof stimuli are present (such as squares,circles, and triangles, or the letters ofthe alphabet). If we then design anexperiment in which the stimuli asso-ciated to each response are drawn froma different class, then the learningcurves of the perceptron are drasti-cally altered. The most importantdifference is that the constant c\ (thecoefficient of nsr in the numerator of Z)is no longer equal to zero, so thatEquation 4 now has a nonrandomasymptote. Moreover, in the formfor P, (the probability of correctgeneralization), where c% = 0, thequantity Z remains greater than zero,and Pa actually approaches the sameasymptote as Pr. Thus the equationfor the perceptron's performance afterinfinite experience with each class ofstimuli is identical for PT and Pg :

This means that in the limit it makesno difference whether the perceptron hasseen a particular test stimulus before ornot; if the stimuli are drawn from adifferentiated environment, the perform-ance will be equally good in either case.

In order to evaluate the perform-ance of the system in a differentiatedenvironment, it is necessary to definethe quantity PCap. This quantity isinterpreted as the expected value ofPC between pairs of stimuli drawn atrandom from classes a and j3. Inparticular, PcU is the expected valueof Pc between members of the sameclass, and Peis is the expected value ofPC between an Si stimulus drawn fromClass 1 and an S2 stimulus drawn fromClass 2. Pclx is the expected value ofPc between members of Class 1 and

stimuli drawn at random from allother classes in the environment.

If Pcll > Pa > PC12, the limitingperformance of the perceptron (PBoo)will be better than chance, and learn-ing of some response, RI, as the proper"generalization response" for mem-bers of Class 1 should eventuallyoccur. If the above inequality is notmet, then improvement over chanceperformance may not occur, and theClass 2 response is likely to occurinstead. It can be shown (IS) thatfor most simple geometrical forms,which we ordinarily regard as "simi-lar," the required inequality can bemet, if the parameters of the systemare properly chosen.

The equation for Pr, for the sum-discriminating version of an a-percep-tron, in a differentiated environmentwhere n,r is fixed for all responses, willhave the following expressions for thefour coefficients:

C2=PJV,(1-P011)

r=l,2

r=»l,2

where

(10)

ir) and a?(Pc\x) represent thevariance of Pcir and Pc\x meas-ured over the set of possibletest stimuli, St, and

THE PERCEPTSON 401

iJ and crf(Pcix) represent thevariance of Pcir and P^ meas-ured over the set of all A-units,

e = covariance of PclrPclx, which isassumed to be negligible.

The variances which appear in theseexpressions have not yielded, thus far,to a precise analysis, and can betreated as empirical variables to bedetermined for the classes of stimuliin question. If the sigma is set equalto half the expected value of the vari-able, in each case, a conservativeestimate can be obtained. When thestimuli of a given class are all of thesame shape, and uniformly distributedover the retina, the subscript s vari-ances are equal to zero. Paw will berepresented by the same set of coeffi-cients, except for c2, which is equal tozero, as usual.

For the mean-discriminating sys-tem, the coefficients are:

\ — \Pcn~ PM)

X[cr/(Prtr

C4= -i p XTr— 1,2 -* a-t'e

[_Pelr Pclr

(ID

Some covariance terms, which areconsidered negligible, have been omit-ted here.

A set of typical learning curves forthe differentiated environment modelis shown in Fig. 11, for the mean-discriminating system. The param-eters are based on measurements for a

10,000

FIG. 11. P, and Pg as function of «,r. Parameters based on square-circle discrimination.

402 F. ROSENBLATT

square-circle discrimination problem.Note that the curves for Pr and Pg

both approach the same asymptotes,as predicted. The values of theseasymptotes can be obtained by sub-stituting the proper coefficients inEquation 9. As the number of asso-ciation cells in the system increases,the asymptotic learning limit rapidlyapproaches unity, so that for a systemof several thousand cells, the errors inperformance should be negligible on aproblem as simple as the one illus-trated here.

As the number of responses in thesystem increases, the performance be-comes progressively poorer, if everyresponse is made mutually exclusiveof all alternatives. One method ofavoiding this deterioration (describedin detail in Rosenblatt, 15) is throughthe binary coding of responses. Inthis case, instead of representing 100different stimulus patterns by 100distinct, mutually exclusive responses,a limited number of discriminatingfeatures is found, each of which can beindependently recognized as beingpresent or absent, and consequentlycan be represented by a single pair ofmutually exclusive responses. Givenan ideal set of binary characteristics(such as dark, light; tall, short;straight, curved; etc.), 100 stimulusclasses could be distinguished by theproper configuration of only sevenresponse pairs. In a further modifica-tion of the system, a single response iscapable of denoting by its activity orinactivity the presence or absence ofeach binary characteristic. The effi-ciency of such coding depends on thenumber of independently recognizable"earmarks" that can be found todifferentiate stimuli. If the stimuluscan be identified only in its entiretyand is not amenable to such analysis,then ultimately a separate binaryresponse pair, or bit, is required to

denote the presence or absence of eachstimulus class (e.g., "dog" or "notdog"), and nothing has been gainedover a system where all responses aremutually exclusive.

BIVALENT SYSTEMSIn all of the systems analyzed up to

this point, the increments of valuegained by an active A-unit, as a resultof reinforcement or experience, havealways been positive, in the sense thatan active unit has always gained inits power to activate the responsesto which it is connected. In thegamma-system, it is true that someunits lose value, but these are alwaysthe inactive units, the active onesgaining in proportion to their rate ofactivity. In a bivalent system, twotypes of reinforcement are possible(positive and negative), and an activeunit may either gain or lose in value,depending on the momentary state ofaffairs in the system. If the positiveand negative reinforcement can becontrolled by the application of ex-ternal stimuli, they become essentiallyequivalent to "reward" and "punish-ment," and can be used in this senseby the experimenter. Under theseconditions, a perceptron appears to becapable of trial-and-error learning. Abivalent system need not necessarilyinvolve the application of reward andpunishment, however. If a binary-coded response system is so organizedthat there is a single response orresponse-pair to represent each "bit,"or stimulus characteristic that islearned, with positive feedback to itsown source-set if the response is "on,"and negative feedback (in the sensethat active A-units will lose ratherthan gain in value) if the response is"off," then the system is still bivalentin its characteristics. Such a bivalent

THE PERCEPTRON 403

system is particularly efficient in re-ducing some of the bias effects (prefer-ence for the wrong response due togreater size or frequency of its asso-ciated stimuli) which plague the alter-native systems.

Several forms of bivalent systemshave been considered (15, Chap. VII).The most efficient of these has thefollowing logical characteristics.

If the system is under a state ofpositive reinforcement, then a positiveAV is added to the values of all activeA-units in the source-sets of "on"responses, while a negative AV isadded to the active units in the source-sets of "off" responses. If the systemis currently under negative reinforce-ment, then a negative AV is added toall active units in the source-set of an"on" response, and a positive AV isadded to active units in an "off"source-set. If the source-sets aredisjunct (which is essential for thissystem to work properly), the equa-tion for a bivalent -y-system has thesame coefficients as the monovalenta-system, for the /j-case (Equation 11).

The performance curves for thissystem are shown in Fig. 12, where theasymptotic generalization probabilityattainable by the system is plotted forthe same stimulus parameters thatwere used in Fig. 11. This is theprobability that all bits in an «-bitresponse pattern will be correct.Clearly, if a majority of correct re-sponses is sufficient to identify a stim-ulus correctly, the performance will bebetter than these curves indicate.

In a form of bivalent system whichutilizes more plausible biological as-sumptions, A-units may be eitherexcitatory or inhibitory in their effecton connected responses. A positiveAV in this system corresponds to theincrementing of an excitatory unit,while a negative AV corresponds tothe incrementing of an inhibitory unit.

Such a system performs similarly tothe one considered above, but can beshown to be less efficient.

Bivalent systems similar to thoseillustrated in Fig. 12 have beensimulated in detail in a series of ex-periments with the IBM 704 computerat the Cornell Aeronautical Lab-oratory. The results have borne outthe theory in all of its main predic-tions, and will be reported separatelyat a later time.

IMPROVED PERCEPTRONS ANDSPONTANEOUS ORGANIZATION

The quantitative analysis of per-ceptron performance in the precedingsections has omitted any considerationof time as a stimulus dimension. Aperceptron which has no capabilityfor temporal pattern recognition isreferred to as a "momentary stimulusperceptron." It can be shown (15)that the same principles of statisticalseparability will permit the perceptronto distinguish velocities, sound se-quences, etc., provided the stimulileave some temporarily persistenttrace, such as an altered threshold,

2000 woo 6000

FIG. 12. Pga for a bivalent binary system(same parameters as Fig. 11).

404 F. ROSENBLATT

which causes the activity in the A-system at time t to depend to somedegree on the activity at time t — 1.

It has also been assumed that theorigin points of A-units are completelyrandom. It can be shown that by asuitable organization of origin points,in which the spatial distribution isconstrained (as in the projection areaorigins shown in Fig. 1), the A-unitswill become particularly sensitive tothe location of contours, and perform-ance will be improved.

In a recent development, which wehope to report in detail in the nearfuture, it has been proven that if thevalues of the A-units are allowed todecay at a rate proportional to theirmagnitude, a striking new propertyemerges: the perceptron becomes cap-able of "spontaneous" concept forma-tion. That is to say, if the system isexposed to a random series of stimulifrom two "dissimilar" classes, and allof its responses are automatically rein-forced without any regard to whetherthey are "right" or "wrong," thesystem will tend towards a stableterminal condition in which (for eachbinary response) the response will be"1" for members of one stimulus class,and "0" for members of the otherclass; i.e., the perceptron will spon-taneously recognize the differencebetween the two classes. This phe-nomenon has been successfully dem-onstrated in simulation experiments,with the 704 computer.

A perceptron, even with a singlelogical level of A-units and responseunits, can be shown to have a numberof interesting properties in the field ofselective recall and selective attention.These properties generally depend onthe intersection of the source sets fordifferent responses, and are elsewherediscussed in detail (IS). By com-bining audio and photo inputs, it ispossible to associate sounds, or audi-

tory "names" to visual objects, and toget the perceptron to perform suchselective responses as are designatedby the command "Name the objecton the left," or "Name the color ofthis stimulus."

The question may well be raised atthis point of where the perceptron'scapabilities actually stop. We haveseen that the system described is suffi-cient for pattern recognition, associa-tive learning, and such cognitive setsas are necessary for selective attentionand selective recall. The system ap-pears to be potentially capable oftemporal pattern recognition, as wellas spatial recognition, involving anysensory modality or combination ofmodalities. It can be shown thatwith proper reinforcement it will becapable of trial-and-error learning,and can learn to emit ordered se-quences of responses, provided its ownresponses are fed back through sensorychannels.

Does this mean that the perceptron iscapable, without further modificationin principle, of such higher order func-tions as are involved in human speech,communication, and thinking? Ac-tually, the limit of the perceptron'scapabilities seems to lie in the area ofrelative judgment, and the abstractionof relationships. In its "symbolic be-havior," the perceptron shows somestriking similarities to Goldstein'sbrain-damaged patients (5). Re-sponses to definite, concrete stimulican be learned, even when the properresponse calls for the recognition of anumber of simultaneous qualifyingconditions (such as naming the colorif the stimulus is on the left, the shapeif it is on the right). As soon as theresponse calls for the recognition of arelationship between stimuli (such as"Name the object left of the square."or "Indicate the pattern that appearedbefore the circle."), however, the

THE PESCEPTRON 405

problem generally becomes excessivelydifficult for the perceptron. Statis-tical separability alone does notprovide a sufficient basis for higherorder abstraction. Some system,more advanced in principle than theperceptron, seems to be required atthis point.

CONCLUSIONS AND EVALUATION

The main conclusions of the theo-retical study of the perceptron can besummarized as follows:

1. In an environment of randomstimuli, a system consisting of ran-domly connected units, subject tothe parametric constraints discussedabove, can learn to associate specificresponses to specific stimuli. Even ifmany stimuli are associated to eachresponse, they can still be recognizedwith a better-than-chance probability,although they may resemble one an-other closely and may activate manyof the same sensory inputs to thesystem.

2. In such an "ideal environment,"the probability of a correct responsediminishes towards its original ran-dom level as the number of stimulilearned increases.

3. In such an environment, no basisfor generalization exists.

4. In a "differentiated environ-ment," where each response is asso-ciated to a distinct class of mutuallycorrelated, or "similar" stimuli, theprobability that a learned associationof some specific stimulus will be cor-rectly retained typically approaches abetter-than-chance asymptote as thenumber of stimuli learned by thesystem increases. This asymptotecan be made arbitrarily close to unityby increasing the number of associa-tion cells in the system.

5. In the differentiated environ-ment, the probability that a stimulus

which has not been seen before will becorrectly recognized and associated toits appropriate class (the probabilityof correct generalization) approachesthe same asymptote as the probabilityof a correct response to a previouslyreinforced stimulus. This asymptotewill be better than chance if the in-equality Pci2 < Pa < Pen is met, forthe stimulus classes in question.

6. The performance of the systemcan be improved by the use of a con-tour-sensitive projection area, and bythe use of a binary response system,in which each response, or "bit,"corresponds to some independent fea-ture or attribute of the stimulus.

7. Trial-and-error learning is possi-ble in bivalent reinforcement systems.

8. Temporal organizations of bothstimulus patterns and responses canbe learned by a system which usesonly an extension of the original prin-ciples of statistical separability, with-out introducing any major complica-tions in the organization of thesystem.

9. The memory of the perceptronis distributed, in the sense that anyassociation may make use of a largeproportion of the cells in the system,and the removal of a portion of theassociation system would not have anappreciable effect on the performanceof any one discrimination or associa-tion, but would begin to show up as ageneral deficit in all learned asso-ciations.

10. Simple cognitive sets, selectiverecall, and spontaneous recognitionof the classes present in a given en-vironment are possible. The recogni-tion of relationships in space and time,however, seems to represent a limit tothe perceptron's ability to form cog-nitive abstractions.

Psychologists, and learning theoristsin particular, may now ask: "Whathas the present theory accomplished,

406 F. ROSENBLATT

beyond what has already been done inthe quantitative theories of Hull,Bush and Mosteller, etc., or physio-logical theories such as Hebb's?" Thepresent theory is still too primitive, ofcourse, to be considered as a full-fledged rival of existing theories ofhuman learning. Nonetheless, as afirst approximation, its chief accom-plishment might be stated as follows:

For a given mode of organization(a, /3, or 7; S or n; monovalent orbivalent) the fundamental phenomenaof learning, perceptual discrimination,and generalization can be predicted en-tirely from six basic physical param-eters, namely:

x: the number of excitatory connec-tions per A-unit,

y: the number of inhibitory connec-tions per A-unit,

6: the expected threshold of an A-unit,

co: the proportion of R-units towhich an A-unit is connected,

NA: the number of A-units in thesystem, and

NR-. the number of R-units in thesystem.

Ns (the number of sensory units) be-comes important if it is very small.It is assumed that the system beginswith all units in a uniform state ofvalue; otherwise the initial value dis-tribution would also be required.Each of the above parameters is a clearlydefined physical variable, which is meas-urable in its own right, independently ofthe behavioral and perceptual phe-nomena which we are trying to predict.

As a direct consequence of its foun-dation on physical variables, thepresent system goes far beyond exist-ing learning and behavior theories inthree main points: parsimony, veri-fiability, and explanatory power andgenerality. Let us consider each ofthese points in turn.

1. Parsimony. Essentially all ofthe basic variables and laws used inthis system are already present in thestructure of physical and biologicalscience, so that we have found itnecessary to postulate only one hy-pothetical variable (or construct)which we have called V, the "value"of an association cell; this is a variablewhich must conform to certain func-tional characteristics which can clearlybe stated, and which is assumed tohave a potentially measurable physicalcorrelate.

2. Verifiability. Previous quanti-tative learning theories, apparentlywithout exception, have had one im-portant characteristic in common:they have all been based on measure-ments of behavior, in specified situa-tions, using these measurements (aftertheoretical manipulation) to predictbehavior in other situations. Sucha procedure, in the last analysis,amounts to a process of curve fittingand extrapolation, in the hope thatthe constants which describe one setof curves will hold good for othercurves in other situations. Whilesuch extrapolation is not necessarilycircular, in the strict sense, it sharesmany of the logical difficulties of circu-larity, particularly when used as an"explanation" of behavior. Such ex-trapolation is difficult to justify in anew situation, and it has been shownthat if the basic constants and param-eters are to be derived anew for anysituation in which they break downempirically (such as change fromwhite rats to humans), then the basic"theory" is essentially irrefutable, justas any successful curve-fitting equa-tion is irrefutable. It has, in fact,been widely conceded by psychologiststhat there is little point in trying to"disprove" any of the major learningtheories in use today, since by exten-sion, or a change in parameters, they

THE PERCEPTRON 407

have all proved capable of adaptingto any specific empirical data. Thisis epitomized in the increasingly com-mon attitude that a choice of theo-retical model is mostly a matter ofpersonal aesthetic preference or pre-judice, each scientist being entitled toa favorite model of his own. In con-sidering this approach, one is remindedof a remark attributed to Kistiakow-sky, that "given seven parameters, Icould fit an elephant." This is clearlynot the case with a system in whichthe independent variables, or param-eters, can be measured independentlyof the predicted behavior. In such asystem, it is not possible to "force"a fit to empirical data, if the param-eters in current use should lead toimproper results. In the currenttheory, a failure to fit a curve in a newsituation would be a clear indicationthat either the theory or the empiricalmeasurements are wrong. Conse-quently, if such a theory does hold upfor repeated tests, we can be consider-ably more confident of its validity andof its generality than in the case of atheory which must be hand-tailoredto meet each situation.

3. Explanatory power and generality.The present theory, being derivedfrom basic physical variables, is notspecific to any one organism or learn-ing situation. It can be generalizedin principle to cover any form of be-havior in any system for which thephysical parameters are known. Atheory of learning, constructed onthese foundations, should be consider-ably more powerful than any whichhas previously been proposed. Itwould not only tell us what behaviormight occur in any known organism,but would permit the synthesis ofbehaving systems, to meet specialrequirements. Other learning theo-ries tend to become increasinglyqualitative as they are generalized.

Thus a set of equations describing theeffects of reward on T-maze learningin a white rat reduces simply to astatement that rewarded behaviortends to occur with increasing prob-ability, when we attempt to generalizeit from any species and any situation.The theory which has been presentedhere loses none of its precision throughgenerality.

The theory proposed by DonaldHebb (7) attempts to avoid thesedifficulties of behavior-based modelsby showing how psychological func-tioning might be derived from neuro-physiological theory. In his attemptto achieve this, Hebb's philosophy ofapproach seems close to our own, andhis work has been a source of inspira-tion for much of what has been pro-posed here. Hebb, however, hasnever actually achieved a model bywhich behavior (or any psychologicaldata) can be predicted from the physio-logical system. His physiology ismore a suggestion as to the sort oforganic substrate which might under-lie behavior, and an attempt to showthe plausibility of a bridge betweenbiophysics and psychology.

The present theory represents thefirst actual completion of such abridge. Through the use of theequations in the preceding sections,it is possible to predict learning curvesfrom neurological variables, and like-wise, to predict neurological variablesfrom learning curves. How well thisbridge stands up to repeated crossingsremains to be seen. In the meantime,the theory reported here clearly dem-onstrates the feasibility and fruitful-ness of a quantitative statistical ap-proach to the organization of cognitivesystems. By the study of systemssuch as the perceptron, it is hopedthat those fundamental laws of organ-ization which are common to allinformation handling systems, ma-

408 F. ROSENBLATT

chines and men included, may even-tually be understood.

REFERENCES

1. ASHBY, W. R. Design for a brain. NewYork: Wiley, 1952.

2. CULBERTSON, J. T. Consciousness and be-havior. Dubuque, Iowa: Wm. C.Brown, 1950.

3. CULBERTSON, J. T. Some uneconomicalrobots. In C. E. Shannon & J. Mc-Carthy (Eds.), Automata studies.Princeton: Princeton Univer. Press,1956. Pp. 99-116.

4. ECCLES, J. C. The neurophysiologicalbasis of mind. Oxford: Clarendon,1953.

5. GOLDSTEIN, K. Human nature in thelight of psychopathology. Cambridge:Harvard Univer. Press, 1940.

6. HAYEK, F. A. The sensory order. Chi-cago: Univer. Chicago Press, 1952.

7. HEBB, D. O. The organization of be-havior. New York: Wiley, 1949.

8. KLEENE, S. C. Representation of eventsin nerve nets and finite automata. InC. E. Shannon & J. McCarthy (Eds.),Automata studies. Princeton: Prince-ton Univer. Press, 1956. Pp. 3-41.

9. KOHLER, W. Relational determinationin perception. In L. A. Jeffress (Ed.),Cerebral mechanisms in behavior. NewYork: Wiley, 1951. Pp. 200-243.

10. McCuLLOCH, W. S. Why the mind is inthe head. In L. A. Jeffress (Ed.),Cerebral mechanisms in behavior. NewYork: Wiley, 1951. Pp. 42-111.

11. MCCULLOCH, W. S., & PITTS, W. Alogical calculus of the ideas immanentin nervous activity. Butt. math. Bio-physics, 1943, S, 115-133.

12. MILNER, P. M. The cell assembly:Mark II. Psychol. Rev., 1957,64,242-252.

13. MINSKY, M. L. Some universal elementsfor finite automata. In C. E. Shannon& J. McCarthy (Eds.), Automatastudies. Princeton: Princeton Univer.Press, 1956. Pp. 117-128.

14. RASHEVSKY, N. Mathematical biophysics.Chicago: Univer. Chicago Press, 1938.

15. ROSENBLATT, F. The perceptron: Atheory of statistical separability incognitive systems. Buffalo: CornellAeronautical Laboratory, Inc. Rep.No. VG-1196-G-1, 1958.

16. UTTLEY, A. M. Conditional probabilitymachines and conditioned reflexes.In C. E. Shannon & J. McCarthy(Eds.), Automata studies. Princeton:Princeton Univer. Press, 1956. Pp.253-275.

17. VON NEUMANN, J. The general andlogical theory of automata. In L. A.Jeffress (Ed.), Cerebral mechanisms inbehavior. New York: Wiley, 1951.Pp. 1-41.

18. VON NEUMANN, J. Probabilistic logicsand the synthesis of reliable organismsfrom unreliable components. In C. E.Shannon & J. McCarthy (Eds.),Automata studies. Princeton: Prince-ton Univer. Press, 1956. Pp. 43-98.

(Received April 23, 1958)