international federation for information processinghh938gx7471/hh938gx7471.pdf ·...

8
V REPRINT OF INFORMATION PROCESSING 1962 PROCEEDINGS OF IFIP CONGRESS 62 INTERNATIONAL FEDERATION FOR INFORMATION PROCESSING MUNICH AUGUST 27 TO SEPTEMBER 1, 1962 n&^c NORTH-HOLLAND PUBLISHING COMPANY, AMSTERDAM

Upload: others

Post on 01-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INTERNATIONAL FEDERATION FOR INFORMATION PROCESSINGhh938gx7471/hh938gx7471.pdf · varyingexamplescan berecognized. Samuel'schecker playerimproves its function for evaluating positions,

V

REPRINT OF

INFORMATION PROCESSING 1962PROCEEDINGS OF IFIP CONGRESS 62

INTERNATIONAL FEDERATION FOR INFORMATION PROCESSING

MUNICHAUGUST 27 TO SEPTEMBER 1, 1962

n&^c

NORTH-HOLLAND PUBLISHING COMPANY, AMSTERDAM

Page 2: INTERNATIONAL FEDERATION FOR INFORMATION PROCESSINGhh938gx7471/hh938gx7471.pdf · varyingexamplescan berecognized. Samuel'schecker playerimproves its function for evaluating positions,

COMPUTER PROGRAMMINGAND FORMAL SYSTEMS

EDITORS

P. BRAFFORT and D. HIRSCHBERG

6 x 9" 170 pp. Gld. 20.—; 40s.; $5.60

Computer programmers and logicians are becoming increasingly inter-ested in one another's techniques. This is due to the growing diversityof computer usage, a situation which justifies a more fundamentalapproach to the symbol manipulation involved and also to the fact thatone of the applications happens to be automatic theorem proving.This book presents a collection of articles dealing with various aspectsof the relationship between formal system theory and computer pro-gramming and with a number of borderline problems. The papers arebased on contributions made at two seminars which were held at theIBM WTEC Education Centre of Blaricum (Holland), in April andOctober 1961.

CONTENTS — Introduction; 1. Mechanical Mathematics andInferential Analysis (HaoWang); 2. Observationsconcerning ComputingDeduction and Heuristics (E. W. Beth); 3. A Basis for a MathematicalTheory of Computation (John McCarthy); 4. An Abstract Computerwith a Lisp-like Machine Language without a Label Operator (P. C.Gilmore); 5. A Simplified Proof Method for Elementary Logic (StigKanger); 6. A Basis for the Mechanization of the Theory of Equations(A. Robinson); 7. Programming and the Theory of Automata (Arthur

W. Burks); 8. The Algebraic Theory of Context-Free Languages(N. Chomsky and M. P. Schutzenberger).

Page 3: INTERNATIONAL FEDERATION FOR INFORMATION PROCESSINGhh938gx7471/hh938gx7471.pdf · varyingexamplescan berecognized. Samuel'schecker playerimproves its function for evaluating positions,

LEARNING, GENERALITY AND

PROBLEM-SOLVING

A. NEWELL*

Carnegie Institute of Technology, Pittsburgh, Pa., USA

1 . Introduction

In the field of artificial intelligence we are engagedin constructing mechanisms that reproduce the in-formation processing we see in man. Sometimes wewish only to match or surpass his external per-formance; sometimes we are at pains to simulate hisbehavior in detail. For this paper the distinction isof little importance. For many sufficiently limitedsymbolic tasks, such as multiplication, we both under-stand the task and can greatly exceed man's per-formance; interest fades from these. We are moreconcerned with performances that seem highly sophis-ticated and elaborate, where we understand completelyneither the task nor how to accomplish it. Yet wedemand a certain clarity: dreaming, reverie and witstill lie outside the operational boundaries of artificialintelligence in 1962.

Among the functions that do concern us, learningplays a peculiarly crucial and subtle role. For instance,the following typical questions are asked with greatregularity by sophisticated visitors:

"I understand that your program plays chess, butdoes it learn to play better?""No.""Oh." (with an intonation of disappointment)"Will your chess program repeat itself exactlyif it is payed against in the same way?""Yes.""I see." (with a hint of satisfaction: the programhas failed some crucial test)

Throughout the field of artificial intelligence emphasisis continually placed on learning. Pattern recognitionprograms, for instance, continually underplay theperformance of recognition, emphasizing instead theability to learn to recognize new patterns. WhenSamuel's checker player i) is cited in discussions ofproblem solving, its learning is usually stressed as acrucial feature.

Why does learning play this crucial role? What is itsspecial power? I would like to explore these questionsand their implications for progress in artificial intel-ligence.

2. The Problem of Learning

Learning is a broad but well established concept thatstands for a cluster of notions, many of which are im-

* lam much indebted to my colleague H. A. Simon fordiscussions on the issues raised in this paper.

plicit. Discussions of learning, such as the following,arenot attempts at precision; they constitute hypothesesabout what are the important features of the existingconcept.

If we observe that a performance of X is a functionof its experience in the past, then we say that X learnedfrom its experience. This corresponds generally to theman-in-the-street's notion of learning. There are somany things wrong with it, that the psychologists longago elaborated it to the paradigm of fig. 1 . At time T

T'T

Performance

Program

Performance

Program

Fig. 1. Learning paradigm

we observe the performance of X on a specific task;if at T its performance on the same task has changed,it has learned something from its previous performanceand intervening experience. The task may be repeatedover and over again in order to study the course oflearning. With a few caveats, this paradigm has suf-ficed in the psychological laboratory.

The paradigm is also much used in the study ofartificial intelligence. It has even been slightly elaboratedsince we can determine the internal structure of ourlearning machines precisely. The machine consists of aperformance program for doing the task and a learningprogram that modifies the performance program as afunction of its behavior on the task.

Does this paradigm express the aspects of learningthat are really important to us? Let me test it againstthe existing problem solving programs. By commonconsent these are not learning programs—witness thedialogues just quoted on the chess program—yet theyfit the paradigm exactly. Fig. 2 shows the top-levelflow diagram of LT, one of the early theorem-provingprograms 2).

The executive routine is performed repeatedly in thesame situation. On first performance it usually does notsolve the problem and on some later performance itdoes. Its eventual success—that is, its change in per-formance—is due to the experience accumulated fromthe intervening trials. Even LT's internal structure

407

Page 4: INTERNATIONAL FEDERATION FOR INFORMATION PROCESSINGhh938gx7471/hh938gx7471.pdf · varyingexamplescan berecognized. Samuel'schecker playerimproves its function for evaluating positions,

[IX, 2408 machine learning

corresponds to the structure of a learning machine. Thetop box with the subproblem tree corresponds to theperformance program. This includes the substitutionmethod, which is the only method in LT that canactually solve problems. The lower box in the diagramcorresponds to the learning program. It contains the

Fig. 2. LT executive

detachment and chaining methods, which create newsubproblems and add them to the subproblem tree—that is, modify the performance program. This partis called into play only after the performance programfails; that is, learning occurs only on failure. Themodification is adaptive, since by construction it gener-ates new subproblems that increase the probability ofsuccess on later trials.

This example is not a trick. LT can in fact Desaid to learn the solution. However, LT is still notusually considered a learning program, nor do Iconsider it one. Instead, important elements exist inour concept of learning besides those given in thestandard paradigm.

The most important way the problem solvers seemdeficient as learning machines is in the use they makeof their experience. All the information is accumulatedto obtain a single result—to prove the original theorem.Once this is done, the experiences have no more value.In contrast, the devices we do call learning machinesuse their experience to prepare for a whole range offuture situations. Pattern recognizers use the learningtrials to extract information from which innumerablevarying examples can be recognized. Samuel's checkerplayer improves its function for evaluating positions,which is used in all future play. Significant learningseems to imply obtaining something of general utilityfrom the experience.

We also seem to require a learning program to usegenuine generalization and induction in processing itsexperience. The problem solvers are again deficient onthis score, since what is learned from each new experi-ence—from each new subproblem—is primarily that itis not a solution. In contrast, some basic postulate ofgeneralization is a prominent part of current learningmachines. Often this is simply "what has worked inthe past will work in the future," but it always exists.

3. The Problem of Generality

These added factors—general utility and induction—that imply significant learning, as opposed to learningthat simply satisfies theparadigm, reveal a deep concernin artificial intelligence to construct a machine generalenough to transcend the visionof its designer.* Learningis viewed as the major means of achieving this. Manhas the ability to get along on his own and to define forhimself the terms on which he will enter into depend-

encies with parts of his environment. In artificial in-telligence, we face the prospect of eternally special-purpose machines, brilliant within a narrow range, butalways encased within an artificial universe bounded bythe limited vision of their designers.

Generality, then, is one of our major goals—theability to cope with the range and diversity of the realworld. Theoretically, we do not care how we get sucha machine. We would accept a perfectly constructedgeneralproblem solver. Such a machine would not needto learn at all! That we reject this possibility—as bycommon consent I believe we do—expresses the con-viction that the real world is too diverse, too full ofdetails and too complex to permit any pure "perform-ance" device to achieve perfection. Continuous learningis the price of generality.

When learning is emphasized as the basic solutionto the problem of generality, we must include all theways experience might be processed to prepare forfuture action. The world itself determines what regu-larities exist over time and place to be exploited bylearning; their nature determines in large measure thekind of processing that must be used.

However, current learning machines utilize an ex-tremely narrow range of mechanisms. Thecentral ideasare well known: repetition, simple reinforcement, statis-tically determined weights. They derive in many waysfrom taking seriously the paradigm of fig. 1 as realizedin animal experimentation. The range of relevant pro-cesses is clearly broader than this, however difficult itmay be to visualize the possibilities. Our horizons needexpanding on the varieties of mechanisms that arerelevant and requisite to achieving generality.

4. Representations

All learning requires an internal representation of theexperiences to be made available for later use. This maybe as simple as a recorded fact, as obscure as theconnectivity of a network, or as elaborate as a detailedmap, but it must exist. The representation forms acrucial bridge, not just between the moment of gather-ing and the moment of using information, but betweendescriptions of the environment and determinations ofaction 3 ). The representation is pulled in two direc-tions: processes must translate from the raw experi-ence to the representation; to simplify this translation,the representation should be simple in terms of theenvironment. But processes must also translate fromthe representation to action; to simplify this trans-lation, the representation should be simple in termsof the action principles of the machine. Complexityof translation can be exchanged between the encodingand decoding processes by an appropriate choice ofrepresentation.** In all events, a certain gap must bebridged to getfrom the environment to its implicationsfor action. The representation remains the filterthrough which all information must pass.

Representations of experience are crucial in anotherway. They form limits, not only to what can be ac-

* This same concern is expressed, although improperly,in the question of whether a machine can outperform itsdesigner. This question was laid to rest long ago, both triviallyin tasks like multiplication and non-trivially in tasks likecheckers.

There is a nice discussion of this in 4).

Page 5: INTERNATIONAL FEDERATION FOR INFORMATION PROCESSINGhh938gx7471/hh938gx7471.pdf · varyingexamplescan berecognized. Samuel'schecker playerimproves its function for evaluating positions,

IX, 2] machine learning 409

complished by the machine, but to what can beenvisaged by the designer. New forms of learningrequire the invention of representations that can ex-press new potentialities—that make variable what waspreviously fixed. These extensions are rarely suggestedby existing structures; they constitute true inventions.For example, LT never learned new methods, becausewe could not invent a space of possible methods ex-pressed in a language that LT could manipulate. Aconsequence of this paucity of good representations isthat existing learning programs tend to cluster aroundthe few existing representations. Here, above all, weneed our horizons extended.

Most of the representations currently in use areextremely close to the action scheme of their machines.This is quite natural, since the easiest way to get alearning scheme is to generalize some feature of theway an existing machine performs. The problem oftranslation from the representation to action is therebysolved, since the representation is already in termsdirectly understood by the performance program.However, translation of the raw experience into therepresentation must still occur. Our excessive tendencyto describe an environment solely by the dichotomy,"succeed" or "fail," may partly reflect the difficultiesof this translation.

Many learning programs use some collection ofnumerical parameters of the action scheme as a repre-sentation, which are then optimized by experience.From our viewpoint here, these are the least interestingkinds of learning, precisely because they never leadbeyond themselves—which does not deny their oc-casional effectiveness. The variety of the world is farricher than can be represented by a fixed set ofnumbers controlling a machine of fixed structure.

More interesting are representations using ex-tremely general languages of action. Much of the workin learning machines has used networks of linearthreshold elements, the set of threshold values servingas the representation of experience. If I dismiss thiswork somewhat summarily, it is only because thisrepresentation still seems too limiting to express,without great organizational innovation, the kind ofdetail and complexity both of action and environmentthat we see in the world.

I am more intrigued with programming languages,the one known class of action languages able to ex-press truly complex behaviors. However, programs areextremely obdurate languages for encoding descriptiveexperiences, which is why they have not been usedmuch in learning programs. Early examples, such asFriedberg's "

r>), did not incorporate enough power andstructure to fashion programs that could accomplishuseful tasks. Work by Kilburn, Grimsdale and Sumnerthree years ago, (i) removed some of the deficiency, andwas considerably more successful. Currently, at" leasttwo efforts are developing programs that constructprograms. One, by Amarel 7) takes a set of examplesof the desired inputs and outputs and tries to discovera program that yields these; the other, by my colleagueH. A. Simon 8), takes general defining statements aboutthe input and output and attempts to find the program.These programs are not learning programs, preciselyas LT is not.* They solve the problem of constructinga program that meets certain conditions. Their signifi-cance lies in developing techniques for translating from

descriptive information to a language of action richenough to express complex behavior. Continued ad-vance along these lines should finallypermit represen-tations of experience that are much richer and thatmore directly reflect the environment than the action-oriented representations we use currently.

Another kind of action language is worth noting.Pattern recognition of the non-network variety tendsto use a set of features of the sample to be identified,along with learning schemes for selecting and weight-ing these features. The representational limitations ofsets of weights has already been commented upon.Recently, programs have been written that producetheir own features. The most advanced scheme is thatof Uhr and Vossler 9).** From our viewpoint theadvance of these programs derives from representingpast experience not merely as numerical weights, butas features.

The ways of representing experience discussed abovewere developed directly out of the action structure ofthe machine. If we turn to the other side—to repre-sentations that mirror the environment—less hasbeen done. The work of Remus i-) moves in this di-rection by basing actions directly on a descriptiveclassification of the environment. More generally, thesewould be programs that seek to construct models,theories or explanations of their environment, inde-pendent of particular action implications. A hypothesistesting program by Feldman, Tonge and Kanteris)is an example. Programs designed to obey naturallanguage l4, 4 ) also constitute exceptions, since they facethe problem of first understanding what is being said.These latter programs, by exploring the possibilitiesfor environment-oriented representations, have greatrelevance for generality and learning, even though theyare not cast directly in the form of learning machines.

I have been emphasizing the critical role of theinternal representation of experience, and the limitsit puts on our vision in constructing generalmachines.Whole ranges of mechanisms seem to me to lie outsidethe view of much of the current work on learningmachines, limited as they are by implicit assumptionsabout how experience should be inducted and repre-sented. Consequently I have been stressing those areasof research—programs that construct programs, re-cognizers that create new features, and programs thatbuild descriptive models of the environment—whichexpand our horizons by making available new formsof representation. I would like to expand further thepossibilities for using experience by means of anexample.

5. An Example from GPS

The program called GPS was described at theUnesco Conference in Paris in 1959 is) and has beenreported on several times since.*** GPS is a computer

* Nevertheless, the early programs called themselveslearning programs.

** Interestingly, the very early program of Selfridge andDinneen 10, ]1) also created its own features.

*** GPS is the joint work of J. C.

Shaw,

H. A. Simon andthe author. This section stems from this joint

effort,

althoughI alone am responsible for its shortcomings. I am indebtedto C. Bush for his help in running GPS.

Page 6: INTERNATIONAL FEDERATION FOR INFORMATION PROCESSINGhh938gx7471/hh938gx7471.pdf · varyingexamplescan berecognized. Samuel'schecker playerimproves its function for evaluating positions,

[IX, 2410 MACHINE LEARNING

program for solving problems, a member of the classof game-playing and theorem-proving programs. It hasbeen used both for exploration into artificial intelli-gence, and for detailed simulation of human problemsolving

LG

' 17). Its performance in both of these aspectshas been dominated by the problems it has presentedin program organization 18).

Fig." 3 shows the general scheme by which GPSoperates. Much goes on under the surface of thisdiagram; yet it is still the most adequate simple

Transform

object A into object B:Match

Atoß

—- |Reduce p|—- [a' into b|

Reduce

difference

D:

Search for

Q(D)—— jApply Qto A |

Apply operator Q to object A

Match Ato C(Q)—- 1 Reduce p|— | ApplyQ toA^Fig. 3. Means-ends analysis

picture. GPS is a program for accepting a task en-vironment denned in terms of discrete objects, oper-ators that manipulate theseobjects, and particular taskslike transforming one object into another. It performsthese tasks by growing a structure of goals. Each goalis a data structure that describes some state of affairsto be achieved and gives ancilliary information aboutmethods, history and environment. Different types ofgoal exist, each with their own methods. The threemethods shown are the crucial ones. (1) To trans-form an object A into an object B, the first methodmatches the two objects; if they are not the same, adifference is found, which leads to setting up a subgoalof reducing that difference. If this subgoal is attained,a new object A' is produced, which hopefully is morelike B. The subgoal is then set up to transform A'into B. (2) The principal method to reduce a differenceis to find an available operator relevant to that differ-ence, and to set up the subgoal of applying it. (3)To apply an operator, the operand is matched to theconditions required by the operator. This may lead toa difference, which requires setting up a subgoal toreduce it, similar to the earlier case. The methods,these three along with others, grow a tree of sub-goals in the process of solving a problem. In addition,GPS has devices for pruning and shaping the tree ofsubgoals: checking for duplicate objects and goals;rejecting goals as unprofitable; and selecting goals asespecially worthwhile.

With this brief sketch, let me consider some of theissues raised earlier. A problem for GPS is basicallydefined by objects and operators. The differences arethe constructs that mediate between them, that allowGPS to select artfully the operators that are appro-priate to transform one object into another. Fig. 4shows the table of connections for logic, wherebydifferences lead to relevant operators. Thedifferences are not given explicitly by either the taskenvironment or the specific task; clearly GPS shouldlearn them for itself. The fact of the program is other-wise: we, GPS's designers, give it differences as partof its equipment to deal with a task when we definea task environment. Thus, differences represent asensitive point of dependence of GPS upon itsdesigners.

Three years ago in a paper entitled "A Variety ofIntelligent Learning in GPS" 1") we discussed thesesame issues, and there also focussed on the differences.First we observed that, although a simple standardlearning paradigm could be used to learn the connec-tions in the table of fig. 4, the information could begenerated directly. If the input and output forms ofthe operators were matched, the differences so foundwould be the table entries—that is, the operators wererelevant to precisely those differences they produced.Looking, rather than learning from repetitive trial, wasthe appropriate way to gain experience here.

We went on to consider how the differences them-selves might be generated by GPS. As initially created,each difference was simply a symbol, linked to amachine language program to detect the difference.From GPS's viewpoint differences were unanalyzableunities; from ours they were members of the class ofall programs. From either viewpoint no learning waspossible. We first had to invent a representation ofdifferences in terms of a simpler programminglanguage—one in which differences, though programs,were represented as objects. We then gave GPSoperators to construct and modify these differenceprograms, and differences to detect features of theseprograms. We thus cast the learning of differencesinto the form of a problem for GPS in terms of objectsand operators, one it could handle by the same means

Rl R2 R3 R4 R5 R6 R7 RS R9 RIO Rl I Rig

Add Vo-iobles X _X_ X_Delete Voiiobles X XXlncreo5e Number

"

Z^l—A -^ __ZDecrease Number X X XXChange Connective XXXChonge Sign X^ >y XChonge Grouping XChonge Position X X

connectionsLogic table ofFig. 4.

it handles all problems. We carried this effort througha small hand simulation, problems of programorganization still preventing our simulating it in themetal. Our exploration went far enough to make thepoint that intelligent learning might look more likeproblem solving than like a stochastic learning process.

Today I wish to reconsider the matter along analternative path, although I shall make much the samepoint. We again wish GPS to obtain somehow its owndifferences. However, GPS cannot be given only theobjects and operators; some elementary form ofperceptual discrimination must be available to it. Weassume that, for each elementary attribute of an objectstructure, GPS can tell its location in the structure,and whether corresponding attributes of two expres-sions have the same or different values. If we providedGPS with less perceptual capability than this,we shouldbe hiding some of the environment from it.

In the best of all worlds, GPS would have anoperator available to remove directly every elementarydifference between a given object and a desired object.If at position P, attribute A for the given object hadthe value X while for the desired object it had thevalue V, then an operator would exist which wouldchange X to V without disturbing anything else. Inreality, GPS must work with operators that are notnearly so obliging as these operators of immediate

Page 7: INTERNATIONAL FEDERATION FOR INFORMATION PROCESSINGhh938gx7471/hh938gx7471.pdf · varyingexamplescan berecognized. Samuel'schecker playerimproves its function for evaluating positions,

411IX, 2] MACHINE LEARNING

perception. (Real saws produce sawdust, as well ascut.) Matching can be looked on as an act of wishfulthinking—of seeing the problem in terms of theimmediate perceptual operations that would make oneexpression like the other. By way of example, fig. 5shows two simple logic expressions as GPS would seethem. Each consists of a collection of nodes, linked

L Left SubexpressionR Right Subexpression

Fig. 5. Difference structure of objects

together by the attributes. If the two expressions arematched, the difference structure shown below themis obtained; it specifies all the elementary things onewould do (if one could) to change the first expressiondirectly into the second.

That the real-world operators are incovenient anddo not allow such direct transformations, does notmake them unpatterned. Each one can be seen as acomposite, made up of these same elementary per-ceptual operations. Fig. 6 shows the difference struc-ture derived by matching the input form to the outputform of an operator. Thus, the basic bridge fromenvironment to action is built: both objects andoperators are described in a common language ofimmediate perceptions.

The price paid for this becomes evident as soonas we attempt to construct the table of connectionsfrom differences to operators. When the differences

Symbo,AA

L Left Subexpression

l A\r R RigM SubexpressionReplace

Re£loc

cB A

Fig. 6. Difference structure of operator

were specified by the designers, each difference repre-sented a macroscopic change (some pattern of thesimple differences just discussed), known to be ofutility in solving problems in terms of the set ofoperators available. Thus there was a difference,change of position, and operators, such as

which dealt simply and directly with this difference.The table of connections was formed between thenamesof the individual differences and the names of theindividual operators. With the elementary differenceswe are now considering this is no longer possible, sinceeach operator has a complicated structure of differ-ences.

I will borrow a technique to deal with this, from

the work of my colleagues E. A. Feigenbaum andH. A. Simon 20). Basically, their program EPAM con-sists of a very simple but intriguing scheme for dis-criminating among a collection of objects by growinga tree of tests when confronted with the task ofsorting the objects. GPS already uses EPAM-liketrees. During the course of problem solving each newlycreated structure (either a goal or an object) is sortedthrough a tree, which normally grows to accommodateit. However, if an identical structure has already beenstored in the tree, it is discovered, since the newstructure comes to rest at the same place in the tree.Appropriate action is then taken. For example, by thissorting process GPS immediately recognizes the finaldesired object if it ever generates it, independently ofits reason for generating it. GPS does not have to askthe question deliberately of each new object "Areyou thefinal answer?".

We will use an EPAM net for our table ofconnections, growing it under the impact of theoperators that are available for a problem. Fig. 7

L

Left

R RightM Main

Fig. 7. Tree of connections for logic

shows the tree that might be produced by dis-criminating the standard set of logic operators. Thedifference structure of each operator was sorted downthis tree until it came to rest either at an unoccupiedslot, or at a place holding a previously stored differ-ence structure. In the latter case, the two differencestructures were matched and the most important dif-ference between them was used to define a test fromwhich new branches would develop to discriminate thetwo operators. Once the tree of operators has beengrown, it can be used to select operators for reducingthe differences found between two objects. Their dif-ference structure can be sorted down this tree to selectthe operators that most closely fit the pattern ofelementary differences in the structure.

Let us consider the total operation of GPS withsuch a modification. In a new task area, GPS firstexplores the environment by processing the operatorsas we have shown; the tree that is grown representsthe set of operators in their capacity to modify dif-ferences in this environment. Then GPS is preparedto turn to particular tasks, using this representationas a central part of its problem solving.

One aspect of this description is still missing.Generalization was a central feature of our discussionof learning; one part of the environment is relevantto another only by an inductive leap. In most learningproblems one is trying to pass between essentiallyanalogous situations and the basis of induction fromone to the other is usually some form of invariance:what was good there is good here. In the present

sssion p.

r.

.

L/\R

R.jlo^/ \rMoc,<AA^ L/\R

Add/

VAdjJp- O

O

s

Avß=»BvA

gumbo/j-y ■*■

=>"

I np iM^^^-^-^Ou t

put

h L/\R IX\Avsisyy X^ip/g sv"°iy \&-j

A.B => B.A, A\lß => ByA, and AA B => -BD—A,

Page 8: INTERNATIONAL FEDERATION FOR INFORMATION PROCESSINGhh938gx7471/hh938gx7471.pdf · varyingexamplescan berecognized. Samuel'schecker playerimproves its function for evaluating positions,

[IX, 2412

MACHINE

LEARNING

v

arrangement although the inductive bases are dif-ferent, generalization is by no means absent.

At least three generalizing assumptions are incor-porated in this scheme. The most important is theassumption of "conservation of symbols." A term ina desired expression must come from somewhere: ifSV T is to transform into TVS then it is a fair as-sumption that the T is the second expression "comesfrom" the T in the first. Thus GPS creates ageneralized difference structure by replacing eachterm in the raw difference structure with an expres-sion representing the location of that term in the

/ L Left Subexpressions R Right Subexpression

Fig. 8. Generalized difference structures

expression. Fig. 8 shows the generalized differencestructures for the two examples of figs. 5 and 6.

The second generalizing assumption is spatial in-variance: most operators in logic can be applied atany location in the expression. Hence the position ofthe total pattern of differences in the total expressionis irrelevant. Consequently, the lowest node of theraw difference structure that covers all differences ismade the top node of the generalized differencestructure. This assumption is quite analogous, bothin its power and in its inductive basis, to the trans-formations, such as centering, focussing and smooth-ing, that are standard in many pattern recognizers.

The third generalizing assumption concerns thesign of the logic expressions. Whenever operations ona binary variable (say with values -f- and —) are sym-metric in the values, it is possible to describe bothchanges from + to — and from — to -f- simply as"change value." This is a familiar transformation inall work with Boolean variables. As shown in fig. 8,the generalized difference structure is transformed thisway.

GPS, of course, neither learns nor discovers thesebases of generalization. To do so would require in-venting a representation of the environment thatwould include in a plausible way many possible bases.Neither GPS, not its designers, know yet how tospecify such a representation.

Surely what we have been describing here is learn-ing, even though there is no repetition of experience,no success or failure, no piling up of statistical data

about the past. The key features of learning—prepar-ation for an indefinite future, and the use of generali-zation to bridge separate experiences—are both clearlypresent. Something akin to an analysis of the en-vironment is going on. So does one problem solvingprogram slowly grow independent of its designers.

6. References!)

Samuel,

A. L.: Some Studies in Machine Learning usingthe Game of Checkers. IBM J. Res. Dev. 3 (1959) 210.

2) Newell, A., H. A. Simon and J. C. Shaw: EmpiricalExplorations of the Logic Theory Machine: a Case Studyin Heuristics. Proc. W. loint Comp. Conf. (1957).

A Newell, A. and H. A. Simon: Computer Simulation ofHuman Thinking and Problem Solving. M. Greenberger (cd.)Management and the Computer of the Future. (Wiley,New York, 1962).

4) Lindsay, R. X.: Toward the Development of a Machinewhich Comprehends. Unpublished Ph.D. thesis, CarnegieInstitute of Technology. (1961).

B) Friedberg, R. M.: A Learning Machine. Part I. IBM J.Res. Dev. 2, (1958) 2.

6) Kilburn, T. R., R. L. Grimsdale and F. H. Sumner:Experi-ments in Machine Learning and Thinking. InformationProcessing. (Paris 1959).

7 ) Amarel, S.: On the Automatic. Formation of a ComputerProgram which Represents a Theory. Proc. Conf. on SelfOrganizing Systems. (Chicago, 1962; to be published).

8)

Simon,

H. A.: Experiments with the Heuristic Compiler.P-2349, The RAND Corporation, Santa Monica, California.(1961).

!)) Vossler, C. and L. Uhr: Computer Simulations of aPerceptual Learning Modelfor Sensory Pattern Recognition,Concept Formation, and Symbol Transformation. TheseProceedings.

10) Dinneen, G. P.: Programming Pattern Recognition. Proc.W. loint Comp. Conf. (1955).

n) Selfridge. O.: Pattern Recognition and Modern Computers.Proc. W. loint Comp. Conf. (1955).

12) Remus, H.: Simulation of a Learning Machine forPlaying GO. These Proceedings.

1S

) Feldman, 1., F. Tonge and H. Kanter: EmpiricalExplorations of a Hypothesis-Testing Model of BinaryChoice Behavior. 5P0546, System Development Corporation,Santa Monica, California (1961).

'■*) Green, B. F., A. K. Wolf, C. Chomsky and K. Laughery:Baseball: an Automatic

Question-Answerer.

Proc. W. lointComp. Conf. (1961).

IB) Newell, A., I. C. Shaw and H. A. Simon: Report on aGeneral Problem Solving Program Information Processing.(Paris 1959).

1C) Newell, A. and H. A. Simon:

GPS,

a Program that SimulatesHuman Thought. H. Billing (cd.). Lernende Automaten.(Oldenbourg, Munich 1961).

n ) Newell, A. and H. A. Simon: Computer Simulation ofHuman Thought. Science 134 (1961) 2011.

ls) Newell, A.: Some problems of Basic Organization inProblem Solving Programs. Proc. Conf. on Self OrganizingSystems (Chicago 1962; to be published).

io) Newell, A., J. C. Shaw and H. A. Simon: A Variety ofIntelligent Learning in a General Problem Solver.M. C. Yovits and S. Cameron (eds.). Self OrganizingSystems. (Pergamon, New York 1960).

2n) Feigenbaum, E. A. and H. A. Simon: Generalization ofan Elementary Perceiving and Memorizing Machine.These Proceedings.