a question-answering system for high school algebra word

24
592 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 There are a number of reasons why I chose the context of algebra story problems in which to embed an English language question-answer- ing system. First, we know a good type of data structure in which to store information needed to answer questions in this context, namely, algebraic equations. There exist well known algorithms for deducing information implicit in the equations, that is, values for particular variables which satisfy the set of equations. In addition, I felt that there was a manage- able subset of English in which many types of algebra story problems were expressible. A large number of these story problems are avail- able in first year high school text books, and I have transcribed some of them into STU- DENT's input English. Since this question- answering task is one performed by humans, and since the entire process from input to solu- tion of the equations was programmed, we can obtain a measure of comparison between the performance of STUDENT and of a human on the same problems. In fact, this program on an IBM 7094, answers most questions that it can handle as fast as or faster than humans trying the same problem. In judging this comparison, one should remember the base speed of the IBM 7094, which can perform over one hun- dred thousand additions per second. II. SEMANTIC GENERATION AND ANAL- YSIS OF DISCOURSE The purpose of this section is to put the tech- niques of analysis embedded in the STUDENT program into a wider context, and indicate how they would fit into a more general language processing system. .We will describe a theory of semantic generation and analysis of dis- course. STUDENT can then be considered a first approximation to a computer implementa- tion of the analytic portion of the theory, with certain restrictions on the interpretation of a discourse to be analyzed. It will be evident f:rom the theory why analysis is so greatly simplified by the imposed restrictions. A. Language as Communication Language is an encoding used for communi- cation between a speaker and a listener ( or writer and reader). To transmit an "idea", the speaker must first encode it in a message, as a string in the transmission language. In order to understand this message, a listener must de- code it and extract its meaning. The coding of a particular message, M, is a function of both its global context and local context. The global context of a message is the background knowl- edge of the speaker and the listener, including some knowledge of possible universes of dis- course, and codings for some simple ideas. The local context of a message, M, is the set of messages temporally adjacent to M. M may refer back to earlier messages. M may even be just a modification of a previous message, and only understandable in this context. For ex- ample, consider the second sentence of the fol- lowing discourse: "How many chaplains are in the U.S. Army? How many are in the Navy?" In order for communication to take place, the information map of both the listener and the speaker must be approximately the same, at least for the universe of discourse; also the de- coding process of the listener must be an ap- proximate inverse of the encoding process of the speaker. Education in language is, in large part, an attempt to force the language proces- sors .of different people into a uniform mold to facilitate successful communication. Weare not proposing that identity in detail is achieved, but as Quine 9 put it : "Different persons growing up in the same language are like different bushes trimmed and trained to take the shape of identical ele- phants. The anatomical details of twigs and branches will fulfill the elephantine form differently from bush to bush, but the over- all outward results are alike." As a speaker transmits successive messages concerning some portion of his information map, the listener who understands the messages constructs a model of a "situation". The rela- tion between the listener's model and the speak- er's information map is that from each can be extracted the transmitted information relevant to the universe of discourse, informa- tion deducible from the entire set of messages. The internal structure of the listener's model need bear no resemblance to that of the speaker, and may in· general contain far less detail. From the collection of the Computer History Museum (www.computerhistory.org)

Upload: trinhdung

Post on 01-Jan-2017

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: A Question-Answering System for High School Algebra Word

592 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964

There are a number of reasons why I chose the context of algebra story problems in which to embed an English language question-answer­ing system. First, we know a good type of data structure in which to store information needed to answer questions in this context, namely, algebraic equations. There exist well known algorithms for deducing information implicit in the equations, that is, values for particular variables which satisfy the set of equations.

In addition, I felt that there was a manage­able subset of English in which many types of algebra story problems were expressible. A large number of these story problems are avail­able in first year high school text books, and I have transcribed some of them into STU­DENT's input English. Since this question­answering task is one performed by humans, and since the entire process from input to solu­tion of the equations was programmed, we can obtain a measure of comparison between the performance of STUDENT and of a human on the same problems. In fact, this program on an IBM 7094, answers most questions that it can handle as fast as or faster than humans trying the same problem. In judging this comparison, one should remember the base speed of the IBM 7094, which can perform over one hun­dred thousand additions per second.

II. SEMANTIC GENERATION AND ANAL­YSIS OF DISCOURSE

The purpose of this section is to put the tech­niques of analysis embedded in the STUDENT program into a wider context, and indicate how they would fit into a more general language processing system. . We will describe a theory of semantic generation and analysis of dis­course. STUDENT can then be considered a first approximation to a computer implementa­tion of the analytic portion of the theory, with certain restrictions on the interpretation of a discourse to be analyzed. It will be evident f:rom the theory why analysis is so greatly simplified by the imposed restrictions.

A. Language as Communication

Language is an encoding used for communi­cation between a speaker and a listener ( or writer and reader). To transmit an "idea", the

speaker must first encode it in a message, as a string in the transmission language. In order to understand this message, a listener must de­code it and extract its meaning. The coding of a particular message, M, is a function of both its global context and local context. The global context of a message is the background knowl­edge of the speaker and the listener, including some knowledge of possible universes of dis­course, and codings for some simple ideas.

The local context of a message, M, is the set of messages temporally adjacent to M. M may refer back to earlier messages. M may even be just a modification of a previous message, and only understandable in this context. For ex­ample, consider the second sentence of the fol­lowing discourse: "How many chaplains are in the U.S. Army? How many are in the Navy?"

In order for communication to take place, the information map of both the listener and the speaker must be approximately the same, at least for the universe of discourse; also the de­coding process of the listener must be an ap­proximate inverse of the encoding process of the speaker. Education in language is, in large part, an attempt to force the language proces­sors .of different people into a uniform mold to facilitate successful communication. Weare not proposing that identity in detail is achieved, but as Quine 9 put it :

"Different persons growing up in the same language are like different bushes trimmed and trained to take the shape of identical ele­phants. The anatomical details of twigs and branches will fulfill the elephantine form differently from bush to bush, but the over­all outward results are alike."

As a speaker transmits successive messages concerning some portion of his information map, the listener who understands the messages constructs a model of a "situation". The rela­tion between the listener's model and the speak­er's information map is that from each can be extracted the transmitted information relevant to the universe of discourse, ~ncluding informa­tion deducible from the entire set of messages. The internal structure of the listener's model need bear no resemblance to that of the speaker, and may in· general contain far less detail.

From the collection of the Computer History Museum (www.computerhistory.org)

Page 2: A Question-Answering System for High School Algebra Word

B. Definition of Coherent Discourse

The theory of language generation and anal­ysis which we shall describe below is designed to handle what we call coherent discourse. A discourse is a sequence of sentences, and the meaning of a discourse for a listener is the model of a situation he derives from this dis­course. Determination of the meaning of each sentence of the discourse may sometimes in­volve knowledge of the meanings of other sen­tences of the discourse. A discourse is co­herent if it has a complete and consistent in­terpretation within the model of the situation being built by the listener. A listener under­stands the discourse if his model of the situa­tion is isomorphic to the speaker's model.

A listener's ability to build a model of a situ­ation from a discourse is dependent on infor­mation available to him from his general store of knowledge. Therefore it is quite possible for a discourse to seem coherent to one listener and not another. A writer, reading his own writing, may feel that he has generated a co­herent sequence of sentences, which, in fact, is incoherent to all other readers. This is, un­fortunately, not a rare occurrence in the scien­tific literature. Conversely, a listener who is a psychiatrist, for example, may find coherence in a sequence of remarks which a patient thinks are entirely unrelated.

The STUDENT system utilizes an expanda­ble store of general knowledge to build a model of a situation described in a member of a lim­ited class of discourses. The form of the model of a situation built by STUDENT will be dis­cussed in detail below.

C. The Use of Kernel Sentences in Our Theory

A basic postulate of our theory of language analysis is that a listener understands a dis­course by transforming it into an equivalent (in meaning) sequence of simpler kernel sentences. A kernel sentence is one which the listener can understand directly; that is, one for which he knows a transformation into his information store. Conversely, a speaker generates a set of kernel sentences from his information map, and utilizes a sequence of transformations on this set to yield his spoken discourse. This set of kernel sentences is not invari~nt from person

HIGH SCHOOL ALGEBRA WORD PROGRAMS 593

to person, and even varies for a single indi­vidual as he learns.

Although we are not proposing our theory as a basis for a psychological model, it has been useful, to avoid circumlocutions, to describe the theory in terms of the properties and actions of a hypothetical speaker and listener. All state­ments about speakers and listeners should be interpreted as referring to computer programs which respectively, generate and analyze co­herent discourse.

D. Generation of Coherent Discourse

1) The Speaker's Model of the World. We assume that a speaker has some model of the world in his information store. We shall not be concerned here with how this model was built, or its exact form. Different forms for the model will be useful for different language tasks, but they must all have the properties de­scribed below.

The basic components of the model are a set of objects, rOd, a set of functions {Fin}, a set of relations {Rin}, a set of propositions {Pd, and a set of semantic deductive rules. A func­tion Fin is a mapping from ordered sets of n objects, called the arguments of Fin, into the {Ode The mapping may be multivalued and is defined only if the arguments satisfy a set of conditions associated with Fin. A condition is essentially membership in a class of objects, but is defined more precisely below. A relation Rin is a special type of object in the model, and consists of a label (a unique identifier), and an ordered set of n conditions, called the argument conditions for the relation. Functions of rela­tions are again relations.

An elementary proposition Pi consists of a label associated with some relation, Rin, and an ordered set of n objects satisfying the argu­ment conditions for this relation. One may think of these propositions as the beliefs of a speaker about what relationships between ob­jects he has noticed are true in the world. Complex propositions are logical combinations (in the usual sense) of elementary proposi­tions.

The semantic deductive rules give procedures for adding new propositions to the model based on the propositions now in the model. In addi-

From the collection of the Computer History Museum (www.computerhistory.org)

Page 3: A Question-Answering System for High School Algebra Word

594 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964

tion to the ordinary rules of logic, these rules include axioms about the relationships of the relations in the model. The semantic deductive rules also include links to the senses of the speaker. For example, one such deductive rule for adding a proposition to the model might be (loosely speaking) "Look in the real world and see if it is true." These rules essentially deter­mine how the model is to be expanded, and are the most complex part of a complete system. However, from our present point of view, we need only consider these rules as a black box which can extend the set of propositions in the model.

A closed question is a relational lable for some Rill and an ordered set of n objects. The answer to this question is affirmative if the proposition, consisting of this label and the n objects, is in the model (or can be added to it according to the semantic deductive rules). If the negation of this proposition is in the model (or can be added), the answer is negative. Otherwise the answer is undefined.

An open question consists of a relational label for an n-argument relation, Ri ll, and a set of objects ~orresponding to n-k of these argu­ments, where n~k~1. An answer to an open question is an ordered set of k objects, such that if these objects are associated with the k unspecified arguments of Rill, the resulting proposition is in the model, or can be added to it. An open question may have no answers, or may have one or more answers. A condition is an open question with k=l, and an object satisfies a condition if it is an answer to the question.

2) Generation of Kernel Sentences. We have described the logical properties of the speaker's model of the world. We shall now consider how strings in a language, words, phrases, and sen­tences, are associated with the model. Corre­sponding to the set of objects {Od there is a set {Nij } of strings (in English in our case), called the names of the objects. There is a many-one mapping from {Nid onto {Od. It is many-one because one object may have more than one name, e.g. frankfurter and hot dog both map back into the same object in the model.

Recall that functions map n-tuples of objects into objects. Thus a function name and an n-

tuple can specify an object. We can derive a name for this object from the function name and the names of its n arguments. Associated with each function is at least one linguistic form, a string of words with blanks in which names of arguments of the function must be inserted. Examples of linguistic forms associ­ated with a model are "number of ", "father of ", and "the child of __ _ and ". There is a many-one mapping from the set of linguistic forms {Lijll} cnto the set or runctions. Two examples oi multiple linguistic forms for the same function are: "father of " and " 's father"; and" plus " and "the sum of ___ and ". Thus, if objects x and y ha ve names "the first number" and "the second number" and associated with the function "*,, is the linguistic form "the product of __ _ and ", then the name of the object pro­duced by applying the function "*,, to x and y is "the product of the first number and the second number". A parsing of a name must decom­pose it into the part which is the linguistic form, and the parts which are names of argu­ments of the corresponding function. We shall call objects defined in terms of a function and an n-tuple of objects a functionally defined object, and those which are not functionally defined we shall call simple objects. Simple ob­j ects have simple names and functionally de­fined objects have composite names.

In addition to linguistic forms associated with functions, there are linguistic forms as­sociated with relations. For an n argument re­lation there are n blanks in the linguistic form. Examples of relational linguistic forms are: " equals ", " gave ---to " and " speaks". These lin-guistic forms, corresponding to the relations in the model, serve as frames for the kernel sentences.

In a manner similar to the way composite names are built, a kernel sentence correspond­ing to an elementary proposition is constructed by inserting names corresponding to each argu­ment in the appropriate blank. Names may be simple or composite. An example of a kernel sentence for a proposition built from such a re­lational linguistic form is "John's father gave .3 times the salary of Bill to Jack." which con-

From the collection of the Computer History Museum (www.computerhistory.org)

Page 4: A Question-Answering System for High School Algebra Word

tains the simple names "John", ".3", "bill", and "Jack". It contains the functional linguis­tic forms " 's father", " times ___ " and "salary of " and the rela-tional linguistic form " gave to "

A kernel sentence corresponding to a com­plex proposition is constructed recursively from the kernel sentences corresponding to its elementary propositional constituents by plac­ing them in the corresponding places in the linguistic forms " and ", " __ _ or ___ ", "not ___ " etc.

The kernel sentence corresponding to a closed question is constructed from the kernel of the corresponding proposition by placing it in the linguistic form "It is true that ?" For an open question, dummy objects are placed in the open argument positions to complete a prop­ositional form. These dummy arguments have names "who", "what", "where", etc., and which dummy objects are used depends on the condi­tion on that argument position. A question mark is placed at the end of the kernel sentence constructed in the usual way from the relational linguistic form and the names of the argu­ments.

In generating a coherent discourse, a speaker chooses a number of propositions in his model and/ or some open or closed questions. He then uses linguistic information associated with the model to construct the set of kernel sentences corresponding to this set of chosen proposi­tions. In the next section we will discuss how he generates his discourse from this set of kernels.

3) Transfor1r~tions on Kernel Sentences. The set of kernel sentences is the base of the coherent discourse. The meaning of a kernel sentence is the proposition into which it maps, and similarly, the meaning of any name is the object which is its image under the mapping. To this set of kernels we apply a sequence of meaning preserving transformations to get tlie final discourse. We use the word "transforma­tion" in its broad general sense, not in the nar­row technical sense defined by Chomsky.lO

There two distinct types of transformations, structural and definitional. A structural or syn­tactic transformation is only dependent on the structure of the kernel string(s) on which it

HIGH SCHOOL ALGEBRA WORD PROGRAMS 595

operates. For example, one syntactic transfor­mation takes a kernel in the active voice to one in the passive voice. Another combines two sentences into a single complex coordinate sen­tence.

One large class of syntactic transformations is used to substitue pronominal phrases for names. Pronominal phrases may be ordinary pronouns such as "he", "she", or "it". They may be referential phrases such as "the latter", "the former" or "this quantity". They may also be truncations of a full name such as "the dis­tance" for "the distance between New York and Los Angeles". In cases where such pro­nominal reference is made, the coherence of the final discourse is dependent on the order in which the resultant strings appear.

The second type of transformation is defini­tional. It involves substitutions of linguistic strings and forms for ones appearing in the kernel sentences. For example, for any appear­ance of "2 times" we may substitute "twice", and for ".5 times" substitute "one half of". In addition to this string substitution, some trans­formations perform form substitution and re­arrangement. For example, for a kernel sen­tence of the form "x is y more than z", where x, y, and z are any names, one definitional transformation can substitute "x exceeds z by y".

Some transformations are optional, and some may be mandatory if certain forms are present in the kernel set. Certain transformations are used by a speaker for stylistic purposes, for ex­ample, to emphasize certain objects; other syn­tactic transformations, such as those vvhich per­form pronominal substitutions, are used be­cause they decrease the depth of a construction, in the sense defined by Y ngve.ll

Let us review the steps in the generation of a coherent discourse. The speaker chooses a set of propositions, the "ideas" he wishes to transmit. He then encodes them as language strings called kernel sentences in the manner described above. He then chooses a sequence of structural and definitional transformations which are defined on this set of kernels or on the ordered set of sentences which result from applications of the first transformations. The resulting sequence of sentences will be a co-

From the collection of the Computer History Museum (www.computerhistory.org)

Page 5: A Question-Answering System for High School Algebra Word

a96 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964

herent discourse to a listener if he knows all the definitional transformations applied. In addition, every pair of distinct names which the speaker maps back into the same object, the listener must also map into a single object.

E., Analysis of Coherent Discourse

Generation of coherent discourse consists of two distinguishable steps. From propositions in the speaker's model of the world, he gen­erates an ordered set of kernel sentences. He .... krt.'I"\ n'V'\"I"\.l~An n C"fAr"I'1I'A"I""IInn. A.~ +".n'l"\n~.n.".,....,.."n+': __ ~ +_ lI.l.lVu. app.l.lV';:) a ,;:)v\::\.uvu.\..-v V~ lI~au.';:)~Vl.l.u.alll.Vu.i::I lIV

this kernel set. The resulting discourse is a coded message which is to be analyzed and de­coded by a listener. The listener's problem can be loosely characterized as an attempt to an­swer the question, "What would I have meant if I said that?"

To analyze a discourse the listener must find the set of kernel sentences from which it was generated; one way to do this is to find a set of inverse transformations which when applied to the input discourse yield a sequence of kernel sentences. The listener must then transform these kernel sentences to an appropriate repre­sentation in his information store. The appro­priateness of a representation is a function of what later use the listener expects to make of the information contained in the discourse. The listener may simultaneously transform a given kernel sentence into a number of different rep­resentations in his information store. On a level of pragmatic analysis, statements require only storage of information. Questions and impera­tives require appropriate responses from the listener. The difficulties in analysis dichoto­mize into those associated with finding the kernel sentences which are the base of the dis­course, and those associated with transforming the kernel sentences into representations in the information store.

STUDENT'S analytical program utilizes a set of inverse analytic transformations. If T i is a transformation that may be used in gen­erating a discourse, and Ti (S) == S, where S and S are sets of sentences, then the analytic transformation T i -1 is the inverse of T i if and only if T i -1 (S) == S. The choice of which in­verse transformations to apply and the order of their application may be restricted by utiliz-

ing heuristics concerned with features of the input.

Once the base set of kernel sentences for a given discourse is determined, there remains the problem of entering representations of these sentences in the listener's information store. The major problem in accomplishing this step involves the separation of those words which are part of linguistic forms for relations, and those which are part of a name. This is difficult because the same word (lexicographic symbol) may have multiple uses in a language. Having separated the relational form from the names which represent the arguments of this relation, one can then analyze the name in terms of components which are functional lin­guistic forms and others which are simple names. From this parsing in terms of rela­tional linguistic forms, functional linguistic forms and simple names, the discourse can be transformed into a canonical representation in the information store of the listener.

F. Limited Deductive Models A complete understanding of a discourse by

a listener would imply that the representation of the discourse in his information store is es­sentially isomorphic to the speaker's model of the world, at least for the universe of discourse. The listener's representation must preserve all information implicit in the discourse.

If the listener is only interested in certain aspects of the discourse, he need only preserve information relevant to his interest, and dis­card the rest. Within his area of interest the listener's model is isomorphic to the speaker's model in the sense that all relevant deductions which can be made by the speaker on the basis of the discourse can also be made by the lis­tener. Outside this area of interest, the listener will be unable to answer any questions. We call such restricted information stores limited deductive models.

The question-answering programs of Lindsay and Raphael, and the STUDENT system, all utilize limited deductive models. For the area of interest in each of these programs there is a "natural" representation for the information in the allowable input. These representations are natural in that they facilitate the deduction of implicit information. For example, Lindsay's

From the collection of the Computer History Museum (www.computerhistory.org)

Page 6: A Question-Answering System for High School Algebra Word

family tree representation makes it easy to compute the relationship of any two individuals in the tree, independent of the number of sen­tences necessary to build the tree.

Because the number of relations and func­tions expressible in the models in all three sys­tems is very limited, there is a corresponding limitation on the number of linguistic forms that may appear in the input. This greatly simplifies the parsing problem by restricting alternatives for forms in the input text.

G. The STUDENT Deductive Model The STUDENT system is an implementation

of the analytic portion of our theory. STU­DENT performs certain inverse transforma­tions to obtain a set of kernel sentences and then transforms these kernel sentences to ex­pressions in a limited deductive model. Utiliz­ing the power of this deductive model, within its limited domain of understanding, it is able to answer questions based on information im­plicit in the input information.

The analytic and transformational techniques utilized in STUDENT are described in detail in the next section. We shall describe here the canonical representation of objects, relations, and functions within the model. STUDENT is restricted to answering questions framed in the context of algebra story problems. Algebraic equations are a natural representation for in­formation in the input.

The objects in the model are numbers, or numbers with an associated dimension. The only relation in the model is equality, and the only functions represented directly in the model are the arithmetic operations of addition, nega­tion, multiplication, division and exponentia­tion. Other functions are defined in terms of these basic functions, by composition, and/or substitution of constants for arguments of these functions. For example, the operation of squar­ing is defined as exponentiation with "2" as the second argument of the exponential func­tion; subtraction is a composition of addition and negation.

Within the computer, a parenthesized prefix notation is used for a standard representation of the equations implicit in the English input. The arithmetic operation to .be expressed is

HIGH SCHOOL ALGEBRA WORD PROGRAMS 597

made the first element of a list, and the argu­ments of the function are succeeding list ele­ments. The exact notation is given in Figure 1. In the figure, A, B, and C are any representa­tions of objects in the model, either composite or simple names. The usual infix notation for these functional expressions is given for com­parison. Because this is a fully parenthesized notation, no ambiguity of operational order arises, as it does, for example, for the un par­enthesized infix notation expression A *B+C, or its corresponding natural language expression "A times B plus C". Note also that in this prefix notation plus and times are not strictly binary operators. Indeed, in the model they may have any finite number of arguments, e.g. (TIMES ABC D) is a legitimate expression in the STUDENT model.

Representations of objects in the STUDENT deductive model are taken from the input. Any strings of words not containing a linguistic form associated with an arithmetic function expressible in the model are considered simple names for objects. Thus, "the age of the child of John and Jane" is considered a simple name because it contains no functional linguistic forms associated with functions represented in ClmTTT'l.T.'111.Tm,_ l! __ !L_..l ..l_..l ___ L! _____ ..l_1 T__ .,;:) .L U lJ.c..l'l .L IS 111IllLeU ueUUClil ve lIluuel. ~u (;l.

more general model it would be considered a composite name, and the functional forms "age of " and "child of and " would be mapped into their corresponding functions in the model.

Figure 1: Notation Within the STUDENT Deductive Model

Infix Operation Notation Prefix Notation

Equality A=B (EQUALAB)

Addition A+B (PLUS A B) A+B+C (PLUS AB C)

Negation -A (MINUS A)

Subtraction A-B (PLUS A (MINUS B»

Multiplication A*B (TIMES A B) A*B*C (TIMESAB C)

Division A/B (QUOTIENT A B)

Exponentiation A B (EXPT AB)

Figure 1. Flow Chart of the Student Program

From the collection of the Computer History Museum (www.computerhistory.org)

Page 7: A Question-Answering System for High School Algebra Word

598 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964

Because such complex strings are considered simple names in the model, and objects are dis­tinguished only by their names, it is important to determine when two distinct names actually refer to the same object. In fact, answers to questions in the STUDENT system are state­ments of the identity of the object referenced by two names. However, one of the names (the desired one) must satisfy a certain lexical con­dition. Most often this condition is just that the name be a numeral. For a more general model this restriction could be stated as re~ quiring a simple name corresponding to some functionally defined name-because, for ex­ample, "number of " would be a func­tional linguistic form in the general model, and the only simple name for such an object would be the numeral corresponding to this number. An answer consists of a statement of identity e.g. "The number of customers Tom gets is 162."

The other lexical restriction on answers sometimes used in the STUDENT system is in­sistence that a certain unit (corresponding to a dimension associated with a number) appear in the desired answer. For example, spans is the unit specified by the question "How many spans equals 1 fathom ?", and the answer given by STUDENT is "1 fathom is 8 spans."

The deductive model described here is useful for answering questions because we know how to extract implicit information from expres­sions in this model. More explicity, we know how to solve sets of algebraic equations to find numerical values which satisfy these equations.

III. Transformation of English to the STUDENT Deductive Model

Our question-answering system contains two main programs which process English input. One is a program called STUDENT which ac­cepts the statement of an algebra story prob­lem and attempts to find the solution to the particular problem. STUDENT does not store any information, nor "remember" any thing from problem to problem. The information ob­tained by STUDENT is the local context of the question.

The other program is called REMEMBER and it processes and stores facts not specific to anyone problem. These facts make up STU-

DENT's store of "global information" as op­posed to the "local information" contained in the statement of anyone problem. This in­formation is accepted in a subset of English which overlaps but is different from the subset of English accepted by STUDENT. REMEM­BER accepts statements in certain fixed for­mats, and for each format the information is stored in a way that makes it convenient for retrieval and use within the STUDENT pro­gram. The following examples indicate some "'-/! +"h .... nnn,-,. ...... +n hL" -/!A"II''I'Vl",t-", -fA"II' ",lAh", 1 ;,.,-fA1"1'YI!l_ V.L ...,.ll.'C Q,.vvCPlia"U.1v ~V..1..L.L.I."\lO .LV~ 6.LV1<JUI.L .L.L.L..L'-'..L. ... .&. ... 'I.N

tion:

"Distance equals speed times time." "Feet is the plural of foot." "Bill is a person." "Times is an operator of level 1." "One half always means .5."

We will not give any details of the processing done by REMEMBER, except to note that some of the global information is stored by actually adding statements to the STUDENT program, and other information is stored on dictionary entries for the individual words.

A. Outline of the Operation of STUDENT

To provide perspective by which to view the detailed heuristic techniques used in the STU­DENT program, we shall first give an outline of the operation of the STUDENT program when given a problem to solve. This outline is a verbal description of the flow chart of the program found in the appendix.

STUDENT is asked to solve a particular problem. We assume that all necessary global information has been stored previously. STU­DENT will now transform the English input statement of this problem into expressions in its limited deductive model, and through ap­propriate deductive procedures attempt to find a solution. More specifically, STUDENT finds the kernel sentences of the input discourse, and transforms this sequence of kernels into a set of simultaneous equations, keeping a list of the answers required, a list of the units involved in the problem (e.g. dollars, pounds) and a list of all the variables (simple names) in the equations. Then STUDENT invokes the SOLVE program to solve this set of equations for the desired unknowns. If a solution is found, STUDENT prints the values of the un-

From the collection of the Computer History Museum (www.computerhistory.org)

Page 8: A Question-Answering System for High School Algebra Word

knowns requested in a fixed format, substitut­ing in "(variable IS value)" the appropriate phrases for variable and value. If a solution cannot be found, various heuristics are used to identify two variables (i.e. find two slightly different phrases that refer to the same object in the model). If two two variables, A and B, are identified, the equation A == B is added to the set of equations. In addition, the store of global information is searched to find any equa­tions that may be useful in finding the solution to this problem. STUDENT prints out any as­sumptions it makes about the identity of two variables, and also any equations that it re­trieves because it thinks they may be relevant. If the use of global equations or equations from identifications leads to a solution, the answers are printed out in the format described above.

If a solution is not found, and certain idioms are present in the problem (a result of a defini­tional transformation used in the generation of the problem), a substitution is made for each of these idioms in turn and the transformation and solution process is repeated. If the substi­tutions for these idioms do not enable the prob­lem to be solved by STUDENT, then STU­DENT requests additional information from the questioner, showing him the variables being used in the problem. If any information is given, STUDENT tries to solve the problem again. If none is given, it reports its inability to solve this problem and terminates. If the problem is ever solved, the solution is printed and the program terminates.

B. Categories of Words in a Transformation.

The words and phrases (strings of words) in the English input can be classified into three distinct categories on the basis of how they are handled in the transformation to the deductive model. The first category consists of strings of words which name objects in the model; I call such strings, variables. Variables are identified only by the string of words in them, and if two strings differ at all, they define distinct vari­ables. One important problem considered be­low is how to determine when two distinct vari­ables refer to the same object.

The second category of words and phrases are what I call substitutors. Each substitutor may be replaced by another string. Some sub­stitutions are mandatory; others are optional

HIGH SCHOOL ALGEBRA WORD PROGRAMS 599

and are only made if the problem cannot be solved without such substitutions. An example of a mandatory substitutions is "2 times" for the word "twice". "Twice" always means "2 times" in the context of the model, and there­fore this substitution is mandatory. One op­tional "idiomatic" substitution is "twice the sum of the length and width of the rectangle" for "the perimeter of the rectangle". The use of these substitutions in the transformation pro­cess is discussed below. These substitutions are inverses of definitional transformations as de­fined earlier.

Members of the third category of words in­dicate the presence of functional linguistic forms which represent functions in the deduc­tive model. I call members of this third class operators. Operators may indicate operations which are complex combinations of the basic functions of the deductive model. One simple operator is the word "plus", which indicates that the objects named by the two variables surrounding it are to be added. An example of a more complex operator is the phrase "percent less than", as in "10 percent less than the marked price", which indicates that the number immediately preceding the "percent" is to be ,.."h4-", .... ,.,4-,.,.;) ~",,.,. ........ 1 f\f\ 4-l-.~,.. ".,.,.",,,14- ;)~TT~;)"";) hTT 1 f\f\ i:)U/Jltl.a.~ltCU .J.l.Vl.H .J.vv, ltH.li:) l.Ci:)Ullt U.lVl.UCU /J~ .J.vv,

and then this quotient multiplied by the vari­able following the "than".

Operators may be classified according to where their arguments are found. A prefix operator, such as "the square of .... " precedes its argument. An operator like " .... percent" is a suffix operator, and follows its argument. Infix operators such as " ..... plus ..... " or " .... less than ...... " appear between their two arguments. In a split prefix operator such as "difference between ...... and ...... ", part of the operator precedes, and part appears between the two arguments. "The sum of ...... and ...... and ...... " is a split prefix operator with an indefinite number of argu­ments.

Some words may act as operators condi­tionally, depending on their context. For ex­ample, "of" is equivalent to "times" if there is a fraction immediately preceding it; e.g., ".5 of the profit" is equivalent to ".5 times the profit" ; however, "Queen of England" does not imply

From the collection of the Computer History Museum (www.computerhistory.org)

Page 9: A Question-Answering System for High School Algebra Word

600 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964

a multiplicative relationship between the Queen and her country.

C. Transformational Procedures.

Let us now consider in detail the transforma­tion procedure used by STUDENT, and see how the three categories of phrases interact. Let us consider the following example, which has been solved by STUDENT.

(THE PROBLEM TO BE SOLVED IS) (IF THE NUMBER OF CUSTOMERS TOM GETS IS TWICE THE SQUARE OF 20 PER CENT OF THE NUMBER OF AD­VERTISEMENTS HE RUNS, AND THE NUMBER OF ADVERTISEMENTS HE RUNS IS 45, WHAT IS THE NUMBER OF CUSTOMERS TOM GETS Q.)

Shown below are copies of actual printout from the STUDENT program, illustrating stages in the transformation and the solution of the problem. The parentheses are an arti­fact of the LISP programming language, and "Q." is a replacement for the question mark not available on the key punch.

The first stage in the transformation is to perform all mandatory substitutions. In this problem only the three phrases underlined (by the author, not the program) are substitutors: "twice" becomes "2 times", "per cent" becomes the single word "percent", and "square of" is truncated to "square". Having made these sub­stitutions, STUDENT prints:

(WITH MANDATORY SUBSTITUTIONS THE PROBLEM IS) (IF THE NUMBER OF CUSTOMERS TOM GETS IS 2 TIMES THE SQUARE 20 PER­CENT OF THE NUMBER OF ADVER­TISEMENTS HE RUNS, AND THE NUM­BER OF ADVERTISEMENTS HE RUNS IS 45, WHAT IS THE NUMBER OF CUS­TOMERS TOM GETS Q.)

From dictionary entries for each word, the words in the problem are tagged by their func­tion in terms of the transformation process, and STUDENT prints:

(WITH WORDS TAGGED BY FUNCTION THE PROBLEM IS) (IF THE NUMBER (OF / OP) CUSTOM­ERS (GETS / VERB) IS 2 (TIMES / OP 1) THE (SQUARE / OP 1) 20 (PERCENT

/ OP 2) (OF/OP) THE NUMBER (OF / OP) ADVERTISEMENTS (HE / PRO) RUNS, AND THE NUMBER (OF / OP) ADVERTISEMENTS (HE / PRO) RUNS IS 45, (WHAT / QWORD) IS THE NUM­BER (OF / OP) CUSTOMERS TOM (GETS / VERB) (QMARK / DLM»

If a word has a tag, or tags, the word followed by "/", followed by the tags, becomes a single unit, and is enclosed in parentheses. Some typical taggings are shown above. "(OF /OP)" indicates that "OF" is an operator and other taggings show that "GETS" is a verb, "TIMES" is an operator of level 1 (operator levels will be explained below), "SQUARE" is an operator of levell, "PERCENT" is an op­erator of level 2, "HE" is a pronoun, "WHAT" is a question word, and "QMARK" (replacing Q.) is a delimiter of a sentence. These tagged words will play the principal role in the re­maining transformation to the set of equations implicit in this problem statement.

The next stage in the transformation is to break the input sentences into "kernel sen­tences". As in the example, a problem may be stated using sentences of great grammatical complexity; however, the final stage of the transformation is only defined on a set of kernel sentences. The simplification to kernel sen­tences as done in STUDENT depends on the recursive use of format matching. If an input sentence is of the form "IF" followed by a substring, followed by a comma, a question word and a second substring then the first sub­string (between the IF and the comma) is made an independent sentence, and everything fol­lowing the comma is made into a second sen­tence. In the example, this means that the in­put is resolved into the following two sentences, (where tags are omitted for the sake of brev­ity) :

"The number of customers Tom gets is 2 times the square 20 percent of the number of advertisements he runs, and the number of advertisements he runs is 45." and "What is the number of customers Tom gets?"

This last procedure effectively resolves a problem into declarative assumptions and a question sentence. A second complexity re­solved by STUDENT is illustrated in the first

From the collection of the Computer History Museum (www.computerhistory.org)

Page 10: A Question-Answering System for High School Algebra Word

sentence of this pair. A coordinate sentence consisting of two sentences joined by a comma immediately followed by an "and" will be re­solved into these two independent sentences. The first sentence is therefore resolved into two simpler sentences.

U sing these two inverse syntactic transfor­mations, this problem statement is resolved into "simple" kernel sentences. For the example, STUDENT prints

(THE SIMPLE SENTENCES ARE) (THE NUMBER (OF / OP) CUSTOMERS TOM (GETS / VERB) IS 2 (TIMES / OP 1) THE (SQUARE / OP 1) 20 (PERCENT / OP 2) (OF / OP) THE NUMBER (OF / OP) ADVERTISEMENTS (HE / PRO) RUNS (PERIOD / DLM) (THE NUMBER (OF / OP) ADVERTISE­CENTS (HE / PRO) RUNS IS 45 (PE­RIOD /DLM) «WHAT / QWORD) IS THE NUMBER (OF / OP) CUSTOMERS TOM (GETS / VERB) (QMARK / DLM»

Each simple sentence is a separate list, i.e., is enclosed in parentheses, and each ends with a delimiter (a period or question mark). Each of these sentences can now be transformed di~ rectly to its interpretation in the model.

D. From Kernel Sentences to Equations

The transformation from the simple kernel sentences to equations uses three levels of pre­cedence for operators. Operators of higher pre­cedence level are used earlier in the transfor­mation. Be for e utilizing the operators, STUDENT looks for linguistic forms associ­ated with the equality relation. These forms include the copula "is" and transitive verbs in certain contexts. In the example we are con­sidering, only the copula "is" is used to indicate equality. The use of transitive verbs as indi­cators of equality, that is, as relationallinguis­tic forms, will be discussed in connection with another example. When the relational linguis­tic form is identified, the names which are the arguments of the form are broken down into variables and operators (functional linguistic forms). In the present problem, the two names are those on either side of the "is" in each sentence.

HIGH SCHOOL ALGEBRA WORD PROGRAMS 601

The word "is" may also be used meaning­fully within algebra story problems as an auxil­iary verb (not meaning equality) in such verbal phrases as "is multiplied by" or "is divided by". A special check is made for the occurrence of these phrases before proceeding on to the main transformation procedure. The transformation of sentences containing these special verbal phrases will be discussed later. If "is" does not appear as an auxiliary in such a verbal phrase, a sentence of the form "PI is P2" is interpreted as indicating the equality of the objects named by phrases PI and P2. No equality relation will be recognized within these phrases, even if an appropriate transitive verb occurs within either of them. If PI * and P2* represent the arithmetic transformations of PI and P2, then "PI is P2" is transformed into the equation "(EQUAL PI * P2*) ".

The transformation of PI and P2 to give them an interpretation in the model is per­formed recursively using a program equivalent to the table in Figure 2. This table shows all the operators and formats currently recognized by the STUDENT program. New operators can easily be added to the program equivalent of this table.

In performing the transformation of a phrase P, a left to right search is made for an operator of level 2 (indicated by subscripts of "OP" and 2). If there is none, a left to right search is made for a level 1 operator (indicated by sub­scripts "OP" and 1), and finally another left to right search is made for an operator of level 0 (indicated by a subscript "OP" and no numeri­cal subscript). The first operator found in this ordered search determines the first step in the transformation of the phrase. This operator and its context are transformed as indicated in column 4 in the table. If no operator is present, delimiters and articles (a, an and the) are de­leted, and the phrase is treated as an indivisible entity, a variable.

In the example, the first simple sentence is

(THE NUMBER (OF/OP) CUSTOMERS TOM (GETS/VERB) IS 2 (TIMES/OP 1) THE (SQUARE/OP 1) 20 (PER­CENT/OP 2) (OF/OP) THE NUMBER (OF/OP) ADVERTISEMENTS (HE/ PRO) RUNS (PERIOD/DLM»

From the collection of the Computer History Museum (www.computerhistory.org)

Page 11: A Question-Answering System for High School Algebra Word

602 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964

Figure 2

Operator Precedence Context Level

PLUS PLUSS MINUS

MINUSS TIMES DIVBY SQUARE SQUARED

** LESSTHAN PER

PERCENT PERLESS SUM

DIFFERENCE

OF

2 0 2

0 1 1 1 0 0 2 0

2 2 0

0

PI PLUS P2 PI PLUSS P2 PI MINUSP2 MINUSP2 PI MINUSS P2 PI TIMES P2 PI DIVBY P2 SQUARE PI PI SQUARED PI ** P2 PI LESSTHAN P2 PI PERKP2 PI PER P2 PI K PERCENT P2 PI K PERLESS P2 SUM PI AND P2 AND P3 SUM PI AND P2 DIFFERENCE BETWEEN PI AND P2

o KOFP2 PI OF P2

Interpretation in the Model

(PLUS PI * P2*) (PLUS PI * P2*) (PLUS PI * (MINUS P2*» (MINUS P2*) (PLUS PI * (MINUS P2*» (TIMES PI * P2*) (QUOTIENT PI * P2*) (EXPT PI * 2) (EXPT Pl* 2) (EXPT PI * P2*) (PLUS P2* (MINUS PI *) ) (QUOTIENT PI * (K P2) *) (QUOTIENT PI * (1 P2) *) (PI (K/I00) P2) * (PI «100-K) /100) P2) * (PLUS PI * (SUM P2 AND P3) *) (PLUS PI * P2*) (PLUS PI * (MINUS P2*) )

(TIMES K P2*) (PI OF P2)*

(a)

(b)

(c)

(b)

(d)

(e)

(f)

(f)

(f)

(g)

(g)

(a) If P1 is a phrase, P1 * indicates its interpretation in the model. (b) PLUSS and MINUSS are identical to PLUS AND MINUS except for precedence level. (c) When two possible contexts are indicated, they are checked in the order shown. (d) SQUARE P1 and SUM P1 are idiomatic shortenings of SQUARE OF P1 and SUM OF PI. (e) * outside a parenthesized expression indicates that the enclosed phrase is to be transformed. (f) K is a number. (g) / and - imply that the indicated arithmetic operations are actually performed.

This is of the form "PI is P2", and is trans­formed to (EQUAL PI * P2*). PI is "(THE NUMBER (OF/OP) CUSTOMERS TOM (GETS/VERB»". The occurrence of the verb "gets" is ignored because of the presence of the "is" in the sentence, meaning "equals". The only operator found is "(OF /OP)". From the table we see that if "OF" is immediately preceded by a number (not the word "num­ber") it is treated as if it were the infix "TIMES". In this case, however, "OF" is not preceded by a number; the subscript OP, indi­cating that "OF" is an operator, is stripped away, and the transformation process is re­peated on the phrase with "OF" no longer act-

ing as an operator. In this repetition, no operators are found, and PI * is the variable

(NUMBER OF CUSTOMERS TOM (GETS/ VERB» . To the right of "IS" in the sentence is P2: (2 (TIMES/OP 1) THE (SQUARE/OP 1)

20 (PERCENT/OP 2) (OF/OP) THE N U M B E R (OF lOP) ADVERTISE­MENTS (HE/PRO) RUNS (PERIOD/ DLM» The first operator found in P2 is PERCENT, an operator of level 2.

From the table in Figure 2, we see that this operator has the effect of dividing the number immediately preceding it by 100. The "PER-

From the collection of the Computer History Museum (www.computerhistory.org)

Page 12: A Question-Answering System for High School Algebra Word

CENT" is removed and the transformation is repeated on the remaining phrase. In the ex­ample, the" .... 20 (PERCENT/OP 2) (OF/ OP) .... " becomes" .... 2000 (OF lOP) .... ".

Continuing the transformation, the operators found are, in order, TIMES, SQUARE, OF and OF. Each is handled as indicated in the table. The "OF" in the context" .... 2000 (OF lOP) THE .... " is treated as an infix TIMES, while at the other occurrence of "OF", the operator marking is removed. The resulting trans­formed expression for P2 is:

(TIMES 2 (EXPT (TIMES .2 (NUMBER OF ADVERTISEMENTS (HE/PRO) RUNS» 2»

The transformation of the second sentence of the example is done in a similar manner, and yields the equation:

(EQUAL (NUMBER OF ADVERTISE­MENTS (HE/PRO) RUNS) 45)

The third sentence is of the form "What is PI ?". It starts with a question word and is therefore treated specially. A unique variable, a single word consisting of an X of G followed by five integers, . is created, and the equation (EQUAL Xnnnnn PI *) is stored. For this example, the variable XOOOOI was created, and this last simple sentence is transformed to the equation:

(EQUAL XOOOOI (NUMBER OF CUSTOM­ERS TOM (GETS/VERBS»

In addition, the created variable is placed on the list of variables for which STUDENT is to find a value. Also, this variable is stored, paired with PI, the untransformed right side, for use in printing out the answer. If a value is found for this variable, STUDENT prints the sen­tence (PI is value) with the appropriate sub­stitution for value. Below we show the full set of equations, and the printed solution given by STUDENT for the example being considered. For ease in solution, the last equations created are put first in the list of equations.

(THE EQUATIONS TO BE SOLVED ARE) (EQUAL XOOOOI (NUMBER OF CUSTOM­ERS TOM (GETS/VERB» ) (EQUAL (NUMBER OF ADVERTISEMENTS (HE/ PRO) RUNS) 45) (EQUAL (NUMBER OF CUSTOMERS TOM (GETS/VERB»

HIGH SCHOOL ALGEBRA WORD PROGRAMS 603

(TIMES 2 (EXPT (TIMES .2000 (NUM­BER OF ADVERTISEMENTS (HE/PRO) RUNS» 2») (THE NUMBER OF CUS­TOMERS TOM GETS IS 162)

In the example just shown, the equality relation was indicated by the copula "is". In the prob­lem shown below, solved by STUDENT, equal­ityjs indicated by the occurrence of a transitive verb in the proper context.

(THE PROBLEM TO BE SOLVED IS) (TOM HAS TWICE AS MANY FISH AS MARY HAS GUPPIES. IF MARY HAS 3 GUPPIES, WHAT IS THE NUMBER OF FISH TOM HAS Q.) (THE NUMBER OF FISH TOM HAS IS 6)

The verb in this case is "has". The simple sen­tence "Mary has 3 guppies" is transformed to the "equivalent" sentence "The number of guppies Mary has is 3" and the sentence is processed as before. This transformation rule may be stated generally as: anything (a sub­ject) followed by a verb followed by a number followed by anything (the unit) is transformed to a sentence starting with "THE NUMBER OF" followed by the unit, followed by the sub­ject and the verb, followed by "IS" and then the number. In "Mary has 3 guppies" the sub­ject is "Mary", the verb "has", and the units "guppies". Similarly, the sentence "The 'witches of Firth brew 3 magic potions" would be trans­formed to

"The number of magic potions the witches of Firth brew is 3."

In addition to a declaration of number, a single-object transitive verb may be used in a comparative structure, such as exhibited in the sentence "Tom has twice as many fish as Mary has guppies." STUDENT transforms this sen­tence into the equivalent sentence.

"The number of fish Tom has is twice the number of guppies Mary has."

Transformations of new sentence formats to formats previously "understood" by the pro­gram can be easily added to the program, thus extending the subset of English "understood" by STUDENT.

The word "is" indicates equality only if it is not used as an auxiliary. The example below shows how verbal phrases containing "is", such

From the collection of the Computer History Museum (www.computerhistory.org)

Page 13: A Question-Answering System for High School Algebra Word

604 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964

as "is multiplied by", and "is increased by" are handled in the transformation.

(THE PROBLEM TO BE SOLVED IS) (A NUMBER IS MULTIPLIED BY 6. THIS PRODUCT IS INCREASED BY 44. THIS RESULT IS 68. FIND THE NUM­BER.)

(THE EQUATIONS TO BE SOLVED ARE) (EQUAL X00001 (NUMBER» (EQUAL (PLUS (TIMES (NUMBER) 6) 44) 68)

(THE NUMBER IS 4)

The sentence "A number is multiplied by 6" only indicates that two objects in the model are related multiplicatively, and does not indicate explicitly any equality relation. The interpre­tation of this sentence in the model is the prefix notation product "(TIMES (NUMBER 6» ". This latter phrase is stored in a temporary loca­tion for possible later reference. In this prob­lem, it is referenced in the next sentence, with the phrase "this produce". The important word in this last phrase is "this"-STUDENT ig­nores all other 'words in a variable containing the key word "this". The last temporarily stored pp.rase is substituted for the phrase con­taining "this". Thus, the first three sentences in the problem shown above yield only one equation, after two substitutions for "this" phrases. The last sentence "Find the number." is transformed as if it were "What is the num­ber Q.", and yields the first equation shown.

The word "this" may occur in a context where it is not referring to a previously stored phrase. Below is an example with such a con­text.

(THE PROBLEM TO BE SOLVED IS) (THE PRICE OF A RADIO IS 69.70 DOL­LARS. IF THIS PRICE IS 15 PERCENT LESS THAN THE MARKED PRICE, FIND THE MARKED PRICE.)

(THE MARKED PRICE IS 82 DOLLARS)

In such contexts, the phrase containing "THIS" is replaced by the left half of the last equation created. In this example, STUDENT breaks the last sentence into two simple sentences, de­leting the "IF". Then the phrase "THIS PRICE" is replaced by the variable "PRICE OF RADIO", which is the left half of the pre­vious equation.

This problem illustrates two other features of the STUDENT program. The first is the action of the complex operator "percent less than". It causes the number immediately pre­ceding it, i.e., 15, to be subtracted from 100, this result divided by 100, to give .85. Then this operator becomes the infix operator "TIMES". This is indicated in the table in Figure 2.

This problem also illustrates how units such as "dollars" are handled by the STUDENT pro­gram. Any word which immediately follows a number is labeled as a special type of variable called a unit. A number followed by a unit is treated in the equation as a product of the number and the unit, e.g., "69.70 DOLLARS" becomes" (TIMES 69.70 (DOLLARS» ". Units are treated as special variables in solving the set of equations; a unit may appear in the an­swer though variables cannot. If the value for a variable found by the solver is the product of a number and a unit, STUDENT concatenates the number and the unit. For example, the solution for" (MARKED PRICE)" in the prob­lem above was (TIMES 82 (DOLLARS» and STUDENT printed out:

(THE MARKED PRICE IS 82 DOLLARS)

There is an exception to the fact that any unit may appear in the answer, as illustrated in the problem below.

(THE PROBLEM TO BE SOLVED IS) (IF 1 SPAN EQUALS 9 INCHES, AND 1 FATHOM EQUALS 6 FEET, HOW MANY SPANS EQUALS 1 FATHOM Q.)

(THE EQUATIONS TO BE SOLVED ARE) (EQUALS X00001 (TIMES 1 (FATH­OMS») (EQUAL (TIMES 1 (FATH­OMS» (TIMES 6 (FEET») (EQUAL (TIMES 1 (SPANS» (TIMES 9 (INCH­ES» )

THE EQUATIONS WERE INSUFFICIENT TO FIND A SOLUTION (USING THE FOLLOWING KNOWN RELATIONSHIPS) «EQUAL (TIMES 1 (YARDS» (TIMES 3 (FEET») (EQUAL (TIMES 1 (FEET» (TIMES 12 (INCHES»»

(1 FATHOM IS 8 SPANS)

If the unit of the answer is specified, in this problem by the phrase "how many spans"­then only that unit, in this problem "spans",

From the collection of the Computer History Museum (www.computerhistory.org)

Page 14: A Question-Answering System for High School Algebra Word

may appear in the answer. Without this re­striction, STUDENT would blithely answer this problem with" (1 FATHOM IS 1 FATH­OM) ".

In the transformation from the English statement of the problem to the equations, "9 INCHES" became (TIMES 9 (INCHES». However, "1 FATHOM" became" (TIMES 1 (FATHOMS»". The plural form for fathom has been used instead of the singular form. STUDENT always uses the plural form if known, to ensure that all units appear in only one form. Since "fathom" and "fathoms" are different, if both were used STUDENT would treat them as distinct, unrelated units. The plural form is part of the global information that can be made available to STUDENT, and the plural form of a word is substituted for any singular form appearing after "I" in any phrase. The inverse operation is carried out for correct printout of the solution.

Notice that the information given in the problem above was insufficient to allow solution of the set of equations to be solved. Therefore, STUDENT looked in its glossary for informa­tion concerning each of the units in this set of equations. It found the relationships "1 foot equals 12 inches." and "1 yard equals 3 feet." U sing only the first fact, and the equation it implies, STUDENT is then able to solve the problem. Thus, in certain cases where a prob­lem is not analytic, in the sense that it does not contain, explicitly stated, all the information needed for its solution, STUDENT is able to draw on a body of facts, picking out relevant ones, and use them to obtain a solution.

In certain problems, the transformation process does not yield a set of solvable equa­tions. However, within this set of equations there exists a pair of variables (or more than one pair) such that the two variables are only "slightly different", and really name the same object in the model. When a set of equations fs unsolvable, STUDENT searches for relevant global equations. In addition, it uses several heuristic techniques for identifying two "slight­ly different" varia-hIes in the equations. The problem below illustrates the identification of two variables where in one variable a pronoun has been substituted for a noun phrase in the other variable. This identification is made by

HIGH SCHOOL ALGEBRA WORD PROGRAMS 605

checking all variables appearing before one containing the pronoun, and finding one which is identical to this pronoun phrase, with a sub­stitution of a string of any length for the pro­noun.

(THE PROBLEM TO BE SOLVED IS)

(THE NUMBER OF SOLDIERS THE RUS­SIANS HAVE IS ONE HALF OF THE NUMBER OF GUNS THEY HAVE. THE NUMBER OF GUNS THEY HAVE IS 7000. WHAT IS THE NUMBER OF SOLDIERS THEY HAVE Q.)

THE EQUATIONS WERE INSUFFICIENT TO FIND A SOLUTION (ASSUMING THAT) ( (NUMBER OF SOLDIERS (THEY/PRO) (HAVE/VERB» IS EQUAL TO (NUM­BER OF SOLDIERS RUSSIANS (HA VEl VERB» )

(THE NUMBER OF SOLDIERS THEY HA VE IS 3500)

If two variables match in this fashion, STU­DENT assumes the two variables are equal, prints out a statement of this assumption, as shown, and adds an equation expressing this equality to the set to be solved. The solution procedure is tried again, with this additional equation. In this example, the additional equa­tion was sufficient to allow determination of the solution.

The example below is again a non-analytic problem. The first set of equations developed by STUDENT is unsolvable. Therefore, STU­DENT tries to find some relevant equations in its store of global informa,tion.

(THE PROBLEM TO BE SOLVED IS) (THE GAS CONSUMPTION OF MY CAR IS 15 MILES PER GALLON. THE DIS­TANCE BETWEEN BOSTON AND NEW YORK IS 250 MILES. WHAT IS THE NUMBER OF GALLONS OF GAS USED ON A TRIP BETWEEN NEW YORK AND BOSTONQ.)

THE EQUATIONS WERE INSUFFICIENT TO FIND A SOLUTION (USING THE FOLLOWING KNOWN RELATIONSHIPS) ( (EQUAL (D 1ST A N C E) (T I M E S (SPEED) (TIME) » (EQUAL (DIS­TANCE) (TIMES (GAS CONSUMPTION)

From the collection of the Computer History Museum (www.computerhistory.org)

Page 15: A Question-Answering System for High School Algebra Word

606 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964

(NUMBER OF GALLONS OF GAS USED») )

(ASSUMING THAT) «DISTANCE) IS EQUAL TO (DISTANCE BETWEEN BOSTON AND NEW YORK) (ASSUMING THAT) «GAS CONSUMPTION) IS EQUAL TO (GAS CONSUMPTION OF MY CAR» (ASSUMING THAT) ( (NUMBER OF GALLONS OF GAS USED) IS EQUAL TO (NUMBER OF

TWEEN NEW YORK AND BOSTON»

(THE NUMBER OF GALLONS OF GAS USED ON A TRIP BETWEEN NEW YORK AND BOSTON IS 16.66 GALLONS)

It uses the first word of each variable string as a key to its glossary. The one exception to this rule is that the words "number of" are ignored if they are the first two words of a variable string. Thus, in this problem, STU­DENT retrieved equations which were stored under the key words distance, gallons, gas, and miles. Two facts about distance had been stored earlier; "distance equals speed times time" and "distance equals gas consumption times number of gallons of gas used". The equations implicit in these sentences. were stored and retrieved now-as possibly useful for the solution of this problem. In fact, only the second is relevant.

Before any attempt is made to solve this augmented set of equations, the variables in the augmented set are matched, to identify "slightly different" variables which refer to the same object in the model. In this example "(DISTANCE) ," "(GAS CONSUMPTION) ", and "(NUMBER OF GALLONS OF GAS USED)" are all identified with "similar" vari­ables. The following conditions must be satis­fied for this type of identification of variables P1 and P2:

1) P1 must appear later in the problem than P2.

2) P1 is completely contained in P2 in the sense that P1 is a contiguous substring within P2.

This identification reflects a syntactic phe­nomenon where a truncated phrase, with one or more modifying phrases dropped, is often

used in place of the original phrase. For ex­ample, if the phrase "the length of a rectangle" has occurred, the phrase "the length" may be used to mean the same thing. This type of iden­tification is distinct from that made using pro­noun substitution.

In the example above, a stored schema was used by identifying the variables in the schema with the variables that occur in the problem. This problem is solvable because the key phrases "distance", "gas consumption" and "number of gallons of gas used" occur as sub­strings of the variables in the problem. Since STUDENT identifies each generic key phrase of the schema with a particular variable of the problem, any schema can be used only once in a problem. Because STUDENT handles schema in this ad hoc fashion it cannot solve problems in which a relationship such as "distance equals speed times time" is needed for two different values of distance, speed, and time.

E. Possible Idiomatic Substitutions There are some phrases which have a dual

character, depending on the context. In the example below, the phrase "perimeter of a rec­tangle" becomes a variable with no reference to its meaning, or definition, in terms of the length and width of the rectangle. This defini­tion is unneeded for solution.

(THE PROBLEM TO BE SOLVED IS) (THE SUM OF THE PERIMETER OF A RECTANGLE AND THE PERIMETER OF A TRIANGLE IS 24 INCHES. IF THE PERIMETER OF THE RECTANGLE IS TWICE THE PERIMETER OF THE TRI­ANGLE, WHAT IS THE PERIMETER OF THE TRIANGLE Q.)

(THE PERIMETER OF THE TRIANGLE IS 8 INCHES)

However, the following problem is stated in terms of the perimeter, length and width of the rectangle. Transforming the English into equa­tions is not sufficient for solution. N either re­trieving and using an equation about "inches", the unit in the problem, nor identifying "length" with a longer phrase serve to make the problem solvable. Therefore, STUDENT looks in its dictionary of possible idioms, and

From the collection of the Computer History Museum (www.computerhistory.org)

Page 16: A Question-Answering System for High School Algebra Word

finds one which it can try in the problem. STU­DENT

(THE PROBLEM TO BE SOLVED IS) (THE LENGTH OF A RECTANGLE IS 8 INCHES MORE THAN THE WIDTH OF THE RECTANGLE. ONE HALF OF THE PERIMETER OF THE RECTANGLE IS 18 INCHES. FIND THE LENGTH AND THE WIDTH OF THE RECTANGLE.)

THE EQUATIONS WERE INSUFFICIENT TO FIND A SOLUTION TRYING POSSI­BLE IDIOMS (THE PROBLEM WITH AN IDIOMATIC SUBSTITUTION IS)

(THE LENGTH OF A RECTANGLE IS 8 INCHES MORE THAN THE WIDTH OF THE RECTANGLE. ONE HALF OF TWICE THE SUM OF THE LENGTH AND WIDTH OF THE RECTANGLE IS 18 INCHES. FIND THE LENGTH AND THE WIDTH OF THE RECTANGLE.)

(ASSUMING THAT) «LENGTH) IS EQUAL TO (LENGTH OF RECTANGLE) )

(THE LENGTH IS 13 INCHES) (THE WIDTH OF THE RECTANGLE IS 5 INCHES)

actually had two possible idiomatic substitu­tions which it could have made for "perimeter of a rectangle"; one was in terms of the length and width of the rectangle and the other was in terms of the shortest and longest sides of the rectangle. When there are two possible sub­stitutions for a given phrase, one is tried first, namely the one STUDENT has been told about most recently. In this problem, the correct one was fortuitously first. If the other had been first, the revised problem would not have been solvable, and eventually the second ( correct) substitution would have been made. Only one non-mandatory idiomatic substitution is ever made at one time, although the substitution is made for all occurrences of the phrase chosen.

In this problem, the idiomatic substitution made allows the problem to be solved, after identification of the variables "length" and "length of rectangle". The retrieved equation about inches was not needed. However, its presence in the set of equations to be solved did not sidetrack the solver in anyway.

HIGH SCHOOL ALGEBRA WORD PROGRAMS 607

This use of possible, but non-mandatory idio­matic substitutions can also be used to give STUDENT a way to solve problems in which two phrases denoting one particular variable are quite different. For example, the phrase, "students who passed the admissions test" and "successful candidates" might be describing the same set of people. However, since STUDENT knows nothing of the "real world" and its value system for success, it would never identify these two phrases. However, if told that "suc­cessful candidates" sometimes means "students who passed the admissions test", it would be able to solve a problem using these two phrases to identify the same variable. Thus, possible idiomatic substitutions serve the dual purpose of providing tentative substitutions of defini­tions, and identification of synonomous phrases.

F. Special Heuristics.

The methods thus far discussed have been applicable to the entire range of algebra prob­lems. However, for special classes of problems, additional heuristics may be used which are needed for members of the class, but not ap­plicable to other problems. An example is the class of age problems, as typified by the prob­lem below.

(THE PROBLEM TO BE SOLVED IS) (BILL S FATHER S UNCLE IS TWICE AS OLD AS BILL S FATHER. 2 YEARS FROM NOW BILL S FATHER WILL BE 3 TIMES AS OLD AS BILL. THE SUM OF THEIR AGES IS 92. FIND BILL SAGE.)

(BILL S AGE IS 8)

Before the age problem heuristics are used, a problem must be identified as belonging to that class of problems. STUDENT identifies age problems by any occurrence of one of the fol­lowing phrases, "as old as", "years old" and "age". This identification is made immediately after all words are looked up in the dictionary and tagged by function. After the special heuristics are used the modified problem is transformed to equations as described previ­ously.

The need for special methods for age prob­lems arises because of the conventions used for denoting the variables, all of which are ages. The word age is usually not used explicitly,

From the collection of the Computer History Museum (www.computerhistory.org)

Page 17: A Question-Answering System for High School Algebra Word

608 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964

but is implicit in such phrases as "as old as". People's names are used where their ages are really the implicit variables. In the example, for instance, the phrase "Bill's father's uncle" is used instead of the phrase "Bill's father's uncle's age".

STUDENT uses a special heuristic to make all these ages explicit. To do this, it must know which words are "person words" and there­fore, may be associated with an age. For this problem STUDENT has been told that Bill, father, and uncle are person words. The "spaces -s" following a word is the STUDENT representation for possessive, used instead of "apostrophe -s" for programming convenience. STUDENT inserts a "S AGE" after every per­son word not followed by a "S" (because this "S" indicates that the person word is being used in a possessive sense, not as an independ­ent age variable). Thus, as indicated, the phrase "BILL S FATHER S UNCLE" becomes

"BILL S FATHER S UNCLE SAGE".

In addition to changing phrases naming peo­ple to ones naming ages, STUDENT makes certain ~pecial idiomatic substitutions. For the phrase "their ages", STUDENT substitutes a conjunction of all the age variables encountered in the problem. In the example, for "THEIR AGES" STUDENT substitutes "BILL S FA­THER S UNCLE S AGE AND BILL S FA­THER S AGE AND BILL S AGE". The phrases "as old as" and "years old" are then deleted as dummy phrases not having any meaning, and "will be" and "was" are changed to "is". There is no need to preserve the tense of the copula, since the sense of the future or past tense is preserved in such prefix phrases as "2 years from now", or "3 years ago".

The remaining special age problem heuristics are used to process the phrases "in 2 years", "5 years ago" and "now". The phrase "2 years from now" is transformed to "in 2 years" be­fore processing. These three time phrases may occur immediately after the word "age", (e.g., Bill's age 3 years ago") or at the beginning of the sentence. If a time phrase occurs at the beginning of the sentence, it implicitly modifies all ages mentioned in the sentence, except those followed by their own time phrase. For ex­ample, "In 2 years Bill's father's age will be 3

times Bill's age" is equivalent to "Bill's father's age in 2 years will be 3 times Bill's age in 2 years". However, "3 years ago Mary's age was 2 times Ann's age now" is equivalent to "Mary's age 3 years ago was 2 times Ann's age now". Thus prefix time phrases are han­dled by distributing them over all ages not modified by another time phrase.

After these prefix phrases have been dis­tributed, each time phrase is translated ap­propriately. The phrase "in 5 years" causes 5 to be added to the age it follows, and "7 years ago" causes 7 to be subtracted from the age preceding this phrase. The word "now" is de­leted.

Only the special heuristics described thus far are necessary to solve the first age problem. The second age problem, given below, requires one additional heuristic not previously men­tioned. This is a substitution for the phrase "was when" which effectively decouples the two facts combined in the first sentence. For "was when", STUDENT substitutes "was K years ago. K years ago" where K is a new variable created for this purpose.

(THE PROBLEM TO BE SOLVED IS) (MARY IS TWICE AS OLD AS ANN WAS WHEN MARY WAS AS OLD AS ANN IS NOW. IF MARY IS 24 YEARS OLD, HOW OLD IS ANN Q.)

(THE EQUATIONS TO BE SOLVED ARE) (EQUAL X00008 «ANN / PERSON) S AGE» (EQUAL «MARY / PERSON) SAGE) 24) (EQUAL (PLUS «MARY / PERSON) S AGE) (MINUS X00007») «ANN / PER­SON) SAGE» (EQUAL «MARY / PERSON) SAGE) (TIMES 2 (PLUS «ANN / PERSON) S AGE) (MINUS X00007» ) ) )

(ANN S AGE IS 18)

In the example, the first sentence becomes the two sentences: "Mary is twice as old as Ann X00007 years ago. X00007 years ago Mary was as old as Ann is now." These two occurrences of time phrases are handled as discussed pre­viously. Similarly the phrase "will be when" would be transformed to "in K years. In K years".

From the collection of the Computer History Museum (www.computerhistory.org)

Page 18: A Question-Answering System for High School Algebra Word

These decoupling heuristics are useful not only for the STUDENT program but for peo­ple trying to solve age problems. The classic age problem about Mary and Ann, given above, took an MIT graduate student over 5 minutes to solve because he did not know this heuristic. With the heuristic he was able to set up the ap­propriate equations much more rapidly. As a crude measure of STUDENT's speed, note that STUDENT took less than one minute to solve this problem.

G. When All Else Fails. For all the problems discussed thus far, STU­

DENT was able to find, a solution eventually. In some cases, however, necessary global in­formation is missing from its store of informa­tion, or variables which name the same object cannot be identified by the heuristics of the program. Whenever STUDENT cannot find a solution for any reason, it turns to the ques­tioner for help. As in the problem below, it prints out "(DO YOU KNOW ANY MORE RELATIONSHIPS BETWEEN THESE VAR­IABLES)" followed by a list of the variables in the problem. The questioner can answer "yes" or "no". If he says "yes", STUDENT says "TELL ME", and the questioner can ap­pend another sentence to the statement of the problem.

(THE PROBLEM TO BE SOLVED IS) (THE GROSS WEIGHT OF A SHIP IS 20000 TONS. IF ITS NET WEIGHT IS 15000 TONS, WHAT IS THE WEIGHT OF THE SHIPS CARGO Q.) THE EQUATIONS WERE INSUFFICIENT TO FIND A SOLUTION (DO YOU KNOW ANY MORE RELATION­SHIPS AMONG THESE VARIABLES) (GROSS WEIGHT OF SHIP) (TONS) (ITS NET WEIGHT) (WEIGHT OF SHIPS CARGO) yes TELL ME (the weight of a ships cargo is the difference between the gross weight and the net weight) THE EQUATIO.NS WERE INSUFFICIENT TO FIND A SOLUTION

(ASSUMING THAT) «NET WEIGHT) IS EQUAL TO (ITS NET WEIGHT)

HIGH SCHOOL ALGEBRA WORD PROGRAMS 609

(ASSUMING THAT) «GROSS WEIGHT) IS E QUA L TO (GROSS WEIGHT OF SHIP»

(THE WEIGHT OF THE SHIPS CARGO IS 5000 TONS)

In this problem, the additional information typed in (in lower case letters) was sufficient to solve the problem. If it was not, the question would be repeated until the questioner said "no", or provides sufficient information for solution of the problem.

In the problem below, the solution to the set of equations involves solving a quadratic equa­tion, which is beyond the mathematical ability of the present STUDENT system. Note that in this case STUDENT reports that the equa­tions were unsolvable, not simply insufficient for solution. STUDENT still requests addi­tional information from the questioner. In the example, the questioner says "no", and STU­DENT states that "I CANT SOLVE THIS PROBLEM" and terminates.

(THE PROBLEM TO BE SOLVED IS) (THE SQUARE OF THE DIFFERENCE BETWEEN THE NUMBER OF APPLES AND THE NUMBER OF ORANGES ON THE TABLE IS EQUAL TO 9. IF THE NUMBER OF APPLES IS 7, FIND THE NUMBER OF ORANGES ON THE TA­BLE.) UNABLE TO SOLVE THIS SET OF EQUA­TIONS

TRYING POSSIBLE IDIOMS (DO YOU KNOW ANY MORE RELATION­SHIPS Al\1:0NG THESE VARIABLES) (NUMBER OF APPLES) (NUMBER OF ORANGES ON TABLE) no

I CANT SOLVE THIS PROBLEM

H. Summary of the STUDENT Subset of English

The subset of English understandable by STUDENT is built around a core of sentence and phrase formats which can be transformed into expressions in the STUDENT deductive model. On this basic core is built a larger set of formats. Each of these are first transformed into a string built on formats in this basic set and then this string is transformed into an ex-

From the collection of the Computer History Museum (www.computerhistory.org)

Page 19: A Question-Answering System for High School Algebra Word

610 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964

pression in the deductive model. For example, the format ($ IS EQUAL TO $) is changed to the basic format ($ IS $), and the phrase "IS CONSECUTIVE TO" is changed to "IS 1 PLUS". The constructions discussed earlier in­volving single object transitive verbs could have been handled this way, though for pro­gramming convenience they were not.

The basic linguistic form which is trans­formed into an equation is one containing "is" as a copula. The phrases "is equal to" and "equals' are both changed to the coupla "is". The auxiliary verbal constructions "is multi­plied by", "is divided by" and "is increased by" are also acceptable as principal verbs in a sen­tence. As discussed in detail earlier, a sentence with no occurrence of "is" can have as a main verb a transitive verb immediately followed by a number. This number must be an element of the phrase which is the direct object of the verb, as in "Mary has three guppies". This type of transitive verb can also have a compara­tive structure as direct object, e.g., "Mary has twice as many guppies as Tom has fish".

This completes the repertoire of declarative sentence formats. Any number of declarative sentences may be conjoined, with ", and" be­tween each pair, to form a new (complex) declarative sentenCe. A declarative sentence (even a complex declarative) can be made a presupposition for a question by preceding it with "IF" and following it with a comma and the question.

Questions, that is, requests for information from STUDENT, will be understood it they match any of the following patterns (where $ will match any string, and $1 anyone word):

(WHAT ARE $ AND (WHAT IS $) $)

(FIND $ AND $)

(HOW MANY $ DO $ HAVE)

(HOW MANY $1 IS $)

(FIND $)

(HOW MANY $ DOES $ HAVE)

This completes the summary of the set of input formats presently understood by STU­DENT. This set can be enlarged in two distinct ways. One is to enlarge the set of basic for­mats, using standard subroutines to aid in de-

fining, for each new basic format, its interpre­tation in the deductive model. The other method of extending the range of STUDENT input is to define transformations from new input for­mats to previously understood basic or exten­sion formats.

Even if a story problem is stated within the subset of English acceptable to STUDENT, this is not a guarantee that this problem can be solved by STUDENT (assuming it to be solva­ble). Two phrases describing the object must be at worst only "slightly different" by the criteria prescribed earlier. Appropriate global information must be available to STUDENT, and the algebra involved must not exceed the abilities of the solver. However, though most algebra story problems found in the standard texts cannot be solved by STUDENT exactly as written, the author has usually been able to find some paraphrase of almost all such prob­lems which is solvable by STUDENT.

1. Limitations of the STUDENT Subset of English

The techniques presented here are general and can be used to enable a computer program to accept and understand a fairly extensive subset of English for a fixed semantic base. However, the current STUDENT system is ex­perimental and has a number of limitations.

STUDENT's interpretation of the input is based on format matching. If each format is used to express the meaning understood by STUDENT, no misinterpretation will occur. However, these formats occur in English dis­course even in algebra story problems, in se­mantic contexts not consistent with STU­DENT's interpretation of these formats. For example, a sentence containing", and" is always interpreted by STUDENT as the conjunction of two declarative statements. Therefore, the sentence "Tom has 2 apples, 3 bananas, and 4 pears." would be incorrectly divided into the two "sentences", "Tom has 2 apples, 3 ba­nanas." and "4 pears."

Each of the operator words shown in Figure 2 must be used as an operator in the context as shown or a misinterpretation will result. For example, the phrase "the number of times I went to the movies" which should be inter-

From the collection of the Computer History Museum (www.computerhistory.org)

Page 20: A Question-Answering System for High School Algebra Word

preted as a variable string will be interpreted incorrectly as the product of the two variables "number of" and "I went to the movies", be­cause "times" is always considered to be an operator. Similarly, in the current implementa­tion of STUDENT, "of" is considered to be an operator if it is preceded by any number. How­ever, the phrase "2 of the boys who passed" will be misinterpreted as the product of "2" and "the boys who passed".

These examples obviously do not constitute a complete list of misinterpretations and errors STUDENT will make, but it should give the reader an idea of limitations on the STUDENT subset of English. In principle, all of these restrictions could be removed. However, re­moving some of them would require only minor changes to the program, while others would require techniques not used in the current sys­tem.

For example, to correct the error in inter­preting "2 of the boys who passed", one can simply check to see if the number before the "of" is less than 1, and if so, only then interpret "of" as an operator "times". However, a much more sophisticated grammer and parsing pro­gram would be necessary to distinguish differ­ent occurrences of", and" and correctly extract simpler sentences from complex coordinate and subordinate sentences.

Because of limitations of the sort described above, and the fact that the STUDENT system currently occupies almost all of the computer memory, STUDENT serves principally as a demonstration of the power of the techniques utilized in its construction. However, I believe that on a larger computer one could use these techniques to construct a system of practical value which would communicate well with peo­ple in English over the limited range of mate­rial understood by the program.

IV. CONCLUSION

A. Results The purpose of the research reported here

was to develop techniques which facilitate nat­ural language communication with a computer. A semantic theory of coherent discourse was proposed as a basis for the design and under­standing of such man-machine systems. This

HIGH SCHOOL ALGEBRA WORD PROGRAMS 611

theory was only outlined, and much additional work remains to be done. However, in its pres­ent rough form, the theory served as a guide for construction of the STUDENT system, which can communicate in a limited subset of English.

The language analysis in STUDENT is an implementation of the analytic portion of this theory. The STUDENT system has a very nar­row semantic base. From the theory it is clear that by utilizing this knowledge of the limited range of meaning of the input discourse, the parsing problem becomes greatly simplified, since the number of linguistic forms that must be recognized is very small. If a parsing sys­tem were based on any small semantic base, this same simplification would occur. This sug­gests that in a general language processor, some time might be spent putting the input into a semantic context before going ahead with the syntactic analysis.

The semantic base of the STUDENT lan­guage analysis is delimited by the character­istics of the problem solving system embedded in it. STUDENT is a question-answering sys­tem which answers questions posed in the con­text of "algebra story problems." We shall use four general criteria for evaluating this ques­tion-answering system.

1) Extent of Understanding. Other question­answering systems analyze input sentence by sentence. Although a representation of the meaning of all input sentences may be placed in some common store, no syntactic connection is ever made between sentences.

In the STUDENT system, an acceptable in­put is a sequence of sentences, such that these sentences cannot be understood by just finding the meanings of the individual sentences, ignor­ing their local context. Inter-sentence depend­encies must be determined, and inter-sentence syntactic relationships must be used in this case for solution of the problem given. This extension of the syntactic dimension of under­standing is important because such inter-sen­tence dependencies (e.g., the use of pronouns) are very commonly used in natural language communication.

The semantic model in the STUDENT sys­tem is based on one relationship (equality) and

From the collection of the Computer History Museum (www.computerhistory.org)

Page 21: A Question-Answering System for High School Algebra Word

612 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964

five basic arithmetic functions. Composition of these functions yield other functions which are also expressed as individual linguistic forms in the input language. The input language is richer in expressing functions than Lindsay's or Raphael's system. Some logic-based ques­tion -answering systems may have more rela­tionships (predicates) allowable in the input, but do not allow any composition of these predi­cates. The logical combinations of predicates used are only those expressed in the input as logical combinations (using and, or, etc.)

The deductive system in STUDENT, as in Lindsay's and Raphael's programs, is designed for the type of questions to be asked. It can only deduce answers of a certain type from the input information, that is arithmetic values satisfying a set of equations. In performing its deductions it is reasonably sophisticated in avoiding irrelevant information, as are the other two mentioned. It lacks the general power of a logical system, but is much more efficient in obtaining its particular class of deductions than would be a general deductive system util­izing the axioms of arithmetic.

2) Facility for Extending Abilities. Extend­ing the syntactic abilities of most other ques­tion-answering systems would require repro­gramming. In the STUDENT system new defi­nitional transformations can be introduced at run time without any reprogramming. The in­formation concerning these transformations can be input in English, or in a combination of English and METEOR, if that is more appro­priate. New syntactic transformations must be added by extending the program.

The semantic base of the STUDENT system can be extended only by adding new program, as is true of other question-answering systems. However STUDENT is organized to facilitate such extensions, by minimizing the interactions of different parts of the program. The neces­sary information need only be added to the program equivalent of the table of operators in Figure 2.

Similarly, the deductive portion of STU­DENT, which solves the derived set of equa­tions, is an independent package. Therefore, a new extended solver can be added to the system by just replacing the package, and maintaining

the input-output characteristics of this subrou­tine.

3) Knowledge of Internal Structure Needed by User. Very little if any internal knowledge of the workings of the STUDENT system need be known by the user. He must have a firm grasp of the type of problem that STUDENT can solve, and a knowledge of the input gram­mar. For example, he must be aware that the same phrase must always be used to represent the same variable in a problem, within the l~~~.j..~ ~.J! ~~~~l~_~.j..~? ,J~.g~~,J ~~.~l~~~ U ~ ~ •• ~.j.. UllUlIl:) V~ 1:)1111UCtl1l1.Y U1;:;11111;:;U I;:;Ctl. uta. ~~I;:; l.l1Ui::IlI

realize that even within these limits STUDENT will not recognize more than one variation on a phrase. But if the user does forget any of these facts, he can still use the system, for the inter­action discussed in the next section allows him to make amends for almost any mistake.

4) Interaction with the User. The STU­DENT system is embedded in a time-sharing environment and this greatly facilitates inter­action with the user. STUDENT differentiates between its failure to solve a problem because of its mathematical limitations and failure from lack of sufficient information. In case of failure it asks the user for additional informa­tion, and suggests the nature of the needed in­formation (relationships among variables of the problem). It can go back to the user re­peatedly for information until it. has enough to solve the problem, or until the user gives up.

STUDENT also reports when it does not recognize the format of an input sentence. U sing this information as a guide, the user is in a teaching-machine type situation, and can quickly learn to speak STUDENT's brand of input English. By monitoring the assumptions that STUDENT makes about the input, and the global information it uses, the user can stop the system and reword a problem to avoid an unwanted ambiguity, or add new general in­formation to the global information store.

B. Extensions The present STUDENT system has reached

the maximum size allowable in the LISP system on a thirty-two thousand word IBM 7094. Therefore, very little can be added directly to the present system. All the programming ex­tensions mentioned here are predicated on the

From the collection of the Computer History Museum (www.computerhistory.org)

Page 22: A Question-Answering System for High School Algebra Word

existence of a computer with a much larger memory.

Without inventing any new techniques, I think that the STUDENT system could be made to understand most of the algebra story problems that appear in first year high school text books. If new operators, new combinations of arithmetic operations occur, they can easily be added to the subroutine which maps the kernel English sentences into equations. The number of formats recognizable in the system can be increased without reprogramming through the machinery available for storing global information. Th~ problems it would not handle are those having excessive verbiage or implied information about the world not ex­pressible in a single sentence.

As mentioned earlier, the system can now make use of any given schema only once in solving a problem. This is because the schema equation is added to the set of equations to be solved, and the variables in the schema only identified with one another set of variables ap­p'earing in the problem. For example, if "dis­tance equals speed times time" were the schema, the "distance", as a variable in the schema might be set equal to "distance traveled by train" or "distance traveled by plane", but not both in the same problem. This problem could be resolved by not adding the schema equation directly to the set of equations to be solved, but by looking for consistent sets of variables to identify with the schema variables. Then STU­DENT could add an instance of the schema equations, with the appropriate substitutions, for each consistent set of variables found which are "similar" to the schema variables.

At the moment the solving subroutine of STUDENT can only perform linear operations on literal equations, and substitutions of num­bers in polynomials and exponentials. It would be relatively easy to add the facility for solving quadratic or even higher order solvable equa­tions. One could even add, quite easily, suffici­ent mechanisms to allow the solver to perform the differentiatio~ needed to do related rate problems in the differential calculus.

The semantic base of the STUDENT system could be expanded. In order to add the relations recognized by the SIR system of Raphael, for

HIGH'SCHOOL ALGEBRA WORD PROGRAMS 613

example, one would have to add on the lowest level of the STUDENT program the set of ker­nel sentences understood in SIR, their map­ping to the SIR model, and the question-:answer­ing routine to retrieve facts. Then the ap­paratus of the STUDENT system would process much more complicated input statements for the SIR model. One serious problem which arises when the semantic base is extended is based on the fact that one kernel may have an interpretation in terms of two different seman­tic bases. For example, "Tom has 3 fish." can be interpreted in both SIR and the present STUDENT system. To resolve this semantic ambiguity,' the program can check the context of the ambiguous statement to see if there has been one consistent model into which all the other statements have been processed. If the latter condition does not determine a slngle preferred interpretation for the statement, then both interpretations can be stored.

One use for our model for generation and analysis of discourse would be as a hypothesis about the linguistic behavior of people. STU­DENT may be a good predictive model for the behaviour of people when confronted with an algebra problem to solve. This can be tested, and such a study may lead to a better under­standing of human behaviour, and/or a better reformulation of this theory of language pro­cessing.

I think we are far from writing a program which can understand all, or even a very large segment of English. However, within its nar­row field of competence, STUDENT has dem­onstrated that "understanding" machines 12 can be built. Indeed, I believe that using the tech­niques developed in this research, one could construct a system of practical value which would communicate well with people in English over the range of material understood by the program.

REFERENCES

1. BOBROW, D. G., "METEOR: A LISP In­terpreter for String Transformations"; in (13)

2. MCCARTHY, J., et. aI., LISP 1.5 Program­mers Reference Manual; MIT Press, Cam­bridge, Mass. ; 1963

From the collection of the Computer History Museum (www.computerhistory.org)

Page 23: A Question-Answering System for High School Algebra Word

614 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964

3. CORBATO, F. J., et. aI., The Compatible Time-Sharing System; MIT Press, Cam­bridge, Mass. ; 1963

4. GREEN, B. F., et. aI., "Baseball: An Auto­matic Question Answerer"; Proc. W JCC; May 1961

5. LINDSAY, R. K., "Inferential Memory as the Basis of Machines Which Understand Natural Language," in (14)

6. RAPHAEL, B., "A Computer Program UTl,,;.· .. h 'TTnAQ't'C!h~nAC!' ". Ptrn ... "Ii' Try''''· QT\~'t'_ 'f'f .LJ..L"""".L.I, "'-J.I..L\,A"",U""",,.L.L"""U , .... I vv • .& u '-''''''' .....,.t'~.L

tan Press, Baltimore, Maryland; 1964

7. SIMMONS, R. F., "Answering English Ques­tions by Computer-A Survey"; SDC Re­port SP-1556, Santa Monica, California; April 1964

8. BOBROW, D. G., "Natural Language Input for a Computer Problem Solving System", Ph.D. Thesis, Mathematics Department, M.I.T., Cambridge, Mass.; 1964

9. QUINE, W. V., Word and Object; MIT Press, Cambridge, Mass.; 1960

10. CHOMSKY, A. N., Syntactic Structures; Mouton and Co.; S-Grauenhage; 1957

11. YNGVE, V., "A Model and an Hypothesis for Language Structure," Proceedings of the American Philosophical Society, vol. 104, No.5; 1960

12. MINSKY, M., "Steps toward Artificial In-

13. BERKLEY, E. C. and BOBROW, D. G. (edi­tors) The Programming Language LISP: Its Operation and Applications, Informa­tion International Inc., Cambridge, Mass.; 1964

14. FEIGENBAUM, E. and FELDMAN, J. (edi­tors) Computers and Thought, McGraw Hill, New York; 1963

From the collection of the Computer History Museum (www.computerhistory.org)

Page 24: A Question-Answering System for High School Algebra Word

THE UNIT PREFERENCE STRATEGY IN

THEOREM PROVING* Lawrence Wos, Daniel Carson, and George Robinson

A rgonne National Laboratory Argonne, Illinois

Unit Preference and Set of Support Strategies

The theorems, axioms, etc., to which the algorithm and strategies described in this paper are applied are stated in a normal form de~ned ~. follows: A literal is formed by pre­fixIng a predicat€' letter to an appropriate number of . argUments (constants,' variables, or expressions formed with the aid of function symbols) and then perhaps writing a negation sign (~) before the predicate letter. For €X~ ample:

P(b,x) -P(b,x) Q(y) R(a,b,x,z,c) S

are all literals if P, Q, R, and S are two-, one-, five-, and zero-place predicate letters, respec­tively. The predicate letter is usually thought of as standing for some n-place relation. Then the literal P (a,b), for example, is thought of as saying that the ordered pair (a,b) has the property P. The literal - P (a,b) is thought of as saying that (a,b) does not have the prop­erty P.

One may build a clause by writing a sequence of literals separated by disjunction (logical "or") signs. Logical "or" will be symbolized by a small letter "v" (distinguishable from pos­sible uses of "v" as a variable from context). Where it is desired to indicate dependence of a particular argument of a predicate letter upon one or more other variables, function symbols are employed. The clause

P(x,y,z) v Q(x,f(x,y»

thus says that either the ordered triple (x,y,z) has the property P or the ordered pair (x,f (x,y» has the property Q (or both). Each of the variables in a clause is then thought of as being universally quantified (that is, if the variable x occurs in a clause, the clause is as­sumed to be preceded by an implicit quantifier, "for each x." Functional expressions such as f (x,y) are then treated as existentially quanti­fied variables. Roughly speaking, f (x,y) in the example above stands for an element (depend­ing on x and y) which forms, when used as the second element with x as the first element an , , ordered pair which has the property Q. A unit clause is a clause composed of a single literal.

Finally, one may consider a sequence of clauses implicitly joined by logical "and" (con­junction). Such a sequence of clauses will be said to be in (conjunctive) rwrrnal form.

Instantiation, as applied to this normal form, can be thought of as the forming' of a possibly less general (more specific) i1lJ8tance of a clause by performing a systematic replacement of variables by constants, new variables, or by expressions formed with the aid of function symbols. Substituting b for x and f(d,u) for y in the clause

P(x,y) v Q(b,y)

would yield a less general instance of that clause:

P(b,f(d,u» v Q(b,f(d,u».

* Work performed under the auspices of the U. S.Atomic Energy Commission.

615

From the collection of the Computer History Museum (www.computerhistory.org)