in logic programming s stems eeiiieiiiiiiie state univ ... · in logic programming s stems 1/ (u)...
TRANSCRIPT
SAD-A 36 522 FILE SEARCHING PROBLEMS IN LOGIC PROGRAMMING S STEMS 1/
(U) FLORIDA STATE UNIV TALLAHASSEE DEPT OF MATHEMATICSAND COMPUTER SCIENCE C M EASTMAN FEB 83
UNCLASS1FIED AFOSR-TR-83-1252 AFOSR-81-0110 F/G 9/2 NLEEIIIEIIIIIIIEEIEEEEEEEIIEEEmA-16 2 E SEHN RE MSI OGCPOGAMNGSE MS /
IW.L 13 2
96- BA.
L.
nn1.25 11.4 1 6
MICROCOPY RESOLUTION TEST CHAR?NATIONAL SRAO 01 ANDAROS l~t, A
SECURITY CLASSIFICATION OF THIS PAGE
REPORT DOCUMENTATION PAGEI& REPORT SECURITY CLASSIFICATION IIb. RESTRICTIVE MARKINGS
UNCLASSIFIED_______________________
2& SECURITY CLASSIFICATION AUTHORITY 3. DISTRISBUT tON/AVAI LABILITY OF REPORT
IApproved for public release; distributionW. DECLASSIP ICATIOI4IDOWNGRADING SCH4EDULE unlimited.
4. PERFORMING ORGANIZATION REPORT NUMBER(S) S. MONITORING ORGANIZATION REPORT NUMBER(S)
NAMEOPERFRMINGORGANZATIO A]FOS'"eh 8 ~3 -1252.. NME F PRFOMIN ORANIATIN Ib. OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION
Florida State University Air Force Office of Scientific Research
6c. ADDRESS (City. Stat# and ZIP Code) 7b. ADDRESS (City. Stabe dad ZIP Code)
Department of Mathematics and Computer Directorate of Mathematical and InformationScience, Tallahassee FL 32306 Sciences, Bolling AFB DC 20332
Ba. NAME OF FUNDINGISPO#4SORING Bb. OFFICE SYMBOL 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBERORGANIZATION (if appicable)
-AOSR INM AFOSR-8-0110Bc. ADDRESS (City. State anid ZIP Code) 10. SOURCE OF FUNDING NOS.
Balling AFB DC 20332 PROGRAM PROJECT TASK WORK UNITELE ME NT NO. NO. NO. NO.
It. TITLE (include Secumity Claamifteat4on)
SEE REMARKS ,,A) KeveuO e 61102F 2304 A212. PERSONAL AUTHOR(S)
Caroline M. Eastman13&. TYPE OF REPORT 13b. TIME COVERED 14. DATE OF REPORT (Yr., Mo.. Day) 15. PAGE COUNT
1S. ABSTRACT (Continue on revrse if necebsary and iden tf by block rnumberA
During this period, the investigator intended to investigate alternative approaches forimproving searching performance in logic programming systems to a level that would beacceptable in a production system by conducting experiments in the LOGLISP system. Dueto incompatibilities between the DEC 10 source computer and the CDC CYBER 760 runningunder NOS, which was available at Florida State University, as well as the differencesbetween the UCI LISP on the DEC 10 and the ALISP available on the CYBER, it wasimpossible to bring the L0GLISP system to fully operational status and perform theexperiments.
40 DISTRIGUTION/AVAILADILITY OF ABSTRACT 21. ABSTRACT SECURITY CL.AzSIFti
UNCLASSIPISO/UNLIMITEO [I SAME AS RPT. QOTIC USERS 0 UNCLASSIFIED2U. NAME OF RESPONSIBLE INDIVIDUAL 22k TELEPHONE NUMBER 2c.OFIESML
(in lude Arme Cede) 1 2.OFIESMODr. Robert N. BUChal (202) 767-4939N1
00FORM 1473,83 APR EDITION OF I JAN 12I OBSOLETE. 1K QTETEflSCURITY CLASSIFICATION OF THIS PAGE
UNCLASIFIEDSICURITY CLAMSIPICATION Of THIS PAGE
ITEM #11, TIT.E- FILE SEACHING PROBLEMs IN LOGIC PROGRMING SYSTEMS
koossion .ForRTIS GRA&IDTIC TABUnannounced 0Justification
ByDistribution/___
Availability Codes 01.90
Avail and/orDist Special
UNCLASSIFIED
SECURITY CLAIISIPICATION OP THIS PAGE
A&=, !N.. .
AFOSR.Ths 8 3 12 52
PILE SPA~nc PfcKM IwGIC PFDWM SYS'IU
CaroiLum ftI. Eastm~an
L-ebruary 1983
A final report an work perroruald Urdsr grant A.V&0-81-011O.
APProVed for publita el e-otsgdistributleftnimtd
84 01 04 103,
TABLE OF QOWOM
Logic Progruzmdng 1
Loglimp .ortaDilty
Fiie Organizations ror Logic Programing 10
A Logic -esign 2:ur a &4.cumnt Retreva.L Database 14
References 1
Appendix A: Index to tne .- LISP Reference Mana. 20
Appendix B: Caq~arimon or ALISP and UICI LISP 26
CbSf. .ah4~1 iafmttDiviso
-4. ,."
LOGIC PROGRAMMING
The area ot knowledge representation has received activeresearch interest recently as more powerrul knowledge-basedsystems have been developed. Such systems show potentiaL inseveral application areas, including database management systems,decision support systems, and automatic programming systems. Avariety or tecnques tor knowledge representation nave beenexplored; one or tne more promising is tne use or resolutionlogic.
Among the many problems which must be resolved betore sucbapproaches can be used in product.on systems is that or searcnerriciency. Current systems eitner nandie a variety or problemstructures at the cost ot relatively unconstrained search orconstrain search at the cost or rigidly derined proolemstructures. An additional search problem resits trcm tne needto expand the current small systems, which are primarily in-coresystems, to much larger size.
Prolog is a programming language based upon the use orresolution logic which provides a high level nonprocedura±rec Mr am tor writing programs and representing data. 7he Prolog
language is described in Warren e= &L (1977), and the underlyinglogical theory is described in Kowalski (1974) and Robinson(1965). Kowalski (1979) provides an extensive introduction tologic programming witn emprisis on Prolog.
7he basic construct useo in Prolog is the clause, whicconsists ot a series or terms. An example of such a clause is
grandparent(x,y) :- parent(x,z), parent (z,y)
The first term is tne nead; tne rest maKe up tne body. Ifal or the terms in the body are true, then the nead is true. Aprocedure is a set ot clauses. A clause witn an epty body isreterred to as a unit clause. Suppose tnat tue rollowing unitclauses are adcled to the previous clause:
parent (Jim, Jane)
parent (Jane, Jennirer)
parent (Jane, John)
parent (Joe, John)
parent (Jim, Janice)
1
parent (Josie, Jane)
gmen posing the goal grandarent (u, Jenn ter) wi.a tinda±± or jennizer's grandarents. Posing the goal graprent(Jim, v) will find Jim's grand=.udren. Posing the goalgrandparent (u, v) wix tin all or the pairs or grandparents andgra dren knon to the system.
Oansider the tirst case given, that or rin ing Jenniter'sgrandparents. The system attempts to snow tnat te head is trueby sawing tat botn or tne clauses parent tu, y) and parent (y,Jennifer) are true. (Here x is replaoed by u, and z is replacedby Jennirter.) It can do th=s by subst tung y = Jane ami u =Jim or by substituting y - Jane ana u = Josie. Te process ortrXin ng appropriate Sbsttur1Ons roL tne variables is rererredto as unification. 2* process ot tinding an appropriatewLurication means traversing a search space witn mtiny possiblecnoices ot clauses and variable assigmnents am includes toepossibility or backtracking it a particular path does not workout. It is also possible tnat no appropriate unirxicat.on wil beround.
7he exauple given here is quite siqnle. More elaborateexanpLes can be round in the rererences previously given. Prologcan De used for a variety ot applications ranging fromintelligent databases to autanatic programming systems.
Lisp is a list oriented Language based upon the -ambdacalculus. TWo ot the zmn expository descriptions are given inGreenberg (1978) and Siklosaky (1976). A brier swuary ot Lispdevelopment is given in MN. tny (1978). Its predauinant use isin artiricia intelligence work.
In Lisp, both programs and data are represented as lists andare not explicitly distinguisred. For exanple, a (very simple)Lisp program to evalate the square root or 3.3 + (4.1 x 5.2)could be written as
(SW (FPLUS 3.3 (FTD 4.1 5.2)))
'This is a two item list; the second item is itselr a list.tm tus rimotion is evaluated, the muiltiplication in the uersubList is evaluated first. Then the 3.3 is added. Finally thesquare root is taken.
Lisp contains a variety or tunctions and special constructs,Including tnose ot taking aprt am putting togetner lists,testing conditiM, and nnapulating muters and strings.
2
I
A-..
Prolog and Lisp are not equal.ly easy to use over a taillrange ot applications. For exaile, the gradprent exampLe usein thie discussion ot Prolog would be imich harder to write in Lispsinc, a tunction which explicitly takes apart lists representingparent intromticn would need to be written. a the otner hand,the sizrp~e Lisp calculation given would be uuch harder to writein Prolog sinc rem!L aritnmUstc and square roots are tuon narderto naixale in a logic context.
Loglisp, is a system wnich combines tue advantages or botnProlog and Lisp. It has been developed under Air Forcesponsorap (PAX) by Roabinson anxa Sibert (1981) at SyracuseUniversity as an extension to UCI Lisp for the DE:-.u. * in thislanguage, Lisp is extended to aL.l.ow tne use ot logicprogramuing. 7hle syntax and techniques are noat quire tWe ame asin tue Prolog system, but tue some basic capebi.it.Les areprovided. 11he Lisp features and tue Logic teatures may bezintermueu, or only one set of teatures ay be used. So bornPattern matching (as in Prolog) and function eva.Luation (as inidsp) mey be easi.Ly done in tue context ot an integrated
language.
32
LWGLISP PORMBILITY
IjLolisp was originally inplemented at Syracuse University; adescription of this system is given in mobinson and Sibert(1981). It was implemented as an extension to UCI Lisp, a'dialect ot Lisp 1.6, on the DBC-10 (Quam and vittie, undated;Borow, Burton, and Lewis, undated). It has also been convertedto run under INTEMLISP. (Schrag, 1982).
ALISP is a Lisp dialect based on Lisp 1.5 and described in3b& ALM user I anua l (Univ. Mass., undated). It was developedat tne University of Massachusetts at Amherst on %.ontroi Datamaintrames. The rirst version ran on a CDC 3600/3800 under tneUMASS timesnaring system. The current version runs on LX Cybersunder the NOS operating system. In addition to basic Lispfeatures, it provides a compiler, a limited programmingenvirornent (editor, file system, and pretty-printer), andapplications packages (relational database system and grapnicsroutines).
Since the manual proved very dir£icult to use witnout anindex, one was constructea. It is given in Appendix A. (Themanual reters to its index, but it was not present and coula notbe located.) Since the reasability or conversion trom UCI Lispto ALISP was being examined, a comparison list of tunctions wasconstructed. It is given in Appendix B. Ibis list snowstunctions present in each ot the languages and includes coulentson runcLon similarities and ditterences. It can be seen tromtnis list that twere are substantial dirrerences between tne twolanguages.
I. ~imp DUaLO=t
There are mny oaalects of Lisp in existence. There navebeen some errorts at standardization witnin tne Lisp commumity,but these nave not met witn overwneiming success (e.g. Marti e=a., 1979; Steele At AL, 1982). While the basic core or thelangge is the same trm dialect to dialect, the additionalfeatures provided as part ot the system and the manner in wnichtne system is implemented can ditfer widely. Such diversity isnot surprising and pernaps nealtny in an experumental language,but it nas nampered tne use of Lisp in more production-orientedenviron ints.
Smt (1981) describes a conversion system to translateprograms written in LISP 1.6 to 111ISLISP which was motivated Dytne need to convert a program used in a couiler testing system.2e conversion system was designed to run under eitner version oL
4
L7 ,i ..
LISP. It depends primariLy on pattern substitution techniquesand tunction redetinitions to convert LISP 1.6 features toInTERISP, including most tunctions, I/0 functions, the escapecharacter, strings, names, an numbers. Hw-er, LEXPIs, macros,arrays, and a few tunctions were excluded from tne conversionsystem.
Samet classiried saoe problems as irreconcxLable; tneseincluded mainly problems witn ditrerences in data typedetinitions. A working detinition of an irreconcilable problemin this context is one which could not be handled by astraighttorward trans ormation or wnicn could not be bandledwitnout run-time support. The conversions tnat were perrormeuwere divided into those based on tne external torm ot the programand those based on tne semantics ot basic constructs.
so~±Vftr CDDMtr5jgA SLUdi&&
Conversion or sortware developed in one environment toanother environment is an important activity (U.S. GAO, 1977,1980). Despite the extensive resources devoted to conversionerrorts, little torma! attention nas been paso to it. Most orour k ledge about sortware conversion is institu-onaL innature, based upon extensive experience.
Gilb (1977) defines sot tware portabi.ty as "the ease orconversion trcm one environment to anotner; the relativeconversion cost tor a given conversion metnoo or algoritLu." Hemeasures portability as 1-(/ER), where ET is the cost totranster tne system to a ditferent environment, ano ER is theoriginal cost ot developing the sottware.
This measure or portability depends not only on the system,but also on the two environments. The omatablity between thetwo environments can be measured as the average portability orsystems converted trom one environment to tne other. Gilb'smeasures provide a measure of portability ana compatabilityrelative to a population or tasks, but they do not directlyaddress the question of predicting the ertort involved witnoutguidance tram previous experience witu tnose systems.
Boehm (1981) tackles this more ditricult question orestimating the resources required for conversion. His estimateor conversion costs is based on calculating a value tor EDSI,equivalent delivered source instructions. This is estimated as
MS - (ADSI) x (AAF/100).
Here ADBI is the actual deliverea source instructions in the codeto be converted, and AAF is an adaptation aadustment tactorcalculated as
PAP"- (0.40 x EK) + (0.30 x CK) + (0.30 x IR).
5
777~
Here rm is an estuate o the percent of the design modified, Ciis an estimate or the percent of lines of code wich must bem dified, and n4 is an estimate or tne percent or the orrgina±integration and testing wnich must be perrormed on the convertedsottware.
The total conversion effort in man-months can tnen beestimated as
M9= 2.4 x (EDSI)**1.05.
Boebm includes an extensive discussion or cost driver factorswncih can be usd to take into acccount such ractors as systemcouplexity, reliability requirements, programming language, andstart experience.
Obviously these estimates, especid,.±y ror D1 and IM, must bein large part subjective. However, this approach provides astructured framework for the problem of estimating sottwareconversion etfort wnose wortn is supported by extensiveexperience.
A -umber of incompaability problems betweeen UCI Lisp andALISP were encounterea. These were categorized as environmentinxwqxtabilities, feature incuwptabi ities, syntactic
cLitabites ana tundn ntal incoaiarabilities.
The envirormental inconatabiities incluaed
The character sets usea in te two systems are differentbotn in size and in encoding. his created some delay in evenreading a Loglisp tape. Furtnermore, some characters used forspecial purposes in Loglisp are rererrea to Dy tneir encoaeavaLue (GEVAL)
Both systems included an editor as part or the environment.These allow both structure editing and pattern matching. However,the cawudm used are difterent. The editor is used by Loglisp inorder to handle editing ot knowledge bases. Ebrmatters areprovideu in both systems, but difrerent runction names are used.
Both dialects of Lisp provide functions to alow access tothe System tile system in order to Mailitate rue nandling witnthe Lisp system. The capabiuities provided are simuar, but tnewunerLying file system are not.
6
Feature incomuptabilites included
here are many unctions present in UCI Lisp wnich nave nocorresponding runctions in ALSIP. For tne most part tnese can benanoled in a straightrorwara manner simruy by writing newtunction detinitions. Appendix B contains a List ot tnetunctions present in botn Lisp dialects.
UCI Lisp provides a macro capability; ALISP does not.
Features i.pleaented using macros tnus need to be rewritten.
Syntactic inopatabilites included
In many cases aurterent names were used in tne two clia.iectstor the same tunction. Examples include PBSVAL (ALISP) vs. ABS(UCI Lisp) and uIFF (ALISP) vs. DIFFERENCE (UCI Lisp). 7heseproblems are also quite straighttorward to handle by renaming.
Inonn tet umjm zu=n
I few cases the syntax used tor runctions was notconsistent. For example, the paraneter order ror MAPC isditterent. Although nandling tnese situations isstraigntrorward, they are more subtle since they appear correct.
Fundamental incompatabilites included
I ALISP function derintions are stored in tne vaLue ce.u.or the appropriate ±iierai atoms. In UCI Lisp tunctiondetinitions are stored on the property list. Thus tunctiondetinitions are easier to change on the ry in UCI Lisp. Sincethis is done in Loglisp in order to switch between Lisp anaLogic, substantia.L conversion problems are presented.
It should be noted trat, witn very tew exceptzons, theseproblems are inherent in the dialect ditterences and are not dueto tne design and implementation or oglisp. It would beextrumently ditticult to implement a system witn omplexfWWt3n nLLat-y wLtnour making tull use Ot the teatures available.
7
lan.- Ri~~9tSmilaritV
The simiarity between ALISP and UCI Lisp can be examinea oycomparing the tunctions avaiiable in the two languages. A simplemeasure of such simlarity between two programming languages isgiven by
SIM= / (Ni + N2 - OV)
where N1 is the nnber of function names used in one language, N2is the number ot tunction names used in tne other language, andOV (overlap) is the number of function names used in botnlanguages. This measure ranges between 0 (least simi.ar) ana 1(most similar).
There are a number of factors which are not taken intoaccount Joy tnis measure. Function names Ln Lisp and otherprogramming languages are not used witn equal frequency, and tnemost trequent ones snould pernaps be given nigner weignts. Also,no distinction is made between a tunction name used for arunction present in one language but not tne other and a runctionname used for a function which is present in botn languages butcalled by different names.
There are 303 function names tor UCI Lisp ana 227 tor ALISP;these are given in Appendix B. There are 80 overlaps, includingmost of the "core" Lisp functions. The similarity is 0.18. Itis interesting to note that the simiuarity computed Joy the samemeasure between C(OL and Ada is 0.32 (Eastman, 1982). Sc thekeyword similarity between two dialects of the same Language canactually be less tnan that between two distinct languages.
2., Fat Donei Rf fort
The formulas given in Boen were used to estimate theconversion effort required tor a full conversion of Loglisp tothe aLISP system. The Loglisp system contains approximately2,000 lines of code, as tormatted Dy tne pretty-printer. Thefactors iA and IM are estimateo at 30% to a..±Low for te change inthe system environment and tunction definition mechanism as wellas the more straightxorward changes. CM is estimated at 50%.With these figures, EDSI is estimateo at 720 ana conversionextort in man-months at 1.7. The tape conversion required about0.25 MM (Franson and Haslup, 1981), and the conversion ot aminimal core system required about 0.5 MM.
Of course, these tigures provide only a very rough estimateor conversion effort. They were developed based on experiencewin systems written in otter languages, and it is not at allclear how well they apply to Lisp. Since programmuing in Lisp ismore ciuplicated than programming in C(OL or FaRNM, it islikely that these tormulas wiL unoerestimate tne exrort
8
required. Also, the concept of "lire ot oode" is not as welLdetined for Lisp as it is tor many otner ianguages since programsare not divided into statements in tne same way as ror manylanguages. 7he approximation used nere was to simply use tnelines ot text provided by the prettyprinter; rnoever, the Linebreaks could have been done in many dirterent ways. It would behtgh.Ly desirable to have a working detintion ot sourceinstruction tor Lisp that could be used in sz= estimations andto nave data validating tneir use in a Lisp envirorment.
9
- - ---. *-- - .-.
FILE O GANIZATIONS FOR IGIC PROGRAING
The clauses in a Logic program can be divided into the twocategories ot rules and facts. A fact is detined here as aclause with no body and no variables. A query is a clausecontaining only a body. All other clauses are rules. So a tactwill be at the end ot an inference patn and wi.i not lead toturtner steps. A rule contains eitner varia.Les or interences(or ooth) and can lead to a turrner unixication step. Thisdivision follows that made by Klahr (1979) and corresponicluueiy to similar (but not necessarily identicai) distinctionsmade by others (e.g. unit and non-unit clauses, ground ana non-ground clauses, extensiona± aata base ano intensiona± data base,assertions and implications.)
Searching in logic based systems can thus be broken downinto two distinguishable but related searching problems: ruleselection an fact retrieval. Rule selection involves tne choice
t or the next clause or goal to use in the inference process. Factretrieval involves locating a particular clause. However, thedistinction is not absolute. Located facts are used in theinterence process, and selecting rules involves tiaing them aswell. Fact retrieval is generally discussed witnhin tne contextor tne retrieval paradigm used in the database area; overviewsare provided in Knuth (1977) ana Wiederhold (1977). Ruleselection talls witnin the heuristic search paradigm used inartiticial intelligence. In most current work, these twosearching problems are kept separate; this approach appears to bemore erxicient than intermixing them.
2L. Maba h~iaix
The rel.ive importance ot rule selection an tact retrieva.aepenos in large part on the application. Same applications,such as database systems, have relatively tew rules. Others,such as theorem proving systems, have relativeLy more rules.Since searching problems in unrestricted resolutions sytems arenot yet well understood, it is reasonable to consider only asubset o such problems. One way to narrow tne problem is toConsider application areas ot interest to see what theimplications ot their speciric characteristics are tor searchstrategies.
Database applications are considered here since use orartiricial intelligence techniques can provide databases witnmore tiexibility than is possible in current x Pmmrcia.L systems.A possible def nition of an intelligent data base is that it is adatabase from which implicit information wnich is not explicitlystored can be retrieved. One well-studied class of suchdatabases allows the use ot logical inference. Such interentialdatabases are discussed in Gallaire and Minker (1978), Klahr
10
(1979), and Minker (1978). (It is possible to have otner typesor inference, e.g. statistical.)
Some of the characteristics to be expected of an inferentialdatabase based on logic principles are
a. Low ratio ot rules to tacts
b. Low level o most proors
c. Possible need for best match and range searches
d. Rapid response tor interactive use
e. Possible need for extensive updating
t. Potentially very large size
g. Varying degrees of data accuracy and timeliness
h. Not aj.z relevant data included (open world assumption)
These characteristics are not necessarily tound in otherlogic application aras. For example, none are likely to be trueor theorem proving applications. These factors intluence theacceptable searching strategies.
SLiaitALM Qkhot
C -rent artiticial intelligence systems depend heavily onhashing. When it is feasible, hashing is extremely rast ana isotren the metnod o choice. This reliance upon hashing hasallowed researchers to concentrate on the problem structure andheuristic search without getting bogged down in the details ottact retrieval. And this approach has not created any problemsin the relatively malJ in-core systems preduinant in artiricialintelligence work. However, hashing has sane limitations whichmay cause problems in such systems; tnse involve update problems,best match and range searching, and locality o± reterenoe.
Hash indices degenerate unoer update to a greater extentthan many other index systems (e.g., B-trees). This requiresmore garbage collection and possible rehasn ig. whi.Le thepointer chasing involved in Lisp garbage collection does notpresent a problem in in-core systems, it may in systems tnatinvolve storage over several levels of Memory hierarchy.
A second problem is tnat, unless tne hean tunction used Isorder preserving, it is very hard to handle situations involvingbest match or range searches. The respond to such a request, thesystem is reduced to hashing on ai± possible values that mighttal within th desired range or to a sequential search. Indextechniques which preserve order intormation are much better ableto handle such questions.
11
management of a memory hierarchy is far more ertective whentnere is a nigh degree ot locality of reterence. This allows tneI/0 overhead to e spread over a greater number ot accesses.large syste wil presuably depend on tne use ot a memoryhierarcty and will need to use storage techniques which preservelocality ot reterence.
It would be desirable to exploit the advantages or nasningeven in a much larger system. However, mechanisms which preserveorder and locality wll be required.
M oia 9L LUS ena
Almost all of the work on selection of twe organizationshas rocused on situations in wnich the workload to be handled isbotn precisely known and stable. Under tnose conditions, thereare mnwy aigoriunms know which can be used to determine optima,riLe organizations anca/or parameters. However, these appoacnesare not necessarily as userul when the workload is neitnerprecisely known or stable, as is to be expected in an intel igentdatabase.
In such a context only partia± intormation wiL be availableabout the classes and frequencies or queries wnich wuil be posedto the system, the tields which wil be used in queries, and thefrequencies with which those rieLas will be used. The relativeumportance ot update and retrieval is also unlikely to be knownprecisely.
Consider a ground relation wnich has mive possible tieldswhich could be used for indexing. If the frequencies witn whichthe ditterent combinations or fields will be used in retrieva.and update are known, it is possible to determine a reasonablesolution to the problem ot determining wnicn or the tieLds shouldbe indexed (e.g. Anderson ano Berra, 1977).
However, if little is known about the workload that thesystem will handle, such algoritrm are not directly applicable.They must be sujppLmented by other approaches, wach mightinclude worklad sanpling, risk analysis, statistical analysis,possibility theory, and expert systems. The system can maintainstatistics on requests posed over time in order to estimate tneworkload at a given ti e. This information can be used to selectan optimsl or near optimal configuration based on tnisinformation. However, this may not be adequate it tne workloadis highly variable. The use of logic procedures to analyzeuseage data and determine appropriate storage organizations isconsistent with the goal of an integrated logic system.
It wll be difticult to extend the size ot currentIntellient systems by one or more orders of magnitude witnout tueuse or ditterent narawre architectures. The are two maindirections in this architectural work: the use ot multiple
12
prooessors and the use of content addressabie (or associative)storage. multiple processors, whicha could be in a distributedsystem# could be used to explore, in yaraL.LIi, a.terflate patnatnrou'jh te search upsoe. Assocatie storage can be usd fortau retrieva.L. (Scirag 1981) describes a possioJie .&..uJarchitecture tor hand]in logic databse applications. Even ir awmqxziter architecture vell adopted to trese types or anL icationsis used, it viii stui neew to be used etrectively. So storageorganizations viii contiue to be a pcoblem even troigh tneprobim parameters wiii change.
13
A LOGIC DESIGN FOR A DOCW REIRIEVAL IATSE
IL D.AD t~J le a tm
Te term intormation retrievai is widely used in both abroad sense and a narrow sense. In its narrower meaning, aninformat.ion retrieval system is a reterenoe retrieva. system. Itis this use which will be considered here. Such systems providelists of reterenoes to documents in response to user queriesoaseu on topics, authors, dates or publication, and similar typesor information. Information retrieva.L systems dirrer trmdatabase management systems tor tormattet data in thneir emphasison textual data.
The index terms used to retrieve tne documents may be chosentram a caretu lly controlled thesaurus of allowed terms. Or theimportant words from the title, the abstract, or the entire textmay De used in a less oontro.Led situation. Norma.lzation to astandard torm is generally allowed in order to eliminatevariation caued by plurals, verb endings, and orner sufrixes.In current commercial systems, boolean combinations or indexterms are generally used in queries. More elaborate matchingfunctions have been used in experimental systems, and researchindicates that they might be more etrective than conventiona.boolean matching.
Information retrieva.L systems are in cmon use today. Theyrange from extensive comercial services, such as Locheed' s
IAEG system, which provide access to dozens ot diterentdatabases (collections ot documents) to small systems intendedtor personal use to experLwentai systems. An introduction tointormation retrieval systems may be found in Salton and McGill(1983).
Evaluation of information retrieva± systems is based onpetormance effectiveness as well as eticiency. The response toa query tor documents on a particular topic will generallyincluKe item which are not reievant to the query and taiito include some items which are relevant. The extent ot suchfailures is measured by recall and precision. Recall is detinedas the traction of the relevant documents wnich are retrieved.Precision is defined as the traction of the retrieved documentswnicn actually are relevant.
Databeen performedd An Unt
Extensive work has been pertormed on tne application otlogic to databases. 7he volume edited by (iallaire and vinker(1978) contains mny pcunir Ly theoretical papers on tne subject.
Daru (1982) describes a database application wit a Spanishlanguage intertace which was implemented in Prolog. The eaxpleappli .ation is a ml1 uplayee database.
14
.. !
To describe a database, Dahl ues domain detinitions, whicha-Lso include hierarchica! relationshps, relation declarations,to serve as relation templates, and relation detinitions, whichinclude the actual data. Coemlo (1979) describes an adaptationof this approach to Portuguese.
L. ani £MtuWm = a M Sytm
Documents in an information retrieva± system need to bedescribed by their contents and by the appropriate bibliograpnicinformation. A possible set of relation declarations tor thisintormation is
title (document title)
author (document autnor order)
date (document puJblication-date)
publisher (document publisher)
topic (document concept)
The first four relations (title, author, date, andpublisher) provide the basic bibliographic information tor thedocument. Here it is assumed that, for a given document, tnereis only one title, one publication date, and one publisher.There may however, be more that one author. The order ispreserved tor an author so that it is possible to tell the oLderin whch the authors were listed.
The domain for document is assumed to be a set or uniquedocument identi£iers. It is not possible to reliably identiry adocument by author, by title, by publisher, or even by date; so aunique identirier is required. (The need for unique identiriertor the various entities about wnich information is stored hasbeen glossed over in work on logic databases.)
A document may contain intormation about many topics, and atopic may be discussed in many docu emts. It is possible toobtain all the topics contained in a document or ail thedocuments discussing a topic by posing the symmetric queries:
<- topic (document x)
<- topic (x concept)
where either a specific document or a specific concept isindicated.
A bibliographic description of a document may be detinedusing
15
description (u v w x) <- title (u v),publisher (u w),date (u x)
<- author (document x y)<- description (document v w x)
Usually a combination ot topics is specified in a queryrather than a single topic. Boolean queries invoviing theoperators h and OR could be readily handled using the Loglispboolean operators or other extralogical teatures to avoidrepetition in query specitication.
Retrieval pertormance can be enhanced y tne use or synonyminformation. Even it one term is used in a query, documentsciassiried with synonyms of that term can be retrieved. 2hisreature could be handled by the tollowing ground relation
synonym (wordl word2)
and the tollowing rules
synonym (x y) <- synornm (y x)
synonym (x y) <- synonym (x z), synonym (z y)
topic (document x) <- topic (document y), synonmm (x y)
A more elaborate thesaurus could be inpleented to nanclehierarchical reLatonsnips.
Ooncept weights are rrequently used in order to indicate tnerelative ixortance or topics ciscussed in *cuent. Sucnconcept weights could be handled by using an expanded groundrelation
topic (document concept weight)
A query which request documents which are about a speciric topiccan tam specity a cutotr weight
<- topic (document concept x), greater (x cutort)
Since, in most systas, concepts w.xn zero weights are uike.tyto be explicitly entered, testing the weight to see it it isgreater than zero Mould be unnecessary.
AMere is som evidence that retrievas perrormence can beenhanced by using matching functions other than esipe booLean,mtcdig. sggeuons includ the use o smilarity tunctionsbaed on vector nodels ot documents and queries and tne use ortizzy set theory. Such munures could be plimented in Logic
16
but might be more easily izilenented in extralogical teaturessucha as those avau.able in Loglisp.
!dw closed world assumption that ail re.Levant information is]present in the kcnowledge base is not viable in informationretrieval systemls. It is generally accepted tnat, a documntmay contain intormation about a concept even though tnat, concepthas not been used to index it. Thbus failure can niot beinterpreted as negation.
17
RERENCES
Henry D. Anderson ana P. Bruce Berra, "Minimum cost selection orsecondary indices for formatted rlies, &Z TXAD&IgJ1f ontanas am, Vol 2, No 1, March 1977, pp 68-90.
Robert J. Boorow, Richard R. Burton, and Daryle Lewis, UIa LISPManurL, undated.
Barry W. Boehm, Software E aeg F m r, Prentice-Hall,I E 'ewood Clirts, New jersey, 1981.
Heicer (oehlo, "A program conversing in Portuguese providing alibrary service", PhD thesis, University ot Edinburgh, December1979.
Veronica Dahl, No database systems developnent tnrough Logic",AX X!a=IQ= DA n b SatM, Vol 7, No 1, March 1982, p102-123.
C. M. Eastman, "A lexical analysis of keywords in high levelprogramming languages", submitted for publication.
Jo n Franson and Lee Haslup, "Implementing Loglisp at FSJ",Florida State University, 1981.
Herve Gallaire ana JacK Minker (editors), "Vig A an a, Ba&Plenum Press, New York, NY, 1978.
Tom Gilb, ftwae at m, Winthrop rublishers, Cambridge,Massachusetts, 1977.
P. Klahr, "Conditional answers in question-answering systems",
kia g M 1, pp 481-483, 1979.
D. E. Knuth, b A t g g XL * r. r=Zd±n.. olue II."5...r an AW 5rtajn, Mdison-Waley Publishig Campny, Reading,Massacusett, 1973.
Robert Kowaski, LJa Lor plm -AgSAM, Elsevier worthHolland, Inc., New York, NY, 1979.
Robert Kowalski, "Predicate logic as programming language", Zo=I= 2A, North HolLana Publishing Oo., Amsterdam, pp 569-574,1974.
J. arthy, "History of LISP", f£gaE ZiaLgM Vol 1, No 8, pp217-224, 1978.
J. Marti, A. C. Hearn, M. L. G(iss, and C. Grims, "Standard LISPreport", SIGUM U, Vol 14, No 10, October 1979, pp 4-8.
18
Minker, "Search strategy and selection tunction tor aninferential relational system", AM gn t on &taas&Am, Vol 3, No 1, pp 1-31, 1978.
Lynn H. Quan and Whitfieeld Diffie, Stford LISP I.& Mia ,Stanford University, undated.
J. A. Robinson, "A machine oriented logic based on the resolutionprincipleO, Jo uaU of tb A M, Vol 12, pp 23-41, 1965.
J. A. Robinson and E. E. Sibert, "Logic programming in LISP",RADC-'I-80-379, Vol 1, Rome Air Development Center, Grirtiss mirrorce Base, New York, 1981.
Gerard Salton and Michael McGill, I to ModernLRetrieal, McGraw-Hill book Lmpany, New York, NY,1983.
Hanan S -et, "Experience witn sottware conversion", SQ waL1zEaw D r , Vol 11 105-1069, 1981.
Robert C. Schrag, "Logical data base machines", SyracuseUniversity, 1981.
Robert C. Schrag, personal comumnication, 1982.
Guy L. Steele, Jr. r& Al, "An overview of Oammn Lisp", AC4Syqmsium on Lisp ano Functional Programing, 1982.
U. S. General Accounting Office, "Millions in savings possible inconverting programs from one computer to another", FED-77-34,1977 (reterencea in U. S. GAO, 1980)
U. S. General Accounting Office, "Wider use or better cmputersotware technology can improve management control ana reducecostsm, FGNSD-80-8, 1980.
University of Massachusetts, ALM5P Reference aA, Universityor asachusetts at Amherst, undated.
D. H. D. Warren, L. M Pereira, and F. Pereira, "PRLCG-tnlanguage and its implementation -a red witn LISP", IGPLANNtia, Vol 1, No 8, pp 109-115, 1977.
G. Wiederhold, DLJ b Dwa, McGraw-Hill Book Cumpany, NewYork, NY, 1977.
G. A. Wilson and J. Minker, 'Resolution, retineaents and searchstrategies: a cmparative study", I= Dn nZstua, Vol C-25, No 8, Rp 182-801, 1976.
19
APPfIOIX A
rfi= TO TM ALLSP WFMEWNC MAMM~
20
____________________________
ALISP INDEX
* 48
aboval 108addP. 102addgt 90addlt 90addi 108and 76append 96aply 54apply* 54argn 59array 116arrayp 117arrayl 116arrtype 117as.ii 149atiength 39atmhash 33
backprn 120backtrk 119bnum 116break 131
c .... r 95car 95cars 95cdr 95cdrs 95close 146clrbit IIInTm ent 170Ocmpile 199Conc 97concons 96cond 75cons 96copy 98COPfi.Le 168cos 109cshift 110
dooW 98de 63decfile 174dstprop 93deletel 102df 63diff 107dim 117dispose 147divide 107do 79dooons 81
21
' ___t______II il lli .... . .
ALISP INDEX
echo 11,150edit 178ettace 102entry 138eofmark 153eotskip 153eotstat 152eo.LL 9eow 25eq 83eqp 84eqs 114equal 88err 124errprin 120errset 121esnitt 110
eva. 45,53evalquote 45exit 49,138exp 109
filestat 145fix 106fixp 106flambda 57float 107floatp 106fntype 65frunno 145fuzz 86
gc 14genchar 7gensym 37get 93gettun 65getval 42go 77greaterp 87grinu 168gts 114
haitpri 30hprnwn 30hw 116
if?7illegal 40irbase 15initfile 164input 164intadd 90intern U25imunit 9
22
ALISP INDEX
label 60lanbda 57lap 204last 95length 95lessp 87list 96list lie 168litp 36lnum 116load 154log 109logand 110logical 107lognot 110logp 106logor 110logxor 110its 114local 147
mapc 71mapcar 71mapoon 71
mapoonc 71mapl 71maplist 71max 87mmb 83meier 8min 87minus 108minusp 87
nconc 101ril 37normtab 152null 37numberp 106
oblist 33Oddp 87open 144or 76outbase 16output 165outputa 165outuflut 24
pack 39packl 32paget iLe 2paramtl 141paramgt 141parami 125
23
i.- - I ]
ALISP MM
plist 34,42,93plus 107plUap 87]Iprine 177wrint 177prinarray 117prinb 32prinbeg 70prinend 24,30prinlen 30print 25prinI 30prog 77progn 79prompt 10prop 93purgepurgfi-ie 167put 93
qsetcq 40quotient 107
randy 110read 12readhrray 117reabeg 22readch 23readend 22readent 22readlen 22readrib 2readnm 23readk 23reanter 12r iob 34remprop 93repeat 81return 77reverse 99rewinW 152rplaca 100rplacd 100runtim 126
save 154aeLe q 76set 40setblt 111
sin 109slash" 152Special 33sqrt 109status 13
24
ALLSP IN
strcars 113strcdrs 113strconc 114strrinca 114strtest 114SUDi 108Sys 45sysin 47sysprin 48sysout 47
teread 22terpri 25tints 107togbit 11ntraoe 135tracflg 135tatbit il1ttyckiar 11
uflitflos 145unpacK 39untrace 135
va.Luep 42
wipe 34wipelist 34
zerop 87
25
APPEND~IX B
A cflARL90N BE'11iEM ALISP AND~ UCI LISP
26
t=i LM, ALMB amom
abs aboval
aos (U) aboval abs,
aa&l
allotand andand= (u)
apply applyapply*
apply- (u)arg
argnarray array
arraylarrayp
asciiarrtypsasciiascii
wsinasuoc(u)atsom (u)auuon (u)
atlength
atcm atntam
bacllprnbacktrk
bakgagbigium
bnumboole logand, logior, logxor
bporgbreak breakbreak (u)breakd (u)bresiwip (U)breai (U)breaibacrom (U)brokent z (u)
car carcars
odc odr
chrct
dirva± (U)
27I
UCI lisp ALISP Qu~~
cirbitcczueft
cxu ieconcon
co ~neocono conscons conscoW (U) o
copy (U) COPY
cood (U)Ocoed (u)
cshit
dcopyddlt (U)ddtind&tt
de dedecf ile
derprop detpropdeletel
depositdf d
dift ditterenoeclitter~eCLM ditt
disposedivide divide
doclocons
dreawn (u)dreverfie (U)drm (u)dakin (U?)dekout (u)do (u)deutat (U)
edmedit
editom (U).ditdstault (u)edite (U)edit (U)editf 1ld (U)eatfils (U)adittyt (U)editi (U)eatp (U)
28
MIC LIS awI Comm=
editracstn (u)editv (u)editye (u)
ettaceentryeowakeorskipeotstat,eo.Lreo.Lw
eq eqeqpeqs
equal equalerr err
errprinferror (Merrse Muerrset, errset
esrurteva~l eval,
evalquoteevaiv (u)
exarrayexcise
exitexp (u) exp
explodeexpodecexpr
texpr
t3uestattix tix
tixia
fautdatiatsizetJlatsitzec (u)float (U) float
fioatp
toro,torkpt, (u)
fntypefree MUfreelist (u)
frunnofaftrrunargfunction
29
UCX LISP ALISP ctuu"fl
f urction.LLtuzz
gc 9cgod
gctlmgenchar
gensym
greaterp greaterp,grind
grirxetgrini MUgrinprops MU
gts
hairprik~lcor (u)h~hend (Mhghorg (uI)
hprnim
ibas inbaseidentitier
iliegalinbaae ibase
incinitfue
initfninitpracp (u)input input
integerintern intern
inunit,
label labellambda lambda,lap (u) laplapist (u)last lastlastword (u)lastpos (u)lconc (u)idiff (uI)length le1 tlessp, lesuplexorder (u)lexpr
30
UCI LI. ALISP CMET
linelengthlineread (u)list list
listf iielitatom (u)
litplnwn
load loadlog (u) log
logmnd boolelogicallogPlogor boolelogxor boole,
lablsubrlaubat (u)
its
macromaknam
MapMRpc mawc ditterent p~arameter orderumpcan (u)umpear mapearmapoon (u) mapoonaqMpoW (u) mapoonc
naplist maplistmaux (ii) manxmuxievel (u)minb (u) mmmemer memb*er
ittr- (u)mwq
mq.(u)min (u) minminus minus
modaeaar (u)
nooncnooncnoons rmnug (u)nxtev (u)nil nilnil (u)nocall. (u)
normtabnot
nti (u)
null null
31
Uci LISP ALLSP ODWM~I
flmbery, flLmerp
Oblist oblistoddP
or oror- (u)
outbaseoutcoutput output
outPutaOutunit
outval (11)
packpackipegetileparauparamgcparuumi.
patcn (u)pgiLine
plus plistplus plus
plusp,
pprinewprint
prevev (U)prinl prini
prinarray
prirtbegprimc
prinendprinlen
print printprintliev (u)prog progprogi (u)prog2progn progn
*prmt (U) promuptproppurgepurgfiJleput
putpropPiurm
quote stquotient quotient
32
UCI LimP ALISP (XOUflS
ranzm (u) randyrandy rand=
read readreadarrayreadbeg
reacdch readchreadentireadent.
readlistreadLenreadldnp
readnreadp (u) raL
ree, (u)reoiainder remainderrb rei~remove (u)raiprop remprop
repeatresetretrrom (u)return returnreverse reverse
rewindrplaca rplacarplacd rplacd
runtine
sassoc uses egsave
se.Lectq (u) selectqset setsetarg
setjDitsetchr (u)
setqsin (u) sinSimd (U)Sima (u)
slashes
spxbindspecial (u) specialspecstrspdlxr (u)spdlpt (u)spdlrt (u)sRxedo (u)OpceaL (u)sprintairt (u) uqrt
Statusstkaount (u)trnm (u)
33
L *,-"A
UCI LISP ALISP COMEY'S
stkntn (u)stkprt (u)stksrch (u)store
strcarsstrcdirsstrconcstrtiMd
stringp (u)strtest
sul ~ SUblsutlis (u)suopair (u)subrsubst
Syssyscir (u)
sysinsysprinsysout
ttab (u)taiip (u)tan (u)tamn (u)toonc (u)
tereadterpri terpri
tines timstogbit
trace tracetracedfns (u)tracet
tracflgtstbitttychar
ttyecho (u)tyijtyo
unbound (u)unbreak (u)undoist (u)unfind (u)
unitnosunpack
untrace (u) untracewntyi (u)upfindlg (u)
valuep
wipe
34
U)CI LISP ALISP CONU11'S
wipelist
xconszerop, zerop
$eof $
*amake
*appendwdit*evai
expandi
*less*max (u)*min (u)*nopoint,*plus*pUtsymh*quo*rset
35
0 1