learner manual

7/31/2019 Learner Manual

1/41

LearnerA Knowledge Acquisition System

Release 0.11 September 2001

by Timothy Chklovski and Matthew Fredette


2/41

Copyright c 2001 Tim Chklovski, Matt Fredette. All rights reserved.

Permission to use, copy, modify and distribute this documentation for any purpose andwithout fee or royalty is hereby granted, provided that you agree to comply with the copy-right notice and statements, including the following disclaimer, and that the same appearon ALL copies of the documentation, including modifications that you make for internaluse or for distribution:

THIS SOFTWARE IS PROVIDED "AS IS", AND TIM CHKLOVSKI AND MATT FRE-DETTE MAKE NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED.By way of example, but not limitation, TIM CHKLOVSKI AND MATT FREDETTEMAKE NO REPRESENTATIONS OR WARRANTIES OF MERCHANTABILITY ORFITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE LI-CENSED SOFTWARE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRDPARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.

The names of Tim Chklovski and Matt Fredette may NOT be used in advertising or publicitypertaining to distribution of the software. Title to copyright in this software and anyassociated documentation shall at all times remain with Tim Chklovski and Matt Fredette,and USER agrees to preserve same.


3/41

i

Table of Contents

1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.1 Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Installing Learner . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Running Learner on Your System . . . . . . . . . . . 5

4 Using the Web Interface . . . . . . . . . . . . . . . . . . . 8

4.1 Starting a Conversation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.2 Tips on using the Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5 Using the Command Line Interface . . . . . . . . 10

5.1 The 30-second tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105.2 Teaching Learner Thinking Rules. . . . . . . . . . . . . . . . . . . . . . 125.3 Asking Learner Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

6 How Does Learner Work? . . . . . . . . . . . . . . . . . 13

6.1 Making Similarity Judgments . . . . . . . . . . . . . . . . . . . . . . . . . . . 136.2 Making Analogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156.3 Making Inferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

7 Learner Data Flow Architecture . . . . . . . . . . . 17

7.1 Generating Questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177.2 Pruning and Ordering the Questions . . . . . . . . . . . . . . . . . . . . . 17

8 Learner Interface API. . . . . . . . . . . . . . . . . . . . . 19

8.1 The Syntax of a Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198.2 The Learner Interface API Functions . . . . . . . . . . . . . . . . . . . . 20

9 FramerD and Link Grammar Parser . . . . . . . 23

9.1 FramerD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

9.2 Link Grammar Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

10 How Frames Work . . . . . . . . . . . . . . . . . . . . . . . 25

10.1 Representing Atoms and Topics . . . . . . . . . . . . . . . . . . . . . . . . 2510.2 Representing English and Internal Knowledge . . . . . . . . . . . 26

10.2.1 Representing English Uframes . . . . . . . . . . . . . . . 2610.2.2 Representing Derived Knowledge Slotted Frames

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2610.2.3 Functions Related to Assertions . . . . . . . . . . . . . . . 26


4/41

ii

11 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

11.1 Substitution Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

12 Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

13 Iframes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

13.1 Iframe Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

14 Learner and the Community . . . . . . . . . . . . . 32

14.1 Population Contributing to the Learner . . . . . . . . . . . . . . . . . 3214.2 Wishlist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3214.3 Bug Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

15 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

16 History and Acknowledgements . . . . . . . . . . . 35

Function Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Concept Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37


5/41

Chapter 1: Introduction 1

1 Introduction

learner allows ordinary Web users (teachers) to teach a computer things they know

and allows anyone to ask it questions about what others have taught it.The home page for the project is at http://www.media.mit.edu/~timc/learner . It

contains a link to a running learner that you teach and ask questions.

We hope that by capturing things people know and capturing how they conclude thingsfrom other things we can pave the way to smarter, easier to use information repositories,computers, and other devices.

To put it another way, learner is a system for capturing declarative knowledge andinference rules. A fundamental feature of the system is to tease the knowledge out of theuser by posing plausible questions.

Importantly, the number and quality of follow-up questions learner can ask goes upas the amount of knowledge in the system goes up. Our approach is to intelligently re-use

the knowledge that users put in to get more good information.

However, the mechanisms of intelligently re-using inputted knowledge are necessarilydiverse.1

From the outset, learner has been designed to be able to capture the large diversity ofreasoning methods. It is open source, has a general plug-in architecture for question-posingmodules, and an extensive and well documented api.

Even though it does not get sensory information the way people do, it can accumulateand use a vast amount of knowledge and rules. With plenty of luck, learner can becomea collaborative creation of mankind that rivals any other artifact in its usefulness.

1.1 MotivationSimply put, there currently are no good tools for dealing with assertion-level unstruc-

tured data.

This is generally an acknowledged problem for any knowedge-intensive business. Forexample, darpa recognizes the importance of being able to gather and reason with hetero-geneous, unstructured knowledge in its recent Rapid Knowledge Formation (rkf) initiative.

However, the enabling technologies such as:

natural language parsing,

persistent object-oriented storage, and

rapid information retrieval from very large repositories

have matured over the past 20 years to a point where robust, new applications can be builton top of them.

Furthermore, the project of collecting knowledge and even the project of developing theset of methods to process such knowledge cannot be effectively tackled by a small team.Thus, it almost inevitably must be the distributed or, more specifically, community-based approach. Such approaches have become feasible only recently thanks to the WorldWide Web and the advent of the Open-Source movement.

1 In line with Marvin Minskys general thesis in his seminal Society of Mind.


6/41

Chapter 1: Introduction 2

It is the goal of the learner project to break new ground in collecting large repositoriesof unstructured assertions and enable reasoning in them. The underlying technologies andapproaches exist. What is needed now is an organized effort of individuals skilled in this

field.The learner technology is still very young. However, there are many huge opportunities

that it could, with maturity, address very effectively. Here are some examples of potentiallyvery successful and lucrative applications:

Self-service Help DesksResponding to Natural Language queries with ever increasing precision to au-tomate customer support and help-desk operations.

Voice Command and ControlAs an enabling technology for controlling computers and other devices withyour voice. Continuous speech recognition has made great advances in recentyears. The new gap is: given the recognized speech to figure out what the useractually wanted to do, i.e. what commands should actually be executed.

Generally, when speaking to each other, people can communicate effectivelybecause the listener is assumed to have a lot of commonsense knowledge andsome reasoning ability. Computers and other devices can potentially join theclass of intelligent listeners if they are equipped with a large common senseknowledge base.

An Information Repository for Assertion-Sized KnowledgeCurrently, databases address the issue of storing and manipulating structuredknowledge. A lot of valuable knowledge is simply too heterogeneous to bestored that way. That is why the looser organizational approach of the World

Wide Web has been so successful. learner can capture roughly sentence-levelinformation and be as useful on that level as the Web is on the document-level.

The rest of the manual describes how learner works so you can use and extend theproject.


7/41

Chapter 2: Installing Learner 3

2 Installing Learner

Before installing any learner software, you must download and install a large database

containing the WordNet (http://www.cogsci.princeton.edu/~wn/ ) lexical databaseand a released part of the CYC (http://www.opencyc.org/ ) ontology. This combineddatabase is called bricolage or brico, and it is in the format used by the Framerd(http://framerd.org) Scheme interpreter.

As of this writing, you can find this database at

http://framerd.org/download.html

by clicking on the .tar.gz link after the words BRICO ontology.

Now unpack this .tar.gz file, preferably under a new /usr/local/share/brico/ directory. The rest of these installation notes assume that you used this specific directory,such that the file

/usr/local/share/brico/brico/brico.pool

exists.1

Once the brico database has been installed, you can install the learner software. Asof this writing, the learner software is available for download at

http://mit.edu/fredette/www/learner/

To run the learner, you need to download and install a total of four software packagesfrom this site, in the following order:

1. link-4.1+learner0.1.tar.gz is a modified version of the Link Grammar Parser(http://www.link.cs.cmu.edu/link ), an English language parser developed atCarnegie-Mellon University.

Our modifications include some additional functionality, and the ability to be integratedwith FramerD.

Unpack this .tar.gz file and follow the instructions in the INSTALL file to compileand install this package.

2. framerd-2.2preA+learner0.1.tar.gz is a modified version of a distributed object-oriented database FramerD (http://framerd.org), which includes an extendedScheme implementation and was developed at the Media Lab of the MassachusettsInstitute of Technology.

Our modifications include new functions to integrate the Link Grammar Parser.

Unpack this .tar.gz file and follow the instructions in the INSTALL file to compileand install this package.

3. learnerdict-0.1.tar.gz is an English dictionary used by the learner software.

Unpack this .tar.gz file under the exact same directory under which you unpackedthe brico database.

4. learner-0.1.tar.gz is the learner software itself.

Unpack this .tar.gz file and follow the instructions in the INSTALL file to build thispackage. The learner does not need to be installed.

1 It is possible to override this with the --with-brico option to learners configurescript - see the installation item for the learner for more information.


8/41

Chapter 2: Installing Learner 4

Please note: If you did not install the brico database and English dictionary underthe /usr/local/share/brico directory, (for example, to store it on a large seconddisk or on the network), you will need to tell learners configure script where to

find the database. For example, if you installedbrico

under /usr/bigdisk/brico,you would do:

./configure --with-brico=/usr/bigdisk/brico


9/41

Chapter 3: Running Learner on Your System 5

3 Running Learner on Your System

For these early releases of the learner, you interact with it entirely at a FramerDcommand line. You can reach this command line by running the command fdscript atyour shell prompt, or, if FramerD found an Emacs when it was installed, you can get acaptive command line buffer in Emacs by doing M-x fdscript.

The first fdscript command you should run should change the working directory towhere you built the learner. If you unpacked and built the learner under the directory/usr/home/test/learner , do:

(cd "/usr/home/test/learner")

There are a few parameters you can set to control how the learner runs:

Variable%keep-pools-and-indicesThis is set in learner-init-server.fdx . If it is #f, all the pools and indiceslearner

uses will be wiped out if they are present (and empty ones will be createdin their place).

Otherwise, the learner assumes indices and pools are in place and uses those.

Variable%prefetch-bricoWhen defined and not #f, will prefetch every topic mentioned in learners databasefrom brico. This takes some time but speeds up future operation. It is convenientto turn this off if you are relaunching the system frequently.

If running in client-server configuration, see learner-init-server.fdx to turnprefetching off in launching the server.

Variable%run-as-standaloneWhen #f, loads the system as the client assuming a server process has been startedseparately. The sever process can be started with the command fdserver learner-server.fdz --local.

Variable%use-generalizationWhen not #f, enables another mechanism for generating questions. Roughly, givencats have tails and dogs have tails, uses bricos hierarchy to arrive at the hy-pothesis that (and ask whether) all pets have tails. This feature is still in tuningand should probably be left off for beginner users.

Variable%user

This is the default username to use. If not set, "unknown" will be used. When usingwith a web interface in a multi-user situation, the username should be provided in theuser slot of some api calls. See Section 8.2 [The Learner Interface API Functions],page 20, for an explanation of which api calls need the user slot.

Variable%verboseWhen not #f, will cause printing out of various information. Furthermore, if(contains? 2 %verbose), more information (the level 2 printouts) will be out-putted. The most output is obtained by setting %verbose as follows: (define%verbose (choice 1 2 3 4 5).


10/41


For example, you may set the variables and load the system as follows:

(define %run-as-standalone #t)(define %user "guest")

(define %prefetch-brico #t)(define %verbose #t)(define %allow-slotted-frame-questions #t)(define %use-generalization #f)(define launch-learner

(lambda ()(if %run-as-standalone

(load-library "learner-init-server.fdx");; run as client of client-server;; NOTE: the server should be started before running the;; following line. Server can be started with, approx.:;; fdserver learner-server.fdz --local

(load-library "learner-use-server.fdx"))(load-library "learner-init-client.fdx")))

(launch-learner)

If you would like to run it in the client-server configuration, you will need to start theserver in a separate shell with

fdserver learner-server.fdz --local

(you may wish to set %prefetch-brico and %verbose in learner-init-server.fdx ).When launching the client (inside Emacs if you wish) make sure to set:

(define %run-as-standalone #f)

You can have several client processes talking to the same server. All the database updatesand retrievals are done by the server to avoid cache coherency problems.

The client exports the api functions, as listed in the learner-client-exports.txt .Those are accessible by issuing dtcall commands.1 .

Next, you can make sure that things are functional by taking the following steps:

1. load the assertion and file learner-initial-assertions.fdx ,

2. load the Iframe definition file learner-initial-iframes.fdx , and

3. evaluate the contents of learner-tests.fdx.

Or, you can go on to the 30-second tutorial (see Chapter 5 [Command Line Interface],page 10).

Note that as one of the first users of the learner, your contribution makes a difference.The knowledge you contribute at this formative stage will shape the future direction of thedevelopment of the learner.

1 see fdscript documentation for more details


11/41


We hope you enjoy experimenting with and extending the learner. learner canachieve its potential only with the contributions of many, so we strongly encourage youto contribute any significant changes back to the learner community. See Section 14.2[Wishlist], page 32, for possible directions.

Should you accumulate a significant knowledge base with your copy of thelearner

, westrongly encourage you to share all or (if you must) part of it with the learner communityby contacting us.


12/41

Chapter 4: Using the Web Interface 8

4 Using the Web Interface

4.1 Starting a ConversationYou can start by asserting something (in the say something interesting box) or by

clicking on one of the hot topics

You can also get more information on any specific topic by clicking on it when it is ahyperlink, or by entering it in the Summarize the topic blank field.

Finally, you can make any existing topic the topic of conversation by entering it in theblank in the Make blank the new topic section.

4.2 Tips on using the Interface

Here are a few tips on how to (and how not to) enter knowledge solearner

can makethe best use of it:

Respond to the questions (guesses) that come up.For the right guesses, change the radio button next to them to Yes, and forthe wrong ones, change it to No. Its ok to leave some as Dont know.

Do not worry about the trailing ?.Trailing ? will be stripped off from your replies.

Provide because reasons.With either Yes or No reply, you can (and, when it makes sense, you areencouraged to) provide a reason in the because field.1

For example, when presented with a question a car has a tail?, you are en-couraged to respond with:

a car has a tail? because a car is not an animal 2

Please note: only enter because reasons that can stand on their own. Do notrely on the words it or they.

Wrong:

a car has a tail? because it is not an animal

Correct Learners guesses whenever possible.Rather than just answer or , correct the system when possible.For example, if the system says:

a car uses electricity?You are encouraged to edit the line to read:

a car uses gasoline

. . . and select the radio button.

1 This is a simplified interface for entering inference frames. See Section 5.2 [TeachingLearner Thinking Rules], page 12, for the full set of options in specifying Iframes.

2 This said that a car does not have a tail because a car is not an animal. If you pre-fer, you can enter the equivalent a car does not have a tail because a caris an animal negating the statement or using the radio button do the same thing.


13/41

Chapter 4: Using the Web Interface 9

Do Not Over-CapitalizeDo not capitalize the first word in a sentence unless you would capitalize thatword in the middle of a sentence. For example:

Ford makes minivansBut there is no capitalization in:

sun rays are sometimes blocked by clouds

Do not forget to select the appropriate or checkbox.

Keep Adding KnowledgeThe more the system knows, the better it can be at making analogies. To helpmake the guesses more on target, try to type in some new facts every two orthree question-answering sessions.

To use the interface most effectively, it is best to have the guess and its because fieldfit on a single line. You may need to resize your browser window and/or your browsers

font sizes to achieve this.The interface also allows you to query what learner knows in the input box labeled

try to answer a question. See Section 5.3 [Asking Learner Questions], page 12, for anexplanation (with examples) of how to pose the questions.


14/41

Chapter 5: Using the Command Line Interface 10

5 Using the Command Line Interface

5.1 The 30-second tutorial

To run the examples given here, start up FDscript, load the learner code, and makesure the database contains only the assertions from learner-initial-assertions.fdx and the Iframes from learner-initial-iframe.fdx . You can do this by starting withan empty database and loading these two files.

These are the functions you can use to talk to the system from the FDscript commandprompt or from Emacs if you are running FDscript within Emacs:

show-topicsshow-topic-summarysay-topic

saysay-Iframe!find-answers

See Chapter 8 [Learner Interface API], page 19, for definitions of these functions.

At any time, you can use the show-topics command to see what the learner knowsabout. For example, (show-topics 4 4) shows the topics the system knows exactlyfour things about. On the initial database, this command produces the following out-put:

;; There are 3 results("cup" . 4)("elephant" . 4)

("fork" . 4);; Nothing (void) was returned

(show-topics 2 -1) shows the topics the system knows at least two things about.

You can see more about any topic with the show-topic-summary command.

For example, (show-topic-summary "cat") produces:

(show-topic-summary "cat");; Topic: cat (8 yes uframes, 0 no uframes, 0 srvframes known)

;; Similar topics:("dog" . 5)("bear" . 2)

("car" . 1)("elephant" . 1)

;; Statements that have cat as a topic:("a cat has a tail" . 10)("a cat is a pet" . 10)("a cat is an animal" . 10)("cats can scratch with their paws" . 10)("cats drink milk" . 10)("cats eat mice" . 10)


15/41


("cats have paws" . 10)("cats have sharp claws on their paws" . 10)

;; Nothing (void) was returned

The number next to each similar topic is the similarity strength. The more thesystem knows, the closer to human judgment these will generally be.

The number next to each statement that has topic as a topic is the strength ofthe topic in the assertion. For example, having cat in the subject position merits ahigher score than having it the object position.

You can talk to the system about a topic by using the say-topic command.

For example, (say-topic "cat") produces something like:

;; I would like to know:(sc: 3) Please confirm, deny, or correct: a cat barks when it is angry?(sc: 3) Please confirm, deny, or correct: a cat can bark?

(sc: 3) Please confirm, deny, or correct: a cat can bite you?(sc: 3) Please confirm, deny, or correct: cats chew on bones?(sc: 3) Please confirm, deny, or correct: cats drink water?(sc: 3) Please confirm, deny, or correct: cats eat meat?(sc: 2) Please confirm, deny, or correct: a Porsche is a fast cat?(sc: 2) Please confirm, deny, or correct: a cat can be fast?(sc: 2) Please confirm, deny, or correct: a cat is an object?(sc: 2) Please confirm, deny, or correct: a cat is brown?(sc: 2) Please confirm, deny, or correct: a cat is dangerous?(sc: 2) Please confirm, deny, or correct: a cat is similar to a wolf?(sc: 2) Please confirm, deny, or correct: a fast cat has a big engine?(sc: 2) Please confirm, deny, or correct: a fast cat has large tires?

(sc: 2) Please confirm, deny, or correct: a wolf is similar to a cat?(sc: 2) Please confirm, deny, or correct: an cat has tusks?(sc: 2) Please confirm, deny, or correct: an cat is very heavy?(sc: 2) Please confirm, deny, or correct: an cat is very large?;; Nothing (void) was returned

This is the system guessing at what may be true about cats. Based on similarity withdogs, the system has correctly guessed that "cats drink water", but incorrectly that"a cat can bark".

This is where you, the user, come in. Please teach the system using the say command.You can type:

(say "cats drink water")(say "a cat can not bark")1

At this point, you know enough to start talking to the system directly.

As you say more things to the system, it uses them to make better guesses about whatis and what is not true. So, youre making the system smarter with every little bit of

1 This statement is stored negatedas "a cat can bark" with probability 0.

Negated statements can also be entered as (say "a cat can bark" 0).


16/41


knowledge you put in. Every little bit really helps so you can make this effort a successwhile just having fun!

5.2 Teaching Learner Thinking RulesA more direct, powerful way to make the system smarter is to teach the system Thinking

Rules in addition to just facts.

For example, in response to the system guessing that a car has a tail (or for any otherreason), you may tell the system

(say-Iframe! (("a car is not an animal")) ("a car has a tail" 0))

creating an Iframe with the what being "a ?car? is an animal => a ?car? has atail".

5.3 Asking Learner Questions

A way to ask the system about what it already knows is using the find-answers com-mand.

learner can handle both simple interrogatives (e.g. What do cats eat?) and fill-in-the-blank questions.

Here are some examples:

(find-answers "What do cats eat?")produces:;; Finding answers for "cats eat ?X?";; Found an answer:"cats eat mice" is true

(find-answers "?Xes? eat ?Y?")produces:;; Finding answers for: "?Xes? eat ?Y?";; Found these answers:"dogs eat mice" is true"cats eat mice" is true"dogs eat meat" is true

(find-answers "turtles eat mice")produces:;; Finding answers for: "turtles eat mice";; Found an answer:

"turtles eat mice" is not true


17/41

Chapter 6: How Does Learner Work? 13

6 How Does Learner Work?

The learner has a set of features that allow it to ask relevant follow-up questions when

obtaining information from the user. This set can be expanded as needed; the system itselfships with a core set of plug-ins. We demonstrate what these plug-ins allow learner todo in a series of examples.

6.1 Making Similarity Judgments

If two topics are similar, similar statements will be true about them. For example, aspoon and a fork are similar, and it is true that you can eat with a spoon and youcan eat with a fork, and forks are usually made of metal and spoons are usually madeof metal.

Furthermore, similarity drops off gradually. For example, a spoon and a shovel are

both inanimate objects that are tools. In some general sense, they are more similar than,say, a spoon and a rabbit. On the level of assertions, we observe that you can sayyou can eat with a spoon and you can dig with a shovel, but it is awkward to phrasesomething similar about a rabbit.

Based on these two observations, it should be possible to go the other way. That is,similarity of topics can be derived from similarity of assertions about them!

That is what learner does.

Similarity is (or should be) used throughout the system to drive creation of new hy-potheses, estimate plausibility of an answer or a newly acquired fact, retrieve most relevantinformation, organize how retrieved information is presented, and so on.

In our experience, most of the things learner needs to do can be reduced to somecombination of similarity and inference computation. So, having a good similarity functionis pretty important. similar-topics-hash is the function that implements similarity inthe learner.

Topics similar to a given topic (the source topic) are computed by taking the followingsteps:

1. For the source topic, identify all the assertions about it (the source assertions).

See scored-assertions-on-topic .

2. For each source assertion, find all assertions that exceed a certain similarity threshold.

See similar-Uframes-set.

3. In each of the similar assertions, find the topic that is in the same role as the sourcetopic is in the corresponding source assertion.

Fore example, for a source topic dog, dogs eat mice may be an assertion aboutdogs (a source assertion). Then, cats eat mice would be a similar assertion, andcats in it would be in the same role as dogs is in dogs eat mice. That would addweight to similarity of cat and dog.

See corresponding-item.

4. All of the similarity scores are added up, arriving at a hashtable of topics, each similarto the source topic, and each associated with a similarity weight.


18/41


Step number 2 above was to find the most similar assertions given an assertion. Thatin itself is a multi-step process (that can be improved be a willing contributor!). Currently,similar assertions to a given source assertion (uframe) are computed by taking the following

steps:1. Given the uframe, compute its significant keys (words) and their significant links (parse

links).

2. Retrieve all uframes that have at least one significant word in common with the sourceframe, making the candidate frames.

3. Score each candidate frame comparing significant atoms in the source frame with thecorresponding significant atoms in the candidate frame as follows:

Add many points for atoms that are identical,

Add some points for atoms that are similar according to brico,

Subtract some points for atoms that are not similar,

Subtract some points for an atom in one frame that does not have a correspondingatom in the other.

Explicit statements about what is similar do not currently affect the internal similaritymeasure, but they could, so that saying "a spoon is similar to a fork" (or finding thatout from brico or from mining the World Wide Web) would prompt exploration of howthey are similar.

Relevant functions:

Functionscored-assertions-on-topic topicGiven a topic, returns conses of assertions and strengths. The strength for an assertionis what strength of the topic would be if we called assertion-topics on the assertion.

Functioncorresponding-item topic uframe similar-uframeGiven a topic, a uframe containing the topic, and a similar-uframe, returns the topicfrom similar-uframe that is in same role, i.e. corresponds to topic in the uframe byits parsing role.

Does not base-formify the topic it returns.

Functionsimilar-Uframes-set uframe threshold . exclude-topicsGiven a uframe, a threshold, and an optional choice of exclude-topics, returns thechoice of uframes at least threshold similar to uframe; excludes any frames that

mention one or more of the exclude-topics.

Functionsimilar-uframes-of->string uframeGiven a uframe, returns a string describing uframes at least similarity-threshold(hardcoded to a certain value) similar to it. Accounts for probability classes of theframes

(lineout(similar-uframes-of->string (the uframe "dogs eat mice")))

;; Related to what you said:("cats eat mice" . 26) ("cat dog")


19/41


("dogs eat meat" . 13)

On each line, the number associated with the string is the similarity score, and theoptional parenthetical argument lists which things in equivalent positions in the two

assertions were found to be brico-similar.

6.2 Making Analogies

The key task of the system is to pose plausible questions on the topic of conversation.How can we come up with these?

There are a few options. One is to re-use what other users have put in, verifying itverbatim. While perhaps needed, it does not lead to a very interesting system, as there isno expansion of the knowledge base.

Another is to seed learners questions with the statements that seem plausible basedon, for example, mining the World Wide Web. That could be a very interesting direction,

especially if what was extracted was sufficiently clean.Yet another approach would be to rely on the learners inference mechanism to gen-

erate new statements. The problem here is that we often do not have enough inferences inthe new to learner areas to make this the weight bearing mechanism.

Finally, there is a very good avenue, the one the system actually uses. This avenue isto generate new statements about a topic by analogy from known statements about similartopics.

So, fundamentally, the analogy mechanism resides on top of the similarity mechanism.Analogy is potentially a very powerful and sophisticated tool. Here, we describe its currentimplementation in the learner.

The preceding section on Making Similarity Judgments (see Section 6.1 [Making Simi-

larity Judgments], page 13) showed how to compute topics similar to a given topic. Letscall such topics friends of the source topic.

To analogize the statements from similar topics (friends) to the target topic, we basicallytake all the assertions that are true about the friends, change them to be about the targettopic, and sum up assertions from all the friends, giving more weight to statements thatcame from better friends (more similar topics), and letting assertions that are true aboutsome friends and not true about others partially cancel each other out.

More formally, we take the following steps:

1. For each friend, retrieve all assertions that mention it, omitting the uncertain ones andthe ones that already mention the target topic.1

2. In each of the retrieved assertions, substitute the source topic with the target topic,conjugating nouns to be plural or singular as needed. This forms the analogized asser-tions.

Each analogized assertion has a score associated with it the better the friend (i.e.the more similar the source topic), the more weight assertions analogized from it get.

3. Collapse the identical analogized assertions that were formed from different sourcetopics, adding up their weights and accounting for probability-classes being the sameor opposite.

1 The latter filter prevents creating strange assertions such as mice eat mice.


20/41


Furthermore, it is an upcoming feature that assertions that are true about more thingsget more weight so that the system progresses from the more general to the more specificquestions in its learning about a topic.

More modules for generating plausible things to ask can, and will, with time, be added.Once the assertions are computed, they get cleaned up to exclude asking what is

already known, what is inferable from other things we are already asking, etc. See Section 7.2[finalizing questions], page 17, for details.

6.3 Making Inferences

This capability is built on top of Iframes. See Chapter 13 [Iframes], page 30, for detailson working with inferences.

Inference is used in several places.

First, inference is used every time an assertion is added to see what it implies. Itis a planned feature to evaluate and react to an assertion creating implications thatcontradict what the system already believes.

When selecting the questions to pose, we use inference to not ask things that may beinferred from the current knowledge. We also use it to not ask questions which may beobviated by answers to others. See Section 7.2 [Pruning and Ordering the Questions],page 17.

It is an upcoming feature to try to infer the answer when you use the find-answerscommand. Currently, it simply tries to look it up in the knowledge base.


21/41

Chapter 7: Learner Data Flow Architecture 17

7 Learner Data Flow Architecture

learner has a plug-in architecture for generating questions. That means that separate

users can experiment with their own question-formulating modules and take advantage ofthe overall framework to organize and present questions for them.

Fundamentally, learners read-eval-print loop takes the following steps:

1. accept the current input,

2. identify the current set of hot topics being discussed,

3. create questions for the current topics and assertions,

4. finalize and present the questions to prompt the next input.

We present in more detail at the steps of creating the questions and finalizing them.

7.1 Generating QuestionsWhen you say things to the system, you are making it more knowledgeable. Through

its built-in analogy and generalization mechanisms, it can also think of (hypothesize) newthings. Fundamentally, these new hypotheses form the basis of the questions the system isasking.

Here, we describe the several specific mechanisms for creating new questions that shipwith the learner distribution.

7.2 Pruning and Ordering the Questions

The learner is set up to support multiple independent question generators. This is toallow independent groups to experiment with question-generating and to make it simple toincrease learners question-asking prowess.

To make question-generation easy, a lot of the clean-up functionality is offloaded intoa common finalizing stage. All question-generators feed their outputs into the finalizingstage and this stage generates and outputs the final set of questions.

Finalization consists of the following actions:

CombiningSeveral questions that are asking the same (or nearly the same) thing are com-bined into a single question. This is useful when separate mechanisms generatethe same question. The scores of the questions are combined, paying attentionto the probability classes of the statements (a positive and a negative conjecturecancel each others scores out).

Pruning

A question is dropped altogether. There may be several reasons for this:

The answer to this question is already known. We currently drop all suchquestions, although some questions may have multiple answers.

The question may be made irrelevant by an answer to another question weare posing. See below for an explanation.


22/41

Chapter 7: Learner Data Flow Architecture 18

Ordering

Given a great many questions, in which order do we ask them? The questiongenerators produce questions already paired with scores, but the other work

done by the finalizer can further alter the scores.The pruning of questions to those which do not follow from answers to other questions

deserves further explanation.

For example, if learner is considering asking about the truth of the following state-ments:

Question 1: "cats have paws"Question 2: "cats have sharp claws on their paws"

and it already knows that "?snakes? have ?paws? => ?snakes? have sharp clawson their ?paws? " (i.e. if you do not have X, you cannot have sharp claws on X),then it will not ask Question 2 together with Question 1.

This is an important feature. If the interface makes it easy to enter because reasons(Iframes), then filtering out of the dependent questions will lead to spontaneous structuringof the dialogue. That is, the system will evolve from a mass of questions towards a decisiontree type dialogue. The initial questions will be the more general ones, and depending ontheir answers, the more specific ones will become relevant.

This mechanism helps make learner a powerful knowledge acquisition tool that im-proves as it accumulates more knowledge.


23/41

Chapter 8: Learner Interface API 19

8 Learner Interface API

Learner Interface API covers the functions you should use to interact with the learner,

both from a command line and from a front end.The subset that should be used from the command line was described in the tutorial on

using the command line interface (see Section 5.1 [The 30-second tutorial], page 10).

The front-end api functions are a subset of the interface api. The functions exported tofront-ends are as follows:

functions for adding to the system:add-to-kb-startadd-to-kb-with-string-outputadd-to-kb-end-with-string-outputsay-iframe!-with-string-outputsay-topic-with-string-output

functions for browsing:find-answers-with-string-outputsimilar-uframes-of->stringshow-topics-with-string-outputshow-topic-summary-with-string-output

These functions allow you to build front ends for the learner without the need tounderstand its architecture or algorithms.

The file learner-client-exports.txt dictates which functions are exported.

8.1 The Syntax of a Term

In describing the API functions, we introduce the notion of a term. A term is the inputyou can give to the system to describe a frame either to retrieve a frame that alreadyexists or a frame you would like to create. The function that parses these terms is term->protoframe; it is defined below. Interface functions rely on it to parse the terms theyreceive as arguments.

A term can be one of the following:

When denoting a Uframe, a list consisting of a string and optional probability andoptional matched extra slots and values:

(list string [probability] [slot value]*)

The string may contain variables, which are denoted by surrounding them in ?. For

example:("cats eat mice")(list "cats eat ?mice?")(list "cats can bark" 0)(list "pigs cannot fly" slot1 value1 slot2 value2)

are all valid terms denoting Uframes.

When denoting an srvframe, a set of 3 strings with optional number denoting proba-bility and optional matched extra slots and values.

For example,


24/41


("cats" "eat" "mice")

is a valid term denoting an srvframe.

Functionterm->protoframe term . other-slots-valuesGiven a user-inputtable description of a frame term, creates and returns a protoframedescribed by term and including other-slots-values.

This function is verbose, i.e. it will output lines describing the problem if the inputtedexpression cannot be parsed.

8.2 The Learner Interface API Functions

This section defines all the commands that constitute the interface api of the learner.

Functionadd-to-kb . termGiven a term, this function adds it to the knowledge base, producing lineouts if thereare any problems (e.g. cant parse, the system already believes the opposite, etc.)

Registers the assertion in this users history

If term has a user slot, makes that user the source of the frame that is added.

See also add-to-kb-with-string-output .

Please note: When specifying slotted frames, if the atoms mentioned in the term donot exist, neither the atom nor the frame get created. This is because parsing providesa part of speech and some other useful information, so we prefer for all atoms to becreated as a result of parsing and then used in slotted frames as needed.

Functionadd-to-kb-with-string-output . termA version ofadd-to-kb that returns its output as a string rather than printing it out.

Examples:

(add-to-kb-with-string-output "a cat has a tail" user "test")(add-to-kb-with-string-output "cat" "have" "tail" 0.9 user "test")

Functionadd-to-kb-start . slots-valuesThis function should be called before adding zero or more assertions to the system. Seealso add-to-kb-end-with-string-output . The set of assertions enclosed betweenthese two commands is treated as one input session to which learner responds.

slots-values should contain the user slot and value when running under multipleusers.

Functionadd-to-kb-end . slots-valuesThis function should be called after all new things the user has said in one submissionhave been added to the knowledge base. This function produces the output to presentas the reaction to what the user has said.

slots-values should contain the user slot and value when running under multipleusers.


25/41


Functionadd-to-kb-end-with-string-output . slots-values

Functionfind-answers . termGiven a term, interprets it as a question to the system and tries to find or inferknowledge that would constitute an answer to this question.

If term has a user slot and a frame is created to be answered later, makes that userthe source of the frame.

See Section 8.1 [The Syntax of a Term], page 19, for an explanation of how to specifyterms.

For example, if you ask (find-answers "?Xes? have tails") or (find-answers "a?X? has a tail"), with just the initial database loaded, you will see output similarto:

;; Finding answers for "?Xes? have tails";; Found these answers:

a dog has a taila cat has a tail;; Nothing (void) was returned

Functionfind-answers-with-string-outputA version of find-answers that returns its output as a string rather than printing itout.

Functionsay . termThis is a wrapper for asserting a single assertion and getting a reply the system.Given a term, adds it to the kb, registers it in the history, and poses the relevantquestions.

See also:

add-to-kb-startadd-to-kb-with-string-outputadd-to-kb-end-with-string-output

See Section 13.1 [Iframe Functions], page 30, for the definition of say-iframe! andsay-iframe!-with-string-output .

Functionsay-topic argThis is the main way a user can set the current topic of conversation. Given an arg(an atom frame or a string) this sets the current topic to be the base-form of arg.

Functionsay-topic-with-string-outputA version ofsay-topic that returns its output as a string rather than printing it out.

Functionshow-topic-summary string . [slots]Shows the summary of the topic indicated by the string and the optional slots.

Functionshow-topic-summary-with-string-output topic-strA version of show-topic-summary that returns its output as a string rather thanprinting it out.


26/41


Functionshow-topics min maxgiven a min and a max, this top-level command outputs topics about which at leastmin and at most max uframes are known. If max is -1, no upper limit is used.

Functionshow-topics-with-string-outputA version of show-topics that returns its output as a string rather than printing itout.

Functionsimilar-uframes-of->string uframeGiven a uframe, returns a string describing uframes at least similarity-threshold(hardcoded to a certain value) similar to it.

Accounts for probability classes of the frames.


27/41

Chapter 9: FramerD and Link Grammar Parser 23

9 FramerD and Link Grammar Parser

learner depends on two major pieces of software: FramerD is a Scheme interpreter

married to a flexible database implementation, and the Link Grammar Parser is an Englishparser from cmu.

9.1 FramerD

FramerD is a distributed object-oriented database used by learner. FramerD is avail-able under the lgpl and includes persistent storage and indexing facilities that can scaleto very large database sizes, as well as a language FDscript, a superset of Scheme.

learner is written in FDscript.

FramerD introduces the concepts of frames, slots, and slotmaps which we use in describ-ing how learner works.

FramerD also comes with a version of the WordNet lexical database and a released partof the cyc ontology combined and converted into the FramerD format (the database iscalled bricolage or brico). For now, learner uses the WordNet component only.

FramerD also has many attractive features:

Built-in support for distributed operation.

Built-in functions for xml and html parsing and output.

Built-in support for perl-like regular expression pattern matching.

To do the more advanced things with the learner, you will need to understand FD-script.

FramerD documentation, covering the database implementation and the FDscript lan-guage, was available at the time of writing at http://framerd.org.

9.2 Link Grammar Parser

The Link Grammar Parser is a constraint-based English-language parser that tries toassign a consistent set of linkages between all words in a sentence.

The Link Grammar Parser is an impressive system in its own right. The parser is writtenin C and source code is freely available for non-commercial purposes.

Complete distribution and documentation of the link grammar parser was available at

the time of writing at http://www.link.cs.cmu.edu/link .Here is an example of how the parser would parse the sentence "cats eat mice":

+-Sp-+--Op-+| | |

cats.n eat mice.n

The above parsing contains the following information about the word cats:

cats is a noun cats.n,

cats has the subject role in the sentence it is on the left side of an S* link; anotherway to say this is that cats forms an S- link with the word eat,


28/41

Chapter 9: FramerD and Link Grammar Parser 24

cats and eat are plural they are linked by the Sp link in which the lowercase pdenotes plurality.

learner currently only accepts sentences that can be parsed completely. When multiple

parsings can be found by the parser, the learner uses the first one.This can lead to some unexpected results. For example in the sentence cats have sharp

claws, sharp gets parsed as a noun in the first returned parsing. This does not, however,cause any known difficulties with the operation of the learner.

According to the authors of the Link Grammar Parser, a statistical version of theparser is under development and may become available sometime in the future to addressthis.


29/41

Chapter 10: How Frames Work 25

10 How Frames Work

All frames in learner, be they for representing atoms, assertions, or parser links, have

some fundamental mechanisms (such as inheritance) and policies (such as rules on mutatingframes) that apply to them. We start by overviewing these mechanisms and policies andthen go on to describe how more specific types of frames work.

Frames in learner have an inheritance mechanism. Namely, the ifget command worksthe same way FDscripts built-in fget command does, except it will recursively follow aframes inherits-from slot until it fails or gets to a frame that has a value for that slot.

It is a policy that frames which are OIDs are not to be mutated. Mutating them wouldlead to difficulties with the need to update their indexing and with knowing what theoriginal author asserted. Rather, protoframes that inherit from oids are created. Mutationof protoframes is permitted where appropriate, as in committing scheduled changes (seeChapter 11 [Variables], page 27). To effect this policy, use a learner function fset-

safe!.Relevant functions: ifget

10.1 Representing Atoms and Topics

learners knowledge consists of assertions, but to work with these effectively, we orga-nize assertions around topics. Currently, only nouns in their base form can be topics, butgerunds (skating in some people like skating) and noun phrases (beach chair) canalso be added.

Interactions with learner revolve around topics and similarity is also measured betweena pair of topics (the similarity of sentences helps compute similarity of topics).

Assertions are said to have the main topic (sometimes not present) and, more generally,topics.

Related functions:

Functionassertion-main-topic assertionGiven an assertion, returns its main topic, if any

Functionassertion-topics assertionGiven an assertion, returns a choice of conses, first element of each is a topic, andsecond is a score of how much the assertion is about the topic

Functiontopic-total-mentionsGiven a topic, returns how many assertions are indexed by it

Functiontopic-frequency topicGiven a topic, returns how many certain-yes? assertions mention it

Functiontopic-absolute-frequency-weight topicGiven a topic, returns returns its weight (decreases with frequency, i.e. is less forcommon words)


30/41

Chapter 10: How Frames Work 26

10.2 Representing English and Internal Knowledge

The preceding chapter explained how frames are used to represent individual atoms andtopics. This chapter explains how frames are used to represent compound structures tohold assertions.

There are two types of frames for holding assertion-level information. One is a Uframe(for Utterance frame), used to hold information in a natural language, and the other is aslotted frame, to hold information for internal processing.

In this section, we explain the common features of all frames representing assertion-levelinforamation. More details about the specifics of each type are available in the sections thatfollow.

10.2.1 Representing English Uframes

A Uframe (for an Utterance frame) is a structure for holding utterances in English(or, potentially, any natural language).

Uframes are created from the output of the link grammar parser and roughly mirror it,although they have additional slots.

A Uframe contains the following slots:

parsed

simplified

utterance-type

significant-keys-links-hash

For efficient processing, we envision that Uframes will be recognized into internal data

structures slotted frames.

10.2.2 Representing Derived Knowledge Slotted Frames

An SRVframe is an example of a slotted frame. srv stands for Subject-Relation-Value.Accordingly, an SRVframe has three slots: the subject, relation, and value.

See Chapter 11 [Variables], page 27, for a description of frames with and without sub-stitutions.

10.2.3 Functions Related to Assertions

add-to-kb, add-to-kb-with-string-output , say are used to add assertions.similarity: similar-uframes


31/41

Chapter 11: Variables 27

11 Variables

Frames may be assertions if they have no substitutable atoms, or templates if they do.

When a frame has a substitutable atom (a variable), the atom is shown surrounded withquestion marks. Here is a quick example of making the atom cat variable:

(let ((frame1 (frame-atom-is-substitution(the uframe "cats eat mice")(a "cat"))))

(frame-finalize! frame1)frame1)

The above example returns a protoframe with the what being "?cats? eat mice".

Whether a frame is an assertion also determines which index it is indexed in (see Chap-ter 12 [Indexing], page 29).

Note that for efficiency reasons,learner

has a system ofscheduling updates to a frame,keeping a log of changes to be made. The log may contain directives such as:

make the atom cat a variable,

substitute dog for cat, and

stop treating dog as a variable1

You need to commit a log before examining the slots that the scheduled changes af-fect. frame-update! and frame-finalize! do that. The ! functions (such as frame-vaiables!, ifget! are the same as their non-! counterparts, but they update the framesthey operate on. Generally, updating a frame that has previously been updated is a low-costoperation that does not mutate the frame.

11.1 Substitution Functions

Functionvariable? frameReturns #t for a template and #f for an assertion.

Functionframe-atom-is-substitution frame atomGiven a frame and an atom, this returns a proto-frame where atom is now a variable(i.e. candidate for substitution).

Functionframe-atoms-are-substitutions frame atomsGiven a frame and a list of atoms, this returns a proto-frame where atoms are nowvariables (i.e. candidates for substitution).

Functionframe-atom-is-not-substitution frame atomgiven a frame and an atom, this returns a proto-frame where atom are no longercandidates for substitution.

1 The term substitution is used interchangeably with variable.


32/41

Chapter 11: Variables 28

Functionframe-atoms-are-not-substitutions frame atomsgiven a frame and a list of atoms, this returns a proto-frame where atoms are nolonger candidates for substitution.

Functionframe-substitute-atom frame atom-from argsGiven a frame and an atom-from and the substitution args, (which in simplest caseis atom-to. Returns a proto-frame where that substitution has been scheduled.

Functionframe-substitute-atoms frame atom-mapGiven a frame and an atom-map, returns a proto-frame where these substitution hasbeen scheduled.


33/41

Chapter 12: Indexing 29

12 Indexing

The learner currently uses six index files:

assertions.indexIndexes the assertions (Uframes and slotted frames) that learner knowsabout. This includes uncertain assertions (those with probability close to.5).

inferences-lhs.index Indexes Iframes by their left-hand sides.

inferences-rhs.index Indexes Iframes by their right-hand sides.

memoization.indexHolds results of computation that never change but that may take a while to

compute.

templates.indexIndexes templates Uframes and slotted frames that have variables in them.

types.indexIndexes all frames by their type and what.

Uframes are indexed by the pairs (Wframe . #t) and by the (Wframe . link-with-direction). For example, the Uframe (the uframe "cats eat mice") will be indexed by1:

(@3af3492b/57802a"WFRAME: eat" . #t)(@3af3492b/57802a"WFRAME: eat" . S-)(@3af3492b/57802a"WFRAME: eat" . O+)(@3af3492b/57802d"WFRAME: mouse" . #t)(@3af3492b/57802d"WFRAME: mouse" . O-)(@3af3492b/578014"WFRAME: cat" . #t)(@3af3492b/578014"WFRAME: cat" . W-)(@3af3492b/578014"WFRAME: cat" . S+)

A pair such as (@3af3492b/57802a"WFRAME: eat" . S-) means that this frame canbe retrieved by a low-level fdscript call (find-frames assertions-index (the wframe"eat") S-) or by a higher-level learner call (all-relevant-assertions (the wframe"eat") S-).

Slotted frames are indexed by the pair (slot-name . wframe). For example, andsrvframe "[cat|have|tail]" will have (subject . @3af3492b/578014"WFRAME: cat")

as one of its indices.

1 The exact numbers after the @ signs will vary.


34/41

Chapter 13: Iframes 30

13 Iframes

An Iframe is basically a rule of a rule-based system. It has the two structure slots

left-terms and right-term. These hold its lhs (left hand side, preconditions) and rhs(right hand side, postcondition).

An Iframes lhs is indexed in iframes-lhs.index and rhs in iframes-rhs.index

The left-terms value is a list of frame templates, and the right-term value is asingle frame, typically a template (having an assertion is allowed, but special provisions notto index it in assertions need to be made).

All of the terms on the left hand side must contain substitutions atoms whereanything can be plugged in (see Chapter 11 [Variables], page 27).

For a rule, when the lhs is satisfied, the rhs should also hold true.

Here is a what of a sample Iframe:

"a ?cat? is sleeping => a ?cat? is awake "This Iframe implies that if X is sleeping, X is not awake.

This frame can be instantiated with a call

(frame-plug-in-atom(the Iframe "a ?cat? is sleeping => a ?cat? is awake [p=0]")(the Wframe "cat")(the Wframe "dog"))

See Section 5.3 [Asking Learner Questions], page 12, for more on inputting Iframes.

13.1 Iframe Functions

learner provides functions for creating, retrieving, and doing inference with Iframes.

Functionsay-Iframe! lhs-expr rhs-expr . [other-slots-values]Given lhs-terms-lst, rhs-term, and optional other-slots-values, adds and returns theIframe.

Deciding what the variables are:All variables indicated in any term (lhs or rhs) become the Iframesvariables. If lhs-terms-lst and rhs-term indicate no variables, variablesare created by taking all the nouns that are both in lhs and rhs. Forexample, asserting (say-Iframe! (("a cat is not a dog")) ("a cat

cannot bark")) creates an Iframe such as:"IFRAME: a ?cat? is a dog => a ?cat? can bark "

Controlling whether constituent frames are also added to kb:When assert-constituents is in other-slots-valueswith value of#t or#f, will assert (or not assert) the constituent frames to db accordingly.Otherwise, (when assert-constituents has another value or is notpresent), will decide whether to assert the constitutents as follows:

If either lhs or rhs mention any of X, Y, Z, Xes, Ys, Zs, will notassert.


35/41


36/41

Chapter 14: Learner and the Community 32

14 Learner and the Community

14.1 Population Contributing to the Learnerlearner is, in a way, a bet on people. Much of learners power comes from having

access to an oracle the contributing population collectively holds an enormous bodyof diverse knowledge. As people contribute it, learner hangs on to it, ever rising in itssophistication.

The opportunity to solve hard problems with incremental contributions of many has notexisted historically. One of the projects goals is to explore and to learn more about thecollaborative approach to solving problems.

The demands of the project are diverse, and successful growth of the learner requirescontributions on different levels.

Basically, the needs form a pyramid:core

write plug-ins

contribute inference rules

contribute and verify simple assertions

Luckily, we can expect the contributors to naturally be distributed in roughly such apyramid as well.

This is because the prior experience required is an inverse pyramid:some lisp / scheme experience, knowledge representation / AI background

knowledge rep / ai background or interest, some programming

analytical skills, familiarity with reasoning

posses common sense

The amount of effort required to contribute to the lowest level of the pyramid is alsomuch less (and has smaller overall effect) than to contribute to a higher level.

To put it another way, the natural distribution of contributors is likely to roughly match

the need.

14.2 Wishlist

Most wanted features expanding learners usability:

An open source Web interface for the learner. Note that FDscript has built insupport for web scripting and xml and html generation and processing

A client-side front-end for the learner (perhaps a VB application for the Windowsplatform)


37/41

Chapter 14: Learner and the Community 33

A port of FramerD / Link Grammar Parser glue code to the Windows platformso Windows machines can run the learner

RPMs of release code for convenience of RedHat Linux Users

Code to make it easy to share data between individual deployments of the system(or between a central server and individual copies)

Code to import knowledge into learner from OpenMind: Commonsense andMindpixel

Code to confirm or invalidate assertions made previously by other users

Most wanted features making learner smarter:

Knowledge and algorithms to solve a specific domain problem. That is, specificinference rules and questions that would make learner smart about a scpecificniche (such the PC hardware field) so that it can collect and allow lookup ofknowledge in that domain.

Code for assigning credit to contributors of knowledge. That is, code that wouldcompute how true and useful knowledge added by each user has been, and assignseach user a positive impact rating.

Code to hypothesize new rules (Iframes).

Code for identifying areas of interest to the learner to gather knowledge in adirected way.

Code for advanced question-answering and reporting e.g. presenting a struc-tured report on everything known about an object.

Many more exciting directions feel free to contact us to discuss.

An upcoming feature of the learner is persistent (on disk) storage of the assertions

learner would like to find out. This repository will bear the dramatic name the purga-tory. There are several reasons a frame can be added to the purgatory:

questions and assertions about truth of which people have asked that we could notanswer,

things that were hypothesized by one oflearners hypothesis-making algorithms, and

things that are contradictions (i.e. learner can infer them to be both true and notbased on what it believes).

14.3 Bug Reports

You can contact the maintainers at [email protected].

When reporting a problem, please include as much of the following as is relevant so thatwe can address it:

a description of the nature of the problem,

a self-containing (preferably small) case that triggers it,

the output you got and the output you expected,

what was in the database at the time (preferably, nothing), and

information on your setup and configuration if appropriate.

Your patches are also welcome and will be incorporated in a timely fashion.


38/41

Chapter 15: Related Work 34

15 Related Work

There are some simpler systems on the web for your enjoyment.

20Q.net (http://www.20q.org) is a learning program that plays the game of 20 ques-tions. It gathers and re-uses the information useful in guessing an object. It is an inter-esting demonstration of the approach on a much narrower problem than the learneraddresses.

Guess the Dictator or Sit-Com Character is a game on the Web that poses previouslygathered questions trying to guess the dictator or the sit-com character you are thinkingof.

A New Database Direction (http://www.zdnet.co.uk/itweek/analysis/2001/21/enterprise/)

OpenMind (http://openmind.org) is a site dedicated to the Open approach to knowl-edge collection from many Web users.

OpenCYC (http://opencyc.com) is a project by CYCorp dedicated to sharing partsof the cyc database and the inference engine.


39/41

Chapter 16: History and Acknowledgements 35

16 History and Acknowledgements

The concept of the learner was originally developed by Tim and Anatoli Chklovski,

as were the similarity, analogy, and question-pruning algorithms.Matthew Fredette and Tim Chklovski have cooperated on the learner implementation.

We are grateful to Alex Vasserman, who has contributed some Link Grammar Parserglue code.

We are grateful to Push Singh for his feedback and sharing his experiences on the Open-Mind Commonsense project with us.

We are also grateful to Kenneth Haase for his continued support and development ofFramerD and his responsiveness in personal communication.


40/41

Function Index 36

Function Index

Aadd-to-kb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

add-to-kb-end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

add-to-kb-end-with-string-output. . . . . . . . . . 21

add-to-kb-start . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

add-to-kb-with-string-output. . . . . . . . . . . . . . 20

assertion-main-topic. . . . . . . . . . . . . . . . . . . . . . . 25

assertion-topics . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

C

corresponding-item. . . . . . . . . . . . . . . . . . . . . . . . . 14

F

find-answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

find-answers-with-string-output. . . . . . . . . . . 21

frame-atom-is-not-substitution. . . . . . . . . . . . 27

frame-atom-is-substitution . . . . . . . . . . . . . . . . 27

frame-atoms-are-not-substitutions. . . . . . . . . 28

frame-atoms-are-substitutions. . . . . . . . . . . . . 27

frame-substitute-atom . . . . . . . . . . . . . . . . . . . . . 28

frame-substitute-atoms . . . . . . . . . . . . . . . . . . . . 28

Iinfer-from . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

infer-one-iframe . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Rrelevant-iframes . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

S

say . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

say-Iframe! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

say-iframe!-with-string-output. . . . . . . . . . . . 31

say-topic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

say-topic-with-string-output. . . . . . . . . . . . . . 21

scored-assertions-on-topic . . . . . . . . . . . . . . . . 14

show-topic-summary. . . . . . . . . . . . . . . . . . . . . . . . . 21

show-topic-summary-with-string-output. . . .

21show-topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

show-topics-with-string-output. . . . . . . . . . . . 22

similar-uframes-of->string. . . . . . . . . . . . . 14, 22

similar-Uframes-set. . . . . . . . . . . . . . . . . . . . . . . . 14

T

term->protoframe . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

topic-absolute-frequency-weight. . . . . . . . . . . 25

topic-frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

topic-total-mentions. . . . . . . . . . . . . . . . . . . . . . . 25

V

variable? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27


41/41

Concept Index 37

Concept Index

Aassertions, vs. templates . . . . . . . . . . . . . . . . . . . . . . 27

atoms, substitutable . . . . . . . . . . . . . . . . . . . . . . . . . 27

C

combining the questions . . . . . . . . . . . . . . . . . . . . . . 17

F

FramerD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

frames, inheritance in. . . . . . . . . . . . . . . . . . . . . . . .

25

frames, mutating . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

frames, slotted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

frames, SRV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

I

Iframes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

indexing, uframes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

inheritance in frames. . . . . . . . . . . . . . . . . . . . . . . . . 25

LLink Grammar Parser . . . . . . . . . . . . . . . . . . . . . . . . 23

O

ordering the questions . . . . . . . . . . . . . . . . . . . . . . . . 18

Ppruning the questions . . . . . . . . . . . . . . . . . . . . . . . . 17

purgatory, the . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Qquestions, combining . . . . . . . . . . . . . . . . . . . . . . . . . 17

questions, ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

questions, pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

S

slotted frames. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26SRVframes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

substitutions (variables) . . . . . . . . . . . . . . . . . . . . . . 27

Ttemplates, vs. assertions. . . . . . . . . . . . . . . . . . . . . . 27

terms, writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

U

Uframes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26uncertain assertions, management of . . . . . . . . . . 33

utterances (in plain English), representing . . . . . 26

Vvariables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

learner manual

Documents