self-improvement through self-understanding: model-based reflection for agent adaptation

J. William Murdock 1/42

Self-Improvement through Self-Understanding:Model-Based Reflection for Agent Adaptation

J. William MurdockIntelligent Decision Aids Group

Navy Center for Applied Research in Artificial IntelligenceNaval Research Laboratory, Code 5515

Washington, DC [email protected] http://bill.murdocks.org

Presentation at NIST – March 18, 2002


Adaptation

• People adapt very well.– They figure out how to do new things.– If something doesn’t work, they try something else.– They understand how and why they are doing things.

• Computer programs do not adapt very well.– They can only do what they are programmed for.– They keep making the same mistakes.– They have no understanding of themselves.

• People adapt very well.– They figure out how to do new things.– If something doesn’t work, they try something else.– They understand how and why they are doing things.

• Computer programs do not adapt very well.– They can only do what they are programmed for.– They keep making the same mistakes.– They have no understanding of themselves.

Can we make computer programs adapt?Can we make computer programs adapt?


REM(Reflective Evolutionary Mind)

• Operating environment for intelligent agents• Provides support for adaptation to new

functional requirements• Uses functional models, generative planning,

and reinforcement learning• J. William Murdock and Ashok K. Goel

• Operating environment for intelligent agents• Provides support for adaptation to new

functional requirements• Uses functional models, generative planning,

and reinforcement learning• J. William Murdock and Ashok K. Goel


Example:Web Browsing Agent

• A mock-up of web browsing software• Based on Mosaic for X Windows, version 2.4• Imitates not only behavior but also internal

process and information of Mosaic 2.4

• A mock-up of web browsing software• Based on Mosaic for X Windows, version 2.4• Imitates not only behavior but also internal

process and information of Mosaic 2.4

???ps

pdf txt

html


Example:Disassembly and Assembly

• Software agent for disassembly in the domain of cameras– Information about cameras– Information about relevant actions

• e.g., pulling, unscrewing, etc.

– Information about disassembly processing• e.g., decide how to disconnect subsystems

from each other and then decide how to disassemble those subsystems separately.

• Agent now needs to assemble a camera

• Software agent for disassembly in the domain of cameras– Information about cameras– Information about relevant actions

• e.g., pulling, unscrewing, etc.

– Information about disassembly processing• e.g., decide how to disconnect subsystems

from each other and then decide how to disassemble those subsystems separately.

• Agent now needs to assemble a camera


• TMK models provide the agent with knowledge of its own design.

• TMK encodes:– Tasks: functional specification / requirements and results– Methods: behavioral specification / composition and control– Knowledge: Domain concepts and relations

• TMK models provide the agent with knowledge of its own design.

• TMK encodes:– Tasks: functional specification / requirements and results– Methods: behavioral specification / composition and control– Knowledge: Domain concepts and relations

Remote Local

…

URL’s, servers,documents, etc.

Access

Request Receive Store

TMK (Task-Method-Knowledge)


...

REM Reasoning Process

A Method

Implemented Task

......

Unimplemented Task

Set of Input Values

Set of Input Values

Execution

Adaptation

ADAPTED Method

ADAPTED Implemented Task

...TraceSet of Output Values


...

ProactiveModel Transfer

...

Adaptation Process

Task

A Method

Similar Implemented Task

......

Situator(for Q-Learning) ADAPTED Method

ADAPTED Implemented Task

...

Failure-DrivenModel Transfer

Existing Method

Trace

Set of Input Values

Generative Planning


...Select Next Task

Within Method

Execution Process

SelectMethod

ExecutePrimitive Task

A Method

Implemented Task

...Set of Input Values

TraceSet of Output Values


Selection: Q-Learning

• Popular, simple form of reinforcement learning.• In each state, each possible decision is assigned an

estimate of its potential value (“Q”).• For each decision, preference is given to higher Q

values.• Each decision is reinforced, i.e., it’s Q value is altered

based on the results of the actions.• These results include actual success or failure and

the Q values of next available decisions.

• Popular, simple form of reinforcement learning.• In each state, each possible decision is assigned an

estimate of its potential value (“Q”).• For each decision, preference is given to higher Q

values.• Each decision is reinforced, i.e., it’s Q value is altered

based on the results of the actions.• These results include actual success or failure and

the Q values of next available decisions.


Q-Learning in REM

• Decisions are made for method selection and for selecting new transitions within a method.

• A decision state is a point in the reasoning (i.e., task, method) plus a set of all decisions which have been made in the past.

• Initial Q values are set to 0.• Decides on option with highest Q value or randomly

selects option with probabilities weighted by Q value (configurable).

• A decision receives positive reinforcement when it leads immediately (without any other decisions) to the success of the overall task.

• Decisions are made for method selection and for selecting new transitions within a method.

• A decision state is a point in the reasoning (i.e., task, method) plus a set of all decisions which have been made in the past.

• Initial Q values are set to 0.• Decides on option with highest Q value or randomly

selects option with probabilities weighted by Q value (configurable).

• A decision receives positive reinforcement when it leads immediately (without any other decisions) to the success of the overall task.


Task-Method-Knowledge Language (TMKL)

• A new, powerful formalism of TMK developed for REM.

• Uses LOOM, a popular off-the-shelf knowledge representation framework: concepts, relations, etc.

• A new, powerful formalism of TMK developed for REM.

• Uses LOOM, a popular off-the-shelf knowledge representation framework: concepts, relations, etc.

REM models not only the tasks of the domain but also itself in TMKL.

REM models not only the tasks of the domain but also itself in TMKL.


Tasks in TMKL

• All tasks can have input & output parameter lists and given & makes conditions.

• A non-primitive task must have one or more methods which accomplishes it.

• A primitive task must include one or more of the following: source code, a logical assertion, a specified output value.

• Unimplemented tasks have neither of these.

• All tasks can have input & output parameter lists and given & makes conditions.

• A non-primitive task must have one or more methods which accomplishes it.

• A primitive task must include one or more of the following: source code, a logical assertion, a specified output value.

• Unimplemented tasks have neither of these.


TMKL Task

(define-task communicate-with-www-server :input (input-url) :output (server-reply) :makes (:and (document-at-location (value server-reply) (value input-url)) (document-at-location (value server-reply) local-host)) :by-mmethod (communicate-with-server-method))


Methods in TMKL

• Methods have provided and additional result conditions which specify incidental requirements and results.

• In addition, a method specifies a start transition for its processing control.

• Each transition specifies requirements for using it and a new state that it goes to.

• Each state has a task and a set of outgoing transitions.

• Methods have provided and additional result conditions which specify incidental requirements and results.

• In addition, a method specifies a start transition for its processing control.

• Each transition specifies requirements for using it and a new state that it goes to.

• Each state has a task and a set of outgoing transitions.


Simple TMKL Method

(define-mmethod external-display

:provided (:not (internal-display-tag (value server-tag)))

:series (select-display-command

compile-display-command

execute-display-command))


Complex TMKL Method(define-mmethod make-plan-node-children-mmethod :series (select-child-plan-node make-subplan-hierarchy add-plan-mappings set-plan-node-children))(tell (transition>links make-plan-node-children-mmethod-t3 equivalent-plan-nodes child-equivalent-plan-nodes) (transition>next make-plan-node-children-mmethod-t5 make-plan-node-children-mmethod-s1) (:create make-plan-node-children-terminate transition) (reasoning-state>transition make-plan-node-children-mmethod-s1 make-plan-node-children-terminate) (:about make-plan-node-children-terminate (transition>provided '(terminal-addam-value (value child-plan-node)))))


Knowledge in TMKL

Foundation: LOOM– Concepts, instances, relations– Concepts and relations are instances and can

have facts about them.

Foundation: LOOM– Concepts, instances, relations– Concepts and relations are instances and can

have facts about them.

Knowledge representation in TMKL involves LOOM +

some TMKL specific reflective concepts and relations.

Knowledge representation in TMKL involves LOOM +

some TMKL specific reflective concepts and relations.


Some TMKLKnowledge Modeling

(defconcept location)(defconcept computer :is-primitive location)(defconcept url :is-primitive location :roles (text))(defrelation text :range string :characteristics :single-valued)(defrelation document-at-location :domain reply :range location)(tell (external-state-relation document-at-location))


Sample Meta-Knowledge in TMKL

•relation characteristics

–single-valued/multiple-valued

–symmetric, commutative

•relation characteristics

–single-valued/multiple-valued

–symmetric, commutative

•relations over relations

–external/internal

–state/definitional

•relations over relations

–external/internal

–state/definitional

•generic relations

–same-as

–instance-of

–inverse-of

•generic relations

–same-as

–instance-of

–inverse-of

•concepts involving concepts

–thing

–meta-concept

–concept

•concepts involving concepts

–thing

–meta-concept

–concept


Web Browsing Agent

• Interactive Domain: Web agent is affected by the user and by the network

• Dynamic Domain: Both users and networks often change

• Knowledge Intensive Domain: Documents, networks, servers, local software, etc.

• Interactive Domain: Web agent is affected by the user and by the network

• Dynamic Domain: Both users and networks often change

• Knowledge Intensive Domain: Documents, networks, servers, local software, etc.

Mock-up of a web browser:

Steps through the web-browsing process

Mock-up of a web browser:

Steps through the web-browsing process


Tasks and Methodsof Web Agent

Communicate with WWW Server Display File

Process URL Method

Process URL

Request from Server Receive from Server

Communicate with WWW Server Method

Interpret Reply Display Interpreted File

External Display Internal Display

Execute Internal DisplaySelect Display Command Compile Display Command Execute Display Command

Display File Method


Example: PDF Viewer

• The web agent is asked to browse the URL for a PDF file. It does not have any information about external viewers for PDF.

• Because the agent already has a task for browsing URL’s it is executed first.

• When the system fails, the user provides feedback indicating the correct viewer.

• Failure-Driven Model Transfer

• The web agent is asked to browse the URL for a PDF file. It does not have any information about external viewers for PDF.

• Because the agent already has a task for browsing URL’s it is executed first.

• When the system fails, the user provides feedback indicating the correct viewer.

• Failure-Driven Model Transfer


Web Agent Adaptation

External Display

Select Display Command Compile Display Command Execute Display Command

...

External Display

Compile Display Command Execute Display Command

...

Select Display Command Base Method Select Display Command Alternate Method

Select Display Command

Select Display Command Base Task Select Display Command Alternate Task


Physical Device Disassembly

• ADDAM: Legacy software agent for case-based, design-level disassembly planning and (simulated) execution

• Interactive: Agent connects to a user specifying goals and to a complex physical environment

• Dynamic: New designs and demands• Knowledge Intensive: Designs, plans, etc.

• ADDAM: Legacy software agent for case-based, design-level disassembly planning and (simulated) execution

• Interactive: Agent connects to a user specifying goals and to a complex physical environment

• Dynamic: New designs and demands• Knowledge Intensive: Designs, plans, etc.


Disassembly Assembly

• A user with access to ADDAM disassembly agent wishes to have this agent instead do assembly.

• ADDAM has no assembly method thus must adapt first.

• Since assembly is similar to disassembly, REM selects Proactive Model Transfer.

• A user with access to ADDAM disassembly agent wishes to have this agent instead do assembly.

• ADDAM has no assembly method thus must adapt first.

• Since assembly is similar to disassembly, REM selects Proactive Model Transfer.


Pieces of ADDAM which are key to Disassembly Assembly

Adapt Disassembly Plan Execute Plan

Plan Then Execute Disassembly

Disassemble

Hierarchical Plan Execution

Select Next Action Execute Action

Topology Based Plan Adaptation

Make Plan Hierarchy

Make Equivalent Plan Nodes Method

Make Equivalent Plan Node Add Equivalent Plan Node

Map Dependencies

Select Dependency Assert Dependency


New Adapted Task inDisassembly Assembly

COPIED Adapt Disassembly Plan COPIED Execute Plan

COPIED Plan Then Execute Disassembly

Assemble

COPIED Hierarchical Plan Execution

Execute Action

COPIED Topology Based Plan Adaptation

COPIED Make Plan Hierarchy

COPIED Make Equivalent Plan Nodes Method

COPIED Add Equivalent Plan Node

COPIED Map Dependencies

COPIED Select Dependency INVERTED Assert Dependency

INSERTED Inversion Task 1

INSERTED Inversion Task 2

Select Next Action

COPIED Make Equivalent Plan Node


Task: Assert Dependency

Before:define-task Assert-Dependency input: target-before-node, target-after-node asserts: (node-precedes (value target-before-node)

(value target-after-node))

After:define-task Mapped-Assert-Dependency input: target-before-node, target-after-node asserts: (node-follows (value target-before-node)

(value target-after-node)))


Task: Make Equivalent Plan Node

define-task make-equivalent-plan-node

input: base-plan-node, parent-plan-node, equivalent-topology-node

output: equivalent-plan-node

makes: (:and

(plan-node-parent (value equivalent-plan-node)

(value parent-plan-node))

(plan-node-object (value equivalent-plan-node)

(value equivalent-topology-node))

(:implies (plan-action (value base-plan-node))

(type-of-action (value equivalent-plan-node)

(type-of-action (value base-plan-node)))))

by procedure ...


Task:Inverted-Reversal-Task

define-task inserted-reversal-task

input: equivalent-plan-node

asserts: (type-of-action

(value equivalent-plan-node)

(inverse-of

(type-of-action

(value equivalent-plan-node))))


ADDAMExample:

Layered Roof


Roof Assembly

1

10

100

1000

10000

100000

1000000

1 2 3 4 5 6 7

Number of Boards

Ela

pse

d T

ime

(sec

on

ds)

REM: Meta-CBR

REM: Graphplan

REM: Q-Learning


Modified Roof Assembly: No Conflicting Goals

1

10

100

1000

10000

100000

1 2 3 4 5 6 7

Number of Boards

Ela

pse

d T

ime

(sec

on

ds)

REM: Meta-CBR

REM: Graphplan

REM: Q-Learning


Applicability ofProactive Model Transfer

• Knowledge about the concepts and relations in the domain

• Knowledge about how the tasks and methods affect these concepts and relations

• Differences between the old task and the new map onto knowledge of the concepts and relations in the domain.

• Knowledge about the concepts and relations in the domain

• Knowledge about how the tasks and methods affect these concepts and relations

• Differences between the old task and the new map onto knowledge of the concepts and relations in the domain.


Applicability ofFailure-Driven Model Transfer

• May need less knowledge about the domain itself since the adaptation is grounded in a specific incident.– e.g., feedback about PDF for an example instead

of advance knowledge of all document types.

• Still requires knowledge about how the tasks and methods interact with the domain.

• May need less knowledge about the domain itself since the adaptation is grounded in a specific incident.– e.g., feedback about PDF for an example instead

of advance knowledge of all document types.

• Still requires knowledge about how the tasks and methods interact with the domain.


Additional Mechanisms

• Model-based adaptation may leave some design decisions unsolved.– These decisions may be solved by traditional

decision making mechanisms, e.g., reinforcement learning.

• Models may be unavailable or irrelevant for some tasks or subtasks– Generative planning can combine primitive actions.

• Model-based adaptation may leave some design decisions unsolved.– These decisions may be solved by traditional

decision making mechanisms, e.g., reinforcement learning.

• Models may be unavailable or irrelevant for some tasks or subtasks– Generative planning can combine primitive actions.


Level of Decomposition

• Level of decomposition may be dictated by the nature of the agent.– Some tasks simply cannot be decomposed

• In other situations, level of decomposition may be guided by the nature of adaptation to be done.– Can be brittle if unpredicted demands arise.

• REM enables autonomous decomposition of primitives which addresses this problem.

• Level of decomposition may be dictated by the nature of the agent.– Some tasks simply cannot be decomposed

• In other situations, level of decomposition may be guided by the nature of adaptation to be done.– Can be brittle if unpredicted demands arise.

• REM enables autonomous decomposition of primitives which addresses this problem.


Computational Costs

• Reasoning about models incurs some costs.– For very easy problems, this overhead may not be

justified.– For other problems, the benefits enormously

outweigh these costs.

• Reasoning about models incurs some costs.– For very easy problems, this overhead may not be

justified.– For other problems, the benefits enormously

outweigh these costs.

Models can localize planning and learning.Models can localize planning and learning.


Knowledge Requirements

• Someone has to build an agent.• Builder should know what that agent does and

how it does it Can make model.• Analyst may be able to understand builder’s

notes, etc. Can make model• Some evidence for this in the context of

software engineering / architectural extraction.

• Someone has to build an agent.• Builder should know what that agent does and

how it does it Can make model.• Analyst may be able to understand builder’s

notes, etc. Can make model• Some evidence for this in the context of

software engineering / architectural extraction.


Current Work: AHEAD• Theme: Analyzing hypotheses regarding asymmetric

threats (e.g., criminals, terrorists).– Input: Hypotheses regarding a potential threat– Output: Argument for and/or against the hypotheses

• Technique: Analogy over functional models– An extension to TMKL will encode known behaviors for

asymetric threats and the purposes that the behaviors serve.– Analogical reasoning will enable retrieval and mapping of

new hypotheses to existing models.– Models will provide arguments about how observed actions

do or do not support the purposes of the hypothesized behavior.

• Naval Research Laboratory / DARPA Evidence Extraction and Link Discovery program

• David Aha, J. William Murdock, Len Breslow

• Theme: Analyzing hypotheses regarding asymmetric threats (e.g., criminals, terrorists).– Input: Hypotheses regarding a potential threat– Output: Argument for and/or against the hypotheses

• Technique: Analogy over functional models– An extension to TMKL will encode known behaviors for

asymetric threats and the purposes that the behaviors serve.– Analogical reasoning will enable retrieval and mapping of

new hypotheses to existing models.– Models will provide arguments about how observed actions

do or do not support the purposes of the hypothesized behavior.

• Naval Research Laboratory / DARPA Evidence Extraction and Link Discovery program

• David Aha, J. William Murdock, Len Breslow


Summary

• REM (Reflective Evolutionary Mind)– Operating environment for agents that adapt

• TMKL (Task-Method-Knowledge Language)– The language for agents in REM– Functional modeling language for encoding

computational processes• Adaptation

– Some kinds of adaptation can be performed using specialized model-based techniques

– Others require more generic planning & learning mechanisms (localized using models)

• REM (Reflective Evolutionary Mind)– Operating environment for agents that adapt

• TMKL (Task-Method-Knowledge Language)– The language for agents in REM– Functional modeling language for encoding

computational processes• Adaptation

– Some kinds of adaptation can be performed using specialized model-based techniques

– Others require more generic planning & learning mechanisms (localized using models)

self-improvement through self-understanding: model-based reflection for agent adaptation

Documents

q value configurable

highest q value

potential value q

higher q values

initial q values

cameratmk taskmethod

decision state

possible decision