intelligent agents: technology and applications

65
Intelligent Agents: Technology and Applications Multi-agent Learning IST 597B Spring 2003 John Yen

Upload: butest

Post on 11-May-2015

919 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Intelligent Agents: Technology and Applications

Intelligent Agents: Technology and ApplicationsMulti-agent Learning

IST 597B

Spring 2003

John Yen

Page 2: Intelligent Agents: Technology and Applications

Learning Objectives

How to identify goals for agent projects? How to design agents? How to identify risks/obstacles early on?

Page 3: Intelligent Agents: Technology and Applications

Multi-Agent Learning

Page 4: Intelligent Agents: Technology and Applications

Multi-Agent Learning

The learned behavior can be used as a basis for more complex interactive behavior

Enables agent to participate in higher level collaborative or adversarial learning situations

Learning would not be possible if the agent was isolated

Page 5: Intelligent Agents: Technology and Applications

Examples Examples of single agent learning in a multi-

agent environment:

1. Reinforcement Learning agent which incorporates information gathered by another agent (Tan, 93)

2. Agent learning negotiating techniques of another using Bayesian Learning (Zeng & Sycara, 96)

Class of multi-agent learning in which an agent attempts to model another agent

Page 6: Intelligent Agents: Technology and Applications

Examples

Training scenario in which a novice agent learns from a knowledgeable agent (Clouse, 96)

A common thing among all the examples is that the learning agent is interacting with other agents

Page 7: Intelligent Agents: Technology and Applications

Predator/Pray (Pursuit) Domain

Introduced by Bends et. al (86) Four predators and one prey Goal: to capture (or surround) the prey Not a complex real-world, toy domain that

helps concretize concepts

Page 8: Intelligent Agents: Technology and Applications

Predator/Pray (Pursuit) Domain

Page 9: Intelligent Agents: Technology and Applications

Taxonomy of MAS

Taxonomy organized along – the degree of heterogeneity, and – the degree of communication

1. Homogenous, Non-Communicating Agents

2. Heterogeneous, Non-Communicating Agents

3. Homogenous, Communicating Agents4. Heterogeneous, Communicating Agents

Page 10: Intelligent Agents: Technology and Applications

Taxonomy of MAS

Page 11: Intelligent Agents: Technology and Applications

Taxonomy of MAS

Page 12: Intelligent Agents: Technology and Applications

1. Homogenous, Non-Communicating Agents

All agents have the same internal structure• Goals

• Domain knowledge

• Actions

The only difference is their sensory input and the actions that they take

• They are situated differently in the world

Korf (1992) introduces a policy for each predictor based on an attractive force to the prey and a repulsive force from other preditors

Page 13: Intelligent Agents: Technology and Applications

1. Homogenous, Non-Communicating Agents

Korf concludes that explicit cooperation is not necessary

Haynes & Sen show that Korf’s heuristic does not work for certain instantiation of the domain

Page 14: Intelligent Agents: Technology and Applications

1. Homogenous, Non-Communicating Agents

Issues:

1. Reactive vs. deliberative agents

2. Local vs. global perspective

3. Modeling of other agents

4. How to affect others

5. Further learning opportunities

Page 15: Intelligent Agents: Technology and Applications

1: Reactive vs. Deliberative Agents

Reactive agents do not maintain an internal state and simply retrieve pre-set behaviors

Deliberative agents maintain an internal state and behave by searching through a space of behaviors, predicting the action of other agents and the effect of actions

Page 16: Intelligent Agents: Technology and Applications

2: Local vs. Global Perspective

How much sensory input should be available to agents? (observability)

Having a global view might lead to sub-optimal results

Better performance by agents with less knowledge: “Ignorance is Bliss”

Page 17: Intelligent Agents: Technology and Applications

3: Modeling of Other Agents

Since agents are identical, they can predict each others actions given the sensory input

Recursive Modeling Method: to model the internal state of another agent in order to predict its actions

Each predator bases its move on the predicted move of other predators and vice versa

Since reasoning can recurse indefinitely, it should be limited in terms of time or recursion

Page 18: Intelligent Agents: Technology and Applications

3: Modeling of Other Agents

If agents know too much, RMM could recurse indefinitely

For coordination to be possible, some potential knowledge must be ignored

Schmidhuber (1996) shows that agents can cooperate without modeling each other

They consider each other as part of the environment

Page 19: Intelligent Agents: Technology and Applications

4: How to Affect Others

Without communication, agents cannot affect each other directly

Can affect each other indirectly in several ways

1. They can be sensed by other agents

2. Change the state of another agent (e.g. by pushing it)

3. Affect each other by stigmergy (Becker, 94)

Page 20: Intelligent Agents: Technology and Applications

4: How to Affect Others

Active stigmergy: – an agent alters the environment so as to effect the

sensory input of another agent. E.g. an agent might leave a marker for other agents to observe

Passive stigmergy:– altering the environment so that the effect of another

agents’ actions change. If an agent turns of the main water valve of a building, the effect of another agent turning on the faucet is altered

Page 21: Intelligent Agents: Technology and Applications

4: How to Affect Others Example: A number of robots in an area with many pucks

scattered around. Robots reactively move straight (turning at walls) until they are pushing 3 or more pucks. Then they back up and turn away

Although robots do not communicate, they can collect the pucks in a single pile over time

When a robot approaches an existing pile, it adds the pucks and turns away

A robot approaching an existing pile obliquely might take a puck away, but over time the desired result is accomplished

Page 22: Intelligent Agents: Technology and Applications

5: Further Learning Opportunities An agent might try to learn to take actions

that will not help it directly in the current situation, but may allow other agents to be more effective in the future.

In Traditional RL, if an action leads to a reward by another agent, the acting agent may have no way of reinforcing that action

Page 23: Intelligent Agents: Technology and Applications

2. Heterogeneous, Non-Communicating Agents

Can be heterogeneous in any of following:• Goals• Actions• Domain knowledge

In the pursuit domain, the prey can be modeled as an agent

Haynes et. al. have used GA and case-based reasoning to make predators learn to cooperate in absence of communication

Page 24: Intelligent Agents: Technology and Applications

2. Heterogeneous, Non-Communicating Agents

They also explore the possibility of evolving both predators and the prey

• Predators use Korf’s greedy heuristic

Though one might think this will result in repeated improvement of predator and prey with no convergence, a prey behavior emerges that always succeeds

• Prey simply moves in a constant straight line

Haynes et. al. conclude Korf’s greedy algorithm relies on random prey movement

Page 25: Intelligent Agents: Technology and Applications

2. Heterogeneous, Non-Communicating Agents

Issues:

1. Benevolence vs. competitiveness

2. Fixed vs. learning agents

3. Modeling of other agents

4. Resource management

5. Social conventions

Page 26: Intelligent Agents: Technology and Applications

1: Benevolence vs. Competitiveness Can be benevolent even if they have

different goals (if they are willing to help each other)

Selfish agents: more effective and biologically plausible

Agents cooperate because it is in their own best interest

Page 27: Intelligent Agents: Technology and Applications

1: Benevolence vs. Competitiveness Prisoners dilemma: two burglars are

captured. Each has to choose whether or not to confess and implicate the other. If neither confess, they will both serve 1 year. If both confess they will both serve 10 years. If one confesses and the other does not, the one who has collaborated will go free and the other will serve for 20 years

Page 28: Intelligent Agents: Technology and Applications

1: Benevolence vs. Competitiveness

Page 29: Intelligent Agents: Technology and Applications

1: Benevolence vs. Competitiveness Each agent will decide to confess to

maximize its own interest If both confess, they will get 10 years each If they had acted “irrationally” and kept

quiet, they would each get 1 year Mor et.al. (1995) show that in repeated

prisoner’s dilemma cooperative behavior can emerge

Page 30: Intelligent Agents: Technology and Applications

1: Benevolence vs. Competitiveness In zero-sum games cooperation is not

sensible If a third dimension was to be added to the

taxonomy, besides the degree of heterogeneity and communication, it would be benevolence vs. competitiveness

Page 31: Intelligent Agents: Technology and Applications

2: Fixed vs. Learning Agents

Learning agents desirable in dynamic environments

Competitive vs. cooperative learning Possibility of “arms race” in competitive

learning. Competing agents continually adapt to each other in more and more specialized ways, never stabilizing at a good behavior

Page 32: Intelligent Agents: Technology and Applications

2: Fixed vs. Learning Agents

Credit-assignment problem: when performance of an agent improves, it is not clear whether the improvement is due to an improvement in the agent’s behavior or a negative behavior in the opponent’s behavior. Same problem if the performance of an agent gets worse.

One solution is to fix the one agent while allowing the other to learn and the to switch. Encourages more arms race than ever!

Page 33: Intelligent Agents: Technology and Applications

3: Modeling of other agents

Goals, actions and domain knowledge of other agents may be unknown and need modeling

Without communication, modeling is done strictly through observation

RMM is good for modeling the states of homogenous agents

Tambe (1995) takes it one step further, studying how agents can learn models of teams of agents

Page 34: Intelligent Agents: Technology and Applications

4: Resource Management

Examples:– Network traffic problem: several agents send

information through the same network (GA)– Load balancing: several users have limited

amount of computing power to share among them (RL)

Braess’ Paradox (Glance et. al., 1995): adding more resources to a network but getting worse performance

Page 35: Intelligent Agents: Technology and Applications

5: Social Conventions Imagine you are to meet a friend in Paris. You

both arrive on the same day but were unable to get in touch to set a time and place. Where will you go and when?

75% of audience at AAAI-95 Symposium on Active Learning answered (without prior communication) they would go to Eiffel tower at noon.

Even without communication agents are able to coordinate actions

Page 36: Intelligent Agents: Technology and Applications

3. Homogenous, Communicating Agents

Communication can be either broadcast or point-to-point

Issues:1. Distributed sensing

– Distributed vision project (Matsuyama, 1997)

– Trafficopter system (Moukas et. al., 1997)

2. Communication content– What they should communicate? states, or goals?

3. Further learning opportunities:– When to communicate?

Page 37: Intelligent Agents: Technology and Applications

4. Heterogeneous, Communicating Agents

Tradeoff between cost and freedom Osawa suggests predators should go through 4

phases:– Autonomy, communication, negotiation, and control

– When they stop making progress using one strategy, they should move to the next expensive strategy

Increasing order of cost (decreasing order of freedom)

Page 38: Intelligent Agents: Technology and Applications

4. Heterogeneous, Communicating Agents

Important issues:

1. Understanding each other

2. Planning communication acts

3. Negotiation

4. Commitment/decommitment

5. Further learning opportunities

Page 39: Intelligent Agents: Technology and Applications

1: Understanding Each Other

Need some set protocol for communication Aspects of the protocol:

1. Information content: KIF (Genesereth, 92)

2. Message Format: KQML (Finin, 94)

3. Coordination: COOL (Barbuceanu, 95)

Page 40: Intelligent Agents: Technology and Applications

2: Planning Communication Acts

The theory of communication as action is called speech acts

Communication acts have precondition and effects

Effects might be to alter an agent’s belief about the state of another agent or agents

Page 41: Intelligent Agents: Technology and Applications

3: Negotiation

Design negotiating MAS based on law of supply and demand

1. Contract nets (Smith, 1990): • Agents have their own goals, are self-interested,

and have limited reasoning resources. They bid to accept tasks from other agents and can then either perform the task or subcontract it to another agent. Agent must pay to contract their tasks.

Page 42: Intelligent Agents: Technology and Applications

3: Negotiation

MAS controlling air temperature in different rooms of a building:

• An agent can set the thermostat to any temperature. Depending on the actual air temperature, the agent can ‘buy’ hot or cold air from another room that has an excess. At the same time the agent can sell the excess air at the current temperature to other rooms. Modeling the loss of heat in transfer from one room to another, the agents try to buy and sell at the best possible prices.

Page 43: Intelligent Agents: Technology and Applications

4: Commitment/Decommitment

Agent agrees to pursue a given goal regardless of how much it serves its own interest

Commitments can make the systems run more smoothly by making agents trust each other

Unclear how to make self-interested agents to commit to others

Belief/desire/intention (BDI) a popular technique for modeling other agents– Used in OASIS: air traffic control

Page 44: Intelligent Agents: Technology and Applications

5: Further Learning Opportunities Instead of predefining a protocol, allow the

agents to learn for themselves what to communicate and how to interpret it

Possible result would be more efficient communication

Page 45: Intelligent Agents: Technology and Applications

Q Learning

Assess state action pairs (s, a) using a Q value

Learn the Q value using rewards/feedback A reward receives at time t is discounted to

previous state-action pairs (using a discount factor)

Goal of learning is to find an optimal policy for selecting actions.

Page 46: Intelligent Agents: Technology and Applications

*( , ) ( , ) ( ) *( )xyy

Q x R x P V y

The Q value

R: Reward

Pxy: The probability of reaching state y from x by taking action action alpha.

Gamma: Discount factor (between 0 and 1).

V*(y): The expected total discounted return starting in y following the policy *.

Policy: a sequence of actions.

Page 47: Intelligent Agents: Technology and Applications

*( ) max *( , )V x Q x

The Expected Total Discount Return V for a state is the maximal Q value among all actions that can be taken at the state (following the rest of the policy).

Page 48: Intelligent Agents: Technology and Applications

*( , ) (1 ) *( , ) ( *( ))Q x Q x r V y

Learning Rule for Q value

Alpha: learning rate

Page 49: Intelligent Agents: Technology and Applications

( , ) 0Q x a

and ( , ) 0Tr x a for all x and a

Do Forever:

tx the current state

ta that maximizes ( , )tQ x a

over all a

Carry out action ta in the world. Let the short term reward be tr , and the new state be 1tx

' ( 1) ( , )tt t t tte r V x Q x a

( 1) ( )t tt t te r V x V x

For each state-action pair ( , )x a do

( , ) ( , )Tr x a Tr x a

1( , ) ( , ) ( , ) tt tQ x a Q x a Tr x a e

'1 1( , ) ( , )t t t t tt tQ x a Q x a e

( , ) ( , ) 1t t t tTr x a Tr x a

Choose an action

1.

2.

(a)

(b)

(c)

(d)

(e)(f)

(g)

(h)

Page 50: Intelligent Agents: Technology and Applications

( , ) /

( , ) /( )i

k

Q x a T

i Q x a T

k actions

ep a x

e

Probability for the agent to select action ai based on Q values

T: “temperature” parameter to determine the randomness of decisions.

Page 51: Intelligent Agents: Technology and Applications

Towards Collaborative and Adversarial LearningA Case Study in Robotic Soccer

Peter Stone & Manuela Veloso

Page 52: Intelligent Agents: Technology and Applications

Introduction

Layered learning, to develop complex multi-agent behaviors from simple ones

Simple multi-agent behavior in Robotic Soccer, to shoot a moving ball

• Passer

• Shooter

Behavior to be learnt: When the shooter should begin to move (shooting policy)

Page 53: Intelligent Agents: Technology and Applications

Simple Behavior

Page 54: Intelligent Agents: Technology and Applications

Parameters

1. Ball speed (fixed vs. variable)

2. Ball trajectory (fixed vs. variable)

3. Goal location (fixed vs. variable)

4. Action quadrant (fixed vs. variable)

Page 55: Intelligent Agents: Technology and Applications

Parameters

Page 56: Intelligent Agents: Technology and Applications

Fixed Ball Motion Simple shooting policy: begin accelerating when the balls

distance to its projected point of intersection with the agent’s path reaches 110 units

• 100% success rate if shooter position fixed• 61% success rate if shooter position variable

Use Neural network, Inputs to NN (coordinate independent):

• Ball distance• Agent distance• Heading offset

Output: 1 or 0 (shot successful or not) Use random shooting policy for training

Page 57: Intelligent Agents: Technology and Applications

Neural Network

Page 58: Intelligent Agents: Technology and Applications

Results

Page 59: Intelligent Agents: Technology and Applications

Varying Ball Speed

Add a fourth input to NN, Ball Speed

Page 60: Intelligent Agents: Technology and Applications

Varying Ball’s Trajectory

Use the same shooting policy Use another NN to determine the direction the

shooter should steer (shooter’s aiming policy)

Page 61: Intelligent Agents: Technology and Applications

Moving the Goal

Can think of it as aiming for different parts of the goal

Change nothing but the shooter’s knowledge of the goal location

Page 62: Intelligent Agents: Technology and Applications

Cooperative Learning

Passing a moving ball• Passer: where to aim the pass,

• Shooter: where to position itself

Page 63: Intelligent Agents: Technology and Applications

Cooperative Learning

Page 64: Intelligent Agents: Technology and Applications

Adversarial Learning

Page 65: Intelligent Agents: Technology and Applications

References

Peter Stone, Manuela Veloso, 2000, “Multi-Agent Systems: A Survey from a Machine Learning Perspective”

Ming Tan, 1993, “Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents”

Peter Stone, Manuela Veloso, 1998, “Toward Collaborative and Adversarial Learning: A Case Study in Robotic Soccer”