summer school july 2010

8/8/2019 Summer School July 2010

http://slidepdf.com/reader/full/summer-school-july-2010 1/164

1

Automated negotiations: AgentsAutomated negotiations: Agents

interacting with other automatedinteracting with other automatedagents and with humansagents and with humans

Sarit KrausDepartment of Computer Science

Bar-Ilan UniversityUniversity of Maryland

[email protected]

http://www.cs.biu.ac.il/~sarit/



2

NegotiationsNegotiations

“A discussion in which interested parties exchange information andcome to an agreement.” — Davis and

Smith, 1977



3

NEGOTIATIONNEGOTIATION is an

interpersonal decision-making process necessarywhenever we cannot

achieve our objectivessingle-handedly.

NegotiationsNegotiations



4

Agent environmentsAgent environments

Teams of agents that need to coordinate jointactivities; problems: distributed information,distributed decision solving, local conflicts.

Open agent environments acting in the sameenvironment; problems: need motivation tocooperate, conflict resolution, trust, distributed

and hidden information.



5

Open Agent EnvironmentsOpen Agent Environments

Consist of:◦ Automated agents developed by or serving different

people or organizations.

◦ People with a variety of interests and institutional

affiliations. The computer agents are “self-interested”;

they may cooperate to further their interests. The set of agents is not fixed.



6

Open Agent Environments (examples)Open Agent Environments (examples)

Agents support people◦ Collaborative interfaces◦ CSCW: Computer Supported Cooperative Work systems◦ Cooperative learning systems◦ Military-support systems

nAgents act as proxies for peoplenCoordinating schedulesnPatient care-delivery systems

nOnline auctionsnGroups of agents act autonomously alongside

peoplenSimulation systems for education and trainingnComputer games and other forms of entertainment

nRobots in rescue operations

nSoftware personal assistants



Open Agent EnvironmentsOpen Agent Environments(examples)(examples) Agents support people

◦ Collaborative interfaces◦ CSCW: Computer Supported Cooperative Work systems◦ Cooperative learning systems◦ Military-support systems

Agents act as proxies for people◦ Coordinating schedules◦ Patient care-delivery systems◦ Online auctions

Groups of agents act autonomously alongside people◦ Simulation systems for education and training◦

Computer games and other forms of entertainment◦ Robots in rescue operations◦ Software personal assistants

◦

◦



ExamplesExamples Monitoring electricity networks (Jennings) Distributed design and engineering (Petrie et al.) Distributed meeting scheduling (Sen & Durfee) Teams of robotic systems acting in hostile environments (Balch &

Arkin, Tambe) Collaborative Internet-agents (Etzioni & Weld, Weiss) Collaborative interfaces (Grosz & Ortiz, Andre) Information agent on the Internet (Klusch) Cooperative transportation scheduling (Fischer)

Supporting hospital patient scheduling (Decker & Jin) Intelligent Agents for Command and Control (Sycara)



Types of agentsTypes of agents

Fully rational agents Bounded rational agents



Using other disciplines’ resultsUsing other disciplines’ results

No need to start from scratch! Required modification and adjustment; AI gives

insights and complimentary methods. Is it worth it to use formal methods for multi-agent

systems?



Negotiating with rational agentsNegotiating with rational agents

Quantitative decision making◦ Maximizing expected utility◦ Nash equilibrium, Bayesian Nash equilibrium

Automated Negotiator

◦ Model the scenario as a game◦ The agent computes (if complexity allows)

the equilibrium strategy, and actsaccordingly.

(Kraus, Strategic Negotiation inMultiagent Environments,MIT Press 2001).

◦



Game Theory studies situations of strategic interaction in whichGame Theory studies situations of strategic interaction in whicheach decision maker's plan of action depends on the plans of each decision maker's plan of action depends on the plans of the other decision makers.the other decision makers.

Short introduction

to game theory



13

Decision Theory (reminder)Decision Theory (reminder)(How to make decisions)(How to make decisions)

Decision Theory = Probability theory + Utility Theory

(deals with chance) (deals with outcomes)

Fundamental idea◦ The MEU (Maximum expected utility) principle◦ Weigh the utility of each outcome by the probability that it

occurs



14

Basic PrincipleBasic Principle

Given probability P(out1| Ai), utility U(out1),

P(out2| Ai), utility U(out2)…

Expected utility of an action Aii:

EU(Ai) = Σ U(out j)*P(out j|Ai)

Choose Ai such that maximizes EU

MEU = argmax Σ U(out j)*P(out j|Ai) Ai Ac Out j OUT

Out j OUT



15

Risk Averse, Risk NeutralRisk Averse, Risk NeutralRisk SeekingRisk Seeking

0

5

1 0

1 5

2 0

2 5

0 1 M 2 M 3 M 4 M

M o n

U t i l i t y

RISK AVERSE

0

5

1 0

1 5

2 0

2 53 0

3 5

4 0

4 5

0 1 M 2 M 3 M 4 M

M o n

t

t y

RISK NEUTRAL

0

2 0

4 0

6 0

8 0

1 0 0

1 2 0

0 1 M 2 M 3 M

M o n

t

t y

RISK SEEKER



16

Game DescriptionGame Description

Players◦ Who participates in the game?

Actions / Strategies◦ What can each player do?

◦

In what order do the players act? Outcomes / Payoffs

◦ What is the outcome of the game?

◦ What are the players' preferences over the possibleoutcomes?



17

Game Description (cont)Game Description (cont)

Information◦ What do the players know about the parameters of

the environment or about one another?

◦ Can they observe the actions of the other players?

Beliefs◦ What do the players believe about the unknown

parameters of the environment or about oneanother?

◦

What can they infer from observing the actions of the other players?



18

Strategies and EquilibriumStrategies and Equilibrium

Strategy◦ Complete plan, describing an action for every

contingency Nash Equilibrium

◦ Each player's strategy is a best response to thestrategies of the other players

◦ Equivalently: No player can improve his payoffs bychanging his strategy alone

◦

Self-enforcing agreement. No need for formalcontracting Other equilibrium concepts also exist



19

Classification of GamesClassification of Games

Depending on the timing of move◦ Games with simultaneous moves

◦ Games with sequential moves

Depending on the information available to theplayers◦ Games with perfect information

◦ Games with imperfect (or incomplete) information We concentrate on non-cooperative games

◦ Groups of players cannot deviate jointly

◦ Players cannot make binding agreements



20

Games with Simultaneous MovesGames with Simultaneous Movesand Perfect Informationand Perfect Information

All players choose their actions simultaneously or justindependently of one another

There is no private information

All aspects of the game are known to the players Representation by game matrices Often called normal form games or strategic form

games



21

Matching PenniesMatching Pennies

Example of a zero-sum game.Strategic issue of competition.



22

Prisoner’s DilemmaPrisoner’s Dilemma

Each player can cooperate or defect

cooperate defect

defect 0,-10

-10,0

-8,-8

-1,-1

Row

Column

cooperate

Main issue: Tension betweensocial optimality and individual incentives.



23

Coordination GamesCoordination Games

A supplier and a buyer need to decide whether to adopt a new purchasing system.

new old

old 0,0

0,0

5,5

20,20

Supplier

Buyer

new



24

Battle of sexesBattle of sexes

football shopping

shopping 0,0

0,0

1,2

2,1

Husband

Wife

football

The game involves both the issues of coordination andcompetition



25

Definition of Nash EquilibriumDefinition of Nash Equilibrium

A game has n players. Each player i has a strategy set S i

◦ This is his possible actions Each player has a payoff function

◦ pI: S R

A strategy t i in S i is a best response if there is no

other strategy in S i that produces a higher

payoff, given the opponent’s strategies



26

Definition of Nash EquilibriumDefinition of Nash Equilibrium

A strategy profile is a list (s1, s2 , …, sn) of thestrategies each player is using

If each strategy is a best response given theother strategies in the profile, the profile is a

Nash equilibrium Why is this important?

◦ If we assume players are rational, they will playNash strategies

◦ Even less-than-rational play will often converge toNash in repeated settings



27

An Example of a Nash EquilibriumAn Example of a Nash Equilibrium

a b

b 2,1

0,1

1,0

1,2

Row

Column

a

(b,a) is a Nash equilibrium:Given that column is playing a, row’s best response is b Given that row isplaying b, column’s best response is a



28

Mixed strategiesMixed strategies

Unfortunately, not every game has a purestrategy equilibrium.◦ Rock-paper-scissors

However, every game has a mixed strategy

Nash equilibrium Each action is assigned a probability of play Player is indifferent between actions, given

these probabilities



29

Mixed StrategiesMixed Strategies

football shopping

shopping 0,0

0,0

1,2

2,1

Husband

Wife

football



30

Mixed strategyMixed strategy

Instead, each player selects a probability associated

with each action◦ Goal: utility of each action is equal◦ Players are indifferent to choices at this probability

a=probability husband chooses football b=probability wife chooses shopping Since payoffs must be equal, for husband:

◦ b*1=(1-b)*2 b=2/3 For wife:

◦ a*1=(1-a)*2 = 2/3 In each case, expected payoff is 2/3

◦ 2/9 of time go to football, 2/9 shopping, 5/9 miscoordinate If they could synchronize ahead of time they could

do better.



31

Rock paper scissorsRock paper scissors

rock paper

paper 1,-1

-1,1

0,0

0,0

Row

Column

rock

scissors

scissors

1,-1

-1,1

-1,1 1,-1 0,0



32

SetupSetup

Player 1 plays rock with probability pr ,scissors with probability ps, paper withprobability 1-pr –ps

Utility2(rock) = 0*pr + 1*ps – 1(1-pr –ps) =

2 ps + pr -1 Utility2(scissors) = 0*ps + 1*(1 – pr – ps) – 1pr

= 1 – 2pr –ps

Utility2

(paper) = 0*(1-pr

–ps

)+ 1*pr

– 1ps= pr –ps

Player 2 wants to choose a probability for each actionso that the expected payoff for each action is thesame.



33

SetupSetup

qr (2 ps + pr –1) = qs(1 – 2pr –ps) = (1-qr -qs) (pr –ps)

• It turns out (after some algebra) that the optimal

mixed strategy is to play each action 1/3 of the time

• Intuition: What if you played rock half the time?Your opponent would then play paper half thetime, and you’d lose more often than you won

•So you’d decrease the fraction of times youplayed rock, until your opponent had no ‘edge’in guessing what you’ll do



34

Extensive Form GamesExtensive Form Games

H

H H

T

T T

(1,2) (4,0)(2,1) (2,1)

Any finite game of perfectinformation has a purestrategy Nash equilibrium.It can be found bybackward induction.

Chess is a finite game of perfect information.Therefore it is a “trivial” game from a gametheoretic point of view.



35

Extensive Form Games - IntroExtensive Form Games - Intro

A game can have complex temporal structure Information

◦ set of players

◦ who moves when and under what circumstances

◦ what actions are available when called upon tomove

◦ what is known when called upon to move

◦ what payoffs each player receives

◦

Foundation is a game tree



36

Example: Cuban Missile CrisisExample: Cuban Missile Crisis

Khrushchev

Kennedy

Arm

Retract

Fold

Nuke

-1, 1

- 100, - 100

10, -10

Pure strategy Nash equilibria: (Arm, Fold)

and (Retract, Nuke)



37

Subgame perfect equilibrium &Subgame perfect equilibrium &credible threatscredible threats

Proper subgame = subtree (of the game tree)whose root is alone in its information set

Subgame perfect equilibrium◦ Strategy profile that is in Nash equilibrium in everyproper subgame (including the root), whether or notthat subgame is reached along the equilibrium pathof play



38

Example: Cuban Missile CrisisExample: Cuban Missile Crisis

Khrushchev

Kennedy

Arm

Retract

Fold

Nuke

-1, 1

- 100, - 100

10, -10

Pure strategy Nash equilibria: (Arm, Fold) and (Retract,Nuke)

Pure strategy subgame perfect equilibria: (Arm, Fold)

Conclusion: Kennedy’s Nuke threat was not credible.

f



39

Type of gamesype of games

Diplomacy



40

Take it or leave it dealsTake it or leave it deals

• The rules of the game:1.You will be randomly paired up with someone in the other

section; this pairing will remain completely anonymous.

2.One of you will be chosen (by coin flip) to be either theProposer or the Responder in this experiment.

3.The Proposer gets to make an offer to split $100 in some

proportion with the Responder. So the proposer canoffer $x to the responder, proposing to keep $100-xfor themselves.

4.The Responder must decide what is the lowest amountoffered by the proposer that he / she will accept; i.e. “Iwill accept any offer which is greater than or equal to

$y.”5.If the responder accepts the offer made by the proposer,

they split the sum according to the proposal . If theresponder rejects, both parties lose their shares.



41

AN EXAMPLE OF Buyer/Seller negotiationAN EXAMPLE OF Buyer/Seller negotiation



42

BARGAININGBARGAINING

ZOPA

xfinal prices b

Sellers’ RPSellers wants s or more

Buyers’ RPBuyer wants b or less

Sellers’ surplus Buyers’ surplus



43


If b < s negative bargaining zonenegative bargaining zone,no possible agreements

If b > s positive bargaining zone,positive bargaining zone, agreement possible

(x-s) sellers’ surplus; (b-x) buyers’ surplus;

The surplus to divide independent on ‘x’ –constant-sum game!



44

POSITIVE BARGAINING ZONEPOSITIVE BARGAINING ZONE

Buyers’ target point

Buyers’ reservation point

Sellers’ reservation point Sellers’ target point

Sellers’ bargaining range

Buyers’ bargaining range

POSITIVE bargaining zone



45

NEGATIVE BARGAININGZONE

Buyers’ target point

Buyers’ reservation point

Sellers’ reservation point Sellers’ target point

Sellers’bargaining range

Buyers’ bargainingrange

NEGATIVE bargaining zone



46

Single issue negotiationSingle issue negotiation

Agents a and b negotiate over a pie of size 1 Offer: (x,y), x+y=1 Deadline: n and Discount factor: δ

Utility: Ua((x,y), t) = x δt-1

if t ≤ n Ub((x,y),t)= y δt-1 0 otherwise

The agents negotiate using Rubinstein’s alternating

offer’s protocol



47

Alternating offers protocolAlternating offers protocol

Time Offer Respond 1 a(x1,y1) b(accept/reject)

2 b (x2,y2) a (accept/reject) -

-

n



48

How much should an agent offer if there isonly one time period?

Let n=1 and a be the first mover

Equilibrium strategies

Agent a’ s offer:

Propose to keep the whole pie (1,0);agent b will accept this



49

Equilibrium strategies for n = 2Equilibrium strategies for n = 2δ = 1/4 first mover: a

Offer: ( x , y ) x : a’sshare; y : b’s shareOptimal offers obtained using backward induction

Time Offering agent Offer Utility

1 a → b (3/4, 1/4) 3/4;1/4

2 b → a (0, 1) 0;1/4

The offer (3/4, 1/4) forms a P.E. Nash

equilibrium

Agreement



50

What happens to first mover’s share as δincreases?

What happens to second mover’s share as δ

increases? As deadline increases, what happens to first

mover’s share? Likewise for second mover?

Effect of discount factor and deadlineEffect of discount factor and deadlineon the equilibrium outcomeon the equilibrium outcome

Effect of δ and deadline on the agents’ shares



51

Effect of δ and deadline on the agents’ shares



52

Multiple issuesMultiple issues

Set of issues: S = {1, 2, …, m}. Each issue is apie of size 1

The issues are divisible Deadline: n (for all the issues)

Discount factor: δ c for issue c

Utility: U(x, t) = ∑c U(xc, t)



53

Multi-issue proceduresMulti-issue procedures

Package deal procedure: The issues are bundledand discussed together as a package

Simultaneous procedure: The issues are

negotiated in parallel but independently of eachother

Sequential procedure: The issues are negotiatedsequentially one after another

P k d l d



54

Package deal procedure

Issues negotiated using alternating offer’sprotocol An offer specifies a division for each of the

m issues

The agents are allowed to accept/reject acomplete offer The agents may have different preferences

over the issues

The agents can make tradeoffs across theissues to maximize their utility – thisleads to Pareto optimal outcome

Utility for two issues



55

Utility for two issuesUa = 2X + Y U b = X + 2Y

M ki t d ff



56

Making tradeoffs

U b = 2

What is a’s utility for Ub = 2

E l f t i



57

Example for two issuesDEADLINE: n = 2

DISCOUNT FACTORS: δ1= δ2 = 1/2

UTILITIES: Ua = 1/2t-1 (x1 + 2x2); Ub =1/2t-1 (2y1 +y2)

Time Offering agentPackage Offer

1 a → b [(1/4, 3/4); (1, 0)]OR [(3/4, 1/4); (0, 1)]

2 b → a [(0, 1); (0, 1)]U b = 1.5

Agreement

The outcome is not symmetric

P E N h ilib i t t i



58

P.E. Nash equilibrium strategiesFor t = nThe offering agent takes 100 percent of all the issues

The receiving agent accepts

For t < n (for agent a):

OFFER [ x, y]s.t. U b( y, t ) = EQUB (t +1)

If more than one such [ x, y] perform trade-offs across issuesto find best offer

RECEIVE [ x, y]

If Ua( x, t ) ≥ EQUA (t +1)

ACCEPTelse REJECT

EQUA (t +1) is a’s equilibrium utility for t+1

EQUB (t +1) is b’s equilibrium utility for t+1

M ki t d ff di i ibl i



59

Making trade-offs – divisible issues

Agent a’s trade-off problem at time t :

TR: Find a package [x, y] to

mMaximize ∑ k

ac xc

c=1

m

Subject to ∑ kbc yc ≥ EQUB (t+1)0 ≤ xc ≤ 1; 0 ≤ yc ≤ 1

c=1

This is the fractional knapsack problem



60

Making trade-offs – divisible issues

Agent a’s perspective (time t )•

•Agent a considers the m issues in theincreasing order of ka/kb and assigns to b the maximum possible share for each of them until b’s cumulative utility equalsEQUB (t +1)

E ilib i t t i



61

Equilibrium strategies

For t = nThe offering agent takes 100 percent of all the issuesThe receiving agent acceptsFor t < n (for agent a)

OFFER [ x , y ]

s.t. Ub(y , t ) = EQUB (t +1)

If more then one such [ x , y ]

perform trade-offs acrossissues to find best offer

RECEIVE [ x , y ]

If Ua( x , t ) ≥ EQUA (t +1)

ACCEPT

else REJECT



M ki t d ffM ki t d ff



63

Making trade-offs –Making trade-offs –indivisible issuesindivisible issues

Agent a’s trade-off problem at time t is to find apackage [x, y] that

For indivisible issues, this is the integer knapsack problem

( ) 10:;10:1..1

1

or yor xt EQ yk t S

xk Maximize

ccUBc

m

c

b

c

m

c

c

a

c

+≥∑

∑

=

=

Key pointsKey points



64

Key pointsKey points

Single issue:

Time to compute equilibrium is O(n) The equilibrium is not unique, it is not symmetric

Multiple divisible issues: (exact solution) Time to compute equilibrium for t=1 is O(mn)

The equilibrium is Pareto optimal, it is not unique, it isnot symmetric

Multiple indivisible issues: (approx. solution) There is an FPTAS to compute approximate

equilibrium The equilibrium is Pareto optimal, it is not unique, it is

not symmetric



65

Negotiation on dataNegotiation on dataallocation in multi-server allocation in multi-server environmentenvironment R. Azulay-Schwartz and S. Kraus. Negotiation On DataR. Azulay-Schwartz and S. Kraus. Negotiation On DataAllocation in Multi-Agent Environments. AutonomousAllocation in Multi-Agent Environments. AutonomousAgents and Multi-Agent Systems journal 5(2):123-172,Agents and Multi-Agent Systems journal 5(2):123-172,2002.2002.



66

Cooperative Web ServersCooperative Web Servers

•The Data and Information System component of the Earth Observing System (EOSDIS) of NASAis a distributed knowledge system whichsupports archival and distribution of data atmultiple and independent servers.



67

Cooperative Web Servers- cont.Cooperative Web Servers- cont.

•Each data collection, or file, is called a dataset.The datasets are huge, so each dataset hasonly one copy.

•The current policy for data allocation in NASA isstatic: old datasets are not reallocated; eachnew dataset is located by the server with thenearest topics (defined according to the topicsof the datasets stored by this server).

Related WorkRelated Work



68

Related Work -Related Work -File Allocation ProblemFile Allocation Problem

The original problem:How to distribute files among computers, in order to optimize the system performance.

Our problem:

How can self-motivated servers decide aboutdistribution of files, when each server has its ownobjectives.



69

EnvironmentEnvironment DescriptionDescription

•There are several information servers. Eachserver is located at a different geographicalarea.

•Each server receives queries from the clients in

its area, and sends documents as responses toqueries. These documents can be storedlocally, or in another server.



70

Environment DescriptionEnvironment Description

server i server j

a query

document/s

area iarea j

distance

a client

the document/s

the query



71

Basic DefinitionsBasic Definitions

•SERVERS:the set of the servers.

•DATASETS:the set of datasets (files) to be allocated.

•

Allocation:a mapping of each dataset to one of theservers. The set of all possible allocation isdenoted by Allocs.

•U: the utility function of each server.



73

Utility FunctionUtility Function

•U server (alloc,t) specifies the utility of server fromalloc ∈ Allocs at time t .

•It consists of •The utility from the assignment of each dataset.•The cost of negotiation delay.

•

U server (alloc,0)= V server (x,alloc(x)). x ־DATASETS



74

Parameters of utilityParameters of utility

•query price: payment for retrieved docoments.•usage(ds,s): the expected number of documents

of dataset ds from clients in the area of server s.

•

storage costs, retrieve costs, answer costs.

C t ti



75

Cost over timeCost over time

•Cost of communication and computationtime of the negotiation.

•Loss of unused information: new documentscan not be used until the negotiation ends.

•

Datasets usage and storage cost areassumed to decrease over time, with thesame discount ratio (p-1).

•Thus, there is a constant discount ratio of theutility from an allocation:

U server (alloc,t)=δ t *U server (alloc,0) - t*C .



76

AssumptionsAssumptions

•Each server prefers any agreement over continuation of the negotiation indefinitely.

•The utility of each server from the conflict

allocation is always greater or equal to 0.

•OFFERS - the set of allocations that arepreferred by all the agents over opting out.

Negotiation Analysis -Negotiation Analysis -



77

Negotiation Analysis -Negotiation Analysis -Simultaneous ResponsesSimultaneous Responses

•Simultaneous responses: A server, when responding, is not informed of the other responses.

•Theorem:

For each offer x ∈OFFERS , there is asubgame-perfect equilibrium of the bargaininggame, with the outcome x offered andunanimously accepted in period 0.

C



78

Choosing the AllocationChoosing the Allocation

•The designers of the servers can agree inadvance on a joint technique for choosing x

•giving each server its conflict utility

•maximizing a social welfare criterion

• the sum of the servers’ utilities.•or the generalized Nash product of the servers’

utilities: Π (Us(x)-Us(conflict))

E i t l E l ti



79

Experimental EvaluationExperimental Evaluation

•How do the parameters influence the results of the negotiation?•vcost(alloc): the variable costs due to an

allocation (excludes storage_cost and the gainsdue to queries).

•vcost_ratio: the ratio of vcosts when usingnegotiation, and vcosts of the static allocation.

ff f



80

Effect of Parameters on The ResultsEffect of Parameters on The Results

•As the number of servers grows, vcost_ratio increases (more complex computations)L.

•As the number of datasets grows, vcost_ratio decreases (negotiation is more beneficial) J.

•

Changing the mean usage did not influencevcost_ratio significantlyK, but vcost_ratio decreases as the standard deviation of theusage increasesJ.

I fl f PI fl f P t t



81

Influence of Parameters - cont.Influence of Parameters - cont.

•When the standard deviation of the distancesbetween servers increases, vcost_ratio decreasesJ.

•When the distance between servers increases,

vcost_ratio decreasesJ

.• In the domains tested,•answer_cost vcost_ratio L.•storage_cost vcost_ratio L.• retrieve_cost vcost_ratio J.•query_price vcost_ratio J.

I l t I f tiI l t I f ti



82

Incomplete InformationIncomplete Information

•Each server knows:•The usage frequency of all

datasets, by clients from its area

•The usage frequency of datasetsstored in it, by all clients




83


ZOPA

xfinal pricesL bL

Sellers’ RPSellers wants s or more

Buyers’ RPBuyer wants b or less

Sellers’ surplus Buyers’ surplus

sH bH

Definition of a Bayesian gameDefinition of a Bayesian game



84

Definition of a Bayesian gameDefinition of a Bayesian game N is the set of players. Ω is the set of the states of nature. Ai is the set of actions for player i. A = A1 × A2 × …

× An

T i is the type set of player i. For each state of

nature, the game will have different types of players (one type per player). u : Ω × A→ R is the payoff function for player i. pi is the probability distribution over Ω for each

player i, that is to say, each player has differentviews of the probability distribution over the statesof the nature. In the game, they never know theexact state of the nature.

Sol tion concepts for Ba esian gamesSolution concepts for Bayesian games



85

Solution concepts for Bayesian gamesSolution concepts for Bayesian games

A (Bayesian) Nash equilibrium is a strategy profileand beliefs specified for each player about thetypes of the other players that maximizes theexpected utility for each player given their beliefsabout the other players' types and given thestrategies played by the other players.

I l t I f ti tI l t I f ti t



86

Incomplete Information - cont.Incomplete Information - cont.

•A revelation mechanism:

•First, all the servers report simultaneously all their private information:• for each dataset, the past usage of the dataset by this

server.

• for each server, the past usage of each local dataset bythis server.

•Then, the negotiation proceeds as in the completeinformation case.

I l t I f ti tI l t I f ti t



87

Incomplete Information - cont.Incomplete Information - cont.

•Lemma:

There is a Nash equilibrium where each server tells the truth about its past usage of remotedatasets, and the other servers usage of its

local datasets.

•Lies concerning details about local usage of localdatasets are intractable.

Summary: negotiation on dataSummary: negotiation on data



88

Summary: negotiation on dataSummary: negotiation on dataallocationallocation•

We have considered the data allocationproblem in a distributed environment.

•We have presented the utility function of theservers, which expresses their preferences.

•We have proposed using a negotiation protocolfor solving the problem.

•For incomplete information situations, arevelation process was added to the protocol.



89

Agent-Human NegotiationAgent-Human Negotiation

C t i t ti ith lC t i t ti ith l



90

Computers interacting with peopleComputers interacting with people

Computer persuahuman

Computer has the

control

Human hasthe control



9191



92

Culture sensitive agentsCulture sensitive agentsThe development of standardizedagent to be used in the collectionof data for studies on culture and

negotiation

r agents negotiate well across cultures

Semi autonomous carsSemi autonomous cars



93

Semi-autonomous carsSemi-autonomous cars

Medical applicationsMedical applications

mailto:[email protected]



94

Medical applicationsMedical applications

Gertner Institute for Epidemiology and HealthPolicy Research

Automated care takerAutomated care taker



95

Automated care-taker Automated care-taker

I will be too tired in the afternoon!!!I scheduled an appointment for you at the physiotherapis

Try to reschedule and fail

The physiotherapist has no other available appoiHow about resting before the appointment?

Security applicationsSecurity applications



96

Collect

UpdateAnalyzePrioritize

PeoplePeople often follow suboptimaloften follow suboptimal



97

Irrationalities attributed to◦ sensitivity to context

◦ lack of knowledge of own preferences

◦ the effects of complexity

◦ the interplay between emotion and cognition

◦ the problem of self control

◦ bounded rationality in the bullet

pp ppdecision strategiesdecision strategies



98

Agents that play repeatedlywith the same person

AutONAAutONA [BY03][BY03]



AutONAAutONA [BY03][BY03]

Buyers and sellers Using data from previous experiments Belief function to model opponent Implemented several tactics and heuristics

◦ including, concession mechanism

A. Byde, M. Yearworth, K.-Y. Chen, and C. Bartolini. AutONA: A system for automated multiple 1-1 negotiation. In CEC , pages 59–67, 2003

Cliff-EdgeCliff-Edge



Cliff-EdgeCliff Edge

Virtual learning and reinforcement learningUsing data from previous interactionsImplemented several tactics and heuristics

qualitative in natureNon-deterministic behavior, via means of

randomization

R. Katz and S. Kraus. Efficient agents for cliff edgeenvironments with a large set of decision options.In AAMAS , pages 697–704, 2006



101

Agents that play with thesame person only once

General opponent*General opponent*modelingmodeling

Challenges of human opponent*Challenges of human opponent*



102

Small number of examples◦ difficult to collect data on people

Noisy data◦ people are inconsistent (the same person may act

differently)

◦ people are diverse

Challenges of human opponentChallenges of human opponentmodelingmodeling

Guessing HeuristicGuessing Heuristic



Guessing HeuristicGuessing Heuristic

Multi-issue, multi-attribute, withincompleteinformation

Domain independent

Implemented several tacticsand heuristics◦ including, concession mechanism

C. M. Jonker, V. Robu, and J. Treur. An agent architecture for multi-attribute negotiation using incomplete preferenceinformation. JAAMAS , 15(2):221–252, 2007

PURB AgentPURB Agent



PURB AgentPURB Agent

Building blocks: Personality model, Utilityfunction, Rules for guiding choice. Key idea: Models Personality traits of its

negotiation partners over time.

Uses decision theory to decide how to negotiate,with utility function that depends on models andother environmental features.

Pre-defined rules facilitate computation.Plays as well as people; adapts to c

QOAgentQOAgent [LIN08][LIN08]Played a

t least as well as p



QOAgent QOAgent [LIN08][LIN08]

Multi-issue, multi-attribute, with incompleteinformation Domain independent Implemented several tactics and heuristics

◦ qualitative in nature Non-deterministic behavior, also via means of

randomization

R. Lin, S. Kraus, J. Wilkenfeld, and J. Barry. Negotiating with boundrational agents in environments with incomplete information using anautomated agent. Artificial Intelligence, 172(6-7):823 – 851, 2008

y p

Is it possible to improve the QO

Yes, if you have data



106

KBAgent KBAgent

Y. Oshrat, R. Lin, and S. Kraus. Facing the challenge of human-agentnegotiations via effective general opponent modeling. In AAMAS , 2009

Multi-issue, multi-attribute, with incompleteinformation

Domain independent Implemented several tactics and heuristics

◦ qualitative in nature Non-deterministic behavior, also via means of

randomization Using data from previous interactions



108

General opponent modelingGeneral opponent modeling

Challenge: sparse data of past negotiationsessions of people negotiation

Technique: Kernel Density Estimation

§

n

G l t d liG l t d li



109

Estimate likelihood of other party: accept an offer make an offer its expected average utility

The estimation is done separately for each possibleagent type:

The type of a negotiator is determined using a simpleBayes' classifier

Use estimation for decision making

General opponent modelingGeneral opponent modeling



110

KBAgent KBAgent as the job candidateas the job candidate

Best result: 20,000, Project manager, With leased car; 20%pension funds, fast promotion, 8 hours

20,000Team Manager

With leased car Pension: 20%Slow promotion9 hours

12,000Programmer

Without leased car Pension: 10%Fast promotion10 hours

20,000Project manager Without leased car Pension: 20%Slow promotion9 hours

KBAgent Human

KBA h j b did



111

KBAgent KBAgent as the job candidateas the job candidate Best agreement: 20,000, Project manager, With leased car; 20%

pension funds, fast promotion, 8 hours

KBAgent Human

20,000Programmer With leased car Pension: 10%Slow promotion9 hours

Round 712,000Programmer Without leased car Pension: 10%Fast promotion

10 hours

20,000Team Manager With leased car Pension: 20%Slow promotion9 hours

E i t



112112

ExperimentsExperiments

172 grad and undergrad students in Computer Science

People were told they may be playing a computer agent or a person.

Scenarios: Employer-Employee Tobacco Convention: England vs. Zimbabwe

Learned from 20 games of human-human

Results:Results:



113113

ComparingComparing KBAgent KBAgent to othersto othersPlayer Type Average Utility Value (std)

KBAgent vs people Employer 468.9 (37.0)QOAgent vs peoples 417.4 (135.9)People vs. People 408.9 (106.7)People vs. QOAgent 431.8 (80.8)

People vs. KBAgent 380. 4 (48.5)KBAgent 482.7 (57.5)QOAgent Job

Candidate397.8 (86.0)

People vs. People 310.3 (143.6)People vs. QOAgent 320.5 (112.7)

People vs. KBAgent 370.5 (58.9)

M i lM i lt



114

Main resultsMain results

In comparison to the QOAgent The KBAgent achieved higher utility values than

QOAgent More agreements were accepted by people

The sum of utility values (social welfare) were higher when the KBAgent was involvedThe KBAgent achieved significantly higher utility

values than people

Results demonstrate the proficiency negotiationdone by the KBAgent

ponent* modeling improves agent ba

Automated care-takerAutomated care-taker



115

Automated care taker Automated care taker

I will be too tired in the afternoon!!!I arrange for you to go to the physiotherapist in the

How can I convince him? What argument should I give?

Security applicationsSecurity applicationsHow should I convince

him to provide me with informatio



116

d I tell him th

at we are running out of antibiotics?



117

Which information to reveal?

ArgumentationArgumentation

Should I tell him that I will lose a project if I don’t hire t

Should I tell him I was fired from my last job?

Should I tell her that my leg hurts?

Build a game thatcombines informationrevelation and bargaining

Automated care-takerAutomated care-taker



118

Automated care taker Automated care taker

I will be too tired in the afternoon!!!I arrange for you to go to the physiotherapist in the

How can I convince him? What argument should I give?

Security applicationsSecurity applications



119

hould I convince him to provide me with information?

C l T il (CT)Color Trails (CT)



120

Color Trails (CT)Color Trails (CT)

An infrastructure for agentdesign, implementationand evaluation for open

environmentsDesigned with Barbara Grosz

(AAMAS 2004)

Implemented by Harvard teamand BIU team

An e perimental test tedAn experimental test ted



121

An experimental test-tedAn experimental test-ted

Interesting for people to play◦ analogous to task settings;

◦ vivid representation of strategy space(not just a list of outcomes).

Possible for computers to play Can vary in complexity

◦ repeated vs. one-shot setting;

◦ availability of information;

◦ communication protocol.◦

S i l P f A tSocial Preference Agent



Social Preference AgentSocial Preference Agent

Learns the extent to which people are affected bysocial preferences such as social welfare andcompetitiveness.

Designed for one-shot take-it-or-leave-itscenarios. Does not reason about the future ramifications of

its actions.

Y. Gal and A. Pfeffer: Predicting people's bidding behavior innegotiation. AAMAS 2006: 370-376



123

Agents for Revelation Games

Peled Noam, Gal Kobi,Kraus Sarit

Introduction - Revelation gamesIntroduction - Revelation games



124

Combine two types of interaction Signaling games (Spence 1974)

Players choose whether to convey privateinformation to each other

Bargaining games (Osborne and Rubinstein 1999)

Players engage in multiple negotiation rounds Example: Job interview

Colored Trails (CT)Colored Trails (CT)



125

Asymmetric Symmetric

Why not equilibrium agents?Why not equilibrium agents?



126

y q gy q g

Results from the social sciences suggest peopledo not follow equilibrium strategies:

◦ Equilibrium based agents played againstpeople failed.

People rarely design agents to follow equilibriumstrategies(Sarne et al AAMAS 2008).

Equilibrium strategies are

usually not cooperative – all lose.



PE agent – Phase onePE agent – Phase one



128

First proposal round (generous):

First proposer: propose the opponent’scounter-proposal.

First responder: Accepts anyproposals which gives it the same or higher benefit from its counter-proposal.

Revelation phase - revelation vs non

revelation: In both boards, the PE with goal revelation yields

lower or equal expected utility than non-revelation PE

Benefits DiversityBenefits Diversity



129

Average proposed benefit to players fromfirst and second rounds

Performance of PEQ agenterformance of PEQ agent



130

Revelation EffectRevelation Effect



131

Only 35% of the games played by

humans included revelation Revelation had a significant effect on

human performance but not on agent

performance Revelation didn't help the agent People were deterred by the strategic

machine-generated proposals

SIGAL agentSIGAL agent



132

gg

Agent based on general opponentmodeling:

Genetic algorithm Logistic Regressio

SIGAL AgentSIGAL Agent



133

Learns from previous games. Predict the acceptance probability for each

proposal using Logistic Regression. Models human as using a weighted utility

function of: Humans benefit Benefits difference Revelation decision

Benefits in previous round

Logistic Regression using aLogistic Regression using aGenetic AlgorithmGenetic Algorithm



134

Genetic AlgorithmGenetic Algorithm

Expected benefit maximizationExpected benefit maximization



135

Maximization – round 2Maximization – round 2



136

Strategy ComparisonStrategy Comparison

S f f



137

Strategies for the asymmetric board, non of the

players has revealed, the human lacks 2 chipsfor reaching the goal, the agent lacks 1:

* In first round the agent was proposed a benefit of 90

HeuristicsHeuristics



138

Tit for Tat Never give more than you asks in the

counter-proposal

Risk averseness Isoelastic utility:

Learned CoefficientsLearned Coefficients



139

Responder benefit: (0.96) Benefits difference: (-0.79) Responder revelation: (0.26)

Proposer revelation: (0.03) Responder benefit in first round: (0.45) Proposer benefit in first round: (0.33)

MethodologyMethodology



140

Cross validation. 10-fold Over-fitting removal. Stop learning in the minimum of the

generalization error Error calculation on held out test set.Using new human-human games

Performance prediction criteria.

PerformancePerformance



141



142

General opponent* modeling inGeneral opponent* modeling inMaximization problemsMaximization problems

AAT agentAAT agent



143

Agent based on general* opponentmodeling

cision Tree/ Naïve Byes AAT

Aspiration Adaptation TheoryAspiration Adaptation Theory( )(AAT)



144

(AAT)(AAT)

Economic theory of people’s behavior (Selten) No utility function exists for decisions (!)

Relative decisions used insteadRetreat and urgency used for goal variables

Avi Rosenfeld and Sarit Kraus. Modeling Agents through BoundedRationality Theories. Proc. of IJCAI 2009., JAAMAS, 2010.

Commodity searchCommodity search



145

1000




146

1000

900




147

1000

900

950

If price < 800 buy; otherwise visit 5 stores andbuy in the cheapest.

ResultsResults



148



149

Generalopponent*modeling in

cooperativeenvironments

Coordination with limitedCoordination with limitedi ticommunication



150

communicationcommunication

Communication is not always possible:◦ High communication costs

◦ Need to act undetected

◦ Damaged communication devices◦ Language incompatibilities

◦ Goal: Limited interruption of humanactivities

Zuckerman, S. Kraus and J. S. Rosenschein.Using Focal Points Learning to ImproveHuman-Machine Tactic Coordination, JAAMAS, 2010.

Focal Points (Examples)Focal Points (Examples)



151

Divide £100 into two piles, if your piles areidentical to your coordination partner, you getthe £100. Otherwise, you get nothing.

101 equilibria

Focal points (Examples)Focal points (Examples)



152

9 equilibria16 equilibria

Focal PointsFocal Points



153

Thomas Schelling (63)

Focal Points = Prominentsolutions to tactic

coordination games

Prior work: Focal PointsPrior work: Focal Points BasedBasedCoordination for closed environmentsCoordination for closed environments



154

Domain-independent rules that could be used byautomated agents to identify focal points:

Properties: Centrality,Firstness, Extremeness, Singularity.

◦ Logic based model

◦ Decision theory based model Algorithms for agents coordination

Kraus and Rosenchein MAAMA 1992Fenster et al ICMAS 1995Annals of Mathematics and Artificial Intelligence 2000

FPL agentFPL agent



155

Agent based on general* opponentmodeling

ision Tree/ neural network Focal Point

FPL agentFPL agent



156

Agent based on general opponentmodeling:

ision Tree/ neural networkraw data vector

FP vector

Focal Point LearningFocal Point Learning



157

3 experimental domains:

Results – cont’Results – cont’General opponent*modeling improvesagent coordination



158

“very similar domain” (VSD) vs “similar domain” (SD) of the “pick the pile” game.



159

eriments with people is a costly proc

Evaluation of agents (EDA)Evaluation of agents (EDA)



160

Peer Designed Agents (PDA): computer agentsdeveloped by humans

Experiment: 300 human subjects, 50 PDAs, 3 EDA Results:

◦ EDA outperformed PDAs in the same situations in

which they outperformed people,◦ on average, EDA exhibited the same measure of

generosity

R. Lin, S. Kraus, Y. Oshrat and Y. Gal. Facilitating the Evaluationof Automated Negotiators using Peer Designed Agents, in AAAI2010.

ConclusionsConclusions



161

Negotiation and argumentation with people isrequired for many applications

General* opponent modeling is beneficial◦ Machine learning

◦ Behavioral model

◦ Challenge: how to integrate machine learning andbehavioral model

ReferencesReferences

1. S.S. Fatima, M. Wooldridge, and N.R. Jennings, Multi-issue negotiationwith deadlines, Jnl of AI Research, 21: 381-471, 2006.



162

with deadlines, Jnl of AI Research, 21: 381 471, 2006.

2. R. Keeney and H. Raiffa, Decisions with multiple objectives: Preferences

and value trade-offs, John Wiley, 1976.3. S. Kraus, Strategic negotiation in multiagent environments, The MIT press,

2001.

4. S. Kraus and D. Lehmann. Designing and Building a NegotiatingAutomated Agent, Computational Intelligence, 11(1):132-171, 1995

5. S. Kraus, K. Sycara and A. Evenchik. Reaching agreements through

argumentation: a logical model and implementation. Artificial Intelligence journal, 104(1-2):1-69, 1998.

6. R. Lin and Sarit Kraus. Can Automated Agents Proficiently Negotiate WithHumans? Communications of the ACM Vol. 53 No. 1, Pages 78-88,January, 2010.

7. R. Lin, S. Kraus, Y. Oshrat and Y. Gal. Facilitating the Evaluation of

Automated Negotiators using Peer Designed Agents, in AAAI 2010.

References contd.References contd.8. R. Lin, S. Kraus, J. Wilkenfeld, and J. Barry. Negotiating with bounded

rational agents in environments with incomplete information using an



163

rational agents in environments with incomplete information using anautomated agent. Artificial Intelligence, 172(6-7):823 – 851, 2008

9. A. Lomuscio, M. Wooldridge, and N.R. Jennings, A classification scheme for negotiation in electronic commerce , Int. Jnl. of Group Deciion andNegotiation, 12(1), 31-56, 2003.

10.M.J. Osborne and A. Rubinstein, A course in game theory, The MIT press,1994.

11.M.J. Osborne and A. Rubinstein, Bargaining and Markets, Academic Press,1990.

12.Y. Oshrat, R. Lin, and S. Kraus. Facing the challenge of human-agentnegotiations via effective general opponent modeling. In AAMAS , 2009

13.H. Raiffa, The Art and Science of Negotiation, Harvard University Press,1982.

14.J.S. Rosenschein and G. Zlotkin, Rules of encounter, The MIT press, 1994.15.I. Stahl, Bargaining Theory, Economics Research Institute, Stockholm Schoolof Economics, 1972.

16.I. Zuckerman, S. Kraus and J. S. Rosenschein. Using Focal Points Learningto Improve Human-Machine Tactic Coordination, JAAMAS, 2010.

17.

18

Tournament



2nd

annual competition of state-of-the-artnegotiating agents to be held in AAMAS’11

Do you want to participate?

At least $2,000 for the winner!

summer school july 2010

Documents