worlds with many intelligent agents

33
Worlds with many intelligent agents An important consideration in AI, as well as games, distributed systems and networking, economics, sociology, political science, international relations, and other disciplines

Upload: colton

Post on 08-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Worlds with many intelligent agents. An important consideration in AI, as well as games, distributed systems and networking, economics, sociology, political science, international relations, and other disciplines. Multi-agent Systems, or “Games”. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Worlds with  many intelligent agents

Worlds with many intelligent agents

An important consideration in AI, as well as games, distributed systems and networking,

economics, sociology, political science, international relations, and other disciplines

Page 2: Worlds with  many intelligent agents

Multi-agent Systems, or “Games”Worlds with multiple agents, somtimes called “games”, are VASTLY more complicated than worlds with single agents.

They’re also more interesting and fun!

There are many different aspects to learn about:- Representations (we will consider two, but there are MANY)- Evaluation metrics (I’ll show you a few standard ones, but again there are

MANY)- Inference and planning (we will talk about a few techniques)- Learning (We won’t really cover this)- Communication (We won’t directly cover this)- Computational Complexity (I’ll mention this in passing)- Applications (We’ll look at several examples, but there are TONS)

Page 3: Worlds with  many intelligent agents

Example Game (Prisoner’s Dilemma)Student A and Student B have been naughty, cheating on an exam. Their teacher finds some evidence, and sends them to the dean’s office, where the dean grills them individually to find out more.

Dean now has lots of evidence on both students, both are suspended for the semester.

A: -7, B: -7

Dean has lots of evidence on A, gives B a light punishment for cooperating.

A: -10, B: -1

Dean has lots of evidence on B, gives A a light punishment for cooperating.

A: -1, B: -10

Dean has little evidence on both, gives both some penalty.

A: -3, B: -3

A rats out B A says nothing

B ra

ts

out A

B sa

ys

noth

ing

Outcome:

Page 4: Worlds with  many intelligent agents

Example Game (Prisoner’s Dilemma)Definitions: the set of choices available to each agent is called their action set, denoted Actions(A) or Actions(B).Each square in the grid is called an outcome. Mathematically, let a be an action in Actions(A), and b in Actions(B). O(a, b) is the outcome square. OA(a, b) is the reward for A, and OB(a, b) is the reward for B.This grid is called a normal form representation of the game.

Dean now has lots of evidence on both students, both are suspended for the semester.

A: -7, B: -7

Dean has lots of evidence on A, gives B a light punishment for cooperating.

A: -10, B: -1

Dean has lots of evidence on B, gives A a light punishment for cooperating.

A: -1, B: -10

Dean has little evidence on both, gives both some penalty.

A: -3, B: -3

A rats out B A says nothing

B ra

ts

out A

B sa

ys

noth

ing

Outcome:

Page 5: Worlds with  many intelligent agents

Quiz: Best outcome?Which outcome is the best, and why?

Dean now has lots of evidence on both students, both are suspended for the semester.

A: -7, B: -7

Dean has lots of evidence on A, gives B a light punishment for cooperating.

A: -10, B: -1

Dean has lots of evidence on B, gives A a light punishment for cooperating.

A: -1, B: -10

Dean has little evidence on both, gives both some penalty.

A: -3, B: -3

A rats out B A says nothing

B ra

ts

out A

B sa

ys

noth

ing

Outcome:

Page 6: Worlds with  many intelligent agents

Answer: Best outcome?The answer depends. Let’s look at a few definitions of “best”.

Dean now has lots of evidence on both students, both are suspended for the semester.

A: -7, B: -7

Dean has lots of evidence on A, gives B a light punishment for cooperating.

A: -10, B: -1

Dean has lots of evidence on B, gives A a light punishment for cooperating.

A: -1, B: -10

Dean has little evidence on both, gives both some penalty.

A: -3, B: -3

A rats out B A says nothing

B ra

ts

out A

B sa

ys

noth

ing

Outcome:

Page 7: Worlds with  many intelligent agents

Definition: Dominant StrategyDefinition: A strategy (or pure strategy) for player A (ditto for B) is a selection of one of the actions from Actions(A).

Definition: A’s strategy a ∊ Actions(A) is dominant if∀b ∊ Actions(B), ∀a’≠a ∊ Actions(A) . OA(a, b) > OA(a’, b)

Another way of saying this is: a is a dominant strategy for A if it is better than any other strategy a’ for A, no matter what strategy B chooses.

Page 8: Worlds with  many intelligent agents

Quiz: Dominant StrategyDefinition: A’s strategy a ∊ Actions(A) is dominant if∀b ∊ Actions(B), ∀a’≠a ∊ Actions(A) . OA(a, b) > OA(a’, b)Does A have a dominant strategy for the game below? What is it? What about B?

Dean now has lots of evidence on both students, both are suspended for the semester.

A: -7, B: -7

Dean has lots of evidence on A, gives B a light punishment for cooperating.

A: -10, B: -1

Dean has lots of evidence on B, gives A a light punishment for cooperating.

A: -1, B: -10

Dean has little evidence on both, gives both some penalty.

A: -3, B: -3

A rats out B A says nothing

B ra

ts

out A

B sa

ys

noth

ing

Outcome:

Page 9: Worlds with  many intelligent agents

Answer: Dominant StrategyDefinition: A’s strategy a ∊ Actions(A) is dominant if∀b ∊ Actions(B), ∀a’≠a ∊ Actions(A) . OA(a, b) > OA(a’, b)Does A have a dominant strategy for the game below? What is it? What about B?

Dean now has lots of evidence on both students, both are suspended for the semester.

A: -7, B: -7

Dean has lots of evidence on A, gives B a light punishment for cooperating.

A: -10, B: -1

Dean has lots of evidence on B, gives A a light punishment for cooperating.

A: -1, B: -10

Dean has little evidence on both, gives both some penalty.

A: -3, B: -3

A rats out B A says nothing

B ra

ts

out A

B sa

ys

noth

ing

Dominant strategy for A!

Dom

inan

t st

rate

gy fo

r B!

Page 10: Worlds with  many intelligent agents

Definition: Pareto OptimalDefinition: An outcome o is called pareto optimal if there is no other outcome o’ such that all players prefer o’ to o.

Notice: pareto optimality can lead to lots of optimal outcomes, not just one.

That’s because o’ is considered better than o only if ALL players prefer o’ to o.

Page 11: Worlds with  many intelligent agents

Quiz: Pareto OptimalDefinition: An outcome o is called pareto optimal if there is no other outcome o’ such that all players prefer o’ to o.

Quiz: which outcome(s) is (are) pareto optimal?

Dean now has lots of evidence on both students, both are suspended for the semester.

A: -7, B: -7

Dean has lots of evidence on A, gives B a light punishment for cooperating.

A: -10, B: -1

Dean has lots of evidence on B, gives A a light punishment for cooperating.

A: -1, B: -10

Dean has little evidence on both, gives both some penalty.

A: -3, B: -3

A rats out B A says nothing

B ra

ts

out A

B sa

ys

noth

ing

Outcome:

Page 12: Worlds with  many intelligent agents

Answer: Pareto OptimalDefinition: An outcome o is called pareto optimal if there is no other outcome o’ such that all players prefer o’ to o.

Quiz: which outcome(s) is (are) pareto optimal?

Dean now has lots of evidence on both students, both are suspended for the semester.

A: -7, B: -7

Dean has lots of evidence on A, gives B a light punishment for cooperating.

A: -10, B: -1

Dean has lots of evidence on B, gives A a light punishment for cooperating.

A: -1, B: -10

Dean has little evidence on both, gives both some penalty.

A: -3, B: -3

A rats out B A says nothing

B ra

ts

out A

B sa

ys

noth

ing

Outcome:

Page 13: Worlds with  many intelligent agents

Definition: (Nash) EquilibriumDefinition: An outcome O(a, b) is called a Nash Equilibrium (or just equilibrium) if 1. ∀a’≠a ∊ Actions(A) . OA(a, b) >= OA(a’, b) and2. ∀b’≠b ∊ Actions(B) . OB(a, b) >= OB(a, b’)

Another way of saying this is that an outcome is an equilibrium if each player is as happy with this outcome as any other available option, assuming the other player sticks with his equilibrium strategy.

Notice that this means an outcome is stable, in the sense that both players have no incentive to change their strategy.

Page 14: Worlds with  many intelligent agents

Quiz: (Nash) EquilibriumDefinition: An outcome O(a, b) is called a Nash Equilibrium if 1. ∀a’≠a ∊ Actions(A) . OA(a, b) >= OA(a’, b) and2. ∀b’≠b ∊ Actions(B) . OB(a, b) >= OB(a, b’)Which outcome(s) is (are) a Nash equilibrium?

Dean now has lots of evidence on both students, both are suspended for the semester.

A: -7, B: -7

Dean has lots of evidence on A, gives B a light punishment for cooperating.

A: -10, B: -1

Dean has lots of evidence on B, gives A a light punishment for cooperating.

A: -1, B: -10

Dean has little evidence on both, gives both some penalty.

A: -3, B: -3

A rats out B A says nothing

B ra

ts

out A

B sa

ys

noth

ing

Outcome:

Page 15: Worlds with  many intelligent agents

Answer: (Nash) EquilibriumDefinition: An outcome O(a, b) is called a Nash Equilibrium if 1. ∀a’≠a ∊ Actions(A) . OA(a, b) >= OA(a’, b) and2. ∀b’≠b ∊ Actions(B) . OB(a, b) >= OB(a, b’)Which outcome(s) is (are) a Nash equilibrium?

Dean now has lots of evidence on both students, both are suspended for the semester.

A: -7, B: -7

Dean has lots of evidence on A, gives B a light punishment for cooperating.

A: -10, B: -1

Dean has lots of evidence on B, gives A a light punishment for cooperating.

A: -1, B: -10

Dean has little evidence on both, gives both some penalty.

A: -3, B: -3

A rats out B A says nothing

B ra

ts

out A

B sa

ys

noth

ing

Outcome:

Page 16: Worlds with  many intelligent agents

Answer: (Nash) EquilibriumOne reason that the prisoner’s dilemma is famous:The only outcome that is a Nash equilibrium is also the only outcome that is NOT a pareto optimum.This is not true in all games; it depends on the rewards!For the prisoner’s dilemma, it means that the only stable solution is probably the worst possible solution for the players. (It’s sort of a depressing example.)

Dean now has lots of evidence on both students, both are suspended for the semester.

A: -7, B: -7

Dean has lots of evidence on A, gives B a light punishment for cooperating.

A: -10, B: -1

Dean has lots of evidence on B, gives A a light punishment for cooperating.

A: -1, B: -10

Dean has little evidence on both, gives both some penalty.

A: -3, B: -3

A rats out B A says nothing

B ra

ts

out A

B sa

ys

noth

ing

Page 17: Worlds with  many intelligent agents

Quiz: The game of Chicken(a.k.a. “Hawk-Dove”, “snow-drift”)

What are the dominant strategies, pareto optima, and Nash equlibria for the Chicken game?

Neither player wins; a tie.

P1: 0, P2: 0

Player 1 wins.

P1: +1, P2: -1

Player 2 wins.

P1: -1, P2: +1

The players crash; a catastrophe!

P1: -10, P2: -10

Swerve Straight

Swer

veSt

raig

ht

Player 1

Play

er 2

Page 18: Worlds with  many intelligent agents

Answer: The game of Chicken(a.k.a. “Hawk-Dove”, “snow-drift”)

What are the dominant strategies, pareto optima, and Nash equlibria for the Chicken game?

Neither player wins; a tie.

P1: 0, P2: 0

Player 1 wins.

P1: +1, P2: -1

Player 2 wins.

P1: -1, P2: +1

The players crash; a catastrophe!

P1: -10, P2: -10

Swerve Straight

Swer

veSt

raig

ht

Player 1

Play

er 2

No dominant strategies!

Pareto Optimum

Nash Equilibrium

Page 19: Worlds with  many intelligent agents

Quiz: Coordination GamesWhat are the dominant strategies, pareto optima, and Nash equlibria for each of the games below?

P1: 10, P2: 10 P1: -10, P2: -10

P1: -10, P2: -10 P1: 10, P2: 10

Left Right

Left

Righ

t

10, 5 0, 0

0, 0 5, 10

Party Home

Part

yHo

me

Drive on which side of the road?

“Battle of the Sexes” game

10, 10 0, 0

0, 0 5, 5

Party Home

Part

yHo

me

Pure coordination game

10, 10 7, 0

0, 7 7, 7

Stag HareSt

agHa

re

“Stag Hunt” game

Page 20: Worlds with  many intelligent agents

Answer: Coordination GamesWhat are the dominant strategies, pareto optima, and Nash equlibria for each of the games below?

P1: 10, P2: 10 P1: -10, P2: -10

P1: -10, P2: -10 P1: 10, P2: 10

Left Right

Left

Righ

t

10, 5 0, 0

0, 0 5, 10

Party Home

Part

yHo

me

Drive on which side of the road?

“Battle of the Sexes” game

10, 10 0, 0

0, 0 5, 5

Party Home

Part

yHo

me

Pure coordination game

10, 10 7, 0

0, 7 7, 7

Stag HareSt

agHa

re

“Stag Hunt” game

Pareto Optimum

Nash EquilibriumDominant

Strategy

Page 21: Worlds with  many intelligent agents

Humans don’t necessarily play Nash equlibria

The “Guess 2/3 of the Average” Game is famous because in experiments with people, they often don’t play Nash equilibrium strategies. (Other games like this are the centipede game and the prisoner’s dilemma.)

Thus Nash equilibria aren’t perfect models of human behavior.

Let’s try it. Here are the rules to “Guess 2/3 of the average”:

1. Everyone guess a number between 0 and 100 (integer or real). Once everyone guesses, we’ll compute the average.

2. The set of people who come closest to 2/3 of the average are the winners (payoff 1), everyone else loses (payoff 0).

Page 22: Worlds with  many intelligent agents

Nash equilibrium for “Guess 2/3 of the Average”

If everyone guesses 100, then 2/3 of the average is 66.7, so 66.7 is the biggest possible result, so everyone has an incentive to guess 66.7 or less, so anything more than 66.7 can’t be a Nash equilibrium.

If everyone guesses 66.7, then 2/3 of the average will be around 44.4, so everyone has an incentive to guess lower, so guessing more than 44.4 can’t be a Nash equilibrium.

If everyone guesses 0, then we have a Nash equilibrium (they’d all guess right and win, and there’s no incentive for anyone to change their guess assuming everyone else remains at 0).

Page 23: Worlds with  many intelligent agents

Constant-Sum Games

0, 0 +1, -1 -1, +1

-1, +1 0, 0 +1, -1

+1, -1 -1, +1 0, 0

Rock Scissors

Player 1

Play

er 2

Paper

Sciss

ors

Pape

rRo

ck

Notice: if you add up the reward for P1 and P2 in each square, you get 0.

The fact that all squares have the same sum makes this a “constant-sum” game.

This game is in fact a “zero-sum” game.

Page 24: Worlds with  many intelligent agents

Constant-Sum Games

0, 0 +1, -1 -1, +1

-1, +1 0, 0 +1, -1

+1, -1 -1, +1 0, 0

Rock Scissors

Player 1

Play

er 2

Paper

Sciss

ors

Pape

rRo

ck

Actually, any constant-sum game can be converted to an equivalent zero-sum game by subtracting a constant from every payoff.

They are “equivalent” in that this transformation preserves equilibria, optimal, and dominant strategies.

Page 25: Worlds with  many intelligent agents

Constant-Sum Games

0, 0 +1, -1 -1, +1

-1, +1 0, 0 +1, -1

+1, -1 -1, +1 0, 0

Rock Scissors

Player 1

Play

er 2

Paper

Sciss

ors

Pape

rRo

ck

In constant-sum games, any gain for one player is offset by a loss for the other player.

So these are hyper-competitive games.

Page 26: Worlds with  many intelligent agents

Quiz: Pure Strategy Nash Equilibria

0, 0 +1, -1 -1, +1

-1, +1 0, 0 +1, -1

+1, -1 -1, +1 0, 0

Rock Scissors

Player 1

Play

er 2

Paper

Sciss

ors

Pape

rRo

ck

Can you find a Nash equilibrium for Rock-Paper-Scissors?

Page 27: Worlds with  many intelligent agents

Answer: Pure Strategy Nash Equilibria

0, 0 +1, -1 -1, +1

-1, +1 0, 0 +1, -1

+1, -1 -1, +1 0, 0

Rock Scissors

Player 1

Play

er 2

Paper

Sciss

ors

Pape

rRo

ck

Can you find a Nash equilibrium for Rock-Paper-Scissors?

No “pure” strategy for either player can lead to a Nash equilibrium for this game!

(Our definition of “strategy” is equivalent to “pure strategy”. We’ll talk about unpure strategies next.)

Page 28: Worlds with  many intelligent agents

Definition: Mixed StrategyA mixed strategy for player A is a probability distribution over the set of available actions, Actions(A).

For instance, for Rock-Paper-Scissors, the distribution

P(Rock) = 1/3P(Paper) = 1/3P(Scissors) = 1/3

is a mixed strategy.

A pure strategy is also a mixed strategy, but with a probability distribution that places all the probability on one outcome.E.g.: P(Rock) = 0, P(Paper) = 1, P(Scissors) = 0.

Page 29: Worlds with  many intelligent agents

Definition: Mixed Strategy Nash Equilibrium

A mixed strategy profile for players A and B is a pair of mixed strategies, distribution PA for player A and distribution PB for player B.

A mixed strategy Nash equilibrium is a mixed strategy profile (PA, PB) such that neither player would gain by choosing a different mixed strategy, assuming the other player’s mixed strategy stays the same.

Page 30: Worlds with  many intelligent agents

Quiz: Mixed Strategy Nash Equilibria

0, 0 +1, -1 -1, +1

-1, +1 0, 0 +1, -1

+1, -1 -1, +1 0, 0

Rock Scissors

Player 1

Play

er 2

Paper

Sciss

ors

Pape

rRo

ck

Can you find a mixed-strategy Nash equilibrium for Rock Paper Scissors?

(This can be hard in general, but see if you can guess it for this game.)

Page 31: Worlds with  many intelligent agents

Answer: Mixed Strategy Nash Equilibria

0, 0 +1, -1 -1, +1

-1, +1 0, 0 +1, -1

+1, -1 -1, +1 0, 0

Rock Scissors

Player 1

Play

er 2

Paper

Sciss

ors

Pape

rRo

ck

The mixed strategy profile where each player has 1/3 probability for each action is a Nash equilibrium.

(It’s the only Nash equilibrium for this game.)

Page 32: Worlds with  many intelligent agents

Nash’s TheoremTheorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash equilibrium.

Nash, John (1951) "Non-Cooperative Games" The Annals of Mathematics 54(2):286-295.

(John Nash did not call them “Nash equilibria”, that name came later.)

He shared the 1994 Nobel Memorial Prize in Economic Sciences with game theorists Reinhard Selten and John Harsanyi for his work on Nash equilibria.

He suffered from schizophrenia in the 1950s and 1960s, as depicted in the 1998 film, “A Beautiful Mind”. He nevertheless recovered enough to return to academia and continue his research.

Page 33: Worlds with  many intelligent agents

More on Constant-Sum GamesMinimax Theorem (John von Neumann, 1928): For every two-person, zero-sum game with finitely many pure strategies, there exists a mixed strategy for each player and a value V such that:- Given player 2’s strategy, the best possible payoff for player 1 is V- Given player 1’s strategy, the best possible payoff for player 2 is –V.

The existence of strategies part is a special case of Nash’s theorem, and a precursor to it.

This basically says that player 1 can guarantee himself a payoff of at least V, and player 2 can guarantee himself a payoff of at least –V. If both players play optimally, that’s exactly what they will get.

It’s called “minimax” because the players get this value by pursuing a strategy that tries to minimize the maximum payoff of the other player. We’ll come back to this.

Definition: The value V is called the value of the game.Eg: The value of Rock-paper-scissors is 0; the best that P1 can hope to achieve, assuming P2 plays optimally (1/3 probability of each action), is a payoff of 0.