cs 416 artificial intelligence lecture 23 making complex decisions chapter 17 lecture 23 making...

CS 416Artificial Intelligence

Lecture 23Lecture 23

Making Complex DecisionsMaking Complex Decisions

Chapter 17Chapter 17

Lecture 23Lecture 23

Making Complex DecisionsMaking Complex Decisions

Chapter 17Chapter 17

Partially observable Markov Decision Processes (POMDPs)

Relationship to MDPsRelationship to MDPs

• Value and Policy Iteration assume you know a lot about the Value and Policy Iteration assume you know a lot about the world: world:

– current state, action, next state, reward for state, …current state, action, next state, reward for state, …

• In real world, you don’t exactly know what state you’re inIn real world, you don’t exactly know what state you’re in

– Is the car in front braking hard or braking lightly?Is the car in front braking hard or braking lightly?

– Can you successfully kick the ball to your teammate?Can you successfully kick the ball to your teammate?

Relationship to MDPsRelationship to MDPs

• Value and Policy Iteration assume you know a lot about the Value and Policy Iteration assume you know a lot about the world: world:

– current state, action, next state, reward for state, …current state, action, next state, reward for state, …

• In real world, you don’t exactly know what state you’re inIn real world, you don’t exactly know what state you’re in

– Is the car in front braking hard or braking lightly?Is the car in front braking hard or braking lightly?

– Can you successfully kick the ball to your teammate?Can you successfully kick the ball to your teammate?

Partially observable

Consider not knowing what state you’re in…Consider not knowing what state you’re in…

• Go left, left, left, left, leftGo left, left, left, left, left

• Go up, up, up, up, upGo up, up, up, up, up

– You’re probably in upper-You’re probably in upper-left cornerleft corner

• Go right, right, right, right, rightGo right, right, right, right, right

Consider not knowing what state you’re in…Consider not knowing what state you’re in…

• Go left, left, left, left, leftGo left, left, left, left, left

• Go up, up, up, up, upGo up, up, up, up, up

– You’re probably in upper-You’re probably in upper-left cornerleft corner

• Go right, right, right, right, rightGo right, right, right, right, right

Extending the MDP model

MDPs have an explicit transition functionMDPs have an explicit transition functionT(s, a, s’)T(s, a, s’)

• We add We add O (s, o)O (s, o)

– The probability of observing The probability of observing o o when in state when in state ss

• We add the We add the belief statebelief state, , bb

– The probability distribution over all possible statesThe probability distribution over all possible states

– b(s)b(s) = belief that you are in state = belief that you are in state ss

MDPs have an explicit transition functionMDPs have an explicit transition functionT(s, a, s’)T(s, a, s’)

• We add We add O (s, o)O (s, o)

– The probability of observing The probability of observing o o when in state when in state ss

• We add the We add the belief statebelief state, , bb

– The probability distribution over all possible statesThe probability distribution over all possible states

– b(s)b(s) = belief that you are in state = belief that you are in state ss

Two parts to the problem

Figure out what state you’re inFigure out what state you’re in• Use Filtering from Chapter 15Use Filtering from Chapter 15

Figure out what to do in that stateFigure out what to do in that state• Bellman’s equation is useful againBellman’s equation is useful again

The optimal action depends only on the agent’s The optimal action depends only on the agent’s current belief statecurrent belief state

Figure out what state you’re inFigure out what state you’re in• Use Filtering from Chapter 15Use Filtering from Chapter 15

Figure out what to do in that stateFigure out what to do in that state• Bellman’s equation is useful againBellman’s equation is useful again

The optimal action depends only on the agent’s The optimal action depends only on the agent’s current belief statecurrent belief state

Update b(s) and(s) aftereach iteration

Selecting an action

• is normalizing constant that makes belief state sum to 1is normalizing constant that makes belief state sum to 1

• b’ = FORWARD (b, a, o)b’ = FORWARD (b, a, o)

• Optimal policy maps belief states to actionsOptimal policy maps belief states to actions

– Note that the n-dimensional belief-state is continuousNote that the n-dimensional belief-state is continuous

Each belief value is a number between 0 and 1Each belief value is a number between 0 and 1

• is normalizing constant that makes belief state sum to 1is normalizing constant that makes belief state sum to 1

• b’ = FORWARD (b, a, o)b’ = FORWARD (b, a, o)

• Optimal policy maps belief states to actionsOptimal policy maps belief states to actions

– Note that the n-dimensional belief-state is continuousNote that the n-dimensional belief-state is continuous

Each belief value is a number between 0 and 1Each belief value is a number between 0 and 1

A slight hitch

The previous slide The previous slide required that you know the required that you know the outcome ooutcome o of action a in order to update the belief of action a in order to update the belief statestate

If the If the policypolicy is supposed to navigate through is supposed to navigate through beliefbelief space, we want to know what belief state space, we want to know what belief state we’re moving into before executing action we’re moving into before executing action aa

The previous slide The previous slide required that you know the required that you know the outcome ooutcome o of action a in order to update the belief of action a in order to update the belief statestate

If the If the policypolicy is supposed to navigate through is supposed to navigate through beliefbelief space, we want to know what belief state space, we want to know what belief state we’re moving into before executing action we’re moving into before executing action aa

Predicting future belief states

Suppose you know action a was performed when Suppose you know action a was performed when in belief state b. What is the probability of in belief state b. What is the probability of receiving observation o?receiving observation o?

• b provides a guess about initial stateb provides a guess about initial state

• a is knowna is known

• Any observation could be realized… any subsequent state Any observation could be realized… any subsequent state could be realized… any new belief state could be realizedcould be realized… any new belief state could be realized

Suppose you know action a was performed when Suppose you know action a was performed when in belief state b. What is the probability of in belief state b. What is the probability of receiving observation o?receiving observation o?

• b provides a guess about initial stateb provides a guess about initial state

• a is knowna is known

• Any observation could be realized… any subsequent state Any observation could be realized… any subsequent state could be realized… any new belief state could be realizedcould be realized… any new belief state could be realized


The probability of perceiving o, given action a and The probability of perceiving o, given action a and belief state b, is given by summing over all the belief state b, is given by summing over all the actual states the agent might reachactual states the agent might reach

The probability of perceiving o, given action a and The probability of perceiving o, given action a and belief state b, is given by summing over all the belief state b, is given by summing over all the actual states the agent might reachactual states the agent might reach


We just computed the odds of receiving oWe just computed the odds of receiving o

We want new belief stateWe want new belief state

• Let Let (b, a, b’) (b, a, b’) be the be the belief transition functionbelief transition function

We just computed the odds of receiving oWe just computed the odds of receiving o

We want new belief stateWe want new belief state

• Let Let (b, a, b’) (b, a, b’) be the be the belief transition functionbelief transition function

Equal to 1 ifb′ = FORWARD(b, a, o)Equal to 0 otherwise

Predicted future belief states

Combining previous two slidesCombining previous two slides

This is a transition model through belief statesThis is a transition model through belief states

Combining previous two slidesCombining previous two slides

This is a transition model through belief statesThis is a transition model through belief states

Relating POMDPs to MDPs

We’ve found a model for transitions through belief We’ve found a model for transitions through belief statesstates

• Note MDPs had transitions through states (the real things)Note MDPs had transitions through states (the real things)

We We need need a model for rewards based on beliefsa model for rewards based on beliefs

• Note MDPs had a reward function based on stateNote MDPs had a reward function based on state

We’ve found a model for transitions through belief We’ve found a model for transitions through belief statesstates

• Note MDPs had transitions through states (the real things)Note MDPs had transitions through states (the real things)

We We need need a model for rewards based on beliefsa model for rewards based on beliefs

• Note MDPs had a reward function based on stateNote MDPs had a reward function based on state

Bringing it all together

We’ve constructed a representation of POMDPs We’ve constructed a representation of POMDPs that make them look like MDPsthat make them look like MDPs

• Value and Policy Iteration can be used for POMDPsValue and Policy Iteration can be used for POMDPs

• The optimal policy, The optimal policy, *(*(bb) of the MDP belief-state ) of the MDP belief-state representation is also optimal for the physical-state POMDP representation is also optimal for the physical-state POMDP representationrepresentation

We’ve constructed a representation of POMDPs We’ve constructed a representation of POMDPs that make them look like MDPsthat make them look like MDPs

• Value and Policy Iteration can be used for POMDPsValue and Policy Iteration can be used for POMDPs

• The optimal policy, The optimal policy, *(*(bb) of the MDP belief-state ) of the MDP belief-state representation is also optimal for the physical-state POMDP representation is also optimal for the physical-state POMDP representationrepresentation

Continuous vs. discrete

Our POMDP in MDP-form is continuousOur POMDP in MDP-form is continuous

• Cluster Cluster continuous space into regions and try to solve for continuous space into regions and try to solve for approximations within these regionsapproximations within these regions

Our POMDP in MDP-form is continuousOur POMDP in MDP-form is continuous

• Cluster Cluster continuous space into regions and try to solve for continuous space into regions and try to solve for approximations within these regionsapproximations within these regions

Final answer to POMDP problem

[l, u, u, r, u, u, r, u, u, r, …][l, u, u, r, u, u, r, u, u, r, …]

• It’s deterministic (it already takes into account the absence of It’s deterministic (it already takes into account the absence of observations)observations)

• It has an expected utility of 0.38 (compared with 0.08 of the It has an expected utility of 0.38 (compared with 0.08 of the simple l, l, l, u, u, u, r, r, r,…)simple l, l, l, u, u, u, r, r, r,…)

• It is successful 86.6%It is successful 86.6%

In general, POMDPs with a few dozen states are In general, POMDPs with a few dozen states are nearly impossible to optimizenearly impossible to optimize

[l, u, u, r, u, u, r, u, u, r, …][l, u, u, r, u, u, r, u, u, r, …]

• It’s deterministic (it already takes into account the absence of It’s deterministic (it already takes into account the absence of observations)observations)

• It has an expected utility of 0.38 (compared with 0.08 of the It has an expected utility of 0.38 (compared with 0.08 of the simple l, l, l, u, u, u, r, r, r,…)simple l, l, l, u, u, u, r, r, r,…)

• It is successful 86.6%It is successful 86.6%

In general, POMDPs with a few dozen states are In general, POMDPs with a few dozen states are nearly impossible to optimizenearly impossible to optimize

Game Theory

Multiagent games with simultaneous movesMultiagent games with simultaneous moves

• First, study games with First, study games with oneone move move

– bankruptcy proceedingsbankruptcy proceedings

– auctionsauctions

– economicseconomics

– war gamingwar gaming

Multiagent games with simultaneous movesMultiagent games with simultaneous moves

• First, study games with First, study games with oneone move move

– bankruptcy proceedingsbankruptcy proceedings

– auctionsauctions

– economicseconomics

– war gamingwar gaming

Definition of a game

• The playersThe players

• The actionsThe actions

• The payoff matrixThe payoff matrix

– provides the utility to each player for each combination of provides the utility to each player for each combination of actionsactions

• The playersThe players

• The actionsThe actions

• The payoff matrixThe payoff matrix

– provides the utility to each player for each combination of provides the utility to each player for each combination of actionsactions

Two-finger Morra

Game theory strategies

Strategy == policy Strategy == policy (as in policy iteration)(as in policy iteration)• What do you do?What do you do?

– pure strategypure strategy

you do the same thing all the timeyou do the same thing all the time

– mixed strategymixed strategy

you rely on some randomized policy to select an actionyou rely on some randomized policy to select an action

• Strategy ProfileStrategy Profile

– The assignment of strategies to playersThe assignment of strategies to players

Strategy == policy Strategy == policy (as in policy iteration)(as in policy iteration)• What do you do?What do you do?

– pure strategypure strategy

you do the same thing all the timeyou do the same thing all the time

– mixed strategymixed strategy

you rely on some randomized policy to select an actionyou rely on some randomized policy to select an action

• Strategy ProfileStrategy Profile

– The assignment of strategies to playersThe assignment of strategies to players

Game theoretic solutions

What’s a solution to a game?What’s a solution to a game?

• All players select a “rational” strategyAll players select a “rational” strategy

• Note that we’re not analyzing one particular game, but the Note that we’re not analyzing one particular game, but the outcomes that accumulate over a series of played gamesoutcomes that accumulate over a series of played games

What’s a solution to a game?What’s a solution to a game?

• All players select a “rational” strategyAll players select a “rational” strategy

• Note that we’re not analyzing one particular game, but the Note that we’re not analyzing one particular game, but the outcomes that accumulate over a series of played gamesoutcomes that accumulate over a series of played games

Prisoner’s Dilemma

Alice and Bob are caught red handed at the scene Alice and Bob are caught red handed at the scene of a crimeof a crime• both are interrogated separately by the policeboth are interrogated separately by the police

• the penalty if they both confess is 5 years for eachthe penalty if they both confess is 5 years for each

• the penalty if they both refuse to confess is 1 year for eachthe penalty if they both refuse to confess is 1 year for each

• if one confesses and the other doesn’tif one confesses and the other doesn’t

– the honest one (who confesses) gets 10 years the honest one (who confesses) gets 10 years

– the liar gets 0 yearsthe liar gets 0 years

Alice and Bob are caught red handed at the scene Alice and Bob are caught red handed at the scene of a crimeof a crime• both are interrogated separately by the policeboth are interrogated separately by the police

• the penalty if they both confess is 5 years for eachthe penalty if they both confess is 5 years for each

• the penalty if they both refuse to confess is 1 year for eachthe penalty if they both refuse to confess is 1 year for each

• if one confesses and the other doesn’tif one confesses and the other doesn’t

– the honest one (who confesses) gets 10 years the honest one (who confesses) gets 10 years

– the liar gets 0 yearsthe liar gets 0 years

What do you do to act selfishly?What do you do to act selfishly?What do you do to act selfishly?What do you do to act selfishly?

Prisoner’s dilemma payoff matrix

Prisoner’s dilemma strategy

Alice’s StrategyAlice’s Strategy

• If Bob If Bob testifiestestifies

– best option is to best option is to testify (-5)testify (-5)

• If Bob If Bob refusesrefuses

– best options is to best options is to testify (0)testify (0)

Alice’s StrategyAlice’s Strategy

• If Bob If Bob testifiestestifies


• If Bob If Bob refusesrefuses


testifying is a testifying is a dominantdominant strategy strategy


Prisoner’s dilemma strategy

Bob’s StrategyBob’s Strategy

• If Alice If Alice testifiestestifies


• If Alice If Alice refusesrefuses


Bob’s StrategyBob’s Strategy

• If Alice If Alice testifiestestifies


• If Alice If Alice refusesrefuses




Rationality

Both players seem to have clear strategiesBoth players seem to have clear strategies

• Both Both testifytestify

– game outcome would be (-5, -5)game outcome would be (-5, -5)

Both players seem to have clear strategiesBoth players seem to have clear strategies

• Both Both testifytestify

– game outcome would be (-5, -5)game outcome would be (-5, -5)

Dominance of strategies

Comparing strategiesComparing strategies

• Strategy Strategy ss can can stronglystrongly dominate dominate s’s’

– the outcome of the outcome of ss is always better than the outcome of is always better than the outcome of s’s’ no matter what the other player doesno matter what the other player does

testifyingtestifying strongly dominatesstrongly dominates refusingrefusing for Bob and Alice for Bob and Alice

• Strategy Strategy ss can can weakly weakly dominate dominate s’s’

– the outcome of the outcome of ss is better than the outcome of is better than the outcome of s’s’ on at on at least one action of the opponent and no worse on othersleast one action of the opponent and no worse on others

Comparing strategiesComparing strategies

• Strategy Strategy ss can can stronglystrongly dominate dominate s’s’

– the outcome of the outcome of ss is always better than the outcome of is always better than the outcome of s’s’ no matter what the other player doesno matter what the other player does

testifyingtestifying strongly dominatesstrongly dominates refusingrefusing for Bob and Alice for Bob and Alice

• Strategy Strategy ss can can weakly weakly dominate dominate s’s’

– the outcome of the outcome of ss is better than the outcome of is better than the outcome of s’s’ on at on at least one action of the opponent and no worse on othersleast one action of the opponent and no worse on others

Pareto Optimal

Pareto optimality comes from economicsPareto optimality comes from economics

• An outcome can be Pareto optimalAn outcome can be Pareto optimal

– textbook: no alternative outcome that all players would textbook: no alternative outcome that all players would preferprefer

– I prefer: the best that could be accomplished without I prefer: the best that could be accomplished without disadvantaging at least one groupdisadvantaging at least one group

Pareto optimality comes from economicsPareto optimality comes from economics

• An outcome can be Pareto optimalAn outcome can be Pareto optimal

– textbook: no alternative outcome that all players would textbook: no alternative outcome that all players would preferprefer

– I prefer: the best that could be accomplished without I prefer: the best that could be accomplished without disadvantaging at least one groupdisadvantaging at least one group

Is the Is the testify testify outcome (-5, -5) Pareto Optimal?outcome (-5, -5) Pareto Optimal?Is the Is the testify testify outcome (-5, -5) Pareto Optimal?outcome (-5, -5) Pareto Optimal?

Is (-5, -5) Pareto Optimal?

Is there an outcome that improves outcome Is there an outcome that improves outcome without disadvantaging any group?without disadvantaging any group?Is there an outcome that improves outcome Is there an outcome that improves outcome without disadvantaging any group?without disadvantaging any group?

How about (-1, -1) from (testify, testify)?How about (-1, -1) from (testify, testify)?How about (-1, -1) from (testify, testify)?How about (-1, -1) from (testify, testify)?

Dominant strategy equilibrium

(-5, -5) represents a dominant strategy equilibrium(-5, -5) represents a dominant strategy equilibrium

• neither player has an incentive to divert from dominant strategyneither player has an incentive to divert from dominant strategy

– If Alice assumes Bob executes same strategy as he is now, she will If Alice assumes Bob executes same strategy as he is now, she will only lose more by switchingonly lose more by switching

likewise for Boblikewise for Bob

• Imagine this as a Imagine this as a local optimumlocal optimum in outcome space in outcome space

– each dimension of outcome space is dimension of a player’s choiceeach dimension of outcome space is dimension of a player’s choice

– any movement from dominant strategy equilibrium in this space any movement from dominant strategy equilibrium in this space results in worse outcomesresults in worse outcomes

(-5, -5) represents a dominant strategy equilibrium(-5, -5) represents a dominant strategy equilibrium

• neither player has an incentive to divert from dominant strategyneither player has an incentive to divert from dominant strategy

– If Alice assumes Bob executes same strategy as he is now, she will If Alice assumes Bob executes same strategy as he is now, she will only lose more by switchingonly lose more by switching

likewise for Boblikewise for Bob

• Imagine this as a Imagine this as a local optimumlocal optimum in outcome space in outcome space

– each dimension of outcome space is dimension of a player’s choiceeach dimension of outcome space is dimension of a player’s choice

– any movement from dominant strategy equilibrium in this space any movement from dominant strategy equilibrium in this space results in worse outcomesresults in worse outcomes

Thus the dilemma…

Now we see the problemNow we see the problem

• Outcome (-5, -5) is Pareto dominated by outcome (-1, -1)Outcome (-5, -5) is Pareto dominated by outcome (-1, -1)

– To achieve Pareto optimal outcome requires divergence To achieve Pareto optimal outcome requires divergence from local optimum at strategy equilibriumfrom local optimum at strategy equilibrium

• Tough situation… Pareto optimal would be nice, but it is Tough situation… Pareto optimal would be nice, but it is unlikely because each player risks losing more unlikely because each player risks losing more

Now we see the problemNow we see the problem

• Outcome (-5, -5) is Pareto dominated by outcome (-1, -1)Outcome (-5, -5) is Pareto dominated by outcome (-1, -1)

– To achieve Pareto optimal outcome requires divergence To achieve Pareto optimal outcome requires divergence from local optimum at strategy equilibriumfrom local optimum at strategy equilibrium

• Tough situation… Pareto optimal would be nice, but it is Tough situation… Pareto optimal would be nice, but it is unlikely because each player risks losing more unlikely because each player risks losing more

Nash Equilibrium

John Nash studied game theory in 1950sJohn Nash studied game theory in 1950s

• Proved that every game has an equilibriumProved that every game has an equilibrium

– If there is a set of strategies with the property that no If there is a set of strategies with the property that no player can benefit by changing her strategy while the other player can benefit by changing her strategy while the other players keep their strategies unchanged, then that set of players keep their strategies unchanged, then that set of strategies and the corresponding payoffs constitute the strategies and the corresponding payoffs constitute the Nash EquilibriumNash Equilibrium

• All All dominant strategiesdominant strategies are Nash equilibria are Nash equilibria

John Nash studied game theory in 1950sJohn Nash studied game theory in 1950s

• Proved that every game has an equilibriumProved that every game has an equilibrium

– If there is a set of strategies with the property that no If there is a set of strategies with the property that no player can benefit by changing her strategy while the other player can benefit by changing her strategy while the other players keep their strategies unchanged, then that set of players keep their strategies unchanged, then that set of strategies and the corresponding payoffs constitute the strategies and the corresponding payoffs constitute the Nash EquilibriumNash Equilibrium

• All All dominant strategiesdominant strategies are Nash equilibria are Nash equilibria

Another game

• Acme: Hardware manufacturer chooses between CD and Acme: Hardware manufacturer chooses between CD and DVD format for next game platformDVD format for next game platform

• Best: Software manufacturer chooses between CD and DVD Best: Software manufacturer chooses between CD and DVD format for next title format for next title

• Acme: Hardware manufacturer chooses between CD and Acme: Hardware manufacturer chooses between CD and DVD format for next game platformDVD format for next game platform

• Best: Software manufacturer chooses between CD and DVD Best: Software manufacturer chooses between CD and DVD format for next title format for next title

No dominant strategy

• Verify that there is no dominant strategyVerify that there is no dominant strategy• Verify that there is no dominant strategyVerify that there is no dominant strategy

Yet two Nash equilibria exist

Outcome 1: (DVD, DVD)… (9, 9)Outcome 1: (DVD, DVD)… (9, 9)

Outcome 2: (CD, CD)… (5, 5)Outcome 2: (CD, CD)… (5, 5)

If either player unilaterally changes strategy, that If either player unilaterally changes strategy, that player will be worse offplayer will be worse off

Outcome 1: (DVD, DVD)… (9, 9)Outcome 1: (DVD, DVD)… (9, 9)

Outcome 2: (CD, CD)… (5, 5)Outcome 2: (CD, CD)… (5, 5)

If either player unilaterally changes strategy, that If either player unilaterally changes strategy, that player will be worse offplayer will be worse off

We still have a problem

Two Nash equlibria, but which is selected?Two Nash equlibria, but which is selected?

• If players fail to select same strategy, both will loseIf players fail to select same strategy, both will lose

– they could “agree” to select the Pareto optimal solutionthey could “agree” to select the Pareto optimal solution

that seems reasonablethat seems reasonable

– they could coordinatethey could coordinate

Two Nash equlibria, but which is selected?Two Nash equlibria, but which is selected?

• If players fail to select same strategy, both will loseIf players fail to select same strategy, both will lose

– they could “agree” to select the Pareto optimal solutionthey could “agree” to select the Pareto optimal solution

that seems reasonablethat seems reasonable

– they could coordinatethey could coordinate

Repeated games

Imagine same game played multiple timesImagine same game played multiple times

• payoffs accumulate for each playerpayoffs accumulate for each player

• optimal strategy is a function of game historyoptimal strategy is a function of game history

– must select optimal action for each possible game historymust select optimal action for each possible game history

• StrategiesStrategies

– perpetual punishmentperpetual punishment

cross me once and I’ll take us both down forevercross me once and I’ll take us both down forever

– tit for tattit for tat

cross me once and I’ll cross you the subsequent movecross me once and I’ll cross you the subsequent move

Imagine same game played multiple timesImagine same game played multiple times

• payoffs accumulate for each playerpayoffs accumulate for each player

• optimal strategy is a function of game historyoptimal strategy is a function of game history

– must select optimal action for each possible game historymust select optimal action for each possible game history

• StrategiesStrategies

– perpetual punishmentperpetual punishment

cross me once and I’ll take us both down forevercross me once and I’ll take us both down forever

– tit for tattit for tat

cross me once and I’ll cross you the subsequent movecross me once and I’ll cross you the subsequent move

The design of games

Let’s invert the strategy selection process to design fair/effective Let’s invert the strategy selection process to design fair/effective gamesgames

• Tragedy of the commonsTragedy of the commons

– individual farmers bring their livestock to the town commons to grazeindividual farmers bring their livestock to the town commons to graze

– commons is destroyed and all experience negative utilitycommons is destroyed and all experience negative utility

– all behaved rationally – refraining would not have saved the commons as all behaved rationally – refraining would not have saved the commons as someone else would eat itsomeone else would eat it

ExternalitiesExternalities are a way to place a value on changes in global utility are a way to place a value on changes in global utility

Power utilities pay for the utility they deprive neighboring communities Power utilities pay for the utility they deprive neighboring communities (yet another Nobel prize in Econ for this – Coase (prof at UVa))(yet another Nobel prize in Econ for this – Coase (prof at UVa))

Let’s invert the strategy selection process to design fair/effective Let’s invert the strategy selection process to design fair/effective gamesgames

• Tragedy of the commonsTragedy of the commons

– individual farmers bring their livestock to the town commons to grazeindividual farmers bring their livestock to the town commons to graze

– commons is destroyed and all experience negative utilitycommons is destroyed and all experience negative utility

– all behaved rationally – refraining would not have saved the commons as all behaved rationally – refraining would not have saved the commons as someone else would eat itsomeone else would eat it

ExternalitiesExternalities are a way to place a value on changes in global utility are a way to place a value on changes in global utility

Power utilities pay for the utility they deprive neighboring communities Power utilities pay for the utility they deprive neighboring communities (yet another Nobel prize in Econ for this – Coase (prof at UVa))(yet another Nobel prize in Econ for this – Coase (prof at UVa))

Auctions

• English AuctionEnglish Auction

– auctioneer incrementally raises bid price until one bidder auctioneer incrementally raises bid price until one bidder remainsremains

bidder gets the item at the highest price of another bidder gets the item at the highest price of another bidder plus the increment bidder plus the increment (perhaps the highest bidder (perhaps the highest bidder would have spent more?)would have spent more?)

strategy is simple… keep bidding until price is higher strategy is simple… keep bidding until price is higher than utilitythan utility

strategy of other bidders is irrelevantstrategy of other bidders is irrelevant

• English AuctionEnglish Auction

– auctioneer incrementally raises bid price until one bidder auctioneer incrementally raises bid price until one bidder remainsremains

bidder gets the item at the highest price of another bidder gets the item at the highest price of another bidder plus the increment bidder plus the increment (perhaps the highest bidder (perhaps the highest bidder would have spent more?)would have spent more?)

strategy is simple… keep bidding until price is higher strategy is simple… keep bidding until price is higher than utilitythan utility

strategy of other bidders is irrelevantstrategy of other bidders is irrelevant

Auctions

• Sealed bid auctionSealed bid auction

– place your bid in an envelope and highest bid is selectedplace your bid in an envelope and highest bid is selected

say your highest bid is vsay your highest bid is v

say you believe the highest competing bid is bsay you believe the highest competing bid is b

bid min (v, b + bid min (v, b + ))

player with highest value on good may not win the player with highest value on good may not win the good and players must contemplate other player’s good and players must contemplate other player’s valuesvalues

• Sealed bid auctionSealed bid auction

– place your bid in an envelope and highest bid is selectedplace your bid in an envelope and highest bid is selected

say your highest bid is vsay your highest bid is v

say you believe the highest competing bid is bsay you believe the highest competing bid is b

bid min (v, b + bid min (v, b + ))

player with highest value on good may not win the player with highest value on good may not win the good and players must contemplate other player’s good and players must contemplate other player’s valuesvalues

Auctions

• Vickery Auction (A sealed bid auction)Vickery Auction (A sealed bid auction)

– Winner pays the price of the second-highest bidWinner pays the price of the second-highest bid

– Dominant strategy is to bid what item is worth to youDominant strategy is to bid what item is worth to you

• Vickery Auction (A sealed bid auction)Vickery Auction (A sealed bid auction)

– Winner pays the price of the second-highest bidWinner pays the price of the second-highest bid

– Dominant strategy is to bid what item is worth to youDominant strategy is to bid what item is worth to you

Auctions

• These auction algorithms can find their way into computer-These auction algorithms can find their way into computer-controlled systemscontrolled systems

– NetworkingNetworking

RoutersRouters

EthernetEthernet

– Thermostat control in offices (Xerox PARC)Thermostat control in offices (Xerox PARC)

• These auction algorithms can find their way into computer-These auction algorithms can find their way into computer-controlled systemscontrolled systems

– NetworkingNetworking

RoutersRouters

EthernetEthernet

– Thermostat control in offices (Xerox PARC)Thermostat control in offices (Xerox PARC)

cs 416 artificial intelligence lecture 23 making complex decisions chapter 17 lecture 23 making...

Documents

current state

belief state sum

state s slide

state s mdps

state bellmans equation

ndimensional beliefstate

right slide

state youre inin real