uncertainty and utilities - github...
TRANSCRIPT
![Page 1: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/1.jpg)
CS 5522: Artificial Intelligence II Uncertainty and Utilities
Instructor: Alan Ritter
Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]
![Page 2: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/2.jpg)
Uncertain Outcomes
![Page 3: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/3.jpg)
Worst-Case vs. Average Case
10 10 9 100
max
min
Idea: Uncertain outcomes controlled by chance, not an adversary!
![Page 4: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/4.jpg)
Worst-Case vs. Average Case
10 10 9 100
max
min
Idea: Uncertain outcomes controlled by chance, not an adversary!
![Page 5: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/5.jpg)
Worst-Case vs. Average Case
10 10 9 100
max
min
Idea: Uncertain outcomes controlled by chance, not an adversary!
![Page 6: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/6.jpg)
Worst-Case vs. Average Case
10 10 9 100
max
min
Idea: Uncertain outcomes controlled by chance, not an adversary!
![Page 7: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/7.jpg)
Worst-Case vs. Average Case
10 10 9 100
max
min
Idea: Uncertain outcomes controlled by chance, not an adversary!
![Page 8: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/8.jpg)
Expectimax Search
▪ Why wouldn’t we know what the result of an action will be?▪ Explicit randomness: rolling dice▪ Unpredictable opponents: the ghosts respond randomly▪ Actions can fail: when moving a robot, wheels might slip
▪ Values should now reflect average-case (expectimax) outcomes, not worst-case (minimax) outcomes
10 4 5 7
max
chance
10 10 9 100
[Demo: min vs exp (L7D1,2)]
![Page 9: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/9.jpg)
Expectimax Search
▪ Why wouldn’t we know what the result of an action will be?▪ Explicit randomness: rolling dice▪ Unpredictable opponents: the ghosts respond randomly▪ Actions can fail: when moving a robot, wheels might slip
▪ Values should now reflect average-case (expectimax) outcomes, not worst-case (minimax) outcomes
▪ Expectimax search: compute the average score under optimal play▪ Max nodes as in minimax search▪ Chance nodes are like min nodes but the outcome is
uncertain▪ Calculate their expected utilities▪ I.e. take weighted average (expectation) of children
▪ Later, we’ll learn how to formalize the underlying uncertain-result problems as Markov Decision Processes
10 4 5 7
max
chance
10 10 9 100
[Demo: min vs exp (L7D1,2)]
![Page 10: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/10.jpg)
Video of Demo Minimax vs Expectimax (Min)
![Page 11: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/11.jpg)
Video of Demo Minimax vs Expectimax (Min)
![Page 12: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/12.jpg)
Video of Demo Minimax vs Expectimax (Min)
![Page 13: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/13.jpg)
Video of Demo Minimax vs Expectimax (Exp)
![Page 14: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/14.jpg)
Video of Demo Minimax vs Expectimax (Exp)
![Page 15: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/15.jpg)
Video of Demo Minimax vs Expectimax (Exp)
![Page 16: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/16.jpg)
Expectimax Pseudocode
def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is EXP: return exp-value(state)
![Page 17: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/17.jpg)
Expectimax Pseudocode
def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is EXP: return exp-value(state)
def exp-value(state): initialize v = 0 for each successor of state: p = probability(successor)
v += p * value(successor) return v
def max-value(state): initialize v = -∞ for each successor of state:
v = max(v, value(successor)) return v
![Page 18: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/18.jpg)
Expectimax Pseudocode
![Page 19: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/19.jpg)
Expectimax Pseudocode
def exp-value(state): initialize v = 0 for each successor of state: p = probability(successor)
v += p * value(successor) return v
![Page 20: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/20.jpg)
Expectimax Pseudocode
def exp-value(state): initialize v = 0 for each successor of state: p = probability(successor)
v += p * value(successor) return v 5 78 24 -12
1/21/3
1/6
![Page 21: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/21.jpg)
Expectimax Pseudocode
def exp-value(state): initialize v = 0 for each successor of state: p = probability(successor)
v += p * value(successor) return v 5 78 24 -12
1/21/3
1/6
v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10
![Page 22: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/22.jpg)
Expectimax Example
12 9 6 03 2 154 6
![Page 23: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/23.jpg)
Expectimax Pruning?
12 93 2
![Page 24: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/24.jpg)
Expectimax Pruning?
12 93 2
![Page 25: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/25.jpg)
Depth-Limited Expectimax
…
…
492 362 …
![Page 26: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/26.jpg)
Depth-Limited Expectimax
…
…
492 362 …
400 300
![Page 27: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/27.jpg)
Depth-Limited Expectimax
…
…
492 362 …
400 300Estimate of true expectimax value
(which would require a lot of
work to compute)
![Page 28: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/28.jpg)
Probabilities
![Page 29: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/29.jpg)
Reminder: Probabilities
▪ A random variable represents an event whose outcome is unknown ▪ A probability distribution is an assignment of weights to outcomes
▪ Example: Traffic on freeway ▪ Random variable: T = whether there’s traffic ▪ Outcomes: T in {none, light, heavy} ▪ Distribution: P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25
▪ Some laws of probability (more later): ▪ Probabilities are always non-negative ▪ Probabilities over all possible outcomes sum to one
▪ As we get more evidence, probabilities may change: ▪ P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60 ▪ We’ll talk about methods for reasoning and updating probabilities later
0.25
0.50
0.25
![Page 30: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/30.jpg)
Reminder: Expectations
![Page 31: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/31.jpg)
▪ The expected value of a function of a random variable is the average, weighted by the probability distribution over outcomes
▪ Example: How long to get to the airport?
Reminder: Expectations
![Page 32: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/32.jpg)
▪ The expected value of a function of a random variable is the average, weighted by the probability distribution over outcomes
▪ Example: How long to get to the airport?
Reminder: Expectations
0.25 0.50 0.25Probability:
![Page 33: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/33.jpg)
▪ The expected value of a function of a random variable is the average, weighted by the probability distribution over outcomes
▪ Example: How long to get to the airport?
Reminder: Expectations
0.25 0.50 0.25Probability:
20 min 30 min 60 minTime:
![Page 34: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/34.jpg)
▪ The expected value of a function of a random variable is the average, weighted by the probability distribution over outcomes
▪ Example: How long to get to the airport?
Reminder: Expectations
0.25 0.50 0.25Probability:
20 min 30 min 60 minTime:x x x
![Page 35: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/35.jpg)
▪ The expected value of a function of a random variable is the average, weighted by the probability distribution over outcomes
▪ Example: How long to get to the airport?
Reminder: Expectations
0.25 0.50 0.25Probability:
20 min 30 min 60 minTime:x x x+ +
![Page 36: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/36.jpg)
▪ The expected value of a function of a random variable is the average, weighted by the probability distribution over outcomes
▪ Example: How long to get to the airport?
Reminder: Expectations
0.25 0.50 0.25Probability:
20 min 30 min 60 minTime:35 minx x x+ +
![Page 37: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/37.jpg)
▪ In expectimax search, we have a probabilistic model of how the opponent (or environment) will behave in any state ▪ Model could be a simple uniform distribution (roll a die) ▪ Model could be sophisticated and require a great deal
of computation ▪ We have a chance node for any outcome out of our
control: opponent or environment ▪ The model might say that adversarial actions are likely!
▪ For now, assume each chance node magically comes along with probabilities that specify the distribution over its outcomes
What Probabilities to Use?
Having a probabilistic belief about another agent’s action does not
mean that the agent is flipping any coins!
![Page 38: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/38.jpg)
▪ In expectimax search, we have a probabilistic model of how the opponent (or environment) will behave in any state ▪ Model could be a simple uniform distribution (roll a die) ▪ Model could be sophisticated and require a great deal
of computation ▪ We have a chance node for any outcome out of our
control: opponent or environment ▪ The model might say that adversarial actions are likely!
▪ For now, assume each chance node magically comes along with probabilities that specify the distribution over its outcomes
What Probabilities to Use?
Having a probabilistic belief about another agent’s action does not
mean that the agent is flipping any coins!
![Page 39: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/39.jpg)
▪ In expectimax search, we have a probabilistic model of how the opponent (or environment) will behave in any state ▪ Model could be a simple uniform distribution (roll a die) ▪ Model could be sophisticated and require a great deal
of computation ▪ We have a chance node for any outcome out of our
control: opponent or environment ▪ The model might say that adversarial actions are likely!
▪ For now, assume each chance node magically comes along with probabilities that specify the distribution over its outcomes
What Probabilities to Use?
Having a probabilistic belief about another agent’s action does not
mean that the agent is flipping any coins!
![Page 40: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/40.jpg)
Quiz: Informed Probabilities
▪ Let’s say you know that your opponent is actually running a depth 2 minimax, using the result 80% of the time, and moving randomly otherwise
▪ Question: What tree search should you use?
![Page 41: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/41.jpg)
Quiz: Informed Probabilities
▪ Let’s say you know that your opponent is actually running a depth 2 minimax, using the result 80% of the time, and moving randomly otherwise
▪ Question: What tree search should you use?
▪ Answer: Expectimax!
![Page 42: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/42.jpg)
Quiz: Informed Probabilities
▪ Let’s say you know that your opponent is actually running a depth 2 minimax, using the result 80% of the time, and moving randomly otherwise
▪ Question: What tree search should you use?
▪ Answer: Expectimax!
![Page 43: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/43.jpg)
Quiz: Informed Probabilities
▪ Let’s say you know that your opponent is actually running a depth 2 minimax, using the result 80% of the time, and moving randomly otherwise
▪ Question: What tree search should you use?
▪ Answer: Expectimax!▪ To figure out EACH chance node’s
probabilities, you have to run a simulation of your opponent
![Page 44: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/44.jpg)
Quiz: Informed Probabilities
▪ Let’s say you know that your opponent is actually running a depth 2 minimax, using the result 80% of the time, and moving randomly otherwise
▪ Question: What tree search should you use?
▪ Answer: Expectimax!▪ To figure out EACH chance node’s
probabilities, you have to run a simulation of your opponent
![Page 45: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/45.jpg)
Quiz: Informed Probabilities
▪ Let’s say you know that your opponent is actually running a depth 2 minimax, using the result 80% of the time, and moving randomly otherwise
▪ Question: What tree search should you use?
▪ Answer: Expectimax!▪ To figure out EACH chance node’s
probabilities, you have to run a simulation of your opponent
![Page 46: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/46.jpg)
Quiz: Informed Probabilities
▪ Let’s say you know that your opponent is actually running a depth 2 minimax, using the result 80% of the time, and moving randomly otherwise
▪ Question: What tree search should you use?
0.1 0.9
▪ Answer: Expectimax!▪ To figure out EACH chance node’s
probabilities, you have to run a simulation of your opponent
![Page 47: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/47.jpg)
Quiz: Informed Probabilities
▪ Let’s say you know that your opponent is actually running a depth 2 minimax, using the result 80% of the time, and moving randomly otherwise
▪ Question: What tree search should you use?
0.1 0.9
▪ Answer: Expectimax!▪ To figure out EACH chance node’s
probabilities, you have to run a simulation of your opponent
▪ This kind of thing gets very slow very quickly
▪ Even worse if you have to simulate your opponent simulating you…
▪ … except for minimax, which has the nice property that it all collapses into one game tree
![Page 48: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/48.jpg)
Modeling Assumptions
![Page 49: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/49.jpg)
The Dangers of Optimism and Pessimism
![Page 50: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/50.jpg)
The Dangers of Optimism and Pessimism
Dangerous Optimism Assuming chance when the world is adversarial
![Page 51: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/51.jpg)
The Dangers of Optimism and Pessimism
Dangerous Optimism Assuming chance when the world is adversarial
![Page 52: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/52.jpg)
The Dangers of Optimism and Pessimism
Dangerous Optimism Assuming chance when the world is adversarial
Dangerous Pessimism Assuming the worst case when it’s not likely
![Page 53: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/53.jpg)
The Dangers of Optimism and Pessimism
Dangerous Optimism Assuming chance when the world is adversarial
Dangerous Pessimism Assuming the worst case when it’s not likely
![Page 54: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/54.jpg)
Assumptions vs. Reality
Adversarial Ghost Random Ghost
Minimax Pacman
Won 5/5
Avg. Score: 483
Won 5/5
Avg. Score: 493
Expectimax Pacman
Won 1/5
Avg. Score: -303
Won 5/5
Avg. Score: 503
[Demos: world assumptions (L7D3,4,5,6)]
Results from playing 5 games
Pacman used depth 4 search with an eval function that avoids troubleGhost used depth 2 search with an eval function that seeks Pacman
![Page 55: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/55.jpg)
Video of Demo World AssumptionsRandom Ghost – Expectimax Pacman
![Page 56: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/56.jpg)
Video of Demo World AssumptionsRandom Ghost – Expectimax Pacman
![Page 57: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/57.jpg)
Video of Demo World AssumptionsRandom Ghost – Expectimax Pacman
![Page 58: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/58.jpg)
Video of Demo World AssumptionsAdversarial Ghost – Minimax Pacman
![Page 59: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/59.jpg)
Video of Demo World AssumptionsAdversarial Ghost – Minimax Pacman
![Page 60: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/60.jpg)
Video of Demo World AssumptionsAdversarial Ghost – Minimax Pacman
![Page 61: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/61.jpg)
Video of Demo World AssumptionsAdversarial Ghost – Expectimax Pacman
![Page 62: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/62.jpg)
Video of Demo World AssumptionsAdversarial Ghost – Expectimax Pacman
![Page 63: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/63.jpg)
Video of Demo World AssumptionsAdversarial Ghost – Expectimax Pacman
![Page 64: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/64.jpg)
Video of Demo World AssumptionsRandom Ghost – Minimax Pacman
![Page 65: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/65.jpg)
Video of Demo World AssumptionsRandom Ghost – Minimax Pacman
![Page 66: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/66.jpg)
Video of Demo World AssumptionsRandom Ghost – Minimax Pacman
![Page 67: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/67.jpg)
Other Game Types
![Page 68: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/68.jpg)
Mixed Layer Types
▪ E.g. Backgammon ▪ Expectiminimax
▪ Environment is an extra “random agent” player that moves after each min/max agent
▪ Each node computes the appropriate combination of its children
![Page 69: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/69.jpg)
Example: Backgammon
▪ Dice rolls increase b: 21 possible rolls with 2 dice ▪ Backgammon ≈ 20 legal moves ▪ Depth 2 = 20 x (21 x 20)3 = 1.2 x 109
▪ As depth increases, probability of reaching a given search node shrinks ▪ So usefulness of search is diminished ▪ So limiting depth is less damaging ▪ But pruning is trickier…
▪ Historic AI: TDGammon uses depth-2 search + very good evaluation function + reinforcement learning: world-champion level play
▪ 1st AI world champion in any game!
Image: Wikipedia
![Page 70: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/70.jpg)
Multi-Agent Utilities
▪ What if the game is not zero-sum, or has multiple players?
▪ Generalization of minimax: ▪ Terminals have utility tuples ▪ Node values are also utility tuples ▪ Each player maximizes its own component ▪ Can give rise to cooperation and competition dynamically…
1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5
![Page 71: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/71.jpg)
Multi-Agent Utilities
▪ What if the game is not zero-sum, or has multiple players?
▪ Generalization of minimax: ▪ Terminals have utility tuples ▪ Node values are also utility tuples ▪ Each player maximizes its own component ▪ Can give rise to cooperation and competition dynamically…
1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5
![Page 72: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/72.jpg)
Multi-Agent Utilities
▪ What if the game is not zero-sum, or has multiple players?
▪ Generalization of minimax: ▪ Terminals have utility tuples ▪ Node values are also utility tuples ▪ Each player maximizes its own component ▪ Can give rise to cooperation and competition dynamically…
1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5
![Page 73: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/73.jpg)
Utilities
![Page 74: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/74.jpg)
Maximum Expected Utility
▪ Why should we average utilities? Why not minimax?
![Page 75: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/75.jpg)
Maximum Expected Utility
▪ Why should we average utilities? Why not minimax?
![Page 76: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/76.jpg)
Maximum Expected Utility
▪ Why should we average utilities? Why not minimax?
![Page 77: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/77.jpg)
Maximum Expected Utility
▪ Why should we average utilities? Why not minimax?
▪ Principle of maximum expected utility:▪ A rational agent should chose the action that maximizes its
expected utility, given its knowledge
![Page 78: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/78.jpg)
Maximum Expected Utility
▪ Why should we average utilities? Why not minimax?
▪ Principle of maximum expected utility:▪ A rational agent should chose the action that maximizes its
expected utility, given its knowledge
▪ Questions:▪ Where do utilities come from?▪ How do we know such utilities even exist?▪ How do we know that averaging even makes sense?▪ What if our behavior (preferences) can’t be described by
utilities?
![Page 79: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/79.jpg)
What Utilities to Use?
0 40 20 30 x2 0 1600 400 900
![Page 80: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/80.jpg)
What Utilities to Use?
▪ For worst-case minimax reasoning, terminal function scale doesn’t matter ▪ We just want better states to have higher evaluations (get the ordering
right) ▪ We call this insensitivity to monotonic transformations
▪ For average-case expectimax reasoning, we need magnitudes to be meaningful
0 40 20 30 x2 0 1600 400 900
![Page 81: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/81.jpg)
What Utilities to Use?
▪ For worst-case minimax reasoning, terminal function scale doesn’t matter ▪ We just want better states to have higher evaluations (get the ordering
right) ▪ We call this insensitivity to monotonic transformations
▪ For average-case expectimax reasoning, we need magnitudes to be meaningful
0 40 20 30 x2 0 1600 400 900
![Page 82: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/82.jpg)
Utilities
▪ Utilities are functions from outcomes (states of the world) to real numbers that describe an agent’s preferences
▪ Where do utilities come from? ▪ In a game, may be simple (+1/-1) ▪ Utilities summarize the agent’s goals ▪ Theorem: any “rational” preferences
can be summarized as a utility function
▪ We hard-wire utilities and let behaviors emerge ▪ Why don’t we let agents pick utilities? ▪ Why don’t we prescribe behaviors?
![Page 83: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/83.jpg)
Utilities: Uncertain OutcomesGetting ice cream
Get Single Get Double
![Page 84: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/84.jpg)
Utilities: Uncertain OutcomesGetting ice cream
Get Single Get Double
![Page 85: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/85.jpg)
Utilities: Uncertain OutcomesGetting ice cream
Get Single Get Double
Oops
![Page 86: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/86.jpg)
Utilities: Uncertain OutcomesGetting ice cream
Get Single Get Double
Oops Whew!
![Page 87: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/87.jpg)
Preferences
▪ An agent must have preferences among: ▪ Prizes: A, B, etc. ▪ Lotteries: situations with uncertain prizes
▪ Notation: ▪ Preference: ▪ Indifference:
![Page 88: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/88.jpg)
Preferences
▪ An agent must have preferences among: ▪ Prizes: A, B, etc. ▪ Lotteries: situations with uncertain prizes
▪ Notation: ▪ Preference: ▪ Indifference:
A B
p 1-p
A Lottery A Prize
A
![Page 89: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/89.jpg)
Preferences
▪ An agent must have preferences among: ▪ Prizes: A, B, etc. ▪ Lotteries: situations with uncertain prizes
▪ Notation: ▪ Preference: ▪ Indifference:
A B
p 1-p
A Lottery A Prize
A
![Page 90: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/90.jpg)
Rationality
![Page 91: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/91.jpg)
Rational Preferences
The Axioms of Rationality
![Page 92: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/92.jpg)
Rational Preferences
Theorem: Rational preferences imply behavior describable as maximization of expected utility
The Axioms of Rationality
![Page 93: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/93.jpg)
Rational Preferences
Theorem: Rational preferences imply behavior describable as maximization of expected utility
The Axioms of Rationality
![Page 94: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/94.jpg)
▪ Theorem [Ramsey, 1931; von Neumann & Morgenstern, 1944]▪ Given any preferences satisfying these constraints, there exists a real-valued function U such that:
▪ I.e. values assigned by U preserve preferences of both prizes and lotteries!
MEU Principle
![Page 95: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/95.jpg)
▪ Theorem [Ramsey, 1931; von Neumann & Morgenstern, 1944]▪ Given any preferences satisfying these constraints, there exists a real-valued function U such that:
▪ I.e. values assigned by U preserve preferences of both prizes and lotteries!
▪ Maximum expected utility (MEU) principle:▪ Choose the action that maximizes expected utility▪ Note: an agent can be entirely rational (consistent with MEU) without ever representing or
manipulating utilities and probabilities▪ E.g., a lookup table for perfect tic-tac-toe, a reflex vacuum cleaner
MEU Principle
![Page 96: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/96.jpg)
Human Utilities
![Page 97: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/97.jpg)
Utility Scales
▪ Normalized utilities: u+ = 1.0, u- = 0.0
![Page 98: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/98.jpg)
Utility Scales
▪ Normalized utilities: u+ = 1.0, u- = 0.0
▪ Micromorts: one-millionth chance of death, useful for paying to reduce product risks, etc.
![Page 99: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/99.jpg)
Utility Scales
▪ Normalized utilities: u+ = 1.0, u- = 0.0
▪ Micromorts: one-millionth chance of death, useful for paying to reduce product risks, etc.
▪ QALYs: quality-adjusted life years, useful for medical decisions involving substantial risk
![Page 100: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/100.jpg)
Utility Scales
▪ Normalized utilities: u+ = 1.0, u- = 0.0
▪ Micromorts: one-millionth chance of death, useful for paying to reduce product risks, etc.
▪ QALYs: quality-adjusted life years, useful for medical decisions involving substantial risk
▪ Note: behavior is invariant under positive linear transformation
![Page 101: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/101.jpg)
Utility Scales
▪ Normalized utilities: u+ = 1.0, u- = 0.0
▪ Micromorts: one-millionth chance of death, useful for paying to reduce product risks, etc.
▪ QALYs: quality-adjusted life years, useful for medical decisions involving substantial risk
▪ Note: behavior is invariant under positive linear transformation
▪ With deterministic prizes only (no lottery choices), only ordinal utility can be determined, i.e., total order on prizes
![Page 102: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/102.jpg)
▪ Utilities map states to real numbers. Which numbers?
Human Utilities
![Page 103: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/103.jpg)
▪ Utilities map states to real numbers. Which numbers?▪ Standard approach to assessment (elicitation) of human utilities:
Human Utilities
![Page 104: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/104.jpg)
▪ Utilities map states to real numbers. Which numbers?▪ Standard approach to assessment (elicitation) of human utilities:
Human Utilities
![Page 105: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/105.jpg)
▪ Utilities map states to real numbers. Which numbers?▪ Standard approach to assessment (elicitation) of human utilities:
▪ Compare a prize A to a standard lottery Lp between
▪ “best possible prize” u+ with probability p
▪ “worst possible catastrophe” u- with probability 1-p
Human Utilities
![Page 106: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/106.jpg)
▪ Utilities map states to real numbers. Which numbers?▪ Standard approach to assessment (elicitation) of human utilities:
▪ Compare a prize A to a standard lottery Lp between
▪ “best possible prize” u+ with probability p
▪ “worst possible catastrophe” u- with probability 1-p
Human Utilities
Pay $30
![Page 107: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/107.jpg)
▪ Utilities map states to real numbers. Which numbers?▪ Standard approach to assessment (elicitation) of human utilities:
▪ Compare a prize A to a standard lottery Lp between
▪ “best possible prize” u+ with probability p
▪ “worst possible catastrophe” u- with probability 1-p
Human Utilities
Pay $30
![Page 108: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/108.jpg)
▪ Utilities map states to real numbers. Which numbers?▪ Standard approach to assessment (elicitation) of human utilities:
▪ Compare a prize A to a standard lottery Lp between
▪ “best possible prize” u+ with probability p
▪ “worst possible catastrophe” u- with probability 1-p
Human Utilities
No change
Pay $30
![Page 109: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/109.jpg)
▪ Utilities map states to real numbers. Which numbers?▪ Standard approach to assessment (elicitation) of human utilities:
▪ Compare a prize A to a standard lottery Lp between
▪ “best possible prize” u+ with probability p
▪ “worst possible catastrophe” u- with probability 1-p
Human Utilities
No change
Pay $30
Instant death
![Page 110: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/110.jpg)
▪ Utilities map states to real numbers. Which numbers?▪ Standard approach to assessment (elicitation) of human utilities:
▪ Compare a prize A to a standard lottery Lp between
▪ “best possible prize” u+ with probability p
▪ “worst possible catastrophe” u- with probability 1-p
▪ Adjust lottery probability p until indifference: A ~ Lp
▪ Resulting p is a utility in [0,1]
Human Utilities
No change
Pay $30
Instant death
![Page 111: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/111.jpg)
▪ Utilities map states to real numbers. Which numbers?▪ Standard approach to assessment (elicitation) of human utilities:
▪ Compare a prize A to a standard lottery Lp between
▪ “best possible prize” u+ with probability p
▪ “worst possible catastrophe” u- with probability 1-p
▪ Adjust lottery probability p until indifference: A ~ Lp
▪ Resulting p is a utility in [0,1]
Human Utilities
0.999999 0.000001
No change
Pay $30
Instant death
![Page 112: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/112.jpg)
Money▪ Money does not behave as a utility function, but we can talk
about the utility of having money (or being in debt)▪ Given a lottery L = [p, $X; (1-p), $Y]
![Page 113: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/113.jpg)
Money▪ Money does not behave as a utility function, but we can talk
about the utility of having money (or being in debt)▪ Given a lottery L = [p, $X; (1-p), $Y]
▪ The expected monetary value EMV(L) is p*X + (1-p)*Y▪ U(L) = p*U($X) + (1-p)*U($Y)▪ Typically, U(L) < U( EMV(L) )
![Page 114: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/114.jpg)
Money▪ Money does not behave as a utility function, but we can talk
about the utility of having money (or being in debt)▪ Given a lottery L = [p, $X; (1-p), $Y]
▪ The expected monetary value EMV(L) is p*X + (1-p)*Y▪ U(L) = p*U($X) + (1-p)*U($Y)▪ Typically, U(L) < U( EMV(L) )▪ In this sense, people are risk-averse▪ When deep in debt, people are risk-prone
![Page 115: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/115.jpg)
Money▪ Money does not behave as a utility function, but we can talk
about the utility of having money (or being in debt)▪ Given a lottery L = [p, $X; (1-p), $Y]
▪ The expected monetary value EMV(L) is p*X + (1-p)*Y▪ U(L) = p*U($X) + (1-p)*U($Y)▪ Typically, U(L) < U( EMV(L) )▪ In this sense, people are risk-averse▪ When deep in debt, people are risk-prone
![Page 116: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/116.jpg)
Example: Insurance
▪ Consider the lottery [0.5, $1000; 0.5, $0]
![Page 117: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/117.jpg)
Example: Insurance
▪ Consider the lottery [0.5, $1000; 0.5, $0]▪ What is its expected monetary value? ($500)
![Page 118: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/118.jpg)
Example: Insurance
▪ Consider the lottery [0.5, $1000; 0.5, $0]▪ What is its expected monetary value? ($500)▪ What is its certainty equivalent?
![Page 119: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/119.jpg)
Example: Insurance
▪ Consider the lottery [0.5, $1000; 0.5, $0]▪ What is its expected monetary value? ($500)▪ What is its certainty equivalent?
▪ Monetary value acceptable in lieu of lottery
![Page 120: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/120.jpg)
Example: Insurance
▪ Consider the lottery [0.5, $1000; 0.5, $0]▪ What is its expected monetary value? ($500)▪ What is its certainty equivalent?
▪ Monetary value acceptable in lieu of lottery▪ $400 for most people
![Page 121: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/121.jpg)
Example: Insurance
▪ Consider the lottery [0.5, $1000; 0.5, $0]▪ What is its expected monetary value? ($500)▪ What is its certainty equivalent?
▪ Monetary value acceptable in lieu of lottery▪ $400 for most people
▪ Difference of $100 is the insurance premium▪ There’s an insurance industry because people
will pay to reduce their risk▪ If everyone were risk-neutral, no insurance
needed!
![Page 122: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/122.jpg)
Example: Insurance
▪ Consider the lottery [0.5, $1000; 0.5, $0]▪ What is its expected monetary value? ($500)▪ What is its certainty equivalent?
▪ Monetary value acceptable in lieu of lottery▪ $400 for most people
▪ Difference of $100 is the insurance premium▪ There’s an insurance industry because people
will pay to reduce their risk▪ If everyone were risk-neutral, no insurance
needed!
▪ It’s win-win: you’d rather have the $400 and the insurance company would rather have the lottery (their utility curve is flat and they have many lotteries)
![Page 123: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/123.jpg)
Example: Human Rationality?
▪ Famous example of Allais (1953)
![Page 124: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/124.jpg)
Example: Human Rationality?
▪ Famous example of Allais (1953)
▪ A: [0.8, $4k; 0.2, $0]▪ B: [1.0, $3k; 0.0, $0]
![Page 125: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/125.jpg)
Example: Human Rationality?
▪ Famous example of Allais (1953)
▪ A: [0.8, $4k; 0.2, $0]▪ B: [1.0, $3k; 0.0, $0]
▪ C: [0.2, $4k; 0.8, $0]▪ D: [0.25, $3k; 0.75, $0]
![Page 126: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/126.jpg)
Example: Human Rationality?
▪ Famous example of Allais (1953)
▪ A: [0.8, $4k; 0.2, $0]▪ B: [1.0, $3k; 0.0, $0]
▪ C: [0.2, $4k; 0.8, $0]▪ D: [0.25, $3k; 0.75, $0]
▪ Most people prefer B > A, C > D
![Page 127: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/127.jpg)
Example: Human Rationality?
▪ Famous example of Allais (1953)
▪ A: [0.8, $4k; 0.2, $0]▪ B: [1.0, $3k; 0.0, $0]
▪ C: [0.2, $4k; 0.8, $0]▪ D: [0.25, $3k; 0.75, $0]
▪ Most people prefer B > A, C > D
▪ But if U($0) = 0, then▪ B > A ⇒ U($3k) > 0.8 U($4k)
![Page 128: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/128.jpg)
Example: Human Rationality?
▪ Famous example of Allais (1953)
▪ A: [0.8, $4k; 0.2, $0]▪ B: [1.0, $3k; 0.0, $0]
▪ C: [0.2, $4k; 0.8, $0]▪ D: [0.25, $3k; 0.75, $0]
▪ Most people prefer B > A, C > D
▪ But if U($0) = 0, then▪ B > A ⇒ U($3k) > 0.8 U($4k)▪ C > D ⇒ 0.8 U($4k) > U($3k)
![Page 129: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/129.jpg)
Example: Human Rationality?
▪ Famous example of Allais (1953)
▪ A: [0.8, $4k; 0.2, $0]▪ B: [1.0, $3k; 0.0, $0]
▪ C: [0.2, $4k; 0.8, $0]▪ D: [0.25, $3k; 0.75, $0]
▪ Most people prefer B > A, C > D
▪ But if U($0) = 0, then▪ B > A ⇒ U($3k) > 0.8 U($4k)▪ C > D ⇒ 0.8 U($4k) > U($3k)
![Page 130: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/130.jpg)
Example: Human Rationality?
▪ Famous example of Allais (1953)
▪ A: [0.8, $4k; 0.2, $0]▪ B: [1.0, $3k; 0.0, $0]
▪ C: [0.2, $4k; 0.8, $0]▪ D: [0.25, $3k; 0.75, $0]
▪ Most people prefer B > A, C > D
▪ But if U($0) = 0, then▪ B > A ⇒ U($3k) > 0.8 U($4k)▪ C > D ⇒ 0.8 U($4k) > U($3k)
![Page 131: Uncertainty and Utilities - GitHub Pagesaritter.github.io/courses/5522_slides/expectimax_utilities.pdf · outcomes, not worst-case (minimax) outcomes Expectimax search: compute the](https://reader035.vdocuments.net/reader035/viewer/2022070707/5ea61734a0035a42982b4b28/html5/thumbnails/131.jpg)
Next Time: MDPs!