adversarial search chapter 5. outline optimal decisions for deterministic, zero-sum game of perfect...
TRANSCRIPT
![Page 1: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/1.jpg)
Adversarial Search
Chapter 5
![Page 2: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/2.jpg)
Outline
Optimal decisions for deterministic, zero-sum game of perfect information: minimax
α-β pruning Imperfect, real-time decisions: cut-off Stochastic games
![Page 3: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/3.jpg)
Games vs. search problems
"Unpredictable" opponent specifying a move for every possible opponent reply
Time limits unlikely to find goal, must approximate
![Page 4: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/4.jpg)
Game tree (2-player, deterministic, turns): tic-tac-toe
![Page 5: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/5.jpg)
Minimax
Perfect play for deterministic games Idea: choose move to position with highest minimax
value = best achievable payoff against best play
E.g., 2-ply game:
![Page 6: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/6.jpg)
Minimax algorithm
![Page 7: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/7.jpg)
Properties of minimax
Complete? Yes (if tree is finite) Optimal? Yes (against an optimal opponent) Time complexity? O(bm) Space complexity? O(bm) (depth-first exploration)
For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible
![Page 8: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/8.jpg)
More than two players
3 players: utility(s) is a vector <u1, u2, u3> representing the value of state s for p1, p2, p3
![Page 9: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/9.jpg)
α-β pruning example
![Page 10: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/10.jpg)
α-β pruning example
![Page 11: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/11.jpg)
α-β pruning example
![Page 12: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/12.jpg)
α-β pruning example
![Page 13: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/13.jpg)
α-β pruning example
![Page 14: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/14.jpg)
Properties of α-β
Pruning does not affect final result
Good move ordering improves effectiveness of pruning
With "perfect ordering," time complexity = O(bm/2) doubles depth of search
A simple example of the value of reasoning about which computations are relevant (a form of metareasoning)
![Page 15: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/15.jpg)
Why is it called α-β?
α is the value of the best (i.e., highest-value) choice found so far at any choice point along the path for max
If v is worse than α, max will avoid it prune that branch
Define β similarly for min
![Page 16: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/16.jpg)
The α-β algorithm
![Page 17: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/17.jpg)
The α-β algorithm
![Page 18: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/18.jpg)
Resource limits – imperfect real-time decisionsSuppose we have 100 secs, explore 104
nodes/sec 106 nodes per move
Standard approach: cutoff test:
e.g., depth limit (perhaps add quiescence search)
evaluation function Eval(s)= estimated desirability of position s
![Page 19: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/19.jpg)
Cutting off search
H-Minimax is identical to Minimax except1. Terminal? is replaced by Cutoff?2. Utility is replaced by Eval
Does it work in practice?bm = 106, b=35 m=4
4-ply lookahead is a hopeless chess player! 4-ply ≈ human novice 8-ply ≈ typical PC, human master 12-ply ≈ Deep Blue, Kasparov
![Page 20: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/20.jpg)
Evaluation functions
For chess, typically linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)
e.g., w1 = 9 with
f1(s) = (number of white queens) – (number of black queens), etc.
![Page 21: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/21.jpg)
How to get a good evaluation function
Get good features (patterns, counts of pieces, etc.) Find good ways to combine the features (linear
combination? Non-linear combination?) Machine learning could be used to learn
New features Good ways (or good coefficients, like w1, …, wn) to
combine the known features
![Page 22: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/22.jpg)
Search versus look-up
Game playing programs often use table look-ups (instead of game tree search) for:
Opening moves – good moves from game books, human expert experience
End-game (when there are a few pieces left) For chess: tables for all end-games with at most 5
pieces are available (online)
![Page 23: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/23.jpg)
Deterministic games in practice Checkers: Chinook ended 40-year-reign of human world
champion Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions.
Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply.
Othello: human champions refuse to compete against computers, who are too good.
Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.
![Page 24: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/24.jpg)
Stochastic games Game of Backgammon
![Page 25: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/25.jpg)
Game tree for Backgammon
![Page 26: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/26.jpg)
Order-preserving transform of leaf values could change the move selection
![Page 27: Adversarial Search Chapter 5. Outline Optimal decisions for deterministic, zero-sum game of perfect information: minimax α-β pruning Imperfect, real-time](https://reader030.vdocuments.net/reader030/viewer/2022032721/56649cd85503460f949a0f7a/html5/thumbnails/27.jpg)
Summary
Games are fun to work on! They illustrate several important points about
AI perfection is unattainable must
approximate good idea to think about what to think about