adversarial search

78
Adversarial Search

Upload: hayes-mccormick

Post on 30-Dec-2015

34 views

Category:

Documents


1 download

DESCRIPTION

Adversarial Search. Adversarial Search. Game playing Perfect play The minimax algorithm alpha-beta pruning Resource limitations Elements of chance Imperfect information. Game Playing State-of-the-Art. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Adversarial Search

Adversarial Search

Page 2: Adversarial Search

Adversarial Search

Game playing• Perfect play

• The minimax algorithm• alpha-beta pruning

• Resource limitations• Elements of chance• Imperfect information

Page 3: Adversarial Search

Game Playing State-of-the-Art

• Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions. Checkers is now solved!

• Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue examined 200 million positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply. Current programs are even better, if less historic.

• Othello: Human champions refuse to compete against computers, which are too good.

• Go: Human champions are beginning to be challenged by machines, though the best humans still beat the best machines. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves, along with aggressive pruning.

• Pacman: unknown

Page 4: Adversarial Search

What kind of games?

• Abstraction: To describe a game we must capture every relevant aspect of the game. Such as:

• Chess• Tic-tac-toe• …

• Accessible environments: Such games are characterized by perfect information

• Search: game-playing then consists of a search through possible game positions

• Unpredictable opponent: introduces uncertainty thus game-playing must deal with contingency problems

Page 5: Adversarial Search

Type of games

Page 6: Adversarial Search

Type of games

Page 7: Adversarial Search

Game Playing

• Many different kinds of games!• Axes:

• Deterministic or stochastic?• One, two, or more players?• Perfect information (can you see the state)?• Turn taking or simultaneous action?

• Want algorithms for calculating a strategy (policy) which recommends a move in each state

Page 8: Adversarial Search

Deterministic Games

• Deterministic, single player, perfect information:

• Know the rules• Know what actions do• Know when you win• E.g. Freecell, 8-Puzzle, Rubik’s cube

• … it’s just search!• Slight reinterpretation:

• Each node stores a value: the best outcome it can reach• This is the maximal outcome of its children (the max value)• Note that we don’t have path sums as before (utilities at end)

• After search, can pick move that leads to best node

Page 9: Adversarial Search

Deterministic Two-Player

• E.g. tic-tac-toe, chess, checkers• Zero-sum games

• One player maximizes result• The other minimizes result

• Minimax search• A state-space search tree• Players alternate• Each layer, or ply, consists of around of moves*• Choose move to position with highest minimax value = best achievable utility against best play

Page 10: Adversarial Search

Games vs. search problems

“Unpredictable" opponent solution is a strategy specifying a move for every possible opponent reply

Time limits unlikely to find goal, must approximate

Plan of attack:◦ Computer considers possible lines of play (Babbage, 1846)◦ Algorithm for perfect play (Zermelo, 1912; Von Neumann, 1944)◦ Finite horizon, approximate evaluation (Zuse, 1945; Wiener, 1948; Shannon,

1950)◦ First chess program (Turing, 1951)◦ Machine learning to improve evaluation accuracy (Samuel, 1952- 57)◦ Pruning to allow deeper search (McCarthy, 1956)

Page 11: Adversarial Search

Searching for the next move

Complexity: many games have a huge search space• Chess: b = 35, m=100nodes = 35 100

• if each node takes about 1 ns to explore then each move will take about 10 50 millennia to calculate.

Resource (e.g., time, memory) limit: optimal solution not feasible/possible, thus must approximate

1. Pruning: makes the search more efficient by discarding portions of the search tree that cannot improve quality result.

2. Evaluation functions: heuristics to evaluate utility of a state without exhaustive search.

Page 12: Adversarial Search

Two-player games

• A game formulated as a search problem

◦ Initial state:◦ Operators:◦ Terminal state:◦ Utility function:

of the

board position and turn definition of legal moves conditions for when game is overa numeric value that describes the outcome

game. E.g., -1, 0, 1 for loss, draw, win. (AKA payoff function)

Page 13: Adversarial Search

Example: Tic-Tac-Toe

CS561 - Lecture 7-8 - Macskassy - Spring 2011 13

Page 14: Adversarial Search

The minimax algorithm

Perfect play for deterministic environments with perfect information Basic idea: choose move with highest minimax value

• = best achievable payoff against best play Algorithm:

1. Generate game tree completely2. Determine utility of each terminal state3. Propagate the utility values upward in the three by applying

• MIN and MAX operators on the nodes in the current level4. At the root node use minimax decision to select the move

• with the max (of the min) utility value

Steps 2 and 3 in the algorithm assume that the opponent will play perfectly.

Page 15: Adversarial Search

Generate Game Tree

Page 16: Adversarial Search

Generate Game Tree

x x x

x

Page 17: Adversarial Search

Generate Game Tree

x

o x x

o

x

o

x o

Page 18: Adversarial Search

Generate Game Tree

1 ply

1 move

x

o x x

o

x

o

x o

Page 19: Adversarial Search

CS561 - Lecture 7-8 - Macskassy - Spring 2011

win

lose

draw

20

x o xo xo

x o xo xx o

x o xx o x

o

x o xo x

x o

x o xx o xo o

x o xx o xo o

x o xo x

x o o

x o xo o xx o

x o xo o xx o

x o xo x

o x o

x o xx o xo x o

x o xo o xx x o

x o xo o xx x o

x o xx o xo x o

A sub-tree

Page 20: Adversarial Search

CS561 - Lecture 7-8 - Macskassy - Spring 2011 20

win

lose

draw

x o xo xo

x o xo xx o

x o xx o x

o

x o xo x

x o

x o xx o xo o

x o xx o xo o

x o xo x

x o o

x o xo o xx o

x o xo o xx o

x o xo x

o x o

x o xx o xo x o

x o xo o xx x o

x o xo o xx x o

x o xx o xo x o

What is a good move?

Page 21: Adversarial Search

CS561 - Lecture 7-8 - Macskassy - Spring 2011 21

MIN

MAX

MiniMax Example

Page 22: Adversarial Search

CS561 - Lecture 7-8 - Macskassy - Spring 2011 22

MiniMax: Recursive Implementation

Page 23: Adversarial Search

Minimax Properties

• Optimal against a perfect player. Otherwise? Time complexity?

◦ O(bm)

Space complexity?◦ O(bm)

For chess, b=35, m=100◦ Exact solution is completely infeasible◦ But, do we need to explore the whole tree?

9 10010 10

max

min

Page 24: Adversarial Search

Resource Limits

• Cannot search to leaves• Depth-limited search

• Instead, search a limited depth of tree

• Replace terminal utilities with an eval

function for non-terminal positions

• Guarantee of optimal play is gone• More plies makes a BIG difference• Example:

◦ Suppose we have 100 seconds, can explore 10K nodes / sec

◦ So can check 1M nodes per move

◦ α – β reaches about depth 8 – decent chess program

? ? ? ?

-1 -2 4 9

4

min min

max

-2 4

Page 25: Adversarial Search

Evaluation Functions

• Function which scores non-terminals

Ideal function: returns the utility of the positionIn practice: typically weighted linear sum of features:

e.g. f1(s) = (num white queens – num black queens), etc.

Page 26: Adversarial Search

Evaluation Functions

Page 27: Adversarial Search

Why Pacman starves

He knows his score will go up by eating the dot nowHe knows his score will go up just as much by eating the dot later onThere are no point- scoring opportunities after eating the dotTherefore, waiting seems just as good as eating

Page 28: Adversarial Search

- pruning: general principle

Player

If

> v then MAX will chose m so prune tree under n

Similar for

for MIN

Player

CS561 - Lecture 7-8 - Macskassy - Spring 2011 28

Opponent

Opponent

m

n v

Page 29: Adversarial Search

- pruning: example 1

MAX

CS561 - Lecture 7-8 - Macskassy - Spring 2011 30

MIN

[-∞,+∞]

Page 30: Adversarial Search

- pruning: example 1

3

MAX

CS561 - Lecture 7-8 - Macskassy - Spring 2011 30

MIN

[-∞,+∞]

Page 31: Adversarial Search

- pruning: example 1

3

MAX

CS561 - Lecture 7-8 - Macskassy - Spring 2011 31

MIN

[-∞,+∞]

[-∞,3]

Page 32: Adversarial Search

- pruning: example 1

3

MAX

MIN

[-∞,+∞]

[-∞,3]

12

CS561 - Lecture 7-8 - Macskassy - Spring 2011 32

Page 33: Adversarial Search

- pruning: example 1

12 83

MAX

MIN

[-∞,+∞] [3,+∞]

[-∞,3]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 33

Page 34: Adversarial Search

- pruning: example 1

12 83

MAX

MIN

2

[3,2]

[-∞,+∞] [3,+∞]

[-∞,3]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 34

Page 35: Adversarial Search

- pruning: example 1

12 83

MAX

MIN

2 14

[-∞,+∞] [3,+∞]

[3,14][3,2]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 35

[-∞,3]

Page 36: Adversarial Search

- pruning: example 1

12 83

MAX

MIN

2 14

[3,5]

5

[3,14]

[-∞,+∞] [3,+∞]

[3,2]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 36

[-∞,3]

Page 37: Adversarial Search

- pruning: example 1

12 83

MAX

MIN

2 14

[3,5]

5

[3,14]

2

[3,2]

[-∞,+∞] [3,+∞]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 37

[3,2][-∞,3]

Page 38: Adversarial Search

- pruning: example 1

12 83 2 14 5 2

MAX

Selected move

MIN

[-∞,+∞] [3,+∞]

[3,5][3,14] [3,2][3,2]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 38

[-∞,3]

Page 39: Adversarial Search

- pruning: example 2

MAX

MIN

[-∞,+∞]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 40

Page 40: Adversarial Search

- pruning: example 2

MAX

MIN

2

[-∞,2]

[-∞,+∞]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 40

Page 41: Adversarial Search

- pruning: example 2

MAX

MIN

5 2

[-∞,2]

[-∞,+∞]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 41

Page 42: Adversarial Search

- pruning: example 2

MAX

MIN

5 2

[-∞,2]

14

[-∞,+∞] [2,+∞]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 42

Page 43: Adversarial Search

- pruning: example 2

MAX

MIN

5 214

[2,5]

5

[-∞,+∞] [2,+∞]

[-∞,2]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 43

Page 44: Adversarial Search

- pruning: example 2

MAX

MIN

5 21451

[2,5] [2,1]

[-∞,+∞] [2,+∞]

[-∞,2]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 44

Page 45: Adversarial Search

- pruning: example 2

MAX

MIN

5 2148

[2,8]

51

[2,5] [2,1]

[-∞,+∞] [2,+∞]

[-∞,2]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 45

Page 46: Adversarial Search

- pruning: example 2

MAX

MIN

5 2148

[2,8]

12 51

[2,5] [2,1]

[-∞,+∞] [2,+∞]

[-∞,2]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 46

Page 47: Adversarial Search

- pruning: example 2

MAX

MIN

5 214812

[2,8][2,3]

3 51

[2,5] [2,1]

[-∞,2]

[-∞,+∞] [2,+∞] [3, +∞]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 47

Page 48: Adversarial Search

MIN

5 2148123 51

[2,8][2,3]

[2,5] [2,1]

[-∞,+∞] [2,+∞] [3, +∞]

- pruning: example 2

MAX

Selected move

[-∞,2]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 48

Page 49: Adversarial Search

- pruning: example 3

MAX

MIN

MIN

[6, ∞]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 50

Page 50: Adversarial Search

- pruning: example 3

MAX

MIN

MIN

[6, ∞]

[-∞,6]

[-∞,6]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 50

Page 51: Adversarial Search

- pruning: example 3

MAX

MIN [-∞,14]

MIN

[6, ∞]

14CS561 - Lecture 7-8 - Macskassy - Spring 2011 51

[-∞,6]

[-∞,6]

Page 52: Adversarial Search

- pruning: example 3

MAX

MIN

MIN [-∞,6]

[-∞,6][6, ∞]

[-∞,14][-∞,5]

14 5CS561 - Lecture 7-8 - Macskassy - Spring 2011 52

Page 53: Adversarial Search

- pruning: example 3

MAX

MIN

MIN [-∞,6]

[5,6][6, ∞]

[-∞,14][-∞,5]

14 5 9CS561 - Lecture 7-8 - Macskassy - Spring 2011 53

Page 54: Adversarial Search

- pruning: example 3

MAX

MIN

MIN

[5,1]

[-∞,6]

[6, ∞] [5,6]

[-∞,14][-∞,5]

14 5 9 1CS561 - Lecture 7-8 - Macskassy - Spring 2011 54

Page 55: Adversarial Search

- pruning: example 3

MAX

MIN

MIN

[5,4]

[-∞,6]

[6, ∞]

[5,1]

[5,6]

[-∞,14][-∞,5]

14 5 9 1 4CS561 - Lecture 7-8 - Macskassy - Spring 2011 55

Page 56: Adversarial Search

- pruning: example 3

MAX

MIN

MIN [-∞,6] [-∞,5]

[6, ∞]

[5,4][5,1]

[5,6]

[-∞,14][-∞,5]

14 5 9 1 4CS561 - Lecture 7-8 - Macskassy - Spring 2011 56

Page 57: Adversarial Search

- pruning: example 3

MAX

MIN

MIN

[5,4]

[-∞,6] [-∞,5]

[6, ∞]

Selected move

[5,1]

[5,6]

[-∞,14][-∞,5]

14 5 9 1 4CS561 - Lecture 7-8 - Macskassy - Spring 2011 57

Page 58: Adversarial Search

- pruning: example 4

MAX

MIN

MIN

[6, ∞]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 59

Page 59: Adversarial Search

- pruning: example 4

MAX

MIN

MIN

[6, ∞]

[-∞,6]

[-∞,6]

CS561 - Lecture 7-8 - Macskassy - Spring 2011 60

Page 60: Adversarial Search

- pruning: example 4

MAX

MIN [-∞,14]

MIN

[6, ∞]

14CS561 - Lecture 7-8 - Macskassy - Spring 2011 60

[-∞,6]

[-∞,6]

Page 61: Adversarial Search

- pruning: example 4

MAX

MIN

MIN [-∞,6]

[-∞,6][6, ∞]

[-∞,14][-∞,7]

14 7CS561 - Lecture 7-8 - Macskassy - Spring 2011 61

Page 62: Adversarial Search

- pruning: example 4

MAX

MIN

MIN [-∞,6]

[7,6][6, ∞]

[-∞,14][-∞,7]

14 7 9CS561 - Lecture 7-8 - Macskassy - Spring 2011 62

Page 63: Adversarial Search

- pruning: example 4

MAX

MIN

MIN

[7,6][6, ∞]

Selected move

[-∞,14][-∞,7]

[-∞,6]

14 7 9CS561 - Lecture 7-8 - Macskassy - Spring 2011 63

Page 64: Adversarial Search

- pruning: general principle

Player

If

> v then MAX will chose m so prune tree under n

Similar for

for MIN

Player

CS561 - Lecture 7-8 - Macskassy - Spring 2011 64

Opponent

Opponent

m

n v

Page 65: Adversarial Search

The α – β algorithm

Page 66: Adversarial Search

Properties α – β algorithm

Page 67: Adversarial Search

Resource limits

Standard approach:◦Use CUTOFF-TEST instead of TERMINAL-TEST

e.g., depth limit (perhaps add quiescence search)

◦Use EVAL instead of UTILITYi.e., evaluation function that estimates desirability of position

Suppose we have 100 seconds, and can explore104 nodes/second

106 nodes per move ≈ 358/2

α – β reaches depth 8 pretty good chess program

Page 68: Adversarial Search

Evaluation Functions

• Function which scores non-terminals

Ideal function: returns the utility of the positionIn practice: typically weighted linear sum of features:

e.g. f1(s) = (num white queens – num black queens), etc.

Page 69: Adversarial Search

Digression: Exact values don't matter

• Behavior is preserved under any monotonic transformation of Eval

• Only the order matters:payoff in deterministic games acts as an ordinal utility

function

Page 70: Adversarial Search

CS561 - Lecture 7-8 - Macskassy - Spring 2011 70

Dice rolls increase b: 21 possible rolls with 2 dice◦ Backgammon 20 legal

moves◦ Depth 4 = 20 x (21 x 20)3 1.2 x 109

As depth increases, probability ofreaching a given node shrinks◦ So value of lookahead is

diminished◦ So limiting depth is less

damaging◦ - pruning is much less

effective TDGammon uses depth-2 search +

very good EVAL function + reinforcement learning◦ world-champion level play

STOCHASTIC GAMES

Page 71: Adversarial Search

Nondeterministic games in general

• In nondeterministic games, chance introduced by dice, card-shuffling• Simplified example with coin-flipping:

Page 72: Adversarial Search

Algorithm for nondeterministic games• Expectiminimax gives perfect play• Just like Minimax, except we must also handle chance nodes:• …• if state is a Max node then

• return the highest ExpectiMinimax-Value of Successors(state)

• if state is a Min node then• return the lowest ExpectiMinimax-Value of Successors(state)

• if state is a chance node then• return average of ExpectiMinimax-Value of Successors(state)

• …

Page 73: Adversarial Search

Expectiminimax

Page 74: Adversarial Search

Digression: Exact values DO matter

• Behavior is preserved only by positive linear transformation of Eval• Hence Eval should be proportional to the expected payoff

Page 75: Adversarial Search

Games of imperfect information

• E.g., card games, where opponent's initial cards are unknown• Typically we can calculate a probability for each possible deal• Seems just like having one big dice roll at the beginning of the game• Idea: compute the minimax value of each action in each deal, then

choose the action with highest expected value over all deals• Special case: if an action is optimal for all deals, it's optimal.• GIB, current best bridge program, approximates this idea by

1) generating 100 deals consistent with bidding information2) picking the action that wins most tricks on average

Page 76: Adversarial Search

Example

• Four-card bridge/whist/hearts hand, Max to play first

Page 77: Adversarial Search

Commonsense example

Page 78: Adversarial Search

Proper analysis

• * Intuition that the value of an action is the average of its values in all actual states is WRONG

• With partial observability, value of an action depends on the information state or belief state the agent is in

• Can generate and search a tree of information states• Leads to rational behaviors such as

• Acting to obtain information• Signalling to one's partner• Acting randomly to minimize information disclosure