csc 412: ai adversarial search bikramjit banerjee partly based on material available from the...

33
CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

Upload: roy-wilkinson

Post on 11-Jan-2016

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

CSC 412: AI Adversarial Search

Bikramjit Banerjee

Partly based on material available from the internet

Page 2: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

Game search problems Search problems

Only problem solver actions can change the state of the environment

Game search problems Multiple problems solvers (players) acting on the same

environment Players’ actions can be

Cooperative: common goal state Adversarial:

a win for one player is a loss for the other Example: zero-sum games like chess, tic-tac-toe

A whole spectrum between purely adversarial and purely cooperative games

We first look at adversarial two-player games with turn-taking

Page 3: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

Game Playing: State of the art Checkers: Chinook ended 40-year-reign of human world

champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions

Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue examined 200 million positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply

Othello: human champions refuse to compete against computers, which are too good.

Go: human champions refuse to compete against computers, which are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.

Page 4: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

Two Player Games

Max always moves first Min is the opponent states? Boards faced by Max/Min actions? Players’ moves goal test? Terminal board test path cost? Utility function for each player

Max Vs Min

Page 5: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

X O

X

X O

X

X

O

X X O

O O X

X

X O X

O

O O

X X X

O

O

X

... ... ...

-1 O 1Utility

Terminal States

Max

Min

Max... ... ... ... ... ...

Search tree: Alternate move games

Page 6: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

An action by one player is called a ply, two ply (an action and a counter action) is called a move.

3 12 8

A11 A12 A13

2 4 6

A21 A22 A23

14 5 2

A31 A32 A33

A3A2A1

A simple abstract game

Page 7: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

The Minimax Algorithm Generate the game tree down to the terminal nodes Apply the utility function to the terminal nodes For a S set of sibling nodes, pass up to the parent

the lowest value in S if the siblings are the largest value in S if the siblings are

Recursively do the above, until the backed-up values reach the initial state

The value of the initial state is the minimum score for Max

Page 8: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

3 12 8

A11 A12 A13

3

2 4 6

A21 A22 A23

2

14 5 2

A31 A32 A33

2

3

A3A2A1

Minimax Decision

In this game Max’s best move is A1, because he is guaranteed a score of at least 3

MAX

MIN

MAX

max

min

Page 9: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet
Page 10: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

Properties of Minimax

Complete? Yes (if tree is finite) Optimal? Yes (against an optimal opponent) Time complexity? O(bm) Space complexity? O(bm) (depth-first exploration)

For chess, b ≈ 35, m ≈100 for "reasonable" games finding optimal solution using Minimax is infeasible

Potential improvement to Minimax running time Depth limited search pruning

Page 11: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

We would like to do Minimax on this full game tree...

… but we don’t have time, so we will explore it to some manageable depth.

cutoff

Depth limited Minimax One possible solution is to do depth

limited Minimax search Search the game tree as deep as you

can in the given time Evaluate the fringe nodes with the utility function Back up the values to the root Choose best move, repeat

Page 12: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

Example Utility FunctionTic Tac ToeAssume Max is using “X”

e(n) =

if n is win for Max, + if n is win for Min, -

else

(number of rows, columns and diagonals available to Max) - (number of rows, columns and diagonals available to Min)

X O

X

O

XO

e(n) = 6 - 4 = 2

e(n) = 4 - 3 = 1

Page 13: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

Example Utility FunctionChess IAssume Max is “White”

Assume each piece has the following valuespawn = 1;knight = 3;bishop = 3;rook = 5;queen = 9;

let w = sum of the value of white pieceslet b = sum of the value of black piecese(n) =

w - bw + b

Note that this value ranges between 1 and -1

Page 14: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

Example Utility Function

Chess II

The previous evaluation function naively gave the same weight to a piece regardless of its position on the board

Let Xi be the number of squares the ith piece attacks

e(n) =same as before

w=piece1value * X1 + piece2value * X2 + ...

Page 15: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

Utility Functions the ability to play a good game is highly dependant

on the evaluation functions How do we come up with good evaluation

functions? Interview an expert Machine learning

Examples of class A Examples of class B 1) What class isthis object?

2) What class isthis object?

1

2

3

4

1

2

3

4

Examples ofwin for white 1) Who would

win this game?

1

2

1

2

Examples ofwin for black

1) Who wouldwin this game?

Page 16: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

α-β Pruning We have seen how to use Minimax

search to play an optimal game We have seen that because of time

limitations we may have to use a cutoff depth to make the search tractable. Using a cutoff causes problem because of the “horizon” effect

Is there some way we can search deeper in the same amount of time?

Yes! Use Alpha-Beta Pruning

Best move before cutoff...

… but all its children are losing moves

Game winning move.

Page 17: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

α-β Pruning

3 12 8

A11 A12 A13

3

2

A21 A22 A23

2

14 5 2

A31 A32 A33

2

3

A3A2A1

“If you have an idea that is surely bad, don't take the time to see how truly awful it is”

-- Pat Winston

max

min

Page 18: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet
Page 19: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet
Page 20: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

α-β pruning: Another example

Example courtesy of Dr. Milos Hauskrecht

MAX

MAX

MIN

4 3 6 2 2 1 9 35 1 5 4 7 5

Page 21: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

α-β pruning

MAX

MAX

MIN

4 3 6 2 2 1 9 35 1 5 4 7 5

4

Page 22: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

α-β pruning

MAX

MAX

MIN

4 3 6 2 2 1 9 35 1 5 4 7 5

4

4

Page 23: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

α-β pruning

MAX

MAX

MIN

4 3 6 2 2 1 9 35 1 5 4 7 5

4

64!!

Page 24: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

α-β pruning

MAX

MAX

MIN

4 3 6 2 2 1 9 35 1 5 4 7 5

4

64

4

Page 25: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

α-β pruningMAX

MAX

MIN

4 3 6 2 2 1 9 35 1 5 4 7

4

64

4

2

Page 26: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

α-β pruningMAX

MAX

MIN

4 3 6 2 2 1 9 35 1 5 4 7 5

4

64

4

2

2

Page 27: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

α-β pruning

MAX

MAX

MIN

4 3 6 2 2 1 9 35 1 5 4 7 5

4

64

4

2

2

!!

Page 28: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

α-β pruning

MAX

MAX

MIN

4 3 6 2 2 1 9 35 1 5 4 7 5

4

64

4

2

2

5

Page 29: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

α-β pruningMAX

MAX

MIN

4 3 6 2 2 1 9 35 1 5 4 7 5

4

64

5

2

2

5

5

7

nodes that were never explored

Higher values firstbelow MAX level

Lower values firstbelow MIN level

Page 30: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

α-β Pruning Guaranteed to compute the same value for root as Minimax In the worst case α-β does NO pruning, examining bd leaf nodes,

where each node has b children and a d-ply search is performed In the best case, α-β will examine only 2bd/2 leaf nodes. Hence if

you hold fixed the number of leaf nodes then you can search twice as deep as Minimax

The best case occurs when each player's best move is the leftmost alternative (i.e., the first child generated). So, at MAX nodes the child with the largest value is generated first, and at MIN nodes the child with the smallest value is generated first -> order the operators carefully

In the chess program Deep Blue, they found empirically that α-β pruning meant that the average branching factor at each node was ~6 instead of ~ 35-40

Page 31: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

Non Zero-sum games

Similar to minimax: Utilities are now tuples

Each player maximizes their own entry at each node

Propagate (or back up) nodes from children

Page 32: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

Stochastic 2-player games E.g. backgammon

Expectiminimax Environment is an extra player

that moves after each agent At chance nodes take

expectations, otherwise like minimax

Page 33: CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

Stochastic 2-player games

Dice rolls increase b: 21 possible rolls with 2 dice Backgammon ≈ 20 legal moves Depth 4 = 20 x (21 x 20)3 1.2 x 109

As depth increases, probability of reaching a given node shrinks So value of lookahead is diminished So limiting depth is less damaging But pruning is less possible

TDGammon uses depth-2 search + very good eval function + reinforcement learning: world-champion level play