game playing chapter 5. game playing §search applied to a problem against an adversary l some...

Game Playing

Chapter 5

Game playingSearch applied to a problem against an adversary

some actions are not under the control of the problem-solver

there is an opponent (hostile agent)

Since it is a search problem, we must specify states & operations/actions initial state = current board; operators = legal moves;

goal state = game over; utility function = value for the outcome of the game

usually, (board) games have well-defined rules & the entire state is accessible

Basic ideaConsider all possible moves for yourselfConsider all possible moves for your opponentContinue this process until a point is reached

where we know the outcome of the gameFrom this point, propagate the best move back

choose best move for yourself at every turn assume your opponent will make the optimal move

on their turn

Example

Tic-tac-toe (Nilsson’s book)

Problem

For interesting games, it is simply not computationally possible to look at all possible moves in chess, there are on average 35 choices per

turn on average, there are about 50 moves per player thus, the number of possibilities to consider is

35100

SolutionGiven that we can only look ahead k number of

moves and that we can’t see all the way to the end of the game, we need a heuristic function that substitutes for looking to the end of the game this is usually called a static board evaluator (SBE) a perfect static board evaluator would tell us for

what moves we could win, lose or draw possible for tic-tac-toe, but not for chess

Creating a SBE approximationTypically, made up of rules of thumb

for example, in most chess books each piece is given a value

• pawn = 1; rook = 5; queen = 9; etc. further, there are other important characteristics of a

position• e.g., center control

we put all of these factors into one function, weighting each aspect differently potentially, to determine the value of a position

• board_value = * material_balance + * center_control + … [the coefficients might change as the game goes on]

Compromise

If we could search to the end of the game, then choosing a move would be relatively easy

just use minimax

Or, if we had a perfect scoring function (SBE), we wouldn’t have to do any search (just choose best move from current state -- one step look ahead)

Since neither is feasible for interesting games, we combine the two ideas

Basic idea

Build the game tree as deep as possible given the time constraints

apply an approximate SBE to the leavespropagate scores back up to the root & use

this information to choose a moveexample

Score percolation: MINIMAX

When it is my turn, I will choose the move that maximizes the (approximate) SBE score

When it is my opponent’s turn, they will choose the move that minimizes the SBE because we are dealing with competitive games,

what is good for me is bad for my opponent & what is bad for me is good for my opponent

assume the opponent plays optimally [worst-case assumption]

MINIMAX algorithm

Start at the the leaves of the trees and apply the SBE

If it is my turn, choose the maximum SBE score for each sub-tree

If it is my opponent’s turn, choose the minimum score for each sub-tree

The scores on the leaves are how good the board appears from that point

Example

Example

Alpha-beta pruningWhile minimax is an effective algorithm, it can

be inefficient one reason for this is that it does unnecessary work it evaluates sub-trees where the value of the sub-

tree is irrelevant alpha-beta pruning gets the same answer as

minimax but it eliminates some useless work

Example

Alpha-Beta Algorithm

•Traverse the search tree in depth-first order

•Assuming we stop the search at ply d, then at each of these nodes we generate, we apply the static evaluation function and return this value to the node's parent

•At each non-leaf node, store a value indicating the best backed-up value found so far. At MAX nodes we'll call this alpha, at MIN nodes we'll call the value beta.

•alpha = best (maximum) value found so far at a MAX node (based on its descendant's values).

•beta = best (i.e., minimum)value found so far at a MIN node (based on its descendant's values).

•The alpha value (of a MAX node) is monotonically non-decreasing •The beta value (of a MIN node) is monotonically non-increasing •Given a node n, cutoff the search below n

(i.e., don't generate any more of n's children) if :•beta cutoff

n is a MAX node andalpha(n) >= beta(i) for some MIN node ancestor i of n.

•alpha cutoffn is a MIN node and beta(n) <= alpha(i) for some MAX node ancestor i of n.

In the example shown above an alpha cutoff

An example of a beta cutoff at node B is shown below: (because alpha(B) = 25 > beta(S) = 20)

MIN . S beta = 20 . | . | MAX A B alpha = 25 20 /|\ / \ / | \ / \ D E 20 -10 -20 25

•To avoid searching for the ancestor nodes in order to make the above tests, we can carry down the tree the best values found so far at the ancestors.

•at a MAX node n: beta = min of all the beta values at MIN node

ancestors of n. at a MIN node n:

alpha = max of all the alpha values at MAX node ancestors of n.

•at each non-leaf node we'll store both an alpha and a beta value.

Initially, root values of alpha = -inf and beta = +infSee the text for Alpha-Beta algorithm

UseWe project ahead k moves, but we only do

one (the best) move thenAfter our opponent moves, we project

ahead k moves so we are possibly repeating some work

However, since most of the work is at the leaves anyway, the amount of work we redo isn’t significant (think of iterative deepening)

Alpha-beta performanceBest-case: can search to twice the depth

during a fixed amount of time [O(bd/2) v. O(bd)]

Worst-case: no savings alpha-beta pruning & minimax always return

the same answer the difference is the amount of work they do effectiveness depends on the order in which

successors are examined• want to examine the best first

Refinements

Waiting for quiescence avoids the horizon effect

• disaster is lurking just beyond our search depth

• on the nth move (the maximum depth I can see) I take your rook, but on the (n+1)th move (a depth to which I don’t look) you checkmate me

solution• when predicted values are changing frequently,

search deeper in that part of the tree (quiescence search)

Secondary search

Find the best move by looking to depth dLook k steps beyond this best move to see if

it still looks goodNo? Look further at second best move, etc.

in general, do a deeper search at parts of the tree that look “interesting”

Picture

Book moves

Build a database of opening moves, end games, tough examples, etc.

If the current state is in the database, use the knowledge in the database to determine the quality of a state

If it’s not in the database, just do alpha-beta pruning

AI & games

Initially felt to be great AI testbedIt turned out, however, that brute-force

search is better than a lot of knowledge engineering scaling up by dumbing down

• perhaps then intelligence doesn’t have to be human-like

more high-speed hardware issues than AI issues however, still good test-beds for learning

game playing chapter 5. game playing §search applied to a problem against an adversary l some...

Documents