games tamara berg cs 560 artificial intelligence many slides throughout the course adapted from...

Games

Tamara Berg

CS 560 Artificial Intelligence

Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer

Announcements

• I sent out an email to the class mailing list on Thurs (making the medium and big cheese mazes extra credit).

• HW2 will be released later this week.

Adversarial search(Chapter 5)

World Champion chess player Garry Kasparov is defeated by IBM’s Deep

Blue chess-playing computer in a six-game match in May, 1997

(link)

© Telegraph Group Unlimited 1997

http://www.computerhistory.org/chess/full_record.php?iid=stl-431e1a07b22e1&mainImage=1

Why study games?

• Games are a traditional hallmark of intelligence• Games are easy to formalize• Games can be a good model of real-world

competitive or cooperative activities– Military confrontations, negotiation, auctions, etc.

Types of game environments

Deterministic Stochastic

Perfect information(fully observable)

Imperfect information(partially observable)

Chess, checkers, go Backgammon, monopoly

Battleship Scrabble, poker, bridge

Alternating two-player zero-sum games

• Players take turns• Each game outcome or terminal state has a

utility for each player (e.g., 1 for win, 0 for loss)• The sum of both players’ utilities is a constant

Games vs. single-agent search

• We don’t know how the opponent will act– The solution is not a fixed sequence of actions from start

state to goal state, but a strategy or policy (a mapping from state to best move in that state)

• Efficiency is critical to playing well– The time to make a move is limited– The branching factor, search depth, and number of

terminal configurations are huge• In chess, branching factor ≈ 35 and depth ≈ 100, giving a search

tree of 10154 nodes– Number of atoms in the observable universe ≈ 1080

– This rules out searching all the way to the end of the game

Search problem components• Initial state• Actions• Transition model

– What state results fromperforming a given action in a given state?

• Goal state• Path cost

– Assume that it is a sum of nonnegative step costs

• The optimal solution is the sequence of actions that gives the lowest path cost for reaching the goal

Initialstate

Goal state

Deterministic Games

Many possible formulations, e.g.:• States: S (start at initial state)• Players: P={1…N} (usually take turns)• Actions: A (may depend on player/state)• Transition Model: SxA -> S• Terminal Test: S->{t,f}• Terminal Utilities: SxP -> R

Solution for a player is a policy: S->A

Deterministic Single-Player

Deterministic, single player, perfect information (e.g. 8-puzzle, rubik’s cube):• Know the rules, know what actions do,

know when you win

… it’s just search!

Slight reinterpretation:• Each node stores a value: the best

outcome it can reach• This is the maximal value of its children • Note we don’t have path sums as

before (utilities at end)

After search, pick moves that lead to best outcome

Game tree• A game of tic-tac-toe between two players, “MAX” (X) and “MIN” (O)

A more abstract game tree

Terminal utilities (for MAX)

3 2 2

3

A two-ply game

A more abstract game tree

• Minimax value of a node: the utility (for MAX) of being in the corresponding state, assuming perfect play on both sides

• Minimax strategy: Choose the move that gives the best worst-case payoff

3 2 2

3

Computing the minimax value of a node

• Minimax(node) = Utility(node) if node is terminal maxaction Minimax(Succ(node, action)) if player = MAX

minaction Minimax(Succ(node, action)) if player = MIN

3 2 2

3

Optimality of minimax

• The minimax strategy is optimal against an optimal opponent

• What if your opponent is suboptimal?– Your utility can only be higher than if

you were playing an optimal opponent!– A different strategy may work better for

a sub-optimal opponent, but it will necessarily be worse against an optimal opponent

11

Example from D. Klein and P. Abbeel

More general games

• More than two players, non-zero-sum• Utilities are now tuples• Each player maximizes their own utility at their node• Utilities get propagated (backed up) from children to parents

4,3,2 7,4,1

4,3,2

1,5,2 7,7,1

1,5,2

4,3,2

10 9

x

Alpha-beta pruning• It is possible to compute the exact minimax decision

without expanding every node in the game tree



3

3



3

3

2



3

3

2 14



3

3

2 5



3

3

2 2

General alpha-beta pruning• Consider a node n in the tree

---

• If player has a better choice at:– Parent node of n– Or any choice point further up

• Then n will never be reached in play.

• Hence, when that much is known about n, it can be pruned.

Alpha-beta Algorithm• Depth first search

– only considers nodes along a single path from root at any time

a = highest-value choice found at any choice point of path for MAX

(initially, a = −infinity)

b = lowest-value choice found at any choice point of path for MIN

(initially, = +infinity)

• Pass current values of a and b down to child nodes during search.• Update values of a and b during search as:

– MAX updates at MAX nodes– MIN updates at MIN nodes

• Prune remaining branches at a node when a ≥ b

Alpha-Beta Example Revisited

, , initial valuesDo DF-search until first leaf

=− =+

=− =+

, , passed to kids

Alpha-Beta Example (continued)

MIN updates , based on kids

=− =+

=− =3


=− =3

MIN updates , based on kids.No change.

=− =+


MAX updates , based on kids.=3 =+

3 is returnedas node value.


=3 =+

=3 =+

, , passed to kids


=3 =+

=3 =2

MIN updates ,based on kids.


=3 =2

≥ ,so prune.

=3 =+


2 is returnedas node value.

MAX updates , based on kids.No change. =3

=+


,=3 =+

=3 =+

, , passed to kids


,

=3 =14

=3 =+



,

=3 =5

=3 =+



=3 =+

2 =3 =2



=3 =+ 2 is returned

as node value.

2


2

Alpha-beta pruning

• Pruning does not affect final result• Amount of pruning depends on move ordering

Example

3 4 1 2 7 8 5 6

-which nodes can be pruned?

Answer to Example

3 4 1 2 7 8 5 6


Answer: NONE! Because the most favorable nodes for both are explored last (i.e., in the diagram, are on the right-hand side).

Max

Min

Max

Second Example(the exact mirror image of the first example)

6 5 8 7 2 1 3 4


Answer to Second Example(the exact mirror image of the first example)

6 5 8 7 2 1 3 4


Min

Max

Max

Answer: LOTS! Because the most favorable nodes for both are explored first (i.e., in the diagram, are on the left-hand side).

Alpha-beta pruning

• Pruning does not affect final result• Amount of pruning depends on move ordering

– Should start with the “best” moves (highest-value for MAX or lowest-value for MIN)

– For chess, can try captures first, then threats, then forward moves, then backward moves

– Can also try to remember “killer moves” from other branches of the tree

• With perfect ordering, the time to find the best move is reduced to O(bm/2) from O(bm)– Depth of search you can afford is effectively doubled

games tamara berg cs 560 artificial intelligence many slides throughout the course adapted from...

Documents

game state

terminal state

search depth

search tree

game match

study games

observable universe

hugein chess