cs540 introduction to artificial intelligence lecture...

Adversarial Search Alpha Beta Pruning Heuristic

CS540 Introduction to Artificial IntelligenceLecture 19

Young WuBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles

Dyer

July 23, 2020


Lion Game ExampleMotivation


Nim Game ExampleMotivation


Game TreeMotivation

The initial state is the beginning of the game.

There are no goal states, but there are multiple terminalstates in which the game ends.

Each successor of a state represents a feasible action (or amove) in the game.

The search problem is to find the terminal state with thelowest cost (or usually the highest reward).


Adversarial SearchMotivation

The main difference between finding solutions of games andstandard search problems or local search problems is that partof the search is performed by an opponent adversarially.

Usually, the opponent wants to maximize the cost or minimizethe reward from the search. This type of search problems iscalled adversarial search.

In game theory, the solution of a game is called anequilibrium. It is a path in which both players do not want tochange actions.


Backward InductionMotivation

Games are usually solved backward starting from the terminalstates.

Each player chooses the best action (successor) given the(already solved) optimal actions of all players in the subtrees(called subgames).


Zero-Sum GamesMotivation

If the sum of the reward or cost over all players at eachterminal state is 0, the game is called a zero-sum game.

Usually, for games with one winner: the reward for winningand the cost of losing are both 1. If the game ends with a tie,both players get 0.


Tic Tac Toe ExampleMotivation


Nim Game ExampleMotivation

Ten objects. Pick 1 or 2 each time. Pick the last one to win.

A: Pick 1.

B: Pick 2.

C, D, E: Don’t choose.


Minimax AlgorithmDescription

Use DFS on the game tree.


Minimax AlgorithmAlgorithm

Input: a game tree pV ,E , cq, and the current state s.

Output: the value of the game at s.

If s is a terminal state, return c psq.

If the player is MAX, return the maximum value over allsuccessors.

α psq � maxs 1Ps 1psq

β�s 1�

If the player is MIN, return the minimum value over allsuccessors.

β psq � mins 1Ps 1psq

α�s 1�


BacktrackingDiscussion

The optimal actions (solution paths) can be found bybacktracking from all terminal states as in DFS.

s� psq � arg maxs 1Ps 1psq

β�s 1�

for MAX

s� psq � arg mins 1Ps 1psq

α�s 1�

for MIN


Minimax PerformanceDiscussion

The time and space complexity is the same as DFS. Note thatD � d is the maximum depth of the terminal states.

T � 1 � b � b2 � ...� bd

S � pb � 1q � d


Non-deterministic GameDiscussion

For non-deterministic games in which chance can make amove (dice roll or coin flip), use expected reward or costinstead.

The algorithm is also called expectiminimax.


PruningMotivation

Time complexity is a problem because the computer usuallyhas a limited amount of time to ”think” and make a move.

It is possible to reduce the time complexity by removing thebranches that will not lead the current player to win. It iscalled the Alpha-Beta pruning.


Alpha Beta PruningDescription

During DFS, keep track of both α and β for each vertex.

Prune the subtree with α ¥ β.


Alpha Beta Pruning Simple ExampleDefinition


Alpha Beta Pruning Algorithm, Part IAlgorithm

Input: a game tree pV ,E , cq, and the current state s.

Output: the value of the game at s.

If s is a terminal state, return c psq.


Alpha Beta Pruning Algorithm, Part IIAlgorithm

If the player is MAX, return the maximum value over allsuccessors.

α psq � maxs 1Ps 1psq

β�s 1�

β psq � β p parent psqq

Stop and return β if α ¥ β.

If the player is MIN, return the minimum value over allsuccessors.

β psq � mins 1Ps 1psq

α�s 1�

α psq � α p parent psqq

Stop and return α if α ¥ β.


Alpha Beta PerformanceDiscussion

In the best case, the best action of each player is the leftmostchild.

In the worst case, Alpha Beta is the same as minimax.


Static Evaluation FunctionDefinition

A static board evaluation function is a heuristics to estimatethe value of non-terminal states.

It should reflect the player’s chances of winning from thatvertex.

It should be easy to compute from the board configuration.


Evaluation Function PropertiesDefinition

If the SBE for one player is x , then the SBE for the otherplayer should be �x .

The SBE should agree with the cost or reward at terminalvertices.


Linear Evaluation Function ExampleDefinition

For Chess, an example of an evaluation function can be alinear combination of the following variables.

1 Material.

2 Mobility.

3 King safety.

4 Center control.

These are called the features of the board.


Iterative Deepening SearchDiscussion

IDS could be used with SBE.

In iteration d , the depth is limited to d , and the SBE of thenon-terminal vertices are used as their cost or reward.


Non Linear Evaluation FunctionDiscussion

The SBE can be estimated given the features using a neuralnetwork.

The features are constructed using domain knowledge, or apossibly a convolutional neural network.

The training data are obtained from games betweenprofessional players.


Monte Carlo Tree SearchDiscussion

Simulate random games by selecting random moves for bothplayers.

Exploitation by keeping track of average win rate for eachsuccessor from previous searches and picking the successorsthat lead to more wins.

Exploration by allowing random choices of unvisitedsuccessors.


Monte Carlo Tree Search DiagramDiscussion


Upper Confidence BoundDiscussion

Combine exploitation and exploration by picking successorsusing upper confidence bound for tree.

ws

ns� c

clog t

ns

ws is the number of wins after successor s, and ns the numberof simulations after successor s, and t is the total number ofsimulations.

Similar to the UCB algorithm for MAB.


Alpha GO ExampleDiscussion

MCTS with ¡ 105 play-outs.

Deep neural network to compute SBE.

cs540 introduction to artificial intelligence lecture...

Documents