cs540 introduction to artificial intelligence lecture...

Post on 13-Oct-2020

30 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Adversarial Search Alpha Beta Pruning Heuristic

CS540 Introduction to Artificial IntelligenceLecture 19

Young WuBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles

Dyer

July 23, 2020

Adversarial Search Alpha Beta Pruning Heuristic

Lion Game ExampleMotivation

Adversarial Search Alpha Beta Pruning Heuristic

Nim Game ExampleMotivation

Adversarial Search Alpha Beta Pruning Heuristic

Game TreeMotivation

The initial state is the beginning of the game.

There are no goal states, but there are multiple terminalstates in which the game ends.

Each successor of a state represents a feasible action (or amove) in the game.

The search problem is to find the terminal state with thelowest cost (or usually the highest reward).

Adversarial Search Alpha Beta Pruning Heuristic

Adversarial SearchMotivation

The main difference between finding solutions of games andstandard search problems or local search problems is that partof the search is performed by an opponent adversarially.

Usually, the opponent wants to maximize the cost or minimizethe reward from the search. This type of search problems iscalled adversarial search.

In game theory, the solution of a game is called anequilibrium. It is a path in which both players do not want tochange actions.

Adversarial Search Alpha Beta Pruning Heuristic

Backward InductionMotivation

Games are usually solved backward starting from the terminalstates.

Each player chooses the best action (successor) given the(already solved) optimal actions of all players in the subtrees(called subgames).

Adversarial Search Alpha Beta Pruning Heuristic

Zero-Sum GamesMotivation

If the sum of the reward or cost over all players at eachterminal state is 0, the game is called a zero-sum game.

Usually, for games with one winner: the reward for winningand the cost of losing are both 1. If the game ends with a tie,both players get 0.

Adversarial Search Alpha Beta Pruning Heuristic

Tic Tac Toe ExampleMotivation

Adversarial Search Alpha Beta Pruning Heuristic

Nim Game ExampleMotivation

Ten objects. Pick 1 or 2 each time. Pick the last one to win.

A: Pick 1.

B: Pick 2.

C, D, E: Don’t choose.

Adversarial Search Alpha Beta Pruning Heuristic

Minimax AlgorithmDescription

Use DFS on the game tree.

Adversarial Search Alpha Beta Pruning Heuristic

Minimax AlgorithmAlgorithm

Input: a game tree pV ,E , cq, and the current state s.

Output: the value of the game at s.

If s is a terminal state, return c psq.

If the player is MAX, return the maximum value over allsuccessors.

α psq � maxs 1Ps 1psq

β�s 1�

If the player is MIN, return the minimum value over allsuccessors.

β psq � mins 1Ps 1psq

α�s 1�

Adversarial Search Alpha Beta Pruning Heuristic

BacktrackingDiscussion

The optimal actions (solution paths) can be found bybacktracking from all terminal states as in DFS.

s� psq � arg maxs 1Ps 1psq

β�s 1�

for MAX

s� psq � arg mins 1Ps 1psq

α�s 1�

for MIN

Adversarial Search Alpha Beta Pruning Heuristic

Minimax PerformanceDiscussion

The time and space complexity is the same as DFS. Note thatD � d is the maximum depth of the terminal states.

T � 1 � b � b2 � ...� bd

S � pb � 1q � d

Adversarial Search Alpha Beta Pruning Heuristic

Non-deterministic GameDiscussion

For non-deterministic games in which chance can make amove (dice roll or coin flip), use expected reward or costinstead.

The algorithm is also called expectiminimax.

Adversarial Search Alpha Beta Pruning Heuristic

PruningMotivation

Time complexity is a problem because the computer usuallyhas a limited amount of time to ”think” and make a move.

It is possible to reduce the time complexity by removing thebranches that will not lead the current player to win. It iscalled the Alpha-Beta pruning.

Adversarial Search Alpha Beta Pruning Heuristic

Alpha Beta PruningDescription

During DFS, keep track of both α and β for each vertex.

Prune the subtree with α ¥ β.

Adversarial Search Alpha Beta Pruning Heuristic

Alpha Beta Pruning Simple ExampleDefinition

Adversarial Search Alpha Beta Pruning Heuristic

Alpha Beta Pruning Algorithm, Part IAlgorithm

Input: a game tree pV ,E , cq, and the current state s.

Output: the value of the game at s.

If s is a terminal state, return c psq.

Adversarial Search Alpha Beta Pruning Heuristic

Alpha Beta Pruning Algorithm, Part IIAlgorithm

If the player is MAX, return the maximum value over allsuccessors.

α psq � maxs 1Ps 1psq

β�s 1�

β psq � β p parent psqq

Stop and return β if α ¥ β.

If the player is MIN, return the minimum value over allsuccessors.

β psq � mins 1Ps 1psq

α�s 1�

α psq � α p parent psqq

Stop and return α if α ¥ β.

Adversarial Search Alpha Beta Pruning Heuristic

Alpha Beta PerformanceDiscussion

In the best case, the best action of each player is the leftmostchild.

In the worst case, Alpha Beta is the same as minimax.

Adversarial Search Alpha Beta Pruning Heuristic

Static Evaluation FunctionDefinition

A static board evaluation function is a heuristics to estimatethe value of non-terminal states.

It should reflect the player’s chances of winning from thatvertex.

It should be easy to compute from the board configuration.

Adversarial Search Alpha Beta Pruning Heuristic

Evaluation Function PropertiesDefinition

If the SBE for one player is x , then the SBE for the otherplayer should be �x .

The SBE should agree with the cost or reward at terminalvertices.

Adversarial Search Alpha Beta Pruning Heuristic

Linear Evaluation Function ExampleDefinition

For Chess, an example of an evaluation function can be alinear combination of the following variables.

1 Material.

2 Mobility.

3 King safety.

4 Center control.

These are called the features of the board.

Adversarial Search Alpha Beta Pruning Heuristic

Iterative Deepening SearchDiscussion

IDS could be used with SBE.

In iteration d , the depth is limited to d , and the SBE of thenon-terminal vertices are used as their cost or reward.

Adversarial Search Alpha Beta Pruning Heuristic

Non Linear Evaluation FunctionDiscussion

The SBE can be estimated given the features using a neuralnetwork.

The features are constructed using domain knowledge, or apossibly a convolutional neural network.

The training data are obtained from games betweenprofessional players.

Adversarial Search Alpha Beta Pruning Heuristic

Monte Carlo Tree SearchDiscussion

Simulate random games by selecting random moves for bothplayers.

Exploitation by keeping track of average win rate for eachsuccessor from previous searches and picking the successorsthat lead to more wins.

Exploration by allowing random choices of unvisitedsuccessors.

Adversarial Search Alpha Beta Pruning Heuristic

Monte Carlo Tree Search DiagramDiscussion

Adversarial Search Alpha Beta Pruning Heuristic

Upper Confidence BoundDiscussion

Combine exploitation and exploration by picking successorsusing upper confidence bound for tree.

ws

ns� c

clog t

ns

ws is the number of wins after successor s, and ns the numberof simulations after successor s, and t is the total number ofsimulations.

Similar to the UCB algorithm for MAB.

Adversarial Search Alpha Beta Pruning Heuristic

Alpha GO ExampleDiscussion

MCTS with ¡ 105 play-outs.

Deep neural network to compute SBE.

top related