cs540 introduction to artificial intelligence lecture...

29
Adversarial Search Alpha Beta Pruning Heuristic CS540 Introduction to Artificial Intelligence Lecture 19 Young Wu Based on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July 23, 2020

Upload: others

Post on 13-Oct-2020

27 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

CS540 Introduction to Artificial IntelligenceLecture 19

Young WuBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles

Dyer

July 23, 2020

Page 2: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Lion Game ExampleMotivation

Page 3: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Nim Game ExampleMotivation

Page 4: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Game TreeMotivation

The initial state is the beginning of the game.

There are no goal states, but there are multiple terminalstates in which the game ends.

Each successor of a state represents a feasible action (or amove) in the game.

The search problem is to find the terminal state with thelowest cost (or usually the highest reward).

Page 5: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Adversarial SearchMotivation

The main difference between finding solutions of games andstandard search problems or local search problems is that partof the search is performed by an opponent adversarially.

Usually, the opponent wants to maximize the cost or minimizethe reward from the search. This type of search problems iscalled adversarial search.

In game theory, the solution of a game is called anequilibrium. It is a path in which both players do not want tochange actions.

Page 6: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Backward InductionMotivation

Games are usually solved backward starting from the terminalstates.

Each player chooses the best action (successor) given the(already solved) optimal actions of all players in the subtrees(called subgames).

Page 7: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Zero-Sum GamesMotivation

If the sum of the reward or cost over all players at eachterminal state is 0, the game is called a zero-sum game.

Usually, for games with one winner: the reward for winningand the cost of losing are both 1. If the game ends with a tie,both players get 0.

Page 8: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Tic Tac Toe ExampleMotivation

Page 9: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Nim Game ExampleMotivation

Ten objects. Pick 1 or 2 each time. Pick the last one to win.

A: Pick 1.

B: Pick 2.

C, D, E: Don’t choose.

Page 10: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Minimax AlgorithmDescription

Use DFS on the game tree.

Page 11: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Minimax AlgorithmAlgorithm

Input: a game tree pV ,E , cq, and the current state s.

Output: the value of the game at s.

If s is a terminal state, return c psq.

If the player is MAX, return the maximum value over allsuccessors.

α psq � maxs 1Ps 1psq

β�s 1�

If the player is MIN, return the minimum value over allsuccessors.

β psq � mins 1Ps 1psq

α�s 1�

Page 12: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

BacktrackingDiscussion

The optimal actions (solution paths) can be found bybacktracking from all terminal states as in DFS.

s� psq � arg maxs 1Ps 1psq

β�s 1�

for MAX

s� psq � arg mins 1Ps 1psq

α�s 1�

for MIN

Page 13: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Minimax PerformanceDiscussion

The time and space complexity is the same as DFS. Note thatD � d is the maximum depth of the terminal states.

T � 1 � b � b2 � ...� bd

S � pb � 1q � d

Page 14: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Non-deterministic GameDiscussion

For non-deterministic games in which chance can make amove (dice roll or coin flip), use expected reward or costinstead.

The algorithm is also called expectiminimax.

Page 15: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

PruningMotivation

Time complexity is a problem because the computer usuallyhas a limited amount of time to ”think” and make a move.

It is possible to reduce the time complexity by removing thebranches that will not lead the current player to win. It iscalled the Alpha-Beta pruning.

Page 16: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Alpha Beta PruningDescription

During DFS, keep track of both α and β for each vertex.

Prune the subtree with α ¥ β.

Page 17: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Alpha Beta Pruning Simple ExampleDefinition

Page 18: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Alpha Beta Pruning Algorithm, Part IAlgorithm

Input: a game tree pV ,E , cq, and the current state s.

Output: the value of the game at s.

If s is a terminal state, return c psq.

Page 19: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Alpha Beta Pruning Algorithm, Part IIAlgorithm

If the player is MAX, return the maximum value over allsuccessors.

α psq � maxs 1Ps 1psq

β�s 1�

β psq � β p parent psqq

Stop and return β if α ¥ β.

If the player is MIN, return the minimum value over allsuccessors.

β psq � mins 1Ps 1psq

α�s 1�

α psq � α p parent psqq

Stop and return α if α ¥ β.

Page 20: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Alpha Beta PerformanceDiscussion

In the best case, the best action of each player is the leftmostchild.

In the worst case, Alpha Beta is the same as minimax.

Page 21: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Static Evaluation FunctionDefinition

A static board evaluation function is a heuristics to estimatethe value of non-terminal states.

It should reflect the player’s chances of winning from thatvertex.

It should be easy to compute from the board configuration.

Page 22: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Evaluation Function PropertiesDefinition

If the SBE for one player is x , then the SBE for the otherplayer should be �x .

The SBE should agree with the cost or reward at terminalvertices.

Page 23: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Linear Evaluation Function ExampleDefinition

For Chess, an example of an evaluation function can be alinear combination of the following variables.

1 Material.

2 Mobility.

3 King safety.

4 Center control.

These are called the features of the board.

Page 24: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Iterative Deepening SearchDiscussion

IDS could be used with SBE.

In iteration d , the depth is limited to d , and the SBE of thenon-terminal vertices are used as their cost or reward.

Page 25: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Non Linear Evaluation FunctionDiscussion

The SBE can be estimated given the features using a neuralnetwork.

The features are constructed using domain knowledge, or apossibly a convolutional neural network.

The training data are obtained from games betweenprofessional players.

Page 26: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Monte Carlo Tree SearchDiscussion

Simulate random games by selecting random moves for bothplayers.

Exploitation by keeping track of average win rate for eachsuccessor from previous searches and picking the successorsthat lead to more wins.

Exploration by allowing random choices of unvisitedsuccessors.

Page 27: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Monte Carlo Tree Search DiagramDiscussion

Page 28: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Upper Confidence BoundDiscussion

Combine exploitation and exploration by picking successorsusing upper confidence bound for tree.

ws

ns� c

clog t

ns

ws is the number of wins after successor s, and ns the numberof simulations after successor s, and t is the total number ofsimulations.

Similar to the UCB algorithm for MAB.

Page 29: CS540 Introduction to Artificial Intelligence Lecture 19pages.cs.wisc.edu/~yw/CS540/CS540_Lecture_19_P.pdfBased on lecture slides by Jerry Zhu, Yingyu Liang, and Charles Dyer July

Adversarial Search Alpha Beta Pruning Heuristic

Alpha GO ExampleDiscussion

MCTS with ¡ 105 play-outs.

Deep neural network to compute SBE.