upper confidence trees for game ai chahine koleejan

19
Upper Confidence Trees for Game AI Chahine Koleejan

Upload: bartholomew-gordon

Post on 16-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Upper Confidence Trees for Game AI Chahine Koleejan

Upper Confidence Treesfor Game AI

Chahine Koleejan

Page 2: Upper Confidence Trees for Game AI Chahine Koleejan

Background on Game AI

• For many years, computer chess was considered an ideal sandbox for testing AI algorithms

• Simple rules and clear benchmarks of performance against human intelligence

• Alpha-beta search programs domination over human players changed this

Page 3: Upper Confidence Trees for Game AI Chahine Koleejan

The Game of Go

• Researchers moved on to Go as their new challenge

• The game of Go is much harder to crack:1. Massive search space– 19x19 board -> up to 361 possible moves per turn– More than 10170 possible states2. Game itself is very complex– Hard to find good heuristics

Page 4: Upper Confidence Trees for Game AI Chahine Koleejan

Example of a Game of Go

Honinbo Shusaku(Black) vs Gennan Inseki(White), 1846

Page 5: Upper Confidence Trees for Game AI Chahine Koleejan

The Multi-arm Bandit Setting

• Hypothetical probability settting• Gambler is at a row of k-”bandits”• When a bandit is pulled the gambler gets

some amount of money• Each bandit has a different probability

distribution• The gambler must decide which bandits to pull

to maximise his reward

Page 6: Upper Confidence Trees for Game AI Chahine Koleejan

Exploitation and Exploration

• We need to balance the exploitation of the action currently believed to be optimal with the exploration of other actions that may be better in the long run

• Upper Confidence Bound:– We want to maximise this value for an arm j:

UCB1 = x ̅j + √[(2 ln n)/nj]

Page 7: Upper Confidence Trees for Game AI Chahine Koleejan

Why do we care?

Page 8: Upper Confidence Trees for Game AI Chahine Koleejan

Why do we care?

• Sequential decision making games are basically a multi-arm bandit problem!

Page 9: Upper Confidence Trees for Game AI Chahine Koleejan

Why do we care?

• Sequential decision making games are basically a multi-arm bandit problem!

• …But worse.

Page 10: Upper Confidence Trees for Game AI Chahine Koleejan

Why do we care?

• Sequential decision making games are basically a multi-arm bandit problem!

• …But worse.

• …But it’s close enough so we can use the math.

Page 11: Upper Confidence Trees for Game AI Chahine Koleejan

Monte Carlo Tree Search(MCTS)

• A tree search method which has revolutionised computer Go

• Works by simulating thousands of random games

• Does not need any prior knowledge of the game

• Does not need heuristics or evaluation functions, just observes the outcome of the simulation

Page 12: Upper Confidence Trees for Game AI Chahine Koleejan

UCT Algorithm

• We have a tree where each node has a value given by the UCB1 bound

• Steps of the algorithm:1. Selection2. Expansion3. Simulation4. Backpropagation

Page 13: Upper Confidence Trees for Game AI Chahine Koleejan

Selection and Expansion

• Starting at root node, recursively choose the child with the highest value until we reach an expandable node

• A node is expandable if it is non-terminal and has unvisited children

• One child node is added to our tree

Page 14: Upper Confidence Trees for Game AI Chahine Koleejan

Simulation

• A simulation is run from the new node to the end of the game according to our defined default policy

• At the most basic level the default policy is just random legal play

Page 15: Upper Confidence Trees for Game AI Chahine Koleejan

Backpropagation

• The simulation result is “backed up” (i.e backpropagated) up the tree through the selected nodes to update their value

• For example, +1 if we won and -1 if we lost

Page 16: Upper Confidence Trees for Game AI Chahine Koleejan

Example

Page 17: Upper Confidence Trees for Game AI Chahine Koleejan

References

• A Survey of Monte Carlo Tree Search Methods, Cameron B. Browne and co. IEEE Transactions on Computational Intelligence and AI in Games, 2012

• Monte-Carlo tree search and rapid action value estimation in computer Go, Sylvain Gelly & David Silver, Artificial Intelligence 175, 2011

Page 18: Upper Confidence Trees for Game AI Chahine Koleejan

• If you’re interested in Go talk to me!

• It’s really cool!

Page 19: Upper Confidence Trees for Game AI Chahine Koleejan

Othello Demo