upper confidence trees for game ai chahine koleejan

Upper Confidence Treesfor Game AI

Chahine Koleejan

Background on Game AI

• For many years, computer chess was considered an ideal sandbox for testing AI algorithms

• Simple rules and clear benchmarks of performance against human intelligence

• Alpha-beta search programs domination over human players changed this

The Game of Go

• Researchers moved on to Go as their new challenge

• The game of Go is much harder to crack:1. Massive search space– 19x19 board -> up to 361 possible moves per turn– More than 10170 possible states2. Game itself is very complex– Hard to find good heuristics

Example of a Game of Go

Honinbo Shusaku(Black) vs Gennan Inseki(White), 1846

The Multi-arm Bandit Setting

• Hypothetical probability settting• Gambler is at a row of k-”bandits”• When a bandit is pulled the gambler gets

some amount of money• Each bandit has a different probability

distribution• The gambler must decide which bandits to pull

to maximise his reward

Exploitation and Exploration

• We need to balance the exploitation of the action currently believed to be optimal with the exploration of other actions that may be better in the long run

• Upper Confidence Bound:– We want to maximise this value for an arm j:

UCB1 = x ̅j + √[(2 ln n)/nj]

Why do we care?

Why do we care?

• Sequential decision making games are basically a multi-arm bandit problem!

Why do we care?


• …But worse.

Why do we care?


• …But worse.

• …But it’s close enough so we can use the math.

Monte Carlo Tree Search(MCTS)

• A tree search method which has revolutionised computer Go

• Works by simulating thousands of random games

• Does not need any prior knowledge of the game

• Does not need heuristics or evaluation functions, just observes the outcome of the simulation

UCT Algorithm

• We have a tree where each node has a value given by the UCB1 bound

• Steps of the algorithm:1. Selection2. Expansion3. Simulation4. Backpropagation

Selection and Expansion

• Starting at root node, recursively choose the child with the highest value until we reach an expandable node

• A node is expandable if it is non-terminal and has unvisited children

• One child node is added to our tree

Simulation

• A simulation is run from the new node to the end of the game according to our defined default policy

• At the most basic level the default policy is just random legal play

Backpropagation

• The simulation result is “backed up” (i.e backpropagated) up the tree through the selected nodes to update their value

• For example, +1 if we won and -1 if we lost

Example

References

• A Survey of Monte Carlo Tree Search Methods, Cameron B. Browne and co. IEEE Transactions on Computational Intelligence and AI in Games, 2012

• Monte-Carlo tree search and rapid action value estimation in computer Go, Sylvain Gelly & David Silver, Artificial Intelligence 175, 2011

• If you’re interested in Go talk to me!

• It’s really cool!

Othello Demo

upper confidence trees for game ai chahine koleejan

Documents