mcts ai
TRANSCRIPT
MctsAiTeam FightingICEMarch 27, 2016
http://www.ice.ci.ritsumei.ac.jp/~ftgaic/
Outline of MctsAi
A sample fighting game AI implementing UCB applied to trees (UCT) [1] for the FightingICE platform
A typical Monte-Carlo Tree Search (MCTS) algorithm [2]
[1] Levente Kocsis and Csaba Szepesvari, “Bandit based Monte-Carlo Planning”[2] R Coulom, “Efficient selectivity and backup operators in Monte-Carlo tree search”
UCT Repeat Selection→Expansion→Playout→Backpropagation until
Reaching the predefined maximum time-length or the maximum number of playouts Use UCB1 value in Selection Finally select the action associated with the adjacent child node, of the root node,
having maximum number of visits
selection expansion playout backpropagation
Upper Confidence Bound (UCB1) [3]
:i: Average evaluation value of node i: Balancing parameter ( empirically set to 3 in the sample AI)::
Select a less visited node with a high evaluation value[3] P Auer and N Cesa-Bianchi and P Fischer, “Finite-time analysis of the multiarmed bandit problem”
MctsAi Procedure
1. Expand all adjacent child nodes at once from the root node2. Repeat an iteration of Selection, Expansion, Playout, and
Backprogation as many times as possible for 16.5ms (<-also empirically set)
3. Select an action to perform
1 Expansion of all adjacent child nodes from the root node Assign a very large random value to non-visited nodes as their initial UCB1 value
0 0
10002
NaN
0
NaN
0
100109999
NaN
0
ucb1value
avg eval. value
# of visits
Node :
2.1 Selection Select nodes with highest UCB1 value all the way down to a leaf node
0
10002
NaN
0
NaN
0
100109999
NaN
0
17
4.42
0.3
3
2.5
10
4.764.07
0.5
4
NaN
0
10030
NaN
0
10028
NaN
0
10020
Example 1
Example 2
2.2 Expansion If a leaf node having 10 visits at the depth level of 1 is
reached, then expand all of its child nodes at once
17
4.42
0.3
3
2.5
10
4.764.07
0.5
4
NaN
0
10030
NaN
0
10028
NaN
0
10020
17
4.42
0.3
3
2.5
10
4.764.07
0.5
4
2.3 Playout
0
10002
NaN
0
NaN
0
100109999
NaN
0
17
4.42
0.3
3
2.5
10
4.764.07
0.5
4
NaN
0
10030
NaN
0
10028
NaN
0
10020
Perform a random simulation for 60 frames ahead
Example 1
Example 2
2.4 Backpropagation
17
4.42
0.3
3
2.5
10
4.764.07
0.5
4
NaN
0
10030
NaN
0
10028
NaN
0
10020
Backpropagate a newly obtained evaluation value and modify the UCB1 value and number of visits of all related nodes
18
4.46
0.3
3
2.27
11
4.444.10
0.5
4
0
1
6.57
NaN
0
10028
NaN
0
10020
3 Selection of an action
0.33
3
4.64
0.33
3
4.64
2.66
6
5.71
56
4.14
2.53
28
1.95
22
3.763.81
0.33
6
0.5
2
5.98
2.2
5
5.66
4.09
11
6.43
Finally, select the action associated with the child node having the highest number of visits
That’s all folks!