mcts ai

MctsAiTeam FightingICEMarch 27, 2016

http://www.ice.ci.ritsumei.ac.jp/~ftgaic/

Outline of MctsAi

A sample fighting game AI implementing UCB applied to trees (UCT) [1] for the FightingICE platform

A typical Monte-Carlo Tree Search (MCTS) algorithm [2]

[1] Levente Kocsis and Csaba Szepesvari, “Bandit based Monte-Carlo Planning”[2] R Coulom, “Efficient selectivity and backup operators in Monte-Carlo tree search”

UCT Repeat Selection→Expansion→Playout→Backpropagation until

Reaching the predefined maximum time-length or the maximum number of playouts Use UCB1 value in Selection Finally select the action associated with the adjacent child node, of the root node,

having maximum number of visits

selection expansion playout backpropagation

Upper Confidence Bound (UCB1) [3]

:i: Average evaluation value of node i: Balancing parameter （ empirically set to 3 in the sample AI）::

Select a less visited node with a high evaluation value[3] P Auer and N Cesa-Bianchi and P Fischer, “Finite-time analysis of the multiarmed bandit problem”

MctsAi Procedure

1. Expand all adjacent child nodes at once from the root node2. Repeat an iteration of Selection, Expansion, Playout, and

Backprogation as many times as possible for 16.5ms (<-also empirically set)

3. Select an action to perform

1　 Expansion of all adjacent child nodes from the root node Assign a very large random value to non-visited nodes as their initial UCB1 value

0 0

10002

NaN

0

NaN

0

100109999

NaN

0

ucb1value

avg eval. value

# of visits

Node :

2.1 Selection Select nodes with highest UCB1 value all the way down to a leaf node

0

10002

NaN

0

NaN

0

100109999

NaN

0

17

4.42

0.3

3

2.5

10

4.764.07

0.5

4

NaN

0

10030

NaN

0

10028

NaN

0

10020

Example 1

Example 2

2.2 Expansion If a leaf node having 10 visits at the depth level of 1 is

reached, then expand all of its child nodes at once

17

4.42

0.3

3

2.5

10

4.764.07

0.5

4

NaN

0

10030

NaN

0

10028

NaN

0

10020

17

4.42

0.3

3

2.5

10

4.764.07

0.5

4

2.3 Playout

0

10002

NaN

0

NaN

0

100109999

NaN

0

17

4.42

0.3

3

2.5

10

4.764.07

0.5

4

NaN

0

10030

NaN

0

10028

NaN

0

10020

Perform a random simulation for 60 frames ahead

Example 1

Example 2

2.4 Backpropagation

17

4.42

0.3

3

2.5

10

4.764.07

0.5

4

NaN

0

10030

NaN

0

10028

NaN

0

10020

Backpropagate a newly obtained evaluation value and modify the UCB1 value and number of visits of all related nodes

18

4.46

0.3

3

2.27

11

4.444.10

0.5

4

0

1

6.57

NaN

0

10028

NaN

0

10020

3 Selection of an action

0.33

3

4.64

0.33

3

4.64

2.66

6

5.71

56

4.14

2.53

28

1.95

22

3.763.81

0.33

6

0.5

2

5.98

2.2

5

5.66

4.09

11

6.43

Finally, select the action associated with the child node having the highest number of visits

That’s all folks!

mcts ai

Technology