enhic: an enforced hill climbing based system for general game playing
TRANSCRIPT
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 1 of 32
In The Name Of God
EnHiC: An Enforced Hill Climbing Based System
for General Game Playing
Amin Babadi1, Behnaz Omoomi2, Graham Kendall3 1,2Isfahan University of Technology, 3University of Nottingham
2015 IEEE Conference on Computational Intelligence and Games (IEEE CIG 2015)
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 2 of 32
Outline
Introduction
The Heuristic Function
Enforced Hill Climbing Search
Experimental Results
Conclusions
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 3 of 32
Outline
Introduction
The Heuristic Function
Enforced Hill Climbing Search
Experimental Results
Conclusions
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 4 of 32
General Game Playing
Most of efforts in video game AI are limited to only one game.
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 5 of 32
General Game Playing
Most of efforts in video game AI are limited to only one game.
We would like to have agents that are able to perform well in each arbitrary game environment.
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 6 of 32
General Game Playing
Most of efforts in video game AI are limited to only one game.
We would like to have agents that are able to perform well in each arbitrary game environment.
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 7 of 32
One Ring to Rule Them All
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 8 of 32
Why Is GGP So Hard?
Environment Properties
Unknown
Dynamic Non-
Deterministic
Multi-Agent
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 9 of 32
Why Is GGP So Interesting?
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 10 of 32
Competitions
There are currently two GGP competitions running. o GGP (Since 2005)
o GVG-AI (Since 2014)
We focus on the GVG-AI competition.
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 11 of 32
GVG-AI Framework
Passed time
Available actions
Observations
History of events
Game status
Next action Agent
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 12 of 32
Outline
Introduction
The Heuristic Function
Enforced Hill Climbing Search
Experimental Results
Conclusions
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 13 of 32
Preservation of a State
The amount of reward or penalty that the controller gets if it tries to preserve its situation.
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 14 of 32
Heuristic Function
hEnHiC(s) = AverageDistanceToPortals(s)
+ AverageDistanceToResources(s)
+ NumberOfNPCs(s) × kNPC
− NumberOfControllerResources(s) × kResource
− Score(s) × kScore
− ComputePreservation(s) (1)
In this work (kNPC, kResource, kscore)=(5×105, 103, 106).
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 15 of 32
Outline
Introduction
The Heuristic Function
Enforced Hill Climbing Search
Experimental Results
Conclusions
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 16 of 32
Enforced Hill Climbing
Introduced by the FF planning system, the winner AIPS-2000 competition
A local search method based on the common hill climbing
If no immediate successor with better heuristic is found, EHC uses a breadth-first search until it finds a heuristically better state.
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 17 of 32
Enforced Hill Climbing
75
94 76 110
77 70 150 12350 80
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 18 of 32
Adaptations to EHC Method
75
94 76 110
77 70 150 12350 80
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 19 of 32
Adaptations to EHC Method (1)
75
94 76 110
77 70 150 12350 80
Original EHC returns this
node!
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 20 of 32
Adaptations to EHC Method (1)
75
94 76 110
77 70 150 12350 80
Original EHC returns this
node!
We may have time to find this
node!
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 21 of 32
Adaptations to EHC Method (2)
75
94 76 110
77 70 150 12350 80
Only the first action is stored.
We may have time to find this
node!
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 22 of 32
Adaptations to EHC Method (3)
75
94 76 110
77 70 150 12350 80If no better state can be found, EnHiC chooses a random
action to escape plateaus.
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 23 of 32
Adaptations to EHC Method (3)
75
94 76 110
77 70 150 12350 80If no better state can be found, EnHiC chooses a random
action to escape plateaus.
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 24 of 32
Outline
Introduction
The Heuristic Function
Enforced Hill Climbing Search
Experimental Results
Conclusions
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 25 of 32
Controllers
4 Sample controllers from GVG-AI framework: o Random
o One-Step-Look-Ahead
o GA
o MCTS
KB-FE-MCTS: D. Perez, S. Samothrakis, and S. Lucas, “Knowledge-based fast evolutionary MCTS for general video game playing,” proceedings of CIG’14, 2014, pp. 1–8.
EnHiC: the main EHC-based system with all adaptations.
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 26 of 32
Setup
All results, except for the (best) results of KB-FE-MCTS, have been recorded using a computer with o Microsoft Windows 7 OS,
o 6 GB RAM, and
o 2.30 GHz Core i7 CPU.
CIG 2014 Games
3 Game Sets
10 Games Per Set
5 Levels Per Game
5 Trials Per Level
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 27 of 32
Table 1
Percentage of Victories Average Score
Game MCTS KB-FE-MCTS EnHiC MCTS KB-FE-MCTS EnHiC
Aliens 8% 100% 100% 36.72 56.52 67.0
Boulderdash 0% 23.3% 0% 9.96 18.24 3.8
Butterflies 88% 100% 100% 27.84 31.76 26.96
Chase 12% 97.4% 88% 4.04 9.78 8.56
Frogs 24% 28% 100% -0.88 -0.48 1
Missile Command 20% 65.9% 72% -1.44 4.54 4.44
Portals 12% 37% 20% 0.12 0.37 0.2
Sokoban 0% 13.4% 20% 0.16 0.7 1.2
Survive Zombies 44% 53.9% 36% 13.28 24.66 50.68
Zelda 8% 37% 28% 0.08 0.9 4.84
Overall 22% 55.6% 56% 9 14.7 16.87
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 28 of 32
EnHiC Variations
EnHiC: the main EHC-based system with all adaptations.
Fast EnHiC: EHC search is stopped once the first better state is found.
Random-Free EnHiC: when EHC search has failed to find a better state, this version returns a NIL action instead of a random one.
Preservation-Free EnHiC: computation of preservation value is removed from heuristic function.
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 29 of 32
Table 2
Percentage of Victories Average Score
Game Fast
EnHiC
Random-Free EnHiC
Preservation-Free EnHiC
EnHiC Fast
EnHiC
Random-Free EnHiC
Preservation-Free EnHiC
EnHiC
Aliens 100% 100% 64% 100% 58.16 62.24 61.52 67.0
Boulderdash 0% 0% 0% 0% 0.56 2.4 4.44 3.8
Butterflies 64% 100% 100% 100% 24.72 29.76 26 26.96
Chase 52% 12% 84% 88% 6.12 4.2 8.84 8.56
Frogs 8% 100% 84% 100% -0.24 1 0.52 1
Missile Command 52% 40% 68% 72% 1.68 2.2 5.04 4.44
Portals 0% 20% 20% 20% 0 0.2 0.2 0.2
Sokoban 8% 0% 20% 20% 1.08 0.4 1.32 1.2
Survive Zombies 16% 44% 32% 36% 6.16 49.48 55.32 50.68
Zelda 0% 16% 20% 28% 1.88 4 4 4.84
Overall 30% 43% 49% 56% 10.01 15.59 16.72 16.87
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 30 of 32
Outline
Introduction
The Heuristic Function
Enforced Hill Climbing Search
Experimental Results
Conclusions
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 31 of 32
Summary
There are very similarities between GGP and automated planning.
We can put these similarities to good use.
We adapted one of the well-known planning algorithms, called enforced hill climbing to solve the general game playing problem.
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 32 of 32
Thank You Very Much!
Any questions?
Dr. Behnaz Omoomi www.bomoomi.iut.ac.ir
Prof. Graham Kendall www.graham-kendall.com
Amin Babadi www.ababadi.ece.iut.ac.ir
Function ComputePreservation
1: Input: current state observation s 2: Output: preservation p
3: prev ← s 4: p ← 0
5: For i = 1 to K Do
6: next ← Adv(prev, ACTION_NIL) 7: If player is lost in next 8: Return p ← –αεi
9: End If 10: If player is winner in next 11: Return p ← αεi
12: End If 13: diff ← Score(next) - Score(prev) 14: If diff ≠ 0
15: p ← p + diff × βεi
16: End If 17: prev ← next 18: End For 19: Return p
In this work (K, α, β,ε)=(5, 2×107, 106, 0.9).
Function EnforcedHillClimbing
1: Input: current state observation so0
2: Output: an action sequence for transforming so0 into a goal state
3: sequence ← <> (empty action sequence)
4: s ← so0
5: While s is not a goal state
6: Perform breadth-first search for a state s’ such that h(s’)<h(s)
7: If no better state is found
8: Return “Failure”
9: End If
10: Add action sequence from s to s’ to the end of sequence
11: s ← s’
12: End While
13: Return sequence
Function EnHiC_act
1: Input: current state observation so0
2: Output: next action to be performed
3: bestAct ← a randomly chosen action
4: If player loses in Adv(so0, bestAct)
5: bestAct ← ACTION_NIL
6: End If
7: bestScore ← hEnHiC(so0)
8: Initialize openSet to be a queue with one element <so0, ACTION_NIL>
9: While openSet is not empty AND time is not over
10: Continue breadth-first search for a state s’ such that hEnHiC(s’)<bestScore
11: If such s’ is found
12: bestScore ← hEnHiC(s’)
13: End If
14: End While
15: Return bestAct