enhic: an enforced hill climbing based system for general game playing

35
EnHiC: An Enforced Hill Climbing Based System for General Game Playing 1 of 32 In The Name Of God EnHiC: An Enforced Hill Climbing Based System for General Game Playing Amin Babadi 1 , Behnaz Omoomi 2 , Graham Kendall 3 1,2 Isfahan University of Technology, 3 University of Nottingham 2015 IEEE Conference on Computational Intelligence and Games (IEEE CIG 2015)

Upload: amin-babadi

Post on 13-Apr-2017

578 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 1 of 32

In The Name Of God

EnHiC: An Enforced Hill Climbing Based System

for General Game Playing

Amin Babadi1, Behnaz Omoomi2, Graham Kendall3 1,2Isfahan University of Technology, 3University of Nottingham

2015 IEEE Conference on Computational Intelligence and Games (IEEE CIG 2015)

Page 2: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 2 of 32

Outline

Introduction

The Heuristic Function

Enforced Hill Climbing Search

Experimental Results

Conclusions

Page 3: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 3 of 32

Outline

Introduction

The Heuristic Function

Enforced Hill Climbing Search

Experimental Results

Conclusions

Page 4: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 4 of 32

General Game Playing

Most of efforts in video game AI are limited to only one game.

Page 5: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 5 of 32

General Game Playing

Most of efforts in video game AI are limited to only one game.

We would like to have agents that are able to perform well in each arbitrary game environment.

Page 6: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 6 of 32

General Game Playing

Most of efforts in video game AI are limited to only one game.

We would like to have agents that are able to perform well in each arbitrary game environment.

Page 7: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 7 of 32

One Ring to Rule Them All

Page 8: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 8 of 32

Why Is GGP So Hard?

Environment Properties

Unknown

Dynamic Non-

Deterministic

Multi-Agent

Page 9: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 9 of 32

Why Is GGP So Interesting?

Page 10: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 10 of 32

Competitions

There are currently two GGP competitions running. o GGP (Since 2005)

o GVG-AI (Since 2014)

We focus on the GVG-AI competition.

Page 11: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 11 of 32

GVG-AI Framework

Passed time

Available actions

Observations

History of events

Game status

Next action Agent

Page 12: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 12 of 32

Outline

Introduction

The Heuristic Function

Enforced Hill Climbing Search

Experimental Results

Conclusions

Page 13: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 13 of 32

Preservation of a State

The amount of reward or penalty that the controller gets if it tries to preserve its situation.

Page 14: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 14 of 32

Heuristic Function

hEnHiC(s) = AverageDistanceToPortals(s)

+ AverageDistanceToResources(s)

+ NumberOfNPCs(s) × kNPC

− NumberOfControllerResources(s) × kResource

− Score(s) × kScore

− ComputePreservation(s) (1)

In this work (kNPC, kResource, kscore)=(5×105, 103, 106).

Page 15: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 15 of 32

Outline

Introduction

The Heuristic Function

Enforced Hill Climbing Search

Experimental Results

Conclusions

Page 16: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 16 of 32

Enforced Hill Climbing

Introduced by the FF planning system, the winner AIPS-2000 competition

A local search method based on the common hill climbing

If no immediate successor with better heuristic is found, EHC uses a breadth-first search until it finds a heuristically better state.

Page 17: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 17 of 32

Enforced Hill Climbing

75

94 76 110

77 70 150 12350 80

Page 18: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 18 of 32

Adaptations to EHC Method

75

94 76 110

77 70 150 12350 80

Page 19: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 19 of 32

Adaptations to EHC Method (1)

75

94 76 110

77 70 150 12350 80

Original EHC returns this

node!

Page 20: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 20 of 32

Adaptations to EHC Method (1)

75

94 76 110

77 70 150 12350 80

Original EHC returns this

node!

We may have time to find this

node!

Page 21: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 21 of 32

Adaptations to EHC Method (2)

75

94 76 110

77 70 150 12350 80

Only the first action is stored.

We may have time to find this

node!

Page 22: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 22 of 32

Adaptations to EHC Method (3)

75

94 76 110

77 70 150 12350 80If no better state can be found, EnHiC chooses a random

action to escape plateaus.

Page 23: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 23 of 32

Adaptations to EHC Method (3)

75

94 76 110

77 70 150 12350 80If no better state can be found, EnHiC chooses a random

action to escape plateaus.

Page 24: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 24 of 32

Outline

Introduction

The Heuristic Function

Enforced Hill Climbing Search

Experimental Results

Conclusions

Page 25: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 25 of 32

Controllers

4 Sample controllers from GVG-AI framework: o Random

o One-Step-Look-Ahead

o GA

o MCTS

KB-FE-MCTS: D. Perez, S. Samothrakis, and S. Lucas, “Knowledge-based fast evolutionary MCTS for general video game playing,” proceedings of CIG’14, 2014, pp. 1–8.

EnHiC: the main EHC-based system with all adaptations.

Page 26: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 26 of 32

Setup

All results, except for the (best) results of KB-FE-MCTS, have been recorded using a computer with o Microsoft Windows 7 OS,

o 6 GB RAM, and

o 2.30 GHz Core i7 CPU.

CIG 2014 Games

3 Game Sets

10 Games Per Set

5 Levels Per Game

5 Trials Per Level

Page 27: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 27 of 32

Table 1

Percentage of Victories Average Score

Game MCTS KB-FE-MCTS EnHiC MCTS KB-FE-MCTS EnHiC

Aliens 8% 100% 100% 36.72 56.52 67.0

Boulderdash 0% 23.3% 0% 9.96 18.24 3.8

Butterflies 88% 100% 100% 27.84 31.76 26.96

Chase 12% 97.4% 88% 4.04 9.78 8.56

Frogs 24% 28% 100% -0.88 -0.48 1

Missile Command 20% 65.9% 72% -1.44 4.54 4.44

Portals 12% 37% 20% 0.12 0.37 0.2

Sokoban 0% 13.4% 20% 0.16 0.7 1.2

Survive Zombies 44% 53.9% 36% 13.28 24.66 50.68

Zelda 8% 37% 28% 0.08 0.9 4.84

Overall 22% 55.6% 56% 9 14.7 16.87

Page 28: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 28 of 32

EnHiC Variations

EnHiC: the main EHC-based system with all adaptations.

Fast EnHiC: EHC search is stopped once the first better state is found.

Random-Free EnHiC: when EHC search has failed to find a better state, this version returns a NIL action instead of a random one.

Preservation-Free EnHiC: computation of preservation value is removed from heuristic function.

Page 29: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 29 of 32

Table 2

Percentage of Victories Average Score

Game Fast

EnHiC

Random-Free EnHiC

Preservation-Free EnHiC

EnHiC Fast

EnHiC

Random-Free EnHiC

Preservation-Free EnHiC

EnHiC

Aliens 100% 100% 64% 100% 58.16 62.24 61.52 67.0

Boulderdash 0% 0% 0% 0% 0.56 2.4 4.44 3.8

Butterflies 64% 100% 100% 100% 24.72 29.76 26 26.96

Chase 52% 12% 84% 88% 6.12 4.2 8.84 8.56

Frogs 8% 100% 84% 100% -0.24 1 0.52 1

Missile Command 52% 40% 68% 72% 1.68 2.2 5.04 4.44

Portals 0% 20% 20% 20% 0 0.2 0.2 0.2

Sokoban 8% 0% 20% 20% 1.08 0.4 1.32 1.2

Survive Zombies 16% 44% 32% 36% 6.16 49.48 55.32 50.68

Zelda 0% 16% 20% 28% 1.88 4 4 4.84

Overall 30% 43% 49% 56% 10.01 15.59 16.72 16.87

Page 30: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 30 of 32

Outline

Introduction

The Heuristic Function

Enforced Hill Climbing Search

Experimental Results

Conclusions

Page 31: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 31 of 32

Summary

There are very similarities between GGP and automated planning.

We can put these similarities to good use.

We adapted one of the well-known planning algorithms, called enforced hill climbing to solve the general game playing problem.

Page 32: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 32 of 32

Thank You Very Much!

Any questions?

Dr. Behnaz Omoomi www.bomoomi.iut.ac.ir

Prof. Graham Kendall www.graham-kendall.com

Amin Babadi www.ababadi.ece.iut.ac.ir

Page 33: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

Function ComputePreservation

1: Input: current state observation s 2: Output: preservation p

3: prev ← s 4: p ← 0

5: For i = 1 to K Do

6: next ← Adv(prev, ACTION_NIL) 7: If player is lost in next 8: Return p ← –αεi

9: End If 10: If player is winner in next 11: Return p ← αεi

12: End If 13: diff ← Score(next) - Score(prev) 14: If diff ≠ 0

15: p ← p + diff × βεi

16: End If 17: prev ← next 18: End For 19: Return p

In this work (K, α, β,ε)=(5, 2×107, 106, 0.9).

Page 34: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

Function EnforcedHillClimbing

1: Input: current state observation so0

2: Output: an action sequence for transforming so0 into a goal state

3: sequence ← <> (empty action sequence)

4: s ← so0

5: While s is not a goal state

6: Perform breadth-first search for a state s’ such that h(s’)<h(s)

7: If no better state is found

8: Return “Failure”

9: End If

10: Add action sequence from s to s’ to the end of sequence

11: s ← s’

12: End While

13: Return sequence

Page 35: EnHiC: An Enforced Hill Climbing Based System for General Game Playing

Function EnHiC_act

1: Input: current state observation so0

2: Output: next action to be performed

3: bestAct ← a randomly chosen action

4: If player loses in Adv(so0, bestAct)

5: bestAct ← ACTION_NIL

6: End If

7: bestScore ← hEnHiC(so0)

8: Initialize openSet to be a queue with one element <so0, ACTION_NIL>

9: While openSet is not empty AND time is not over

10: Continue breadth-first search for a state s’ such that hEnHiC(s’)<bestScore

11: If such s’ is found

12: bestScore ← hEnHiC(s’)

13: End If

14: End While

15: Return bestAct