enhic: an enforced hill climbing based system for general game playing

Post on 13-Apr-2017

579 Views

Category:

Engineering

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 1 of 32

In The Name Of God

EnHiC: An Enforced Hill Climbing Based System

for General Game Playing

Amin Babadi1, Behnaz Omoomi2, Graham Kendall3 1,2Isfahan University of Technology, 3University of Nottingham

2015 IEEE Conference on Computational Intelligence and Games (IEEE CIG 2015)

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 2 of 32

Outline

Introduction

The Heuristic Function

Enforced Hill Climbing Search

Experimental Results

Conclusions

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 3 of 32

Outline

Introduction

The Heuristic Function

Enforced Hill Climbing Search

Experimental Results

Conclusions

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 4 of 32

General Game Playing

Most of efforts in video game AI are limited to only one game.

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 5 of 32

General Game Playing

Most of efforts in video game AI are limited to only one game.

We would like to have agents that are able to perform well in each arbitrary game environment.

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 6 of 32

General Game Playing

Most of efforts in video game AI are limited to only one game.

We would like to have agents that are able to perform well in each arbitrary game environment.

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 7 of 32

One Ring to Rule Them All

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 8 of 32

Why Is GGP So Hard?

Environment Properties

Unknown

Dynamic Non-

Deterministic

Multi-Agent

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 9 of 32

Why Is GGP So Interesting?

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 10 of 32

Competitions

There are currently two GGP competitions running. o GGP (Since 2005)

o GVG-AI (Since 2014)

We focus on the GVG-AI competition.

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 11 of 32

GVG-AI Framework

Passed time

Available actions

Observations

History of events

Game status

Next action Agent

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 12 of 32

Outline

Introduction

The Heuristic Function

Enforced Hill Climbing Search

Experimental Results

Conclusions

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 13 of 32

Preservation of a State

The amount of reward or penalty that the controller gets if it tries to preserve its situation.

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 14 of 32

Heuristic Function

hEnHiC(s) = AverageDistanceToPortals(s)

+ AverageDistanceToResources(s)

+ NumberOfNPCs(s) × kNPC

− NumberOfControllerResources(s) × kResource

− Score(s) × kScore

− ComputePreservation(s) (1)

In this work (kNPC, kResource, kscore)=(5×105, 103, 106).

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 15 of 32

Outline

Introduction

The Heuristic Function

Enforced Hill Climbing Search

Experimental Results

Conclusions

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 16 of 32

Enforced Hill Climbing

Introduced by the FF planning system, the winner AIPS-2000 competition

A local search method based on the common hill climbing

If no immediate successor with better heuristic is found, EHC uses a breadth-first search until it finds a heuristically better state.

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 17 of 32

Enforced Hill Climbing

75

94 76 110

77 70 150 12350 80

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 18 of 32

Adaptations to EHC Method

75

94 76 110

77 70 150 12350 80

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 19 of 32

Adaptations to EHC Method (1)

75

94 76 110

77 70 150 12350 80

Original EHC returns this

node!

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 20 of 32

Adaptations to EHC Method (1)

75

94 76 110

77 70 150 12350 80

Original EHC returns this

node!

We may have time to find this

node!

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 21 of 32

Adaptations to EHC Method (2)

75

94 76 110

77 70 150 12350 80

Only the first action is stored.

We may have time to find this

node!

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 22 of 32

Adaptations to EHC Method (3)

75

94 76 110

77 70 150 12350 80If no better state can be found, EnHiC chooses a random

action to escape plateaus.

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 23 of 32

Adaptations to EHC Method (3)

75

94 76 110

77 70 150 12350 80If no better state can be found, EnHiC chooses a random

action to escape plateaus.

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 24 of 32

Outline

Introduction

The Heuristic Function

Enforced Hill Climbing Search

Experimental Results

Conclusions

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 25 of 32

Controllers

4 Sample controllers from GVG-AI framework: o Random

o One-Step-Look-Ahead

o GA

o MCTS

KB-FE-MCTS: D. Perez, S. Samothrakis, and S. Lucas, “Knowledge-based fast evolutionary MCTS for general video game playing,” proceedings of CIG’14, 2014, pp. 1–8.

EnHiC: the main EHC-based system with all adaptations.

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 26 of 32

Setup

All results, except for the (best) results of KB-FE-MCTS, have been recorded using a computer with o Microsoft Windows 7 OS,

o 6 GB RAM, and

o 2.30 GHz Core i7 CPU.

CIG 2014 Games

3 Game Sets

10 Games Per Set

5 Levels Per Game

5 Trials Per Level

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 27 of 32

Table 1

Percentage of Victories Average Score

Game MCTS KB-FE-MCTS EnHiC MCTS KB-FE-MCTS EnHiC

Aliens 8% 100% 100% 36.72 56.52 67.0

Boulderdash 0% 23.3% 0% 9.96 18.24 3.8

Butterflies 88% 100% 100% 27.84 31.76 26.96

Chase 12% 97.4% 88% 4.04 9.78 8.56

Frogs 24% 28% 100% -0.88 -0.48 1

Missile Command 20% 65.9% 72% -1.44 4.54 4.44

Portals 12% 37% 20% 0.12 0.37 0.2

Sokoban 0% 13.4% 20% 0.16 0.7 1.2

Survive Zombies 44% 53.9% 36% 13.28 24.66 50.68

Zelda 8% 37% 28% 0.08 0.9 4.84

Overall 22% 55.6% 56% 9 14.7 16.87

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 28 of 32

EnHiC Variations

EnHiC: the main EHC-based system with all adaptations.

Fast EnHiC: EHC search is stopped once the first better state is found.

Random-Free EnHiC: when EHC search has failed to find a better state, this version returns a NIL action instead of a random one.

Preservation-Free EnHiC: computation of preservation value is removed from heuristic function.

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 29 of 32

Table 2

Percentage of Victories Average Score

Game Fast

EnHiC

Random-Free EnHiC

Preservation-Free EnHiC

EnHiC Fast

EnHiC

Random-Free EnHiC

Preservation-Free EnHiC

EnHiC

Aliens 100% 100% 64% 100% 58.16 62.24 61.52 67.0

Boulderdash 0% 0% 0% 0% 0.56 2.4 4.44 3.8

Butterflies 64% 100% 100% 100% 24.72 29.76 26 26.96

Chase 52% 12% 84% 88% 6.12 4.2 8.84 8.56

Frogs 8% 100% 84% 100% -0.24 1 0.52 1

Missile Command 52% 40% 68% 72% 1.68 2.2 5.04 4.44

Portals 0% 20% 20% 20% 0 0.2 0.2 0.2

Sokoban 8% 0% 20% 20% 1.08 0.4 1.32 1.2

Survive Zombies 16% 44% 32% 36% 6.16 49.48 55.32 50.68

Zelda 0% 16% 20% 28% 1.88 4 4 4.84

Overall 30% 43% 49% 56% 10.01 15.59 16.72 16.87

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 30 of 32

Outline

Introduction

The Heuristic Function

Enforced Hill Climbing Search

Experimental Results

Conclusions

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 31 of 32

Summary

There are very similarities between GGP and automated planning.

We can put these similarities to good use.

We adapted one of the well-known planning algorithms, called enforced hill climbing to solve the general game playing problem.

EnHiC: An Enforced Hill Climbing Based System for General Game Playing 32 of 32

Thank You Very Much!

Any questions?

Dr. Behnaz Omoomi www.bomoomi.iut.ac.ir

Prof. Graham Kendall www.graham-kendall.com

Amin Babadi www.ababadi.ece.iut.ac.ir

Function ComputePreservation

1: Input: current state observation s 2: Output: preservation p

3: prev ← s 4: p ← 0

5: For i = 1 to K Do

6: next ← Adv(prev, ACTION_NIL) 7: If player is lost in next 8: Return p ← –αεi

9: End If 10: If player is winner in next 11: Return p ← αεi

12: End If 13: diff ← Score(next) - Score(prev) 14: If diff ≠ 0

15: p ← p + diff × βεi

16: End If 17: prev ← next 18: End For 19: Return p

In this work (K, α, β,ε)=(5, 2×107, 106, 0.9).

Function EnforcedHillClimbing

1: Input: current state observation so0

2: Output: an action sequence for transforming so0 into a goal state

3: sequence ← <> (empty action sequence)

4: s ← so0

5: While s is not a goal state

6: Perform breadth-first search for a state s’ such that h(s’)<h(s)

7: If no better state is found

8: Return “Failure”

9: End If

10: Add action sequence from s to s’ to the end of sequence

11: s ← s’

12: End While

13: Return sequence

Function EnHiC_act

1: Input: current state observation so0

2: Output: next action to be performed

3: bestAct ← a randomly chosen action

4: If player loses in Adv(so0, bestAct)

5: bestAct ← ACTION_NIL

6: End If

7: bestScore ← hEnHiC(so0)

8: Initialize openSet to be a queue with one element <so0, ACTION_NIL>

9: While openSet is not empty AND time is not over

10: Continue breadth-first search for a state s’ such that hEnHiC(s’)<bestScore

11: If such s’ is found

12: bestScore ← hEnHiC(s’)

13: End If

14: End While

15: Return bestAct

top related