justin karneeb. a case based reasoning system developed for use in dom, a domination style game a...

JuKeCBJustin Karneeb

A Case Based Reasoning system developed for use in DOM, a Domination style game

A Research Project which has been under development over the past three years by:◦ Justin Karneeb◦ Kellen Gillespie◦ Stephan Lee-Urban◦ Professor Munoz Avila

What is JuKeCB?

JuKeCB is a CBR system that learns stochastic policies by observation◦ Stochastic Policy: A non deterministic “strategy”

Imitates winning strategies that have been observed in the past to win future battles

Continues to learn as it observes more games

How about some more detail

In order to understand JuKeCB, you first need to understand the game it is playing.

SCREEN SHOT OF DOM

A short aside: DOM Game

DOM is a Domination style game◦ Team based gameplay◦ Scoring based on holding or “dominating” key

points on the map◦ Easily visible abstract strategies

Basic Strategy◦ Capture enemy Dom points◦ Defend owned Dom points◦ Own more Dom points than opponent

DOM: The Rules

Score is updated every five game turns◦ Each team is awarded 1*NumberDomPointsHeld

points Two possible game modes

◦ Score Limit: Game ends when one team’s score exceeds X points

◦ Turn Limit: Game ends when X number of turns have passed

DOM: Winning

DomOneHuggerTeam◦ All Bots go to domination Point 1

FirstHalfOfDomPointsTeam ◦ Evenly distribute all bots to go to the first half of

domination points

SecondHalfOfDomPointsTeam◦ Evenly distribute all bots to go to the second half

of domination Points

DOM: Meet the easy teams!

EachBotToOneDomTeam ◦ Send all bots to a different domination point

GreedyDistanceTeam ◦ Send all bots to their closest unowned domination

point

Smart OpportunisticTeam ◦ Sends each bot to a different unowned

domination point

DOM: Meet the tough teams…

DOM PICTURE

Questions about DOM Game?

What was all that stochastic policy nonsense you were talking about?◦ Each case in JuKeCB stores two stochastic policies

WinningStrategy LosingStrategy

◦ JuKeCB can employ a winning strategy against similar losing strategies Example: SecondHalf beats Dom1Hugger

Why use stochastic policies at all?◦ Why not plans/single actions/scripts ect

Meanwhile… Back at JuKeCB

Feature selection can be very difficult◦ It took us almost a full semester to get a set of

features that seemed to work Each feature must supply information about

the strategy or game state◦ Do not include unnecessary information

Each feature must be reproducible◦ Able to reproduce similar results when ran

As a whole, features must completely identify a strategy and the game state (ideally)

So what makes a good Policy?

SCREENSHOT OF DOM GAME

Brainstorm!

All features based on a timeframe or window Domination Point Destinations

◦ Probability each bot went to specific dom points Unowned

◦ Probability each bot went to an unowned point Closest

◦ Probability each bot went to its closest point Score Difference

◦ Difference in scores during the time window

Is that enough?

Here’s what we came up with

The DomOneHugger issue◦ Who owns those other points!◦ Not enough information on game state◦ Makes us think the strategy is similar when in fact

it is not

Needed more features!◦ Domination Point Held Ratios

The probability that Team0 held a given dom point

Still not perfect, good enough

Features Failure

Retain◦ Observe game state◦ Store in case base

Retrieve◦ Observe game state◦ Forward similar case

to JuKeCBTeam for reuse

Reuse◦ Enact strategy found

in case

The Case Base Cycle

JuKeCB does all of its learning by observation Game Window Monitoring

◦ Most features built over the course of the window DomPointDest DomHeldRatios Unowned/Closest

◦ Some features are created at the very end DeltaScore

◦ Some are static for a game Num Domination points Num Bots per team Dom Point Distances

Observation: Retain

Once the window ends, the case is created

Retain Continued

JuKeCB uses a three-stage retrieval process to reduce search time

Stage One◦ Runs only at game start◦ Remove all cases that do not pertain to the map

Stage Two◦ Runs at every retrieve update◦ Get all cases that are 95% similar or higher

Stage Three◦ Runs after stage two◦ Gets the case with the highest delta score

Case Retrieval

Left out some features earlier…◦ Number of Domination points◦ Number of Bots per team◦ Distance between Domination points

These features only pertain to Stage One similarity

Temporarily remove all cases that◦ Have a different number of dom points◦ Have a different number of bots per team◦ Do not have a similar set of dom point distances

Stage One

Responsible for finding cases pertinent to the situation

Computes similarity between the enemies current strategy with all losing strategies

If no case is more than 60% similar, run randomly

If no case is more than 95% similar, return most similar case

Otherwise, return all cases more than 95% similar

Stage Two

Compares only the following features◦ Dom point destinations (per bot)◦ ToUnowned (per bot)◦ ToClosest (per bot)◦ Dom Held Ratios

All features are real numbers Similarity formula:

◦ Weight*( 1-|(V1-V2)/(VMax-VMin)| );◦ VMax and VMin are 1 and 0 for all features

Stage Two Similarity

Weights were something we tweaked for a long time

Currently, they are as follows◦ Destinations: 40% (.4/numdompoints/numbots)◦ Dom Point Ratios: 20% (.2*numdompoints)◦ ToUnowned: 20% (.2/numbots)◦ ToClosest: 20% (.2/numbots)

Could probably still be further refined

Stage Two Similarity Cont

Only run if stage two returns more than one case

Looks at the DeltaScoreWinning feature of each returned case

Return case with highest score

This was to combat a perfect similarity beating a very similar case with superior results

Stage Three

JuKeCBTeam recieves as input a strategy◦ Not a full case, just the winning policy of the case

Uses a random number generator to try to follow the given distribution as best as possible◦ Randomly roll numbers for each feature and act

accordingly◦ Rank each destination by how many criteria it

meets◦ If tied, choose one at random

Reuse: JuKeCB Team

Bot0:◦ To Dom0: 20% -- Is owned◦ To Dom1: 20% -- Is unowned◦ To Dom2: 60% -- Is owned◦ To Unowned: 80%◦ To Closest: 10%

Dom Roll (0-100): 68 Unowned Roll (0-100): 27 Closest Roll (0-100): 92

Dom0 Score = 0 Dom1 score = 1 Dom2 score = 1

Reuse: Example

At the end of every game, JuKeCB compiles the list of all recently created cases

Attempts to add them to the case base◦ If no case is 95% similar, add it◦ If a case is 95% similar and the new case has a

higher delta score, swap them

On demand, run a full check◦ Over time, swapping cases can cause redundant

cases in the case base. Running a full check can be very time intensive

Maintanence

Able to beat all ‘easy’ teams with ease◦ DomOneHugger◦ FirstHalfDomPoints◦ SecondHalfDomPoints

Able to win or be competitive against ‘hard’ teams◦ EachBotToOneDom◦ GreedyDistance◦ SmartOpportunistic

Performance

The following results were run on this map◦ IMAGE OF MAP

Performance

Untrained GreedyDistance

1 23 45 67 89 111 133 155 177 199 221 243 265 287 309 331 353 375 397 419 441 463 4850

5000

10000

15000

20000

25000

GreedyDistanceTeam vs JuKeCB (Untrained)

GreedyJuKeCB

Game Time (Turns)

Gam

e S

core

(P

oin

ts)

Trained GreedyDistance

1 19 37 55 73 91 1091271451631811992172352532712893073253433613793974154334514694870

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

GreedyJuKeCB

Untrained DynamicTeam

1 23 45 67 89 1111331551771992212432652873093313533753974194414634850

5000

10000

15000

20000

25000

DynamicTeam vs JuKeCB (Untrained)

DynamicTeamJuKeCB

Game Time (Turns)

Gam

e S

core

(P

oin

ts)

Trained DynamicTeam

1 23 45 67 89 1111331551771992212432652873093313533753974194414634850

5000

10000

15000

20000

25000

DynamicTeam vs JuKeCB (Trained)

DynamicJuKeCB

Game Time (Turns)

Gam

e S

core

(P

oin

ts)

Retrieval can take a long time◦ Num Cases: 129, Average Time Taken: 221ms◦ Num Cases: 258, Average Time Taken: 721ms◦ Num Cases: 516, Average Time Taken: 3,063ms◦ Num Cases: 1032, Average Time Taken: 12,231ms

Cant beat SmartOpportunistic◦ We lack the features to properly define its

strategy◦ No ‘defend’ features◦ All cases appear to be random

Problems

Parallelizing retrieval◦ Direct speed up by using more CPUs

Clustering the Case Base◦ Greedy clustering◦ Similarity clustering

Using Asynchronous retrieval◦ Hide the delay

Additional Work

Speed up retrieval by dividing up the parts of the case base needed for any given retrieval

Greedy Clustering◦ Create new ‘clusters’ depending on the greedy

policy 010: Bot0-Dom0, Bot1-Dom1, Bot2-Dom0

◦ This clustering scheme got us very poor results◦ Too much data loss

Clustering JuKeCB

Similarity Clustering◦ Each cluster gets a representative case◦ New cases are added to a cluster if the similarity

is over a certain threshold◦ New clusters are created if no similar clusters

found

Quite good results◦ Moderate speedup (sorry, cant find numbers!)◦ Only slight performance drop

Clustering JuKeCB

Divide up the Case Base into X number of chunks◦ X = number of processors on the machine◦ Have each processor run stage two on its own

chunk◦ Run stage three on the results of all chunks

Speedup was almost optimal◦ OldRetrievalTime/NumProcessors

Parallelizing JuKeCB

Hide retrieval delay by hiding it in a new thread

Works only if the game is running at ‘human playable speed’

Gets near identical results to normal system with no visible delay

Similar to parallelizing, sacrifice slight speed up gain for better responsiveness

Asynchronous Retrieval

Combining all the previous methods into one ultra fast case based reasoning machine!◦ A clustered case based whose retrieval was done

asynchronously in parallel

Optimising JuKeCB, JuKeCBTeam, and DOM Game◦ Some things were coded somewhat sloppily and could

easily be improved– such as the reuse phase

Adding more features◦ Like we discussed earlier, we do not have enough

features to properly define some strategies

Possible additional work

Overall JuKeCB was a great system for me to work on

It gave me substantial knowledge in the CBR field

A paper: Imitating Inscrutable Enemies: Learning from Stochastic Policy Observation, Retrieval and Reuse was published and presented at the ICCBR 2010 conference

….I swear I did not name the system….

Closing/Questions

justin karneeb. a case based reasoning system developed for use in dom, a domination style game a...

Documents