justin karneeb. a case based reasoning system developed for use in dom, a domination style game a...
TRANSCRIPT
JuKeCBJustin Karneeb
A Case Based Reasoning system developed for use in DOM, a Domination style game
A Research Project which has been under development over the past three years by:◦ Justin Karneeb◦ Kellen Gillespie◦ Stephan Lee-Urban◦ Professor Munoz Avila
What is JuKeCB?
JuKeCB is a CBR system that learns stochastic policies by observation◦ Stochastic Policy: A non deterministic “strategy”
Imitates winning strategies that have been observed in the past to win future battles
Continues to learn as it observes more games
How about some more detail
In order to understand JuKeCB, you first need to understand the game it is playing.
SCREEN SHOT OF DOM
A short aside: DOM Game
DOM is a Domination style game◦ Team based gameplay◦ Scoring based on holding or “dominating” key
points on the map◦ Easily visible abstract strategies
Basic Strategy◦ Capture enemy Dom points◦ Defend owned Dom points◦ Own more Dom points than opponent
DOM: The Rules
Score is updated every five game turns◦ Each team is awarded 1*NumberDomPointsHeld
points Two possible game modes
◦ Score Limit: Game ends when one team’s score exceeds X points
◦ Turn Limit: Game ends when X number of turns have passed
DOM: Winning
DomOneHuggerTeam◦ All Bots go to domination Point 1
FirstHalfOfDomPointsTeam ◦ Evenly distribute all bots to go to the first half of
domination points
SecondHalfOfDomPointsTeam◦ Evenly distribute all bots to go to the second half
of domination Points
DOM: Meet the easy teams!
EachBotToOneDomTeam ◦ Send all bots to a different domination point
GreedyDistanceTeam ◦ Send all bots to their closest unowned domination
point
Smart OpportunisticTeam ◦ Sends each bot to a different unowned
domination point
DOM: Meet the tough teams…
DOM PICTURE
Questions about DOM Game?
What was all that stochastic policy nonsense you were talking about?◦ Each case in JuKeCB stores two stochastic policies
WinningStrategy LosingStrategy
◦ JuKeCB can employ a winning strategy against similar losing strategies Example: SecondHalf beats Dom1Hugger
Why use stochastic policies at all?◦ Why not plans/single actions/scripts ect
Meanwhile… Back at JuKeCB
Feature selection can be very difficult◦ It took us almost a full semester to get a set of
features that seemed to work Each feature must supply information about
the strategy or game state◦ Do not include unnecessary information
Each feature must be reproducible◦ Able to reproduce similar results when ran
As a whole, features must completely identify a strategy and the game state (ideally)
So what makes a good Policy?
SCREENSHOT OF DOM GAME
Brainstorm!
All features based on a timeframe or window Domination Point Destinations
◦ Probability each bot went to specific dom points Unowned
◦ Probability each bot went to an unowned point Closest
◦ Probability each bot went to its closest point Score Difference
◦ Difference in scores during the time window
Is that enough?
Here’s what we came up with
The DomOneHugger issue◦ Who owns those other points!◦ Not enough information on game state◦ Makes us think the strategy is similar when in fact
it is not
Needed more features!◦ Domination Point Held Ratios
The probability that Team0 held a given dom point
Still not perfect, good enough
Features Failure
Retain◦ Observe game state◦ Store in case base
Retrieve◦ Observe game state◦ Forward similar case
to JuKeCBTeam for reuse
Reuse◦ Enact strategy found
in case
The Case Base Cycle
JuKeCB does all of its learning by observation Game Window Monitoring
◦ Most features built over the course of the window DomPointDest DomHeldRatios Unowned/Closest
◦ Some features are created at the very end DeltaScore
◦ Some are static for a game Num Domination points Num Bots per team Dom Point Distances
Observation: Retain
Once the window ends, the case is created
Retain Continued
JuKeCB uses a three-stage retrieval process to reduce search time
Stage One◦ Runs only at game start◦ Remove all cases that do not pertain to the map
Stage Two◦ Runs at every retrieve update◦ Get all cases that are 95% similar or higher
Stage Three◦ Runs after stage two◦ Gets the case with the highest delta score
Case Retrieval
Left out some features earlier…◦ Number of Domination points◦ Number of Bots per team◦ Distance between Domination points
These features only pertain to Stage One similarity
Temporarily remove all cases that◦ Have a different number of dom points◦ Have a different number of bots per team◦ Do not have a similar set of dom point distances
Stage One
Responsible for finding cases pertinent to the situation
Computes similarity between the enemies current strategy with all losing strategies
If no case is more than 60% similar, run randomly
If no case is more than 95% similar, return most similar case
Otherwise, return all cases more than 95% similar
Stage Two
Compares only the following features◦ Dom point destinations (per bot)◦ ToUnowned (per bot)◦ ToClosest (per bot)◦ Dom Held Ratios
All features are real numbers Similarity formula:
◦ Weight*( 1-|(V1-V2)/(VMax-VMin)| );◦ VMax and VMin are 1 and 0 for all features
Stage Two Similarity
Weights were something we tweaked for a long time
Currently, they are as follows◦ Destinations: 40% (.4/numdompoints/numbots)◦ Dom Point Ratios: 20% (.2*numdompoints)◦ ToUnowned: 20% (.2/numbots)◦ ToClosest: 20% (.2/numbots)
Could probably still be further refined
Stage Two Similarity Cont
Only run if stage two returns more than one case
Looks at the DeltaScoreWinning feature of each returned case
Return case with highest score
This was to combat a perfect similarity beating a very similar case with superior results
Stage Three
JuKeCBTeam recieves as input a strategy◦ Not a full case, just the winning policy of the case
Uses a random number generator to try to follow the given distribution as best as possible◦ Randomly roll numbers for each feature and act
accordingly◦ Rank each destination by how many criteria it
meets◦ If tied, choose one at random
Reuse: JuKeCB Team
Bot0:◦ To Dom0: 20% -- Is owned◦ To Dom1: 20% -- Is unowned◦ To Dom2: 60% -- Is owned◦ To Unowned: 80%◦ To Closest: 10%
Dom Roll (0-100): 68 Unowned Roll (0-100): 27 Closest Roll (0-100): 92
Dom0 Score = 0 Dom1 score = 1 Dom2 score = 1
Reuse: Example
At the end of every game, JuKeCB compiles the list of all recently created cases
Attempts to add them to the case base◦ If no case is 95% similar, add it◦ If a case is 95% similar and the new case has a
higher delta score, swap them
On demand, run a full check◦ Over time, swapping cases can cause redundant
cases in the case base. Running a full check can be very time intensive
Maintanence
Able to beat all ‘easy’ teams with ease◦ DomOneHugger◦ FirstHalfDomPoints◦ SecondHalfDomPoints
Able to win or be competitive against ‘hard’ teams◦ EachBotToOneDom◦ GreedyDistance◦ SmartOpportunistic
Performance
The following results were run on this map◦ IMAGE OF MAP
Performance
Untrained GreedyDistance
1 23 45 67 89 111 133 155 177 199 221 243 265 287 309 331 353 375 397 419 441 463 4850
5000
10000
15000
20000
25000
GreedyDistanceTeam vs JuKeCB (Untrained)
GreedyJuKeCB
Game Time (Turns)
Gam
e S
core
(P
oin
ts)
Trained GreedyDistance
1 19 37 55 73 91 1091271451631811992172352532712893073253433613793974154334514694870
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
GreedyJuKeCB
Untrained DynamicTeam
1 23 45 67 89 1111331551771992212432652873093313533753974194414634850
5000
10000
15000
20000
25000
DynamicTeam vs JuKeCB (Untrained)
DynamicTeamJuKeCB
Game Time (Turns)
Gam
e S
core
(P
oin
ts)
Trained DynamicTeam
1 23 45 67 89 1111331551771992212432652873093313533753974194414634850
5000
10000
15000
20000
25000
DynamicTeam vs JuKeCB (Trained)
DynamicJuKeCB
Game Time (Turns)
Gam
e S
core
(P
oin
ts)
Retrieval can take a long time◦ Num Cases: 129, Average Time Taken: 221ms◦ Num Cases: 258, Average Time Taken: 721ms◦ Num Cases: 516, Average Time Taken: 3,063ms◦ Num Cases: 1032, Average Time Taken: 12,231ms
Cant beat SmartOpportunistic◦ We lack the features to properly define its
strategy◦ No ‘defend’ features◦ All cases appear to be random
Problems
Parallelizing retrieval◦ Direct speed up by using more CPUs
Clustering the Case Base◦ Greedy clustering◦ Similarity clustering
Using Asynchronous retrieval◦ Hide the delay
Additional Work
Speed up retrieval by dividing up the parts of the case base needed for any given retrieval
Greedy Clustering◦ Create new ‘clusters’ depending on the greedy
policy 010: Bot0-Dom0, Bot1-Dom1, Bot2-Dom0
◦ This clustering scheme got us very poor results◦ Too much data loss
Clustering JuKeCB
Similarity Clustering◦ Each cluster gets a representative case◦ New cases are added to a cluster if the similarity
is over a certain threshold◦ New clusters are created if no similar clusters
found
Quite good results◦ Moderate speedup (sorry, cant find numbers!)◦ Only slight performance drop
Clustering JuKeCB
Divide up the Case Base into X number of chunks◦ X = number of processors on the machine◦ Have each processor run stage two on its own
chunk◦ Run stage three on the results of all chunks
Speedup was almost optimal◦ OldRetrievalTime/NumProcessors
Parallelizing JuKeCB
Hide retrieval delay by hiding it in a new thread
Works only if the game is running at ‘human playable speed’
Gets near identical results to normal system with no visible delay
Similar to parallelizing, sacrifice slight speed up gain for better responsiveness
Asynchronous Retrieval
Combining all the previous methods into one ultra fast case based reasoning machine!◦ A clustered case based whose retrieval was done
asynchronously in parallel
Optimising JuKeCB, JuKeCBTeam, and DOM Game◦ Some things were coded somewhat sloppily and could
easily be improved– such as the reuse phase
Adding more features◦ Like we discussed earlier, we do not have enough
features to properly define some strategies
Possible additional work
Overall JuKeCB was a great system for me to work on
It gave me substantial knowledge in the CBR field
A paper: Imitating Inscrutable Enemies: Learning from Stochastic Policy Observation, Retrieval and Reuse was published and presented at the ICCBR 2010 conference
….I swear I did not name the system….
Closing/Questions