evaluating simulation software components with player rating systems (simutools 2013)
DESCRIPTION
In component-based simulation systems, simulation runs are usually executed by combinations of distinct components, each solving a particular sub-task. If multiple components are available for a given sub-task (e.g., different event queue implementations), a simulation system may rely on an automatic selection mechanism, on a user decision, or --- if neither is available --- on a predefined default component. However, deciding upon a default component for each kind of sub-task is difficult: such a component should work well across various application domains and various combinations with other components. Furthermore, the performance of individual components cannot be evaluated easily, since performance is typically measured for component combinations as a whole (e.g., the execution time of a simulation run). Finally, the selection of default components should be dynamic, as new and potentially superior components may be deployed to the system over time. We illustrate how player rating systems for team-based games can solve the above problems and evaluate our approach with an implementation of the TrueSkill(tm) rating system (Herbrich et al, 2007), applied in the context of the open-source modeling and simulation framework JAMES II. We also show how such systems can be used to steer performance analysis experiments for component ranking. The paper can be found here: https://docs.google.com/file/d/0BxPrl7QoBqmoUDVXNmZUc29Nbmc/edit?usp=sharingTRANSCRIPT
Evaluating Simulation SoftwareComponents with Player RatingSystems6. 3. 2013, SIMUTools 2013
Jonathan Wienß Michael Stein Roland Ewald
Sponsored by:
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 1
Component-Based Simulation Systems
• Simulator: combination of components
• Typical components:
• Event management• Collision detection• State saving• Result storage• Random number generation• etc.
• Example: JAMES IIhttp://flickr.com/photos/jdhancock/7239958506, cc-by
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 2
Problem: Evaluating Individual Components
https://commons.wikimedia.org/wiki/File:Rowing_-_USA_Lwt_4_@_World_Champs_2003.jpg
• Only component combinations are comparable
• Dedicated performance studies are expensive & difficult
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 3
Solution: Player Rating SystemsPerformance Comparison Multiplayer Team Results
1.
2.
3.
SC
SE B
SD ASD
A
25 s
SE
B
17 s
SC
15 s
E.g., Event Queues
Simulators{{
1. Component Combination = Team of Players
2. Record results (of multiple combinations)
3. Update global component rating
⇒ Component Rating Systems, e.g. to find good default components.
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 4
Component Rating Systems
• What is required?
• How does it work?
• How well does it work?
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 5
Component Rating Systems: Requirements
• Re-usable (system-independent)
• Inexpensive (memory, execution time)
• Scalable (w.r.t. components / component combinations)
• Robust (w.r.t. ‘outlier problems’)
• Adaptive (component updates)
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 6
Microsoft’s TrueSkillTM Approach1 (used for XBox LiveTM)• Input:
• Team defined by player indices, e.g., Ai = {4, 8, 125}
• Team assignment A = {A1, . . . ,Ak} (pairwise disjoint)
• Team ranking r (game result)
• Output: player skill ratings µi
• Assumptions:
• Player skill si ∼ N (µi , σ2
i )
• Player performance pi ∼ N (si , β2)
• Team performance tj =∑
i∈Ajpi
1: Herbrich, Minka, and Graepel: TrueSkill(tm): A Bayesian Skill Rating System, Adv. in Neural Information Processing Systems 19, 2007
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 7
Bayesian Inference in TrueSkill
p(s|r ,A) = P(r |s,A)p(s)P(r |A)
=
∫ ∞−∞
. . .
∫ ∞−∞
p(s,p, t|r ,A)dpdt
r : rankingA: team assignment
t : team performancesp: player performances
s : player skills
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 8
Factor Graphs & Message Passing
SDSC
A
SE
B
Performance Comparison
15 s 25 s 17 s
Multi-Player Team Results
1.
2.
3.
SC
SE B
SD A
1. Pass messages downwards:s → p → t
2. Expectation propagation (≈):t ↔ d(r)
3. Pass messages upwards:t → p → s
sSC
pSC
tSC
sSE
pSE
tSE+B
sB
pB
sSD
pSD
tSD+A
sA
pA
d1 d2
Skills
Performance
TeamPerformance
TeamPerformance
Difference
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 9
Limitations & Adaptations
• Strong assumptions that may not hold:
• Player performance independence
• Normally distributed performance
• No additive team performance→ average
• Player may play in more than one team
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 10
Ranking Event Queues in JAMES II: Reference Data
Event Queues / Models 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 SumMList 4 4 5 2 3 5 7 3 5 4 4 4 4 4 4 5 1 1 1 6 76
LinkedList 1 1 1 5 1 3 3 5 3 7 6 6 6 6 6 2 5 5 5 3 80
TwoList 2 2 2 6 2 2 2 7 2 3 7 7 7 7 7 4 6 6 6 4 91
CalendarQueue 7 7 8 3 6 8 8 1 6 5 2 2 2 3 2 9 2 2 2 7 92
BucketsThreshold 10 9 9 1 8 1 4 6 10 8 1 1 1 1 1 8 4 4 4 2 93
MPLinkedList 3 3 3 7 4 6 5 4 4 1 9 9 9 9 9 3 7 7 7 5 114
CalendarReQueue 9 10 10 4 7 9 6 10 9 9 3 3 3 2 3 7 3 3 3 8 121
Heap 5 5 4 8 5 4 1 9 8 6 8 8 8 8 8 6 9 9 9 1 129
Simple 8 6 6 9 9 7 9 2 1 10 5 5 5 5 5 1 10 10 10 9 132
DynamicCalendarQueue 6 8 7 10 10 9 10 8 7 2 10 10 10 10 10 10 8 8 8 10 171
55 55 55 55 55 54 55 55 55 55 55 55 55 55 55 55 55 55 55 55• Five models for each formalism: SRS, stoch-π, PDEVS, SR
• Per formalism: (1 + 3 + 1 + 3 = 8 simulators)× 10 event queues
• 80 comp. combinations× 20 replications× 5 models = 8.000 runs
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 11
Experiment Setup
SD
A
Simulation Problems
SE
BEligible Component Combinations
Execution Times
Component Rating System
1.
2.
Current Event Queue Ranking:
...
1. ...
10.
... ? Count Inversions
A, r
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 12
Evaluation: Ranking Event Queues
0
5
10
15
20
25
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Average Num
ber o
f Inversion
s
Component Combination Comparisons
Default Setupβ = 833.3
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 13
Summary
Problem: How to evaluate individual components of a simulation system?
Solution: A scalable and robust component rating system.
Method: Bayesian inference (MS TrueSkill algorithm).
Outlook:
• Global component rankings
• Consider ‘margin of victory’
• Improve usage for experiment steering
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 14
http://bitbucket.org/alesia
(License: Apache 2.0)6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 15
Thank you.
Questions?
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 16
Operation Modes
Simulation Software
Component Rating System
Component Comparisons
ComponentRanks
Users
ResultsProblem
Passive Mode Active Mode
Simulation Software
Component Rating System
Component Comparisons
Component Ranks etc.
PerformanceResults
Problem & Component Combinations
Match Selection & Experiment Control
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 17
Evaluation: Ranking Event Queues
0
5
10
15
20
25
30
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Aver
age
Num
ber o
f Inv
ersi
ons
Component Combination Comparisons
Passive Mode Active Mode
6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 18