evaluating simulation software components with player rating systems (simutools 2013)

Evaluating Simulation SoftwareComponents with Player RatingSystems6. 3. 2013, SIMUTools 2013

Jonathan Wienß Michael Stein Roland Ewald

Sponsored by:

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 1

Component-Based Simulation Systems

• Simulator: combination of components

• Typical components:

• Event management• Collision detection• State saving• Result storage• Random number generation• etc.

• Example: JAMES IIhttp://flickr.com/photos/jdhancock/7239958506, cc-by


Problem: Evaluating Individual Components

https://commons.wikimedia.org/wiki/File:Rowing_-_USA_Lwt_4_@_World_Champs_2003.jpg

• Only component combinations are comparable

• Dedicated performance studies are expensive & difficult


Solution: Player Rating SystemsPerformance Comparison Multiplayer Team Results

1.

2.

3.

SC

SE B

SD ASD

A

25 s

SE

B

17 s

SC

15 s

E.g., Event Queues

Simulators{{

1. Component Combination = Team of Players

2. Record results (of multiple combinations)

3. Update global component rating

⇒ Component Rating Systems, e.g. to find good default components.


Component Rating Systems

• What is required?

• How does it work?

• How well does it work?


Component Rating Systems: Requirements

• Re-usable (system-independent)

• Inexpensive (memory, execution time)

• Scalable (w.r.t. components / component combinations)

• Robust (w.r.t. ‘outlier problems’)

• Adaptive (component updates)


Microsoft’s TrueSkillTM Approach1 (used for XBox LiveTM)• Input:

• Team defined by player indices, e.g., Ai = {4, 8, 125}

• Team assignment A = {A1, . . . ,Ak} (pairwise disjoint)

• Team ranking r (game result)

• Output: player skill ratings µi

• Assumptions:

• Player skill si ∼ N (µi , σ2

i )

• Player performance pi ∼ N (si , β2)

• Team performance tj =∑

i∈Ajpi

1: Herbrich, Minka, and Graepel: TrueSkill(tm): A Bayesian Skill Rating System, Adv. in Neural Information Processing Systems 19, 2007


Bayesian Inference in TrueSkill

p(s|r ,A) = P(r |s,A)p(s)P(r |A)

=

∫ ∞−∞

. . .

∫ ∞−∞

p(s,p, t|r ,A)dpdt

r : rankingA: team assignment

t : team performancesp: player performances

s : player skills


Factor Graphs & Message Passing

SDSC

A

SE

B

Performance Comparison

15 s 25 s 17 s

Multi-Player Team Results

1.

2.

3.

SC

SE B

SD A

1. Pass messages downwards:s → p → t

2. Expectation propagation (≈):t ↔ d(r)

3. Pass messages upwards:t → p → s

sSC

pSC

tSC

sSE

pSE

tSE+B

sB

pB

sSD

pSD

tSD+A

sA

pA

d1 d2

Skills

Performance

TeamPerformance

TeamPerformance

Difference


Limitations & Adaptations

• Strong assumptions that may not hold:

• Player performance independence

• Normally distributed performance

• No additive team performance→ average

• Player may play in more than one team


Ranking Event Queues in JAMES II: Reference Data

Event Queues / Models 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 SumMList 4 4 5 2 3 5 7 3 5 4 4 4 4 4 4 5 1 1 1 6 76

LinkedList 1 1 1 5 1 3 3 5 3 7 6 6 6 6 6 2 5 5 5 3 80

TwoList 2 2 2 6 2 2 2 7 2 3 7 7 7 7 7 4 6 6 6 4 91

CalendarQueue 7 7 8 3 6 8 8 1 6 5 2 2 2 3 2 9 2 2 2 7 92

BucketsThreshold 10 9 9 1 8 1 4 6 10 8 1 1 1 1 1 8 4 4 4 2 93

MPLinkedList 3 3 3 7 4 6 5 4 4 1 9 9 9 9 9 3 7 7 7 5 114

CalendarReQueue 9 10 10 4 7 9 6 10 9 9 3 3 3 2 3 7 3 3 3 8 121

Heap 5 5 4 8 5 4 1 9 8 6 8 8 8 8 8 6 9 9 9 1 129

Simple 8 6 6 9 9 7 9 2 1 10 5 5 5 5 5 1 10 10 10 9 132

DynamicCalendarQueue 6 8 7 10 10 9 10 8 7 2 10 10 10 10 10 10 8 8 8 10 171

55 55 55 55 55 54 55 55 55 55 55 55 55 55 55 55 55 55 55 55• Five models for each formalism: SRS, stoch-π, PDEVS, SR

• Per formalism: (1 + 3 + 1 + 3 = 8 simulators)× 10 event queues

• 80 comp. combinations× 20 replications× 5 models = 8.000 runs


Experiment Setup

SD

A

Simulation Problems

SE

BEligible Component Combinations

Execution Times

Component Rating System

1.

2.

Current Event Queue Ranking:

...

1. ...

10.

... ? Count Inversions

A, r


Evaluation: Ranking Event Queues

0

5

10

15

20

25

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Average Num

ber o

f Inversion

s

Component Combination Comparisons

Default Setupβ = 833.3


Summary

Problem: How to evaluate individual components of a simulation system?

Solution: A scalable and robust component rating system.

Method: Bayesian inference (MS TrueSkill algorithm).

Outlook:

• Global component rankings

• Consider ‘margin of victory’

• Improve usage for experiment steering


http://bitbucket.org/alesia

(License: Apache 2.0)6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 15

http://bitbucket.org/alesia

Thank you.

Questions?


Operation Modes

Simulation Software


Component Comparisons

ComponentRanks

Users

ResultsProblem

Passive Mode Active Mode

Simulation Software


Component Comparisons

Component Ranks etc.

PerformanceResults

Problem & Component Combinations

Match Selection & Experiment Control


Evaluation: Ranking Event Queues

0

5

10

15

20

25

30

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Aver

age

Num

ber o

f Inv

ersi

ons

Component Combination Comparisons

Passive Mode Active Mode


evaluating simulation software components with player rating systems (simutools 2013)

Technology