evaluating simulation software components with player rating systems (simutools 2013)

18
Evaluating Simulation Software Components with Player Rating Systems 6. 3. 2013, SIMUTools 2013 Jonathan Wienß Michael Stein Roland Ewald Sponsored by: 6. 3. 2013 c 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 1

Upload: roland-ewald

Post on 05-Dec-2014

396 views

Category:

Technology


0 download

DESCRIPTION

In component-based simulation systems, simulation runs are usually executed by combinations of distinct components, each solving a particular sub-task. If multiple components are available for a given sub-task (e.g., different event queue implementations), a simulation system may rely on an automatic selection mechanism, on a user decision, or --- if neither is available --- on a predefined default component. However, deciding upon a default component for each kind of sub-task is difficult: such a component should work well across various application domains and various combinations with other components. Furthermore, the performance of individual components cannot be evaluated easily, since performance is typically measured for component combinations as a whole (e.g., the execution time of a simulation run). Finally, the selection of default components should be dynamic, as new and potentially superior components may be deployed to the system over time. We illustrate how player rating systems for team-based games can solve the above problems and evaluate our approach with an implementation of the TrueSkill(tm) rating system (Herbrich et al, 2007), applied in the context of the open-source modeling and simulation framework JAMES II. We also show how such systems can be used to steer performance analysis experiments for component ranking. The paper can be found here: https://docs.google.com/file/d/0BxPrl7QoBqmoUDVXNmZUc29Nbmc/edit?usp=sharing

TRANSCRIPT

Page 1: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Evaluating Simulation SoftwareComponents with Player RatingSystems6. 3. 2013, SIMUTools 2013

Jonathan Wienß Michael Stein Roland Ewald

Sponsored by:

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 1

Page 2: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Component-Based Simulation Systems

• Simulator: combination of components

• Typical components:

• Event management• Collision detection• State saving• Result storage• Random number generation• etc.

• Example: JAMES IIhttp://flickr.com/photos/jdhancock/7239958506, cc-by

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 2

Page 3: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Problem: Evaluating Individual Components

https://commons.wikimedia.org/wiki/File:Rowing_-_USA_Lwt_4_@_World_Champs_2003.jpg

• Only component combinations are comparable

• Dedicated performance studies are expensive & difficult

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 3

Page 4: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Solution: Player Rating SystemsPerformance Comparison Multiplayer Team Results

1.

2.

3.

SC

SE B

SD ASD

A

25 s

SE

B

17 s

SC

15 s

E.g., Event Queues

Simulators{{

1. Component Combination = Team of Players

2. Record results (of multiple combinations)

3. Update global component rating

⇒ Component Rating Systems, e.g. to find good default components.

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 4

Page 5: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Component Rating Systems

• What is required?

• How does it work?

• How well does it work?

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 5

Page 6: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Component Rating Systems: Requirements

• Re-usable (system-independent)

• Inexpensive (memory, execution time)

• Scalable (w.r.t. components / component combinations)

• Robust (w.r.t. ‘outlier problems’)

• Adaptive (component updates)

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 6

Page 7: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Microsoft’s TrueSkillTM Approach1 (used for XBox LiveTM)• Input:

• Team defined by player indices, e.g., Ai = {4, 8, 125}

• Team assignment A = {A1, . . . ,Ak} (pairwise disjoint)

• Team ranking r (game result)

• Output: player skill ratings µi

• Assumptions:

• Player skill si ∼ N (µi , σ2

i )

• Player performance pi ∼ N (si , β2)

• Team performance tj =∑

i∈Ajpi

1: Herbrich, Minka, and Graepel: TrueSkill(tm): A Bayesian Skill Rating System, Adv. in Neural Information Processing Systems 19, 2007

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 7

Page 8: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Bayesian Inference in TrueSkill

p(s|r ,A) = P(r |s,A)p(s)P(r |A)

=

∫ ∞−∞

. . .

∫ ∞−∞

p(s,p, t|r ,A)dpdt

r : rankingA: team assignment

t : team performancesp: player performances

s : player skills

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 8

Page 9: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Factor Graphs & Message Passing

SDSC

A

SE

B

Performance Comparison

15 s 25 s 17 s

Multi-Player Team Results

1.

2.

3.

SC

SE B

SD A

1. Pass messages downwards:s → p → t

2. Expectation propagation (≈):t ↔ d(r)

3. Pass messages upwards:t → p → s

sSC

pSC

tSC

sSE

pSE

tSE+B

sB

pB

sSD

pSD

tSD+A

sA

pA

d1 d2

Skills

Performance

TeamPerformance

TeamPerformance

Difference

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 9

Page 10: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Limitations & Adaptations

• Strong assumptions that may not hold:

• Player performance independence

• Normally distributed performance

• No additive team performance→ average

• Player may play in more than one team

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 10

Page 11: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Ranking Event Queues in JAMES II: Reference Data

Event Queues / Models 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 SumMList 4 4 5 2 3 5 7 3 5 4 4 4 4 4 4 5 1 1 1 6 76

LinkedList 1 1 1 5 1 3 3 5 3 7 6 6 6 6 6 2 5 5 5 3 80

TwoList 2 2 2 6 2 2 2 7 2 3 7 7 7 7 7 4 6 6 6 4 91

CalendarQueue 7 7 8 3 6 8 8 1 6 5 2 2 2 3 2 9 2 2 2 7 92

BucketsThreshold 10 9 9 1 8 1 4 6 10 8 1 1 1 1 1 8 4 4 4 2 93

MPLinkedList 3 3 3 7 4 6 5 4 4 1 9 9 9 9 9 3 7 7 7 5 114

CalendarReQueue 9 10 10 4 7 9 6 10 9 9 3 3 3 2 3 7 3 3 3 8 121

Heap 5 5 4 8 5 4 1 9 8 6 8 8 8 8 8 6 9 9 9 1 129

Simple 8 6 6 9 9 7 9 2 1 10 5 5 5 5 5 1 10 10 10 9 132

DynamicCalendarQueue 6 8 7 10 10 9 10 8 7 2 10 10 10 10 10 10 8 8 8 10 171

55 55 55 55 55 54 55 55 55 55 55 55 55 55 55 55 55 55 55 55• Five models for each formalism: SRS, stoch-π, PDEVS, SR

• Per formalism: (1 + 3 + 1 + 3 = 8 simulators)× 10 event queues

• 80 comp. combinations× 20 replications× 5 models = 8.000 runs

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 11

Page 12: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Experiment Setup

SD

A

Simulation Problems

SE

BEligible Component Combinations

Execution Times

Component Rating System

1.

2.

Current Event Queue Ranking:

...

1. ...

10.

... ? Count Inversions

A, r

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 12

Page 13: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Evaluation: Ranking Event Queues

0

5

10

15

20

25

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Average Num

ber o

f Inversion

s

Component Combination Comparisons

Default Setupβ = 833.3

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 13

Page 14: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Summary

Problem: How to evaluate individual components of a simulation system?

Solution: A scalable and robust component rating system.

Method: Bayesian inference (MS TrueSkill algorithm).

Outlook:

• Global component rankings

• Consider ‘margin of victory’

• Improve usage for experiment steering

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 14

Page 15: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

http://bitbucket.org/alesia

(License: Apache 2.0)6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 15

Page 16: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Thank you.

Questions?

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 16

Page 17: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Operation Modes

Simulation Software

Component Rating System

Component Comparisons

ComponentRanks

Users

ResultsProblem

Passive Mode Active Mode

Simulation Software

Component Rating System

Component Comparisons

Component Ranks etc.

PerformanceResults

Problem & Component Combinations

Match Selection & Experiment Control

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 17

Page 18: Evaluating Simulation Software Components with Player Rating Systems (SIMUTools 2013)

Evaluation: Ranking Event Queues

0

5

10

15

20

25

30

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Aver

age

Num

ber o

f Inv

ersi

ons

Component Combination Comparisons

Passive Mode Active Mode

6. 3. 2013 c© 2013 UNIVERSITÄT ROSTOCK | MODELING & SIMULATION RESEARCH GROUP 18