scalable and secure architectures for online multiplayer games
DESCRIPTION
Scalable and Secure Architectures for Online Multiplayer Games. Thesis Proposal Ashwin Bharambe May 15, 2006. 8 million. 7 million. These MMORPGs have client-server architectures They accommodate ~0.5 million players at a time. Some more facts. 6 million. 5 million. 4 million. - PowerPoint PPT PresentationTRANSCRIPT
Scalable and Secure Architectures for Online
Multiplayer Games
Thesis ProposalThesis Proposal
Ashwin BharambeAshwin BharambeMay 15, 2006May 15, 2006
2
Online Games are Huge!
1 million
2 million
3 million
4 million
5 million
6 million
7 million
8 million
200520042003200220012000199919981997
Nu
mb
er
of
sub
scr
ibe
rs World of Warcraft
Final Fantasy XI
EverquestUltima Online
http://www.mmogchart.com/
1.These MMORPGs have client-server architectures
2.They accommodate ~0.5 million players at a time
1.These MMORPGs have client-server architectures
2.They accommodate ~0.5 million players at a time
Some more facts
3
Why MMORPGs Scale
Role Playing Games have been slow-pacedRole Playing Games have been slow-pacedPlayers interact with the server relatively infrequently
Maintain multiple independent game-worldsMaintain multiple independent game-worldsEach hosted on a different server
Not true with other game genresNot true with other game genresFPS or First Person Shooters (e.g., Quake)Demand high interactivityNeed a single game-world
4
FPS Games Don’t Scale
Ban
dwid
th (
kbps
)
Quake II server
Bandwidth and computation, both become Bandwidth and computation, both become bottlenecksbottlenecks
5
Goal: Cooperative Server Architecture
Focus on fast-paced FPS gamesFocus on fast-paced FPS games
6
Distributing Games: Challenges
Tight latency constraintsTight latency constraintsAs players or missiles move, updates must be disseminated very quickly < 150 ms for FPS games
High write-sharing in the workloadHigh write-sharing in the workload
CheatingCheatingExecution and state maintenance spread over untrustworthy nodes
7
Talk Outline
ProblemProblem
BackgroundBackgroundGame ModelRelated Work
Colyseus ArchitectureColyseus Architecture
Expected ContributionsExpected Contributions
8
Game Model
Player
GameStatus
Monsters
Ammo
Interactive3-D
environment(maps, models,textures)
ImmutableState
MutableState
Screenshot of Serious Sam
9
Game Execution in Client-Server Model
void RunGameFrame() // every 50-100ms{ // every object in the world // thinks once every game frame
foreach (obj in mutable_objs) { if (obj->think)
obj->think(); }
send_world_update_to_clients();};
void RunGameFrame() // every 50-100ms{ // every object in the world // thinks once every game frame
foreach (obj in mutable_objs) { if (obj->think)
obj->think(); }
send_world_update_to_clients();};
10
Object Partitioning
Player
Monster
11
Distributed Game Execution
class CruzMissile { // every object in the world // thinks once every game frame
void think() { update_pos(); if (dist_to_ground() < EPSILON) explode(); }
void explode() { foreach (p in get_nearby_objects()) { if (p.type == “player”) p.health -= 50; } }};
class CruzMissile { // every object in the world // thinks once every game frame
void think() { update_pos(); if (dist_to_ground() < EPSILON) explode(); }
void explode() { foreach (p in get_nearby_objects()) { if (p.type == “player”) p.health -= 50; } }};
Object Discovery
Replica Synchronization
Missile
Monster
ItemItem
12
Talk Outline
ProblemProblem
BackgroundBackgroundGame ModelGame ModelRelated Work
Colyseus ArchitectureColyseus Architecture
Expected ContributionsExpected Contributions
13
Related Work
Distributed DesignsDistributed DesignsDistributed Interactive Simulation (DIS) e.g., HLA, DIVE, MASSIVE, etc. Use region-based partitioning, IP multicast
Butterfly, Second-Life, SimMUD [INFOCOM 04] Use region-based partitioning, DHT multicast
Cheat-proofingCheat-proofingLock-step synchronization with commitment
14
Related Work: Techniques
Region-based PartitioningRegion-based Partitioning
Parallel SimulationParallel Simulation
Area-of-Interest Management with MulticastArea-of-Interest Management with Multicast
15
Related Work: Techniques
Region-based PartitioningRegion-based PartitioningDivide the game-world into fixed #regionsAssign objects in one region to one server
+ Simple to place and discover objects– High migration rates, especially for FPS games– Regions exhibit very high skews in popularity
can result in severe load imbalance
Parallel SimulationParallel Simulation
Area-of-Interest Management with MulticastArea-of-Interest Management with Multicast
16
Related Work: Techniques
Region-based PartitioningRegion-based Partitioning
Parallel SimulationParallel SimulationPeer-to-peer: each peer maintains full stateWrites to objects are sent to all peers
+ Point-to-point link updates go fastest– Needs lock-step + bucket synchronization– No conflict resolution inconsistency never
heals
Area-of-Interest Management with MulticastArea-of-Interest Management with Multicast
17
Related Work: Techniques
Region-based PartitioningRegion-based Partitioning
Parallel SimulationParallel Simulation
Area-of-Interest Management with MulticastArea-of-Interest Management with MulticastPlayers only need updates from nearby region1 region == 1 multicast group, use one shared multicast tree per group
Bandwidth load-imbalance due to skews in region popularityUpdates need multiple hops, bad for FPS games
18
Talk Outline
ProblemProblem
BackgroundBackground
Colyseus ArchitectureColyseus ArchitectureScalability [NSDI 2006] Evaluation
Security
Expected ContributionsExpected Contributions
19
Colyseus Components
R3 R4
P1P2
ServerS1
Server S2
P3 P4
Object Discovery Replica Management Object Placement
Server S3
get_ne
arby_o
bjects
()
20
Flexible and dynamic object placementFlexible and dynamic object placementPermits use of clustering algorithmsNot tied to “regions”
Previous systems use region-based Previous systems use region-based placementplacement
Frequent, disruptive migration for fast gamesRegions in a game have very skewed popularity
Object Placement
Region Rank
Pop
ular
ity
21
Writes are serialized at the primary Writes are serialized at the primary Primary responsible for executing think codePrimary responsible for executing think code
Replica trails from the primary by one hopReplica trails from the primary by one hopWeakly consistentLow latency is critical
Replication Model
SinglePrimary
Read-onlyReplicas
Primary-Backup Replication1-hop
22
Object Discovery
Most objects only need other “nearby” Most objects only need other “nearby” objects for executing think functions objects for executing think functions
get_nearby_objects ()
23
Distributed Object DiscoveryMy position is
x=x1, y=y1, z=z1
Located on 128.2.255.255
My position isx=x1, y=y1, z=z1
Located on 128.2.255.255
Publication
Find all objects withobj.x ε [x1, x2]obj.y ε [y1, y2]obj.z ε [z1, z2]
Find all objects withobj.x ε [x1, x2]obj.y ε [y1, y2]obj.z ε [z1, z2]
Subscription
S
S
S
P
Use a structured overlay to achieve this
24
Mercury: Range Queriable DHT
Supports range queries vs. exact matches Supports range queries vs. exact matches No need for partitioning into “regions”
Places data contiguouslyPlaces data contiguouslyCan utilize spatial locality in games
Dynamically balances loadDynamically balances loadControl traffic does not cause hotspots
Provides O(log n)-hop lookupProvides O(log n)-hop lookupAbout 200ms for 225 nodes in our setup
[SIGCOMM 2004]
25
Object Discovery Optimizations
Pre-fetch soon-to-be required objectsPre-fetch soon-to-be required objectsUse game physics for prediction
Pro-active replicationPro-active replicationPiggyback object creation on update messages
Soft-state subscriptions and publicationsSoft-state subscriptions and publicationsAdd object-specific TTLs to pubs and subs
26
Colyseus Design: Recap
MercuryMercury
128.2.9.100 128.2.9.200
Monster on 128.2.9.200
Find me nearby objects
Replica
Direct point-to-point connection
27
Putting It All Together
28
Talk Outline
ProblemProblem
BackgroundBackground
Colyseus ArchitectureColyseus ArchitectureScalabilityScalability Evaluation [NSDI 2006]
Security
Expected ContributionsExpected Contributions
29
Evaluation Goals
Bandwidth scalabilityBandwidth scalabilityPer-node bandwidth usage should scale with the number of nodes
View inconsistency due to object discovery View inconsistency due to object discovery latency should be smalllatency should be small
Discovery latency, pre-fetching overhead Discovery latency, pre-fetching overhead
in [NSDI 2006]in [NSDI 2006]
30
Experimental Setup
Emulab-based evaluationEmulab-based evaluation
Synthetic game Synthetic game Workload based on Quake III traces
P2P scenarioP2P scenario1 player per serverUnlimited bandwidthModeled end-to-end latencies
More results including a Quake II evaluation, More results including a Quake II evaluation, in [NSDI 2006]in [NSDI 2006]
31
Per-node Bandwidth ScalingM
ean
ou
tgo
ing
ban
dw
idth
(kb
ps)
Number of nodes
32
View InconsistencyA
vg.
frac
tio
n o
f m
ob
ile
ob
ject
s m
issi
ng
Number of nodes
no delay100 ms delay400 ms delay
33
Planned Work
Consistency modelsConsistency modelsGame operations demand differing levels of consistency and latency response Causal ordering of events Atomicity
DeploymentDeploymentPerformance metrics depend crucially on the workloadA real game workload would be useful for future research
34
Talk Outline
ProblemProblem
BackgroundBackground
Colyseus ArchitectureColyseus ArchitectureScalabilityScalability Evaluation Evaluation
Security [Planned Work]
Expected ContributionsExpected Contributions
35
Cheating in Online Games
Why do cheats arise?Why do cheats arise?Distributed system (client-server or P2P)Bugs in the game implementation
Possible Cheats in ColyseusPossible Cheats in ColyseusObject Discovery map-hack, subscription-hijack
Replication god-mode, event-ordering, etc.
Object Placement god-mode
36
Object Discovery Cheats
map-hackmap-hack cheat [Information overexposure] cheat [Information overexposure]Subscribe to arbitrary areas in the gameDiscover all objects, which may be against game rules
Subscription-hijack cheatSubscription-hijack cheatIncorrectly route subscriptions of your enemyEnemy cannot discover (see) players Other players can see her and can shoot her
37
Replication Cheats
god-modegod-mode cheat cheatPrimary node has arbitrary control over writes to the object
Timestamp cheatTimestamp cheatPrimary node decides the serialized write order
You die!
No, I don’t!
Node A Node B
38
Replication Cheats
Suppress-update cheatSuppress-update cheatPrimary does not send updates to the replicas
Inconsistency cheatInconsistency cheatPrimary sends incorrect or conflicting updates to the replicas
Hide from this guy
I am dead
I moved to another room
Player APlayer C
Player D
Player B
39
Related Work
NEO protocol [GauthierDickey 04]NEO protocol [GauthierDickey 04]Lock-step synchronization with commitment Send encrypted update in round 1 Send decryption key in round 2, only after you
receive updates from everybody
+ Addresses suppress-update cheat timestamp cheat
– Lock-step synchronization increases “lag”– Does not address god-mode cheat, among
others
40
Solution Approach
Philosophy: Philosophy: Detection rather than PreventionDetection rather than PreventionPreventing cheating ≈ Byzantine fault toleranceKnown protocols emphasize strict consistency and assume weak synchrony
Multiple rounds unsuitable for game-play
High-level decisionsHigh-level decisions1. Make players leave an audit-trail2. Make peers police each other 3. Keep detection out of critical path always
41
Distributed Audit
Log
Log
Log
Randomly chosen witness
CentralizedAuditor
42
Player Log
Witness Log
Logging Using Witnesses
Think code
Witness Node
Player Node
Optimistic
Update path
SerializedUpdates
43
Using Witnesses: Good and Bad
+ Player, witness logs can be used for audits+ Player, witness logs can be used for audits Potentially address timestamp, god-mode and
inconsistency cheats
+ Witness can generate pubs + subs + Witness can generate pubs + subs Addresses map-hack cheat
– – Bandwidth overheadBandwidth overhead
– – Does not handle suppress-update cheat and Does not handle suppress-update cheat and the subscription-hijack cheatthe subscription-hijack cheat
44
Using Witnesses: Alternate Design
Move the primary directly to the witness nodeMove the primary directly to the witness nodeCode execution and writes directly applied at the witness
– – Primary Primary replica updates go through witness replica updates go through witness
– – Witness gets arbitrary powerWitness gets arbitrary powerPlayer cannot complain to anybody
Witness Node hasprimary copy of player
45
Challenges
Balance power between player and witnessBalance power between player and witnessUse cryptographic techniques
How do players detect somebody is cheating?How do players detect somebody is cheating?Extraction of rules from the game code
Securing the object discovery layerSecuring the object discovery layerLeverage DHT security research
Keep bandwidth overhead minimalKeep bandwidth overhead minimal
46
Talk Outline
ProblemProblem
BackgroundBackground
Colyseus ArchitectureColyseus ArchitectureScalabilityScalability Evaluation Evaluation
SecuritySecurity
Expected ContributionsExpected Contributions
47
Expected Contributions
Mercury range-queriable DHTMercury range-queriable DHT
Design and evaluation of ColyseusDesign and evaluation of Colyseus
Real-world measurement of game workloadsReal-world measurement of game workloads
Anti-cheating protocolsAnti-cheating protocols
48
Expected Contributions
Mercury range-queriable DHTMercury range-queriable DHTFirst structured overlay to support range queries and dynamic load balancingImplementation used in other systems
Design and evaluation of ColyseusDesign and evaluation of Colyseus
Real-world measurement of game workloadsReal-world measurement of game workloads
Anti-cheating protocolsAnti-cheating protocols
49
Expected Contributions
Mercury range-queriable DHTMercury range-queriable DHT
Design and evaluation of ColyseusDesign and evaluation of ColyseusFirst distributed design to be successfully applied for scaling FPS gamesDemonstrated that low-latency game-play is feasibleFlexible architecture for adapting to various types of games
Real-world measurement of game workloadsReal-world measurement of game workloads
Anti-cheating protocolsAnti-cheating protocols
50
Expected Contributions
Mercury range-queriable DHTMercury range-queriable DHT
Design and evaluation of ColyseusDesign and evaluation of Colyseus
Real-world measurement of game workloadsReal-world measurement of game workloadsDeployment of Quake III
Anti-cheating protocolsAnti-cheating protocols
51
Expected Contributions
Mercury range-queriable DHTMercury range-queriable DHT
Design and evaluation of ColyseusDesign and evaluation of Colyseus
Real-world measurement of game workloadsReal-world measurement of game workloads
Anti-cheating protocolsAnti-cheating protocolsEncourage real-world deployments Lead towards lighter-weight fault-tolerance protocols
52
Summary of Thesis Statement
Design of scalable, secure architectures for Design of scalable, secure architectures for games utilizing key propertiesgames utilizing key properties
Game workload is predictablePlayers tolerate loose, eventual consistency
53
Differences from Related Work
Avoid region-based object placementAvoid region-based object placementFrequent migration when objects moveLoad-imbalance due to skewed region popularity
1-hop unicast update path between primaries 1-hop unicast update path between primaries and replicasand replicas
Previous systems used overlay multicast
Replication model with eventual consistency Replication model with eventual consistency Avoid parallel execution
54
Timeline
Development of newer consistency Development of newer consistency and anti-cheat protocolsand anti-cheat protocols May 06 May 06 Jul 06 Jul 06
Integration of Colyseus with Integration of Colyseus with
Quake III Quake III May 06 May 06 Jul 06 Jul 06
Implementation of consistency and Implementation of consistency and anti-cheat protocolsanti-cheat protocols Jul 06 Jul 06 Sep 06 Sep 06
Deployment and evaluationDeployment and evaluation Jul 06 Jul 06 Dec 06 Dec 06
Thesis writingThesis writing Dec 06 Dec 06 Mar 07 Mar 07
55
Thanks
56
Object Discovery LatencyM
ean
ob
ject
dis
cove
ry l
aten
cy (
ms)
Number of nodes
57
Object Discovery Latency
Observations:1. Routing delay scales similarly for both types of DHTs: both exploit caching effectively.
Median hop-count = 3.
2. DHT gains a small advantage because it does not have to “spread” subscriptions
Observations:1. Routing delay scales similarly for both types of DHTs: both exploit caching effectively.
Median hop-count = 3.
2. DHT gains a small advantage because it does not have to “spread” subscriptions
58
Bandwidth Breakdown
Number of nodes
Mea
n o
utg
oin
g b
and
wid
th (
kbp
s)
59
Bandwidth Breakdown
Observations:
1. Object discovery forms a significant part of the total bandwidth consumed
2. A range-queriable DHT scales better vs. a normal DHT (with linearized maps)
Observations:
1. Object discovery forms a significant part of the total bandwidth consumed
2. A range-queriable DHT scales better vs. a normal DHT (with linearized maps)
60
Goals and Challenges
1.1. Relieve the computational bottleneck Relieve the computational bottleneckChallenge: partition code execution effectively
2.2. Relieve the bandwidth bottleneck Relieve the bandwidth bottleneck Challenge: minimize bandwidth overhead due to object replication
3.3. Enable low-latency game-play Enable low-latency game-playChallenge: replicas should be updated as quickly as possible
61
Key Design Elements
Primary-backup replication modelPrimary-backup replication modelRead-only replicas
Flexible object placementFlexible object placementAllow objects to be placed on any node
Scalable object lookup Scalable object lookup Use structured overlays for discovering objects
62
View Consistency
Object discovery should succeed as quickly as Object discovery should succeed as quickly as possiblepossible
Missing objects incorrect rendered view
ChallengesChallengesO(log n) hops for the structured overlay Not enough for fast games
Objects like missiles travel fast and short-lived
63
Distributed Architectures: Motivation
Server farms? $$$Server farms? $$$Significant barrier to entry
Motivating factorsMotivating factorsMost game publishers are smallGames grow old very quickly
What if you are ~1000 university students What if you are ~1000 university students wanting to host and play a large game? wanting to host and play a large game?
64
Colyseus Components
Object Location Replica Management
Mercury
server s2
P1
P2
R3 R4
3. Register Replicas: R3 (to s2), R4 (to s2)
4. Synch Replicas: R3, R4
1. Specify Predicted Interests: (5 < X < 60 & 10 < y < 200) TTL 30sec
2. Locate Remote Objects: P3 on s2, P4 on s2
Object Storeserver s1
P3 P4
Object Placement
5. Optimize Placement: migrate P1 to server s2
65
Object Pre-fetching
On-demand object discovery can cause stalls On-demand object discovery can cause stalls or render an incorrect viewor render an incorrect view
Use game physics for predictionUse game physics for predictionPredict which areas objects will move toSubscribe to object publications in those areas
66
Normal object discovery and replica Normal object discovery and replica instantiation slow for short-lived objectsinstantiation slow for short-lived objects
Piggyback object-creation messages to Piggyback object-creation messages to updates of other objectsupdates of other objects
Replicate missile pro-actively wherever creator is replicated
Pro-active Replication
67
Objects need to tailor publication rate to Objects need to tailor publication rate to speedspeed
Ammo or health-packs don’t move much
Add TTLs to subscriptions and publications Add TTLs to subscriptions and publications Stored pubs act like triggers to incoming subs
Soft-state Storage
68
Per-node Bandwidth Scaling
Observations:1. Colyseus bandwidth-costs scale well with #nodes
2. Feasible for P2P deployment (compare single-server or broadcast)
3. In aggregate, Colyseus bandwidth costs are 4-5 times higher there is overhead
Observations:1. Colyseus bandwidth-costs scale well with #nodes
2. Feasible for P2P deployment (compare single-server or broadcast)
3. In aggregate, Colyseus bandwidth costs are 4-5 times higher there is overhead
69
View Inconsistency
Observations:1. View inconsistency is small and gets repaired quickly
2. Missing objects on the periphery
Observations:1. View inconsistency is small and gets repaired quickly
2. Missing objects on the periphery
no delay100 ms delay400 ms delay
70
Cheating in Games
Examples of some cheatsExamples of some cheatsInformation overexposure (maphack)Get arbitrary health, weapons (god-mode)Precise and automatic weapons (aimbot)Event ordering Did I shoot you first or did you move first?
Exploiting bugs inside the game (duping)
71
Distributed Design Components
Object
Replica
ObjectDiscovery
ObjectDiscovery
InstantiateReplicas
InstantiateReplicas