scalable and transparent parallelization of multiplayer games

72
1 Scalable and transparent parallelization of multiplayer games Bogdan Simion MASc thesis Department of Electrical and Computer Engineering

Upload: brianne-haughey

Post on 31-Dec-2015

49 views

Category:

Documents


1 download

DESCRIPTION

Scalable and transparent parallelization of multiplayer games. Bogdan Simion MASc thesis Department of Electrical and Computer Engineering. Multiplayer games. Captivating, highly popular Dynamic artifacts. Multiplayer games. - More than 100k concurrent players. - Long playing times:. - PowerPoint PPT Presentation

TRANSCRIPT

1

Scalable and transparent parallelization of

multiplayer games

Bogdan SimionMASc thesis

Department of Electrical andComputer Engineering

2

Multiplayer games Captivating,

highly popular

Dynamic

artifacts

3

Multiplayer games

- Long playing times:

- More than 100k concurrent players

4

Multiplayer games

1.World of Warcraft:“I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.”

- Long playing times:

- More than 100k concurrent players

5

Multiplayer games

1.World of Warcraft:“I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.”

2.Halo2: “My longest playing streak was last summer, about 19 hours playing Halo2 on my XBox.”

- Long playing times:

- More than 100k concurrent players

6

Multiplayer games

1.World of Warcraft:“I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.”

2.Halo2: “My longest playing streak was last summer, about 19 hours playing Halo2 on my XBox.”

- Long playing times:

- More than 100k concurrent players

- Game server is the bottleneck

7

Server scaling

Game code parallelization is hard Complex and highly dynamic code Concurrency issues (data races) require

conservative synchronization Deadlocks

8

State-of-the-art

Parallel programming paradigms: Lock-based (pthreads)Transactional memory

Previous parallelizations of Quake Lock-based [Abdelkhalek et. al ‘04] shows

that false sharing is a challenge

9

Transactional Memory vs. Locks

AdvantagesSimpler programming taskTransparently ensures correct execution

Shared data access tracking Detects conflicts and aborts conflicting

transactions

DisadvantagesSoftware (STM) access tracking overheads

10

Transactional Memory vs. Locks

AdvantagesSimpler programming taskTransparently ensures correct execution

Shared data access tracking Detects conflicts and aborts conflicting

transactions

DisadvantagesSoftware (STM) access tracking overheads

Never shown to be practical for real applications

11

Contributions

Case study of parallelization for gamessynthetic version of Quake (SynQuake)

We compare 2 approaches: lock-based and STM parallelizations

We showcase the first realistic application where STM outperforms locks

12

Outline

Application environment: SynQuake gameData structures, server architecture

Parallelization issues False sharingLoad balancing

Experimental results Conclusions

13

Environment: SynQuake game

Simplified version of Quake

Entities: players resources

(apples) walls

Emulated quests

14

SynQuake

Playerscan move

and interact(eat, attack, flee, go to

quest)Apples

Food objects,increase life

WallsImmutable, limit

movement

Contains all the features found in Quake

15

Game map representation

Fast retrieval of game objects

Spatial data structure: areanode tree

16

Areanode tree

Game map Areanode tree

Root node

17

Areanode tree

Game map Areanode tree

A B

18

Areanode tree

A B

Game map Areanode tree

A B

19

A1

A2

B1

B2

Areanode tree

A B

A1 A2 B1 B2

Game map Areanode tree

20

Areanode tree

A1 A2 B1 B2

A B

Game map Areanode tree

A1

A2

B1

B2

21

Server frame

1

2

3

Server frame

Barrier

Barrier

Barrier

Barrier

22

Server frame

Receive & Process Requests

1

2

3

Server frame

Clientrequests

23

Server frame

Receive & Process Requests

1

2

3

Server frame

Admin(singlethread)

Clientrequests

24

Server frame

Clientrequests

Receive & Process Requests

Form &Send Replies

Clientupdates

1

2

3

Server frame

Admin(singlethread)

25

Parallelization in games

Quake - Locks-based synchronization [Abdelkhalek et al. 2004]

26

Parallelization: request processing

Clientrequests

Receive & Process Requests

Form &Send Replies

Clientupdates

1

2

3

Server frame

Admin(singlethread)

Parallelization in this stage

27

Outline

Application environment: SynQuake game Parallelization issues

False sharingLoad balancing

Experimental results Conclusions

28

Parallelization overview

Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies

29

Collision detection

• Player actions: move, shoot etc. • Calculate action bounding box

30

Action bounding box

P1

Short- Range

31

Action bounding box

P1

P2

Short- Range

Long- Range

32

Action bounding box

P1

P2

P3

33

Action bounding box

P1

P2

P3

Overlap P1

Overlap P2

34

Player assignment

P1

P2

P3

T1

T3Players handled by threads

If players P1,P2,P3 are assigned to distinct threads → Synchronization required

Long-range actions have a higher probability to cause conflicts

T2

35

False sharing

36

False sharing

Move range

Sh

oo

t ran

ge

37

False sharing

Action bounding box with locks

Move range

Sh

oo

t ran

ge

38

False sharing

Action bounding box with TM

Action bounding box with locks

Move range

Sh

oo

t ran

ge

39

Parallelization overview

Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies

40

Synchronization algorithm: Locks

Hold locks on parents as little as possible Deadlock-free algorithm

41

Synchronization algorithm: Locks

A B

1

2

P1

P2

P4

P3P5

P6

P

A B

A1 A2 B1 B2

Root

P1

P2P5, P6 P4

P3

42

Synchronization algorithm: Locks

A B

1

2

P1

P2

P4

P3P5

P6

P

A B

A1 A2 B1 B2

Root

P1

P2P5, P6 P4

P3

Area of interest

43

Synchronization algorithm: Locks

A B

1

2

P1

P2

P4

P3P5

P6

P

A B

A1 A2 B1 B2

Root

P1

P2P5, P6 P4

P3

Area of interest

Leaves overlapped

44

Synchronization algorithm: Locks

A B

1

2

P1

P2

P4

P3P5

P6

P

A B

A1 A2 B1 B2

Root

P1

P2P5, P6 P4

P3

Area of interest

Leaves overlapped

Lock parentstemporarily

45

Synchronization: Locks vs. STMLocks:

1. Determine overlapping leaves (L)

2. LOCK (L)

3. Process L

4. For each node P in overlapping parents

LOCK(P)

Process P

UNLOCK(P)

5. UNLOCK (L)

STM:

1. BEGIN_TRANSACTION

2. Determine overlapping leaves (L)

3. Process L

4. For each node in P in overlapping parents

Process P

5. COMMIT_TRANSACTION

46

Synchronization: Locks vs. STMLocks:

1. Determine overlapping leaves (L)

2. LOCK (L)

3. Process L

4. For each node P in overlapping parents

LOCK(P)

Process P

UNLOCK(P)

5. UNLOCK (L)

STM:

1. BEGIN_TRANSACTION

2. Determine overlapping leaves (L)

3. Process L

4. For each node in P in overlapping parents

Process P

5. COMMIT_TRANSACTION

STM acquires ownership gradually, reduced false sharingConsistency ensured transparently by the STM

47

Parallelization overview

Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies

48

Load balancing issues

Assign tasks to threads

Balance workload

T1 T2

T3 T4

P2

P1

P3

P4

49

Assign tasks to threads

Cross-border conflicts→ Synchronization

T1 T2

T3 T4

P2

P1

Moveaction

Shootaction P3

P4

Load balancing issues

50

Load balancing goals

Tradeoff:Balance workload among threadsReduce synchronization/true sharing

51

Load balancing policies

a) Round-robin

Y

256

768

1024

512

0 256 768 1024512

X

- Thread 3 - Thread 4- Thread 1 - Thread 2

52

Load balancing policies

a) Round-robin

Y

256

768

1024

512

0 256 768 1024512

X

- Thread 3 - Thread 4- Thread 1 - Thread 2

b) Spread

256

768

1024

512

0 256 768 1024512

Y

53

Load balancing policies

c) Static locality-aware

- Thread 3 - Thread 4- Thread 1 - Thread 2

b) Spread

256

768

1024

512

0 256 768 1024512

X

Y

256

768

1024

512

0 256 768 1024512

X

Y

54

Locality-aware load balancing

Dynamically detect player hotspots and adjust workload assignments

Compromise between load balancing and reducing synchronization

55

Dynamic locality-aware LB

Game map Graph representation

56

Dynamic locality-aware LB

Game map Graph representation

57

Experimental results

Test scenarios Scaling:

with and without physics computation The effect of load balancing on scaling The influence of locality-awareness

58

128

256

384

640

768

896

1024

512

0 128 256 384 640 768 896 1024512

- Quest 1

X

YQuest scenarios

59

Quest scenarios

- Quest 3

128

256

384

640

768

896

1024

512

0 128 256 384 640 768 896 1024512

- Quest 4

X

Y

- Quest 1 - Quest 2

60

Scalability

61

Processing times – without physics

62

Processing times – with physics

63

Load balancing

64

Quest scenarios (4 quadrants)Y

static

dynamic

- Thread 3 - Thread 4- Thread 1 - Thread 2

65

Quest scenarios (4 splits)

static dynamic

- Thread 3 - Thread 4- Thread 1 - Thread 2

66

Quest scenarios (1 quadrant)

static dynamic

- Thread 3 - Thread 4- Thread 1 - Thread 2

67

Locality-aware load balancing (locks)

68

Conclusions

First application where STM outperforms locks:Overall performance of STM is better at 4

threads in all scenariosReduced false sharing through on-the-fly

collision detection

Locality-aware load balancing reduces true sharing but only for STM

69

Thank you !

70

Splitting components (1 center quest)

71

Load balancing (short range actions)

72

Locality-aware load balancing (STM)