software distributed shared memory (sdsm): multiview sdsm, false sharing. solution: multiview

56
1 DSM Innovations - MultiView Software Distributed Shared Memory (SDSM): MultiView 1. SDSM, false sharing. 2. Solution: MultiView. 3. Granularity adaptation. 4. Integrated services. Ayal Itzkovitz, Assaf Schuster

Upload: azra

Post on 31-Jan-2016

88 views

Category:

Documents


4 download

DESCRIPTION

Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView. Granularity adaptation. Integrated services. Ayal Itzkovitz, Assaf Schuster. Local memory. core. core. core. core. A multi-core system (simplified). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

1DSM Innovations - MultiView

Software Distributed Shared Memory (SDSM):

MultiView

1. SDSM, false sharing.

2. Solution: MultiView.

3. Granularity adaptation.

4. Integrated services.

Ayal Itzkovitz, Assaf Schuster

Page 2: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

2DSM Innovations - MultiView

A multi-core system (simplified)

A parallel program may spawn processes (threads) in order to utilize all computing units

Processes communicate through shared memory, physically located on the local machine

core

Local memory

core core

core

Page 3: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

3DSM Innovations - MultiView

Network

A distributed system

core

Local memory

core

Local memory

core

Local memory

Virtual Shared Memory

Emulation of the same programming paradigm Ultimately: no changes to source/binary code

Page 4: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

4DSM Innovations - MultiView

The First SDSM System

The first software SDSM system, Ivy [Li & Hudak, Yale, ‘86] Strict memory semantics (Lamport’s sequential consistency)

Page-based: memory pages as units of sharing

The major performance limitation:

Page size False sharing Page size – 4K (and more) Average object size – 28 bytes

About 150 objects on a page

Page 5: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

5DSM Innovations - MultiView

Object Distribution

Page 6: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

6DSM Innovations - MultiView

Network

Object Distribution – Memory View

Page 7: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

“…the conventional wisdom remains that the overhead of false sharing […] in page-based consistency protocols is the primary factor limiting the performance of software SDSM”

[Amza, Cox, Ramajamni, and Zwaenepoel, PPoPP ‘97]

“[The] conventional wisdom holds that fine-grain performance and false sharing doom page-based approaches”

[Buck and Keleher, IPPS ‘98]

False Sharing

Page 8: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

8DSM Innovations - MultiView

Solution: The MultiView Approach

“MultiView and Millipage – Fine-grain Sharing in Page-based SDSMs” [Itzkovitz and Schuster, OSDI ‘99]

Implement small-size pages through special memory configuration

Other Goals: W/O compromising the strict memory consistency [ICS’04, EuroPar’04]

Utilizing low-latency networks (Myrinet, VIA/ServerNet-II, Infiniband) [Hot-Interconnects’03, IPDPS’04]

Transparency [EuroPar’03]

Adaptive sharing granularity [ICPP’00, IPDPS’01 best paper]

Maximize locality through migration and load sharing [DISC’01]

Additional “service layers” (garbage collection, data-race detection) [JPDC’01,JPDC02]

Page 9: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

9DSM Innovations - MultiView

The Traditional Memory Layout

xyz

Traditional

w

v

u

struct a { …};struct b; int x, y, z;

main() { w = malloc(sizeof(struct a)); v = malloc(sizeof(struct a)); u = malloc(sizeof(struct b));

…}

struct a { …};struct b; int x, y, z;

main() { w = malloc(sizeof(struct a)); v = malloc(sizeof(struct a)); u = malloc(sizeof(struct b));

…}

Page 10: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

10DSM Innovations - MultiView

xyz

The MultiView Technique

TraditionalMultiView

w

v

u

w

v

u

xyz

Page 11: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

11DSM Innovations - MultiView

The MultiView Technique

TraditionalMultiView

w

v

u

xyz

xyz

w

v

u

Protection is now set independently

RW

NAR

Variables reside in the same page but are not shared

Page 12: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

12DSM Innovations - MultiView

The MultiView Technique

TraditionalMultiView

w

v

u

xyz

xyz

w

v

u

View 1

View 2

View 3

Memory Object

Page 13: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

13DSM Innovations - MultiView

The MultiView Technique

Memory Layout

View 1

View 2

Memory Object

xyz

MultiView

w

v

u

MemoryObjectView 1

View 2

View 3

View 3

Page 14: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

14DSM Innovations - MultiView

The MultiView Technique

Host A

View 1

View 2

Memory Object

View 3

Host B

View 1

View 2

Memory Object

View 3

R R

NA RW

NA

R

R

R

R

R

RW

RW

NA

NA

Page 15: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

15DSM Innovations - MultiView

The MultiView Technique

View 1

View 2

View 3

View 1

View 2

View 3

R R

NA RW

NA

R

R

R

R

R

RW

RW

NA

NA

Host A Host B

Page 16: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

16DSM Innovations - MultiView

Enabling Technology

SharedMemoryObject

Memory mapped I/O created for inter-process communication

Page 17: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

17DSM Innovations - MultiView

Implementation: Millipage

Can be used by a single process to provide desired functionality

SharedMemoryObject

• Windows-NT (Solaris, BSD, Linux)

• CreateFileMapping(), MapViewOfFileEx() for allocating views

Page 18: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

18DSM Innovations - MultiView

Transparency

1999: Minipages are allocated at malloc time (via malloc-like API) Allocation routines should be slightly modified

mat = malloc(lines*cols*sizeof(int));…mat[i][j] = mat[i-1][j]+mat[i][j-1]; …

mat = malloc(lines*sizeof(int*));for(i=0;i<N;i++) mat[i] = malloc(cols*sizeof(int));…mat[i][j] = mat[i-1][j]+mat[i][j-1]; …

SOR and LU have not been modified at all WATER- changed ~20 lines out of 783 lines IS- changed 5 lines out of 93 lines TSP- changed ~15 lines out of ~400 lines

2003: complete transparency Through binary instrumentation/interception of OS calls

Page 19: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

19DSM Innovations - MultiView

SOR SPLASH-II Benchmark

SOR speedup

012345678

0 2 4 6 8 10

Number of threads

Spe

edup

Transparent DSM

Millipede 4.0

Transparent+Barrier

SMP (2 processors)

Page 20: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

20DSM Innovations - MultiView

Performance with Fixed Granularity(NBodyW on 8 nodes)

50

52

54

56

58

60

62

allocation granularity

run

tim

e [

s]

Page 21: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

21DSM Innovations - MultiView

False Sharing vs. Prefetching (WATER)

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 2 3 4 5 6 none

chunking level

0.50

0.60

0.70

0.80

0.90

1.00

1.10

eff

icie

nc

y

compete req. (4) x 10 compete req. (8) x 10

Read/Write faults(4) Read/Write faults(8)

efficiency (4 hosts) efficiency (8 hosts)

Page 22: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

22DSM Innovations - MultiView

Adapting Granularity

Application run time

Sha

red

data

ele

men

ts

Adaptation is dynamic, automatic, transparent

Page 23: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

23DSM Innovations - MultiView

Performance (VIA/ServerNet-II, 2004)

1 2 4 6 8 10 120

2

4

6

8

10

12Water-nsq speedup (one thread per node)

nodes

spee

dup

1 2 4 6 8 10 1202468

1012141618202224

Water-nsq speedup (two threads per node)

nodes

spee

dup

SC/MV - fine granularityHLRCMixed consistencySC/MV - best static granularity

SC/MV - dynamic granularity

Page 24: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

24DSM Innovations - MultiView

Integrating Data Race Detection

Detection in application variable granularity

Overheads 1 proc

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

SOR LU IS TSP W ATER

no

rmal

ized

exe

cuti

on

tim

e

NO_DR BAS PCT OPT

Overheads 8 proc

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

SOR LU IS TSP W ATER

no

rmal

ized

exe

cuti

on

tim

e

NO_DR BAS PCT OPT

Page 25: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

25DSM Innovations - MultiView

Integrating Distributed Garbage Collection(Remote Reference Counting)

Collection in native application granularity.

0.20%

2.50% 2.60%

37.70%

0%

5%

10%

15%

20%

25%

30%

35%

40%

IS 0.8 LU 30 WATER 31 SOR 1140

garbage creation ratio (obj/sec)

ove

rhe

ad

Page 26: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

26DSM Innovations - MultiView

Questions?

Page 27: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

27DSM Innovations - MultiView

1. In-core multi-threading

2. Multi-core/SMP multi-threading

3. Tightly-coupled cluster,

customized interconnect (SGI’s Altix)

4. Tightly-coupled cluster,

of-the-shelf interconnect (InfiniBand)

5. WAN, Internet, Grid, peer-to-peer

Traditionally: 1+2 are programmable using shared memory, 3+4 are programmable using message passing, in 5 peer processes communicate with central control only.

HDSM: systems in 3 move towards presenting a shared memory interface to a physically distributed system.

What about 4,5? Software Distributed Shared Memory = SDSM

Types of Parallel Systems

Scalability

Communication Efficiency

Page 28: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

28DSM Innovations - MultiView

Matrix Multiplication

R R W

two threads

Read/only matrices Write matrix

A = malloc(MATSIZE);B = malloc(MATSIZE);C = malloc(MATSIZE);

parfor(n) mult(A, B, C);

mult(id):

for (line=Nxid .. Nx(id+1)) for(col=0..N) C[line,col] = multline(A[line],B[col]);

Page 29: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

29DSM Innovations - MultiView

Network

Matrix Multiplication

RO RO

RO RO

RO RO

RW RW

RO RO

RO RO

RO RO

RW RW

A

x

B

=

C

A

x

B

=

C

Sent once

Sent once

Page 30: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

30DSM Innovations - MultiView

Network

Matrix Multiplication

RO RO

RO RO

RO RO

RW RW

RO RO

RO RO

RO RO

RW RW

A

x

B

=

C

A

x

B

=

C

R WR

Page 31: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

31DSM Innovations - MultiView

Network

Matrix Multiplication - False Sharing

RO RO

RO RO

NA

RO RO

RO RO

A

x

B

=

C

A

x

B

=

C

Sent once

RO RO

RW RW

RO RO

RO RO

RO RO

RW RW

Sent once

NA

RO RO RO RO

RW RW

Page 32: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

32DSM Innovations - MultiView

Network

Matrix Multiplication - False Sharing

RO RO

RO RO

RO RO

RO RO

A

x

B

=

C

A

x

B

=

CRW RW

RO RO RO RO

RW RW

NA NA

RO RO

RO RO

RO RO

RO RO

RW RW

Page 33: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

33DSM Innovations - MultiView

Network

Matrix Multiplication - False Sharing

RO RO

RO RO

RO RO

RO RO

A

x

B

=

C

A

x

B

=

CRW RW

RO RO RO RO

RW RW

RO RO

RO RO

RO RO

RO RO

RW RWNA NA

Page 34: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

34DSM Innovations - MultiView

RR W

Network

Matrix Multiplication - False Sharing

RO RO

RO RO

RO RO

RO RO

A

x

B

=

C

A

x

B

=

C

RO RO RO RO

RO RO

RO RO

RO RO

RO RO

RW RW

RW RW

RW RW

RW RW

Page 35: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

35DSM Innovations - MultiView

First Approach: Weak Semantics

Example - Release Consistency: Allow multiple writers to page

(assume exclusive update for any portion of the page) Each page has a twin copy At synchronization time, all pages perform “diff” with their twins, and

send diffs to managers Managers hold master copies

twin twin

RW RW

Apply diff Apply diff

Page 36: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

36DSM Innovations - MultiView

First Approach: Weak Semantics

Allow memory to reside in an incosistent state for time intervals

Enforce consistency only at synchronization points Reaching a consistent view of the memory requires

computation

Reduces (but not always eliminate) false sharing Reduces number of protocol messages

Weak memory semantics Involves both memory and processing time overhead

Still: coarse-grain sharing (why diff at locations not touched? )

Page 37: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

37DSM Innovations - MultiView

Software DSM Evolution - Weak Semantics

Li & Hudak - IVY, ‘86Yale

Munin, ‘92Release Cons.

Rice

Midway, ‘93Entry Cons.CMU

Treadmarks, ‘94Lazy Release Cons.

Rice

Brazos, ‘97Scope Cons.

Rice

Page-grain:

Relaxed consistency

Page 38: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

38DSM Innovations - MultiView

Software DSM Evolution - Multithreading

Li & Hudak - IVY, ‘86Yale

Munin, ‘92Release Cons.

Rice

Midway, ‘93Entry Cons.CMU

Treadmarks, ‘94Lazy Release Cons.

Rice

Brazos, ‘97Scope Cons.

Rice

Page-grain:

Relaxed consistency

CVM, Millipede, ‘96 multi-protocol

Maryland Technion

Quarks, ‘98protocol latency hiding

Utah

Multithreading

Page 39: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

39DSM Innovations - MultiView

Second Approach:Code Instrumentation

Example - Binary Rewriting: wrap each load and store with instructions that check whether

the data is available locally

load r1, ptr[line]load r2, ptr[v] add r1, 3hstore r1, ptr[line]sub r2, r1store r2, ptr[v]

push ptr[line]call __check_rload r1, ptr[line]push ptr[v]call __check_r load r2, ptr[v] add r1, 3hpush ptr[line]call __check_wstore r1, ptr[line]push ptr[line]call __done sub r2, r1push ptr[v]call __check_w store r2, ptr[v]push ptr[v]call __done

CodeInstr.

push ptr[line]call __check_wload r1, ptr[line]push ptr[v]call __check_w load r2, ptr[v] add r1, 3hstore r1, ptr[line]push ptr[line]call __done sub r2, r1store r2, ptr[v]push ptr[v]call __done

Opt.

line += 3; v = v - line;

Compile

Page 40: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

40DSM Innovations - MultiView

Second Approach:Code Instrumentation

Provides fine-grain access control, thus avoids false sharing

Bypasses the page protection mechanism Usually, fixed granularity for all application data (Still,

false sharing ) Needs a special compiler or binary-level rewriting tools

Cost: High overheads (even on single machine) Inflated code Not portable (among architectures)

Page 41: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

41DSM Innovations - MultiView

Software DSM Evolution

Li & Hudak - IVY, ‘86Yale

Munin, ‘92Release Cons.

Rice

Midway, ‘93Entry Cons.CMU

Treadmarks, ‘94Lazy Release Cons.

Rice

Brazos, ‘97Scope Cons.

Rice

Page-grain:

Relaxed consistency

CVM, Millipede, ‘96 multi-protocol

Maryland Technion

Quarks, ‘98protocol latency hiding

Utah

Multithreading

Blizzard, ‘94binary

instrumentationWisconsin

Shasta, ‘97transparent,

works forcommercial apps

Digital WRL

Fine-grain:Code

Instrumentation

Page 42: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

42DSM Innovations - MultiView

MultiView - Overheads

Application:traverse an array of integers, all packed up in minipages

The number of minipages is derived from the value of max views in page

Limitations of the experiments: 1.63GB contiguous address space available Up to 1664 views Need 64 bits!!!

Page 43: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

43DSM Innovations - MultiView

MultiView - Overheads

As expected, committed (physical) memory is constant Only a negligible overhead (< 4%): Due to TLB misses

0.96

0.98

1

1.02

1.04

1.06

1.08

512Kb

1 MB 2 MB 4 MB 8 MB 16MB

Slo

wdo

wns

1 2 4 8 16 32Num views

Page 44: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

44DSM Innovations - MultiView

MultiView - Taking it to the extreme

Beyond critical points overhead becomes substantial

0

2

4

6

8

10

12

14

16

18

20

Number of views

Slo

wd

ow

n

512 Kb 1 MB 2 MB4 MB 8 MB 16 MB

8MB

4MB

2MB

1MB

Number of minipages at critical points is 128K Slowdown due to L2 cache exhausted by PTEs

Page 45: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

45DSM Innovations - MultiView

MultiView - Taking it to the extreme

Beyond critical points overhead becomes substantial

0

2

4

6

8

10

12

14

16

18

20

Number of views

Slo

wd

ow

n

512 Kb 1 MB 2 MB4 MB 8 MB 16 MB

8MB

4MB

2MB

1MB

Number of minipages at critical points is 128K Slowdown due to L2 cache exhausted by PTEs

SDSM

Page 46: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

47DSM Innovations - MultiView

The Transparent DSM: System Initialization

For most DSM systems, initialization is an almost trivial task

The transparent DSM system cannot use such a simple solution

In order to initialize a DSM system transparently we have to inject the initialization code into the loaded application

Page 47: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

48DSM Innovations - MultiView

Standard Initialization

…call c_init…call main…

crtStartup:

…application code…

main:

Startup code from in the C standard library. This code is

identical for all C applications.crtStartup is the entry point of

the executable.

Standard C application

This instruction lies at a fixed offset from crtStartup. We

denote this offset as main_call_offset

Initialize the C runtime library

Start the application

Page 48: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

49DSM Innovations - MultiView

Transparent DSM System Initialization

…call c_init…call main…

crtStartup:

…application code…

main:

mainPtr dd NULL

hookedMain: dsm_init(…); dsm_create_thread(…,mainPtr,…); …

DllMain: … crtStartup = get_entry_point(); mainPtr = *(crtStartup + main_call_offset); *(crtStartup + main_call_offset) = hookedMain; …

main

hookedMain

Injected DLL

The OS passes control to DllMain() after

the DLL has been loadedThe main thread is resumed

Initialize the C runtime library

Initialize the DSM system(the OS API is intercepted,

globals are moved to the DSM)

The application main threadis created using the DSM

system thread creation API

Page 49: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

50DSM Innovations - MultiView

SDSMs on Emerging Fast Networks

Fast networking is an emerging technology MultiView provides only one aspect: reducing message

sizes

The next magnitude of improvement shifts from the network layer to the system architectures and protocols that use those networks

Challenges: Efficiently employ and integrate fast networks Provide a “thin” protocol layer: reduce protocol complexity, eliminate

buffer copying, use home-based management, etc.

Page 50: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

51DSM Innovations - MultiView

Adding the Privileged View

Constant Read/Write permissions

Separate application threads from SDSM injected threads

Atomic updates DSM threads can access (and

update) memory while application threads are prohibited

Direct send/receive Memory-to-memory No buffer copying

xyz

Application Views

RW

NAR

RW

The Privileged View

Memory Object

Page 51: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

52DSM Innovations - MultiView

Coarse Granularity

1

2

3

5

6

4

Manager

Memory Access Request(1-6) Request

Request

Host 1 Host 2

Host 3

Reply (Data 2,4,5)

Reply (Data 1,3) 1

2

3

5

6

4

1

2

3

5

6

4

Page 52: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

53DSM Innovations - MultiView

Automatic Adaptation of Granularity

1

2

3

4

5

6

Recompose

When same host accesses consecutive minipages

Coarse granularity

1

2

3

4

5

6

Coarse granularityHost A

Host A

Split

When different hosts update

different minipages

Host A

Host B

Fine granularity

1

2

3

4

5

6

Fine granularity

Page 53: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

54DSM Innovations - MultiView

Memory Faults(Barnes)

0

10000

20000

30000

40000

50000fa

ult

s

read faults write faults

Millipede

Page 54: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

55DSM Innovations - MultiView

Water-nsq Performance (cont’d)

SC/MV-f.g. HLRC Mixed SC/MV-b.g.0

20

40

60

80

100

120

140

160

180

200

run

time

brea

kdow

ns (s

ec)

Water-nsquared breakdown

computationread faultswrite faultsbarrierslocks

1 2 3 4 5 6 7 80

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

chunking level (molecules)

Pro

toco

l ove

rhea

d

read faultswrite faultscompete requests

run-

time

(sec

)

run-time

The effect of chunking in Water - nsquared

162

164

166

168

170

172

174

176

178

180

Page 55: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

56DSM Innovations - MultiView

Basic Costs in Millipage(Myrinet interconnect, 1998)

Access fault 26 usec

get protection 7 usec

set protection 12 usec

messages (one way)

header msg 12 usec

a data msg (1/2 KB) 22 usec

a data msg (1 KB) 34 usec

a data msg (4 KB) 90 usec

MPT translation 7 usec

Message sizes directly influence latency

The most compute demanding operation: Minipage translation - 7 usec

In relaxed consistency systems, protocol operations might take hundreds of usecs

example:Run-length diff for 4KB page: 250 usec

Page 56: Software Distributed Shared Memory (SDSM): MultiView SDSM, false sharing. Solution: MultiView

57DSM Innovations - MultiView

Scalability (IB vs. VIA interconnects, 2003)

Application Speedups (8 nodes)

0

2

4

6

8

10

12

14

16

Sp

eed

up

VIA/ServerNet - 1 thread Kernel/IB - 1 thread Kernel/IB - 2 threads