michael bond milind kulkarni man cao minjia zhang meisam fathi salmi swarnendu biswas aritra...

34
Octet: Capturing and Controlling Cross-Thread Dependences Efficiently Michael Bond Milind Kulkarni Man Cao Minjia Zhang Meisam Fathi Salmi Swarnendu Biswas Aritra Sengupta Jipeng Huang Ohio State Purdue

Upload: lizeth-hemenway

Post on 14-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Octet: Capturing and Controlling Cross-Thread Dependences Efficiently

Michael BondMilind KulkarniMan CaoMinjia ZhangMeisam Fathi SalmiSwarnendu BiswasAritra SenguptaJipeng Huang

Ohio State

Purdue

Parallel programming is mainstream

Shared memory with locks

Challenge:performance & correctness

• Help express parallelism better• Eliminate concurrency errors• Diagnose production bugs• Deal with nondeterminism

Need practical runtime support

• Atomicity checking• Data race

detection• Record & replay

• Transactional memory• DRF/SC enforcement• Deterministic

execution

Need practical runtime support

• Atomicity checking• Data race

detection• Record & replay

• Transactional memory• DRF/SC enforcement• Deterministic

execution

Track dependences Control dependences

Need practical runtime support

o.f = …

… = o.f

• Atomicity checking• Data race

detection• Record & replay

• Transactional memory• DRF/SC enforcement• Deterministic

execution

Track dependences Control dependences

Need practical runtime support

o.f = …

… = o.f

• Atomicity checking• Data race

detection• Record & replay

• Transactional memory• DRF/SC enforcement• Deterministic

execution

Track dependences Control dependences

Need practical runtime support

o.f = …

… = o.f

Commodity (software-only) approachesslow programs by several times

• Atomicity checking• Data race

detection• Record & replay

• Transactional memory• DRF/SC enforcement• Deterministic

execution

Track dependences Control dependences

o.f = …

check

… = o.f

check

Need practical runtime support

Commodity (software-only) approachesslow programs by several times

• Atomicity checking• Data race

detection• Record & replay

• Transactional memory• DRF/SC enforcement• Deterministic

execution

Track dependences Control dependences

o.f = …

check

… = o.f

check

Need practical runtime support

Any access could race add synchronization at every access

Octet

Framework for runtime supportHB edges all dependencesAtomicity of analysis & access

Concurrency control mechanismSynchronization cross-thread dependence Qualitative performance improvement

Octet

Framework for runtime supportHB edges all dependencesAtomicity of analysis & access

Concurrency control mechanismSynchronization cross-thread dependence Qualitative performance improvement

Proofs!

Octet tracks ownership

Each object’s state Є { WrExT , RdExT , RdShc }

wr o.f

T1 T2

write check

o’s state = WrExT1Ti

me

wr o.f

T1 T2

read check

write check

o’s state = WrExT1Ti

me

wr o.f

T1 T2

read check

write check

o’s state = WrExT1Ti

me

wr o.f

T1 T2

safe point

write check

read check

o’s state = WrExT1Ti

me

wr o.f

T1 T2

safe point

write check

read check

o’s state = WrExT1

Implicit safe point

Tim

e

wr o.f

T1 T2

safe point

write check

read check

o’s state = WrExT1Ti

me

wr o.f

T1 T2

safe point

write check

read check

o’s state = RdExT2Ti

me

wr o.f

T1 T2

rd o.f

safe point

write check

read check

o’s state = RdExT2Ti

me

wr o.f

T1 T2

rd o.f

safe point

read check

write check

o’s state = RdExT2

T3 T4

wr o.f

T1 T2

rd o.f

T3

safe point

T4

read check

write check

read check

o’s state = RdExT2

wr o.f

T1 T2

rd o.f

rd o.f

T3

safe point

T4

read check

write check

read check

o’s state = RdShc

wr o.f

T1 T2

rd o.f

rd o.f

T3

safe point

T4

read check

write check

read check

read check

o’s state = RdShc

wr o.f

T1 T2

rd o.f

rd o.f

T3

safe point

rd o.f

T4

read check

write check

read check

read check

o’s state = RdShc

wr o.f

T1 T2

rd o.f

rd o.f

T3

safe point

rd o.f

T4

read check

write check

read check

read check

o’s state = RdShc

Sharing detection[von Praun & Gross ’01]Comparison in our paper

Distributed shared memoryShasta [Scales et al. ’96]

Biased locking[Kawachiya et al. ’02][Russell & Detlefs ’06][Hindman & Grossman ’06]

• Atomicity checking• Data race

detection• Record & replay

• Transactional memory• DRF/SC enforcement• Deterministic

execution

Practical runtime support

Track dependences Control dependences

Framework for runtime supportConcurrency control mechanism

Oct

et

wr o.f

T1 T2

rd o.f

rd o.f

T3

safe point

rd o.f

T4

read check

write check

read check

read check

Dependence recorder records happens-before edges

Implementation in Jikes RVMPublicly availablehttp://jikesrvm.org/Research+Archive

Parallel programsDaCapo Benchmarks 2006 & 2009SPEC JBB 2000 & 2005

Parallel platform32 cores (AMD Opteron 6272)

eclip

se6

hsql

db6

luse

a...

xala

n6

avro

ra9

jyth

on9

luin

dex9

luse

a...

pmd9

sunfl

ow9

xala

n9

jbb2

000

jbb2

005

geo

0

100

200

300

400

500

600

700

800

900

1000

Pessimi...

Overh

ead

(%

)34,600% 3,000%

eclip

se6

hsql

db6

luse

a...

xala

n6

avro

ra9

jyth

on9

luin

dex9

luse

a...

pmd9

sunfl

ow9

xala

n9

jbb2

000

jbb2

005

geo

0

20

40

60

80

100

120 Octet w/o coord

Octet w/o coord

Overh

ead

(%

)

eclip

se6

hsql

db6

luse

a...

xala

n6

avro

ra9

jyth

on9

luin

dex9

luse

a...

pmd9

sunfl

ow9

xala

n9

jbb2

000

jbb2

005

geo

0

20

40

60

80

100

120

OctetOctet w/o coord

Overh

ead

(%

)

eclip

se6

hsql

db6

luse

a...

xala

n6

avro

ra9

jyth

on9

luin

dex9

luse

a...

pmd9

sunfl

ow9

xala

n9

jbb2

000

jbb2

005

geo

0

20

40

60

80

100

120

RecorderOctetOctet w/o coord

Overh

ead

(%

)

Octet helps enable practical runtime support for reliable, scalable concurrency

Framework for runtime supportHB edges all dependencesAtomicity of analysis & access

Concurrency control mechanismSynchronization cross-thread dependence Qualitative performance improvement