![Page 1: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/1.jpg)
Scalable and Precise Dynamic Datarace Detection
for Structured Parallelism
Raghavan Raman Jisheng Zhao Vivek SarkarRice University
June 13, 2012
Martin Vechev Eran YahavETH Zürich Technion
![Page 2: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/2.jpg)
2
Parallel Programming• Parallel programming is inherently hard– Need to reason about large number of
interleavings
• Dataraces are a major source of errors– Manifest only in some of the possible schedules– Hard to detect, reproduce, and correct
![Page 3: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/3.jpg)
3
Limitations of Past Work• Worst case linear space and time overhead per
memory access• Report false positives and/or false negatives• Dependent on scheduling techniques• Require sequentialization of input programs
![Page 4: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/4.jpg)
4
Structured Parallelism• Trend in newer programming languages: Cilk,
X10, Habanero Java (HJ), ...– Simplifies reasoning about parallel programs– Benefits: deadlock freedom, simpler analysis
• Datarace detection for structured parallelism– Different from that for unstructured parallelism– Logical parallelism is much larger than number of
processors
![Page 5: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/5.jpg)
5
Contribution• First practical datarace detector which is
parallel with constant space overhead– Scalable– Sound and Precise
![Page 6: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/6.jpg)
6
Structured Parallelism in X10, HJ• async <stmt>– Creates a new task that executes <stmt>
• finish <stmt>– Waits for all tasks spawned in <stmt> to complete
![Page 7: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/7.jpg)
7
SPD3: Scalable Precise Dynamic Datarace Detection
• Identifying parallel accesses– Dynamic Program Structure Tree (DPST)
• Identifying interfering accesses– Access Summary
![Page 8: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/8.jpg)
8
Dynamic Program Structure Tree (DPST)
• Maintains parent-child relationships among async, finish, and step instances– A step is a maximal sequence of statements with
no async or finish
![Page 9: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/9.jpg)
9
DPST Examplefinish { // F1
F1
![Page 10: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/10.jpg)
10
DPST Examplefinish { // F1
S1; F1
S1
![Page 11: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/11.jpg)
11
DPST Examplefinish { // F1
S1;async { // A1
}
F1
A1S1
![Page 12: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/12.jpg)
12
DPST Examplefinish { // F1
S1;async { // A1
} S5;
F1
A1S1 S5
![Page 13: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/13.jpg)
13
DPST Examplefinish { // F1
S1;async { // A1
async { // A2
}} S5;
F1
A1
A2
S1 S5
![Page 14: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/14.jpg)
14
DPST Examplefinish { // F1
S1;async { // A1
async { // A2
}async { // A3
}
} S5;
F1
A1
A3A2
S1 S5
![Page 15: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/15.jpg)
15
DPST Examplefinish { // F1
S1;async { // A1
async { // A2
}async { // A3
} S4;
} S5;
F1
A1
A3A2
S1
S4
S5
![Page 16: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/16.jpg)
16
DPST Examplefinish { // F1
S1;async { // A1
async { // A2S2;
}async { // A3
} S4;
} S5;
F1
A1
A3A2
S1
S2
S4
S5
![Page 17: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/17.jpg)
17
DPST Examplefinish { // F1
S1;async { // A1
async { // A2S2;
}async { // A3
S3;}
S4; } S5;
F1
A1
A3A2
S1
S2 S3
S4
S5
![Page 18: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/18.jpg)
18
DPST Examplefinish { // F1
S1;async { // A1
async { // A2S2;
}async { // A3
S3;}
S4; } S5;async { // A4
S6;}
}
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
![Page 19: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/19.jpg)
19
DPST Examplefinish { // F1
S1;async { // A1
async { // A2S2;
}async { // A3
S3;}
S4; } S5;async { // A4
S6;}
}
F1
A1 A4
A3
Left-to-right ordering of children
A2
S1
S2 S3
S4
S5
S6
1: 2: 3: 4: 5: 6: 7: 8: 9:10:11:12:13:14:15:16:
![Page 20: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/20.jpg)
20
DPST Operations• InsertChild (Node n, Node p)– O(1) time– No synchronization needed
• DMHP (Node n1, Node n2)– O(H) time• H = height(LCA(n1, n2))
– DMHP = Dynamic May Happen in Parallel
![Page 21: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/21.jpg)
21
Identifying Parallel Accesses using DPST
DMHP (S1, S2)
1) L = LCA (S1, S2)2) C = child of L that is
ancestor of S1
3) If C is async return true
Else return false
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
![Page 22: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/22.jpg)
22
Identifying Parallel Accesses using DPST
DMHP (S1, S2)
1) L = LCA (S1, S2)2) C = child of L that is
ancestor of S1
3) If C is async return true
Else return false
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
![Page 23: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/23.jpg)
23
Identifying Parallel Accesses using DPST
DMHP (S1, S2)
1) L = LCA (S1, S2)2) C = child of L that is
ancestor of S1
3) If C is async return true
Else return false
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
LCA (S3, S6)
![Page 24: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/24.jpg)
24
Identifying Parallel Accesses using DPST
DMHP (S1, S2)
1) L = LCA (S1, S2)2) C = child of L that is
ancestor of S1
3) If C is async return true
Else return false
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
Child of F1 that is ancestor of S3
![Page 25: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/25.jpg)
25
Identifying Parallel Accesses using DPST
DMHP (S1, S2)
1) L = LCA (S1, S2)2) C = child of L that is
ancestor of S1
3) If C is async return true
Else return false
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
A1 is an async => DMHP (S3, S6) = true
![Page 26: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/26.jpg)
26
Identifying Parallel Accesses using DPST
DMHP (S1, S2)
1) L = LCA (S1, S2)2) C = child of L that is
ancestor of S1
3) If C is async return true
Else return false
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
![Page 27: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/27.jpg)
27
Identifying Parallel Accesses using DPST
DMHP (S1, S2)
1) L = LCA (S1, S2)2) C = child of L that is
ancestor of S1
3) If C is async return true
Else return false
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
LCA (S5, S6)
![Page 28: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/28.jpg)
28
Identifying Parallel Accesses using DPST
DMHP (S1, S2)
1) L = LCA (S1, S2)2) C = child of L that is
ancestor of S1
3) If C is async return true
Else return false
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
Child of F1 that is ancestor of S5
![Page 29: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/29.jpg)
29
Identifying Parallel Accesses using DPST
DMHP (S1, S2)
1) L = LCA (S1, S2)2) C = child of L that is
ancestor of S1
3) If C is async return true
Else return false
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
S5 is NOT an async => DMHP (S5, S6) = false
![Page 30: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/30.jpg)
30
Access SummaryProgram Memory
M Ms.w Ms.r1 Ms.r2
Shadow Memory… …
![Page 31: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/31.jpg)
31
Access SummaryProgram Memory
M Ms.w Ms.r1 Ms.r2
Shadow Memory
A Step Instance that Wrote M
![Page 32: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/32.jpg)
32
Access SummaryProgram Memory
M Ms.w Ms.r1 Ms.r2
Shadow Memory
Two Step Instances that Read M
![Page 33: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/33.jpg)
33
Access Summary Operations
• WriteCheck (Step S, Memory M)– Check for access that interferes with a write of M by S
• ReadCheck (Step S, Memory M)– Check for access that interferes with a read of M by S
![Page 34: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/34.jpg)
34
SPD3 Example
= M = M
= M M =
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
![Page 35: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/35.jpg)
35
SPD3 Example
= M = M
= M M =
Executing Step
Ms.r1 Ms.r2 Ms.w
S1 null null null
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
![Page 36: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/36.jpg)
36
SPD3 Example
= M = M
= M M =
Executing Step
Ms.r1 Ms.r2 Ms.w
S1 null null null
S4 (Read M) S4 null null
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
Update Ms.r1
![Page 37: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/37.jpg)
37
SPD3 Example
= M = M
= M M =
Executing Step
Ms.r1 Ms.r2 Ms.w
S1 null null null
S4 (Read M) S4 null null
S3 (Read M) S4 S3 null
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
Update Ms.r2
![Page 38: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/38.jpg)
38
SPD3 Example
= M = M
= M M =
Executing Step
Ms.r1 Ms.r2 Ms.w
S1 null null null
S4 (Read M) S4 null null
S3 (Read M) S4 S3 null
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
S3, S4 stand for subtree under LCA(S3,S4)
![Page 39: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/39.jpg)
39
SPD3 Example
= M = M
= M M =
Executing Step
Ms.r1 Ms.r2 Ms.w
S1 null null null
S4 (Read M) S4 null null
S3 (Read M) S4 S3 null
S2 (Read M) S4 S3 null
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
S2 is in the subtree under LCA(S3, S4)
=> Ignore S2
![Page 40: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/40.jpg)
40
SPD3 Example
= M = M
= M M =
Executing Step
Ms.r1 Ms.r2 Ms.w
S1 null null null
S4 (Read M) S4 null null
S3 (Read M) S4 S3 null
S2 (Read M) S4 S3 null
S6 (Write M)
F1
A1 A4
A3A2
S1
S2 S3
S4
S5
S6
Report a Read-Write Datarace between steps S4 and S6
![Page 41: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/41.jpg)
41
SPD3 Algorithm• At async, finish, and step boundaries– Update the DPST
• On every access to a memory M, atomically– Read the fields of its shadow memory, Ms
– Perform ReadCheck or WriteCheck as appropriate– Update the fields of Ms, if necessary
![Page 42: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/42.jpg)
42
Space Overhead
• DPST: O(a+f)– ‘a’ is the number of async instances– ‘f’ is the number of finish instances
• Shadow Memory: O(1) per memory location
![Page 43: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/43.jpg)
43
Empirical Evaluation• Experimental Setup– 16-core (4x4) Intel Xeon 2.4GHz system
• 30 GB memory• Red Hat Linux (RHEL 5)• Sun Hotspot JDK 1.6
– All benchmarks written in HJ using only Finish/Async constructs• Executed using the adaptive work-stealing runtime
– SPD3 algorithm • Implemented in Java with static optimizations
![Page 44: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/44.jpg)
44
SPD3 is Scalable
Crypt
LUFa
ct
MolDyn
MonteCarlo
RayTrac
erSe
ries
SOR
Spars
eMatMult FF
THealt
h
Nqueens
Strass
en
Fannku
ch
Mandelbrot
Matmul
GeoMean0.002.004.006.008.00
10.0012.0014.0016.0018.00
1-thread 2-thread 4-thread 8-thread 16-thread
Slow
dow
n re
lativ
e to
resp
ectiv
e (w
.r.t.
num
ber o
f thr
eads
) uni
nstr
umen
ted
![Page 45: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/45.jpg)
45
Estimated Peak Heap Memory Usageon 16-threads
Crypt
LUFa
ct
MolDyn
MonteCarlo
RayTrac
erSe
ries
SOR
Spars
e0
100020003000400050006000700080009000
10000
Eraser FastTrack SPD3
Estim
ated
Pea
k He
ap M
emor
y U
sage
(M
B)
![Page 46: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/46.jpg)
46
Estimated Peak Heap Memory Usage LUFact Benchmark
1 2 4 8 160
500
1000
1500
2000
2500
3000
Eraser FastTrack2 SPD3
Number of threads
Estim
ated
Pea
k He
ap M
emor
y U
sage
(M
B)
![Page 47: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/47.jpg)
47
Related Work: A ComparisonProperties OTFDAA Offset-
SpanSP-bags SP-hybrid FastTrack ESP-bags SPD3
Target Language Nested Fork-Join & Synchronization
operations
Nested Fork-Join
Spawn-Sync
Spawn-Sync
Unstructured Fork-Join
Async-Finish
Async-Finish
Space Overhead per memory location
O(n) O(1) O(1) O(1) O(N) O(1) O(1)
Guarantees Per-Schedule Per-Input Per-Input Per-Input Per-Input Per-Input Per-Input
Empirical Evaluation No Minimal Yes No Yes Yes Yes
Execute Program in Parallel
Yes Yes No Yes Yes No Yes
Dependent on Scheduling technique
No No No Yes No No No
OTFDAA – On the fly detection of access anomalies (PLDI ’89)n – number of threads executing the programN – maximum logical concurrency in the program
![Page 48: Scalable and Precise Dynamic Datarace Detection for Structured Parallelism](https://reader036.vdocuments.net/reader036/viewer/2022062501/568161df550346895dd1f05b/html5/thumbnails/48.jpg)
48
Summary• First practical datarace detector which is
parallel with constant space overhead– Dynamic Program Structure Tree– Access Summary