![Page 1: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/1.jpg)
Parallelizing Data Race Detection
Benjamin WesterFacebook
David Devecsery, Peter Chen, Jason Flinn, Satish NarayanasamyUniversity of Michigan
![Page 2: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/2.jpg)
Data Races
Benjamin Wester - ASPLOS '13 2
TIM
E
t1 t2
lock(l)x = 1unlock(l)
x = 3lock(l)x = 2unlock(l)
![Page 3: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/3.jpg)
Race Detection
Benjamin Wester - ASPLOS '13
Normal run time
TIM
E With race detector
3
Instrumentation+
Analysis
Detection is slow!8x–200x•Managed vs. binary•Granularity•Completeness•Debug info gatheredMatches app concurrency
![Page 4: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/4.jpg)
Race Detection
Benjamin Wester - ASPLOS '13
Normal run time
TIM
E With race detector
4
Parallel race detector
Detection is slow!8x–200x•Managed vs. binary•Granularity•Completeness•Debug info gatheredMatches app concurrency
![Page 5: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/5.jpg)
How to use parallel hardware?
• Scale program?– Performance tied to app
• Parallel algorithm?– Lots of fine-grained dependencies
• Uniparallelism– Converts execution into parallel pipeline– Works for instrumentation
Benjamin Wester - ASPLOS '13 5
[Wallace CGO’07,Nightingale ASPLOS’08]
![Page 6: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/6.jpg)
Uniparallelism
Benjamin Wester - ASPLOS '13
TIM
E
6
![Page 7: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/7.jpg)
Uniparallelism
Benjamin Wester - ASPLOS '13
TIM
E
7
Epoch-parallelExecution
Epoch-sequentialExecution
![Page 8: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/8.jpg)
Uniparallelism
• Scales well:Epochs are independent execution units
• More efficient:Synchronization
Benjamin Wester - ASPLOS '13 8
TIM
E
![Page 9: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/9.jpg)
Add Detector…
• Scales well:Epochs are independent execution units
• More efficient:Synchronization& Lock elision
Race detection depends on state!
Benjamin Wester - ASPLOS '13 9
TIM
E Here!Here!
![Page 10: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/10.jpg)
Architecture
Benjamin Wester - ASPLOS '13 10
Epoch-sequentialEpoch-sequential Epoch-parallelEpoch-parallel CommitCommit
![Page 11: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/11.jpg)
Moving Work
Identify meaningful fast subset of analysis state– Low frequency, lightweight, self-contained– Run in Epoch-Sequential phase– Predicted and usable in parallel phase
Symbolic execution– Replace missing state with symbols– Defer some computation to commit phase– Commit: replace symbols with concrete values
Benjamin Wester - ASPLOS '13 11
![Page 12: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/12.jpg)
Architecture
Benjamin Wester - ASPLOS '13 12
Epoch-sequentialEpoch-sequential Epoch-parallelEpoch-parallel CommitCommit
Instrument and predictfast subset of analysis state Perform deferred work
on concrete values
Full instrumentationSymbolic execution
![Page 13: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/13.jpg)
Fast
Slow
State: Vector clocks – e.g. ⟨0, 1, 0⟩ – for each:– Thread– Lock– Variable x 2: last read(s), last write
Example computation:Check(read_x ≤ thread_i ); write_x = thread_i
Transitive reductionUse knowledge of happens-before relationship
Parallel FastTrack
Benjamin Wester - ASPLOS '13 13
![Page 14: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/14.jpg)
Fast
Slow
Parallel Eraser
State:– Set of locks held by thread (lockset)– Lockset for each variable– Position in state machine for each variable
Example computation:If state_x == SHARED then ls_x = ls_x ∩ {L, M}
Lockset factorizationFactor out common behavior
Benjamin Wester - ASPLOS '13 14
![Page 15: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/15.jpg)
Parallel Performance
EfficiencyScalability
2 worker threads as baseline8-CPU PlatformBenchmarks:
SPLASH-2 (water, lu, ocean, fft, radix) Parallel app: pbzip2
Benjamin Wester - ASPLOS '13 15
![Page 16: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/16.jpg)
Efficiency
Benjamin Wester - ASPLOS '13 16
Exactly 2 cores
![Page 17: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/17.jpg)
Efficiency
Benjamin Wester - ASPLOS '13 17
Median 13% faster
Median 13% faster
Median 8% faster
Median 8% faster
Exactly 2 cores
![Page 18: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/18.jpg)
Scaling Parallel FastTrack
Benjamin Wester - ASPLOS '13 18
2 Worker Threads
![Page 19: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/19.jpg)
Scaling Parallel FastTrack
Benjamin Wester - ASPLOS '13 19
Median 4.4x with 8 coresMedian 4.4x with 8 cores
2 Worker Threads
![Page 20: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/20.jpg)
Scaling Parallel Eraser
Benjamin Wester - ASPLOS '13 20
Median 3.3x with 8 coresMedian 3.3x with 8 cores
2 Worker Threads
![Page 21: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/21.jpg)
Conclusion
3-Phase Uniparallel architecture– Parallel FastTrack– Parallel Eraser
Parallel algorithms:– Are more efficient– Scale better
Benjamin Wester - ASPLOS '13 21
![Page 22: Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d6e5503460f94a4f807/html5/thumbnails/22.jpg)
Questions?
Benjamin Wester - ASPLOS '13 22