light64: lightweight hardware support for data race detection during systematic testing of parallel...

16
Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear MICRO’09 LBA reading group – 09/29/09 (by Evangelos)

Upload: ronald-davidson

Post on 24-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs

A. Nistor, D. Marinov and J. Torellas

to appear MICRO’09LBA reading group – 09/29/09

(by Evangelos)

Page 2: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

Introduction – Context Debugging of parallel applications

Even for 1 input too many interleavings Systematic Testing

Execute many times - explore all interleavings

Assumptions: Input provided Thread Interleaving only cause of non-determinism

Goal: Hardware support for data race detection under Systematic Testing

Page 3: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

Background of Systematic Testing

• Serializing of threads (multiplexing)

• New scheduler implementation

• Happens-before definition

• Segment-based interleaving

Page 4: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

Background of Systematic Testing

State: represented by a Serial Log; ordered list of segments

Page 5: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

Background of Systematic Testing

Page 6: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

Light64 – The Idea

“Two different thread interleavings that have the same happens-before graph but a flipped data race, will very likely have at least a small deviation in the execution history”

Page 7: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

Corner cases?

No false positives; few false negatives Systematic tester environment highly

deterministic Extremely improbable for two different

streams of values to generate the same hash

Cannot identify benign races; races on data that will never be consumed

By construction…

Page 8: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

Design

Small hardware modifications CRC logic at the head of ROB ISA extensions; start/stop – save/load hash

history Two modes of execution

Passive Mode Active Mode Tradeoff between accuracy and

performance

Page 9: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

Passive Mode

During step 4 Augment each state with the Execution

History Hash. Check if executions with same happens-before have the same hash value (e.g., S2 & S11)

No guarantees on coverage Dependable on systematic tester’s exploration

strategy and pruning heuristics No practical overhead

Page 10: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

Active Mode

During step 2; While re-executing to reach the selected state ‘S’,

flip as many segments as possible. Compare Execution History Hash against original execution

Heuristic 1 – efficient segment reordering Smallest-ID Thread first during first run Biggest-ID Thread first during re-execution

Heuristic 2 – additional re-executions to increase coverage

ActiveFIN – re-execute all final states ActiveFULL – re-execute all states

Page 11: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

Experimental Setup

Used Pin to model a system running a systematic tester

Instruction count as a performance metric

SPLASH-2 benchmarks (modified & unmodified)

6 versions of a system: Plain, Plain+RD, ActiveNO, ActiveFIN,

ActiveFULL, Passive

Page 12: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

State Space Characterization

Page 13: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

Race Detection Capability

Page 14: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

Runtime Overhead

Page 15: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

Runtime Overhead – Software-based

Page 16: Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear

Conclusions

Lightweight support for data race detection in a Systematic Tester world

Relatively low overhead for S.T. Not a conventional MICRO paper