rtr: 1 byte/kilo- i nstruction race recording

23
RTR: 1 Byte/Kilo- Instruction Race Recording Min Xu Rastislav BodikMark D. Hill

Upload: ata

Post on 31-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

RTR: 1 Byte/Kilo- I nstruction Race Recording. Rastislav Bodik. Mark D. Hill. Min Xu. Why Do You Need a Recorder?. % gdb a.out gdb> run Program received SIGSEGV. In get() at hash.c:45 45 a = bucket->d;. % gcc sim .c % a.out Segmentation fault %. % gcc para- sim .c % a.out - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: RTR: 1 Byte/Kilo- I nstruction Race Recording

RTR: 1 Byte/Kilo-InstructionRace Recording

Min Xu Rastislav Bodik Mark D. Hill

Page 2: RTR: 1 Byte/Kilo- I nstruction Race Recording

2

% gcc sim.c% a.outSegmentation fault%

% gdb a.outgdb> runProgram received SIGSEGV.In get() at hash.c:4545 a = bucket->d;

% gdb a.outgdb> runProgram exited normally.gdb>

% gcc para-sim.c% a.outSegmentation fault%

Why Do You Need a Recorder?

% gdb a.out loggdb> runProgram received SIGSEGV.In get() at para-hash.c:6767 a = bucket->d;

% gcc para-sim.c% a.outSegmentation faultRace recorded in “log”%

Page 3: RTR: 1 Byte/Kilo- I nstruction Race Recording

3Ideally …

% gdb a.out loggdb> runProgram received SIGSEGV.In get() at para-hash.c:6767 a = bucket->d;

% gcc para-sim.c% a.outSegmentation faultRace recorded in “log”%

Long recording:small logLow runtime

overheadLow cost

Applicability:Programs – data race

Systems – non-SC

Page 4: RTR: 1 Byte/Kilo- I nstruction Race Recording

4Better and Better Recorders

Low Cost

Low Overhead

Small Log Size

Applicability

Data Race Non-SC

InstRply ’87 Y Y

B. & G. ’91 YNetzer’93 YDéjà Vu

’98 YRecPlay

’00JaRec ’04

FDR ’03BugNet ’05

YReEnact

‘03 Y

CORD ‘06 YStrata ’06 YRTR ‘06 Y Y

Page 5: RTR: 1 Byte/Kilo- I nstruction Race Recording

1 Byte/Kilo-Instruction[ASPLOS’06]

A New Recorder

HardwareAcceleration

[ISCA’03]

LessHardware[ASPLOS’06]

SC & TSO[ASPLOS’06]

Result: One more step toward practical

This talk covers only RTR• Regulated Transitive Reduction algorithm

Page 6: RTR: 1 Byte/Kilo- I nstruction Race Recording

6Outline

Race Recording

RTR Algorithm

Results with Commercial Workloads

Compress log during recording replay more “regularly”

Conclusion

Page 7: RTR: 1 Byte/Kilo- I nstruction Race Recording

Technically, what’s race recording?

Page 8: RTR: 1 Byte/Kilo- I nstruction Race Recording

8Race Recording

X=6

X = 1

X++

print(X)

X = 1

X++

print(X)

-X = X*5

--

---

X = X*5-

Thread IThread J

Original Replay

X=10

Recording

X= 6

-X = X*5

--

Log

Thread IThread J

Page 9: RTR: 1 Byte/Kilo- I nstruction Race Recording

9

Goal: Reproduce same conflicts with minimum log data

Terminologies and Assumptions

ld A

Thread I Thread J

Recording

st B

st C

sub

ld B

add

st C

ld B

st A

st C

Thread I Thread J

Replay

Log

ld D

st D

ld A

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

Conflicts(red)

Dependence(black)

Page 10: RTR: 1 Byte/Kilo- I nstruction Race Recording

Regulated Transitive Reduction (RTR)

Page 11: RTR: 1 Byte/Kilo- I nstruction Race Recording

11Log All Conflicts

1

2

3

4

5

6

1

2

3

4

5

6

ld A

Thread I Thread J

Replay

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

Log J: 23 14 35 46

Log I: 23

Log Size: 5*16=80 bytes(10 integers)

Dependence Log

16 bytes

But too many conflicts

Page 12: RTR: 1 Byte/Kilo- I nstruction Race Recording

12Netzer’s Transitive Reduction (TR)

1

2

3

4

5

6

1

2

3

4

5

6

ld A

Thread I Thread J

Replay

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

TR reduced Log J: 23

35 46

Log I: 23

Log Size: 64 bytes(8 integers)

TR Reduced Log

How to further reduce log size?

Page 13: RTR: 1 Byte/Kilo- I nstruction Race Recording

13The Intuition of the RTR Algorithm

After Reduction

From I to J

From J to I

Vectors

Vectors“Regulate” Replay

Page 14: RTR: 1 Byte/Kilo- I nstruction Race Recording

14

Stricter Dependences to Aid Vectorization

1

2

3

4

1

2

3

4

ld A

Thread I Thread J

Replay

st B

st C

add

st C

ld B

st Ald D

5 5sub st C

6 6ld B st D

Log J: 23 45

Log I: 23

Log Size: 48 bytes(6 integers)

New Reduced Log

stricter

Reduced

Fewer dependencies to log

Page 15: RTR: 1 Byte/Kilo- I nstruction Race Recording

15Compress Vectorized Dependencies

1

2

3

4

5

6

1

2

3

4

5

6

ld A

Thread I Thread J

Replay

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

Log J: x=3,5, ∆=1

Log I: x=3, ∆=1

Log Size: 40 bytes(5 integers)

Vectorized Log

VectorDeps.

TRRTR: fewer deps + fewer byte/dep

Page 16: RTR: 1 Byte/Kilo- I nstruction Race Recording

16Deadlock Avoidance of RTR

1

2

3

4

5

6

1

2

3

4

5

6

ld A

Thread I Thread J

Recording

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

Limit the strict dependencies (see paper)

i:4j:1 j:2 i:3 i:4

Replay Cycle

Page 17: RTR: 1 Byte/Kilo- I nstruction Race Recording

Results with Commercial Workloads

Page 18: RTR: 1 Byte/Kilo- I nstruction Race Recording

18Full-system Simulation Method

Commercial server hardware• GEMS: http://www.cs.wisc.edu/gems• Full-system (OS + application) executions• 4-core CMP (Sequential Consistent)

• 1-way in-order issue, 2 GHz, • 64KB I/D L1, 4MB L2, 64byte lines, MOSI directory

Commercial server software• Apache – static web serving• SpecJBB – middleware• OLTP – TPC-C like• Zeus – static web serving

L2C0

C1

C3

C2

Page 19: RTR: 1 Byte/Kilo- I nstruction Race Recording

19Log Size: 1 byte/KI

0.0

0.5

1.0

1.5

2.0byte/core/KI

ApacheJBB OLTP Zeus AVG0

50

100

150

200KB/core/s

ApacheJBB OLTP Zeus AVG

Less buffer, longer recording, smaller logs

Page 20: RTR: 1 Byte/Kilo- I nstruction Race Recording

20RTR vs. Netzer’s TR

TR

RTR

0

20

40

60

80

100

ApacheJBB OLTP ZeusAVG

Lo

g

Siz

e 28% smaller log• TR was “optimal”

Page 21: RTR: 1 Byte/Kilo- I nstruction Race Recording

21Why Does RTR Work Well?

RTR•Instructions execute at similar speed•Dependencies are often “vectorizable”

Page 22: RTR: 1 Byte/Kilo- I nstruction Race Recording

A New Recorder

HardwareAcceleration

[ISCA’03]

1 Byte/Kilo-Instruction[ASPLOS’06]

LessHardware[ASPLOS’06]

SC & TSO[ASPLOS’06]

Result: One more step toward practical

“Less hardware” & “TSO” not covered• Equally important• More details in the paper

Page 23: RTR: 1 Byte/Kilo- I nstruction Race Recording

23Conclusion

Race recording Counter nondeterminism

RTR 1 byte/kilo-instruction•Based on Netzer’s transitive reduction•Create stricter dependencies•Vectorize dependencies to compress log•Avoid overly-strict hence no deadlock

Future work•Support snooping, SMT, replayer