exploiting streams in instruction and data address trace compression

34
Exploiting Streams in Instruction and Data Address Trace Compression Aleksandar Milenković, Milena Milenković Laboratory for Advanced Computer Architectures and Systems at Alabama - LaCASA ECE Department, The University of Alabama in Huntsville {milenka | milenkm} @ece.uah.edu

Upload: davin

Post on 15-Jan-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Exploiting Streams in Instruction and Data Address Trace Compression. Aleksandar Milenkovi ć , Milena Milenkovi ć Laboratory for Advanced Computer Architectures and Systems at Alabama - LaCASA ECE Department, The University of Alabama in Huntsville {milenka | milenkm} @ece.uah.edu. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Exploiting Streams  in Instruction and Data Address Trace Compression

Exploiting Streams in Instruction and Data Address Trace Compression

Aleksandar Milenković, Milena MilenkovićLaboratory for Advanced Computer Architectures and Systems at Alabama - LaCASAECE Department, The University of Alabama in Huntsville

{milenka | milenkm} @ece.uah.edu

Page 2: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 2/29

Outline

Introduction Related work Stream-based compression Evaluation Conclusion

Page 3: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 3/29

Why Program Execution Traces?

Trace-driven simulation in computer architecture research

Performance tuning

System validation

Introduction

Page 4: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 4/29

Trace Issues

Trace collection, reduction, processing Traces must be large to offer

faithful representation of the system workload

An example: – 1 billion instructions, 10 B/instr: 10GB– SPEC CPU2000 benchmarks, reference

input: hundreds of billions of instructions Effective reduction technique:

– lossless, high compression ratio, fast decompression

Introduction

Page 5: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 5/29

Trace Types

Basic block traces for control flow analysis

Address traces for cache studies Instruction words

for processor studies Operands

for arithmetic unit studies

Introduction

Page 6: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 6/29

Related Work Ziv-Lempel algorithm (gzip utility) WPP - Whole Program Path (J. Larus, 1999)

– program instrumentation, only instruction traces– a trace of acyclic paths compressed with Sequitur

Timestamped WPP (Y. Zhang, R.Gupta, 2001)– path traces for a function stored in one block

PDATS, PDI (E. E. Johnson, 2001)– PDATS: stores address differences

with an optional repetition count – PDI: each of the N most frequently used instruction

words in the trace is replaced with its dictionary index; while other words are left unchanged

Loop detection (E. N. Elnozahy, 1999)– links info about data addresses with the loop

Using Value Predictors (M. Burtsher, 2003)

Page 7: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 7/29

Stream Based Compression (SBC)

For combined address+instruction traces SBC exploits trace inherent characteristics

– Limited number of instruction streams– Locality of data addresses

Instructions from a stream replaced by ID Information about data addresses linked

to the corresponding instruction stream Resulting files:

– Stream Table File (STF)– Stream-Based Instruction Trace (SBIT)– Stream-Based Data Trace (SBDT)

Page 8: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 8/29

Compression FlowH A IwH A IwH A Iw

T Iw… …T Iw

Dinero+ Trace

DA…DA

IBuffer DBufferS.SA

S.L

Stream Table

SA LSA L… …

SA L

1

2

n

T Iw

CaT Iw

CaSid Mid Rdy Aoff Stride Count

Sid Mid Rdy Aoff Stride Count

Sid Mid Rdy Aoff Stride Count

Data FIFO Buffer

SBIT

1

STF

SA L T1Iw1 … Tk Iwk

SBDT

Aoff Stride CountdH

H- Header; A – Address; Iw – Instruction Word; T- Type; DA – Data Address; S.SA – Stream Starting Address; S.L – Stream Length; Ca – Current Data Address, Sid – Stream Id; Mid – Memory Ref Id; Aoff – Address Offset; Rdy – Ready for Commit; dH – Data Header

Stream Based Compression

Page 9: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 9/29

SBC Data Trace Format

DataHeader 1BStride

0, 1, 2, 4, or 8BAddrOffset

1, 2, 4, or 8BRepCount

0, 1, 2, 4, or 8B

Bits 7-5: RepCount size Bits 4-2: Stride size Bits 0-1: AddrOffset size

000: 0B (=0)001: 1B010: 2B011: 4B100: 8B101: 0B (=1)110: unused111: unused

000: 0B (=0)001: 1B010: 2B011: 4B100: 8B101: 0B (=1)110: 0B (=4)111: 0B (=8)

00: 1B01: 2B10: 4B11: 8B

Stream Based Compression

Page 10: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 10/29

SBC: An ExampleType Address IWord

2 120026a60 223e00181 11ff96ff82 120026a64 b7fe00082 120026a68 421106522 120026a6c 424114122 120026a70 23bd19a42 120026a74 465204132 12002678 a43300000 11ff970202 1200267c 426114132 12002680 f43ffffd2 12002678 a43300000 11ff970282 1200267c 426114132 12002680 f43ffffd2 12002678 a43300000 11ff970302 1200267c 426114132 12002680 f43ffffd… … …2 12002678 a43300000 11ff971002 1200267c 426114132 12002680 f43ffffd2 12002678 a43300000 11ff971082 1200267c 426114132 12002680 f43ffffd2 120026a84 23defff0

Stream1 (It. 0)

Stream2 (It. 1)

Stream2 (It. 2)

Stream2 (It. 28)

Stream3 (It. 29)

Dinero+

Trace

Stream Based Compression

for (i=0; i<30;++i){ … a += c[i]; …} …

Page 11: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 11/29

SBC: An Example

1

2

2

..

3

Stream-based Instruction Trace (SBIT)

AddrOffset Stride RepCount

11ff96ff8 0 0

11ff97020 0 0

11ff97028 8 1b

11ff97108 0 0

Stream-based Data Trace (SBIT)

1 223e0018

AddrOffset Length

120026a60 9

12002678 3

12002678 4

2 f43ffffd..

0 a4330000 2 f43ffffd..

0 a4330000 2 f43ffffd..

Stream Table File (STF)

Stream Based Compression

Page 12: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 12/29

2 120026a64 b7fe00082 120026a68 421106522 120026a6c 424114122 120026a70 23bd19a42 120026a74 465204132 12002678 a43300000

1

SBC: How It Works

1

2

2

..

3

Stream-based

Instruction Trace (SBIT)

AddrOffset Stride RepCount

11ff96ff8 0 0

11ff97020 0 0

11ff97028 8 1b

11ff97108 0 0

Stream-based Data Trace (SBIT)

1 223e0018

AddrOffset Length

120026a60 9

12002678 3

12002678 4

2 f43ffffd..

Stream Table (in memory)

Stream Based Compression

1

2

3

Type Address IWord2 120026a60 223e0018

0

0

0

11ff96ff8

11ff96ff8

11ff97020

Current Address

Stride

Repetition Count

2 1200267c 426114132 12002680 f43ffffd

Page 13: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 13/29

SBC: How It Works

1

2

2

..

3

Stream-based

Instruction Trace (SBIT)

AddrOffset Stride RepCount

11ff96ff8 0 0

11ff97020 0 0

11ff97028 8 1b

11ff97108 0 0

Stream-based Data Trace (SBIT)

AddrOffset Length

120026a60 9

12002678 3

12002678 4

Stream Table

Stream Based Compression

1

2

3

Type Address IWord2 120026a60 223e0018

2 120026a64 b7fe00082 120026a68 421106522 120026a6c 424114122 120026a70 23bd19a42 120026a74 465204132 12002678 a4330000

2 1200267c 426114132 12002680 f43ffffd

0

0

0

11ff97028

11ff96ff81

11ff970200

0 a4330000 2 f43ffffd..

2 12002678 a43300000

8

1b

11ff970282 1200267c 426114132 12002680 f43ffffd

Page 14: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 14/29

SBC: How It Works

1

2

2

..

3

Stream-based

Instruction Trace (SBIT)

AddrOffset Stride RepCount

11ff96ff8 0 0

11ff97020 0 0

11ff97028 8 1b

11ff97108 0 0

Stream-based Data Trace (SBIT)

AddrOffset Length

120026a60 9

12002678 3

12002678 4

Stream Table

Stream Based Compression

1

2

3

Type Address IWord2 120026a60 223e0018

2 120026a64 b7fe00082 120026a68 421106522 120026a6c 424114122 120026a70 23bd19a42 120026a74 465204132 12002678 a4330000

2 1200267c 426114132 12002680 f43ffffd

11ff97028

8

1b

11ff97030

11ff96ff81

11ff970200

0 a4330000 2 f43ffffd..

2 12002678 a433000002 1200267c 426114132 12002680 f43ffffd

1a

11ff97028

2 12002678 a43300000

… … …2 12002678 a43300000 11ff971002 1200267c 426114132 12002680 f43ffffd2 12002678 a43300000 11ff971082 1200267c 426114132 12002680 f43ffffd2 120026a84 23defff0

11ff97030

11ff97108

0

2 1200267c 426114132 12002680 f43ffffd

Page 15: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 15/29

Experimentation

SPEC CPU2000 Traces for Alpha ISA– First 2 billion instructions (F2B)– Mid 2 billion instructions (M2B)

• skip 50 billion, then collect 2 billion

Collection: modified SimpleScalar Measure compression ratio & decompression

time relative to the Dinero+– Gzipped only – mPDI– SBC– SBC.gz : SBC combined with Gzip– SBC.seq : SBC combined with Sequitur

Evaluation

Page 16: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 16/29

Stream Statistics: CINT

Less than 7000 instruction streams for most applications

Evaluation

F2B M2B All F2B M2B All F2B M2B All164.gzip 751 336 1437 229 229 229 13.9 13.8 13.6176.gcc 25416 22222 30162 272 254 315 11.8 10.7 11.4181.mcf 744 308 1181 88 64 88 8.9 6.0 7.4186.crafty 4122 1892 5347 191 100 191 13.1 13.4 13.3197.parser 4767 4200 6116 157 157 189 9.4 9.9 10.0252.eon 3486 588 4389 169 168 169 13.8 14.1 13.7253.perlbmk 9034 6344 11542 84 868 868 10.1 12.0 11.8254.gap 3218 476 3530 284 75 284 24.3 10.3 11.1255.vortex 5496 2644 8254 126 110 126 11.1 11.2 11.0300.twolf 2399 1014 4902 163 185 185 12.3 14.5 14.4

Average 5943.3 4002.4 7686.0 176.3 221.0 264.4 12.9 11.6 11.8

# of Streams MaxStreamlen AvrStreamLen

Page 17: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 17/29

Stream Statistics: CFPEvaluation

F2B M2B All F2B M2B All F2B M2B All168.wupwise 1563 234 1912 229 229 229 23.9 27.5 27.4171.swim 1582 496 1839 707 707 707 93.6 132.3 130.8172.mgrid 1457 875 1725 1944 1944 1944 240.1 159.6 420.8173.applu 1470 506 1752 3162 3162 3162 411.5 448.9 462.4177.mesa 1637 593 1938 550 266 550 14.8 18.5 18.15178.galgel 1818 81 4153 264 206 264 18.4 23.0 21.8179.art 435 341 976 168 561 561 10.3 8.7 9.0183.equake 517 260 1355 44 623 623 8.6 28.3 27.7188.ammp 955 502 1810 168 561 422 12.5 35.2 38.5189.lucas 964 317 1414 427 427 427 27.1 127.9 113.3191.fma3d 2083 841 5007 383 1158 1158 10.7 43.6 34.3200.sixtrack 3532 82 6515 264 580 580 20.1 192.9 170.5301.appsi 2439 389 2989 729 729 894 34.0 51.5 50.7

Average 1573.2 424.4 2568.1 695.3 857.9 886.2 71.2 99.8 117.3

# of Streams MaxStreamlen AvrStreamLen

Less than 7000 instruction streams for all applications

Page 18: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 18/29

Compression Ratio: CINT, F2B

F2BCINT mPDI SBC Din.gz mPDI.gz SBC.gz SBC.seq164.gzip 4.4 61.5 40.6 47.9 214.5 197.5176.gcc 3.2 31.9 9.7 20.0 173.8 198.8181.mcf 3.4 47.7 24.9 56.9 513.2 612.3186.crafty 3.0 40.9 7.2 22.8 233.7 253.7197.parser 3.7 34.4 28.2 33.1 187.3 356.1252.eon 3.5 22.5 6.2 27.4 408.3 797.6253.perlbmk 3.2 31.4 6.0 16.8 349.4 327.1254.gap 4.0 51.0 13.3 36.3 783.4 888.6255.vortex 3.5 21.4 7.0 14.6 118.3 340.9300.twolf 3.4 28.8 7.6 23.9 107.9 90.2

Average 3.54 37.15 15.06 29.97 308.99 406.28

Evaluation

Page 19: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 19/29

Compression Ratio: CINT, M2B

M2BCINT mPDI SBC Din.gz mPDI.gz SBC.gz SBC.seq164.gzip 3.8 61.8 42.4 49.2 222.3 204.4176.gcc 3.1 41.5 15.1 21.5 268.3 300.0181.mcf 2.4 16.6 21.4 20.7 59.9 84.8186.crafty 3.0 45.1 7.1 25.5 263.1 285.2197.parser 3.5 33.8 28.7 33.4 170.7 340.9252.eon 3.5 22.0 6.1 28.9 395.6 774.9253.perlbmk 2.9 43.1 35.8 48.2 755.6 1132.7254.gap 3.0 35.8 34.4 39.3 1142.0 1957.6255.vortex 3.4 27.4 12.1 25.4 234.2 411.8300.twolf 3.3 24.9 6.6 19.8 80.0 66.3

Average 3.2 35.2 21.0 31.2 359.2 555.9

Evaluation

Page 20: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 20/29

Compression Ratio: CFP, F2B

F2BCFP mPDI SBC Din.gz mPDI.gz SBC.gz SBC.seq168.wupwise 4.0 79.0 34.3 99.7 2878.9 4811.3171.swim 3.1 410.7 24.4 179.6 43946.9 43522.9172.mgrid 2.9 74.9 12.2 38.3 8976.3 16329.6173.applu 2.9 66.3 13.0 23.1 2708.6 31370.8177.mesa 3.0 74.7 10.3 56.9 1238.6 1775.6178.galgel 3.5 99.9 21.1 29.0 11829.7 44227.4179.art 4.2 80.9 24.2 30.6 12606.5 24796.3183.equake 3.8 54.4 30.7 153.2 1929.8 3353.1188.ammp 4.6 79.6 24.9 49.2 2624.8 3571.9189.lucas 3.6 151.7 69.6 182.1 31181.3 78054.0191.fma3d 4.3 48.0 12.7 23.7 3617.7 17601.0200.sixtrack 3.2 68.5 20.0 50.7 1292.0 1951.1301.appsi 3.0 35.2 8.5 20.0 2295.1 11320.8

Average 3.5 101.8 23.5 72.0 9778.9 21745.1

Evaluation

Page 21: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 21/29

Compression Ratio: CFP, M2B

M2BCFP mPDI SBC Din.gz mPDI.gz SBC.gz SBC.seq168.wupwise 2.7 42.9 18.0 37.9 2047.5 3741.7171.swim 2.8 505.6 21.0 155.7 99989.2 189501.1172.mgrid 2.9 76.9 12.6 38.6 9582.5 17525.1173.applu 2.8 77.7 14.2 24.9 3523.7 45522.8177.mesa 2.9 83.6 10.7 50.9 1081.5 1508.0178.galgel 2.5 55.9 27.9 38.6 9421.5 76728.1179.art 2.9 68.5 26.2 36.7 20895.7 94731.9183.equake 2.5 34.8 27.2 27.0 374.4 436.8188.ammp 2.5 41.8 22.7 28.5 445.0 442.8189.lucas 2.6 270.4 37.9 77.3 29332.7 58094.7191.fma3d 2.6 111.7 4.9 9.7 11987.6 34224.3200.sixtrack 2.6 130.8 13.5 32.5 7433.1 15566.1301.appsi 2.9 34.8 8.1 18.6 2290.8 13523.0

Average 2.7 118.1 18.8 44.4 15261.9 42426.7

Evaluation

Page 22: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 22/29

Decompression Speedup, F2B

Decompression speedup - F2B

0

1

10

100

modPDI.gz

SBC.gz

SBC.seq

… relative to Dinero+.gz

Evaluation

Page 23: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 23/29

Decompression Speedup, M2B… relative to Dinero+.gz

Decompression speedup - M2B

0

1

10

100

modPDI.gz

SBC.gz

SBC.seq

Evaluation

Page 24: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 24/29

Compressibility of Instruction/Data Components

The instruction component(instruction address + instruction word) compresses much better

Only 5% of whole compressed trace for CINT, 10% for CFP

Further research efforts shouldimprove data address compression

Evaluation

Page 25: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 25/29

Compressibility of Instruction/Data Components

Instruction address + instruction word trace component

1

10

100

1000

10000

100000

Co

mp

ress

ion

rat

io

SBC.gz

SBC.seq

mPDI.gz

Din.gz

Data address trace component

1

10

100

1000

164.g

zip

176.g

cc

181.m

cf

186.c

rafty

197.p

arse

r

252.e

on

253.p

erlbm

k

254.g

ap

255.v

orte

x

300.t

wolf

Co

mp

ress

ion

rat

io

SBC.gz

SBC.seq

mPDI.gz

Din.gz

Evaluation

Page 26: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 26/29

Data Address Compression

A good indicator of compression ratio:the number of memory references in the trace divided by the number of records in SBDT file, NMEM/NSBDT.

Also depends on the length of repetition, stride, and address offset fields

E.g., 176.gcc and 300.twolf in F2B: NMEM/NSBDT =4.6 (176.gcc ), 4.5 (300.twolf)

Compression ratio: 10.7 (176.gcc ), 6.9 (300.twolf),

Reason - different length of record fields

Evaluation

Page 27: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 27/29

|SBDT| = i (AddrOffi + Stridei + RepCounti), i =0,1,2,4,8

Data Address Compression: Components

|Din+Data| = 8 NMEM

ComprRatio = 8NMEM/(NSBDT i (PAddrOffi +PStridei +PRepCounti)

i =0,1,2,4,8; P - percentage

Percentage 176.gcc 300.twolf

AddrOffsetByte1 67.53 37.73AddrOffsetByte2 27.82 33.30AddrOffsetByte4 4.60 28.97AddrOffsetByte8 0.05 0.01

StrideByte0 49.68 32.64StrideByte1 28.12 20.72StrideByte2 19.03 28.76StrideByte4 3.16 24.24StrideByte8 0.00 0.00

RepCountByte0 77.24 74.71RepCountByte1 22.58 24.97RepCountByte2 0.18 0.33RepCountByte4 0.00 0.00

Evaluation

Page 28: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 28/29

Conclusions

SBC: new technique for compression of combined data address and instruction traces– Reduces trace size and decompression time– Can be successfully combined

with other compression techniques such as Gzip and Sequitur

– One pass algorithm => migrate into hardware

– Does not require program instrumentation– Stream Table + Stream Frequency enable

fast workload characterization

Page 29: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 29/29

Conclusions

Future directions– 2-level SBT referencing BBT

(Basic Block Table)– Study what happens when other trace

information are included (time, data value)– Possible hardware implementation– Can SBC trace driven simulation beat

execution-driven?

Page 30: Exploiting Streams  in Instruction and Data Address Trace Compression

Backup Slides

Page 31: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 31/29

Compressibility of Instruction/Data Components Not the same through the trace

Evaluation

171.swim Instructions (F2B)

1

10

100

1000

10000

100000

1000000

1 11 21 31 41 51 61 71 81 91

[ x 20 million instr]

Co

mp

res

sio

n r

ati

o

DineroI.raw/DineroI.gzipDineroI.raw/SbcI.rawDinero.raw/SbcI.gzip

171.swim Data (F2B)

1

10

100

1000

10000

100000

1 11 21 31 41 51 61 71 81 91

[ x 20 million instr ]

Co

mp

res

sio

n r

ati

o

DineroD.raw/DineroD.gzipDineroD.raw/SbcD.rawDineroD.raw/SbcD.gzip

Page 32: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 32/29

FIFO Size Influence?

For most applications, not very significant after 4000 entries

Evaluation

Size decrease for SBDTrelative to 1000-entry FIFO

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1000 2000 4000 8000 16000

FIFO size

301.appsi SBDT 189.lucas SBDT

Size decrease for SBDT.gzrelative to 1000-entry FIFO

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1000 2000 4000 8000 16000

FIFO size

301.appsi SBDT.gz 189.lucas SBDT.gz

Page 33: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 33/29

Trace Size: CINT

F2B M2B F2B M2B164.gzip 33.17 32.07 29.16 28.99176.gcc 50.94 52.10 31.80 31.98181.mcf 41.36 37.98 30.38 29.87186.crafty 37.74 36.71 29.84 29.68197.parser 37.94 35.06 29.87 29.44252.eon 48.59 48.58 31.45 31.45253.perlbmk 45.02 46.88 30.92 31.20254.gap 37.36 38.36 29.78 29.93255.vortex 44.40 38.95 30.83 30.02300.twolf 33.77 33.00 29.25 29.13

Average 41.03 39.97 30.33 30.17

Load+Store% Dinero+ [GB]

Evaluation

Page 34: Exploiting Streams  in Instruction and Data Address Trace Compression

WWC-06 34/29

Trace Size: CFP

F2B M2B F2B M2B168.wupwise 19.76 30.96 27.16 28.83171.swim 31.02 32.86 28.84 29.11172.mgrid 36.66 36.43 29.68 29.64173.applu 37.75 38.20 29.84 29.91177.mesa 37.53 38.09 29.81 29.89178.galgel 41.80 41.27 30.44 30.36179.art 37.81 34.12 29.85 29.30183.equake 36.00 45.04 29.58 30.93188.ammp 31.13 37.23 28.85 29.76189.lucas 18.73 22.20 27.01 27.52191.fma3d 18.71 45.70 27.00 31.02200.sixtrack 32.09 24.69 29.00 27.89301.appsi 37.24 37.29 29.76 29.77

Average 32.02 35.70 28.99 29.53

Dinero+ [GB]Load+Store%

Evaluation