isa-independent workload characterization and implications ... · implications for specialized...

35
ISA-Independent Workload Characterization and Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Upload: others

Post on 02-Nov-2019

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

ISA-Independent Workload Characterization and Implications for Specialized Architectures

Yakun Sophia Shao and David Brooks Harvard University

{shao,dbrooks}@eecs.harvard.edu

Page 2: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Specialized architectures are decoupled from legacy ISAs.

2

Spectrum of Specialization:

General-Purpose CPU GPU Fixed-Function

ASIC

High Efficiency Low Efficiency

Low Programmability

High Programmability

No ISA Tied to a Specific ISA

Page 3: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Specialization requires workload intrinsic characteristics.

Specialized architecture is tailored to applications.

•  e.g. special data path, memory access patterns.

3

I want to design specialized architectures for applications.

You need to first understand their characteristics.

Where should I start first?

Page 4: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

4

Yeah, good point! What should I do to understand

those characteristics?

Hmmm…it’s what you used to do for CPU designs.

Specialization requires workload intrinsic characteristics.

but is what you get the true program characteristic?

How about I run the program and collect performance-

counter stats?

Page 5: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Performance-Counter Based Workload Characterization

•  Metrics –  IPC –  Cache miss rates –  Branch mis-prediction rates –  …

•  Microarchitecture-dependent –  What if there is a bigger cache/a better branch predictor? –  Not program intrinsic characteristics

5

Page 6: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

6

Specialization requires workload intrinsic characteristics.

Oh I also heard about microarchitecture-independent

workload characterization.

hmmm…that removes microarchitecture dependency.

But it still ties to a specific ISA.

We can perform the profiling analysis just using the instruction

trace.

Page 7: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

7

Specialization requires workload intrinsic characteristics.

“Ties to a specific ISA”? Will that be a problem?

Yes for specialized architectures!

Page 8: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

ISA impacts program behaviors.

Stack Overhead •  Limited Registers •  Additional Load/Store

Complex Operations •  Memory Operands •  Vector Operations

Calling Conventions

8

Page 9: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

9

Specialization requires workload intrinsic characteristics.

I see. So is there a way to get ISA-independent

program characteristics?

That’s a good question. I found a paper in ISPASS this year which seems to

answer this question. Let’s take a look!

Page 10: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Paper Summary

Goal: •  An analysis tool to characterize workloads ISA-Independent

characteristics for specialized architectures

10

Methods: •  Leverage compiler’s intermediate representation (IR) •  Categorize characteristics into compute, memory, and control Takeaways: •  ISA-dependent characterization is misleading for specialization. •  ISA-independent characterization allows designers to quickly

identify opportunities for specialization.

Page 11: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Tool Overview

Program

IR Trace

x86 Trace

Characterization for Specialized Architecture

Compute Memory Control

ISA-Independent

Design of Specialized Architecture

11

ISA-Dependent

Page 12: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Program Representations

12

Program

IR Trace

x86 Trace

ILDJIT

LLVM

Page 13: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Program Representations

•  SPEC CPU2000

13

Program

IR Trace

x86 Trace

ILDJIT

LLVM

Page 14: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Program Representations

ILDJIT •  A modular compilation framework •  Performs machine-independent

classical optimizations at the IR level •  Uses LLVM’s back end to

–  Do machine-dependent optimizations –  Generate machine code

14

Program

IR Trace

x86 Trace

ILDJIT

LLVM

Campanoni, et al., A Highly Flexible, Parallel Virtual Machine: Design and Experience of ILDJIT, Software Practice Experience, 2010

Page 15: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Program Representations

ILDJIT IR

•  High-level IR •  Machine-, ISA-, and system-library-

independent •  Features:

–  80 instructions –  Unlimited registers –  Only loads/stores access memory –  No vector operations –  Parameters are passed by variables

15

Program

IR Trace

x86 Trace

ILDJIT

LLVM

Page 16: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Program Representations

x86 Trace •  Used for ISA-dependent analysis •  Semantically equivalent to the IR

code •  Collected with Pin instrumentation

16

Program

IR Trace

x86 Trace

ILDJIT

LLVM

Page 17: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Tool Overview

Program

IR Trace

x86 Trace

Characterization for Specialized Architecture

Compute Memory Control

ISA-Independent

Design of Specialized Architecture

17

ISA-Dependent

Page 18: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

ISA-Independent Workload Characteristics

18

Compute

Memory

Control

•  Opcode Diversity •  Static Instructions (I-MEM)!

•  Memory Footprint (D-MEM) •  Global Address Entropy •  Local Address Entropy

•  Branch Instruction Counts •  Branch Entropy

Page 19: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Compute::Static Instructions

19

Page 20: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

20

Compute::Static Instructions

I will think those stack operations are part of the

“hot code”.

So if you use x86 trace instead of

IR trace…

Page 21: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

ISA-Independent Workload Characteristics

21

Compute

Memory

Control

•  Opcode Diversity •  Static Instructions (I-MEM)

•  Memory Footprint (D-MEM) •  Global Address Entropy!•  Local Address Entropy!

•  Branch Instruction Counts •  Branch Entropy

Page 22: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Memory::Entropy

Entropy: a measure of the randomness

22

Entropy = − p(xi )* log2i=1

N

∑ p(xi )

Case 1: X is always a constant. p(X) =1

log2 p(X) = 0Entropy = 0

Case 2: N possible outcomes of X occur equally.

p(X) = 1N

log2 p(X) = log2 N−1

Entropy = −N * 1N* log2 N

−1

Entropy = log2 N

Page 23: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Memory::Global Address Entropy

23

Temporal Locality

Address Stream A Address Stream B (less temporal locality) (more temporal locality)

0 0 0 0 !0 0 0 1 !0 0 1 0 !0 0 1 1 !

Entropy = 2! Entropy = 0"

Yen, Draper, and Hill. Notary: Hardware Techniques to Enhance Signatures. MICRO 08

Page 24: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Memory::Global Address Entropy

24

Temporal Locality

Address Stream A Address Stream B (less temporal locality) (more temporal locality)

0 0 0 0 !0 0 0 1 !0 0 1 0 !0 0 1 1 !

Entropy = 2! Entropy = 0"

Yen, Draper, and Hill. Notary: Hardware Techniques to Enhance Signatures. MICRO 08

Page 25: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Memory::Global Address Entropy

25

Temporal Locality

I will have wrong locality estimate for workloads!

So if you use x86 trace instead of

IR trace…

Page 26: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Memory::Local Address Entropy

Address Stream A Address Stream B

0 0 0 0

0 1 0 0

1 0 0 0

1 1 0 0

0 0 0 0

0 0 0 1

0 0 1 0

0 0 1 1

0 421 3

# of Bits Skipped

Local Entropy

1

2

AB

(less spatial locality) (more spatial locality)

26

Spatial Locality

Address Stream A Address Stream B (less spatial locality) (more spatial locality)

0 0 0 0 !0 1 0 0 !1 0 0 0 !1 1 0 0 !

Page 27: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Memory::Local Address Entropy

27

Spatial Locality

I will think program has more spatial locality than

it really has.

So if you use x86 trace instead of

IR trace…

Page 28: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

ISA-Independent Workload Characteristics

28

Compute

Memory

Control

•  Opcode Diversity •  Static Instructions (I-MEM)

•  Memory Footprint (D-MEM) •  Global Address Entropy •  Local Address Entropy

•  Branch Instruction Counts •  Branch Entropy!

Yokota, et all, Introducing Entropies for Representing Program Behavior and Branch Predictor Performance, 07

Page 29: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Control::Branch Entropy

29

Page 30: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Control::Branch Entropy

30

I won’t get much wrong for control.

So if you use x86 trace instead of

IR trace…

Page 31: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Tool Overview

Program

IR Trace

x86 Trace

Characterization for Specialized Architecture

Compute Memory Control

ISA-Independent

Design of Specialized Architecture

31

ISA-Dependent

Page 32: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

ISA-Independent Workload Characteristics

32

Compute

Memory

Control

•  Opcode Diversity •  Static Instructions (I-MEM)

•  Memory Footprint (D-MEM) •  Global Address Entropy •  Local Address Entropy

•  Branch Instruction Counts •  Branch Entropy

Is there a way to compare those

across workloads?

Yes, Kiviat plot!

Page 33: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

ISA-Independent Workload Characteristics

33

Compute

Memory

Control

•  Opcode Diversity!•  Static Instructions (I-MEM)!

•  Memory Footprint (D-MEM)!•  Global Address Entropy!•  Local Address Entropy

•  Branch Instruction Counts •  Branch Entropy!

Page 34: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Workload Characterization

34

Page 35: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

Conclusions

•  We demonstrate that ISA-dependent analysis can be misleading for specialized architectures.

•  We present an analysis tool to characterize ISA-independent characteristics for specialization.

•  We show that our tool provides opportunities for designers to compare workloads’ characteristics.

35