abacus: a hardware-based software profiler for modern processors eric matthews lesley shannon school...

40
ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews • Lesley Shannon School of Engineering Science Sergey Blagodurov • Sergey Zhuravlev • Alexandra Fedorova School of Computing Science Simon Fraser University, Vancouver, BC, Canada

Post on 22-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

ABACUS: A Hardware-Based Software Profiler for Modern Processors

Eric Matthews • Lesley ShannonSchool of Engineering Science

Sergey Blagodurov • Sergey Zhuravlev • Alexandra FedorovaSchool of Computing Science

Simon Fraser University, Vancouver, BC, Canada

Page 2: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Overview

Legendary Introduction to ABACUS

Delicious Profiling Units

Epic Conclusion

2

Page 3: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Introduction to ABACUS

3

Page 4: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Introduction to ABACUS

4

Page 5: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Introduction to ABACUS

5

Page 6: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Introduction to ABACUS

6

Page 7: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

ABACUS

7

Page 8: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

ABACUS

8

ASPLOSrocks!

Page 9: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

ABACUS

9

Page 10: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Performance comparison

10

Memory Reuse Profile

ABACUS avg runtime: 48.5seconds

Simics avg runtime: 1 hour 6minutes

ABACUS

Simics

missReuse 0

Reuse 1

01234

namd

Counts

(in

Mil-

lions)

missReuse 0

Reuse 1

0

2

4

hmmer

Counts

(in

Mil-

lions)

Page 11: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Conclusion

ABACUS is a generic profiler that can be easily integrated into modern processors

It can be used by the O/S to obtain runtime information about a thread’s behaviour to make better thread assignments

11

Page 12: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Thank you! Questions?

Page 13: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Motivation

Future systems will be multi-core and heterogeneous

How does the OS place threads on this architecture?

Characterize thread behaviour

Instruction MixMemory Reuse ProfileEffectiveness of pre-fetchingMemory bandwidth utilization

13

Page 14: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Motivation (cont'd)

How are these metrics collected?

Offline analysis

Code Instrumentation

Simulation (e.g., Simics)

Software-based instruction set simulator

Models systems with full OS support

14

Page 15: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Motivation (cont'd)

Why not use current hardware counters?

Architecture-specific

Not all desired metrics provided

Help detect symptoms, not causes

Limited in number and in concurrent use

15

Page 16: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Goal

Create a hardware profiler to collect thread characteristics at runtime

Imposed constraints

External to processor

Minimally invasive

Cycle accurate

OS controllable

16

Page 17: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

ABACUS

hArdware-Based Analyzer for the Characterization of User Software

A collection of runtime configurable profiling units

Collects metrics useful for thread placement

Controllable through the O/S

17

Page 18: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Hardware Platform

18

Proof-of-concept System

LEON3 Sparc v8 Instruction Set Architecture

Single core, single threaded

Test System

OpenSparc Niagara T1 soft processor

1 to 4 hardware threads

Multi-core Multi-board support

Page 19: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Hardware Platform (cont'd)

19

Page 20: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

ABACUS

20

Page 21: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

External InterfaceBus slave and master modules

Processing required on processor signals

Designed such that only external interface changes with different processor/system

21

Page 22: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Portability

22

Previously integrated with a LEON3 (Sparc

v8 ISA) based system

Differences:

AMBA Advanced High-performance Bus (AHB) vs Processor Local Bus (PLB)

Processor internals

Page 23: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

ControllerStarts or stops profiling

Can limit profiling to a specific address range

DMA interface for retrieving collected data

Linux device driver support

23

Page 24: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Profiling Units

Operate on one or more processor signals:

Instruction

PC

Cache Reuse Distance

etc.

Store data in a collection of counters

24

Page 25: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Profiling Units (cont'd)Focus on two dimensional metrics

– Gives bigger picture / greater insight

Aim to be as architecture independent as possible

25

Page 26: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Profile UnitBehaves like a traditional software profiler

Operates on Program Counter

26

Range Overlap

TraceRangeNon-Overlap

Code Space

Page 27: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Memory Reuse UnitCollects a measure of code or data reuse

Utilizes Least Recently Used (LRU) stack

Reuse distance is movement in the LRU stack or a miss

Uses in cache contention management

27

Page 28: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Memory Reuse UnitCreates histogram of cache reuse pattern

Range: [0, set associativity – 1] or cache miss

28

Reuse Distance

4-way set-associative reuse profile

Page 29: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Instruction Mix

29

Identify current instruction subset in use

Divide instructions into logical categories

Load/Store

Floating Point

Control Flow

Opcode-based table lookup

Page 30: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Latency Unit

30

Break down miss latency into constituent sources

Bus contention

DRAM latency

etc.

For each category create a histogram of latency in cycles

Page 31: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Stall Unit

31

Break down Cycles Per Instruction

Attribute cycles to their sources

Cache miss

Translation Lookaside Buffer (TLB) miss

Floating Point busy stalls

etc.

Page 32: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Verification

32

Run a subset of the SPECCPU2006 benchmarks

Those with memory usage within board specs

Collect metrics with ABACUS and Simics

Profile for a few billion instructions

Limited by Simics performace

Page 33: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Test Platform

Proof-of-concept System

Single core, single threaded

XUP V2Pro: 90% slice utilization

33

Processor LEON3 (SPARC v8 ISA) (50MHz)

Memory 256MB DDR RAM

OS Debian Etch (4.0)

Page 34: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Simulation Platform

Simics System:

Differences:

SPARC v9 ISA (64-bit processor)

Local filesystem vs NFS

34

Processor UltraSparc II (SPARC v9 ISA)

Memory 256MB DDR RAM

OS Debian Etch (4.0)

Page 35: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

LEON3 Comparison

35

missReuse 0

Reuse 1

0

10

20

namd

Counts

(in

Mil-

lions)

missReuse 0

Reuse 1

05

10152025

hmmer

Counts

(in

Mil-

lions)

ABACUS

Simics

Page 36: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

LEON3 Comparison (cont'd)

36

missReuse 0

Reuse 1

01234

namd

Counts

(in

Mil-

lions)

missReuse 0

Reuse 1

0

2

4

hmmer

Counts

(in

Mil-

lions)

DC Memory Reuse Profile

ABACUS

Simics

Page 37: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Resource Usage

3737

Default:

0

200

400

600

800

1000

1200

1400

1600

LUT (V2p)LUT (V5)FF

32bit counters 40bit counters 32bit countersProfile Unit added

2–way LRU Instruction Cache2–way LRU Data Cache5 Instruction Types

Page 38: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Conclusion

ABACUS is a generic profiler that can be easily integrated into modern processors

It can be used by the O/S to obtain runtime information about a thread’s behaviour to make better thread assignments

38

Page 39: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Future Plans

Move to multi-core/multi-threaded system

Memory reuse distance independent of existing cache implementation

Process tracking

Integrate results into OS scheduler

39

Page 40: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev

Questions

?