the pinpoints toolkit for finding representative regions of large programs harish patil platform...

21
The PinPoints Toolkit for Finding Representative Regions of Large Programs Harish Patil Platform Technology & Architecture Development Enterprise Platform Group Intel Corporation Presented as part of the Pin tutorial at ASPLOS 2004, Boston, MA 10/09/2004

Upload: sheila-underwood

Post on 17-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

The PinPoints Toolkit for Finding Representative Regions

of Large ProgramsHarish Patil

Platform Technology & Architecture DevelopmentEnterprise Platform Group

Intel CorporationPresented as part of the Pin tutorial at ASPLOS 2004, Boston, MA

10/09/2004

ASPLOS’04 2PinPoints

People

PinPoints: Harish Patil, Robert Cohn, Mark Charney, Andrew Sun, Rajiv Kapoor, Anand Karunanidhi

Pin: Robert Cohn, Artur Klauser, Geoff Lowney, CK Luk, Robert Muth, Harish Patil,Vijay Janapa Reddi, Steven Wallace

Acknowledgements: Brad Calder, Michael Greenfield, Geoff Lowney, Joel Emer, Chris Weaver, Michael Adler, Kim Hazelwood, James Vash, Ram Ramanujam, Roger Golliver, Timothy Prince, Allan Knies, Youngsoo Choi, Nechama Katan, Chris Gianos, Hideki Saito, Mahesh Madhav …

ASPLOS’04 3PinPoints

PinPoints

Representative Regions of Programs– Automatically chosen– Validated ( represent whole-program behavior)– For Trace-driven or Execution-driven Simulation

• Pin (Intel) : http://rogue.colorado.edu/Pin + SimPoint (UCSD) http://www.cse.ucsd.edu/~calder/simpoint/

Found/Validated PinPoints for long running (trillions of instructions) programs [IPF & x86]

ASPLOS’04 4PinPoints

Outline Of the Talk

• Why PinPoints?

• PinPoints methodology: How to find and validate representative regions for simulation

Reference: Paper in MICRO-37: “Pinpointing Representative Portions of Large Intel® Itanium® Programs with

Dynamic Instrumentation”, Patil et al.

http://rogue.colorado.edu/Pin/links.php PinPoints [download]

ASPLOS’04 5PinPoints

Motivation: Simulating Large Programs

Problem: Whole-program simulation is very slow (can take months)

Solution: Find representative simulation points– Programs have phases: random/blind selection may

miss them– SimPoint approach: Find phases using basic block

profile: one simulation point (PinPoint) per phasePinPoints : < 1% of program execution

– Capture whole-program behavior

ASPLOS’04 6PinPoints

Motivation: Simulating Large Programs (continued)

Problem: Porting programs to simulators is often not practical – license issues

– extra resources (disks etc.)

Solution: Drive simulation from native environment– Run under Pin

Pin runs programs “out-of-the-box” (no porting required)

ASPLOS’04 7PinPoints

The PinPoints Methodology

isimpoint : Generate Dynamic Basic Block Profile

SimPoint Tools: Analyze Basic Block Profile to find phases

Scripts: Generate PinPoints files

PinPoints file

H/W counters-based Validation

Sample Counters

Match?

Whole ProgramWeighted Sum

for PinPoints

Phase Detection+

PinPoint Selection

Trace Generation/Simulation

ASPLOS’04 8PinPoints

Phases in gzip’s Execution

Performance(IPC)

Energy usedper interval

Instructioncache misses

Data cache misses

2nd levelcache misses

Branchmisprediction

Instructions

ASPLOS’04 9PinPoints

SimPoint: You are what you execute

• Goal - track behavior of a program– Behavior caused by the path through code

• How - Track the code that is executing– Detect changes and similarities in code

• Basic Block Distribution Analysis– Generate and compare Code Signatures

ASPLOS’04 10PinPoints

Basic-Block Distribution Analysis

B C

A

D

E

3

2

3

1

1

< 3, 1, 2, 3, 1 >A B C D E

ASPLOS’04 11PinPoints

Basic-Block Distribution Analysis

B C

A

D

E

< 3, 2, 1, 3, 1 >< 2, 0, 2, 2, 2 >

< 3, 1, 2, 3, 1 >A B C D E

3

1

3

1

2

2

2

2

2

0

•Capture using isimpoint•Compare vectors•Group similar vectors in clusters•Choose one PinPoint per cluster

ASPLOS’04 12PinPoints

Phase Detection + PinPoint Selection

PinPoint 1: Weight 30% PinPoint 2: Weight 70%

pinpoints.pp

350 3518… …

1 2 350 4232… …

Profiles(vectors) for Program Slices (100 Million Instructions each)

1 2 1022 4232… …

Profile with isimpoint

Analyze with SimPoint

ASPLOS’04 13PinPoints

PinPoints Generated for Some Programs (Commercial and SPEC2000)

Program # Retired Instructions

(billions)

# Slices

(250 million insts.)

# PinPoints

AMBER-rt 3994 15975 6

Fluent-m3 2625 10499 8

LS-DYNA 4932 19729 6

SPECINT 142 567 4

SPECFP 373 1491 5

PinPoints : < 1% of program execution

ASPLOS’04 14PinPoints

PinPoints: Validation

• Do PinPoints capture whole-program behavior?Whole-Program CPI: Actual-CPI Predict using CPI for PinPoints: Predicted-CPI

Predicted-CPI = Weighti * CPIi% Delta = (Actual-CPI – Predicted-CPI)*100/ (Actual-CPI)

• Do they work across micro-architectures?– Predict performance on different configurations with

the same binary/PinPoints : Compare with actual performance

ASPLOS’04 15PinPoints

CPI: Some IPF SPEC2000 programs: Actual vs. Predicted

0.1

0.6

1.1

1.6

2.1

CP

I

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

% A

bs(

Del

ta in

CP

I)

%Delta

Whole_pgm_CPI

PinPoints_CPI

Predicting Whole-program CPI with PinPoints

(Itanium 2: 1.3 GHz)

ASPLOS’04 16PinPoints

CPI: Some x86 SPEC2000 programs: Actual vs. Predicted

0

5

10

15

20

25

CP

I

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Ab

s(%

Del

ta in

CP

I) %Delta

Whole_pgm_CPI

PinPoints_CPI

Predicting Whole-program CPI with PinPoints

(Pentium 4: 2.8 GHz)

ASPLOS’04 17PinPoints

Predicting Whole-program L2 Misses with PinPoints

(Itanium 2: 900 MHz)

L2 Misses Per Thousand Insts for Some SPEC2000 Programs Actual vs. Predicated

0102030405060708090

MP

KI

0.02.04.06.08.010.012.014.0 A

bs(

Del

ta in

MP

KI)

Delta

Whole_pgm_L2MPKI

PinPoints_L2MPKI

ASPLOS’04 18PinPoints

Speedup Prediction with PinPoints (Itanium 1, 2 varying Frequency)

Same binaries/ Same set of PinPoints : Different Microarchitectures

Speedup over Config1:Some IPF SPEC2000 programs: Actual vs Predicted

0

1

2

3

4

5

6

7

8

9

Sp

ee

du

p

Config2:Actual

Config2:Predicted

Config4:Actual

Config4:Predicted

ASPLOS’04 19PinPoints

Relevant Pin Tools

• isimpoint : generates basic block vectors in a format suitable for SimPoint analysis

• controller : allows fast-forwarding till a region of interest is reached

Specifying a region of interest:– Skip N instructions– Specific code address + Count– PinPoints file + PinPoint numberAvailable as “class CONTROL” in a Pin kit

ASPLOS’04 20PinPoints

Summary

Finding simulation points : The Pin Advantage• No special compiler/link flags or porting required• Allows analysis of programs as they run

PinPoints : < 1% of program execution• Predict whole-program behavior• Work across microarchitectures

ASPLOS’04 21PinPoints

Resources

• Timothy Sherwood, Erez Perelman, Greg Hamerly and Brad Calder. “Automatically Characterizing Large Scale Program Behavior” ASPLOS’02

• SimPoint toolkithttp://www-cse.ucsd.edu/~calder/simpoint/

• Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, and Anand Karunanidhi. “Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation” MICRO-37(2004).

• PinPoints toolkit: To be released soon (available upon request)