barrierwatch: characterizing multithreaded workloads across and within program-defined epochs...

Post on 19-Jan-2016

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BarrierWatch:Characterizing Multithreaded Workloads across and within

Program-Defined Epochs

Socrates Demetriades and

Sangyeun Cho

Computer Frontiers 2011, Ischia, Italy.

Program’s time-varying behavior.

Bodytrack / 16-threads parallel execution

Time

Challenge: How to detect behavioral changes?

NoC

Tra

ffic

Adaptive CMP architectures can take advantage of this time varying behavior.

Tracking program behavior

Traditionally, two methods for tracking program phases

1. Run-time monitoring of the program execution.– Observations are limited by the monitoring metric. – Cost of monitoring mechanisms. – Granularity of monitoring intervals? Fine- vs coarse- grain?

2. Profile based analysis. – Static program analysis, complicated algorithms. – Binary rewriting– Architectural support.

Code-based metrics: not directly suitable for parallel workloads.

Overview of our proposal

Track the program behavior at Run Time.

Effective

Simple

Low-cost

View the program execution on ‘epoch’ granularity.

Outline

Introduction

Program epochs and characterization

Run-time epoch change detection.

Case study

Summary

Observation / Motivation

• Natural alignment of barriers with the changes in program behavior.

• Intervals enclosed by barriers repeat with consistent behavior.

No

C T

raffi

c

Time

No

C T

raffi

c

Time

Program epochs

• Epoch: An execution interval between two consecutive barriers.

A B

ep

och

BarriersA

B

epoch

AB

ep

och AB

ep

och AB

ep

och AB

ep

och A B

ep

och

No

C T

raffi

c

Time

Program epochs

BarriersA

B

epoch

BA

ep

och BA

ep

och BA

ep

och BA

ep

och BA

ep

och

• Epoch: An execution interval between two consecutive barriers.

No

C T

raffi

c

Time

Program epochs

DCepoch

DCepoch

• Epoch: An execution interval between two consecutive barriers.

No

C T

raffi

c

Time

Program epochs

• Epoch: An execution interval between two consecutive barriers.

Epochs’ effectiveness: characterization

Are epochs effective in characterizing the variability of program behavior?

How similar is program behavior among the different dynamic instances of the same epoch?

How different is the behavior across different epochs?

How the program behaves within the epochs?

Characterization across epochs.

Error bars:variability across the dynamic instances of an epoch

Dispersion across points:variability across different epochs

LOW variability

HIGH variability

NoC

Tra

ffic

Characterization across epochs.

• fundamental correlation between epoch boundaries and changes in program behavior

• High predictability of behavior across epoch instances

NoC

Tra

ffic

L2 M

iss

Rat

io

Glo

bal

IP

C

C2C

Tra

nfer

s

Low variability across instances of an epochHigh variability across different epochs

Characterization across epochs.

Low variability across instances of an epochHigh variability across different epochs

Ratio =

The smaller the ratio, • the sharper the behavioral shifts on epoch

boundaries• the more predictable the program behavior across

repeating epoch instances.

PARSEC and SPLASH2 programs.

bodytrack fluidanimatestreamcluster barnes fmm lu ocean radiosity water-ns0.0

0.2

0.4

0.6

0.8

1.0

Global IPC L2 Miss Ratio Traffic Volume C2C-tranfer Hit Ratio

Varia

tion

Ratio

Less than 0.2 for most benchmarks.

Epochs’ effectiveness: characterization

Are epochs effective in characterizing the variability of program behavior?

How similar is program behavior among the different dynamic instances of the same epoch?

How different is the behavior across different epochs?

How the program behaves within the epochs?

Characterization within epochs.

• Epochs may exhibit stable or other behavioral patterns within their boundaries.

• Internal behavior patterns reoccur and thus can be accurately predicted.

Stable

Unstable

Multiphase

Characterization within epochs.

0%

20%

40%

60%

80%

100%

bodytrack fluidan. streamcl. barnes fmm lu ocean radiosity water-ns average

0%

20%

40%

60%

80%

100%Stable epochs Unstable epochs Multi-phase epochs

Characterization within epochs.

0%

20%

40%

60%

80%

100%Stable epochs Unstable epochs Multi-phase epochs

bodytrack fluidan. streamcl. barnes fmm lu ocean radiosity water-ns average

Characterization within epochs.

• Most epochs exhibit stable behavior within their boundaries.

• Close relation to classic definition of program phase.

• Reoccurring Internal patterns can be predictable.

0%

20%

40%

60%

80%

100%Stable epochs Unstable epochs Multi-phase epochs

bodytrack fluidan. streamcl. barnes fmm lu ocean radiosity water-ns average

Epoch characterization summary

Epochs repeat in a consistent and predictable way providing a reliable granularity of the cyclic pattern of program behavior.

Epoch boundaries are likely to naturally indicate changes of program behavior

Most epochs exhibit stable behavior within their boundaries or other reoccurring predictable patterns.

Epochs: Advantages

Independent from the underlying architecture.

Naturally adopting variable-length intervals

Deterministic boundaries (global sync points).

Barriers can be easily captured at run time.

Many multithreaded workloads are written with barrier synchronizations.

Outline

Introduction

Program epochs and characterization

Run-time epoch change detection.

Case study

Summary

...barrier_wait(barrier)......barrier_wait(barrier)...

Application’s source code

Application’s Instruction stream

Run-time epoch change detection.

Reconfiguration

units

EPOCH ID Decision signature

F bit

Barrier A Barrier ABarrier A

Epoch Table

Barrier B

Barrier B

Barrier A TBarrier B Config ABBarrier B TConfig ABBarrier A TBarrier B Config AB

Outline

Introduction

Program epochs and characterization

Run-time epoch change detection.

Case study.

Summary

Case study: Overview

Purpose: Demonstrate the applicability of the BarrierWatch approach in the context of dynamic adaptation.

Goal: Optimize energy/performance trade-off in a CMP architecture using BarrierWatch.

Adaptation Technique: DVFS applied to the NoC. (@ epoch granularity)

Experimental methodology

Benchmarks: From PARSEC & Splash2 suites (pthread).

Architectural ModelFull system simulator (simics) augmented with a cycle accurate memory hierarchy model.Tile-based CMP model / 16 in-order cores / 2-issue widthShared, physically distributed L2 Cache. Mesh NoC, x-y routing.Two-stage router pipeline, buffer size 2 per VC.

On-Chip DVFS

On-Chip Power consumption Model NoC power + Background power NoC Voltage/Frequency levels:

Frequency (GHz)

Voltage (V) alias

3 0.8 f100%

2.25 0.65 f75%

1.5 0.5 f50%

0.75 0.35 f25%

Evaluated schemes

• Schemes with fixed/static NoC frequency.1. f100% (baseline).2. f75% 3. f50% 4. f25%

• Epoch-based DVFS schemes (adaptive architectures) 1. f-DVFS dyn (Run-time DVFS)2. F-DVFS stat (off-line predefined DVFS settings).

• Best frequency: The one that minimizes the Energy x Delay (ED product).

bodytrack streamcluster barnes average-25

-20

-15

-10

-5

0

5

10

15

20

25

f-75% f-50% f-25% f-DVFS dyn f-DVFS stat

% E

nerg

y Sa

ving

s

bodytrack streamcluster barnes average0

10

20

30

40

% S

low

dow

n

Case study: Results

bodytrack streamcluster barnes average-25

-20

-15

-10

-5

0

5

10

15

20

25

f-75% f-50% f-25% f-DVFS dyn f-DVFS stat

% E

nerg

y Sa

ving

s

bodytrack streamcluster barnes average0

10

20

30

40

% S

low

dow

n

Case study: Results

bodytrack streamcluster barnes average-25

-20

-15

-10

-5

0

5

10

15

20

25

f-75% f-50% f-25% f-DVFS dyn f-DVFS stat

% E

nerg

y Sa

ving

s

bodytrack streamcluster barnes average0

10

20

30

40

% S

low

dow

n

Case study: Results

bodytrack streamcluster barnes average-25

-20

-15

-10

-5

0

5

10

15

20

25

f-75% f-50% f-25% f-DVFS dyn f-DVFS stat

% E

nerg

y Sa

ving

s

bodytrack streamcluster barnes average0

10

20

30

40

% S

low

dow

n

Case study: Results

-38.5

-83.2

bodytrack streamcluster barnes average-25

-20

-15

-10

-5

0

5

10

15

20

25

f-75% f-50% f-25% f-DVFS dyn f-DVFS stat

% E

nerg

y Sa

ving

s

bodytrack streamcluster barnes average0

10

20

30

40

% S

low

dow

n

Case study: Results

-38.5

-83.2

Run-time Epoch-based DVFS:

12.5% energy savings for 2.7% slowdown

Case study: Results

Epoch-based dynamic schemes outperform all static scheme.

bodytrack fluidanimate streamcluster barnes fmm ocenan radiosity water-ns average0

0.2

0.4

0.6

0.8

1

1.2

1.4

f-75% f-50% f-25% f-DVFS dyn f-DVFS stat

ED Im

prov

emen

t

Outline

Introduction

Program epochs and characterization

Run-time epoch change detection.

Case study.

Summary.

Summary

Program-defined epochs represent well the repetitive and varying behavior of multithreaded programs.

BarrierWatch prominent method for effective run-time management in CMPs.

Desirable properties: 1. Simple and lightweight. 2. Effective at run-time. 3. Independent of the underlying architecture. 4. Well suited for Parallel applications.

Thank you!

Computer Frontiers 2011, Ischia, Italy.

top related