barrierwatch: characterizing multithreaded workloads across and within program-defined epochs...

38
BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers 2011, Ischia, Italy.

Upload: charity-griffith

Post on 19-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

BarrierWatch:Characterizing Multithreaded Workloads across and within

Program-Defined Epochs

Socrates Demetriades and

Sangyeun Cho

Computer Frontiers 2011, Ischia, Italy.

Page 2: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Program’s time-varying behavior.

Bodytrack / 16-threads parallel execution

Time

Challenge: How to detect behavioral changes?

NoC

Tra

ffic

Adaptive CMP architectures can take advantage of this time varying behavior.

Page 3: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Tracking program behavior

Traditionally, two methods for tracking program phases

1. Run-time monitoring of the program execution.– Observations are limited by the monitoring metric. – Cost of monitoring mechanisms. – Granularity of monitoring intervals? Fine- vs coarse- grain?

2. Profile based analysis. – Static program analysis, complicated algorithms. – Binary rewriting– Architectural support.

Code-based metrics: not directly suitable for parallel workloads.

Page 4: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Overview of our proposal

Track the program behavior at Run Time.

Effective

Simple

Low-cost

View the program execution on ‘epoch’ granularity.

Page 5: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Outline

Introduction

Program epochs and characterization

Run-time epoch change detection.

Case study

Summary

Page 6: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Observation / Motivation

• Natural alignment of barriers with the changes in program behavior.

• Intervals enclosed by barriers repeat with consistent behavior.

No

C T

raffi

c

Time

Page 7: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

No

C T

raffi

c

Time

Program epochs

• Epoch: An execution interval between two consecutive barriers.

A B

ep

och

BarriersA

B

epoch

AB

ep

och AB

ep

och AB

ep

och AB

ep

och A B

ep

och

Page 8: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

No

C T

raffi

c

Time

Program epochs

BarriersA

B

epoch

BA

ep

och BA

ep

och BA

ep

och BA

ep

och BA

ep

och

• Epoch: An execution interval between two consecutive barriers.

Page 9: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

No

C T

raffi

c

Time

Program epochs

DCepoch

DCepoch

• Epoch: An execution interval between two consecutive barriers.

Page 10: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

No

C T

raffi

c

Time

Program epochs

• Epoch: An execution interval between two consecutive barriers.

Page 11: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Epochs’ effectiveness: characterization

Are epochs effective in characterizing the variability of program behavior?

How similar is program behavior among the different dynamic instances of the same epoch?

How different is the behavior across different epochs?

How the program behaves within the epochs?

Page 12: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Characterization across epochs.

Error bars:variability across the dynamic instances of an epoch

Dispersion across points:variability across different epochs

LOW variability

HIGH variability

NoC

Tra

ffic

Page 13: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Characterization across epochs.

• fundamental correlation between epoch boundaries and changes in program behavior

• High predictability of behavior across epoch instances

NoC

Tra

ffic

L2 M

iss

Rat

io

Glo

bal

IP

C

C2C

Tra

nfer

s

Low variability across instances of an epochHigh variability across different epochs

Page 14: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Characterization across epochs.

Low variability across instances of an epochHigh variability across different epochs

Ratio =

The smaller the ratio, • the sharper the behavioral shifts on epoch

boundaries• the more predictable the program behavior across

repeating epoch instances.

Page 15: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

PARSEC and SPLASH2 programs.

bodytrack fluidanimatestreamcluster barnes fmm lu ocean radiosity water-ns0.0

0.2

0.4

0.6

0.8

1.0

Global IPC L2 Miss Ratio Traffic Volume C2C-tranfer Hit Ratio

Varia

tion

Ratio

Less than 0.2 for most benchmarks.

Page 16: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Epochs’ effectiveness: characterization

Are epochs effective in characterizing the variability of program behavior?

How similar is program behavior among the different dynamic instances of the same epoch?

How different is the behavior across different epochs?

How the program behaves within the epochs?

Page 17: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Characterization within epochs.

• Epochs may exhibit stable or other behavioral patterns within their boundaries.

• Internal behavior patterns reoccur and thus can be accurately predicted.

Stable

Unstable

Multiphase

Page 18: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Characterization within epochs.

0%

20%

40%

60%

80%

100%

bodytrack fluidan. streamcl. barnes fmm lu ocean radiosity water-ns average

0%

20%

40%

60%

80%

100%Stable epochs Unstable epochs Multi-phase epochs

Page 19: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Characterization within epochs.

0%

20%

40%

60%

80%

100%Stable epochs Unstable epochs Multi-phase epochs

bodytrack fluidan. streamcl. barnes fmm lu ocean radiosity water-ns average

Page 20: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Characterization within epochs.

• Most epochs exhibit stable behavior within their boundaries.

• Close relation to classic definition of program phase.

• Reoccurring Internal patterns can be predictable.

0%

20%

40%

60%

80%

100%Stable epochs Unstable epochs Multi-phase epochs

bodytrack fluidan. streamcl. barnes fmm lu ocean radiosity water-ns average

Page 21: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Epoch characterization summary

Epochs repeat in a consistent and predictable way providing a reliable granularity of the cyclic pattern of program behavior.

Epoch boundaries are likely to naturally indicate changes of program behavior

Most epochs exhibit stable behavior within their boundaries or other reoccurring predictable patterns.

Page 22: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Epochs: Advantages

Independent from the underlying architecture.

Naturally adopting variable-length intervals

Deterministic boundaries (global sync points).

Barriers can be easily captured at run time.

Many multithreaded workloads are written with barrier synchronizations.

Page 23: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Outline

Introduction

Program epochs and characterization

Run-time epoch change detection.

Case study

Summary

Page 24: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

...barrier_wait(barrier)......barrier_wait(barrier)...

Application’s source code

Application’s Instruction stream

Run-time epoch change detection.

Reconfiguration

units

EPOCH ID Decision signature

F bit

Barrier A Barrier ABarrier A

Epoch Table

Barrier B

Barrier B

Barrier A TBarrier B Config ABBarrier B TConfig ABBarrier A TBarrier B Config AB

Page 25: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Outline

Introduction

Program epochs and characterization

Run-time epoch change detection.

Case study.

Summary

Page 26: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Case study: Overview

Purpose: Demonstrate the applicability of the BarrierWatch approach in the context of dynamic adaptation.

Goal: Optimize energy/performance trade-off in a CMP architecture using BarrierWatch.

Adaptation Technique: DVFS applied to the NoC. (@ epoch granularity)

Page 27: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Experimental methodology

Benchmarks: From PARSEC & Splash2 suites (pthread).

Architectural ModelFull system simulator (simics) augmented with a cycle accurate memory hierarchy model.Tile-based CMP model / 16 in-order cores / 2-issue widthShared, physically distributed L2 Cache. Mesh NoC, x-y routing.Two-stage router pipeline, buffer size 2 per VC.

Page 28: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

On-Chip DVFS

On-Chip Power consumption Model NoC power + Background power NoC Voltage/Frequency levels:

Frequency (GHz)

Voltage (V) alias

3 0.8 f100%

2.25 0.65 f75%

1.5 0.5 f50%

0.75 0.35 f25%

Page 29: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Evaluated schemes

• Schemes with fixed/static NoC frequency.1. f100% (baseline).2. f75% 3. f50% 4. f25%

• Epoch-based DVFS schemes (adaptive architectures) 1. f-DVFS dyn (Run-time DVFS)2. F-DVFS stat (off-line predefined DVFS settings).

• Best frequency: The one that minimizes the Energy x Delay (ED product).

Page 30: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

bodytrack streamcluster barnes average-25

-20

-15

-10

-5

0

5

10

15

20

25

f-75% f-50% f-25% f-DVFS dyn f-DVFS stat

% E

nerg

y Sa

ving

s

bodytrack streamcluster barnes average0

10

20

30

40

% S

low

dow

n

Case study: Results

Page 31: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

bodytrack streamcluster barnes average-25

-20

-15

-10

-5

0

5

10

15

20

25

f-75% f-50% f-25% f-DVFS dyn f-DVFS stat

% E

nerg

y Sa

ving

s

bodytrack streamcluster barnes average0

10

20

30

40

% S

low

dow

n

Case study: Results

Page 32: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

bodytrack streamcluster barnes average-25

-20

-15

-10

-5

0

5

10

15

20

25

f-75% f-50% f-25% f-DVFS dyn f-DVFS stat

% E

nerg

y Sa

ving

s

bodytrack streamcluster barnes average0

10

20

30

40

% S

low

dow

n

Case study: Results

Page 33: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

bodytrack streamcluster barnes average-25

-20

-15

-10

-5

0

5

10

15

20

25

f-75% f-50% f-25% f-DVFS dyn f-DVFS stat

% E

nerg

y Sa

ving

s

bodytrack streamcluster barnes average0

10

20

30

40

% S

low

dow

n

Case study: Results

-38.5

-83.2

Page 34: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

bodytrack streamcluster barnes average-25

-20

-15

-10

-5

0

5

10

15

20

25

f-75% f-50% f-25% f-DVFS dyn f-DVFS stat

% E

nerg

y Sa

ving

s

bodytrack streamcluster barnes average0

10

20

30

40

% S

low

dow

n

Case study: Results

-38.5

-83.2

Run-time Epoch-based DVFS:

12.5% energy savings for 2.7% slowdown

Page 35: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Case study: Results

Epoch-based dynamic schemes outperform all static scheme.

bodytrack fluidanimate streamcluster barnes fmm ocenan radiosity water-ns average0

0.2

0.4

0.6

0.8

1

1.2

1.4

f-75% f-50% f-25% f-DVFS dyn f-DVFS stat

ED Im

prov

emen

t

Page 36: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Outline

Introduction

Program epochs and characterization

Run-time epoch change detection.

Case study.

Summary.

Page 37: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Summary

Program-defined epochs represent well the repetitive and varying behavior of multithreaded programs.

BarrierWatch prominent method for effective run-time management in CMPs.

Desirable properties: 1. Simple and lightweight. 2. Effective at run-time. 3. Independent of the underlying architecture. 4. Well suited for Parallel applications.

Page 38: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers

Thank you!

Computer Frontiers 2011, Ischia, Italy.