barrierwatch: characterizing multithreaded workloads across and within program-defined epochs...
Post on 19-Jan-2016
217 Views
Preview:
TRANSCRIPT
BarrierWatch:Characterizing Multithreaded Workloads across and within
Program-Defined Epochs
Socrates Demetriades and
Sangyeun Cho
Computer Frontiers 2011, Ischia, Italy.
Program’s time-varying behavior.
Bodytrack / 16-threads parallel execution
Time
Challenge: How to detect behavioral changes?
NoC
Tra
ffic
Adaptive CMP architectures can take advantage of this time varying behavior.
Tracking program behavior
Traditionally, two methods for tracking program phases
1. Run-time monitoring of the program execution.– Observations are limited by the monitoring metric. – Cost of monitoring mechanisms. – Granularity of monitoring intervals? Fine- vs coarse- grain?
2. Profile based analysis. – Static program analysis, complicated algorithms. – Binary rewriting– Architectural support.
Code-based metrics: not directly suitable for parallel workloads.
Overview of our proposal
Track the program behavior at Run Time.
Effective
Simple
Low-cost
View the program execution on ‘epoch’ granularity.
Outline
Introduction
Program epochs and characterization
Run-time epoch change detection.
Case study
Summary
Observation / Motivation
• Natural alignment of barriers with the changes in program behavior.
• Intervals enclosed by barriers repeat with consistent behavior.
No
C T
raffi
c
Time
No
C T
raffi
c
Time
Program epochs
• Epoch: An execution interval between two consecutive barriers.
A B
ep
och
BarriersA
B
epoch
AB
ep
och AB
ep
och AB
ep
och AB
ep
och A B
ep
och
No
C T
raffi
c
Time
Program epochs
BarriersA
B
epoch
BA
ep
och BA
ep
och BA
ep
och BA
ep
och BA
ep
och
• Epoch: An execution interval between two consecutive barriers.
No
C T
raffi
c
Time
Program epochs
DCepoch
DCepoch
• Epoch: An execution interval between two consecutive barriers.
No
C T
raffi
c
Time
Program epochs
• Epoch: An execution interval between two consecutive barriers.
Epochs’ effectiveness: characterization
Are epochs effective in characterizing the variability of program behavior?
How similar is program behavior among the different dynamic instances of the same epoch?
How different is the behavior across different epochs?
How the program behaves within the epochs?
Characterization across epochs.
Error bars:variability across the dynamic instances of an epoch
Dispersion across points:variability across different epochs
LOW variability
HIGH variability
NoC
Tra
ffic
Characterization across epochs.
• fundamental correlation between epoch boundaries and changes in program behavior
• High predictability of behavior across epoch instances
NoC
Tra
ffic
L2 M
iss
Rat
io
Glo
bal
IP
C
C2C
Tra
nfer
s
Low variability across instances of an epochHigh variability across different epochs
Characterization across epochs.
Low variability across instances of an epochHigh variability across different epochs
Ratio =
The smaller the ratio, • the sharper the behavioral shifts on epoch
boundaries• the more predictable the program behavior across
repeating epoch instances.
PARSEC and SPLASH2 programs.
bodytrack fluidanimatestreamcluster barnes fmm lu ocean radiosity water-ns0.0
0.2
0.4
0.6
0.8
1.0
Global IPC L2 Miss Ratio Traffic Volume C2C-tranfer Hit Ratio
Varia
tion
Ratio
Less than 0.2 for most benchmarks.
Epochs’ effectiveness: characterization
Are epochs effective in characterizing the variability of program behavior?
How similar is program behavior among the different dynamic instances of the same epoch?
How different is the behavior across different epochs?
How the program behaves within the epochs?
Characterization within epochs.
• Epochs may exhibit stable or other behavioral patterns within their boundaries.
• Internal behavior patterns reoccur and thus can be accurately predicted.
Stable
Unstable
Multiphase
Characterization within epochs.
0%
20%
40%
60%
80%
100%
bodytrack fluidan. streamcl. barnes fmm lu ocean radiosity water-ns average
0%
20%
40%
60%
80%
100%Stable epochs Unstable epochs Multi-phase epochs
Characterization within epochs.
0%
20%
40%
60%
80%
100%Stable epochs Unstable epochs Multi-phase epochs
bodytrack fluidan. streamcl. barnes fmm lu ocean radiosity water-ns average
Characterization within epochs.
• Most epochs exhibit stable behavior within their boundaries.
• Close relation to classic definition of program phase.
• Reoccurring Internal patterns can be predictable.
0%
20%
40%
60%
80%
100%Stable epochs Unstable epochs Multi-phase epochs
bodytrack fluidan. streamcl. barnes fmm lu ocean radiosity water-ns average
Epoch characterization summary
Epochs repeat in a consistent and predictable way providing a reliable granularity of the cyclic pattern of program behavior.
Epoch boundaries are likely to naturally indicate changes of program behavior
Most epochs exhibit stable behavior within their boundaries or other reoccurring predictable patterns.
Epochs: Advantages
Independent from the underlying architecture.
Naturally adopting variable-length intervals
Deterministic boundaries (global sync points).
Barriers can be easily captured at run time.
Many multithreaded workloads are written with barrier synchronizations.
Outline
Introduction
Program epochs and characterization
Run-time epoch change detection.
Case study
Summary
...barrier_wait(barrier)......barrier_wait(barrier)...
Application’s source code
Application’s Instruction stream
Run-time epoch change detection.
Reconfiguration
units
EPOCH ID Decision signature
F bit
Barrier A Barrier ABarrier A
Epoch Table
Barrier B
Barrier B
Barrier A TBarrier B Config ABBarrier B TConfig ABBarrier A TBarrier B Config AB
Outline
Introduction
Program epochs and characterization
Run-time epoch change detection.
Case study.
Summary
Case study: Overview
Purpose: Demonstrate the applicability of the BarrierWatch approach in the context of dynamic adaptation.
Goal: Optimize energy/performance trade-off in a CMP architecture using BarrierWatch.
Adaptation Technique: DVFS applied to the NoC. (@ epoch granularity)
Experimental methodology
Benchmarks: From PARSEC & Splash2 suites (pthread).
Architectural ModelFull system simulator (simics) augmented with a cycle accurate memory hierarchy model.Tile-based CMP model / 16 in-order cores / 2-issue widthShared, physically distributed L2 Cache. Mesh NoC, x-y routing.Two-stage router pipeline, buffer size 2 per VC.
On-Chip DVFS
On-Chip Power consumption Model NoC power + Background power NoC Voltage/Frequency levels:
Frequency (GHz)
Voltage (V) alias
3 0.8 f100%
2.25 0.65 f75%
1.5 0.5 f50%
0.75 0.35 f25%
Evaluated schemes
• Schemes with fixed/static NoC frequency.1. f100% (baseline).2. f75% 3. f50% 4. f25%
• Epoch-based DVFS schemes (adaptive architectures) 1. f-DVFS dyn (Run-time DVFS)2. F-DVFS stat (off-line predefined DVFS settings).
• Best frequency: The one that minimizes the Energy x Delay (ED product).
bodytrack streamcluster barnes average-25
-20
-15
-10
-5
0
5
10
15
20
25
f-75% f-50% f-25% f-DVFS dyn f-DVFS stat
% E
nerg
y Sa
ving
s
bodytrack streamcluster barnes average0
10
20
30
40
% S
low
dow
n
Case study: Results
bodytrack streamcluster barnes average-25
-20
-15
-10
-5
0
5
10
15
20
25
f-75% f-50% f-25% f-DVFS dyn f-DVFS stat
% E
nerg
y Sa
ving
s
bodytrack streamcluster barnes average0
10
20
30
40
% S
low
dow
n
Case study: Results
bodytrack streamcluster barnes average-25
-20
-15
-10
-5
0
5
10
15
20
25
f-75% f-50% f-25% f-DVFS dyn f-DVFS stat
% E
nerg
y Sa
ving
s
bodytrack streamcluster barnes average0
10
20
30
40
% S
low
dow
n
Case study: Results
bodytrack streamcluster barnes average-25
-20
-15
-10
-5
0
5
10
15
20
25
f-75% f-50% f-25% f-DVFS dyn f-DVFS stat
% E
nerg
y Sa
ving
s
bodytrack streamcluster barnes average0
10
20
30
40
% S
low
dow
n
Case study: Results
-38.5
-83.2
bodytrack streamcluster barnes average-25
-20
-15
-10
-5
0
5
10
15
20
25
f-75% f-50% f-25% f-DVFS dyn f-DVFS stat
% E
nerg
y Sa
ving
s
bodytrack streamcluster barnes average0
10
20
30
40
% S
low
dow
n
Case study: Results
-38.5
-83.2
Run-time Epoch-based DVFS:
12.5% energy savings for 2.7% slowdown
Case study: Results
Epoch-based dynamic schemes outperform all static scheme.
bodytrack fluidanimate streamcluster barnes fmm ocenan radiosity water-ns average0
0.2
0.4
0.6
0.8
1
1.2
1.4
f-75% f-50% f-25% f-DVFS dyn f-DVFS stat
ED Im
prov
emen
t
Outline
Introduction
Program epochs and characterization
Run-time epoch change detection.
Case study.
Summary.
Summary
Program-defined epochs represent well the repetitive and varying behavior of multithreaded programs.
BarrierWatch prominent method for effective run-time management in CMPs.
Desirable properties: 1. Simple and lightweight. 2. Effective at run-time. 3. Independent of the underlying architecture. 4. Well suited for Parallel applications.
Thank you!
Computer Frontiers 2011, Ischia, Italy.
top related