skeleton based performance prediction on shared networks sukhdeep sodhi microsoft corp jaspal...

19
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

Upload: gyles-simmons

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS

Sukhdeep Sodhi

Microsoft Corp

Jaspal Subhlok

University of Houston

Page 2: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

2

Resource Selection for Network/Grid Applications

Application

Network

?where is the best performance

Data

Sim 1GUI

Model

Pre Stream

Page 3: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

3

Current approaches to Node Selection

1. Measure and model network properties, such as available bandwidth and CPU loads (with tools like NWS)

2. Find “best” nodes for execution based on network statusBut expected application performance based on measured

resource status may not be accurate• depends on application characteristics – hard to model• translation, e.g., unused bandwidth vs expected throughput• data may be stale as frequent measurements are expensive

Data

Sim 1GUI

Model

Pre Stream

Page 4: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

4

Our Approach

Application

Network

PREDICT APPLICATION PERFORMANCE BY RUNNING A SMALL PROGRAM REPRESENTATIVE OF ACTUAL DISTRIBUTED APPLICATION

Data

Sim 1GUI

Model

Pre Stream

Page 5: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

5

Performance Skeleton is a synthetic short running program whose execution characteristics mirror the application it represents

An application and its skeleton have similar

• communication pattern

• CPU usage

• memory usage

• synchronization pattern

Goal: Performance of a skeleton is directly related to the performance of the application under any condition

• e.g., a skeleton executes in .1% of the time the application takes to execute on any part of a shared network

Performance Skeleton

Page 6: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

6

Central Contribution of This Paper

Data

Sim 1GUI

Model

Pre Stream

Data

Sim 1

GUI

Model

PreStream

CREATE SKELETON

Framework for Automatic Construction of

Performance Skeletons

ApplicationSkeleton

Page 7: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

7

Data

Sim 1GUI

Model

Pre Stream

Data

Sim 1

GUI

Model

PreStream

CREATE SKELETON

Automatic Construction of Skeletons

Record Execution Trace

ApplicationSkeleton

Compress execution trace into execution signature

Construct skeleton program from execution signature

Page 8: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

8

Data

Sim 1GUI

Model

Pre Stream

Data

Sim 1

GUI

Model

PreStream

CREATE SKELETON

Automatic Construction of Skeletons

Record Execution Trace

ApplicationSkeleton

Compress execution trace into execution signature

Construct skeleton program from execution signature

Page 9: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

9

Recording of Execution Trace

• Implemented for MPI applications• Link MPI application with PMPI based profiling

library– no source code modification / analysis required

• Execute on a dedicated testbed• Records all MPI function calls

– Call name, start time, stop time, parameters passed– Timing done to microsecond granularity

• CPU busy = time between two consecutive MPI calls

Page 10: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

10

Data

Sim 1GUI

Model

Pre Stream

Data

Sim 1

GUI

Model

PreStream

CREATE SKELETON

Automatic Construction of Skeletons

Record Execution Trace

ApplicationSkeleton

Compress execution trace into execution signature

Construct skeleton program from execution signature

Page 11: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

11

Generation of Execution Signature …1

Application execution typically follows cyclic patternsGoal: Determine cyclic patterns and form loop

structure by identifying repeating execution behavior.– Repeating patterns should be broadly similar

Step 1:Execution trace to symbol strings– Cluster similar execution events

• Replace all events in cluster by average event

– Each cluster is then assigned a unique symbol– Execution trace is replaced by string of symbols:

,,,,,,,,,,, , ,,, , ,,, …

Page 12: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

12

Generation of Execution Signature …2

Step 2: Compress string by Identifying Cycles– Similar to longest substring matching problem

– Algorithm builds loop structure recursively from symbol strings

e.g. ,,,,,,,,,,, , ,,, , ,,, is replaced by

[,,]4, [,[]2,]2

– Typically signature is multiple orders of magnitude smaller than trace

Step 3: Adaptively increase degree of clustering – until signature is compact enough

Page 13: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

13

Data

Sim 1GUI

Model

Pre Stream

Data

Sim 1

GUI

Model

PreStream

CREATE SKELETON

Automatic Construction of Skeletons

Record Execution Trace

ApplicationSkeleton

Compress execution trace into execution signature

Construct skeleton program from execution signature

Page 14: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

14

Generate Performance Skeleton Program

Goal:Execution time of performance skeleton should be a fixed factor K less than application execution time

Reduce Iterations of each loop by a factor K– Add remainder iterations to events outside of all loops

Process events outside loop as follows:– Reduce execution time of compute operations by a factor K– Reduce execution time of message exchanges by reducing

bytes exchanged by a factor K• Communication operations not scaled linearly due to latency. • Considering latency would make approach architecture-specific

Replace symbols by C language statements

Page 15: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

15

Experimental Validation

Skeletons constructed for Class B NAS MPI benchmarks are executed in following sharing scenarios

• Competing processes on one node• Competing processes on all nodes• Competing traffic on one link• Competing traffic on all links• Competing process and traffic on one node and linkSkeleton execution time is used to predict

application execution time. Setup: Intel Xeon dual CPU 1.7 GHz nodes running Linux

2.4.7. Gigabit crossbar switch. iproute to simulate link sharing

Page 16: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

16

Prediction Accuracy

Graph shows error between predicted and measured application execution time

Skeleton execution is 1/10th of Application execution

average error: 6% max error 18%

Error is higher for scenarios with competing traffic

Page 17: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

17

Comparison with other methods

0

2040

60

80100

120

140160

180

BT CG IS LU MG SP Avg

Benchmarks

%ag

e er

ror

Performance Skeleton

Average Prediction

Class S

Average Prediction: Average slowdown of entire benchmark is used to predict execution time for each program.

Class S Prediction: Class S benchmark(~1sec) programs used as skeletons for Class B (30-900s)benchmarks

Page 18: SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

18

Preliminary Conclusions

Performance estimation with skeleton has high accuracy

Need to incorporate memory access patterns and fine grain CPU behavior for execution across architectures

Implementation limited to mpi applications– basic approach should work for other paradigms

Skeletons may have other uses as a fast way of estimating application performance– e.g. on a slow simulated future system