program performance analysis toolkit adaptor

IntroductionMethodology

Implementation (general info)Implementation (client)

Evaluation of implementation

Programs Performance Analysis Toolkit Adaptor

Michael K. PankovAdvisor: Anatoly P. Karpenko

Bauman Moscow State Technical University

October 11, 2013

Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor




Goal, tasks, and importance of the work

Goal

Develop a method and software toolkit for modeling ofprograms performance on general purpose computers

Tasks

1 Develop method of programs performance modeling

2 Implement the performance analysis & modeling toolkit

3 Study the efficiency of toolkit on a set of benchmarks

Importance

1 Estimation of computer performance during its design

2 Search of optimal compiler settings by methods of iterativecompilation and machine learning-driven techniques





Overview

A lot of recent research: see C. Dubach, G. Fursin, B. C. Lee,W. Wu

In particular, there’s cTuning public repository for researchand corresponding program Collective Mind run by G. Fursin

This work is about modeling of performance ofgeneral-purpose computer programs with feature ranking bymeans of Earth Importance and regression by means ofk-Nearest Neighbors and Earth Regression. We try toaccomplish automatic detection of relevant features.





Method of statistical programs performance analysisVelocitas

1 Perform a series of experiments on measuring time of programexecution and form a set, U:

U = {(Xi , yi )},Xi = (xij , i ∈ [1;m], j ∈ [1; n])

Xi — features vector (CPU frequency, number of rows ofprocessed matrix, etc.), yi — response (execution time),m — number of experiments, n — number of features





2 Split the U set into training sample D and test sample C byrandomly assigning of 70% of experiments to D

D = {di | f Irand (di ) > 0.3}, (1)

di = (Xi , yi ), (2)

f Irand (d) ∈ [0 : 1], (3)

i ∈ [1;m], (4)

C = U \ D (5)





3 Extract features xik

xik = f (Xi ), (6)

X ′i = (xij ), (7)

D ′ = {(X ′i , yi )}, (8)

i ∈ [1;m], (9)

j ∈ [1; n + r ], (10)

k ∈ [n + 1; r ] (11)

r — number of additional features (i.e. ”size of input data”)





4 Filter the training set D ′ to remove noise and incorrectmeasurements

D ′′ = D ′ \ {(X ′i , yi ) | P(X ′

i , yi )}

P — experiment selection predicate (we remove allexperiments where the measured execution time is less thantmin)





5 Rank the features and select only ones with non-zeroimportance

sj = frank (D ′′), (12)

j ∈ [1; n], (13)

D ′′′ = {(X ′i , yi ) | Sj > 0} (14)

sj — scalar value of importance of particular feature,frank — feature ranking function (we used MSE, Relief F,Earth Importance)





6 Fit the regression model of 1 of 4 kinds (linear, random forest,Earth, k nearest neighbors)

Mp = {fpred ,B} (15)

B = ffit(D ′′), p ∈ [1; 4] (16)

B — vector of model parameters, ffit — learning function,fpref — prediction function (they’re defined for each modelseparately)





7 Test the model by RRSE metric

C = U \ D (17)

= {(X ′i , yi )}, (18)

i ∈ [1;m], (19)

X ′i = (xik), (20)

k ∈ [1; n + r ] (21)

Y = fpred (~X ,B), (22)

RRSE =

√√√√√√√m∑

i=1(yi − yi )2

m∑i=1

(yi − y)2(23)

yi — predicted value of response, y — average value ofresponse in testing sample





Architecture of Adaptor Framework

Database server

Data views

Client

Database interaction module

Program building module

Experimentation module

Information retrieval module

Information analysis module





Technology stack

Database server

Distributed client-server document-oriented storage CouchDBCloud platform Cloudant

Client

PythonStatistical framework OrangeGNU/Linux on x86 platform





Database interaction module

Provides high-level API forstorage of Python objects todatabase documents

Uses local CouchDB serveras a fall-back if the remoteisn’t available





Program building module

Manages paths to sourcefiles of experimentalprograms

Sources are in hierarchicalstructure of directoriesModule enables that onlyspecifying the name ofprogram to build isenough for sources to befound

Manages build tools andtheir settings





Experimentation module

Calibrates the programexecution time measurementbefore every series of runs

Subtracts the executiontime of ”simplest”program to avoidsystematical errorRuns the program beingstudied until relativedispersion of timemeasurement becomespretty low (drel < 5%)

Passes experiment data todatabase interaction module





Information retrieval module

Collects the information onused platform andexperiment being carried out

CPU

FrequencyCache sizeInstruction set extensionsetc.

Compiler

Experiment

Studied programSize of input data





Data analysis module

Receives data from database and saves it to CSV files forinput to Orange statistical analysis system

Graphs results using Python library matplotlib

Two groups of program performance models

Simplest (1 feature)More complex (3-5 features)

Four regression models in both groups

Lineark Nearest NeighborsMultivariate Adaptive Regression SplinesRandom Forest





Data analysis module (cont.)

Scheme of 40 data analysiscomponents in Orangesystem

Reading inPreprocessingFilteringFeature extractionFeature rankingPredictor fittingPrediction resultsevaluationSaving predictions to CSVfile





PlatformIntel CPUs

Core 2 Quad Q82002.33 GHz, 2 MB cacheCore i5 M460 2.53GHz, 3 MB cacheXeon E5430 2.66GHz, 6 MB cache

Ubuntu 12.04, gcc andllvm compilers

Polybench/C 3.2 benchmarkset, 28 programs in total

Linear algebra, solution ofsystems of linear algebraicequations and ordinarydifferential equations

Input data is generated bydeterministic algorithms

Performance of chosenprograms from benchmarkset is modeled usingAdaptor framework

symm. Multiplication ofsymmetric matrices

Square matrices of 2i

dimensionality,i = f

′′rand(1, 10)

ludcmp.LU-decomposition.

Square matrices off′′

rand(2, 1024)dimensionality

1000 experiments per CPU





Feature ranking. symm program

Attribute Relief F Mean Square Error Earth Importance

size 0.268 0.573 4.9cpu mhz 0.000 0.006 3.3

width 0.130 0.573 0.7cpu cache 0.000 0.006 0.5

height 0.130 0.573 0.0

Earth Importance selected only relevant features





Feature ranking. symm program (cont.)

428 experiments

1 feature: matrix dimensionality

RMSE RRSE R2

k Nearest Neighbors 5.761 0.051 0.997Random Forest 5.961 0.052 0.997

Linear Regression 15.869 0.139 0.981

Root Relative Square Error of k Nearest Neighbors —approx. 5%





Resulting model of performance

k Nearest Neighborsmodelof performanceof symm programon IntelCore 2 Quad Q8200CPU





Resulting model of performance

Comparison of models of performance of ludcmp program

468 experiments

2 features: width of matrix, CPU frequency

RMSE RRSE R2

k Nearest Neigbors 1.093 0.048 0.998Linear Regression 9.067 0.394 0.845





Where models fail

Amazon throttles itsmicro servers: data issplit into two”curves”

Earth Regression atleast tries to followthe ”main curve”

k Nearest Neighborsis much worse in thissituation





Results of evaluation

Most suitable Feature Ranking method — Earth Importance

Most suitable Regression method — k Nearest Neighbors





Further work

Velocitas method is promising, scales for larger feature sets

Data filtering to reduce noise can help it to get even better

Orange is decent statistical framework, but interactive workwith it limits batch processing

For larger data sets and increased automation of Adaptorframework, either its API, or other libraries (e.g. sklearn)should be used

Custom research scenario support is required

It would be interesting to perform experiments on GPU tostudy effects of massive parallel execution





Thank you!

Contact information: Michael K. Pankov

[email protected]

This is an extended and edited version of my diploma defensekeynote, given on June 19, 2013