program performance analysis toolkit adaptor
DESCRIPTION
The Adaptor framework automates experimentation, data collection and analysis in the field of programs performance and tuning. It can be used for i.e. estimation of computer system performance during its design or search of optimal compiler settings by methods of iterative compilation and machine learning-driven techniques. Contact information: Michael K. Pankov • [email protected] • michaelpankov.com Source on GitHub: https://github.com/constantius9/adaptor This is an extended and edited version of my diploma defense keynote, given on June 19, 2013TRANSCRIPT
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Programs Performance Analysis Toolkit Adaptor
Michael K. PankovAdvisor: Anatoly P. Karpenko
Bauman Moscow State Technical University
October 11, 2013
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Goal, tasks, and importance of the work
Goal
Develop a method and software toolkit for modeling ofprograms performance on general purpose computers
Tasks
1 Develop method of programs performance modeling
2 Implement the performance analysis & modeling toolkit
3 Study the efficiency of toolkit on a set of benchmarks
Importance
1 Estimation of computer performance during its design
2 Search of optimal compiler settings by methods of iterativecompilation and machine learning-driven techniques
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Overview
A lot of recent research: see C. Dubach, G. Fursin, B. C. Lee,W. Wu
In particular, there’s cTuning public repository for researchand corresponding program Collective Mind run by G. Fursin
This work is about modeling of performance ofgeneral-purpose computer programs with feature ranking bymeans of Earth Importance and regression by means ofk-Nearest Neighbors and Earth Regression. We try toaccomplish automatic detection of relevant features.
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Method of statistical programs performance analysisVelocitas
1 Perform a series of experiments on measuring time of programexecution and form a set, U:
U = {(Xi , yi )},Xi = (xij , i ∈ [1;m], j ∈ [1; n])
Xi — features vector (CPU frequency, number of rows ofprocessed matrix, etc.), yi — response (execution time),m — number of experiments, n — number of features
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
2 Split the U set into training sample D and test sample C byrandomly assigning of 70% of experiments to D
D = {di | f Irand (di ) > 0.3}, (1)
di = (Xi , yi ), (2)
f Irand (d) ∈ [0 : 1], (3)
i ∈ [1;m], (4)
C = U \ D (5)
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
3 Extract features xik
xik = f (Xi ), (6)
X ′i = (xij ), (7)
D ′ = {(X ′i , yi )}, (8)
i ∈ [1;m], (9)
j ∈ [1; n + r ], (10)
k ∈ [n + 1; r ] (11)
r — number of additional features (i.e. ”size of input data”)
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
4 Filter the training set D ′ to remove noise and incorrectmeasurements
D ′′ = D ′ \ {(X ′i , yi ) | P(X ′
i , yi )}
P — experiment selection predicate (we remove allexperiments where the measured execution time is less thantmin)
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
5 Rank the features and select only ones with non-zeroimportance
sj = frank (D ′′), (12)
j ∈ [1; n], (13)
D ′′′ = {(X ′i , yi ) | Sj > 0} (14)
sj — scalar value of importance of particular feature,frank — feature ranking function (we used MSE, Relief F,Earth Importance)
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
6 Fit the regression model of 1 of 4 kinds (linear, random forest,Earth, k nearest neighbors)
Mp = {fpred ,B} (15)
B = ffit(D ′′), p ∈ [1; 4] (16)
B — vector of model parameters, ffit — learning function,fpref — prediction function (they’re defined for each modelseparately)
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
7 Test the model by RRSE metric
C = U \ D (17)
= {(X ′i , yi )}, (18)
i ∈ [1;m], (19)
X ′i = (xik), (20)
k ∈ [1; n + r ] (21)
Y = fpred (~X ,B), (22)
RRSE =
√√√√√√√m∑
i=1(yi − yi )2
m∑i=1
(yi − y)2(23)
yi — predicted value of response, y — average value ofresponse in testing sample
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Architecture of Adaptor Framework
Database server
Data views
Client
Database interaction module
Program building module
Experimentation module
Information retrieval module
Information analysis module
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Technology stack
Database server
Distributed client-server document-oriented storage CouchDBCloud platform Cloudant
Client
PythonStatistical framework OrangeGNU/Linux on x86 platform
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Database interaction module
Provides high-level API forstorage of Python objects todatabase documents
Uses local CouchDB serveras a fall-back if the remoteisn’t available
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Program building module
Manages paths to sourcefiles of experimentalprograms
Sources are in hierarchicalstructure of directoriesModule enables that onlyspecifying the name ofprogram to build isenough for sources to befound
Manages build tools andtheir settings
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Experimentation module
Calibrates the programexecution time measurementbefore every series of runs
Subtracts the executiontime of ”simplest”program to avoidsystematical errorRuns the program beingstudied until relativedispersion of timemeasurement becomespretty low (drel < 5%)
Passes experiment data todatabase interaction module
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Information retrieval module
Collects the information onused platform andexperiment being carried out
CPU
FrequencyCache sizeInstruction set extensionsetc.
Compiler
Experiment
Studied programSize of input data
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Data analysis module
Receives data from database and saves it to CSV files forinput to Orange statistical analysis system
Graphs results using Python library matplotlib
Two groups of program performance models
Simplest (1 feature)More complex (3-5 features)
Four regression models in both groups
Lineark Nearest NeighborsMultivariate Adaptive Regression SplinesRandom Forest
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Data analysis module (cont.)
Scheme of 40 data analysiscomponents in Orangesystem
Reading inPreprocessingFilteringFeature extractionFeature rankingPredictor fittingPrediction resultsevaluationSaving predictions to CSVfile
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
PlatformIntel CPUs
Core 2 Quad Q82002.33 GHz, 2 MB cacheCore i5 M460 2.53GHz, 3 MB cacheXeon E5430 2.66GHz, 6 MB cache
Ubuntu 12.04, gcc andllvm compilers
Polybench/C 3.2 benchmarkset, 28 programs in total
Linear algebra, solution ofsystems of linear algebraicequations and ordinarydifferential equations
Input data is generated bydeterministic algorithms
Performance of chosenprograms from benchmarkset is modeled usingAdaptor framework
symm. Multiplication ofsymmetric matrices
Square matrices of 2i
dimensionality,i = f
′′rand(1, 10)
ludcmp.LU-decomposition.
Square matrices off′′
rand(2, 1024)dimensionality
1000 experiments per CPU
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Feature ranking. symm program
Attribute Relief F Mean Square Error Earth Importance
size 0.268 0.573 4.9cpu mhz 0.000 0.006 3.3
width 0.130 0.573 0.7cpu cache 0.000 0.006 0.5
height 0.130 0.573 0.0
Earth Importance selected only relevant features
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Feature ranking. symm program (cont.)
428 experiments
1 feature: matrix dimensionality
RMSE RRSE R2
k Nearest Neighbors 5.761 0.051 0.997Random Forest 5.961 0.052 0.997
Linear Regression 15.869 0.139 0.981
Root Relative Square Error of k Nearest Neighbors —approx. 5%
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Resulting model of performance
k Nearest Neighborsmodelof performanceof symm programon IntelCore 2 Quad Q8200CPU
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Resulting model of performance
Comparison of models of performance of ludcmp program
468 experiments
2 features: width of matrix, CPU frequency
RMSE RRSE R2
k Nearest Neigbors 1.093 0.048 0.998Linear Regression 9.067 0.394 0.845
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Where models fail
Amazon throttles itsmicro servers: data issplit into two”curves”
Earth Regression atleast tries to followthe ”main curve”
k Nearest Neighborsis much worse in thissituation
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Results of evaluation
Most suitable Feature Ranking method — Earth Importance
Most suitable Regression method — k Nearest Neighbors
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Further work
Velocitas method is promising, scales for larger feature sets
Data filtering to reduce noise can help it to get even better
Orange is decent statistical framework, but interactive workwith it limits batch processing
For larger data sets and increased automation of Adaptorframework, either its API, or other libraries (e.g. sklearn)should be used
Custom research scenario support is required
It would be interesting to perform experiments on GPU tostudy effects of massive parallel execution
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor
IntroductionMethodology
Implementation (general info)Implementation (client)
Evaluation of implementation
Thank you!
Contact information: Michael K. Pankov
This is an extended and edited version of my diploma defensekeynote, given on June 19, 2013
Michael K. PankovAdvisor: Anatoly P. Karpenko Programs Performance Analysis Toolkit Adaptor