performances
TRANSCRIPT
Performance Analysis in Parallel Programming
Ezio Bartocci, PhD
SUNY at Stony Brook
January 13, 2011
Today’s agenda
Serial Program Performance An example in the cable I/O statements Taking times
Parallel Program Performance Analysis The cost of communication Amdahl's Law Sources of Overhead Scalability
Agenda
Performance of a Serial Program Introductions
Definition of Running Time:
Let be T(n) units the running time of a program.
The actual time depends on:
the hardware being used the programming language and compiler details of the input other than its size
Example of Trapezoidal Rule:
h = (b-a)/n;integral = (f(a)+f(b))/2.0;x = a;for (i=1; i <=n-1; i++){
x = x + h;integral = integral + f(x);
}integral = integral * h;
a bh
c1
c2
c1
T n ≈k1 n
T n ≈k1 n+k2
T n =c1+c 2 n−1
k 1n≫k2
linear polynomial in n
What about the I/O ?Introductions
In 33 Mhz 486 PC (linux) and gcc compilerA multiplication takes a microsecondprintf about 300 microseconds
On faster systems, the ratios may be worse,with the arithmetic time decreasingI/O times remaining the same
Estimation of the time
T n =Tcalc n +T i /o n
Parallel Program Performance Analysis
Introductions
T n , p denotes the runtime of the parallel solution with p processes.
Speedup of a parallel program:
S n , p=T n
T n , p
Efficiency of a parallel program:
En , p=S n , p
p=
T npTn , p
Ambiguous definitionT nIs the fastest known serial program ?
or T n=T n ,10S n , p≤pS n , p=p linear speedup
Efficiency is a measure of process utilitization in a parallel program, relative to the serial program. 0E n , p≤1E n , p =1 linear speedupE n , p1/ p slowdown
The Cost of CommunicationIntroductions
For a reasonable estimatation of the performance of a parallel program, we should count also the cost of communication.
T n , p=T calc n , pT i /on , pT comm n , p
While the cost of sending a single message containig k units of data will be:
t sk tc
t s is called message latency
1t c
is called bandwidth
Taking TimingsIntroductions
#include <time.h>
....clock_t c1, c2;....c1 = clock();for(j = 0; j < MATRIX_SIZE; j++){ for(i = 0; i < MATRIX_SIZE; i++){ B[i * MATRIX_SIZE + j]= A[j * MATRIX_SIZE + i] + A[j * MATRIX_SIZE + i]; }}c2 = clock();
serial_comp = c2 - c1;printf("Time serial computation: %.0f milliseconds\n", serial_comp / (CLOCKS_PER_SEC / 1000));
Amdahl's LawIntroductions
S p=T
1−r T rT / p= 11−r r / p
dSdp
= r
[1−r pr ]2≥0
lim p∞ S p = 11−r
0≤r≤1 is the fraction of the program that is perfectly parallelizable
Example: if r =0.5 the maximum speedup is 2
S(p) is an increasing function in p
if r =0.99 the maximum speedup is 100 even if we use 10000 proc.
Work and OverheadIntroductions
The amount of work done by a serial program is simply the runtime:
W n=T n
The amount of work done by a parallel program is the sum of the Amounts of work done by each process:
W n , p =∑q=0
p−1W q n , p
Thus, an alternative definition of efficiency is:
E n , p=T n
pTn , p=
W nW n , p
Work and OverheadIntroductions
Overhead is the amount of work done by the parallel program that is not done by the serial program:
T on , p =W n , p−W on=pT n , p−T n
per-process overhead: difference between the parallel runtime and the ideal parallel runtime that would be obtained with linear speedup:
T ' on , p =T n , p−T n/ pT on , p=pT ' on , p
Main sources of Overhead: communication, idle time and extra computation
In terms of Amdahl's law the efficiency achievable by a parallel program on a given problem instance will approach zero as the number of processes is increased.
ScalabilityIntroductions
Amdahl's law define the upper bound on the speedupin terms of efficiency we have:
E=T
p 1−r T rT
= 1p 1−r r
ScalabilityIntroductions
Look this assertion in terms of overhead:
T O n , p= pT n , p −T n
Efficiency can be written as:
E n , p =T n
pT n , p=
T n T O n , p T n
= 1T O n , p /T n1
Look this assertion in terms of overhead:
The behavior of the efficiency as p is increased is completely determined by the overhead function T O n , p
A parallel program is scalable if, as the number of processes (p) isincreased, we can find a rate of increase for the problem size (n) so that efficiency remains constant.