performances

Performance Analysis in Parallel Programming

Ezio Bartocci, PhD

SUNY at Stony Brook

January 13, 2011

Today’s agenda

Serial Program Performance An example in the cable I/O statements Taking times

Parallel Program Performance Analysis The cost of communication Amdahl's Law Sources of Overhead Scalability

Agenda

Performance of a Serial Program Introductions

Definition of Running Time:

Let be T(n) units the running time of a program.

The actual time depends on:

the hardware being used the programming language and compiler details of the input other than its size

Example of Trapezoidal Rule:

h = (b-a)/n;integral = (f(a)+f(b))/2.0;x = a;for (i=1; i <=n-1; i++){

x = x + h;integral = integral + f(x);

}integral = integral * h;

a bh

c1

c2

c1

T n ≈k1 n

T n ≈k1 n+k2

T n =c1+c 2 n−1

k 1n≫k2

linear polynomial in n

What about the I/O ?Introductions

In 33 Mhz 486 PC (linux) and gcc compilerA multiplication takes a microsecondprintf about 300 microseconds

On faster systems, the ratios may be worse,with the arithmetic time decreasingI/O times remaining the same

Estimation of the time

T n =Tcalc n +T i /o n

Parallel Program Performance Analysis

Introductions

T n , p denotes the runtime of the parallel solution with p processes.

Speedup of a parallel program:

S n , p=T n

T n , p

Efficiency of a parallel program:

En , p=S n , p

p=

T npTn , p

Ambiguous definitionT nIs the fastest known serial program ?

or T n=T n ,10S n , p≤pS n , p=p linear speedup

Efficiency is a measure of process utilitization in a parallel program, relative to the serial program. 0E n , p≤1E n , p =1 linear speedupE n , p1/ p slowdown

The Cost of CommunicationIntroductions

For a reasonable estimatation of the performance of a parallel program, we should count also the cost of communication.

T n , p=T calc n , pT i /on , pT comm n , p

While the cost of sending a single message containig k units of data will be:

t sk tc

t s is called message latency

1t c

is called bandwidth

Taking TimingsIntroductions

#include <time.h>

....clock_t c1, c2;....c1 = clock();for(j = 0; j < MATRIX_SIZE; j++){ for(i = 0; i < MATRIX_SIZE; i++){ B[i * MATRIX_SIZE + j]= A[j * MATRIX_SIZE + i] + A[j * MATRIX_SIZE + i]; }}c2 = clock();

serial_comp = c2 - c1;printf("Time serial computation: %.0f milliseconds\n", serial_comp / (CLOCKS_PER_SEC / 1000));

Amdahl's LawIntroductions

S p=T

1−r T rT / p= 11−r r / p

dSdp

= r

[1−r pr ]2≥0

lim p∞ S p = 11−r

0≤r≤1 is the fraction of the program that is perfectly parallelizable

Example: if r =0.5 the maximum speedup is 2

S(p) is an increasing function in p

if r =0.99 the maximum speedup is 100 even if we use 10000 proc.

Work and OverheadIntroductions

The amount of work done by a serial program is simply the runtime:

W n=T n

The amount of work done by a parallel program is the sum of the Amounts of work done by each process:

W n , p =∑q=0

p−1W q n , p

Thus, an alternative definition of efficiency is:

E n , p=T n

pTn , p=

W nW n , p

Work and OverheadIntroductions

Overhead is the amount of work done by the parallel program that is not done by the serial program:

T on , p =W n , p−W on=pT n , p−T n

per-process overhead: difference between the parallel runtime and the ideal parallel runtime that would be obtained with linear speedup:

T ' on , p =T n , p−T n/ pT on , p=pT ' on , p

Main sources of Overhead: communication, idle time and extra computation

In terms of Amdahl's law the efficiency achievable by a parallel program on a given problem instance will approach zero as the number of processes is increased.

ScalabilityIntroductions

Amdahl's law define the upper bound on the speedupin terms of efficiency we have:

E=T

p 1−r T rT

= 1p 1−r r

ScalabilityIntroductions

Look this assertion in terms of overhead:

T O n , p= pT n , p −T n

Efficiency can be written as:

E n , p =T n

pT n , p=

T n T O n , p T n

= 1T O n , p /T n1

Look this assertion in terms of overhead:

The behavior of the efficiency as p is increased is completely determined by the overhead function T O n , p

A parallel program is scalable if, as the number of processes (p) isincreased, we can find a rate of increase for the problem size (n) so that efficiency remains constant.

performances

Education