parallel processing: why, when, how?why, when, … · parallel processing why when how?parallel...
TRANSCRIPT
![Page 1: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/1.jpg)
Parallel Processing:Why, When, How?Why, When, How?
SimLab2010 BelgradeSimLab2010, BelgradeOctober, 5, 2010R di P t lRodica Potolea
![Page 2: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/2.jpg)
Parallel Processing Why When How?Parallel Processing Why, When, How?
• Why?• Why?• Problems “too” costly to be solved with the classical
approach• The need of results on specific (or reasonable) timeThe need of results on specific (or reasonable) time
• When?• Are there any sequences which are better suited for
parallel implementationparallel implementation• Are there situations when is better NOT to parallelize
• How?Whi h th t i t (if ) h d t• Which are the constraints (if any) when we need to pass from sequential to parallel implementation
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 2
![Page 3: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/3.jpg)
OutlineOutline
• Computation paradigms• Computation paradigms• Constraints in parallel computation• Dependence analysis• Techniques for dependences removalq p• Systolic architectures & simulations• Increase the parallel potential• Increase the parallel potential • Examples on GRID
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 3
![Page 4: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/4.jpg)
OutlineOutline
• Computation paradigms• Computation paradigms• Constraints in parallel computation• Dependence analysis• Techniques for dependences removalq p• Systolic architectures & simulations• Increase the parallel potential• Increase the parallel potential • Examples on GRID
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 4
![Page 5: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/5.jpg)
Computation paradigmsComputation paradigms
• Sequential model –• referred to as Von Neumann architecturereferred to as Von Neumann architecture• basis of comparison • he simultaneously proposed other models• he simultaneously proposed other models• due to the technological constraints, at that time,
only the sequential model has been implementedonly the sequential model has been implemented• Von Neumann could be considered as the founder
of parallel computers as well (conceptual view)
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 5
of parallel computers as well (conceptual view)
![Page 6: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/6.jpg)
Computation paradigms tdComputation paradigms contd.
• Flynn’s taxonomy (according to instruction/data flow
single/multiple)• SISD • SIMD single control unit dispatches instructions to each• SIMD single control unit dispatches instructions to each
processing unit
• MISD the same data is analyzed from different points of• MISD the same data is analyzed from different points of view (MI)
• MIMD different instructions run on different data
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 6
• MIMD different instructions run on different data
![Page 7: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/7.jpg)
Parallel computationParallel computation
Wh• Why parallel solutions?• Increase speed• Process huge amount of data• Solve problems in real timep• Solve problems in due time
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 7
![Page 8: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/8.jpg)
Parallel computation EfficiencyParallel computation Efficiency• Parameters to be evaluated (in sequential view)
Ti• Time• Memory
• Cases to be considered• Best• Worst• Average
• Optimal algorithms• Ω() = O()• An algorithm is optimal if the running time in the worstAn algorithm is optimal if the running time in the worst
case is of the same order of magnitude with function Ωof the problem to be solved
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 8
![Page 9: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/9.jpg)
Parallel computation Efficiency dcontd.
• Order of magnitude: a function which expresses th i ti d Ω ti lthe running time and Ω respectively• if polynomial function, the polynomial should have the
same maximum degree, regardless the multiplicative constantconstant
• Ex: Ω() = c1np and O()= c2np it is considered Ω = O, written as Ω(n2) = O(n2)
Al ith h O() > Ω i th t• Algorithms have O() >=Ω in the worst case• O() <= Ω could be true for best case scenario, or even
for the average case( l ) d ( ) b i f• Ex Sorting: Ω(n lgn) and O(1) best case scenario for
some algs.
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 9
![Page 10: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/10.jpg)
Parallel computation Efficiencydcontd.
• Do we really need parallel computers? • Any “computing greedy” algorithms better suited for
parallel implementation?• Consider 2 classes of algorithms:• Consider 2 classes of algorithms:
• Alg1: polynomial• Alg2: exponentialg p
• Assume the design of a new system which increases the speed of execution V times
• ? How does the maximum dimension of the problem that can be solved in the new system increases? i e express n=f(V)
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 10
increases? i.e. express n=f(V)
![Page 11: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/11.jpg)
Parallel computation Efficiencytdcontd.
Alg1: O(nk)Alg1: O(n )Oper. Time
M1( ld) k TM1(old): nk TM2(new): nk T/V( ) /
Vnk T(n )k= Vn k (v1/k n )k(n2)k= Vn1
k=(v1/k n1)k
n2= v1/k*n1 => n2= c*n1
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 11
2 1 2 1
![Page 12: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/12.jpg)
Parallel computation Efficiencytdcontd.
Alg2: O(2n)Alg2: O(2 )Oper. Time
M1(old): 2n T M2(new): 2n T/VM2(new): 2 T/V
V2n T(2) V 2 2 l V(2) n2 = V 2 n1 =2 lgV + n
1
n2= n1 + lgV => n2= c + n1
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 12
n2 n1 + lgV > n2 c + n1
![Page 13: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/13.jpg)
Parallel computation Efficiencycontd.
Speed of the new machine in terms of the pspeed of the old machine: V2=V ·V1
Alg1: O(nk): n2= v1/k *nAlg2: O(2n): n2= n + lgV
Conclusion: BAD NEWS!!5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 13
Conclusion: BAD NEWS!!
![Page 14: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/14.jpg)
Parallel computation Efficiencycontd.
Are there algorithms that really need exponential g y prunning time?
backtrackingavoid it! “Never” use!
Are there problems that are inherently exponential? p y pNP-complete, NP-hard problems“difficult” problems – we do not face them indifficult problems we do not face them in
real-world situations often?Hamiltonian cycle, sum problem, …
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 14
y , p ,
![Page 15: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/15.jpg)
Parallel computation Efficiencycontd.
• Parameters to be evaluated for• Parameters to be evaluated for parallel processing• Time
Memory• Memory• Number of processors involved• Cost = Time * Number of processors
(c = t * n)• Efficiency
• E = O()/cost• E = O()/cost • E ≤ 1
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 15
![Page 16: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/16.jpg)
Parallel computation Efficiencycontd.
• Speedup• Speedup• Δ = best sequential execution time/time of
parallel execution • Δ ≤ nΔ ≤ n
• Optimal algorithms• E=1, Δ = n• Cost = O()• Cost = O()• A parallel algorithm is optimal if the cost is of
the same order of magnitude with the best known algorithm to solve the problem in theknown algorithm to solve the problem in the worst case
• Other
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 16
![Page 17: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/17.jpg)
Efficiency tEfficiency cont
• Parallel running time• Computation time• Communication time (data transfer, partial
l f f )results transfer, information communication)• Computation time
• As the number of processors involved increases, the computation time decreases
Comm nication time• Communication time• Quite the opposite!!!
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 17
![Page 18: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/18.jpg)
Efficiency tEfficiency cont
• Diagram of running timeT Diagram of running timeT
N
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 18
![Page 19: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/19.jpg)
OutlineOutline
• Computation paradigms• Computation paradigms• Constraints in parallel computation• Dependence analysis• Techniques for dependences removalq p• Systolic architectures & simulations• Increase the parallel potential• Increase the parallel potential • Examples on GRID
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 19
![Page 20: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/20.jpg)
Constraints in parallel computationConstraints in parallel computation
• Data/code to be distributed among / gthe different processing units in the systemy
• Observe the precedence relations between statements/variablesbetween statements/variables
• Avoid conflicts (data and control dependencies)dependencies)
• Remove constraints (where possible)
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 20
![Page 21: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/21.jpg)
Constraints in parallel computation dConstraints in parallel computation contd.
• Data parallel• Input data split to be processed by different CPU• Characterizes the SIMD model of computation
C t l ll l• Control parallel• Different sequences processed by different CPU• Characterizes the MISD and MIMD models of• Characterizes the MISD and MIMD models of
computation • Conflicts should be avoided
if constraint removed then parallel sequence of codeelse sequential sequence of code
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 21
![Page 22: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/22.jpg)
OutlineOutline• Computation paradigms• Constraints in parallel computation• Dependence analysis• Dependence analysis• Systolic architectures & simulations
h f d d l• Techniques for dependences removal• Increase the parallel potential p p• Examples on GRID
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 22
![Page 23: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/23.jpg)
Dependence analysisDependence analysis
• Dependences represent the majorDependences represent the major limitation in parallelism
• Data dependencies• Data dependencies• Refers to read/write conflicts
• Data Flow dependence• Data Flow dependence• Data Antidependence• Output dependenceOutput dependence
• Control dependencies• Exist between statements
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 23
• Exist between statements
![Page 24: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/24.jpg)
Dependence analysisDependence analysis• Data Flow dependence
i b f d• Indicates a write before read ordering of operations which must be satisfied
• Can NOT be avoided therefore represents the limitation of the parallelism’s potential
Ex: (assume sequence in a loop on i)
S1: a(i) = x(i) – 3 //assign a(i), i.e. write in its memory location
S2: b(i) = a(i) /c(i)//use a(i), i.e. read from the cor. mem loc.
• a(i) should be first calculated in S1, and only then used in S2. Therefore, S1 and S2 cannot be processed in parallel!!!
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 24
![Page 25: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/25.jpg)
Parallel Algorithms cont.Dependencies analysis
• Data AntidependenceI di t read before write d i f ti hi h• Indicates a read before write ordering of operations which must be satisfied
• Can be avoided (techniques to remove this type of dependence)Ex: (assume sequence in a loop on i)Ex: (assume sequence in a loop on i)
S1: a(i) = x(i) – 3 // write laterS2 b(i) (i) + (i+1) // d fi tS2: b(i) = c(i) + a(i+1) // read first
• a(i+1) should be first used (with its former value) in S2, and only then computed in S1 at the next execution of the looponly then computed in S1, at the next execution of the loop. Therefore, several iterations of the loop cannot be processed in parallel!
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 25
![Page 26: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/26.jpg)
Parallel Algorithms cont.Dependencies analysis
• Output dependenceI di t write before write d i f ti hi h• Indicates a write before write ordering of operations which must be satisfied
• Can be avoided (techniques to remove this type of dependence)Ex: (assume sequence in a loop on i)Ex: (assume sequence in a loop on i)
S1: c(i+4) = b(i) + a(i+1) //assign first hereS2 (i+1) (i) // d h ft 3S2: c(i+1) = x(i) //and here after 3
//executions of the loop
• c(i+4) is assigned in S1; after 3 executions of the loop the same• c(i+4) is assigned in S1; after 3 executions of the loop, the same element in the array is assigned in S2. Therefore, several iterations of the loop cannot be processed in parallel
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 26
![Page 27: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/27.jpg)
Parallel Algorithms cont.Dependencies analysis
• Each dependence defines a dependence vector• They are defined by the distance between the loop where they interfere• They are defined by the distance between the loop where they interfere
(data flow)S1: a(i) = x(i) – 3 d = 0 on var aS2: b(i) = a(i) /c(i)S2: b(i) = a(i) /c(i)
(data antidep.)S1: a(i) = x(i) – 3 d = 1 on var aS2: b(i) = c(i) + a(i+1)S2: b(i) = c(i) + a(i+1)
(output dep.)S1: c(i+4) = b(i) + a(i+1) d = 3 on var cS2: c(i+1) = x(i)S2: c(i+1) = x(i)
• All dependence vectors within a sequence define a dependence matrix
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 27
![Page 28: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/28.jpg)
Parallel Algorithms cont.Dependencies analysis
• Data dependencies• Data Flow dependence (wbr) cannotData Flow dependence (wbr) cannot
be removed
• Data Antidependence (rbw) can be removed
• Output dependence (wbw) can be removed
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 28
![Page 29: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/29.jpg)
OutlineOutline
• Computation paradigms• Computation paradigms• Constraints in parallel computation• Dependence analysis• Dependence analysis• Techniques for dependences
removal• Systolic architectures & simulations• Increase the parallel potential p p• Examples on GRID
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 29
![Page 30: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/30.jpg)
Techniques for dependencies removalTechniques for dependencies removal
• Many dependencies are introduced artificially• Many dependencies are introduced artificially by programmers
• Output and antidependencies = falsep pdependencies
• Occur not because data is passed but b th l ti i d ibecause the same memory location is used in more than one place
• Removal of (some of) those dependencies• Removal of (some of) those dependencies increase the parallelization potential
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 30
![Page 31: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/31.jpg)
Techniques for dependencies removal tdTechniques for dependencies removal contd.
• RenamingS1: a = b * cS1: a = b c S2: d = a + 1S3: a = a * d
• Dependence analysis:• Dependence analysis:• S1->S2 (data flow on a)• S2->S3 (data flow on d)( )• S1->S3 (data flow on a)• S1->S3 (output dep. on a)
( d d )• S2->S3 (antidependence on a)
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 31
![Page 32: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/32.jpg)
Techniques for dependencies removal tdTechniques for dependencies removal contd.
• RenamingS1: a = b * cS1: a = b c S2: d = a + 1S’3: newa = a * d
• Dependence analysis:• Dependence analysis:• S1->S2 (data flow on a) holds• S2->S’3 (data flow on d) holds( )• S1->S'3 (data flow on a) holds• S1->S’3 (output dep. on a)
’ ( d d )• S2->S’3 (antidependence on a)
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 32
![Page 33: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/33.jpg)
Techniques for dependencies removal tdTechniques for dependencies removal contd.
• Expansionfor i=1 to nfor i 1 to nS1: x = b(i) - 2 S2: y = c(i) * b(i)S3: a(i) = x + y
• Dependence analysis:• Dependence analysis:• S1->S3 (data flow on x – same iteration)• S2->S3 (data flow on y – same iteration)S S3 (data o o y sa e te at o )• S1->S1 (output dep. on x – consecutive iterations)• S2->S2 (output dep. on y – consecutive iterations)• S3->S1 (antidep. on x – consecutive iterations)• S3->S2 (antidep. on y – consecutive iterations)
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 33
![Page 34: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/34.jpg)
Techniques for dependencies removal tdTechniques for dependencies removal contd.
• Expansion (loop iterations become independent)for i=1 to nfor i=1 to nS’1: x(i) = b(i) - 2 S’2: y(i) = c(i) * b(i)S’3: a(i) = x(i) + y(i)
• Dependence analysis:• S1->S3 (data flow on x) holds
S2 >S3 (data flow on y) holds• S2->S3 (data flow on y) holds• S1->S1 (output dep. on x)• S2->S2 (output dep. on y)( p p y)• S3->S1 (antidep. on x) replaced by the same data flow on x
• S3->S2 (antidep. on y) replaced by the same data flow on y
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 34
![Page 35: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/35.jpg)
Techniques for dependencies removal tdTechniques for dependencies removal contd.
• Splittingfor i=1 to nS1: a(i) = b(i) + c(i) S2: d(i) = a(i) + 2S3: f(i) = d(i) + a(i+1)( ) ( ) ( )
• Dependence analysis:• S1->S2 (data flow on a(i) – same iteration)• S1 >S2 (data flow on a(i) same iteration)• S2->S3 (data flow on d(i) – same iteration)
S3 >S1 ( tid (i 1) )• S3->S1 (antidep. on a(i+1) – consecutive iterations)We have a cycle of dependencies: S1->S2 ->S3->S1
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 35
![Page 36: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/36.jpg)
Techniques for dependencies removal tdTechniques for dependencies removal contd.
• Splitting (cycle removed)for i=1 to nfor i=1 to nS0: newa(i) = a(i+1) //copy the value obtained at the previous stepS1: a(i) = b(i) + c(i) S2: d(i) = a(i) + 2S’3: f(i) = d(i) + newa(i)S 3: f(i) = d(i) + newa(i)
• Dependence analysis:• S1->S2 (data flow on a(i) – same iteration)• S1 >S2 (data flow on a(i) same iteration)• S2->S’3 (data flow on d(i) – same iteration)• S’3->S1 (antidep on a(i+1)– consec iterations)• S 3 >S1 (antidep. on a(i+1) consec. iterations)• S0->S’3 (data flow on a(i+1) – same iteration)• S0->S1 antidep on a(i+1) consecutive iterations)
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 36
• S0 >S1 antidep. on a(i+1) – consecutive iterations)
![Page 37: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/37.jpg)
OutlineOutline
• Computation paradigms• Computation paradigms• Constraints in parallel computation• Dependence analysis• Techniques for dependences removalq p• Systolic architectures & simulations• Increase the parallel potential• Increase the parallel potential • Examples on GRID
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 37
![Page 38: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/38.jpg)
Systolic architectures & simulationssimulations
• Systolic architectureSystolic architecture• Special purpose architecture• pipe of processing units called cells
• Advantages• Faster • Scalable
• Limitations• Expensive • Highly specialized for particular applicationsg y p p pp• Difficult to build => simulations
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 38
![Page 39: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/39.jpg)
Systolic architectures & simulationssimulations
Goals for simulation:• Goals for simulation:• Verification and validation• Tests and availability of solutions• EfficiencyEfficiency • Proof of concept
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 39
![Page 40: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/40.jpg)
Systolic architectures & simulationssimulations
• Models of systolic architectures• Models of systolic architectures• Many problems in terms of systems of liner
equations• Linear arrays (from special purpose architectures
to multi-purpose architectures)• Computing values of a polynomialp g p y• Convolution product• Vector-matrix multiplication
• Mesh arraysMesh arrays• Matrix multiplication• Speech recognition (simplified mesh array)
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 40
![Page 41: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/41.jpg)
Systolic architectures & simulationssimulations
• All but one simulated within a software• All but one simulated within a software model in PARLOG• Easy to implementy p• Ready for run• Specification language S h i i P l/C b h• Speech recognition Pascal/C submesh array• Vocabulary of thousands of words• Vocabulary of thousands of words• Real time recognition
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 41
![Page 42: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/42.jpg)
Systolic architectures & simulationssimulations
PARLOG example ( )• PARLOG example (convolution product)
cel([],[],_,_,[],[]).cel([X|Tx],[Y|Ty],B,W,[Xo|Txo],[Yo|Tyo])<-([ | ],[ | y], , ,[ | ],[ | y ])
Yo is Y+W*x,Xo is B,Xo is B,cel(Tx,Ty,X,W,Txo,Tyo).
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 42
![Page 43: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/43.jpg)
OutlineOutline
• Computation paradigms• Computation paradigms• Constraints in parallel computation• Dependence analysis• Techniques for dependences removalq p• Systolic architectures & simulations• Increase the parallel potential• Increase the parallel potential• Examples on GRID
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 43
![Page 44: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/44.jpg)
Program transformationsProgram transformations
Loop transformation• Loop transformation• To allow execution of parallel loops• Techniques borrowed from compilers
techniques
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 44
![Page 45: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/45.jpg)
Program transformations tdProgram transformations contd.
• Cycle shrinkingA li h d d i l f l i l• Applies when dependencies cycle of a loop involves dependencies between loops far apart (say distance d)
• The loop is executed on parallel on that distance d
for i = 1 to na(i+d) = b(i) – 1b(i d) (i) (i)b(i+d) = a(i) + c(i)
Transforms intof i1 1 dfor i1 = 1 to n step d
parallel for i=i1 to i1+d-1a(i+d) = b(i) – 1
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 45
b(i+d) = a(i) + c(i)
![Page 46: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/46.jpg)
Program transformations tdProgram transformations contd.
• Loop interchanging• Consist of interchanging of indexes• It requires the new dep vector to be positivef j 1 tfor j = 1 to n
for i = 1 to na(i j) = a(i-1 j) +b(i-2 j)a(i, j) = a(i-1, j) +b(i-2,j)
Inner loop cannot be vectorised (executed in parallel)
for i = 1 to nfor i 1 to nparallel for j = 1 to n
a(i, j) = a(i-1, j) +b(i-2,j)
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 46
![Page 47: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/47.jpg)
Program transformations tdProgram transformations contd.
• Loop fusion• Transform 2 adjacent loops into a single one• Reduces the overhead of loops• It requires not to violate the ordering imposed by data equ es o o o a e e o de g posed by da a
dependenciesfor i = 1 to n
a(i) = b(i) + c(i+1)a(i) = b(i) + c(i+1)for i = 1 to n
c(i) = a(i-1) //write before read
for i = 1 to na(i) = b(i) + c(i+1)
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 47
c(i) = a(i-1)
![Page 48: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/48.jpg)
Program transformations tdProgram transformations contd.
• Loop collapsing• Combines 2 nested loops• Increases the vector length
f i 1 tfor i = 1 to nfor j = 1 to m
a(i j) = a(i j-1) + 1a(i,j) = a(i,j-1) + 1
for ij = 1 to n*mi=i*j div mj=ij mod m
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 48
a(i,j) = a(i,j-1) + 1
![Page 49: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/49.jpg)
Program transformations tdProgram transformations contd.
• Loop partitioning• Partition into independent problemsfor i = 1 to nfor i = 1 to n
a(i) = a(i-g) + 1Loop can be split into g independent problemsLoop can be split into g independent problems
for i = 1 to (n div g) *g +k step ga(i) = a(i-g) + 1
Where k=1,g
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 49
Each loop could be executed in parallel
![Page 50: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/50.jpg)
OutlineOutline
• Computation paradigms• Computation paradigms• Constraints in parallel computation• Dependence analysis• Systolic architectures & simulationsy• Techniques for dependences removal• Increase the parallel potential• Increase the parallel potential • Examples on GRID
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 50
![Page 51: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/51.jpg)
GRIDGRID
• It provides access to• It provides access to• computational power• storage capacity• storage capacity
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 51
![Page 52: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/52.jpg)
Examples on GRIDExamples on GRID
M t i lti li ti• Matrix multiplication• Matrix power• Gaussian elimination• Sorting• Graph algorithms
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 52
![Page 53: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/53.jpg)
Examples on GRIDMatrix multiplication tMatrix multiplication cont.
Analysis 1D (absolute analysis)y ( y )Matrix Multiplication 1D
10000
100000
100
1000
10000
Tim
e
1003007001000
1
10
0 5 10 15 20 25 30 35 40
Processors
• Optimal sol: p=f(n)
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 53
![Page 54: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/54.jpg)
Examples on GRIDMatrix multiplication tMatrix multiplication cont.
Analysis 1D (relative analysis)Matrix Multiplication 1D
80
90
100
30
40
50
60
70
Tim
e %
1003007001000
0
10
20
0 5 10 15 20 25 30 35 40
Processors
Computation time (for small dim) is small compared to communication time: hence parallel implementation with 1D efficient on very large
matrix dim.
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 54
![Page 55: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/55.jpg)
Examples on GRIDMatrix multiplication tMatrix multiplication cont.
Analysis 2D (absolute analysis)Matrix Multiplication 2DMatrix Multiplication 2D
10000
100000
100
1000
Time
1000x1000700x700300x300100x100
10
100
11 4 9 16 25 36
Processors
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 55
![Page 56: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/56.jpg)
Examples on GRIDMatrix multiplication tMatrix multiplication cont.
Analysis 2D (relative analysis)M i M l i li i 2DMatrix Multiplication 2D
80
90
100
40
50
60
70
Tim
e %
1000700
300
100
0
10
20
30
0 5 10 15 20 25 30 35 40
Processors
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 56
![Page 57: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/57.jpg)
Examples on GRIDMatrix multiplication tMatrix multiplication cont.
Comparison 1D- 2D
Comparison 1D - 2D
100000
100
1000
10000
Tim
e
100 - 1D300 - 1D700 - 1D1000 - 1D100 - 2D300 2D
1
10
0 5 10 15 20 25 30 35 40
300 - 2D700 - 2D1000 - 2D
0 5 10 15 20 25 30 35 40
Processors
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 57
![Page 58: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/58.jpg)
Examples on GRID Graph algorithmsExamples on GRID Graph algorithms
• Minimum Spanning Trees• Minimum Spanning Trees7 2
3 15 1 8 10 5
7 2 5
Graph with 7 vertices: MST will have 6 edges with the sum of their weight minimumPossible solutions: MST weight = 20
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 58
![Page 59: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/59.jpg)
Examples on GRID Graph algorithms tExamples on GRID Graph algorithms cont.
• MST Absolute Analysis • MST Relative AnalysisMinimum Spannig Tree
10000
100000
Minumum Spanning Tree
80
90
100
100
1000
Tim
e
500020001000500
30
40
50
60
70
Tim
e %
500100020005000
1
10
1 2 4 8 10 20
P
0
10
20
30
1 2 4 8 10 20 25
ProcessorsProcessors
• Observe the min value shifts to the right as graph’s dimension increases
• Observe the running time improvement with one order of
it d ( d t ti l
Processors
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 59
magnitude (compared to sequential approach) for graphs with n = 5000
![Page 60: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/60.jpg)
Examples on GRID Graph algorithms tExamples on GRID Graph algorithms cont.
• Transitive Closure: source-partitionedE h t i i d t d h t• Each vertex vi is assigned to a process, and each process computes shortest path from each vertex in its set to any other vertex in the graph, by using the sequential algorithm (hence, data partitioning)Req i es no inte p ocess comm nication• Requires no inter process communication
Transitive closure - source partition
1000000
100
1000
10000
100000
Tim
e
20050010002000
1
10
100
1 2 4 8 16 20
Processors
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 60
![Page 61: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/61.jpg)
Examples on GRID Graph algorithms tExamples on GRID Graph algorithms cont.
• Transitive Closure: source-paralleli h t t t f d th ll l• assigns each vertex to a set of processes and uses the parallel
formulation of the single-source algorithm (similar to MST)• Requires inter process communication
Transitive closure - source parallel
1000000
10000000
1000
10000
100000
Tim
e
20050010002000
1
10
100
1 2 4 8 16
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 61
Processors
![Page 62: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/62.jpg)
Examples on GRID Graph algorithms tExamples on GRID Graph algorithms cont.
• Graph Isomorphism: exponential running time!• Graph Isomorphism: exponential running time!• search if 2 graphs matches• trivial solution: generate all permutations of a graph (n!) and match with
the other graphg p
n!
• Each process is responsible for computing (n 1)! (data parallel approach)
(n-1)! (n-1)! (n-1)!
• Each process is responsible for computing (n-1)! (data parallel approach)
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 62
![Page 63: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/63.jpg)
Examples on GRID Graph algorithms tExamples on GRID Graph algorithms cont.
Graph Isomorphism
100000
1000000
le
1000
10000
arith
mic
sca
9 vertices10 vertices11 vertices
10
100
Tim
e - l
oga 11 vertices
12 vertices
10 2 4 6 8 10 12 14
Processors
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 63
![Page 64: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/64.jpg)
Conclusions•The need for parallel processing (why)
Existence of problems that require large amount of time to be solved (like NPcomplete and NP hard problems)NPcomplete and NP hard problems)
•The constraints in parallel processing (how)
Dependence analysis: define the boundaries of the problem
•The particularities of the solution give the amount p gof parallelism (when)
The parabolic shape: more significant when large communication is i l dinvolved
Shift of optimal number of processors
How parallelization is done matters
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 64
How parallelization is done matters
Time up to 2 orders of magnitude decrease
![Page 65: Parallel Processing: Why, When, How?Why, When, … · Parallel Processing Why When How?Parallel Processing Why, When, How? • Why? • Problems “too” costly to be solved with](https://reader030.vdocuments.net/reader030/viewer/2022021706/5b7e2bec7f8b9a03248b8e35/html5/thumbnails/65.jpg)
Thank you for your attention!
Questions?
5 Oct. 10 Rodica Potolea - CS Dept., T.U. Cluj-Napoca, Romania 65