parallel processing comparative study 1. context how to finish a work in short time???? solution to...

72
PARALLEL PROCESSING COMPARATIVE STUDY 1

Upload: baldwin-sims

Post on 18-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

1

PARALLEL PROCESSING COMPARATIVE STUDY

Page 2: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

2

CONTEXT

How to finish a work in short time????

Solution

To use quicker worker.

Inconvenient:

The speed of worker has a limit

Inadequate for long works

Page 3: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

3

CONTEXT

How to finish a calculation in short time????

Solution

To use quicker calculator (processor).[1960-2000]

Inconvenient:

The speed of processor has reach a limit

Inadequate for long calculations

Page 4: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

4

CONTEXT

How to finish a work in short time????

Solution

1. To use quicker worker. (Inadequate for long works)

Page 5: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

5

CONTEXT

How to finish a work in short time????

Solution

1. To use quicker worker. (Inadequate for long works)

Page 6: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

6

CONTEXT

How to finish a work in short time????

Solution

1. To use quicker worker. (Inadequate for long works)2. To use more than one worker concurrently

Page 7: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

7

CONTEXT

How to finish a Calculation in short time????

Solution

1. To use quicker processor (Inadequate for long

calculations)

Page 8: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

8

CONTEXT

How to finish a Calculation in short time????

Solution

1. To use quicker processor (Inadequate for long

calculations)

Page 9: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

9

CONTEXT

How to finish a Calculation in short time????

Solution

1. To use quicker processor (Inadequate for long calculations)

2. To use more than one processor concurrently

Page 10: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

10

CONTEXT

How to finish a Calculation in short time????

Solution

1. To use quicker processor (Inadequate for long calculations)

2. To use more than one processor concurrently

Parallelism

Page 11: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

11

CONTEXT

Definition

The parallelism is the concurrent use of more than one processing unit (CPUs, Cores of processor, GPUs, or

combinations of them) in order to carry out calculations more quickly

Page 12: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

12

PROJECT GOAL

Parallelism needs

1. Parallel Computer (more than one processors)

2. Accommodate Calculation to Parallel Computer

Page 13: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

13

THE GOAL

Parallelism needs

1. Parallel Computer (more than one processors)

2. Accommodate Calculation to Parallel Computer

Page 14: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

14

THE GOAL

Parallel Computer

Several parallel computers in the hardware market

Differ in their architecture

Several Classifications

Based on the Instruction and Data Streams (Flynn classification)

Based on the Memory Charring Degree ….

Page 15: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

15

THE GOAL

Flynn ClassificationA. Single Instruction and Single Data stream

Page 16: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

16

THE GOAL

Flynn ClassificationB. Single Instruction and Multiple Data

Page 17: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

17

THE GOAL

Flynn ClassificationC. Multiple Instruction and Single Data stream

Page 18: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

18

THE GOAL

Flynn ClassificationD. Multiple Instruction and Multiple Data stream

Page 19: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

19

THE GOAL

Memory Sharing Degree Classification

A . Shared Memory B. Distributed memory

Page 20: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

20

THE GOAL

Memory Sharing Degree Classification

C. Hybrid Distributed-Shared Memory

Page 21: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

21

THE GOAL

Parallelism needs

1. Parallel Computer (more than one processors)

2. Accommodate Calculation to Parallel Computer

Dividing the calculation and data between the processors

Defining the execution scenario (how the processor cooperates)

Page 22: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

22

THE GOAL

Parallelism needs

1. Parallel Computer (more than one processors)

2. Accommodate Calculation to Parallel Computer

Dividing the calculation and data between the processors

Defining the execution scenario (how the processor cooperates)

Page 23: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

23

THE GOAL

Parallelism needs

1. Parallel Computer (more than one processors)

2. Accommodate Calculation to Parallel Computer

Dividing the calculation and data between the processors

Defining the execution scenario (how the processors cooperate)

Page 24: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

24

THE GOAL

The accommodation of calculation to parallel computer

Is called parallel processing

Depend closely on the architecture

Page 25: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

25

THE GOAL

Goal : A comparative study between

1. Shared Memory Parallel Processing approach

2. Distributed Memory Parallel Processing approach

Page 26: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

26

PLAN

1. Distributed Memory Parallel Processing approach

2. Shared Memory Parallel Processing approach

3. Case study problems

4. Comparison results and discussion

5. Conclusion

Page 27: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

27

DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH

Page 28: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

28

DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH

Distributed-Memory Computers (DMC)

= Distributed Memory System (DMS)

=

Massively Parallel Processor (MPP)

Page 29: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

29

DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH

• Distributed-memory computers architecture

Page 30: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

30

DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH

• Architecture of nodes

Nodes can be :

identical processors Pure DMC

different types of processor Hybrid DMC

different type of nodes with different Architectures Heterogeneous DMC

Page 31: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

31

DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH

• Architecture of Interconnection NetworkNo shared memory space between nodes

Network is the only way of node-communications

Network performance influence directly the performance of parallel program on DMC

Network performance depends on :

1. Topology

2. Physical connectors (as wires…)

3. Routing Technique

The DMC evolutions closely depends on the Networking evolutions

Page 32: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

32

DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH

The Used DMC in our Comparative Study

• Heterogeneous DMC

• Modest cluster of workstations

• Three nodes:

• Sony Laptop: i3 processor

• HP Laptop: i3 processor

• HP Laptop core 2 due processor

• Communication Network: 100 MByte-Ethernet

Page 33: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

33

DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH

Parallel Software Development for DMC

Designer main tasks:

1. Global Calculation decomposition and tasks assignment

2. Data decomposition

3. Communications scheme Definition

4. Synchronization Study

Page 34: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

34

DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH

Parallel Software Development for DMC

Important considerations for efficiency:

1. Minimize Communication

2. Avoid barrier synchronization

Page 35: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

35

DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH

Implementation environments

Several implementation environmentsPVM (Parallel Virtual Machine)

MPI (Message Passing Interface)

Page 36: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH

MPI Application Anatomy

All the node execute the same code

All the nodes does not do the same work

It’s possible using SPMD application form

SPMD :....

The processes are organized in one controller and workers

Contradiction

Page 37: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

37

SHARED MEMORY PARALLEL PROCESSING APPROACH

Several SMPC in the Markets

Multi-core PC: Intel i3 i5 i7 ,AMD

Which SMPC we use ?

- GPU originally for image processing

- GPU NOW : Domestic Super-Computer

Characteristics:

• Chipset and fastest Shared Memory Parallel computer

• Hard Parallel Design

Page 38: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

38

SHARED MEMORY PARALLEL PROCESSING APPROACH

The GPU Architecture

The implementation environment

Page 39: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

39

SHARED MEMORY PARALLEL PROCESSING APPROACH

GPU Architecture

As the classical processing unit, the Graphics Processing Unit is composed from two main components:

A- Calculation Units B- Storage Unit

Page 40: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

40

SHARED MEMORY PARALLEL PROCESSING APPROACH

Page 41: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

41

SHARED MEMORY PARALLEL PROCESSING APPROACHSHARED MEMORY PARALLEL PROCESSING APPROACH

Page 42: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

42

SHARED MEMORY PARALLEL PROCESSING

The GPU Architecture

The implementation environment 1. CUDA : for GPUS manufactured by NVIDIA

2. OpenCL: independent of the GPU architecture

Page 43: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

43

SHARED MEMORY PARALLEL PROCESSING

CUDA Program Anatomy

Page 44: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

44

SHARED MEMORY PARALLEL PROCESSING

Q: How to execute code fragments to be parallelized in the GPU?

R: By Calling a kernel

Q: What’s Kernel ?

R: A kernel is a function callable from the host and executed on the device simultaneously by many threads in parallel

Page 45: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

45

KERNEL LAUNCH

SHARED MEMORY PARALLEL PROCESSING

Page 46: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

46

KERNEL LAUNCH

SHARED MEMORY PARALLEL PROCESSING

Page 47: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

47

KERNEL LAUNCH

SHARED MEMORY PARALLEL PROCESSING

Page 48: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

48

SHARED MEMORY PARALLEL PROCESSING

Design recommendations

utilize the shared memory to reduce the amount of time to

access the global memory.

reduce the amount of idle threads ( control divergence) to fully

utilize the GPU resource.

Page 49: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

49

CASE STUDY PROBLEM

Square Matrix multiplication problem

• ALGORITHM: ()

// Input: Two matrices and

// Output: Matrix

for to do

for to do

for to do

return

• Complexity:

If we use big notation the

Page 50: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

50

CASE STUDY PROBLEMPi approximation

• ALGORITHM: PiApprox ()

// Input: number of Bins

// Output: approximation

for to do

return

• Complexity:

If we use big notation the.

Page 51: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

51

COMPARSION

• Comparisons Creteria

• Analysis and conclusion

Page 52: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

52

COMPARISONCriteria 1: Time-Cost factor

𝑇𝐶𝐹 = ∗  𝑃𝐸𝑇 𝐻𝐶𝑃𝐸𝑇: Parallel Execution Time (in Milliseconds)𝐻𝐶: The Hardware Cost (in Saudi Arabia Riyals)

The Hardware costs( )𝐻𝐶GPU : 5000 SAR𝐻𝐶Cluster of workstation : 9630 SAR. 𝐻𝐶

Page 53: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

53

COMPARISON

0 500 1000 1500 20000

5000000000

10000000000

15000000000

20000000000

25000000000

30000000000

35000000000

40000000000

45000000000

50000000000

Time Cost-Factor from the matrix multiplication prob-lem

GPU

cluster

matrix size

TCF

0 2000 4000 6000 8000 10000 12000 140000

2000

4000

6000

8000

10000

12000

14000

16000Time Cost-Factor from the PI approximation problem

GPU

cluster

bins number

TCF

Page 54: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

54

COMPARISON

Conclusion:

GPU is better if we need to perform a lot of number of small

amount of iterations calculation.

However if our need is to perform a calculation with big

amount of iterations, the cluster of workstations is the best

choice.

Page 55: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

55

COMPARISONCriteria 2: required Memory

Matrix multiplication problem

Graphics Processing UnitThe Global-Memory-based-method requirement:

ℎ 𝑇 𝑒 𝑇𝑜𝑡𝑎𝑙 𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑀𝑒𝑚𝑜𝑟𝑦=6∗ ∗ ∗ 𝑛 𝑛 𝑠𝑖𝑧𝑒𝑜𝑓 𝑓𝑙𝑜𝑎𝑡The Shared-Memory-based-method requirement:

ℎ 𝑇 𝑒 𝑇𝑜𝑡𝑎𝑙 𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑀𝑒𝑚𝑜𝑟𝑦=8∗ ∗ ∗ 𝑛 𝑛 𝑠𝑖𝑧𝑒𝑜𝑓 𝑓𝑙𝑜𝑎𝑡Cluster of workstations

The used cluster contains three nodes

ℎ 𝑇 𝑒 𝑇𝑜𝑡𝑎𝑙 𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑀𝑒𝑚𝑜𝑟𝑦=19/3∗ ∗ ∗ 𝑛 𝑛 𝑠𝑖𝑧𝑒𝑜𝑓 𝑓𝑙𝑜𝑎𝑡

Page 56: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

56

COMPARISONCriteria 2: required Memory

Pi approximation problem

• Graphics Processing Unit The size of these arrays depends on the number of used thread

The required memory = ∗ ∗ 𝟐 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒉𝒓𝒆𝒂𝒅𝒔 𝒔𝒊𝒛𝒆𝒐𝒇 𝒅𝒐𝒖𝒃𝒍𝒆• Cluster of workstations

Small amount of memory used on each node almost 15 ∗ 𝑠𝑖𝑧𝑒𝑜𝑓𝑑𝑜𝑢𝑏𝑙𝑒

Page 57: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

57

COMPARISON

Criteria 2: required Memory

Conclusion:

We cannot judge which parallel approach is the better for the required memory criteria. This criteria depends on the intrinsic characteristics of the on-hand problem.

Page 58: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

58

COMPARISON

Criteria 3 : The Gap between the Theoretical Complexity and Effective Complexity

• The Gap between the Theoretical Complexity and Effective Complexity-calculated by:

𝐺𝑎𝑝=(( / )−1)×100𝐸𝑃𝑇 𝑇𝑃𝑇𝐸𝑃𝑇: Experimental Parallel Time𝑇𝑃𝑇: Theoretical Parallel Time

𝑇𝑃𝑇 = /𝑆𝑇 𝑁𝑆𝑇: Sequential Time.𝑁: Number of processing unit.

Page 59: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

59

CLUSTER OF WORKSTATIONS

0 200 400 600 800 1000 1200 1400 1600 1800 20000

10000

20000

30000

40000

50000

60000

The Gap between the Theoretical complexity and E ective Complexity fffor Matrix multiplication problem - cluster of workstations

Matrix size

Gap

0 2000 4000 6000 8000 10000 120000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

The Gap between the Theoretical complexity and Effective Complexity for Pi approximation problem-

cluster of workstaion

Bin

Gap

COMPARISONCriteria 3 : The Gap between the Theoretical Complexity and Effective Complexity

Page 60: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

60

GRAPHICS PROCESSING UNIT

0 200 400 600 800 1000 1200 1400 1600 1800 2000

-60000

-50000

-40000

-30000

-20000

-10000

0

The Gap between the Theoretical complexity and Effective Complexity for Matrix multiplication prob-

lem- GPU.

Matrix size

Gap

0 2000 4000 6000 8000 10000 12000

-0.4

-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

The Gap between the Theoretical complexity and Effective Complexity for Pi approximation problem -

GPU

Bin

Gap

COMPARISONCriteria 3 : The Gap between the Theoretical Complexity and Effective Complexity

Page 61: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

61

COMPARISON

Conclusion

In the GPU, the resulting execution time of parallel program can give less time than the theoretical expected time . That is impossible to achieve when using a Cluster of workstation because of the communication overhead.

To minimize the Gap, or take it constant, in the cluster of workstations, the designer has to maintain constant, as possible, number and sizes of communicated messages when increasing the problem size.

Criteria 3 : The Gap between the Theoretical Complexity and Effective Complexity

Page 62: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

62

COMPARISON

Criteria 4: Efficiency

: Sequential Time.

: Parallel Time.

: Number processing unit

Page 63: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

63

CRITERIA 4: EFFICIENCY

0 200 400 600 800 1000 1200 1400 1600 1800 20000123456789

10111213141516

Matrix multiplication problem

cluster

GPU

matrix size

efficie

ncy

0 2000 4000 6000 8000 10000 120000

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Pi approximation

Cluster

GPU

Bins number

efficie

ncy

COMPARISON

Page 64: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

64

COMPARISON

Criteria 4: Efficiency

• Conclusion: The efficiency (speedup) is much better in the GPU than in the cluster of workstations.

Page 65: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

65

IMPORTANT NOTIFICATION

one process (CPU) one thread (GPU)0

20000

40000

60000

80000

100000

120000

140000

160000

matrix sequential solution

(32*32) (128*128) (512*512) (1000*1000) (1805*1805)

ms

one process CPU one thread GPU0

2

4

6

8

10

12

14

PI sequential solution

100 1000 10000

ms

COMPARISON

Page 66: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

IMPORTANT NOTIFICATION

Page 67: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

67

COMPARISON

• Criteria 5: Hardness of development

• Cuda

• MPI

Page 68: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

68

COMPARISON

• Criteria 6: necessary hardware and software materials

• GPU (Nvidia gt 525m )

• Cluster of workstation( 3 pc, switch, internet modem and wires)

Page 69: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

69

Page 70: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

70

CONCLUSION

Page 71: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

Parallel Processing Comparative Study

Shared Memory Parallel Processing Approach Distributed Memory Parallel Processing Approach

Graphics Processing Unit (GPU) Cluster Of work-station

GPU and Cluster are the main two components of the Fastest Word Computers (As Shahin)

To compare we use : Two different problems (Matrix-Multiplication and Pi Approximation) Six Measure’s Criteria

More Adequate for Data-Level Parallelism Form More Adequate for Task –Level Parallelism Form

Big number of small calculation A Big calculation

Memory requirement ̴ Problem Characteristics Memory requirement ̴ Problem Characteristics

Better than the expected Run Time Impossible Null or Negative GAP

Complicate Design and programming Less complicated

Implementation environment very practical Complicated

Page 72: PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker

72