module 5 parallel processing

8/8/2019 Module 5 Parallel Processing

1/15

MODULE 5 PARALLEL PROCESSING

Page 1/15SNGCE
http://www.nitropdf.com/http://www.artspdf.com/


2/15

What is Parallel Computing?

Traditionally, software has been written forserialcomputation:

o To be run on a single computer having a single Central Processing Unit (CPU);o A problem is broken into a discrete series of instructions.

o Instructions are executed one after another.o Only one instruction may execute at any moment in time.

In the simplest sense,parallel computingis the simultaneous use of multiple compute

resources to solve a computational problem:

o To be run using multiple CPUs

o A problem is broken into discrete parts that can be solved concurrently

o Each part is further broken down to a series of instructions

o Instructions from each part execute simultaneously on different CPUs

The computer resources can include:

o A single computer with multiple processors;

o An arbitrary number of computers connected by a network;

o A combination of both.

The computational problem usually demonstrates characteristics such as the ability to be:

Page 2/15SNGCE


3/15

o Broken apart into discrete pieces of work that can be solved simultaneously;

o Execute multiple program instructions at any moment in time;

o Solved in less time with multiple compute resources than with a single computeresource.

DISADVANTAGES:

1. Cache coherence problem:

When multiple processors are connected to form a high performance system, there is a

possibility for cache coherence problem. The processors connected in a system may have theirown cache memories and when they are updating their corresponding cache, it is not updated in

main memory and leads to cache coherence problem. This can be overcome by write through or

write back protocol. Also MESI (Modified, Exclusive, Shared and Invalid) protocol.

PIPELINE:

Pipelining is CPU implementation technique where multiple operations on a number of

instructions are overlapped.

Clock Number Time in clockcycles

Instruction Number 1 2 3 4 5 6 7

Instruction I IF ID EX MEM WB

Instruction I+1 IF ID EX MEM WB

Instruction I+2 IF ID EX MEM WB

Instruction I+3 IF ID EX MEMWB

Instruction I +4 IF ID EXMEM WB

MIPS Pipeline Stages:

IF = Instruction Fetch-Fetch the instruction from the Instruction Memory

ID = Instruction Decode

EX = Execution

MEM = Memory Access-Mem:Read the data from the Data Memory

WB = Write Back

Page 3/15SNGCE


4/15

Pipelining refers to the technique in which a give n task is divided into a number of subtasks thatneed to be performed in sequence. Each subtask is performed by a give n functional unit. The

units are connected in a serial fashion and all of them operate simultaneously. The use of pipe

lining improves the performance compared to the traditional sequential execution of tasks.Above figures shows an illustration of the basic difference b between executing four subtasks ofa give n instruction (in this case fetching F , decoding D , execution E , and writing the results W

) using pipelining and sequential processing.

Pipeline Vs sequential operation:

It is clear from the figure that the total time require d to process three instructions (I1 , I2 , I) is

only six time units if four-stage pipelining is used as compare d to 12 time units if sequential processing is used. A possible saving of up to 50% in the execution time of these three

instructions is obtained.

Pipeline Hazards: Hazards are situations in pipelining which prevent the next instruction in the instruction

stream from executing during the designated clock cycle possibly resulting in one or

more stall (or wait) cycles. Hazards reduce the ideal speedup (increase CPI > 1) gained from pipelining and are

classified into three classes:

Structural hazards: Arise from hardware resource conflicts when the available

hardware cannot support all possible combinations of instructions.

Page 4/15SNGCE


5/15

Data hazards: Arise when an instruction depends on the result of a previous

instruction in a way that is exposed by the overlapping of instructions in the

pipeline Control hazards: Arise from the pipelining of conditional branches and other

instructions that change the PC

Pipeline Stall Due to Instruction Dependency:

Correct operation of a pipeline requires that operation performed by a stage MUST NOTdepend on the operation(s) performed by other stage(s).

Instruction dependency refers to the case whereby fetching of an instruction depend s onthe results of executing a previous instruction.

Instruction dependency manifests s itself in the execution of a conditional branchinstruction.

Consider, for example, the case of a branch if negative instruction. In this case, the next

instruction to fetch will not be known until the result of executing that branch if

negative instruction is known.

FLYNNS CLASSIFICATION OF COMPUTERS:

The idea of using multiple processors both to increase performance and to improveavailability dates back to the earliest electronic computers. About 30 years ago, Flynn proposed a simple model of categorizing all computers that is still useful today. He

looked at the parallelism in the instruction and data streams called for by the instructions

at the most constrained component of the multiprocessor, and placed all computers in oneof four categories:

(SISD) Single instruction stream, single data stream. This category is

the uniprocessor. SIMD Single Instruction stream, multiple data stream. Example: array

processor and Vector processor. MIMD Multiple instruction stream, Multiple data stream. Example:

SMP and NUMA processors. MISD Multiple instruction, single data. Commercially not implemented.

SISD:

Page 5/15

SNGCE


6/15

It uses only one control register and one processing unit. Normal uniprocessors falls in this

category.

SIMD:

The same instruction is executed by multiple processors using different data streams.Each processor has its own data memory (hence multiple data), but there is a single

instruction memory and control processor, which fetches and dispatches instructions. Each processor works only on the portion of the data that is assigned to it. Of course, the

processes may need to communicate periodically in order to exchange data.

A data parallel algorithm consists of a sequence of elementary instructions applied to thedata: an instruction is initiated only if the previous instruction is ended. Single-Program-

Multiple-Data (SIMD) follows this model where the code is identical on all processors.

MIMD:

Each processor fetches its own instructions and operates on its own data. The processors

are often off-the-shelf microprocessors.

MIMDs offer flexibility. With the correct hardware and software support, MIMDs can

function as single-user multiprocessors focusing on high performance for one application,as multi-programmed multiprocessors running many tasks simultaneously, or as some

combination of these functions.

MIMDs can build on the cost/performance advantages of off-the-shelf microprocessors.

In fact, nearly all multiprocessors built today use the same microprocessors found in

workstations and single-processor servers. The MIMD is of two forms one is shared memory and distributed memory.

Page 6/15

SNGCE


7/15

Examples: SMP and NUMA processors- Shared memory MIMD.

Clusters-Distributed memory MIMD

SMP: This is referred as Symmetric multiprocessors. This has uniform memory access to the

memory. The processors connected in this system communicate with the memory through

interconnection networks. Two or more similar processors of comparable capacity

Processors share same memory and I/O Processors are connected by a bus or other internal connection Memory access time is approximately the same for each processor All processors share access to I/O

Advantages:

Performance

If some work can be done in parallel

Availability

Since all processors can perform the same functions, failure of a single processor

does not halt the system

PE1 PE2 PEn

Interconnection Network

Page 7/15

SNGCE


8/15

Incremental growth

User can enhance performance by adding additional processors

Scaling

Vendors can offer range of products based on number of processors

NUMA:

This is called as non-uniform memory access processors.

Access times to different regions of memory may differ

This consists of several processors connected to their corresponding memory

elements.

Any processor can communicate with any other memory element through

interconnection networks.

The access time thus differs when the processor access its own immediate memory

and other memories through interconnection networks.

Distributed memory:

Collection of independent uni-processors or SMPs. Interconnected to form a cluster.

Communication via fixed path or network connections.

Usually a proprietary high-speed communications network. Data are exchanged between

nodes as messages over the network.

MISD:

No commercial multiprocessor of this type has been built to date, but may be in the

future.

PE1 PE2 PEn

Interconnection Network

ME1 ME2 MEn

Page 8/15SNGCE


9/15

Some special purpose stream processors approximate a limited form of this (there is only

a single data stream that is operated on by successive functional units).

INTERCONNECTION NETWORKS:

According to the mode of operation, INs are classified as synchronous versus

asynchronous. In synchronous mode of operation, a single global clock is used by all components in the

system such that the whole system is operating in a lockstep manner. Asynchronous mode of operation, on the other hand, does not require a global clock. According to the control strategy, INs can be classified as centralized versus

decentralized.

In centralized control system s, a sing le central control u nit is used to oversee andcontrol the operation of the component s of the system.

In decentralize d control; the control function is distributed among different component s

in the system. According to their topology, INs are classified as static versus dynamic networks. In

dynamic networks, connections among inputs and outputs are mad e using switching

elements. Depending on the switch settings, different interconnections can be established. In static

networks, direct fixed paths exist between nodes.

There are no switching elements (node s) in static networks.

Some interconnection types are shown below,

Page 9/15SNGCE


10/15

Linear Array

N nodes, N-1 edges

Node Degree:

Diameter:

Cost:

Fault Tolerance:

Page 10/15SNGCE


11/15

Ring

N nodes, N edges

Node Degree:

Diameter:

Cost:

Fault Tolerance:

Mesh and Torus

Node Degree:Internal 4Other 3, 2

Diameter: 2(n-1)

N = n*n

Node Degree:4

Diameter: 2* floor(n/2)

Page 11/15

pared by KARTHIK.S

SNGCE


12/15

Hypercubes

N = 2d

d dimensions (d = log N)

A cube with d dimensions is made outof 2 cubes of dimension d-1

Symmetric

Degree, Diameter, Cost, Fault tolerance

Node labeling number of bits

Hypercubes

d = 0 d = 1 d = 2 d = 3

0

1

0100

1110

000

001

100 110

111

011

101

010

Page 12/15

pared by KARTHIK.S

SNGCE


13/15

Hypercube of dimension d

N = 2d d = log n

Node degree = d

Number of bits to label a node = d

Diameter = d

Number of edges = n*d/2

Hamming distance!

Routing

Cross bar networks:

Page 13/15

pared by KARTHIK.S

SNGCE


14/15

Page 14/15

pared by KARTHIK.S

SNGCE


15/15

Dynamic Networks

Straight Exchange

Upper-broadcast Lower-broadcast

The different setting of the 2X2 SE

Multi-stage network

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

Page 15/15

pared by KARTHIK.S

SNGCE

module 5 parallel processing

Documents