module 5 parallel processing
TRANSCRIPT
-
8/8/2019 Module 5 Parallel Processing
1/15
MODULE 5 PARALLEL PROCESSING
Page 1/15SNGCE
http://www.nitropdf.com/http://www.artspdf.com/ -
8/8/2019 Module 5 Parallel Processing
2/15
What is Parallel Computing?
Traditionally, software has been written forserialcomputation:
o To be run on a single computer having a single Central Processing Unit (CPU);o A problem is broken into a discrete series of instructions.
o Instructions are executed one after another.o Only one instruction may execute at any moment in time.
In the simplest sense,parallel computingis the simultaneous use of multiple compute
resources to solve a computational problem:
o To be run using multiple CPUs
o A problem is broken into discrete parts that can be solved concurrently
o Each part is further broken down to a series of instructions
o Instructions from each part execute simultaneously on different CPUs
The computer resources can include:
o A single computer with multiple processors;
o An arbitrary number of computers connected by a network;
o A combination of both.
The computational problem usually demonstrates characteristics such as the ability to be:
Page 2/15SNGCE
http://www.nitropdf.com/http://www.artspdf.com/ -
8/8/2019 Module 5 Parallel Processing
3/15
o Broken apart into discrete pieces of work that can be solved simultaneously;
o Execute multiple program instructions at any moment in time;
o Solved in less time with multiple compute resources than with a single computeresource.
DISADVANTAGES:
1. Cache coherence problem:
When multiple processors are connected to form a high performance system, there is a
possibility for cache coherence problem. The processors connected in a system may have theirown cache memories and when they are updating their corresponding cache, it is not updated in
main memory and leads to cache coherence problem. This can be overcome by write through or
write back protocol. Also MESI (Modified, Exclusive, Shared and Invalid) protocol.
PIPELINE:
Pipelining is CPU implementation technique where multiple operations on a number of
instructions are overlapped.
Clock Number Time in clockcycles
Instruction Number 1 2 3 4 5 6 7
Instruction I IF ID EX MEM WB
Instruction I+1 IF ID EX MEM WB
Instruction I+2 IF ID EX MEM WB
Instruction I+3 IF ID EX MEMWB
Instruction I +4 IF ID EXMEM WB
MIPS Pipeline Stages:
IF = Instruction Fetch-Fetch the instruction from the Instruction Memory
ID = Instruction Decode
EX = Execution
MEM = Memory Access-Mem:Read the data from the Data Memory
WB = Write Back
Page 3/15SNGCE
http://www.nitropdf.com/http://www.artspdf.com/ -
8/8/2019 Module 5 Parallel Processing
4/15
Pipelining refers to the technique in which a give n task is divided into a number of subtasks thatneed to be performed in sequence. Each subtask is performed by a give n functional unit. The
units are connected in a serial fashion and all of them operate simultaneously. The use of pipe
lining improves the performance compared to the traditional sequential execution of tasks.Above figures shows an illustration of the basic difference b between executing four subtasks ofa give n instruction (in this case fetching F , decoding D , execution E , and writing the results W
) using pipelining and sequential processing.
Pipeline Vs sequential operation:
It is clear from the figure that the total time require d to process three instructions (I1 , I2 , I) is
only six time units if four-stage pipelining is used as compare d to 12 time units if sequential processing is used. A possible saving of up to 50% in the execution time of these three
instructions is obtained.
Pipeline Hazards: Hazards are situations in pipelining which prevent the next instruction in the instruction
stream from executing during the designated clock cycle possibly resulting in one or
more stall (or wait) cycles. Hazards reduce the ideal speedup (increase CPI > 1) gained from pipelining and are
classified into three classes:
Structural hazards: Arise from hardware resource conflicts when the available
hardware cannot support all possible combinations of instructions.
Page 4/15SNGCE
http://www.nitropdf.com/http://www.artspdf.com/ -
8/8/2019 Module 5 Parallel Processing
5/15
Data hazards: Arise when an instruction depends on the result of a previous
instruction in a way that is exposed by the overlapping of instructions in the
pipeline Control hazards: Arise from the pipelining of conditional branches and other
instructions that change the PC
Pipeline Stall Due to Instruction Dependency:
Correct operation of a pipeline requires that operation performed by a stage MUST NOTdepend on the operation(s) performed by other stage(s).
Instruction dependency refers to the case whereby fetching of an instruction depend s onthe results of executing a previous instruction.
Instruction dependency manifests s itself in the execution of a conditional branchinstruction.
Consider, for example, the case of a branch if negative instruction. In this case, the next
instruction to fetch will not be known until the result of executing that branch if
negative instruction is known.
FLYNNS CLASSIFICATION OF COMPUTERS:
The idea of using multiple processors both to increase performance and to improveavailability dates back to the earliest electronic computers. About 30 years ago, Flynn proposed a simple model of categorizing all computers that is still useful today. He
looked at the parallelism in the instruction and data streams called for by the instructions
at the most constrained component of the multiprocessor, and placed all computers in oneof four categories:
(SISD) Single instruction stream, single data stream. This category is
the uniprocessor. SIMD Single Instruction stream, multiple data stream. Example: array
processor and Vector processor. MIMD Multiple instruction stream, Multiple data stream. Example:
SMP and NUMA processors. MISD Multiple instruction, single data. Commercially not implemented.
SISD:
Page 5/15
SNGCE
http://www.nitropdf.com/http://www.artspdf.com/ -
8/8/2019 Module 5 Parallel Processing
6/15
It uses only one control register and one processing unit. Normal uniprocessors falls in this
category.
SIMD:
The same instruction is executed by multiple processors using different data streams.Each processor has its own data memory (hence multiple data), but there is a single
instruction memory and control processor, which fetches and dispatches instructions. Each processor works only on the portion of the data that is assigned to it. Of course, the
processes may need to communicate periodically in order to exchange data.
A data parallel algorithm consists of a sequence of elementary instructions applied to thedata: an instruction is initiated only if the previous instruction is ended. Single-Program-
Multiple-Data (SIMD) follows this model where the code is identical on all processors.
MIMD:
Each processor fetches its own instructions and operates on its own data. The processors
are often off-the-shelf microprocessors.
MIMDs offer flexibility. With the correct hardware and software support, MIMDs can
function as single-user multiprocessors focusing on high performance for one application,as multi-programmed multiprocessors running many tasks simultaneously, or as some
combination of these functions.
MIMDs can build on the cost/performance advantages of off-the-shelf microprocessors.
In fact, nearly all multiprocessors built today use the same microprocessors found in
workstations and single-processor servers. The MIMD is of two forms one is shared memory and distributed memory.
Page 6/15
SNGCE
http://www.nitropdf.com/http://www.artspdf.com/ -
8/8/2019 Module 5 Parallel Processing
7/15
Examples: SMP and NUMA processors- Shared memory MIMD.
Clusters-Distributed memory MIMD
SMP: This is referred as Symmetric multiprocessors. This has uniform memory access to the
memory. The processors connected in this system communicate with the memory through
interconnection networks. Two or more similar processors of comparable capacity
Processors share same memory and I/O Processors are connected by a bus or other internal connection Memory access time is approximately the same for each processor All processors share access to I/O
Advantages:
Performance
If some work can be done in parallel
Availability
Since all processors can perform the same functions, failure of a single processor
does not halt the system
PE1 PE2 PEn
Interconnection Network
Page 7/15
SNGCE
http://www.nitropdf.com/http://www.artspdf.com/ -
8/8/2019 Module 5 Parallel Processing
8/15
Incremental growth
User can enhance performance by adding additional processors
Scaling
Vendors can offer range of products based on number of processors
NUMA:
This is called as non-uniform memory access processors.
Access times to different regions of memory may differ
This consists of several processors connected to their corresponding memory
elements.
Any processor can communicate with any other memory element through
interconnection networks.
The access time thus differs when the processor access its own immediate memory
and other memories through interconnection networks.
Distributed memory:
Collection of independent uni-processors or SMPs. Interconnected to form a cluster.
Communication via fixed path or network connections.
Usually a proprietary high-speed communications network. Data are exchanged between
nodes as messages over the network.
MISD:
No commercial multiprocessor of this type has been built to date, but may be in the
future.
PE1 PE2 PEn
Interconnection Network
ME1 ME2 MEn
Page 8/15SNGCE
http://www.nitropdf.com/http://www.artspdf.com/ -
8/8/2019 Module 5 Parallel Processing
9/15
Some special purpose stream processors approximate a limited form of this (there is only
a single data stream that is operated on by successive functional units).
INTERCONNECTION NETWORKS:
According to the mode of operation, INs are classified as synchronous versus
asynchronous. In synchronous mode of operation, a single global clock is used by all components in the
system such that the whole system is operating in a lockstep manner. Asynchronous mode of operation, on the other hand, does not require a global clock. According to the control strategy, INs can be classified as centralized versus
decentralized.
In centralized control system s, a sing le central control u nit is used to oversee andcontrol the operation of the component s of the system.
In decentralize d control; the control function is distributed among different component s
in the system. According to their topology, INs are classified as static versus dynamic networks. In
dynamic networks, connections among inputs and outputs are mad e using switching
elements. Depending on the switch settings, different interconnections can be established. In static
networks, direct fixed paths exist between nodes.
There are no switching elements (node s) in static networks.
Some interconnection types are shown below,
Page 9/15SNGCE
http://www.nitropdf.com/http://www.artspdf.com/ -
8/8/2019 Module 5 Parallel Processing
10/15
Linear Array
N nodes, N-1 edges
Node Degree:
Diameter:
Cost:
Fault Tolerance:
Page 10/15SNGCE
http://www.nitropdf.com/http://www.artspdf.com/ -
8/8/2019 Module 5 Parallel Processing
11/15
Ring
N nodes, N edges
Node Degree:
Diameter:
Cost:
Fault Tolerance:
Mesh and Torus
Node Degree:Internal 4Other 3, 2
Diameter: 2(n-1)
N = n*n
Node Degree:4
Diameter: 2* floor(n/2)
Page 11/15
pared by KARTHIK.S
SNGCE
http://www.nitropdf.com/http://www.artspdf.com/ -
8/8/2019 Module 5 Parallel Processing
12/15
Hypercubes
N = 2d
d dimensions (d = log N)
A cube with d dimensions is made outof 2 cubes of dimension d-1
Symmetric
Degree, Diameter, Cost, Fault tolerance
Node labeling number of bits
Hypercubes
d = 0 d = 1 d = 2 d = 3
0
1
0100
1110
000
001
100 110
111
011
101
010
Page 12/15
pared by KARTHIK.S
SNGCE
http://www.nitropdf.com/http://www.artspdf.com/ -
8/8/2019 Module 5 Parallel Processing
13/15
Hypercube of dimension d
N = 2d d = log n
Node degree = d
Number of bits to label a node = d
Diameter = d
Number of edges = n*d/2
Hamming distance!
Routing
Cross bar networks:
Page 13/15
pared by KARTHIK.S
SNGCE
http://www.nitropdf.com/http://www.artspdf.com/ -
8/8/2019 Module 5 Parallel Processing
14/15
Page 14/15
pared by KARTHIK.S
SNGCE
http://www.nitropdf.com/http://www.artspdf.com/ -
8/8/2019 Module 5 Parallel Processing
15/15
Dynamic Networks
Straight Exchange
Upper-broadcast Lower-broadcast
The different setting of the 2X2 SE
Multi-stage network
000
001
010
011
100
101
110
111
000
001
010
011
100
101
110
111
Page 15/15
pared by KARTHIK.S
SNGCE
http://www.nitropdf.com/http://www.artspdf.com/