module 5 parallel processing

Upload: skarthikmtech

Post on 10-Apr-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Module 5 Parallel Processing

    1/15

    MODULE 5 PARALLEL PROCESSING

    Page 1/15SNGCE

    http://www.nitropdf.com/http://www.artspdf.com/
  • 8/8/2019 Module 5 Parallel Processing

    2/15

    What is Parallel Computing?

    Traditionally, software has been written forserialcomputation:

    o To be run on a single computer having a single Central Processing Unit (CPU);o A problem is broken into a discrete series of instructions.

    o Instructions are executed one after another.o Only one instruction may execute at any moment in time.

    In the simplest sense,parallel computingis the simultaneous use of multiple compute

    resources to solve a computational problem:

    o To be run using multiple CPUs

    o A problem is broken into discrete parts that can be solved concurrently

    o Each part is further broken down to a series of instructions

    o Instructions from each part execute simultaneously on different CPUs

    The computer resources can include:

    o A single computer with multiple processors;

    o An arbitrary number of computers connected by a network;

    o A combination of both.

    The computational problem usually demonstrates characteristics such as the ability to be:

    Page 2/15SNGCE

    http://www.nitropdf.com/http://www.artspdf.com/
  • 8/8/2019 Module 5 Parallel Processing

    3/15

    o Broken apart into discrete pieces of work that can be solved simultaneously;

    o Execute multiple program instructions at any moment in time;

    o Solved in less time with multiple compute resources than with a single computeresource.

    DISADVANTAGES:

    1. Cache coherence problem:

    When multiple processors are connected to form a high performance system, there is a

    possibility for cache coherence problem. The processors connected in a system may have theirown cache memories and when they are updating their corresponding cache, it is not updated in

    main memory and leads to cache coherence problem. This can be overcome by write through or

    write back protocol. Also MESI (Modified, Exclusive, Shared and Invalid) protocol.

    PIPELINE:

    Pipelining is CPU implementation technique where multiple operations on a number of

    instructions are overlapped.

    Clock Number Time in clockcycles

    Instruction Number 1 2 3 4 5 6 7

    Instruction I IF ID EX MEM WB

    Instruction I+1 IF ID EX MEM WB

    Instruction I+2 IF ID EX MEM WB

    Instruction I+3 IF ID EX MEMWB

    Instruction I +4 IF ID EXMEM WB

    MIPS Pipeline Stages:

    IF = Instruction Fetch-Fetch the instruction from the Instruction Memory

    ID = Instruction Decode

    EX = Execution

    MEM = Memory Access-Mem:Read the data from the Data Memory

    WB = Write Back

    Page 3/15SNGCE

    http://www.nitropdf.com/http://www.artspdf.com/
  • 8/8/2019 Module 5 Parallel Processing

    4/15

    Pipelining refers to the technique in which a give n task is divided into a number of subtasks thatneed to be performed in sequence. Each subtask is performed by a give n functional unit. The

    units are connected in a serial fashion and all of them operate simultaneously. The use of pipe

    lining improves the performance compared to the traditional sequential execution of tasks.Above figures shows an illustration of the basic difference b between executing four subtasks ofa give n instruction (in this case fetching F , decoding D , execution E , and writing the results W

    ) using pipelining and sequential processing.

    Pipeline Vs sequential operation:

    It is clear from the figure that the total time require d to process three instructions (I1 , I2 , I) is

    only six time units if four-stage pipelining is used as compare d to 12 time units if sequential processing is used. A possible saving of up to 50% in the execution time of these three

    instructions is obtained.

    Pipeline Hazards: Hazards are situations in pipelining which prevent the next instruction in the instruction

    stream from executing during the designated clock cycle possibly resulting in one or

    more stall (or wait) cycles. Hazards reduce the ideal speedup (increase CPI > 1) gained from pipelining and are

    classified into three classes:

    Structural hazards: Arise from hardware resource conflicts when the available

    hardware cannot support all possible combinations of instructions.

    Page 4/15SNGCE

    http://www.nitropdf.com/http://www.artspdf.com/
  • 8/8/2019 Module 5 Parallel Processing

    5/15

    Data hazards: Arise when an instruction depends on the result of a previous

    instruction in a way that is exposed by the overlapping of instructions in the

    pipeline Control hazards: Arise from the pipelining of conditional branches and other

    instructions that change the PC

    Pipeline Stall Due to Instruction Dependency:

    Correct operation of a pipeline requires that operation performed by a stage MUST NOTdepend on the operation(s) performed by other stage(s).

    Instruction dependency refers to the case whereby fetching of an instruction depend s onthe results of executing a previous instruction.

    Instruction dependency manifests s itself in the execution of a conditional branchinstruction.

    Consider, for example, the case of a branch if negative instruction. In this case, the next

    instruction to fetch will not be known until the result of executing that branch if

    negative instruction is known.

    FLYNNS CLASSIFICATION OF COMPUTERS:

    The idea of using multiple processors both to increase performance and to improveavailability dates back to the earliest electronic computers. About 30 years ago, Flynn proposed a simple model of categorizing all computers that is still useful today. He

    looked at the parallelism in the instruction and data streams called for by the instructions

    at the most constrained component of the multiprocessor, and placed all computers in oneof four categories:

    (SISD) Single instruction stream, single data stream. This category is

    the uniprocessor. SIMD Single Instruction stream, multiple data stream. Example: array

    processor and Vector processor. MIMD Multiple instruction stream, Multiple data stream. Example:

    SMP and NUMA processors. MISD Multiple instruction, single data. Commercially not implemented.

    SISD:

    Page 5/15

    SNGCE

    http://www.nitropdf.com/http://www.artspdf.com/
  • 8/8/2019 Module 5 Parallel Processing

    6/15

    It uses only one control register and one processing unit. Normal uniprocessors falls in this

    category.

    SIMD:

    The same instruction is executed by multiple processors using different data streams.Each processor has its own data memory (hence multiple data), but there is a single

    instruction memory and control processor, which fetches and dispatches instructions. Each processor works only on the portion of the data that is assigned to it. Of course, the

    processes may need to communicate periodically in order to exchange data.

    A data parallel algorithm consists of a sequence of elementary instructions applied to thedata: an instruction is initiated only if the previous instruction is ended. Single-Program-

    Multiple-Data (SIMD) follows this model where the code is identical on all processors.

    MIMD:

    Each processor fetches its own instructions and operates on its own data. The processors

    are often off-the-shelf microprocessors.

    MIMDs offer flexibility. With the correct hardware and software support, MIMDs can

    function as single-user multiprocessors focusing on high performance for one application,as multi-programmed multiprocessors running many tasks simultaneously, or as some

    combination of these functions.

    MIMDs can build on the cost/performance advantages of off-the-shelf microprocessors.

    In fact, nearly all multiprocessors built today use the same microprocessors found in

    workstations and single-processor servers. The MIMD is of two forms one is shared memory and distributed memory.

    Page 6/15

    SNGCE

    http://www.nitropdf.com/http://www.artspdf.com/
  • 8/8/2019 Module 5 Parallel Processing

    7/15

    Examples: SMP and NUMA processors- Shared memory MIMD.

    Clusters-Distributed memory MIMD

    SMP: This is referred as Symmetric multiprocessors. This has uniform memory access to the

    memory. The processors connected in this system communicate with the memory through

    interconnection networks. Two or more similar processors of comparable capacity

    Processors share same memory and I/O Processors are connected by a bus or other internal connection Memory access time is approximately the same for each processor All processors share access to I/O

    Advantages:

    Performance

    If some work can be done in parallel

    Availability

    Since all processors can perform the same functions, failure of a single processor

    does not halt the system

    PE1 PE2 PEn

    Interconnection Network

    Page 7/15

    SNGCE

    http://www.nitropdf.com/http://www.artspdf.com/
  • 8/8/2019 Module 5 Parallel Processing

    8/15

    Incremental growth

    User can enhance performance by adding additional processors

    Scaling

    Vendors can offer range of products based on number of processors

    NUMA:

    This is called as non-uniform memory access processors.

    Access times to different regions of memory may differ

    This consists of several processors connected to their corresponding memory

    elements.

    Any processor can communicate with any other memory element through

    interconnection networks.

    The access time thus differs when the processor access its own immediate memory

    and other memories through interconnection networks.

    Distributed memory:

    Collection of independent uni-processors or SMPs. Interconnected to form a cluster.

    Communication via fixed path or network connections.

    Usually a proprietary high-speed communications network. Data are exchanged between

    nodes as messages over the network.

    MISD:

    No commercial multiprocessor of this type has been built to date, but may be in the

    future.

    PE1 PE2 PEn

    Interconnection Network

    ME1 ME2 MEn

    Page 8/15SNGCE

    http://www.nitropdf.com/http://www.artspdf.com/
  • 8/8/2019 Module 5 Parallel Processing

    9/15

    Some special purpose stream processors approximate a limited form of this (there is only

    a single data stream that is operated on by successive functional units).

    INTERCONNECTION NETWORKS:

    According to the mode of operation, INs are classified as synchronous versus

    asynchronous. In synchronous mode of operation, a single global clock is used by all components in the

    system such that the whole system is operating in a lockstep manner. Asynchronous mode of operation, on the other hand, does not require a global clock. According to the control strategy, INs can be classified as centralized versus

    decentralized.

    In centralized control system s, a sing le central control u nit is used to oversee andcontrol the operation of the component s of the system.

    In decentralize d control; the control function is distributed among different component s

    in the system. According to their topology, INs are classified as static versus dynamic networks. In

    dynamic networks, connections among inputs and outputs are mad e using switching

    elements. Depending on the switch settings, different interconnections can be established. In static

    networks, direct fixed paths exist between nodes.

    There are no switching elements (node s) in static networks.

    Some interconnection types are shown below,

    Page 9/15SNGCE

    http://www.nitropdf.com/http://www.artspdf.com/
  • 8/8/2019 Module 5 Parallel Processing

    10/15

    Linear Array

    N nodes, N-1 edges

    Node Degree:

    Diameter:

    Cost:

    Fault Tolerance:

    Page 10/15SNGCE

    http://www.nitropdf.com/http://www.artspdf.com/
  • 8/8/2019 Module 5 Parallel Processing

    11/15

    Ring

    N nodes, N edges

    Node Degree:

    Diameter:

    Cost:

    Fault Tolerance:

    Mesh and Torus

    Node Degree:Internal 4Other 3, 2

    Diameter: 2(n-1)

    N = n*n

    Node Degree:4

    Diameter: 2* floor(n/2)

    Page 11/15

    pared by KARTHIK.S

    SNGCE

    http://www.nitropdf.com/http://www.artspdf.com/
  • 8/8/2019 Module 5 Parallel Processing

    12/15

    Hypercubes

    N = 2d

    d dimensions (d = log N)

    A cube with d dimensions is made outof 2 cubes of dimension d-1

    Symmetric

    Degree, Diameter, Cost, Fault tolerance

    Node labeling number of bits

    Hypercubes

    d = 0 d = 1 d = 2 d = 3

    0

    1

    0100

    1110

    000

    001

    100 110

    111

    011

    101

    010

    Page 12/15

    pared by KARTHIK.S

    SNGCE

    http://www.nitropdf.com/http://www.artspdf.com/
  • 8/8/2019 Module 5 Parallel Processing

    13/15

    Hypercube of dimension d

    N = 2d d = log n

    Node degree = d

    Number of bits to label a node = d

    Diameter = d

    Number of edges = n*d/2

    Hamming distance!

    Routing

    Cross bar networks:

    Page 13/15

    pared by KARTHIK.S

    SNGCE

    http://www.nitropdf.com/http://www.artspdf.com/
  • 8/8/2019 Module 5 Parallel Processing

    14/15

    Page 14/15

    pared by KARTHIK.S

    SNGCE

    http://www.nitropdf.com/http://www.artspdf.com/
  • 8/8/2019 Module 5 Parallel Processing

    15/15

    Dynamic Networks

    Straight Exchange

    Upper-broadcast Lower-broadcast

    The different setting of the 2X2 SE

    Multi-stage network

    000

    001

    010

    011

    100

    101

    110

    111

    000

    001

    010

    011

    100

    101

    110

    111

    Page 15/15

    pared by KARTHIK.S

    SNGCE

    http://www.nitropdf.com/http://www.artspdf.com/