computerarchitecture abhishekmail 130520052349 phpapp02

150
COMPUTER ARCHITECTURE

Upload: arjoghosh

Post on 12-Oct-2015

8 views

Category:

Documents


0 download

DESCRIPTION

Computer architecture

TRANSCRIPT

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    1/150

    COMPUTER ARCHITECTURE

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    2/150

    MICROPROCESSOR

    IT IS ONE OF THE GREATEST

    ACHIEVEMENTS OF THE 20THCENTURY.

    IT USHERED IN THE ERA OF WIDESPREAD

    COMPUTERIZATION.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    3/150

    EARLY ARCHITECTURE

    VON NEUMANN ARCHITECTURE, 1940

    PROGRAM IS STORED IN MEMORY.

    SEQUENTIAL OPERATION

    ONE INST IS RETRIEVED AT A TIME, DECODEDAND EXECUTED

    LIMITED SPEED

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    4/150

    Conventional 32 bit

    Microprocessors Higher data throughput with 32 bit wide data bus

    Larger direct addressing range

    Higher clock frequencies and operating speeds as a resultof improvements in semiconductor technology

    Higher processing speeds because larger registers requirefewer calls to memory and reg-to-reg transfers are 5 timesfaster than reg-to-memory transfers

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    5/150

    Conventional 32 bit

    Microprocessors More insts and addressing modes to improve software

    efficiency

    More registers to support High-level languages

    More extensive memory management & coprocessorcapabilities

    Cache memories and inst pipelines to increase processingspeed and reduce peak bus loads

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    6/150

    Conventional 32 bit

    Microprocessors To construct a complete general purpose 32 bit

    microprocessor, five basic functions are necessary:

    ALU

    MMU

    FPU

    INTERRUPT CONTROLLER TIMING CONTROL

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    7/150

    Conventional Architecture

    KNOWN AS VON NEUMANN ARCHITECTURE

    ITS MAIN FEATURES ARE:

    A SINGLE COMPUTING ELEMENT INCORPORATING APROCESSOR, COMM. PATH AND MEMORY

    A LINEAR ORGANIZATION OF FIXED SIZE MEMORYCELLS

    A LOW-LEVEL MACHINE LANGUAGE WITH INSTSPERFORMING SIMPLE OPERATIONS ON ELEMENTARYOPERANDS

    SEQUENTIAL, CENTRALIZED CONTROL OFCOMPUTATION

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    8/150

    Conventional Architecture

    Single processor configuration:

    PROCESSOR

    MEMORY INPUT-OUTPUT

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    9/150

    Conventional Architecture

    Multiple processor configuration with a global bus:

    SYSTEM

    INPUT-

    OUTPUT

    GLOBAL

    MEMORY

    PROCESSORS

    WITH LOCAL

    MEM & I/O

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    10/150

    Conventional Architecture

    THE EARLY COMP ARCHITECTURES WERE

    DESIGNED FOR SCIENTIFIC AND COMMERCIAL

    CALCULATIONS AND WERE DEVEPOED TO

    INCREASE THE SPEED OF EXECUTION OF SIMPLEINSTRUCTIONS.

    TO MAKE THE COMPUTERS PERFORM MORE

    COMPLEX PROCESSES, MUCH MORE COMPLEX

    SOFTWARE WAS REQUIRED.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    11/150

    Conventional Architecture

    ADVANCEMENT OF TECHNOLOGY ENHANCED

    THE SPEED OF EXECUTION OF PROCESSORS BUT

    A SINGLE COMM PATH HAD TO BE USED TO

    TRANSFER INSTS AND DATA BETWEEN THEPROCESSOR AND THE MEMORY.

    MEMORY SIZE INCREASED. THE RESULT WAS

    THE DATA TRANSFER RATE ON THE MEMORY

    INTERFACE ACTED AS A SEVERE CONSTRAINT

    ON THE PROCESSING SPEED.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    12/150

    Conventional Architecture

    AS HIGHER SPEED MEMORY BECAME

    AVAILABLE, THE DELAYS INTRODUCED BY THE

    CAPACITANCE AND THE TRANSMISSION LINE

    DELAYS ON THE MEMORY BUS AND THEPROPAGATION DELAYS IN THE BUFFER AND

    ADDRESS DECODING CIRCUITRY BECAME MORE

    SIGNIFICANT AND PLACED AN UPPER LIMIT ON

    THE PROCESSING SPEED.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    13/150

    Conventional Architecture

    THE USE OF MULTIPROCESSOR BUSES WITH THENEED FOR ARBITRATION BETWEEN COMPUTERSREQUESTING CONTROL OF THE BUS REDUCED

    THE PROBLEM BUT INTRODUCED SEVERAL WAITCYCLES WHILE THE DATA OR INST WEREFETCHED FROM MEMORY.

    ONE METHOD OF INCREASING PROCESSINGSPEED AND DATA THROUGHPUT ON THEMEMORY BUS WAS TO INCREASE THE NUMBEROF PARALLEL BITS TRANSFERRED ON THE BUS.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    14/150

    Conventional Architecture

    THE GLOBAL BUS AND THE GLOBAL MEMORYCAN ONLY SERVE ONE PROCESSOR AT A TIME.

    AS MORE PROCESSORS ARE ADDED TOINCREASE THE PROCESSING SPEED, THE GLOBALBUS BOTTLENECK BECAME WORSE.

    IF THE PROCESSING CONSISTS OF SEVERALINDEPENDENT TASKS, EACH PROC WILLCOMPETE FOR GLOBAL MEMORY ACCESS ANDGLOBAL BUS TRANSFER TIME.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    15/150

    Conventional Architecture

    TYPICALLY, ONLY 3 OR 4 TIMES THE SPEED OF A SINGLE

    PROCESSOR CAN BE ACHIEVED IN MULTIPROCESSOR

    SYSTEMS WITH GLOBAL MEMORY AND A GLOBAL BUS.

    TO REDUCE THE EFFECT OF GLOBAL MEMORY BUS AS A

    BOTTLENECK, (1) THE LENGTH OF THE PIPE WAS

    INCREASED SO THAT THE INST COULD BE BROKEN

    DOWN INTO BASIC ELEMENTS TO BE MANIPULATED

    SIMULTANEOUSLY AND (2) CACHE MEM WASINTRODUCED SO THAT INSTS AND/OR DATA COULD BE

    PREFETCHED FROM GLOBAL MEMORY AND STORED IN

    HIGH SPEED LOCAL MEMORY.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    16/150

    PIPELINING

    THE PROC. SEPARATES EACH INST INTO ITS BASIC

    OPERATIONS AND USES DEDICATED EXECUTION UNITS

    FOR EACH TYPE OF OPERATION.

    THE MOST BASIC FORM OF PIPELINING IS TO PREFETCH

    THE NEXT INSTRUCTION WHILE SIMULTANEOULY

    EXECUTING THE PREVIOUS INSTRUCTION.

    THIS MAKES USE OF THE BUS TIME WHICH WOULDOTHERWISE BE WASTED AND REDUCES INSTRUCTION

    EXECUTION TIME.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    17/150

    PIPELINING

    TO SHOW THE USE OF A PIPELINE, CONSIDER THEMULTIPLICATION OF 2 DECIMAL NOS: 3.8 X 102AND9.6X103. THE PROC. PERFORMS 3 OPERATIONS:

    A: MULTIPLIES THE MANTISSA

    B: ADDS THE EXPONENTS

    C: NORMALISES THE RESULT TO PLACE THE DECIMALPOINT IN THE CORRECT POSITION.

    IF 3 EXECUTION UNITS PERFORMED THESEOPERATIONS, OPS.A & B WOULD DO NOTHING WHILE C

    IS BEING PERFORMED. IF A PIPELINE WERE IMPLEMENTED, THE NEXT

    NUMBER COULD BE PROCESSED IN EXECUTION UNITS AAND B WHILE C WAS BEING PERFORMED.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    18/150

    PIPELINING

    TO GET A ROUGH INDICATION OF PERFORMANCE

    INCREASE THROUGH PIPELINE, THE STAGE EXECUTION

    INTERVAL MAY BE TAKEN TO BE THE EXECUTION TIME

    OF THE SLOWEST PIPELINE STAGE.

    THE PERFORMANCE INCREASE FROM PIPELINING IS

    ROUGHLY EQUAL TO THE SUM OF THE AVERAGE

    EXECUTION TIMES FOR ALL STAGES OF THE PIPELINE,

    DIVIDED BY THE AVERAGE VALUE OF THE EXECUTIONTIME OF THE SLOWEST PIPELINE STAGE FOR THE INST

    MIX CONSIDERED.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    19/150

    PIPELINING

    NON-SEQUENTIAL INSTS CAUSE THE

    INSTRUCTIONS BEHIND IN THE PIPELINE TO BE

    EMPTIED AND FILLING TO BE RESTARTED.

    NON-SEQUENTIAL INSTS. TYPICALLY COMPRISE

    15 TO 30% OF INSTRUCTIONS AND THEY REDUCE

    PIPELINE PERFORMANCE BY A GREATER

    PERCENTAGE THAN THEIR PROBABILITY OFOCCURRENCE.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    20/150

    CACHE MEMORY

    VON-NEUMANN SYSTEM PERFORMANCE ISCONSIDERABLY EFFECTED BY MEMORY ACCESS TIMEAND MEMORY BW (MAXIMUM MEMORY TRANSFERRATE).

    THESE LIMITATIONS ARE SPECIALLY TIGHT FOR 32 BITPROCESSORS WITH HIGH CLOCK SPEEDS.

    WHILE STATIC RAM WITH 25ns ACCESS TIMES ARE

    CAPABLE OF KEEPING PACE WITH PROC SPEED, THEYMUST BE LOCATED ON THE SAME BOARD TO MINIMISEDELAYS, THUS LIMITING THE AMOUNT OF HIGH SPEEDMEMORY AVAILABLE.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    21/150

    CACHE MEMORY

    DRAM HAS A GREATER CAPACITY PER CHIP ANDA LOWER COST, BUT EVEN THE FASTEST DRAMCANT KEEP PACE WITH THE PROCESSOR,

    PARTICULARLY WHEN IT IS LOCATED ON ASEPARATE BOARD ATTACHED TO A MEMORYBUS.

    WHEN A PROC REQUIRES INST/DATA FROM/TOMEMORY, IT ENTERS A WAIT STATE UNTIL IT ISAVAILABLE. THIS REDUCES PROCESSORSPERFORMANCE.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    22/150

    CACHE MEMORY

    CACHE ACTS AS A FAST LOCAL STORAGEBUFFER BETWEEN THE PROC AND THE MAINMEMORY.

    OFF-CHIP BUT ON-BOARD CACHE MAY REQUIRESEVERAL MEMORY CYCLES WHEREAS ON-CHIPCACHE MAY ONLY REQUIRE ONE MEMORYCYCLE, BUT ON-BOARD CACHE CAN PREVENTTHE EXCESSIVE NO. OF WAIT STATES IMPOSEDBY MEMORY ON THE SYSTEM BUS AND ITREDUCES THE SYSTEM BUS LOAD.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    23/150

    CACHE MEMORY

    THE COST OF IMPLEMENTING AN ON-BOARD

    CACHE IS MUCH LOWER THAN THE COST OF

    FASTER SYSTEM MEMORY REQUIRED TO

    ACHIEVE THE SAME MEMORY PERFORMANCE.

    CACHE PERFORMANCE DEPENDS ON ACCESS

    TIME AND HIT RATIO, WHICH IS DEPENDENT ON

    THE SIZE OF THE CACHE AND THE NO. OF BYTESBROUGHT INTO CACHE ON ANY FETCH FROM

    THE MAIN MEMORY (THE LINE SIZE).

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    24/150

    CACHE MEMORY

    INCREASING THE LINE SIZE INCREASES THE

    CHANCE THAT THERE WILL BE A CACHE HIT ON

    THE NEXT MEMORY REFERENCE.

    IF A 4K BYTE CACHE WITH A 4 BYTE LINE SIZE

    HAS A HIT RATIO OF 80%, DOUBLING THE LINE

    SIZE MIGHT INCREASE THE HIT RATIO TO 85%

    BUT DOUBLING THE LINE SIZE AGAIN MIGHTONLY INCREASE THE HIT RATIO TO 87%.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    25/150

    CACHE MEMORY

    OVERALL MEMORY PERFORMANCE IS A

    FUNCTION OF CACHE ACCESS TIME, CACHE HIT

    RATIO AND MAIN MEMORY ACCESS TIME FOR

    CACHE MISSES.

    A SYSTEM WITH 80% CACHE HIT RATIO AND

    120ns CACHE ACCESS TIME ACCESSES MAIN

    MEMORY 20% OF THE TIME WITH AN ACCESSTIME OF 600 ns. THE AV ACCESS TIME IN ns WILL

    BE (0.8x120) +[0.2x(600 + 120)]= 240

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    26/150

    CACHE DESIGN

    PROCESSORS WITH DEMAND PAGED VIRTUAL

    MEMORY SYSTEMS REQUIRE AN ASSOCIATIVE

    CACHE.

    VIRTUAL MEM SYSTEMS ORGANIZE ADDRESSES

    BY THE START ADDRESSES FOR EACH PAGE AND

    AN OFFSET WHICH LOCATES THE DATA WITHIN

    THE PAGE.

    AN ASSOCIATIVE CACHE ASSOCIATES THEOFFSET WITH THE PAGE ADDRESS TO FIND THE

    DATA NEEDED.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    27/150

    CACHE DESIGN

    WHEN ACCESSED, THE CACHE CHECKS TO SEE IF ITCONTAINS THE PAGE ADDRESS (OR TAG FIELD); IF SO, ITADDS THE OFFSET AND, IF A CACHE HIT IS DETECTED,THE DATA IS FETCHED IMMEDIATELY FROM THE

    CACHE. PROBLEMS CAN OCCUR IN A SINGLE SET-ASSOCIATIVE

    CACHE IF WORDS WITHIN DIFFERENT PAGES HAVE THESAME OFFSET.

    TO MINIMISE THIS PROBLEM A 2-WAY SET-ASSOCIATIVE

    CACHE IS USED. THIS IS ABLE TO ASSOCIATE MORETHAN ONE SET OF TAGS AT A TIME ALLOWING THECACHE TO STORE THE SAME OFFSET FROM TWODIFFERENT PAGES.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    28/150

    CACHE DESIGN

    A FULLY ASSOCIATIVE CACHE ALLOWS ANYNUMBER OF PAGES TO USE THE CACHESIMULTANEOUSLY.

    A CACHE REQUIRES A REPLACEMENTALGORITHM TO FIND REPLACEMENT CACHELINES WHEN A MISS OCCURS.

    PROCESSORS THAT DO NOT USE DEMAND PAGEDVIRTUAL MEMORY, CAN EMPLOY A DIRECTMAPPED CACHE WHICH CORRESPONDS EXACTLYTO THE PAGE SIZE AND ALLOWS DATA FROMONLY ONE PAGE TO BE STORED AT A TIME.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    29/150

    MEMORY ARCHITECTURES

    32 BIT PROCESSORS HAVE INTRODUCED 3 NEW

    CONCEPTS IN THE WAY THE MEMORY IS

    INTERFACED:

    1. LOCAL MEMORY BUS EXTENSIONS

    2. MEMORY INTEREAVING

    3. VIRTUAL MEMORY MANAGEMENT

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    30/150

    LOCAL MEM BUS EXTENSIONS

    IT PERMITS LARGER LOCAL MEMORIES TO BECONNECTED WITHOUT THE DELAYS CAUSED BY BUSREQUESTS AND BUS ARBITRATION FOUND ONMULTIPROCESSOR BUSES.

    IT HAS BEEN PROVIDED TO INCREASE THE SIZE OF THELOCAL MEMORY ABOVE THAT WHICH CAN BEACCOMODATED ON THE PROCESSOR BOARD.

    BY OVERLAPPING THE LOCAL MEM BUS AND THESYSTEM BUS CYCLES IT IS POSSIBLE TO ACHIEVE

    HIGHER MEM ACCESS RATES FROM PROCESSORS WITHPIPELINES WHICH PERMIT THE ADDRESS OF THE NEXTMEMORY REFERENCE TO BE GENERATED WHILE THEPREVIOUS DATA WORD IS BEING FETCHED.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    31/150

    MEMORY INTERLEAVING

    PIPELINED PROCESSORS WITH THE ABILITY TO

    GENERATE THE ADDRESS OF THE NEXT MEMORY

    REFERENCE WHILE FETCHING THE PREVIOUS

    DATA WORD WOULD BE SLOWED DOWN IF THEMEMORY WERE UNABLE TO BEGIN THE NEXT

    MEMORY ACCESS UNTIL THE PREVIOUS MEM

    CYCLE HAD BEEN COMPLETED.

    THE SOLUTION IS TO USE TWO-WAY MEMORYINTERLEAVING. IT USES 2 MEM BOARDS- 1 FOR

    ODD ADDRESSES AND 1 FOR EVEN ADDRESSES.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    32/150

    MEMORY INTERLEAVING

    ONE BOARD CAN BEGIN THE NEXT MEM CYCLE

    WHILE THE OTHER BOARD COMPLETES THE

    PREVIOUS CYCLE.

    THE SPEED ADV IS GREATEST WHEN MULTIPLE

    SEQUENTIAL MEM ACCESSES ARE REQUIRED

    FOR BURST I/O TRANSFERS BY DMA.

    DMA DEFINES A BLOCK TRANSFER IN TERMS OF

    A STARTING ADDRESS AND A WORD COUNT FORSEQUENTIAL MEM ACCESSES.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    33/150

    MEMORY INTERLEAVING

    TWO WAY INTERLEAVING MAY NOT PREVENT

    MEM WAIT STATES FOR SOME FAST SIGNAL

    PROCESSING APPLICATIONS AND SYSTEMS HAVE

    BEEN DESIGNED WITH 4 OR MORE WAYINTERLEAVING IN WHICH THE MEM BOARDS ARE

    ASSIGNED CONSECUTIVE ADDRESSES BY A

    MEMORY CONTROLLER.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    34/150

    Conventional Architecture

    EVEN WITH THESE ENHANCEMENTS, THE SEQUENTIAL

    VON NEUMANN ARCHITECTURE REACHED THE LIMITS

    IN PROCESSING SPEED BECAUSE THE SEQUENTIAL

    FETCHING OF INSTS AND DATA THROUGH A COMMON

    MEMORY INTERFACE FORMED THE BOTTLENECK.

    THUS, PARALLEL PROC ARCHITECTURES CAME INTO

    BEING WHICH PERMIT LARGE NUMBER OF COMPUTING

    ELEMENTS TO BE PROGRAMMED TO WORK TOGETHERSIMULTANEOUSLY. THE USEFULNESS OF PARALLEL

    PROCESSOR DEPENDS UPON THE AVAILABILITY OF

    SUITABLE PARALLEL ALGORITHMS.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    35/150

    HOW TO INCREASE THE SYSTEM SPEED?

    1. USING FASTER COMPONENTS. COSTS MORE,

    DISSIPATE CONSIDERABLE HEAT.

    THE RATE OF GROWTH OF SPEED USING

    BETTER TECHNOLOGY IS VERY SLOW. eg., IN

    80S BASIC CLOCK RATE WAS 50 MHz AND

    TODAY IT IS AROUND 2 GHz, DURING THIS

    PERIOD SPEED OF COMPUTER IN SOLVING

    INTENSIVE PROBLEMS HAS GONE UP BY AFACTOR OF 100,000. IT IS DUE TO THE

    INCREASED ARCHITECTURE.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    36/150

    HOW TO INCREASE THE SYSTEM SPEED?

    2. ARCHITECTURAL METHODS:

    A. USE PARALLELISM IN SINGLE PROCESSOR

    [ OVERLAPPING EXECUTION OF NO OF INSTS

    (PIPELINING)]

    B. OVERLAPPING OPERATION OF DIFFERENT

    UNITS

    C. INCREASE SPEED OF ALU BY EXPLOITING

    DATA/TEMPORAL PARALLELISM

    D. USING NO OF INTERCONNECTED PROCESSORS

    TO WORK TOGETHER

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    37/150

    PARALLEL COMPUTERS

    THE IDEA EMERGED AT CIT IN 1981

    A GROUP HEADED BY CHARLES SEITZ AND

    GEOFFREY FOX BUILT A PARALLEL COMPUTERIN 1982

    16 NOS 8085 WERE CONNECTED IN A HYPERCUBE

    CONFIGURATION

    ADV WAS LOW COST PER MEGAFLOP

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    38/150

    PARALLEL COMPUTERS

    BESIDES HIGHER SPEED, OTHER FEATURES OF

    PARALLEL COMPUTERS ARE:

    BETTER SOLUTION QUALITY: WHEN ARITHMETIC

    OPS ARE DISTRIBUTED, EACH PE DOES SMALLER

    NO OF OPS, THUS ROUNDING ERRORS ARE

    REDUCED

    BETTER ALGORITHMS

    BETTER AND FASTER STORAGE

    GREATER RELIABILITY

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    39/150

    CLASSIFICATION OF COMPUTER

    ARCHITECTURE

    FLYNNS TAXONOMY: IT IS BASEDUPON HOW THE COMPUTER

    RELATES ITS INSTRUCTIONS TO THEDATA BEING PROCESSED

    SISD

    SIMD

    MISD

    MIMD

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    40/150

    FLYNNS TAXONOMY

    SISD: CONVENTIONAL VON-NEUMANN SYSTEM.

    CONTROL

    UNITPROCESSOR

    INST STREAMDATA

    STREAM

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    41/150

    FLYNNS TAXONOMY

    SIMD: IT HAS A SINGLE STREAM OF VECTOR

    INSTS THAT INITIATE MANY OPERATIONS. EACH

    ELEMENT OF A VECTOR IS REGARDED AS A

    MEMBER OF A SEPARATE DATA STREAM GIVINGMULTIPLE DATA STREAMS.

    CONTROLUNIT

    INST

    STREAM

    PROCESSOR

    PROCESSOR

    PROCESSOR

    DATA STREAM 1

    DATA STREAM 2

    DATA STREAM 3SYNCHRONOUS

    MULTIPROCESSOR

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    42/150

    FLYNNS TAXONOMY

    MISD: NOT POSSIBLE

    C U 1

    C U 2

    PU 3

    PU 2

    C U 3

    PU 1INST STREAM 1

    INST STREAM 2

    INST STREAM 3

    DS

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    43/150

    FLYNNS TAXONOMY

    MIMD: MULTIPROCESSOR CONFIGURATION AND

    ARRAY OF PROCESSORS.

    CU 1

    CU 2

    CU 3

    IS 1

    IS 2

    IS 3

    DS 1

    DS 2

    DS 3

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    44/150

    FLYNNS TAXONOMY

    MIMD COMPUTERS COMPRISE OF INDEPENDENT

    COMPUTERS, EACH WITH ITS OWN MEMORY,

    CAPABLE OF PERFORMING SEVERAL

    OPERATIONS SIMULTANEOUSLY.

    MIMD COMPS MAY COMPRISE OF A NUMBER OF

    SLAVE PROCESSORS WHICH MAY BE

    INDIVIUALLY CONNECTED TO MULTI-ACCESSGLOBAL MEMORY BY A SWITCHING MATRIX

    UNDER THE CONTROL OF MASTER PROCESSOR.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    45/150

    FLYNNS TAXONOMY

    THIS CLASSIFICATION IS TOO BROAD.

    IT PUTS EVERYTHING EXCEPT

    MULTIPROCESSORS IN ONE CLASS.

    IT DOES NOT REFLECT THE CONCURRENCY

    AVAILABLE THROUGH THE PIPELINE

    PROCESSING AND THUS PUTS VECTOR

    COMPUTERS IN SISD CLASS.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    46/150

    SHORES CLASSIFICATION

    SHORE CLASSIFIED THE COMPUTERS ON THE BASISOF ORGANIZATION OF THE CONSTITUENT ELEMENTSOF THE COMPUTER.

    SIX DIFFERENT KINDS OF MACHINES WERERECOGNIZED:

    1. CONVENTIONAL VON NEWMANN ARCHITECTUREWITH 1 CU, 1 PU, IM AND DM. A SINGLE DM READPRODUCES ALL BITS FOR PROCESSING BY PU. THE PU

    MAY CONTAIN MULTIPLE FUNCTIONAL UNITS WHICHMAY OR MAY NOT BE PIPELINED. SO, IT INCLUDESBOTH THE SCALAR COMPS (IBM 360/91, CDC7600)ANDPIPELINED VECTOR COMPUTERS (CRAY 1, CYBER 205)

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    47/150

    SHORES CLASSIFICATION

    TYPE I:

    IM CU

    HORIZONTAL PU

    WORD SLICE DM

    NOTE THAT THE PROCESSING IS

    CHARACTERISED AS

    HORIZONTAL (NO OF BITS IN

    PARALLEL AS A WORD)

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    48/150

    SHORES CLASSIFICATION

    MACHINE 2: SAME AS MACHINE 1 EXCEPT THATSM FETCHES A BIT SLICE FROM ALL THE WORDSIN THE MEMORY AND PU IS ORGANIZED TO

    PERFORM THE OPERATIONS IN A BIT SERIALMANNER ON ALL THE WORDS.

    IF THE MEMORY IS REGARDED AS A 2D ARRAYOF BITS WITH ONE WORD STORED PER ROW,THEN THE MACHINE 2 READS VERTICAL SLICE

    OF BITS AND PROCESSES THE SAME, WHEREASTHE MACHINE 1 READS AND PROCESSESHORIZONTAL SLICE OF BITS. EX. MPP, ICL DAP

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    49/150

    SHORES CLASSIFICATION

    MACHINE 2:

    IM

    CU

    VERTICAL

    PU

    BIT SLICE

    DM

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    50/150

    SHORES CLASSIFICATION

    MACHINE 3: COMBINATION OF 1 AND 2.

    IT COULD BE CHARACTERISED HAVING A

    MEMORY AS AN ARRAY OF BITS WITH BOTH

    HORIZONTAL AND VERTICAL READING ANDPROCESSING POSSIBLE.

    SO, IT WILL HAVE BOTH VERTICAL AND

    HORIZONTAL PROCESSING UNITS.

    EXAMPLE IS OMENN 60 (1973)

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    51/150

    SHORES CLASSIFICATION

    MACHINE 3:

    IM

    CU

    VERTICAL

    PU

    HORIZON

    TAL PU

    DM

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    52/150

    SHORES CLASSIFICATION

    MACHINE 4: IT IS OBTAINED BY REPLICATING THE PUAND DM OF MACHINE 1.

    AN ENSEMBLE OF PU AND DM IS CALLED PROCESSING

    ELEMENT (PE).

    THE INSTS ARE ISSUED TO THE PEs BY A SINGLE CU. PEsCOMMUNICATE ONLY THROUGH CU.

    ABSENCE OF COMM BETWEEN PEs LIMITS ITSAPPLICABILITY

    EX: PEPE(1976)

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    53/150

    SHORES CLASSIFICATION

    MACHINE 4:

    IM

    CU

    PU PU PU

    DM DM DM

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    54/150

    SHORES CLASSIFICATION

    MACHINE 5: SIMILAR TO MACHINE 4 WITH THE

    ADDITION OF COMMUNICATION BETWEEN

    PE.EXAMPLE: ILLIAC IV

    IM

    CU

    PU PU PU

    DM DM DM

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    55/150

    SHORES CLASSIFICATION

    MACHINE 6:

    MACHINES 1 TO 5 MAINTAIN SEPARATIONBETWEEN DM AND PU WITH SOME DATA BUS OR

    CONNECTION UNIT PROVIDING THECOMMUNICATION BETWEEN THEM.

    MACHINE 6 INCLUDES THE LOGIC IN MEMORYITSELF AND IS CALLED ASSOCIATIVEPROCESSOR.

    MACHINES BASED ON SUCH ARCHITECTURESSPAN A RANGE FROM SIMPLE ASSOCIATIVEMEMORIES TO COMPLEX ASSOCIATIVE PROCS.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    56/150

    SHORES CLASSIFICATION

    MACHINE 6:

    IM

    CU

    PU + DM

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    57/150

    FENGS CLASSIFICATION

    FENG PROPOSED A SCHEME ON THE BASIS OF

    DEGREE OF PARALLELISM TO CLASSIFY

    COMPUTER ARCHITECTURE.

    MAXIMUM NO OF BITS THAT CAN BE PROCESSED

    EVERY UNIT OF TIME BY THE SYSTEM IS CALLED

    MAXIMUM DEGREE OF PARALLELISM

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    58/150

    FENGS CLASSIFICATION

    BASED ON FENGS SCHEME, WE HAVESEQUENTIAL AND PARALLEL OPERATIONS ATBIT AND WORD LEVELS TO PRODUCE THE

    FOLLOWING CLASSIFICATION: WSBS NO CONCEIVABLE IMPLEMENTATION

    WPBS STARAN

    WSBP CONVENTIONAL COMPUTERS

    WPBP ILLIAC IV

    o THE MAX DEGREE OF PARALLELISM IS GIVEN BYTHE PRODUCT OF THE NO OF BITS IN THE WORDAND NO OF WORDS PROCESSED IN PARALLEL

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    59/150

    HANDLERS CLASSIFICATION

    FENGS SCHEME, WHILE INDICATING THEDEGREE OF PARALLELISM DOES NOT ACCOUNTFOR THE CONCURRENCY HANDLED BY THE

    PIPELINED DESIGNS.

    HANDLERS SCHEME ALLOWS THE PIPELININGTO BE SPECIFIED.

    IT ALLOWS THE IDENTIFICATION OFPARALLELISM AND DEGREE OF PIPELININGBUILT IN THE HARDWARE STRUCTURE

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    60/150

    HANDLERS CLASSIFICATION

    HANDLER DEFINED SOME OF THE TERMS AS:

    PCUPROCESSOR CONTROL UNITS

    ALUARITHMETIC LOGIC UNIT

    BLCBIT LEVEL CIRCUITS PEPROCESSING ELEMENTS

    A COMPUTING SYSTEM C CAN THEN BE CHARATERISED

    BY A TRIPLE AS T(C) = (KxK', DxD', WxW')

    WHERE K=NO OF PCU, K'= NO OF PROCESSORS THAT

    ARE PIPELINED, D=NO OF ALU,D'= NO OF PIPELINEDALU, W=WORDLENGTH OF ALU OR PE AND W'= NO OF

    PIPELINE STAGES IN ALU OR PE

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    61/150

    COMPUTER PROGRAM ORGANIZATION

    BROADLY, THEY MAY BE CLASSIFIED AS:

    CONTROL FLOW PROGRAMORGANIZATION

    DATAFLOW PROGRAM ORGANIZATION

    REDUCTION PROGRAM ORGANIATION

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    62/150

    COMPUTER PROGRAM ORGANIZATION

    IT USES EXPLICIT FLOWS OF CONTROL INFO TO

    CAUSE THE EXECUTION OF INSTS.

    DATAFLOW COMPS USE THE AVAILABILITY OF

    OPERANDS TO TRIGGER THE EXECUTION OF

    OPERATIONS.

    REDUCTION COMPUTERS USE THE NEED FOR A

    RESULT TO TRIGGER THE OPERATION WHICH

    WILL GENERATE THE REQUIRED RESULT.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    63/150

    COMPUTER PROGRAM ORGANIZATION

    THE THREE BASIC FORMS OF COMP PROGRAM

    ORGANIZATION MAY BE DESCRIBED IN TERMS

    OF THEIR DATA MECHANISM (WHICH DEFINES

    THE WAY A PERTICULAR ARGUMENT IS USED BYA NUMBER OF INSTRUCTIONS) AND THE

    CONTROL MECHANISM (WHICH DEFINES HOW

    ONE INST CAUSES THE EXECUTION OF ONE OR

    MORE OTHER INSTS AND THE RESULTINGCONTROL PATTERN).

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    64/150

    COMPUTER PROGRAM ORGANIZATION

    CONTROL FLOW PROCESSORSHAVE A BY

    REFERENCE DATA MECHANISM (WHICH USES

    REFERENCES EMBEDDED IN THE INSTS BEING

    EXECUTED TO ACCESS THE CONTENTS OF THESHARED MEMORY) AND TYPICALLY A

    SEQUENTIAL CONTROL MECHANISM ( WHICH

    PASSES A SINGLE THREAD OF CONTROL FROM

    INSTRUCTION TO INSTRUCTION).

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    65/150

    COMPUTER PROGRAM ORGANIZATION

    DATAFLOW COMPUTERSHAVE A BY VALUEDATA MECHANISM (WHICH GENERATES ANARGUMENT AT RUN-TIME WHICH IS REPLICATEDAND GIVEN TO EACH ACCESSING INSTRUCTIONFOR STORAGE AS A VALUE) AND A PARALLELCONTROL MECHANISM.

    BOTH MECHANISMS ARE SUPPORTED BY DATA

    TOKENS WHICH CONVEY DATA FROM PRODUCERTO CONSUMER INSTRUCTIONS AND CONTRIBUTETO THE ACTIVATION OF CONSUMER INSTS.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    66/150

    COMPUTER PROGRAM ORGANIZATION

    TWO BASIC TYPES OF REDUCTION PROGRAMORGANIZATIONS HAVE BEEN DEVELOPED:

    A. STRING REDUCTION WHICH HAS A BY VALUEDATA MECHANISM AND HAS ADVANTAGESWHEN MANIPULATING SIMPLE EXPRESSIONS.

    B. GRAPH REDUCTION WHICH HAS A BYREFERENCE DATA MECHANISM AND HASADVANTAGES WHEN LARGER STRUCTURESARE INVOLVED.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    67/150

    COMPUTER PROGRAM ORGANIZATION

    CONTROL-FLOW AND DATA-FLOW PROGRAMS

    ARE BUILT FROM FIXED SIZE PRIMITIVE INSTS

    WITH HIGHER LEVEL PROGRAMS CONSTRUCTED

    FROM SEQUENCES OF THESE PRIMITIVEINSTRUCTIONS AND CONTROL OPERATIONS.

    REDUCTION PROGRAMS ARE BUILT FROM HIGH

    LEVEL PROGRAM STRUCTURES WITHOUT THENEED FOR CONTROL OPERATORS.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    68/150

    COMPUTER PROGRAM ORGANIZATION

    THE RELATIONSHIP OF THEDATA AND CONTROLMECHANISMS TO THE BASIC COMPUTERPROGRAM ORGANIZATIONS CAN BE SHOWN ASUNDER:

    DATA MECHANISM

    BY VALUE BY REFERENCECONTROL MECHANISM

    SEQUENTIAL VON-NEUMANN CON.FLOW

    PARALLEL DATA FLOW PARALLEL CONTROL FLOWRECURSIVE STRING REDUCTION GRAPH REDUCTION

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    69/150

    MACHINE ORGANIZATION

    MACHINE ORGANIZATION CAN BE CLASSIFIEDAS FOLLOWS:

    CENTRALIZED: CONSISTING OF A SINGLE

    PROCESSOR, COMM PATH AND MEMORY. ASINGLE ACTIVE INST PASSES EXECUTION TO ASPECIFIC SUCCESSOR INSTRUCTION.

    o TRADITIONAL VON-NEUMANN PROCESSORS

    HAVE CENTRALIZED MACHINE ORGANIZATIONAND A CONTROL FLOW PROGRAMORGANIZATION.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    70/150

    MACHINE ORGANIZATION

    PACKET COMMUNICATION: USING A CIRCULAR

    INST EXECUTION PIPELINE IN WHICH

    PROCESSORS, COMMUNICATIONS AND

    MEMORIES ARE LINKED BY POOLS OF WORK.

    o NEC 7281 HAS A PACKET COMMUNICATION

    MACHINE ORGANIZATION AND DATAFLOW

    PROGRAM ORGANIZATION.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    71/150

    MACHINE ORGANIZATION

    EXPRESSION MANIPULATION WHICH USES IDENTICAL

    RESOURCES IN A REGULAR STRUCTURE, EACH

    RESOURCE CONTAINING A PROCESSOR,

    COMMUNICATION AND MEMORY. THE PROGRAM

    CONSISTS OF ONE LARGE STRUCTURE, PARTS OFWHICH ARE ACTIVE WHILE OTHER PARTS ARE

    TEMPORARILY SUSPENDED.

    AN EXPRESSION MANIPULATION MACHINE MAY BE

    CONSTRUCTED FROM A REGULAR STRUCTURE OF T414

    TRANSPUTERS, EACH CONTAINING A VON-NEUMANN

    PROCESSOR, MEMORY AND COMMUNICATION LINKS.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    72/150

    MULTIPROCESSING SYSTEMS

    IT MAKES USE OF SEVERAL PROCESSORS, EACHOBEYING ITS OWN INSTS, USUALLYCOMMUNICATING VIA A COMMON MEMORY.

    ONE WAY OF CLASSIFYING THESE SYSTEMS ISBY THEIR DEGREE OF COUPLING.

    TIGHTLY COUPLED SYSTEMS HAVE PROCESSORSINTERCONNECTED BY A MULTIPROCESSORSYSTEM BUS WHICH BECOMES A PERFORMANCEBOTTLENECK.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    73/150

    MULTIPROCESSING SYSTEMS

    INTERCONNECTION BY A SHARED MEMORY IS LESS

    TIGHTLY COUPLED AND A MULTIPORT MEMORY MAY

    BE USED TO REDUCE THE BUS BOTTLENECK.

    THE USE OF SEVERAL AUTONOMOUS SYSTEMS, EACH

    WITH ITS OWN OS, IN A CLUSTER IS MORE LOOSELY

    COUPLED.

    THE USE OF NETWORK TO INTERCONNECT SYSTEMS,USING COMM SOFTWARE, IS THE MOST LOOSELY

    COUPLED ALTERNATIVE.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    74/150

    MULTIPROCESSING SYSTEMS

    DEGREE OF COUPLING:

    NETWORK

    SW

    NETWORK

    SW

    NETWORK LINK

    OS OSCLUSTER LINK

    SYSTEM

    MEMORY

    SYSTEM

    MEMORY

    SYSTEM BUS

    CPU CPUMULTIPROCESSOR BUS

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    75/150

    MULTIPROCESSING SYSTEMS

    MULTIPROCESSORS MAY ALSO BE CLASSIFIED

    AS AUTOCRATIC OR EGALITARIAN.

    AUTOCRATIC CONTROL IS SHOWN WHERE A

    MASTER-SLAVE RELATIONSHIP EXISTS BETWEEN

    THE PROCESSORS.

    EGALITARIAN CONTROL GIVES ALL PROCESSORS

    EQUAL CONTROL OF SHARED BUS ACCESS.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    76/150

    MULTIPROCESSING SYSTEMS

    MULTIPROCESSING SYSTEMS WITH SEPARATE

    PROCESSORS AND MEMORIES MAY BE

    CLASSIFIED AS DANCE HALL CONFIGURATIONS

    IN WHICH THE PROCESSORS ARE LINED UP ONONE SIDE WITH THE MEMORIES FACING THEM.

    CROSS CONNECTIONS ARE MADE BY A

    SWITCHING NETWORK.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    77/150

    MULTIPROCESSING SYSTEMS

    DANCE HALL CONFIGURATION:

    CPU 1

    CPU 2

    CPU 3

    CPU 4

    SWITCHI

    NG

    NETWO

    RK

    MEM 1

    MEM 2

    MEM 3

    MEM 4

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    78/150

    MULTIPROCESSING SYSTEMS

    ANOTHER CONFIGURATION IS BOUDOIR CONFIG IN WHICH EACH

    PROCESSOR IS CLOSELY COUPLED WITHITS OWN MEMORY AND A

    NETWORK OF SWITCHES IS USED TO LINK THE PROCESSOR-MEMORY

    PAIRS.

    CPU 1

    MEM 1

    CPU 2

    MEM 2

    CPU 3

    MEM 3

    CPU 4

    MEM 4

    SWITCHING

    NETWORK

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    79/150

    MULTIPROCESSING SYSTEMS

    ANOTHER TERM, WHICH IS USED TO DESCRIBE A

    FORM OF PARALLEL COMPUTING IS

    CONCURRENCY.

    IT DENOTES INDEPENDENT, AYNCHRONOUS

    OPERATION OF A COLLECTION OF PARALLEL

    COMPUTING DEVICES RATHER THAN THE

    SYNCHRONOUS OPERATION OF DEVICES IN AMULTIPROCESSOR SYSTEM.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    80/150

    SYSTOLIC ARRAYS

    IT MAY BE TERMED AS MISD SYSTEM.

    IT IS A REGULAR ARRAY OF PROCESSING ELEMENTS,EACH COMMUNICATING WITH ITS NEAREST

    NEIGHBOURS AND OPERATING SYNCHRONOUSLYUNDER THE CONTROL OF A COMMON CLOCK WITH ARATE LIMITED BY THE SLOWEST PROCESSOR IN THEARRAY.

    THE ERM SYSTOLIC IS DERIVED FROM THR RHYTHMICCONTRACTION OF THE HEART, ANALOGOUS TO THERHYTHMIC PUMPING OF DATA THROUGH AN ARRAY OFPROCESSING ELEMENTS.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    81/150

    WAVEFRONT ARRAY

    IT IS A REGULAR ARRAY OF PROCESSING ELEMENTS,

    EACH COMMUNICATING WITH ITS NEAREST

    NEIGHBOURS BUT OPERATING WITH NO GLOBAL

    CLOCK.

    IT EXHIBITS CONCURRENCY AND IS DATA DRIVEN.

    THE OPERATION OF EACH PROCESSOR IS CONTROLLED

    LOCALLY AND IS ACTIVATED BY THE ARRIVAL OF DATAAFTER ITS PREVIOUS OUTPUT HAS BEEN DELIVERED TO

    THE APPROPRIATE NEIGHBOURING PROCESSOR.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    82/150

    WAVEFRONT ARRAY

    PROCESSING WAVEFRONTS DEVELOP ACROSS

    THE ARRAY AS PROCESSORS PASS ON THE

    OUTPUT DATA TO THEIR NEIGHBOUR. HENCE

    THE NAME.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    83/150

    GRANULARITY OF PARALLELISM

    PARALLEL PROCESSING EMPHASIZES THE USE OFSEVERAL PROCESSING ELEMENTS WITH THEMAIN OBJECTIVE OF GAINING SPEED INCARRYING OUT A TIME CONSUMING COMPUTINGJOB

    A MULTI-TASKING OS EXECUTES JOBCONCURRENTLY BUT THE OBJECTIVE IS TO

    EFFECT THE CONTINUED PROGRESS OF ALL THETASKS BY SHARING THE RESOURCES IN ANORDERLY MANNER.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    84/150

    GRANULARITY OF PARALLELISM

    THE PARALLEL PROCESSING EMPHASIZES THEEXPLOITATION OF CONCURRENCY AVAILABLEIN A PROBLEM FOR CARRYING OUT THECOMPUTATION BY EMPLOYING MORE THAN ONEPROCESSOR TO ACHIEVE BETTER SPEED AND/ORTHROUGHPUT.

    THE CONCURRENCY IN THE COMPUTING

    PROCESS COULD BE LOOKED UPON FORPARALLEL PROCESSING AT VARIOUS LEVELS(GRANULARITY OF PARALLELISM) IN THE

    SYSTEM.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    85/150

    GRANULARITY OF PARALLELISM

    THE FOLLOWING GRANULARITIES OF PARALLELISMMAY BE IDENTIFIED IN ANY EXISTING SYSTEM:

    o PROGRAM LEVEL PARALLELISM

    o PROCESS OR TASK LEVEL PARALLELISM

    o PARALLELISM AT THE LEVEL OF GROUP OFSTATEMENTS

    o STATEMENT LEVEL PARALLELISM

    o PARALLELISM WITHIN A STATEMENT

    o INSTRUCTION LEVEL PARALLELISM

    o PARALLELISM WITHIN AN INSTRUCTION

    o LOGIC AND CIRCUIT LEVEL PARALLELISM

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    86/150

    GRANULARITY OF PARALLELISM

    THE GRANULARITIES ARE LISTED IN THE

    INCREASING DEGREE OF FINENESS.

    GRANULARITIES AT LEVELS 1,2 AND 3 CAN BEEASILY IMPLEMENTED ON A CONVENTIONAL

    MULTIPROCESSOR SYSTEM.

    MOST MULTI-TASKING OS ALLOW CREATIONAND SCHEDULING OF PROCESSES ON THE

    AVAILABLE RESOURCES.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    87/150

    GRANULARITY OF PARALLELISM

    SINCE A PROCESS REPRESENTS A SIZABLE CODE INTERMS OF EXECUTION TIME, THE OVERLOADS INEXPLOITING THE PARALLELISM AT THESEGRANULARITIES ARE NOT EXCESSIVE.

    IF THE SAME PRINCIPLE IS APPLIED TO THE NEXT FEWLEVELS, INCREASED SCHEDULING OVERHEADS MAYNOT WARRANT PARALLEL EXECUTION

    IT IS SO BECAUSE THE UNIT OF WORK OF A MULTI-PROCESSOR IS CURRENTLY MODELLED AT THE LEVELOF A PROCESS OR TASK AND IS REASONABLYSUPPORTED ON THE CURRENT ARCHITECTURES.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    88/150

    GRANULARITY OF PARALLELISM

    THE LAST THREE LEVELS ARE BEST HANDLED BYHARDWARE. SEVERAL MACHINES HAVE BEEN BUILT TOPROVIDE THE FINE GRAIN PARALLELISM IN VARYINGDEGREES.

    A MACHINE HAVING INST LEVEL PARALLELISMEXECUTES SEVERAL INSTS SIMULTANEOUSLY.EXAMPLES ARE PIPELINE INST PROCESSORS,SYNCHRONOUS ARRAY PROCESSORS, ETC.

    CIRCUIT LEVEL PARALLELISM EXISTS IN MOSTMACHINES IN THE FORM OF PROCESSING MULTIPLEBITS/BYTES SIMULTANEOUSLY.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    89/150

    PARALLEL ARCHITECTURES

    THERE ARE NUMEROUS ARCHITECTURES THAT HAVE

    BEEN USED IN THE DESIGN OF HIGH SPEED COMPUTERS.

    IT FALLS BASICALLY INTO 2 CLASSES:

    GENERAL PURPOSE &

    SPECIAL PURPOSE

    o GENERAL PURPOSE ARCHITECTURES ARE DESIGNED TO

    PROVIDE THE RATED SPEEDS AND OTHER COMPUTINGREQUIREMENTS FOR VARIETY OF PROBLEMS WITH

    SAME PERFORMANCE.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    90/150

    PARALLEL ARCHITECTURES

    THE IMPORTANT ARCHITECTURAL IDEAS BEING

    USED IN DESIGNING GEN PURPOSE HIGH SPEED

    COMPUTERS ARE:

    PIPELINED ARCHITECTURES

    ASYNCHRONOUS MULTI-PROCESSORS

    DATA-FLOW COMPUTERS

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    91/150

    PARALLEL ARCHITECTURES

    THE SPECIAL PURPOSE MACHINES HAVE TO EXCEL FOR

    WHAT THEY HAVE BEEN DESIGNED. IT MAY OR MAY

    NOT DO SO FOR OTHER APPLICATIONS. SOME OF THE

    IMPORTANT ARCHITECTURAL IDEAS FOR DEDICATED

    COMPUTERS ARE:

    SYNCHRONOUS MULTI-PROCESSORS(ARRAY

    PROCESSOR)

    SYSTOLIC ARRAYS

    NEURAL NETWORKS

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    92/150

    ARRAY PROCESSORS

    IT CONSISTS OF SEVERAL PE, ALL OF WHICH EXECUTE

    THE SAME INST ON DIFFERENT DATA.

    THE INSTS ARE FETCHED AND BROADCAST TO ALL THE

    PE BY A COMMON CU.

    THE PE EXECUTE INSTS ON DATA RESIDING IN THEIR

    OWN MEMORY.

    THE PE ARE LINKED VIA AN INTERCONNECTION

    NETWORK TO CARRY OUT DATA COMMUNICATION

    BETWEEN THEM.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    93/150

    ARRAY PROCESSORS

    THERE ARE SEVERAL WAYS OF CONNECTING PE

    THESE MACHINES REQUIRE SPECIAL

    PROGRAMMING EFFORTS TO ACHIEVE THESPEED ADVANTAGE

    THE COMPUTATIONS ARE CARRIED OUT

    SYNCHRONOUSLY BY THE HW AND THEREFORESYNC IS NOT AN EXPLICIT PROBLEM

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    94/150

    ARRAY PROCESSORS

    USING AN INTERCONNECTION NETWORK:

    PE1 PE2 PE3 PE4 PEnCU AND

    SCALARPROCESS

    ORINTERCONNECTION

    NETWORK

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    95/150

    ARRAY PROCESSORS

    USING AN ALIGNMENT NETWORK:

    PE0 PE1 PE2 PEn

    ALIGNMENT NETWORK

    MEM O MEM1 MEM2 MEMK

    CONTROL

    UNIT AND

    SCALAR

    PROCESSOR

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    96/150

    CONVENTIONAL MULTI-PROCESSORS

    ASYNCHRONOUS MULTIPROCESSORS

    BASED ON MULTIPLE CPUs AND MEM BANKS

    CONNECTED THROUGH EITHER A BUS ORCONNECTION NETWORK IS A COMMONLY USED

    TECHNIQUE TO PROVIDE INCREASED

    THROUGHPUT AND/OR RESPONSE TIME IN A

    GENERAL PURPOSE COMPUTING ENVIRONMENT.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    97/150

    CONVENTIONAL MULTI-PROCESSORS

    IN SUCH SYSTEMS, EACH CPU OPERATES

    INDEPENDENTLY ON THE QUANTUM OF WORK GIVEN

    TO IT

    IT HAS BEEN HIGHLY SUCCESSFUL IN PROVIDING

    INCREASED THROUGHPUT AND/OR RESPONSE TIME IN

    TIME SHARED SYSTEMS.

    EFFECTIVE REDUCTION OF THE EXECUTION TIME OF AGIVEN JOB REQUIRES THE JOB TO BE BROKEN INTO

    SUB-JOBS THAT ARE TO BE HANDLED SEPARATELY BY

    THE AVAILABLE PHYSICAL PROCESSORS.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    98/150

    CONVENTIONAL MULTI-PROCESSORS

    IT WORKS WELL FOR TASKS RUNNING MORE OR

    LESS INDEPENDENTLY ie., FOR TASKS HAVING

    LOW COMMUNICATION AND SYNCHRONIZATION

    REQUIREMENTS.

    COMM AND SYNC IS IMPLEMENTED EITHER

    THROUGH THE SHARED MEMORY OR BY

    MESSAGE SYSTEM OR THROUGH THE HYBRIDAPPROACH.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    99/150

    CONVENTIONAL MULTI-PROCESSORS

    SHARED MEMORY ARCHITECTURE:

    MEMORY

    CPU CPU CPU

    COMMON BUS ARCHITECTURE

    MEM0 MEM1 MEMn

    PROCESSOR MEMORY SWITCH

    CPU CPU CPU

    SWITCH BASEDMULTIPROCESSOR

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    100/150

    CONVENTIONAL MULTI-PROCESSORS

    MESSAGE BASED ARCHITECTURE:

    PE 1 PE 2 PE n

    CONNECTION NETWORK

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    101/150

    CONVENTIONAL MULTI-PROCESSORS

    HYBRID ARCHITECTURE:

    PE 1 PE 2 PE n

    CONNECTION NETWORK

    MEM 1 MEM 2 MEM k

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    102/150

    CONVENTIONAL MULTI-PROCESSORS

    ON A SINGLE BUS SYSTEM, THERE IS A LIMIT ON THENUMBER OF PROCESSORS THAT CAN BE OPERATED INPARALLEL.

    IT IS USUALLY OF THE ORDER OF 10.

    COMM NETWORK HAS THE ADVANTAGE THAT THE NOOF PROCESSORS CAN GROW WITHOUT LIMIT, BUT THECONNECTION AND COMM COST MAY DOMINATE ANDTHUS SATURATE THE PERFORMANCE GAIN.

    DUE TO THIS REASON, HYBRID APPROACH MAY BEFOLLOWED

    MANY SYSTEMS USE A COMMON BUS ARCH FORGLOBAL MEM, DISK AND I/O WHILE THE PROC MEMTRAFFIC IS HANDLED BY SEPARATE BUS.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    103/150

    DATA FLOW COMPUTERS

    A NEW FINE GRAIN PARALLEL PROCESSINGAPPROACH BASED ON DATAFLOW COMPUTINGMODEL HAS BEEN SUGGESTED BY JACK DENNISIN 1975.

    HERE, A NO OF DATA FLOW OPERATORS, EACHCAPABLE OF DOING AN OPERATION AREEMPLOYED.

    A PROGRAM FOR SUCH A MACHINE IS ACONNECTION GRAPH OF THE OPERATORS.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    104/150

    DATA FLOW COMPUTERS

    THE OPERATORS FORM THE NODES OF THE

    GRAPH WHILE THE ARCS REPRESENT THE DATA

    MOVEMENT BETWEEN NODES.

    AN ARC IS LABELED WITH A TOKEN TO INDICATE

    THAT IT CONTAINS THE DATA.

    A TOKEN IS GENERATED ON THE OUTPUT OF ANODE WHEN IT COMPUTES THE FUNCTION

    BASED ON THE DATA ON ITS INPUT ARCS.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    105/150

    DATA FLOW COMPUTERS

    THIS IS KNOWN AS FIRING OF THE NODE.

    A NODE CAN FIRE ONLY WHEN ALL OF ITS INPUT

    ARCS HAVE TOKENS AND THERE IS NO TOKEN

    ON THE OUTPUT ARC. WHEN A NODE FIRES, IT REMOVES THE INPUT

    TOKENS TO SHOW THAT THE DATA HAS BEEN

    CONSUMED.

    USUALLY, COMPUTATION STARTS WITHARRIVAL OF DATA ON THE INPUT NODES OF THE

    GRAPH.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    106/150

    DATA FLOW COMPUTERS

    DATA FLOW GRAPH FOR THE COMPUTATION:

    A = 5 + CD

    5

    C

    D

    + -

    COMPUTATION PROGRESSES

    AS PER DATA AVAILABILITY

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    107/150

    DATA FLOW COMPUTERS

    MANY CONVENTIONAL MACHINES EMPLOYING

    MULTIPLE FUNCTIONAL UNITS EMPLOY THE DATA

    FLOW MODEL FOR SCHEDULING THE FUNCTIONAL

    UNITS.

    EXAMPLE EXPERIMENTAL MACHINES ARE

    MANCHESTER MACHINE (1984) AND MIT MACHINE.

    THE DATA FLOW COMPUTERS PROVIDE FINE

    GRANULARITY OF PARALLEL PROCESSING, SINCE THE

    DATA FLOW OPERATORS ARE TYPICALLY

    ELEMENTARY ARITHMETIC AND LOGIC OPERATORS.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    108/150

    DATA FLOW COMPUTERS

    IT MAY PROVIDE AN EFFECTIVE SOLUTION FOR USING

    VERY LARGE NUMBER OF COMPUTING ELEMENTS IN

    PARALLEL.

    WITH ITS ASYNCHRONOUS DATA DRIVEN CONTROL, IT

    HAS A PROMISE FOR EXPLOITATION OF THE

    PARALLELISM AVAILABLE BOTH IN THE PROBLEM AND

    THE MACHINE.

    CURRENT IMPLEMENTATIONS ARE NO BETTER THAN

    CONVENTIONAL PIPELINED MACHINES EMPLOYING

    MULTIPLE FUNCTIONAL UNITS.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    109/150

    SYSTOLIC ARCHITECTURES

    THE ADVENT OF VLSI HAS MADE IT POSSIBLE TODEVELOP SPECIAL ARCHITECTURES SUITABLEFOR DIRECT IMPLEMENTATION IN VLSI.

    SYSTOLIC ARCHITECTURES ARE BASICALLYPIPELINES OPERATING IN ONE OR MOREDIMENSIONS.

    THE NAME SYSTOLIC HAS BEEN DERIVED FROMTHE ANALOGY OF THE OPERATION OF BLOODCIRCULATION SYSTEM THROUGH THE HEART.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    110/150

    SYSTOLIC ARCHITECTURES

    CONVENTIONAL ARCHITECTURES OPERATE ONTHE DATA USING LOAD AND STORE OPERATIONSFROM THE MEMORY.

    PROCESSING USUALLY INVOLVES SEVERALOPERATIONS.

    EACH OPERATION ACCESSES THE MEMORY FOR

    DATA, PROCESSES IT AND THEN STORES THERESULT. THIS REQUIRES A NO OF MEMREFERENCES.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    111/150

    SYSTOLIC ARCHITECTURES

    CONVENTIONAL PROCESSING:

    MEMORY

    F1 F2 Fn

    MEMORY

    F1 F2 Fn

    SYSTOLIC PROCESSING

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    112/150

    SYSTOLIC ARCHITECTURES

    IN SYSTOLIC PROCESSING, DATA TO BE PROCESSED

    FLOWS THROUGH VARIOUS OPERATION STAGES AND

    THEN FINALLY IT IS PUT IN THE MEMORY.

    SUCH AN ARCHITECTURE CAN PROVIDE BERY HIGH

    COMPUTING THROUGHPUT DUE TO REGULAR

    DATAFLOW AND PIPELINE OPERATION.

    IT MAY BE USEFUL IN DESIGNING SPECIAL PROCESSORS

    FOR GRAPHIC, SIGNAL & IMAGE PROCESSING.

    PERFORMANCE OF

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    113/150

    PERFORMANCE OF

    PARALLEL COMPUTERS AN IMPORTANT MEASURE OF PARALLELARCHITECTURE IS SPEEDUP.

    LET n = NO. OF PROCESSORS; Ts = SINGLE PROC.

    EXEC TIME; Tn = N PROC. EXEC. TIME,

    THEN

    SPEEDUP S = Ts/Tn

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    114/150

    AMDAHLS LAW

    1967

    BASED ON A VERY SIMPLE OBSERVATION.

    A PROGRAM REQUIRING TOTAL TIME T FOR

    SEQUENTIAL EXECUTION SHALL HAVE SOME

    PART WHICH IS INHERENTLY SEQUENTIAL.

    IN TERMS OF TOTAL TIME TAKEN TO SOLVE THE

    PROBLEM, THIS FRACTION OF COMPUTING TIME

    IS AN IMPORTANT PARAMETER.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    115/150

    AMDAHLS LAW

    LET f = SEQ. FRACTION FOR A GIVEN PROGRAM.

    AMDAHLS LAW STATES THAT THE SPEED UP OF

    A PARALLEL COMPUTER IS LIMITED BY

    S

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    116/150

    AMDAHLS LAW

    CONSIDER TWO PARALLEL COMPS. Me AND Mi.Me IS BUILT USING POWERFUL PROCS. CAPABLEOF EXECUTING AT A SPEED OF M MEGAFLOPS.

    THE COMP Mi IS BUILT USING CHEAP PROCS.AND EACH PROC. OF Mi EXECUTES r.MMEGAFLOPS, WHERE 0 < r < 1

    IF THE MACHINE Me ATTEMPTS A COMPUTATION

    WHOSE INHERENTLY SEQ.FRACTION f > r THENMi WILL EXECUTE COMPS. MORE SLOWLY THANA SINGLE PROC. OF Mi.

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    117/150

    AMDAHLS LAW

    PROOF:

    LET W = TOTAL WORK; M = SPEED OF Mi (IN Mflops)

    R.m = SPEED OF PE OF Ma; f.W = SEQ WORK OF JOB;

    T(Ma) = TIME TAKEN BY Ma FOR THE WORK W,

    T(Mi) = TIME TAKEN BY Mi FOR THE WORK W, THEN

    TIME TAKEN BY ANY COMP =

    T = AMOUNT OF WORK/SPEED

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    118/150

    AMDAHLS LAW

    T(Ma) = TIME FOR SEQ PART + TIME FOR

    PARALLEL PART

    = ((f.W)/(r.M)) + [((1-f).W/n)/(r.M)] = (W/M).(f/r) IF n IS

    INFINITELY LARGE.T(Me) = (W/M) [ASSUMING ONLY 1 PE]

    SO IF f > r, THEN T(Ma) > T(Mi)

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    119/150

    AMDAHLS LAW

    THE THEOREM IMPLIES THAT A SEQ COMPONENT

    FRACTION ACCEPTABLE FOR THE MACHINE Mi

    MAY NOT BE ACCEPTABLE FOR THE MACHINE

    Ma.

    IT IS NOT GOOD TO HAVE A LARGER PROCESSING

    POWER THAT GOES AS A WASTE. PROCS MUST

    MAINTAIN SOME LEVEL OF EFFICIENCY.

    A A S A

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    120/150

    AMDAHLS LAW

    RELATION BETWEEN EFFICIENCY e AND SEQ FRACTIONr:

    S

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    121/150

    MINSKYS CONJECTURE

    1970

    FOR A PARALLEL COMPUTER WITH n PROCS, THE

    SPEEDUP S SHALL BE PROPORTIONAL TO log2n.

    MINSKYS CONJECTURE WAS VERY BAD FOR THE

    PROPONENTS OF LARGE SCALE PARALLEL

    ARCHITECTURES.

    FLYNN & HENNESSY (1980) THEN GAVE THATSPEEDUP OF n PROCESSOR PARALEL SYSTEM IS

    LIMITED BY S

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    122/150

    PARALLEL ALGORITHMS

    IMP MEASURE OF THE PERFORMANCE OF ANYALGO IS ITS TIME AND SPACE COMPLEXITY.THEY ARE SPECIFIED AS SOME FUNCTION OF THEPROBLEM SIZE.

    MANY TIMES, THEY DEPEND UPON THE USEDDATA STRUCTURE.

    SO, ANOTHER IMP MEASURE IS THEPREPROCESSING TIME COMPLEXITY TOGENERATE THE DESIRED DATA STRUCTURE.

    PARALLEL ALGORITHMS

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    123/150

    PARALLEL ALGORITHMS

    PARALLEL ALGOS ARE THE ALGOS TO BE RUN

    ON PARALLEL MACHINE.

    SO, COMPLEXITY OF COMM AMONGSTPROCESORS ALSO BECOMES AN IMPORTANT

    MEASURE.

    SO, AN ALGO MAY FARE BADLY ON ONEMACHINE AND MUCH BETTER ON THE OTHER.

    PARALLEL ALGORITHMS

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    124/150

    PARALLEL ALGORITHMS

    DUE TO THIS REASON, MAPPING OF THE ALGO

    ON THE ARCHITECTURE IS AN IMP ACTIVITY IN

    THE STUDY OF PARALLEL ALGOS.

    SPEEDUP AND EFFICIENCY ARE ALSO IMP

    PERFORMANCE MEASURES FOR A PARALLEL

    ALGO WHEN MAPPED ON TO A GIVENARCHITECTURE.

    PARALLEL ALGORITHMS

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    125/150

    PARALLEL ALGORITHMS

    A PARALLEL ALGO FOR A GIVEN PROBLEM

    MAY BE DEVELOPED USING ONE OR MORE OF

    THE FOLLOWING:

    1. DETECT AND EXPLOIT THE INHERENTPARALLELISM AVAILABLE IN THE EXISTING

    SEQUENTIAL ALGORITHM

    2. INDEPENDENTLY INVENT A NEW PARALLEL

    ALGORITHM3. ADAPT AN EXISTING PARALLEL ALGO THAT

    SOLVES A SIMILAR PROBLEM.

    DISTRIBUTED PROCESSING

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    126/150

    DISTRIBUTED PROCESSING

    PARALLEL PROCESSING DIFFERS FROM DISTRIBUTED

    PROCESSING IN THE SENSE THAT IT HAS (1) CLOSE

    COUPLING BETWEEN THE PROCESSORS & (2)

    COMMUNICATION FAILURES MATTER A LOT.

    PROBLEMS MAY ARISE IN DISTRIBUTED PROCESSING

    BECAUSE OF (1) TIME UNCERTAINTY DUE TO DIFFERING

    TIME IN LOCAL CLOCKS, (2) INCOMPLETE INFO ABOUT

    OTHER NODES IN THE SYSTEM, (3) DUPLICATE INFO

    WHICH MAY NOT BE ALWAYS CONSISTENT.

    PIPELINING PROCESSING

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    127/150

    PIPELINING PROCESSING

    A PIPELINE CAN WORK WELL WHEN:

    1. THE TIME TAKEN BY EACH STAGE IS NEARLYTHE SAME.

    2. IT REQUIRES A STEADY STEAM OF JOBS,OTHERWISE UTILIZATION WILL BE POOR.

    3. IT HONOURS THE PRECEDENCE CONSTRAINTSOF SUB-STEPS OF JOBS.

    IT IS THE MOST IMP PROPERTY OF PIPELINE. IT

    ALLOWS PARALLEL EXECUTION OF JOBSWHICH HAVE NO PARALLELISM WITHININDIVIDUAL JOBS THEMSELVES.

    PIPELINING PROCESSING

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    128/150

    PIPELINING PROCESSING

    IN FACT, A JOB WHICH CAN BE BROKEN INTO A NO OFSEQUENTIAL STEPS IS THE BASIS OF PIPELINEPROCESSING.

    THIS IS DONE BY INTRODUCING TEMPORALPARALLELISM WHICH MEANS EXECUTING DIFFERENTSTEPS OF DIFFERENT JOBS INSIDE THE PIPELINE.

    THE PERFORMANCE IN TERMS OF THROUGHPUT ISGUARANTEED IF THERE ARE ENOUGH JOBS TO BE

    STREAMED THROUGH THE PIPELINE, ALTHOUGH ANINDIVIDUAL JOB FINISHES WITH A DELAY EQUALLINGTHE TOTAL DELAY OF ALL THE STAGES.

    PIPELINING PROCESSING

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    129/150

    PIPELINING PROCESSING

    THE FOURTH IMP THING IS THAT THE STAGES INTHE PIPELINE ARE SPECIALIZED TO DOPARTICULAR SUBFUNCTIONS, UNLIKE INCONVENTIONAL PARALLEL PROCESSORS WERE

    EQUIPMENT IS REPLICATED.

    IT AMOUNTS TO SAYING THAT DUE TOSPECIALIZATION, THE STAGE PROC COULD BE

    DESIGNED WITH BETTER COST AND SPEED,OPTIMISED FOR THE SPECIALISED FUNCTION OFTHE STAGE

    PERFORMANCE MEASURES

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    130/150

    OF PIPELINE EFFICIENCY, SPEEDUP AND THROUGHPUT

    EFFICIENCY: LET n BE THE LENGTH OF PIPE AND

    m BE THE NO OF TASKS RUN ON THE PIPE, THENEFFICIENCY e CAN BE DEFINED AS

    e = [(m.n)/((m+n-1).(n))]

    WHEN n>>m, e TENDS TO m/n (A SMALL FRACTION)

    WHEN n

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    131/150

    OF PIPELINE SPEEDUP = S = [((n.ts).m)/((m+n-1).ts)]

    = [(m.n)/(n+m-1)]

    WHEN n>>m, S=m (NO. OF TASKS RUN)

    WHEN n

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    132/150

    OF PIPELINE THROUGHPUT = Th = [m/((n+m-1).ts)] = e/ts WHERE ts IS

    TIME THAT ELAPSES AT 1 STAGE.

    WHEN n>>m, Th = m/(n.ts)

    WHEN n

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    133/150

    OPTIMAL PIPE SEGMENTATION

    IN HOW MANY SUBFUNCTIONS A FUNCTIONSHOULD BE DIVIDED?

    LET n = NO OF STAGES, T= TIME FOR NON-PIPELINED IMPLEMENTATION, D = LATCH DELAYAND c = COST OF EACH STAGE

    STAGE COMPUTE TIME = T/n (SINCE T IS DIVIDED

    EQUALLY FOR n STAGES) PIPELINE COST= c.n + k WHERE k IS A CONSTANT

    REFLECTING SOME COST OVERHEAD.

    OPTIMAL PIPE SEGMENTATION

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    134/150

    OPTIMAL PIPE SEGMENTATION

    SPEED (TIME PER OUTPUT) = (T/n + D)

    ONE OF THE IMPORTANT PERFORMANCE

    MEASURE IS THE PRODUCT OF SPEED AND COST

    DENOTED BY p. p = [(T/n) + D).(c.n +k)] = T.c +D.c.n + (k.T)/n + k.D

    TO OBTAIN A VALUE OF n WHICH GIVES BEST

    PERFORMANCE, WE DIFFERENTIATE p w r t n AND

    EQUATE IT TO ZERO dp/dn = D.c(k.T)/n2= 0

    n = SQRT [(k.T)/(D.c)]

    PIPELINE CONTROL

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    135/150

    PIPELINE CONTROL

    IN A NON-PIPELINED SYSTEM, ONE INST IS FULLY EXECUTED

    BEFORE THE NEXT ONE STARTS, THUS MATCHING THE ORDER

    OF EXECUTION.

    IN A PIPELINED SYSTEM, INST EXECUTION IS OVERLAPPED. SO,IT CAN CAUSE PROBLEMS IF NOT CONSIDERED PROPERLY IN

    THE DESIGN OF CONTROL.

    EXISTENCE OF SUCH DEPENDENCIES CAUSES HAZARDS

    CONTROL STRUCTURE PLAYS AN IMP ROLE IN THE

    OPERATIONAL EFFICIENCY AND THROUGHPUT OF THE

    MACHINE.

    PIPELINE CONTROL

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    136/150

    PIPELINE CONTROL

    THERE ARE 2 TYPES OF CONTROL STRUCTURESIMPLEMENTED ON COMMERCIAL SYSTEMS.

    THE FIRST ONE IS CHARACTERISED BY A STREAMLINE FLOWOF THE INSTS IN THE PIPE.

    IN THIS, INSTS FOLLOW ONE AFTER ANOTHER SUCH THATTHE COMPLETION ORDERING IS THE SAME AS THE ORDER OFINITIATION.

    THE SYSTEM IS CONCEIVED AS A SEQUENCE OF FUNCTIONALMODULES THROUGH WHICH THE INSTS FLOW ONE AFTERANOTHER WITH AN INTERLOCK BETWEEN THE ADJACENTSTAGES TO ALLOW THE TRANSFER OF DATA FROM ONESTAGE TO ANOTHER.

    PIPELINE CONTROL

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    137/150

    PIPELINE CONTROL

    THE INTERLOCK IS NECESSARY BECAUSE THE PIPE IS

    ASYNCHRONOUS DUE TO VARIATIONS IN THE SPEEDS

    OF DIFFERENT STAGES.

    IN THESE SYSTEMS, THE BOTTLRNECKS APPEARDYNAMICALLY AT ANY STAGE AND THE INPUT TO IT IS

    HALTED TEMPORARILY.

    THE SECOND TYPE OF CONTROL IS MORE FLEXIBLE,

    POWERFUL BUT EXPENSIVE.

    PIPELINE CONTROL

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    138/150

    PIPELINE CONTROL

    IN SUCH SYSTEMS, WHEN A STAGE HAS TO SUSPEND THE

    FLOW OF A PARTICULAR INSTRUCTION, IT ALLOWS OTHER

    INSTS TO PASS THROUGH THE STAGE RESULTING IN AN OUT-

    OF-TURN EXECUTION OF THE INSTS.

    THE CONTROL MECHANISM IS DESIGNED SUCH THAT EVEN

    THOUGH THE INSTS ARE EXECUTED OUT-OF-TURN, THE

    BEHAVIOUR OF THE PROGRAM IS SAME AS IF THEY WERE

    EXECUTED IN THE ORIGINAL SEQUENCE.

    SUCH CONTROL IS DESIRABLE IN A SYSTEM HAVINGMULTIPLE ARITHMETIC PIPELINES OPERATING IN PARALLEL.

    PIPELINE HAZARDS

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    139/150

    PIPELINE HAZARDS

    THE HARDWARE TECHNIQUE THAT DETECTS AND

    RESOLVES HAZARDS IS CALLED INTERLOCK.

    A HAZARD OCCURS WHENEVER AN OBJECT WITHIN

    THE SYSTEM (REF, FLAG, MEM LOCATION) IS ACCESSEDOR MODIFIED BY 2 SEPARATE INSTS THAT ARE CLOSE

    ENOUGH IN THE PROGRAM SUCH THAT THEY MAY BE

    ACTIVE SIMULTANEOUSLY IN THE PIPELINE.

    HAZARDS ARE OF 3 KINDS: RAW, WAR AND WAW

    PIPELINE HAZARDS

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    140/150

    PIPELINE HAZARDS

    ASSUME THAT AN INST j LOGICALLY FOLLOWS AN INST i.

    RAW HAZARD: IT OCCURS BETWEEN 2 INSTS WHEN INST j

    ATTEMPTS TO READ SOME OBJECT THAT IS BEING MODIFIED

    BY INST i.

    WAR HAZARD: IT OCCURS BETWEEN 2 INSTS WHEN THE INST j

    ATTEMPTS TO WRITE ONTO SOME OBJECT THAT IS BEING

    READ BY THE INST i.

    WAW HAZARD: IT OCCURS WHEN THE INST j ATTEMPTS TO

    WRITE ONTO SOME OBJECT THAT IS ALSO REQUIRED TO BE

    MODIFIED BY THE INST i.

    PIPELINE HAZARDS

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    141/150

    PIPELINE HAZARDS

    THE DOMAIN (READ SET) OF AN INST k, DENOTED BY Dk,IS THE SET OF ALL OBJECTS WHOSE CONTENTS AREACCESSED BY THE INST k.

    THE RANGE (WRITE SET) OF AN INST k, DENOTED BY Rk,IS THE SET OF ALL OBJECTS UPDATED BY THE INST k.

    A HAZARD BETWEEN 2 INSTS i AND j (WHERE j FOLLOWSi) OCCURS WHENEVER ANY OF THE FOLLOWING HOLDS:

    Ri

    * Dj

    { } (RAW)

    Di* Rj { } (WAR)

    Ri* Rj { } (WAW), WHERE * IS INTERSECTIONOPERATION AND { } IS EMPTY SET.

    HAZARD DETECTION & REMOVAL

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    142/150

    HAZARD DETECTION & REMOVAL

    TECHNIQUES USED FOR HAZARD DETECTION CAN BECLASSIFIED INTO 2 CLASSES:

    CENTRALIZE ALL THE HAZARD DETECTION IN ONE

    STAGE (USUALLY IU) AND COMPARE THE DOMAIN ANDRANGE SETS WITH THOSE OF ALL THE INSTS INSIDETHE PIPELINE

    ALLOW THE INSTS TO TRAVEL THROUGH THE PIPELINEUNTIL THE OBJECT EITHER FROM THE DOMAIN OR

    RANGE IS REQUIRED BY THE INST. AT THIS POINT,CHECK IS MADE FOR A POTENTIAL HAZARD WITH ANYOTHER INST INSIDE THE PIPELINE.

    HAZARD DETECTION & REMOVAL

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    143/150

    HAZARD DETECTION & REMOVAL

    FIRST APPROACH IS SIMPLE BUT SUSPENDS THE

    INST FLOW IN THE IU ITSELF, IF THE INST

    FETCHED IS IN HAZARD WITH THOSE INSIDE THE

    PIPELINE.

    THE SECOND APPROACH IS MORE FLEXIBLE BUT

    THE HARDWARE REQUIRED GROWS AS A

    SQUARE OF THE NO OF STAGES.

    HAZARD DETECTION & REMOVAL

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    144/150

    HAZARD DETECTION & REMOVAL

    THERE ARE 2 APPROACHS FOR HAZARD REMOVAL:

    SUSPEND THE PIPELINE INITIATION AT THE POINT OF

    HAZARD. THUS, IF AN INST j DISCOVERS THAT THERE IS

    A HAZARD WITH THE PREVIOUSLY INITIATED INST i,THEN ALL THE INSTS j+1, j+2, ARE STOPPED IN THEIR

    TRACKS TILL THE INST i HAS PASSED THE POINT OF

    HAZARD.

    SUSPEND j BUT ALLOW THE INSTS j+1, j+2, TO FLOW.

    HAZARD DETECTION & REMOVAL

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    145/150

    HAZARD DETECTION & REMOVAL

    THE FIRST APPROACH IS SIMPLE BUT PENALIZES ALL THEINSTS FOLLOWING j.

    SECOND APPROACH IS EXPENSIVE.

    IF THE PIPELINE STAGES HAVE ADDITIONAL BUFFERS BESIDESA STAGING LATCH, THEN IT IS POSSIBLE TO SUSPEND AN INSTBECAUSE OF HAZARD.

    AT EACH POINT IN THE PIPELINE, WHERE DATA IS TO BEACCESSED AS AN INPUT TO SOME STAGE AND THERE IS A RAWHAZARD, ONE CAN LOAD ONE OF THE STAGING LATCH NOTWITH THE DATA BUT ID OF THE STAGE THAT WILL PRODUCEIT.

    HAZARD DETECTION & REMOVAL

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    146/150

    HAZARD DETECTION & REMOVAL

    THE WAITING INST THEN IS FROZEN AT THIS STAGE UNTIL

    THE DATA IS AVAILABLE.

    SINCE THE STAGE HAS MULTIPLE STAGING LATCHES IT CAN

    ALLOW OTHER INSTS TO PASS THROUGH IT WHILE THE RAWDEPENDENT ONE IS FROZEN.

    ONE CAN INCLUDE LOGIC IN THE STAGE TO FORWARD THE

    DATA WHICH WAS IN RAW HAZARD TO THE WAITING STAGE.

    THIS FORM OF CONTROL ALLOWS HAZARD RESOLUTION

    WITH THE MINIMUM PENALTY TO OTHER INSTS.

    HAZARD DETECTION & REMOVAL

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    147/150

    HAZARD DETECTION & REMOVAL

    THIS TECHNIQUE IS KNOWN BY THE NAME INTERNAL

    FORWARDING SINCE THE STAGES ARE DESIGNED TO

    CARRY OUT AUTOMATIC ROUTING OF THE DATA TO

    THE REQUIRED PLACE USING IDENTIFICATION CODES

    (IDs).

    IN FACT, MANY OF THE DATA DEPENDENT

    COMPUTATIONS ARE CHAINED BY MEANS OF ID TAGS

    SO THAT UNNECESSARY ROUTING IS ALSO AVOIDED.

    MULTIPROCESSOR SYSTEMS

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    148/150

    MULTIPROCESSOR SYSTEMS

    IT IS A COMPUTER SYSTEM COMPRISING OF TWO OR MOREPROCESSORS.

    AN INTERCONNECTION NETWORK LINKS THESE

    PROCESSORS.

    THE MAIN OBJECTIVE IS TO ENHANCE THE PERFORMANCEBY MEANS OF PARALLEL PROCESSING.

    IT FALLS UNDER THE MIMD ARCHITECTURE.

    BESIDES HIGH PERFORMANCE, IT PROVIDES THEFOLLOWING BENEFITS: FAULT TOLERANCE & GRACEFULDEGRADATION; SCALABILITY & MODULAR GROWTH

    CLASSIFICATION OF MULTI-PROCESSORS

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    149/150

    CLASSIFICATION OF MULTI PROCESSORS

    MULTI-PROCESSOR ARCHITECTURE:

    TIGHTLY COUPLED LOOSELY COUPLED

    UMA NUMA NORMA

    NO REMOTE MEMORY ACCESS

    IN A TIGHTLY COUPLED MULTI-PROCESSOR, MULTIPLE PROCS SHAREINFO VIA COMMON MEM. HENCE, ALSO KNOWN AS SHARED MEM MULTI-

    PROCESSOR SYSTEM. BESIDES GLOBAL MEM, EACH PROC CAN ALSO

    HAVE LOCAL MEM DEDICATED TO IT.

    DISTRIBUTED MEM MULTI-

    PROCESSOR SYSTEM

    SYMMETRIC

  • 5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02

    150/150

    MULTIPROCESSOR IN UMA SYSTEM, THE ACCESS TIME FOR MEM IS EQUAL

    FOR ALL THE PROCESSORS.

    A SMP SYSTEM IS AN UMA SYSTEM WITH IDENTICAL

    PROCESSORS, EQUALLY CAPABLE IN PERFORMINGSIMILAR FUNCTIONS IN AN IDENTICAL MANNER.

    ALL THE PROCS. HAVE EQUAL ACCESS TIME FOR THE MEMAND I/O RESOURCES.

    FOR THE OS, ALL SYSTEMS ARE SIMILAR AND ANY PROC.CAN EXECUTE IT