computerarchitecture abhishekmail 130520052349 phpapp02
DESCRIPTION
Computer architectureTRANSCRIPT
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
1/150
COMPUTER ARCHITECTURE
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
2/150
MICROPROCESSOR
IT IS ONE OF THE GREATEST
ACHIEVEMENTS OF THE 20THCENTURY.
IT USHERED IN THE ERA OF WIDESPREAD
COMPUTERIZATION.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
3/150
EARLY ARCHITECTURE
VON NEUMANN ARCHITECTURE, 1940
PROGRAM IS STORED IN MEMORY.
SEQUENTIAL OPERATION
ONE INST IS RETRIEVED AT A TIME, DECODEDAND EXECUTED
LIMITED SPEED
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
4/150
Conventional 32 bit
Microprocessors Higher data throughput with 32 bit wide data bus
Larger direct addressing range
Higher clock frequencies and operating speeds as a resultof improvements in semiconductor technology
Higher processing speeds because larger registers requirefewer calls to memory and reg-to-reg transfers are 5 timesfaster than reg-to-memory transfers
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
5/150
Conventional 32 bit
Microprocessors More insts and addressing modes to improve software
efficiency
More registers to support High-level languages
More extensive memory management & coprocessorcapabilities
Cache memories and inst pipelines to increase processingspeed and reduce peak bus loads
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
6/150
Conventional 32 bit
Microprocessors To construct a complete general purpose 32 bit
microprocessor, five basic functions are necessary:
ALU
MMU
FPU
INTERRUPT CONTROLLER TIMING CONTROL
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
7/150
Conventional Architecture
KNOWN AS VON NEUMANN ARCHITECTURE
ITS MAIN FEATURES ARE:
A SINGLE COMPUTING ELEMENT INCORPORATING APROCESSOR, COMM. PATH AND MEMORY
A LINEAR ORGANIZATION OF FIXED SIZE MEMORYCELLS
A LOW-LEVEL MACHINE LANGUAGE WITH INSTSPERFORMING SIMPLE OPERATIONS ON ELEMENTARYOPERANDS
SEQUENTIAL, CENTRALIZED CONTROL OFCOMPUTATION
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
8/150
Conventional Architecture
Single processor configuration:
PROCESSOR
MEMORY INPUT-OUTPUT
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
9/150
Conventional Architecture
Multiple processor configuration with a global bus:
SYSTEM
INPUT-
OUTPUT
GLOBAL
MEMORY
PROCESSORS
WITH LOCAL
MEM & I/O
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
10/150
Conventional Architecture
THE EARLY COMP ARCHITECTURES WERE
DESIGNED FOR SCIENTIFIC AND COMMERCIAL
CALCULATIONS AND WERE DEVEPOED TO
INCREASE THE SPEED OF EXECUTION OF SIMPLEINSTRUCTIONS.
TO MAKE THE COMPUTERS PERFORM MORE
COMPLEX PROCESSES, MUCH MORE COMPLEX
SOFTWARE WAS REQUIRED.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
11/150
Conventional Architecture
ADVANCEMENT OF TECHNOLOGY ENHANCED
THE SPEED OF EXECUTION OF PROCESSORS BUT
A SINGLE COMM PATH HAD TO BE USED TO
TRANSFER INSTS AND DATA BETWEEN THEPROCESSOR AND THE MEMORY.
MEMORY SIZE INCREASED. THE RESULT WAS
THE DATA TRANSFER RATE ON THE MEMORY
INTERFACE ACTED AS A SEVERE CONSTRAINT
ON THE PROCESSING SPEED.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
12/150
Conventional Architecture
AS HIGHER SPEED MEMORY BECAME
AVAILABLE, THE DELAYS INTRODUCED BY THE
CAPACITANCE AND THE TRANSMISSION LINE
DELAYS ON THE MEMORY BUS AND THEPROPAGATION DELAYS IN THE BUFFER AND
ADDRESS DECODING CIRCUITRY BECAME MORE
SIGNIFICANT AND PLACED AN UPPER LIMIT ON
THE PROCESSING SPEED.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
13/150
Conventional Architecture
THE USE OF MULTIPROCESSOR BUSES WITH THENEED FOR ARBITRATION BETWEEN COMPUTERSREQUESTING CONTROL OF THE BUS REDUCED
THE PROBLEM BUT INTRODUCED SEVERAL WAITCYCLES WHILE THE DATA OR INST WEREFETCHED FROM MEMORY.
ONE METHOD OF INCREASING PROCESSINGSPEED AND DATA THROUGHPUT ON THEMEMORY BUS WAS TO INCREASE THE NUMBEROF PARALLEL BITS TRANSFERRED ON THE BUS.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
14/150
Conventional Architecture
THE GLOBAL BUS AND THE GLOBAL MEMORYCAN ONLY SERVE ONE PROCESSOR AT A TIME.
AS MORE PROCESSORS ARE ADDED TOINCREASE THE PROCESSING SPEED, THE GLOBALBUS BOTTLENECK BECAME WORSE.
IF THE PROCESSING CONSISTS OF SEVERALINDEPENDENT TASKS, EACH PROC WILLCOMPETE FOR GLOBAL MEMORY ACCESS ANDGLOBAL BUS TRANSFER TIME.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
15/150
Conventional Architecture
TYPICALLY, ONLY 3 OR 4 TIMES THE SPEED OF A SINGLE
PROCESSOR CAN BE ACHIEVED IN MULTIPROCESSOR
SYSTEMS WITH GLOBAL MEMORY AND A GLOBAL BUS.
TO REDUCE THE EFFECT OF GLOBAL MEMORY BUS AS A
BOTTLENECK, (1) THE LENGTH OF THE PIPE WAS
INCREASED SO THAT THE INST COULD BE BROKEN
DOWN INTO BASIC ELEMENTS TO BE MANIPULATED
SIMULTANEOUSLY AND (2) CACHE MEM WASINTRODUCED SO THAT INSTS AND/OR DATA COULD BE
PREFETCHED FROM GLOBAL MEMORY AND STORED IN
HIGH SPEED LOCAL MEMORY.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
16/150
PIPELINING
THE PROC. SEPARATES EACH INST INTO ITS BASIC
OPERATIONS AND USES DEDICATED EXECUTION UNITS
FOR EACH TYPE OF OPERATION.
THE MOST BASIC FORM OF PIPELINING IS TO PREFETCH
THE NEXT INSTRUCTION WHILE SIMULTANEOULY
EXECUTING THE PREVIOUS INSTRUCTION.
THIS MAKES USE OF THE BUS TIME WHICH WOULDOTHERWISE BE WASTED AND REDUCES INSTRUCTION
EXECUTION TIME.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
17/150
PIPELINING
TO SHOW THE USE OF A PIPELINE, CONSIDER THEMULTIPLICATION OF 2 DECIMAL NOS: 3.8 X 102AND9.6X103. THE PROC. PERFORMS 3 OPERATIONS:
A: MULTIPLIES THE MANTISSA
B: ADDS THE EXPONENTS
C: NORMALISES THE RESULT TO PLACE THE DECIMALPOINT IN THE CORRECT POSITION.
IF 3 EXECUTION UNITS PERFORMED THESEOPERATIONS, OPS.A & B WOULD DO NOTHING WHILE C
IS BEING PERFORMED. IF A PIPELINE WERE IMPLEMENTED, THE NEXT
NUMBER COULD BE PROCESSED IN EXECUTION UNITS AAND B WHILE C WAS BEING PERFORMED.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
18/150
PIPELINING
TO GET A ROUGH INDICATION OF PERFORMANCE
INCREASE THROUGH PIPELINE, THE STAGE EXECUTION
INTERVAL MAY BE TAKEN TO BE THE EXECUTION TIME
OF THE SLOWEST PIPELINE STAGE.
THE PERFORMANCE INCREASE FROM PIPELINING IS
ROUGHLY EQUAL TO THE SUM OF THE AVERAGE
EXECUTION TIMES FOR ALL STAGES OF THE PIPELINE,
DIVIDED BY THE AVERAGE VALUE OF THE EXECUTIONTIME OF THE SLOWEST PIPELINE STAGE FOR THE INST
MIX CONSIDERED.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
19/150
PIPELINING
NON-SEQUENTIAL INSTS CAUSE THE
INSTRUCTIONS BEHIND IN THE PIPELINE TO BE
EMPTIED AND FILLING TO BE RESTARTED.
NON-SEQUENTIAL INSTS. TYPICALLY COMPRISE
15 TO 30% OF INSTRUCTIONS AND THEY REDUCE
PIPELINE PERFORMANCE BY A GREATER
PERCENTAGE THAN THEIR PROBABILITY OFOCCURRENCE.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
20/150
CACHE MEMORY
VON-NEUMANN SYSTEM PERFORMANCE ISCONSIDERABLY EFFECTED BY MEMORY ACCESS TIMEAND MEMORY BW (MAXIMUM MEMORY TRANSFERRATE).
THESE LIMITATIONS ARE SPECIALLY TIGHT FOR 32 BITPROCESSORS WITH HIGH CLOCK SPEEDS.
WHILE STATIC RAM WITH 25ns ACCESS TIMES ARE
CAPABLE OF KEEPING PACE WITH PROC SPEED, THEYMUST BE LOCATED ON THE SAME BOARD TO MINIMISEDELAYS, THUS LIMITING THE AMOUNT OF HIGH SPEEDMEMORY AVAILABLE.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
21/150
CACHE MEMORY
DRAM HAS A GREATER CAPACITY PER CHIP ANDA LOWER COST, BUT EVEN THE FASTEST DRAMCANT KEEP PACE WITH THE PROCESSOR,
PARTICULARLY WHEN IT IS LOCATED ON ASEPARATE BOARD ATTACHED TO A MEMORYBUS.
WHEN A PROC REQUIRES INST/DATA FROM/TOMEMORY, IT ENTERS A WAIT STATE UNTIL IT ISAVAILABLE. THIS REDUCES PROCESSORSPERFORMANCE.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
22/150
CACHE MEMORY
CACHE ACTS AS A FAST LOCAL STORAGEBUFFER BETWEEN THE PROC AND THE MAINMEMORY.
OFF-CHIP BUT ON-BOARD CACHE MAY REQUIRESEVERAL MEMORY CYCLES WHEREAS ON-CHIPCACHE MAY ONLY REQUIRE ONE MEMORYCYCLE, BUT ON-BOARD CACHE CAN PREVENTTHE EXCESSIVE NO. OF WAIT STATES IMPOSEDBY MEMORY ON THE SYSTEM BUS AND ITREDUCES THE SYSTEM BUS LOAD.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
23/150
CACHE MEMORY
THE COST OF IMPLEMENTING AN ON-BOARD
CACHE IS MUCH LOWER THAN THE COST OF
FASTER SYSTEM MEMORY REQUIRED TO
ACHIEVE THE SAME MEMORY PERFORMANCE.
CACHE PERFORMANCE DEPENDS ON ACCESS
TIME AND HIT RATIO, WHICH IS DEPENDENT ON
THE SIZE OF THE CACHE AND THE NO. OF BYTESBROUGHT INTO CACHE ON ANY FETCH FROM
THE MAIN MEMORY (THE LINE SIZE).
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
24/150
CACHE MEMORY
INCREASING THE LINE SIZE INCREASES THE
CHANCE THAT THERE WILL BE A CACHE HIT ON
THE NEXT MEMORY REFERENCE.
IF A 4K BYTE CACHE WITH A 4 BYTE LINE SIZE
HAS A HIT RATIO OF 80%, DOUBLING THE LINE
SIZE MIGHT INCREASE THE HIT RATIO TO 85%
BUT DOUBLING THE LINE SIZE AGAIN MIGHTONLY INCREASE THE HIT RATIO TO 87%.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
25/150
CACHE MEMORY
OVERALL MEMORY PERFORMANCE IS A
FUNCTION OF CACHE ACCESS TIME, CACHE HIT
RATIO AND MAIN MEMORY ACCESS TIME FOR
CACHE MISSES.
A SYSTEM WITH 80% CACHE HIT RATIO AND
120ns CACHE ACCESS TIME ACCESSES MAIN
MEMORY 20% OF THE TIME WITH AN ACCESSTIME OF 600 ns. THE AV ACCESS TIME IN ns WILL
BE (0.8x120) +[0.2x(600 + 120)]= 240
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
26/150
CACHE DESIGN
PROCESSORS WITH DEMAND PAGED VIRTUAL
MEMORY SYSTEMS REQUIRE AN ASSOCIATIVE
CACHE.
VIRTUAL MEM SYSTEMS ORGANIZE ADDRESSES
BY THE START ADDRESSES FOR EACH PAGE AND
AN OFFSET WHICH LOCATES THE DATA WITHIN
THE PAGE.
AN ASSOCIATIVE CACHE ASSOCIATES THEOFFSET WITH THE PAGE ADDRESS TO FIND THE
DATA NEEDED.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
27/150
CACHE DESIGN
WHEN ACCESSED, THE CACHE CHECKS TO SEE IF ITCONTAINS THE PAGE ADDRESS (OR TAG FIELD); IF SO, ITADDS THE OFFSET AND, IF A CACHE HIT IS DETECTED,THE DATA IS FETCHED IMMEDIATELY FROM THE
CACHE. PROBLEMS CAN OCCUR IN A SINGLE SET-ASSOCIATIVE
CACHE IF WORDS WITHIN DIFFERENT PAGES HAVE THESAME OFFSET.
TO MINIMISE THIS PROBLEM A 2-WAY SET-ASSOCIATIVE
CACHE IS USED. THIS IS ABLE TO ASSOCIATE MORETHAN ONE SET OF TAGS AT A TIME ALLOWING THECACHE TO STORE THE SAME OFFSET FROM TWODIFFERENT PAGES.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
28/150
CACHE DESIGN
A FULLY ASSOCIATIVE CACHE ALLOWS ANYNUMBER OF PAGES TO USE THE CACHESIMULTANEOUSLY.
A CACHE REQUIRES A REPLACEMENTALGORITHM TO FIND REPLACEMENT CACHELINES WHEN A MISS OCCURS.
PROCESSORS THAT DO NOT USE DEMAND PAGEDVIRTUAL MEMORY, CAN EMPLOY A DIRECTMAPPED CACHE WHICH CORRESPONDS EXACTLYTO THE PAGE SIZE AND ALLOWS DATA FROMONLY ONE PAGE TO BE STORED AT A TIME.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
29/150
MEMORY ARCHITECTURES
32 BIT PROCESSORS HAVE INTRODUCED 3 NEW
CONCEPTS IN THE WAY THE MEMORY IS
INTERFACED:
1. LOCAL MEMORY BUS EXTENSIONS
2. MEMORY INTEREAVING
3. VIRTUAL MEMORY MANAGEMENT
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
30/150
LOCAL MEM BUS EXTENSIONS
IT PERMITS LARGER LOCAL MEMORIES TO BECONNECTED WITHOUT THE DELAYS CAUSED BY BUSREQUESTS AND BUS ARBITRATION FOUND ONMULTIPROCESSOR BUSES.
IT HAS BEEN PROVIDED TO INCREASE THE SIZE OF THELOCAL MEMORY ABOVE THAT WHICH CAN BEACCOMODATED ON THE PROCESSOR BOARD.
BY OVERLAPPING THE LOCAL MEM BUS AND THESYSTEM BUS CYCLES IT IS POSSIBLE TO ACHIEVE
HIGHER MEM ACCESS RATES FROM PROCESSORS WITHPIPELINES WHICH PERMIT THE ADDRESS OF THE NEXTMEMORY REFERENCE TO BE GENERATED WHILE THEPREVIOUS DATA WORD IS BEING FETCHED.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
31/150
MEMORY INTERLEAVING
PIPELINED PROCESSORS WITH THE ABILITY TO
GENERATE THE ADDRESS OF THE NEXT MEMORY
REFERENCE WHILE FETCHING THE PREVIOUS
DATA WORD WOULD BE SLOWED DOWN IF THEMEMORY WERE UNABLE TO BEGIN THE NEXT
MEMORY ACCESS UNTIL THE PREVIOUS MEM
CYCLE HAD BEEN COMPLETED.
THE SOLUTION IS TO USE TWO-WAY MEMORYINTERLEAVING. IT USES 2 MEM BOARDS- 1 FOR
ODD ADDRESSES AND 1 FOR EVEN ADDRESSES.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
32/150
MEMORY INTERLEAVING
ONE BOARD CAN BEGIN THE NEXT MEM CYCLE
WHILE THE OTHER BOARD COMPLETES THE
PREVIOUS CYCLE.
THE SPEED ADV IS GREATEST WHEN MULTIPLE
SEQUENTIAL MEM ACCESSES ARE REQUIRED
FOR BURST I/O TRANSFERS BY DMA.
DMA DEFINES A BLOCK TRANSFER IN TERMS OF
A STARTING ADDRESS AND A WORD COUNT FORSEQUENTIAL MEM ACCESSES.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
33/150
MEMORY INTERLEAVING
TWO WAY INTERLEAVING MAY NOT PREVENT
MEM WAIT STATES FOR SOME FAST SIGNAL
PROCESSING APPLICATIONS AND SYSTEMS HAVE
BEEN DESIGNED WITH 4 OR MORE WAYINTERLEAVING IN WHICH THE MEM BOARDS ARE
ASSIGNED CONSECUTIVE ADDRESSES BY A
MEMORY CONTROLLER.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
34/150
Conventional Architecture
EVEN WITH THESE ENHANCEMENTS, THE SEQUENTIAL
VON NEUMANN ARCHITECTURE REACHED THE LIMITS
IN PROCESSING SPEED BECAUSE THE SEQUENTIAL
FETCHING OF INSTS AND DATA THROUGH A COMMON
MEMORY INTERFACE FORMED THE BOTTLENECK.
THUS, PARALLEL PROC ARCHITECTURES CAME INTO
BEING WHICH PERMIT LARGE NUMBER OF COMPUTING
ELEMENTS TO BE PROGRAMMED TO WORK TOGETHERSIMULTANEOUSLY. THE USEFULNESS OF PARALLEL
PROCESSOR DEPENDS UPON THE AVAILABILITY OF
SUITABLE PARALLEL ALGORITHMS.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
35/150
HOW TO INCREASE THE SYSTEM SPEED?
1. USING FASTER COMPONENTS. COSTS MORE,
DISSIPATE CONSIDERABLE HEAT.
THE RATE OF GROWTH OF SPEED USING
BETTER TECHNOLOGY IS VERY SLOW. eg., IN
80S BASIC CLOCK RATE WAS 50 MHz AND
TODAY IT IS AROUND 2 GHz, DURING THIS
PERIOD SPEED OF COMPUTER IN SOLVING
INTENSIVE PROBLEMS HAS GONE UP BY AFACTOR OF 100,000. IT IS DUE TO THE
INCREASED ARCHITECTURE.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
36/150
HOW TO INCREASE THE SYSTEM SPEED?
2. ARCHITECTURAL METHODS:
A. USE PARALLELISM IN SINGLE PROCESSOR
[ OVERLAPPING EXECUTION OF NO OF INSTS
(PIPELINING)]
B. OVERLAPPING OPERATION OF DIFFERENT
UNITS
C. INCREASE SPEED OF ALU BY EXPLOITING
DATA/TEMPORAL PARALLELISM
D. USING NO OF INTERCONNECTED PROCESSORS
TO WORK TOGETHER
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
37/150
PARALLEL COMPUTERS
THE IDEA EMERGED AT CIT IN 1981
A GROUP HEADED BY CHARLES SEITZ AND
GEOFFREY FOX BUILT A PARALLEL COMPUTERIN 1982
16 NOS 8085 WERE CONNECTED IN A HYPERCUBE
CONFIGURATION
ADV WAS LOW COST PER MEGAFLOP
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
38/150
PARALLEL COMPUTERS
BESIDES HIGHER SPEED, OTHER FEATURES OF
PARALLEL COMPUTERS ARE:
BETTER SOLUTION QUALITY: WHEN ARITHMETIC
OPS ARE DISTRIBUTED, EACH PE DOES SMALLER
NO OF OPS, THUS ROUNDING ERRORS ARE
REDUCED
BETTER ALGORITHMS
BETTER AND FASTER STORAGE
GREATER RELIABILITY
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
39/150
CLASSIFICATION OF COMPUTER
ARCHITECTURE
FLYNNS TAXONOMY: IT IS BASEDUPON HOW THE COMPUTER
RELATES ITS INSTRUCTIONS TO THEDATA BEING PROCESSED
SISD
SIMD
MISD
MIMD
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
40/150
FLYNNS TAXONOMY
SISD: CONVENTIONAL VON-NEUMANN SYSTEM.
CONTROL
UNITPROCESSOR
INST STREAMDATA
STREAM
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
41/150
FLYNNS TAXONOMY
SIMD: IT HAS A SINGLE STREAM OF VECTOR
INSTS THAT INITIATE MANY OPERATIONS. EACH
ELEMENT OF A VECTOR IS REGARDED AS A
MEMBER OF A SEPARATE DATA STREAM GIVINGMULTIPLE DATA STREAMS.
CONTROLUNIT
INST
STREAM
PROCESSOR
PROCESSOR
PROCESSOR
DATA STREAM 1
DATA STREAM 2
DATA STREAM 3SYNCHRONOUS
MULTIPROCESSOR
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
42/150
FLYNNS TAXONOMY
MISD: NOT POSSIBLE
C U 1
C U 2
PU 3
PU 2
C U 3
PU 1INST STREAM 1
INST STREAM 2
INST STREAM 3
DS
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
43/150
FLYNNS TAXONOMY
MIMD: MULTIPROCESSOR CONFIGURATION AND
ARRAY OF PROCESSORS.
CU 1
CU 2
CU 3
IS 1
IS 2
IS 3
DS 1
DS 2
DS 3
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
44/150
FLYNNS TAXONOMY
MIMD COMPUTERS COMPRISE OF INDEPENDENT
COMPUTERS, EACH WITH ITS OWN MEMORY,
CAPABLE OF PERFORMING SEVERAL
OPERATIONS SIMULTANEOUSLY.
MIMD COMPS MAY COMPRISE OF A NUMBER OF
SLAVE PROCESSORS WHICH MAY BE
INDIVIUALLY CONNECTED TO MULTI-ACCESSGLOBAL MEMORY BY A SWITCHING MATRIX
UNDER THE CONTROL OF MASTER PROCESSOR.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
45/150
FLYNNS TAXONOMY
THIS CLASSIFICATION IS TOO BROAD.
IT PUTS EVERYTHING EXCEPT
MULTIPROCESSORS IN ONE CLASS.
IT DOES NOT REFLECT THE CONCURRENCY
AVAILABLE THROUGH THE PIPELINE
PROCESSING AND THUS PUTS VECTOR
COMPUTERS IN SISD CLASS.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
46/150
SHORES CLASSIFICATION
SHORE CLASSIFIED THE COMPUTERS ON THE BASISOF ORGANIZATION OF THE CONSTITUENT ELEMENTSOF THE COMPUTER.
SIX DIFFERENT KINDS OF MACHINES WERERECOGNIZED:
1. CONVENTIONAL VON NEWMANN ARCHITECTUREWITH 1 CU, 1 PU, IM AND DM. A SINGLE DM READPRODUCES ALL BITS FOR PROCESSING BY PU. THE PU
MAY CONTAIN MULTIPLE FUNCTIONAL UNITS WHICHMAY OR MAY NOT BE PIPELINED. SO, IT INCLUDESBOTH THE SCALAR COMPS (IBM 360/91, CDC7600)ANDPIPELINED VECTOR COMPUTERS (CRAY 1, CYBER 205)
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
47/150
SHORES CLASSIFICATION
TYPE I:
IM CU
HORIZONTAL PU
WORD SLICE DM
NOTE THAT THE PROCESSING IS
CHARACTERISED AS
HORIZONTAL (NO OF BITS IN
PARALLEL AS A WORD)
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
48/150
SHORES CLASSIFICATION
MACHINE 2: SAME AS MACHINE 1 EXCEPT THATSM FETCHES A BIT SLICE FROM ALL THE WORDSIN THE MEMORY AND PU IS ORGANIZED TO
PERFORM THE OPERATIONS IN A BIT SERIALMANNER ON ALL THE WORDS.
IF THE MEMORY IS REGARDED AS A 2D ARRAYOF BITS WITH ONE WORD STORED PER ROW,THEN THE MACHINE 2 READS VERTICAL SLICE
OF BITS AND PROCESSES THE SAME, WHEREASTHE MACHINE 1 READS AND PROCESSESHORIZONTAL SLICE OF BITS. EX. MPP, ICL DAP
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
49/150
SHORES CLASSIFICATION
MACHINE 2:
IM
CU
VERTICAL
PU
BIT SLICE
DM
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
50/150
SHORES CLASSIFICATION
MACHINE 3: COMBINATION OF 1 AND 2.
IT COULD BE CHARACTERISED HAVING A
MEMORY AS AN ARRAY OF BITS WITH BOTH
HORIZONTAL AND VERTICAL READING ANDPROCESSING POSSIBLE.
SO, IT WILL HAVE BOTH VERTICAL AND
HORIZONTAL PROCESSING UNITS.
EXAMPLE IS OMENN 60 (1973)
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
51/150
SHORES CLASSIFICATION
MACHINE 3:
IM
CU
VERTICAL
PU
HORIZON
TAL PU
DM
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
52/150
SHORES CLASSIFICATION
MACHINE 4: IT IS OBTAINED BY REPLICATING THE PUAND DM OF MACHINE 1.
AN ENSEMBLE OF PU AND DM IS CALLED PROCESSING
ELEMENT (PE).
THE INSTS ARE ISSUED TO THE PEs BY A SINGLE CU. PEsCOMMUNICATE ONLY THROUGH CU.
ABSENCE OF COMM BETWEEN PEs LIMITS ITSAPPLICABILITY
EX: PEPE(1976)
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
53/150
SHORES CLASSIFICATION
MACHINE 4:
IM
CU
PU PU PU
DM DM DM
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
54/150
SHORES CLASSIFICATION
MACHINE 5: SIMILAR TO MACHINE 4 WITH THE
ADDITION OF COMMUNICATION BETWEEN
PE.EXAMPLE: ILLIAC IV
IM
CU
PU PU PU
DM DM DM
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
55/150
SHORES CLASSIFICATION
MACHINE 6:
MACHINES 1 TO 5 MAINTAIN SEPARATIONBETWEEN DM AND PU WITH SOME DATA BUS OR
CONNECTION UNIT PROVIDING THECOMMUNICATION BETWEEN THEM.
MACHINE 6 INCLUDES THE LOGIC IN MEMORYITSELF AND IS CALLED ASSOCIATIVEPROCESSOR.
MACHINES BASED ON SUCH ARCHITECTURESSPAN A RANGE FROM SIMPLE ASSOCIATIVEMEMORIES TO COMPLEX ASSOCIATIVE PROCS.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
56/150
SHORES CLASSIFICATION
MACHINE 6:
IM
CU
PU + DM
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
57/150
FENGS CLASSIFICATION
FENG PROPOSED A SCHEME ON THE BASIS OF
DEGREE OF PARALLELISM TO CLASSIFY
COMPUTER ARCHITECTURE.
MAXIMUM NO OF BITS THAT CAN BE PROCESSED
EVERY UNIT OF TIME BY THE SYSTEM IS CALLED
MAXIMUM DEGREE OF PARALLELISM
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
58/150
FENGS CLASSIFICATION
BASED ON FENGS SCHEME, WE HAVESEQUENTIAL AND PARALLEL OPERATIONS ATBIT AND WORD LEVELS TO PRODUCE THE
FOLLOWING CLASSIFICATION: WSBS NO CONCEIVABLE IMPLEMENTATION
WPBS STARAN
WSBP CONVENTIONAL COMPUTERS
WPBP ILLIAC IV
o THE MAX DEGREE OF PARALLELISM IS GIVEN BYTHE PRODUCT OF THE NO OF BITS IN THE WORDAND NO OF WORDS PROCESSED IN PARALLEL
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
59/150
HANDLERS CLASSIFICATION
FENGS SCHEME, WHILE INDICATING THEDEGREE OF PARALLELISM DOES NOT ACCOUNTFOR THE CONCURRENCY HANDLED BY THE
PIPELINED DESIGNS.
HANDLERS SCHEME ALLOWS THE PIPELININGTO BE SPECIFIED.
IT ALLOWS THE IDENTIFICATION OFPARALLELISM AND DEGREE OF PIPELININGBUILT IN THE HARDWARE STRUCTURE
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
60/150
HANDLERS CLASSIFICATION
HANDLER DEFINED SOME OF THE TERMS AS:
PCUPROCESSOR CONTROL UNITS
ALUARITHMETIC LOGIC UNIT
BLCBIT LEVEL CIRCUITS PEPROCESSING ELEMENTS
A COMPUTING SYSTEM C CAN THEN BE CHARATERISED
BY A TRIPLE AS T(C) = (KxK', DxD', WxW')
WHERE K=NO OF PCU, K'= NO OF PROCESSORS THAT
ARE PIPELINED, D=NO OF ALU,D'= NO OF PIPELINEDALU, W=WORDLENGTH OF ALU OR PE AND W'= NO OF
PIPELINE STAGES IN ALU OR PE
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
61/150
COMPUTER PROGRAM ORGANIZATION
BROADLY, THEY MAY BE CLASSIFIED AS:
CONTROL FLOW PROGRAMORGANIZATION
DATAFLOW PROGRAM ORGANIZATION
REDUCTION PROGRAM ORGANIATION
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
62/150
COMPUTER PROGRAM ORGANIZATION
IT USES EXPLICIT FLOWS OF CONTROL INFO TO
CAUSE THE EXECUTION OF INSTS.
DATAFLOW COMPS USE THE AVAILABILITY OF
OPERANDS TO TRIGGER THE EXECUTION OF
OPERATIONS.
REDUCTION COMPUTERS USE THE NEED FOR A
RESULT TO TRIGGER THE OPERATION WHICH
WILL GENERATE THE REQUIRED RESULT.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
63/150
COMPUTER PROGRAM ORGANIZATION
THE THREE BASIC FORMS OF COMP PROGRAM
ORGANIZATION MAY BE DESCRIBED IN TERMS
OF THEIR DATA MECHANISM (WHICH DEFINES
THE WAY A PERTICULAR ARGUMENT IS USED BYA NUMBER OF INSTRUCTIONS) AND THE
CONTROL MECHANISM (WHICH DEFINES HOW
ONE INST CAUSES THE EXECUTION OF ONE OR
MORE OTHER INSTS AND THE RESULTINGCONTROL PATTERN).
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
64/150
COMPUTER PROGRAM ORGANIZATION
CONTROL FLOW PROCESSORSHAVE A BY
REFERENCE DATA MECHANISM (WHICH USES
REFERENCES EMBEDDED IN THE INSTS BEING
EXECUTED TO ACCESS THE CONTENTS OF THESHARED MEMORY) AND TYPICALLY A
SEQUENTIAL CONTROL MECHANISM ( WHICH
PASSES A SINGLE THREAD OF CONTROL FROM
INSTRUCTION TO INSTRUCTION).
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
65/150
COMPUTER PROGRAM ORGANIZATION
DATAFLOW COMPUTERSHAVE A BY VALUEDATA MECHANISM (WHICH GENERATES ANARGUMENT AT RUN-TIME WHICH IS REPLICATEDAND GIVEN TO EACH ACCESSING INSTRUCTIONFOR STORAGE AS A VALUE) AND A PARALLELCONTROL MECHANISM.
BOTH MECHANISMS ARE SUPPORTED BY DATA
TOKENS WHICH CONVEY DATA FROM PRODUCERTO CONSUMER INSTRUCTIONS AND CONTRIBUTETO THE ACTIVATION OF CONSUMER INSTS.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
66/150
COMPUTER PROGRAM ORGANIZATION
TWO BASIC TYPES OF REDUCTION PROGRAMORGANIZATIONS HAVE BEEN DEVELOPED:
A. STRING REDUCTION WHICH HAS A BY VALUEDATA MECHANISM AND HAS ADVANTAGESWHEN MANIPULATING SIMPLE EXPRESSIONS.
B. GRAPH REDUCTION WHICH HAS A BYREFERENCE DATA MECHANISM AND HASADVANTAGES WHEN LARGER STRUCTURESARE INVOLVED.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
67/150
COMPUTER PROGRAM ORGANIZATION
CONTROL-FLOW AND DATA-FLOW PROGRAMS
ARE BUILT FROM FIXED SIZE PRIMITIVE INSTS
WITH HIGHER LEVEL PROGRAMS CONSTRUCTED
FROM SEQUENCES OF THESE PRIMITIVEINSTRUCTIONS AND CONTROL OPERATIONS.
REDUCTION PROGRAMS ARE BUILT FROM HIGH
LEVEL PROGRAM STRUCTURES WITHOUT THENEED FOR CONTROL OPERATORS.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
68/150
COMPUTER PROGRAM ORGANIZATION
THE RELATIONSHIP OF THEDATA AND CONTROLMECHANISMS TO THE BASIC COMPUTERPROGRAM ORGANIZATIONS CAN BE SHOWN ASUNDER:
DATA MECHANISM
BY VALUE BY REFERENCECONTROL MECHANISM
SEQUENTIAL VON-NEUMANN CON.FLOW
PARALLEL DATA FLOW PARALLEL CONTROL FLOWRECURSIVE STRING REDUCTION GRAPH REDUCTION
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
69/150
MACHINE ORGANIZATION
MACHINE ORGANIZATION CAN BE CLASSIFIEDAS FOLLOWS:
CENTRALIZED: CONSISTING OF A SINGLE
PROCESSOR, COMM PATH AND MEMORY. ASINGLE ACTIVE INST PASSES EXECUTION TO ASPECIFIC SUCCESSOR INSTRUCTION.
o TRADITIONAL VON-NEUMANN PROCESSORS
HAVE CENTRALIZED MACHINE ORGANIZATIONAND A CONTROL FLOW PROGRAMORGANIZATION.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
70/150
MACHINE ORGANIZATION
PACKET COMMUNICATION: USING A CIRCULAR
INST EXECUTION PIPELINE IN WHICH
PROCESSORS, COMMUNICATIONS AND
MEMORIES ARE LINKED BY POOLS OF WORK.
o NEC 7281 HAS A PACKET COMMUNICATION
MACHINE ORGANIZATION AND DATAFLOW
PROGRAM ORGANIZATION.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
71/150
MACHINE ORGANIZATION
EXPRESSION MANIPULATION WHICH USES IDENTICAL
RESOURCES IN A REGULAR STRUCTURE, EACH
RESOURCE CONTAINING A PROCESSOR,
COMMUNICATION AND MEMORY. THE PROGRAM
CONSISTS OF ONE LARGE STRUCTURE, PARTS OFWHICH ARE ACTIVE WHILE OTHER PARTS ARE
TEMPORARILY SUSPENDED.
AN EXPRESSION MANIPULATION MACHINE MAY BE
CONSTRUCTED FROM A REGULAR STRUCTURE OF T414
TRANSPUTERS, EACH CONTAINING A VON-NEUMANN
PROCESSOR, MEMORY AND COMMUNICATION LINKS.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
72/150
MULTIPROCESSING SYSTEMS
IT MAKES USE OF SEVERAL PROCESSORS, EACHOBEYING ITS OWN INSTS, USUALLYCOMMUNICATING VIA A COMMON MEMORY.
ONE WAY OF CLASSIFYING THESE SYSTEMS ISBY THEIR DEGREE OF COUPLING.
TIGHTLY COUPLED SYSTEMS HAVE PROCESSORSINTERCONNECTED BY A MULTIPROCESSORSYSTEM BUS WHICH BECOMES A PERFORMANCEBOTTLENECK.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
73/150
MULTIPROCESSING SYSTEMS
INTERCONNECTION BY A SHARED MEMORY IS LESS
TIGHTLY COUPLED AND A MULTIPORT MEMORY MAY
BE USED TO REDUCE THE BUS BOTTLENECK.
THE USE OF SEVERAL AUTONOMOUS SYSTEMS, EACH
WITH ITS OWN OS, IN A CLUSTER IS MORE LOOSELY
COUPLED.
THE USE OF NETWORK TO INTERCONNECT SYSTEMS,USING COMM SOFTWARE, IS THE MOST LOOSELY
COUPLED ALTERNATIVE.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
74/150
MULTIPROCESSING SYSTEMS
DEGREE OF COUPLING:
NETWORK
SW
NETWORK
SW
NETWORK LINK
OS OSCLUSTER LINK
SYSTEM
MEMORY
SYSTEM
MEMORY
SYSTEM BUS
CPU CPUMULTIPROCESSOR BUS
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
75/150
MULTIPROCESSING SYSTEMS
MULTIPROCESSORS MAY ALSO BE CLASSIFIED
AS AUTOCRATIC OR EGALITARIAN.
AUTOCRATIC CONTROL IS SHOWN WHERE A
MASTER-SLAVE RELATIONSHIP EXISTS BETWEEN
THE PROCESSORS.
EGALITARIAN CONTROL GIVES ALL PROCESSORS
EQUAL CONTROL OF SHARED BUS ACCESS.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
76/150
MULTIPROCESSING SYSTEMS
MULTIPROCESSING SYSTEMS WITH SEPARATE
PROCESSORS AND MEMORIES MAY BE
CLASSIFIED AS DANCE HALL CONFIGURATIONS
IN WHICH THE PROCESSORS ARE LINED UP ONONE SIDE WITH THE MEMORIES FACING THEM.
CROSS CONNECTIONS ARE MADE BY A
SWITCHING NETWORK.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
77/150
MULTIPROCESSING SYSTEMS
DANCE HALL CONFIGURATION:
CPU 1
CPU 2
CPU 3
CPU 4
SWITCHI
NG
NETWO
RK
MEM 1
MEM 2
MEM 3
MEM 4
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
78/150
MULTIPROCESSING SYSTEMS
ANOTHER CONFIGURATION IS BOUDOIR CONFIG IN WHICH EACH
PROCESSOR IS CLOSELY COUPLED WITHITS OWN MEMORY AND A
NETWORK OF SWITCHES IS USED TO LINK THE PROCESSOR-MEMORY
PAIRS.
CPU 1
MEM 1
CPU 2
MEM 2
CPU 3
MEM 3
CPU 4
MEM 4
SWITCHING
NETWORK
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
79/150
MULTIPROCESSING SYSTEMS
ANOTHER TERM, WHICH IS USED TO DESCRIBE A
FORM OF PARALLEL COMPUTING IS
CONCURRENCY.
IT DENOTES INDEPENDENT, AYNCHRONOUS
OPERATION OF A COLLECTION OF PARALLEL
COMPUTING DEVICES RATHER THAN THE
SYNCHRONOUS OPERATION OF DEVICES IN AMULTIPROCESSOR SYSTEM.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
80/150
SYSTOLIC ARRAYS
IT MAY BE TERMED AS MISD SYSTEM.
IT IS A REGULAR ARRAY OF PROCESSING ELEMENTS,EACH COMMUNICATING WITH ITS NEAREST
NEIGHBOURS AND OPERATING SYNCHRONOUSLYUNDER THE CONTROL OF A COMMON CLOCK WITH ARATE LIMITED BY THE SLOWEST PROCESSOR IN THEARRAY.
THE ERM SYSTOLIC IS DERIVED FROM THR RHYTHMICCONTRACTION OF THE HEART, ANALOGOUS TO THERHYTHMIC PUMPING OF DATA THROUGH AN ARRAY OFPROCESSING ELEMENTS.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
81/150
WAVEFRONT ARRAY
IT IS A REGULAR ARRAY OF PROCESSING ELEMENTS,
EACH COMMUNICATING WITH ITS NEAREST
NEIGHBOURS BUT OPERATING WITH NO GLOBAL
CLOCK.
IT EXHIBITS CONCURRENCY AND IS DATA DRIVEN.
THE OPERATION OF EACH PROCESSOR IS CONTROLLED
LOCALLY AND IS ACTIVATED BY THE ARRIVAL OF DATAAFTER ITS PREVIOUS OUTPUT HAS BEEN DELIVERED TO
THE APPROPRIATE NEIGHBOURING PROCESSOR.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
82/150
WAVEFRONT ARRAY
PROCESSING WAVEFRONTS DEVELOP ACROSS
THE ARRAY AS PROCESSORS PASS ON THE
OUTPUT DATA TO THEIR NEIGHBOUR. HENCE
THE NAME.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
83/150
GRANULARITY OF PARALLELISM
PARALLEL PROCESSING EMPHASIZES THE USE OFSEVERAL PROCESSING ELEMENTS WITH THEMAIN OBJECTIVE OF GAINING SPEED INCARRYING OUT A TIME CONSUMING COMPUTINGJOB
A MULTI-TASKING OS EXECUTES JOBCONCURRENTLY BUT THE OBJECTIVE IS TO
EFFECT THE CONTINUED PROGRESS OF ALL THETASKS BY SHARING THE RESOURCES IN ANORDERLY MANNER.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
84/150
GRANULARITY OF PARALLELISM
THE PARALLEL PROCESSING EMPHASIZES THEEXPLOITATION OF CONCURRENCY AVAILABLEIN A PROBLEM FOR CARRYING OUT THECOMPUTATION BY EMPLOYING MORE THAN ONEPROCESSOR TO ACHIEVE BETTER SPEED AND/ORTHROUGHPUT.
THE CONCURRENCY IN THE COMPUTING
PROCESS COULD BE LOOKED UPON FORPARALLEL PROCESSING AT VARIOUS LEVELS(GRANULARITY OF PARALLELISM) IN THE
SYSTEM.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
85/150
GRANULARITY OF PARALLELISM
THE FOLLOWING GRANULARITIES OF PARALLELISMMAY BE IDENTIFIED IN ANY EXISTING SYSTEM:
o PROGRAM LEVEL PARALLELISM
o PROCESS OR TASK LEVEL PARALLELISM
o PARALLELISM AT THE LEVEL OF GROUP OFSTATEMENTS
o STATEMENT LEVEL PARALLELISM
o PARALLELISM WITHIN A STATEMENT
o INSTRUCTION LEVEL PARALLELISM
o PARALLELISM WITHIN AN INSTRUCTION
o LOGIC AND CIRCUIT LEVEL PARALLELISM
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
86/150
GRANULARITY OF PARALLELISM
THE GRANULARITIES ARE LISTED IN THE
INCREASING DEGREE OF FINENESS.
GRANULARITIES AT LEVELS 1,2 AND 3 CAN BEEASILY IMPLEMENTED ON A CONVENTIONAL
MULTIPROCESSOR SYSTEM.
MOST MULTI-TASKING OS ALLOW CREATIONAND SCHEDULING OF PROCESSES ON THE
AVAILABLE RESOURCES.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
87/150
GRANULARITY OF PARALLELISM
SINCE A PROCESS REPRESENTS A SIZABLE CODE INTERMS OF EXECUTION TIME, THE OVERLOADS INEXPLOITING THE PARALLELISM AT THESEGRANULARITIES ARE NOT EXCESSIVE.
IF THE SAME PRINCIPLE IS APPLIED TO THE NEXT FEWLEVELS, INCREASED SCHEDULING OVERHEADS MAYNOT WARRANT PARALLEL EXECUTION
IT IS SO BECAUSE THE UNIT OF WORK OF A MULTI-PROCESSOR IS CURRENTLY MODELLED AT THE LEVELOF A PROCESS OR TASK AND IS REASONABLYSUPPORTED ON THE CURRENT ARCHITECTURES.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
88/150
GRANULARITY OF PARALLELISM
THE LAST THREE LEVELS ARE BEST HANDLED BYHARDWARE. SEVERAL MACHINES HAVE BEEN BUILT TOPROVIDE THE FINE GRAIN PARALLELISM IN VARYINGDEGREES.
A MACHINE HAVING INST LEVEL PARALLELISMEXECUTES SEVERAL INSTS SIMULTANEOUSLY.EXAMPLES ARE PIPELINE INST PROCESSORS,SYNCHRONOUS ARRAY PROCESSORS, ETC.
CIRCUIT LEVEL PARALLELISM EXISTS IN MOSTMACHINES IN THE FORM OF PROCESSING MULTIPLEBITS/BYTES SIMULTANEOUSLY.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
89/150
PARALLEL ARCHITECTURES
THERE ARE NUMEROUS ARCHITECTURES THAT HAVE
BEEN USED IN THE DESIGN OF HIGH SPEED COMPUTERS.
IT FALLS BASICALLY INTO 2 CLASSES:
GENERAL PURPOSE &
SPECIAL PURPOSE
o GENERAL PURPOSE ARCHITECTURES ARE DESIGNED TO
PROVIDE THE RATED SPEEDS AND OTHER COMPUTINGREQUIREMENTS FOR VARIETY OF PROBLEMS WITH
SAME PERFORMANCE.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
90/150
PARALLEL ARCHITECTURES
THE IMPORTANT ARCHITECTURAL IDEAS BEING
USED IN DESIGNING GEN PURPOSE HIGH SPEED
COMPUTERS ARE:
PIPELINED ARCHITECTURES
ASYNCHRONOUS MULTI-PROCESSORS
DATA-FLOW COMPUTERS
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
91/150
PARALLEL ARCHITECTURES
THE SPECIAL PURPOSE MACHINES HAVE TO EXCEL FOR
WHAT THEY HAVE BEEN DESIGNED. IT MAY OR MAY
NOT DO SO FOR OTHER APPLICATIONS. SOME OF THE
IMPORTANT ARCHITECTURAL IDEAS FOR DEDICATED
COMPUTERS ARE:
SYNCHRONOUS MULTI-PROCESSORS(ARRAY
PROCESSOR)
SYSTOLIC ARRAYS
NEURAL NETWORKS
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
92/150
ARRAY PROCESSORS
IT CONSISTS OF SEVERAL PE, ALL OF WHICH EXECUTE
THE SAME INST ON DIFFERENT DATA.
THE INSTS ARE FETCHED AND BROADCAST TO ALL THE
PE BY A COMMON CU.
THE PE EXECUTE INSTS ON DATA RESIDING IN THEIR
OWN MEMORY.
THE PE ARE LINKED VIA AN INTERCONNECTION
NETWORK TO CARRY OUT DATA COMMUNICATION
BETWEEN THEM.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
93/150
ARRAY PROCESSORS
THERE ARE SEVERAL WAYS OF CONNECTING PE
THESE MACHINES REQUIRE SPECIAL
PROGRAMMING EFFORTS TO ACHIEVE THESPEED ADVANTAGE
THE COMPUTATIONS ARE CARRIED OUT
SYNCHRONOUSLY BY THE HW AND THEREFORESYNC IS NOT AN EXPLICIT PROBLEM
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
94/150
ARRAY PROCESSORS
USING AN INTERCONNECTION NETWORK:
PE1 PE2 PE3 PE4 PEnCU AND
SCALARPROCESS
ORINTERCONNECTION
NETWORK
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
95/150
ARRAY PROCESSORS
USING AN ALIGNMENT NETWORK:
PE0 PE1 PE2 PEn
ALIGNMENT NETWORK
MEM O MEM1 MEM2 MEMK
CONTROL
UNIT AND
SCALAR
PROCESSOR
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
96/150
CONVENTIONAL MULTI-PROCESSORS
ASYNCHRONOUS MULTIPROCESSORS
BASED ON MULTIPLE CPUs AND MEM BANKS
CONNECTED THROUGH EITHER A BUS ORCONNECTION NETWORK IS A COMMONLY USED
TECHNIQUE TO PROVIDE INCREASED
THROUGHPUT AND/OR RESPONSE TIME IN A
GENERAL PURPOSE COMPUTING ENVIRONMENT.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
97/150
CONVENTIONAL MULTI-PROCESSORS
IN SUCH SYSTEMS, EACH CPU OPERATES
INDEPENDENTLY ON THE QUANTUM OF WORK GIVEN
TO IT
IT HAS BEEN HIGHLY SUCCESSFUL IN PROVIDING
INCREASED THROUGHPUT AND/OR RESPONSE TIME IN
TIME SHARED SYSTEMS.
EFFECTIVE REDUCTION OF THE EXECUTION TIME OF AGIVEN JOB REQUIRES THE JOB TO BE BROKEN INTO
SUB-JOBS THAT ARE TO BE HANDLED SEPARATELY BY
THE AVAILABLE PHYSICAL PROCESSORS.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
98/150
CONVENTIONAL MULTI-PROCESSORS
IT WORKS WELL FOR TASKS RUNNING MORE OR
LESS INDEPENDENTLY ie., FOR TASKS HAVING
LOW COMMUNICATION AND SYNCHRONIZATION
REQUIREMENTS.
COMM AND SYNC IS IMPLEMENTED EITHER
THROUGH THE SHARED MEMORY OR BY
MESSAGE SYSTEM OR THROUGH THE HYBRIDAPPROACH.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
99/150
CONVENTIONAL MULTI-PROCESSORS
SHARED MEMORY ARCHITECTURE:
MEMORY
CPU CPU CPU
COMMON BUS ARCHITECTURE
MEM0 MEM1 MEMn
PROCESSOR MEMORY SWITCH
CPU CPU CPU
SWITCH BASEDMULTIPROCESSOR
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
100/150
CONVENTIONAL MULTI-PROCESSORS
MESSAGE BASED ARCHITECTURE:
PE 1 PE 2 PE n
CONNECTION NETWORK
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
101/150
CONVENTIONAL MULTI-PROCESSORS
HYBRID ARCHITECTURE:
PE 1 PE 2 PE n
CONNECTION NETWORK
MEM 1 MEM 2 MEM k
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
102/150
CONVENTIONAL MULTI-PROCESSORS
ON A SINGLE BUS SYSTEM, THERE IS A LIMIT ON THENUMBER OF PROCESSORS THAT CAN BE OPERATED INPARALLEL.
IT IS USUALLY OF THE ORDER OF 10.
COMM NETWORK HAS THE ADVANTAGE THAT THE NOOF PROCESSORS CAN GROW WITHOUT LIMIT, BUT THECONNECTION AND COMM COST MAY DOMINATE ANDTHUS SATURATE THE PERFORMANCE GAIN.
DUE TO THIS REASON, HYBRID APPROACH MAY BEFOLLOWED
MANY SYSTEMS USE A COMMON BUS ARCH FORGLOBAL MEM, DISK AND I/O WHILE THE PROC MEMTRAFFIC IS HANDLED BY SEPARATE BUS.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
103/150
DATA FLOW COMPUTERS
A NEW FINE GRAIN PARALLEL PROCESSINGAPPROACH BASED ON DATAFLOW COMPUTINGMODEL HAS BEEN SUGGESTED BY JACK DENNISIN 1975.
HERE, A NO OF DATA FLOW OPERATORS, EACHCAPABLE OF DOING AN OPERATION AREEMPLOYED.
A PROGRAM FOR SUCH A MACHINE IS ACONNECTION GRAPH OF THE OPERATORS.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
104/150
DATA FLOW COMPUTERS
THE OPERATORS FORM THE NODES OF THE
GRAPH WHILE THE ARCS REPRESENT THE DATA
MOVEMENT BETWEEN NODES.
AN ARC IS LABELED WITH A TOKEN TO INDICATE
THAT IT CONTAINS THE DATA.
A TOKEN IS GENERATED ON THE OUTPUT OF ANODE WHEN IT COMPUTES THE FUNCTION
BASED ON THE DATA ON ITS INPUT ARCS.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
105/150
DATA FLOW COMPUTERS
THIS IS KNOWN AS FIRING OF THE NODE.
A NODE CAN FIRE ONLY WHEN ALL OF ITS INPUT
ARCS HAVE TOKENS AND THERE IS NO TOKEN
ON THE OUTPUT ARC. WHEN A NODE FIRES, IT REMOVES THE INPUT
TOKENS TO SHOW THAT THE DATA HAS BEEN
CONSUMED.
USUALLY, COMPUTATION STARTS WITHARRIVAL OF DATA ON THE INPUT NODES OF THE
GRAPH.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
106/150
DATA FLOW COMPUTERS
DATA FLOW GRAPH FOR THE COMPUTATION:
A = 5 + CD
5
C
D
+ -
COMPUTATION PROGRESSES
AS PER DATA AVAILABILITY
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
107/150
DATA FLOW COMPUTERS
MANY CONVENTIONAL MACHINES EMPLOYING
MULTIPLE FUNCTIONAL UNITS EMPLOY THE DATA
FLOW MODEL FOR SCHEDULING THE FUNCTIONAL
UNITS.
EXAMPLE EXPERIMENTAL MACHINES ARE
MANCHESTER MACHINE (1984) AND MIT MACHINE.
THE DATA FLOW COMPUTERS PROVIDE FINE
GRANULARITY OF PARALLEL PROCESSING, SINCE THE
DATA FLOW OPERATORS ARE TYPICALLY
ELEMENTARY ARITHMETIC AND LOGIC OPERATORS.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
108/150
DATA FLOW COMPUTERS
IT MAY PROVIDE AN EFFECTIVE SOLUTION FOR USING
VERY LARGE NUMBER OF COMPUTING ELEMENTS IN
PARALLEL.
WITH ITS ASYNCHRONOUS DATA DRIVEN CONTROL, IT
HAS A PROMISE FOR EXPLOITATION OF THE
PARALLELISM AVAILABLE BOTH IN THE PROBLEM AND
THE MACHINE.
CURRENT IMPLEMENTATIONS ARE NO BETTER THAN
CONVENTIONAL PIPELINED MACHINES EMPLOYING
MULTIPLE FUNCTIONAL UNITS.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
109/150
SYSTOLIC ARCHITECTURES
THE ADVENT OF VLSI HAS MADE IT POSSIBLE TODEVELOP SPECIAL ARCHITECTURES SUITABLEFOR DIRECT IMPLEMENTATION IN VLSI.
SYSTOLIC ARCHITECTURES ARE BASICALLYPIPELINES OPERATING IN ONE OR MOREDIMENSIONS.
THE NAME SYSTOLIC HAS BEEN DERIVED FROMTHE ANALOGY OF THE OPERATION OF BLOODCIRCULATION SYSTEM THROUGH THE HEART.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
110/150
SYSTOLIC ARCHITECTURES
CONVENTIONAL ARCHITECTURES OPERATE ONTHE DATA USING LOAD AND STORE OPERATIONSFROM THE MEMORY.
PROCESSING USUALLY INVOLVES SEVERALOPERATIONS.
EACH OPERATION ACCESSES THE MEMORY FOR
DATA, PROCESSES IT AND THEN STORES THERESULT. THIS REQUIRES A NO OF MEMREFERENCES.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
111/150
SYSTOLIC ARCHITECTURES
CONVENTIONAL PROCESSING:
MEMORY
F1 F2 Fn
MEMORY
F1 F2 Fn
SYSTOLIC PROCESSING
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
112/150
SYSTOLIC ARCHITECTURES
IN SYSTOLIC PROCESSING, DATA TO BE PROCESSED
FLOWS THROUGH VARIOUS OPERATION STAGES AND
THEN FINALLY IT IS PUT IN THE MEMORY.
SUCH AN ARCHITECTURE CAN PROVIDE BERY HIGH
COMPUTING THROUGHPUT DUE TO REGULAR
DATAFLOW AND PIPELINE OPERATION.
IT MAY BE USEFUL IN DESIGNING SPECIAL PROCESSORS
FOR GRAPHIC, SIGNAL & IMAGE PROCESSING.
PERFORMANCE OF
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
113/150
PERFORMANCE OF
PARALLEL COMPUTERS AN IMPORTANT MEASURE OF PARALLELARCHITECTURE IS SPEEDUP.
LET n = NO. OF PROCESSORS; Ts = SINGLE PROC.
EXEC TIME; Tn = N PROC. EXEC. TIME,
THEN
SPEEDUP S = Ts/Tn
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
114/150
AMDAHLS LAW
1967
BASED ON A VERY SIMPLE OBSERVATION.
A PROGRAM REQUIRING TOTAL TIME T FOR
SEQUENTIAL EXECUTION SHALL HAVE SOME
PART WHICH IS INHERENTLY SEQUENTIAL.
IN TERMS OF TOTAL TIME TAKEN TO SOLVE THE
PROBLEM, THIS FRACTION OF COMPUTING TIME
IS AN IMPORTANT PARAMETER.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
115/150
AMDAHLS LAW
LET f = SEQ. FRACTION FOR A GIVEN PROGRAM.
AMDAHLS LAW STATES THAT THE SPEED UP OF
A PARALLEL COMPUTER IS LIMITED BY
S
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
116/150
AMDAHLS LAW
CONSIDER TWO PARALLEL COMPS. Me AND Mi.Me IS BUILT USING POWERFUL PROCS. CAPABLEOF EXECUTING AT A SPEED OF M MEGAFLOPS.
THE COMP Mi IS BUILT USING CHEAP PROCS.AND EACH PROC. OF Mi EXECUTES r.MMEGAFLOPS, WHERE 0 < r < 1
IF THE MACHINE Me ATTEMPTS A COMPUTATION
WHOSE INHERENTLY SEQ.FRACTION f > r THENMi WILL EXECUTE COMPS. MORE SLOWLY THANA SINGLE PROC. OF Mi.
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
117/150
AMDAHLS LAW
PROOF:
LET W = TOTAL WORK; M = SPEED OF Mi (IN Mflops)
R.m = SPEED OF PE OF Ma; f.W = SEQ WORK OF JOB;
T(Ma) = TIME TAKEN BY Ma FOR THE WORK W,
T(Mi) = TIME TAKEN BY Mi FOR THE WORK W, THEN
TIME TAKEN BY ANY COMP =
T = AMOUNT OF WORK/SPEED
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
118/150
AMDAHLS LAW
T(Ma) = TIME FOR SEQ PART + TIME FOR
PARALLEL PART
= ((f.W)/(r.M)) + [((1-f).W/n)/(r.M)] = (W/M).(f/r) IF n IS
INFINITELY LARGE.T(Me) = (W/M) [ASSUMING ONLY 1 PE]
SO IF f > r, THEN T(Ma) > T(Mi)
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
119/150
AMDAHLS LAW
THE THEOREM IMPLIES THAT A SEQ COMPONENT
FRACTION ACCEPTABLE FOR THE MACHINE Mi
MAY NOT BE ACCEPTABLE FOR THE MACHINE
Ma.
IT IS NOT GOOD TO HAVE A LARGER PROCESSING
POWER THAT GOES AS A WASTE. PROCS MUST
MAINTAIN SOME LEVEL OF EFFICIENCY.
A A S A
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
120/150
AMDAHLS LAW
RELATION BETWEEN EFFICIENCY e AND SEQ FRACTIONr:
S
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
121/150
MINSKYS CONJECTURE
1970
FOR A PARALLEL COMPUTER WITH n PROCS, THE
SPEEDUP S SHALL BE PROPORTIONAL TO log2n.
MINSKYS CONJECTURE WAS VERY BAD FOR THE
PROPONENTS OF LARGE SCALE PARALLEL
ARCHITECTURES.
FLYNN & HENNESSY (1980) THEN GAVE THATSPEEDUP OF n PROCESSOR PARALEL SYSTEM IS
LIMITED BY S
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
122/150
PARALLEL ALGORITHMS
IMP MEASURE OF THE PERFORMANCE OF ANYALGO IS ITS TIME AND SPACE COMPLEXITY.THEY ARE SPECIFIED AS SOME FUNCTION OF THEPROBLEM SIZE.
MANY TIMES, THEY DEPEND UPON THE USEDDATA STRUCTURE.
SO, ANOTHER IMP MEASURE IS THEPREPROCESSING TIME COMPLEXITY TOGENERATE THE DESIRED DATA STRUCTURE.
PARALLEL ALGORITHMS
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
123/150
PARALLEL ALGORITHMS
PARALLEL ALGOS ARE THE ALGOS TO BE RUN
ON PARALLEL MACHINE.
SO, COMPLEXITY OF COMM AMONGSTPROCESORS ALSO BECOMES AN IMPORTANT
MEASURE.
SO, AN ALGO MAY FARE BADLY ON ONEMACHINE AND MUCH BETTER ON THE OTHER.
PARALLEL ALGORITHMS
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
124/150
PARALLEL ALGORITHMS
DUE TO THIS REASON, MAPPING OF THE ALGO
ON THE ARCHITECTURE IS AN IMP ACTIVITY IN
THE STUDY OF PARALLEL ALGOS.
SPEEDUP AND EFFICIENCY ARE ALSO IMP
PERFORMANCE MEASURES FOR A PARALLEL
ALGO WHEN MAPPED ON TO A GIVENARCHITECTURE.
PARALLEL ALGORITHMS
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
125/150
PARALLEL ALGORITHMS
A PARALLEL ALGO FOR A GIVEN PROBLEM
MAY BE DEVELOPED USING ONE OR MORE OF
THE FOLLOWING:
1. DETECT AND EXPLOIT THE INHERENTPARALLELISM AVAILABLE IN THE EXISTING
SEQUENTIAL ALGORITHM
2. INDEPENDENTLY INVENT A NEW PARALLEL
ALGORITHM3. ADAPT AN EXISTING PARALLEL ALGO THAT
SOLVES A SIMILAR PROBLEM.
DISTRIBUTED PROCESSING
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
126/150
DISTRIBUTED PROCESSING
PARALLEL PROCESSING DIFFERS FROM DISTRIBUTED
PROCESSING IN THE SENSE THAT IT HAS (1) CLOSE
COUPLING BETWEEN THE PROCESSORS & (2)
COMMUNICATION FAILURES MATTER A LOT.
PROBLEMS MAY ARISE IN DISTRIBUTED PROCESSING
BECAUSE OF (1) TIME UNCERTAINTY DUE TO DIFFERING
TIME IN LOCAL CLOCKS, (2) INCOMPLETE INFO ABOUT
OTHER NODES IN THE SYSTEM, (3) DUPLICATE INFO
WHICH MAY NOT BE ALWAYS CONSISTENT.
PIPELINING PROCESSING
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
127/150
PIPELINING PROCESSING
A PIPELINE CAN WORK WELL WHEN:
1. THE TIME TAKEN BY EACH STAGE IS NEARLYTHE SAME.
2. IT REQUIRES A STEADY STEAM OF JOBS,OTHERWISE UTILIZATION WILL BE POOR.
3. IT HONOURS THE PRECEDENCE CONSTRAINTSOF SUB-STEPS OF JOBS.
IT IS THE MOST IMP PROPERTY OF PIPELINE. IT
ALLOWS PARALLEL EXECUTION OF JOBSWHICH HAVE NO PARALLELISM WITHININDIVIDUAL JOBS THEMSELVES.
PIPELINING PROCESSING
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
128/150
PIPELINING PROCESSING
IN FACT, A JOB WHICH CAN BE BROKEN INTO A NO OFSEQUENTIAL STEPS IS THE BASIS OF PIPELINEPROCESSING.
THIS IS DONE BY INTRODUCING TEMPORALPARALLELISM WHICH MEANS EXECUTING DIFFERENTSTEPS OF DIFFERENT JOBS INSIDE THE PIPELINE.
THE PERFORMANCE IN TERMS OF THROUGHPUT ISGUARANTEED IF THERE ARE ENOUGH JOBS TO BE
STREAMED THROUGH THE PIPELINE, ALTHOUGH ANINDIVIDUAL JOB FINISHES WITH A DELAY EQUALLINGTHE TOTAL DELAY OF ALL THE STAGES.
PIPELINING PROCESSING
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
129/150
PIPELINING PROCESSING
THE FOURTH IMP THING IS THAT THE STAGES INTHE PIPELINE ARE SPECIALIZED TO DOPARTICULAR SUBFUNCTIONS, UNLIKE INCONVENTIONAL PARALLEL PROCESSORS WERE
EQUIPMENT IS REPLICATED.
IT AMOUNTS TO SAYING THAT DUE TOSPECIALIZATION, THE STAGE PROC COULD BE
DESIGNED WITH BETTER COST AND SPEED,OPTIMISED FOR THE SPECIALISED FUNCTION OFTHE STAGE
PERFORMANCE MEASURES
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
130/150
OF PIPELINE EFFICIENCY, SPEEDUP AND THROUGHPUT
EFFICIENCY: LET n BE THE LENGTH OF PIPE AND
m BE THE NO OF TASKS RUN ON THE PIPE, THENEFFICIENCY e CAN BE DEFINED AS
e = [(m.n)/((m+n-1).(n))]
WHEN n>>m, e TENDS TO m/n (A SMALL FRACTION)
WHEN n
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
131/150
OF PIPELINE SPEEDUP = S = [((n.ts).m)/((m+n-1).ts)]
= [(m.n)/(n+m-1)]
WHEN n>>m, S=m (NO. OF TASKS RUN)
WHEN n
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
132/150
OF PIPELINE THROUGHPUT = Th = [m/((n+m-1).ts)] = e/ts WHERE ts IS
TIME THAT ELAPSES AT 1 STAGE.
WHEN n>>m, Th = m/(n.ts)
WHEN n
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
133/150
OPTIMAL PIPE SEGMENTATION
IN HOW MANY SUBFUNCTIONS A FUNCTIONSHOULD BE DIVIDED?
LET n = NO OF STAGES, T= TIME FOR NON-PIPELINED IMPLEMENTATION, D = LATCH DELAYAND c = COST OF EACH STAGE
STAGE COMPUTE TIME = T/n (SINCE T IS DIVIDED
EQUALLY FOR n STAGES) PIPELINE COST= c.n + k WHERE k IS A CONSTANT
REFLECTING SOME COST OVERHEAD.
OPTIMAL PIPE SEGMENTATION
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
134/150
OPTIMAL PIPE SEGMENTATION
SPEED (TIME PER OUTPUT) = (T/n + D)
ONE OF THE IMPORTANT PERFORMANCE
MEASURE IS THE PRODUCT OF SPEED AND COST
DENOTED BY p. p = [(T/n) + D).(c.n +k)] = T.c +D.c.n + (k.T)/n + k.D
TO OBTAIN A VALUE OF n WHICH GIVES BEST
PERFORMANCE, WE DIFFERENTIATE p w r t n AND
EQUATE IT TO ZERO dp/dn = D.c(k.T)/n2= 0
n = SQRT [(k.T)/(D.c)]
PIPELINE CONTROL
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
135/150
PIPELINE CONTROL
IN A NON-PIPELINED SYSTEM, ONE INST IS FULLY EXECUTED
BEFORE THE NEXT ONE STARTS, THUS MATCHING THE ORDER
OF EXECUTION.
IN A PIPELINED SYSTEM, INST EXECUTION IS OVERLAPPED. SO,IT CAN CAUSE PROBLEMS IF NOT CONSIDERED PROPERLY IN
THE DESIGN OF CONTROL.
EXISTENCE OF SUCH DEPENDENCIES CAUSES HAZARDS
CONTROL STRUCTURE PLAYS AN IMP ROLE IN THE
OPERATIONAL EFFICIENCY AND THROUGHPUT OF THE
MACHINE.
PIPELINE CONTROL
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
136/150
PIPELINE CONTROL
THERE ARE 2 TYPES OF CONTROL STRUCTURESIMPLEMENTED ON COMMERCIAL SYSTEMS.
THE FIRST ONE IS CHARACTERISED BY A STREAMLINE FLOWOF THE INSTS IN THE PIPE.
IN THIS, INSTS FOLLOW ONE AFTER ANOTHER SUCH THATTHE COMPLETION ORDERING IS THE SAME AS THE ORDER OFINITIATION.
THE SYSTEM IS CONCEIVED AS A SEQUENCE OF FUNCTIONALMODULES THROUGH WHICH THE INSTS FLOW ONE AFTERANOTHER WITH AN INTERLOCK BETWEEN THE ADJACENTSTAGES TO ALLOW THE TRANSFER OF DATA FROM ONESTAGE TO ANOTHER.
PIPELINE CONTROL
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
137/150
PIPELINE CONTROL
THE INTERLOCK IS NECESSARY BECAUSE THE PIPE IS
ASYNCHRONOUS DUE TO VARIATIONS IN THE SPEEDS
OF DIFFERENT STAGES.
IN THESE SYSTEMS, THE BOTTLRNECKS APPEARDYNAMICALLY AT ANY STAGE AND THE INPUT TO IT IS
HALTED TEMPORARILY.
THE SECOND TYPE OF CONTROL IS MORE FLEXIBLE,
POWERFUL BUT EXPENSIVE.
PIPELINE CONTROL
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
138/150
PIPELINE CONTROL
IN SUCH SYSTEMS, WHEN A STAGE HAS TO SUSPEND THE
FLOW OF A PARTICULAR INSTRUCTION, IT ALLOWS OTHER
INSTS TO PASS THROUGH THE STAGE RESULTING IN AN OUT-
OF-TURN EXECUTION OF THE INSTS.
THE CONTROL MECHANISM IS DESIGNED SUCH THAT EVEN
THOUGH THE INSTS ARE EXECUTED OUT-OF-TURN, THE
BEHAVIOUR OF THE PROGRAM IS SAME AS IF THEY WERE
EXECUTED IN THE ORIGINAL SEQUENCE.
SUCH CONTROL IS DESIRABLE IN A SYSTEM HAVINGMULTIPLE ARITHMETIC PIPELINES OPERATING IN PARALLEL.
PIPELINE HAZARDS
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
139/150
PIPELINE HAZARDS
THE HARDWARE TECHNIQUE THAT DETECTS AND
RESOLVES HAZARDS IS CALLED INTERLOCK.
A HAZARD OCCURS WHENEVER AN OBJECT WITHIN
THE SYSTEM (REF, FLAG, MEM LOCATION) IS ACCESSEDOR MODIFIED BY 2 SEPARATE INSTS THAT ARE CLOSE
ENOUGH IN THE PROGRAM SUCH THAT THEY MAY BE
ACTIVE SIMULTANEOUSLY IN THE PIPELINE.
HAZARDS ARE OF 3 KINDS: RAW, WAR AND WAW
PIPELINE HAZARDS
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
140/150
PIPELINE HAZARDS
ASSUME THAT AN INST j LOGICALLY FOLLOWS AN INST i.
RAW HAZARD: IT OCCURS BETWEEN 2 INSTS WHEN INST j
ATTEMPTS TO READ SOME OBJECT THAT IS BEING MODIFIED
BY INST i.
WAR HAZARD: IT OCCURS BETWEEN 2 INSTS WHEN THE INST j
ATTEMPTS TO WRITE ONTO SOME OBJECT THAT IS BEING
READ BY THE INST i.
WAW HAZARD: IT OCCURS WHEN THE INST j ATTEMPTS TO
WRITE ONTO SOME OBJECT THAT IS ALSO REQUIRED TO BE
MODIFIED BY THE INST i.
PIPELINE HAZARDS
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
141/150
PIPELINE HAZARDS
THE DOMAIN (READ SET) OF AN INST k, DENOTED BY Dk,IS THE SET OF ALL OBJECTS WHOSE CONTENTS AREACCESSED BY THE INST k.
THE RANGE (WRITE SET) OF AN INST k, DENOTED BY Rk,IS THE SET OF ALL OBJECTS UPDATED BY THE INST k.
A HAZARD BETWEEN 2 INSTS i AND j (WHERE j FOLLOWSi) OCCURS WHENEVER ANY OF THE FOLLOWING HOLDS:
Ri
* Dj
{ } (RAW)
Di* Rj { } (WAR)
Ri* Rj { } (WAW), WHERE * IS INTERSECTIONOPERATION AND { } IS EMPTY SET.
HAZARD DETECTION & REMOVAL
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
142/150
HAZARD DETECTION & REMOVAL
TECHNIQUES USED FOR HAZARD DETECTION CAN BECLASSIFIED INTO 2 CLASSES:
CENTRALIZE ALL THE HAZARD DETECTION IN ONE
STAGE (USUALLY IU) AND COMPARE THE DOMAIN ANDRANGE SETS WITH THOSE OF ALL THE INSTS INSIDETHE PIPELINE
ALLOW THE INSTS TO TRAVEL THROUGH THE PIPELINEUNTIL THE OBJECT EITHER FROM THE DOMAIN OR
RANGE IS REQUIRED BY THE INST. AT THIS POINT,CHECK IS MADE FOR A POTENTIAL HAZARD WITH ANYOTHER INST INSIDE THE PIPELINE.
HAZARD DETECTION & REMOVAL
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
143/150
HAZARD DETECTION & REMOVAL
FIRST APPROACH IS SIMPLE BUT SUSPENDS THE
INST FLOW IN THE IU ITSELF, IF THE INST
FETCHED IS IN HAZARD WITH THOSE INSIDE THE
PIPELINE.
THE SECOND APPROACH IS MORE FLEXIBLE BUT
THE HARDWARE REQUIRED GROWS AS A
SQUARE OF THE NO OF STAGES.
HAZARD DETECTION & REMOVAL
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
144/150
HAZARD DETECTION & REMOVAL
THERE ARE 2 APPROACHS FOR HAZARD REMOVAL:
SUSPEND THE PIPELINE INITIATION AT THE POINT OF
HAZARD. THUS, IF AN INST j DISCOVERS THAT THERE IS
A HAZARD WITH THE PREVIOUSLY INITIATED INST i,THEN ALL THE INSTS j+1, j+2, ARE STOPPED IN THEIR
TRACKS TILL THE INST i HAS PASSED THE POINT OF
HAZARD.
SUSPEND j BUT ALLOW THE INSTS j+1, j+2, TO FLOW.
HAZARD DETECTION & REMOVAL
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
145/150
HAZARD DETECTION & REMOVAL
THE FIRST APPROACH IS SIMPLE BUT PENALIZES ALL THEINSTS FOLLOWING j.
SECOND APPROACH IS EXPENSIVE.
IF THE PIPELINE STAGES HAVE ADDITIONAL BUFFERS BESIDESA STAGING LATCH, THEN IT IS POSSIBLE TO SUSPEND AN INSTBECAUSE OF HAZARD.
AT EACH POINT IN THE PIPELINE, WHERE DATA IS TO BEACCESSED AS AN INPUT TO SOME STAGE AND THERE IS A RAWHAZARD, ONE CAN LOAD ONE OF THE STAGING LATCH NOTWITH THE DATA BUT ID OF THE STAGE THAT WILL PRODUCEIT.
HAZARD DETECTION & REMOVAL
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
146/150
HAZARD DETECTION & REMOVAL
THE WAITING INST THEN IS FROZEN AT THIS STAGE UNTIL
THE DATA IS AVAILABLE.
SINCE THE STAGE HAS MULTIPLE STAGING LATCHES IT CAN
ALLOW OTHER INSTS TO PASS THROUGH IT WHILE THE RAWDEPENDENT ONE IS FROZEN.
ONE CAN INCLUDE LOGIC IN THE STAGE TO FORWARD THE
DATA WHICH WAS IN RAW HAZARD TO THE WAITING STAGE.
THIS FORM OF CONTROL ALLOWS HAZARD RESOLUTION
WITH THE MINIMUM PENALTY TO OTHER INSTS.
HAZARD DETECTION & REMOVAL
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
147/150
HAZARD DETECTION & REMOVAL
THIS TECHNIQUE IS KNOWN BY THE NAME INTERNAL
FORWARDING SINCE THE STAGES ARE DESIGNED TO
CARRY OUT AUTOMATIC ROUTING OF THE DATA TO
THE REQUIRED PLACE USING IDENTIFICATION CODES
(IDs).
IN FACT, MANY OF THE DATA DEPENDENT
COMPUTATIONS ARE CHAINED BY MEANS OF ID TAGS
SO THAT UNNECESSARY ROUTING IS ALSO AVOIDED.
MULTIPROCESSOR SYSTEMS
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
148/150
MULTIPROCESSOR SYSTEMS
IT IS A COMPUTER SYSTEM COMPRISING OF TWO OR MOREPROCESSORS.
AN INTERCONNECTION NETWORK LINKS THESE
PROCESSORS.
THE MAIN OBJECTIVE IS TO ENHANCE THE PERFORMANCEBY MEANS OF PARALLEL PROCESSING.
IT FALLS UNDER THE MIMD ARCHITECTURE.
BESIDES HIGH PERFORMANCE, IT PROVIDES THEFOLLOWING BENEFITS: FAULT TOLERANCE & GRACEFULDEGRADATION; SCALABILITY & MODULAR GROWTH
CLASSIFICATION OF MULTI-PROCESSORS
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
149/150
CLASSIFICATION OF MULTI PROCESSORS
MULTI-PROCESSOR ARCHITECTURE:
TIGHTLY COUPLED LOOSELY COUPLED
UMA NUMA NORMA
NO REMOTE MEMORY ACCESS
IN A TIGHTLY COUPLED MULTI-PROCESSOR, MULTIPLE PROCS SHAREINFO VIA COMMON MEM. HENCE, ALSO KNOWN AS SHARED MEM MULTI-
PROCESSOR SYSTEM. BESIDES GLOBAL MEM, EACH PROC CAN ALSO
HAVE LOCAL MEM DEDICATED TO IT.
DISTRIBUTED MEM MULTI-
PROCESSOR SYSTEM
SYMMETRIC
-
5/21/2018 Computerarchitecture Abhishekmail 130520052349 Phpapp02
150/150
MULTIPROCESSOR IN UMA SYSTEM, THE ACCESS TIME FOR MEM IS EQUAL
FOR ALL THE PROCESSORS.
A SMP SYSTEM IS AN UMA SYSTEM WITH IDENTICAL
PROCESSORS, EQUALLY CAPABLE IN PERFORMINGSIMILAR FUNCTIONS IN AN IDENTICAL MANNER.
ALL THE PROCS. HAVE EQUAL ACCESS TIME FOR THE MEMAND I/O RESOURCES.
FOR THE OS, ALL SYSTEMS ARE SIMILAR AND ANY PROC.CAN EXECUTE IT