computer organization & architecture rcs-302

166
RCS-302 16/08/19 Department of Computer Science & Engineering 1 COMPUTER ORGANIZATION & ARCHITECTURE Faculties : 1. Mr. Ravi Tiwari (Subject Coordinator) 2. Mr. Gaurav Rajput 3. Mr. Paras Bassi 4. Mr. Vimal Singh

Upload: others

Post on 03-Nov-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

RCS-302

16/08/19Department of Computer Science &

Engineering1

COMPUTER ORGANIZATION

& ARCHITECTURE

Faculties :1. Mr. Ravi Tiwari (Subject Coordinator)2. Mr. Gaurav Rajput3. Mr. Paras Bassi4. Mr. Vimal Singh

Page 2: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

UNIT 1Introduction to Computer

Organization

16/08/19Department of Computer Science &

Engineering2

COMPUTER ORGANIZATION

& ARCHITECTURE

Page 3: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

VISIONTo develop competent IT professionals catering to the needs of Industry and society in a global perspective.

MISSIONTo attain academic & professional excellence with collective efforts of all stake holders through:M1: Dissemination of basic concepts and analytical skills. M2: Exposure to new tools in the area of Information technology.M3: Effective interaction with industry for better employability.M4: Inculcating values and professional ethics with social responsibility.

Department of Computer Science & Engineering

316/08/19Department of Computer Science &

Engineering

Page 4: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Arithmetic Signed and unsigned numbers Addition and Subtraction Logical operations ALU: arithmetic and logic unit Multiply Divide Floating Point

notation add multiply

16/08/19 4Department of Computer Science &

Engineering

Page 5: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Learning Objectives

The objectives of the following slide is to make student aware about the :

Arithmetic Signed and unsigned numbers Addition and Subtraction Logical operations ALU: Arithmetic and Logic Unit Multiply Divide Floating Point

516/08/19 Department of Computer Science & Engineering

Page 6: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

6

32

32

32

operation

result

a

b

ALU

Arithmetic

• Where we've been:– Performance (seconds, cycles,

instructions)– Abstractions:

Instruction Set Architecture Assembly Language and Machine

Language

• What's up ahead:– Implementing the Architecture

Page 7: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

7

• Bits have no inherent meaning (no semantics)• Decimal number system, e.g.:

4382 = 4x103 + 3x102 + 8x101 + 2x100

• Can use arbitrary base g; value of digit c at position i:c x gi

• Binary numbers (base 2)n-1 n-2 … 1 0

an-1 an-2 … a1 a0

2n-1 2n-2 … 21 20

• (an-1 an-2... a1 a0) two = an-1 x 2n-1 + an-2 x 2n-2 + … + a0 x 20

Binary numbers (1)

position

digit

weight

Page 8: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

8

Binary numbers (2)

• So far numbers are unsigned• With n bits 2n possible combinations

– a0 : least significant bit (lsb)

– an-1: most significant bit (msb)

1 bit 2 bits 3 bits 4 bits decimal value 0 00 0000000 0 1 01 0010001 1

10 0100010 211 0110011 3

1000100 41010101 51100110 61110111 7

1000 81001 9

Page 9: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

9

• Binary numbers (base 2)0000 0001 0010 0011 0100 0101 0110 0111 1000 1001...

decimal: 0...2n-1

• Of course it gets more complicated:- numbers are finite (overflow)- fractions and real numbers- negative numbers

– e.g., no MIPS subi instruction; – however, addi can add a negative number

How do we represent negative numbers?i.e., which bit patterns will represent which numbers?

Binary numbers (3)

Page 10: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

TU/e Processor Design 5Z032 10

Conversion• Decimaal -> binair

Divide by 2 Remainder

• Hexadecimal: base 16. Octal: base 81010 1011 0011 1111two = ab3fhex

43822191 01095 1547 1273 1136 168 034 017 08 14 02 01 00 1

4382ten =1 0001 0001 1110two

Page 11: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

11

• Sign Magnitude: One's Complement Two's Complement

000 = +0 000 = +0 000 = +0001 = +1 001 = +1 001 = +1010 = +2 010 = +2 010 = +2011 = +3 011 = +3 011 = +3100 = -0100 = -3100 = -4101 = -1101 = -2101 = -3110 = -2110 = -1110 = -2111 = -3111 = -0111 = -1

• Issues: balance, number of zeros, ease of operations

• Which one is best? Why?

Signed binary numbersPossible representations:

Page 12: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

12

• 32 bit signed numbers:

0000 0000 0000 0000 0000 0000 0000 0000two = 0ten

0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten

0000 0000 0000 0000 0000 0000 0000 0010two = + 2ten

...0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten

0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten

1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648ten

1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647ten

1000 0000 0000 0000 0000 0000 0000 0010two = – 2,147,483,646ten

...1111 1111 1111 1111 1111 1111 1111 1101two = – 3ten

1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten

1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten

– Range [-2 31 .. 2 31 -1]

• (an-1 an-2... a1 a0) 2’s-compl = -an-1 x 2n-1 + an-2 x 2n-2 + … + a0 x 20 = - 2n + an-1 x 2n-1 + … + a0 x 20

maxint

minint

Two’s complement

Page 13: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

13

• Negating a two's complement number: invert

all bits and add 1

– remember: “negate” and “invert” are quite different!

• Proof:

a + a = 1111.1111b = -1 d =>

-a = a + 1

Two's Complement Operations

Page 14: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

14

Two's Complement Operations

Converting n bit numbers into numbers with more than n bits:

– MIPS 8 bit, 16 bit values / immediates converted to 32 bits

• Copy the most significant bit (the sign bit) into the other bits

0010 -> 0000 0010

1010 -> 1111 1010

• MIPS "sign extension" example instructions:lb load byte (signed)lbu load byte (unsigned)slti set less than immediate (signed)sltiu set less than immediate (unsigned)

Page 15: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

15

Addition & Subtraction

• Just like in grade school (carry/borrow 1s) 0111 0111 0110

+ 0110 - 0110 - 0101

• Two's complement operations easy– subtraction using addition of negative numbers

0110 0110 - 0101 + 1010

• Overflow (result too large for finite computer word):– e.g., adding two n-bit numbers does not yield an n-bit

number 0111+ 0001 note that overflow term is somewhat misleading,

1000 it does not mean a carry “overflowed”

Page 16: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

16

• No overflow when adding a positive and a negative number

• No overflow when signs are the same for subtraction

• Overflow occurs when the value affects the sign:– overflow when adding two positives yields a negative – or, adding two negatives gives a positive– or, subtract a negative from a positive and get a negative– or, subtract a positive from a negative and get a positive

• Consider the operations A + B, and A – B– Can overflow occur if B is 0 ?– Can overflow occur if A is 0 ?

Detecting Overflow

Page 17: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

17

• When an exception (interrupt) occurs:– Control jumps to predefined address for exception

(interrupt vector)– Interrupted address is saved for possible resumption in

exception program counter (EPC); new instruction: mfc0(move from coprocessor0)

– Interrupt handler handles exception (part of OS).registers $k0 and $k1 reserved for OS

• Details based on software system / language– C ignores integer overflow; FORTRAN not

• Don't always want to detect overflow— new MIPS instructions: addu, addiu, subunote: addiu and sltiu still sign-extends!

Effects of Overflow

Page 18: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

18

Logic operations

• Sometimes operations on individual bits needed:

Logic operation C operationMIPS instructionShift left logical << sllShift right logical >> srlBit-by-bit AND & and, andiBit-by-bit OR | or, ori

• and and andi can be used to turn off some bits; or and ori turn on certain bits

• Of course, AND en OR can be used for logic operations. – Note: Language C’s logical AND (&&) and OR (||) are

conditional

• andi and ori perform no sign extension !

Page 19: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

19

• Let's build an ALU to support the andi and ori instructions– we'll just build a 1 bit ALU, and use 32 of them

b

a

operation

result

An ALU (arithmetic logic unit)

Page 20: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

20

• Selects one of the inputs to be the output, based on a control input

• Lets build our ALU and use a MUX to select theoutcome for the chosen operation

S

CA

B0

1

Review: The Multiplexor

note: we call this a 2-input mux even though it has 3 inputs!

Page 21: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

21

• Not easy to decide the “best” way to build something– Don't want too many inputs to a single gate– Don’t want to have to go through too many gates– For our purposes, ease of comprehension is important

• Let's look at a 1-bit ALU for addition (= full-adder):

Different Implementations

cout = a b + a cin + b cin

sum = a xor b xor cin

How could we build a 1-bit ALU for add, and, and or? How could we build a 32-bit ALU?

CarryOut

CarryIn

Suma

b

+

Page 22: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

22

Building a 32 bit ALU

b

0

2

Result

Operation

a

1

CarryIn

CarryOut

Result31a31

b31

Result0

CarryIn

a0

b0

Result1a1

b1

Result2a2

b2

Operation

ALU0

CarryIn

CarryOut

ALU1

CarryIn

CarryOut

ALU2

CarryIn

CarryOut

ALU31

CarryIn

Page 23: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

23

• Two's complement approach: just negate b and add

• How do we negate?

• A very clever solution:

What about subtraction (a – b) ?

0

2

Result

Operation

a

1

CarryIn

CarryOut

0

1

Binvert

b

Page 24: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

24

ALU symbol

ALU

zero

result

overflow

operation

a

b

carry-out

32

32

32

Page 25: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

25

Conclusions

• We can build an ALU to support the MIPS instruction set– key idea: use multiplexor to select the output we want

– we can efficiently perform subtraction using two’s complement

– we can replicate a 1-bit ALU to produce a 32-bit ALU

• Important points about hardware– all of the gates are always working

• not efficient from energy perspective !!

– the speed of a gate is affected by the number of connected outputs it has to drive (so-called Fan-Out)

– the speed of a circuit is affected by the number of gates in series

(on the “critical path” or the “deepest level of logic”)• Unit of measure: FO4 = inverter with Fan-Out of 4

• P4 (heavily superpipelined) has about 15 FO4 critical path

Page 26: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

26

• Is a 32-bit ALU as fast as a 1-bit ALU?• Is there more than one way to do addition?

– Two extremes: ripple carry and sum-of-products– How many logic layers do we need for these two extremes?

Can you see the ripple? How could you get rid of it?

c1 = b0c0 + a0c0 + a0b0

c2 = b1c1 + a1c1 + a1b1 c2 = (..subst c1..) c3 = b2c2 + a2c2 + a2b2 c3 = c4 = b3c3 + a3c3 + a3b3 c4 =

Not feasible! Why not?

Problem: Ripple carry adder is slow

Page 27: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

27

• An approach in-between our two extremes

• Motivation: – If we didn't know the value of carry-in,

what could we do?– When would we always generate a

carry? gi = ai bi

– When would we propagate the carry? pi = ai + bi

Carry-lookahead adder (1)

Cin

CoutCout = Gi + Pi Cin

a

b

Page 28: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

28

Carry-lookahead adder (2)

• Did we get rid of the ripple?

c1 = g0 + p0c0

c2 = g1 + p1c1 c2 = g1 + p1(g0 + p0c0)

c3 = g2 + p2c2 c3 = c4 = g3 + p3c3 c4 =

Feasible ?a0b0a1b1a2b2a3b3

Cin

P0

G0

ALU

Result0-34

P0 = p0.p1.p2.p3

G0= g3+(p3.g2)+(p3.p2.g1)+(p3.p2.p1.g0)

Page 29: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

29

• Use principle to build bigger adders

• Can’t build a 16 bit adder this way... (too big)

• Could use ripple carry of 4-bit CLA adders

• Better: use the CLA principle again!

C arr y-l ookahead ad d er (3 )

CarryIn

Result0--3

ALU0

CarryIn

Result4--7

ALU1

CarryIn

Result8--11

ALU2

CarryIn

CarryOut

Result12--15

ALU3

CarryIn

C1

C2

C3

C4

P0G0

P1G1

P2G2

P3G3

pigi

pi + 1gi + 1

ci + 1

ci + 2

ci + 3

ci + 4

pi + 2gi + 2

pi + 3gi + 3

a0b0a1b1a2b2a3b3

a4b4a5b5a6b6a7b7

a8b8a9b9

a10b10a11b11

a12b12a13b13a14b14a15b15

Carry-lookahead unit

Page 30: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

30

• More complicated than addition– accomplished via shifting and addition

• More time and more area• Let's look at 3 versions based on gradeschool

algorithm

0010 (multiplicand)

__*_1011 (multiplier)

• Negative numbers: convert and multiply– there are better techniques, we won’t look at them now

Multiplication (1)

Page 31: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

31

Multiplication (2)

Done

1. TestMultiplier0

1a. Add multiplicand to product andplace the result in Product register

2. Shift the Multiplicand register left 1 bit

3. Shift the Multiplier register right 1 bit

32nd repetition?

Start

Multiplier0 = 0Multiplier0 = 1

No: < 32 repetitions

Yes: 32 repetitions

64-bit ALU

Control test

MultiplierShift right

ProductWrite

MultiplicandShift left

64 bits

64 bits

32 bits

First implementationProduct initialized to 0

Page 32: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

32

Multiplication (3)

MultiplierShift right

Write

32 bits

64 bits

32 bits

Shift right

Multiplicand

32-bit ALU

Product Control test

Done

1. TestMultiplier0

1a. Add multiplicand to the left half ofthe product and place the result inthe left half of the Product register

2. Shift the Product register right 1 bit

3. Shift the Multiplier register right 1 bit

32nd repetition?

Start

Multiplier0 = 0Multiplier0 = 1

No: < 32 repetitions

Yes: 32 repetitions

Second version

Page 33: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

33

Multiplication (4)

ControltestWrite

32 bits

64 bits

Shift rightProduct

Multiplicand

32-bit ALU

Done

1. TestProduct0

1a. Add multiplicand to the left half ofthe product and place the result inthe left half of the Product register

2. Shift the Product register right 1 bit

32nd repetition?

Start

Product0 = 0Product0 = 1

No: < 32 repetitions

Yes: 32 repetitions

Final versionProduct initialized with multiplier

Page 34: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

34

Fast multiply: Booth’s Algorithm

• Exploit the fact that: 011111 = 100000 - 1Therefore we can replace multiplier, e.g.: 0001111100 = 0010000000 - 100

• Rules:Currentbit

Bit to theright

Explanation Operation

1 0 Begin 1s Subtractmultiplicand

1 1 Middle of 1s nothing

0 1 End of 1s Addmultiplicand

0 0 Middle of 0s nothing

Page 35: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

35

Booth’s Algorithm (2)

• Booth’s algorithm works for signed 2’s complement as well (without any modification)

• Proof: let’s multiply b * a(ai-1 - ai ) indicates what to do:

0 : do nothing+1: add b-1 : subtract

We get b*a =

This is exactly what we need !

i

ii

ii

ii

aab

baa

22

2)(

30

0

3131

31

01

Page 36: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

36

Division

• Similar to multiplication: repeated subtract

• The book discusses again three versions

Page 37: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

37

Divide (1)

• Well known algorithm: DividendDivisor 1000/1001010\1001 Quotient -1000 10

101 1010 -1000 10 Remainder

Page 38: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

38

Division (2)

• Implementation:

64-bit ALU

Control testWrite

64 bits

64 bits

32 bits

DivisorShift right

Remainder

QuotientShift left

Start

1. Substract the Divisor register from theRemainder register and place theresult in the Remainder register

Test Remainder

2.a Shift the Quotient registerto the left, setting the rightmost bit to 1

2.b Restore the original value by adding the Divisor register. Also, shift a 1 into the Quotient register

Shift Divisor Register right 1 bit

Done

33rd repetition?

>= 0 < 0

yes

no

Page 39: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

39

Multiply / Divide in MIPS

• MIPS provides a separate pair of 32-bit registers for the result of a multiply and divide: Hi and Lo

mult $s2,$s3 # Hi,Lo = $s2 * $s3div $s2,$s3 # Hi,Lo = $s2 mod $s3, $s2 / $s3

• Copy result to general purpose registermfhi $s1 # $s1 = Himflo $s1 # $s1 = Lo

• There are also unsigned variants of mult and div:multu and divu

Page 40: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

40

Shift instructions

• sll• srl• sra

• Why not ‘sla’ instruction ?

Shift: a quick way to multiply and divide with power of 2 (strength reduction). Is this always allowed?

Page 41: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

41

Floating Point (a brief look)

• We need a way to represent– numbers with fractions, e.g., 3.1416

– very small numbers, e.g., .000000001

– very large numbers, e.g., 3.15576 109

• Representation:– sign, exponent, significand: (–1)sign significand

2exponent

– more bits for significand gives more accuracy

– more bits for exponent increases range

• IEEE 754 floating point standard: – single precision : 8 bit exponent, 23 bit significand

– double precision: 11 bit exponent, 52 bit significand

Page 42: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

42

IEEE 754 floating-point standard

• Leading “1” bit of significand is implicit

• Exponent is “biased” to make sorting easier– all 0s is smallest exponent all 1s is largest– bias of 127 for single precision and 1023 for double

precision– summary: (–1)sign significand) 2exponent – bias

• Example:– decimal: -.75 = -3/4 = -3/22

– binary : -.11 = -1.1 x 2-1

– floating point: exponent = -1+bias = 126 = 01111110– IEEE single precision:

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 0 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Page 43: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

43

Floating Point Complexities

• Operations more complicated: align, renormalize, ...

• In addition to overflow we can have “underflow”

• Accuracy can be a big problem– IEEE 754 keeps two extra bits, guard and round, and

additional sticky bit (indicating if one of the remaining bits unequal zero)

– four rounding modes

– positive divided by zero yields “infinity”

– zero divide by zero yields “not a number”

– other complexities

• Implementing the standard can be tricky• Not using the standard can be even worse

– see text for description of 80x86 and Pentium bug!

Page 44: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

44

Conversion: decimal IEEE 754 FP

• Decimal number (base 10)123.456 = 1x102+2x101+3x100+4x10-1+5x10-2+6x10-3

• Binary number (base 2)101.011 = 1x22+0x21+1x20+0x2-1+1x2-2+1x2-3

• Example conversion: 5.375– Multiply with power of 2, to get rid of fraction:

5.375 = 5.375x16 / 16 = 86 x 2-4

– Convert to binary, and normalize to 1.xxxxx86 x 2-4 = 1010110 x 2-4 = 1.01011 x 22

– Add bias (127 for single precision) to exponent:exponent field = 2 + 127 = 129 = 1000 0001

– IEEE single precision format (remind the leading “1” bit):

sign exponent significand

0 10000001 01011000000000000000000

Page 45: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Assignment 1

Q1. What is Computer Architecture?

Q2. Convert the number (-16.125) into IEEE 754 single precision format.

Q3. Discuss about hardware used for signed operand multiplication.

Q4. Explain advantages and disadvantages of daisy chaining and polling bus arbitration scheme.

Q5.Discuss the importance of Array Multiplier. Explain your answer with a 3 bit to 2 bit array multipliers.

Page 46: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Tutorial 1

Q1. Convert the following arithmetic expressions from infix to reverse polish notation

i) A* B + C * D + E * F

ii) AB + A (BD + CE)

Q2. Convert the number (-16.125) into IEEE 754 single precision format.

Q3.Explain the floating-point addition/subtraction algorithm with flow chart.

Q4. What do you mean by Array Multiplier? Explain 3bit by 2 bit array multiplier.

Q5. What do you mean by Booth’s Algorithm? Discuss the required hardware and iterate your algorithm for the product (+8)*(-3).

Page 47: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Outcomes

After reading above topics students will be able to:

• An ability to apply the knowledge of mathematics, science, and

engineering in the field of Information Technology for the

solution of engineering problems.

• Design solutions for complex engineering problems and design system components or processes that meet the specified needs with appropriate consideration for the public health and safety, and the cultural, societal, and environmental considerations.

4716/08/19

Page 48: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

UNIT 2Instructions Cycle &

Control Unit

16/08/19Department of Computer Science &

Engineering48

COMPUTER ORGANIZATION

& ARCHITECTURE

Page 49: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Instructions• Types & Format

Cycles Control Unit

Hardwired Microprogram

16/08/19 49Department of Computer Science &

Engineering

Page 50: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Learning Objectives

The objectives of the following slide is to make student aware about the :

Instructions & their types . Instruction Cycles Control Unit

Hardwired Microprogram

5016/08/19 Department of Computer Science & Engineering

Page 51: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

INSTRUCTION

5116/08/19 Department of Computer Science & Engineering

Instruction is command which is givenby the user to computer.

Page 52: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Instruction cycle

5216/08/19 Department of Computer Science & Engineering

• The time period during which one instruction is fetched from memory and execute when a computer given an instruction in machine language.

• Each instruction is further divided into sequence of phases.

• After the execution the program counter is incremented to point to the next instruction.

Page 53: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Phases

5316/08/19 Department of Computer Science & Engineering

• Fetch an instruction from memory• Decode the instruction• Execute the instruction

Page 54: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Fetch cycle

• In this phase the sequence counter is initialized to 0.

• The address of first instruction from PC is loaded into address register during the first clock cycle.

16/08/1954

Page 55: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

The Fetch Cycle

• Consists of three time units and four micro-operations.

• Each micro-operation involves the movement of data into or out of a register.

IRMDR

PCPCt

MDRMEMORYt

MARPCt

1:

:

:

3

2

1

Page 56: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Decode cycle

16/08/1956

• The instruction is decoded by the instruction decoder of a processor.

• All the bits of the instruction under execution stored in IR are analyzed and decode in third clock cycle.

Page 57: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

16/08/1957

Page 58: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Micro-operations

• Are the functional, or atomic, operations of a processor.

• A single micro-operation generally involves a transfer between registers, transfer between registers and external bus, or a simple ALU operation.

Page 59: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Micro-operations and the Clock

• Each clock pulse defines a time unit, which are of equal duration.

• Micro-operations are performed within this time unit.

• If multiple micro-operations do not interfere with one another then grouping of micro-operations can be performed within one time unit.

• Grouping can be performed as long as; – Proper sequence of events are followed

• PC MAR must be done first in order for MEMORY MDR

– Conflicts are avoided• MEMORY MDR can not be in the same time unit as MDR

IR

Page 60: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

The Indirect Cycle

• Occurs if the instruction specifies an indirect address.

• Consists of three time unit and three micro-operations.

• Data is transferred to the MAR from the IR, which is used to fetch the address of the operand, the IR is then updated from MDR so it contains a direct address rather than indirect.

IRMDRt

MDRMEMORYt

MARIRt

:

:

:

3

2

1

Page 61: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Types of Micro-operation

• Transfer data between registers

• Transfer data from register to external

• Transfer data from external to register

• Perform arithmetic or logical operations

Functions of Control Unit• Sequencing

Causes the processor to step through a series of micro-operations

• ExecutionCauses the performance of each micro-operation

• This is done using Control Signals

Page 62: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Model of Control Unit

Page 63: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Control Signals - Input

• Clock– One micro-instruction (or set of simultaneous micro

instructions) per clock pulse.

• Instruction register– Op-code of the current instruction– Determines which micro-instructions are performed

• Flags– Determines the status of the processor– Results of previous ALU operations

• Control Signals from control bus– Interrupts– Acknowledgements

Page 64: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Data Paths and Control Signals

Page 65: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

PROCESSOR ORGANIZATION

• Organization is how features are implemented– Control signals, interfaces, memory

technology.– e.g. Is there a hardware multiply unit or

is it done by repeated addition

Page 66: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Internal Organization

• Usually a single internal bus• Using single bus simplifies & saves space• Gates control movement of data onto

and off the bus• Control signals control data transfer to

and from external systems bus• Temporary registers needed for proper

operation of ALU

Page 67: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Hardwired Implementation

• Control unit inputs• Flags and control bus

– Each bit means something• Instruction register

– Op-code causes different control signals for each different instruction

– Decoder takes encoded input and produces single output

– n binary inputs and 2n outputs

Page 68: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Control Unit with Decoded Inputs

Page 69: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Hardwired Logic

• Logic Gates Hardwired Internally– Functions predefined– Truth Tables– Boolean Logic used to define timing– Connect Instructions – Unique logic for each set of op-codes

Page 70: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Problems With Hard Wired Designs

• Complex sequencing & micro-operation logic

• Difficult to design and test• Inflexible design• Difficult to add new instructions

Page 71: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Assignment 2

Q1. List and explain different type of shift micro operation.

Q2. Draw the flow chart for the execution of a complete instruction in a basic computer.

Q3. An instruction format, there are 16 bits in an instruction word. Bit 0 to 11 convey the address of memory location for memory related instructions. For non-memory instructions these bits convey various registers or I/O operations. Bits 12 to 14 show the various memory operations such as AND, ADD, LDA etc. Bit 15 shows if the memory accessed directly or indirectly. For such an instruction format draw the block diagram of control unit of a computer and briefly explain how an instruction decoded and executed by this control unit.

Q4. What is subroutine call ? Explain with an example.

Q5. What is micro instruction. Explain the working of micro program sequencer with diagram.

Page 72: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Tutorial 2

Q1. Write a program to evaluate the arithmetic statement

X=(A-B+C*(D*E-F))/(G+H*K)

i. Using a general register computer with one address instructions.

ii. Using an accumulator type computer with zero address instruction.

Q2. An instruction is stored at location 300 with its adder field at 301 with its adder field at 301. The adder field has the value 400. A processor register R1 contains the number 200. Evaluate the effective address if the addressing mode of the instruction is: i) Direct ii) Immediate iii) Relative

Q3. What is addressing mode? Explain different types with diagram.

Q4. Differentiate horizontal and vertical microprogramming .

Q5. What is control word?

Page 73: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Outcomes

After reading above topics students will be able to:

• Design solutions for complex engineering problems and design system components or processes that meet the specified needs with appropriate consideration for the public health and safety, and the cultural, societal, and environmental considerations.

7316/08/19

Page 74: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

UNIT 3Memory Organization

16/08/19Department of Computer Science &

Engineering74

COMPUTER ORGANIZATION

& ARCHITECTURE

Page 75: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Memory Types Concepts Hierarchy

RAM Organization Cache Memory & Performance Mapping Techniques Virtual Memory

16/08/19 75Department of Computer Science &

Engineering

Page 76: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Learning Objectives

The objectives of the following slide is to make student aware about the :

Memory Types, Concepts, Hierarchy RAM Organization Cache Memory & Performance Mapping Techniques Virtual Memory Implementation

7616/08/19 Department of Computer Science & Engineering

Page 77: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Multiple-Chip SRAM

Fig. 17.2 Eight 128K 8 SRAM chips forming a 256K 32 memory unit.

/

WE

CS

OE

D in D out

Addr

WE

CS

OE

D in D out

Addr

WE

CS

OE

D in D out

Addr

WE

CS

OE

D in D out

Addr

WE

CS

OE

D in D out

Addr

WE

CS

OE

D in D out

Addr

WE

CS

OE

D in D out

Addr

18

/

17

32 WE

CS

OE

D in D out

Addr

Data in

Data out, byte 3

Data out, byte 2

Data out, byte 1

Data out, byte 0

MSB

Address

Page 78: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

SRAM with Bidirectional Data Bus

Fig. 17.3 When data input and output of an SRAM chip are shared or connected to a bidirectional data bus, output must be disabled during write operations.

/ h

/

g

Write enable

Data in/out

Chip select

Output enable

Address Data in Data out

Page 79: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

DRAM and Refresh Cycles

DRAM vs. SRAM Memory Cell Complexity

Fig. 17.4 Single-transistor DRAM cell, which is considerably simpler than SRAM cell, leads to dense, high-capacity DRAM memory chips.

Word line

Capacitor

Bit line

Pass transistor

Word line

Bit line

Compl. bit line

Vcc

(a) DRAM cell (b) Typical SRAM cell

Page 80: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Bridging the CPU-Memory Speed Gap

Idea: Retrieve more data from memory with each access

Fig. 17.9 Two ways of using a wide-access memory to bridge the speed gap between the processor and memory.

Wide-access

memory

.

.

.

Narrow bus to

processor Mux

Wide-access

memory

. . .

Wide bus to

processor

.

.

. Mux

(a) Buffer and mult iplexer at the memory side

(a) Buffer and mult iplexer at the processor side

. . .

Page 81: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Nonvolatile Memory

ROM PROM

EPROM

Read-only memory organization, with the fixed

contents shown on the right.

B i t l i n e s

Word lines

Word contents

1 0 1 0

1 0 0 1

0 0 1 0

1 1 0 1

S u p p l y v o l t a g e

Page 82: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

The Need for a Memory Hierarchy

The widening speed gap between CPU and main memory

Processor operations take of the order of 1 ns

Memory access requires 10s or even 100s of ns

Memory bandwidth limits the instruction execution rate

Each instruction executed involves at least one memory access

Hence, a few to 100s of MIPS is the best that can be achieved

A fast buffer memory can help bridge the CPU-memory gap

The fastest memories are expensive and thus not very large

A second (third?) intermediate cache level is thus often used

Page 83: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Nov. 2014Computer Architecture, Memory System Design

Slide 83

Typical Levels in a Hierarchical Memory

Fig. 17.14 Names and key characteristics of levels in a memory hierarchy.

Tertiary Secondary 

Main 

Cache 2 

Cache 1 

Reg’s  $Millions $100s Ks 

$10s Ks 

$1000s  

$10s  

$1s  

Cost per GB Access latency  Capacity 

TBs 10s GB 

100s MB 

 MBs  

 10s KB  

 100s B  

min+ 10s ms 

100s ns 

10s ns  

 a few ns  

  ns  

Speed gap  

Page 84: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Nov. 2014Computer Architecture, Memory System Design

Slide 84

Cache Memory Organization

Processor speed is improving at a faster rate than memory’s• Processor-memory speed gap has been widening• Cache is to main as desk drawer is to file cabinet

Page 85: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Cache, Hit/Miss Rate, and Effective Access Time

One level of cache with hit rate h

Ceff = hCfast + (1 – h)(Cslow + Cfast) = Cfast + (1 – h)Cslow

CPUCache(fast)

memory

Main(slow)

memory

Reg file

Word

Line

Data is in the cache fraction h of the time(say, hit rate of 98%)

Go to main 1 – h of the time(say, cache miss rate of 2%)

Cache is transparent to user;transfers occur automatically

Page 86: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Multiple Cache Levels

Fig. 18.1 Cache memories act as intermediaries between the superfast processor and the much slower main memory.

Cleaner and easier to analyze

Level-2 cache

Main memory

CPU CPU registers

Level-1 cache

Level-2 cache

Main memory

CPU CPU registers

Level-1 cache

(a) Level 2 between level 1 and main (b) Level 2 connected to “backside” bus

Page 87: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Cache Memory Design Parameters

Cache size (in bytes or words). A larger cache can hold more of the program’s useful data but is more costly and likely to be slower.

Block or cache-line size (unit of data transfer between cache and main). With a larger cache line, more data is brought in cache with each miss. This can improve the hit rate but also may bring low-utility data in.

Placement policy. Determining where an incoming cache line is stored. More flexible policies imply higher hardware cost and may or may not have performance benefits (due to more complex data location).

Replacement policy. Determining which of several existing cache blocks (into which a new cache line can be mapped) should be overwritten. Typical policies: choosing a random or the least recently used block.

Write policy. Determining if updates to cache words are immediately forwarded to main (write-through) or modified blocks are copied back to main if and when they must be replaced (write-back or copy-back).

Page 88: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

What Makes a Cache Work?

Assuming no conflict in address mapping, the cache will hold a small program loop in its entirety, leading to fast execution.

Temporal localitySpatial locality

9-instruction program loop

Address mapping (many-to-one)

Cache memory

Main memory

Cache line/block (unit of t rans fer between main and cache memories)

Page 89: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Direct-Mapped Cache

Fig. 18.4 Direct-mapped cache holding 32 words within eight 4-word lines. Each line is associated with a tag and a valid bit.

3-bit line index in cache

2-bit word offset in line Main memory locations

0-3 4-7

8-11

36-39 32-35

40-43

68-71 64-67 72-75

100-103 96-99 104-107

Tag Word

address

Valid bits

Tags

Read tag and specified word

Com-pare

1,Tag

Data out

Cache miss

1 if equal

Page 90: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Accessing a Direct-Mapped Cache

Example

Fig. 18.5 Components of the 32-bit address in an example direct-mapped cache with byte addressing.

Show cache addressing for a byte-addressable memory with 32-bit addresses. Cache line W = 16 B. Cache size L = 4096 lines (64 KB).

Solution

Byte offset in line is log216 = 4 b. Cache line index is log24096 = 12 b.

This leaves 32 – 12 – 4 = 16 b for the tag.

12-bit line index in cache

4-bit byte offset in line

Byte address in cache

16-bit line tag

32-bit address

Page 91: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Set-Associative Cache

Two-way set-associative cache holding 32 words of data within 4-word lines and 2-line sets.

Main memory locations

0-3

16-19

32-35

48-51

64-67

80-83

96-99

112-115

Valid bits

Tags

1

0

2-bit set index in cache

2-bit word offset in line

Tag

Word address

Option 0

Option 1

Read tag and specified word from each option

Com-pare

1,Tag

Com-pare

Data out

Cache

miss

1 if equal

Page 92: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Accessing a Set-Associative Cache

Example

Components of the 32-bit address in an example two-way set-associative cache.

Show cache addressing scheme for a byte-addressable memory with 32-bit addresses. Cache line width 2W = 16 B. Set size 2S = 2 lines. Cache size 2L = 4096 lines (64 KB).

Solution

Byte offset in line is log216 = 4 b. Cache set index is (log24096/2) = 11 b.

This leaves 32 – 11 – 4 = 17 b for the tag.11-bit set index in cache

4-bit byte offset in line

Address in cache used to read out two candidate

items and their control info

17-bit line tag

32-bit address

Page 93: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Disk Memory Basics

Disk memory elements and key terms.

Track 0 Track 1

Track c – 1

Sector

Recording area

Spindle

Direction of rotation

Platter

Read/write head

Actuator

Arm

Track 2

Page 94: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Disk Drives

Typically

2 - 8 cm

Typically2-8 cm

Page 95: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Access Time for a Disk

The three components of disk access time. Disks that spin faster have a shorter average and worst-case access time.

1. Head movement from current position to desired cylinder: Seek time (0-10s ms)

Rotation

2. Disk rotation until the desired sector arrives under the head: Rotational latency (0-10s ms)

3. Disk rotation until sector has passed under the head: Data transfer time (< 1 ms)

Sector

1 2

3

Page 96: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Disk Performance

Reducing average seek time and rotational latency by performing disk accesses out of order.

Seek time = a + b(c – 1) + (c – 1)1/2

Average rotational latency = (30 / rpm) s = (30 000 / rpm) ms

Arrival order of access requests: A, B, C, D, E, F Possible out-of-order reading: C, F, D, E, B, A

A

B

C

D

E F

Rotation

Page 97: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Nov. 2014Computer Architecture, Memory System Design

Slide 97

The Need for Virtual Memory

Program segments in main memory and on disk.

Program and data on several disk tracks

System

Stack

Active pieces of program and data in memory

Unused space

Page 98: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Nov. 2014Computer Architecture, Memory System Design

Slide 98

Fig. Data movement in a memory hierarchy.

Memory Hierarchy: The Big Picture

Pages Lines

Words

Registers

Main memory

Cache

Virtual memory

(transferred explicitly

via load/store) (transferred automatically

upon cache miss) (transferred automatically

upon page fault)

Page 99: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Address Translation in Virtual Memory

Fig.Virtual-to-physical address translation parameters.

ExampleDetermine the parameters in Fig. 20.3 for 32-bit virtual addresses, 4 KB pages, and 128 MB byte-addressable main memory.

Solution: Physical addresses are 27 b, byte offset in page is 12 b; thus, virtual (physical) page numbers are 32 – 12 = 20 b (15 b)

Virtual address

Physical address

Physical page number

Virtual page number Offset in page

Offset in page

Address translation

P bits

P bits

V P bits

M P bits

Page 100: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Page Tables and Address Translation

Fig. The role of page table in the virtual-to-physical address translation process.

Page table

Main memory

Valid bits

Page table register

Virtual page

number

Other f lags

Page 101: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Protection and Sharing in Virtual Memory

Fig. Virtual memory as a facilitator of sharing and memory protection.

01234567

01234567

Page table for process 1

Main memory

Permission bits

Pointer Flags

Page table for process 2

To disk memory

Only read accesses allow ed

Read & w rite accesses allowed

Page 102: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

The Latency Penalty of Virtual Memory

Virtual address

Memory access 1

Fig.

Physical address

Memory access 2

Page table

Main memory

Valid bits

Page table register

Virtual page

number

Other f lags

Page 103: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Virtual- or Physical-Address Cache?

Fig.Options for where virtual-to-physical address translation occurs.

TLB access may form an extra pipeline stage, thus the penalty in throughput can be insignificant

Cache may be accessed with part of address that is common between virtual and physical addresses

TLB Main memory Virtual-address cache

TLB Main memory Physical-address cache

TLB

Main memory Hybrid-address

cache

Page 104: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Page Replacement Policies

Least-recently used (LRU) policy

Implemented by maintaining a stack

Pages A B A F B E A

LRU stackMRU D A B A F B E A

B D A B A F B EE B D D B A F B

LRU C E E E D D A F

Page 105: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Approximate LRU Replacement Policy

Fig. 20.8 A scheme for the approximate implementation of LRU .

Least-recently used policy: effective, but hard to implement

Approximate versions of LRU are more easily implemented Clock policy: diagram below shows the reason for name Use bit is set to 1 whenever a page is accessed

Page slot 0

Page slot 1Page slot 70

1

0

0

1

1

0

1

0

1

0

1

0

0

0

1

(a) Before replacement (b) After replacement

Page 106: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Improving Virtual Memory Performance

Table 20.1 Memory hierarchy parameters and their effects on performance

Parameter variation Potential advantages Possible disadvantages

Larger main or cache size

Fewer capacity misses Longer access time

Larger pages or longer lines

Fewer compulsory misses (prefetching effect)

Greater miss penalty

Greater associativity (for cache only)

Fewer conflict misses Longer access time

More sophisticated replacement policy

Fewer conflict misses Longer decision time, more hardware

Write-through policy (for cache only)

No write-back time penalty, easier write-miss handling

Wasted memory bandwidth, longer access time

Page 107: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Fig. Data movement in a memory hierarchy.

Summary of Memory Hierarchy

Cache memory: provides illusion of very high speed

Virtual memory: provides illusion of very large size

Main memory: reasonable cost, but slow & small

Locality makes the illusions work Pages

Lines Words

Registers

Main memory

Cache

Virtual memory

(transferred explicitly

via load/store) (transferred automatically

upon cache miss) (transferred automatically

upon page fault)

Page 108: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Assignment 3

Q1. What do you mean by memory hierarchy?

Q2.Discuss semiconductor RAM. Differentiate SRAM and DRAM.

Q3. What is cache memory? Why it is implemented?

Q4.Explain the concept of virtual memory.

Q5. Write short notes on following:

i. Optical Disk

ii. Magnetic Tape

Page 109: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Tutorial 3

Q1. A computer uses RAM chip of 1024*1 capacity. How many chips are needed and how should there address lines be connected to provide a memory capacity of 1024 bytes?

Q2.A ROM chip of 1024*8 bits has four select inputs and operates on a 5 volt power supply. How many pins are needed for the IC package? Draw a block diagram and label all inputs and output terminals in the ROM.

Q3. Explain various cache mapping techniques. A computer system has 4K word cache organized in block set associative manner with 4 blocks per set, 64 words per block. The main memory contains 65536 blocks. How many bits are there in each of TAG, SET and WORD fields?

Page 110: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Q4. Define the term address space and memory space. An address space is specified by 24 bits and corresponding memory space specified by 16 bits. Find the following:

i. How many words are there in address space?

ii. How many words are there in memory space?

iii. If a page consists of 2K word, how many pages and blocks are there in the system?

Q5. A Computer employs RAM chips of 2568 and ROM chips of 1024*8. The computer needs 2KB of RAM and

4KB of ROM and 4 interface units, each of 4 registers:

i. How many RAM and ROM chips are needed

ii. Draw memory address map for the system.

Page 111: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Outcomes

After reading above topics students will be able to:

• Use research-based knowledge and research methods including design of experiments, analysis and interpretation of data, and synthesis of the information to provide valid conclusions.

11116/08/19

Page 112: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

UNIT 4I/O Organization

16/08/19Department of Computer Science &

Engineering112

COMPUTER ORGANIZATION

& ARCHITECTURE

Page 113: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Peripheral Devices I/O

Modules Interfaces Processor Port

Modes of Transfer Programmed I/O Interrupt Driven I/O DMA

Asynchronous & Synchronous Communication

16/08/19 113Department of Computer Science &

Engineering

Page 114: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Learning Objectives

The objectives of the following slide is to make student aware about the :

Peripheral Devices I/O

Modules Interfaces Processor Port

Modes of Transfer Programmed I/O Interrupt Driven I/O DMA

Asynchronous & Synchronous Communication

11416/08/19 Department of Computer Science & Engineering

Page 115: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Input-Output Organization

• 11-1 Peripheral Devices– I/O Subsystem

• Provides an efficient mode of communication between the central system and the outside environment

– Peripheral (or I/O Device)• Input or Output devices attached to the computer

– Monitor (Visual Output Device) : CRT, LCD– KBD (Input Device) : light pen, mouse, touch screen, joy stick, digitizer– Printer (Hard Copy Device) : Dot matrix (impact), thermal, ink jet, laser (non-impact)– Storage Device : Magnetic tape, magnetic disk, optical disk

– ASCII (American Standard Code for Information Interchange) Alphanumeric Characters• I/O communications are usually involved in the transfer of ASCII information• ASCII Code • 11-2 Input-Output Interface

– Interface• 1) A conversion of signal values may be required

Page 116: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

• 2) A synchronization mechanism may be needed– The data transfer rate of peripherals is usually slower than the transfer rate of the CPU

• 3) Data codes and formats in peripherals differ from the word format in the CPU and Memory

• 4) The operating modes of peripherals are different from each other– Each peripherals must be controlled so as not to disturb the operation of other peripherals

connected to the CPU

– Interface• Special hardware components between the CPU and peripherals• Supervise and Synchronize all input and output transfers

– I/O Bus and Interface Modules : Fig. 11-1

• I/O Bus– Data lines– Address lines– Control lines

• Interface Modules : 주로 VLSI Chip 사용

– SCSI (Small Computer System Interface)– IDE (Integrated Device Electronics)– Centronics– RS-232– IEEE-488 (GPIB)

I n te r f a c e

K e y b o a rda n d

d is p la yt e r m in a l

I n te r f a c e

M a g n e t ict a p e

I n te r f a c e

M a g n e t i cd is k

I n te r f a c e

P r in te r

P r o c e s s o r

D a ta

C o n t r o l

A d d r e s s

I / O b u s

Page 117: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

• I/O command : 8251 SIO – Control Command– Status Command– Input Command– Output Command

– I/O Bus versus Memory Bus• Computer buses can be used to communicate with memory and I/O

– 1) Use two separate buses, one for memory and the other for I/O : Fig. 11-19, p. 421» I/O Processor

– 2) Use one common bus for both memory and I/O but have separate control lines for each : Isolated I/O or I/O Mapped I/O

» IN, OUT : I/O Instruction» MOV or LD : Memory read/write Instruction

– 3) Use one common bus for memory and I/O with common control lines : Memory Mapped I/O

» MOV or LD : I/O and Memory read/write Instruction

Intel, Zilog

Motorola* Control Lines

I/O Request, Mem Request, Read/Write

* Control LinesRead/Write

Page 118: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

– Example of I/O Interface : Fig. 11-2

• 4 I/O port : Data port A, Data port B, Control, Status

– 비교 : 8255 PIO ( port A, B, C, Control/Status )

• Address Decode : CS, RS1, RS0

• 11-3 Asynchronous Data Transfer– Synchronous Data Transfer

• All data transfers occur simultaneously during the occurrence of a clock pulse

• Registers in the interface share a common clock with CPU registers

– Asynchronous Data Transfer• Internal timing in each unit (CPU and

Interface) is independent• Each unit uses its own private clock for

internal registers

T im in g a n d

C o n t ro l

C S

W R

R D

R S0

R S1

B u sb u ff e r s

S t a tu sr e g i s te r

C o n t ro lr e g i s te r

P o r t Br e g i s te r

P o r t Ar e g i s te rB id i re c t io n a l

d a t a b u s

S ta t u s

T o C P U

I / O r e a d

R e g is te r s e le c t

C h ip s e le c t

I / O w r i te

I / O d a t a

I / O d a t a

C o n t r o l

T o I / O d e v ic e

Inte

rnal bus

C S R S0

R S1

R e g is te r s e le c te d

N o n e : d a ta b u s in h ig h - im p e d a n c e

P o r t A r e g is te r

P o r t B r e g is te r

C o n tro l r e g is t e r

S t a t u s r e g is te r

0

1

1

1

1

1 0

0

1

1

0

1

0

× ×

Page 119: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

– Strobe : Control signal to indicate the time at which data is being transmitted

• 1) Source-initiated strobe : Fig. 11-3• 2) Destination-initiated strobe : Fig. 11-4

• Disadvantage of strobe method– Destination 이 Data 가 가갔 가 를아무이상없이잘 져 는지알수 없다 .– 따라서 Handshake method 를사용하여 Data 전송을확인함

Fig. 11-3 Source-initiated strobeFig. 11-4 Destination-initiated strobe

S o u r c eu n it

D e s t in a t io nu n it

( a ) B lo c k d ia g r a m

V a lid d a t aD a ta

S t r o b e

( b ) T im in g d ia g r a m

D a ta b u s

S tr o b eS o u r c e

u n i tD e s t in a t io n

u n i t

( a ) B lo c k d ia g r a m

V a l id d a taD a t a

S t r o b e

( b ) T im in g d ia g r a m

D a ta b u s

S tr o b e

Page 120: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

– Handshake : Agreement between two independent units

• 1) Source-initiated handshake : Fig. 11-5• 2) Destination-initiated handshake : Fig. 11-6

• Timeout : If the return handshake signal does not respond within a given time period, the unit assumes that an error has occurred.

Fig. 11-5 Source-initiated handshakeFig. 11-6 Destination-initiated handshake

S o u rc eu n i t

D e s t in a t io nu n i t

( a ) B lo c k d i a g r a m

V a l id d a t aD a t a

( b ) T im in g d ia g ra m

D a ta b u s

D a t a v a lid

D a ta a c c e p t e d

D a ta v a l id

D a t a a c c e p t e d

P la c e d a t a o n b u sE n a b le d a ta v a lid .

D is a b le d a ta v a l idI n v a l id a te d a t a o n b u s

A c c e p t d a t a f ro m b u sE n a b le d a ta a c c e p te d

D is a b le d a ta a c c e p t e dR e a d y to a c c e p t d a ta

( in it ia l s t a t e )

( c ) S e q u e n c e o f e v e n ts

S o u r c e u n i t D e s t in a t i o n u n i t

S o u r c eu n it

D e s t i n a t io nu n it

( a ) B lo c k d i a g r a m

V a l id d a ta

( b ) T i m i n g d i a g ra m

D a t a b u s

D a t a v a l id

R e a d y f o r d a t a

D a t a v a l id

D a ta b u s

P la c e d a t a o n b u sE n a b le d a ta v a l id .

D i s a b le d a t a v a l i dIn v a l id a te d a t a o n b u s

( in i t ia l s ta t e )

A c c e p t d a ta f r o m b u sD is a b le re d a y f o r d a t a

R e a d y t o a c c e p t d a t a .

E n a b le r e a d y f o r d a ta

( c ) S e q u e n c e o f e v e n t s

R e a d y f o r d a t a

S o u r c e u n i t D e s t in a t io n u n i t

Page 121: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

– Asynchronous Serial Transfer• Synchronous transmission : Sec. 11-8

– The two unit share a common clock frequency – Bits are transmitted continuously at the rate dictated by the clock pulses

• Asynchronous transmission : Fig. 11-7– Special bits are inserted at both ends of the character code– Each character consists of three parts :

» 1) start bit : always “0”, indicate the beginning of a character» 2) character bits : data» 3) stop bit : always “1”

• Asynchronous transmission rules :– When a character is not being sent, the line is kept in the 1-state– The initiation of a character transmission is detected from the start bit,

which is always “0”– The character bits always follow the start bit– After the last bit of the character is transmitted, a stop bit is detected

when the line returns to the 1-state for at least one bit time

1 1 11 0000

S t a r tb i t C h a ra c te r b i t s

S to pb i t

Page 122: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

• Baud Rate : Data transfer rate in bits per second– 10 character per second with 11 bit format = 110 bit per second

• UART (Universal Asynchronous Receiver Transmitter) : 8250• UART (Universal Synchronous/Asynchronous Receiver Transmitter) : 8251

– Asynchronous Communication Interface : Fig. 11-8

• 8250 SIO– 80 : Data Write/Read (Transmit/Receive)– 81 : Control Write/ Status Read

» A0 = RS (register select)

• Double Buffered (in transmit register)– New character can be loaded as soon as the previous one starts transmission

• 3 possible errors (in status register)– 1) parity error

» Even or Odd parity error– 2) framing error

» right number of stop bits is not detected at the end of the received character

– 3) overrun error » CPU does not read the character from the receiver register before the next one is available

T im in g a n d

C o n t r o l

C S

W R

R D

R S

B u sb u ff e r s

S ta tu sr e g is t e r

C o n t ro lr e g is t e r

R e c e iv e rr e g is t e r

T r a n s m it te rr e g is t e rB id i r e c t io n a l

d a t a b u s

I / O re a d

R e g is t e r s e le c t

C h ip s e le c t

I / O w r i te

Inte

rnal bus

C S R S R e g is t e r s e le c te d

N o n e : d a ta b u s in h ig h - im p e d a n c e

T r a n s m it te r r e g is te r

0

1

1

1

1 1

0

1

0

×

S h i f tr e g is te r

S h i f tr e g is te r

T r a n s m it te rc o n tr o l

a n d c lo c k

R e c e iv e rc o n tr o l

a n d c lo c k

T r a n s m itd a ta

T r a n s m it te rc lo c k

R e c e i v e rd a ta

R e c e i v e rc lo c k

R D

W R

R D

W R

O p e r a t io n

×

C o n tr o l r e g i s te r

R e c e i v e r r e g is t e r

S ta tu s re g i s t e r

Page 123: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

• 11-4 Modes of Transfer– Data transfer to and from peripherals

• 1) Programmed I/O : in this section• 2) Interrupt-initiated I/O : in this section and sec. 11-5• 3) Direct Memory Access (DMA) : sec. 11-6• 4) I/O Processor (IOP) : sec. 11-7

– Example of Programmed I/O : Fig. 11-10, 11-11

– Interrupt-initiated I/O• 1) Non-vectored : fixed branch address• 2) Vectored : interrupt source supplies the branch address (interrupt vector)

I n te r f a c e

C P UI / O

d e v ic e

D a ta re g i s te r

S ta tu sr e g is te r

F

D a ta b u s

I / O w r i te

I / O r e a d

A d d re s s b u s

D a ta a c c e p te d

D a ta v a lid

I / O b u s

F = F la g b i t

R e a d s ta tu s re g i s te r

C h e c k fl a g b i t

R e a d d a ta re g i s te r

T ra n s f e r d a ta to m e m o ry

C o n t in u ew i th

p ro g r a m

F l a g

O p e r a t i o nc o m p l e t e ?

= 0

= 1

y e s

n o

Page 124: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

– Software Considerations• I/O routines

– software routines for controlling peripherals and for transfer of data between the processor and peripherals

• I/O routines for standard peripherals are provided by the manufacturer (Device driver, OS or BIOS)

• I/O routines are usually included within the operating system• I/O routines are usually available as operating system procedures ( OS or

BIOS function call)

• 11-5 Priority Interrupt– Priority Interrupt

• Identify the source of the interrupt when several sources will request service simultaneously

• Determine which condition is to be serviced first when two or more requests arrive simultaneously

– 1) Software : Polling– 2) Hardware : Daisy chain, Parallel priority

Page 125: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

– Polling• Identify the highest-priority source by

software means– One common branch address is used for all

interrupts– Program polls the interrupt sources in sequence– The highest-priority source is tested first

• Polling priority interrupt 의단점– If there are many interrupt sources, the time

required to poll them can exceed the time available to service the I/O device

– 따라서 Hardware priority interrupt 를사용

– Daisy-Chaining : Fig. 11-12

“1” “1” “0”Device 2 Interrupt Request

D e v i c e 1P I P O

D e v ic e 3P I P O

D e v ic e 2P I P O T o n e x t

D e v ic e

C P U

I N T

I N T A C K

I n te r r u p t r e q u e s t

I n te r r u p t a c k n o w le d g e

P r o c e s s o r d a ta b u s

V A D 1 V A D 3V A D 2

Page 126: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

• One stage of the daisy-chain priority arrangement : Fig. 11-13

No interrupt request Invalid : interrupt request, but no acknowledge No interrupt request : Pass to other device (other device requested interrupt ) Interrupt request

INTACK

INTS Q

R

V e c to r a d d r e s s

D e la y

E n a b le

R F

P IP r io r i ty in

I n te r r u p tr e q u e s t

f r o m d e v ic e

O p e n - c o l le c to rin v e r t e r I n te r r u p t r e q u e s t to C P U

P r io r i t y o u tP O

V A D

R FP I P O E n a b le0011

0

01

1

0011

0001

Page 127: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

– Parallel Priority • Priority Encoder Parallel

Priority : Fig. 11-14– Interrupt Enable F/F (IEN) : set or

cleared by the program– Interrupt Status F/F (IST) : set or

cleared by the encoder output• Priority Encoder Truth Table :

Tab. 11-2– Interrupt Cycle

• At the end of each instruction cycle, CPU checks IEN and IST

• if both IEN and IST equal to “1”• CPU goes to an Instruction

Cycle– Sequence of microoperation

during Instruction Cycle

: Decrement stack point: Push PC into stack: Enable INTACK: Transfer VAD to PC: Disable further interrupts

Branch to ISR

0

3

2

1

0

3

2

1

P r io r i ty e n c o d e r

I 0

I 2

I 3

I 1

d is k

K e y b o a r d

R e a d er

P r i n t e r

I n te r r u p tr e g is t e r

y

0

0

0

0

0

0

x

I S TI E N

V A Dto C P U

E n a b le

I n t e r r u p tto C P U

I N T A C Kf r o m C P U

M a s kr e g is t e r

ninstructionext Fetch to

0

1

][

1

Go

IEN

VADPC

INTACK

PCSPM

SPSP

Page 128: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

– Software Routines• CPU 가 현재 main program 의 749 번지를실행

도중에 KBD interrupt 발생• KBD service program 의 255 번지를실행도중

에 DISK interrupt 발생

KBD Int. Here 749

DISK Int. Here 255

J M P D I S K

M a in p r o g r a m

J M P K B D

J M P R D R

J M P P D R

S t a c k

7 5 02 5 6

M e m o r y

0

3

2

1

A d d r e s s

7 5 0

P r o g r a m t o s e r v ic em a g n e t i c d i s k

P r o g r a m t o s e r v ic eK e y b o a r d

P r o g r a m t o s e r v ic ec h a r a c t e r r e a d e r

P r o g r a m t o s e r v ic el i n e p r i n te r

D I S K

2 5 6

K B D

R D R

P T R

I / O s e r v ic e p r o g r a m s

Page 129: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

• 11-6 Direct Memory Access (DMA)– DMA

• DMA controller takes over the buses to manage the transfer directly between the I/O device and memory (Bus Request/Grant)

Fig. 11-14

C P U

B R

B G

D B U S

W R

A B U S

R D

B u s r e q u e s t

B u s g r a n t

A d d r e s s b u s

W r i t e

R e a d

D a t a b u s H ig h - im p e d a n c e

( d is a b le )w h e n B G is

e n a b le d

D M AC o n t r o l le r

B R

B G

Page 130: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

– Transfer Modes• 1) Burst transfer :• 2) Cycle stealing transfer :

– DMA Controller ( Intel 8237 DMAC ) : Fig. 11-17

• DMA Initialization Process– 1) Set Address register :

» memory address for read/write– 2) Set Word count register :

» the number of words to transfer – 3) Set transfer mode :

» read/write, » burst/cycle stealing, » I/O to I/O, » I/O to Memory, » Memory to Memory» Memory search» I/O search

– 4) DMA transfer start : next section– 5) EOT (End of Transfer) :

» Interrupt

C o n t r o l lo g i c

C S

D a ta b u sb u ff e r s

C o n t r o l r e g is te r

D a ta b u s

D M A s e le c t

Inte

rnal bus

R S

I n te r r u p t

B G

B R

R D

W R

R e g is te r s e le c t

R e a d

W r i te

B u s r e q u e s t

B u s g r a n t

I n te r r u p t

A d d r e s s r e g is t e r

W o r d c o u n t r e g is te r

A d d r e s s b u sb u ff e r s

A d d r e s s b u s

D M A r e q u e s t

D M A A c k n o w le d g et o I / O d e v ic e

Page 131: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

– DMA Transfer (I/O to Memory)• 1) I/O Device sends a DMA request• 2) DMAC activates the BR line• 3) CPU responds with BG line• 4) DMAC sends a DMA acknowledge to the I/O device• 5) I/O device puts a word in the data bus (for memory write)• 6) DMAC write a data to the address specified by Address register• 7) Decrement Word count register• 8) Word count register EOT interrupt CPU• 9) Word count register• DMAC checks the DMA request from I/O device

I / OP e r ip h e r a l

d e v ic e

D M A a c k n o w le d g e

A d d r e s ss e le c t

C P U

I n te r r u p t

A d d r e s s D a t a

B G

B R

R D W R

R a n d o m a c c e s sm e m o r y ( R A M )

A d d r e s s D a taR D W R

D i r e c t m e m o r y a c c e s s ( D A M )

c o n tr o l le r

I n te r r u p t

A d d r e s s D a t aR D W R

B G

R S

D S

B RD M A r e q u e s t

R e a d c o n tr o l

W r i te c o n t r o l

A d d r e s s b u s

D a ta b u s

Page 132: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

• 11-7 Input-Output Processor (IOP)– IOP : Fig. 11-19

• Communicate directly with all I/O devices• Fetch and execute its own instruction

– IOP instructions are specifically designed to facilitate I/O transfer– DMAC must be set up entirely by the CPU

• Designed to handle the details of I/O processing

– Command• Instruction that are read form memory by an IOP

– Distinguish from instructions that are read by the CPU– Commands are prepared by experienced programmers and are stored in memory– Command word = IOP program

M e m o r y u n i t

C e n t r a l P ro c e s s in gu n i t ( C P U )

I n p u t - o u tp u tp r o c e s s o r ( I O P )

Mem

ory

bus

P D P DP DP D

P e r ip h e r a l d e v ic e s

I / O b u s

Page 133: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

– CPU - IOP Communication : Fig. 11-20

• Memory units acts as a message center : Information

– each processor leaves information for the other

Message Center

IOP ProgramCPU Program

C P U o p e r a t io n s I O P o p e r a t io n s

S e n d in s tr u c t i o nto t e s t I O P p a t h T r a n s f e r s t a t u s w o r d

to m e m o r y lo c a t io n

I f s t a t u s O K . , s e n ds t a r t I / O in s t ru c t io n

t o I O PA c c e s s m e m o r y f o r

I O P p r o g r a m

C P U c o n t in u e s w i t ha n o t h e r p r o g r a m

C o n d u c t I / O t ra n s f e ru s in g D M A ; p re p a r e

s t a t u s r e p o r t

I / O t r a n s f e r c o m p le t e din te r ru p t C P U

R e q u e s t I O P s t a t u s

T r a n s f e r s t a t u s w o r dto m e m o r y lo c a t io n

C h e c k s t a t u s w o r df o r c o r r e c t t ra n s f e r

C o n t in u e

Page 134: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

• 11-8 Serial Communication– Difference between I/O Processor and Data Communication Processor

• I/O Processor– communicate with peripherals through a common I/O bus (data, address, control bus)

• Data Communication Processor– communicate with each terminal through a single pair of wires

– Modem ( = Data Sets, Acoustic Couplers )• Convert digital signals into audio tones to be transmitted over telephone lines• Various modulation schemes are used (FM, AM, PCM)

– Block transfer• An entire block of characters is transmitted in synchronous transmission• Transmitter sends one more character (error check) after the entire block is sent

– Error Check• LRC (Longitudinal Redundancy Check) : XOR• CRC (Cyclic Redundancy Check) : Polynomial

– 3 Transmission System• Simplex : one direction only• Half-duplex : both directions but only one direction at a time• Full-duplex : both directions simultaneously

Page 135: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

– Data Link• The communication lines, modems, and other equipment used in the transmission of

information between two or more stations

– Data Link Protocol• 1) Character-Oriented Protocol• 2) Bit-Oriented Protocol

– Character-Oriented Protocol• Message format for Character-Oriented Protocol : Fig. 11-25

– TEXT :– BCC : Block Check Character (LRC or CRC)

• ASCII Communication Control Character : Tab. 11-4– SYN (0010110) : Establishes synchronism– SOH (0000001) : Start of Header (address or control information)– STX (0000010) : Start of Text– ETX (0000011) : End of Text

• Transmission Example : Tab. 11-5, 11-6

S Y N S O HS Y N H e a d e r S T X T e x t B C CE T X

Page 136: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Assignment 4

Q1. Differentiate synchronous and asynchronous transmission.Q2. What is CAM? Q3. Give the block diagram of DMA controller. Why are the read and write control lines in a DMA controller bidirectional? Q4. Explain the working principle of I/O processors.Q5. Discuss the Programmed I/O method for controlling input output operations.

Page 137: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Tutorial 4

Q1. Differentiate synchronous and asynchronous communication.Q2. What are various asynchronous communication protocols explain.Q3. What is interrupt? Explain priority interrupt.Q4. What are various mode of data transfer?Q5. What do you mean by I/O processors?

Page 138: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Outcomes

After reading above topics students will be able to:

• Use research-based knowledge and research methods including design of experiments, analysis and interpretation of data, and synthesis of the information to provide valid conclusions.

13816/08/19

Page 139: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

UNIT 5Pipelining & Speed Up

Concept

16/08/19Department of Computer Science &

Engineering139

COMPUTER ORGANIZATION

& ARCHITECTURE

Page 140: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Architectural Classification Flynn’s Feng’s

Pipelining Concepts Performance

Speed Laws Pipelining Hazards

16/08/19 140Department of Computer Science &

Engineering

Page 141: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Learning Objectives

The objectives of the following slide is to make student aware about the :

Architectural Classification Flynn’s Feng’s

Pipelining Concepts Performance

14116/08/19 Department of Computer Science & Engineering

Page 142: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Characterize Pipelines

1) Hardware or software implementation – pipelining can be implemented in either software or hardware.

2) Large or Small Scale – Stations in a pipeline can range from simplistic to powerful, and a pipeline can range in length from short to long.

3) Synchronous or asynchronous flow – A synchronous pipeline operates like an assembly line: at a given time, each station is processing some amount of information. A asynchronous pipeline, allow a station to forward information at any time.

4) Buffered or unbuffered flow – One stage of pipeline sends data directly to another one or a buffer is place between each pairs of stages.

5) Finite Chunks or Continuous Bit Streams – The digital information that passes though a pipeline can consist of a sequence or small data items or an arbitrarily long bit stream.

6) Automatic Data Feed Or Manual Data Feed – Some implementations of pipelines use a separate mechanism to move information, and other implementations require each stage to participate in moving information.

Page 143: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

What is Pipelining

• A technique used in advanced microprocessors where the microprocessor begins executing a second instruction before the first has been completed.

- A Pipeline is a series of stages, where some work is done at each stage. The work is not finished until it has passed through all stages.

• With pipelining, the computer architecture allows the next instructions to be fetched while the processor is performing arithmetic operations, holding them in a buffer close to the processor until each instruction operation can performed.

Page 144: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

How Pipelines Works

• The pipeline is divided into segments and each segment can execute it operation concurrently with the other segments. Once a segment completes an operations, it passes the result to the next segment in the pipeline and fetches the next operations from the preceding segment.

Page 145: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Example

Page 146: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302
Page 147: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Instructions Fetch

• The instruction Fetch (IF) stage is responsible for obtaining the requested instruction from memory. The instruction and the program counter (which is incremented to the next instruction) are stored in the IF/ID pipeline register as temporary storage so that may be used in the next stage at the start of the next clock cycle.

Page 148: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Instruction Decode

• The Instruction Decode (ID) stage is responsible for decoding the instruction and sending out the various control lines to the other parts of the processor. The instruction is sent to the control unit where it is decoded and the registers are fetched from the register file.

Page 149: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Execution

• The Execution (EX) stage is where any calculations are performed. The main component in this stage is the ALU. The ALU is made up of arithmetic, logic and capabilities.

Page 150: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Memory and IO

• The Memory and IO (MEM) stage is responsible for storing and loading values to and from memory. It also responsible for input or output from the processor. If the current instruction is not of Memory or IO type than the result from the ALU is passed through to the write back stage.

Page 151: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Write Back

• The Write Back (WB) stage is responsible for writing the result of a calculation, memory access or input into the register file.

Page 152: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Operation Timings

• Estimated timings for each of the stages: Instructio

n Fetch2ns

Instruction Decode

1ns

Execution 2ns

Memory and IO

2ns

Write Back

1ns

Page 153: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Advantages/Disadvantages

Advantages: • More efficient use of processor• Quicker time of execution of large

number of instructions

Disadvantages:• Pipelining involves adding hardware to

the chip• Inability to continuously run the pipeline

at full speed because of pipeline hazards

which disrupt the smooth execution of

the pipeline.

Page 154: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Pipeline Hazards

• Data Hazards – an instruction uses the result of the previous instruction. A hazard occurs exactly when an instruction tries to read a register in its ID stage that an earlier instruction intends to write in its WB stage.

• Control Hazards – the location of an instruction depends on previous instruction

• Structural Hazards – two instructions need to access the same resource

Page 155: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Data Hazards

Page 156: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Stalling

• Stalling involves halting the flow of instructions until the required result is ready to be used. However stalling wastes processor time by doing nothing while waiting for the result.

Page 157: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302
Page 158: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Type of Pipelining

• Software Pipelining 1) Can Handle Complex Instructions 2) Allows programs to be reused

• Hardware Pipelining 1) Help designer manage complexity – a complex task can be divided into smaller, more manageable pieces. 2) Hardware pipelining offers higher performance

Page 159: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Type of Hardware Pipelines

• Instruction Pipeline - An instruction pipeline is very similar to a manufacturing assembly line.

1st stage receives some parts, performs its assembly task, and passes the results to the second stage;

2nd stage takes the partially assembled product from the first stage, performs its task, and passes its work to the third stage;

3rd stage does its work, passing the results to the last stage, which completes the task and outputs its results.

• Data Pipeline – data pipeline is designed to pass data from stage to stage.

Page 160: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Instruction Pipelines Conflict

• It divided into two categories.– Data Conflicts– Branch Conflicts

• When the current instruction changes a register that the next one needed, data conflicts happens.

• When the current instruction make a jump, branch conflicts happens.

Page 161: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

ARCHITECTURAL CLASSIFICATION

• Flynn classification: (1966) is based on multiplicity of instruction streams and the data streams in computer systems.

• Feng’s classification: (1972) is based on serial versus parallel processing.

Page 162: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Flynn classification:• It Is based on multiplicity of instruction streams and the data

streams in computer systems. • The most popular taxonomy of computer architecture was

defined by Flynn. – Flynn’s classification scheme is based on the notion of a stream of

information. Two types of information flow into a processor: instructions and data.

– The instruction stream is defined as the sequence of instructions performed by the processing unit.

– The data stream is defined as the data traffic exchanged between the memory and the processing unit

• Computer architecture can be classified into the following four distinct categories: – single-instruction single-data streams (SISD); – single-instruction multiple-data streams (SIMD); – multiple-instruction single-data streams (MISD); and – multiple-instruction multiple-data streams (MIMD).

Page 163: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

FENG’S CLASSIFICATION •Tse-yun Feng suggested the use of degree of parallelism to classify various computer

architectures.

•The maximum number of binary digits that can be processed within a unit time by a

computer system is called the maximum parallelism degree P.

•A bit slice is a string of bits one from each of the words at the same vertical position.

•Under above classification

–Word Serial and Bit Serial (WSBS)

–Word Parallel and Bit Serial (WPBS)

–Word Serial and Bit Parallel(WSBP)

–Word Parallel and Bit Parallel (WPBP)  

•WSBS has been called bit parallel processing because one bit is processed at a time.

•WPBS has been called bit slice processing because m-bit slice is processes at a time.

•WSBP is found in most existing computers and has been called as Word Slice processing

because one word of n bit processed at a time.

•WPBP is known as fully parallel processing in which an array on n x m bits is processes at

one time.

Page 164: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Assignment 5

Q1. Explain pipeline concept with performance matrices. Q2. What do you mean by delayed branch?Q3. Explain Flynn’s and Feng’s classification.Q4. What is Amdahl's Law & Gustafson's Law for speed up.Q5. What are the various hazard in pipelining? How they can be resolved?

Page 165: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Tutorial 5

Q1. Differentiate linear and non linear pipeline processors.Q2. What are various pipeline conflicts?Q3. A non-pipeline system takes 50 ns to process a task. The same task can processed in a six-segment pipeline with a dock cycle of 10 ns. Determine the speedup ratio of the pipeline for 100 tasks. What is the maximum speedup that can achieved? Q4.Give an example of program that will cause data conflict in the three segment pipeline.Q5. Discuss various issues in instruction pipelining.

Page 166: COMPUTER ORGANIZATION & ARCHITECTURE RCS-302

Outcomes

After reading above topics students will be able to:

• Ability to Identify, formulates, review research literature

and analyze complex engineering problems reaching substantiated conclusions using first principles

mathematics, natural science and engineering science.

• Design solutions for complex engineering problems and

design system components or processes that meet the specified needs with appropriate consideration for the

public health and safety, and the cultural, societal, and environmental considerations.

16616/08/19