lectures for 3rd edition
DESCRIPTION
Lectures for 3rd Edition. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/1.jpg)
12004 Morgan Kaufmann Publishers
Lectures for 3rd Edition
Note: these lectures are often supplemented with other materials and also problems from the text worked out on the blackboard. You’ll want to customize these lectures for your class. The student audience for these lectures have had exposure to logic design and attend a hands-on assembly language programming lab that does not follow a typical lecture format.
![Page 2: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/2.jpg)
22004 Morgan Kaufmann Publishers
Chapter Three
![Page 3: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/3.jpg)
32004 Morgan Kaufmann Publishers
• Bits are just bits (no inherent meaning)— conventions define relationship between bits and
numbers
• Binary numbers (base 2)0000 0001 0010 0011 0100 0101 0110 0111 1000 1001...decimal: 0...2n-1
• Of course it gets more complicated:numbers are finite (overflow)fractions and real numbersnegative numberse.g., no MIPS subi instruction; addi can add a negative
number
• How do we represent negative numbers?i.e., which bit patterns will represent which numbers?
Numbers
![Page 4: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/4.jpg)
42004 Morgan Kaufmann Publishers
• Sign Magnitude: One's Complement Two's Complement
000 = +0 000 = +0 000 = +0001 = +1 001 = +1 001 = +1010 = +2 010 = +2 010 = +2011 = +3 011 = +3 011 = +3100 = -0 100 = -3 100 = -4101 = -1 101 = -2 101 = -3110 = -2 110 = -1 110 = -2111 = -3 111 = -0 111 = -1
• Issues: balance, number of zeros, ease of operations
• Which one is best? Why?
Possible Representations
![Page 5: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/5.jpg)
52004 Morgan Kaufmann Publishers
• 32 bit signed numbers:
0000 0000 0000 0000 0000 0000 0000 0000two = 0ten
0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten
0000 0000 0000 0000 0000 0000 0000 0010two = + 2ten
...0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten
0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten
1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648ten
1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647ten
1000 0000 0000 0000 0000 0000 0000 0010two = – 2,147,483,646ten
...1111 1111 1111 1111 1111 1111 1111 1101two = – 3ten
1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten
1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten
maxint
minint
MIPS
![Page 6: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/6.jpg)
62004 Morgan Kaufmann Publishers
• Negating a two's complement number: invert all bits and add 1
– remember: “negate” and “invert” are quite different!
• Converting n bit numbers into numbers with more than n bits:
– MIPS 16 bit immediate gets converted to 32 bits for arithmetic
– copy the most significant bit (the sign bit) into the other bits
0010 -> 0000 0010
1010 -> 1111 1010
– "sign extension" (lbu vs. lb)
Two's Complement Operations
![Page 7: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/7.jpg)
72004 Morgan Kaufmann Publishers
• Just like in grade school (carry/borrow 1s) 0111 0111 0110+ 0110 - 0110 - 0101
• Two's complement operations easy
– subtraction using addition of negative numbers 0111+ 1010
• Overflow (result too large for finite computer word):
– e.g., adding two n-bit numbers does not yield an n-bit number 0111+ 0001 note that overflow term is somewhat misleading, 1000 it does not mean a carry “overflowed”
Addition & Subtraction
![Page 8: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/8.jpg)
82004 Morgan Kaufmann Publishers
• No overflow when adding a positive and a negative number
• No overflow when signs are the same for subtraction
• Overflow occurs when the value affects the sign:
– overflow when adding two positives yields a negative
– or, adding two negatives gives a positive
– or, subtract a negative from a positive and get a negative
– or, subtract a positive from a negative and get a positive
• Consider the operations A + B, and A – B
– Can overflow occur if B is 0 ?
– Can overflow occur if A is 0 ?
Detecting Overflow
![Page 9: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/9.jpg)
92004 Morgan Kaufmann Publishers
• An exception (interrupt) occurs
– Control jumps to predefined address for exception
– Interrupted address is saved for possible resumption
• Details based on software system / language
– example: flight control vs. homework assignment
• Don't always want to detect overflow— new MIPS instructions: addu, addiu, subu
note: addiu still sign-extends!note: sltu, sltiu for unsigned comparisons
Effects of Overflow
![Page 10: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/10.jpg)
102004 Morgan Kaufmann Publishers
• More complicated than addition
– accomplished via shifting and addition
• More time and more area
• Let's look at 3 versions based on a gradeschool algorithm
0010 (multiplicand)
__x_1011 (multiplier)
• Negative numbers: convert and multiply
– there are better techniques, we won’t look at them
Multiplication
![Page 11: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/11.jpg)
112004 Morgan Kaufmann Publishers
Multiplication: Implementation
DatapathControl
MultiplicandShift left
64 bits
64-bit ALU
ProductWrite
64 bits
Control test
MultiplierShift right
32 bits
32nd repetition?
1a. Add multiplicand to product and
place the result in Product register
Multiplier0 = 01. Test
Multiplier0
Start
Multiplier0 = 1
2. Shift the Multiplicand register left 1 bit
3. Shift the Multiplier register right 1 bit
No: < 32 repetitions
Yes: 32 repetitions
Done
![Page 12: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/12.jpg)
122004 Morgan Kaufmann Publishers
Final Version
Multiplicand
32 bits
32-bit ALU
ProductWrite
64 bits
Controltest
Shift right
32nd repetition?
Product0 = 01. Test
Product0
Start
Product0 = 1
3. Shift the Product register right 1 bit
No: < 32 repetitions
Yes: 32 repetitions
Done
What goes here?
•Multiplier starts in right half of product
![Page 13: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/13.jpg)
132004 Morgan Kaufmann Publishers
Floating Point (a brief look)
• We need a way to represent
– numbers with fractions, e.g., 3.1416
– very small numbers, e.g., .000000001
– very large numbers, e.g., 3.15576 109
• Representation:
– sign, exponent, significand: (–1)sign significand 2exponent
– more bits for significand gives more accuracy
– more bits for exponent increases range
• IEEE 754 floating point standard:
– single precision: 8 bit exponent, 23 bit significand
– double precision: 11 bit exponent, 52 bit significand
![Page 14: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/14.jpg)
142004 Morgan Kaufmann Publishers
IEEE 754 floating-point standard
• Leading “1” bit of significand is implicit
• Exponent is “biased” to make sorting easier
– all 0s is smallest exponent all 1s is largest
– bias of 127 for single precision and 1023 for double precision
– summary: (–1)sign significand) 2exponent – bias
• Example:
– decimal: -.75 = - ( ½ + ¼ )
– binary: -.11 = -1.1 x 2-1
– floating point: exponent = 126 = 01111110
– IEEE single precision: 10111111010000000000000000000000
![Page 15: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/15.jpg)
152004 Morgan Kaufmann Publishers
Floating point addition
•
Still normalized?
4. Round the significand to the appropriate
number of bits
YesOverflow or
underflow?
Start
No
Yes
Done
1. Compare the exponents of the two numbers.
Shift the smaller number to the right until its
exponent would match the larger exponent
2. Add the significands
3. Normalize the sum, either shifting right and
incrementing the exponent or shifting left
and decrementing the exponent
No Exception
Small ALU
Exponentdifference
Control
ExponentSign Fraction
Big ALU
ExponentSign Fraction
0 1 0 1 0 1
Shift right
0 1 0 1
Increment ordecrement
Shift left or right
Rounding hardware
ExponentSign Fraction
![Page 16: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/16.jpg)
162004 Morgan Kaufmann Publishers
Floating Point Complexities
• Operations are somewhat more complicated (see text)
• In addition to overflow we can have “underflow”
• Accuracy can be a big problem
– IEEE 754 keeps two extra bits, guard and round
– four rounding modes
– positive divided by zero yields “infinity”
– zero divide by zero yields “not a number”
– other complexities
• Implementing the standard can be tricky
• Not using the standard can be even worse
– see text for description of 80x86 and Pentium bug!
![Page 17: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/17.jpg)
172004 Morgan Kaufmann Publishers
Chapter Three Summary
• Computer arithmetic is constrained by limited precision
• Bit patterns have no inherent meaning but standards do exist
– two’s complement
– IEEE 754 floating point
• Computer instructions determine “meaning” of the bit patterns
• Performance and accuracy are important so there are manycomplexities in real machines
• Algorithm choice is important and may lead to hardware optimizations for both space and time (e.g., multiplication)
• You may want to look back (Section 3.10 is great reading!)
![Page 18: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/18.jpg)
182004 Morgan Kaufmann Publishers
Lets Build a Processor
• Almost ready to move into chapter 5 and start building a processor
• First, let’s review Boolean Logic and build the ALU we’ll need(Material from Appendix B)
32
32
32
operation
result
a
b
ALU
![Page 19: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/19.jpg)
192004 Morgan Kaufmann Publishers
• Problem: Consider a logic function with three inputs: A, B, and C.
Output D is true if at least one input is trueOutput E is true if exactly two inputs are trueOutput F is true only if all three inputs are true
• Show the truth table for these three functions.
• Show the Boolean equations for these three functions.
• Show an implementation consisting of inverters, AND, and OR gates.
Review: Boolean Algebra & Gates
![Page 20: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/20.jpg)
202004 Morgan Kaufmann Publishers
• Let's build an ALU to support the andi and ori instructions
– we'll just build a 1 bit ALU, and use 32 of them
• Possible Implementation (sum-of-products):
b
a
operation
result
op a b res
An ALU (arithmetic logic unit)
![Page 21: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/21.jpg)
212004 Morgan Kaufmann Publishers
• Selects one of the inputs to be the output, based on a control input
• Lets build our ALU using a MUX:
S
CA
B0
1
Review: The Multiplexor
note: we call this a 2-input mux even though it has 3 inputs!
![Page 22: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/22.jpg)
222004 Morgan Kaufmann Publishers
• Not easy to decide the “best” way to build something
– Don't want too many inputs to a single gate
– Don’t want to have to go through too many gates
– for our purposes, ease of comprehension is important
• Let's look at a 1-bit ALU for addition:
• How could we build a 1-bit ALU for add, and, and or?
• How could we build a 32-bit ALU?
Different Implementations
cout = a b + a cin + b cin
sum = a xor b xor cin
Sum
CarryIn
CarryOut
a
b
![Page 23: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/23.jpg)
232004 Morgan Kaufmann Publishers
Building a 32 bit ALU
b
0
2
Result
Operation
a
1
CarryIn
CarryOut
Result31a31
b31
Result0
CarryIn
a0
b0
Result1a1
b1
Result2a2
b2
Operation
ALU0
CarryIn
CarryOut
ALU1
CarryIn
CarryOut
ALU2
CarryIn
CarryOut
ALU31
CarryIn
![Page 24: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/24.jpg)
242004 Morgan Kaufmann Publishers
• Two's complement approach: just negate b and add.
• How do we negate?
• A very clever solution:
What about subtraction (a – b) ?
0
2
Result
Operation
a
1
CarryIn
CarryOut
0
1
Binvert
b
![Page 25: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/25.jpg)
252004 Morgan Kaufmann Publishers
Adding a NOR function
• Can also choose to invert a. How do we get “a NOR b” ?
Binvert
a
b
CarryIn
CarryOut
Operation
1
0
2+
Result
1
0
Ainvert
1
0
![Page 26: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/26.jpg)
262004 Morgan Kaufmann Publishers
• Need to support the set-on-less-than instruction (slt)
– remember: slt is an arithmetic instruction
– produces a 1 if rs < rt and 0 otherwise
– use subtraction: (a-b) < 0 implies a < b
• Need to support test for equality (beq $t5, $t6, $t7)
– use subtraction: (a-b) = 0 implies a = b
Tailoring the ALU to the MIPS
![Page 27: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/27.jpg)
272004 Morgan Kaufmann Publishers
Supporting slt
• Can we figure out the idea?
Binvert
a
b
CarryIn
CarryOut
Operation
1
0
2+
Result
1
0
Ainvert
1
0
3Less
Binvert
a
b
CarryIn
Operation
1
0
2+
Result
1
0
3Less
Overflowdetection
Set
Overflow
Ainvert
1
0
Use this ALU for most significant bitall other bits
![Page 28: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/28.jpg)
282004 Morgan Kaufmann Publishers
a0
Operation
CarryInALU0Less
CarryOut
b0
CarryIn
a1 CarryInALU1Less
CarryOut
b1
Result0
Result1
a2 CarryInALU2Less
CarryOut
b2
a31 CarryInALU31Less
b31
Result2
Result31
......
...
BinvertAinvert
0
0
0 Overflow
Set
CarryIn
Supporting slt
![Page 29: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/29.jpg)
292004 Morgan Kaufmann Publishers
Test for equality
• Notice control lines:
0000 = and0001 = or0010 = add0110 = subtract0111 = slt1100 = NOR
•Note: zero is a 1 when the result is zero!
a0
Operation
CarryInALU0Less
CarryOut
b0
a1 CarryInALU1Less
CarryOut
b1
Result0
Result1
a2 CarryInALU2Less
CarryOut
b2
a31 CarryInALU31Less
b31
Result2
Result31
......
...
Bnegate
Ainvert
0
0
0 Overflow
Set
CarryIn...
...Zero
![Page 30: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/30.jpg)
302004 Morgan Kaufmann Publishers
Conclusion
• We can build an ALU to support the MIPS instruction set
– key idea: use multiplexor to select the output we want
– we can efficiently perform subtraction using two’s complement
– we can replicate a 1-bit ALU to produce a 32-bit ALU
• Important points about hardware
– all of the gates are always working
– the speed of a gate is affected by the number of inputs to the gate
– the speed of a circuit is affected by the number of gates in series(on the “critical path” or the “deepest level of logic”)
• Our primary focus: comprehension, however,– Clever changes to organization can improve performance
(similar to using better algorithms in software)– We saw this in multiplication, let’s look at addition now
![Page 31: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/31.jpg)
312004 Morgan Kaufmann Publishers
• Is a 32-bit ALU as fast as a 1-bit ALU?
• Is there more than one way to do addition?
– two extremes: ripple carry and sum-of-products
Can you see the ripple? How could you get rid of it?
c1 = b0c0 + a0c0 + a0b0
c2 = b1c1 + a1c1 + a1b1 c2 =
c3 = b2c2 + a2c2 + a2b2 c3 =
c4 = b3c3 + a3c3 + a3b3 c4 =
Not feasible! Why?
Problem: ripple carry adder is slow
![Page 32: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/32.jpg)
322004 Morgan Kaufmann Publishers
• An approach in-between our two extremes
• Motivation:
– If we didn't know the value of carry-in, what could we do?
– When would we always generate a carry? gi = ai bi
– When would we propagate the carry? pi = ai + bi
• Did we get rid of the ripple?
c1 = g0 + p0c0
c2 = g1 + p1c1 c2 =
c3 = g2 + p2c2 c3 =
c4 = g3 + p3c3 c4 =
Feasible! Why?
Carry-lookahead adder
![Page 33: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/33.jpg)
332004 Morgan Kaufmann Publishers
• Can’t build a 16 bit adder this way... (too big)• Could use ripple carry of 4-bit CLA adders• Better: use the CLA principle again!
Use principle to build bigger adders
a4 CarryIn
ALU1 P1 G1
b4a5b5a6b6a7b7
a0 CarryIn
ALU0 P0 G0
b0
Carry-lookahead unit
a1b1a2b2a3b3
CarryIn
Result0–3
pigi
ci + 1
pi + 1
gi + 1
C1
Result4–7
a8 CarryIn
ALU2 P2 G2
b8a9b9
a10b10a11b11
ci + 2
pi + 2
gi + 2
C2
Result8–11
a12 CarryIn
ALU3 P3 G3
b12a13b13a14b14a15b15
ci + 3
pi + 3
gi + 3
C3
Result12–15
ci + 4C4
CarryOut
![Page 34: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/34.jpg)
342004 Morgan Kaufmann Publishers
ALU Summary
• We can build an ALU to support MIPS addition
• Our focus is on comprehension, not performance
• Real processors use more sophisticated techniques for arithmetic
• Where performance is not critical, hardware description languages allow designers to completely automate the creation of hardware!
![Page 35: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/35.jpg)
352004 Morgan Kaufmann Publishers
Chapter 4
![Page 36: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/36.jpg)
362004 Morgan Kaufmann Publishers
• Measure, Report, and Summarize
• Make intelligent choices
• See through the marketing hype
• Key to understanding underlying organizational motivation
Why is some hardware better than others for different programs?
What factors of system performance are hardware related?(e.g., Do we need a new machine, or a new operating system?)
How does the machine's instruction set affect performance?
Performance
![Page 37: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/37.jpg)
372004 Morgan Kaufmann Publishers
Which of these airplanes has the best performance?
Airplane Passengers Range (mi) Speed (mph)
Boeing 737-100 101 630 598Boeing 747 470 4150 610BAC/Sud Concorde 132 4000 1350Douglas DC-8-50 146 8720 544
•How much faster is the Concorde compared to the 747?
•How much bigger is the 747 than the Douglas DC-8?
![Page 38: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/38.jpg)
382004 Morgan Kaufmann Publishers
• Response Time (latency)
— How long does it take for my job to run?
— How long does it take to execute a job?
— How long must I wait for the database query?
• Throughput
— How many jobs can the machine run at once?
— What is the average execution rate?
— How much work is getting done?
• If we upgrade a machine with a new processor what do we increase?
• If we add a new machine to the lab what do we increase?
Computer Performance: TIME, TIME, TIME
![Page 39: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/39.jpg)
392004 Morgan Kaufmann Publishers
• Elapsed Time
– counts everything (disk and memory accesses, I/O , etc.)
– a useful number, but often not good for comparison purposes
• CPU time
– doesn't count I/O or time spent running other programs
– can be broken up into system time, and user time
• Our focus: user CPU time
– time spent executing the lines of code that are "in" our program
Execution Time
![Page 40: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/40.jpg)
402004 Morgan Kaufmann Publishers
• For some program running on machine X,
PerformanceX = 1 / Execution timeX
• "X is n times faster than Y"
PerformanceX / PerformanceY = n
• Problem:
– machine A runs a program in 20 seconds
– machine B runs the same program in 25 seconds
Book's Definition of Performance
![Page 41: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/41.jpg)
412004 Morgan Kaufmann Publishers
Clock Cycles
• Instead of reporting execution time in seconds, we often use cycles
• Clock “ticks” indicate when to start activities (one abstraction):
• cycle time = time between ticks = seconds per cycle
• clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec)
A 4 Ghz. clock has a cycle time
time
seconds
program
cycles
program
seconds
cycle
(ps) spicosecond 2501210 9104
1
![Page 42: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/42.jpg)
422004 Morgan Kaufmann Publishers
So, to improve performance (everything else being equal) you can either
(increase or decrease?)
________ the # of required cycles for a program, or
________ the clock cycle time or, said another way,
________ the clock rate.
How to Improve Performance
seconds
program
cycles
program
seconds
cycle
![Page 43: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/43.jpg)
432004 Morgan Kaufmann Publishers
• Could assume that number of cycles equals number of instructions
This assumption is incorrect,
different instructions take different amounts of time on different machines.
Why? hint: remember that these are machine instructions, not lines of C code
time
1st
inst
ruct
ion
2nd
inst
ruct
ion
3rd
inst
ruct
ion
4th
5th
6th ...
How many cycles are required for a program?
![Page 44: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/44.jpg)
442004 Morgan Kaufmann Publishers
• Multiplication takes more time than addition
• Floating point operations take longer than integer ones
• Accessing memory takes more time than accessing registers
• Important point: changing the cycle time often changes the number of cycles required for various instructions (more later)
time
Different numbers of cycles for different instructions
![Page 45: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/45.jpg)
452004 Morgan Kaufmann Publishers
• Our favorite program runs in 10 seconds on computer A, which has a 4 GHz. clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should we tell the designer to target?"
• Don't Panic, can easily work this out from basic principles
Example
![Page 46: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/46.jpg)
462004 Morgan Kaufmann Publishers
• A given program will require
– some number of instructions (machine instructions)
– some number of cycles
– some number of seconds
• We have a vocabulary that relates these quantities:
– cycle time (seconds per cycle)
– clock rate (cycles per second)
– CPI (cycles per instruction)
a floating point intensive application might have a higher CPI
– MIPS (millions of instructions per second)
this would be higher for a program using simple instructions
Now that we understand cycles
![Page 47: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/47.jpg)
472004 Morgan Kaufmann Publishers
Performance
• Performance is determined by execution time
• Do any of the other variables equal performance?
– # of cycles to execute program?
– # of instructions in program?
– # of cycles per second?
– average # of cycles per instruction?
– average # of instructions per second?
• Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.
![Page 48: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/48.jpg)
482004 Morgan Kaufmann Publishers
• Suppose we have two implementations of the same instruction set architecture (ISA).
For some program,
Machine A has a clock cycle time of 250 ps and a CPI of 2.0 Machine B has a clock cycle time of 500 ps and a CPI of 1.2
What machine is faster for this program, and by how much?
• If two machines have the same ISA which of our quantities (e.g., clock rate, CPI, execution time, # of instructions, MIPS) will always be identical?
CPI Example
![Page 49: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/49.jpg)
492004 Morgan Kaufmann Publishers
• A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively).
The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of CThe second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C.
Which sequence will be faster? How much?What is the CPI for each sequence?
# of Instructions Example
![Page 50: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/50.jpg)
502004 Morgan Kaufmann Publishers
• Two different compilers are being tested for a 4 GHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software.
The first compiler's code uses 5 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions.
The second compiler's code uses 10 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions.
• Which sequence will be faster according to MIPS?
• Which sequence will be faster according to execution time?
MIPS example
![Page 51: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/51.jpg)
512004 Morgan Kaufmann Publishers
• Performance best determined by running a real application
– Use programs typical of expected workload
– Or, typical of expected class of applicationse.g., compilers/editors, scientific applications, graphics,
etc.
• Small benchmarks
– nice for architects and designers
– easy to standardize
– can be abused
• SPEC (System Performance Evaluation Cooperative)
– companies have agreed on a set of real program and inputs
– valuable indicator of performance (and compiler technology)
– can still be abused
Benchmarks
![Page 52: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/52.jpg)
522004 Morgan Kaufmann Publishers
Benchmark Games
• An embarrassed Intel Corp. acknowledged Friday that a bug in a software program known as a compiler had led the company to overstate the speed of its microprocessor chips on an industry benchmark by 10 percent. However, industry analysts said the coding error…was a sad commentary on a common industry practice of “cheating” on standardized performance tests…The error was pointed out to Intel two days ago by a competitor, Motorola …came in a test known as SPECint92…Intel acknowledged that it had “optimized” its compiler to improve its test scores. The company had also said that it did not like the practice but felt to compelled to make the optimizations because its competitors were doing the same thing…At the heart of Intel’s problem is the practice of “tuning” compiler programs to recognize certain computing problems in the test and then substituting special handwritten pieces of code…
Saturday, January 6, 1996 New York Times
![Page 53: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/53.jpg)
532004 Morgan Kaufmann Publishers
SPEC ‘89
• Compiler “enhancements” and performance
0
100
200
300
400
500
600
700
800
tomcatvfppppmatrix300eqntottlinasa7doducspiceespressogcc
BenchmarkCompiler
Enhanced compiler
SP
EC
pe
rfo
rman
ce r
atio
![Page 54: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/54.jpg)
542004 Morgan Kaufmann Publishers
SPEC CPU2000
![Page 55: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/55.jpg)
552004 Morgan Kaufmann Publishers
SPEC 2000
Does doubling the clock rate double the performance?
Can a machine with a slower clock rate have better performance?
Clock rate in MHz
500 1000 1500 30002000 2500 35000
200
400
600
800
1000
1200
1400
Pentium III CINT2000
Pentium 4 CINT2000
Pentium III CFP2000
Pentium 4 CFP2000
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000
Always on/maximum clock Laptop mode/adaptiveclock
Minimum power/minimumclock
Benchmark and power mode
Pentium M @ 1.6/0.6 GHz
Pentium 4-M @ 2.4/1.2 GHz
Pentium III-M @ 1.2/0.8 GHz
![Page 56: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/56.jpg)
562004 Morgan Kaufmann Publishers
Experiment
• Phone a major computer retailer and tell them you are having trouble deciding between two different computers, specifically you are confused about the processors strengths and weaknesses
(e.g., Pentium 4 at 2Ghz vs. Celeron M at 1.4 Ghz )
• What kind of response are you likely to get?
• What kind of response could you give a friend with the same question?
![Page 57: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/57.jpg)
572004 Morgan Kaufmann Publishers
Execution Time After Improvement =
Execution Time Unaffected +( Execution Time Affected / Amount of Improvement )
• Example:
"Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?"
How about making it 5 times faster?
• Principle: Make the common case fast
Amdahl's Law
![Page 58: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/58.jpg)
582004 Morgan Kaufmann Publishers
• Suppose we enhance a machine making all floating-point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will the speedup be if half of the 10 seconds is spent executing floating-point instructions?
• We are looking for a benchmark to show off the new floating-point unit described above, and want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 100 seconds with the old floating-point hardware. How much of the execution time would floating-point instructions have to account for in this program in order to yield our desired speedup on this benchmark?
Example
![Page 59: Lectures for 3rd Edition](https://reader035.vdocuments.net/reader035/viewer/2022062721/568136c6550346895d9e61de/html5/thumbnails/59.jpg)
592004 Morgan Kaufmann Publishers
• Performance is specific to a particular program/s
– Total execution time is a consistent summary of performance
• For a given architecture performance increases come from:
– increases in clock rate (without adverse CPI affects)
– improvements in processor organization that lower CPI
– compiler enhancements that lower CPI and/or instruction count
– Algorithm/Language choices that affect instruction count
• Pitfall: expecting improvement in one aspect of a machine’s
performance to affect the total performance
Remember