overview & computer performance scr1043 - module 1 1

77
OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Upload: lucinda-hines

Post on 13-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

OVERVIEW & COMPUTER

PERFORMANCE

SCR1043 - Module 1 1

Page 2: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

-- Organization and Architecture- Structure and Function

Reference: William Stallings – Computer Organization & Architecture

SCR1043 - Module 1 2

Page 3: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Computer Architecture is those attributes visible to the programmer. Examples: the Instruction set the number of bits used to represent various data types I/O mechanisms memory addressing techniques

Computer Organization is how features are implemented: Control signals Interfaces between computer and peripherals The memory technology being used

So, for example, the fact that a multiply instruction is available is a computer architecture issue. How that multiply is implemented is a computer organization issue.

SCR1043 - Module 1 3

Page 4: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 4

Many computer manufacturers offer a family of computer models, all with the same architecture but with differences in organization.

All Intel x86 family share the same basic architecture

The IBM System/370 architecture first introduced in 1970 included a number of models that share the same basic architecture and has survived to this day as the architecture of IBM’s mainframe product line.

The newer models retained the same architecture so that the customer’s software investment was protected (code compatibility)

Page 5: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

A computer is a complex system with a hierarchical system of interrelated subsystems with different levels.

At each level, the designer is concerned with structure and function:Structure: The way in which the components are

interrelated.Function: The operation of each individual

component as part of the structure.

The computer system in this course will be described from the top down, instead of bottom-up.

SCR1043 - Module 1 5

Page 6: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Four main structural components: Central processing unit (CPU): Controls the operation

of the computer and performs its data processing functions. Its major structural components are: Control unit: Controls the operation of the CPU Arithmetic and logic unit (ALU): Performs the computer’s data

processing functions Registers: Provides storage internal to the CPU CPU interconnection: Some mechanism that provides for

communication among the control unit, ALU, and registers

Main memory: Stores data I/O: Moves data between the computer and its external

environment System interconnection: Some mechanism that

provides for communication among CPU, main memory, and I/O

SCR1043 - Module 1 6

Page 7: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 7

Computer

Main Memory

InputOutput

SystemsInterconnection

Peripherals

Communicationlines

CentralProcessing Unit

Computer

Page 8: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 8

Computer Arithmeticand Login Unit

ControlUnit

Internal CPUInterconnection

Registers

CPU

I/O

Memory

SystemBus

CPU

Page 9: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 9

CPU

ControlMemory

Control Unit Registers and Decoders

SequencingLogin

ControlUnit

ALU

Registers

InternalBus

Control Unit

Page 10: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

There are only four functions: Data processing

process data in variety of forms and requirements

Data storage short and long term data storage for

retrieval and update Data movement

move data between computer and outside world.

Control control of process, move and store data

using instruction. How to perform this function?

through PROGRAM

SCR1043 - Module 1 10

Page 11: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

A sequence of steps For each step, a computer function is

executed For each operation, a different/new set of

control signals is needed For each operation a unique code

(instruction) is providede.g. ADD, MOVE

A hardware segment accepts the code and issues the control signals

SCR1043 - Module 1 11

Page 12: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Approach 1: Hardwired program connecting/combining various logic

components to store data and perform arithmetic and logic operations

Hardwired systems are inflexible

SCR1043 - Module 1 12

Page 13: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Approach 2: Software General purpose hardware can do different tasks,

given correct control signals Instead of re-wiring, supply a new set of control signals

through instruction codes

SCR1043 - Module 1 13

Page 14: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

- A Brief History of Computers

- Designing for Performance

- Pentium and PowerPC Evolution

Reference: William Stallings – Computer Organization & Architecture

SCR1043 - Module 1 14

Page 15: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 15

Page 16: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

1943-1946: ENIAC (Electronic Numerical Integrator And Computer)First general purpose computerDesigned by Mauchly and EckertDesigned to create ballistics tables for

WWII, but too late – helped determine H-bomb feasibility instead. General purpose!

30 tons + 15000 sq. ft. + 18000 vacuum tubes + 140 KW = 5000 additions/sec

SCR1043 - Module 1 16

Page 17: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 17

Page 18: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

1945: stored-program concept first implemented for EDVAC (Electronic Discrete Variable Computer).

Key concepts:

Data and instructions are stored in a single read-write memory.

The contents of this memory are addressable by location, without regard to the type of data contained there

Execution occurs in a sequential fashion from one instruction to the next

SCR1043 - Module 1 18

Page 19: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 19

Page 20: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Prototype for all subsequent general-purpose computers. With rare exceptions, all of today’s computers have this same general structure, and are referred to as von Neumann machines.

General IAS structure consists of:A main memory, which stores both data and

instructionsAn ALU capable of operating on binary dataA control unit, which interprets the instructions in

memory and causes them to be executed I/O equipment operated by the control unit

SCR1043 - Module 1 20

Page 21: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 21

Page 22: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

1950: UNIVAC – commissioned by Census Bureau for 1950 calculations

Late 1950’s: UNIVAC IIGreater memory and higher performanceSame basic architecture as UNIVACFirst example of upward compatibility

1953: IBM 701 – primarily for science

1955: IBM 702 – primarily for business

SCR1043 - Module 1 22

Page 23: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

1947: Transistor developed at Bell Labs Introduction of more complex ALU and control

units High-level programming languages

The data channel – an independent I/O module with its own processor and instruction set

The multiplexor – a central termination point for data channels, CPU, and memory. Precursor to idea of data bus.

DEC (Digital Equipment Corporation) founded in 1957 delivered its first computer, PDP-1, a mini-computer phenomenon.

SCR1043 - Module 1 23

Page 24: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 24

Page 25: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

1958: Integrated circuit developed

1964: Introduction of IBM System/360

First planned family of computer products. Characteristics of a family:

Similar or Identical Instruction Set and Operating System

Increasing Speed Increasing Number of I/O Ports Increasing Memory Size Increasing Cost

Different models could all run the same software, but with different price/performance.

SCR1043 - Module 1 25

Page 26: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Literally - “small electronics”

A computer is made up of gates, memory cells and interconnections

These can be manufactured on a semiconductor

e.g. silicon wafer

Page 27: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

With microelectronics, density of components on chip keep on increasing

From Gordon Moore – co-founder of Intel, it says Number of transistors on a chip will double every year Since 1970’s development has slowed a little, a modified

law Number of transistors on a chip doubles every 18

months Therefore, more circuit can be packed on the same size chip

Higher packing density means shorter electrical paths, giving higher performance Smaller size gives increased flexibility Reduced power and cooling requirements Fewer interconnections increases reliability

SCR1043 - Module 1 27

Page 28: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 28

Moore prediction

Actual

Page 29: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

1964 Replaced (& not compatible with) 7000 series

First planned “family” of computersSimilar or identical instruction setsSimilar or identical O/S

Increasing speed Increasing number of I/O ports (i.e. more

terminals) Increased memory size Increased cost

SCR1043 - Module 1 29

Page 30: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 30

Page 31: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

1964: First PDP-8 shipped First minicomputer Started OEM market Introduced the bus structure

Did not need air conditioned room

Small enough to sit on a lab bench

$16,000 compared to $100k++ for IBM 360

SCR1043 - Module 1 31

Page 32: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Semiconductor memory

Replaced bulky core memoryGoes through its own generations in size,

increasing by a factor of 4 each time: 1K, 4K, 16K, 64K, 256K, 1M, 4M, 16M on a single chip with declining cost and access time

Microprocessor and personal computers

Distributed computing

Larger and larger scales of integration

SCR1043 - Module 1 32

Page 33: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 33

Page 34: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Microprocessor : all CPU components on a single chip

1971 - 4004 First microprocessor 4 bit Followed in 1972 by 8008 8 bit Both designed for specific applications 1974 - 8080 Intel’s first general purpose microprocessor Designed to be the CPU of a general purpose

microcomputer

SCR1043 - Module 1 34

Page 35: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

8080 first general purpose microprocessor 8 bit data path Used in first personal computer – Altair

8086 much more powerful 16 bit instruction cache, prefetch few instructions 8088 (8 bit external bus) used in first IBM PC

80286 16 MB memory addressable

80386 First 32 bit design Support for multitasking- run multiple programs at the same

time

SCR1043 - Module 1 35

Page 36: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

80486 sophisticated powerful cache and instruction pipelining built in maths co-processor

Pentium Superscalar technique - multiple instructions executed in

parallel Pentium Pro

Increased superscalar organization Aggressive register renaming branch prediction data flow analysis speculative execution

SCR1043 - Module 1 36

Page 37: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Pentium II MMX technology graphics, video & audio processing

Pentium III Additional floating point instructions for 3D

graphics Pentium 4

Further floating point and multimedia enhancements

Itanium 64 bit

Core Duo starts of a multicore processor

SCR1043 - Module 1 37

Page 38: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1
Page 39: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1
Page 40: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

1975, 801 minicomputer project (IBM) RISC Berkeley RISC I processor 1986, IBM commercial RISC workstation product, RT PC.

Not commercial success Many rivals with comparable or better performance

1990, IBM RISC System/6000 RISC-like superscalar machine POWER architecture

IBM alliance with Motorola (68000 microprocessors), and Apple, (used 68000 in Macintosh)

Result is PowerPC architecture Derived from the POWER architecture Superscalar RISC Apple Macintosh Embedded chip applications

Page 41: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1
Page 42: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 42

Page 43: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Price/performancePrice drops every yearPerformance increases almost yearlyMemory goes up a factor of 4 every 3 years of so

The basic building blocks for today’s computers are the same as those of the IAS computer nearly 50 years ago.

SCR1043 - Module 1 43

Page 44: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Density of integrated circuits increases by 4 every 3 years (e.g. memory evolution)

Also results in performance boost of 4-5 times every 3 years

Requires more elaborate ways of feeding instructions quickly enough. Some techniques:Branch predictionData-flow analysisSpeculative execution

SCR1043 - Module 1 44

Page 45: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

All components do not increase performance at same rate as processor

Results in a need to adjust the organization and architecture to compensate for the mismatch among the capabilities of the various components.

SCR1043 - Module 1 45

Page 46: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Must carry a constant flow of program instructions and data between memory chips and processor

Processor speed and memory capacity have grown rapidly

Speed with which data can be transferred between processor and main memory has lagged badly

DRAM density goes up faster than amount of main memory neededNumber of DRAM’s goes downWith fewer DRAM’s, less opportunity for parallel data

transfer

SCR1043 - Module 1 46

Page 47: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Increase number of bits retrieved at one time Make DRAM “wider” rather than “deeper”

Change DRAM interface Include cache in DRAM chip

Reduce frequency of memory access More complex and efficient cache between processor and

memory Cache on chip/processor

Increase interconnection bandwidth between processor and memory High speed buses Hierarchy of buses

I/O devices also become increasingly demanding

SCR1043 - Module 1 47

Page 48: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Peripherals with intensive I/O demands Large data throughput demands Processors can handle this Problem moving data Solutions:

CachingBufferingHigher-speed interconnection busesMore elaborate bus structuresMultiple-processor configurations

Page 49: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Peripherals (I/O devices) has extremes•speed variations : < 1Hz to GHz•in amount of data transfer: <1bit/sec to Gb/sec

Page 50: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Because of constant and unequal changes in:Processor componentsMain memory I/O devices Interconnection structures,

designers must constantly strive to balance their throughtput and processing demands.

SCR1043 - Module 1 50

Page 51: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Increase hardware speed of processorFundamentally due to shrinking logic gate size

More gates, packed more tightly, increasing clock rate Propagation time for signals reduced

Increase size and speed of cachesDedicating part of processor chip

Cache access times drop significantly

Change processor organization and architecture Increase effective speed of executionParallelism

Page 52: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Power Power density increases with density of logic and clock

speed Dissipating heat

RC delay Speed at which electrons flow is limited by resistance and

capacitance of metal wires connecting them due to increased density

Interconnected wires becomes thinner, increasing resistance (R) Wires are closer together, increasing capacitance (C) Therefore, Delay increases as RC product increases

Memory latency Memory speeds lag behind processor speeds

Solution: More emphasis on organizational and architectural

approaches

Page 53: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 53

Better performance if improvement in architecture of the CPU compared to the processing speed (technology)

Page 54: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Typically two or three levels of cache between processor and main memory (L1,L2,L3)

Chip density increasedMore cache memory on chip

Faster cache access

Pentium chip devoted about 10% of chip area to cache

Pentium 4 devotes about 50%

Page 55: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Enable parallel execution of instructions

Pipeline works like assembly lineDifferent stages of execution of different

instructions at same time along pipeline

Superscalar allows multiple pipelines within single processor Instructions that do not depend on one another

can be executed in parallel

Page 56: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Both of these approaches are reaching a point of diminishing returns.

Internal organization of processors complexCan get a great deal of parallelismFurther significant increases likely to be relatively

modest Benefits from cache are reaching limit Increasing clock rate runs into power dissipation

problem Some fundamental physical limits are being reached

We can use Amdahl’s law to estimate maximum expected performance improvements to an overall system when only part of the system is improved.

Page 57: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Within a processor, increase in performance is proportional to square root of increase in complexity

If software can use multiple processors, doubling number of processors almost doubles performance So, use two simpler processors on the chip rather than

one more complex processor Multiple processors on single chip

With large shared cache With two processors, larger caches are justified Power consumption of memory logic (for cache) is less

than processing logic Example: IBM POWER4

Two cores based on PowerPC

Page 58: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1
Page 59: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

CPU Performance and its factorsEvaluating Performance

Reference: David A. Patterson & John L. Hennessy – Computer Organization And Design

SCR1043 - Module 1 59

Page 60: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Hardware performance is often key to the effectiveness of an entire system of hardware and software.

For different types of applications, different performance metrics may by appropriate, and different aspects of a computer systems may be the most significant factor in determining overall performance.

Understanding how best to measure performance and limitations of performance is important when selecting a computer system

To understand the issues of assessing performance. Why a piece of software performs as it does? Why one instruction set can be implemented to perform better than another? How some hardware feature affects performance?

SCR1043 - Module 1 60

Page 61: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Performance is important!

Identify HW/SW performance problems

Comparisons:Which machine is faster?Which ISA is better?Which implementation (of an ISA) is faster?

Expose significant performance issues (enable us to ignore unimportant issues)

SCR1043 - Module 1 61

Page 62: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Which of these airplanes has the best performance?

• How do we say one computer has better performance than another?• Peformance based on speed

• To take a single passenger from one point to another in the least time – Concorde

• Performance based on throughput• To transport 450 passengers from one point to another - 747

SCR1043 - Module 1 62

Page 63: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Response Time and ThroughputResponse Time: time to respond (complete an

operation)Throughput: jobs completed per unit timeOften can trade one for the other

SCR1043 - Module 1 63

Page 64: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

MB/s, Mb/s: Megabytes, Megabits Per Second

MIPS: Millions of Instructions Per Second

CPI: Clock Cycles Per InstructionIPC: Instructions Per Clock cycle

Hz: (processor clock frequency) cycles Per Second

LIPS: Logical Interference Per Second

FLOPS: Floating-Point arithmetic Operations Per Second

SCR1043 - Module 1 64

Page 65: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Real time: “Wall Clock” time, always ticking

CPU execution time (CPU time): ticks only when CPU is working for you User: CPU time spent in the program System: CPU time spent in the operating system

performing tasks on behalf of the program

Clock cycle: Also called tick, clock tick, clock period, clocks, cycle (e.g. 0.25 nanosecond). The time for one clock period, usually of the processor clock, which runs at a constant rate

Clock rate: the inverse of the clock cycle. Frequency (e.g. 4 GHz)

SCR1043 - Module 1 65

Page 66: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

CPU execution time for a program = Seconds for the program

Clock cycle time = Seconds per clock cycle

Clock ticks at a constant rate, measure time in clock cycles:

Seconds = Cycles * SecondsProgram Program Cycle

Prefer clock frequency? Divide by HzSeconds = Cycles / Clock rate

(Freq)Program Program

SCR1043 - Module 1 66

Page 67: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 67

A simple formula relates the most basic metrics (i.e., clock cycles and clock cycle time) to CPU time

Page 68: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Our favorite program runs in 10 seconds on computer A, which has a 4 GHz clock. Computer B will run this program in 6 seconds, given that computer B requires 1.2 times as many clock cycles as computer A for this program. What is computer B’s clock rate?

CPU Time(A) = CPU Clock Cycles(A) / Clock Rate(A)10 s = CPU Clock Cycles(A) / 4 GHz10 s = CPU Clock Cycles(A) / 4 X 10*9 HzCPU Clock Cycles(A) = 40 x 10*9 cycles

CPU Time(B) = 1.2 X CPU Clock Cycles(A) / Clock Rate(B)6 s = 1.2 X CPU Clock Cycles(A) / Clock Rate(B)Clock Rate (B) = 1.2 X 40 X 10*9 cycles / 6 secondsClock Rate (B) = 48 X 10*9 cycles / 6 secondsClock Rate (B) = 8 X 10*9 cycles / secondsClock Rate (B) = 8 GHz

SCR1043 - Module 1 68

Page 69: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Instruction count = Instructions executed for the program Clock cycle per instruction = Average number of clock

cycles per instructions Programs are made of instructions:

Cycles = Instructions * Cycles

Program Program Instructions Using CPI:

Cycles = Instructions * CPI

Program Program Or, using Instructions Per Clock (IPC):

Cycles = Instructions / IPCProgram Program

SCR1043 - Module 1 69

Page 70: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

CPU time = Seconds

Program

= Cycles *Seconds

Program Cycle

= Instructions * Cycles *Seconds

Program InstructionsCycle

In other words:

SCR1043 - Module 1 70

Page 71: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Suppose we have two implementations of the same instruction set architecture and for the same program. Which computer is faster and by how much?Computer A: clock cycle time=250 ps and CPI=2.0Computer B: clock cycle time=500 ps and CPI=1.2

Say I = number of instructions for the program, find number of clock cycles for A and B

CPU Clock Cycles(A) = I X CPI(A)CPU Clock Cycles(A) = I X 2.0CPU Clock Cycles(B) = I X CPI(B)CPU Clock Cycles(B) = I X 1.2

SCR1043 - Module 1 71

Page 72: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Compute CPU Time for A and B

CPU Time(A) = CPU Clock Cycles(A) X Clock Cycle Time(A)

CPU Time(A) = I X 2.0 X 250 ps = I X 500 psCPU Time(B) = CPU Clock Cycles(B) X Clock Cycle

Time(B)CPU Time(B) = I X 1.2 X 500 ps = I X 600 ps

Clearly A is faster. The amount faster is the ratio of execution time.Performance(A) = Execution time(B) = I X 600 ps = 1.2

timesPerformance(B) Execution time(B) I X 500 ps

We can conclude, A is 1.2 times faster than B for this program

SCR1043 - Module 1 72

Page 73: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Sometimes it is possible to compute the CPU clock cycles by looking at the different types of instructions and using their individual clock cycle counts

CPIi = count of the number of instructions of class i executed Ci = average number of cycles per instruction for that instruction

class n = number of instruction classes

Remember that overall CPI for a program will depend on both the number of cycles for each instruction type and the frequency of each instruction type in the program execution

SCR1043 - Module 1 73

Page 74: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

SCR1043 - Module 1 74

Page 75: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

A compiler designer is trying to decide between two code sequences for a particular computer. The hardware designers have supplied the following facts:

For a particular high-level-language statement, the compiler writer is considering two code sequence that require the following instruction counts:

Which code sequence executes the most instructions? Which will be faster? Which is the CPI for each sequence?

SCR1043 - Module 1 75

Page 76: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

Sequence 1 (Instruction Count(1)) 2+1+2=5 instructions Sequence 2 (Instruction Count(2)) 4+1+1=6 instructions

CPU Clock Cycles(1)= (2X1)+(1X2)+(2X3) = 10 cycles CPU Clock Cycles(2)= (4X1)+(1X2)+(1X3) = 9 cycles

So code Sequence 2 faster, even though it executes 1 extra instruction Code Sequence 2 uses fewer clock cycles, must have

lower CPI

CPI = CPU Clock Cycles/Instruction Count CPI(1) = CPU Clock Cycles(1)/Instruction Count(1) = 10/5 =

2 CPI(2) = CPU Clock Cycles(2)/Instruction Count(2) = 9/6 =

1.5SCR1043 - Module 1 76

Page 77: OVERVIEW & COMPUTER PERFORMANCE SCR1043 - Module 1 1

The evolution of computers has been characterized by increasing processor speed, decreasing comp size, increasing memory size, and increasing I/O capacity and speed.

All computer designers must balance performance and cost.

Execution time of real programs as the metric is a reliable method of determining and reporting performance.

SCR1043 - Module 1 77