chapter 1 computer abstractions and technology part ii

29
Chapter 1 Computer Abstractions and Technology Part II

Upload: gervase-gray

Post on 20-Jan-2016

237 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 1 Computer Abstractions and Technology Part II

Chapter 1

Computer Abstractions and TechnologyPart II

Page 2: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

CPU Clock and Instructions

Multiplication and division takes more time than addition and subtraction

Floating point operations take longer than integer operations

Accessing memory takes more time than accessing registers

Important : changing the cycle time often changes the number of cycles required for various instructions

Page 3: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

CPU Clock and Instructions One could assume that number of clock cycles

equals number of instructions, i.e., one clock cycle per instruction:

BUT .. different instructions take different number of cycles.

1st

inst

ruct

ion

2nd

inst

ruct

ion

3rd

inst

ruct

ion

4th

5th

6th ...

Page 4: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

Cycles Per Instruction (CPI)

Execution time depends on the total number of executed instructions in a program

CPI is the average number of clock cycles each instruction takes to execute

Useful way to compare two different implementations of the same ISA

Page 5: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

Instruction Count and CPI

Instruction Count for a program Determined by program, ISA and compiler

Average clock cycles per instruction Determined by CPU hardware Determined by instructions mix

Rate Clock

CPICount nInstructio

Time Cycle ClockCPICount nInstructioTime CPU

nInstructio per CyclesCount nInstructioCycles Clock

Page 6: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

CPI Example 1 Suppose we have two implementations of the

same instruction set architecture (ISA), and some program.

Computer A: clock cycle time = 250ps, CPI = 2.0 Computer B: clock cycle time = 500ps, CPI = 1.2

Which machine is faster for this program? By how much?

Question: If two machines have the same ISA, which quantities (e.g., clock rate, CPI, execution time, #instructions) will always be identical?

Page 7: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

CPI Example 1

1.2500psI

600psI

ATime CPUBTime CPU

600psI500ps1.2IBTime CycleBCPICount nInstructioBTime CPU

500psI250ps2.0IATime CycleACPICount nInstructioATime CPU

A is faster…

…by this much

Page 8: Chapter 1 Computer Abstractions and Technology Part II

same ISA => same #instructions

Florida A & M University - Department of Computer and Information Sciences

1.2500ps

600ps

ATime CPUBTime CPU

600psI500ps1.2BTime CycleBCPIBTime CPU

500psI250ps2.0ATime CycleACPI ATime CPU

Page 9: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

CPI in More Detail If different instructions take different

numbers of cycles

n

1iii )Count(CPICycles Clock

Counti : number of instructions in class i

CPIi : number of cycles per class-i

instruction

n : number of instruction classes

Page 10: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

CPI in More Detail

n

1i

ii Count nInstructio

Count nInstructioCPI

Count nInstructio

Cycles ClockCPI

Relative frequency

Weighted average CPI (instruction mix)

Example: 20% 3 cycle; 50% 4 cycles; 30% 6 cycles CPI = 3 x 0.20 + 4 x 0.50 + 6 x 0.30 = 0.6 + 2.0 + 1.8 = 4.4

Page 11: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

CPI Example 2 A compiler designer is trying to decide between

two code sequences for a machine that has three different classes of instructions based on number of clock cycles required to execute.

Analyze:

Which sequence requires more instructions?

Which sequence will be faster? By how much?

What is average CPI for each sequence.

Page 12: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

CPI Example 2

Class A B C

CPI for class 1 2 3

IC in sequence 1 2 1 2

IC in sequence 2 4 1 1

Sequence 1: IC = 5 Clock Cycles

= 2×1 + 1×2 + 2×3= 10

Avg. CPI = 10/5 = 2.0

Sequence 2: IC = 6 Clock Cycles

= 4×1 + 1×2 + 1×3= 9

Avg. CPI = 9/6 = 1.5

Page 13: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

Performance Summary Performance is determined by execution time

Do any of the other variables equal performance? # of cycles to execute program? # of instructions in program? # of cycles per second? average # of cycles per instruction? average # of instructions per second?

Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.

Page 14: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

Performance Summary

Time is the only complete and reliable measurement

Changing instruction set to lower instruction count may lead to an computer with a slower clock cycle time

Code that executes fewer instructions may not be faster because CPI depends on the type of instructions executed

The BIG Picture

cycle Clock

Seconds

nInstructio

cycles Clock

Program

nsInstructioTime CPU

Page 15: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

Performance Summary

Performance depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, Tc

The BIG Picture

cycle Clock

Seconds

nInstructio

cycles Clock

Program

nsInstructioTime CPU

Page 16: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

Power Trends

Rapid slowing due to practical power limit for cooling microprocessors

Page 17: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

Power Trends In CMOS IC technology

Dynamic power Primary source of power dissipation Power consumed during switching Depends on

capacitive loading of each transistor voltage applied frequency of transistor switching

FrequencyVoltageload CapacitivePower 2

×1000×305V → 1V

Page 18: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

Reducing Power Suppose a new CPU has

85% of capacitive load of old CPU 15% voltage and 15% frequency reduction

0.520.85FVC

0.85)F0.85)(V0.85)(CPP 4

old

2

oldold

old

2

oldold

old

new

(

The power wall We can’t reduce voltage further We can’t remove more heat

How else can we improve performance?

Page 19: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

Uniprocessor Performance§1.6 T

he Sea C

hange: The S

witch to M

ultiprocessors

Constrained by power, instruction-level parallelism, memory latency

Page 20: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

Multiprocessors Multicore microprocessors

More than one processor per chip Requires explicit parallel programming

Compare with instruction level parallelism Hardware executes multiple instructions at once Hidden from the programmer

Hard to do Programming for performance Load balancing Optimizing communication and synchronization

Page 21: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

SPEC Benchmarks Programs used to measure performance

Ideally reflects a typical actual workload or expected class of applications (e.g. compiler, graphics)

Standard Performance Evaluation Coop (SPEC) Mission: Establish, maintain, and endorse a

standardized set of relevant benchmarks and metrics for performance evaluation of modern computer systems

Develops benchmarks for CPU, I/O, Web, …

Page 22: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

SPEC SPEC is an umbrella non-profit organization that

covers three groups, each with their own benchmarks: Open Systems Group(OSG) -Component-and system-

level benchmarks in an UNIX / NT / VMS environment.

High Performance Group(HPG) -Benchmarking in a numeric computing environment, with emphasis on high-performance numeric computing.

Graphics Performance Characterization Group(GPCG) -Benchmarks for graphical subsystems and OpenGL and Xwindows.

Page 23: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

SPEC SPEC 1989, 1992, 1995 SPEC CPU2000 – retired February 2007 SPEC CPU 2006 version 1.0 released August 2006

SPEC provides benchmark sets for: graphics high performance scientific computing - HPC2002, OMP2001,

MPI2006 file systems SPEC - sfs2008 Web servers and clients - WEB2005 JAVA client server - jAppServer2004 Engineering CAD applications

Page 24: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

SPEC CPU2006 v1.0 next-generation, industry-standardized,

CPU-intensive benchmark suite

emphasizes performance of the processor (CPU), memory, and compiler

comparative measure of compute-intensive performance across the widest practical range of hardware

source code from real user applications.

Page 25: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

SPEC CPU2006 Two benchmark suites: CINT2006 for

measuring compute-intensive integer performance, and CFP2006 for compute-intensive floating point performance

CINT2006 suite includes 12 application-based benchmarks written in C and C++

CFP2006 includes 17 CPU-intensive benchmarks written in C, C++, Fortran, and a mixture of C and Fortran

Page 26: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

SPEC CPU Benchmark SPEC CPU2006

Elapsed time to execute a collection/mix of programs

Negligible I/O, so focuses on CPU performance Normalize relative to reference machine Summarize as geometric mean of

performance ratios

n

n

1iiratio time Execution

Page 27: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

CINT2006 for Opteron X4 2356

Name Description IC×109 CPI Tc (ns) Exec time Ref time SPECratio

perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3

bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8

gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1

mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8

go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6

hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5

sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5

libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8

h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3

omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1

astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1

xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0

Geometric mean 11.7

SPECratio is inversely proportional to execution time

Page 28: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

SPEC Power Benchmark Power consumption of server at different

workload levels

Performance: ssj_ops/sec Power: Watts (Joules/sec)

10

0ii

10

0ii powerssj_ops Wattper ssj_ops Overall

Page 29: Chapter 1 Computer Abstractions and Technology Part II

Florida A & M University - Department of Computer and Information Sciences

Concluding Remarks Cost/performance is improving

Due to underlying technology development Hierarchical layers of abstraction

In both hardware and software Instruction set architecture

The hardware/software interface Execution time: the best performance

measure Power is a limiting factor

Use parallelism to improve performance