software school, fudan university 2015 the role of performance to tell which system is faster

49
Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Upload: marvin-cunningham

Post on 06-Jan-2018

217 views

Category:

Documents


1 download

DESCRIPTION

Software School, Fudan University Example - 1 (cont.) Time of Concorde vs. Boeing 747? Concord is 1350 mph / 610 mph = 2.2 times faster = 6.5 hours / 3 hours Throughput of Concorde vs. Boeing 747 ? Concord is 178,200 pmph / 286,700 pmph = 0.62 “times faster” Boeing is 286,700 pmph / 178,200 pmph= 1.60 “times faster” Boeing is 1.6 times (“60%”) faster in terms of throughput Concord is 2.2 times (“120%”) faster in terms of flying time We will focus primarily on execution time for a single job Lots of instructions in a program => Instruction throughput important!

TRANSCRIPT

Page 1: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 2015

The Role of Performance To tell which system is faster

Page 2: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 20152

Performance: Two notions of “performance”

° Time to do the task (Execution Time)– execution time, response time, latency

° Tasks per day, hour, week, sec, ns. .. (Performance)– throughput, bandwidth

Plane

Boeing 747

BAD/Sud Concorde

Speed

610 mp/h

1350 mp/h

DC to Paris

6.5 hours

3 hours

Passengers

470

132

Throughput (pmp/h)

286,700

178,200

Which has higher performance?

Page 3: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 20153

Example - 1 (cont.)

• Time of Concorde vs. Boeing 747?• Concord is 1350 mph / 610 mph = 2.2 times faster = 6.5 hours / 3 hours

• Throughput of Concorde vs. Boeing 747 ?• Concord is 178,200 pmph / 286,700 pmph = 0.62 “times faster”• Boeing is 286,700 pmph / 178,200 pmph = 1.60 “times

faster”

• Boeing is 1.6 times (“60%”) faster in terms of throughput• Concord is 2.2 times (“120%”) faster in terms of flying timeWe will focus primarily on execution time for a single jobLots of instructions in a program => Instruction throughput important!

Page 4: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 20154

Defining Performance

Response time° Computer user cares about it° Equals to time_end – time_start

Throughput° Computer manager cares about it° Equals to # of jobs completed per second

Throughput = 1/ Response time?

Page 5: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 20155

Job BJob A

Response Time vs. Throughput

° Only if each component in the system doesn’t overlap° Example

° No overlap

° Overlap

Job A Job B

5s 5s

Throughput = 2/10 = 0.2 = 1/5

3s 3s2s

Throughput = 2/8 = 0.25 1/5

Page 6: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 20156

What Do We Improve?

Example° Make CPU faster both (response time & throughput)° Add more CPUs only throughput° Why?

Page 7: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 20157

More Definitions

° Elapsed time X = CPU execution time + waiting time (e.g. I/O or task switch)

CPU execution time• Time spent running the program

• Can split to two parts: - User CPU Time

- System CPU Time

e.g. Unix time command

90.7u 12.9s 2:39 65%

Means User time 90.7s, system CPU time 12.9s, elapsed time 2 minutes and 39 seconds

We will concentrate mostly on the CPU execution time

Page 8: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 2015

• Machine X runs a program in 10 sec

• Machine Y runs the same program in 15 sec

° How many times is X faster than Y ?

8

Page 9: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 20159

Performance Comparison

° Performance = 1 / Response time° Machine X is n times faster than machine Y

= = n

Example,• Machine X runs a program in 10 sec

• Machine Y runs the same program in 15 sec

15 / 10 = 1.5 X is 1.5 times faster than Y

Performance XPerformance Y

Response time YResponse time X

Page 10: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201510

Performance Comparison

° Machine X is m% faster than Y

= = 1 + m / 100

° Example,• Machine X runs a program in 10 sec

• Machine Y runs the same program in 15 sec

15 / 10 = 1.5 = 1 + 50/100 X is 50% faster than Y

Performance XPerformance Y

Response time YResponse time X

Page 11: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201511

Performance and Its Factors

° CPU execution time = CPU clock cycles X Clock cycle time

° CPU execution time = CPU clock cycles / Clock rate

° This formula make it clear that the hardware designer can improve performance by

• Reducing the length of the clock cycle

• Or Reducing the number of clock cycles

° The designer often faces a trade-off between the number of clock cycles and the length of each cycle

Page 12: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201512

Example -2

° Our favorite program runs in 10 seconds on computer A, which has a 4GHz clock. We are trying to help a computer designer build a computer , B, that will run this program in 6 seconds. The designer has determined that a substantial increase in the clock rate is possible, but this increase will affect the rest of the CPU design, causing computer B to require 1.5 times as many clock cycles as computer A from this program. What clock rate should we tell the designer to target?

Page 13: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201513

Example - 2 (cont.)° CPU time A =CPU clock cycles A / Clock rateA

° 10 s = CPU clock cycles A / 4X109cycles/s

° CPU clock cycles A = 40 X 109cycles

° CPU time B =CPU clock cycles B / Clock rateB

° 6 s = 1.5 X 40 X 109cycles / Clock rateB° Clock rateB = 1.5 X 40 X 109cycles / 6s = 10 X 109 cycles/s

= 10GHz

Page 14: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201514

Hardware Software Interface

° Previous example do not include any reference to the number of instructions needed for the programs

° The execution time must depend on the number of instructions in a program

° CPU clock cycles = Instructions for a program X Average clock cycles per instruction

° => CPU time = Instruction count X CPI X Clock cycle time = Instruction count X CPI / Clock rate

Page 15: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201515

Example -3° Suppose we have two implementations of the same ISA.

Computer A has a clock cycle time of 250 ps and a CPI of 2.0 for some program and computer B has a clock cycle time of 500 ps and a CPI of 1.2 for the same program. Which computer is faster for this program, and by how much?

Page 16: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201516

Example - 3 (cont.)

° Let I = instruction count • CPU clock cycles A = I X 2.0

• CPU clock cycles B = I X 1.2

° Now • CPU timeA = CPU clock cyclesA X Clock cycle timeA = I X 2.0 X

250ps = 500 X I ps

• CPU timeB = I X 1.2 X 500ps = 600 X I ps

° CPUA / CPUB = EXE TB / EXE TA = (600 X I ps)/(500 X I ps ) = 1.2

Page 17: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201517

The Basic Components of Performance

Components of performance Units of measureCPU execution time for a program Seconds for the program

Instruction count Instructions executed for the program

Clock cycles per instruction (CPI) Average number of clock cycles per instruction

Clock cycle time Seconds per clock cycle

Page 18: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201518

Aspects of CPU Performance

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycleinstr count CPI clock rate

Program

Compiler

Instr. Set

Organization

Technology

Page 19: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201519

Aspects of CPU Performance

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycleinstr count CPI clock rate

Program X X

Compiler X X

Instr. Set X X X

Organization X X

Technology X

Page 20: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201520

CPI: Average Cycles per Instruction

CPI = CPI F where F = I i = 1

n

i i i iInstruction Count

CPI = (CPU Time * Clock Rate) / Instruction Count = Clock Cycles / Instruction Count

CPI = ideal CPI + Memory_Stalls/Inst + Other_Stalls/Inst

Memory_Stalls/Inst = Instruction Miss Rate x Instruction Miss Penalty +Loads/Inst x Load Miss Rate x Load Miss Penalty +Stores/Inst x Store Miss Rate x Store Miss Penalty

Page 21: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201521

Other Metrics (1)

° MIPS (million instructions per second)

= Instruction count / (execution time x 106)

= Instruction count * clock rate / (Instruction count * CPI * 106)

= Clock rate / (CPI * 106)° VAX 11/78 = 1 MIPS

• But was it?

° The larger the better

Is MIPS a good metric?

Page 22: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201522

Shortcoming of MIPS

MIPS can vary inversely with performance° Happens when the instruction count changes° Example (same clock rate, R)° 3 types of instructions; A,B,C; take 1,2,3 cycles respectively° Before: instruction count, A=10, B=1, C=1° After: instruction count, A=5, B=1, C=1 CPI (before) = (10*1+1*2+1*3)/(10+1+1) = 15/12 = 5/4 CPI (after) = (5*1+1*2+1*3)/(5+1+1) = 10/7 MIPS (before) = R / (15/12) = 12R/15 = 0.8 R MIPS (after) = R / (10/7) = 7R/10 = 0.7 R

1) Before is faster. WRONG !!!

Page 23: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201523

Shortcoming of MIPS

A machine cannot have a single MIPS rating° MIPS varies between programs on the same machine

Cannot compare two different ISAs° Different ISAs have different instruction counts

Page 24: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201524

Other Metrics (2)MFLOPS (million floating-point operations per second)

=

° The larger the better° What’s wrong with MFLOPS?

# of floating-point operations in a program

execution time x 106

Page 25: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201525

Shortcoming of MFLOPS

Not applicable to integer applications• MFLOPS = 0

° # of floating-point operations depends on• Compiler

• ISA (may not support FP division)

° Different FP operations different execution time• FP multiplication takes longer time than FP add

° Different programs have different mixtures of FP operations

Page 26: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201526

Comparing and Summarizing Performance° Fair way to summarize performance?° Capture in a single number?° Example:

° – Which computer is better?° – By how much?° – Which program is more important?

Computer A Computer B Computer CProgram 1 1 10 20Program 2 1000 100 20Total Time 1001 110 40

Page 27: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201527

Comparing and Summarizing Performance

° All of these are true:° – A is 10 times faster than B for program P1° – B is 10 times faster than A for program P2° – A is 20 times faster than C for program P1° – C is 50 times faster than A for program P2° – B is 2 times faster than C for program P1° – C is 5 times faster than B for program P2° So which machine is faster???

Page 28: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201528

Page 29: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201529

Page 30: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201530

Page 31: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201531

Metrics of performance

Compiler

Programming Language

Application

DatapathControl

Transistors Wires Pins

ISA

Function Units

(millions) of Instructions per second – MIPS(millions) of (F.P.) operations per second – MFLOP/s

Cycles per second (clock rate)

Megabytes per second

Answers per month

Useful Operations per second

Each metric has a place and a purpose, and each can be misused

Page 32: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201532

Evaluating Performance of Two Computers

What do you execute?

Ideally° Real applications you use everyday

In reality: Benchmarks+ Save money and effort+ Smaller than real programs, easier to standardized– Not representative of real workload

To improve the quality of evaluation° Run a set of benchmarks

Page 33: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201533

Other Evaluation Tools

° Simulator• Speed

• Accuracy

° Trace• Replay recorded accesses

• Cache, branch, register

• File/network access

• …….

° Analysis methods

Page 34: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201534

Benchmark Examples

° CPU Benchmark• SPEC89/92/95/2000

• Berkeley Multimedia Workload

° Transaction Benchmark• TPC-C / TPC-D

° 3D Benchmark• 3DMark 2001

° Kernel Benchmark• Linpack or Livermore loops

° Microbenchmark• Whetstone and Dhrystone

• Try to match real application characteristics

Page 35: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201535

Be careful what you report (and what others report…)

° Killer Application takes X seconds on machine Y• What implementation of the application?

• What is the input? What were the options?

• What compiler? What optimizations?

• What machine configuration? Disk speed? Memory capacity? Etc.

° Could you (or someone else) reproduce the results?° You can always reproduce the results of a car

magazine’s performance review – why not a system experiment???

Page 36: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201536

Improving Performance: Fundamentals

° Suppose we have a machine with two instructions• Instruction A executes in 100 cycles

• Instruction B executes in 2 cycles

° We want better performance….• Which instruction do we improve?

Page 37: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201537

Our Goal: Improve Performance

Minimize time which is a product, NOT isolated terms° Why?° These terms are not necessary independent of each other

Example° ISA change to make an instruction do more work° To decrease the instruction count° But, CPI goes up due to longer instruction execution time

Page 38: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201538

Speedup due to enhancement E: ExTime w/o E Performance w/ ESpeedup(E) = -------------------- = -------------------------- ExTime w/ E Performance w/o E

Suppose that enhancement E accelerates a fraction P of the task by a factor S and the remainder of the task is unaffected then,ExTime(with E) = ((1-P) + P/S) X ExTime(without E)

Speedup(with E) = 1 (1-P) + p/S

Amdahl's Law

Page 39: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201539

Page 40: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201540

Improving Performance

° Locality• Rule of thumb: a program spends 90% of its execution time in

only 10% of the code

• Temporal: recently accessed items are likely to be accessed again in the near future

• Spatial: items located near each other tend to be accessed close together in time

° Concurrency• One of the most important ways to improve performance

• Reduces CPI by overlapping execution

• Threads, instructions, circuits, etc.

Page 41: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201541

Evaluating Systems?

Design-time metrics:

° Can it be implemented, in how long, at what cost?

° Can it be programmed? Ease of compilation?

Static Metrics:

° How many bytes does the program occupy in memory?

Dynamic Metrics:

° How many instructions are executed?

° How many bytes does the processor fetch to execute the program?

° How many clocks are required per instruction?

Best Metric: Time to execute the program!

NOTE: this depends on instructions set, processor organization, and compilation techniques.

CPI

Inst. Count Cycle Time

Page 42: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201542

So what is ISA?

° ISA: an interface between hardware and software° What is it ?

• Assemble Language Abstraction

• Machine Language Abstraction

° What does it provide?• An abstraction of the real computer, hide the details of

implementation- The syntax of computer instructions

- The semantics of instructions

- The execution model

- Programmer-visible computer status

Instruction Set Architecture (ISA)

Page 43: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201543

Instruction Set Architecture: What Must be Specified?

InstructionFetch

InstructionDecode

OperandFetch

Execute

ResultStore

NextInstruction

° Instruction Format or Encoding• how is it decoded?

° Location of operands and result• where other than memory?

• how many explicit operands?

• how are memory operands located?

• which can or cannot be in memory?

° Data type and Size° Operations

• what are supported

° Successor instruction• jumps, conditions, branches

• fetch-decode-execute is implicit!

Page 44: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201544

Instruction Set Architecture Category

° ISA define the processor family• Two modern main kind: RISC and CISC

- RISC (load/store): SPARC, MIPS, PowerPC

- CISC (GPR): X86 (or called IA32)

• Another divide: Superscalar, VLIW and EPIC- Superscalar: all the above

- Vector: Cray I

- VLIW: Philips TriMedia

- EPIC: IA64

° Under same ISA, there are many different processors• From different manufacturers:

- X86 from Intel and AMD and VIA

• Different models- 8086, 80386, Pentium, Pentium 4

Page 45: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201545

CISC Instruction Sets #1

° Complex Instruction Set Computer--Dominant style through mid-80’s

° Philosophy• Add instructions to perform “typical” programming tasks

° Stack-oriented instruction set• Use stack to pass arguments, save program counter

• Explicit push and pop instructions

° Arithmetic instructions can access memory• addl %eax, 12(%ebx,%ecx,4)

- requires memory read and write

- Complex address calculation

° Condition codes• Set as side effect of arithmetic and logical instructions

Page 46: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201546

CISC Instruction Set #2

° Large Number of Instructions• More than 100 instructions

° Every Instruction Execution Time Varies greatly• Some instruction will do a very complex task and execute very

long , e.g. copy an entire block

° Variable-length Instruction Encoding• IA32 vary from 1 byte to 15 byte

° Implementation artifacts hidden from machine-level programs.• Clean abstraction.

Page 47: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201547

RISC Instruction Sets #1

° Reduced Instruction Set Computer• Internal project at IBM, later popularized by Hennessy (Stanford)

and Patterson (Berkeley)

° Fewer, simpler instructions• Might take more instructions to get given task done

• Can execute them with small and fast hardware

° Register-oriented instruction set• Many more (typically 32) registers

• Use register for arguments, return pointer, temporaries

° Only load and store instructions can access memory° No Condition codes

• Test instructions return 0/1 in register

Page 48: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 201548

RISC Instruction Set #2

° Instruction Execution Time doesn’t vary large• RISC hasn’t complex operation instructions, e.g. floating-point

divide

° Fixed Length Encoding• Easy to decode

• Less compact

° Simple Addressing Formats• Only base and displacement addressing

Page 49: Software School, Fudan University 2015 The Role of Performance To tell which system is faster

Software School, Fudan University 2015

Summary

?