introduction© avi mendelson, 3/2005 1 mamas – computer architecture 234367 dr. avi mendelson –...

38
introduction © Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu Rapoport (2) Randi Katz and (3) Petterson

Post on 21-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 1

MAMAS – Computer Architecture234367

Dr. Avi Mendelson – Intel Israel

Some of the slides were taken from:

(1) Lihu Rapoport (2) Randi Katz and (3) Petterson

Page 2: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 2

General course information

Grade: 20% Exercise – most likely 5 assignments. 80% Final exam.

Textbooks:– Computer Architecture a Quantitative Approach:

Hennessy & Patterson – preferably 3rd edition

– Computer Organization and Design – The Hardware \ Software Interface: Patterson & Hennessy

Other course information: WEB site of the course.

Page 3: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 3

הערות לגבי מערך השיעורים

השיעורים והתרגולים מתואמים בינהם, לא ניתן להבין אחד ללא השני הגשת תרגילים

בזוגות או יחידים–

עבודות מודפסות )לא בכתב יד(–

אנו נפעיל אמצעים לזיהוי העתקות.במידה ויתפסו העתקות )תרגילים או מבחן( – )אפס( לכל המעורבים0ינתן ציון סופי

בשבועות שבהם אחד המתרגלים יעדר ו/או במידה ויהיה חופש באחדהתרגולים (ותרגולים האחרים באותו שבוע יתקיימו כרגיל), יושלם

הקורס בהקדם האפשרי, אבל מומלץ לסטודנטים להצטרף לתרגול אחר.

:החומר בקורס אינו זהה לזה שנלמד בסימסטרים קודמים לכן אם חומר לא נלמד )בשעור או בתרגול( לא נשאל עליו במבחן חומר חדש נכלל בחומר למבחן

....השקפים מערבבים אנגלית ועברית כיוון והושאלו ממקורות שונים

Page 4: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 4

Before we start

Page 5: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 5

The paradigm (Patterson)

Every Computer Scientist should master the “AAA”

Architecture Algorithms Applications

Page 6: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 6

Computer Architecture

The goal of Computer Architecture To build “cost effective

systems”– How do we calculate the cost of a

system ?– How we evaluate the effectiveness

of the system? To optimize the system

– What are the optimization points ?

Fact: most of the computer systems still use Von-Neumann principle of operation, even though, internally, they are much different from the computer of that time.

Why Computer Architecture? We, computer architects,

were lucky enough to have real impact on the computer technology

We need to understand the hardware trends

Most of our work are within the fields of performance evaluation and the algorithms which are implemented by the hardware.

Page 7: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 7

Introduction

Page 8: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 8

Computer System Structure

CPU

I/O BUS

Bridge Memory

KeyBoardMouse

Scanner

LAN

LanAdap

USBHub

GraphicAdapt

VideoBuffer

Mem BUSCPU BUS

Cache

Scsi/IDEAdap

Scsi Bus

HardDisk

North

South

Page 9: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 9

Computer systems how it looks in real systems

“North” – CPU + memory subsystem

I/O slots

Page 10: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 10

Class Focus

Performance – How to achieve and how to measure

CPU– CPU design, pipeline, hazards– Out-of-order and speculative execution

Memory Hierarchy– Main memory– Cache– Virtual Memory

PC Architecture

– Disks I/O

Advance topics

– Software optimizations

We will not focus on

Low level hardware details

Parallel and distributed systems (although we mention some of their basic technologies)

Page 11: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 11

Trends in Computer Technologies

Page 12: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 12

Technology TrendsCapacity Speed

Logic 2x in 3 years 2x in 3 years

DRAM 4x in 3 years 1.4x in 10 years

Disk 2x in 3 years 1.4x in 10 years

CPU Performance TrendsLogic Speed: 2x per 3 years

Logic Capacity: 2x per 3 years

Computing capacity: 4x per 3 years

BUT:

–If we could keep all the transistors busy all the time

–Actual: 3.3x per 3 years

X

Page 13: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 13

Technology and Computer Architecture

0

0.5

1

1.5

2

0 50 100 150 200 250 300 350 400

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

Series10

Series11

Series12

Series13

Series14

Series15

Series16

Speed demons

SPECInt92 = 10050

MHz

ALPHA

X86

PowerPC

21164

21064

1.0

0.5

1.5

2.0

50 100 150 200 250 300 350 400

1

SP

EC

Int9

2 / M

Hz

PENTIUM

PENTIUM PRO

Source: ISCA 95, p. 174

Page 14: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 14

Can it last forever – or – new challenges are coming

100

1 386

486

Pentium Pentium MMX

PentiumPro

Pentium II

10

1.5 1.0 0.8 0.6 0.35 0.25 0.18 Process (microns)

Max

imu

m P

ower

(W

)

1

10

100

1000

Wat

ts2/c

m

i386i486

Pentium processor

Pentium Pro processor

Pentium II processor

Pentium III processor

Hot plate

Nuclear ReactorRocketNozzle

Sun’sSurface

Power density Power

Page 15: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 15

Considerations in computer design

Page 16: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 16

Architecture & Microarchitecture

Architecture (ISA-Instruction Set Architecture):The collection of features of a processor (or a system) as they are seen by the “user”

– User: a binary executable running on the processor, or

– assembly level programmer

Microarchitecture (µarch, uarch):The collection of features or way of implementation of a processor (or a system) that do not affect the user

Page 17: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 17

Architecture & Microarchitecture Elements Architecture:

– Registers data width )8/16/32(– Instruction set– Addressing modes– Addressing methods )Segmentation, Paging, etc...(

Architecture:– Physical memory size– Caches size and structure– Number of execution units, number of execution pipelines

– Branch prediction

– TLB

Timing is considered Arch (though it is user visible!)

Processors with the same arch may have different Arch

Page 18: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 18

Compatibility

Backward compatibility– New hardware can run existing software

– Example: Pentium 4 can run software originally written for Pentium III, Pentium II, Pentium , 486, 386, 268

Forward compatibility– New software can run on existing hardware

– Example: new software written with MMXTM must still run on older Pentium processors which do not support MMXTM

– Less important than backward compatibility

New ideas: architecture independent– JIT – just in time compiler: Java and .NET

– Binary translation

Page 19: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 19

How to compare between different systems?

Page 20: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 20

Benchmarks – Programs for Evaluating Processor Performance

Toy Benchmarks– 10-100 line programs

– e.g.: sieve, puzzle, quicksort

Synthetic Benchmarks– Attempt to match average frequencies of real workloads

– e.g., Winstone, Drystone

Real programs– e.g., gcc, spice

SPEC: System Performance Evaluation Cooperative– SPECint )8 integer programs(

– and SPECfp )10 floating point(

Page 21: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 21

CPI – to compare systems with same instruction set architecture (ISA)

The CPU is synchronous - it works according to a clock signal.– Clock cycle is measured in nsec )10-9 of a second(.

– Clock rate )= 1/clock cycle( is measured in MHz )106 cycles/second(.

CPI - cycles per instruction– Average #cycles per Instruction )in a given program(

– IPC )= 1/CPI( : Instructions per cycles

Clock rate is mainly affected by technology, CPI by the architecture

CPI breakdown: how many cycles (in average) the program spends for different causes; e.g., in executing, memory I/O etc.

CPI =#cycles required to execute the program #instruction executed in the program

Page 22: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 22

CPI (cont.)

CPIi - #cycles to execute a given type of instruction– e.g.: CPIadd = 1, CPImul = 3– Independent of a program

Calculating the CPI of a program– ICi - #times instruction of type i was executed in the program

– IC - #instruction executed in the program:

– Fi - relative frequency of instruction of type i : Fi = ICi/IC

– #cyc - #cycles required to execute the program:

– CPI:

– This calculation does not take into account other delays such as memory, I/O

CPIcyc

IC

CPI IC

ICCPI

IC

ICCPI F

i ii

n

ii

i

n

i ii

n

# 1

1 1

# *cyc CPI IC CPI ICi ii

n

1

IC ICii

n

1

Page 23: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 23

CPU Time

CPU Time– The time required by the CPU to execute a given program:

CPU Time = clock cycle #cyc = clock cycle CPI IC

Our goal: minimize CPU Time– Minimize clock cycle: more MHz )process, circuit, Arch(

– Minimize CPI: Arch )e.g.: more execution units(

– Minimize IC: architecture )e.g.: MMXTM technology(

Speedup due to enhancement E ExTime w/o E Performance w/ E

Speedup(E) = ------------- = -------------------

ExTime w/ E Performance w/o E

Page 24: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 24

Speedupoverall =ExTimeold

ExTimenew

=1

Speedupenhanced

Fractionenhanced(1 - Fractionenhanced) +

ExTimenew = ExTimeold xSpeedupenhanced

Fractionenhanced(1 - Fractionenhanced) +

Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then:

Amdahl’s Law

Page 25: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 25

• Floating point instructions improved to run 2X; but only 10% of actual instructions are FP

Speedupoverall =1

0.95= 1.053

ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold

Corollary:

Make The Common Case Fast

Amdahl’s Law: Example

Page 26: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 26

instruction set

software

hardware

Instruction Set Design

The ISA is what the user and the compiler sees

The ISA is what the hardware needs to implement

Page 27: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 27

Why ISA is important?

Code size

– long instructions may take more time to be fetched

– Requires larges memory )important in small devices, e.g., cell phones(

Number of instructions (IC)

– Reducing IC reduce execution time )assuming same CPI and frequency(

Code “simplicity”

– Simple HW implementation which leads to higher frequency and lower power

– Code optimization can better be applied to “simple code”

Page 28: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 28

The impact of the ISA

RISC vs CISC

Page 29: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 29

CISC Processors CISC - Complex Instruction Set Computer

The idea: a high level machine language Characteristic

–Many instruction types, with many addressing modes–Some of the instructions are complex:

Perform complex tasks Require many cycles

–ALU operations directly on memory Usually uses limited number of registers

–Variable length instructions Common instructions get short codes save code length

Example: x86

Page 30: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 30

CISC Drawbacks Compilers do not take advantage of the complex instructions

and the complex indexing methods Implement complex instructions and complex addressing modes

complicate the processor slow down the simple, common instructions

contradict Amdahl’s law corollary: Make The Common Case Fast

Variable length instructions are real pain in the neck:– It is difficult to decode few instructions in parallel

As long as instruction is not decoded, its length is unknown It is unknown where the instruction ends It is unknown where the next instruction starts

– An instruction may not fit into the “right behavior” of the memory hierarchy )will be discussed next lectures(

Page 31: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 31

RISC Processors RISC - Reduced Instruction Set Computer The idea: simple instructions enable fast hardware Characteristic

– A small instruction set, with only a few instructions formats

– Simple instructions execute simple tasks require a single cycle )with pipeline(

– A few indexing methods

– ALU operations on registers only Memory is accessed using Load and Store instructions only. Many orthogonal registers Three address machine: Add dst, src1, src2

– Fixed length instructions

Examples: MIPSTM, SparcTM, AlphaTM, PowerPCTM

Page 32: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 32

RISC Processors (Cont.) Simple architecture Simple micro-architecture

–Simple, small and fast control logic

–Simpler to design and validate

–Room for on die caches: instruction cache + data cache Parallelize data and instruction access

–Shorten time-to-market

Using a smart compiler –Better pipeline usage

–Better register allocation

Existing RISC processor are not “pure” RISC –e.g., support division which takes many cycles

Page 33: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 33

RISC and Amdhal’s Law (Example) In compare to the CISC architecture:

– 10% of the static code, that executes 90% of the dynamic has the same CPI

– 90% of the static code, which is only 10% of the dynamic, increases in 60%

– The number of instruction being executed is increased in 50%

– The speed of the processor is doubled This was true for the time the RISC processors were invented

We get

And then

CPInew

CPIold

Fractionenhanced

Speedupenhanced

= )1 - Fractionenhanced( + = 0.9 + 0.1×1.6 = 1.06

CPU Timeold clockold CPIold ICold

Speedupoverall = = = 2/)1.061.5(=1.26 CPU Timenew clocknew CPInew ICnew

Page 34: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 34

So, what is better, RISC or CISC

Today CISC architectures (X86) are running as fast as RISC (or even faster)

The main reasons are:– Translates CISC instructions into RISC instructions )ucode(

– CISC architecture are using “RISC like engine”

We will discuss this kind of solutions later on in this course.

Page 35: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 35

Virtual machines (JAVA)

Machine independent ISA– Can be run on different architectures

– Each architectures has an emulation )virtual machine( that forms a “system within the system”

The code can be “compiled for the native code “on the fly”– This process is called JIT: Just-In-Time

.Net allows to combine different formats of code:– e.g., different programming languages

Pros– Portability, Flexibility

Cons– Efficiency

– The JIT can apply only very basic optimization

Page 36: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 36

backup

Page 37: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 37

IC cost = Die cost + Testing cost + Packaging cost Final test yieldDie cost = Wafer cost Dies per Wafer * Die yield

Dies per wafer = * ( Wafer_diam / 2)� 2 – * Wafer_diam – Test dies� Die Area ¦ 2 * Die Area

Die Yield = Wafer yield * 1 +

Defects_per_unit_area * Die_Area

Integrated Circuits Costs

Die Cost goes roughly with die area4

{

}

Page 38: Introduction© Avi Mendelson, 3/2005 1 MAMAS – Computer Architecture 234367 Dr. Avi Mendelson – Intel Israel Some of the slides were taken from: (1) Lihu

introduction© Avi Mendelson, 3/2005 38

Real World Examples

Chip Metal Line Wafer Defect Area Dies/ Yield Die Cost layers width cost /cm2 mm2 wafer

386DX 2 0.90 $900 1.0 43 360 71% $4

486DX2 3 0.80 $1200 1.0 81 181 54% $12

PowerPC 601 4 0.80 $1700 1.3 121 115 28% $53

HP PA 7100 3 0.80 $1300 1.0 196 66 27% $73

DEC Alpha 3 0.70 $1500 1.2 234 53 19% $149

SuperSPARC 3 0.70 $1700 1.6 256 48 13% $272

Pentium 3 0.80 $1500 1.5 296 40 9% $417

– From "Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report, August 2, 1993, p. 15