1 lecture 6 processor technology. 2 advance in hardware intel family: (8086/1978 -- pentium ii/1998)...
TRANSCRIPT
![Page 1: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/1.jpg)
1
Lecture 6Processor
Technology
![Page 2: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/2.jpg)
2
Advance in Hardware
INTEL Family: (8086/1978 -- Pentium II/1998)exponential performance improvement over
time• number of transitors: increased almost 2500
times (29 K --> 7.5 M)• clock rate: 45 times (10 MHz -> 450 MHz)
![Page 3: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/3.jpg)
3
Moore’s Law (1969) The number of transistors on a microchip doubles
about every 18-24 months, assuming the price of the chip stays the same
The speed of a microprocessor doubles about every 18-24 months, assuming price stays the same
The price of a microchip drops about 48% every 18-24 months, assuming the performance metric (processor speed or
memory capacity) of the chip stays the same.
![Page 4: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/4.jpg)
4
Milestones of Chip Density
1K
10K
100K
1M
10M
70 90 92 94 96 98 00
100M
1G
Num
ber
of T
rans
isto
rs p
er
Chi
p
72 74 76 78 80 82 84 86 88 02
4004
•8080
•8085
8086••68000
80286• 68020•• 80386
•
LSI LogicGate Array
•80486
••
IBMGateArray
• Pentium ProMPU Only
••
•P7
Pentium ProMPU and CacheMemory Chip
▲
▲4K ▲
▲
16K
▲64K
256K▲
▲ 1M
▲4M
▲16M
▲64M
▲256MLSI LogicGate Array
= Memory (DRAM)
= Microprocossor and Logic
Pentium
1G
Memory Increase = 1.5/yearMPU Increase = 1.35/year
Year
•
Source: ICE
•▲
![Page 5: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/5.jpg)
5
Outline
Instruction Set Architecture (ISA)
Pipelining Concepts Processor Technology
CISC, RISC, superscalar, VLIW
Case Study Future processor
![Page 6: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/6.jpg)
6
Part 1: Instruction Set Architecture
(ISA)
![Page 7: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/7.jpg)
7
Computer Architecture’s Changing Definition 1950s to 1960s: Computer Arithmetic 1970s to mid 1980s: Instruction Set Design,
especially ISA appropriate for compilers 1990s: Design of CPU, memory system, I/O
system, Multiprocessors
![Page 8: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/8.jpg)
8
Instruction Set Architecture (ISA)
instruction set
software
hardware
![Page 9: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/9.jpg)
9
Interface DesignA good interface:
• Lasts through many implementations (portability, compatability)
• Is used in many different ways (generality)
• Provides convenient functionality to higher levels
• Permits an efficient implementation at lower levels
Interfaceimp 1
imp 2
imp 3
use
use
use
time
![Page 10: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/10.jpg)
10
What Operations Should be in Instruction Set?
How many are possible ? Which ones do we need ? Circuit complexity ? How frequently is each used ? How much slower would each be, if
implemented in terms of simpler ones ?
![Page 11: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/11.jpg)
11
Typically include:
ALU (25-40 % frequency of use) Data transfer (~15-40 %) Control flow (~15-25 %) System (~ 2%) Floating point (~ 15 %) Decimal and string (~ 15 %)
![Page 12: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/12.jpg)
12
4 types of flow operations:
Conditional branch (Branches): ~73%Unconditional branches (Jumps): ~14 %Procedure calls + return (Jump): ~ 13%
![Page 13: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/13.jpg)
13
Data types and sizes ?
How many are possible ? Which ones do we need ? How frequently are they used ? How much slower if implemented in
software ?
![Page 14: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/14.jpg)
14
Data Types
Integer: short, long, extra long. floating-point: single-, double-, quad-
precision. characters: char, strings. bit fields. binary coded decimal.
![Page 15: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/15.jpg)
15
Other Issues
What are the most common accesses (profile) ?
What should the instruction format be ?
![Page 16: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/16.jpg)
16
Conflicting goals:
code compactness less no. of lines in program (at machine level
after compilation) less memory, less I-Fetch bandwidth.
easy decoding want fixed format less expensive and faster I-decode.
![Page 17: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/17.jpg)
17
Ways to get code compactness: (An ideal case)
Huffman encoding --; e.g,50 % 'A' --- "0"25 % 'B' -- '10'12.5 % 'C" -- '110'12.5 % 'D' -- '111”
Variable length according to frequency Easy to implement ? Cost ?
![Page 18: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/18.jpg)
18
Evolution of Instruction Sets
Design decisions must take into account: technologymachine organizationprogramming languagescompiler technologyoperating systems
And they in turn influence these
![Page 19: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/19.jpg)
19
Aspects of CPU Performance
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
Inst Count CPI Clock Rate
Program X
Compiler X (X)
Inst. Set. X X
Organization X (X) X
Technology X
![Page 20: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/20.jpg)
20
Cycles Per Instruction (CPI)
CPU time = CycleTime * CPI * Ii = 1
n
i i
CPI = CPI * F where F = I i = 1
n
i i i i
Instruction Count
“Instruction Frequency”
Invest Resources where time is Spent!
CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count
“Average Cycles per Instruction”
![Page 21: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/21.jpg)
21
Example: Calculating CPI
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles CPI(i) (% Time)
ALU 50% 1 .5 (33%)
Load 20% 2 .4 (27%)
Store 10% 2 .2 (13%)
Branch 20% 2 .4 (27%)
1.5
![Page 22: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/22.jpg)
22
Part 2: Pipelining
![Page 23: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/23.jpg)
23
Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave
each have one load of clothes to wash, dry, and fold
Washer takes 30 minutes Dryer takes 40 minutes “Folder” takes 20 minutes
A B C D
![Page 24: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/24.jpg)
24
Sequential Laundry
Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take?
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
Task
Order
Time
![Page 25: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/25.jpg)
25
Pipelined Laundry: (Start work ASAP)
Pipelined laundry takes 3.5 hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
Task
Order
Time
30 40 40 40 40 20
![Page 26: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/26.jpg)
26
Computer Pipelining
Overlapping the execution of instructions.Instruction fetch (IF)Instruction decode (ID)Execute (EX)Write back (WB)
Some operation (e.g., IF, ID, EX) is performed on every instruction in the pipeline.
![Page 27: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/27.jpg)
27
Computer Pipelining
Pipelining increases the Throughput Throughput = no. of instructions executed in
a given time period
Hence, reduces the average execution time per instruction (or CPI).
![Page 28: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/28.jpg)
28
Pipelining Speedup:
k-stage linear pipeline, n tasks.Pipelined = k+(n-1) cycles.Unpipelined = n x k cycles.See the laundry example.
Speedup = Sk = nk / (k+n-1)
Sk k as n .
![Page 29: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/29.jpg)
29
Speedup (explanation)
At time t=0, the first pipeline operation enters the pipe.
After k pipeline clock cycles, the 1st result exits.
Then, 1 result exits per clock cycle.
![Page 30: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/30.jpg)
30
Pipelining Lessons Doesn’t help latency of
single task, but throughput of entire workload
Pipeline rate limited by slowest pipeline stage
Multiple tasks operating simultaneously
Potential speedup = Number pipe stages
Unbalanced lengths of pipe stages reduces speedup
A
B
C
D
6 PM 7 8 9
Task
Order
Time
30 40 40 40 40 20
![Page 31: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/31.jpg)
31
Computer Pipelines
Execute billions of instructions, so throughput is what matters
desirable features: all instructions same length, registers located in same place in instruction
format, memory operands only in loads or stores
![Page 32: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/32.jpg)
32
MIPS example: 5 Stage Pipelining
MemoryAccess
WriteBack
InstructionFetch
Instr. DecodeReg. Fetch
ExecuteAddr. Calc
IRLMD
![Page 33: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/33.jpg)
33
Visualizing Pipelining
Instr.
Order
Time (clock cycles)
![Page 34: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/34.jpg)
34
Limits to pipelining: Hazards !!
Hazards prevent next instruction from executing during its designated clock cycleStructural hazards: HW cannot support this
combination of instructions (single person to fold and put clothes away)
Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock)
Control hazards: Pipelining of branches & other instructions stall the pipeline until the hazard “bubbles” in the pipeline
![Page 35: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/35.jpg)
35
One Memory Port/Structural Hazards
Instr.
Order
Time (clock cycles)
Load
Instr 1
Instr 2
Instr 3
Instr 4
Use reg A
Use reg A
![Page 36: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/36.jpg)
36
One Memory Port/Structural Hazards
Instr.
Order
Time (clock cycles)
Load
Instr 1
Instr 2
stall
Instr 3
Wait for one cycle
![Page 37: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/37.jpg)
37
Data Hazard on R1
Instr.
Order
Time (clock cycles)
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
IF ID/RF EX MEM WB
All wait for the result of r1
![Page 38: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/38.jpg)
38
3 Generic “Data Hazards”
Assume InstrI followed by InstrJ
Read After Write (RAW) InstrJ tries to read operand before InstrI
writes it
![Page 39: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/39.jpg)
39
Read After Write (RAW)
1 2 543
1 2 543
Write
Read
Inst i
Inst j
read the old data.
![Page 40: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/40.jpg)
40
Write After Read (WAR)
InstrJ tries to write operand before InstrI reads i
Gets wrong operand
Can’t happen in MIPS 5 stage pipeline because: All instructions take 5 stages, and Reads are always in stage 2, and Writes are always in stage 5
![Page 41: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/41.jpg)
41
Write After Read (WAR)
1 2 543
1 2 543
Read
Write
Inst i
Inst j
Always read the correct data.
![Page 42: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/42.jpg)
42
Write After Write (WAW)
InstrJ tries to write operand before InstrI writes it
Leaves wrong result ( InstrI not InstrJ )
Can’t happen in MIPS’s 5 stage pipeline because:
All instructions take 5 stages, and Writes are always in stage 5
![Page 43: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/43.jpg)
43
Write After Write (WAW)
1 2 543
1 2 543
Write
Inst i
Inst j
Write
![Page 44: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/44.jpg)
44
Part 3: Processor Technology
![Page 45: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/45.jpg)
45
Evolution of Processor Design
Single Accumulator (EDSAC 1950)
Accumulator + Index Registers(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model from Implementation
High-level Language Based Concept of a Family(B5000 1963) (IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets Load/Store Architecture
RISC
(Vax, Intel 432 1977-80) (CDC 6600, Cray 1 1963-76)
(Mips,Sparc,HP-PA,IBM RS6000, . . .1987)Mixed (1998)
![Page 46: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/46.jpg)
46
CISC: Complex Instruction Set Computer.
more than 300 instructions ISA variable instruction/data formats small set of 8 to 24 general-purpose registers allow many memory reference operations
(addressing modes) CPI: 1 to 20 cycles, average CPI: 4 cycles Examples: INTEL x86 series (Pentium, Pentium Pro,
Pentium II), Motorola M680X0, Digital VAX 8600, IBM 390, AMD 486, Cyrix 686
![Page 47: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/47.jpg)
47
Intel CPU FamilyYear Model Features
1981 Intel 8088 16-bit, 29K, max speed 10 MHz
1982 Intel 80286 16-bit, 130K, max speed 12 MHz
1985 Intel 80386 32-bit, 275K, max speed 20 MHz
1989 Intel 80486 32-bit, 2M, 25 MHz
1993- Intel Pentium 3.11M, 66MHz, CMOS, 2-issue, 266 MHzwith MMX (16/16 KB cache)
1995- Intel Pentium Pro(Socket 7)
5.5M, 200 MHz, 8/8 KB cache, Bi-CMOS,64-bit data bus, 3-issue, no MMX
1997- Intel Pentium II(Slot 1)
16/16 KB L1 cache, 256/512KB L2 cache,300 MHz, 32-bit, 3-issue, with MMX
1998 512KB L2 (half speed), 450 MHz, 7.5 Mtransistors
1998 Intel Pentium II Xeon 512KB L2 (full speed), 450 MHz, 32-bit
![Page 48: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/48.jpg)
48
RISC: Reduced Instruction Set Computer.
Observation: Only 25% of a complex inst set is frequently
used about 95% of the time 75% of the hardware-supported instructions
are rarely used All instructions are of the same length Push rarely used inst into software Adding cache and Floating Point Units (FPU) in
processor chips
![Page 49: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/49.jpg)
49
RISC: other features Instruction set: less than 100 instructions Fixed 32- or 64-bit instruction format Only 3 to 5 simple addressing modes Single address mode for load/store: base +
displacementno indirection
Simple branch conditions
![Page 50: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/50.jpg)
50
RISC
Large register files: 32 integer registers + 32 floating point registers,
some has over 100
execute majority of the instruction in one cycle (average CPI: 1.5)
higher clock rate easy for compiler optimization
![Page 51: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/51.jpg)
51
Examples of RISC processors
SUN: SPARC, MicroSPARC, SuperSPARC, UltraSPARC,
MIPS: R2000/3000/4000/5000/8000, R10000, INTEL: i860, Digital : Alpha 21164, 21264, 21364, IBM, Apple, Motorola : PowerPC 601, 603, 604e,
620, 630, IBM : POWER2 (SP2), POWER 3 (ASCI Blue Pacific), HP: HP PA-RISC PA-8000
![Page 52: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/52.jpg)
52
Example: MIPS
Op
31 26 01516202125
Rs1 Rd immediate
Op
31 26 025
Op
31 26 01516202125
Rs1 Rs2
target
Rd Opx
Register-Register
561011
Register-Immediate
Op
31 26 01516202125
Rs1 Rs2/Opx immediate
Branch
Jump / Call
![Page 53: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/53.jpg)
53
Advanced Pipelining
cycle
Successive instruction
(a) Single-issue base pipeline
(b) 3-issue superscalar pipeline(c) Single-issue superpipeline
(d) 3-issue superscalar superpipeline
![Page 54: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/54.jpg)
54
Superscalar Processors (RISC+CISC)
Multiple instructions issued per cycle (IPC > 1). Clock rate matches that of generic scalar RISC. CPI is lower than generic scalar RISC.
Examples: Alpha 21064 (2-issue), 21164 (4-issue), PowerPC: 604e (4), 620(4) HP PA-7200 (2), PA-8000 (4), MIPS: R5000 (2), R10000 (4), SUN: MicroSPARC (2), UltraSparc-2 (4), INTEL i860 (RISC, 2 issues), Pentium (CISC, 2), Pentium
Pro (3), Pentium II (3)
![Page 55: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/55.jpg)
55
SUN SPARC (RISC) Scalable Processor ARChitecture, a specification,
not a chip. Larger register set: SPARC 128-144. Generations:
SPARC (1987), MicroSPARC, SuperSparc (1993), UltraSparc (1995), UltraSPACR II, Ultra III, Ultra IV,..
Machines:CM-5 : SPARC 33 MHz.CS-2 : SuperSparc 40 MHz (viking).SUN Sunfire (Enterprise 1000): UltraSparc
![Page 56: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/56.jpg)
56
UltraSPARC Roadmap
UltraSPARC II: 64-bit, 0.25 micron (same as
Pentium II, AMD K6-2), Max 360 MHz, 30 W, (400
MHz later 1998)
UltraSPARC III: 600 MHz late 1999, 1000 MHz
UltraSPARC IV: mid-2000, 1000 MHz, 0.15
micron, Sun’s first copper-based chip
UltraSPARC V: 1500 MHz, 0.07 micron (, by 2002
![Page 57: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/57.jpg)
57
PowerPC (RISC)
1991, by Apple, IBM, and Motorola. OS: IBM AIX, Apple Mac OS, NetWare 4,
OS/2, Sun Solaris, and Window NT, MS-DOS.
Technology update: September1998: IBM 400-MHz PowerPC
(copper wiring)
![Page 58: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/58.jpg)
58
PowerPC Family :
First generation : PowerPC 601: desktop PCs.
The 2nd generation : PowerPC 603 (603e, 166 MHz, 3 W, 81 mm2):
portable+battery-powered applications. PowerPC 604 (604e, 5.6 M transistors, Power 10 W,
dynamic branch prediction logic, 4-issue, 6-stage): sophisticated PCs and servers.
PowerPC 620: integrated L2 controller and dedicated cache interface, 4-issue, 5-stage, 30 W, used in servers or supercomputer.
![Page 59: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/59.jpg)
59
PowerPC 3rd generation: G3
L2 cache support, new bus architecture 32-bit processor, used in iMAC (1998) 0.25-micron, 67 mm2, 250 MHz, 5 W, 6.35 M
transistors. 5 execution units (similar to 603e),
1 floating point unit, 1 branch unit, 1 load/store unit, 2 single-cycle integer unit (603e only 1), 1 system unit
![Page 60: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/60.jpg)
60
PowerPC 3rd generation: G3
4-stage pipeline: fetch, decode-dispatch,
execute, complete-writeback
fetch unit fetches 4 instructions per clock peak rate: complete 3 instructions per clock Two 32 KB on-chip L1 caches (data + instruction)
: same as 604e, 8-way set associative
![Page 61: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/61.jpg)
61
PowerPC 3rd generation: G3
On-chip L2 cache: 2-way set associative of sizes 256 KB, 512 KB or 1 MB
Performance: 250 MHz CPU clock, 50 MHz system bus, half-speed 1-MB L2 cache: 10 SPECint95
Bus protocol: MEI (modified exclusive, invalid), used for single or dual-processor design
![Page 62: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/62.jpg)
62
IBM POWER
Performance Optimized With Enhanced RISC POWER2, 66.7 MHz, 6-issue, used in IBM SP2
4 floating-point operations at once cycle. peak performance: 266 Mflops (66.7 x 4).
ASCI Blue Pacific : using POWER3 3.9 trillion calculation per second 15,000 times faster than the desktop PC at Lawrence Livermore National Lab. 2.6 trillion bytes of memory 1 second = 63,000 years using hand calculator 4096 POWER3 CPUs (8 CPUs per node)
![Page 63: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/63.jpg)
63
POWER3: Superscalar RISC 8-issue, (most other processors 4-issue) 200 MHz (slow but fast), 0.25 micron, 5 metal layers, 1088 pin IBM’s first 64-bit microprocessor. Used in ASCI Blue Pacific, 4096 nodes, Memory subsystem: 6.4 GB/s POWER3 workstation, Oct. 23 1998: RS/6000 43P
Model 260, single or dual-processor, up to 8 (in SMP form), 4MB L2 instruction cache+256 MB SDRAM memory
Compatable with PowerPC design
![Page 64: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/64.jpg)
64
IBM
1999: 0.18 micron, copper wiring 2000: silicon-on-insulator 2001: “gigachip” (POWER4 ??)
![Page 65: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/65.jpg)
65
MIPS:
R4000 (1991, 64-bit), R8000 (1994), R10000
1995 or 1996-, 30 W. 5.9 M transistors, 32/32 KB cache5-7 pipeline stages, 4-issue
SGI Power Challenge up to 18 X R8000 or x 36 R10000
![Page 66: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/66.jpg)
66
Other Commodity Processors
HP PA-RISC (Precision Architecture): PA-RISC 7200 (CONVEX Examplar, 128
processors)
DEC: Alpha 21064 (CRAY T3D), 21164 (300 MHz, T3E), 21264 (T3E1200)
Intel 80860 (i860): ``Cray on a Chip'' 66 MFLOPS (Cray 1S = 85
MFLOPS)
![Page 67: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/67.jpg)
67
VLIW: Very Long Instruction Word. Use even more functional units than that of a superscalar
processor. All instructions are the same length The operations in each work are chosen by the compiler. CPI is further lower than superscalar. Clock rate is slow. Microprogrammed control, synchronization of parallel
operations is entirely done at compile time No commodity processor is designed in VLIW (but it is
coming back !! INTEL 64-bit Merced)
![Page 68: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/68.jpg)
68
Register File
Load/Store Integer ALU FP Unit Branch Unit
Mainmemory
Load/store FP ADD FP Multiply Branch .... Integer ALU
(b) pipeline execution of VLIW instruction
Cycle
(a) A VLIW processor architecture and instruction format
![Page 69: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/69.jpg)
69
SummaryCISC RISC VLIW
InstructionComplexity
Varies fromSimple tocomplex
One simple operation Many simpleIndependentoperations
Instruction size Varies One size, usually 32bits
One size
Instructionformat
Field placementvaries
Regular, consistentfield placement
Regular, consistentfield placement
Memory reference Bundled withoperations
No bundled,load/store
architecture
No bundled,load/store
architecture
Hardware designfocus
MicrocodedImplementations;
One or morepipeline
No microcode; oneor more pipelines
Multiple pipelines, nomicrocode, no
complex dispatchlogic
Registers Few, sometimesspecial
Many, generalpurpose
Many, generalpurpose
![Page 70: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/70.jpg)
70
Processor Performance Processor Clock IPC stage SPEC95int
SPEC95fp
Alpha 21164 500 4(2+2) 7-12 12.6 18.3 PowerPC620 200 4 5 9.0 9.0 PowerPC 604e 225 4 6 8.5 7.0 UltraSPARC II 250 4 6-9 8.5 15 HA-8000 180 4 7-9 10.8 18.3 MIPS R10000 200 4 5-7 8.9 17.2 Pentium Pro 200 3(2+1) 12-14 8.7 6.0
Only 1 floating point unit active at a time
![Page 71: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/71.jpg)
71
Case Study 1: INTEL Processors
![Page 72: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/72.jpg)
72
Pentium (430 TX Mother Board)
Main Memory(DRAM
Max 256 MB)
Main Memory(DRAM
Max 256 MB)
Memory controller82349TX (MTXC)
82371AB(PIIX4)
Bus Master
PCI Bus (3.3 V or 5V, 30/33 MHz)
PCIDevice
EISA/ISADevice
EISA/ISADevice
ISA/EIO Bus (3.3 V; 5V)
Pentium processor 32-bit address64-bit data500 MB/s (8x60)Host Bus (3.3 V; 60-66 MHz)
USB USB
L2 cache(Max 512 KB)
HD CD-ROM
BMI IDE (33 MB/s)
(up to 5)
![Page 73: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/73.jpg)
73
Pentium Pro cache:
8 KB data+8 KB instruction cache (L1)On-board L2 cache: 256 KB or 512 KB
40 general purpose registers Data TLB: 64 entries No MMX up to 200 MHz, 35W:
integration of high-speed CPU with high-speed cache is not easy
![Page 74: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/74.jpg)
74
Execution in Pentium Pro
Five functional units:Store data unitStore address unitLoad address unit Integer ALU unitFloating point/integer unit
3-issue but only one floating point op Peak flop rate= 200 MFLOP at 200 MHz
![Page 75: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/75.jpg)
75
Pentium Pro (P6)
Pentium ProPentium ProCoreCore
8 KB L18 KB L1DataData
CacheCache
8 KB L18 KB L1InstructionInstruction
CacheCache
Bus Interface Unit
256/512 KB256/512 KBUnified Unified
L2 cacheL2 cache
LocalAPIC
External Bus
Substrate
Half-speed
Full-speed Backside Bus
![Page 76: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/76.jpg)
76
Pentium Pro Memory Subsystem
L1 cache (Data cache 8 KB): supporting one load and one store per cycle (full-
speed), peak bandwidth of 3.2 GB/s on a 200 MHz
L2 cache: run at full CPU clock speed, can transfer 64 bit per
cycle (1.6 GB/sec on a 200 MHz Pentium Pro)
External bus: 64-bit, 64-bit per bus cycle SMP support
Full cache coherence up to 4 processors
![Page 77: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/77.jpg)
77
INTEL Pentium Pro SMP
Processor Bus: 532 MB/s (66.6 MHz x 64 bits) Four-way interleaved DRAMs, EDO or
synchronous DRAM Interface to EISA or PCI Bus operations:
write-back cache, MESI protocolpipeline depth: 8
![Page 78: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/78.jpg)
78
P6 P6P6P6
PCI BridgeDRAM
ControllerDataPath
MICMIC MICMIC MICMIC MICMIC
Memory controller
MIC: memory interface controller
EISA/ISABridge
PCI Bus
PCIDevice
PCIDevice
EISA/ISADevice
EISA/ISADevice
EISA/ISA Bus
Pentium Pro processor bus
Interleave data(288 bits)
32-bit address64-bit data500 MB/s
32-bit address32-bit data132 MB/s
Mem data(72 bits)
![Page 79: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/79.jpg)
79
Pentium II Larger L1 cache: 16/16 KB L2 cache: 512 KB unified cache thru backside bus Add MMX features back (like Pentium MMX) Slot 1 architecture (240 pins) Clock speed improved: 233, 266, 300,... 450 MHz. SMP support: up to TWO only Deschutes version (1998): >= 333 MHz, 100 MHz
external bus (440BX chipset), AGP, SMP support: 4 processors, Slot 2 (330 pins) ?
![Page 80: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/80.jpg)
80
Pentium II (P6)
Pentium IIPentium IICoreCore
16 KB L116 KB L1DataData
CacheCache
16 KB L116 KB L1InstructionInstruction
CacheCache
Bus Interface Unit
512 KB512 KBUnified Unified
L2 cacheL2 cache
LocalAPIC
External Bus
Substrate
Half-speed
Full-speed
![Page 81: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/81.jpg)
81
AMD K6-2
3DNow technology: MMX support 4 floating point units (4-issue); Pentium II only
one floating point unit 300 MHz AMD K6-2: 1.2 GFLOPS > Pentium II
450 MHz Socket 7, 100 MHz external bus, 0.25 micron 9.3 M transistors K6-3 (SharkTooth): 350, 450 MHz, 256 KB on-
chip L2 cache, 21.3 M transistors
![Page 82: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/82.jpg)
82
Intel Processors for Each Market Segment
PentiumPentium®® Pro Pro ProcessorProcessorPentiumPentium®® Pro Pro ProcessorProcessor
PentiumPentium®® II Xeon™ II Xeon™ ProcessorProcessorPentiumPentium®® II Xeon™ II Xeon™ ProcessorProcessor
Pentium II Pentium II ProcessorProcessorPentium II Pentium II ProcessorProcessor
Pentium II Pentium II ProcessorProcessorPentium II Pentium II ProcessorProcessor
PentiumPentium®® Processor Processorwith MMXwith MMX™™ TechnologyTechnology
PentiumPentium®® Processor Processorwith MMXwith MMX™™ TechnologyTechnology
Intel Intel ®® Celeron™ Celeron™ ProcessorProcessorIntel Intel ®® Celeron™ Celeron™ ProcessorProcessor
Pentium Processor Pentium Processor with MMX with MMX TechnologyTechnology
Pentium Processor Pentium Processor with MMX with MMX TechnologyTechnology
Mobile Pentium II Mobile Pentium II ProcessorProcessorMobile Pentium II Mobile Pentium II ProcessorProcessor
’’9797’’9797 ’’9898’’9898
Basic Basic PC DesktopPC Desktop
Basic Basic PC DesktopPC Desktop
Mid- to High-End Mid- to High-End Server/Server/
WorkstationWorkstation
Mid- to High-End Mid- to High-End Server/Server/
WorkstationWorkstation
Entry-level Server/Entry-level Server/WorkstationWorkstation
Entry-level Server/Entry-level Server/WorkstationWorkstation
Performance Performance DesktopDesktop
Performance Performance DesktopDesktop
Mobile PCMobile PCMobile PCMobile PC
0.25
mic
ron
P6
Mic
roar
chit
ectu
re C
ore
0.25
mic
ron
P6
Mic
roar
chit
ectu
re C
ore
![Page 83: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/83.jpg)
83
INTEL Merced (mid-2000)
64-bit processor VLIW concept? Need good compiler technique run UNIX (more scalable than NT)
![Page 84: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/84.jpg)
84
INTEL McKinley (late-2001)
64-bit, 0.13 micron More cache memory than any other
INTEL processors aims at 1000 MHz, 2x faster than Merced Need good compiler technique
![Page 85: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/85.jpg)
85
INTEL Foster
32-bit, 0.13 micron 1000 MHz high-end PC server longer pipeline + “instruction trace cache”
![Page 86: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/86.jpg)
86
Intel Roadmap
32-bit: P5: Pentium (1993), P6: Pentium Pro (1995), Pentium II (450 MHz) Celeron 2, Pentium II Xeon (450 MHz), Tanner and Cascades chip (1999), P7: Willamette (desktop), Foster (1000 MHz, high-end PC server)
64-bit: Merced and McKinley
![Page 87: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/87.jpg)
87
Intel vs. Compaq 64-bit roadmaps:
Year Intel's IA-64 Compaq's Alpha 1998 in progress 21264 at 575 MHz 1999 first samples 21264 at 750 MHz to 1 GHz mid-2000 Merced at 800 MHz + 21364 at 1 GHz + late 2001 McKinley at 1 GHz + EV8 2002 Madison 2003(?) Deerfield
![Page 88: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/88.jpg)
88
Digital Alpha 21164
0.35 m , 500 MHz, RISC
4-way issue superscalar Up to 2 Integer and 2
Floating point instructions issues per CPU cycle
Large on-chip L2 cache 96 KB, writable, 3-way set
associative
9.3 M transistors Fully pipelined
7-stage integer pipeline 9-stage floating point
pipeline
High-through memory subsystem (400 MB/s)
![Page 89: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/89.jpg)
89
Alpha 21164 Block Diagram
Instcache(8KB)
Instcache(8KB)
4-wayissueunit
4-wayissueunit
Int UnitInt Unit
FP +FP +
FP *FP *
Merge Log
Merge Log
Write-through
DataCache(8KB)
Write-through
DataCache(8KB)
Write-backL2
Cache(96KB)
Write-backL2
Cache(96KB)
BusInterface
Unit
BusInterface
UnitL3
CacheL3
CacheInt UnitInt Unit
128-bit internal data bus
Inst Unit Exec Unit Memory Unit
128bit data
40bit address
![Page 90: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/90.jpg)
90
Current Status of Processor Technology
Still don’t work well for some applications:data bases, CAD tools, sparse matrix,..
Alpha 21164, 300 MHz, 4-way superscalarRunning Microsoft SQLserver database on Windows NT It operates at 12% of peak performance
Caches don’t work. Speed is tied to memory bandwidth.
![Page 91: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/91.jpg)
91
Microprocessor-DRAM performance gap full cache miss time = 100s instructions Alpha 7000 server: 340 ns/5.0 ns = 68
clks (2-issue, x2 = 136 insts) Alpha 8400 Server: 266 ns/3.3ns = 80
clks (21164 processor, 4-issue, x 4 = 320 insts)
Rely on locality + caches to bridge gap
![Page 92: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/92.jpg)
92
Processor-Memory Gap
µProc60%/yr.(2X/1.5yr)
DRAM9%/yr.(2X/10 yrs)1
10
100
1000
198
0198
1 198
3198
4198
5 198
6198
7198
8198
9199
0199
1 199
2199
3199
4199
5199
6199
7199
8 199
9200
0
DRAM
CPU198
2
Processor-MemoryPerformance Gap:(grows 50% / year)
Per
form
ance
Time
“Moore’s Law”
Processor-DRAM Memory Gap (latency)
![Page 93: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/93.jpg)
93
Future Processors
Specialized, very long instruction word (VLIW) machines
Wide, simultaneous multithreaded (SMT) uniprocessor
Single-chip multiprocessor Memory-centric computing engines (IRAM,
PPRAM,CRAM)
![Page 94: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/94.jpg)
94
IRAM: Berkeley
Growing performance gap between CPU and memory access speed
Microprocessor and DRAM on single chip Bridge the processor-memory
performance gap via on-chip latency and bandwidth
improve power-performance
![Page 95: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/95.jpg)
95
CRAM (Univ. Toronto)
Computation moved from the CPU into the memory
CRAM= RAM+SIMD PetaOPS performance (1015 operations
per second) Bandwidth internal to memory: 2.9 TB/s Cache/CPU: 800 MB/s
![Page 96: 1 Lecture 6 Processor Technology. 2 Advance in Hardware INTEL Family: (8086/1978 -- Pentium II/1998) exponential performance improvement over time number](https://reader036.vdocuments.net/reader036/viewer/2022070407/56649e1b5503460f94b08821/html5/thumbnails/96.jpg)
96
Other Projects
PPRAM Project (Kyushu Univ., Japan): Parallel Processing RAM Chip
CMP Project (Stanford)billion-transistor processor architecturesingle-chip multiprocessor (4 to 16)New ISAs