computer architecture chapter 1 computer abstractions and technology

48
Computer Architecture Chapter 1 Computer Abstractions and Technology Yu-Lun Kuo 郭郭郭 Department of Computer Science and Information Engineering Tunghai University, Taichung, Taiwan R.O.C. [email protected] http://www.csie.ntu.edu.tw/~d95037/

Upload: taariq

Post on 27-Jan-2016

63 views

Category:

Documents


0 download

DESCRIPTION

Computer Architecture Chapter 1 Computer Abstractions and Technology. Yu-Lun Kuo 郭育倫 Department of Computer Science and Information Engineering Tunghai University, Taichung, Taiwan R.O.C. [email protected] http://www.csie.ntu.edu.tw/~d95037/. This book. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computer Architecture Chapter 1  Computer Abstractions and Technology

Computer ArchitectureChapter 1

Computer Abstractions and Technology

Yu-Lun Kuo 郭育倫Department of Computer Science and

Information EngineeringTunghai University, Taichung, Taiwan R.O.C.

[email protected] http://www.csie.ntu.edu.tw/~d95037/

Page 2: Computer Architecture Chapter 1  Computer Abstractions and Technology

This book• http://www.elsevierdirect.com/product.jsp?isbn=978012

3744937

2

Page 3: Computer Architecture Chapter 1  Computer Abstractions and Technology

112/04/21 3

Related Courses

ComputerOrganization

ComputerOrganization

ComputerArchitecture

ComputerArchitecture

Parallel & Advanced Computer Architecture

Parallel & Advanced Computer Architecture

EmbeddedSystems Software

EmbeddedSystems Software

How to build it,Implementationdetails

Why, Analysis,Evaluation

Parallel Architectures,Hardware-Software InteractionsSystem Optimization

RTOS, Tools-chain,I/O & Device drivers,

Compilers

Hardware-SoftwareCo-design

Hardware-SoftwareCo-design

How to make embedded systems better

SoftwareSoftware

OS,Programming Lang,System Programming

Special Topics onComputer Performance

Optimization

Special Topics onComputer Performance

Optimization

Performance tools, Performance skills,Compiler optimization tricks

Page 4: Computer Architecture Chapter 1  Computer Abstractions and Technology

Computer Architecture and Organization

• Architecture is those attributes visible to the programmer– Instruction set, number of bits used for data

representation, I/O mechanisms, addressing techniques.

– e.g. Is there a multiply instruction?

• Organization is how features are implemented– Control signals, interfaces, memory technology.– e.g. Is there a hardware multiply unit or is it done by

repeated addition?

Page 5: Computer Architecture Chapter 1  Computer Abstractions and Technology

Computer Architecture and Organization

• All Intel x86 family share the same basic architecture

• The IBM System/370 family share the same basic architecture

• This gives code compatibility– At least backwards

• Organization differs between different versions

Page 6: Computer Architecture Chapter 1  Computer Abstractions and Technology
Page 7: Computer Architecture Chapter 1  Computer Abstractions and Technology

Class of Computing Applications (1/2)

• Desktop computers– Emphasize delivering good performance to a

single user at low cost– Price-performance, Graphics performance

• Intel, AMD, Apple, Microsoft, Linux

• Servers– Accessed only via a network– Provide for greater expandability of both

computing and input/output capacity– Availability, Scalability, Throughput

• IBM, HP-Compaq, Sun, Intel, Microsoft, Linux

04/21/23 7

Page 8: Computer Architecture Chapter 1  Computer Abstractions and Technology

Class of Computing Applications (2/2)

• Supercomputers– Consist of hundreds to thousands of processors– Usually gigabytes to terabytes of memory– Terabyte to petabytes of storage– Cost million to hundreds of millions of dollars

• Embedded computers– Computer inside another device– Include the microprocessors

• Washing machine, car, cell phone, video game, PDA, and digital TVs

04/21/23 8

Page 9: Computer Architecture Chapter 1  Computer Abstractions and Technology

9

Page 10: Computer Architecture Chapter 1  Computer Abstractions and Technology

百萬台電

圖 1.1 從 1988 至 2002年,不同種類的處理器的銷售量。這些數字的獲得有些許不同,因此需要注意這些結果的解釋。如桌上型電腦和伺服器的總數計算完整的電腦系統,因為其中的一部份為多重處理器,使的處理器的銷售數字較高些,但大約只有全部的 10~20%( 由於伺服器平均雖有著超過一顆以上的處理器,但僅為單一處理器系統的桌上型電腦銷售量 3%)。嵌入式電腦的總數,實際上是計算處理器的數目。有些嵌入式系統是看不見處理器的,更有些單一設備卻有多顆的處理器。

Where is the Market?

Page 11: Computer Architecture Chapter 1  Computer Abstractions and Technology

Instruction Set Architecture (ISA)

• ISA: An abstract interface between the hardware and the lowest level software of a machine that encompasses all the information necessary to write a machine language program that will run correctly, including instructions, registers, memory access, I/O, and so on.“... the attributes of a [computing] system as seen by the

programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementation.” – Amdahl, Blaauw, and Brooks, 1964

Page 12: Computer Architecture Chapter 1  Computer Abstractions and Technology

百萬顆處理

圖 1.2 1998 至 2002年所有的指令集架構為處理器的銷售量。關於「其餘」的種類是指定應用或客製化的處理器。在 ARM的例子裡,大約有 80%的銷售量是使用在手機上,他們結合了 ARM和特定應用邏輯在單一晶片上。

Page 13: Computer Architecture Chapter 1  Computer Abstractions and Technology

Hierarchical Layers

• System Software– Sitting between the hardware and applications

software– Including operating systems, compilers, and

assemblers

04/21/23 13

Page 14: Computer Architecture Chapter 1  Computer Abstractions and Technology

Compilers & assemblers

• Compilers– Translation of a program written in a high-level

language, such as C or JAVA, into instructions that the hardware can execute

• Assemblers– Translates a symbolic version of an instruction into

the binary version

• Assembly language– A symbolic representation of machine instructions

04/21/23 14

Page 15: Computer Architecture Chapter 1  Computer Abstractions and Technology

04/21/23 15

Page 16: Computer Architecture Chapter 1  Computer Abstractions and Technology

編譯器

組譯器

高階語言程式(c語言 )

組合語言程式(MIPS規格 )

二位元機械語言程式(MIPS規格 )

圖 1.4 C程式編譯成組合語言在組譯成二位元機械語言。雖然從高階語言轉譯成二位元機械語言有兩個步驟,有些編譯器會將中間過程刪除,直接產生二位元機械語言。這些語言和程式在第二章會有更為詳細的介紹。

Page 17: Computer Architecture Chapter 1  Computer Abstractions and Technology

112/04/21 17

What is “Computer Architecture”?

Applications

Instruction Set Architecture (ISA)

Compiler

OperatingSystem

Firmware

• Coordination of many levels of abstraction• Under a rapidly changing set of forces• Design, Measurement, and Evaluation

I/O systemInstr. Set Proc.

Digital Design

Circuit Design

Datapath & Control

Layout & fab

Semiconductor Materials

Page 18: Computer Architecture Chapter 1  Computer Abstractions and Technology

Registers vs. Memory

• Arithmetic instructions operands must be registers, – only 32 registers provided

• Compiler associates variables with registers• What about programs with lots of variables

Page 19: Computer Architecture Chapter 1  Computer Abstractions and Technology

Impacts of Advancing Technology

• Processor– logic capacity: increases about 30% per year– performance: 2x every 1.5 years

ClockCycle = 1/ClockRate

500 MHz ClockRate = 2 nsec ClockCycle

1 GHz ClockRate = 1 nsec ClockCycle

4 GHz ClockRate = 250 psec ClockCycle

Page 20: Computer Architecture Chapter 1  Computer Abstractions and Technology

Impacts of Advancing Technology

• Memory– DRAM capacity: 4x every 3 years, now 2x every 2

years– memory speed: 1.5x every 10 years– cost per bit: decreases about 25% per year

• Disk– capacity: increases about 60% per year

04/21/23 21

Page 21: Computer Architecture Chapter 1  Computer Abstractions and Technology

圖 1.6 桌上型電腦。液晶顯示螢幕是主要的輸出裝置,鍵盤與滑鼠為主要的輸入裝置。主機箱內則包含了處理器和額外的輸入 /輸出裝置。本圖是Dell Optiplex GX260系統。

Page 22: Computer Architecture Chapter 1  Computer Abstractions and Technology

電源供應器

有罩子的風散

主機板

DVD驅動器

ZIP驅動器

硬碟

圖 1.8 在 15頁圖 1.6的個人電腦內部圖。這種包裝因為它開啟的方式,旁邊有絞鍊,所以有時稱做蛤殼式 (clamshell)包裝。為了看看裡邊有什麼,我們從左上角開始。左上角的金屬盒是電源供應器,下方是個有罩子的風散。在風扇的右下方是印刷電路板 (printed circuit (PC)board),在電腦裡稱做主機板,包含了電腦裡大部分的電子零件。圖 1.10是個接近此種板子的圖例。處理器就是在風扇右邊的大型凸起矩形物。在右手邊我們可以看見擺放各種驅動盤機器的隔間,最上面是 DVD驅動器,中間是 ZIP驅動器,下面是硬碟。

Page 23: Computer Architecture Chapter 1  Computer Abstractions and Technology

Example Machine Organization

• Workstation design target– 25% of cost on processor– 25% of cost on memory (minimum memory size)– Rest on I/O devices, power supplies, box

CPU

Computer

Control

Datapath

Memory Devices

Input

Output

Page 24: Computer Architecture Chapter 1  Computer Abstractions and Technology

編譯器

介面

電腦輸入

輸出

控制單元

資料路徑

處理器 記憶體

效能評估

圖 1.5 構成電腦五種要素的組織圖。處理器從記憶體中抓取指令和資料。記憶體中的資料由輸入裝置寫入,並由輸出裝置讀出。控制單元則送出運作訊號以決定資料流程、記憶體、輸入和輸出裝置的動作。

Page 25: Computer Architecture Chapter 1  Computer Abstractions and Technology

Inside the Pentium 4 Processor Chip

Page 26: Computer Architecture Chapter 1  Computer Abstractions and Technology

控制單元 其它介面邏輯

控制單元 輸入 /輸

出介面

指令快取記憶體資料快取記憶體增強型浮點

及多媒體運算單元

控制單元

控制單元

第二階快取及記憶體介面

進階管線化多執行緒支援單元

圖 1.9 在圖 1.8的電路板上所使用的處理器的內部圖。左手邊的是 Pentium4處理器晶片的縮影照片,右手邊則顯示了該處理器內部的主要區塊。

Page 27: Computer Architecture Chapter 1  Computer Abstractions and Technology

處理器 記憶

處理器介面

輸入 /輸出裝置匯流排插槽

圖形化介面卡

碟盤及通用序列埠介面

圖 1.10 貼近個人電腦主機板。這塊板子使用 Intel Pentium 4處理器,位於板子的左上角。它的上面覆蓋了一個似鰭狀的金屬散熱器。這是個散熱裝置,幫助晶片散去熱量。記憶體部分包含了一個或多個電路板, 垂直插在主機板上,靠近中央。動態隨機存取記憶體鑲嵌在這些小電路板上 (稱之為雙同軸記憶體模組 (dual inline memory modules,DIMMS)),然後插入進接器。主機板上其餘的大部分用來連接外部輸入 /輸出裝置,如音頻信號 /MIDI 、右邊的平行 / 序列埠、底部的兩個週邊元件連接介面(PCI) 卡插槽和連接硬碟的進階連接技術 (advanced technology attachment,ATA) 連接器。

Page 28: Computer Architecture Chapter 1  Computer Abstractions and Technology

Safe Place for Data

• Memory– Primary memory (Main memory)

• Volatile, when it loses power

– Secondary memory• Nonvolatile memory• Magnetic disk – hard disk

• Floppy disks• Optical disks

– CDs, DVDs, HDVD, BD

• Flash based removable memory

04/21/23 29

Page 29: Computer Architecture Chapter 1  Computer Abstractions and Technology

圖 1.11 圖中顯示了 10片碟盤和讀寫頭的硬碟。

Page 30: Computer Architecture Chapter 1  Computer Abstractions and Technology

Total transistors in PCs

• 1972 – 4004 - 2000 trs• 1974 – 8080 - 7000 trs• 1978 – 8086 - 50,000 trs• 1982 – 286 - 200,000 trs• 1985 – 386 - 500,000 trs• 1987 – 486 - 1 million trs• 1992 – Pentium - 5 million trs• 1995 – Pentium II - 7 million trs• 1999 – Pentium III - 10 million trs

04/21/23 31

Page 31: Computer Architecture Chapter 1  Computer Abstractions and Technology

Moore’s Law

• In 1965, Gordon Moore predicted that the number of transistors that can be integrated on a die would double every 18 to 24 months (i.e., grow exponentially with time).

• Amazingly visionary – million transistor/chip barrier was crossed in the 1980’s.– 2300 transistors, 1 MHz clock (Intel 4004) - 1971– 16 Million transistors (Ultra Sparc III)– 42 Million transistors, 2 GHz clock (Intel Xeon) – 2001– 55 Million transistors, 3 GHz, 130nm technology, 250mm2

die (Intel Pentium 4) - 2004– 140 Million transistor (HP PA-8500)

Page 32: Computer Architecture Chapter 1  Computer Abstractions and Technology

112/04/21 33

Moore’s Law

• “Cramming More Components onto Integrated Circuits”– Gordon Moore, Electronics, 1965

• # on transistors on cost-effective integrated circuit double every 18 months

Page 33: Computer Architecture Chapter 1  Computer Abstractions and Technology

20 到 40 道的製程

矽碇 薄片

空白晶圓

將晶片封裝

測試過的晶片

切割機

測試過的晶圓

晶圓測試機

圖樣晶圓

封裝過的晶片

零件測試機

測試過的封裝晶片

賣給顧客

圖 1.14 晶片的製造過程。矽碇在切成薄片後,空白的晶圓會經過 20 到 40 道的圖樣製造 ( 查閱第 28 頁圖 1.15),處理過後的晶圓會以晶圓測試機測試,並顯示好的部份的電腦 映圖。之後晶圓會被切成一塊一塊的小方塊, ( 查閱第 19 頁的圖 1.9)。在本圖裡,這片晶圓有 20個晶片,其中有 17個通過測試 (x 表示壞的晶片 )。本例中的良率是 17/20/ 即 85%,之後好的晶片會封裝起來,在賣給消費者前在測試一次。這個例子裡,封裝過後的晶片有一顆是壞的。

Page 34: Computer Architecture Chapter 1  Computer Abstractions and Technology

圖 1.15 包含了 Intel Pentium 4晶片的 8 吋 (200mm)晶圓。百分 之百良率的晶圓裡,有 165 顆 Pentium晶片。第 19 頁圖 1.9 便是這些 Pentium4晶片的顯微照片。一顆晶片的面積為 250 ,裡頭有 5500萬顆電晶體,使用 0.18製程,意思是最小的電晶體大 小約 0.18 微米,然而一般來說它們會稍微較實際的製程大小較小些,而實際的製程大小意指電晶體的大小相對於最後製造出的大小是差不多的。 Pentium4晶片也有使用更先進的 0.13製程製造。晶圓的周圍有數十顆部份製造的晶片是無用的,它們之所以會被製造,是如此一來會較容易設計晶圓圖樣所需的光罩圖。

2mm

Page 35: Computer Architecture Chapter 1  Computer Abstractions and Technology

圖 1.16 散熱片上的 Intel Pentium4(3.06Ghz)晶片,散熱片要散去晶片所製造出的 82瓦熱量。

Page 36: Computer Architecture Chapter 1  Computer Abstractions and Technology

年 使用於電腦的 技術 相對效能 /單位成本1951 真空管 (vacuum tube) 11965 電晶體 351975 積體電路 9001995 超大型積體電路 2,400,0002005 極大型積體電腦 6,200,000,000

圖 1.12 長時間以來,使用在電腦的各項技術其單位成本的相對效能。資料來源 : 波士頓電腦 博物館, 2005 年為作者推算而得。

Page 37: Computer Architecture Chapter 1  Computer Abstractions and Technology

效能

圖 1.17 1978~2003年,工作站效能增進圖。此處,效能以大約比 VAX-11/780快幾倍的數字表示,這是常用的衡量標準。每年的效能成長率介於 1.5 和 1.6 倍間。這些效能數字是基於 SPECint(見第二章 ),根據時間之不同調整以應付測試程式的變動。處理器名字後方所列出的 x/y , x是模型數字, y是速度 (MHz)。

Page 38: Computer Architecture Chapter 1  Computer Abstractions and Technology

發表時間

千位元容

圖 1.13 動態隨機存取記憶體晶片隨時間演變的容量成長圖。 Y 軸以千位元做量測,千指的是 1024 。這二十年來,動態隨機存取記憶體工業幾乎每三年便會提高四倍的容量,相當每年百分 之六十。每三年增加四倍的估計為動態隨機存取記憶體的成長法則。近年來,成長率已經逐漸趨緩,而收為接近每二年倍增或每四年增加四倍。

)(210

Page 39: Computer Architecture Chapter 1  Computer Abstractions and Technology

04/21/23 40

Disks: Archaic (Nostalgic) vs. Modern (Newfangled)

• Seagate 373453, 2003

• 15000 RPM (4X)

• 73.4 GBytes (2500X)

• Tracks/Inch: 64000 (80X)

• Bits/Inch: 533,000 (60X)

• Four 2.5” platters (in 3.5” form factor)

• Bandwidth: 86 MBytes/sec (140X)

• Latency: 5.7 ms (8X)

• Cache: 8 MBytes

• CDC Wren I, 1983• 3600 RPM• 0.03 GBytes capacity• Tracks/Inch: 800 • Bits/Inch: 9550 • Three 5.25” platters

• Bandwidth: 0.6 MBytes/sec

• Latency: 48.3 ms• Cache: none

Page 40: Computer Architecture Chapter 1  Computer Abstractions and Technology

04/21/23 41

Latency Lags Bandwidth(for last ~20 years)

• Performance Milestones

• Disk: 3600, 5400, 7200, 10000, 15000 RPM (8x, 143x)(latency = simple operation w/o contentionBW = best-case)

1

10

100

1000

10000

1 10 100

Relative Latency Improvement

Relative BW

Improvement

Disk

(Latency improvement = Bandwidth improvement)

Page 41: Computer Architecture Chapter 1  Computer Abstractions and Technology

04/21/23 42

Memory: Archaic (Nostalgic) vs. Modern (Newfangled)

• 1980 DRAM (asynchronous)

• 0.06 Mbits/chip• 64,000 xtors, 35 mm2

• 16-bit data bus per module, 16 pins/chip

• 13 Mbytes/sec• Latency: 225 ns• (no block transfer)

• 2000 Double Data Rate Synchr. (clocked) DRAM

• 256.00 Mbits/chip (4000X)• 256,000,000 xtors, 204 mm2

• 64-bit data bus per DIMM, 66 pins/chip (4X)

• 1600 Mbytes/sec (120X)• Latency: 52 ns (4X)• Block transfers (page mode)

Page 42: Computer Architecture Chapter 1  Computer Abstractions and Technology

04/21/23 43

Latency Lags Bandwidth(last ~20 years)

• Performance Milestones

• Memory Module: 16bit plain DRAM, Page Mode DRAM, 32b, 64b, SDRAM, DDR SDRAM (4x,120x)

• Disk: 3600, 5400, 7200, 10000, 15000 RPM (8x, 143x)

(latency = simple operation w/o contentionBW = best-case)

1

10

100

1000

10000

1 10 100

Relative Latency Improvement

Relative BW

Improvement

MemoryDisk

(Latency improvement = Bandwidth improvement)

Page 43: Computer Architecture Chapter 1  Computer Abstractions and Technology

04/21/23 44

LANs: Archaic (Nostalgic) vs. Modern (Newfangled)

• Ethernet 802.3 • Year of Standard: 1978• 10 Mbits/s

link speed • Latency: 3000 sec• Shared media• Coaxial cable

• Ethernet 802.3ae • Year of Standard: 2003• 10,000 Mbits/s

(1000X)link speed

• Latency: 190 msec

(15X)• Switched media• Category 5 copper wire

Coaxial Cable:

Copper coreInsulator

Braided outer conductorPlastic Covering

Copper, 1mm thick, twisted to avoid antenna effect

Twisted Pair:"Cat 5" is 4 twisted pairs in bundle

Page 44: Computer Architecture Chapter 1  Computer Abstractions and Technology

04/21/23 45

Latency Lags Bandwidth(last ~20 years)

• Performance Milestones

• Ethernet: 10Mb, 100Mb, 1000Mb, 10000 Mb/s (16x,1000x)

• Memory Module: 16bit plain DRAM, Page Mode DRAM, 32b, 64b, SDRAM, DDR SDRAM (4x,120x)

• Disk: 3600, 5400, 7200, 10000, 15000 RPM (8x, 143x)

(latency = simple operation w/o contentionBW = best-case)

1

10

100

1000

10000

1 10 100

Relative Latency Improvement

Relative BW

Improvement

Memory

Network

Disk

(Latency improvement = Bandwidth improvement)

Page 45: Computer Architecture Chapter 1  Computer Abstractions and Technology

04/21/23 46

CPUs: Archaic (Nostalgic) vs. Modern (Newfangled)

• 1982 Intel 80286 • 12.5 MHz• 2 MIPS (peak)• Latency 320 ns• 134,000 xtors, 47 mm2

• 16-bit data bus, 68 pins• Microcode interpreter,

separate FPU chip• (no caches)

• 2001 Intel Pentium 4 • 1500 MHz (120X)• 4500 MIPS (peak) (2250X)• Latency 15 ns (20X)• 42,000,000 xtors, 217 mm2

• 64-bit data bus, 423 pins• 3-way superscalar,

Dynamic translate to RISC, Superpipelined (22 stage),Out-of-Order execution

• On-chip 8KB Data caches, 96KB Instr. Trace cache, 256KB L2 cache

Page 46: Computer Architecture Chapter 1  Computer Abstractions and Technology

04/21/23 47

Latency Lags Bandwidth(last ~20 years)

• Performance Milestones• Processor: ‘286, ‘386, ‘486,

Pentium, Pentium Pro, Pentium 4 (21x,2250x)

• Ethernet: 10Mb, 100Mb, 1000Mb, 10000 Mb/s (16x,1000x)

• Memory Module: 16bit plain DRAM, Page Mode DRAM, 32b, 64b, SDRAM, DDR SDRAM (4x,120x)

• Disk : 3600, 5400, 7200, 10000, 15000 RPM (8x, 143x)

1

10

100

1000

10000

1 10 100

Relative Latency Improvement

Relative BW

Improvement

Processor

Memory

Network

Disk

(Latency improvement = Bandwidth improvement)

CPU high, Memory low(“Memory Wall”)

Page 47: Computer Architecture Chapter 1  Computer Abstractions and Technology

1/22/2008 48

Computing Devices Then…

EDSAC, University of Cambridge, UK, 1949

Page 48: Computer Architecture Chapter 1  Computer Abstractions and Technology

1/22/2008CS152-Spring’08

49

Computing Devices Now

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this p icture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Robots

SupercomputersAutomobiles

Laptops

Set-top boxes

Games

Smart phones

Servers

Media Players

Sensor Nets

Routers

Cameras