topic 5 processor development ah computing computer architecture

100
Topic 5 Processor Development AH Computing Computer Architecture

Upload: jasper-anderson

Post on 23-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Topic 5 Processor Development AH Computing Computer Architecture

Topic 5 Processor Development

AH Computing

Computer Architecture

Page 2: Topic 5 Processor Development AH Computing Computer Architecture

SQA arrangements Description of the evolution of the following microprocessor architectures: the

Power PC series, the Intel X86 series and the Intel IA-64 in terms, where appropriate, of the following features and techniques: increasing clock speeds data bus widths pipelining superscalar processing branch prediction speculative loading of data and executing of instructions predication the number and function of registers used SIMD RISC CISC

Explanation of the relationship between these developments and system performance.

Page 3: Topic 5 Processor Development AH Computing Computer Architecture

Introduction From 1980s, microprocessor architecture has

developed rapidly, as a result of Increasing miniaturisation of microelectronic

circuitry, which means that more and more complex chip designs have become possible and economically viable

The pressure form software developers to design microprocessors with ever increasing performance

Page 4: Topic 5 Processor Development AH Computing Computer Architecture

Introduction The first microprocessors were not general

purpose processors but were designed for specific applications

Page 5: Topic 5 Processor Development AH Computing Computer Architecture

Intel 4004 (1971) the first complete CPU on one chip the first commercially available microprocessor used

in calculators, data terminals, numeric control systems etc.

16 general purpose registers 1KByte of data memory and 4Kbytes of instruction

memory 16 4-bit GP registers Clock speed of 740 KHz 45 instructions

Page 6: Topic 5 Processor Development AH Computing Computer Architecture

Intel 8080 (1974) 16-bit address bus, 8-bit

data bus PC was 16 bits long 7 8-bit GP registers Used in the first personal

computer, the Altair 8800 Others…Zilog Z-80,

Motorola/MOS 6502

Page 7: Topic 5 Processor Development AH Computing Computer Architecture
Page 8: Topic 5 Processor Development AH Computing Computer Architecture
Page 9: Topic 5 Processor Development AH Computing Computer Architecture

Processor DevelopmentLook at the evolution of families of processors Power PC Intel X86 Intel I-64

Page 10: Topic 5 Processor Development AH Computing Computer Architecture

Processor DevelopmentCompare the following features and techniques Increasing clock speeds Data bus widths Pipelining Superscalar processing Branch prediction Speculative loading of data Predication The number and function of registers used SIMD RISC CISC

Page 11: Topic 5 Processor Development AH Computing Computer Architecture
Page 12: Topic 5 Processor Development AH Computing Computer Architecture
Page 13: Topic 5 Processor Development AH Computing Computer Architecture

8086/88 (1979)

Page 14: Topic 5 Processor Development AH Computing Computer Architecture
Page 15: Topic 5 Processor Development AH Computing Computer Architecture
Page 16: Topic 5 Processor Development AH Computing Computer Architecture
Page 17: Topic 5 Processor Development AH Computing Computer Architecture

Pentium Intel introduced superscalar architecture to the

Pentium processor 2 integer arithmetic and logic units 1 Floating Point unit 8 80-bit

Page 18: Topic 5 Processor Development AH Computing Computer Architecture
Page 19: Topic 5 Processor Development AH Computing Computer Architecture
Page 20: Topic 5 Processor Development AH Computing Computer Architecture

X86 series evolution

Page 21: Topic 5 Processor Development AH Computing Computer Architecture

Development of registers X86

808680286

Page 22: Topic 5 Processor Development AH Computing Computer Architecture

Development of registers X86

80486

Page 23: Topic 5 Processor Development AH Computing Computer Architecture

Development of registers X86

Pentium 3

Page 24: Topic 5 Processor Development AH Computing Computer Architecture

Summary of X86The X86 series of microprocessors can be characterised

as having: a relatively small number of registers (8 GP, 8 FP

and 8 SIMD) a large instruction set instructions of varying length many addressing modes These characteristics are typical of CISC (complex

instruction set computer) architecture. Other CISC based processors include the IBM 370 and the VAX11/780.

Page 25: Topic 5 Processor Development AH Computing Computer Architecture

Questions (Scholar page 128)5. Sketch a graph of the increase in clock speeds from

the 8086 to the Pentium processor6. Which of the X86 processors was the first to use

pipelining to improve performance?7. How many registers has the (a) 8086, (b) 80286, (c)

80486 (d) Pentium8. Which X86 chip was the first to have a superscalar

architecture?9. The X86 series are considered to be CISC

processors. Justify this claim.

Page 26: Topic 5 Processor Development AH Computing Computer Architecture

PowerPC series

Page 27: Topic 5 Processor Development AH Computing Computer Architecture

Background Improvements in processor capability and

operating systems led to the birth of the Wintel PC

Wintel is portmanteau of Windows and Intel. It usually means a computer based on an Intel x86 compatible processor and running the Microsoft Windows operating system.

Still dominates the laptop and desktop market

Page 28: Topic 5 Processor Development AH Computing Computer Architecture

Motorola At the same time Motorola was developing its

own family of microprocessors, the 68000 series

These were developed as 32-bit processors from start

As a result, Apple was able to develop its Macintosh computers with true graphical OS from the start

Page 29: Topic 5 Processor Development AH Computing Computer Architecture

Motorola 68000 (1979) Same time as Intel 8086 8MHz clock speed 32-bit architecture 16-bit data bus, 24-bit address bus 16 32-bit registers (8 data, 8 address) No segment registers required as direct

addressing used Used pre-fetching to speed up execution

Page 30: Topic 5 Processor Development AH Computing Computer Architecture

Motorola 68020 (1984) 32-bit data and address buses Pipeline had 3 stages 256 cache added

Page 31: Topic 5 Processor Development AH Computing Computer Architecture

Motorola 68040 (1991) 32-bit data and address buses Pipeline had 6 stages Floating point unit added 4Kbyte caches for data and programs added

Page 32: Topic 5 Processor Development AH Computing Computer Architecture

Motorola 68060 (1994) Superscalar – 3 execution units, 2 integer and

1 FP 10 stage pipelines 8Kbyte caches for data and programs

Page 33: Topic 5 Processor Development AH Computing Computer Architecture

Motorola series Used in Sun workstations, Apple Macintosh

computers, and later Atari computers No longer in use in main computer market Still used in embedded systems Motorola and IBM designed the first

PowerPC chip to

Page 34: Topic 5 Processor Development AH Computing Computer Architecture

Main Characteristics of Motorola series

In the final years of the 68000 processors, Apple, Motorola and IBM defined a specification for open system software and hardware, and Motorola and IBM designed the first PowerPC chip to meet this specification.

Page 35: Topic 5 Processor Development AH Computing Computer Architecture

PowerPC Acronym for “performance optimised with

enhanced RISC” Compared with CISC-based X86

More registers A smaller, but more efficient, instruction set Less addressing modes

Page 36: Topic 5 Processor Development AH Computing Computer Architecture

PowerPC First chip 601 in 1993 32-bit chip with a 64-bit data bus Clock speed of 60MHz Up to 4 Gb of memory Superscalar architecture 3 independent

execution units (integer, floating point and branch processing) – each with a 6 stage pipeline

Page 37: Topic 5 Processor Development AH Computing Computer Architecture
Page 38: Topic 5 Processor Development AH Computing Computer Architecture

Used in the Nintendo Wii

Used in the XBox

Page 39: Topic 5 Processor Development AH Computing Computer Architecture

Power PC overview Used in

Controllers in cars Networking – routers and servers Honda’s Asimo Vehicle-Management Computer for the F-35

fighter jet Playstation 3, Wii, Nintendo DS

Page 40: Topic 5 Processor Development AH Computing Computer Architecture

All Power PC processors have

•two sets of 32 programmer accessible GP registers (64 bits wide)

•And a small number of special purpose registers

Page 41: Topic 5 Processor Development AH Computing Computer Architecture

Comparison of X86 with PowerPC

Direct addressing for Load, Store and Branch instructions. All other instruction address internal registers

Page 42: Topic 5 Processor Development AH Computing Computer Architecture

TRENDS important

Page 43: Topic 5 Processor Development AH Computing Computer Architecture

Summary of table clock speeds have increased by a factor of 50

in 10 years bus speeds have increased by a factor of 20 the complexity (no. of transistors) has

increased by a factor of 20 on chip cache has increased new features have been added.

Page 44: Topic 5 Processor Development AH Computing Computer Architecture

Clock speeds PowerPC chips had clock speeds lower than

CISC based designs But more efficient RISC based technology

gave a better performance. Clock speed alone cannot be used to compare

processors

Page 45: Topic 5 Processor Development AH Computing Computer Architecture

Questions (Page 133)10. Which 3 companies cooperated in the design of the PowerPC specification?11. What was the first PowerPC chip released, and when?12. The 601 chip can be described as superscalar. How is this justified?13. How many programmer accessible registers are there in all PowerPC chips?14. Compare the X86 and PowerPC architectures in terms of

1. a) instructions set 2. b) instruction length 3. c) addressing modes

15. What new feature did the G3 chip have which improved performance?16. Which was the first PowerPC chip to have SIMD instructions?

1. a) 601 2. b) 604e 3. c) G3 4. d) G4 5. e) G5

17. Why is clock speed not a good way of comparing a Windows PC with a Apple Macintosh?

18. Other than in Apple computers, what are PowerPC chips used for?

Page 46: Topic 5 Processor Development AH Computing Computer Architecture

Q10: Apple, Motorola, IBMQ11: the 601 in 1993Q12: it has 3 independent processing units - the floating point unit (FPU), the integerALU, and the system unitQ13: 2 sets of 32 registers, each 64 bits wideQ14: a) similar - X86 has 235 different instructions, PowerPC has 225b) X86 has varied instruction lengths (1-11 bytes), the PowerPC instructions are allexactly 4 bytesc) the X86 has 11 addressing modes, the PowerPC has only 2Q15: L2 "backside" cache on chipQ16: d) G4Q17: because the Mac uses the more efficient RISC architecture, a Mac with a lowerclock speed may outperform a Windows PC with a higher clock speedQ18: IBM servers, Nintendo Game Cube, and a range of embedded applications

Answers

Page 47: Topic 5 Processor Development AH Computing Computer Architecture

Intel IA-64

Page 48: Topic 5 Processor Development AH Computing Computer Architecture

Intel IA-64 The X86 series reached its peak with the

Pentium 3, Pentium 4 and Athlon processors. These are essentially CISC processors, using

pipelining and superscalar processing, but with some RISC-like features. In 1994, Intel and HP began work on designing a new 64-bit architecture to replace the X86 series.

Page 49: Topic 5 Processor Development AH Computing Computer Architecture

EPIC Combination of RISC and CISC features, and is given the

description EPIC - explicitly parallel instruction computing. There are 4 key features to the design:

instruction level parallelism - the compiler creates code which uses the many parallel execution units of the processor

use of VLIW - very long instruction words use of predication - executing both branches of a program,

then discarding the "not chosen" branch results use of speculative loading - use of large fast cache to load

data and instructions in advance of when they will be required

Page 50: Topic 5 Processor Development AH Computing Computer Architecture

X86 IA-64

Page 51: Topic 5 Processor Development AH Computing Computer Architecture

X86 IA-64

Page 52: Topic 5 Processor Development AH Computing Computer Architecture

X86 IA-64

Page 53: Topic 5 Processor Development AH Computing Computer Architecture

X86 IA-64

Page 54: Topic 5 Processor Development AH Computing Computer Architecture

VLIW Very Long Instruction Words Fetched from memory in bundles of 128 bits Contains 3 instructions Each of length 41 bits Final 5 bits are a pointer, which indicates to the processor to

which of the many execution units each instruction should be assigned.

Page 55: Topic 5 Processor Development AH Computing Computer Architecture
Page 56: Topic 5 Processor Development AH Computing Computer Architecture

IA-64 Execution Units I-unit (integer and logical operations) M-unit (load and store operations) B-unit (branch instructions) F-unit (floating point operations)

Page 57: Topic 5 Processor Development AH Computing Computer Architecture

Pointer 5 bits = 32 different combinations 00000 – send instruction 1 to the M-unit, instruction

2 to the I-unit, instruction 3 to another I-unit 11101 – send instruction 1 to the M-unit, instruction

2 to the F-unit and instruction 3 to the B-unit The pointer is created by the compiler which

determines in advance whether or not instructions can be executed in parallel

Page 58: Topic 5 Processor Development AH Computing Computer Architecture

The CompilerWhen the instruction arrives at the processor, the 3 instructions are directed to the appropriate execution unit for processing:

Page 59: Topic 5 Processor Development AH Computing Computer Architecture

Summary of IA-64 Performance is enhanced by

the use of VLIW reduces the number of relatively slow memory fetches

The sequencing of instructions being determined by the compiler rather than being dealt with at run time)

Page 60: Topic 5 Processor Development AH Computing Computer Architecture
Page 61: Topic 5 Processor Development AH Computing Computer Architecture

The Itanium processor The first commercial version of the IA-64

architecture was massively superscalar 11 execution units 4 integer units 2 floating point units 3 branch units 2 load/store units

Page 62: Topic 5 Processor Development AH Computing Computer Architecture

The Itanium processor It makes extensive use of

Predication Speculative loading of both data and instructions

Executes 20 operations per cycle Clock speed of 800MHz is the equivalent of

an X86 or PowerPC running at several GHz

Page 63: Topic 5 Processor Development AH Computing Computer Architecture

The Itanium processorIt has 128 64-bit registers for integer/logical/general

purpose use 128 82-bit registers for floating point and

graphics use Data bus 128 bits wide Address bus 64 bits wide (potentially64

Exabytes of addressable memory)

1 Exabyte

=1024 Petabytes

=1024 x 1024 Terabytes

= 1024 x 1024 x 1024 Gigabytes

Page 64: Topic 5 Processor Development AH Computing Computer Architecture

Questions (Page 137)19. The IA-64 uses VLIW. What does this mean?

20. Can the Itanium be described as a superscalar architecture?

21. IA-64 chips use predication. Explain the difference between predication and branch prediction.

22. How can an 800MHz Itanium outperform a 2.5GHz Pentium?

Page 65: Topic 5 Processor Development AH Computing Computer Architecture

AnswersQ19: VLIW = very large instruction word; the IA-64 fetches a 128 bit bundle containing 3 41-bit instructions during each memory fetchQ20: yes, it has 11 execution units which can operate in parallelQ21: branch prediction mean "guessing" whether or not a branch will be taken, and executing following instructions accordingly - if the prediction is wrong, the pipeline will stall; predication means executing instructions from both branches simultaneously, and discarding the results from the branch which is not requiredQ22: due to its parallel execution units, 10 stage pipeline, VLIW memory accessing and use of predication and speculative loading, the Itanium can process up to 20 operations per cycle.

Page 66: Topic 5 Processor Development AH Computing Computer Architecture

Intel Itanium Intel has released two processor families

using the brand: the original Itanium and the Itanium 2.

Starting November 1, 2007, new members of the second family are again called Itanium.

The processors are marketed for use in enterprise servers and high-performance computing systems.

Page 67: Topic 5 Processor Development AH Computing Computer Architecture

Dual Core Dual-core refers to a

CPU that includes two complete execution cores per physical processor.

Page 68: Topic 5 Processor Development AH Computing Computer Architecture

Parallel Computing

Page 69: Topic 5 Processor Development AH Computing Computer Architecture

SQA arrangements Description of how parallel computers

function referring to their use of: local (cache) as well as main memory pipelining local pathways and packet switching to achieve

communication between CPUs. Description of the performance benefits of

parallel computers.

Page 70: Topic 5 Processor Development AH Computing Computer Architecture

Examples of parallel computing Pipelining - executing one instruction while

fetching the next Superscalar architecture– multiple execution

units all processing different operations simultaneously

SIMD instructions – the same instruction being applied to several data items at the same time

Page 71: Topic 5 Processor Development AH Computing Computer Architecture

Parallel Computing Another approach is to have multiple

processors This is the basis of most mainframe

computers and supercomputers

Page 72: Topic 5 Processor Development AH Computing Computer Architecture

Parallel Computing Using multiple processing elements simultaneously

to solve a problem. accomplished by breaking the problem into

independent parts so that each processing element can execute its part of the algorithm simultaneously with the others.

The processing elements can include resources such as a single computer with multiple processors, several networked computers, specialized hardware, or any combination of the above

Page 73: Topic 5 Processor Development AH Computing Computer Architecture

Multiprocessing, mainframes and supercomputers

Simplest – several processors connected to the same system bus…

Page 74: Topic 5 Processor Development AH Computing Computer Architecture

Multiprocessing, mainframes and supercomputers Each processor has shared

access to memory and to I/O devices

Master-slave – some systems have one processor controlling the others

Symmetrical Multiprocessing (SMP)- In other systems all are equal (up to 10 processors)

Page 75: Topic 5 Processor Development AH Computing Computer Architecture

Multiprocessing Not limited to mainframe systems PowerMac G5 dual processor desktop system

has 2 G5 processors

Page 76: Topic 5 Processor Development AH Computing Computer Architecture

Comparison

Page 77: Topic 5 Processor Development AH Computing Computer Architecture

Massively Parallel Architectures

Page 78: Topic 5 Processor Development AH Computing Computer Architecture

Massive parallel processing (MPP) is a term used in computer architecture to

refer to a computer system with many independent arithmetic units or entire microprocessors, that run in parallel.

The term massive connotes hundreds if not thousands of such units.

processors are arranged in an interconnected array which serves as a network.

Early examples of such a system are the Distributed Array Processor, the Goodyear MPP, the Connection Machine, and the Ultracomputer.

Page 79: Topic 5 Processor Development AH Computing Computer Architecture

Massively Parallel Architectures Today's most powerful supercomputers are all

MP systems such as Earth Simulator, Blue Gene, ASCI White, ASCI Red, ASCI Purple, and ASCI Thor's Hammer.

Page 80: Topic 5 Processor Development AH Computing Computer Architecture

Massively Parallel Architectures

Memory Each processor has access to its own local

memory or cache All processors can access a main (global)

memory by a systemwide bus

Page 81: Topic 5 Processor Development AH Computing Computer Architecture

Massively Parallel Architectures processors are pipelined - the results from

one processor can become the input for another processor

as well as the system bus, there may be local pathways connecting groups of processors into clusters, and other pathways connecting clusters

Page 82: Topic 5 Processor Development AH Computing Computer Architecture

MP Architectures - communicationTo achieve communication between processors, parallel

computers use: data pathways (buses) to connect clusters of

processors, as well as system buses to connect processors and pipelines, enabling the results of one CPU to flow into another

Or packet switching techniques similar to those used in

networks to manage the flow of data between processors.

Page 83: Topic 5 Processor Development AH Computing Computer Architecture

MP Architectures - communication Packet switching techniques, similar to

those on a network, are used in which data packets are assigned the addresses of specific nodes (processors) on the array.

This enables any processor on the array to access the local memory of any other processor on the array or to pass data or instructions to other processors.

Page 84: Topic 5 Processor Development AH Computing Computer Architecture

MPP

Page 85: Topic 5 Processor Development AH Computing Computer Architecture

Examples - Lucidor It consists of 90 interconnected nodes. Each node

has two 90MHz Itanium 2 processors accessing 16K of L1 cache, and 256K of L2 cache.

Each node can access the system bus via a 128 port switch at a data transfer rate of 2Gbits per second.

In addition to the local memory, each node has shared access to 6Gb of main memory.

As a result, the system can achieve data processing rates of over 600GFlops per second.

Page 86: Topic 5 Processor Development AH Computing Computer Architecture

Hitachi SR2201 from 8 up to 2048 processors. The processors (Hitachi

RISC chips) are arranged in a 3-dimensional grid to maximise communication between them.

As with Lucidor, speeds of up to 600GFlos per second can be achieved.

These systems are in use for a variety of applications, including structural and crash analysis, fluid dynamics research, quantum chemistry analysis and visualisation tools.

All of these can make use of the parallel architecture, as they require high speed processing of large amounts of data.

Page 87: Topic 5 Processor Development AH Computing Computer Architecture

Cray The CrayT3D is a current example, with 2048

nodes arranged in a 3-dimensional grid. Each node has 2 Alpha processors, with

access to individual cache and 8Mwords of memory.

Cray claims that this system can process 1 trillion flops per second.

Page 88: Topic 5 Processor Development AH Computing Computer Architecture

Cray 2

Page 89: Topic 5 Processor Development AH Computing Computer Architecture

Blue Gene Blue Gene is a computer architecture project

designed to produce several supercomputers, designed to reach operating speeds in the PFLOPS (petaFLOPS) range, and currently reaching sustained speeds of nearly 500 TFLOPS (teraFLOPS).

Blue Gene/L has 65,536 processors. Each is connected by 3 networks. At the time of writing, Blue Gene/L is the fastest

computer in the world, achieving over 70Tflops per second.

Page 90: Topic 5 Processor Development AH Computing Computer Architecture

Blue Gene

Chip – 2 processors

Card – 2 chips

Node – 16 cards

Cabinet – 32 nodes

System – 64 cabinets

Page 91: Topic 5 Processor Development AH Computing Computer Architecture

Blue Gene

Page 92: Topic 5 Processor Development AH Computing Computer Architecture

Exercise Research one of the following –

Earth Simulator, Blue Gene, ASCI White, ASCI Red, ASCI Purple, and ASCI Thor's Hammer.

In terms of Number of nodes Number of processors at each node Global memory Processing power in teraflops per second Applications

Page 93: Topic 5 Processor Development AH Computing Computer Architecture

Past Paper Questions 2011 Q 14 2008 Q 13 2007 Q17a,b 2006 Q15b

Page 94: Topic 5 Processor Development AH Computing Computer Architecture

Past Paper 2008JGT(37) If the flag is set jump to location 37

Describe the problem that instruction JGT(37) could cause for a processor using a pipeline.

Page 95: Topic 5 Processor Development AH Computing Computer Architecture

2009

Mediatrain is a company which uses a high performance computer system to produce multimedia training projects. The computer system has a PowerPC superscalar processor which has thirty two 64-bit general purpose registers.(a) The PowerPC is an example of a RISC processor. RISC processors have a large number of general purpose registers. Name three other features of a RISC processor that distinguish it from a CISCprocessor. (3)

Page 96: Topic 5 Processor Development AH Computing Computer Architecture

2009 contc. Explain the benefit to the PowerPC processor

of having so many general purpose registers (2)

d. Most of the instructions in the PowerPC processor instruction set have an op-code and an operand.

Describe the function of the op-code and the operand. (2)

Page 97: Topic 5 Processor Development AH Computing Computer Architecture

2009 conte. Superscalar processing involves the use of

multiple pipelines.

State a feature of the PowerPC processor which makes it suited to superscalar processing. Justify your answer. (4)

Page 98: Topic 5 Processor Development AH Computing Computer Architecture

2009 contf. Branch instructions can cause a problem for

processors which use pipelines.

Branch prediction can reduce this problem.

Describe how branch prediction operates. (3)

Page 99: Topic 5 Processor Development AH Computing Computer Architecture

2009(g) The PowerPC processor makes use of Single

Instruction Multiple Data (SIMD) instructions.

Explain how the use of SIMD instructions improves performance, using a suitable multimedia example. (3)

Page 100: Topic 5 Processor Development AH Computing Computer Architecture

200815. The Pentium III processor has eight registers

which can be operated on by SIMD instructions.

(a) Describe what is meant by a SIMD instruction. (1)

(b) Describe how the Pentium III could use SIMD instructions and registers when adjusting the brightness of a graphic. (3)