comp arch ch1 ch2 ch3 ch4

8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4

1/161

William Stallings

Computer Organizationand Architecture8th Edition

CHAPTER 1

INTRODUCTION


2/161

Architecture and Organization

Architecture is those attributes visible tothe programmer Instruction set, number of bits used for data

representation, I/O mechanisms, addressing

techniques. e.g. Is there a multiply instruction?

Organization is how features areimplemented Control signals, interfaces, memory technology.

e.g. Is there a hardware multiply unit or is it done by

repeated addition?


3/161

Family Concept

All Intel x86 family share the same basicarchitecture

The IBM System/370 family share the samebasic architecture

This gives code compatibility (at least

backwards)

Organization differs between different

versions

Architecture and Organization


4/161

Computer Complex system: How can we

design/describe it?

Hierarchical system: A set of interrelated subsystems, each

subsystem hierarchic in structure until somelowest level of elementary subsystems is

reached

At each level of the system, the designer

is concerned with structureand function.

Structure and Function


5/161

Structure and Function

Structure is the way in whichcomponents relate to each other

Function is the operation of individualcomponents as part of the structure


6/161

Function

General computer

functions:

Data processing Data storage

Data movement

Control


7/161

Operations

Data movement

Ex., keyboard to

screen

Functional View of the Computer


8/161

Operations

Storage

Ex., Internet

download to disk

Playing an mp3 file

stored in memory

to earphones attached

to the same PC.


9/161

Operations

Processing from/

to storage

Any number-crunching

application that takes

data from memory and

stores the result back in

memory.

ex., updating bank

statement


10/161

Operations

Processing from

storage to I/O

Receiving packets over a

network interface,

verifying their CRC,

then storing them

in memory.

ex., printing a bank

statement


11/161

Structure

Four main structural components

CPU

Main Memory

I/O Devices

System Interconnection


12/161

Structure

Four main structural components

1. Central Processing Unit (CPU)

Controls the operation of thecomputer and performs its dataprocessing functions; often simplyreferred to as processor.

2. Main Memory

Stores data


13/161

Structure

Four main structural components3.I/O

moves data between the computer

and its external environment.4. System Interconnection

Some mechanism that provides for

communication among CPU, mainmemory, and I/O. A common example ofsystem interconnection is a system bus

consisting of a number of wires to w/c all


14/161

Structure Top Level

ComputerMain

Memory

InputOutput

SystemsInterconnection

Peripherals

Communicationlines

CentralProcessing

Unit

Computer


15/161

Structure The CPU

ComputerArithmetic

andLogic Unit

Control

Unit

Internal CPUInterconnection

Registers

CPU

I/O

Memory

SystemBus

CPU


16/161

Structure The Control Unit

CPU

ControlMemory

Control UnitRegisters and

Decoders

SequencingLogic

ControlUnit

ALU

Registers

InternalBus

Control Unit


17/161

Computer Evolution andPerformance

CHAPTER 2


18/161

Brief History of Computers

The First Generation: Vacuum Tubes

ENIAC

oElectronic Numerical Integrator And Computer

oWorlds first general purpose electronic digitalcomputer

o John Mauchly and John Eckert

o It weighs 30 tons, occupying 1500 square feetof floor space, and containing more than18,000 vacuum tubes.


19/161


The First Generation: Vacuum Tubes

Von Neumann/Turing

o

Stored Program concepto Main memory storing programs and data

o Attributed to John von Neumann who was anENIAC designer and Alan Turing was the one

who developed the idea

o Input and output equipment operated bycontrol unit


20/161

o In 1946, von Neumann and his colleaguesbegan the design of a new stored program

computer, referred to as the IAS computer.

o The IAS computer, although not completeduntil 1952, is the prototype of all subsequentgeneral-purpose computers.


21/161


IAS computer consist of:oA main memory, which stores both data and

instructions

o

An arithmetic and logic unit (ALU) capable ofoperating on binary data

oA control unit, which interprets the instructionsin memory and causes them to be executed

o Input and output (I/O) equipment operated bythe control unit


22/161

Structure of the IAS computer


23/161

John von Neumann and the IAS machine, 1952


24/161

UNIVAC

o UNIVAC I (Universal Automatic Computer)

o 1947 -Eckert-Mauchly Computer Corporationo first successful commercial computer. It was

intended for both scientific and commercialapplications.

o US Bureau of Census 1950 calculations

o Became part of Sperry-Rand Corporation

o Late 1950s -UNIVAC II

-Faster

-More memory


25/161

IBM

o Punched-card processing equipment

o 1953 -the 701

o IBMs first stored program computer

o Scientific calculations

o 1955 -the 702

o Business applications

o Lead to 700/7000 series


26/161


The Second Generation: Transistors

Transistoro is smaller, cheaper, and dissipates less heat

than a vacuum tube but can be used in

the same way as a vacuum tube to constructcomputers

o invented at Bell Labs in 1947 by WilliamShockley

o IBM 7000

o DEC (Digital Equipment Corporation) wasfounded in 957

o Produced PDP-1 in the same year


27/161


The Third Generation: Integrated Circuits

o A computer is made up of gates, memorycells and interconnections

o single, self-contained transistor is called a

discrete component

o All these can be manufactured either

separately (discrete components) or on the

same piece of semiconductor


28/161


Generations of Computers

oVacuum tube -1946-1957

oTransistor -1958-1964

oSmall scale integration -1965 on

-Up to 100 devices on a chip

oMedium scale integration -to 1971

-100-3,000 devices on a chip

oLarge scale integration -1971-1977

-3,000 -100,000 devices on a chipoVery large scale integration -1978 -1991

-100,000 -100,000,000 devices on a chip

oUltra large scale integration1991 -

-Over 100,000,000 devices on a chip


29/161

Moores Law

Increased density of components on chip

Gordon Mooreco-founder of IntelNumber of transistors on a chip will double everyyear

Since 1970s development has slowed a little

Number of transistors doubles every 18 months

Cost of a chip has remained almost unchanged

Higher packing density means shorter electrical

paths, giving higher performanceSmaller size gives increased flexibility

Reduced power and cooling requirements

Fewer interconnections increases reliability


30/161

Growth in CPU Transistor Count


31/161

IBM 360 Series

first planned family of computers.

Similar or identical O/S

Increasing speed

Increasing number of I/O ports (i.e. moreterminals)

Increased memory size

Increased cost

Multiplexed switch structure


32/161

DEC PDP - 8 1964

First minicomputer (after miniskirt!)

Did not need air conditioned room

Small enough to sit on a lab bench

$16,000

-$100k+ for IBM 360

Embedded applications & OEM

BUS STRUCTURE


33/161

DEC-PDP 8 Bus Structure


34/161

Semiconductor Memory

1970Fairchild

Size of a single core

-i.e. 1 bit of magnetic core storageHolds 256 bits

Non-destructive read

Much faster than coreCapacity approximately doubles each year


35/161

Microprocessors -Intel

1971 -4004

First microprocessorAll CPU components on a single chip

4 bit

Multiplication by repeated addition, no hardwaremultiplier!

Followed in 1972 by 8008

8 bit

Both designed for specific applications

1974 -8080

Intels first general purpose microprocessor


36/161

1970s Processors


37/161

1980s Processors


38/161

1990s Processors


39/161

Recent Processors


40/161

Designing for Performance

Year by year, the cost of computer systemscontinues to drop dramatically, while theperformance and capacity of those systemscontinue to rise equally dramatically

The basic building blocks for todays computermiracles are virtually the same as those of theIAS computer from over 50 years ago, while onthe other hand, the techniques for squeezing thelast iota of performance out of the materials athand have become increasingly sophisticated.


41/161


But many techniques have been invented toimprove the performance.

Some of the main techniques are the following:

Pipelining

On board cache

On board L1 and L2 Cache

Branch Prediction -The processor looks ahead in theinstruction code fetched from memory and predicts

which branches, or groups of instructions, are likely tobe processed next. If the processor guesses right mostof the time, it can pre-fetch the correct instructions andbuffer them so that the processor is kept busy.


42/161


Data Flow Analysis - The processor analyzes whichinstructions are dependent on each others results, ordata, to create an optimized schedule of instructions.

Speculative Execution - Using branch prediction anddata flow analysis, some processors speculativelyexecute instructions ahead of their actual appearance inthe program execution, holding the results in temporarylocations. This enables the processor to keep its

execution engines as busy as possible by executinginstructions that are likely to be needed.


43/161

Performance Balance

While processor power has raced ahead atbreakneck speed, other critical components of thecomputer have not kept up. The result is a need to

look for performance balance: an adjusting of theorganization and architecture to compensate forthe mismatch among the capabilities of the variouscomponents.

Processor speed increased Memory capacity increased


44/161

Logic and Memory Performance Gap


45/161

While processor speed has grown rapidly, thespeed with which data can be transferredbetween main memory and the processor haslagged badly. The interface between processorand main memory is the most crucial pathway in

the entire computer because it is responsible forcarrying a constant flow of program instructionsand data between memory chips and theprocessor. If memory or the pathway fails to

keep pace with the processors insistentdemands, the processor stalls in a wait state,and valuable processing time is lost.


46/161

Solutions

Increased number of bits retrieved at one time Make DRAM wider rather than deeper

Change DRAM interface Cache

Reduce frequency of memory access More complex cache and cache on chip

Increase interconnection bandwidth High speed buses

Hierarchy of buses


47/161

I/O Devices

As computers become faster and more capable,more sophisticated applications are developed thatsupport the use of peripherals with intensive I/Odemands.

Solutions Caching

Buffering

Higher-speed interconnection buses

More elaborate bus structures

Multiple processor configurations


48/161

Typical I/O Device Data Rates


49/161

The key is balance among:

Processor components Main memory

I/O Devices

Interconnection structures

Th l ti f th I t l X86 A hit t


50/161

The evolution of the Intel X86 Architecture

8080: The worlds first general-purpose

microprocessor. This was an 8-bit machine,with an 8-bit data path to memory. The 8080 wasused in the first personal computer, the Altair.

8086: A far more powerful, 16-bit machine. In

addition to a wider data path and largerregisters, the 8086 sported an instruction cache,or queue, that pre-fetches a few instructionsbefore they are executed. A variant of thisprocessor, the 8088, was used in IBMs firstpersonal computer, securing the success of Intel.The 8086 is the first appearance of the x86

architecture.



51/161


80286: This extension of the 8086 enabled

addressing a 16-MByte memory instead of just1 MByte.

80386: Intels first 32-bit machine, and a major

overhaul of the product. With a 32-bitarchitecture, the 80386 rivaled the complexity andpower of minicomputers and mainframesintroduced just a few years earlier. This was the

first Intel processor to support multitasking,meaning it could run multiple programs at thesame time.



52/161


80486: The 80486 introduced the use of

much more sophisticated and powerfulcache technology and sophisticated instructionpipelining. The 80486 also offered a built-inmath coprocessor, offloading complex mathoperations from the main CPU.

Pentium: With the Pentium, Intel introduced

the use of superscalar techniques, whichallow multiple instructions to execute in parallel.



53/161


Pentium Pro: The Pentium Pro continued the

move into superscalar organization begunwith the Pentium, with aggressive use ofregister renaming, branch prediction, data flowanalysis, and speculative execution.

Pentium II: The Pentium II incorporated IntelMMX technology, which is designedspecifically to process video, audio, and

graphics data efficiently. Pentium III: The Pentium III incorporates

additional floating-point instructions tosupport 3D graphics software.



54/161


Pentium 4: The Pentium 4 includes

additional floating-point and otherenhancements for multimedia.8

Core: This is the first Intel x86

microprocessor with a dual core, referring tothe implementation of two processors on asingle chip.

Core 2: The Core 2 extends the architecture

to 64 bits. The Core 2 Quad provides fourprocessors on a single chip.

Embedded Systems and ARM


55/161

Embedded Systems and ARM The ARM architecture refers to a processor

architecture that has evolved from RISC designprinciples and is used in embedded systems.

The term embedded system refers to the use ofelectronics and software within aproduct, as

opposed to a general-purpose computer, such asa laptop or desktop system.

Embedded system. A combination ofcomputer hardware and software, and

perhaps additional mechanical or other parts,designed to perform a dedicated function. Inmany cases, embedded systems are part of alarger system or product, as in the case of an



56/161


Embedded Systems Requirements:

Small to large systems, implying very differentcost constraints, thus different needs foroptimization and reuse

Relaxed to very strict requirements andcombinations of different quality requirements,for example, with respect to safety, reliability,real-time, flexibility, and legislation

Short to long life times Different environmental conditions in terms of,

for example, radiation, vibrations, and humidity


57/161

Possible Organization of an Embedded System


58/161


59/161

ARM Evolution


60/161

ARM processors are designed to meet the

needs of three system categories: Embedded real-time systems: Systems for storage,automotive body and power-train, industrial, andnetworking applications

Application platforms: Devices running openoperating systems including Linux, Palm OS,Symbian OS, and Windows CE in wireless,consumer entertainment and digital imaging

applications Secure applications: Smart cards, SIM cards, and

payment terminals

P f A


61/161

Performance Assessment

In evaluating processor hardware and setting

requirements for new systems, performance is one of thekey parameters to consider, along with cost, size,security, reliability, and in some cases powerconsumption.

System clock speed Operations performed by a processor, such as

fetching an instruction, decoding the instruction,performing an arithmetic operation, and so on are

governed by a system clock. The speed of a processor is dictated by the pulse

frequency produced by the clock, measured incycles per second, or Hertz (Hz).

P f A


62/161

Performance Assessment

Clock signals are generated by a quartz crystal,which generates a constant signal wave whilepower is applied. This wave is converted into adigital voltage pulse stream that is provided in a

constant flow to the processor circuitry. The rate of pulses is known as the clock rate,

or clock speed. One increment, or pulse, ofthe clock is referred to as a clock cycle, or a

clock tick. The time between pulses is thecycle time.


63/161

System Clock


64/161

Instruction execution takes place in discrete

steps Fetch, decode, load and store, arithmetic or logical

Usually require multiple clock cycles per instruction

Pipelining simultaneous execution of instructions

Conclusion: clock speed is not the whole story

about performance


65/161


66/161

Instruction execution rate

Let CPIi be the number of cycles required for

instruction type i. and Ii be the number of executedinstructions of type I be the number of cycles requiredfor instruction type i. and Ii be the number of executedinstructions of type i for a given program. Then we

can calculate an overall CPI as follows:


67/161


68/161

Instruction execution rate

Millions of instructions per second (MIPS)

Millions of floating point instructions per second(MFLOPS)

Heavily dependent on:

instruction set

compiler design

processor implementation

cache & memory hierarchy

We can express the MIPS rate in terms of the clock rateand CPI as follows:


69/161


70/161

The average CPI when the program is executedon a uniprocessor with the above trace results is

CPI 0.6 + (2*0.18) + (4*0.12) + (8*0.1) = 2.24.The corresponding MIPS rate is

(400*106)/(2.24*106) = 178.

Floating point performance is expressed asmillions of floating-point operations per second(MFLOPS), defined as follows:


71/161

Benchmarks

Programs designed to test performance

benchmark suite is a collection of programs,defined in a high-level language, that togetherattempt to provide a representative test of acomputer in a particular application or systemprogramming area.

System Performance Evaluation Corporation(SPEC), maintained and defined the best known

collection of benchmark suites

Averaging Results


72/161

Averaging Results

To obtain a reliable comparison of the performance ofvarious computers, it is preferable to run a number of

different benchmark programs on each machine andthen average the results. For example, if m differentbenchmark program, then a simple arithmetic meancan be calculated as follows:

Where Ri is the high-level language instructionexecution rate for the ith benchmark program.

Alternative: Harmonic Mean

Ahmdals Law


73/161

Ahmdals Law

Gene Amdahl

Potential speed-up of program using multipleprocessors

Concluded that:

Code needs to be parallelizable

Speed up is bound, giving diminishing returns formore processors

Task dependent

Servers gain by maintaining multipleconnections on multiple processors

Databases can be split into parallel tasks

Let T be the total execution time of the program using a


74/161

Let T be the total execution time of the program using asingle processor. Then the speedup using a parallelprocessor with N processors that fully exploits the

parallel portion of the program is as follows:

Two important conclusions can be drawn:

1. When f is small, the use of parallel processors haslittle effect.

2. As N approaches infinity, speedup is bound by 1/(1 f),so that there are diminishing returns for using moreprocessors.


75/161

Speedup

Suppose that a feature of the system is used duringexecution a fraction of the time f, before enhancement, and

that the speedup of that feature after enhancement is SUf.Then the overall speedup of the system is


76/161

For example, suppose that a task makes extensive use offloating-point operations, with 40% of the time is consumed

by floating-point operations. With a new hardware design,the floating-point module is speeded up by a factor of K.Then the overall speedup is:

Thus, independent of K, the maximum speedup is1.67


77/161

Top Level View of ComputerFunction and Interconnection

CHAPTER 3


78/161

Computer Components

The Control Unit and the Arithmetic and LogicUnit constitute the Central Processing Unit

An instruction interpreter and a module ofgeneral-purpose arithmetic and logic functions

Data and instructions must be put into thesystem

Taken together, theses are referred to as I/O

components Memory/Main Memory

place to store temporarily both instructionsand data.

Top-Level View Components


79/161

Top-Level View Components

Top Level View


80/161

Top-Level View The CPU exchanges data with memory. For this

purpose, it typically makes use of two internal (to theCPU) registers: a memory address register (MAR), whichspecifies the address in memory for the next read orwrite, and a memory buffer register (MBR), which

contains the data to be written into memory or receivesthe data read from memory. Similarly, an I/O addressregister (I/OAR) specifies a particular I/O device. An I/Obuffer (I/OBR) register is used for the exchange of data

between an I/O module and the CPU An I/O module transfers data from external devices toCPU and memory, and vice versa. It contains internalbuffers for temporarily holding these data until they can

be sent on.

C t F ti


81/161

Computer Function

The basic function performed by a computer isexecution of a program

The processor does the actual work byexecuting instructions specified in the program.

Instruction processing consists of two steps:The processor reads ( fetches) instructions from memoryone ata time and executes each instruction.

Program execution (executes) consists of repeating the

process of instruction fetch and instruction execution

I t ti F t h d E t


82/161

Instruction Fetch and Execute

Fetch Cycle Program Counter (PC) holds address of nextinstruction to fetch

Processor fetches instruction from memory location

pointed to by PC Increment PC

Unless told otherwise

Instruction loaded into Instruction

Register (IR) Processor interprets instruction and performs required

actions

C t F ti


83/161

Computer Function

Instruction Cycle

processing required for a single instruction

The two steps are referred to as the fetch cycleand the execute cycle. Program execution halts

only if the machine is turned off, some sort of

unrecoverableerror occurs, or a programinstruction that halts the computer is encountered.

I t ti F t h d E t


84/161


Execute Cycle Processor-memorydata transfer between CPU and main memory

Processor I/O

Data transfer between CPU and I/O module Data processing

Some arithmetic or logical operation on data

Control

Alteration of sequence of operations e.g. jump

Combination of above

E l f P g E ti


85/161

Example of a Program Execution



86/161


In this example, three instruction cycles, each

consisting of a fetch cycle and an execute cycle, areneeded to add the contents of location 940 to thecontents of 941.

With a more complex set of instructions, fewercycles would be needed. Some older processors, forexample, included instructions that contain morethan one memory address. Thus the execution cyclefor a particular instruction on such processor could

involve more than one reference to memory. Also,instead of memory references, an instruction may

specify an I/O operation.

Instruction Cycle State Diagram


87/161




88/161


States in the upper part of the diagram involvean exchange between the processor and eithermemory or an I/O module. States in the lowerpart of the diagram involve only internal

processor operations. The OAC state appearstwice, because an instruction may involve aread, a write, or both. However, the actionperformed during that state is fundamentally the

same in both cases, and so only a single stateidentifier is needed.

Instruction Cycle State


89/161


The states can be described as follows: Instruction address calculation (IAC): Determinethe address of the next instruction to be executed.

Instruction fetch (IF): Read instruction from its

memory location into the processor. Instruction operation decoding (IOD): Analyze

instruction to determine type of operation to beperformed and operand(s) to be used.

Operand address calculation (OAC): If theoperation involves reference to an operand inmemory or available via I/O, then determine theaddress of the operand.



90/161


The states can be described as follows:

Operand fetch (OF): Fetch the operand frommemory or read it in from I/O.

Data operation (DO): Perform the operationindicated in the instruction.

Operand store (OS): Write the result into memoryor out to I/O.

Interrupts


91/161

Interrupts

Mechanism by which other modules (e.g. I/O)may interrupt normal sequence of processing

Program

e.g. overflow, division by zero

Timer Generated by internal processor time

I/O

from I/O controller Hardware failure

e.g. memory parity error

Program Flow Control


92/161

Program Flow Control

Interrupt Cycle


93/161

Interrupt Cycle

Added to instruction cycle

Processor checks for interrupt

Indicated by an interrupt signal

If no interrupt, fetch next instruction

If interrupt pending: Suspend execution of current program

Save context

Set PC to start address of interrupt handles routine Process interrupt

Restore context and continue interrupted program

Transfer of Control via Interrupts


94/161




95/161


From the point of view of the user program, aninterrupt is just that: an interruption of the normalsequence of execution. When the interruptprocessing is completed, execution resumes

Thus, the user program does not have to containany special code to accommodate interrupts; theprocessor and the operating system areresponsible for suspending the user program

and then resuming it at the same point.

Interrupt Cycle


96/161

Interrupt Cycle

Added to instruction cycle

Processor checks for interrupt

Indicated by an interrupt signal

If no interrupt, fetch next instruction

If interrupt pending: Suspend execution of current program

Save context

Set PC to start address of interrupt handles routine Process interrupt

Restore context and continue interrupted program

Instruction Cycle with Interrupts


97/161




98/161


The processor now proceeds to the fetch cycle and

fetches the first instruction in the interrupt handlerprogram, which will service the interrupt. Theinterrupt handler program is generally part of theoperating system. Typically, this program

determines the nature of the interrupt and performswhatever actions are needed. In the example wehave been using, the handler determines which I/Omodule generated the interrupt and may branch to a

program that will write more data out to that I/Omodule. When the interrupt handler routine iscompleted, the processor can resume execution ofthe user program at the point of interruption.



99/161


In the interrupt cycle, the processor checks to see if

any interrupts have occurred, indicated by thepresence of an interrupt signal. If no interrupts arepending, the processor proceeds to the fetch cycleand fetches the next instruction of the current

program. If an interrupt is pending, the processordoes the following:

It suspends execution of the current program beingexecuted and saves its context. This means saving

the address of the next instruction to be executed(current contents of the program counter) and anyother data relevant to the processors current activity

It sets the program counter to the starting address ofan interrupt handler routine.

Program Timing Short I/O Wait


100/161

Program Timing Short I/O Wait

Program Timing Long I/O Wait


101/161

Program Timing Long I/O Wait

Instruction Cycle State Diagram w/


102/161

y gInterrupts

Multiple Interrupts


103/161

Multiple Interrupts Disable Interrupts

Processor will ignore further interrupts whilstprocessing one interrupt

Interrupts remain pending and are checked after firstinterrupt has been processed

Interrupts handled in sequence as they occur Define Priorities

Low priority interrupts can be interrupted by higherpriority interrupts

When higher priority interrupt has beenprocessed, processor returns to previousinterrupt

Multiple Interrupts - Nested


104/161

Multiple Interrupts Nested

Multiple Interrupts - Sequential


105/161

Multiple Interrupts Sequential

Interconnection Structures


106/161


The collection of paths connecting the variousmodules is called the interconnection structure.

The design of this structure will depend on theexchanges that must be made among modules.



107/161


Types of exchanges that are needed by

indicating the major forms of input and output foreach module type:

Memory: Typically, a memory module will consistof N words of equal length. Each word is assigned

a unique numerical address (0, 1, . . . ,N 1). A wordof data can be read from or written into the memory

I/O module: From an internal (to the computersystem) point of view, I/O is functionally similar to

memory. There are two operations, read and write.Further, an I/O module may control more than oneexternal device. We can refer to each of theinterfaces to an external device as a port and giveeach a uniqueaddress (e.g., 0, 1, . . . ,M 1).



108/161

Interconnection Structures- Processor: The processor reads in instructions and

data, writes out data after processing, and uses controlsignals to control the overall operation of the system. Italso receives interrupt signals.

Computer Module

Memory Connection


109/161

Memory Connection

Receives and sends data

Receives addresses (of locations)

Receives control signals

Read

Write

Timing

Input / Output Connection


110/161


Similar to memory from computers viewpoint

Output

Receive data from computer

Send data to peripheral

Input

Receive data from peripheral Send data from computer



111/161


Receive control signals from computer

Send control signals to peripherals

Ex. Spin disk

Receive addresses from computer

Send interrupt signals (control)

CPU Connection


112/161

CPU Connection

Reads instruction and data

Writes out data (after processing)

Sends control signals to other units

Receives (& acts on) interrupts

Bus Interconnection


113/161

us te co ect o

A bus is a communication pathway connectingtwo or more devices

Multiple devices connect to the bus, and a signaltransmitted by any one device is available for

reception by all other devices attached to thebus.

A bus that connects major computercomponents (processor, memory, I/O) is called asystem bus.

Bus Structure


114/161

On any bus the lines can be classified into threefunctional groups:

The data lines provide a path for moving dataamong system modules. These lines, collectively,

are called thedata bus

. The address lines are used to designate the

source or destination of the data on the data bus.

The control lines are used to control the access to

and the use of the data and address lines.

Bus Structure


115/161

The operation of the bus is as follows. If one modulewishes to send data to another, it must do twothings: (1) obtain the use of the bus, and (2) transferdata via the bus. If one module wishes to requestdata from another module, it must (1) obtain the useof the bus, and (2) transfer a request to the othermodule over the appropriate control and addresslines. It must then wait for that second module tosend the data.

Bus Structure


116/161

Typical Physical Realization of a Bus Architecture


117/161

Traditional ISA with Cache


118/161

High Performance Bus


119/161

Bus Types


120/161

yp

Dedicated Separate data & address lines

Multiplexed

Shared lines

Address valid or data valid control line

Advantage - fewer lines

Disadvantages More complex control

Ultimate performance

Bus Arbitration


121/161

More than one module controlling the bus Ex. CPU and DMA controller

Only one module may control bus at one time

Arbitration may be centralised or distributed

Centralized and Distributed Arbitration


122/161

Centralised Single hardware device controlling bus access Bus Controller

Arbiter

May be part or separate Distributed

Each module may claim the bus

Control logic on all modules

Timing


123/161

g

Co-ordination of events on bus Synchronous

Events determined by clock signals

Control Bus includes clock line

A single 1-0 is a bus cycle

All devices can read clock line

Usually sync on leading edge

Usually a single cycle for an event

Synchronous Timing Diagram


124/161

y g g

Asynchronous Timing Read Diagram


125/161

y g g

Asynchronous Timing Write Diagram


126/161

y g g

PCI Bus


127/161

Peripheral Component Interconnection

Intel released to public domain

32 or 64 bit

PCI Bus Lines (required)


128/161

System Lines Including clock and reset

Address and Data

32 time mux lines for address/data

Interrupt & validate lines

Interface Control

Arbitration

Not shared Direct connection to PCI bus arbiter

Error Lines

PCI Bus Lines (optional)


129/161

Interrupt Lines Not shared

Cache Support

64-bit Bus Extension

Additional 32 lines

Time multiplexed

2 lines to enable devices to agree to use 64-

bit transfer JTAG/Boundary Scan

For testing procedures

PCI Commands


130/161

Transaction between initiator (master) and target

Master claims bus

Determine type of transaction

Ex. I/O read/write

Address phase

One or more data phases

PCI Read Timing Diagram


131/161

PCI Bus Arbiter


132/161

PCI Bus Arbitration


133/161


134/161

Cache Memory

Chapter 4

Terminology


135/161

Capacity: the amount of information that can becontained in a memory unit usually in terms of words or bytes

Word:the natural unit of organization in the memory,typically the number of bits used to represent a number

Addressable unit: the fundamental data element sizethat can be addressed in the memory typically either the word size or individual bytes

Unit of transfer:The number of data elementstransferred at a time usually bits in main memory and blocks in secondary

memory

Transfer rate: Rate at which data is transferred to/fromthe memory device

Terminology


136/161

Access time:

For RAM, the time to address the unit andperform the transfer

For non-random access memory, the time toposition the R/W head over the desired location

Memory cycle time: Access time plus any othertime required before a second access can bestarted

Access technique: how are memory contents

accessed

Memory Hierarchy


137/161

Major design objective of any memory system

To provide adequate storage capacity at An acceptable level of performance At a reasonable cost

Four interrelated ways to meet this goal

Use a hierarchy of storage devices Develop automatic space allocation methods forefficient use of the memory

Through the use of virtual memory techniques, freethe user from memory management tasks

Design the memory and its related interconnectionstructure so that the processor can operate at or nearits maximum rate

Memory Hierarchy Basis of the memory hierarchy


138/161

Basis of the memory hierarchy Registers internal to the CPU for temporary data storage (small in number but very fast) External storage for data and programs (relatively large and fast) External permanent storage (much larger and much slower)

Characteristics of the memory hierarchy Consists of distinct levels of memory components Each level characterized by its size, access time, and cost

per bit Each increasing level in the hierarchy consists of modules

of larger capacity, slower access time, and lower cost/bit Goal of the memory hierarchy

Try to match the processor speed with the rate ofinformation transfer from the lowest element in the hierarchy

Memory Hierarchy Diagram


139/161

Hierarchy List


140/161

Registers

L1 Cache

L2 Cache

Main memory

Disk cache

Disk

OpticalTape

Cache Memory


141/161

y

Cache memory is a critical component of thememory hierarchy

Compared to the size of main memory, cache isrelatively small

Operates at or near the speed of the processor Very expensive compared to main memory

Cache contains copies of sections of main memory

Cache Memory


142/161

y

Small amount of fast memory Sits between normal main memory and

CPU

May be located on CPU chip or module

Cache and Main Memory


143/161

Cache/Main Memory Structure


144/161

Cache Operation - Overview


145/161

CPU requests contents of memory location

Check cache for this data

If present, get from cache (fast)

If not present, read required block from main

memory to cache Then deliver from cache to CPU

Cache includes tags to identify which block of

main memory is in each cache slot

Locality of Reference


146/161

The cache memory works because of

locality of reference Memory references made by the processor,

for both instructions and data, tend to clustertogether

Instruction loops, subroutines Data arrays, tables

Keep these clusters in high speed memory toreduce the average delay in accessing data

Over time, the clusters being referenced willchange -- memory management must dealwith this


147/161

Cache Design


148/161

Addressing

Size

Mapping Function

Replacement Algorithm

Write Policy

Block Size

Number of Caches

Cache Addressing


149/161

Where does cache sit?

Between processor and virtual memory management unit Between MMU and main memory

Logical cache (virtual cache) stores data usingvirtual addresses

Processor accesses cache directly, not thorough physicalcache

Cache access faster, before MMU address translation

Virtual addresses use same address space for differentapplications

Must flush cache on each context switch

Physical cache stores data using main memoryphysical addresses

Mapping Function


150/161

Because there are fewer cache lines than

main memory blocks, an algorithm isneeded for mapping main memory blocksinto cache lines.

The choice of the mapping function dictateshow the cache is organized.

3 techniques:direct, associative, and setassociative.


151/161

Direct Mapping


152/161

Direct Mapping


153/161

Set Associative Mapping


154/161

Set-associative mapping is a compromise that

exhibits the strengths of both the direct andassociative approaches while reducing theirdisadvantages.



155/161



156/161

Fully Associative Mapping


157/161

Associative mapping overcomes thedisadvantage of direct mapping bypermitting each main memory block to be

loaded into any line of the cache

Fully Associative Mapping


158/161

Write Policy


159/161

Must not overwrite a cache block unless

main memory is up to date

Multiple CPUs may have individual caches

I/O may address main memory directly

Write Through


160/161

All writes go to main memory as well as

cache Multiple CPUs can monitor main memory

traffic to keep local (to CPU) cache up to

date Lots of traffic

Slows down writes

Remember bogus write through caches!

Write Back


161/161

Updates initially made in cache only

Update bit for cache slot is set when updateoccurs

If block is to be replaced, write to main

memory only if update bit is set Other caches get out of sync

I/O must access main memory through

cache N B 15% of memory references are writes

comp arch ch1 ch2 ch3 ch4

Documents