comp arch ch1 ch2 ch3 ch4
TRANSCRIPT
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
1/161
William Stallings
Computer Organizationand Architecture8th Edition
CHAPTER 1
INTRODUCTION
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
2/161
Architecture and Organization
Architecture is those attributes visible tothe programmer Instruction set, number of bits used for data
representation, I/O mechanisms, addressing
techniques. e.g. Is there a multiply instruction?
Organization is how features areimplemented Control signals, interfaces, memory technology.
e.g. Is there a hardware multiply unit or is it done by
repeated addition?
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
3/161
Family Concept
All Intel x86 family share the same basicarchitecture
The IBM System/370 family share the samebasic architecture
This gives code compatibility (at least
backwards)
Organization differs between different
versions
Architecture and Organization
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
4/161
Computer Complex system: How can we
design/describe it?
Hierarchical system: A set of interrelated subsystems, each
subsystem hierarchic in structure until somelowest level of elementary subsystems is
reached
At each level of the system, the designer
is concerned with structureand function.
Structure and Function
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
5/161
Structure and Function
Structure is the way in whichcomponents relate to each other
Function is the operation of individualcomponents as part of the structure
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
6/161
Function
General computer
functions:
Data processing Data storage
Data movement
Control
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
7/161
Operations
Data movement
Ex., keyboard to
screen
Functional View of the Computer
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
8/161
Operations
Storage
Ex., Internet
download to disk
Playing an mp3 file
stored in memory
to earphones attached
to the same PC.
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
9/161
Operations
Processing from/
to storage
Any number-crunching
application that takes
data from memory and
stores the result back in
memory.
ex., updating bank
statement
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
10/161
Operations
Processing from
storage to I/O
Receiving packets over a
network interface,
verifying their CRC,
then storing them
in memory.
ex., printing a bank
statement
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
11/161
Structure
Four main structural components
CPU
Main Memory
I/O Devices
System Interconnection
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
12/161
Structure
Four main structural components
1. Central Processing Unit (CPU)
Controls the operation of thecomputer and performs its dataprocessing functions; often simplyreferred to as processor.
2. Main Memory
Stores data
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
13/161
Structure
Four main structural components3.I/O
moves data between the computer
and its external environment.4. System Interconnection
Some mechanism that provides for
communication among CPU, mainmemory, and I/O. A common example ofsystem interconnection is a system bus
consisting of a number of wires to w/c all
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
14/161
Structure Top Level
ComputerMain
Memory
InputOutput
SystemsInterconnection
Peripherals
Communicationlines
CentralProcessing
Unit
Computer
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
15/161
Structure The CPU
ComputerArithmetic
andLogic Unit
Control
Unit
Internal CPUInterconnection
Registers
CPU
I/O
Memory
SystemBus
CPU
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
16/161
Structure The Control Unit
CPU
ControlMemory
Control UnitRegisters and
Decoders
SequencingLogic
ControlUnit
ALU
Registers
InternalBus
Control Unit
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
17/161
Computer Evolution andPerformance
CHAPTER 2
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
18/161
Brief History of Computers
The First Generation: Vacuum Tubes
ENIAC
oElectronic Numerical Integrator And Computer
oWorlds first general purpose electronic digitalcomputer
o John Mauchly and John Eckert
o It weighs 30 tons, occupying 1500 square feetof floor space, and containing more than18,000 vacuum tubes.
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
19/161
Brief History of Computers
The First Generation: Vacuum Tubes
Von Neumann/Turing
o
Stored Program concepto Main memory storing programs and data
o Attributed to John von Neumann who was anENIAC designer and Alan Turing was the one
who developed the idea
o Input and output equipment operated bycontrol unit
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
20/161
o In 1946, von Neumann and his colleaguesbegan the design of a new stored program
computer, referred to as the IAS computer.
o The IAS computer, although not completeduntil 1952, is the prototype of all subsequentgeneral-purpose computers.
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
21/161
Brief History of Computers
IAS computer consist of:oA main memory, which stores both data and
instructions
o
An arithmetic and logic unit (ALU) capable ofoperating on binary data
oA control unit, which interprets the instructionsin memory and causes them to be executed
o Input and output (I/O) equipment operated bythe control unit
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
22/161
Structure of the IAS computer
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
23/161
John von Neumann and the IAS machine, 1952
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
24/161
UNIVAC
o UNIVAC I (Universal Automatic Computer)
o 1947 -Eckert-Mauchly Computer Corporationo first successful commercial computer. It was
intended for both scientific and commercialapplications.
o US Bureau of Census 1950 calculations
o Became part of Sperry-Rand Corporation
o Late 1950s -UNIVAC II
-Faster
-More memory
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
25/161
IBM
o Punched-card processing equipment
o 1953 -the 701
o IBMs first stored program computer
o Scientific calculations
o 1955 -the 702
o Business applications
o Lead to 700/7000 series
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
26/161
Brief History of Computers
The Second Generation: Transistors
Transistoro is smaller, cheaper, and dissipates less heat
than a vacuum tube but can be used in
the same way as a vacuum tube to constructcomputers
o invented at Bell Labs in 1947 by WilliamShockley
o IBM 7000
o DEC (Digital Equipment Corporation) wasfounded in 957
o Produced PDP-1 in the same year
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
27/161
Brief History of Computers
The Third Generation: Integrated Circuits
o A computer is made up of gates, memorycells and interconnections
o single, self-contained transistor is called a
discrete component
o All these can be manufactured either
separately (discrete components) or on the
same piece of semiconductor
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
28/161
Brief History of Computers
Generations of Computers
oVacuum tube -1946-1957
oTransistor -1958-1964
oSmall scale integration -1965 on
-Up to 100 devices on a chip
oMedium scale integration -to 1971
-100-3,000 devices on a chip
oLarge scale integration -1971-1977
-3,000 -100,000 devices on a chipoVery large scale integration -1978 -1991
-100,000 -100,000,000 devices on a chip
oUltra large scale integration1991 -
-Over 100,000,000 devices on a chip
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
29/161
Moores Law
Increased density of components on chip
Gordon Mooreco-founder of IntelNumber of transistors on a chip will double everyyear
Since 1970s development has slowed a little
Number of transistors doubles every 18 months
Cost of a chip has remained almost unchanged
Higher packing density means shorter electrical
paths, giving higher performanceSmaller size gives increased flexibility
Reduced power and cooling requirements
Fewer interconnections increases reliability
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
30/161
Growth in CPU Transistor Count
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
31/161
IBM 360 Series
first planned family of computers.
Similar or identical O/S
Increasing speed
Increasing number of I/O ports (i.e. moreterminals)
Increased memory size
Increased cost
Multiplexed switch structure
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
32/161
DEC PDP - 8 1964
First minicomputer (after miniskirt!)
Did not need air conditioned room
Small enough to sit on a lab bench
$16,000
-$100k+ for IBM 360
Embedded applications & OEM
BUS STRUCTURE
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
33/161
DEC-PDP 8 Bus Structure
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
34/161
Semiconductor Memory
1970Fairchild
Size of a single core
-i.e. 1 bit of magnetic core storageHolds 256 bits
Non-destructive read
Much faster than coreCapacity approximately doubles each year
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
35/161
Microprocessors -Intel
1971 -4004
First microprocessorAll CPU components on a single chip
4 bit
Multiplication by repeated addition, no hardwaremultiplier!
Followed in 1972 by 8008
8 bit
Both designed for specific applications
1974 -8080
Intels first general purpose microprocessor
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
36/161
1970s Processors
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
37/161
1980s Processors
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
38/161
1990s Processors
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
39/161
Recent Processors
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
40/161
Designing for Performance
Year by year, the cost of computer systemscontinues to drop dramatically, while theperformance and capacity of those systemscontinue to rise equally dramatically
The basic building blocks for todays computermiracles are virtually the same as those of theIAS computer from over 50 years ago, while onthe other hand, the techniques for squeezing thelast iota of performance out of the materials athand have become increasingly sophisticated.
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
41/161
Designing for Performance
But many techniques have been invented toimprove the performance.
Some of the main techniques are the following:
Pipelining
On board cache
On board L1 and L2 Cache
Branch Prediction -The processor looks ahead in theinstruction code fetched from memory and predicts
which branches, or groups of instructions, are likely tobe processed next. If the processor guesses right mostof the time, it can pre-fetch the correct instructions andbuffer them so that the processor is kept busy.
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
42/161
Designing for Performance
Data Flow Analysis - The processor analyzes whichinstructions are dependent on each others results, ordata, to create an optimized schedule of instructions.
Speculative Execution - Using branch prediction anddata flow analysis, some processors speculativelyexecute instructions ahead of their actual appearance inthe program execution, holding the results in temporarylocations. This enables the processor to keep its
execution engines as busy as possible by executinginstructions that are likely to be needed.
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
43/161
Performance Balance
While processor power has raced ahead atbreakneck speed, other critical components of thecomputer have not kept up. The result is a need to
look for performance balance: an adjusting of theorganization and architecture to compensate forthe mismatch among the capabilities of the variouscomponents.
Processor speed increased Memory capacity increased
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
44/161
Logic and Memory Performance Gap
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
45/161
While processor speed has grown rapidly, thespeed with which data can be transferredbetween main memory and the processor haslagged badly. The interface between processorand main memory is the most crucial pathway in
the entire computer because it is responsible forcarrying a constant flow of program instructionsand data between memory chips and theprocessor. If memory or the pathway fails to
keep pace with the processors insistentdemands, the processor stalls in a wait state,and valuable processing time is lost.
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
46/161
Solutions
Increased number of bits retrieved at one time Make DRAM wider rather than deeper
Change DRAM interface Cache
Reduce frequency of memory access More complex cache and cache on chip
Increase interconnection bandwidth High speed buses
Hierarchy of buses
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
47/161
I/O Devices
As computers become faster and more capable,more sophisticated applications are developed thatsupport the use of peripherals with intensive I/Odemands.
Solutions Caching
Buffering
Higher-speed interconnection buses
More elaborate bus structures
Multiple processor configurations
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
48/161
Typical I/O Device Data Rates
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
49/161
The key is balance among:
Processor components Main memory
I/O Devices
Interconnection structures
Th l ti f th I t l X86 A hit t
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
50/161
The evolution of the Intel X86 Architecture
8080: The worlds first general-purpose
microprocessor. This was an 8-bit machine,with an 8-bit data path to memory. The 8080 wasused in the first personal computer, the Altair.
8086: A far more powerful, 16-bit machine. In
addition to a wider data path and largerregisters, the 8086 sported an instruction cache,or queue, that pre-fetches a few instructionsbefore they are executed. A variant of thisprocessor, the 8088, was used in IBMs firstpersonal computer, securing the success of Intel.The 8086 is the first appearance of the x86
architecture.
Th l ti f th I t l X86 A hit t
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
51/161
The evolution of the Intel X86 Architecture
80286: This extension of the 8086 enabled
addressing a 16-MByte memory instead of just1 MByte.
80386: Intels first 32-bit machine, and a major
overhaul of the product. With a 32-bitarchitecture, the 80386 rivaled the complexity andpower of minicomputers and mainframesintroduced just a few years earlier. This was the
first Intel processor to support multitasking,meaning it could run multiple programs at thesame time.
Th l ti f th I t l X86 A hit t
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
52/161
The evolution of the Intel X86 Architecture
80486: The 80486 introduced the use of
much more sophisticated and powerfulcache technology and sophisticated instructionpipelining. The 80486 also offered a built-inmath coprocessor, offloading complex mathoperations from the main CPU.
Pentium: With the Pentium, Intel introduced
the use of superscalar techniques, whichallow multiple instructions to execute in parallel.
Th l ti f th I t l X86 A hit t
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
53/161
The evolution of the Intel X86 Architecture
Pentium Pro: The Pentium Pro continued the
move into superscalar organization begunwith the Pentium, with aggressive use ofregister renaming, branch prediction, data flowanalysis, and speculative execution.
Pentium II: The Pentium II incorporated IntelMMX technology, which is designedspecifically to process video, audio, and
graphics data efficiently. Pentium III: The Pentium III incorporates
additional floating-point instructions tosupport 3D graphics software.
Th l ti f th I t l X86 A hit t
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
54/161
The evolution of the Intel X86 Architecture
Pentium 4: The Pentium 4 includes
additional floating-point and otherenhancements for multimedia.8
Core: This is the first Intel x86
microprocessor with a dual core, referring tothe implementation of two processors on asingle chip.
Core 2: The Core 2 extends the architecture
to 64 bits. The Core 2 Quad provides fourprocessors on a single chip.
Embedded Systems and ARM
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
55/161
Embedded Systems and ARM The ARM architecture refers to a processor
architecture that has evolved from RISC designprinciples and is used in embedded systems.
The term embedded system refers to the use ofelectronics and software within aproduct, as
opposed to a general-purpose computer, such asa laptop or desktop system.
Embedded system. A combination ofcomputer hardware and software, and
perhaps additional mechanical or other parts,designed to perform a dedicated function. Inmany cases, embedded systems are part of alarger system or product, as in the case of an
Embedded Systems and ARM
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
56/161
Embedded Systems and ARM
Embedded Systems Requirements:
Small to large systems, implying very differentcost constraints, thus different needs foroptimization and reuse
Relaxed to very strict requirements andcombinations of different quality requirements,for example, with respect to safety, reliability,real-time, flexibility, and legislation
Short to long life times Different environmental conditions in terms of,
for example, radiation, vibrations, and humidity
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
57/161
Possible Organization of an Embedded System
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
58/161
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
59/161
ARM Evolution
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
60/161
ARM processors are designed to meet the
needs of three system categories: Embedded real-time systems: Systems for storage,automotive body and power-train, industrial, andnetworking applications
Application platforms: Devices running openoperating systems including Linux, Palm OS,Symbian OS, and Windows CE in wireless,consumer entertainment and digital imaging
applications Secure applications: Smart cards, SIM cards, and
payment terminals
P f A
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
61/161
Performance Assessment
In evaluating processor hardware and setting
requirements for new systems, performance is one of thekey parameters to consider, along with cost, size,security, reliability, and in some cases powerconsumption.
System clock speed Operations performed by a processor, such as
fetching an instruction, decoding the instruction,performing an arithmetic operation, and so on are
governed by a system clock. The speed of a processor is dictated by the pulse
frequency produced by the clock, measured incycles per second, or Hertz (Hz).
P f A
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
62/161
Performance Assessment
Clock signals are generated by a quartz crystal,which generates a constant signal wave whilepower is applied. This wave is converted into adigital voltage pulse stream that is provided in a
constant flow to the processor circuitry. The rate of pulses is known as the clock rate,
or clock speed. One increment, or pulse, ofthe clock is referred to as a clock cycle, or a
clock tick. The time between pulses is thecycle time.
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
63/161
System Clock
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
64/161
Instruction execution takes place in discrete
steps Fetch, decode, load and store, arithmetic or logical
Usually require multiple clock cycles per instruction
Pipelining simultaneous execution of instructions
Conclusion: clock speed is not the whole story
about performance
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
65/161
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
66/161
Instruction execution rate
Let CPIi be the number of cycles required for
instruction type i. and Ii be the number of executedinstructions of type I be the number of cycles requiredfor instruction type i. and Ii be the number of executedinstructions of type i for a given program. Then we
can calculate an overall CPI as follows:
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
67/161
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
68/161
Instruction execution rate
Millions of instructions per second (MIPS)
Millions of floating point instructions per second(MFLOPS)
Heavily dependent on:
instruction set
compiler design
processor implementation
cache & memory hierarchy
We can express the MIPS rate in terms of the clock rateand CPI as follows:
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
69/161
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
70/161
The average CPI when the program is executedon a uniprocessor with the above trace results is
CPI 0.6 + (2*0.18) + (4*0.12) + (8*0.1) = 2.24.The corresponding MIPS rate is
(400*106)/(2.24*106) = 178.
Floating point performance is expressed asmillions of floating-point operations per second(MFLOPS), defined as follows:
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
71/161
Benchmarks
Programs designed to test performance
benchmark suite is a collection of programs,defined in a high-level language, that togetherattempt to provide a representative test of acomputer in a particular application or systemprogramming area.
System Performance Evaluation Corporation(SPEC), maintained and defined the best known
collection of benchmark suites
Averaging Results
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
72/161
Averaging Results
To obtain a reliable comparison of the performance ofvarious computers, it is preferable to run a number of
different benchmark programs on each machine andthen average the results. For example, if m differentbenchmark program, then a simple arithmetic meancan be calculated as follows:
Where Ri is the high-level language instructionexecution rate for the ith benchmark program.
Alternative: Harmonic Mean
Ahmdals Law
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
73/161
Ahmdals Law
Gene Amdahl
Potential speed-up of program using multipleprocessors
Concluded that:
Code needs to be parallelizable
Speed up is bound, giving diminishing returns formore processors
Task dependent
Servers gain by maintaining multipleconnections on multiple processors
Databases can be split into parallel tasks
Let T be the total execution time of the program using a
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
74/161
Let T be the total execution time of the program using asingle processor. Then the speedup using a parallelprocessor with N processors that fully exploits the
parallel portion of the program is as follows:
Two important conclusions can be drawn:
1. When f is small, the use of parallel processors haslittle effect.
2. As N approaches infinity, speedup is bound by 1/(1 f),so that there are diminishing returns for using moreprocessors.
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
75/161
Speedup
Suppose that a feature of the system is used duringexecution a fraction of the time f, before enhancement, and
that the speedup of that feature after enhancement is SUf.Then the overall speedup of the system is
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
76/161
For example, suppose that a task makes extensive use offloating-point operations, with 40% of the time is consumed
by floating-point operations. With a new hardware design,the floating-point module is speeded up by a factor of K.Then the overall speedup is:
Thus, independent of K, the maximum speedup is1.67
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
77/161
Top Level View of ComputerFunction and Interconnection
CHAPTER 3
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
78/161
Computer Components
The Control Unit and the Arithmetic and LogicUnit constitute the Central Processing Unit
An instruction interpreter and a module ofgeneral-purpose arithmetic and logic functions
Data and instructions must be put into thesystem
Taken together, theses are referred to as I/O
components Memory/Main Memory
place to store temporarily both instructionsand data.
Top-Level View Components
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
79/161
Top-Level View Components
Top Level View
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
80/161
Top-Level View The CPU exchanges data with memory. For this
purpose, it typically makes use of two internal (to theCPU) registers: a memory address register (MAR), whichspecifies the address in memory for the next read orwrite, and a memory buffer register (MBR), which
contains the data to be written into memory or receivesthe data read from memory. Similarly, an I/O addressregister (I/OAR) specifies a particular I/O device. An I/Obuffer (I/OBR) register is used for the exchange of data
between an I/O module and the CPU An I/O module transfers data from external devices toCPU and memory, and vice versa. It contains internalbuffers for temporarily holding these data until they can
be sent on.
C t F ti
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
81/161
Computer Function
The basic function performed by a computer isexecution of a program
The processor does the actual work byexecuting instructions specified in the program.
Instruction processing consists of two steps:The processor reads ( fetches) instructions from memoryone ata time and executes each instruction.
Program execution (executes) consists of repeating the
process of instruction fetch and instruction execution
I t ti F t h d E t
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
82/161
Instruction Fetch and Execute
Fetch Cycle Program Counter (PC) holds address of nextinstruction to fetch
Processor fetches instruction from memory location
pointed to by PC Increment PC
Unless told otherwise
Instruction loaded into Instruction
Register (IR) Processor interprets instruction and performs required
actions
C t F ti
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
83/161
Computer Function
Instruction Cycle
processing required for a single instruction
The two steps are referred to as the fetch cycleand the execute cycle. Program execution halts
only if the machine is turned off, some sort of
unrecoverableerror occurs, or a programinstruction that halts the computer is encountered.
I t ti F t h d E t
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
84/161
Instruction Fetch and Execute
Execute Cycle Processor-memorydata transfer between CPU and main memory
Processor I/O
Data transfer between CPU and I/O module Data processing
Some arithmetic or logical operation on data
Control
Alteration of sequence of operations e.g. jump
Combination of above
E l f P g E ti
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
85/161
Example of a Program Execution
Instruction Fetch and Execute
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
86/161
Instruction Fetch and Execute
In this example, three instruction cycles, each
consisting of a fetch cycle and an execute cycle, areneeded to add the contents of location 940 to thecontents of 941.
With a more complex set of instructions, fewercycles would be needed. Some older processors, forexample, included instructions that contain morethan one memory address. Thus the execution cyclefor a particular instruction on such processor could
involve more than one reference to memory. Also,instead of memory references, an instruction may
specify an I/O operation.
Instruction Cycle State Diagram
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
87/161
Instruction Cycle State Diagram
Instruction Cycle State Diagram
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
88/161
Instruction Cycle State Diagram
States in the upper part of the diagram involvean exchange between the processor and eithermemory or an I/O module. States in the lowerpart of the diagram involve only internal
processor operations. The OAC state appearstwice, because an instruction may involve aread, a write, or both. However, the actionperformed during that state is fundamentally the
same in both cases, and so only a single stateidentifier is needed.
Instruction Cycle State
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
89/161
Instruction Cycle State
The states can be described as follows: Instruction address calculation (IAC): Determinethe address of the next instruction to be executed.
Instruction fetch (IF): Read instruction from its
memory location into the processor. Instruction operation decoding (IOD): Analyze
instruction to determine type of operation to beperformed and operand(s) to be used.
Operand address calculation (OAC): If theoperation involves reference to an operand inmemory or available via I/O, then determine theaddress of the operand.
Instruction Cycle State
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
90/161
Instruction Cycle State
The states can be described as follows:
Operand fetch (OF): Fetch the operand frommemory or read it in from I/O.
Data operation (DO): Perform the operationindicated in the instruction.
Operand store (OS): Write the result into memoryor out to I/O.
Interrupts
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
91/161
Interrupts
Mechanism by which other modules (e.g. I/O)may interrupt normal sequence of processing
Program
e.g. overflow, division by zero
Timer Generated by internal processor time
I/O
from I/O controller Hardware failure
e.g. memory parity error
Program Flow Control
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
92/161
Program Flow Control
Interrupt Cycle
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
93/161
Interrupt Cycle
Added to instruction cycle
Processor checks for interrupt
Indicated by an interrupt signal
If no interrupt, fetch next instruction
If interrupt pending: Suspend execution of current program
Save context
Set PC to start address of interrupt handles routine Process interrupt
Restore context and continue interrupted program
Transfer of Control via Interrupts
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
94/161
Transfer of Control via Interrupts
Transfer of Control via Interrupts
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
95/161
Transfer of Control via Interrupts
From the point of view of the user program, aninterrupt is just that: an interruption of the normalsequence of execution. When the interruptprocessing is completed, execution resumes
Thus, the user program does not have to containany special code to accommodate interrupts; theprocessor and the operating system areresponsible for suspending the user program
and then resuming it at the same point.
Interrupt Cycle
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
96/161
Interrupt Cycle
Added to instruction cycle
Processor checks for interrupt
Indicated by an interrupt signal
If no interrupt, fetch next instruction
If interrupt pending: Suspend execution of current program
Save context
Set PC to start address of interrupt handles routine Process interrupt
Restore context and continue interrupted program
Instruction Cycle with Interrupts
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
97/161
Instruction Cycle with Interrupts
Instruction Cycle with Interrupts
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
98/161
Instruction Cycle with Interrupts
The processor now proceeds to the fetch cycle and
fetches the first instruction in the interrupt handlerprogram, which will service the interrupt. Theinterrupt handler program is generally part of theoperating system. Typically, this program
determines the nature of the interrupt and performswhatever actions are needed. In the example wehave been using, the handler determines which I/Omodule generated the interrupt and may branch to a
program that will write more data out to that I/Omodule. When the interrupt handler routine iscompleted, the processor can resume execution ofthe user program at the point of interruption.
Instruction Cycle with Interrupts
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
99/161
Instruction Cycle with Interrupts
In the interrupt cycle, the processor checks to see if
any interrupts have occurred, indicated by thepresence of an interrupt signal. If no interrupts arepending, the processor proceeds to the fetch cycleand fetches the next instruction of the current
program. If an interrupt is pending, the processordoes the following:
It suspends execution of the current program beingexecuted and saves its context. This means saving
the address of the next instruction to be executed(current contents of the program counter) and anyother data relevant to the processors current activity
It sets the program counter to the starting address ofan interrupt handler routine.
Program Timing Short I/O Wait
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
100/161
Program Timing Short I/O Wait
Program Timing Long I/O Wait
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
101/161
Program Timing Long I/O Wait
Instruction Cycle State Diagram w/
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
102/161
y gInterrupts
Multiple Interrupts
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
103/161
Multiple Interrupts Disable Interrupts
Processor will ignore further interrupts whilstprocessing one interrupt
Interrupts remain pending and are checked after firstinterrupt has been processed
Interrupts handled in sequence as they occur Define Priorities
Low priority interrupts can be interrupted by higherpriority interrupts
When higher priority interrupt has beenprocessed, processor returns to previousinterrupt
Multiple Interrupts - Nested
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
104/161
Multiple Interrupts Nested
Multiple Interrupts - Sequential
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
105/161
Multiple Interrupts Sequential
Interconnection Structures
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
106/161
Interconnection Structures
The collection of paths connecting the variousmodules is called the interconnection structure.
The design of this structure will depend on theexchanges that must be made among modules.
Interconnection Structures
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
107/161
Interconnection Structures
Types of exchanges that are needed by
indicating the major forms of input and output foreach module type:
Memory: Typically, a memory module will consistof N words of equal length. Each word is assigned
a unique numerical address (0, 1, . . . ,N 1). A wordof data can be read from or written into the memory
I/O module: From an internal (to the computersystem) point of view, I/O is functionally similar to
memory. There are two operations, read and write.Further, an I/O module may control more than oneexternal device. We can refer to each of theinterfaces to an external device as a port and giveeach a uniqueaddress (e.g., 0, 1, . . . ,M 1).
Interconnection Structures
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
108/161
Interconnection Structures- Processor: The processor reads in instructions and
data, writes out data after processing, and uses controlsignals to control the overall operation of the system. Italso receives interrupt signals.
Computer Module
Memory Connection
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
109/161
Memory Connection
Receives and sends data
Receives addresses (of locations)
Receives control signals
Read
Write
Timing
Input / Output Connection
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
110/161
Input / Output Connection
Similar to memory from computers viewpoint
Output
Receive data from computer
Send data to peripheral
Input
Receive data from peripheral Send data from computer
Input / Output Connection
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
111/161
Input / Output Connection
Receive control signals from computer
Send control signals to peripherals
Ex. Spin disk
Receive addresses from computer
Send interrupt signals (control)
CPU Connection
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
112/161
CPU Connection
Reads instruction and data
Writes out data (after processing)
Sends control signals to other units
Receives (& acts on) interrupts
Bus Interconnection
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
113/161
us te co ect o
A bus is a communication pathway connectingtwo or more devices
Multiple devices connect to the bus, and a signaltransmitted by any one device is available for
reception by all other devices attached to thebus.
A bus that connects major computercomponents (processor, memory, I/O) is called asystem bus.
Bus Structure
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
114/161
On any bus the lines can be classified into threefunctional groups:
The data lines provide a path for moving dataamong system modules. These lines, collectively,
are called thedata bus
. The address lines are used to designate the
source or destination of the data on the data bus.
The control lines are used to control the access to
and the use of the data and address lines.
Bus Structure
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
115/161
The operation of the bus is as follows. If one modulewishes to send data to another, it must do twothings: (1) obtain the use of the bus, and (2) transferdata via the bus. If one module wishes to requestdata from another module, it must (1) obtain the useof the bus, and (2) transfer a request to the othermodule over the appropriate control and addresslines. It must then wait for that second module tosend the data.
Bus Structure
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
116/161
Typical Physical Realization of a Bus Architecture
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
117/161
Traditional ISA with Cache
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
118/161
High Performance Bus
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
119/161
Bus Types
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
120/161
yp
Dedicated Separate data & address lines
Multiplexed
Shared lines
Address valid or data valid control line
Advantage - fewer lines
Disadvantages More complex control
Ultimate performance
Bus Arbitration
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
121/161
More than one module controlling the bus Ex. CPU and DMA controller
Only one module may control bus at one time
Arbitration may be centralised or distributed
Centralized and Distributed Arbitration
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
122/161
Centralised Single hardware device controlling bus access Bus Controller
Arbiter
May be part or separate Distributed
Each module may claim the bus
Control logic on all modules
Timing
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
123/161
g
Co-ordination of events on bus Synchronous
Events determined by clock signals
Control Bus includes clock line
A single 1-0 is a bus cycle
All devices can read clock line
Usually sync on leading edge
Usually a single cycle for an event
Synchronous Timing Diagram
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
124/161
y g g
Asynchronous Timing Read Diagram
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
125/161
y g g
Asynchronous Timing Write Diagram
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
126/161
y g g
PCI Bus
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
127/161
Peripheral Component Interconnection
Intel released to public domain
32 or 64 bit
PCI Bus Lines (required)
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
128/161
System Lines Including clock and reset
Address and Data
32 time mux lines for address/data
Interrupt & validate lines
Interface Control
Arbitration
Not shared Direct connection to PCI bus arbiter
Error Lines
PCI Bus Lines (optional)
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
129/161
Interrupt Lines Not shared
Cache Support
64-bit Bus Extension
Additional 32 lines
Time multiplexed
2 lines to enable devices to agree to use 64-
bit transfer JTAG/Boundary Scan
For testing procedures
PCI Commands
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
130/161
Transaction between initiator (master) and target
Master claims bus
Determine type of transaction
Ex. I/O read/write
Address phase
One or more data phases
PCI Read Timing Diagram
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
131/161
PCI Bus Arbiter
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
132/161
PCI Bus Arbitration
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
133/161
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
134/161
Cache Memory
Chapter 4
Terminology
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
135/161
Capacity: the amount of information that can becontained in a memory unit usually in terms of words or bytes
Word:the natural unit of organization in the memory,typically the number of bits used to represent a number
Addressable unit: the fundamental data element sizethat can be addressed in the memory typically either the word size or individual bytes
Unit of transfer:The number of data elementstransferred at a time usually bits in main memory and blocks in secondary
memory
Transfer rate: Rate at which data is transferred to/fromthe memory device
Terminology
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
136/161
Access time:
For RAM, the time to address the unit andperform the transfer
For non-random access memory, the time toposition the R/W head over the desired location
Memory cycle time: Access time plus any othertime required before a second access can bestarted
Access technique: how are memory contents
accessed
Memory Hierarchy
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
137/161
Major design objective of any memory system
To provide adequate storage capacity at An acceptable level of performance At a reasonable cost
Four interrelated ways to meet this goal
Use a hierarchy of storage devices Develop automatic space allocation methods forefficient use of the memory
Through the use of virtual memory techniques, freethe user from memory management tasks
Design the memory and its related interconnectionstructure so that the processor can operate at or nearits maximum rate
Memory Hierarchy Basis of the memory hierarchy
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
138/161
Basis of the memory hierarchy Registers internal to the CPU for temporary data storage (small in number but very fast) External storage for data and programs (relatively large and fast) External permanent storage (much larger and much slower)
Characteristics of the memory hierarchy Consists of distinct levels of memory components Each level characterized by its size, access time, and cost
per bit Each increasing level in the hierarchy consists of modules
of larger capacity, slower access time, and lower cost/bit Goal of the memory hierarchy
Try to match the processor speed with the rate ofinformation transfer from the lowest element in the hierarchy
Memory Hierarchy Diagram
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
139/161
Hierarchy List
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
140/161
Registers
L1 Cache
L2 Cache
Main memory
Disk cache
Disk
OpticalTape
Cache Memory
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
141/161
y
Cache memory is a critical component of thememory hierarchy
Compared to the size of main memory, cache isrelatively small
Operates at or near the speed of the processor Very expensive compared to main memory
Cache contains copies of sections of main memory
Cache Memory
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
142/161
y
Small amount of fast memory Sits between normal main memory and
CPU
May be located on CPU chip or module
Cache and Main Memory
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
143/161
Cache/Main Memory Structure
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
144/161
Cache Operation - Overview
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
145/161
CPU requests contents of memory location
Check cache for this data
If present, get from cache (fast)
If not present, read required block from main
memory to cache Then deliver from cache to CPU
Cache includes tags to identify which block of
main memory is in each cache slot
Locality of Reference
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
146/161
The cache memory works because of
locality of reference Memory references made by the processor,
for both instructions and data, tend to clustertogether
Instruction loops, subroutines Data arrays, tables
Keep these clusters in high speed memory toreduce the average delay in accessing data
Over time, the clusters being referenced willchange -- memory management must dealwith this
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
147/161
Cache Design
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
148/161
Addressing
Size
Mapping Function
Replacement Algorithm
Write Policy
Block Size
Number of Caches
Cache Addressing
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
149/161
Where does cache sit?
Between processor and virtual memory management unit Between MMU and main memory
Logical cache (virtual cache) stores data usingvirtual addresses
Processor accesses cache directly, not thorough physicalcache
Cache access faster, before MMU address translation
Virtual addresses use same address space for differentapplications
Must flush cache on each context switch
Physical cache stores data using main memoryphysical addresses
Mapping Function
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
150/161
Because there are fewer cache lines than
main memory blocks, an algorithm isneeded for mapping main memory blocksinto cache lines.
The choice of the mapping function dictateshow the cache is organized.
3 techniques:direct, associative, and setassociative.
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
151/161
Direct Mapping
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
152/161
Direct Mapping
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
153/161
Set Associative Mapping
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
154/161
Set-associative mapping is a compromise that
exhibits the strengths of both the direct andassociative approaches while reducing theirdisadvantages.
Set Associative Mapping
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
155/161
Set Associative Mapping
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
156/161
Fully Associative Mapping
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
157/161
Associative mapping overcomes thedisadvantage of direct mapping bypermitting each main memory block to be
loaded into any line of the cache
Fully Associative Mapping
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
158/161
Write Policy
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
159/161
Must not overwrite a cache block unless
main memory is up to date
Multiple CPUs may have individual caches
I/O may address main memory directly
Write Through
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
160/161
All writes go to main memory as well as
cache Multiple CPUs can monitor main memory
traffic to keep local (to CPU) cache up to
date Lots of traffic
Slows down writes
Remember bogus write through caches!
Write Back
-
8/2/2019 Comp Arch Ch1 Ch2 Ch3 Ch4
161/161
Updates initially made in cache only
Update bit for cache slot is set when updateoccurs
If block is to be replaced, write to main
memory only if update bit is set Other caches get out of sync
I/O must access main memory through
cache N B 15% of memory references are writes