basic computer architecture
DESCRIPTION
Basic Computer Architecture. CSCE 496/896: Embedded Systems Witawas Srisa-an. Review of Computer Architecture. Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. I made some modifications to the note for clarity. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/1.jpg)
Basic Computer Architecture
CSCE 496/896: Embedded SystemsWitawas Srisa-an
![Page 2: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/2.jpg)
Review of Computer Architecture
Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook.
I made some modifications to the note for clarity. Assume some background information from CSCE 430 or equivalent
![Page 3: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/3.jpg)
von Neumann architecture
Memory holds data and instructions.Central processing unit (CPU) fetches instructions from memory. Separate CPU and memory distinguishes programmable computer.
CPU registers help out: program counter (PC), instruction register (IR), general-purpose registers, etc.
![Page 4: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/4.jpg)
von Neumann Architecture
MemoryUnit
CPUControl + ALU
OutputUnit
InputUnit
![Page 5: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/5.jpg)
CPU + memory
memoryCPU
PC
address
data
IRADD r5,r1,r3200
200
ADD r5,r1,r3
![Page 6: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/6.jpg)
Recalling Pipelining
![Page 7: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/7.jpg)
Recalling Pipelining
What is a potentialProblem with von NeumannArchitecture?
![Page 8: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/8.jpg)
Harvard architecture
CPU
PCdata memory
program memory
address
data
address
data
![Page 9: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/9.jpg)
von Neumann vs. Harvard
Harvard can’t use self-modifying code.Harvard allows two simultaneous memory fetches.
Most DSPs (e.g Blackfin from ADI) use Harvard architecture for streaming data: greater memory bandwidth. different memory bit depths between instruction and data.
more predictable bandwidth.
![Page 10: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/10.jpg)
Today’s Processors
Harvard or von Neumann?
![Page 11: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/11.jpg)
RISC vs. CISC
Complex instruction set computer (CISC): many addressing modes; many operations.
Reduced instruction set computer (RISC): load/store; pipelinable instructions.
![Page 12: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/12.jpg)
Instruction set characteristics
Fixed vs. variable length.Addressing modes.Number of operands.Types of operands.
![Page 13: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/13.jpg)
Tensilica Xtensa
RISC based variable length But not CISC
![Page 14: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/14.jpg)
Programming model
Programming model: registers visible to the programmer.
Some registers are not visible (IR).
![Page 15: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/15.jpg)
Multiple implementations
Successful architectures have several implementations: varying clock speeds; different bus widths; different cache sizes, associativities, configurations;
local memory, etc.
![Page 16: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/16.jpg)
Assembly language
One-to-one with instructions (more or less).
Basic features: One instruction per line. Labels provide names for addresses (usually in first column).
Instructions often start in later columns.
Columns run to end of line.
![Page 17: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/17.jpg)
ARM assembly language example
label1 ADR r4,cLDR r0,[r4] ; a commentADR r4,dLDR r1,[r4]SUB r0,r0,r1 ; comment
destination
![Page 18: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/18.jpg)
Pseudo-ops
Some assembler directives don’t correspond directly to instructions: Define current address. Reserve storage. Constants.
![Page 19: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/19.jpg)
Pipelining
Execute several instructions simultaneously but at different stages.
Simple three-stage pipe:fe
tch
deco
de
exec
ute
mem
ory
![Page 20: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/20.jpg)
Pipeline complications
May not always be able to predict the next instruction: Conditional branch.
Causes bubble in the pipeline:fetch decode Execute
JNZ
fetch decode execute
fetch decode execute
![Page 21: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/21.jpg)
Superscalar
RISC pipeline executes one instruction per clock cycle (usually).
Superscalar machines execute multiple instructions per clock cycle. Faster execution. More variability in execution times. More expensive CPU.
![Page 22: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/22.jpg)
Simple superscalar
Execute floating point and integer instruction at the same time. Use different registers. Floating point operations use their own hardware unit.
Must wait for completion when floating point, integer units communicate.
![Page 23: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/23.jpg)
Costs
Good news---can find parallelism at run time. Bad news---causes variations in execution time.
Requires a lot of hardware. n2 instruction unit hardware for n-instruction parallelism.
![Page 24: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/24.jpg)
Finding parallelism
Independent operations can be performed in parallel:ADD r0, r0, r1ADD r3, r2, r3ADD r6, r4, r0
+ +
+
r0 r1 r2 r3
r0 r4
r6
r3
![Page 25: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/25.jpg)
Pipeline hazards
• Two operations that have data dependency cannotbe executed in parallel:
x = a + b;a = d + e;y = a - f;
-
+
+
x
a
b
d
e
a
y
f
![Page 26: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/26.jpg)
Order of execution
In-order: Machine stops issuing instructions when the next instruction can’t be dispatched.
Out-of-order: Machine will change order of instructions to keep dispatching.
Substantially faster but also more complex.
![Page 27: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/27.jpg)
VLIW architectures
Very long instruction word (VLIW) processing provides significant parallelism.
Rely on compilers to identify parallelism.
![Page 28: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/28.jpg)
What is VLIW?
Parallel function units with shared register file:
register file
functionunit
functionunit
functionunit
functionunit
...
instruction decode and memory
![Page 29: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/29.jpg)
VLIW cluster
Organized into clusters to accommodate available register bandwidth:
cluster cluster cluster...
![Page 30: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/30.jpg)
VLIW and compilers
VLIW requires considerably more sophisticated compiler technology than traditional architectures---must be able to extract parallelism to keep the instructions full.
Many VLIWs have good compiler support.
![Page 31: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/31.jpg)
Scheduling
a b
c
d
e f
g
a b e
f c
d g
nop
nop
expressions instructions
![Page 32: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/32.jpg)
EPIC
EPIC = Explicitly parallel instruction computing.
Used in Intel/HP Merced (IA-64) machine.
Incorporates several features to allow machine to find, exploit increased parallelism.
![Page 33: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/33.jpg)
IA-64 instruction format
Instructions are bundled with tag to indicate which instructions can be executed in parallel:
tag instruction 1 instruction 2 instruction 3
128 bits
![Page 34: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/34.jpg)
Memory system
CPU fetches data, instructions from a memory hierarchy:
Mainmemory
L2cache
L1cache CPU
![Page 35: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/35.jpg)
Memory hierarchy complications
Program behavior is much more state-dependent. Depends on how earlier execution left the cache.
Execution time is less predictable. Memory access times can vary by 100X.
![Page 36: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/36.jpg)
Memory Hierarchy Complication
Pentium 3-M Pentium 4-M Pentium M
Core P6 (Tualatin 0.13µ)
Netburst (Northwood 0.13µ)
"P6+" (Banias 0.13µ, Dothan
0.09µ)
L1 Cache(data + code) 16Kb + 16Kb 8Kb + 12Kµops (TC) 32Kb + 32Kb
L2 Cache 512Kb 512Kb 1024Kb
Instructions Sets MMX, SSE MMX, SSE, SSE2 MMX, SSE, SSE2
Max frequencies (CPU/FSB)
1.2GHz133MHz
2.4GHz400MHz (QDR)
2GHz400MHz (QDR)
Number of transistors 44M 55M 77M, 140M
SpeedStep 2nd generation 2nd generation 3rd generation
![Page 37: Basic Computer Architecture](https://reader035.vdocuments.net/reader035/viewer/2022070423/5681678d550346895ddcb1a7/html5/thumbnails/37.jpg)
End of Overview
Next class: Altera Nios II processors