ARSITEKTUR SISTEM
KOMPUTER
Wayan Suparta, PhD https://wayansuparta.wordpress.com/
17 April 2018
Reduced Instruction Set
Computers (RISC)
• CISC – Complex Instruction Set Computer
• RISC – Reduced Instruction Set Computer
Some Major Advances in
Computers in 50 years
• VLSI
• The family concept
• Microprogrammed
control unit
• Cache memory
• MiniComputers
• Microprocessors
• Pipelining
• PC’s
• Multiple processors
• RISC processors
• Hand helds
RISC
• Reduced Instruction Set Computer
• Key features
– Large number of general purpose registers
(or use of compiler technology to optimize register use)
– Limited and simple instruction set
– Emphasis on optimising the instruction pipeline &
memory management
Comparison of processors
Driving force for CISC
• Software costs far exceed hardware costs
• Increasingly complex high level languages
• A “Semantic” gap between HHL & ML
• This Leads to:
– Large instruction sets
– More addressing modes
– Hardware implementations of HLL statements
• e.g. CASE (switch) on VAX (long, complex structure)
Intention of CISC
• Ease compiler writing
• Improve execution efficiency
– Complex operations in microcode
• Support more complex HLLs
Execution Characteristics Studied
What was studied?
• Operations performed
• Operands used
• Execution sequencing
How was it Studied?
• Studies was done based on programs written in HLLs
• Dynamic studies measured during the execution of the program
Operations
• Assignments
– Movement of data
• Conditional statements (IF, LOOP)
– Sequence control
Observations?
• Procedure call-return is very time consuming
• Some HLL instructions lead to very many
machine code operations
Weighted Relative Dynamic Frequency
of HLL Operations [Patterson]
Dynamic Occurrence
Machine-Instruction Weighted
Memory-Reference Weighted
Pascal
C
Pascal
C
Pascal
C
ASSIGN
45%
38%
13%
13%
14%
15%
LOOP
5%
3%
42%
32%
33%
26%
CALL
15%
12%
31%
33%
44%
45%
IF
29%
43%
11%
21%
7%
13%
GOTO
—
3%
—
—
—
—
OTHER
6%
1%
3%
1%
2%
1%
Operands Observations?
• Predominately local scalar variables
Implications?
• Optimization should concentrate on accessing local variables
Pascal
C
Average
Integer Constant
16%
23%
20%
Scalar Variable
58%
53%
55%
Array/Structure
26%
24%
25%
Procedure Calls
Observations?
• Context switching is quite time consuming
• Depends on number of parameters passed
• Depends on level of nesting
• Most programs do not do a lot of calls
followed by lots of returns
• Most variables used are local
Implications Characterize RISC
• Best support is provided by optimising: – most utilized features and
– most time consuming features
• Conclusions: – Large number of registers
• Used for operand referencing
– Careful design of pipelines • Address branching - Branch prediction etc.
– Simplified instruction set • Reduced length
• Reduced number
Register File
• Software solution
– Require compiler to allocate registers
– Allocate based on most used variables in a given
time
– Requires sophisticated program analysis
• Hardware solution
– Have more registers
– Thus more variables will be in registers
Using Registers for Local Variables
• Store local scalar variables in registers
– Reduces memory accesses
• Every procedure (function) call changes locality
– Parameters must be passed
– Partial context switch
– Results must be returned
– Variables from calling program must be restored
– Partial Context switch
Using “Register Windows”
Observations: • Typically only few local & Pass parameters
• Typically limited range of depth of calls
Implications: • Use multiple small sets of registers
• Calls switch to a different set of registers
• Returns switch back to a previously used set of registers
• Partition register set
Using “Register Windows” cont.
Partition register set into:
– Local registers
– Parameter registers (Passed Parameters)
– Temporary registers (Passing Parameters)
Then:
– Temporary registers from one set overlap parameter registers from the next
This provides parameter passing without moving data (just move one pointer)
Overlapping Register Windows
Picture of Calls & Returns:
Circular Buffer diagram
Operation of Circular Buffer
• When a call is made, a current window pointer is moved to show the currently active register window
• If all windows are in use, an interrupt is generated and the oldest window (the one furthest back in the call nesting) is saved to memory
• A saved window pointer indicates where the next saved windows should be restored
Global Variables
How should we accommodate Global Variables?
• Allocate by the compiler to memory
• Have a static set of registers for global
variables
• Put them in cache
Registers v Cache – which is better?
Large Register File
Cache
All local scalars
Recently-used local scalars
Individual variables
Blocks of memory
Compiler-assigned global variables
Recently-used global variables
Save/Restore based on procedure nesting depth
Save/Restore based on cache replacement algorithm
Register addressing
Memory addressing
Referencing a Scalar -
Window Based Register File
Referencing a Scalar - Cache
Compiler Based Register Optimization Basis:
• Assuming relatively small number of registers (16-32)
• Optimizing the use is up to compiler
• HLL programs have no explicit references to registers
Process: • Assign symbolic or virtual register to each candidate variable
• Map (unlimited) symbolic registers to (limited) real registers
• Symbolic registers that do not overlap can share real registers
• If you run out of real registers some variables use memory
Graph Coloring Algorithm for Reg Assign
Given:
• A graph of nodes and edges
• Nodes are symbolic registers
• Two symbolic registers that are live in the same program fragment
are joined by an edge
Then:
• Assign a color to each node
• Adjacent nodes must have different colors
• Assign minimum number of colors
And then:
• Try to color the graph with n colors, where n is the number of real
registers
• Nodes that can not be colored are placed in memory
Graph Coloring Algorithm Example
The debate: Why CISC (1 of 2)? • Compiler simplification?
– Dispute…
- Complex machine instructions are harder to exploit
- Optimization actually may be more difficult
• Smaller programs? (Memory is now cheap)
– Programs may take up less instructions, but…
– May not occupy less memory,
just look shorter in symbolic form
• More instructions require longer op-codes, more memory references
• Register references require fewer bits
The Debate: Why CISC (2 of 2)?
• Faster programs? – More complex control unit
– Microprogram control store larger
– Thus instructions take longer to execute
• Bias towards use of simpler instructions ?
• It is far from clear that CISC is the appropriate solution
Early RISC Computers
• MIPS – Microprocessor without Interlocked Pipeline
Stages
Stanford (John Hennessy)
MIPS Technology
• SPARC – Scalable Processor Architecture
Berkeley (David Patterson)
Sun Microsystems
• 801 – IBM Research (George Radin)
Concentrating on RISC: Major Characteristics: • One instruction per cycle
• Register to register operations
• Few, simple addressing modes
• Few, simple instruction formats
Also: • Hardwired design (no microcode)
• Fixed instruction format
But: • More compile time/effort
Breadth of RISC Characteristics
Characteristics of Example Processors
Memory to memory vs Register to
memory Operations
Controversy: CISC vs RISC
• Challenges of comparison – There are no pair of RISC and CISC that are directly
comparable – There are no definitive set of test programs – It is difficult to separate hardware effects from
complier effects – Most comparisons are done on “toy” rather than
production machines – Most commercial machines are a mixture
• Not clear cut • Today’s designs borrow from both philosophies
RISC Pipelining basics
• Two phases of execution for register based instructions
– I: Instruction fetch
– E: Execute
• ALU operation with register input and output
• For load and store there need to be three
– I: Instruction fetch
– E: Execute
• Calculate memory address
– D: Memory
• Register to memory or memory to register operation
Effects of RISC Pipelining
(Allows 2 memory accesses per stage)
(E1 register read, E2 execute & register write
Optimization of RISC Pipelining
• Delayed branch
– Leverages branch that does not take effect until
after execution of following instruction
– This, following instruction becomes the delay slot
Normal vs Delayed Branch
Example of Delayed Branch (cleaver!)
What is wrong with this example? Why is there a Write back
More Options for RISC
Architectures
• RISC Pipelining
– Superpipelined – more fine grained pipeline
(more stages in pipeline)
– Superscalar – replicates stages of pipeline
(multiple pipelines)
MIPS 4000 RISC Machine • 64 bit architecture (4 Gig address space), (1 Terabyte of file mapping)
• Partitioned into CPU & MMU
• 32 registers (R0=0), but
• 128K Cache – ½ Instructions, ½ Data
• One 32 bit word for each instruction (94 Instructions)
• All operations are register to register
• No condition codes! Flags are stored in general registers for explicit use – simplifies branch optimization
• Only on load/Store Format – Base, Offset extended addressing synthesized with multiple instructions
• Uses branch prediction
• Especially designed for Embedded computing
• Has multiple FPU’s – FP likely stalls pipeline
• See MIPS instructions - page 485
• See Formats – page 486