Download - ARSITEKTUR SISTEM KOMPUTER - …€¦ARSITEKTUR SISTEM KOMPUTER Wayan Suparta, PhD ... –Careful design of pipelines •Address branching - Branch prediction etc

ARSITEKTUR SISTEM

KOMPUTER

Wayan Suparta, PhD https://wayansuparta.wordpress.com/

17 April 2018

Reduced Instruction Set

Computers (RISC)

• CISC – Complex Instruction Set Computer

• RISC – Reduced Instruction Set Computer

Some Major Advances in

Computers in 50 years

• VLSI

• The family concept

• Microprogrammed

control unit

• Cache memory

• MiniComputers

• Microprocessors

• Pipelining

• PC’s

• Multiple processors

• RISC processors

• Hand helds

RISC

• Reduced Instruction Set Computer

• Key features

– Large number of general purpose registers

(or use of compiler technology to optimize register use)

– Limited and simple instruction set

– Emphasis on optimising the instruction pipeline &

memory management

Comparison of processors

Driving force for CISC

• Software costs far exceed hardware costs

• Increasingly complex high level languages

• A “Semantic” gap between HHL & ML

• This Leads to:

– Large instruction sets

– More addressing modes

– Hardware implementations of HLL statements

• e.g. CASE (switch) on VAX (long, complex structure)

Intention of CISC

• Ease compiler writing

• Improve execution efficiency

– Complex operations in microcode

• Support more complex HLLs

Execution Characteristics Studied

What was studied?

• Operations performed

• Operands used

• Execution sequencing

How was it Studied?

• Studies was done based on programs written in HLLs

• Dynamic studies measured during the execution of the program

Operations

• Assignments

– Movement of data

• Conditional statements (IF, LOOP)

– Sequence control

Observations?

• Procedure call-return is very time consuming

• Some HLL instructions lead to very many

machine code operations

Weighted Relative Dynamic Frequency

of HLL Operations [Patterson]

Dynamic Occurrence

Machine-Instruction Weighted

Memory-Reference Weighted

Pascal

C

Pascal

C

Pascal

C

ASSIGN

45%

38%

13%

13%

14%

15%

LOOP

5%

3%

42%

32%

33%

26%

CALL

15%

12%

31%

33%

44%

45%

IF

29%

43%

11%

21%

7%

13%

GOTO

—

3%

—

—

—

—

OTHER

6%

1%

3%

1%

2%

1%

Operands Observations?

• Predominately local scalar variables

Implications?

• Optimization should concentrate on accessing local variables

Pascal

C

Average

Integer Constant

16%

23%

20%

Scalar Variable

58%

53%

55%

Array/Structure

26%

24%

25%

Procedure Calls

Observations?

• Context switching is quite time consuming

• Depends on number of parameters passed

• Depends on level of nesting

• Most programs do not do a lot of calls

followed by lots of returns

• Most variables used are local

Implications Characterize RISC

• Best support is provided by optimising: – most utilized features and

– most time consuming features

• Conclusions: – Large number of registers

• Used for operand referencing

– Careful design of pipelines • Address branching - Branch prediction etc.

– Simplified instruction set • Reduced length

• Reduced number

Register File

• Software solution

– Require compiler to allocate registers

– Allocate based on most used variables in a given

time

– Requires sophisticated program analysis

• Hardware solution

– Have more registers

– Thus more variables will be in registers

Using Registers for Local Variables

• Store local scalar variables in registers

– Reduces memory accesses

• Every procedure (function) call changes locality

– Parameters must be passed

– Partial context switch

– Results must be returned

– Variables from calling program must be restored

– Partial Context switch

Using “Register Windows”

Observations: • Typically only few local & Pass parameters

• Typically limited range of depth of calls

Implications: • Use multiple small sets of registers

• Calls switch to a different set of registers

• Returns switch back to a previously used set of registers

• Partition register set

Using “Register Windows” cont.

Partition register set into:

– Local registers

– Parameter registers (Passed Parameters)

– Temporary registers (Passing Parameters)

Then:

– Temporary registers from one set overlap parameter registers from the next

This provides parameter passing without moving data (just move one pointer)

Overlapping Register Windows

Picture of Calls & Returns:

Circular Buffer diagram

Operation of Circular Buffer

• When a call is made, a current window pointer is moved to show the currently active register window

• If all windows are in use, an interrupt is generated and the oldest window (the one furthest back in the call nesting) is saved to memory

• A saved window pointer indicates where the next saved windows should be restored

Global Variables

How should we accommodate Global Variables?

• Allocate by the compiler to memory

• Have a static set of registers for global

variables

• Put them in cache

Registers v Cache – which is better?

Large Register File

Cache

All local scalars

Recently-used local scalars

Individual variables

Blocks of memory

Compiler-assigned global variables

Recently-used global variables

Save/Restore based on procedure nesting depth

Save/Restore based on cache replacement algorithm

Register addressing

Memory addressing

Referencing a Scalar -

Window Based Register File

Referencing a Scalar - Cache

Compiler Based Register Optimization Basis:

• Assuming relatively small number of registers (16-32)

• Optimizing the use is up to compiler

• HLL programs have no explicit references to registers

Process: • Assign symbolic or virtual register to each candidate variable

• Map (unlimited) symbolic registers to (limited) real registers

• Symbolic registers that do not overlap can share real registers

• If you run out of real registers some variables use memory

Graph Coloring Algorithm for Reg Assign

Given:

• A graph of nodes and edges

• Nodes are symbolic registers

• Two symbolic registers that are live in the same program fragment

are joined by an edge

Then:

• Assign a color to each node

• Adjacent nodes must have different colors

• Assign minimum number of colors

And then:

• Try to color the graph with n colors, where n is the number of real

registers

• Nodes that can not be colored are placed in memory

Graph Coloring Algorithm Example

The debate: Why CISC (1 of 2)? • Compiler simplification?

– Dispute…

- Complex machine instructions are harder to exploit

- Optimization actually may be more difficult

• Smaller programs? (Memory is now cheap)

– Programs may take up less instructions, but…

– May not occupy less memory,

just look shorter in symbolic form

• More instructions require longer op-codes, more memory references

• Register references require fewer bits

The Debate: Why CISC (2 of 2)?

• Faster programs? – More complex control unit

– Microprogram control store larger

– Thus instructions take longer to execute

• Bias towards use of simpler instructions ?

• It is far from clear that CISC is the appropriate solution

Early RISC Computers

• MIPS – Microprocessor without Interlocked Pipeline

Stages

Stanford (John Hennessy)

MIPS Technology

• SPARC – Scalable Processor Architecture

Berkeley (David Patterson)

Sun Microsystems

• 801 – IBM Research (George Radin)

Concentrating on RISC: Major Characteristics: • One instruction per cycle

• Register to register operations

• Few, simple addressing modes

• Few, simple instruction formats

Also: • Hardwired design (no microcode)

• Fixed instruction format

But: • More compile time/effort

Breadth of RISC Characteristics

Characteristics of Example Processors

Memory to memory vs Register to

memory Operations

Controversy: CISC vs RISC

• Challenges of comparison – There are no pair of RISC and CISC that are directly

comparable – There are no definitive set of test programs – It is difficult to separate hardware effects from

complier effects – Most comparisons are done on “toy” rather than

production machines – Most commercial machines are a mixture

• Not clear cut • Today’s designs borrow from both philosophies

RISC Pipelining basics

• Two phases of execution for register based instructions

– I: Instruction fetch

– E: Execute

• ALU operation with register input and output

• For load and store there need to be three

– I: Instruction fetch

– E: Execute

• Calculate memory address

– D: Memory

• Register to memory or memory to register operation

Effects of RISC Pipelining

(Allows 2 memory accesses per stage)

(E1 register read, E2 execute & register write

Optimization of RISC Pipelining

• Delayed branch

– Leverages branch that does not take effect until

after execution of following instruction

– This, following instruction becomes the delay slot

Normal vs Delayed Branch

Example of Delayed Branch (cleaver!)

What is wrong with this example? Why is there a Write back

More Options for RISC

Architectures

• RISC Pipelining

– Superpipelined – more fine grained pipeline

(more stages in pipeline)

– Superscalar – replicates stages of pipeline

(multiple pipelines)

MIPS 4000 RISC Machine • 64 bit architecture (4 Gig address space), (1 Terabyte of file mapping)

• Partitioned into CPU & MMU

• 32 registers (R0=0), but

• 128K Cache – ½ Instructions, ½ Data

• One 32 bit word for each instruction (94 Instructions)

• All operations are register to register

• No condition codes! Flags are stored in general registers for explicit use – simplifies branch optimization

• Only on load/Store Format – Base, Offset extended addressing synthesized with multiple instructions

• Uses branch prediction

• Especially designed for Embedded computing

• Has multiple FPU’s – FP likely stalls pipeline

• See MIPS instructions - page 485

• See Formats – page 486

Download - ARSITEKTUR SISTEM KOMPUTER - …€¦ARSITEKTUR SISTEM KOMPUTER Wayan Suparta, PhD ... –Careful design of pipelines •Address branching - Branch prediction etc

Top Related