cs480 computer science seminar fall, 2002

22
RISC architecture and instruction Level Parallelism (ILP) based on “Computer Architecture: a Quantitative Approach” by Hennessy and Patterson, Morgan Kaufmann publishing Co. 1996 CS480 Computer Science Seminar Fall, 2002

Upload: finley

Post on 07-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

RISC architecture and instruction Level Parallelism (ILP) based on “Computer Architecture: a Quantitative Approach” by Hennessy and Patterson, Morgan Kaufmann publishing Co. 1996. CS480 Computer Science Seminar Fall, 2002. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS480 Computer Science Seminar Fall, 2002

RISC architecture and instruction Level Parallelism (ILP)

based on “Computer Architecture: a Quantitative Approach” by Hennessy and Patterson, Morgan Kaufmann publishing Co. 1996

CS480 Computer Science Seminar

Fall, 2002

Page 2: CS480 Computer Science Seminar Fall, 2002

CISC versus RISC: Historical factors that affect the architecture of the processors

• M6800 was introduced, 16k RAM chips cost $500, and 40 MB hard disks cost $55,000. When MC68000 was introduced, 64k RAMs still cost around several hundred dolla4rs, and 10 MB hard disks cost $5,000. During such time periods, code size is among top conderations which led to CISC design.

• As succeeding generations of processors were introduced, manufacturers continued to offer upward compatibility and at the same time, added more capability to the old design, which led to even more complex design. The complex instruction set made it difficult to support higher clock rate.

• Furthermore, the machine architects wanted to close the “semantic gap” between machine and high level instruction sets, which encouraged CISC design.

Page 3: CS480 Computer Science Seminar Fall, 2002

The justification for RISC design

• Advancement of VLSI technology drastically drives the cost down (RAMs, hard disks, etc.)

• Research conducted in1971 (Knuth) and 1982 (Patterson) showed that 85% of a program’s statements were assignment, conditional branch, or procedure calls and nearly 80% of the assignment statements were MOVE instructions without arithmetic operations.

Page 4: CS480 Computer Science Seminar Fall, 2002

The bridge from CISC to RISC

• Instruction prefetching: fetch the next instruction(s) into an instruction queue before the current instruction is completed.

• Pipelining: execute the the next instruction before the the completion of the execution of the current instruction. (each instruction is carried out in several stages, e.g, PowerPC601 has 20 stages).

• Superscalar operation: processor can issue more than one instruction simultaneously. The number of instructions issued may vary as program executes.

• It is very difficult to inplement the above speed-up techniques with CISC processors (because the instructions are long and of variable length and there are usually some many different addressing mode. Also the operand access often depends on complex address arithmetic.

Page 5: CS480 Computer Science Seminar Fall, 2002

RISC design philosophy

• One instruction issue per cycle.• Fixed length instruction.• Only load and store instructions access memory.• Simplified address mode: usually register indirect and indexed, where the

index may he in a register or may be an immediate constant.• Fewer, simpler operations (means shorter clock cycles, since less is done in a

given clock cycle.)• Delayed loads and branches: ofter these instructions take more than one

cycle to complete. The processor is allowed to execute other instructions following the load or branch while it completes.

• Prefetch (instructions, operands, and branch taget) and speculative execution (guess the outcome of a condition and execute the code; if gussed wrong, the result is simply discarded).

• Let the compiler figure out the dependences among instructions and schedule the instruction in a way that the number of “delay slots” are minimized.

Page 6: CS480 Computer Science Seminar Fall, 2002

Pipeline concept:a simplified 5-stage pipeline

Page 7: CS480 Computer Science Seminar Fall, 2002

Latencies of some operations to be used in the following example

Page 8: CS480 Computer Science Seminar Fall, 2002

How scheduling of instruction can reduce total execution time by exploiting ILP

• Example

R1 is initially the address of the element in the array with the highest address; F2 contains the scalar value of s; for simplicity, the element of the array with the lowest address is assumed to be zero.

Note: the body of each iteration is independent

Page 9: CS480 Computer Science Seminar Fall, 2002

The straightforward assembler code of the above loop without showing the “stall”

Page 10: CS480 Computer Science Seminar Fall, 2002

The straightforward assembler code with the “stall” machine/clock cycles indicated:

before scheduling, it takes 9 cycle per iteration

Page 11: CS480 Computer Science Seminar Fall, 2002

The straightforward assembler code with the “stall” machine/clock cycles indicated:

after scheduling, it takes only 6 cycles

Page 12: CS480 Computer Science Seminar Fall, 2002

Loop unrolling technique – replicating the loop body multiple times - to further reduce the execution time

(before scheduling)

Note that there are 4 copies of loop body, assuming R1 is initially a multiple of 32 (loop iterations is a multiple of 4). Also note that registers are not reused. This loop will run in 27 cycles: each LD takes 2 cycles, each ADDD 3, the branch 2, and all other instructions 1; or approximately 6.8 cycles per array element.

Page 13: CS480 Computer Science Seminar Fall, 2002

Loop unrolling technique – replicating the loop body multiple times - to further reduce the execution time

(after scheduling)

After scheduling, the loop runs in 14 cycles or 3.5 cycles per array element.

Page 14: CS480 Computer Science Seminar Fall, 2002

Complication of scheduling :inter-dependency among instructions

• Data dependences: instruction j is data dependent on instruction I if either of the following hold– Instruction i produces a result that is used by

instruction j.– Instruction j is data dependent on instruction k, and

instruction k is data dependent on instruction i.

• Name dependences• Control dependences

Page 15: CS480 Computer Science Seminar Fall, 2002

Data dependency example

Page 16: CS480 Computer Science Seminar Fall, 2002

Unrolling a loop sometimes eliminate data dependences. In the example below, the arrows indicate dependency. But

was discussed before, the SUBIs are not needed.

Page 17: CS480 Computer Science Seminar Fall, 2002

Name dependences

• A name dependence occurs when two instructions use the same register or momory location, called a name, but there is no flow of data between the instructions associated with that name.

• Two types– Antidependence: instruction j writes a register or memory location that

instructin i reads and instruction i is executed first.– Output dependence: instructin i and j write the same register or memory

location.

• Instructions involved in a name dependence can execute simultanously or be reordered (since no value is being transmitted between these instructions) if the name (register number or memory location) used in the instructions is changed so the instructions do not conflict.

Page 18: CS480 Computer Science Seminar Fall, 2002

Name dependences: example shows both data (light arrows) and name dependences (dark arrows)

Page 19: CS480 Computer Science Seminar Fall, 2002

Name dependences removed by renaming the registers: only true data dependences (light arrows) are left

Page 20: CS480 Computer Science Seminar Fall, 2002

Control dependence

if p1 { s1; }

if p2 { s2;}

s1 is control dependent on p1, and s2 is control dependent on p2 but not on p1.

• There are two constraints:– an instruction that is control dependent on a branch

cannot be moved before the branch.– An instruction that is not control dependent on a

branch cannot be moved after the branch so that its execution is controlled by the branch.

Page 21: CS480 Computer Science Seminar Fall, 2002

Control dependence example

Page 22: CS480 Computer Science Seminar Fall, 2002

VLIW