daddy! -- where do instructions come from? program sequencer controls program flow and provides the...
Post on 18-Dec-2015
216 views
TRANSCRIPT
Daddy! -- Where do instructions come from?
Program Sequencer controls program flow and
provides the next instruction to be executedStraight line code, jumps and loops
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
2
Tackled today
Program sequencer Linear flow of instruction Why not discuss idle instruction here? Jumps
Software loops – normal and more efficient “down-counting” loops
Special Motorola MC68XXX software loop instructions
Loops – hardware loops
Subroutines -- – next lecture Interrupts and Exceptions – next lecture Idle – next lecture
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
3
Example code
Look at moving elements from array fooHere[ ] to farAway[ ] using various instruction modes Straight line coding In a loop – please make sure that you
understand the terminology – exam question Software loop Hardware loop
In a subroutine Via an interrupt
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
4
Linear program flow
Program flow on the chip is mainly linear
The processor fetches and executes program instructions sequentially
Non sequential structures (instructions and supporting registers) direct the processor to execute an instruction that is not the next sequential address
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
5
Array movement
.extern _fooHere, _farAway; extern long fooHere[5], farAway[5]
P0.H = _fooHere; P0.L = _fooHere;
P1.H = _farAway; P1.L = _farAway;
R0 = [P0]; [P1] = R0;
farAway[0] = fooHere[0];
R0 = [P0 + ?]; [P1 + ?] = R0;
farAway[1] = fooHere[1];
R0 = [P0 + ??]; [P1 + ??] = R0;
farAway[2] = fooHere[2];
farAway[3] = fooHere[3];
farAway[4] = fooHere[4];
Question – What goes in the place of the ? and ?? when doing loop or when doing
[P1 + ?] = R0; W[P1 + ?] = R1; B[P1 + ?] = R2; ANSWER: -- Find out the correct answer – and make sure you do it correctly all the time
ANSWER: -- Why worry? DO THE CODE a different way and don’t
worry
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
6
Better solution – let the processor worry about getting the indexing correct!
.extern _fooHere; .extern _farAway;
.extern _fooHere; .extern _farAway;
extern long fooHere[5], farAway[5]
P0.H = _fooHere; P0.L = _fooHere;
P0.H = _fooHere; P0.L = _fooHere;
P1.H = _farAway; P1.L = _farAway;
P1.H = _farAway; P0.L = _farAway;
R0 = [P0++]; [P1++] = R0;
R0 = [P0]; [P1] = R0;
farAway[0] = fooHere[0];
R0 = [P0++]; [P1++] = R0;
R0 = [P0 + ?]; [P1 + ?] = R0;
farAway[1] = fooHere[1];
R0 = [P0++]; [P1++] = R0;
R0 = [P0 + ??]; [P1 + ??] = R0;
farAway[2] = fooHere[2];
Remember -- P0 will end up pointing PAST the end of the array
farAway[3] = fooHere[3];
farAway[4] = fooHere[4];
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
7
The C++ code we actually developed
.extern _fooHere; .extern _farAway;
extern long fooHere[5]; extern farAway[5];
extern long fooHere[5], farAway[5];
P0.H = _fooHere; P0.L = _fooHere;
long *pt0; pt0 = fooHere; (Actually pt0 = &fooHere[0];)
P1.H = _farAway; P1.L = _farAway;
long *pt1; pt1 = farAway; (Actually pt1 = &ffarAway[0];)
R0 = [P0++]; [P1++] = R0;
*pt1++ = *pt0++; farAway[0] = fooHere[0];
R0 = [P0++]; [P1++] = R0;
*pt1++ = *pt0++; farAway[1] = fooHere[1];
R0 = [P0++]; [P1++] = R0;
*pt1++ = *pt0++; farAway[2] = fooHere[2];
Remember -- P0 will end up pointing PAST the end of the array
farAway[3] = fooHere[3];
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
8
IDLE – Seems the next simplest!
IDLE instruction is part of a sequence of instructions to place the processor in a quiescent state so that something can happen External system can change clock
frequencies – power saving – high clock frequency can mean high power consumption
A ssync instruction MUST immediately follow the idle instruction
Getting out of the idle instruction sequence needs an understanding of interrupts Will discuss more about idle later
More info in instruction ref. manual p 11.3
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
9
Jump instruction Both JUMP and CALL instructions transfer
program flow to another memory location The difference between JUMP and CALL is
that the CALL automatically loads the return address into the RETS register. The return address is the next sequenctal address after the CALL instruction.
JUMPs can be conditional (depends on CC bit in ASTAT register.
Conditional JUMP instructions use static branch prediction to reduce branch latency caused by the length of the Blackfin instruction pipeline. What does “static” branch prediction mean? What is “dynamic” branch prediction?
When possible the assembler will use the short relative jump. The target instruction must be within -4096 to +4094 bytes of the current instruction.
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
10
Array movement
.extern _fooHere, _farAway; extern long fooHere[5], farAway[5]
P0.H = _fooHere; P0.L = _fooHere;
P1.H = _farAway; P1.L = _farAway;
R0 = [P0]; [P1] = R0;
for (int num = 0; num < 5 ; num++) {
R0 = [P0 + ?]; [P1 + ?] = R0;
farAway[num] = fooHere[num];
R0 = [P0 + ??]; [P1 + ??] = R0;
}
…… and so on ….
Linear code – Straight line coding is STILL a viable solution for solving a loop.
You don’t waste any time in incrementing a loop counterYou don’t waste time in checking a loop counterYou don’t waste time upsetting the processor instruction pipeline by jumping back and throwing away all prefetched instructions.
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
11
Standard software LoopThe C++ code we actually developed
.extern _fooHere; .extern _farAway;
P0.H = _fooHere; P0.L = _fooHere;
P1.H = _farAway; P1.L = _farAway;
extern long fooHere[5]; extern farAway[5];
long *pt0; pt0 = fooHere;
long *pt1; pt1 = farAway;
extern long fooHere[5], farAway[5];
R1 = 0; R2 = 5;LOOP: CC = R2 <= R1; IF CC JUMP LOOP_END;
int num = 0; for ( /* empty */; num < 5 ; num++) {
for (int num = 0; num < 5 ; num++) {
R0 = [P0++]; [P1++] = R0;
*pt1++ = *pt0++; farAway[num] = fooHere[num];
R1 += 1; JUMP LOOP;LOOP_END: outside loop
} }
PREDICTED NOT TAKEN
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
12
Program Loops
Most programs have 1 or 2 loops embedded inside each other, occasionally 3 or more
For all images in a list For each row in each image For each column (pixel) in each row For each colour in each pixel
Important to get the maximum efficiency of the instructions that are executed the most often!
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
13
Efficiency of Standard software Loop
Suppose we go round the loop N times
2 loop control instructions outside of loop + 4 * N loop control instructions inside the loop
2 * N “useful instructions” inside loop + 4 useful set up instructions
Loop efficiency =
4 + 2 * N-------------------------- * 100%4 + 2 * N + 2 + 4 * N
If N is large 2 * N ----------- * 100% = 33% 6 * N
.extern _fooHere; .extern _farAway;
P0.H = _fooHere; P0.L = _fooHere;
P1.H = _farAway; P1.L = _farAway;
extern long fooHere[5]; extern farAway[5];
long *pt0; pt0 = fooHere;
long *pt1; pt1 = farAway;
R1 = 0; R2 = 5;LOOP: CC = R2 <= R1; IF CC JUMP LOOP_END;
int num = 0; for ( /* empty */; num < 5 ; num++) {
R0 = [P0++]; [P1++] = R0;
*pt1++ = *pt0++;
R1 += 1; JUMP LOOP;LOOP_END: outside loop
}
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
14
Down-counting software loop
.extern _fooHere; .extern _farAway;
P0.H = _fooHere; P0.L = _fooHere;
P1.H = _farAway; P1.L = _farAway;
extern long fooHere[5]; extern farAway[5];
long *pt0; pt0 = fooHere;
long *pt1; pt1 = farAway;
extern long fooHere[5], farAway[5];
R1 = ; CC = R1 <= 0; IF CC JUMP DO_WHILE_END;
DO_WHILE:
int num = 5 ; if (num > 0) do { // Test needed if // exact value of // num not known
for (int num = 0; num < 5 ; num++) {
R0 = [P0++]; [P1++] = R0;
*pt1++ = *pt0++; farAway[num] = fooHere[num];
R1 += -1; CC = R1 <= 0; IF !CC JUMP DO_WHILE (BP);
DO_WHILE_END: outside loop
} while ( (--num) > 0)
}
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
15
Efficiency of Down-counting software LoopSuppose we go round the loop N times
3 loop control instructions outside of loop + 3 * N loop control instructions inside the loop
2 * N “useful instructions” inside loop + 4 useful set up instructions
Loop efficiency =
4 + 2 * N-------------------------- * 100%4 + 2 * N + 3 + 3 * N
If N is large 2 * N ----------- * 100% = 40% 5 * N
.extern _fooHere; .extern _farAway;
P0.H = _fooHere; P0.L = _fooHere;
P1.H = _farAway; P1.L = _farAway;
extern long fooHere[5]; extern farAway[5];
long *pt0; pt0 = fooHere;
long *pt1; pt1 = farAway;
R1 = ; CC = R1 <= 0; IF CC JUMP DO_WHILE_END;
DO_WHILE:
int num = 5 ; if (num > 0) do { // Test needed if // exact value of // num not known
R0 = [P0++]; [P1++] = R0;
*pt1++ = *pt0++;
R1 += -1; CC = R1 <= 0; IF !CC JUMP DO_WHILE (BP);
DO_WHILE_END: outside loop
} while ( (--num) > 0)
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
16
Efficient loops Motorola MC68XXX has specialized loop instruction – essentially
Decrement the counter (data register) and start the jump occurring While the decrement is occurring, test if OLD COUNTER WAS
LESS THAN ZERO. If old counter less than zero then stop the jump Motorola has specialized memory operations WHICH TAKE MANY
PROCESSOR CYCLES Motorola has instruction [P1++] = [P0++] which has all the following
steps – each taking 4 clock cycles Fetch instruction internReg.L = W[P0]; internReg.H = W[P0+2]; W[P1] = internReg.L; W[P1+2] = internReg.H; P0 += 4; P1 += 4;
TOTAL OF 24 cycles at 8 MHz
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
17
Efficiency of “Motorola-style” Down-counting software Loop with specialized branch instructions
Suppose we go round the loop N times
3 loop control instructions outside of loop + 1 * N loop control instructions inside the loop
1 * N “useful instructions” inside loop + 2 useful set up instructions
Loop efficiency =
6 + 5 * N-------------------------- * 100%6 + 5 * N + 4 + 1 * N
If N is large 5 * N ----------- * 100% = 84% 6 * N
.extern _fooHere; .extern _farAway;
P0 = _fooHere; P1 = _farAway;
extern long fooHere[5]; extern farAway[5];
long *pt0; pt0 = fooHere;
long *pt1; pt1 = farAway;
R1 = (5 – 1); CC = R1 < 0; IF CC JUMP DO_WHILE_END;
DO_WHILE:
int num = 5 ; if (num > 0) do { // Test needed if // exact value of // num not known
[P1++] = [P0++]; *pt1++ = *pt0++;
IF (R1 < 0 ) THEN CONTINUE OTHERWISE (R1 += -1) AND JUMP DO_WHILE (BP);
DO_WHILE_END: outside loop
} while ( (--num) > 0)
NOTE: NOT AVAILABLE ON BLACKFIN
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
18
Blackfin Hardware Loops Blackfin supports a mechanism
for zero-overhead looping Common design decision –
the two inner-most loops are the most often executed – so make those the most efficient
The program sequencer contains TWO loop units, each containing three registers Loop Top registers – LT0,
LT1 Loop Bottom registers –
LB0, LB1 Loop Count registers – LC0,
LC1
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
19
Blackfin Hardware Loops The program sequencer contains TWO loop units, each containing
three registers Loop Top registers – LT0, LT1 Loop Bottom registers – LB0, LB1 Loop Count registers – LC0, LC1
When that when an instruction at address X is executed (meaning PC = = X) and if the address X matches the contents of LBn
(meaning PC = = LBn) and the counter register is greater than equal to 2 (LCx
>= 2) THEN the next instruction will be taken from address
LTn Note that if two loops end on the same
instruction then loop 1 has the highest priority
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
20
Pseudo code example
Set LT0 = first instruction in loop -- LOOP STARTSet LB0 = last instruction in loop; -- LOOP END:Set LC0 = 5;LOOP_START: R0 = [P0++];LOOP_END: [P1++] = R0;
Manual (P4-16) says Each loop register can be loaded individually with a register transfer, but this incurs a significant overhead if the loop count is non-zero (the loop is active) at the time of the transfer.
That sounds unpleasant – so lets find an easier wayManual (P4-16) says The LSETUP instruction can be used to load
all three registers of a loop unit at the same time
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
21
Efficiency of Standard software Loop
Suppose we go round the loop N times
2 loop control instructions outside of loop + 4 * N loop control instructions inside the loop
2 * N “useful instructions” inside loop + 4 useful set up instructions
Loop efficiency =
4 + 2 * N-------------------------- * 100%4 + 2 * N + 2 + 4 * N
If N is large 2 * N ----------- * 100% = 33% 6 * N
.extern _fooHere; .extern _farAway;
P0.H = _fooHere; P0.L = _fooHere;
P1.H = _farAway; P1.L = _farAway;
extern long fooHere[5]; extern farAway[5];
long *pt0; pt0 = fooHere;
long *pt1; pt1 = farAway;
R1 = 0; R2 = 5;LOOP: CC = R2 <= R1; IF CC JUMP LOOP_END;
int num = 0; for ( /* empty */; num < 5 ; num++) {
R0 = [P0++]; [P1++] = R0;
*pt1++ = *pt0++;
R1 += 1; JUMP LOOP;LOOP_END: outside loop
}
WARNING: LOOP_END is an instruction that IS NOT EXECUTED INSIDE THE SOFTWARE LOOP
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
22
Efficiency of Hardware Loop
Suppose we go round the loop N times
2 loop control instructions outside of loop + 0 loop control instructions inside the loop – There are some pipeline overhead issues on leaving loop
2 * N “useful instructions” inside loop + 4 useful set up instructions
Loop efficiency =
4 + 2 * N-------------------------- * 100%4 + 2 * N + 2
If N is large 2 * N ----------- * 100% = 100% 2 * N
.extern _fooHere; .extern _farAway;
P0.H = _fooHere; P0.L = _fooHere;
P1.H = _farAway; P1.L = _farAway;
extern long fooHere[5]; extern farAway[5];
long *pt0; pt0 = fooHere;
long *pt1; pt1 = farAway;
P2 = 5; LSETUP( LOOP_START, LOOP_END) LC1 = P2;
int num = 0; for ( /* empty */; num < 5 ; num++) {
LOOP_START:
R0 = [P0++];
*pt1++ = *pt0++;
LOOP_END: [P1++] = R0;
OUTSIDE_LOOP:
}
WARNING: LOOP_END is an instruction that IS EXECUTED INSIDE THE HARDWARE LOOP
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
23
Big warning
SOFTWARE LOOP HARDWARE LOOP
R1 = 0; R2 = 5;LOOP: CC = R2 <= R1; IF CC JUMP LOOP_END;
LOOP_START:
R0 = [P0++];
R0 = [P0++]; [P1++] = R0;
LOOP_END: [P1++] = R0;
OUTSIDE_LOOP:
R1 += 1; JUMP LOOP;LOOP_END: outside loop
LOOP_END Always executed in hardware loop
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
24
Warning and speed issues The distance between LSETUP instruction and LOOP_START
instruction MUST NOT BE MORE THAN 30 bytes (otherwise the offset description will not fit into the instruction). There is a 4 clock cycle advantage if LSETUP is the instruction
immediately before the LOOP_START instruction The distance between LSETUP instruction and LOOP_END instruction
MUST NOT BE MORE THAN 2046 bytes (otherwise the offset description will not fit into the instruction)
The processor supports a four-location instruction loop buffer. If the loop code contains four or fewer instructions, then no fetched to instruction memory are necessary for any number of loop iterations because the instructions are stored locally. This eliminates instruction fetch time (especially important when
accessing external memory) Really efficient loops are no more than 4 long. Have requested information if 4 instructions or 4 instructions which can
be highly parallel (like 16 instructions in a non-parallel mode)
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
25
Tackled today
Program sequencer Linear flow of instruction Why not discuss idle instruction here? Jumps
Software loops – normal and more efficient “down-counting” loops
Special Motorola MC68XXX software loop instructions
Loops – hardware loops
Subroutines -- – next lecture Interrupts and Exceptions – next lecture Idle – next lecture
04/18/23 Program sequencer , Copyright M. Smith, ECE, University of Calgary, Canada
26
Information taken from Analog Devices On-line Manuals with permission http://www.analog.com/processors/resources/technicalLibrary/manuals/
Information furnished by Analog Devices is believed to be accurate and reliable. However, Analog Devices assumes no responsibility for its use or for any infringement of any patent other rights of any third party which may result from its use. No license is granted by implication or otherwise under any patent or patent right of Analog Devices. Copyright Analog Devices, Inc. All rights reserved.