moving arrays -- 1 completion of ideas needed for a general and complete program final concepts...

30
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

Upload: hugh-adams

Post on 04-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

Moving Arrays -- 1

Completion of ideas needed for a general and complete program

Final concepts needed for FinalReview for Final – Loop efficiency

Page 2: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

2 / 29

Tackled today

Declaring and initializing arrays off the stack – Review and a little bit of new Useful for background DMA tasks Useful for minimizing total memory used in

non-general program Declaring arrays and variables on the stack

– Review and a little bit of new Re-entrant code and thread safe

Demonstrating memory to memory DMA

Page 3: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

3 / 29

Declaring fixed arrays in memory – not on the stack

short foo_startarray[40];short far_finalarray[40];

void HalfWaveRectifyASM( ) {// Take the signal from foo_startarray[ ] and rectify the signal// Half wave rectify – if > 0 keep the same; if < 0 make zero// Full wave rectify – if > 0 keep the same; if < 0 then abs value// Rectify startarray[ ] and place result in finalarray[ ]

for (int count = 0; count < 40; count++) {if (foo_startarray[count] < 0) far_finalarray[count] = 0;else far_finalarray[count] = foo_startarray[count];

}}

The program code is the same – but the data part is not

Page 4: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

4 / 29

First attemptto get correctanswer

.section data1

Tells linker to place this stuff in memory map location data1

.align 4 – adjust address to end in 0, 4, 8 or CWe know processor works best when we start things on a boundary between groups of 4 bytes

[N * 2] We need N short ints

We know the processor works with address working in bytes. Therefore need N * 2 bytes sounds sensible

Page 5: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

5 / 29

“wrong approach” – does not match with what C / C++ does with memory

20 bytes (16 bits) for

N short value in C++ = N * 2 bytes

Page 6: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

6 / 29

“Correct approach was NOT what I expected”

ASM Array with space for N long ints .var arrayASM[N]; better .byte4 arrayASM[N];

ASM Array with space for N short ints var arrayASM[N / 2]; better .byte2 arrayASM[N};

ASM Array with space for N chars var arrayASM[N / 4]; better .byte arrayASM[N];

Page 7: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

7 / 29

Better answer is “Look at the assembler manual”

Page 8: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

8 / 29

Improving what we did before

Big warning – external array initialization occurs on “reload” of your program code and NOT on “restart” of your program code (WHY?) Understanding why this is true and why it is a problem will solve many issues when programming

Page 9: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

9 / 29

Page 10: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

10 / 29

When DMA might be useful-- Video manipulation

ProgramWait for picture 1 to come in – video-inProcess picture 1 – lots of

mathematics perhapsWait for picture 1 to be transmitted –

video out Spending a lot of time waiting rather

than doing

Page 11: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

11 / 29

When DMA might be useful-- Double Buffering

Program1. Wait for picture 2 memory to fill – video-in2. Picture 3 comes into memory – background DMA task from input

Process picture 2 – place result into picture 0 location3. Picture 4 comes into memory – background DMA task from input

Process picture 3 – place result into picture 1 locationTransmit picture 0 – background DMA task to output

4. Picture 0 comes into memory – background DMA task from inputProcess picture 4 – place result into picture 2 locationTransmit picture 1– background DMA task to output

5. Picture 1 comes into memory – background DMA task from inputProcess picture 0 – place result into picture 3 locationTransmit picture 2 – background DMA task to output

6. Picture 2 comes into memory – background DMA task from inputProcess picture 1 – place result into picture 4 locationTransmit picture 3– background DMA task to output

7. REPEAT STEPS FOR EVER

Page 12: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

12 / 29

We are only going to look at a simple DMA task

Normal code when trying to move data from one location to another Number of simple examples in Lab. 3 using SPI interface1) P0 address of start_array[0];2) P1 address of final_array[0];3) R0 number of data items to be transferred needed to transfer4) R1 How many values already transferred

5) R1 = 0;LOOP: 6) CC = R0 <= R17) IF CC JUMP DONE:8) R2 = [P0++]; VERY BIG PIPELINE9) [P1++] = R2; LATENCY ISSUES10) JUMP LOOP; MANY INTERNAL

PROCESSOR STALLS ON DATA BUSDONE: WHILE WAIT FOR R2 TO BE

Must wait to Do something else READ, STORED and then TRANSMITTED

INSTRUCTION BUS STALLS EVERY TIMETHE CODEJUMPS-- LOSE 4 CYCLES

Page 13: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

13 / 29

We are only going to look at a simple DMA task

DMA special hardware that works without the processor 1) DMA_source_address_register address of start_array[0]; 2) DMA_destination_address_register address of final_array[0];3) DMA_max_count_register max-value needed to transfer 4) DMA_count_register How many values already transferred

R1 = 0;LOOP:

CC = R0 <= R1IF CC JUMP DONE: 5) DMA_enable = true

R2 = [P0++]; DMA transfer happen in background [P1++] = R2; Miminized pipeline issuesJUMP LOOP;

DONE:Do something else Processor can do something else

immediately while DMA hardware handles all the memory transfers WITHOUT PROCESSOR HELP.

Page 14: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

14 / 29

Write some tests so we know how to proceed -- Test 1

Is DMA useful when the arrays being moved are in the processor’sinternal memory and placed on the stack as with this code

Page 15: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

15 / 29

Write some test so we know how to proceed -- Test 2

IS DMA useful when both the arrays are placed in external memorySDRAM is needed for large video images

SDRAM -- MANY MEGS AVAILABLE

SDRAM addresses hard-coded in this example

Page 16: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

16 / 29

Write some test so we know how to proceed -- Test 3

Most probable way to use DMA – Store video arrays in SLOW external memoryMove to FAST internal memory for processing, put result back into external

SDRAM addresses hard-coded in this example

WAIL -- Can use compilersection (“SDRAM”) syntax

Page 17: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

17 / 29

Some resultsCode details later

CompilerDebug Mode

CompilerRelease Mode

L1 L1Internal memory

8748 625

L1 L1 DMA 6579 6477DMA slower

SDRAM SDRAMexternal

39132 28200

SDRAM SDRAM DMA

12175 12090

SDRAM L1 DMA 5265 4836

SDRAM L1 DMAL1 SDRAM DMA

9792 9276

Page 18: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

18 / 29

Memory to memory move Debug Code

Page 19: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

19 / 29

Review for final

A) What happened here?

B) What happened here?

C) What happened here?

E) What happened here?

F) Determine loop efficiency in terms of instructions in terms of cycles / read_write op

D) Why did this happen?

Page 20: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

20 / 29

Answer questions

A B C D E

Page 21: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

21 / 29

Review for finalInternal memory to Internal memory

F) Determine loop efficiency in terms of cycles / read_write op

internal memory -> internal memory

size was 300 Useful reads 300 Useful writes 300

Cycles 8748 as measured

8748 / 600 = 14.58

Why not an exact number?Instructions in loop? 19Total # of reads / write 9 / loop2700 read / writes – around 3 cycles

Page 22: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

22 / 29

Review for finalSDRAM to SDRAM

F) Determine loop efficiency in terms of cycles / read_write op

SDRAM external -> SDRAM memory Useful reads / writes 300 each

Cycles 39132 as measured 39132 / 600 = 65.22

Why not an exact number?Instructions in loop? 19Total # of reads / write 9 / loop7 * 300 read / writes internal2 * 300 read / writes external

Time r/w external = 39132 – 2100*3 33000 / 600 = 5.5 cycles Factor of 2 slower

Page 23: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

23 / 29

Memory to memory moveRelease Mode

Page 24: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

24 / 29

Review for final

A) What happened here?

B) What happened here?

C) What happened here?

E) What happened here?

F) Determine loop efficiency in terms of instructions in terms of cycles / read_write op

D) Why did this happen inside loop?

Page 25: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

25 / 29

Answer questions

A B C D E

Page 26: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

26 / 29

F) Determine loop efficiency in terms of cycles / read_write op

internal memory -> internal memory

size was 300 Useful reads 300 Useful writes 300

Cycles 625 as measured

625 / 600 = 1.05

Why not an exact number?Instructions in loop? 4300 * 4 = 1200

WE WOULD EXPECT 1200 cycles!!!!Where did the difference go?

Release modeinternal to internal

Page 27: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

27 / 29

F) Determine loop efficiency in terms of cycles / read_write op

SDRAM -> internal memory

size was 300 Useful reads 300 Useful writes 300

Cycles 28200 as measured

28200 / 600 = 47

SDRAM access 47 cyclesL1 memory 1 cycle

Would make sense to process in L1 memory – so move SDRAM to L1 to process

Release modeexternal to external

Page 28: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

28 / 29

F) Determine loop efficiency in terms of cycles / read_write op

SDRAM -> internal memory

size was 300 Useful reads 300 Useful writes 300

Cycles 4836 as measured 300 of those are L1 writes Leaving 4500

4500 / 300 = 15

SDRAM read before 47 cyclesSDRAM read now 15 cycles L1 -> L1 1 cycle

Would make sense to process in L1 memory – so move SDRAM to L1 to process

Loads of overhead in SDRAM to SDRAM

External to internal

Page 29: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

29 / 29

Tackled today

Review of handling external arrays (global arrays) from assembly code Arrays declared in another file Arrays declared in this file -- NEW Needed for arrays used by ISRs

Arrays declared on the stack Pointers passed as parameters to a subroutine Can’t use arrays on the stack when used by ISR

Page 30: Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency

04/20/23DMA , Copyright M. Smith, ECE, University of Calgary, Canada

30 / 29

Information taken from Analog Devices On-line Manuals with permission http://www.analog.com/processors/resources/technicalLibrary/manuals/

Information furnished by Analog Devices is believed to be accurate and reliable. However, Analog Devices assumes no responsibility for its use or for any infringement of any patent other rights of any third party which may result from its use. No license is granted by implication or otherwise under any patent or patent right of Analog Devices. Copyright Analog Devices, Inc. All rights reserved.