specialized video (8-bit) and vector (16-bit) instructions on the blackfin there is always a...
Post on 18-Dec-2015
218 views
TRANSCRIPT
Specialized Video (8-bit) and Vector (16-bit) Instructions on the Blackfin
There is always a “MAKE-UP-YOUR-QUESTION-AND-ANSWER-IT” Question on a Dr. Smith Final.
Must be at an appropriate level for a third year course.
Expand on these ideas for Q9 question and answer on the final
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
2 / 22
Problem to solve
Using the video capability on Blackfin Specialized instructions
Examine and explain examples in detail (working program) for Q9 format for final.
Take something we have done in a laboratory and vectorize it for example
Vectorize – take a 32-bit set of data operations and demonstrate the same concept with 8-bit data – but doing 4 operations at the same time.
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
3 / 22
Blackfin Evaluation Board
8-bit values
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
4 / 22
Getting video information in
DMA activity Values coming in using PPI (Parallel Port Interface)
which shares many of its pins with PF Stored video information in SDRAM BF561 – 2 core Blackfin, handles both video in and
out at the same time Possible Q9 question – one whole chapter in
Hardware book on this area
Video library for Blackfin developed in 2004 by Swiss International Internship students
www.enel.ucalgary.ca/People/Smith/ECE-ADI-Project/CourseInfo/VideoCourseInfoFrame.htm
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
5 / 22
Special 8 bit ALUS
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
6 / 22
Video image
Blanking information
Frame 1 - luminance + colour information
Blanking information
Frame 2 - luminance + colour information
Blanking information
Have ability to manipulate frame information without touching blanking information
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
7 / 22
Frame information
Pixel 1 uses G1 + CB1 + CR1 Pixel 2 uses G2 + CB1 + CR1 Pixel 3 uses G3 + CB3 + CR3 Pixel 4 uses G4 + CB3 + CR3
CB1 G1 CR1 G2 CB3 G3 CR3 G4 CB5 G5 CR5 G6
Image brightness decreasing
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
8 / 22
Frame information
R0 = [P0] brings in information on pixel 1 and 2 intensity and colourOne memory access, 2 pixel info
One memory access, 4 pixel intensity information
CB1 G1 CR1 G2 CB3 G3 CR3 G4 CB5 G5 CR5 G6
G1 G2 G3 G4 G5 G6 G7 G8
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
9 / 22
BYTEOP16P (Quad 8-bit ADD)Four 8-bit ADDs in a single cycle
8 pixel values stored in R1 and R0 -- 2 registers used at the same time I0 register used to select which 4 pixel values used in operations –
called “byte alignment” I0 = 0; use all bytes in R0 I0 = 1; Lowest byte in R1, top 3 in R0
8 pixel value stored in R3 and R2 I1 used to select the 4 pixels used
If I0=1, I1 = 1; and(R4, R6) = byteop16p(R3:2, R1:0) ; // sum -- 6 registers at the same time
then we got 4 16-bit answers -- “my byte notation”R4.H = R3.B0 + R1.B0; // Bottom byteR4.L = R2.B3 + R0.B3; // Highest byteR6.H = R2.B2 + R0.B2; // Next highest byteR6.L = R2.B1 + R0.B1; // Next highest byte
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
10 / 22
BYTEOP1P (Quad 8-bit Average) Four 8-bit ADDs + average in a single cycle
8 pixel values stored in R1 and R0 I0 register used to select which 4 pixel values used in
operations – called “byte alignment” I0 = 0; use all bytes in R0 I0 = 1; Lowest byte in R1, top 3 in R0
8 pixel value stored in R3 and R2 I1 used to select the 4 pixels used
If I0=1, I1 = 1; andR4 = byteop1p(R3:2, R1:0) ; // sum and averagethen we got 4 16-bit answersR4.B3 = (R3.B0 + R1.B0) / 2; // Bottom byteR4.B2 = (R2.B3 + R0.B3) / 2; // Highest byteR4.B1 = (R2.B2 + R0.B2) / 2; // Next highest byteR4.B0 = (R2.B1 + R0.B1) / 2; // Next highest byte
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
11 / 22
BYTEOP2P (Quad 8-bit Average Half word)
Six 8-bit ADDs + average in a single cycle 8 pixel values stored in R1 and R0
I0 register used to select which 4 pixel values used in operations – called “byte alignment”
I0 = 0; use all bytes in R0 I0 = 1; Lowest byte in R1, top 3 in R0
8 pixel value stored in R3 and R2 I1 used to select the 4 pixels used
If I0=1, I1 = 1; andR4 = byteop2p(R3:2, R1:0) ; // sum and averagethen we got 4 16-bit answersR4.B2 = (R3.B0 + R1.B0 + R2.B3 + R0.B3) / 4; // Highest 4 bytesR4.B0 = (R2.B2 + R0.B2 + R2.B1 + R0.B1) / 4;
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
12 / 22
BYTEOP16M -- 4 subtracts in 1 cycle (Quad 8-bit SUBTRACT)
8 pixel values stored in R1 and R0 I0 register used to select which 4 pixel values used in operations –
called “byte alignment” I0 = 0; use all bytes in R0 I0 = 1; Lowest byte in R1, top 3 in R0
8 pixel value stored in R3 and R2 I1 used to select the 4 pixels used
If I0=1, I1 = 1; and(R4, R6) = byteop16M(R3:2, R1:0) ; // MINUS operation
then we got 4 16-bit answers -- “my byte notation”R4.H = R3.B0 - R1.B0; // Bottom byteR4.L = R2.B3 - R0.B3; // Highest byte --- Video survellianceR6.H = R2.B2 - R0.B2; // Next highest byteR6.L = R2.B1 - R0.B1; // Next highest byte
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
13 / 22
SAA -- Quad 4-bit Subtract, Absolute and Accumulate
Take the differences between 2 imagesR0 = [P0++]; -- 4 pixels – image 1 – reading from image 1R1 = [I1++]; -- 4 pixels -- image 2 – reading from image 1 R2 = 0; -- sum of differencesLoop N - 1 times -- Do a zero-overhead loop over all the images
R2 = SAA(R1, R0) || R0 = [P0++] || R1 = [I1++];
R2 = SAA(R1, R0); -- Finish off the operations in the Blackfin pipeline
Now demonstrate adding the 4 bytes together from R2 How to do efficiently – sounds like Q9 question to me
R2 = SAA(R1, R0) || R0 = [P0++] || R1 = [I1++];We have 4 subtracts down, 4 absolute values done and 8 pixel reads AND two pointer updates -- all in a single cycle – parallel instructions denoted by ||
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
14 / 22
Worked Examples
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
15 / 22
Vector operations
Many of the operations give 16 bit results Example Quad 8-bit Add
(R4, R6) = byteop16P(R3:2, R1:0) Now you want to add the results together
R5 = R4 +|- R6; R4.H + R6.H with R4.L + R6.L R5.L = R5.H + R5.L (NS);
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
16 / 22
Vector operations
R0 = R1 +|+ R0; R1.H + R0.H, R1.L + R0.L R0 = R1 +|+ R0 (co); R1.H + R0.L, R1.L + R0.H
co – word order “cross over” Can be +|+, +|-, -|+ or -|-;
R3 = R1 + R0, R4 = R1 – R0; 32 bit add and subtract at same time Must use same source registers
R3 = R1 +|+ R0, R1 -|- R0; 2 16-bit adds and 2 16-bit subtracts at same time
R3 = R1 +|+ R0, R1 -|- R0 (asr); 2 16-bit adds and 2 16-bit subtracts at same time –
and then afterwards do an arithmetic shift right (add and average, subtract and average)
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
17 / 22
Vector operations
Normal max instruction (32-bit)R0 = MAX(R1, R2);R0 is largest of R1 and R2Ditto MIN(R1, R2);
Vector maxR0 = MAX(R1, R2) (V)R0.H is largest of R1.H and R2.HR0.L is largest of R0.L and R2.L
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
18 / 22
VIT_MAX – Compare and Select
R0 = VIT_MAX(R1, R2);R1 = 0x23000002R2 = 0x70000001R2.H and R1.L are largest
R0 = 0x70000002A0.W = binary 10 indicating R2.H
and R1.L was largest
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
19 / 22
Other neat vector operations
Vector ABS R2 = ABS R1 (V);absolute values of R1.H and R1.L
stored in R2 Vector arithmetic shift
R2 = R1 >>> 3 (V) – 2 16-bit shifts Vector multiply and accumulate Vector Negate -- etc
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
20 / 22
Worked examples
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
21 / 22
Problem to solve
Using the video capability on Blackfin Specialized instruction
Examine and explain examples in detail (working program) for Q9 format for final.
Take something we have done in a laboratory and vectorize it for example
04/18/23Video , Copyright M. Smith, ECE, University of Calgary, Canada
22 / 22
Information taken from Analog Devices On-line Manuals with permission http://www.analog.com/processors/resources/technicalLibrary/manuals/
Information furnished by Analog Devices is believed to be accurate and reliable. However, Analog Devices assumes no responsibility for its use or for any infringement of any patent other rights of any third party which may result from its use. No license is granted by implication or otherwise under any patent or patent right of Analog Devices. Copyright Analog Devices, Inc. All rights reserved.