bengdsp notes
TRANSCRIPT
-
8/13/2019 BEngDSP Notes
1/181
1
U H
BEng
School of Engineering & Technology, University of Hertfordshire
Prof. Talib Alukaidey
Digital Signal Processing
-
8/13/2019 BEngDSP Notes
2/181
2
U H
Table of Contents
Outline ofDigital Signal Processors
Digital vs. Analogue Signal Processing ---------------------------------------------------------------------- Page 3Why process signals digitally? -------------------------------------------------------------------------------- Page 5What is Digital Signal Processing? ---------------------------------------------------------------------- Page 6What are Digital Signal Processors? -------------------------------------------------------------------- Page 7What are the typical Applications for DSP? ---------------------------------------------------------- Page 9What do you need to produce a Functional DSP Device? --------------------------------------- Page 15The Efficiency of the Assemblers & the Goodies of the Simulators ------------------------ Page 20High Level Languages and Their Advantages ------------------------------------------------------- Page 25Binary Notation in DSP's ------------------------------------------------------------------------------------ Page 29Features Of ADSP-2100 Base Architecture ----------------------------------------------------------- Page 41ADSP-2100 Family Base Internal Architecture ------------------------------------------------------- Page 42ALU ----------------------------------------------------------------------------------------------------------------- Page 43MAC ---------------------------------------------------------------------------------------------------------------- Page 55Shifter -------------------------------------------------------------------------------------------------------------- Page 71
Data Address Generator (DAG) Operations ---------------------------------------------------------- Page 83Program Sequencer Operations -------------------------------------------------------------------------- Page 93ADSP-2100 Family Peripherals --------------------------------------------------------------------------- Page 102The Base Architecture of Floating-Point DSP Processor ---------------------------------------- Page 128The System Architecture ------------------------------------------------------------------------------------ Page 133The Complete Architecture --------------------------------------------------------------------------------- Page 134What is a Real Time Application? ------------------------------------------------------------------------ Page 135Real Time Operating Systems as an Ideal Environment for Embedded Applications -- Page 136Compression Techniques and a Compressor and De-Compressor Generator ---------- Page 140Performance Measures------------------------------------------------------------------------------------------ Page 145Data Flow Bottle-necks & Solutions; Pipeline & Parallel Architectures With Examples --- Page 147High Performance System Classification Scheme ------------------------------------------------- Page 163SIMD Matrix Multiplication & SIMD FFT ---------------------------------------------------------------- Page 166How To Design SIMD DSP System From The Off-Shelf Fixed-Point DS Processors? ----------- Page 167Multiprocessing With The SHARC ------------------------------------------------------------------------ Page 171VLIW Compiler and the DSP Super Computer Architecture Goes Hand in Hand -------- Page 213
-
8/13/2019 BEngDSP Notes
3/181
3
U H
Digital vs. Analogue Signal Processing
Digital vs. Analogue Signal Processing
Y(f)
X(f)
LP BP HP
f
x(t)R y(t)C
x(t)R y(t)
CL
x(t)R
y(t)C
Simple
Filters
YHP
YBP
YLP
t
t
x(t)
Data with abroad rangeof spectralcontent
Filters are typically used to pick out signals of interest from noise, by making use of their differing frequency
characteristics.
Filters can be designed analogue components or digital components. The following figure shows simple
analogue filters:
-
8/13/2019 BEngDSP Notes
4/181
4
U H
DIGITALS/H A/D PROCE- D/A
fs
NOISY SIGNAL CLEAN
DiscreteTime Value
AnalogueDiscrete FilterProcessing
SSOR
Signal
SIGNAL
The following figure shows the required components for Digital filters:
Analogue
Signal
-
8/13/2019 BEngDSP Notes
5/181 5
U H
Bandwidth Aging Temp Drift Accuracy Upgrade Prediciton0
10
20
30
40
50
60
70
80
90
Bandwidth Aging Temp Drift Accuracy Upgrade Prediciton
Analogue
Digital
Why process signals digitally?
-
8/13/2019 BEngDSP Notes
6/181 6
U H
For reasons of simplicity and flexibilityassociated with the binary nature of the
electronics, processing of signals is most
conveniently done digitally and it is this
major area of electronics, informationtechnology and control engineering known
as Digital Signal Processing.
What is Digital Signal Processing?
Digital Signal Processings are Numerical Techniques ToExtract Information From Discrete Time, Discrete ValuedSignals.
-
8/13/2019 BEngDSP Notes
7/181 7
U H
The rapid advances being made in the field of digital component technology
are having profound effects on all aspects of digital systems design.
Nowhere are these effects being felt more strongly than in the design of highperformance systems for such applications as digital signal processing.
This part of the DSP2course brings together a wide variety of logical
concepts that impact the design of such systems which acknowledge and
take advantage of modern component technology.
The Digital Signal Processors may be interpreted as:
1- The design of VLSI components intended for use in digital signal
processing applications, &
2- The design of digital signal processing systems that utilise VLSI
components.
What are Digital Signal Processors?
-
8/13/2019 BEngDSP Notes
8/181 8
U H
2
< 300s1
< 300 s0
< 300 sS A M P E L S
F I
a i x n i
N
( )* ( ) 1
1
< 300
R
y n a i x n i
N( ) ( )* ( ) 11
2 1 0
SPEECH RECOG.
-
8/13/2019 BEngDSP Notes
9/1819
U H
What are the typical Applications for DSP?
Communications
Echo CancellationScrambler-Descrambleretc.
Radar
ImagingSpeechControlGeologyMedical
and more and more
-
8/13/2019 BEngDSP Notes
10/1811
U H
SPEECHAmong The Applications of DSP to Speech are:
. VOCODERS . SYNTHESIS . ANALYSIS . RECOGNITION
One of the Largest Applications is in Voice Synthesis:
Impulse
TrainGenerator
RandomNumberGenerator
Pitch PeriodDigital Filter Coefficients(Vocal Tract Parameters)
X
Amplitude
SpeechSamples
Time-Varying
DigitalFilter
-
8/13/2019 BEngDSP Notes
11/1811
U H
CONTROLControl Systems are Finding Applications for DSP
. Lead/Lag Compensators . Transducer Linearisation . LargeMultivariate Systems
For Example: Feedback Control
Digital
Command D/A
Dynamic
System
A/D
Feedbac
Digital
Filter
.
-
8/13/2019 BEngDSP Notes
12/181
1
U H
COMMUNICATIONCommunications Applications of DSP Include:
. PCM Generation . Tone Detection . Adaptive EchoCancellers . SSB Generation
For Example: SSB Via Hilbert Filters
X(t)A/D
Delay
HilbertFilter
SIN
COS
Y(n)
X(f)
f
Y(f)
f
-
8/13/2019 BEngDSP Notes
13/181
1
U H
IMAGINGImage Processing Applications Include:
. Deblurring . Data Compression . Scene Analysis . 3-DReconstruction
For Example: A Moving Camera Blurs a Picture and canbe Modelled as a Low Pass Filter. Deblurring Requires the
Inverse Linear Operation
Scene
MovingCamera
Picture
2-D
Filter
Inverse
Point SpreadFunction
Deblurre
-
8/13/2019 BEngDSP Notes
14/181
1
U H
MEDICALDSP is Finding New Applications in the Medical Field:
. Patient Monitoring . Tomography . Blood Flow Velocimeters. EKG Pattern Analysis . XRAY Enhancement
For Example: Micro Based Monitor
CommercialFetal
Monitor
MUXS/H
A/D
Micro
Display
DataRecoder
DSP
-
8/13/2019 BEngDSP Notes
15/181
1
U H
What do you need to produce aFunctional DSP device?
Answer: HARDWARE & SOFTWARE
Real Time DSP applications requirechoices In both Hardware &Software to produce a functional
device
-
8/13/2019 BEngDSP Notes
16/181
1
U H
APPLICATION
HARDWARE SOFTWARE
ARRAYPROCESSOR
MICRO-PROCESSOR
D SPCHIP SPECIAL
DEVICEHIGH
LEVEL
ASSEMBLYCODE
MICROCODE
FUNCTIONALDEVICE
ADVANCED CAD TOOLS
-
8/13/2019 BEngDSP Notes
17/181
1
U H
Design Capture : Dra w and Spe cify
T ransla tor
Ana log D e vices Design Implementa tion
GENERATOR
CODE
-
8/13/2019 BEngDSP Notes
18/181
1
U H
Library of DSP PrimitiveFunctions
A
B
EQ
2
3
EQ?
1 1IN
EX T_IN
IN?
G 1GP
INGP
12
AEXP?
AEXPAND
12
ACOMP?
ACOMPRES
FIR
LMSE
AFIR?3 X n
3 Dn
Yn 1
2
NOISE?
NOISE
1
Z
DELAY1
12 -1
DELAY?
Z
DELAY2
12 -2
DELAY?
Z
DELAYN
12 -n
DELAY?
2
MULT
MULT?
GDFT2
DFT1
2
3
4
2
MINUS
MINUS?
+
-
AMP
AMP?
2 1
-
8/13/2019 BEngDSP Notes
19/181
1
U H
Proportional Integral Derivative (PID)Compensation Filter
U t K e t K de tdt K e t dt p d i( ) ( ) ( ) ( )
2
MINUSMINUS1
+
-
AMP
AMP1
2
1
SOURCE
1
PROFILE GEN
zcne=tzlrate=10000.0trigger=cp0
AMP
AMP2
2 1
AMP
AMP3
2 1
AMP
AMP4
2 1
INT Z
12
INT1
DFF_LD1
12
DFF1
ddt
SUM3
2
4
SUM1
13
PAR_IN
1
ENCODER
OUT1
SER OUT1des=port2
1
3
1
gdn=1.0
gdn=.7
gdn=.5
gdn=1.19
-
8/13/2019 BEngDSP Notes
20/181
-
8/13/2019 BEngDSP Notes
21/181
2
U H
The AssemblerThe Assembler translates source code, written with an
algebraic syntax, into object code. Variables, data buffers,
and symbolic constants are defined with the Assemblerdirectives.
LCNTR=r15, Do end_bfly until LCE;
f8=f1*f6, f14=f11-f14, dm(i2,m0)=f10, f9=pm(i11,m8);f11=f1*f7, f3=f9+f14, f9=f9-f14, dm(i2,m0)=f13, f7=pm(i8,m8);
f14=f0*f6, f13=f8+f12, f8=dm(i0,m0), pm(i10,m10)=f9;
end_bfly: f12=f0*f7, f13=f8+f12, f10=f8-f12, f6=dm(i0,m0), pm(i10,m10)=f3;
FFT Butterfly Core Example
-
8/13/2019 BEngDSP Notes
22/181
2
U H
Due to the following characteristics, a high efficient codecould be achieved if an assembler is used:
Dedicated Purpose
Assembler is Hardware Slave
Moderate Data Size
Instruction Mnemonics, Address Labels
Simple Arithmetic Operations
High Speed
Moderate Ease Writing and Development
Moderate Ease of Documentation
DSP Processor Development Cycle
-
8/13/2019 BEngDSP Notes
23/181
2
U H
S T A R T
Burn PR OM s
Prototype T e st
EN D
(System Builder)
D e fine T a rge t H a rdware
Assemble Mo dule
Link
S IMU LAT E EMU LAT E
PR OM Sp litte r
.obj .cde .int
.sys .dsp
.ach
.exe .exe
CROSS-SOFTWARE-PRO
GRAMS
Repeat as necessary
Repeat as necessary
DSP Processor Development Cycle
-
8/13/2019 BEngDSP Notes
24/181
2
U H
Performs interactive, instruction-level simulation of the DSPprocessor code within the hardware configuration
Simulates interrupt and I/O handling,
Flags illegal operations
Supports full symbolic assembly and disassembly
Displays the internal operations and status of the processor
Provides an easy-to-use, window oriented, and graphical user
interface with commands accessed from pull-down menuswith a mouse
The Simulator
-
8/13/2019 BEngDSP Notes
25/181
2
U H
High Level Languages and TheirAdvantages
High-Level Languages are:
C
Compiler
HD
C++
HD
DSP/C
HD
Compiler
(Numerical C)
ADA
HD
-
8/13/2019 BEngDSP Notes
26/181
2
U H
Compiles with ANSI Specification Incorporates Optimizing Algorithms to Speed Up the
Execution of Code
They Include an Extensive Runtime Library withTypical 100 Standard and DSP-Specific Functions
Outputs DSP Processor Assembly LanguageSource Code
C Compiler and Runtime Library
-
8/13/2019 BEngDSP Notes
27/181
2
U H
Supports ANSI Standard (X3J11.1) Numerical C as
Defined by the Numeric C Extensions Group (NECG)
Accepts C Source Input Containing Numerical CExtensions for:
Array Selection
Vector Math OperationsComplex Data TypesCircular PointersVariably Dimensioned Arrays
Outputs DSP Processor Assembly Language SourceCode
DSP/C Compiler
DSP HLL Ad t
-
8/13/2019 BEngDSP Notes
28/181
2
U H
DSP HLLs Advantages are:
Hardware Transparent (Portability)
High Level Arithmetic Operations (Complex Math) orUse Library Routines e.g. sin(), fir(), fft()
Loops, Arrays, Labels, I/O Format
Searching and Sorting
Peripheral Intensive System
Relatively Fast Writing & Development
Ease of Documentation
-
8/13/2019 BEngDSP Notes
29/181
-
8/13/2019 BEngDSP Notes
30/181
3
U H
Binary - Hexadecimal - Decimal Number
Conversion Table
Decimal
0123
456789
1011
12131415
Hexadecimal
0123
456789
AB
CDEF
Binary
0000000100100011
01000101011001111000100110101011
1100110111101111
-
8/13/2019 BEngDSP Notes
31/181
3
U H
Signed / Unsigned
UnSigned
Signed
0000 0V - FULL SCALE
FFFF 5V + FULL SCALE
8000 -5V - FULL SCALE
0000 0V
7FFF 5V + FULL SCALE
S/U U U U U U U U U U U U U U U U
-
8/13/2019 BEngDSP Notes
32/181
3
U H
2's Compliment Representation
For 2's complement representation, the scale factor for the sign bit of a number
is seen as -(2) (M-1) where M is the number of bits left of the binary point. For
a 4.2 number, the sign scale is (-2)^3.
Example: 0101.01 = 0 * (-8) + 1 * (4) + 0 * (2) + 1 * (1) + 0 * (1/2) + 1 * (1/4)
= 5.25
= - 2.75
1101.01 = 1 * (-8) + 1 * (4) + 0 * (2) + 1 * (1) + 0 * (1/2) + 1 * (1/4)
Binary Point
-2-13 012-(2 ) 2 2 2 2 2
Sign Bit
-
8/13/2019 BEngDSP Notes
33/181
3
U H
Fractional versus Integer Notation
S F F F F F F F F F F F F F F F
S I I I I I I I I I I I I I I I
radix point
radix point
Integer format is 16.0 notation
-
8/13/2019 BEngDSP Notes
34/181
3
U H
DSP is optimized for fractional
notation
DSP supports integer notation
-
8/13/2019 BEngDSP Notes
35/181
3
U H
Ranges for 16 bit Formats
Largest PositiveValue (0x7FFF)
In Decimal
0.9999694824218751.9999389648437503.9998779296875007.999755859375000
15.999511718750000
31.99902343750000063.998046875000000127.996093750000000255.992187500000000511.984375000000000
1023.9687500000000002047.9375000000000004095.8750000000000008191.750000000000000
16383.50000000000000032767.000000000000000
Largest NegativeValue (0x8000)
In Decimal
1.02.04.08.0
16.0
32.064.0128.0256.0512.0
1024.02048.04096.08192.0
16384.032768.0
Value of 1 LSB(0x0001)In Decimal
0.0000305175781250.0000610351562500.0001220703125000.0002441406250000.000488281250000
0.0009765625000000.0019531250000000.0039062500000000.0078125000000000.0156250000000000.0312500000000000.0625000000000000.1250000000000000.250000000000000
0.5000000000000001.000000000000000
FORMAT
1.152.143.134.125.11
6.107.98.89.710.611.512.413.314.2
15.116.0
Fractional
Integer
-
8/13/2019 BEngDSP Notes
36/181
3
U H
Format Example
+5 V
-5 V
0 V
0x7FFF
0x0000
0x8000
FORMAT
1 2
3
4 5
0x7FFF
0x3FFF
0x0000
0xCCCD
0x8000
1)
2)
3)
4)
5)
16.0 1.15
5 V
2.5 V
0 V
-2.0 V
-5.0 V
= 32767 ->
= 16383 ->
= 0 ->
= -13107 ->
= -32768 ->
0.999969482... ->
0.499969482... ->
0.0000000... ->
-0.399993986... ->
-1.0000000.... ->
5 V
2.5 V
0 V
-2.0 V
-5.0 V
-
8/13/2019 BEngDSP Notes
37/181
3
U H
There are two methods for converting Hexadecimal Numbers to Decimal
Numbers. One is easy and one is hard.
HARD WAY : Convert the hexadecimal number to binary. Place the binary
point. Multiply each bit of the binary number by its associated scale factor.
Example: Convert 0x2A00 to a 1.15 twos-complement decimal value
0x2A00 = 0.010 1010 0000 0000= 2^-2 + 2^-4 + 2^-6
= 0.25 + 0.0625 + 0.015625
= 0.328125 = 0.33 = 1/3
EASY WAY : Use a calculator to convert the hexadecimal number to decimal.
Divide the decimal number by 2^N where N is the number of bits to the right
of the binary point.
Example: Convert 0x2A00 to a 1.15 twos-complement decimal value
0x2A00 10752 / 2^15 = 10752 / 32768 = 0.328125
Hexadecimal to Decimal Conversion
-
8/13/2019 BEngDSP Notes
38/181
3
U H
There are two methods for converting Decimal Numbers toHexadecimal numbers. One is easy, and one is hard.
HARD WAY: Break the decimal number into its 2^N components.
Example: Convert 0.8125 to a 1.15 twos-complement hexadecimal format
0.8125 =>
2-2
2-1
20
2-3
2-4
2-5
2-6
2-7
1 1/2 1/4 1/8 1/16 1/641/32 1/128
0 0001011 => 0x6800
EASY WAY: Multiply the decimal number by 2^N where N is the number ofbits to the right of the binary point. Then use a calculator to convert to hex.
Example: Convert 0.8125 to a 1.15 twos-complement hexadecimal format
0.8125 * 2^15 = 0.8125 * 32768 = 26624 0x6800
Decimal to Hexadecimal Conversion
-
8/13/2019 BEngDSP Notes
39/181
3
U H
Binary Notation Mini-Quiz
Mini-Quiz
1) What is 0x4000 (1.15 format) in signed decimal notation?
2) What is 0x4000 (16.0 format) in signed decimal notation?
3) What is 0x4000 (0.16 format) in unsigned decimal notation?
4) What is .875 in hex 1.15 Format?
5) What is -.875 in hex 1.15 Format?
-
8/13/2019 BEngDSP Notes
40/181
4
U H
Binary Notation Mini-Quiz Answer
1) What is 0x4000 in 1.15 signed notation? 0.5
2) What is 0x4000 in 16.0 signed notation? 16384
3) What is 0x4000 in 0.16 unsigned notation? 0.25
4) What is .875 in 1.15 Format? 0x7000
5) What is -.875 in 1.15 Format? 0x9000
-
8/13/2019 BEngDSP Notes
41/181
4
U H
Features Of ADSP-2100 Base Architecture
Modified Harvard Architecture
2 Data Address Generators
Advanced Program Sequencer
3 Arithmetic Units (ALU/MAC/Shifter)
Result Bus
ADSP-2100 Family Base Internal Architecture
-
8/13/2019 BEngDSP Notes
42/181
4
U H
y
Input Regs
Output Regs
Shifter
Input Regs
Output Regs
ALU
Input Regs
Output Regs
MAC
R BUS 16
DMD BUS
PMD BUS
DataAddress
Generator#2
DataAddressGenerator
#1
DMA BUS
PMA BUS14
14
24
16
ProgramSequencer
-
8/13/2019 BEngDSP Notes
43/181
4
U H
ALU
ALU Block Diagram
-
8/13/2019 BEngDSP Notes
44/181
4
U H
g
X Y
ALU
R
AZ
ANAC
AV
AS
AQ
CI
MUX
ARREGISTER
16
AF
REGISTER
AX
REGISTERS
2 x 16
16 16
16
16
24
16
PMD BUS
DMD BUS 16
R - BUS
MUX
MUX
MUX
AY
REGISTERS
2 x 16
-
8/13/2019 BEngDSP Notes
45/181
4
U H
ALU Features
4 Input Registers ( AX0, AX1, AY0, AY1 )
Feedback Paths ( AF, AR, MR0, MR1, MR2, SR0, SR1 )
Six Status Flags
Saturation
Provisions For Double Precision
Background Registers
-
8/13/2019 BEngDSP Notes
46/181
4
U H
ALU Instruction Examples
(Programmer's Quick Reference pgs 4-5)
AR = AX0 + AY0;
AF = MR1 XOR AY1;
AR = AX0 + AF;
IF GE AR = -AR;
IF AV AR = AY1 + 1;
-
8/13/2019 BEngDSP Notes
47/181
4
U H
ALU Instructions
[IF Condition] dest = xop + yop ;
[IF Condition] dest = xop + C ;[IF Condition] dest = xop + yop + C ;
[IF Condition] dest = xop - yop ;
[IF Condition] dest = xop - yop + C - 1 ;
[IF Condition] dest = yop - xop ;
[IF Condition] dest = yop - xop + C - 1;
[IF Condition] dest = xop AND yop;
[IF Condition] dest = xop OR yop;
[IF Condition] dest = xop XOR yop;
[IF Condition] dest = PASS xop ;
[IF Condition] dest = PASS yop ;
[IF Condition] dest = PASS 0;[IF Condition] dest = PASS 1;
ALU Instructions
-
8/13/2019 BEngDSP Notes
48/181
4
U H
ALU Instructions
[IF Condition] dest = - xop ;
[IF Condition] dest = - yop ;
[IF Condition] dest = NOT xop ;[IF Condition] dest = NOT yop ;
[IF Condition] dest = ABS xop ;
[IF Condition] dest = yop +/-1 ;
DIVS yop , xop ;
DIVQ xop ;
XOP = [AR, MR0, MR1, MR2, SR0, SR1, AX0, AX1]
YOP = [AY0, AY1, AF]
dest = [AR, AF]
Examples: AR = AX0 + AY0;
AF = NOT AR;
AF = AX1 + AY0 + C;
ALU St t Fl
-
8/13/2019 BEngDSP Notes
49/181
4
U H
ALU Status Flags
Flag Name Definition
AZ Zero Logical NOR of all bits in ALU result reg. True if
ALU output equals 0
AN Negative Sign bit of ALU result. True if ALU output negative
AV Overflow X-OR of carry outputs of 2 most significant adder
stages. True if ALU overflows
AC Carry Carry output from most significant adder stage
AS Sign Sign of ALU X input port. Affected only by ABS
instruction
AQ Quotient Quotient bit generated only by DIVS and DIVQ
Arithmetic Conditions
-
8/13/2019 BEngDSP Notes
50/181
5
U H
Arithmetic Conditions
ALU Overflow Bit Set
ALU Carry Bit Set
EQ: ALU result = 0
NE: ALU Result 0
GT: ALU Result > 0GE: ALU Result 0
LT: ALU Result < 0
LE: ALU Result 0
NEG: XOP Input Negative
POS: XOP Input Positive
AV:
Not AV:
AC:
Not AC:
MV:
Not MV:
Not CE: Not Counter Expired
Absolute Value Instruction Only
MAC Overflow Bit
>
>
>
>
ALU Saturation
-
8/13/2019 BEngDSP Notes
51/181
5
U H
Sets ALU result to full scale positive or full scale negative if overflow or
underflow occurs
Feature enabled by executing ena ar_sat (bit 3 of MSTAT)
Once enabled, affects every ALU operation
Only affects results sent to AR (AF - flags still get set)
Overflow or underflow determined by the following conditions
Overflow (AV) Carry (AC) AR Contents
0 0 ALU Output
0 1 ALU Output
1 0 0x7FFF
full-scale positive
1 1 0x8000
full-scale negativeALU Overflow Latch Mode
Causes AV status flag to become sticky. Need to explicitly clear.
Feature enable by executing ena av_latch (bit 2 of MSTAT)
ALU Mi i Q i
-
8/13/2019 BEngDSP Notes
52/181
5
U H
ALU Mini-Quiz
Write The ADSP-2100 Code To Perform The Following Operations:
1) Add 0x0030 to 0x0070 And Store Result in AF.
Hint:
= 0x0070 ;
= 0x0030 ;
AF = + ;
2) Find The Logical AND Of 0x1234 And 0xF00F.
Store The Result In AR.
ALU Mini Quiz
-
8/13/2019 BEngDSP Notes
53/181
5
U H
ALU Mini-Quiz
Write The ADSP-2100 Code To Perform The Following Operations:
1) Add 0x0030 to 0x0070 And Store Result in AF.
Hint:
AX0 (or AX1) = 0x0070 ;
AY0 (or AY1)= 0x0030 ;
AF = AX0 + AY0 ;
2) Find The Logical AND Of 0x1234 And 0xF00F.
Store The Result In AR.
AY1 = 0x1234;AR = 0xF00F;
AR = AR AND AY1;
ADSP-2100 Family Base Internal Architecture
-
8/13/2019 BEngDSP Notes
54/181
5
U H
Input Regs
Output Regs
Shifter
Input Regs
Output Regs
ALU
Input Regs
Output Regs
MAC
R BUS 16
DMD BUS
PMD BUS
DataAddress
Generator
#2
DataAddressGenerator
#1
DMA BUS
PMA BUS14
14
24
16
ProgramSequencer
-
8/13/2019 BEngDSP Notes
55/181
5
U H
MAC
MAC Block Diagram24PMD BUS
-
8/13/2019 BEngDSP Notes
56/181
5
U H
MF
REGISTER
MY
REGISTERS2 x 16
16
16
X Y
MULTIPLIER
P
MX
REGISTERS2 x 16
16 16
32
16
MR1
REGISTER
MR2
REGISTERMR0
REGISTER
168
M
U
X
R0R1R2
40
MV
16
DMD BUS16
R - BUS
ADD / SUBTRACT
MUX
MUXMUX
MUXMUXMUX
MAC Features
-
8/13/2019 BEngDSP Notes
57/181
5
U H
MAC Features
40 Bit Accumulator
Saturation
Complete Set of Background Registers
Mixed Mode Input Operands for Multiprecision Feedback Paths
Access to R-Bus, DM and PM
MAC Instruction Examples
-
8/13/2019 BEngDSP Notes
58/181
5
U H
MAC Instruction Examples
(Programmer's Quick Reference pgs 6-7)
MR = MX1 * MY0 (SS);
MF = AR * MY1 (SS);
MR = MR + AR * MY1 (SS);
MR = 0;
IF MV SAT MR;
IF EQ MR = MX0 * MY0 (UU);
MAC Instructions
-
8/13/2019 BEngDSP Notes
59/181
5
U H
MAC Instructions
[IF condition] dest = xop * yop (format);
[IF condition] dest = MR + xop * yop (format);
[IF condition] dest = MR - xop * yop (format);
[IF condition] dest = 0;
[IF condition] dest = MR [ (RND)];
Where:
condition = arithmetic conditions
dest = {MR, MF}
format = {SS, US, SU, UU, RND}
XOP = {MX0, MX1, MR2, MR1, MR0, AR, SR0, SR1}
YOP = {MY0, MY1, MF}
Placement of Binary Point in Multiplication
-
8/13/2019 BEngDSP Notes
60/181
6
U H
Placement of Binary Point in Multiplication
Binary Integer Multiplication
M Bits
P Bitsx
M+P Bits
Example: 16.0 x 16.0 => 32.0
Mixed/Fractional Multiplication
M.N Bits
P.Q Bitsx
(M+P).(N+Q) Bits
Example: 1.15 x 1.15 => 2.30
4.12 x 1.15 => 5.27
Multiplication Modes on the ADSP-21xx
-
8/13/2019 BEngDSP Notes
61/181
6
U H
Multiplication Modes on the ADSP 21xx
Multiplier Assumes all numbers in a 1.15 Format Multiplier Automatically 1-bit Left Shifts Product
Before Accumulation (Result Forced to 1.31 Format)
Example: MR = MX0 * MY1 (SS);
Mode 1: Fractional Mode
0x4000 0x4000
MX0 MY1
MR0MR1MR2
MR1
0x00 2000 0000
0x2000 underflowoverflow
Multiplication Modes on the ADSP-21xx
-
8/13/2019 BEngDSP Notes
62/181
6
U H
p
Multiplier Assumes all numbers in a 16.0 Format No automatic left-shift necessary
Example: MR = MX0 * MY1 (SS);
Mode 2: Integer Mode
0x4000 0x4000
MX0 MY1
MR0MR1MR2
0x00 1000 0000
0x0000overflow
MR0
overflow
Multiplication on the ADSP-21xx
-
8/13/2019 BEngDSP Notes
63/181
6
U H
p
To Switch Modes: ENA M_MODE; {Select Integer Mode} *DIS M_MODE; {Select Fractional Mode}
MSTAT Register holds value
Fractional Mode the Default on Reset/Power-up
* Integer Mode Not Available on ADSP-2100A
Rounding in the MAC
-
8/13/2019 BEngDSP Notes
64/181
6
U H
g
Rounding can be specified as part of multiply instruction (RND)
Rounding only applies to fixed point fractional results
40-bit results "rounded to nearest" 16 bit value.
Rounded result can be placed in MR1 or MF register
Input: MX0 = 0x7FF9, MY0 = 0xEEEE
Command MR2 MR1 MR0
MR = MX0 * MY0 (SS); FF EEEE EEFC
MR = MX0 * MY0 (RND); FF EEEF 6EFC
Saturation and Overflow
-
8/13/2019 BEngDSP Notes
65/181
6
U H
Overflow occurs when sign bit is corrupted during accumulation
Overflow Status signal (MV) is updated every time a MAC operation is
executed
MV is set when MSB of MR2 does not equal MSB of MR1
Saturation is performed by following instruction:
IF MV SAT MR
Input: MX0 = 0x7FFF, MY0 = 0x7FFF, MR = 00 7FFE 0002
Command MR2 MR1 MR0
MR = MR + MX0 * MY0 (SS); 00 FFFC 0004IF MV SAT MR; 00 7FFF FFFF
MAC Mini-Quiz
-
8/13/2019 BEngDSP Notes
66/181
6
U H
Write an ADSP-2101 Program to add the values in AX0 and AY0 and to multiply
the result by 0x20.
AX0 = 0x0020;
AY0 = 0x0010;
AR = _______________
___ = _______________
____=_______ * _________
Binary Multiply Mini-Quiz
-
8/13/2019 BEngDSP Notes
67/181
6
U H
Fractional Mode Integer Mode
0x1240 * 0x0001
0x4000 * 0x4000
0x4000 * 0x0002
What is the ADSP-21xx Multiplier Output?(Hint: The Output is 32 Bits Wide)
MAC Mini-Quiz
-
8/13/2019 BEngDSP Notes
68/181
6
U H
Write an ADSP-2101 Program to add the values in AX0 and AY0 and to multiply
the result by 0x20.
AX0 = 0x0020;
AY0 = 0x0010;
AR = AX0 + AY0;
MY0 = 0x20;
MR = AR * MY0 (SS);
Binary Multiply Mini-Quiz
-
8/13/2019 BEngDSP Notes
69/181
6
U H
Fractional Mode Integer Mode
0x1240 * 0x0001
0x4000 * 0x4000
0x4000 * 0x0002
What is the ADSP-21xx Multiplier Output?(Hint: The Output is 32 Bits Wide)
0x0000 2480 0x0000 1240
0x2000 0000
0x0001 0000
0x1000 0000
0x0000 8000
ADSP-2100 Family Base Internal Architecture
-
8/13/2019 BEngDSP Notes
70/181
7
U H
Input Regs
Output Regs
Shifter
Input Regs
Output Regs
ALU
Input Regs
Output Regs
MAC
R BUS 16
DMD BUS
PMD BUS
DataAddress
Generator#2
DataAddressGenerator
#1
DMA BUS
PMA BUS14
14
24
16
ProgramSequencer
-
8/13/2019 BEngDSP Notes
71/181
7
U H
Shifter
Shifter Block Diagram16DMD BUS
-
8/13/2019 BEngDSP Notes
72/181
7
U H
32
SR1
REGISTER
SR0
REGISTER
16
SI
REGISTER
MUX
SE
REGISTERNEGATE
MUX
EXPONENT
DETECTOR
SHIFTER
ARRAY
I
CO
OR / PASS
MUX
8
32
16
1616
From
Instruction
16
8
MUX
R - BUS
BLOCK
EXPONENT
LOGIC
MUX
MUX
16
Shifter Features
-
8/13/2019 BEngDSP Notes
73/181
7
U H
16 Bit Input Value Gets Stored Anywhere in a 32 Bit Output Field
All Shift Instructions Execute in a Single Instruction Cycle
Specify Immediate Shift Value within Instruction or indirectly in
the SE register
Normalize, Denormalize, and Exponent Detect Instructions Used
For Block Floating Point and Floating Point Operations
Shifter Instruction Examples
-
8/13/2019 BEngDSP Notes
74/181
7
U H
(Programmer's Quick Reference pgs 8-9)
SR = ASHIFT SI BY -3 (LO);
SR = LSHIFT AR BY 6 (HI);
SR = SR OR LSHIFT SR1 (LO);
Shifter Instructions
Shift Immediate Instructions
-
8/13/2019 BEngDSP Notes
75/181
7
U H
S t ed ate st uct o s
SR = [SR OR] ASHIFT xop BY (alignment);
SR = [SR OR] LSHIFT xop BY (alignment);
Shift By Value in SE Register
[IF condition] SR = [SR OR] ASHIFT xop (alignment);
[IF condition] SR = [SR OR] LSHIFT xop (alignment);
Where:
condition = Arithmetic Condition
xop = {SI, SR0, SR1, MR2, MR1, MR0, AR}
alignment = {HI, LO}
data = -32 ... 32
Arithmetic Shift Sign Extends Right Shifts
Logical Shift Zero fills Right Shifts
Left Shifts Are Always Zero Filled
Positive SE or Values Shift Left
Negative SE or Values Shift Right
NO "+" for Positive Shifts
Using the Shift Instructions
-
8/13/2019 BEngDSP Notes
76/181
7
U H
Placement of Output Depends on HI/LO Modifier, SE Register and Value
Refer to Table 2.4 In ADSP-21xx Users Manual
Example 1: SR = LSHIFT SI BY -12 (LO);
1110 1010 0011 0101SI
Before:
xxxx xxxxSE
xxxx xxxx xxxx xxxx
SR0SR1
xxxx xxxx xxxx xxxx
SI
After:
xxxx xxxxSE
0000 0000 0000 1110
SR0SR1
0000 0000 0000 0000
1110 1010 0011 0101
Immediate Shift Instructions
-
8/13/2019 BEngDSP Notes
77/181
7
U H
Example 2: SR = LSHIFT SI BY -12 (HI);
1110 1010 0011 0101SI
Before:
xxxx xxxxSE
xxxx xxxx xxxx xxxx
SR0SR1
xxxx xxxx xxxx xxxx
SI
After:
xxxx xxxxSE
1010 0011 0101 0000
SR0SR1
0000 0000 0000 1110
1110 1010 0011 0101
Shift Instructions with SE Register
-
8/13/2019 BEngDSP Notes
78/181
7
U H
Example 3: SR = LSHIFT SI (HI);
1110 1010 0011 0101SI
Before:
1111 0100 (-12)SE
SR0SR1
xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx
1110 1010 0011 0101SI
After:
SE
SR0SR11010 0011 0101 00000000 0000 0000 1110
1111 0100 (-12)
Shift Instructions with OR Functionality
-
8/13/2019 BEngDSP Notes
79/181
7
U H
Example 4: SR = SR OR LSHIFT SI (HI);
1110 1010 0011 0101SI
Before:
1111 0100 (-12)SE
SR0SR1
0000 0000 0000 0000 0000 0000 0000 0101
1110 1010 0011 0101SI
After:
SE
SR0SR11010 0011 0101 01010000 0000 0000 1110
1111 0100 (-12)
Shifter Mini-Quiz
-
8/13/2019 BEngDSP Notes
80/181
8
U H
Write ADSP-2101 Code to:
Write 0x0034 into the AR register
Write 0x0012 into the SI register
Shift AR into the MS bits of SR0 (SR0 = 0x3400)
Shift SI into the LS bits of SR0
Hint: 4 Instructions SR1 = 0x0000, SR0 = 0x3412 When Done
Shifter Mini-Quiz Answers
-
8/13/2019 BEngDSP Notes
81/181
8
U H
Solution 1:
AR = 0x0034;
SI = 0x0012;
SR = ASHIFT AR BY 8 (LO);
SR = SR OR ASHIFT SI BY 0 (LO);
Solution 2:
AR = 0x0034;
SI = 0x0012;
SR = LSHIFT AR BY -8 (HI);
SR = SR OR ASHIFT SI BY -16 (HI);
ADSP-2100 Family Base Internal Architecture
-
8/13/2019 BEngDSP Notes
82/181
8
U H
Input Regs
Output Regs
Shifter
Input Regs
Output Regs
ALU
Input Regs
Output Regs
MAC
R BUS 16
DMD BUS
PMD BUS
DataAddress
Generator#2
DataAddressGenerator
#1
DMA BUS
PMA BUS14
14
24
16
ProgramSequencer
Data Address Generator (DAG) Operations
-
8/13/2019 BEngDSP Notes
83/181
8
U H
Data Address Generator (DAG) Operations
Registered Indirect Addressing
Automatic Post-Modify of Address
Circular Buffering
DAG 1 Fetches/Stores to Data Memory
DAG 2 Fetched/Stores to Data or Program Memory
Bit-Reverser For FFT Support (DAG 1 Only)
Data Address Generator Block Diagram
-
8/13/2019 BEngDSP Notes
84/181
8
U H
L
REGISTERS
4 x 14
MUX
ADDRESS
DMD BUS
FROM
INSTRUCTION
ADD
I
REGISTERS
4 x 14
M
REGISTERS
4 x 14
MODULUS
LOGIC
BITREVERSE
142 14 14 14
14
DAG1 ONLY
FROM
INSTRUCTION
2
DAG Features
-
8/13/2019 BEngDSP Notes
85/181
8
U H
DAG Features
Data Fetch/Store Execute Simultaneous With ArithmeticInstruction
2 DAGS In Processor
4 Index Address Registers Per DAG
4 Modify Registers Per DAG
4 Length Registers Per DAG
Any Modifier Register in DAG can be Used With Any
Index Register in DAG
Example DAG Instructions
(P ' Q i k R f 10)
-
8/13/2019 BEngDSP Notes
86/181
8
U H
(Programmer's Quick Reference pgs10)
AX0 = DM(0X3800);
AX0 = DM(I0, M3);
MODIFY (I4, M5);
AX1 = DM(I2,M3), AY0 = PM(I4,M7);
MR=MR+MX0 * MY0 (SS), MX0 = DM(I2,M2), MY0 = PM(I6,M6);
Note: L Registers Must Be 0 If Circular Buffers Are Not Used
-
8/13/2019 BEngDSP Notes
87/181
Modulo Addressing Example
B Add H#0030
-
8/13/2019 BEngDSP Notes
88/181
8
U H
H#0030
H#0037
I0
I0 = Current Address
M0 = Modify Value (3)
Base Address = H#0030
L0 = Buffer Length (8)
M L
Address Sequence
30
33
36
31
34
37
32
35
Modulo Addressing Code Example
-
8/13/2019 BEngDSP Notes
89/181
8
U H
.VAR/DM/CIRC/ABS=0X30I0 = ^Buff;L0 = %Buff;M0 = 3;AX0 = DM (I0, M0);
AY0 = DM (I0, M0);AX1 = DM (I0, M0);AY1 = DM (I0, M0);
Buff [8]; /*Define Buffer *//*I0 = Start address of Buff *//*L0 = Length of Buff *//*Modify value = 3 *//*Fetch data at address 30 */
/*Fetch data at address 33 *//*Fetch data at address 36 *//*Fetch data at address 31 */
Bit Reversal with the ADSP-2100 Family
-
8/13/2019 BEngDSP Notes
90/181
9
U H
Only available with DAG1
Enabled by setting bit 1 of MSTAT register or using the instructionENA BIT_REV
Reverses all 14 bits of address
normal order: 13 12 11 10 09 08 07 06 05 04 03 02 01 00Bit-reversed: 00 01 02 03 04 05 06 07 08 09 10 11 12 13
For an FFT of size 2^N, set M register to 2*2 (14-N)*
* x2 because FFT output has real and imaginary data interleaved
i.e. 256 FFT = 2^8 FFT, M = 2*2^(14-8) = 2*2^6 = 128
DAGS Mini-Quiz
-
8/13/2019 BEngDSP Notes
91/181
9
U H
0x12340x1234
0x1234
0x1234
0x1234
Data Memory
DM(0x3800) Write the ADSP-2101 Instructionsto Find the Sum of the N=5 NumbersStored in Data Memory
Hint:
Use Multifunction Instructions Nine Instructions Total
3 Instructions are Repeated
Questions:
1) How Many Instructions Cycles AreRequired?
2) How Many Instruction Cycles are
Required if N=100?
3) Is this an Efficient Use of the Processor?
DAGS Mini-Quiz Answer
.module/boot = 0 dags_mini_quiz;
.var/dm/circ data_buf [5];
-
8/13/2019 BEngDSP Notes
92/181
9
U H
start:i0 = ^data_buf; /*Load DAG Registers */
l0 = % data_buf;m3 = 1;ar = dm (I0, m3); /*Load first data value */ay0 = dm (I0, m3); /*Load second data value */ar = ar + ay0, ay0 = dm (i0, m3); /*Add and load third value */ar = ar + ay0, ay0 = dm (i0, m3); /*Add and load fourth value */ar = ar + ay0, ay0 = dm (i0, m3); /*Add and load fifth value */ar = ar + ay0; /*Last addition */
.endmod;
1) 9 Cycles
2) 104 Cycles
3) No, it would waste program memory
Program Sequencer Block Diagram
DMD BUS 16
-
8/13/2019 BEngDSP Notes
93/181
9
U H
INTERRUPTCONTROLLER
CONDITIONLOGIC
LOOP STACK4 X 18
NEXTADDRESSSOURCESELECT
INCREMENT
PROGRAMCOUNTER
NEXT ADDRESS MUX
PC STACK16 X 14
PMA BUS 14
MUX
From INSTRUCTION REGISTER
LOOPCOMPARATOR
18
14
14
2
IRQ
4
4
14
16
14
COUNTERLOGIC
STATUSLOGIC
CE
Program Sequencer Operations
Zero Overhead Looping
-
8/13/2019 BEngDSP Notes
94/181
9
U H
Conditional/Unconditional Branches
Interrupt Handling
Counter and Status Stacks
Next Instruction Address Generation
Program Sequencer Features
Automatic Operation, Transparent to User
Single Cycle Conditional Branches
4-Deep Loop, Counter Stack
16-Deep PC Stack
Sequencer Instructions
(Programmer's Quick Reference pgs 12)
-
8/13/2019 BEngDSP Notes
95/181
9
U H
[ IF condition] JUMP ;
[ IF condition] CALL ;
[ IF condition] RTS;
[ IF condition] RTI;
IF CALL ;
IF JUMP ;
SET / TOGGLE / RESET FLAG_OUT;
Where:
condition = Branch Condition = {(I4), (I5), (I6), (I7), }flag_condition = {FLAG_IN, NOT FLAG_IN}
Program Loop Example
-
8/13/2019 BEngDSP Notes
96/181
9
U H
General Form:
DO LABEL UNTIL CONDITION
Example:CNTR=10;
DO ENDLOOP UNTIL CE;
{ First Loop Instruction } ;
{ Last Loop Instruction } ;ENDLOOP:
{ Next Loop Instruction } ;
{ First Instruction Outside Loop } ;
Address PushedOn PC Stack
Address PushedOn LOOP Stack
Interrupt Handling
Interrupts Can Be Generated By An External Interrupt Signal Or
-
8/13/2019 BEngDSP Notes
97/181
9
U H
Interrupts Can Be Generated By An External Interrupt Signal Or2100 Family Peripherals (Timer, Sport, HIP, etc)
External Interrupts (IRQx) Can Be Level Or Edge Sensitive (ICNTL)
Interrupts Have Priority And Can Be Nested
Interrupts Can Be Masked (IMASK)
Interrupts Can Be Forced Or Cleared Under Software Control (IFC) *
Different Family Members Have Different Interrupt Vector Tables
Interrupt Vector Table Always Begins At PM Address 0x0000
* Except ADSP-2100A
-
8/13/2019 BEngDSP Notes
98/181
Interrupts & Interrupt Vector Addresses
ADSP-2101
-
8/13/2019 BEngDSP Notes
99/181
9
U H
Interrupt SourceProgram startup at RESETIRQ2
SPORT1 Transmit / IRQ1SPORT1 Receive / IRQ0Timer
Interrupt Vector Address0x00000x0004 (highest priority)
0x00100x00140x0018 (lowest priority)
ADSP-2105
Interrupt Source
Program startup at RESETIRQ2
Interrupt Vector Address
0x00000x0004 (highest priority)SPORT0 TransmitSPORT0 ReceiveSPORT1 Transmit / IRQ1SPORT1 Receive / IRQ0Timer
0x00080x000C0x00100x00140x0018 (lowest priority)
0x00140x0018
0x001C
0x001C
Sequencer Mini-Quiz
-
8/13/2019 BEngDSP Notes
100/181
1
U H
Modify the answer of the DAGS Mini-Quiz to use a zero-overhead loop.
Assume N=100. Your program should require 9 Instruction Locations
0x1234
0x1234
0x1234
0x1234
0x1234
Data Memory
DM(0x3800)
Write the ADSP-2101 Instructionsto Find the Sum of the N=100Numbers Stored in Data Memory
0x1234
Sequencer Mini-Quiz Answer
-
8/13/2019 BEngDSP Notes
101/181
1
U H
.module/boot = 0 sequencer_mini_quiz;
.const buf_len = 100;
.var/dm/circ/abs=0x3800 data_buf [buf_len];
start:i0 = ^data_buf; /*Load address of data buf */l0 = %data_buf; /*Load length of data buf */m3 = 1;cntr = buf_len - 2; /*Load counter */ar = dm (i0, m3); /*Load first data value */ay0 = dm (i0, m3); /*Load second data value */do add_loop until ce;
/*Value */
ar = ar + ay0; /*Last addition */
rts;.endmod;
ADSP-2100 Family Peripherals
-
8/13/2019 BEngDSP Notes
102/181
1
U H
Memory Interfacing
Timer
Serial Ports
-
8/13/2019 BEngDSP Notes
103/181
1
U H
ADSP-21xx Family Memory Interface
ADSP-2101 Basic System Configuration
Clock or Crystal
-
8/13/2019 BEngDSP Notes
104/181
1
U H
SCLKRFS
TFSDTDR
14 24
16824
Serial Device
14 2
SCLK
RFS or IRQ0TFS or IRQ1DT or FODR or FI
A D
OE
WE
CS
DATA
MEMORY&PERIPHERALS
(Optional)
A D CS
OE
WE
PROGRAMMEMORY
(Optional)
A D CS
OE BOOTEPROM
27C6427C128
27C25627C512
150 ns
ADSP-2101
CLKIN CLKOUT VDD
SERIALPORT 0
GND
SERIALPORT 1
DATAADDRESSPMS DMS BMSRD WR
XTAL
MMAP
BG
BR
IRQ2
RESET (Optional)
Serial Device
(Optional)
ADSP-21xx Family Memory Architecture
V i d M C fi ti A F il M b *
-
8/13/2019 BEngDSP Notes
105/181
1
U H
Varied Memory Configurations Across Family Members*
Core Can Access PM Twice and DM Once Per Instruction
PM and DM Buses Multiplexed Off Chip*
Can Perform One Off-Chip Access with No Cycle Penalty
On Chip PM Can Be Initialized Through Boot EPROM or
Host Interface Port
External EPROM Can Store 8 Pages of Bootable Code.
Software Programmable Wait States
* Does not apply to ADSP-2100A
On Chip Memory Configurations For
ADSP-21xx Processors
ProgramProgramMemory
DataMemory Memory
-
8/13/2019 BEngDSP Notes
106/181
1
U H
ADSP-2100A
ADSP-2101
ADSP-2103
ADSP-2105
ADSP-2111
ADSP-2115
ADSP-21msp5x
ADSP-2161/63
ADSP-2171/73
MemoryRAM
MemoryRAM
yROM
-
1k
1k
1/2k
1k
1/2k
1k
1/2k
2k
-
-
-
-
-
-
2k
8k/4k
8k
-
2k
2k
1k
2k
1k
2k
-
2k
ADSP-2181 16k -16k
ADSP-2101 Program Memory Architecture
0x0000(Reset
-
8/13/2019 BEngDSP Notes
107/181
1
U H
(Vector)
0x07FF0x0800
0x37FF0x3800
0x3FFF
Internal PM
RAM BootedFrom ExternalBoot Memory
ExternalProgramMemory
ExternalProgramMemory
Internal PMRAM Not
Booted
MMAP = 0
(Boot)
MMAP = 1
(No Boot)
ADSP-21xx Data Memory Architecture
-
8/13/2019 BEngDSP Notes
108/181
1
U H
0x0000
0x3FFF
InternalData Memory
RAM
0x0400
0x0800
0x3000
0x3400
0x3800
0x3C00
1K ExternalDWAIT0
1K ExternalDWAIT1
10K ExternalDWAIT2
1K ExternalDWAIT3
1K ExternalDWAIT4
Memory Mappedand Reserved
Registers
ADSP-2171
Internal Data
Memory
RAM
ADSP-21xx Memory Control Registers
-
8/13/2019 BEngDSP Notes
109/181
1
U H
11 1 1 1 1 1 1 1 1 1 1 1 1 1
DWAIT4 DWAIT3 DWAIT2 DWAIT1 DWAIT0
Data Memory Wait State Control Register DM(0x3FFE)
System Control Register DM(0x3FFF)
0 10 0 0 1 1 1 10
PWAITProgram
MemoryWait States
BWAITBoot
MemoryWait States*
BPAGEBoot Page
Select
BFORCEBoot
Force Bit
* 7 wait states for Boot Memory on ADSP-2171
Memory Mapped Control Registersvs. Status Registers
Memory Mapped Control Registers
> Physical locations in Data Memory
-
8/13/2019 BEngDSP Notes
110/181
1
U H
y y
> Accessed by address
> Addresses 0x3C00 thru 0x3FFF (All Processors)
Status Registers (or Non-Memory Mapped Registers)
> Physical registers in the DSP
> Accessed by name
Memory Mapped Control Registers> Mainly to set up the peripherals (i.e., mode of operation)
Status Registers
> Set up the operation of the DSP core (i.e., MAC, interrupts)
> Provide information about the DSP core (i.e., stacks, status flags)
Initialize Memory Mapped Registers before running (i.e., not on the fly)
Status Registers are meant to be used on the fly
Memory Mapped Control Registers 0x3FFF System Control Register - Wait states, Enable SPORTs
0x3FFE Data Memory Waitstate Control Register
0x3FFD-0x3FFB Timer Control Registers - Set Timer values
-
8/13/2019 BEngDSP Notes
111/181
1
U H
0x3FFA -0x3FF7 SPORT0 Multichannel Word Enable Register
0x3FF6 SPORT0 Control Register - clock, frame and data modes
0x3FF5 SPORT0 SCLKDIV - Divide down register for SCLK
0x3FF4 SPORT0 RFSDIV - Divide down register for internal RFS
0x3FF3 SPORT0 Autobuffer Control Register
0x3FF2-0x3FEF SPORT1 Control and Setup (same as SPORT0)
0x3FEF-0x3FEC Analog Control Registers No SPORT1 autobuffer on msp5x parts
0x3FEB-0x3FE9 NO REGISTERS
0x3FE8 HMASK Register - HIP mask for interrupts
0x3FE7-0x3FE6 HIP Status Registers - HSR7 and HSR6 0x3FE5-0x3FE0 HIP Data Registers
Status Registers
ASTAT ALU Status Flags, MAC Overflow Flag, Shifter Input Flag
SSTAT Stacks Overflow and Empty (Read Only)
-
8/13/2019 BEngDSP Notes
112/181
1
U H
SSTAT Stacks Overflow and Empty (Read-Only)
MSTAT Computation Modes, Miscellaneous Functions
5 4 3 2 1 0
TimerSPORT1 Receive or IRQ0SPORT1 Transmit or IRQ1SPORT0 ReceiveSPORT0 TransmitIRQ2
0 0 0 0 0 01 = Enable
0 = Disable
4 3 2 1 0
IRQ0 SensitivityIRQ1 SensitivityIRQ2 Sensitivity
Interrupt Nesting
0
1 = Edge
0 = Level
1 = Enable0 = Disable
ICNTL External Interrupt Sensitivity (edge/level) and Nesting
IMASK Interrupt Enables - Masks the servicing of interrupts
IFC Interrupt Force/Clear (Write-Only)
Boot EPROM to Internal PM RAM
8 bits 24 bits
-
8/13/2019 BEngDSP Notes
113/181
1
U H
8k x 8
BootPage 0
2k x 24
0x0000
0x2000
BOOTEPROM
Internal PM RAM
0x1FFF
.
.
.
AdditionalBoot
Pages
0x0000
0x07FF
8 bitsAB
C
Page length
24 bitsA B C
A
B
C
X
A B C
11
2
2
Booting Order
ADSP-2101 Timer Block Diagram
16DMD Bus
-
8/13/2019 BEngDSP Notes
114/181
1
U H
TSCALE TPERIOD
CLKOUTTimer Enable
& Prescale LogicTCOUNTDecrement Zero
Count Register Load Logic
TimerInterrupt
Timer Enable
168
16
ADSP-2100 Family Timer Features
The ADSP-21xx programmable interval timer can generate periodic interrupts
-
8/13/2019 BEngDSP Notes
115/181
1
U H
The ADSP 21xx programmable interval timer can generate periodic interrupts
based on multiples of the processor's cycle time. The timer is not available onthe ADSP-2100.
TCOUNT = dedicated count-down register
TPERIOD = reloads TCOUNT at interrupt
TSCALE = # of Clock ticks before TCOUNT decrements - 1
TCOUNT is decremented every TSCALE+1 cycles. After TCOUNT
expires, it is reloaded with the value in TPERIOD. One interrupt
occurs every (TPERIOD + 1) * (TSCALE + 1) cycles.
ADSP-2101 Timer Registers
-
8/13/2019 BEngDSP Notes
116/181
1
U H
0x3FFD
0x3FFC
0x3FFB
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
TPERIOD Period Register
TCOUNT Counter Register
TSCALE Scaling Register00000000
ENABLING THE TIMER
-
8/13/2019 BEngDSP Notes
117/181
1
U H
1. Set values for TCOUNT, TPERIOD, and TSCALE.
2. Set bit 0 in IMASK to 1 to enable interrupt.
3. Execute "ena timer" instruction to start counting down.
(Bit 5 in MSTAT register)
Example Setup Code for Timer
i0 = 0x3ffb; /*i0 points to TSCALE*/
-
8/13/2019 BEngDSP Notes
118/181
1
U H
m0 = 1; /*modify value is 1 */
l0 = 0; /*not a circular buffer */
dm(i0,m0) = 0; /*set TSCALE to decrement every cycle*/
dm(i0,m0) = 49; /*to generate first interrupt at 50 cycles*/
dm(i0,m0) = 99; /*to reload TCOUNT with 99 at interrupt*/
IMASK = 0x1; /*enables the timer interrupt*/
ena timer; /*starts the count down after executing this*/
TIMER MINI-QUIZ
-
8/13/2019 BEngDSP Notes
119/181
1
U H
1. Write code to generate a timer interrupt every 50 cycles the first time, and
75 cycles thereafter (any decrement that works).
2. Write code to generate a timer interrupt every 300 ms. Assume clock is
16.67MHz.
3. What is the longest time you can set the timer for if you have a 12.5MHzcycle time. What would the values of TSCALE, TCOUNT, and
TPERIOD?
TIMER MINI-QUIZ ANSWER1. 2.i0 = 0x3ffb; /*same first 3 lines*/ /*300ms = 5,000,000 cycles*/
m0 = 1; dm(i0,m0) = 0xF9; /*TSCALE = 250*/
l0=0; dm(i0,m0) = 0x4E1F; /*TCOUNT = 20,000*/
dm(i0,m0) = 0; /*set tscale=1*/ dm(i0,m0) = 0x4E1F; /*TPERIOD = 20,000*/
-
8/13/2019 BEngDSP Notes
120/181
1
U H
dm(i0,m0) = 49; /*set tcount = 50*/ imask = 0x1;dm(i0,m0) = 74; /*set tperiod = 75*/ ena timer;
imask = 0x1;
ena timer;
3. /*same first 3 lines*/
/*12.5mHz processor yields an 80 ns instruction cycle time TCOUNT and TPERIOD are 16
bit registers - largest number they can represent is 65535, TSCALE is an 8 bit register, sothe largest number it can represent is 255. Following the equation
(TSCALE+1)*(TPERIOD+1) gives us 0x100 0000 number of cycles per timer interrupt.
This number multiplied by 80ns is 1.3422 seconds*/
dm(i0,m0) = 0xff;
dm(i0,m0) = 0xffff;
dm(i0,m0) = 0xffff;imask = 0x1; ena timer;
-
8/13/2019 BEngDSP Notes
121/181
1
U H
ADSP-21xx Serial Port
-
8/13/2019 BEngDSP Notes
122/181
1
U H
ADSP-21xx Serial Port UART
ADSP-2101 Serial Port Block Diagram
DMD Bus16
-
8/13/2019 BEngDSP Notes
123/181
1
U H
CompandingHardware
Receive Shift Register
16
16
TXnTransmit Data Register
Transmit Shift Register
16
16
DT DR
SerialControl
SCLKTFS RFS
InternalSerialClock
Generator
RXnReceive Data Register
ADSP-21xx SPORT Features
-
8/13/2019 BEngDSP Notes
124/181
1
U H
ADSP-21xx SPORTs Are Used For Synchronous Communication
Full Duplex
Fully Programmable
Autobuffer Capability
Multi-Channel Capability
Data Rates Up To 13 Mbits/sec
2171 Data Rates Up To 20 Mbits/sec
Examples of Serial Port Implementation
Connecting a CODEC to the Serial Port
-
8/13/2019 BEngDSP Notes
125/181
1
U H
Connecting Two 2101's Together
Using the Serial Port as a UART
2101TP3053CODEC
2101 2101
2101
(withsoftwareUART)
PC
AD233
(RS-232 Driver)
ADSP-21xx SPORT Hardware
SCLK: Serial Clock
SPORT Has 5 Wires
-
8/13/2019 BEngDSP Notes
126/181
1
U H
RX: Data Receive
TX: Data Transmit
TFS: Transmit Frame Sync
RFS: Receive Frame Sync
SCLK
TFS1
RFS1
RX
TX
ADSP-21xx Serial Device
Serial Clock
Transmit Frame Sync
Receive Frame Sync
Receive Data
Transmit Data
ADSP-21xx SPORT Software
Access Serial Port Data By Accessing SPORT Data Registers:
-
8/13/2019 BEngDSP Notes
127/181
1
U H
TX0, TX1, RX0, RX1
Configure Serial Port Through Memory Mapped Control Registers:
System Control Register **
SPORT Control Register **
SPORT SCLKDIV Register
SPORT RFSDIV RegisterSPORT Autobuffer Control Register
SPORT0 Multichannel Enable Registers
** Required to Configure SPORTs
Synchronize SPORT Transfers and Processor Operation With Interrupts
Each SPORT is Allocated a Transmit and Receive Interrupt
The Base Architecture of Floating-Point DSP Processor
DAG 1Program
CACHE32 x 48 JTAG Test
&Emulation
TimerDAG 2
-
8/13/2019 BEngDSP Notes
128/181
1
U H
8 x 4 x 32Program
Sequencer
Emulation
BusConnect
24
32
48
40DMD BUS
PMD BUS
DMA BUS
PMA BUS
Fl P/Fx PALUMultiFx P MAC
Fl P/Fx P 32-Bit
Barrel Shift
RegisterFile
16 x 40
8 x 4 x 24
IEEE Compatibility(IEEE Floating Point Standard 754/854)
Data Formats32-Bit Single-Precision IEEE Floating Point
(23-Bit Data or Mantissa, 8-Bit Exponent, & Sign Bit)
-
8/13/2019 BEngDSP Notes
129/181
1
U H
40-Bit Extended Single-Precision IEEE Floating Point(31-Bit Data or Mantissa, 8-Bit Exponent, & Sign Bit)32-Bit Fixed Point (Integer and Fractional) With 80-Bit
Accumulation
RoundingRounding-to-Nearest (Unbiased Rounding)
Round-Toward-Zero (Truncation)
IEEE Exception HandlingOverflowUnderflowEquals ZerosDivide-by-Zero
Interrupt on Exception or Latched Status
RegisterFile
Floating-Point Multiplier/MAC
-
8/13/2019 BEngDSP Notes
130/181
1
U H
Fl P/Fx PALU
MultiFx P MAC
Fl P/Fx P 32-BitBarrel Shift
16 x 40
Example Multiplier/MAC InstructionsF1 = F5 * F7
R2 = R3 * R8 (SSF)
MRF = MRF + R5 * R0 (UUIR)
RegisterFile
Floating-Point Multiplier/MAC
-
8/13/2019 BEngDSP Notes
131/181
1
U H
Fl P/Fx PALU
MultiFx P MACFl P/Fx P 32-Bit
Barrel Shift16 x 40
Example Multiplier/MAC InstructionsF1 = F5 * F7
R2 = R3 * R8 (SSF)
MRF = MRF + R5 * R0 (UUIR)
Example Multi-Function Instructions
-
8/13/2019 BEngDSP Notes
132/181
1
U H
IF EQ F1 = ABS F8, F9 = DM (I0,M4)
F8 =F1*F6, F3=F9+F14, F9=F9-F14,DM(I2,M0)=F10, PM(I10,M10)=F3
The System Architecture
1xCLOCK 4
-
8/13/2019 BEngDSP Notes
133/181
1
U H
Peripherals
DataMemoryD SP
Selects
OE
WE
ADDR
DATA
SelectsOE
WE
ADDR
DATA
ACK
PMS1-0
PMRD
PMWR
PM A
PM D
PMPAGEPMACK
PMTS
DMS3-0
DMRD
DMWR
DM A
DM D
DMPAGEDMACK
DMTS
CLKINRESET IRQ3-0
ProgramMemory
Selects
OE
WE
ADDR
DATA
54
2
24
48
4
32
40Processor
NOISY SIGNAL CLEAN
The Complete Architecture
-
8/13/2019 BEngDSP Notes
134/181
1
U H
DIGITALS/H A/D PROCE- D/A
fs
DiscreteTime Value
AnalogueDiscrete FilterProcessing
SSOR
Signal
SIGNAL
Analogue
Signal
What is a Real Time Application?
Real Time is misleading expression.Howeverit means that the DSP system can process therequired algorithm within a specified time
-
8/13/2019 BEngDSP Notes
135/181
1
U H
DIGITALS/H A/D
PROCESSOR
fs
RADAR SIGNAL
DISPLAY
Fourier
Transform
x(t) x(f)
f1 f2
Real Time Operating Systems as an IdealEnvironment for Embedded Applications
The current DSP processors:
-
8/13/2019 BEngDSP Notes
136/181
1
U H
Are more than high-performance signal -processingengines
Provide a more regular instruction set, with plenty ofaddress space to run large programs
Come with efficient C compilers rival generalpurpose microprocessors
Cli k t dd t t
DSPEmbedded Applications
DSP
RTOS
-
8/13/2019 BEngDSP Notes
137/181
1
U H
Click to add text RTOS
Fax TasksTelephone
Tasks
Speech
Recognition
Tasks
Sound
Tasks
Generation
Answering
Machine
Tasks
ARCHITECTUREDSP
RTOS
DSP Memory
M t
Real-Time
M lti T ki
DSP
St I/O
-
8/13/2019 BEngDSP Notes
138/181
1
U H
Managment Multi-Tasking Stream I/O
DSP
Event Handling
Memory Segments Processor Segments Peripheral Devices
Cli k t dd t t
Operating System Features: BOS Nucleus RXTC SPOX Helios
Preemptive Task Scheduling Yes Yes Yes Yes Yes
Features for Real Time Operating Systems
-
8/13/2019 BEngDSP Notes
139/181
1
U H
Click to add textTime-Sliced Scheduling Yes Yes Yes No YesRound-Robin Scheduling ? Yes Yes No Yes
Parallel Processing No No No Optional Yes
Inter-Task Messages Yes Yes Yes Yes Yes
Memory Management Yes Yes Yes Yes Yes
Interrupt Management Yes No Yes Yes YesTimer Management Yes Yes Yes No Yes
Device-Independent I/ O No No No Yes Yes
Stream I/ O $495* No No Yes Yes
OS RAM/ ROM Size (Bytes) 5K-40K 4K-20K 12K-16K 44K+ 80K-200K
Please contact the vendors listed above for the best and most up-to-date information
Compression Techniques and a Compressorand De-Compressor Generator
The CCITT/ISO Joint Photographic Experts Group
(JPEG) d (MPEG) di it l i i
-
8/13/2019 BEngDSP Notes
140/181
1
U H
(JPEG) and (MPEG) digital image compressionprocessing algorithmsare seriously required for:MultimediaVideo EditingColour Publishing and Graphics Arts
Image-Processing, Storage and RetrievalColour Printers, Scanners and CopiersHigh-Speed Image Transmission Systems forLANs, Modem and Colour FacsimileDigital Cameras
These algorithmsmay be implemented in real timeas:
A) A dedicated Chip (Compressor)
C P d t C i DCT H ff Q t iti P i
-
8/13/2019 BEngDSP Notes
141/181
1
U H
Company Productname
Compressionratio
DCTTable
HuffmanTable
QuantasitionTable
Pricein
Fast
Forward
Outlaw Digital
Video
from 4:1 to
10:1
Board: Disc 0.5 GByte 4700
950C-Cube CL 550
En- / Decoder
from 8:1 to
100:1
static program program 80
C-Cube CL 650En- / Decoder
from 1:1 to50:1
static program program 200
Winbond W9930En- / Decoder
from 8:1 to100:1
static static program 29
LSI Logic L64702 * program
program program 60
B) DSP Processor + Compressor
-
8/13/2019 BEngDSP Notes
142/181
1
U H
DATA
compressed
uncompressed
DATA
compressed
uncompressedDCT
IDCTDSP Processor
DCT: Discrete Cosine Transform
C) Software Solution (DSP C / Assembler code)
Company Processor type Data Bits Operation
frequency
Benchmarks
Optibase Motorola
56002
24 40 MHz *
-
8/13/2019 BEngDSP Notes
143/181
1
U H
Atlanta Signal
Processor
Texas
Instruments
TMS320C31
32 16 MHz 64 KB Grey scale
700ms
Sonitech
International
Texas
Instruments
TMS320C3x
32 16 MHz 400 Kbytes/s b &
540 Kbytes /s Colour
Atlanta Signal
Processor
Analog Devices
21020
32 33 MHz 500 Kbytes/s b & W
600 Kbytes /s Colour
Zoran Corp Zoran ZR38000 16 25 MHz 440Kbytes/s b & W
500 Kbytes /s Colour
Compressor-De-Compressor Generator
n Millions
Pixels/Second
Processing Rate
Quantizer&
HuffmanTables
CompressionRate
1:1 to 80:1
-
8/13/2019 BEngDSP Notes
144/181
1
U H
Processing Rate
MPEG Param.Comp/Decomp
Generator
CAssembly
JPEG Param.
1:1 to 80:1
n Bit Gray Scale, RGB, CMYK, 4:4:4:4, YUV Colour Space I/O
Comdisco: SPW
Hyperception: HW
Momentum: FDAS
Modelsfor
Code & Model Generator
Performance Measures
Two measures are used commonly:
MIPS: Millions of Instructions Per Second
-
8/13/2019 BEngDSP Notes
145/181
1
U H
MIPS: Millions of Instructions Per SecondThis is a measure of raw instruction
execution rate without specifying the nature of the
computations.
MFLOPS: Millions of Floating Point Operations Per Second
This is a measure useful in assessing computations in
floating point format.
The difference between MIPS and MFLOPS can be appreciated by
considering a simple DO LOOP high level language construction:
DO I = 1 TO 1000000 STEP 1
BEGIN
Z(1) = X(I) * Y(I) + C(I);
END
-
8/13/2019 BEngDSP Notes
146/181
1
U H
END
Each iteration accomplishes two floating point operations, yet depending on the
host computer the compiled assembly language code could occupy many bytes.
The speed of execution of the two floating point operations depends therefore on
the MIPS of the processor; provided that each iteration could be completed in
say a nanosecond, the processor would then execute at the rate of two MFLOPS.A system of a giga (one thousand millions)! processors could conceivably do all
the iterations at once and attain a performance of two giga MFLOPS.
Despite its spread use, an MIP is perhaps the poorest definition of performance
since it contains no quantifiable attributes for assessing useful processing.
The term FLOPS is widely used in signal processing applications and is acommon measure of performance in comparing processors.
Data Flow Bottle-necks & Solutions ;Pipeline & Parallel Architectures With Examples
DATA INMEMORY INSTRUCTION
-
8/13/2019 BEngDSP Notes
147/181
1
U H
DATA INMEMORYBUS
DATA
OUTPUT
INSTRUCTION
Bottle-neck Of a Shared Instruction/Data Bus inVon-Neumann Machine
INSTRUCTION
DATA BUS
DATA
ALU
TMP
ACCUM
GENERALGPURPOSE
REGISTERS
PROG CNTR
ADDR REG
MEMORY
(INSTRUCTIONSAND DATA)
The First Generation P Architecture
-
8/13/2019 BEngDSP Notes
148/181
1
U H
AND DATA)
ADDRESS
CONTROL & TIMING
ADDRESS BUS
Each instruction is a new event; it is fetched, decoded, and executed.
The Assembly Language Commands Help To Execute Lengthy Manipulations
On Designated Strings Of Data.
The Programmer Must Code Iterative Loops Or To Use Other Mechanisms To
Enhance Performance While Constrained With The Basic Limitations.At The Algorithmic Level, Many Sequences Of Operations Have Little Or No
Precedence Relationships.
The simplest view of a pipeline is that each stage consists of combinational
logic driven by an input register. The output from a stage captured by the
input register of the following stage. Each stage has a delay for the initial
data capture and subsequent processing.
It is possible to construct two types of pipeline system:
i) Synchronous Pipeline
Overview of the Pipeline Approach
-
8/13/2019 BEngDSP Notes
149/181
1
U H
i) Synchronous Pipeline
If all stages have an equal delay, then a synchronous clock can transfer
results into each input register. This is the simplest control problem.
ii) Asynchronous Pipeline
If there is a large discrepancy between the various delays in each stage,
then an asynchronous data transfer might be in order. Here the intermediateregisters are omitted. The design of such pipes requires careful timing of
data input/output.
The following figure shows a simple Pipeline DSP System.
Combinatorial Logic
In
p
u
t
R
e
g
i
st
DSPADSP-2181
-
8/13/2019 BEngDSP Notes
150/181
1
U H
t
e
r
Stage jStage j-1 Stage j+1
Simple Pipeline DSP System
AD
Converterjj-1 j+1 DA
Converter
When can the Pipeline Approach be considered?
In general a pipeline can be considered if:
i) The procedure can be broken into a sequence of discrete steps,
ii) The steady state data flow matches the reminder of the system, &
iii) Components can be found which implement the steps with the
desired response.
-
8/13/2019 BEngDSP Notes
151/181
1
U H
p
How can the performance of the pipeline be measured?
A synchronous pipeline produces a result every clock period t,
i.e. a data-flow rate of 1/toutputs per second. An N-stage pipelinegives an apparent N-fold increase in performance. If the input to the
pipeline is intermittent, however, then some stages will not be
processing valid data, and this must be accounted for by the control
mechanism. If, on the average, only a fraction P of the total stages
are occupied, then the data flow falls to P/toutputs per second.
In the following figure, a sequence of procedures is assumed each to
process data in time t, except for the FFT procedure which
consumes 8 t. Given that all the mechanisms for increasing
Question:
-
8/13/2019 BEngDSP Notes
152/181
1
U H
consumes 8 t. Given that all the mechanisms for increasingthroughput (i.e. for decreasing t)have been exhausted, what are the
alternatives to enhance DSP performance?
t 8t tP1 P = FFT2 P3
Sequential Data Flow
-
8/13/2019 BEngDSP Notes
153/181
Overview of the Parallel Approach
The simplest view of a parallel approach is that the input data to be fed to the units
sequentially via the input commutater and the output commutater collect the result
data after the processors have been executed simultaneously.
-
8/13/2019 BEngDSP Notes
154/181
1
U H
The following figure shows a simple Parallel DSP System.
DSPADSP-2181
I
np
u
t
C
o
m
m
u
ta
O
u
t
p
u
t
C
o
m
m
ut
-
8/13/2019 BEngDSP Notes
155/181
1
U H Simple Parallel DSP System
AD
ConverterDA
Converter
a
t
e
r
a
t
e
r
When can the Parallel Approach be considered?
In general a parallel approach can be considered if:
i) The procedure can not be broken into a sequence of discrete steps, &
ii) The steady state data flow does not need to be constrained.
Note: The input/output commutation is usually difficult to implement and consumes
some overhead which lowers the effective throughput.
-
8/13/2019 BEngDSP Notes
156/181
1
U H
How can the performance of the parallel be measured?
A parallel array need not have an identical delay in each path, though this
complicates the control problem. If each of N units has a delay ti, then the
average delay could be used to compute data-flow. For N parallel paths theresponse will be shown to be the same as an N-stage pipeline. If a proportion
P of units is unused then the output rate drops.
The overall behaviour is identical therefore with a pipeline although
implementation issues are widely different.
The final resort to enhance DSP Performance is in the form of Multiplicity:Answer (continue):
b) Parallel Array of Processing Units
In this case the individual processors still operate with a response time of 8 .
The input commutater sequentially allocates input data which is collected
8 seconds later by the output commuter.
-
8/13/2019 BEngDSP Notes
157/181
1
U H
t 8t t
1
8
Bandwidth in = 1/t Bandwidth out = 1/t
Parallel Data-Flow Solution
Input Commutater Output Commutater
Example: FFT with Serial, Pipelining and Parallel Butterflies
The FFT provides a good example of the use of alternative
signal-processing architecture to improve throughput.
The key comparison is:i) That of butterfly time &
-
8/13/2019 BEngDSP Notes
158/181
1
U H
i) That of butterfly time, tB, &ii) The time, (N/2)T log2N, to cycle through all butterflies of an FFT.
The interval, t, includes the butterfly computation time and anyoverhead in address generation or looping.
Realistic alternatives to consider are:
Serial (direct)
Pipeline log2N stages deep, with N/2 steps
Parallel N/2 butterfly processors, iterate log2N times
t1 t5DO 20 J = 1, log2 NDO 10 I = 1 N/2
Serial (direct)
Single processor compute each butterfly, one step at a time.
-
8/13/2019 BEngDSP Notes
159/181
1
U H
t4
t3
t2 t6
t7t8
t9t1t11t12
DO 10 I = 1, N/2
10 CONTINUE
20 CONTINUE
The Serial (Classic) Approach
The Computation Flow
DO I = 1, N/2 DO I = 1, N/2 DO I = 1, N/2
Pipeline log2N stages deep, with N/2 stepsHere there are log2N butterfly processors, corresponding to the number of passes
(3 in the case of 8 data points- B1, B2, B3); each is used to compute the butterflies
pertinent to its pass in series; as each pass is computed, the processors are ready to
accept a pair of inputs for the next pass, and when the pipeline is full (steady state),
a set of outputs will be produced by each pass (N/2 computations).
-
8/13/2019 BEngDSP Notes
160/181
1
U H
Log2 N BUTTERFLY
PROCESSORS (B1 - B3)
IN A PIPELINE
B1t4
B1t3
B1t2
B1t1 B2t1
B2t2
B2t3B2t4
B3t1B3t2B3t3B3t4
The Pipeline Approach
N/2 BUTTERFLY PROCESSORS (B1 - B3) IN PARALLEL
Here there is one processor for each of the N/2 steps per pass; all butterflies for
that pass are computed at the same time; as soon as one pass is completed, all are
ready for the next pass; in the steady state, there will be an output for every
computation cycle.
Parallel N/2 butterfly processors, iterate log2N times
-
8/13/2019 BEngDSP Notes
161/181
1
U H
N/2 BUTTERFLY PROCESSORS (B1 - B3) IN PARALLEL
B4t1
B3t1
B2t1
B1t1 B1t2
B2t2
B3t2B4t2
B1t3B2t3B3t3B4t3
The Parallel Approach
DO J = 1, log2N
DO J = 1, log2N
DO J = 1, log2N
DO J = 1, log2N
Summarize the differences between the serial, pipeline, and theparallel architecture for the FFT example in terms of the
computation time and the number of butterfly processors.
Consider a 1024-point FFT, what are the time and hardware costs
for the three architectures?
Q.
A.A hite t e C t ti Ti e N be f B tte fl
-
8/13/2019 BEngDSP Notes
162/181
1
U H
Architecture Computation Time Number of Butterfly
Processors
Serial N/2log2N 1
Pipeline N/2 log2N
Parallel log2N N/2
The 1024-point FFT costing:
Serial 5,120 1
Pipeline 512 10
Parallel 10 512
High Performance System Classification SchemeThere have been many attempts to classify processor architectures. A standard classification
scheme would be exceedingly useful both for discussion purposes and as a guide to processor
designs. The requirements for such a scheme are at least that:
It be complete (i.e., include all architectures) and
Orthogonal (i.e., differentiate the key attributes).Unfortunately, despite the attractiveness of the concept, no such scheme exists. Of the many
-
8/13/2019 BEngDSP Notes
163/181
1
U H
proposals, one forms the basis of many others. It is neither complete nor orthogonal, yet its
elegance and intrinsic simplicity are attractive and it does concentrate on data flow and control
in a general way.
The basis of scheme is that a processor processes data by a sequence of instructions regardless
of the format and mechanisms whereby each arrives at the point of action. Based on the concept
of a data stream and an instruction stream, four possibilities exist:
SISD - Single Instruction Single Data Stream
SIMD- Single Instruction Multiple Data Stream
MISD Multiple Instruction Single Data Stream
MIMD Multiple Instruction Multiple Data Stream
Answer:
Note that both the Babbage and Von Neumann architectures are SISD, although they differ greatly in
implementation. The performance of such a configuration can be though of as unity for purposes of comparison:
I
Data in
D1 D2 D3 D4
Examples are shown in the following figures.
Q. With the aid of appropriate diagram(s), show how the four categories in Flynns taxonomy can be emulatedon a dual processor shared-memory system. Your diagrams must clearly show the IS and DS from and to the
various units.
-
8/13/2019 BEngDSP Notes
164/181
1
U H
Data in
I1 I2
Data out
(Version 1)
I3
I1 I2 I3 I4
Data in
D
(Version 2)
I
DData in Data out
Data out
I
SISD SIMD
MISD
D1
I1
D2
I2
D3
I3
D4
I4
MIMD
The SIMD architecture is an example of a parallel array in which each processing unit executes the same
instruction. It can achieve an n-fold increase in data flow band-width for each instruction, provided that
the units can be continuously utilized.
The original motivation for developing SIMD array processors was to perform parallel computations on
vector or matrix types of data. Parallel processing algorithms have been developed by many computer
scientists for SIMD computers. Important SIMD algorithms can be used to perform matrix multiplication,
Fast Fourier Transform (FFT), matrix transposition, summation of vector elements, matrix inversion,
parallel sorting, linear recurrence, Boolean matrix operations, and to solve partial differential equations.
The MIMD architecture is implemented by a multiple processor system. Clearly implied is some form of
cooperative network to share a computational task (completely autonomous units being of little interest)
Discussion on the classification scheme
-
8/13/2019 BEngDSP Notes
165/181
1
U H
cooperative network to share a computational task (completely autonomous units being of little interest).
This is an example of a parallel array in which the task assigned to each processor can be different. The
performance enhancement potential is equal to the number of processors.
The MISD architecture is not widely implemented in practice and substantial disagreement exist on its
exact structure. It is considered here as a pipeline in which a single data stream is modified at successive
stages., and its performance enhancement potential equals the number of stages as shown in the previous
section.
There is a relationship between these classifications and the structure of processing algorithms. An
algorithm may contain a collection of processing tasks which could optimally be assigned to different
processing configurations to achieve an overall higher performance. If components were of sufficiently
low cost, a solution might be to build a conglomerate of different processing architectures and utilize the
optimum one at appropriate points in the algorithm. The task assignment problem here is formidable; and
as well the physical complexity and lowered reliability of such a conglomerate of components is a major
limiting factor of such a scheme. This will be discussed in more detail later.
SIMD Matrix Multiplication & SIMD FFT
*) G Barnes, et al.,"The ILLIAC-IV Computer," IEEE Trans. on Computers,
Aug. 1968, pp. 746-756.
To be found in the following References
-
8/13/2019 BEngDSP Notes
166/181
1
U H
**) K Hwang & F Briggs, "Computer Architecture and Parallel Processing,"
McGraw-Hill Book Company, 1985.
*) B Wilkinson,"Computer Architecture: Design and Performance,"
Prentic-Hall Int. Ltd, 1991.
How To Design SIMD DSP System From TheOff-Shelf Fixed-Point DS Processors?
Here we will develop SIMD DSP system with a processor-pair
architecture, based on a dual-port RAM. The design is easy to
implement and provides a significant computational boost overa single processor.
-
8/13/2019 BEngDSP Notes
167/181
1
U H
The off-shelf Fixed-Point DS Processors are two ADSP-2101s,
each with its own private memories. The following figure shows
a block diagram of the system hardware architecture.
A processor pair almost doubles the speed of a single processor while
Keeping the architecture and
Inter processor co-ordination as simple as possible.
Hardware Architecture
Program
Memory
Program
Memory
ADSP
2101
ADSP
2101
Private
DataMemory
Private
DataMemory
Common DataData
Memory
DMA DMA
PMD PMD
DMA DMA
-
8/13/2019 BEngDSP Notes
168/181
1
U H
y
(Dual-Port
RAM)
PMD PMDDMACK DMACK
BUSYL BUSYR
Processor Pair Block Diagram
Private memories are accessible to one processor only.
Common memory is accessed by both.
Each memory has a private memory of 32K of 24-bit
program memory and 14K of 16-bit data memory.
In addition, 2K of 16-bit dual-port RAM is shared by both processors.This area of memory allows inter-processor communication and data
transfers.
Software ArchitectureTo complement the hardware design, a hypothetical application is
presented. Data is input and low-pass filtered by one processor,
then the second processor determines the peak location within a
filtered window.
Although the software implementation is simplistic, it shows a technique for programming
in a multiprocessing environment: alternating buffers and flags
-
8/13/2019 BEngDSP Notes
169/181
1
U H
The alternating buffers in this application are two identical buffers
located in dual-port RAM so both processors can access them:
The first processor fills buffer 1 with information,While the second processor fills the information in buffer 2.
Each buffer has a flag that indicates completion of operations on
that buffer.
When processor 1 has finished its operations on the buffer data,
It sets the flag, signalling processor 2 to begin operations on that buffer.
in a multiprocessing environment: alternating buffers and flags.
The sequence of operations is shown in the following table:
Processor 1 (Filter) Processor 2 (Peak Locator)Initialise flags, coefficients initialise pointers
delay line, pointers
Perform low pass filter Check flag 1; wait if not set
operation on data in buffer 1
Set flag 1Check flag 1; if set, perform
-
8/13/2019 BEngDSP Notes
170/181
1
U H
g p
Perform low pass filter peak locating operation on
operation on data in buffer 2
data in buffer 1
Clear flag 1
Set flag 2
Check flag 2; if set, perform
Perform low pass filter peak locating operation on
operation on data in buffer 1 data in buffer 2
Clear flag 2
Set flag 1; etc.Check flag 1; etc.
The alternating buffer scheme is easier to implement than a single buffer scheme. If only one buffer were used, careful timing analysis or extensive
handshaking would be required to ensure that the processors did not use old or invalid data.
The Modified Harvard Architecture
DSP
Processor DataStorage
DM
Program
&Data
St
PM
Data Data
Address Address
3224
Multiprocessing With The SHARC
-
8/13/2019 BEngDSP Notes
171/181
1
U H
Harvard Architecture: Simultaneous Access of Data and Instruction
Modified Harvard Architecture: Simultaneous Access of 2 Data Memories and Instruction from Cache Gives Three Bus Performance with only 2 Busses
Storage32/4048
I/O
Cache
SHARCComplete Signal Computer On A Chip
ADSP-21000 Family High Performance Processor Core - 25ns = 40MIPS / 120 MFLOPS
Large Efficient On-Chip Memory System
- 4 Megabits on ADSP-21060- 2 Megabits on ADSP-21062
-
8/13/2019 BEngDSP Notes
172/181
1
U H
- 2 Megabits on ADSP-21062
DMA Controller and I/O Processor- Allows Flexible, Zero-Overhead, High-Speed Data Transfers
- 240 Mbytes/s
Host Interface- Efficient Interface to 16- & 32-Bit Microprocessors
Two Serial Ports- 40 Mbit/s Multichannel Serial Ports
Two Integrated Multiprocessing Interfaces- Glueless Cluster Interface Tran