bengdsp notes

8/13/2019 BEngDSP Notes

1/181

1

U H

BEng

School of Engineering & Technology, University of Hertfordshire

Prof. Talib Alukaidey

Digital Signal Processing


2/181

2

U H

Table of Contents

Outline ofDigital Signal Processors

Digital vs. Analogue Signal Processing ---------------------------------------------------------------------- Page 3Why process signals digitally? -------------------------------------------------------------------------------- Page 5What is Digital Signal Processing? ---------------------------------------------------------------------- Page 6What are Digital Signal Processors? -------------------------------------------------------------------- Page 7What are the typical Applications for DSP? ---------------------------------------------------------- Page 9What do you need to produce a Functional DSP Device? --------------------------------------- Page 15The Efficiency of the Assemblers & the Goodies of the Simulators ------------------------ Page 20High Level Languages and Their Advantages ------------------------------------------------------- Page 25Binary Notation in DSP's ------------------------------------------------------------------------------------ Page 29Features Of ADSP-2100 Base Architecture ----------------------------------------------------------- Page 41ADSP-2100 Family Base Internal Architecture ------------------------------------------------------- Page 42ALU ----------------------------------------------------------------------------------------------------------------- Page 43MAC ---------------------------------------------------------------------------------------------------------------- Page 55Shifter -------------------------------------------------------------------------------------------------------------- Page 71

Data Address Generator (DAG) Operations ---------------------------------------------------------- Page 83Program Sequencer Operations -------------------------------------------------------------------------- Page 93ADSP-2100 Family Peripherals --------------------------------------------------------------------------- Page 102The Base Architecture of Floating-Point DSP Processor ---------------------------------------- Page 128The System Architecture ------------------------------------------------------------------------------------ Page 133The Complete Architecture --------------------------------------------------------------------------------- Page 134What is a Real Time Application? ------------------------------------------------------------------------ Page 135Real Time Operating Systems as an Ideal Environment for Embedded Applications -- Page 136Compression Techniques and a Compressor and De-Compressor Generator ---------- Page 140Performance Measures------------------------------------------------------------------------------------------ Page 145Data Flow Bottle-necks & Solutions; Pipeline & Parallel Architectures With Examples --- Page 147High Performance System Classification Scheme ------------------------------------------------- Page 163SIMD Matrix Multiplication & SIMD FFT ---------------------------------------------------------------- Page 166How To Design SIMD DSP System From The Off-Shelf Fixed-Point DS Processors? ----------- Page 167Multiprocessing With The SHARC ------------------------------------------------------------------------ Page 171VLIW Compiler and the DSP Super Computer Architecture Goes Hand in Hand -------- Page 213


3/181

3

U H

Digital vs. Analogue Signal Processing

Digital vs. Analogue Signal Processing

Y(f)

X(f)

LP BP HP

f

x(t)R y(t)C

x(t)R y(t)

CL

x(t)R

y(t)C

Simple

Filters

YHP

YBP

YLP

t

t

x(t)

Data with abroad rangeof spectralcontent

Filters are typically used to pick out signals of interest from noise, by making use of their differing frequency

characteristics.

Filters can be designed analogue components or digital components. The following figure shows simple

analogue filters:


4/181

4

U H

DIGITALS/H A/D PROCE- D/A

fs

NOISY SIGNAL CLEAN

DiscreteTime Value

AnalogueDiscrete FilterProcessing

SSOR

Signal

SIGNAL

The following figure shows the required components for Digital filters:

Analogue

Signal


5/181 5

U H

Bandwidth Aging Temp Drift Accuracy Upgrade Prediciton0

10

20

30

40

50

60

70

80

90

Bandwidth Aging Temp Drift Accuracy Upgrade Prediciton

Analogue

Digital

Why process signals digitally?


6/181 6

U H

For reasons of simplicity and flexibilityassociated with the binary nature of the

electronics, processing of signals is most

conveniently done digitally and it is this

major area of electronics, informationtechnology and control engineering known

as Digital Signal Processing.

What is Digital Signal Processing?

Digital Signal Processings are Numerical Techniques ToExtract Information From Discrete Time, Discrete ValuedSignals.


7/181 7

U H

The rapid advances being made in the field of digital component technology

are having profound effects on all aspects of digital systems design.

Nowhere are these effects being felt more strongly than in the design of highperformance systems for such applications as digital signal processing.

This part of the DSP2course brings together a wide variety of logical

concepts that impact the design of such systems which acknowledge and

take advantage of modern component technology.

The Digital Signal Processors may be interpreted as:

1- The design of VLSI components intended for use in digital signal

processing applications, &

2- The design of digital signal processing systems that utilise VLSI

components.

What are Digital Signal Processors?


8/181 8

U H

2

< 300s1

< 300 s0

< 300 sS A M P E L S

F I

a i x n i

N

( )* ( ) 1

1

< 300

R

y n a i x n i

N( ) ( )* ( ) 11

2 1 0

SPEECH RECOG.


9/1819

U H

What are the typical Applications for DSP?

Communications

Echo CancellationScrambler-Descrambleretc.

Radar

ImagingSpeechControlGeologyMedical

and more and more


10/1811

U H

SPEECHAmong The Applications of DSP to Speech are:

. VOCODERS . SYNTHESIS . ANALYSIS . RECOGNITION

One of the Largest Applications is in Voice Synthesis:

Impulse

TrainGenerator

RandomNumberGenerator

Pitch PeriodDigital Filter Coefficients(Vocal Tract Parameters)

X

Amplitude

SpeechSamples

Time-Varying

DigitalFilter


11/1811

U H

CONTROLControl Systems are Finding Applications for DSP

. Lead/Lag Compensators . Transducer Linearisation . LargeMultivariate Systems

For Example: Feedback Control

Digital

Command D/A

Dynamic

System

A/D

Feedbac

Digital

Filter

.


12/181

1

U H

COMMUNICATIONCommunications Applications of DSP Include:

. PCM Generation . Tone Detection . Adaptive EchoCancellers . SSB Generation

For Example: SSB Via Hilbert Filters

X(t)A/D

Delay

HilbertFilter

SIN

COS

Y(n)

X(f)

f

Y(f)

f


13/181

1

U H

IMAGINGImage Processing Applications Include:

. Deblurring . Data Compression . Scene Analysis . 3-DReconstruction

For Example: A Moving Camera Blurs a Picture and canbe Modelled as a Low Pass Filter. Deblurring Requires the

Inverse Linear Operation

Scene

MovingCamera

Picture

2-D

Filter

Inverse

Point SpreadFunction

Deblurre


14/181

1

U H

MEDICALDSP is Finding New Applications in the Medical Field:

. Patient Monitoring . Tomography . Blood Flow Velocimeters. EKG Pattern Analysis . XRAY Enhancement

For Example: Micro Based Monitor

CommercialFetal

Monitor

MUXS/H

A/D

Micro

Display

DataRecoder

DSP


15/181

1

U H

What do you need to produce aFunctional DSP device?

Answer: HARDWARE & SOFTWARE

Real Time DSP applications requirechoices In both Hardware &Software to produce a functional

device


16/181

1

U H

APPLICATION

HARDWARE SOFTWARE

ARRAYPROCESSOR

MICRO-PROCESSOR

D SPCHIP SPECIAL

DEVICEHIGH

LEVEL

ASSEMBLYCODE

MICROCODE

FUNCTIONALDEVICE

ADVANCED CAD TOOLS


17/181

1

U H

Design Capture : Dra w and Spe cify

T ransla tor

Ana log D e vices Design Implementa tion

GENERATOR

CODE


18/181

1

U H

Library of DSP PrimitiveFunctions

A

B

EQ

2

3

EQ?

1 1IN

EX T_IN

IN?

G 1GP

INGP

12

AEXP?

AEXPAND

12

ACOMP?

ACOMPRES

FIR

LMSE

AFIR?3 X n

3 Dn

Yn 1

2

NOISE?

NOISE

1

Z

DELAY1

12 -1

DELAY?

Z

DELAY2

12 -2

DELAY?

Z

DELAYN

12 -n

DELAY?

2

MULT

MULT?

GDFT2

DFT1

2

3

4

2

MINUS

MINUS?

+

-

AMP

AMP?

2 1


19/181

1

U H

Proportional Integral Derivative (PID)Compensation Filter

U t K e t K de tdt K e t dt p d i( ) ( ) ( ) ( )

2

MINUSMINUS1

+

-

AMP

AMP1

2

1

SOURCE

1

PROFILE GEN

zcne=tzlrate=10000.0trigger=cp0

AMP

AMP2

2 1

AMP

AMP3

2 1

AMP

AMP4

2 1

INT Z

12

INT1

DFF_LD1

12

DFF1

ddt

SUM3

2

4

SUM1

13

PAR_IN

1

ENCODER

OUT1

SER OUT1des=port2

1

3

1

gdn=1.0

gdn=.7

gdn=.5

gdn=1.19


20/181


21/181

2

U H

The AssemblerThe Assembler translates source code, written with an

algebraic syntax, into object code. Variables, data buffers,

and symbolic constants are defined with the Assemblerdirectives.

LCNTR=r15, Do end_bfly until LCE;

f8=f1*f6, f14=f11-f14, dm(i2,m0)=f10, f9=pm(i11,m8);f11=f1*f7, f3=f9+f14, f9=f9-f14, dm(i2,m0)=f13, f7=pm(i8,m8);

f14=f0*f6, f13=f8+f12, f8=dm(i0,m0), pm(i10,m10)=f9;

end_bfly: f12=f0*f7, f13=f8+f12, f10=f8-f12, f6=dm(i0,m0), pm(i10,m10)=f3;

FFT Butterfly Core Example


22/181

2

U H

Due to the following characteristics, a high efficient codecould be achieved if an assembler is used:

Dedicated Purpose

Assembler is Hardware Slave

Moderate Data Size

Instruction Mnemonics, Address Labels

Simple Arithmetic Operations

High Speed

Moderate Ease Writing and Development

Moderate Ease of Documentation

DSP Processor Development Cycle


23/181

2

U H

S T A R T

Burn PR OM s

Prototype T e st

EN D

(System Builder)

D e fine T a rge t H a rdware

Assemble Mo dule

Link

S IMU LAT E EMU LAT E

PR OM Sp litte r

.obj .cde .int

.sys .dsp

.ach

.exe .exe

CROSS-SOFTWARE-PRO

GRAMS

Repeat as necessary

Repeat as necessary

DSP Processor Development Cycle


24/181

2

U H

Performs interactive, instruction-level simulation of the DSPprocessor code within the hardware configuration

Simulates interrupt and I/O handling,

Flags illegal operations

Supports full symbolic assembly and disassembly

Displays the internal operations and status of the processor

Provides an easy-to-use, window oriented, and graphical user

interface with commands accessed from pull-down menuswith a mouse

The Simulator


25/181

2

U H

High Level Languages and TheirAdvantages

High-Level Languages are:

C

Compiler

HD

C++

HD

DSP/C

HD

Compiler

(Numerical C)

ADA

HD


26/181

2

U H

Compiles with ANSI Specification Incorporates Optimizing Algorithms to Speed Up the

Execution of Code

They Include an Extensive Runtime Library withTypical 100 Standard and DSP-Specific Functions

Outputs DSP Processor Assembly LanguageSource Code

C Compiler and Runtime Library


27/181

2

U H

Supports ANSI Standard (X3J11.1) Numerical C as

Defined by the Numeric C Extensions Group (NECG)

Accepts C Source Input Containing Numerical CExtensions for:

Array Selection

Vector Math OperationsComplex Data TypesCircular PointersVariably Dimensioned Arrays

Outputs DSP Processor Assembly Language SourceCode

DSP/C Compiler

DSP HLL Ad t


28/181

2

U H

DSP HLLs Advantages are:

Hardware Transparent (Portability)

High Level Arithmetic Operations (Complex Math) orUse Library Routines e.g. sin(), fir(), fft()

Loops, Arrays, Labels, I/O Format

Searching and Sorting

Peripheral Intensive System

Relatively Fast Writing & Development

Ease of Documentation


29/181


30/181

3

U H

Binary - Hexadecimal - Decimal Number

Conversion Table

Decimal

0123

456789

1011

12131415

Hexadecimal

0123

456789

AB

CDEF

Binary

0000000100100011

01000101011001111000100110101011

1100110111101111


31/181

3

U H

Signed / Unsigned

UnSigned

Signed

0000 0V - FULL SCALE

FFFF 5V + FULL SCALE

8000 -5V - FULL SCALE

0000 0V

7FFF 5V + FULL SCALE

S/U U U U U U U U U U U U U U U U


32/181

3

U H

2's Compliment Representation

For 2's complement representation, the scale factor for the sign bit of a number

is seen as -(2) (M-1) where M is the number of bits left of the binary point. For

a 4.2 number, the sign scale is (-2)^3.

Example: 0101.01 = 0 * (-8) + 1 * (4) + 0 * (2) + 1 * (1) + 0 * (1/2) + 1 * (1/4)

= 5.25

= - 2.75

1101.01 = 1 * (-8) + 1 * (4) + 0 * (2) + 1 * (1) + 0 * (1/2) + 1 * (1/4)

Binary Point

-2-13 012-(2 ) 2 2 2 2 2

Sign Bit


33/181

3

U H

Fractional versus Integer Notation

S F F F F F F F F F F F F F F F

S I I I I I I I I I I I I I I I

radix point

radix point

Integer format is 16.0 notation


34/181

3

U H

DSP is optimized for fractional

notation

DSP supports integer notation


35/181

3

U H

Ranges for 16 bit Formats

Largest PositiveValue (0x7FFF)

In Decimal

0.9999694824218751.9999389648437503.9998779296875007.999755859375000

15.999511718750000

31.99902343750000063.998046875000000127.996093750000000255.992187500000000511.984375000000000

1023.9687500000000002047.9375000000000004095.8750000000000008191.750000000000000

16383.50000000000000032767.000000000000000

Largest NegativeValue (0x8000)

In Decimal

1.02.04.08.0

16.0

32.064.0128.0256.0512.0

1024.02048.04096.08192.0

16384.032768.0

Value of 1 LSB(0x0001)In Decimal

0.0000305175781250.0000610351562500.0001220703125000.0002441406250000.000488281250000

0.0009765625000000.0019531250000000.0039062500000000.0078125000000000.0156250000000000.0312500000000000.0625000000000000.1250000000000000.250000000000000

0.5000000000000001.000000000000000

FORMAT

1.152.143.134.125.11

6.107.98.89.710.611.512.413.314.2

15.116.0

Fractional

Integer


36/181

3

U H

Format Example

+5 V

-5 V

0 V

0x7FFF

0x0000

0x8000

FORMAT

1 2

3

4 5

0x7FFF

0x3FFF

0x0000

0xCCCD

0x8000

1)

2)

3)

4)

5)

16.0 1.15

5 V

2.5 V

0 V

-2.0 V

-5.0 V

= 32767 ->

= 16383 ->

= 0 ->

= -13107 ->

= -32768 ->

0.999969482... ->

0.499969482... ->

0.0000000... ->

-0.399993986... ->

-1.0000000.... ->

5 V

2.5 V

0 V

-2.0 V

-5.0 V


37/181

3

U H

There are two methods for converting Hexadecimal Numbers to Decimal

Numbers. One is easy and one is hard.

HARD WAY : Convert the hexadecimal number to binary. Place the binary

point. Multiply each bit of the binary number by its associated scale factor.

Example: Convert 0x2A00 to a 1.15 twos-complement decimal value

0x2A00 = 0.010 1010 0000 0000= 2^-2 + 2^-4 + 2^-6

= 0.25 + 0.0625 + 0.015625

= 0.328125 = 0.33 = 1/3

EASY WAY : Use a calculator to convert the hexadecimal number to decimal.

Divide the decimal number by 2^N where N is the number of bits to the right

of the binary point.

Example: Convert 0x2A00 to a 1.15 twos-complement decimal value

0x2A00 10752 / 2^15 = 10752 / 32768 = 0.328125

Hexadecimal to Decimal Conversion


38/181

3

U H

There are two methods for converting Decimal Numbers toHexadecimal numbers. One is easy, and one is hard.

HARD WAY: Break the decimal number into its 2^N components.

Example: Convert 0.8125 to a 1.15 twos-complement hexadecimal format

0.8125 =>

2-2

2-1

20

2-3

2-4

2-5

2-6

2-7

1 1/2 1/4 1/8 1/16 1/641/32 1/128

0 0001011 => 0x6800

EASY WAY: Multiply the decimal number by 2^N where N is the number ofbits to the right of the binary point. Then use a calculator to convert to hex.

Example: Convert 0.8125 to a 1.15 twos-complement hexadecimal format

0.8125 * 2^15 = 0.8125 * 32768 = 26624 0x6800

Decimal to Hexadecimal Conversion


39/181

3

U H

Binary Notation Mini-Quiz

Mini-Quiz

1) What is 0x4000 (1.15 format) in signed decimal notation?

2) What is 0x4000 (16.0 format) in signed decimal notation?

3) What is 0x4000 (0.16 format) in unsigned decimal notation?

4) What is .875 in hex 1.15 Format?

5) What is -.875 in hex 1.15 Format?


40/181

4

U H

Binary Notation Mini-Quiz Answer

1) What is 0x4000 in 1.15 signed notation? 0.5

2) What is 0x4000 in 16.0 signed notation? 16384

3) What is 0x4000 in 0.16 unsigned notation? 0.25

4) What is .875 in 1.15 Format? 0x7000

5) What is -.875 in 1.15 Format? 0x9000


41/181

4

U H

Features Of ADSP-2100 Base Architecture

Modified Harvard Architecture

2 Data Address Generators

Advanced Program Sequencer

3 Arithmetic Units (ALU/MAC/Shifter)

Result Bus

ADSP-2100 Family Base Internal Architecture


42/181

4

U H

y

Input Regs

Output Regs

Shifter

Input Regs

Output Regs

ALU

Input Regs

Output Regs

MAC

R BUS 16

DMD BUS

PMD BUS

DataAddress

Generator#2

DataAddressGenerator

#1

DMA BUS

PMA BUS14

14

24

16

ProgramSequencer


43/181

4

U H

ALU

ALU Block Diagram


44/181

4

U H

g

X Y

ALU

R

AZ

ANAC

AV

AS

AQ

CI

MUX

ARREGISTER

16

AF

REGISTER

AX

REGISTERS

2 x 16

16 16

16

16

24

16

PMD BUS

DMD BUS 16

R - BUS

MUX

MUX

MUX

AY

REGISTERS

2 x 16


45/181

4

U H

ALU Features

4 Input Registers ( AX0, AX1, AY0, AY1 )

Feedback Paths ( AF, AR, MR0, MR1, MR2, SR0, SR1 )

Six Status Flags

Saturation

Provisions For Double Precision

Background Registers


46/181

4

U H

ALU Instruction Examples

(Programmer's Quick Reference pgs 4-5)

AR = AX0 + AY0;

AF = MR1 XOR AY1;

AR = AX0 + AF;

IF GE AR = -AR;

IF AV AR = AY1 + 1;


47/181

4

U H

ALU Instructions

[IF Condition] dest = xop + yop ;

[IF Condition] dest = xop + C ;[IF Condition] dest = xop + yop + C ;

[IF Condition] dest = xop - yop ;

[IF Condition] dest = xop - yop + C - 1 ;

[IF Condition] dest = yop - xop ;

[IF Condition] dest = yop - xop + C - 1;

[IF Condition] dest = xop AND yop;

[IF Condition] dest = xop OR yop;

[IF Condition] dest = xop XOR yop;

[IF Condition] dest = PASS xop ;

[IF Condition] dest = PASS yop ;

[IF Condition] dest = PASS 0;[IF Condition] dest = PASS 1;

ALU Instructions


48/181

4

U H

ALU Instructions

[IF Condition] dest = - xop ;

[IF Condition] dest = - yop ;

[IF Condition] dest = NOT xop ;[IF Condition] dest = NOT yop ;

[IF Condition] dest = ABS xop ;

[IF Condition] dest = yop +/-1 ;

DIVS yop , xop ;

DIVQ xop ;

XOP = [AR, MR0, MR1, MR2, SR0, SR1, AX0, AX1]

YOP = [AY0, AY1, AF]

dest = [AR, AF]

Examples: AR = AX0 + AY0;

AF = NOT AR;

AF = AX1 + AY0 + C;

ALU St t Fl


49/181

4

U H

ALU Status Flags

Flag Name Definition

AZ Zero Logical NOR of all bits in ALU result reg. True if

ALU output equals 0

AN Negative Sign bit of ALU result. True if ALU output negative

AV Overflow X-OR of carry outputs of 2 most significant adder

stages. True if ALU overflows

AC Carry Carry output from most significant adder stage

AS Sign Sign of ALU X input port. Affected only by ABS

instruction

AQ Quotient Quotient bit generated only by DIVS and DIVQ

Arithmetic Conditions


50/181

5

U H

Arithmetic Conditions

ALU Overflow Bit Set

ALU Carry Bit Set

EQ: ALU result = 0

NE: ALU Result 0

GT: ALU Result > 0GE: ALU Result 0

LT: ALU Result < 0

LE: ALU Result 0

NEG: XOP Input Negative

POS: XOP Input Positive

AV:

Not AV:

AC:

Not AC:

MV:

Not MV:

Not CE: Not Counter Expired

Absolute Value Instruction Only

MAC Overflow Bit

>

>

>

>

ALU Saturation


51/181

5

U H

Sets ALU result to full scale positive or full scale negative if overflow or

underflow occurs

Feature enabled by executing ena ar_sat (bit 3 of MSTAT)

Once enabled, affects every ALU operation

Only affects results sent to AR (AF - flags still get set)

Overflow or underflow determined by the following conditions

Overflow (AV) Carry (AC) AR Contents

0 0 ALU Output

0 1 ALU Output

1 0 0x7FFF

full-scale positive

1 1 0x8000

full-scale negativeALU Overflow Latch Mode

Causes AV status flag to become sticky. Need to explicitly clear.

Feature enable by executing ena av_latch (bit 2 of MSTAT)

ALU Mi i Q i


52/181

5

U H

ALU Mini-Quiz

Write The ADSP-2100 Code To Perform The Following Operations:

1) Add 0x0030 to 0x0070 And Store Result in AF.

Hint:

= 0x0070 ;

= 0x0030 ;

AF = + ;

2) Find The Logical AND Of 0x1234 And 0xF00F.

Store The Result In AR.

ALU Mini Quiz


53/181

5

U H

ALU Mini-Quiz

Write The ADSP-2100 Code To Perform The Following Operations:

1) Add 0x0030 to 0x0070 And Store Result in AF.

Hint:

AX0 (or AX1) = 0x0070 ;

AY0 (or AY1)= 0x0030 ;

AF = AX0 + AY0 ;

2) Find The Logical AND Of 0x1234 And 0xF00F.

Store The Result In AR.

AY1 = 0x1234;AR = 0xF00F;

AR = AR AND AY1;



54/181

5

U H

Input Regs

Output Regs

Shifter

Input Regs

Output Regs

ALU

Input Regs

Output Regs

MAC

R BUS 16

DMD BUS

PMD BUS

DataAddress

Generator

#2


#1

DMA BUS

PMA BUS14

14

24

16

ProgramSequencer


55/181

5

U H

MAC

MAC Block Diagram24PMD BUS


56/181

5

U H

MF

REGISTER

MY

REGISTERS2 x 16

16

16

X Y

MULTIPLIER

P

MX

REGISTERS2 x 16

16 16

32

16

MR1

REGISTER

MR2

REGISTERMR0

REGISTER

168

M

U

X

R0R1R2

40

MV

16

DMD BUS16

R - BUS

ADD / SUBTRACT

MUX

MUXMUX

MUXMUXMUX

MAC Features


57/181

5

U H

MAC Features

40 Bit Accumulator

Saturation

Complete Set of Background Registers

Mixed Mode Input Operands for Multiprecision Feedback Paths

Access to R-Bus, DM and PM

MAC Instruction Examples


58/181

5

U H

MAC Instruction Examples


MR = MX1 * MY0 (SS);

MF = AR * MY1 (SS);

MR = MR + AR * MY1 (SS);

MR = 0;

IF MV SAT MR;

IF EQ MR = MX0 * MY0 (UU);

MAC Instructions


59/181

5

U H

MAC Instructions

[IF condition] dest = xop * yop (format);

[IF condition] dest = MR + xop * yop (format);

[IF condition] dest = MR - xop * yop (format);

[IF condition] dest = 0;

[IF condition] dest = MR [ (RND)];

Where:

condition = arithmetic conditions

dest = {MR, MF}

format = {SS, US, SU, UU, RND}

XOP = {MX0, MX1, MR2, MR1, MR0, AR, SR0, SR1}

YOP = {MY0, MY1, MF}

Placement of Binary Point in Multiplication


60/181

6

U H

Placement of Binary Point in Multiplication

Binary Integer Multiplication

M Bits

P Bitsx

M+P Bits

Example: 16.0 x 16.0 => 32.0

Mixed/Fractional Multiplication

M.N Bits

P.Q Bitsx

(M+P).(N+Q) Bits

Example: 1.15 x 1.15 => 2.30

4.12 x 1.15 => 5.27

Multiplication Modes on the ADSP-21xx


61/181

6

U H

Multiplication Modes on the ADSP 21xx

Multiplier Assumes all numbers in a 1.15 Format Multiplier Automatically 1-bit Left Shifts Product

Before Accumulation (Result Forced to 1.31 Format)

Example: MR = MX0 * MY1 (SS);

Mode 1: Fractional Mode

0x4000 0x4000

MX0 MY1

MR0MR1MR2

MR1

0x00 2000 0000

0x2000 underflowoverflow

Multiplication Modes on the ADSP-21xx


62/181

6

U H

p

Multiplier Assumes all numbers in a 16.0 Format No automatic left-shift necessary

Example: MR = MX0 * MY1 (SS);

Mode 2: Integer Mode

0x4000 0x4000

MX0 MY1

MR0MR1MR2

0x00 1000 0000

0x0000overflow

MR0

overflow

Multiplication on the ADSP-21xx


63/181

6

U H

p

To Switch Modes: ENA M_MODE; {Select Integer Mode} *DIS M_MODE; {Select Fractional Mode}

MSTAT Register holds value

Fractional Mode the Default on Reset/Power-up

* Integer Mode Not Available on ADSP-2100A

Rounding in the MAC


64/181

6

U H

g

Rounding can be specified as part of multiply instruction (RND)

Rounding only applies to fixed point fractional results

40-bit results "rounded to nearest" 16 bit value.

Rounded result can be placed in MR1 or MF register

Input: MX0 = 0x7FF9, MY0 = 0xEEEE

Command MR2 MR1 MR0

MR = MX0 * MY0 (SS); FF EEEE EEFC

MR = MX0 * MY0 (RND); FF EEEF 6EFC

Saturation and Overflow


65/181

6

U H

Overflow occurs when sign bit is corrupted during accumulation

Overflow Status signal (MV) is updated every time a MAC operation is

executed

MV is set when MSB of MR2 does not equal MSB of MR1

Saturation is performed by following instruction:

IF MV SAT MR

Input: MX0 = 0x7FFF, MY0 = 0x7FFF, MR = 00 7FFE 0002

Command MR2 MR1 MR0

MR = MR + MX0 * MY0 (SS); 00 FFFC 0004IF MV SAT MR; 00 7FFF FFFF

MAC Mini-Quiz


66/181

6

U H

Write an ADSP-2101 Program to add the values in AX0 and AY0 and to multiply

the result by 0x20.

AX0 = 0x0020;

AY0 = 0x0010;

AR = _______________

___ = _______________

____=_______ * _________

Binary Multiply Mini-Quiz


67/181

6

U H

Fractional Mode Integer Mode

0x1240 * 0x0001

0x4000 * 0x4000

0x4000 * 0x0002

What is the ADSP-21xx Multiplier Output?(Hint: The Output is 32 Bits Wide)

MAC Mini-Quiz


68/181

6

U H

Write an ADSP-2101 Program to add the values in AX0 and AY0 and to multiply

the result by 0x20.

AX0 = 0x0020;

AY0 = 0x0010;

AR = AX0 + AY0;

MY0 = 0x20;

MR = AR * MY0 (SS);

Binary Multiply Mini-Quiz


69/181

6

U H

Fractional Mode Integer Mode

0x1240 * 0x0001

0x4000 * 0x4000

0x4000 * 0x0002

What is the ADSP-21xx Multiplier Output?(Hint: The Output is 32 Bits Wide)

0x0000 2480 0x0000 1240

0x2000 0000

0x0001 0000

0x1000 0000

0x0000 8000



70/181

7

U H

Input Regs

Output Regs

Shifter

Input Regs

Output Regs

ALU

Input Regs

Output Regs

MAC

R BUS 16

DMD BUS

PMD BUS

DataAddress

Generator#2


#1

DMA BUS

PMA BUS14

14

24

16

ProgramSequencer


71/181

7

U H

Shifter

Shifter Block Diagram16DMD BUS


72/181

7

U H

32

SR1

REGISTER

SR0

REGISTER

16

SI

REGISTER

MUX

SE

REGISTERNEGATE

MUX

EXPONENT

DETECTOR

SHIFTER

ARRAY

I

CO

OR / PASS

MUX

8

32

16

1616

From

Instruction

16

8

MUX

R - BUS

BLOCK

EXPONENT

LOGIC

MUX

MUX

16

Shifter Features


73/181

7

U H

16 Bit Input Value Gets Stored Anywhere in a 32 Bit Output Field

All Shift Instructions Execute in a Single Instruction Cycle

Specify Immediate Shift Value within Instruction or indirectly in

the SE register

Normalize, Denormalize, and Exponent Detect Instructions Used

For Block Floating Point and Floating Point Operations

Shifter Instruction Examples


74/181

7

U H


SR = ASHIFT SI BY -3 (LO);

SR = LSHIFT AR BY 6 (HI);

SR = SR OR LSHIFT SR1 (LO);

Shifter Instructions

Shift Immediate Instructions


75/181

7

U H

S t ed ate st uct o s

SR = [SR OR] ASHIFT xop BY (alignment);

SR = [SR OR] LSHIFT xop BY (alignment);

Shift By Value in SE Register

[IF condition] SR = [SR OR] ASHIFT xop (alignment);

[IF condition] SR = [SR OR] LSHIFT xop (alignment);

Where:

condition = Arithmetic Condition

xop = {SI, SR0, SR1, MR2, MR1, MR0, AR}

alignment = {HI, LO}

data = -32 ... 32

Arithmetic Shift Sign Extends Right Shifts

Logical Shift Zero fills Right Shifts

Left Shifts Are Always Zero Filled

Positive SE or Values Shift Left

Negative SE or Values Shift Right

NO "+" for Positive Shifts

Using the Shift Instructions


76/181

7

U H

Placement of Output Depends on HI/LO Modifier, SE Register and Value

Refer to Table 2.4 In ADSP-21xx Users Manual

Example 1: SR = LSHIFT SI BY -12 (LO);

1110 1010 0011 0101SI

Before:

xxxx xxxxSE

xxxx xxxx xxxx xxxx

SR0SR1

xxxx xxxx xxxx xxxx

SI

After:

xxxx xxxxSE

0000 0000 0000 1110

SR0SR1

0000 0000 0000 0000

1110 1010 0011 0101

Immediate Shift Instructions


77/181

7

U H

Example 2: SR = LSHIFT SI BY -12 (HI);

1110 1010 0011 0101SI

Before:

xxxx xxxxSE

xxxx xxxx xxxx xxxx

SR0SR1

xxxx xxxx xxxx xxxx

SI

After:

xxxx xxxxSE

1010 0011 0101 0000

SR0SR1

0000 0000 0000 1110

1110 1010 0011 0101

Shift Instructions with SE Register


78/181

7

U H

Example 3: SR = LSHIFT SI (HI);

1110 1010 0011 0101SI

Before:

1111 0100 (-12)SE

SR0SR1

xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx

1110 1010 0011 0101SI

After:

SE

SR0SR11010 0011 0101 00000000 0000 0000 1110

1111 0100 (-12)

Shift Instructions with OR Functionality


79/181

7

U H

Example 4: SR = SR OR LSHIFT SI (HI);

1110 1010 0011 0101SI

Before:

1111 0100 (-12)SE

SR0SR1

0000 0000 0000 0000 0000 0000 0000 0101

1110 1010 0011 0101SI

After:

SE

SR0SR11010 0011 0101 01010000 0000 0000 1110

1111 0100 (-12)

Shifter Mini-Quiz


80/181

8

U H

Write ADSP-2101 Code to:

Write 0x0034 into the AR register

Write 0x0012 into the SI register

Shift AR into the MS bits of SR0 (SR0 = 0x3400)

Shift SI into the LS bits of SR0

Hint: 4 Instructions SR1 = 0x0000, SR0 = 0x3412 When Done

Shifter Mini-Quiz Answers


81/181

8

U H

Solution 1:

AR = 0x0034;

SI = 0x0012;

SR = ASHIFT AR BY 8 (LO);

SR = SR OR ASHIFT SI BY 0 (LO);

Solution 2:

AR = 0x0034;

SI = 0x0012;

SR = LSHIFT AR BY -8 (HI);

SR = SR OR ASHIFT SI BY -16 (HI);



82/181

8

U H

Input Regs

Output Regs

Shifter

Input Regs

Output Regs

ALU

Input Regs

Output Regs

MAC

R BUS 16

DMD BUS

PMD BUS

DataAddress

Generator#2


#1

DMA BUS

PMA BUS14

14

24

16

ProgramSequencer

Data Address Generator (DAG) Operations


83/181

8

U H

Data Address Generator (DAG) Operations

Registered Indirect Addressing

Automatic Post-Modify of Address

Circular Buffering

DAG 1 Fetches/Stores to Data Memory

DAG 2 Fetched/Stores to Data or Program Memory

Bit-Reverser For FFT Support (DAG 1 Only)

Data Address Generator Block Diagram


84/181

8

U H

L

REGISTERS

4 x 14

MUX

ADDRESS

DMD BUS

FROM

INSTRUCTION

ADD

I

REGISTERS

4 x 14

M

REGISTERS

4 x 14

MODULUS

LOGIC

BITREVERSE

142 14 14 14

14

DAG1 ONLY

FROM

INSTRUCTION

2

DAG Features


85/181

8

U H

DAG Features

Data Fetch/Store Execute Simultaneous With ArithmeticInstruction

2 DAGS In Processor

4 Index Address Registers Per DAG

4 Modify Registers Per DAG

4 Length Registers Per DAG

Any Modifier Register in DAG can be Used With Any

Index Register in DAG

Example DAG Instructions

(P ' Q i k R f 10)


86/181

8

U H

(Programmer's Quick Reference pgs10)

AX0 = DM(0X3800);

AX0 = DM(I0, M3);

MODIFY (I4, M5);

AX1 = DM(I2,M3), AY0 = PM(I4,M7);

MR=MR+MX0 * MY0 (SS), MX0 = DM(I2,M2), MY0 = PM(I6,M6);

Note: L Registers Must Be 0 If Circular Buffers Are Not Used


87/181

Modulo Addressing Example

B Add H#0030


88/181

8

U H

H#0030

H#0037

I0

I0 = Current Address

M0 = Modify Value (3)

Base Address = H#0030

L0 = Buffer Length (8)

M L

Address Sequence

30

33

36

31

34

37

32

35

Modulo Addressing Code Example


89/181

8

U H

.VAR/DM/CIRC/ABS=0X30I0 = ^Buff;L0 = %Buff;M0 = 3;AX0 = DM (I0, M0);

AY0 = DM (I0, M0);AX1 = DM (I0, M0);AY1 = DM (I0, M0);

Buff [8]; /*Define Buffer *//*I0 = Start address of Buff *//*L0 = Length of Buff *//*Modify value = 3 *//*Fetch data at address 30 */

/*Fetch data at address 33 *//*Fetch data at address 36 *//*Fetch data at address 31 */

Bit Reversal with the ADSP-2100 Family


90/181

9

U H

Only available with DAG1

Enabled by setting bit 1 of MSTAT register or using the instructionENA BIT_REV

Reverses all 14 bits of address

normal order: 13 12 11 10 09 08 07 06 05 04 03 02 01 00Bit-reversed: 00 01 02 03 04 05 06 07 08 09 10 11 12 13

For an FFT of size 2^N, set M register to 2*2 (14-N)*

* x2 because FFT output has real and imaginary data interleaved

i.e. 256 FFT = 2^8 FFT, M = 2*2^(14-8) = 2*2^6 = 128

DAGS Mini-Quiz


91/181

9

U H

0x12340x1234

0x1234

0x1234

0x1234

Data Memory

DM(0x3800) Write the ADSP-2101 Instructionsto Find the Sum of the N=5 NumbersStored in Data Memory

Hint:

Use Multifunction Instructions Nine Instructions Total

3 Instructions are Repeated

Questions:

1) How Many Instructions Cycles AreRequired?

2) How Many Instruction Cycles are

Required if N=100?

3) Is this an Efficient Use of the Processor?

DAGS Mini-Quiz Answer

.module/boot = 0 dags_mini_quiz;

.var/dm/circ data_buf [5];


92/181

9

U H

start:i0 = ^data_buf; /*Load DAG Registers */

l0 = % data_buf;m3 = 1;ar = dm (I0, m3); /*Load first data value */ay0 = dm (I0, m3); /*Load second data value */ar = ar + ay0, ay0 = dm (i0, m3); /*Add and load third value */ar = ar + ay0, ay0 = dm (i0, m3); /*Add and load fourth value */ar = ar + ay0, ay0 = dm (i0, m3); /*Add and load fifth value */ar = ar + ay0; /*Last addition */

.endmod;

1) 9 Cycles

2) 104 Cycles

3) No, it would waste program memory

Program Sequencer Block Diagram

DMD BUS 16


93/181

9

U H

INTERRUPTCONTROLLER

CONDITIONLOGIC

LOOP STACK4 X 18

NEXTADDRESSSOURCESELECT

INCREMENT

PROGRAMCOUNTER

NEXT ADDRESS MUX

PC STACK16 X 14

PMA BUS 14

MUX

From INSTRUCTION REGISTER

LOOPCOMPARATOR

18

14

14

2

IRQ

4

4

14

16

14

COUNTERLOGIC

STATUSLOGIC

CE

Program Sequencer Operations

Zero Overhead Looping


94/181

9

U H

Conditional/Unconditional Branches

Interrupt Handling

Counter and Status Stacks

Next Instruction Address Generation

Program Sequencer Features

Automatic Operation, Transparent to User

Single Cycle Conditional Branches

4-Deep Loop, Counter Stack

16-Deep PC Stack

Sequencer Instructions

(Programmer's Quick Reference pgs 12)


95/181

9

U H

[ IF condition] JUMP ;

[ IF condition] CALL ;

[ IF condition] RTS;

[ IF condition] RTI;

IF CALL ;

IF JUMP ;

SET / TOGGLE / RESET FLAG_OUT;

Where:

condition = Branch Condition = {(I4), (I5), (I6), (I7), }flag_condition = {FLAG_IN, NOT FLAG_IN}

Program Loop Example


96/181

9

U H

General Form:

DO LABEL UNTIL CONDITION

Example:CNTR=10;

DO ENDLOOP UNTIL CE;

{ First Loop Instruction } ;

{ Last Loop Instruction } ;ENDLOOP:

{ Next Loop Instruction } ;

{ First Instruction Outside Loop } ;

Address PushedOn PC Stack

Address PushedOn LOOP Stack

Interrupt Handling

Interrupts Can Be Generated By An External Interrupt Signal Or


97/181

9

U H

Interrupts Can Be Generated By An External Interrupt Signal Or2100 Family Peripherals (Timer, Sport, HIP, etc)

External Interrupts (IRQx) Can Be Level Or Edge Sensitive (ICNTL)

Interrupts Have Priority And Can Be Nested

Interrupts Can Be Masked (IMASK)

Interrupts Can Be Forced Or Cleared Under Software Control (IFC) *

Different Family Members Have Different Interrupt Vector Tables

Interrupt Vector Table Always Begins At PM Address 0x0000

* Except ADSP-2100A


98/181

Interrupts & Interrupt Vector Addresses

ADSP-2101


99/181

9

U H

Interrupt SourceProgram startup at RESETIRQ2

SPORT1 Transmit / IRQ1SPORT1 Receive / IRQ0Timer

Interrupt Vector Address0x00000x0004 (highest priority)

0x00100x00140x0018 (lowest priority)

ADSP-2105

Interrupt Source

Program startup at RESETIRQ2

Interrupt Vector Address

0x00000x0004 (highest priority)SPORT0 TransmitSPORT0 ReceiveSPORT1 Transmit / IRQ1SPORT1 Receive / IRQ0Timer

0x00080x000C0x00100x00140x0018 (lowest priority)

0x00140x0018

0x001C

0x001C

Sequencer Mini-Quiz


100/181

1

U H

Modify the answer of the DAGS Mini-Quiz to use a zero-overhead loop.

Assume N=100. Your program should require 9 Instruction Locations

0x1234

0x1234

0x1234

0x1234

0x1234

Data Memory

DM(0x3800)

Write the ADSP-2101 Instructionsto Find the Sum of the N=100Numbers Stored in Data Memory

0x1234

Sequencer Mini-Quiz Answer


101/181

1

U H

.module/boot = 0 sequencer_mini_quiz;

.const buf_len = 100;

.var/dm/circ/abs=0x3800 data_buf [buf_len];

start:i0 = ^data_buf; /*Load address of data buf */l0 = %data_buf; /*Load length of data buf */m3 = 1;cntr = buf_len - 2; /*Load counter */ar = dm (i0, m3); /*Load first data value */ay0 = dm (i0, m3); /*Load second data value */do add_loop until ce;

/*Value */

ar = ar + ay0; /*Last addition */

rts;.endmod;

ADSP-2100 Family Peripherals


102/181

1

U H

Memory Interfacing

Timer

Serial Ports


103/181

1

U H

ADSP-21xx Family Memory Interface

ADSP-2101 Basic System Configuration

Clock or Crystal


104/181

1

U H

SCLKRFS

TFSDTDR

14 24

16824

Serial Device

14 2

SCLK

RFS or IRQ0TFS or IRQ1DT or FODR or FI

A D

OE

WE

CS

DATA

MEMORY&PERIPHERALS

(Optional)

A D CS

OE

WE

PROGRAMMEMORY

(Optional)

A D CS

OE BOOTEPROM

27C6427C128

27C25627C512

150 ns

ADSP-2101

CLKIN CLKOUT VDD

SERIALPORT 0

GND

SERIALPORT 1

DATAADDRESSPMS DMS BMSRD WR

XTAL

MMAP

BG

BR

IRQ2

RESET (Optional)

Serial Device

(Optional)

ADSP-21xx Family Memory Architecture

V i d M C fi ti A F il M b *


105/181

1

U H

Varied Memory Configurations Across Family Members*

Core Can Access PM Twice and DM Once Per Instruction

PM and DM Buses Multiplexed Off Chip*

Can Perform One Off-Chip Access with No Cycle Penalty

On Chip PM Can Be Initialized Through Boot EPROM or

Host Interface Port

External EPROM Can Store 8 Pages of Bootable Code.

Software Programmable Wait States

* Does not apply to ADSP-2100A

On Chip Memory Configurations For

ADSP-21xx Processors

ProgramProgramMemory

DataMemory Memory


106/181

1

U H

ADSP-2100A

ADSP-2101

ADSP-2103

ADSP-2105

ADSP-2111

ADSP-2115

ADSP-21msp5x

ADSP-2161/63

ADSP-2171/73

MemoryRAM

MemoryRAM

yROM

-

1k

1k

1/2k

1k

1/2k

1k

1/2k

2k

-

-

-

-

-

-

2k

8k/4k

8k

-

2k

2k

1k

2k

1k

2k

-

2k

ADSP-2181 16k -16k

ADSP-2101 Program Memory Architecture

0x0000(Reset


107/181

1

U H

(Vector)

0x07FF0x0800

0x37FF0x3800

0x3FFF

Internal PM

RAM BootedFrom ExternalBoot Memory

ExternalProgramMemory

ExternalProgramMemory

Internal PMRAM Not

Booted

MMAP = 0

(Boot)

MMAP = 1

(No Boot)

ADSP-21xx Data Memory Architecture


108/181

1

U H

0x0000

0x3FFF

InternalData Memory

RAM

0x0400

0x0800

0x3000

0x3400

0x3800

0x3C00

1K ExternalDWAIT0

1K ExternalDWAIT1

10K ExternalDWAIT2

1K ExternalDWAIT3

1K ExternalDWAIT4

Memory Mappedand Reserved

Registers

ADSP-2171

Internal Data

Memory

RAM

ADSP-21xx Memory Control Registers


109/181

1

U H

11 1 1 1 1 1 1 1 1 1 1 1 1 1

DWAIT4 DWAIT3 DWAIT2 DWAIT1 DWAIT0

Data Memory Wait State Control Register DM(0x3FFE)

System Control Register DM(0x3FFF)

0 10 0 0 1 1 1 10

PWAITProgram

MemoryWait States

BWAITBoot

MemoryWait States*

BPAGEBoot Page

Select

BFORCEBoot

Force Bit

* 7 wait states for Boot Memory on ADSP-2171

Memory Mapped Control Registersvs. Status Registers

Memory Mapped Control Registers

> Physical locations in Data Memory


110/181

1

U H

y y

> Accessed by address

> Addresses 0x3C00 thru 0x3FFF (All Processors)

Status Registers (or Non-Memory Mapped Registers)

> Physical registers in the DSP

> Accessed by name

Memory Mapped Control Registers> Mainly to set up the peripherals (i.e., mode of operation)

Status Registers

> Set up the operation of the DSP core (i.e., MAC, interrupts)

> Provide information about the DSP core (i.e., stacks, status flags)

Initialize Memory Mapped Registers before running (i.e., not on the fly)

Status Registers are meant to be used on the fly

Memory Mapped Control Registers 0x3FFF System Control Register - Wait states, Enable SPORTs

0x3FFE Data Memory Waitstate Control Register

0x3FFD-0x3FFB Timer Control Registers - Set Timer values


111/181

1

U H

0x3FFA -0x3FF7 SPORT0 Multichannel Word Enable Register

0x3FF6 SPORT0 Control Register - clock, frame and data modes

0x3FF5 SPORT0 SCLKDIV - Divide down register for SCLK

0x3FF4 SPORT0 RFSDIV - Divide down register for internal RFS

0x3FF3 SPORT0 Autobuffer Control Register

0x3FF2-0x3FEF SPORT1 Control and Setup (same as SPORT0)

0x3FEF-0x3FEC Analog Control Registers No SPORT1 autobuffer on msp5x parts

0x3FEB-0x3FE9 NO REGISTERS

0x3FE8 HMASK Register - HIP mask for interrupts

0x3FE7-0x3FE6 HIP Status Registers - HSR7 and HSR6 0x3FE5-0x3FE0 HIP Data Registers

Status Registers

ASTAT ALU Status Flags, MAC Overflow Flag, Shifter Input Flag

SSTAT Stacks Overflow and Empty (Read Only)


112/181

1

U H

SSTAT Stacks Overflow and Empty (Read-Only)

MSTAT Computation Modes, Miscellaneous Functions

5 4 3 2 1 0

TimerSPORT1 Receive or IRQ0SPORT1 Transmit or IRQ1SPORT0 ReceiveSPORT0 TransmitIRQ2

0 0 0 0 0 01 = Enable

0 = Disable

4 3 2 1 0

IRQ0 SensitivityIRQ1 SensitivityIRQ2 Sensitivity

Interrupt Nesting

0

1 = Edge

0 = Level

1 = Enable0 = Disable

ICNTL External Interrupt Sensitivity (edge/level) and Nesting

IMASK Interrupt Enables - Masks the servicing of interrupts

IFC Interrupt Force/Clear (Write-Only)

Boot EPROM to Internal PM RAM

8 bits 24 bits


113/181

1

U H

8k x 8

BootPage 0

2k x 24

0x0000

0x2000

BOOTEPROM

Internal PM RAM

0x1FFF

.

.

.

AdditionalBoot

Pages

0x0000

0x07FF

8 bitsAB

C

Page length

24 bitsA B C

A

B

C

X

A B C

11

2

2

Booting Order

ADSP-2101 Timer Block Diagram

16DMD Bus


114/181

1

U H

TSCALE TPERIOD

CLKOUTTimer Enable

& Prescale LogicTCOUNTDecrement Zero

Count Register Load Logic

TimerInterrupt

Timer Enable

168

16

ADSP-2100 Family Timer Features

The ADSP-21xx programmable interval timer can generate periodic interrupts


115/181

1

U H

The ADSP 21xx programmable interval timer can generate periodic interrupts

based on multiples of the processor's cycle time. The timer is not available onthe ADSP-2100.

TCOUNT = dedicated count-down register

TPERIOD = reloads TCOUNT at interrupt

TSCALE = # of Clock ticks before TCOUNT decrements - 1

TCOUNT is decremented every TSCALE+1 cycles. After TCOUNT

expires, it is reloaded with the value in TPERIOD. One interrupt

occurs every (TPERIOD + 1) * (TSCALE + 1) cycles.

ADSP-2101 Timer Registers


116/181

1

U H

0x3FFD

0x3FFC

0x3FFB

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

TPERIOD Period Register

TCOUNT Counter Register

TSCALE Scaling Register00000000

ENABLING THE TIMER


117/181

1

U H

1. Set values for TCOUNT, TPERIOD, and TSCALE.

2. Set bit 0 in IMASK to 1 to enable interrupt.

3. Execute "ena timer" instruction to start counting down.

(Bit 5 in MSTAT register)

Example Setup Code for Timer

i0 = 0x3ffb; /*i0 points to TSCALE*/


118/181

1

U H

m0 = 1; /*modify value is 1 */

l0 = 0; /*not a circular buffer */

dm(i0,m0) = 0; /*set TSCALE to decrement every cycle*/

dm(i0,m0) = 49; /*to generate first interrupt at 50 cycles*/

dm(i0,m0) = 99; /*to reload TCOUNT with 99 at interrupt*/

IMASK = 0x1; /*enables the timer interrupt*/

ena timer; /*starts the count down after executing this*/

TIMER MINI-QUIZ


119/181

1

U H

1. Write code to generate a timer interrupt every 50 cycles the first time, and

75 cycles thereafter (any decrement that works).

2. Write code to generate a timer interrupt every 300 ms. Assume clock is

16.67MHz.

3. What is the longest time you can set the timer for if you have a 12.5MHzcycle time. What would the values of TSCALE, TCOUNT, and

TPERIOD?

TIMER MINI-QUIZ ANSWER1. 2.i0 = 0x3ffb; /*same first 3 lines*/ /*300ms = 5,000,000 cycles*/

m0 = 1; dm(i0,m0) = 0xF9; /*TSCALE = 250*/

l0=0; dm(i0,m0) = 0x4E1F; /*TCOUNT = 20,000*/

dm(i0,m0) = 0; /*set tscale=1*/ dm(i0,m0) = 0x4E1F; /*TPERIOD = 20,000*/


120/181

1

U H

dm(i0,m0) = 49; /*set tcount = 50*/ imask = 0x1;dm(i0,m0) = 74; /*set tperiod = 75*/ ena timer;

imask = 0x1;

ena timer;

3. /*same first 3 lines*/

/*12.5mHz processor yields an 80 ns instruction cycle time TCOUNT and TPERIOD are 16

bit registers - largest number they can represent is 65535, TSCALE is an 8 bit register, sothe largest number it can represent is 255. Following the equation

(TSCALE+1)*(TPERIOD+1) gives us 0x100 0000 number of cycles per timer interrupt.

This number multiplied by 80ns is 1.3422 seconds*/

dm(i0,m0) = 0xff;

dm(i0,m0) = 0xffff;

dm(i0,m0) = 0xffff;imask = 0x1; ena timer;


121/181

1

U H

ADSP-21xx Serial Port


122/181

1

U H

ADSP-21xx Serial Port UART

ADSP-2101 Serial Port Block Diagram

DMD Bus16


123/181

1

U H

CompandingHardware

Receive Shift Register

16

16

TXnTransmit Data Register

Transmit Shift Register

16

16

DT DR

SerialControl

SCLKTFS RFS

InternalSerialClock

Generator

RXnReceive Data Register

ADSP-21xx SPORT Features


124/181

1

U H

ADSP-21xx SPORTs Are Used For Synchronous Communication

Full Duplex

Fully Programmable

Autobuffer Capability

Multi-Channel Capability

Data Rates Up To 13 Mbits/sec

2171 Data Rates Up To 20 Mbits/sec

Examples of Serial Port Implementation

Connecting a CODEC to the Serial Port


125/181

1

U H

Connecting Two 2101's Together

Using the Serial Port as a UART

2101TP3053CODEC

2101 2101

2101

(withsoftwareUART)

PC

AD233

(RS-232 Driver)

ADSP-21xx SPORT Hardware

SCLK: Serial Clock

SPORT Has 5 Wires


126/181

1

U H

RX: Data Receive

TX: Data Transmit

TFS: Transmit Frame Sync

RFS: Receive Frame Sync

SCLK

TFS1

RFS1

RX

TX

ADSP-21xx Serial Device

Serial Clock

Transmit Frame Sync

Receive Frame Sync

Receive Data

Transmit Data

ADSP-21xx SPORT Software

Access Serial Port Data By Accessing SPORT Data Registers:


127/181

1

U H

TX0, TX1, RX0, RX1

Configure Serial Port Through Memory Mapped Control Registers:

System Control Register **

SPORT Control Register **

SPORT SCLKDIV Register

SPORT RFSDIV RegisterSPORT Autobuffer Control Register

SPORT0 Multichannel Enable Registers

** Required to Configure SPORTs

Synchronize SPORT Transfers and Processor Operation With Interrupts

Each SPORT is Allocated a Transmit and Receive Interrupt

The Base Architecture of Floating-Point DSP Processor

DAG 1Program

CACHE32 x 48 JTAG Test

&Emulation

TimerDAG 2


128/181

1

U H

8 x 4 x 32Program

Sequencer

Emulation

BusConnect

24

32

48

40DMD BUS

PMD BUS

DMA BUS

PMA BUS

Fl P/Fx PALUMultiFx P MAC

Fl P/Fx P 32-Bit

Barrel Shift

RegisterFile

16 x 40

8 x 4 x 24

IEEE Compatibility(IEEE Floating Point Standard 754/854)

Data Formats32-Bit Single-Precision IEEE Floating Point

(23-Bit Data or Mantissa, 8-Bit Exponent, & Sign Bit)


129/181

1

U H

40-Bit Extended Single-Precision IEEE Floating Point(31-Bit Data or Mantissa, 8-Bit Exponent, & Sign Bit)32-Bit Fixed Point (Integer and Fractional) With 80-Bit

Accumulation

RoundingRounding-to-Nearest (Unbiased Rounding)

Round-Toward-Zero (Truncation)

IEEE Exception HandlingOverflowUnderflowEquals ZerosDivide-by-Zero

Interrupt on Exception or Latched Status

RegisterFile

Floating-Point Multiplier/MAC


130/181

1

U H

Fl P/Fx PALU

MultiFx P MAC

Fl P/Fx P 32-BitBarrel Shift

16 x 40

Example Multiplier/MAC InstructionsF1 = F5 * F7

R2 = R3 * R8 (SSF)

MRF = MRF + R5 * R0 (UUIR)

RegisterFile

Floating-Point Multiplier/MAC


131/181

1

U H

Fl P/Fx PALU

MultiFx P MACFl P/Fx P 32-Bit

Barrel Shift16 x 40

Example Multiplier/MAC InstructionsF1 = F5 * F7

R2 = R3 * R8 (SSF)

MRF = MRF + R5 * R0 (UUIR)

Example Multi-Function Instructions


132/181

1

U H

IF EQ F1 = ABS F8, F9 = DM (I0,M4)

F8 =F1*F6, F3=F9+F14, F9=F9-F14,DM(I2,M0)=F10, PM(I10,M10)=F3

The System Architecture

1xCLOCK 4


133/181

1

U H

Peripherals

DataMemoryD SP

Selects

OE

WE

ADDR

DATA

SelectsOE

WE

ADDR

DATA

ACK

PMS1-0

PMRD

PMWR

PM A

PM D

PMPAGEPMACK

PMTS

DMS3-0

DMRD

DMWR

DM A

DM D

DMPAGEDMACK

DMTS

CLKINRESET IRQ3-0

ProgramMemory

Selects

OE

WE

ADDR

DATA

54

2

24

48

4

32

40Processor

NOISY SIGNAL CLEAN

The Complete Architecture


134/181

1

U H

DIGITALS/H A/D PROCE- D/A

fs

DiscreteTime Value

AnalogueDiscrete FilterProcessing

SSOR

Signal

SIGNAL

Analogue

Signal

What is a Real Time Application?

Real Time is misleading expression.Howeverit means that the DSP system can process therequired algorithm within a specified time


135/181

1

U H

DIGITALS/H A/D

PROCESSOR

fs

RADAR SIGNAL

DISPLAY

Fourier

Transform

x(t) x(f)

f1 f2

Real Time Operating Systems as an IdealEnvironment for Embedded Applications

The current DSP processors:


136/181

1

U H

Are more than high-performance signal -processingengines

Provide a more regular instruction set, with plenty ofaddress space to run large programs

Come with efficient C compilers rival generalpurpose microprocessors

Cli k t dd t t

DSPEmbedded Applications

DSP

RTOS


137/181

1

U H

Click to add text RTOS

Fax TasksTelephone

Tasks

Speech

Recognition

Tasks

Sound

Tasks

Generation

Answering

Machine

Tasks

ARCHITECTUREDSP

RTOS

DSP Memory

M t

Real-Time

M lti T ki

DSP

St I/O


138/181

1

U H

Managment Multi-Tasking Stream I/O

DSP

Event Handling

Memory Segments Processor Segments Peripheral Devices

Cli k t dd t t

Operating System Features: BOS Nucleus RXTC SPOX Helios

Preemptive Task Scheduling Yes Yes Yes Yes Yes

Features for Real Time Operating Systems


139/181

1

U H

Click to add textTime-Sliced Scheduling Yes Yes Yes No YesRound-Robin Scheduling ? Yes Yes No Yes

Parallel Processing No No No Optional Yes

Inter-Task Messages Yes Yes Yes Yes Yes

Memory Management Yes Yes Yes Yes Yes

Interrupt Management Yes No Yes Yes YesTimer Management Yes Yes Yes No Yes

Device-Independent I/ O No No No Yes Yes

Stream I/ O $495* No No Yes Yes

OS RAM/ ROM Size (Bytes) 5K-40K 4K-20K 12K-16K 44K+ 80K-200K

Please contact the vendors listed above for the best and most up-to-date information

Compression Techniques and a Compressorand De-Compressor Generator

The CCITT/ISO Joint Photographic Experts Group

(JPEG) d (MPEG) di it l i i


140/181

1

U H

(JPEG) and (MPEG) digital image compressionprocessing algorithmsare seriously required for:MultimediaVideo EditingColour Publishing and Graphics Arts

Image-Processing, Storage and RetrievalColour Printers, Scanners and CopiersHigh-Speed Image Transmission Systems forLANs, Modem and Colour FacsimileDigital Cameras

These algorithmsmay be implemented in real timeas:

A) A dedicated Chip (Compressor)

C P d t C i DCT H ff Q t iti P i


141/181

1

U H

Company Productname

Compressionratio

DCTTable

HuffmanTable

QuantasitionTable

Pricein

Fast

Forward

Outlaw Digital

Video

from 4:1 to

10:1

Board: Disc 0.5 GByte 4700

950C-Cube CL 550

En- / Decoder

from 8:1 to

100:1

static program program 80

C-Cube CL 650En- / Decoder

from 1:1 to50:1

static program program 200

Winbond W9930En- / Decoder

from 8:1 to100:1

static static program 29

LSI Logic L64702 * program

program program 60

B) DSP Processor + Compressor


142/181

1

U H

DATA

compressed

uncompressed

DATA

compressed

uncompressedDCT

IDCTDSP Processor

DCT: Discrete Cosine Transform

C) Software Solution (DSP C / Assembler code)

Company Processor type Data Bits Operation

frequency

Benchmarks

Optibase Motorola

56002

24 40 MHz *


143/181

1

U H

Atlanta Signal

Processor

Texas

Instruments

TMS320C31

32 16 MHz 64 KB Grey scale

700ms

Sonitech

International

Texas

Instruments

TMS320C3x

32 16 MHz 400 Kbytes/s b &

540 Kbytes /s Colour

Atlanta Signal

Processor

Analog Devices

21020

32 33 MHz 500 Kbytes/s b & W


Zoran Corp Zoran ZR38000 16 25 MHz 440Kbytes/s b & W


Compressor-De-Compressor Generator

n Millions

Pixels/Second

Processing Rate

Quantizer&

HuffmanTables

CompressionRate

1:1 to 80:1


144/181

1

U H

Processing Rate

MPEG Param.Comp/Decomp

Generator

CAssembly

JPEG Param.

1:1 to 80:1

n Bit Gray Scale, RGB, CMYK, 4:4:4:4, YUV Colour Space I/O

Comdisco: SPW

Hyperception: HW

Momentum: FDAS

Modelsfor

Code & Model Generator

Performance Measures

Two measures are used commonly:

MIPS: Millions of Instructions Per Second


145/181

1

U H

MIPS: Millions of Instructions Per SecondThis is a measure of raw instruction

execution rate without specifying the nature of the

computations.

MFLOPS: Millions of Floating Point Operations Per Second

This is a measure useful in assessing computations in

floating point format.

The difference between MIPS and MFLOPS can be appreciated by

considering a simple DO LOOP high level language construction:

DO I = 1 TO 1000000 STEP 1

BEGIN

Z(1) = X(I) * Y(I) + C(I);

END


146/181

1

U H

END

Each iteration accomplishes two floating point operations, yet depending on the

host computer the compiled assembly language code could occupy many bytes.

The speed of execution of the two floating point operations depends therefore on

the MIPS of the processor; provided that each iteration could be completed in

say a nanosecond, the processor would then execute at the rate of two MFLOPS.A system of a giga (one thousand millions)! processors could conceivably do all

the iterations at once and attain a performance of two giga MFLOPS.

Despite its spread use, an MIP is perhaps the poorest definition of performance

since it contains no quantifiable attributes for assessing useful processing.

The term FLOPS is widely used in signal processing applications and is acommon measure of performance in comparing processors.

Data Flow Bottle-necks & Solutions ;Pipeline & Parallel Architectures With Examples

DATA INMEMORY INSTRUCTION


147/181

1

U H

DATA INMEMORYBUS

DATA

OUTPUT

INSTRUCTION

Bottle-neck Of a Shared Instruction/Data Bus inVon-Neumann Machine

INSTRUCTION

DATA BUS

DATA

ALU

TMP

ACCUM

GENERALGPURPOSE

REGISTERS

PROG CNTR

ADDR REG

MEMORY

(INSTRUCTIONSAND DATA)

The First Generation P Architecture


148/181

1

U H

AND DATA)

ADDRESS

CONTROL & TIMING

ADDRESS BUS

Each instruction is a new event; it is fetched, decoded, and executed.

The Assembly Language Commands Help To Execute Lengthy Manipulations

On Designated Strings Of Data.

The Programmer Must Code Iterative Loops Or To Use Other Mechanisms To

Enhance Performance While Constrained With The Basic Limitations.At The Algorithmic Level, Many Sequences Of Operations Have Little Or No

Precedence Relationships.

The simplest view of a pipeline is that each stage consists of combinational

logic driven by an input register. The output from a stage captured by the

input register of the following stage. Each stage has a delay for the initial

data capture and subsequent processing.

It is possible to construct two types of pipeline system:

i) Synchronous Pipeline

Overview of the Pipeline Approach


149/181

1

U H

i) Synchronous Pipeline

If all stages have an equal delay, then a synchronous clock can transfer

results into each input register. This is the simplest control problem.

ii) Asynchronous Pipeline

If there is a large discrepancy between the various delays in each stage,

then an asynchronous data transfer might be in order. Here the intermediateregisters are omitted. The design of such pipes requires careful timing of

data input/output.

The following figure shows a simple Pipeline DSP System.

Combinatorial Logic

In

p

u

t

R

e

g

i

st

DSPADSP-2181


150/181

1

U H

t

e

r

Stage jStage j-1 Stage j+1

Simple Pipeline DSP System

AD

Converterjj-1 j+1 DA

Converter

When can the Pipeline Approach be considered?

In general a pipeline can be considered if:

i) The procedure can be broken into a sequence of discrete steps,

ii) The steady state data flow matches the reminder of the system, &

iii) Components can be found which implement the steps with the

desired response.


151/181

1

U H

p

How can the performance of the pipeline be measured?

A synchronous pipeline produces a result every clock period t,

i.e. a data-flow rate of 1/toutputs per second. An N-stage pipelinegives an apparent N-fold increase in performance. If the input to the

pipeline is intermittent, however, then some stages will not be

processing valid data, and this must be accounted for by the control

mechanism. If, on the average, only a fraction P of the total stages

are occupied, then the data flow falls to P/toutputs per second.

In the following figure, a sequence of procedures is assumed each to

process data in time t, except for the FFT procedure which

consumes 8 t. Given that all the mechanisms for increasing

Question:


152/181

1

U H

consumes 8 t. Given that all the mechanisms for increasingthroughput (i.e. for decreasing t)have been exhausted, what are the

alternatives to enhance DSP performance?

t 8t tP1 P = FFT2 P3

Sequential Data Flow


153/181

Overview of the Parallel Approach

The simplest view of a parallel approach is that the input data to be fed to the units

sequentially via the input commutater and the output commutater collect the result

data after the processors have been executed simultaneously.


154/181

1

U H

The following figure shows a simple Parallel DSP System.

DSPADSP-2181

I

np

u

t

C

o

m

m

u

ta

O

u

t

p

u

t

C

o

m

m

ut


155/181

1

U H Simple Parallel DSP System

AD

ConverterDA

Converter

a

t

e

r

a

t

e

r

When can the Parallel Approach be considered?

In general a parallel approach can be considered if:

i) The procedure can not be broken into a sequence of discrete steps, &

ii) The steady state data flow does not need to be constrained.

Note: The input/output commutation is usually difficult to implement and consumes

some overhead which lowers the effective throughput.


156/181

1

U H

How can the performance of the parallel be measured?

A parallel array need not have an identical delay in each path, though this

complicates the control problem. If each of N units has a delay ti, then the

average delay could be used to compute data-flow. For N parallel paths theresponse will be shown to be the same as an N-stage pipeline. If a proportion

P of units is unused then the output rate drops.

The overall behaviour is identical therefore with a pipeline although

implementation issues are widely different.

The final resort to enhance DSP Performance is in the form of Multiplicity:Answer (continue):

b) Parallel Array of Processing Units

In this case the individual processors still operate with a response time of 8 .

The input commutater sequentially allocates input data which is collected

8 seconds later by the output commuter.


157/181

1

U H

t 8t t

1

8

Bandwidth in = 1/t Bandwidth out = 1/t

Parallel Data-Flow Solution

Input Commutater Output Commutater

Example: FFT with Serial, Pipelining and Parallel Butterflies

The FFT provides a good example of the use of alternative

signal-processing architecture to improve throughput.

The key comparison is:i) That of butterfly time &


158/181

1

U H

i) That of butterfly time, tB, &ii) The time, (N/2)T log2N, to cycle through all butterflies of an FFT.

The interval, t, includes the butterfly computation time and anyoverhead in address generation or looping.

Realistic alternatives to consider are:

Serial (direct)

Pipeline log2N stages deep, with N/2 steps

Parallel N/2 butterfly processors, iterate log2N times

t1 t5DO 20 J = 1, log2 NDO 10 I = 1 N/2

Serial (direct)

Single processor compute each butterfly, one step at a time.


159/181

1

U H

t4

t3

t2 t6

t7t8

t9t1t11t12

DO 10 I = 1, N/2

10 CONTINUE

20 CONTINUE

The Serial (Classic) Approach

The Computation Flow

DO I = 1, N/2 DO I = 1, N/2 DO I = 1, N/2

Pipeline log2N stages deep, with N/2 stepsHere there are log2N butterfly processors, corresponding to the number of passes

(3 in the case of 8 data points- B1, B2, B3); each is used to compute the butterflies

pertinent to its pass in series; as each pass is computed, the processors are ready to

accept a pair of inputs for the next pass, and when the pipeline is full (steady state),

a set of outputs will be produced by each pass (N/2 computations).


160/181

1

U H

Log2 N BUTTERFLY

PROCESSORS (B1 - B3)

IN A PIPELINE

B1t4

B1t3

B1t2

B1t1 B2t1

B2t2

B2t3B2t4

B3t1B3t2B3t3B3t4

The Pipeline Approach

N/2 BUTTERFLY PROCESSORS (B1 - B3) IN PARALLEL

Here there is one processor for each of the N/2 steps per pass; all butterflies for

that pass are computed at the same time; as soon as one pass is completed, all are

ready for the next pass; in the steady state, there will be an output for every

computation cycle.

Parallel N/2 butterfly processors, iterate log2N times


161/181

1

U H

N/2 BUTTERFLY PROCESSORS (B1 - B3) IN PARALLEL

B4t1

B3t1

B2t1

B1t1 B1t2

B2t2

B3t2B4t2

B1t3B2t3B3t3B4t3

The Parallel Approach

DO J = 1, log2N

DO J = 1, log2N

DO J = 1, log2N

DO J = 1, log2N

Summarize the differences between the serial, pipeline, and theparallel architecture for the FFT example in terms of the

computation time and the number of butterfly processors.

Consider a 1024-point FFT, what are the time and hardware costs

for the three architectures?

Q.

A.A hite t e C t ti Ti e N be f B tte fl


162/181

1

U H

Architecture Computation Time Number of Butterfly

Processors

Serial N/2log2N 1

Pipeline N/2 log2N

Parallel log2N N/2

The 1024-point FFT costing:

Serial 5,120 1

Pipeline 512 10

Parallel 10 512

High Performance System Classification SchemeThere have been many attempts to classify processor architectures. A standard classification

scheme would be exceedingly useful both for discussion purposes and as a guide to processor

designs. The requirements for such a scheme are at least that:

It be complete (i.e., include all architectures) and

Orthogonal (i.e., differentiate the key attributes).Unfortunately, despite the attractiveness of the concept, no such scheme exists. Of the many


163/181

1

U H

proposals, one forms the basis of many others. It is neither complete nor orthogonal, yet its

elegance and intrinsic simplicity are attractive and it does concentrate on data flow and control

in a general way.

The basis of scheme is that a processor processes data by a sequence of instructions regardless

of the format and mechanisms whereby each arrives at the point of action. Based on the concept

of a data stream and an instruction stream, four possibilities exist:

SISD - Single Instruction Single Data Stream

SIMD- Single Instruction Multiple Data Stream

MISD Multiple Instruction Single Data Stream

MIMD Multiple Instruction Multiple Data Stream

Answer:

Note that both the Babbage and Von Neumann architectures are SISD, although they differ greatly in

implementation. The performance of such a configuration can be though of as unity for purposes of comparison:

I

Data in

D1 D2 D3 D4

Examples are shown in the following figures.

Q. With the aid of appropriate diagram(s), show how the four categories in Flynns taxonomy can be emulatedon a dual processor shared-memory system. Your diagrams must clearly show the IS and DS from and to the

various units.


164/181

1

U H

Data in

I1 I2

Data out

(Version 1)

I3

I1 I2 I3 I4

Data in

D

(Version 2)

I

DData in Data out

Data out

I

SISD SIMD

MISD

D1

I1

D2

I2

D3

I3

D4

I4

MIMD

The SIMD architecture is an example of a parallel array in which each processing unit executes the same

instruction. It can achieve an n-fold increase in data flow band-width for each instruction, provided that

the units can be continuously utilized.

The original motivation for developing SIMD array processors was to perform parallel computations on

vector or matrix types of data. Parallel processing algorithms have been developed by many computer

scientists for SIMD computers. Important SIMD algorithms can be used to perform matrix multiplication,

Fast Fourier Transform (FFT), matrix transposition, summation of vector elements, matrix inversion,

parallel sorting, linear recurrence, Boolean matrix operations, and to solve partial differential equations.

The MIMD architecture is implemented by a multiple processor system. Clearly implied is some form of

cooperative network to share a computational task (completely autonomous units being of little interest)

Discussion on the classification scheme


165/181

1

U H

cooperative network to share a computational task (completely autonomous units being of little interest).

This is an example of a parallel array in which the task assigned to each processor can be different. The

performance enhancement potential is equal to the number of processors.

The MISD architecture is not widely implemented in practice and substantial disagreement exist on its

exact structure. It is considered here as a pipeline in which a single data stream is modified at successive

stages., and its performance enhancement potential equals the number of stages as shown in the previous

section.

There is a relationship between these classifications and the structure of processing algorithms. An

algorithm may contain a collection of processing tasks which could optimally be assigned to different

processing configurations to achieve an overall higher performance. If components were of sufficiently

low cost, a solution might be to build a conglomerate of different processing architectures and utilize the

optimum one at appropriate points in the algorithm. The task assignment problem here is formidable; and

as well the physical complexity and lowered reliability of such a conglomerate of components is a major

limiting factor of such a scheme. This will be discussed in more detail later.

SIMD Matrix Multiplication & SIMD FFT

*) G Barnes, et al.,"The ILLIAC-IV Computer," IEEE Trans. on Computers,

Aug. 1968, pp. 746-756.

To be found in the following References


166/181

1

U H

**) K Hwang & F Briggs, "Computer Architecture and Parallel Processing,"

McGraw-Hill Book Company, 1985.

*) B Wilkinson,"Computer Architecture: Design and Performance,"

Prentic-Hall Int. Ltd, 1991.

How To Design SIMD DSP System From TheOff-Shelf Fixed-Point DS Processors?

Here we will develop SIMD DSP system with a processor-pair

architecture, based on a dual-port RAM. The design is easy to

implement and provides a significant computational boost overa single processor.


167/181

1

U H

The off-shelf Fixed-Point DS Processors are two ADSP-2101s,

each with its own private memories. The following figure shows

a block diagram of the system hardware architecture.

A processor pair almost doubles the speed of a single processor while

Keeping the architecture and

Inter processor co-ordination as simple as possible.

Hardware Architecture

Program

Memory

Program

Memory

ADSP

2101

ADSP

2101

Private

DataMemory

Private

DataMemory

Common DataData

Memory

DMA DMA

PMD PMD

DMA DMA


168/181

1

U H

y

(Dual-Port

RAM)

PMD PMDDMACK DMACK

BUSYL BUSYR

Processor Pair Block Diagram

Private memories are accessible to one processor only.

Common memory is accessed by both.

Each memory has a private memory of 32K of 24-bit

program memory and 14K of 16-bit data memory.

In addition, 2K of 16-bit dual-port RAM is shared by both processors.This area of memory allows inter-processor communication and data

transfers.

Software ArchitectureTo complement the hardware design, a hypothetical application is

presented. Data is input and low-pass filtered by one processor,

then the second processor determines the peak location within a

filtered window.

Although the software implementation is simplistic, it shows a technique for programming

in a multiprocessing environment: alternating buffers and flags


169/181

1

U H

The alternating buffers in this application are two identical buffers

located in dual-port RAM so both processors can access them:

The first processor fills buffer 1 with information,While the second processor fills the information in buffer 2.

Each buffer has a flag that indicates completion of operations on

that buffer.

When processor 1 has finished its operations on the buffer data,

It sets the flag, signalling processor 2 to begin operations on that buffer.

in a multiprocessing environment: alternating buffers and flags.

The sequence of operations is shown in the following table:

Processor 1 (Filter) Processor 2 (Peak Locator)Initialise flags, coefficients initialise pointers

delay line, pointers

Perform low pass filter Check flag 1; wait if not set

operation on data in buffer 1

Set flag 1Check flag 1; if set, perform


170/181

1

U H

g p

Perform low pass filter peak locating operation on

operation on data in buffer 2

data in buffer 1

Clear flag 1

Set flag 2

Check flag 2; if set, perform

Perform low pass filter peak locating operation on

operation on data in buffer 1 data in buffer 2

Clear flag 2

Set flag 1; etc.Check flag 1; etc.

The alternating buffer scheme is easier to implement than a single buffer scheme. If only one buffer were used, careful timing analysis or extensive

handshaking would be required to ensure that the processors did not use old or invalid data.

The Modified Harvard Architecture

DSP

Processor DataStorage

DM

Program

&Data

St

PM

Data Data

Address Address

3224

Multiprocessing With The SHARC


171/181

1

U H

Harvard Architecture: Simultaneous Access of Data and Instruction

Modified Harvard Architecture: Simultaneous Access of 2 Data Memories and Instruction from Cache Gives Three Bus Performance with only 2 Busses

Storage32/4048

I/O

Cache

SHARCComplete Signal Computer On A Chip

ADSP-21000 Family High Performance Processor Core - 25ns = 40MIPS / 120 MFLOPS

Large Efficient On-Chip Memory System

- 4 Megabits on ADSP-21060- 2 Megabits on ADSP-21062


172/181

1

U H

- 2 Megabits on ADSP-21062

DMA Controller and I/O Processor- Allows Flexible, Zero-Overhead, High-Speed Data Transfers

- 240 Mbytes/s

Host Interface- Efficient Interface to 16- & 32-Bit Microprocessors

Two Serial Ports- 40 Mbit/s Multichannel Serial Ports

Two Integrated Multiprocessing Interfaces- Glueless Cluster Interface Tran

bengdsp notes

Documents