mapping signal processingmapping signal processing

41
SPAlgo2Arch 1 Mapping Signal Processing Mapping Signal Processing Mapping Signal Processing Mapping Signal Processing Algorithms to Architecture Algorithms to Architecture Sumam David S Sumam David S Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India [email protected] Objectives Objectives At the end of the lecture have gained an overview of implementation aspects of signal processing algorithms have gained an understanding of the relationship between the parameters influencing the choice of implementation appreciate the approaches in efficient mapping signal processing algorithms to architecture be familiar with few transformations that help the designer to SU2010-CS Algo2Arch 2 be familiar with few transformations that help the designer to develop different solutions for a given signal processing algorithm

Upload: others

Post on 26-Feb-2022

66 views

Category:

Documents


0 download

TRANSCRIPT

SPAlgo2Arch 1

Mapping Signal ProcessingMapping Signal ProcessingMapping Signal Processing Mapping Signal Processing Algorithms to ArchitectureAlgorithms to Architecture

Sumam David SSumam David SHead, Dept of Electronics & CommunicationNational Institute of Technology Karnataka, Surathkal, [email protected]

ObjectivesObjectivesAt the end of the lecture

have gained an overview of implementation aspects of signal processing algorithmshave gained an understanding of the relationship between the parameters influencing the choice of implementationappreciate the approaches in efficient mapping signal processing algorithms to architecturebe familiar with few transformations that help the designer to

SU2010-CS Algo2Arch 2

be familiar with few transformations that help the designer to develop different solutions for a given signal processing algorithm

SPAlgo2Arch 2

OrganisationOrganisationWhat is SP ?Applications of SPKey algorithmsImplementation options & trade-offsKey approaches in mapping algorithms to architecturearchitecture

SU2010-CS Algo2Arch 3

NITK NITK SurathkalSurathkal, India, India• 12°N, 74°E• West coast of India• Arabian Sea• Mangalore

SU2010-CS Algo2Arch 4

SPAlgo2Arch 3

SP SP –– where do u find it?where do u find it?

High performance

SU2010-CS Algo2Arch 5

Typical DSP systemTypical DSP system

SU2010-CS Algo2Arch 6

SPAlgo2Arch 4

Examples of typical signalsExamples of typical signalsSpeech and music signals p g

Represent air pressure as a function of time at a point in space

SU2010-CS Algo2Arch 7

Biomedical signalsBiomedical signals

SU2010-CS Algo2Arch 8

SPAlgo2Arch 5

EEGEEG

SU2010-CS Algo2Arch 9

Seismic signalSeismic signal

SU2010-CS Algo2Arch 10

SPAlgo2Arch 6

ImageImage

I(x,y)

SU2010-CS Algo2Arch 11

VideoVideo

SU2010-CS Algo2Arch 12

SPAlgo2Arch 7

Digital CameraDigital CameraImage Processing Algorithmsg g g

Bad pixel detection and maskingColor interpolationColor balancingContrast enhancement

SU2010-CS Algo2Arch 13

False color detection and maskingImage and video compression

Bad pixel detection & maskingBad pixel detection & masking

SU2010-CS Algo2Arch 14

SPAlgo2Arch 8

Color interpolation & balancingColor interpolation & balancing

SU2010-CS Algo2Arch 15

Digital Sound SynthesisDigital Sound SynthesisGuitar with nylon stringsy gMarimbaTenor saxophone

SU2010-CS Algo2Arch 16

SPAlgo2Arch 9

Signal CompressionSignal CompressionOriginal musicg

Audio Format: PCM 16.000 kHz, 16 Bit(Data size 66206 bytes)

Compressed musicAudio Format: GSM 6.10, 22.05 kHz

SU2010-CS Algo2Arch 17

(Data size 9295 bytes)

Image CompressionImage Compression

SU2010-CS Algo2Arch 18

8 bpp 0.5 bpp

SPAlgo2Arch 10

Signal EnhancementSignal Enhancement

Noisy speech

SU2010-CS Algo2Arch 19

Noise removed

Contrast enhancementContrast enhancement

SU2010-CS Algo2Arch 20

SPAlgo2Arch 11

Noise removalNoise removal

SU2010-CS Algo2Arch 21

Signal ProcessingSignal ProcessingWorks on discrete samples of a pcontinuous time signalReal time requirementData drivenProgrammable or custom DSPs

SU2010-CS Algo2Arch 22

g

SPAlgo2Arch 12

Implementation choiceImplementation choiceDepends on p

Sampling rateThroughputPower - energyAreaWordlength – precisionFlexibility

SU2010-CS Algo2Arch 23

Time marketVolume

Examples of DSP PrimitivesExamples of DSP PrimitivesConvolutionDigital Filters

Finite Impulse response (FIR)Infinite Impulse Response (IIR)

CorrelationDiscrete Fourier Transform / Fast Fourier Transform (FFT)(FFT)Discrete Cosine Transform (DCT)Least Mean Square (LMS)Applications often comprised of many primitives

SU2010-CS Algo2Arch 24

SPAlgo2Arch 13

Two basic DSP structuresTwo basic DSP structuresb0

Finite impulse Response (FIR) Infinite impulse Response (IIR)

b1-a1

-a2 b2

SU2010-CS Algo2Arch 25

Finite impulse Response (FIR)No feedbackOrder -4

Infinite impulse Response (IIR)FeedbackOrder - 2

Different Different applnsapplns different demandsdifferent demands

SU2010-CS Algo2Arch 26

SPAlgo2Arch 14

Standard processors or special purpose ?Standard processors or special purpose ?

SU2010-CS Algo2Arch 27

Architectural optionsArchitectural optionsOTS (Off The Shelf) processors

P bl i DSPProgrammable microprocessors or DSPBased on generic computational units, for DSPs usually MACPrefabbed or IP cores

Time-multiplexed application specific processorsSeveral algorithmic operations performed on same hardware unitTrades reduced HW for longer computation time

Hardware mapped architecturesOne (or more) hardware unit per algorithmic operationHigh hardware cost and high throughput

SU2010-CS Algo2Arch 28

SPAlgo2Arch 15

Hardware optionsHardware options

SU2010-CS Algo2Arch 29

FPGA - Field Programmable gate arraysASIC – Application Specific Integrated Circuit ASIC

Key design issue today Key design issue today -- energyenergy

Utilising the computational timeclock frequencysupply voltageSleep modes

SU2010-CS Algo2Arch 30

SPAlgo2Arch 16

Algorithm 2 ArchitectureAlgorithm 2 ArchitectureWhich structure gives optimal performance, energy and area?How to get from a signal processing algorithm to an EFFICIENT implementation using

Different numbering systemsPipeliningParallelismRetiming, SchedulingStrength reduction, i.e. complexity of operations.etc, etc,...

in a structured way!

SU2010-CS Algo2Arch 31

An exampleAn exampleDatapath and control pathp p

SU2010-CS Algo2Arch 32

SPAlgo2Arch 17

DSP basic operationsDSP basic operations

∑ ∑M N

b1

FIRIIR

∑−

=

−⋅=1

0

/2N

n

Nnmjnm exX π

∑ ∑= =

−− ⋅+⋅=k k

knkknkn ybxay0 1∑

=−⋅=

1

0

N

kknkn xhy

IIR

TransformsFFT,DCT

SU2010-CS Algo2Arch 33

Decomposition (SVD, LU, QR) - Matrix operations

Arithmetic operations ? x, +, shiftsNon terminating – data flow , sampling rate

Number representationNumber representationBinary number systemy y

Integers eg. 0110 6 (decimal)Two’s complement 1010 -6 (decimal) (4 bit)

Fractional valuesDivide by 2n-1 range [-1, +1)

n bits : [2n-1 , 2n-1-1]b ts [ , ]8 bits : ?

Fractional : -1 , +127/128Resolution : 1/256 of range :[-1. +1)

SU2010-CS Algo2Arch 34

SPAlgo2Arch 18

Floating point numbersFloating point numbers32 bits S(1) Exponent (8) Mantissa (23)

Number magnitude: (1-mantissa)x2exponent-127

Smallest +ve: (1.00….0) x 2-126

Largest +ve: (1.11….1) x 2127 ~ 2128

Mantissa – determines resolutionExponent determines dynamic range

S(1) Exponent (8) Mantissa (23)

Exponent – determines dynamic range

Much wider dynamic range at the cost of energy, area, computation timeEnergy efficient implementation fixed point used

SU2010-CS Algo2Arch 35

Finite word length effectsFinite word length effectsCoefficient quantisation 0

original - solid line, quantized - dashed line

qSignal quantisation

Round off noiseLimit cycles

Scaling 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-80

-70

-60

-50

-40

-30

-20

-10

ω/π

Gai

n, d

B

Adjust signal to fit the hardware

Saturation arithmetic

SU2010-CS Algo2Arch 36

SPAlgo2Arch 19

Implementation FlowImplementation FlowHigh level description - Specifications

EquationsMatlab, C/C++

ArchitectureBuilding blocks, speed and areaBlock diagramOptimisations: #bits, # coefficientsHardware Description languages – Verilog, VHDL

Compile (Synthesize)Mapping to gatesPlace & Route

SU2010-CS Algo2Arch 37

Implementation of FIR filtersImplementation of FIR filters

∑3

h∑=

−⋅=0k

knkn xhyAlgorithm non-terminating

SU2010-CS Algo2Arch 38

One execution of the loop – one iterationIteration period = time for one iteration

SPAlgo2Arch 20

Direct form 4 tap Fir filterDirect form 4 tap Fir filter

SU2010-CS Algo2Arch 39

FIR with adder treeFIR with adder tree

SU2010-CS Algo2Arch 40

SPAlgo2Arch 21

Critical pathCritical pathCritical path Sampling rate = no of Critical path

between delaysinput to delaydelay to outputinput to output

samples processed/secLatency = time difference between output and corresponding input

Combinatorial logic = gate delaysS fSequential logic = no of clock cycles

SU2010-CS Algo2Arch 41

Signal Flow GraphSignal Flow GraphNodes: represent computations and/or taskDirected edge (j, k): denotes a linear transformation from the input signal at node j to the output signal at node k

in digital usually limited to delays and constant multipliers

Source = no entering edgeSink = only entering edges

SU2010-CS Algo2Arch 42

SPAlgo2Arch 22

TranspositionTransposition

SU2010-CS Algo2Arch 43

Reverse the direction of all edgesExchange input and output

Transposed FIR filterTransposed FIR filter

SU2010-CS Algo2Arch 44

SPAlgo2Arch 23

Fixed point implementationFixed point implementation

SU2010-CS Algo2Arch 45

PipeliningPipelining

SU2010-CS Algo2Arch 46

Pipelining increases latencyMore registers can bring down Critical Path delay to max(TM,TA)

SPAlgo2Arch 24

Data Flow GraphData Flow GraphDFGs capture the data-driven property of DSP algorithmD t Fl G h d t t ti ( f ti )Data-Flow Graph – nodes represent computations (or functions)

directed edges represent data flow with non-negative number of delaysa node can execute whenever all its input data are available.Each edge describes a precedence constraint between two nodes in DFG:

Intra-iteration precedence constraint: if the edge has zero delaysInter-iteration precedence constraint: if the edge has one or more delays

SU2010-CS Algo2Arch 47

DFGDFG

SU2010-CS Algo2Arch 48

SPAlgo2Arch 25

Dependence GraphDependence GraphLike DFG a combination of nodes (tasks) with ( )directed edges which indicate precedence constraintsUnlike DFG nodes are not reused. Instead in each iteration a new node is createdDG has an explicit time axis, hence delays are

SU2010-CS Algo2Arch 49

p , ynot indicatedExtremely modular regular arrays

FIR Filter FIR Filter -- DGDGy(n) = h0 x(n) +h1x(n-1) +h2x(n-2)

o

o

o

o oh2

h1

h

y

h

x'

h'

y'

SU2010-CS Algo2Arch 50

h0

y(0) y(1) y(2)

x(0) x(1) x(2)

xx' =x, h' = h,

y' = y + hxn

SPAlgo2Arch 26

Recurrence equationsRecurrence equations

SU2010-CS Algo2Arch 51

Time for one complete cycle = TM+TA – Iteration periodSampling time TS ≥ TM+TA

Precedence constraintsPrecedence constraints

SU2010-CS Algo2Arch 52

SPAlgo2Arch 27

Loop BoundLoop Boundloopon timeexecution TotalboundLoop =

loopin delaysTotalboundLoop

SU2010-CS Algo2Arch 53

Iteration BoundIteration Bound

I i B d M i l b dIteration Bound = Maximum loop bound

SU2010-CS Algo2Arch 54

Delay free loops cannot be computed

SPAlgo2Arch 28

Iteration boundIteration boundClock period is lower bounded by the critical path computation timeSampling period is lower bounded by the iteration bound regardless of computational resources availableIteration bound

Depends on technologyDerivation of graph from equationsg p q

SU2010-CS Algo2Arch 55

Pipelining and ParallelismPipelining and ParallelismPipelining

Splits the logic path by introducing pipeline registersleads to a reduction of the critical path but introduce latencyEither increases the clock speed (or sampling speed) or reduces the power consumption at same speed in a DSP system

Parallel ProcessingMultiple outputs are computed in parallel in a clock periodMultiple outputs are computed in parallel in a clock periodThe effective sampling speed is increased by the level of parallelismCan also be used to reduce the power consumption

SU2010-CS Algo2Arch 56

SPAlgo2Arch 29

PipeliningPipelining

SU2010-CS Algo2Arch 57

Parallel processingParallel processing

SU2010-CS Algo2Arch 58

SPAlgo2Arch 30

PipeliningPipeliningCutset - A set of edges that if removed, or cut, results in two disjoint

hgraphs.Feedforward Cutset - if data is moved in forward direction on all cutsetsPipelining – Adding delays at feedforward cutsetsReduces critical path delayIntroduces latency, no change in functionality

SU2010-CS Algo2Arch 59

PipeliningPipelining

SU2010-CS Algo2Arch 60

Tnode = 1CPold = 4CPnew= 2

SPAlgo2Arch 31

Parallel FIR filterParallel FIR filter

SISO FIRSISO FIR )2()1()()( 210 −+−+= nxbnxbnxbny

MIMO FIR

SU2010-CS Algo2Arch 61

Parallel 3 tap FIRParallel 3 tap FIR

SU2010-CS Algo2Arch 62

SPAlgo2Arch 32

Pipelining & ParallelismPipelining & ParallelismCritical path unchanged in parallel systemp g p y

Tclock ≥ TM + 2TA

Titer = Ts = Tclock/LParallel system Tclock ≠ Ts

Pipeline system Tclock = TsB bi i ll l i (bl k i L) dBy combining parallel processing (block size: L) and pipelining (pipelining stage: M), the sample period can be reduced toTiter = Ts = Tclock/LM

SU2010-CS Algo2Arch 63

RetimingRetiming

M i d l i hMoving delays in the system

SU2010-CS Algo2Arch 64

• Modifies Critical path delay, no of registers• Retiming does not change

- delay in a loop- iteration bound

SPAlgo2Arch 33

CutsetCutset retimingretiming

SU2010-CS Algo2Arch 65

Node retimingNode retiming

Cut set around a nodeCut set around a node

SU2010-CS Algo2Arch 66

SPAlgo2Arch 34

Algorithmic Strength reductionAlgorithmic Strength reductionReduces no of strong operationsg p

SU2010-CS Algo2Arch 67

Fixed coefficient multiplicationFixed coefficient multiplication

SU2010-CS Algo2Arch 68

SPAlgo2Arch 35

Complex multiplicationComplex multiplication(a+jb)(x+jy)( j )( jy)Can u do it with 3 multipliers and few extra adders?

SU2010-CS Algo2Arch 69

Architectural synthesisArchitectural synthesisGiven a DFG, resources,

AllocationAllocate enough resources to solve the problem

BindingWhich operation happens on which resource

SchedulingSchedulingWhen should each operation take place

SU2010-CS Algo2Arch 70

SPAlgo2Arch 36

6 tap FIR Filter6 tap FIR Filter

Cycle Operations Resources1 m1, m2,m3,m4,m5,m6 6 multi2 a1 1 adder

Resource UtilisationAdder – 5/6 ~83%M lti li 6/36 16 67%

SU2010-CS Algo2Arch 71

2 a1 1 adder3 a2 1 adder4 a3 1 adder`5 a4 1 adder6 a5 1 adder

Multiplier – 6/36 ~ 16.67%m1, m2 have to be in cycle 1

Alternate Alternate –– 2 M +1 A2 M +1 A

Cycle Operations Resources1 m1, m2 2 multi2 a1,m3,m4 1 adder+1multi

SU2010-CS Algo2Arch 72

2 a1,m3,m4 1 adder 1multi3 a2,m5,m6 1 adder+1 multi4 a3 1 adder`5 a4 1 adder6 a5 1 adder

Adder – 5/6 ~83%Multiplier – 6/12 = 50%

SPAlgo2Arch 37

1M, 1A1M, 1A

Cycle Operations Resources1 m1 1 multi2 m2 1 adder+1multi3 m3, a1 1 adder+1 multi

SU2010-CS Algo2Arch 73

3 m3, a1 1 adder 1 multi4 m4, a2 1 adder+1 multi`5 m5, a3 1 adder+1multi6 m6,a4 1 adder+1multi7 a5 1 adder

Adder – 5/7 ~84%Multiplier – 5/7 = 73%

SchedulingSchedulingASAP or ALAPMobility =ALAPtime – ASAPtime for each functionTime bound = Total no of opns of a type / # resourcesSystem bound = max over all types of resourcesIf resource constraint

(1M 1A) max(6 5) = 6 time units(1M,1A) max(6,5) = 6 time unitsPipelining / retiming

6M, 5 A 2 cycles

SU2010-CS Algo2Arch 74

SPAlgo2Arch 38

IIR filterIIR filter

SU2010-CS Algo2Arch 75

4 cyclesIteration Bound = 3

IIR filterIIR filter

SU2010-CS Algo2Arch 76

3 cycles2A+3M

SPAlgo2Arch 39

Embedded System Design DilemmaEmbedded System Design DilemmaASICASSP

In-System

COMPUTEPERFORMANCE

FPGA?

In System ConfigurableProcessors

SU2010-CS Algo2Arch 77

SYSTEM TIME-TO-MARKET

GPPDSP

MEDIASECURITY

ETC.

SW HW

Combining The Best of HW and SWCombining The Best of HW and SW

Registersm) Registers

Datapath

Con

trol

FSM Storage

Mem

ory

(Pro

gram

SU2010-CS Algo2Arch 78

control compute

SPAlgo2Arch 40

Software Configurable Processor ConceptSoftware Configurable Processor Concept

Registers

Datapath

Con

trol

FSM Storage

Mem

ory

(Pro

gram

)

STRETCHISEF

SU2010-CS Algo2Arch 79

DatapathM

ConclusionConclusionOverview of implementation signal processing algorithmsRelationship between parameters influencing the choice of implementationLooked at some approaches to efficiently map signal processing algorithms to architecture

Hopefully a thought on what happens atHopefully a thought on what happens at hardware level and how to optimisealgorithms to suit the architecture

SU2010-CS Algo2Arch 80

SPAlgo2Arch 41

ReferencesReferencesK.K. Parhi, VLSI Digital Signal Processing , g g gSystems, Wiley 1999

SU2010-CS Algo2Arch 81

18 March 2010 Embedded Processor Architecture 82

Thank you