dsp—why so hard? november 2010. who ? peter.eastty@oxford-digital.com design and sell processor...

Post on 28-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DSP—Why So Hard?

November 2010

Who ? Peter.Eastty@Oxford-Digital.com Design and sell processor cores and matching

programming environments. Program strange algorithms onto stranger

processors with the strangest tools.

Customers NDAs Client lists.

You

Why ?

People expensive, Silicon cheap. People slow, Silicon fast. People slow, Computers fast.

Programmer efficiency is everything, it gets you time to market which gets you the market.

Why ?

Targets move, right up to the last minute. Never, ever build a fixed function device. Three stories.

Third party algorithms will have to be adopted whether public domain or highly secret.

What is an audio signal?

Known bandwidth Known resolution Known number of channels

So why don’t I enumerate them

Where ?

Large mixing consoles Cell Phones Hi Fi TVs iPod docs PCs etc.

What processing do we want to do to the audio? Continuously varying value against time

Filtering Polynomial Non-linear Decisions DSP = Low Delay Not block structured

What hardware resources are available?

Memory

Multipliers

Connections

Adders etc.

Data types for DSP

See RBJ on headroom and floating point

Want a fixed point data type.

Word length, 16 – 32 etc. have nothing to do with audio, setup the word-length to suit the audio not a computer.

Languages for DSP

C is not a DSP language, the data types are all wrong and it has no concept of time.

C++ could be a DSP language but it doesn’t want to be one, it too has no concept of time.

Languages for DSP

With modern hardware design and compiler technology there is never a need for assembler. NEVER EVER.

Of course if you’re tied to old hardware for legacy code reasons you still might have to hack in assembler.

Languages for DSP

main (…)

{ ASM(…);

ASM(…);

ASM(…);

}

/* This is not C. */

Languages for DSP

main (…)

{ Clear_Acc();

MAC(…);

Store_Acc_to_Register(…);

}

/* This isn’t either. */

Languages for DSP

main (…)

{ Multiply_by_Coefficient();

Biquad(…);

Do_FFT(…);

}

/* Neither is this. */

Languages for DSP

Beware of ‘optional extensions’. They can become mandatory.

There is still at least one University teaching DSP using FORTRAN and assembler …

...sad to say they apologized about the FORTRAN.

Languages for DSP

I don’t know the perfect DSP language.

But any high level language is better than any machine specific language.

Multiple Memory Banks

If there are multiple memories then memory allocation is NOT the programmers job, the tool-chain should do this for you.

But it might be nice to be able to do some if you want to.

Multiply-Add

Source level individual operations (add, multiply etc.) should be independent, hardware instructions can combine multiple operations (like Multiply-Add).

Make sure the operations in a combined instruction are exactly the same as those in individual instructions.

Limiting

Whatever number system you use it will have a range, even floating point.

Limiting will be required after every operation that can exceed the range, multiply, add, subtract and absolute value.

This includes the multiply in a multiply-add.

-1 x -1 = -1 ????????

Pipelines User should never have to think about

pipelines. Variable pipelines are wrong. Pipeline is not a panacea for timing

problems, it limits the processing in a loop. Pushing code through a branch. Using the pipe for parameter passing.

Pipelines

Definition of pipeline length, count between the instruction that generates an item and an instruction that may use it.

Short circuiting the pipe. Useful, but not very useful.

Can unwind the execution by having pipeline-length prime relative to instruction count, but this adds to delay, which in turn adds to storage requirement.

Branching

If you can find another way avoid branches.

If you have to have jumps and a pipeline keep it all away from the programmer.

If you do have jumps they’ll likely break the guaranteed timing.

Conditional Execution

Conditional execution doesn’t break pipeline etc.

But you’ll need as many condition code stores as you have pipeline length.

Timing is identical for conditional execution and multiplexer.

Multiplexer

y = (a < 0.0) ? b : c; Timing is identical for conditional

execution and multiplexer. With multiplexer you can use any variable

as a control so no condition code store is required.

y = (a <= 0.0) ? b : c;

ABS

For simple bends in an input/output relationship, Absolute Value plus some Addition and Subtraction is more economical than most other methods.

Truncation, Rounding, Dither and Noise Shaping For every instruction that needs it …

… and just for Output

Assume fixed point Floating point is hard

Truncation, Rounding and Truncation Towards Zero!.

Truncation is easy but has DC offset Truncation Towards Zero! ½ LSB offset number systems Rounding wins and is not much more

complex.

Dither

How do we make it? Truly random, pseudo random, hash? What colour do we want it to be? What PDF do we want? Make sure it’s un-correlated. Want repeatability for test. Problems with infinite gain components. Rounding wins.

Noise shaping

What shape? What order? Want repeatability for test. Problems with infinite gain components. Rounding wins.

Make sure your instruction set can do dither and noise shaping.

Coefficient Interpolation

Coefficients as a sampled system SRC called interpolation HW or SW, 2-3 instructions to feed one. Only in exceptional circumstances is it

worth a hardware solution. Linear is possible, first order filter is easy

and works for many applications.

Coefficient Synchronization

Coefficient synchronisation. Lots of people ignore it or treat it on a per

use basis. Can be done for linear or first order filters

with ease. This is really a synchronous sampling

problem.

CoefficientSynchronization, Synchronization

J i t t e r

Scaling, multiprocessors, synchronisation & segmentation

Not all solutions fit in a single processor. Automatic segmentation of programs

across multiple processors is possible. But it is hard. If the processors are not identical, and

identically connected it’s very, very hard.

Scaling, multiprocessors, synchronisation & segmentation If you have multiple processors and no

branches then you can run them in lockstep, many examples.

For data transfer between processors simply send from one processor and receive by the other at the same time.

Disastrous for assembler, easy for compiler.

Scaling, multiprocessors, synchronisation & segmentation How do you connect multiple processors,

series or parallel? If you chose either then you can’t do some

algorithms. Use mesh or router instead. Small routers are actually cheap and

relatively easy to generate code for. Multiple processors I/O, dedicated

processor connections or is I/O a full member of the clan?

Constant folding and common code removal. Easy in a compiler, often missed by an assembly

language programmer. Keep everything as source until the last possible

moment. That way common parts can be taken advantage

of, constants, but more importantly data and instructions.

Leads to documentation of library functions requiring “at most X data memories and Y instructions”.

Libraries

Binary libraries don’t work well with pipelined processors, the cost of getting into or out of them is usually to great.

A binary library (like a dll) is NOT a secure method of distributing intellectual property.

Encrypted source going through a trusted tool-chain to generate encrypted binaries is the way to go.

Hardware with problems….

Let’s just have one continuous data type (and maybe one integer type).

Different widths for different memories makes horrible problems.

Private instruction sets and ‘Useful’ instructions.

Hardware with problems….

Do not chisel a digital analogue of an analogue circuit out of digits.

Sample rate to silicon clock ratio

Hardware with problems….

Bi-quad coefficient ranges. Feedback coefficients ranges need to be

big enough. Feed-forward coefficient ranges are not

limited, they can get big. If there’s nowhere in your system to make gain, you’re in trouble.

Hardware with problems….

The accumulator is dead. When hardware was expensive and DSP engineers were cheap it made sense to get performance this way, but that is no longer true.

Most of today’s algorithms aren’t sums of products anyway.

And it makes a high level description difficult.

Hardware with problems….

Double precision is probably not the right thing for LF filters.

Choosing the right filter structure and adding a few bits is a financially better solution.

Hardware with problems….

If you must have an accumulator make sure you can load and store it!

Hardware with problems….

Shifting is required to get gain into the system.

There are few reasons for a shift of greater than 2^7 and very few for more than 2^15.

Shift after the multiplier, it’s the only place where there are the bits to shift.

Shift in the wrong place is common.

Hardware with problems….

If a standard 5 coefficient bi-quad takes more than 5 instructions there’s something wrong.

A simple Z-1 delay, and cascades thereof should not consume instructions.

Simple rotating memory, and language support.

Hardware with problems….

A pipeline needs to be started cleanly. This is not always easy.

Debuggnig

Source level debugging is perfectly standard in almost every general purpose processor toolset, why is it missing from DSP toolsets?

Debuggnig

If you do add a debugger, remember that the objects you are processing are signals, thus they vary with time.

A numerical display of a signal is generally useless, like using a DVM to analyse audio, necessary but not sufficient.

Provide a scope and signal generator.

Debuggnig

Debugging Input or Output is a signal.

Easiest done by the instruction NOT the location.

How do we make DSP easier?

Get the algorithm away from the hardware

Use DSP that is compiler compatible

DSP – Why So Hard?

ProgramOnlySignals.Easy?Yes!

Program Only Signals. Easy? Yes!

DSP—Made Easy!

November 2010

top related