guillermo güichal utn – frbb · integer division of system clock (sample rate = clock / 4 is...

Introduction to

DSP Using FPGAs

Guillermo GüichalUTN – FRBB

Program

Morning

Introduction to DSP What is DSP? How is it done?

Why use FPGAs for DSP? Comments on DSP algorithms and FPGA implementations.

Issues related to DSP using FPGAs Clock frequency, sampling, bit count, arithmetic operations.

FPGA Design flow for DSP applicationsDesign alternatives: HDLs, dedicated tools, etc.

Basic design examples

Program

Afternoon Intro to Xilinx System Generator

Xilinx Sysgen and its interaction with Matlab, Simulink & ISE.

Use of Xilinx SysGen for DSP Use of Sysgen for simulation and synthesis.

Examples

Additional Topics Other tools and additional comments

References

“On the Roots of Digital Signal Processing, Parts 1 & 2”, IEEE Circuits and Systems Magazine, Vol. 7 Number 1 and 4

Berkeley Design Technology, Inc. whitepapers, www.bdti.com

DSP-FPGA.com articles, www.dsp-fpga.com

Andraka Consulting articles, www.andraka.com

Programmable Logic Design Line articles, www.pldesignline.com

FPGA and Structured ASIC Journal articles, www.fpgajournal.com

ACM Queue articles, www.acmqueue.com

“Applying Data Converters”, Texas Instruments

“The Scientist and Engineer’s Guide to DSP”, Steven W Smith, www.DSPGuide.com

IEEE papers, www.ieeexplore.ieee.org

“Digital Signal Processing with FPGAs”, Uwe Mayer-Baese, Springer

Xilinx, Altera and Lattice documentation, www.xilinx.com, www.altera.com, www.latticesemi.com

The Mathworks documentation, www.mathworks.com

etc.

http://www.bdti.com/

http://www.dsp-fpga.com/

http://www.andraka.com/

http://www.pldesignline.com/

http://www.fpgajournal.com/

http://www.acmqueue.com/

http://www.dspguide.com/

http://www.xilinx.com/

http://www.altera.com/

http://www.latticesemi.com/

http://www.mathworks.com/

Let’s go over some backgroundinformation on DSP

A Propos of the “Treatise on Cubic Form" by Juan de Herrera

Salvador Dali, 1960

What is DSP?

From wikipediaDigital signal processing (DSP) is the study of signals in a digital representation and the processing methods of these signals. DSP and analog signal processing are subfields of signal processing. DSP includes subfields like: audio and speech signal processing, sonar and radar signal processing, sensor array processing, spectral estimation, statistical signal processing, digital image processing, signal processing for communications, biomedical signal processing, seismic data processing, etc.

What is DSP?

Digital Signal Processing (not Processors)

DIGITAL: Digital domain, as opposed to analog. Everything is digital nowadays…

SIGNAL: A physical quantity that changes over time.

PROCESSING: Do something with the signal, manipulate it in useful ways.

What is DSP?

We have always “processed” signals…… to communicate

… to understand and summarize scientific data

… for entertainment

... etc.

Now we do it digitally, research new methods and algorithms and constantly find new challenging and complex applications.

Many of the mathematical methods and algorithms used for signal processing are well known, and were developed within other contexts.

Refer to the Circuits and Systems magazine series “On the Roots of DSP”, by Andreas Antoniou for a history of DSP.

What is DSP?

So…

We want to manipulate signals… which are usually real signals like audio, temperature, currents and voltages, seismic, sonar, RF waves (communications), images, biological, etc.

We take them into the digital domain because it makes life easier for us.

We manipulate (process) the signals using algorithms and methods to transform them in ways that are useful for our purposes.

What is DSP?

Real signals Analog signal conditioning Bandwidth, amplitude, etc.

Make this as simple as possible

Digital domain Sampling (discretization and quantization) We try to do this as early as possible in the process

Processing Signal processing algorithms and methods Implies mathematical operations, delay lines. Lots of theory and tools… implementation issues! An interesting blend of theory and practice

And we want to do all this this in the simplest, cheapest manner…

What is DSP?

DSP is made possible by mathematical research, the digital computer and IC technology

Discretization and interpolation has been part of mathematics since classical times

Work by Fourier, Poisson, Laurent and others during the 1700 an 1800s Work during 1900s by Nyquist, Shannon, Bode and others. Calculating machines, ENIAC and the modern digital computer Integrated circuit technology in 1950s Numerical filtering methods during 1960s Creation of specific processors (DSPs), ADC and DACs Powerful processors, IC technology and alternatives to ASICs Tools, compilers, simulators make our job easier

How is DSP Done?

DSP algorithms… Filters

FIR IIR

Discrete Fourier Transform

DSP algorithms… Direct Digital

Synthesis (DDS)

Digital Up-converter

How is DSP Done?

DSP algorithms… OFDM Receiver (used in benchmark article)

How is DSP Done?

DSP applications (DSPs: Back to the Future, ACM Queue article)

How is DSP Done?

DSP algorithms… shape DSP architectures Fast multiplication and other DSP tasks

Single cycle, multiply accumulate (MAC), ALUs, shifter, wide accumulators

Flexible and efficient memory accessData delay lines, FIFOs, dedicated address generation (inverted, circular addressing), high bandwidth (multiple busses, coefficient )

Efficient Looping Zero overhead looping, addressing and calculations in parallel

Real time, speed High clock frequency, parallelism (MAC, ALU, address generation, SIMD), special instruction sets (low end DSP), multiple execution units (high end, VLIW

Streamlined I/O and interfacesMust connect to ADC, DACs and transfer data in and out in real time and with little overhead

Data formatsDiverse precisions, accumulator guard bits. Support for rounding, saturation and shifting. Speed, cost & power-> Fixed point, Numeric fidelity-> floating point

How is DSP Done?

What else does wikipedia say…DSP algorithms have traditionally run on specialized processors called digital signal processors (DSPs). Algorithms requiring more performance than DSPs could provide were typically implemented using application-specific integrated circuit (ASICs). Today however there are a number of technologies used for digital signal processing.

These include more powerful general purpose microprocessors, field-programmable gate arrays (FPGAs), digital signal controllers (mostly for industrial apps such as motor control), and stream processors, among others.

How is DSP Done?

Nowadays there are several options for DSP applications ASICs (Application Specific Integrated Circuits)

ASSP (Application Specific Standard Product)

DSP (Digital Signal Processor)

FPGA (Field Programmable Gate Array)

GPP and MCUs with DSP enhancements

High end CPUs

We have to choose the right platform for each problem!Speed? Cost? Power? Tools? Time to market? Flexibility? Other tasks?

How is DSP Done?

How do we choose? What are my needs? What are each options’ strengths? What are each options’ limitations? What are my strengths and weaknesses? Are there tools available? What will be around the DSP portion of my design?

Strengths and limitations of each option change over time… and they change quickly!

Update your information and don’t take anything for granted

How is DSP Done?

FPGA Technology Overview

Reading

Salvador Dali, 1981

FPGA Overview

An FPGA is a “sea of gates”. Lots of logic that can be connected together to form different combinational and sequential digital circuits.

An FPGA inside

Function generation (combinational logic) Registers and latches (sequential logic) Memory Clock management Power management

DSP functions!!!

FPGA Overview

Xilinx Spartan 3 FPGA – General FPGA Architecture

FPGA Overview

Xilinx Spartan 3 FPGA

CLB Structure

FPGA Overview

Xilinx Spartan 3 Memory

FPGA Overview

Xilinx Spartan 3 Clock Management

FPGA Overview

Xilinx Spartan 3 Routing

FPGA DSP Functions

High end FPGAs – Function generators & registers Xilinx Virtex 5

Altera Stratix III

DSP: Low cost FPGAs Xilinx Spartan 3 has multipliers

Altera Cyclone III

FPGA DSP Functions

DSP: Low cost FPGAs LatticeECP-DSP

FPGA DSP Functions

DSP: Low cost FPGAs LatticeECP-DSP vs Spartan 3 (Lattice Article)

FPGA DSP Functions

DSP: Low cost FPGAs Xilinx Spartan 3A-DSP (XtremeDSP DSP48 slices)

FPGA DSP Functions

DSP: High end FPGAs

Xilinx and Altera both have High End FPGAs with DSP enhancements

High speed multipliers

Flexible Multiply–Accumulate logic

DSP block cascading and interconnection

Rounding and saturation units

Barrel shifter

Support for floating point multiplication

Advanced clock and power management

Support for additional DSP Intellectual Property (IP)

FPGA DSP Functions

DSP: High end FPGAs Altera Stratix III DSP Blocks

FPGA DSP Functions

DSP: High end FPGAs Xilinx Virtex 5 DSP48 Slice

FPGA DSP Functions

Do we want to use an FPGA?

Tower of Enigmas

Salvador Dali, 1981

FPGA Overview

Remember… a DSP is essentially a sequential processing machine, with support to execute (although most DSP do several things in parallel)

Analog Devices’AD21xx architecture

FPGA Overview

… but some are very powerful processing machines!

TI’s C6712 AD’s Blackfin

When to use FPGAs

When do we choose FPGAs to do DSP?

FPGAs are good for… Lots of parallel processing Many simple and rigid, repetitive tasks High sampling rates and data bandwidth Fixed point operations Implementing small DSPs blocks within lots of digital logic Prototyping or replacing ASICs Flexible or dynamic hardware configuration Mapping a block diagram directly into hardware Multirate systems Configurable word lengths and precision What else?

When do we choose FPGAs to do DSP?

FPGAs are not that good for… Sequential tasks (if we have C code available) Complex tasks with lots of decision making and branching Very low power applications (but that is changing) Floating point operations … What else?

When to use FPGAs

When to use FPGAs

From Xilinx slides..

FPGA vs DSPFrom an ACMQueue article

From FPGA vendor’s article (Altera)

HighHighEasyHighLowShortRISC/GPP

HighLowEasyLowLowestShortMCU

HighHighHardHighHighShortFPGA

HighLowEasyLowHighShortDSP

LowLowEasiestLowHighShortestASSP

LowLowHardestLowHighLongestASIC

FlexibilityPowerEase of UsePricePerformanceTime to Market

When to use FPGAs

FPGA vs DSPFrom FPGA vendor (Altera at FPGA-DSP.com article)

When to use FPGAs

FPGA vs DSPFrom FPGA vendor (Xilinx at DSP Engineering article)

FPGAs for high end applications

Improved performance (parallelism)

Lower system power (compared to DSP clusters)

Reconfigurable hardware (evolving standards)

Custom bit precision

Optimization of computation hardware (not possible in DSPs)(distributed arithmetic, etc, see Andraka)

High I/O bandwidth

When to use FPGAs

FPGA vs DSP and other options

ASICs (Application Specific Integrated Circuits)

ASSP (Application Specific Standard Product)

DSP (Digital Signal Processor)

FPGA (Field Programmable Gate Array)

GPP and MCUs with DSP enhancements (dsPIC, ARM DSP extensions)

High end CPUs (Intel, AMD doing processing for audio and images)

Comments? Opinions? Other issues?

When to use FPGAs

We’ve decided to use an FPGA!What issues affect our implementation?

Portrait of Mrs. Mary Sigall

Salvador Dali, 1948

Implementation Issues

Issues that affect the implementation on an FPGA Data frequency, sampling frequency, clock frequency

Number representation, word widths, precision, rounding

Arithmetic operations, parallel, serial, distributed, overflow, underflow, saturation

Look-Up tables, block ram or distributed memory, optimizations


Frequencies Sampling frequency: Frequency at which samples of the data are

taken and processed.

Clock frequency: Frequency of the system clock (Clock driving the FPGA registers)

Data rate: Rate at which new data arrives and needs to be processed

Multiple frequencies: several different data rates, multiple sampling frequencies and/or different clock domains

These rates and frequencies will affect and limitthe possible architectures and solutions


Sampling frequency Data sampling must meet the Nyquist criterion

External data must be band limited before it is sampled (filters)

Data can be oversampled or undersampled Oversampling used to increase SNR or reduce effects of

quantization noise Undersampling used in IF or RF signals

Multirate systems have several sampling frequencies Relationship between them affects data transfers between them

FPGA will drive ADC control signals ADC timing, relationship between system clock and ADC signals Meet data setup and hold times in FPGA


Sampling frequency Manage synchronization to system clock

Integer division of system clock (Sample rate = Clock / 4 is easy!) Manage multiple sample rates (Downsampling by 43 is hard) Use asynchronous FIFOs to get data to and from processing

logic to the DAC and ADC FPGA will probably drive ADC and DAC control signals

ADC and DAC timing Relationship between system clock and converter signals Must meet data setup and hold times for FPGA and DAC


Clock frequency Higher clock frequencies will enable higher sampling rates

Higher clock frequencies will consume more power

Use as few clock domains as possible and control rates with “Clock Enable” input

Clock dividers generate “CE” signals at lower rates

Logic can be reused


Clock frequency If sampling frequency is lower than system clock, several

clocks can be used to process the data


Data rate Useful data might come at lower frequencies than the

sampling frequency

Some data may not need to be sampled at same as others


Clock frequency and sample rate

Data at clock rate

Filter result available every on clock cycle

Data at CE rate

Filter result takes several clock cycles to complete

Note: Data has N bits and each FF represents N registers


Clock frequency and sample rate Clock at high speed

Data at CE rate (sample frequency)

Each coefficient is multiplied at CE2 rate

Multiplication is implemented with shift-add logic, and takes several cycles to complete

Filter result takes several clock cycles to complete


Clock frequency and sample rate In the filter shown, timing between all signals must be synchronized to

achieve results Use SYNCHRONOUS logic, as recommended for FPGAs

Different CE signals control what parts of the process are activated by enabling FFs


Number representation, Bits and Word Widths Fixed point or floating point. Number representation.

Operations on the data will change the word size to maintain full precision

Scaling

Overflow, underflow, rounding


Number systems for binary representations

Fixed point numbers on FPGAs (For now! High end FPGAs have support for some floating point operations)

Each system has advantages and disadvantages for implementations in digital circuits or arithmetic operations.

Our examples will use fixed point two’s complement representations

Fixed point

Traditional Non traditional

•Two’s complement•One’s complement•Sign-Magnitude•Diminshed-1

•Signed digit•RNS


Fixed point binary numbers Fixed point sets the decimal point at a fixed location within

the binary word


Operation results have longer word lengths


Overflow, saturation, rounding & scaling Overflow results when an arithmetic operation requires more

bits than are available in the result register

Rounding will help maintain the number of bits low

May introduce offsets or accumulative errors

Scaling can be used to reduce the number of bits used

If all numbers are between -1 and 1 multiplication will result in a number between -1 and 1

Will result in larger round-off or quantization errors


Overflow Consider these 3-bit two’s complement numbers: 010, 011

Overflow! Maintaining the same number of bits gives an incorrect result.

An extra bit for the result will give the correct answer

Sign extension: 0011 + 0010 = 0101 (4 bit number = 5)

To avoid overflow we can use extra bits in the accumulator (guard bits)


Saturation

In previous operation, 010 + 011:

Sign extension: 0011 + 0010 = 0101 (4 bit number = 5)Result is OK but has an extra bit

Overflow is detected by checking the old sign bit position with the new sign bit bit 4 = bit 3? No Overflow

Saturate the result0101 saturated to 011 (maximum number that can be represented by 3 bits)

In filters, maybe we can saturate the result, but not the intermediate values


Rounding

Get rid of least significant bits in result (multiplication)

Round coefficients and/or datapath

Different rounding methods: Truncation, round floor, round ceiling, round half-up, round half-even, etc.

Refer to Programmable Logic Design Line article (Jan 4, 2006)

“An introduction to different rounding algorithms”


Rounding


Arithmetic Operations and Implementation StructuresOperations can be done using different approaches

Different word widths, number representations, etc.

Different clock frequencies and data rates

Others (Distributed arithmetic, optimizations)

Several factors determine the type of implementation

Required precision

Required data rates and sampling frequency

Resources available in the FPGA

Dedicated DSP blocks, multipliers, logic, memory, etc

Others (Power. External logic.)


Arithmetic Operations: Some multipliers(from Andraka’s web site, www.andraka.com) Scaling Accumulator Ripple-Carry Array Row Adder Tree

Computed Partial Product Partial Product LUT



Structures, Calculations and Optimizations Filter structures

Pipelining

Power of 2 operations

Cordic algorithms

Resource sharing

Look up tables for operations, look up table optimization

Type of memory used (block or distributed)

Etc.


Filter Structures (from Peled and Liu paper)


Pipelining


Operations and StructuresRefer to papers and articles:

“Multiplication in FPGAs”, Ray Andraka, www.andraka.com

“FPGAs: the high end alternative for DSP applications”, Chris Dick, DSP Engineering

“A New Hardware Realization of Digital Filters”, Peled and B Liu, IEEE Trans on Acoust., Speech, Signal Processing, Dec. 1974

“Application of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review“, Stanley White, IEEE ASSP Magazine, July 1989

“High Speed Binary Addition“, Robert Jackson, Sunil Talwar, IEEE Signals, Systems and Computers, 2004

Etc.


How do we go about doing DSP on an FPGA?

The Disintegration of the Persistence of Memory

Salvador Dali, 1954


Typical DSP design flow

Task Work on…

Design – Simulate Models

Code – Compile – Simulate Code (Assembly, C, HDLs)

Run - Test – Debug Platform

This is valid for CPU, DSP or FPGA based approaches… we are probably more careful if building and ASIC.


Possible design flows Code-based

Model-based

Mixed

Tools?

A basic example: Filtering

Three Apparitions of the Visage of Gala

Salvador Dali, 1945

guillermo güichal utn – frbb · integer division of system clock (sample rate = clock / 4 is...

Documents