digital signal processing dr. hugh blanton entc 4337/entc 5337

24
DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Upload: christine-madison-craig

Post on 29-Dec-2015

230 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

DIGITAL SIGNAL PROCESSING

Dr. Hugh Blanton

ENTC 4337/ENTC 5337

Page 2: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 2

OutlineOutline

• Signal processing applications

• Conventional DSP architecture

• TI TMS320C6000 DSP architecture introduction

• Signal processing on general-purpose processors

• Conclusion

Page 3: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 3

Signal Processing ApplicationsSignal Processing Applications• Embedded system demand: product volume matters

• 400 Million units/year: automobiles, PCs, and cell phones

• 30 Million units/year: ADSL modems and printers

• Embedded system cost and input/output rates• Low-cost, medium-throughput: low-end printers,

wireless handsets, sound cards, car audio, disk drives

• High-cost, high-throughput: high-end printers,wireless basestations, 3-D sonar, 3-D images from2-D X-rays (tomographic reconstruction)

• Embedded processor requirements• Inexpensive with small area and volume

• Predictable input/output (I/O) rates to/from processor

• Power constraints (severe for handheld devices)

Single DSP

Multiple DSPs

Page 4: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 4

Conventional DSP ProcessorsConventional DSP Processors• Low cost: $3/processor in volume

• Deterministic interrupt service routine latency guarantees predictable input/output rates• On-chip direct memory access (DMA) controllers

• Processes streaming input/output separately from CPU

• Sends interrupt to CPU when block has been read/written

• Ping-pong buffering• CPU reads/writes buffer 1 while DMA reads/writes buffer 2

• When DMA finishes with buffer 2, roles of buffer 1 & 2 switch

Page 5: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 5

Conventional DSP ProcessorsConventional DSP Processors• Low power consumption: 10-100 mW

• TI TMS320C54 0.32 mA/MIP 76.8 mW at 1.5 V, 160 MHz

• TI TMS320C55 0.05 mA/MIP 22.5 mW at 1.5 V, 300 MHz

Page 6: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 6

Conventional DSP ArchitectureConventional DSP Architecture• Multiply-accumulate (MAC) in one instruction cycle

• Harvard architecture for fast on-chip input/output

• Data memory/bus(es) separate from program memory/bus

• One read from program memory per instruction cycle

• Two reads/writes from/to data memory per instruction cycle

• Instructions to keep pipeline (3-6 stages) full

• Zero-overhead looping (one pipeline flush to set up)

• Delayed branches

• Special addressing modes supported in hardware

• Bit-reversed addressing (e.g. fast Fourier transforms)

• Modulo addressing for circular buffers (e.g. FIR filters)

Page 7: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 7

Conventional DSP Architecture (con’t)

xN-K+1 xN-K+2xN-1 xN

Data ShiftingTime Buffer contents Next sample

xN+1

xN+3

xN+2

n=N

n=N+1

n=N+2 xN-K+3 xN-K+4xN+1 xN+2

xN-K+2 xN-K+3xN xN+1

• Buffer of length K• Used in finite and

infinite impulse response filters

• Linear buffer• Order by time index• Data shifting update:

discard oldest data, copy old data left, insert new data

Page 8: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 8

Conventional DSP Architecture (con’t)

Modulo AddressingTime Buffer contents Next sample

n=N

n=N+1

n=N+2

xN-2 xN-1xN-K+1 xN-K+2

xN-K+4

xN+1

xN+2

xN+3

xN-2 xN-1xN+1 xN-K+2xN

xN-2 xN-1xN+1 xN+2xN

xN

xN

xN

xN-K+3

xN-K+3 xN-K+4

• Circular buffer• Index oldest sample • Modulo addressing update: insert new data at oldest index,

update oldest index

Page 9: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 9

Conventional DSP Processors Summary Conventional DSP Processors Summary Fixed-Point Floating-Point Cost/Unit $3 - $79 $3 - $381 Architecture Accumulator load-store or

memory-register Registers 2-4 data

8 address 8 or 16 data

8 or 16 address Data Words 16 or 24 bit integer

and fixed-point 32 bit integer and fixed/floating-point

On-Chip Memory

2-64 kwords data 2-64 kwords program

8-64 kwords data 8-64 kwords program

Address Space

16-128 kw data 16-64 kw program

16 Mw – 4Gw data 16 Mw – 4 Gw program

Compilers C, C++ compilers; poor code generation

C, C++ compilers; better code generation

Examples TI TMS320C5000; Motorola 56000

TI TMS320C30; Analog Devices SHARC

Page 10: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 10

Conventional DSP Processor Families

Conventional DSP Processor Families

• Floating-point DSPs

• Used in first pass prototyping of algorithms

• Resurgence due to professional and car audio

• Different on-chip configurations in each family• Size and map of data and program memory

• A/D, input/output buffers, interfaces, timers, and D/A

• Drawbacks to conventional DSP processors

• No byte addressing (needed for images and video)

• Limited on-chip memory

• Limited addressable memory on fixed-point DSPs (exceptions include Motorola 56300 and TI C5409)

• Non-standard C extensions for fixed-point data type

DSP MarketFixed-point 95%Floating-point 5%

Page 11: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 11

Program RAMData RAM

or Cache

Internal Buses

Control Regs

Regs (B

0-B

15

)

Regs (A

0-A

15

)

.D1

.M1

.L1

.S1

.D2

.M2

.L2

.S2

CPU

Addr

Data

ExternalMemory -Sync -Async

DMA

Serial Port

Host Port

Boot Load

Timers

Pwr Down

TI TMS320C6000 DSP Architecture TI TMS320C6000 DSP Architecture

Simplified Architectur

e

Page 12: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 12

TI TMS320C6000 DSP ArchitectureTI TMS320C6000 DSP Architecture

• Families: All support same C6000 instruction set

• C6200 fixed-point 150- 300 MHz ADSL, printers

• C6400 fixed-point 300-1,000 MHz video/comm. apps.

• C6700 floating-point 100- 225 MHz medical imaging, pro-audio

• TMS320C6211: 150 MHz, $21 in volume

• 300 million multiply-accumulates/s, 1200 RISC MIPS

• On-chip memory: 16 kwords program, 16 kwords data

• TMS320C6701 Evaluation Module Board 167 MHz

• 334 million multiply-accumulates/s, 1336 RISC MIPS

• On-chip memory: 16 kwords program, 16 kwords data

• External: one 133-MHz 64-kword, two 100-MHz 1-Mword

Page 13: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 13

TI TMS320C6000 DSP ArchitectureTI TMS320C6000 DSP Architecture

• Very long instruction word (VLIW) size of 256 bits

• Eight 32-bit functional units with single cycle throughput

• One instruction cycle per clock cycle

• Data word size is 32 bits

• 16 (32 on C64) 32-bit registers in each of two data paths

• 40 bits can be stored in adjacent even/odd registers

• Two parallel data paths

• Data unit - 32-bit address calculations (modulo, linear)

• Multiplier unit - 16 bit 16 bit with 32-bit result

• Logical unit - 40-bit (saturation) arithmetic & compares

• Shifter unit - 32-bit integer ALU and 40-bit shifter

Page 14: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 14

TI TMS320C6000 Instruction SetTI TMS320C6000 Instruction Set

.S UnitADD NEGADDK NOTADD2 ORAND SETB SHLCLR SHREXT SSHLMV SUBMVC SUB2MVK XORMVKH ZERO

.L UnitABS NOTADD ORAND SADDCMPEQ SATCMPGT SSUBCMPLT SUBLMBD SUBCMV XORNEG ZERONORM

.M UnitMPY SMPYMPYH SMPYH

.D UnitADD STADDA SUBLD SUBAMV ZERONEG

OtherNOP IDLE

C6000 Instruction Set by Functional Unit

Six of the eight functional units can perform integer add, subtract, and move operations

Page 15: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 15

TI TMS320C6000 Instruction SetTI TMS320C6000 Instruction SetArithmeti

cABSADD

ADDAADDKADD2MPY

MPYHNEGSMPY

SMPYHSADDSAT

SSUBSUB

SUBASUBCSUB2ZERO

LogicalAND

CMPEQCMPGTCMPLTNOTORSHLSHRSSHLXOR

BitManagement

CLREXT

LMBDNORMSET

DataManagement

LDMV

MVCMVK

MVKHST

ProgramControl

BIDLENOP

C6000 InstructionSet by Category

(un)signed multiplicationsaturation/packed arithmetic

Page 16: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 16

C6000 vs. C5000 Addressing Modes

ADD #0FFh add .L1 -13,A1,A6

TI C5000

TI C6000

(implied) add .L1 A7,A6,A7

ADD 010h not supported

ADD * ldw .L1 *A5++[8],A1

• Immediate• The operand is part of

the instruction

• Register• Operand is specified in

a register

• Direct• Address of operand is

part of the instruction (added to imply memory page)

• Indirect• Address of operand is

stored in a register

Page 17: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 17

TI TMS320C6000 DSP ArchitectureTI TMS320C6000 DSP Architecture

• Deep pipeline

• 7-11 stages in C6200: fetch 4, decode 2, execute 1-5

• 7-16 stages in C6700: fetch 4, decode 2, execute 1-10

• Pentium IV has an estimated 20 pipeline stages

• Avoid using branch instructions in code

• Branch instruction in pipeline disables interrupts: latency of a branch is 5 cycles

• Avoid branches by using conditional execution: every instruction can be conditionally executed

• No hardware protection against pipeline hazards

• Compiler and assembler must prevent pipeline hazards

Page 18: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 18

TI TMS320C6700 ExtensionsTI TMS320C6700 Extensions

.S UnitABSDP CMPLTSP

ABSSP RCPDPCMPEQDP RCPSP CMPEQSP RSARDP CMPGTDP RSQRSP CMPGTSP SPDPCMPLTDP

.L UnitADDDP INTSPADDSP SPINTDPINT SPTRUNCDPSP SUBDPDPTRUNC SUBSPINTDP

.M UnitMPYDP MPYIDMPYI MPYSP

.D UnitADDAD LDDW

C6700 Floating Point Extensions by Unit

Four functional units can perform IEEE single-precision (SP) and double-precision (DP) floating-point add, subtract, move.

Operations beginning with R are reciprocal calculations.

Page 19: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 19

Digital Signal Processor Cores

• ASIC with

• Programmable digital

signal processor core

• RAM

• ROM

• Standard cells

• Codec

• Peripherals

• Gate array

• Microcontroller core

Application Specific Integrated Circuit

Page 20: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 20

General Purpose ProcessorsGeneral Purpose Processors

• Multimedia applications on PCs• Video, audio, graphics and animation

• Repetitive parallel sequences of instructions

• Single Instruction Multiple Data (SIMD)

• One instruction acts on multiple data in parallel

• Well-suited for graphics

• Native signal processing extensions use SIMD

• Sun Visual Instruction Set [1995] (UltraSPARC 1/2)

• Intel MMX [1996] (Pentium I/II/III/IV)

• Intel Streaming SIMD Extensions (Pentium III)

Page 21: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 21

DSP on General Purpose Processors (con’t)DSP on General Purpose Processors (con’t)

• Programming is considerably tougher

• Ability of compilers to generate code for instruction set extensions may lag (e.g. four years for Pentium MMX)

• Libraries of routines using native signal processing

• Hand code in assembly for best performance

• Single-instruction multiple-data (SIMD) approach

• Pack/unpack data not aligned on SIMD word boundaries

• Saturation arithmetic in MMX; not supported in VIS

• Extended-precision accumulation in MMX; none in VIS

• Application speedup for Intel MMX and Sun VIS

• Signal and image processing: 1.5:1 to 2:1

• Graphics: 4:1 to 6:1 (no packing/unpacking)

Page 22: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 22

Concluding RemarksConcluding Remarks• Conventional digital signal processors

• High performance vs. power consumption/cost/volume• Excellent at one-dimensional processing• Per cycle: 1 16 16 MAC & 4 16-bit RISC instructions

• TMS320C6000 VLIW DSP family• High performance vs. cost/volume• Excellent at multidimensional signal processing• Per cycle: 2 16 16 MACs & 4 32-bit RISC instructions

• Native signal processing• Available on desktop computers• Excels at graphics• Per cycle: 2 8 16 MACs OR 8 8-bit RISC instructions

• Use assembly for computational kernels and C for main program (control code, interrupt def.)

Page 23: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 23

Concluding RemarksConcluding Remarks• Digital signal processor market

• 40% annual growth rate 1990-2000 fastest in semiconductor market• Revenue: $3.5B ‘98, $4.4B ‘99, $6.1B ‘00, $4.5B ‘01, $4.9B ‘02• 2000: 44% TI, 23% Agere, 13% Motorola, 10% Analog Devices• 2001: 40% TI, 16% Agere, 12% Motorola, 8% Analog Devices• 2002: 43% TI, 14% Motorola, 14% Agere, 9% Analog Devices

• Independent processor benchmarking by industry• Berkeley Design Technology Inc. http://www.bdti.com• EDN Embedded Microprocessor Benchmark Consortium

http://www.eembc.org• Web resources

• Newsgroup comp.dsp: FAQ http://www.bdti.com/faq/dsp_faq.html

• Embedded processors and systems: http://www.eg3.com• On-line courses: http://www.techonline.com

Page 24: DIGITAL SIGNAL PROCESSING Dr. Hugh Blanton ENTC 4337/ENTC 5337

Dr. Blanton - ENTC 4337 - General Architecture 24

Concluding RemarksConcluding Remarks• Web resources

• Newsgroup comp.dsp: FAQ http://www.bdti.com/faq/dsp_faq.html

• Embedded processors and systems: http://www.eg3.com• On-line courses: http://www.techonline.com