data path - yonsei universitysoc.yonsei.ac.kr/class/material/dsp/datapath.pdf · 2017-03-06 · dsp...

DSP VLSI Design

Data Path

Byungin Moon

Yonsei University

1YONSEI UNIVERSITYDSP VLSI Design

Data PathOutline

Definition of DSP data pathsFixed-point data paths

Multiplier, accumulator registers, ALU, ShifterOverflow, saturation, roundingOperand supply

Floating-point data pathsMultiplier, accumulator registers, ALU, ShifterOverflow, saturation, roundingOperand supply

Special function units


Data PathWhat is the Data Path of a DSP Processor?

The part where the vital arithmetic manipulations of signals take place.

Highly specialized to achieve high performance of the types of computation most common in DSP applications, such as multiply-accumulate

The key feature of the DSP processorsThe feature that most clearly differentiate DSP processors from other kinds of processorsAlong with the memory architecture

ClassificationsFixed-point data pathFloating-point data path


Data PathFixed-Point Data Paths

Constituent componentsa multiplier, an ALU, one or more shifters, operand registers, accumulators, and other specialized unitsALU (Arithmetic Logic Unit)

In the book DSPFundamentals, refer to the combination adder/subtractor/logical functional unitSome vendors refer to the entire data path as ALU or arithmetic unit

AGU (Address Generation Unit)Separate hardware unit for a rich variety of address calculations

Modulo, bit-reversedSpecialized addressing modes in DSP processors in one of the factors distinguishing DSPs from other kinds of processors

ExceptionsDSP32C and DSP32xx provide separate floating-point and fixed-point data paths, and fixed point is used for address calculations


Data Path

Example of Fixed-point Data Path(DSP5600x, 24-bit Fixed-point DSP)


Data PathMultiplier

MultiplicationEssential operation in virtually all DSP applicationsHalf or more of instructions executed by the processor

Single-cycle multiplierCentral to the definition of a programmable digital signal processorVirtually all DSP processors contain a multiplier that can multiply two native-sized operands in a single instruction cycle

Multiplier capabilitiesThere are important differences among DSP processors


Data Path

Some Differences in terms of multiplier capabilities

Internal pipelining of the multiplierAll DSP multipliers can produce one new result per instruction cycle

One-cycle input latencyWithout pipelining

One-cycle output latency : One-cycle delay from the time inputs are presented to the multiplier until the time the resultis availableLow frequency

With pipeliningMultiple output latencyOne or more cycles must be spent waiting for the multiplier result when data dependency between instructions existsHigh frequency


Data Path


Integrated or separateIntegrated multiplier

Multiplier is integrated with an adder to form a multiplier-accumulator unitMAC result is not delayed for accumulationLow frequencyDSP5600x

Separate multiplierThe output is deposited into a product register, and from there can be sent to an adder for accumulationMAC operation result is delayed by one cycleHigh frequencyTwo independent operations can be carried out simultaneously in the multiplier and the adder (I think)AT&T DSP16xx


Data Path


The size of the product relative to the size of the input operands

Law of conservation of bitsWhen multiplying two n-bit fixed-point numbers, 2 × n bits are required to represent the resulting product without introducing any error-128 (8 bits) × -128 (8 bits) = 16,384 needs 16-bit output

A result that is twice the width of the input operandsThe multiplier itself does not introduce any errorsIn most fixed-point DSP processor multipliers

Narrower resultFor the sake of speed and costZilog Z893xx and the Clarkspur CD2400 core

16-bit operands and a 24-bit result


Data PathSelecting Multiplier Output Portion

To pass the full-width multiplication result to the next step of computation

Usually impossible (ever wider results)And not necessary (excessive dynamic range and precision are not useful)

Selection before or after accumulationSelects a subset of the multiplier output bits to be passed on to the next computationsAfter a series of multiplications and accumulations, the accumulator results is reduced

Selected portion of the resultUsually depends on the position of the radix pointThe position of the radix point of the selected result is the same as the radix point in the normal format


Data Path

Selecting Result Portionsin Integer Multiplication


Data Path

Selecting Result Portionsin Fractional Multiplication


Data PathSupport for Fixed-point Multiplication

Optimization in fractional multiplicationFactional multiplication output (unmodified)

Radix point exists between xn-2 and xn-3 (xn-1 is the MSB) and xn-2 is an additional integer bitSo left-shift the result (xn-2 becomes the MSB)

Example : 0100 × 0100 = 00010000 (Is this correct?)00100000 is correct

If input operands is both NMAX (the largest negative value), theleft shift produces an incorrect result

1000 × 1000 = 01000000 -> 10000000 (incorrect)Need to adjust the input operands

Sometimes integer multiplication also needs this adjustmentAutomatic left shift by one bit in case of fractional multiplication

Hardware support for selecting the result portionTreat 2n-bit multiplier output register or accumulator register as two independently addressable n-bit registers


Data PathAccumulator Registers

Hold intermediate and final results of multiply-accumulate and other arithmetic operations

Two or more accumulators are neededGuard bits

Extra bits to allow the programmer to accumulate a number of values without the risk of overflowing the accumulator and without the need for scaling intermediate results to avoid overflowN guard bits

Provides the capacity for up to 2n values to be accumulated without the possibility of overflow

ExamplesAT&T DSP16xx provides four guard bits (32-bit accumulators and 32-bit multiplier product)Analog Devices ADSP-21xx provides eight guard bits (40-bit accumulators and 32-bit multiplier product)


Data PathAlternatives to Guard bits

Scaling multiplier outputsScaling the multiplier result by shifting it right by a few bits to avoid overflowSome processors support this shifting without requiring additional instruction cycles (TI TMS320C2x and TMS320C5x)Results in a loss of precision

Unless the amount of scaling used is extreme or the number of products being accumulated is very large, the loss of precision is smallIf in fractional arithmetic, after a series of multiply-accumulates, only the MSB half of the accumulator is retained

The loss of precision due to scaling multiplier output usually does not affect the final result

Scaling multiplier inputs (TI TMS320C1x)Significantly reduced precision


Data PathArithmetic Logic Unit (ALU)

Implement basic arithmetic and logical operationsAdd, subtract, increment, negate, and logical and, or and not

Addition for MACIn some processors, the ALU is used to perform addition for multiply-accumulate operationsIn other processors, a separate adder is provided

Width of logical operationsSome processors perform logical operations on operands that are the full width of the accumulator (DSP16xx 36-bit accumulator and 36-bit logical operation)Other processors can perform logical operations only on native-width data words (DSP5600x 56-bit accumulator and 24-bit logical operation)


Data PathShifter

Necessity of shiftingProgrammer will want to choose a particular subset of the result bits to pass along to the next stage of processing

A shifter eases selection by scaling its input by 2n

Example (FIR filter with a gain of 100)


Data PathShifter

Trade-off of scaling signalsTo avoid/prevent overflowLoss of precision and dynamic range

Type of shiftersOne or zero bit shift at a time

Multiple cycles are consumed for multibit shiftsBarrel shifter

Supports shifts by any number of bits in a single cyclePosition of shifters

Between the multiplier and Adder/ALUIn path from the accumulator to the memory

Guard bits do not remove the need for scalingInside the accumulator (supporting logical/rotate-type shifts)Some DSP processors provide multiple shifters

Ex) DSP5600x


Data PathOverflow

The event that the magnitude of a real value exceed the maximum value that can be represented by a storage with a specific formatExamples

In filtering algorithms, the accumulator value may grow by a series of multiply-accumulates and eventually exceed the maximum valueIn a 2-digit decimal number system, addition of 50, 45, and 20 produce 15 instead of the correct result 115When the accumulator value is transferred to memory if the accumulator provides guard bitsWhen a shifter is used to scale up the accumulator value as it is stored to memory


Data PathDealing with Overflow

ScalingScale all computations to eliminate the possibility of overflowCan be effectiveBut signal fidelity (dynamic range and precision) cannot be maintained

Saturation arithmeticA special circuit detects when overflow has occurred and replaces the erroneous output with the largest positive/negativenumberIn a 2-digit decimal system, the result of adding 50, 45, and 20 with saturation is 99Useful and safe because it often is not practical or desirable to scale signals to eliminate the possibility of overflowHardware unit for saturation arithmetic (automatically) is called a limiter by some manufacturers


Data PathMethods to Reduce the PrecisionTruncation

Simplest, but the truncated value is always smaller than or equal to the original

Truncation adds a bias or offset to signalsRound-to-Nearest

Conventional type of rounding that we use in everyday arithmeticPerformed automatically by hardware or by a special instruction

Add a constant equal to one half the value of the LSB to the value to be rounded, and truncate the result

Automatically by hardware, by a special instruction to preloading the accumulator with the appropriate constant value, a normal move to preload, or by adding the constant to the accumulator

Numbers at the midpoint between the two nearest output values are rounded up to the higher (more positive)

Add a bias to signals, though it is much smaller than the bias introduced by truncation


Data PathMethods to Reduce the Precision

Convergent rounding (round-to-even)When a number to be rounded lies exactly at the midpoint between the two nearest values, it may be rounded higher or lower depending on the value of the LSB

If the LSB is zero, then round down; if the LSB is one, then round upIf the LSB is equally likely to be zero or one in midpoint cases, convergent rounding avoids the bias by round-to-nearest

Not supported in hardware by most fixed-point DSPsADSP-21xx and DSP5600x families provide convergent rounding

Other rounding techniquesRound-to-zero, round-to-negative, etc.In some applications, the rounding technique to be used is specified by a published technical standard


Data PathRound-to-Nearest vs. Convergent Rounding


Data PathOperand Supply

Load-store architectureOperands of the data path are supplied from a small number of operand registers for from the accumulator(s)Values must be loaded into the operand register before they can be processed by the data pathRegister-direct addressing

Memory-oriented architectureOperands can be also supplied directly from memory to the data pathMemory-direct, register-indirect, memory-indirect, etc.TI TMS320C2x/C5x, and DSP Group PineDSPCore

Operand can be supplied from a field in the instructionImmediate addressing


Data PathFloating-Point Data Paths

Similar to those found in fixed-point DSP processors, but differ in several respectsIn most floating-point DSPs, the main data path is capable of both floating-point and fixed-point calculations

Can handle only one type of operation in a given instruction cycleFloating-point DSPs from TI, Analog Devices, and Motorola

Some floating-point processors provide two data pathsFixed-point data path does not include a multiplierAT&T DSP32xx

Control arithmetic unit handles both address calculations and general-purpose integer arithmeticData arithmetic handles floating-point calculations


Data Path

Typical Floating-Point Data Path(From the AT&T DSP3210)


Data PathMultipliers and ALUs

MultipliersUnlike fixed-point DSP multiplier, generally do not produce an output word large enough to maintain full precisionInstead, the output format is commonly somewhat larger than the input format, usually providing an extra eight to twelve bits of mantissa precision

ALUsProvide addition, subtraction, absolute value, negate, minimum, maximum, and other specialized operationsAddition for multiply-accumulate operations

Some processors (for example, the AT&T DSP32C) provide a multiply-add operation

The result is written into an different accumulatorBit-wise logical operations are usually not provided by floating-point-only ALUs


Data Path

Examples of Specialized Floating-PointALU Operations


Data PathExceptions

Unusual conditions that may cause erroneous arithmetic results

Set the appropriate bit in a status register or cause an interrupt (optional on some processors)

OverflowMuch less of a concern than with fixed-point, but for some applications still a real concernSet a status flag and automatically saturate the result

UnderflowOccurs when the result of arithmetic operation is too small to be representedSet the result to zero and set a flag in a status register

Division by zero


Data PathOthers in Data Paths

RoundingMost floating-point processors automatically rounds arithmetic results to 40- to 44-bit intermediate formatWhen final results are written to memory, the can be written either extended-precision values or rounded to the native single-precision formatMost processors provide the simplest round-to-nearest

Some have more optionsConvergent rounding, round toward positive/negative infinity, and round toward zero

Accumulator RegistersIn general, floating-point processors have more an larger registers than their fixed-point counterparts

ShifterWith Floating-point arithmetic the hardware automatically scales the results to preserve the maximum precision possibleNot visible to programmers

Operand supply – nothing to mention


Data PathSpecial Function Units

As demand grows for DSP processor use in specialized applications, processor manufacturers have begun to incorporate specialized hardware into their processors’ data paths to improve performanceAT&T 1610

Intended for speech coding in digital mobile radio applications,

Involve a large number of bit field operations (for example, inserting a series of n bits at a specified location within an m-bit word)

Add a specialized bit manipulation unit (BMU) and instructions to access this function unit

data path - yonsei universitysoc.yonsei.ac.kr/class/material/dsp/datapath.pdf · 2017-03-06 · dsp...

Documents