data path - yonsei universitysoc.yonsei.ac.kr/class/material/dsp/datapath.pdf · 2017-03-06 · dsp...
TRANSCRIPT
1YONSEI UNIVERSITYDSP VLSI Design
Data PathOutline
Definition of DSP data pathsFixed-point data paths
Multiplier, accumulator registers, ALU, ShifterOverflow, saturation, roundingOperand supply
Floating-point data pathsMultiplier, accumulator registers, ALU, ShifterOverflow, saturation, roundingOperand supply
Special function units
2YONSEI UNIVERSITYDSP VLSI Design
Data PathWhat is the Data Path of a DSP Processor?
The part where the vital arithmetic manipulations of signals take place.
Highly specialized to achieve high performance of the types of computation most common in DSP applications, such as multiply-accumulate
The key feature of the DSP processorsThe feature that most clearly differentiate DSP processors from other kinds of processorsAlong with the memory architecture
ClassificationsFixed-point data pathFloating-point data path
3YONSEI UNIVERSITYDSP VLSI Design
Data PathFixed-Point Data Paths
Constituent componentsa multiplier, an ALU, one or more shifters, operand registers, accumulators, and other specialized unitsALU (Arithmetic Logic Unit)
In the book DSPFundamentals, refer to the combination adder/subtractor/logical functional unitSome vendors refer to the entire data path as ALU or arithmetic unit
AGU (Address Generation Unit)Separate hardware unit for a rich variety of address calculations
Modulo, bit-reversedSpecialized addressing modes in DSP processors in one of the factors distinguishing DSPs from other kinds of processors
ExceptionsDSP32C and DSP32xx provide separate floating-point and fixed-point data paths, and fixed point is used for address calculations
4YONSEI UNIVERSITYDSP VLSI Design
Data Path
Example of Fixed-point Data Path(DSP5600x, 24-bit Fixed-point DSP)
5YONSEI UNIVERSITYDSP VLSI Design
Data PathMultiplier
MultiplicationEssential operation in virtually all DSP applicationsHalf or more of instructions executed by the processor
Single-cycle multiplierCentral to the definition of a programmable digital signal processorVirtually all DSP processors contain a multiplier that can multiply two native-sized operands in a single instruction cycle
Multiplier capabilitiesThere are important differences among DSP processors
6YONSEI UNIVERSITYDSP VLSI Design
Data Path
Some Differences in terms of multiplier capabilities
Internal pipelining of the multiplierAll DSP multipliers can produce one new result per instruction cycle
One-cycle input latencyWithout pipelining
One-cycle output latency : One-cycle delay from the time inputs are presented to the multiplier until the time the resultis availableLow frequency
With pipeliningMultiple output latencyOne or more cycles must be spent waiting for the multiplier result when data dependency between instructions existsHigh frequency
7YONSEI UNIVERSITYDSP VLSI Design
Data Path
Some Differences in terms of multiplier capabilities
Integrated or separateIntegrated multiplier
Multiplier is integrated with an adder to form a multiplier-accumulator unitMAC result is not delayed for accumulationLow frequencyDSP5600x
Separate multiplierThe output is deposited into a product register, and from there can be sent to an adder for accumulationMAC operation result is delayed by one cycleHigh frequencyTwo independent operations can be carried out simultaneously in the multiplier and the adder (I think)AT&T DSP16xx
8YONSEI UNIVERSITYDSP VLSI Design
Data Path
Some Differences in terms of multiplier capabilities
The size of the product relative to the size of the input operands
Law of conservation of bitsWhen multiplying two n-bit fixed-point numbers, 2 × n bits are required to represent the resulting product without introducing any error-128 (8 bits) × -128 (8 bits) = 16,384 needs 16-bit output
A result that is twice the width of the input operandsThe multiplier itself does not introduce any errorsIn most fixed-point DSP processor multipliers
Narrower resultFor the sake of speed and costZilog Z893xx and the Clarkspur CD2400 core
16-bit operands and a 24-bit result
9YONSEI UNIVERSITYDSP VLSI Design
Data PathSelecting Multiplier Output Portion
To pass the full-width multiplication result to the next step of computation
Usually impossible (ever wider results)And not necessary (excessive dynamic range and precision are not useful)
Selection before or after accumulationSelects a subset of the multiplier output bits to be passed on to the next computationsAfter a series of multiplications and accumulations, the accumulator results is reduced
Selected portion of the resultUsually depends on the position of the radix pointThe position of the radix point of the selected result is the same as the radix point in the normal format
12YONSEI UNIVERSITYDSP VLSI Design
Data PathSupport for Fixed-point Multiplication
Optimization in fractional multiplicationFactional multiplication output (unmodified)
Radix point exists between xn-2 and xn-3 (xn-1 is the MSB) and xn-2 is an additional integer bitSo left-shift the result (xn-2 becomes the MSB)
Example : 0100 × 0100 = 00010000 (Is this correct?)00100000 is correct
If input operands is both NMAX (the largest negative value), theleft shift produces an incorrect result
1000 × 1000 = 01000000 -> 10000000 (incorrect)Need to adjust the input operands
Sometimes integer multiplication also needs this adjustmentAutomatic left shift by one bit in case of fractional multiplication
Hardware support for selecting the result portionTreat 2n-bit multiplier output register or accumulator register as two independently addressable n-bit registers
13YONSEI UNIVERSITYDSP VLSI Design
Data PathAccumulator Registers
Hold intermediate and final results of multiply-accumulate and other arithmetic operations
Two or more accumulators are neededGuard bits
Extra bits to allow the programmer to accumulate a number of values without the risk of overflowing the accumulator and without the need for scaling intermediate results to avoid overflowN guard bits
Provides the capacity for up to 2n values to be accumulated without the possibility of overflow
ExamplesAT&T DSP16xx provides four guard bits (32-bit accumulators and 32-bit multiplier product)Analog Devices ADSP-21xx provides eight guard bits (40-bit accumulators and 32-bit multiplier product)
14YONSEI UNIVERSITYDSP VLSI Design
Data PathAlternatives to Guard bits
Scaling multiplier outputsScaling the multiplier result by shifting it right by a few bits to avoid overflowSome processors support this shifting without requiring additional instruction cycles (TI TMS320C2x and TMS320C5x)Results in a loss of precision
Unless the amount of scaling used is extreme or the number of products being accumulated is very large, the loss of precision is smallIf in fractional arithmetic, after a series of multiply-accumulates, only the MSB half of the accumulator is retained
The loss of precision due to scaling multiplier output usually does not affect the final result
Scaling multiplier inputs (TI TMS320C1x)Significantly reduced precision
15YONSEI UNIVERSITYDSP VLSI Design
Data PathArithmetic Logic Unit (ALU)
Implement basic arithmetic and logical operationsAdd, subtract, increment, negate, and logical and, or and not
Addition for MACIn some processors, the ALU is used to perform addition for multiply-accumulate operationsIn other processors, a separate adder is provided
Width of logical operationsSome processors perform logical operations on operands that are the full width of the accumulator (DSP16xx 36-bit accumulator and 36-bit logical operation)Other processors can perform logical operations only on native-width data words (DSP5600x 56-bit accumulator and 24-bit logical operation)
16YONSEI UNIVERSITYDSP VLSI Design
Data PathShifter
Necessity of shiftingProgrammer will want to choose a particular subset of the result bits to pass along to the next stage of processing
A shifter eases selection by scaling its input by 2n
Example (FIR filter with a gain of 100)
17YONSEI UNIVERSITYDSP VLSI Design
Data PathShifter
Trade-off of scaling signalsTo avoid/prevent overflowLoss of precision and dynamic range
Type of shiftersOne or zero bit shift at a time
Multiple cycles are consumed for multibit shiftsBarrel shifter
Supports shifts by any number of bits in a single cyclePosition of shifters
Between the multiplier and Adder/ALUIn path from the accumulator to the memory
Guard bits do not remove the need for scalingInside the accumulator (supporting logical/rotate-type shifts)Some DSP processors provide multiple shifters
Ex) DSP5600x
18YONSEI UNIVERSITYDSP VLSI Design
Data PathOverflow
The event that the magnitude of a real value exceed the maximum value that can be represented by a storage with a specific formatExamples
In filtering algorithms, the accumulator value may grow by a series of multiply-accumulates and eventually exceed the maximum valueIn a 2-digit decimal number system, addition of 50, 45, and 20 produce 15 instead of the correct result 115When the accumulator value is transferred to memory if the accumulator provides guard bitsWhen a shifter is used to scale up the accumulator value as it is stored to memory
19YONSEI UNIVERSITYDSP VLSI Design
Data PathDealing with Overflow
ScalingScale all computations to eliminate the possibility of overflowCan be effectiveBut signal fidelity (dynamic range and precision) cannot be maintained
Saturation arithmeticA special circuit detects when overflow has occurred and replaces the erroneous output with the largest positive/negativenumberIn a 2-digit decimal system, the result of adding 50, 45, and 20 with saturation is 99Useful and safe because it often is not practical or desirable to scale signals to eliminate the possibility of overflowHardware unit for saturation arithmetic (automatically) is called a limiter by some manufacturers
20YONSEI UNIVERSITYDSP VLSI Design
Data PathMethods to Reduce the PrecisionTruncation
Simplest, but the truncated value is always smaller than or equal to the original
Truncation adds a bias or offset to signalsRound-to-Nearest
Conventional type of rounding that we use in everyday arithmeticPerformed automatically by hardware or by a special instruction
Add a constant equal to one half the value of the LSB to the value to be rounded, and truncate the result
Automatically by hardware, by a special instruction to preloading the accumulator with the appropriate constant value, a normal move to preload, or by adding the constant to the accumulator
Numbers at the midpoint between the two nearest output values are rounded up to the higher (more positive)
Add a bias to signals, though it is much smaller than the bias introduced by truncation
21YONSEI UNIVERSITYDSP VLSI Design
Data PathMethods to Reduce the Precision
Convergent rounding (round-to-even)When a number to be rounded lies exactly at the midpoint between the two nearest values, it may be rounded higher or lower depending on the value of the LSB
If the LSB is zero, then round down; if the LSB is one, then round upIf the LSB is equally likely to be zero or one in midpoint cases, convergent rounding avoids the bias by round-to-nearest
Not supported in hardware by most fixed-point DSPsADSP-21xx and DSP5600x families provide convergent rounding
Other rounding techniquesRound-to-zero, round-to-negative, etc.In some applications, the rounding technique to be used is specified by a published technical standard
23YONSEI UNIVERSITYDSP VLSI Design
Data PathOperand Supply
Load-store architectureOperands of the data path are supplied from a small number of operand registers for from the accumulator(s)Values must be loaded into the operand register before they can be processed by the data pathRegister-direct addressing
Memory-oriented architectureOperands can be also supplied directly from memory to the data pathMemory-direct, register-indirect, memory-indirect, etc.TI TMS320C2x/C5x, and DSP Group PineDSPCore
Operand can be supplied from a field in the instructionImmediate addressing
24YONSEI UNIVERSITYDSP VLSI Design
Data PathFloating-Point Data Paths
Similar to those found in fixed-point DSP processors, but differ in several respectsIn most floating-point DSPs, the main data path is capable of both floating-point and fixed-point calculations
Can handle only one type of operation in a given instruction cycleFloating-point DSPs from TI, Analog Devices, and Motorola
Some floating-point processors provide two data pathsFixed-point data path does not include a multiplierAT&T DSP32xx
Control arithmetic unit handles both address calculations and general-purpose integer arithmeticData arithmetic handles floating-point calculations
25YONSEI UNIVERSITYDSP VLSI Design
Data Path
Typical Floating-Point Data Path(From the AT&T DSP3210)
26YONSEI UNIVERSITYDSP VLSI Design
Data PathMultipliers and ALUs
MultipliersUnlike fixed-point DSP multiplier, generally do not produce an output word large enough to maintain full precisionInstead, the output format is commonly somewhat larger than the input format, usually providing an extra eight to twelve bits of mantissa precision
ALUsProvide addition, subtraction, absolute value, negate, minimum, maximum, and other specialized operationsAddition for multiply-accumulate operations
Some processors (for example, the AT&T DSP32C) provide a multiply-add operation
The result is written into an different accumulatorBit-wise logical operations are usually not provided by floating-point-only ALUs
28YONSEI UNIVERSITYDSP VLSI Design
Data PathExceptions
Unusual conditions that may cause erroneous arithmetic results
Set the appropriate bit in a status register or cause an interrupt (optional on some processors)
OverflowMuch less of a concern than with fixed-point, but for some applications still a real concernSet a status flag and automatically saturate the result
UnderflowOccurs when the result of arithmetic operation is too small to be representedSet the result to zero and set a flag in a status register
Division by zero
29YONSEI UNIVERSITYDSP VLSI Design
Data PathOthers in Data Paths
RoundingMost floating-point processors automatically rounds arithmetic results to 40- to 44-bit intermediate formatWhen final results are written to memory, the can be written either extended-precision values or rounded to the native single-precision formatMost processors provide the simplest round-to-nearest
Some have more optionsConvergent rounding, round toward positive/negative infinity, and round toward zero
Accumulator RegistersIn general, floating-point processors have more an larger registers than their fixed-point counterparts
ShifterWith Floating-point arithmetic the hardware automatically scales the results to preserve the maximum precision possibleNot visible to programmers
Operand supply – nothing to mention
30YONSEI UNIVERSITYDSP VLSI Design
Data PathSpecial Function Units
As demand grows for DSP processor use in specialized applications, processor manufacturers have begun to incorporate specialized hardware into their processors’ data paths to improve performanceAT&T 1610
Intended for speech coding in digital mobile radio applications,
Involve a large number of bit field operations (for example, inserting a series of n bits at a specified location within an m-bit word)
Add a specialized bit manipulation unit (BMU) and instructions to access this function unit