intel® stratix® 10 variable precision dsp blocks user guide · 2020-01-18 · 1. intel ® stratix...

Intel® Stratix® 10 Variable PrecisionDSP Blocks User Guide

Updated for Intel® Quartus® Prime Design Suite: 19.3

SubscribeSend Feedback

UG-S10-DSP | 2019.10.22Latest document on the web: PDF | HTML

https://www.intel.com/content/www/us/en/programmable/bin/rssdoc?name=kly1436148709581

mailto:[email protected]?subject=Feedback%20on%20Intel%20Stratix%2010%20Variable%20Precision%20DSP%20Blocks%20User%20Guide%20(UG-S10-DSP%202019.10.22)&body=We%20appreciate%20your%20feedback.%20In%20your%20comments,%20also%20specify%20the%20page%20number%20or%20paragraph.%20Thank%20you.

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/stratix-10/ug-s10-dsp.pdf

https://www.intel.com/content/www/us/en/programmable/documentation/kly1436148709581.html

Contents

1. Intel® Stratix® 10 Variable Precision DSP Blocks Overview............................................ 41.1. Features...............................................................................................................41.2. Supported Operational Modes in Intel Stratix 10 Devices.............................................51.3. Resources.............................................................................................................7

2. Block Architecture Overview...........................................................................................92.1. Input Register Bank for Fixed-Point and Floating-Point Arithmetic............................... 122.2. Pipeline Registers for Fixed-Point and Floating-Point Arithmetic.................................. 142.3. Pre-adder for Fixed-Point Arithmetic....................................................................... 152.4. Internal Coefficient for Fixed-Point Arithmetic.......................................................... 152.5. Multipliers for Fixed-Point and Floating-Point Arithmetic............................................ 152.6. Adder or Subtractor for Fixed-Point and Floating-Point Arithmetic............................... 162.7. Accumulator, Chainout Adder, and Preload Constant for Fixed-Point Arithmetic............. 162.8. Systolic Register for Fixed-Point Arithmetic..............................................................172.9. Double Accumulation Register for Fixed-Point Arithmetic........................................... 172.10. Output Register Bank for Fixed-Point and Floating-Point Arithmetic........................... 182.11. Exception Handling for Floating-Point Arithmetic.....................................................19

3. Operational Mode Descriptions..................................................................................... 223.1. Operational Modes for Fixed-Point Arithmetic...........................................................22

3.1.1. Independent Multiplier Mode......................................................................223.1.2. Multiplier Adder Sum Mode........................................................................243.1.3. Independent Complex Multiplier.................................................................243.1.4. 18 × 19 Multiplication Summed with 36-Bit Input Mode................................ 253.1.5. Systolic FIR Mode.................................................................................... 26

3.2. Operational Modes for Floating-Point Arithmetic....................................................... 293.2.1. Single Floating-Point Arithmetic Functions................................................... 293.2.2. Multiple Floating-Point Arithmetic Functions.................................................32

4. Design Considerations.................................................................................................. 394.1. Internal Coefficient and Pre-Adder for Fixed-Point Arithmetic..................................... 394.2. Accumulator for Fixed-Point Arithmetic................................................................... 394.3. Chainout Adder....................................................................................................404.4. Input Cascade for Fixed-Point Arithmetic.................................................................40

5. Intel Stratix 10 Variable Precision DSP Blocks Implementation Guide.......................... 43

6. Native Fixed Point DSP Intel Stratix 10 FPGA IP Core References.................................446.1. Native Fixed Point DSP Intel Stratix 10 FPGA IP Release Information...........................456.2. Supported Operational Modes................................................................................466.3. Maximum Input Data Width for Fixed-Point Arithmetic.............................................. 47

6.3.1. Using Less Than 36-Bit Operand In 18 x 18 Plus 36 Mode Example................. 506.4. Parameterizing Native Fixed Point DSP IP Core.........................................................50

6.4.1. Native Fixed Point DSP Intel Stratix 10 FPGA IP Parameters........................... 516.5. Signals...............................................................................................................56

7. Multiply Adder IP Core References................................................................................607.1. Multiply Adder Intel FPGA IP Release Information.....................................................617.2. Features............................................................................................................. 62

Contents

Intel® Stratix® 10 Variable Precision DSP Blocks User Guide Send Feedback

2


7.2.1. Pre-adder............................................................................................... 627.2.2. Systolic Delay Register............................................................................. 647.2.3. Pre-load Constant.................................................................................... 687.2.4. Double Accumulator................................................................................. 68

7.3. Parameters......................................................................................................... 697.3.1. General Tab.............................................................................................697.3.2. Extra Modes............................................................................................ 697.3.3. Multipliers Tab......................................................................................... 717.3.4. Preadder Tab...........................................................................................747.3.5. Accumulator Tab...................................................................................... 777.3.6. Systolic/Chainout Tab............................................................................... 787.3.7. Pipelining Tab.......................................................................................... 79

7.4. Signals...............................................................................................................80

8. ALTMULT_COMPLEX Intel FPGA IP Core Reference....................................................... 828.1. ALTMULT_COMPLEX Intel FPGA IP Release Information..............................................828.2. Features............................................................................................................. 838.3. Complex Multiplication..........................................................................................838.4. Parameters......................................................................................................... 848.5. Signals...............................................................................................................85

9. LPM_MULT Intel FPGA IP Core References....................................................................869.1. LPM_MULT Intel FPGA IP Release Information.......................................................... 869.2. Features............................................................................................................. 869.3. Parameters......................................................................................................... 87

9.3.1. General Tab.............................................................................................879.3.2. General 2 Tab..........................................................................................889.3.3. Pipelining Tab.......................................................................................... 88

9.4. Signals...............................................................................................................89

10. Native Floating Point DSP Intel Stratix 10 FPGA IP References...................................9010.1. Native Floating Point DSP Intel Stratix 10 FPGA IP Release Information..................... 9010.2. Native Floating Point DSP Intel Stratix 10 FPGA IP Core Supported Operational

Modes............................................................................................................. 9110.3. Parameterizing the Native Floating Point DSP Intel Stratix 10 FPGA IP....................... 92

10.3.1. Native Floating Point DSP Intel Stratix 10 FPGA IP Parameters......................9210.4. Native Floating Point DSP Intel Stratix 10 FPGA IP Core Signals ..............................94

11. LPM_DIVIDE (Divider) Intel FPGA IP Core..................................................................9811.1. LPM_DIVIDE Intel FPGA IP Release Information......................................................9811.2. Features........................................................................................................... 9911.3. Verilog HDL Prototype.........................................................................................9911.4. VHDL Component Declaration.............................................................................. 9911.5. VHDL LIBRARY_USE Declaration.........................................................................10011.6. Ports.............................................................................................................. 10011.7. Parameters......................................................................................................100

11.7.1. General Tab......................................................................................... 10111.7.2. General1 Tab....................................................................................... 101

12. Intel Stratix 10 Variable Precision DSP Blocks User Guide Document Archives......... 102

13. Document Revision History for Intel Stratix 10 Variable Precision DSP BlocksUser Guide.............................................................................................................103

Contents

Send Feedback Intel® Stratix® 10 Variable Precision DSP Blocks User Guide

3


1. Intel® Stratix® 10 Variable Precision DSP BlocksOverview

The variable-precision digital signal processing (DSP) blocks in Intel® Stratix® 10devices can support fixed-point arithmetic and single-precision floating-pointarithmetic. The Intel Stratix 10 DSP blocks provide high design flexibility and areoptimized to support high-performance DSP applications.

Related Information

HyperFlex Core Architecture, Intel Stratix 10 Device OverviewProvides more information about Hyper-Registers and the HyperFlex corearchitecture. Hyper-Registers are additional registers available in everyinterconnect routing segment throughout the core fabric, including the routingsegments connected to the DSP inputs and outputs.

1.1. Features

The Intel Stratix 10 fixed-point arithmetic features include:

• High-performance, power-optimized, and fully registered multiplication operations

• 18-bit and 27-bit word lengths

• Two 18 x 19 multipliers or one 27 x 27 multiplier per DSP block

• Built-in addition, subtraction, and 64-bit double accumulation register to combinemultiplication results

• Cascading 19-bit or 27-bit and cascading 18-bit when pre-adder is used to formthe tap-delay line for filtering applications

• Cascading 64-bit output bus to propagate output results from one block to thenext block without external logic support

• Hard pre-adder supported in 18-bit and 27-bit DSP operation modes for symmetricfilters

• Internal coefficient register bank in both 18-bit and 27-bit modes for filterimplementation

• 18-bit and 27-bit systolic finite impulse response (FIR) filters with distributedoutput adder

• Biased rounding support

The Intel Stratix 10 floating-point arithmetic is a completely hardened architecture.Features for floating-point arithmetic include :

• Multiplication, addition, subtraction, multiply-add, and multiply-subtract

• Multiplication with accumulation capability and a dynamic accumulator resetcontrol

• Multiplication with cascade summation and subtraction capability

UG-S10-DSP | 2019.10.22

Send Feedback

Intel Corporation. All rights reserved. Agilex, Altera, Arria, Cyclone, Enpirion, Intel, the Intel logo, MAX, Nios,Quartus and Stratix words and logos are trademarks of Intel Corporation or its subsidiaries in the U.S. and/orother countries. Intel warrants performance of its FPGA and semiconductor products to current specifications inaccordance with Intel's standard warranty, but reserves the right to make changes to any products and servicesat any time without notice. Intel assumes no responsibility or liability arising out of the application or use of anyinformation, product, or service described herein except as expressly agreed to in writing by Intel. Intelcustomers are advised to obtain the latest version of device specifications before relying on any publishedinformation and before placing orders for products or services.*Other names and brands may be claimed as the property of others.

ISO9001:2015Registered

https://www.intel.com/content/www/us/en/programmable/documentation/joc1442261161666.html#joc1432140239932


https://www.intel.com/content/www/us/en/quality/intel-iso-registrations.html



• Complex multiplication

• Direct vector dot product

• Systolic vector dot product

• Sequential vector dot product

• Exception handling support using exception flags

1.2. Supported Operational Modes in Intel Stratix 10 Devices

Table 1. Supported Combinations of Operational Modes and Features for VariablePrecision DSP Block in Intel Stratix 10 Devices

Variable-PrecisionDSP BlockResource

Operation Mode SupportedOperationInstance

Pre-AdderSupport

CoefficientSupport

InputCascadeSupport

ChaininSupport

ChainoutSupport

1 variableprecisionDSP block

Fixed-pointindependent18 x 19multiplication

2 (1) Yes Yes Yes (2) No No

Fixed-pointindependent27 x 27multiplication

1 Yes Yes Yes (3) Yes Yes

Fixed-point two18 x 19 multiplieradder mode

1 Yes Yes Yes(2) Yes Yes

Fixed-point 18 x 18multiplier addersummed with36-bit input

1 No No No Yes Yes

Fixed-point 18 x 19systolic mode

1 Yes Yes Yes(2) Yes Yes

1 variableprecisionDSP block

Floating-pointmultiplication mode

1 No No No No Yes

Floating-pointadder or subtractmode

1 No No No No Yes

Floating-pointmultiplier adder orsubtract mode

1 No No No Yes Yes

continued...

(1) The Intel Quartus® Prime software will determine the merging of two independentmultiplication automatically when there are not enough DSP blocks on the device or within aLogic Lock (Standard) region.

(2) Each of the two inputs to a pre-adder has a maximum width of 18-bit. When the inputcascade is used to feed one of the pre-adder inputs, the maximum width for the input cascadeis 18-bit.

(3) When you enable the pre-adder feature, the input cascade support is not available.

1. Intel® Stratix® 10 Variable Precision DSP Blocks Overview

UG-S10-DSP | 2019.10.22


5


Variable-PrecisionDSP BlockResource

Operation Mode SupportedOperationInstance

Pre-AdderSupport

CoefficientSupport

InputCascadeSupport

ChaininSupport

ChainoutSupport

Floating-pointmultiplieraccumulate mode

1 No No No No Yes

Floating-pointvector one mode

1 No No No Yes Yes

Floating-pointvector two mode

1 No No No Yes Yes

2 VariableprecisionDSP blocks

Fixed-pointcomplex 18x19multiplication

1 No No No No No

4 VariableprecisionDSP blocks

Floating-pointcomplexmultiplication

1 No No No No No

Table 2. Supported Combinations of Operational Modes and Dynamic Control Featuresfor Variable Precision DSP Blocks in Intel Stratix 10 Devices

Variable-Precision DSP

Block Resource

Operation Mode DynamicACCUMULATE

DynamicLOADCONST

Dynamic SUB DynamicNEGATE

1 variableprecision DSPblock

Fixed-pointindependent 18 x 19multiplication

No No No No

Fixed-pointindependent 27 x 27multiplication

Yes Yes No Yes

Fixed-point two18 x 19 multiplieradder mode

Yes Yes Yes Yes

Fixed-point 18 x 18multiplier addersummed with 36-bitinput

Yes Yes Yes Yes

Fixed-point 18 x 19systolic mode

Yes Yes Yes Yes

Floating-pointmultiplication mode

No No No No

Floating-point adderor subtract mode

No No No No

Floating-pointmultiplier adder orsubtract mode

No No No No

Floating-pointmultiplier accumulatemode

Yes No No No

Floating-point vectorone mode

No No No No

continued...


UG-S10-DSP | 2019.10.22


6


Variable-Precision DSP

Block Resource

Operation Mode DynamicACCUMULATE

DynamicLOADCONST

Dynamic SUB DynamicNEGATE

Floating-point vectortwo mode

No No No No

2 variableprecision DSPblocks

Fixed-point complex18 x 19 multiplication

No No No No

4 Variableprecision DSPblocks

Floating-pointcomplexmultiplication

No No No No

Related Information

• Design Considerations on page 39

• Internal Coefficient and Pre-Adder for Fixed-Point Arithmetic on page 39

• Accumulator for Fixed-Point Arithmetic on page 39

• Chainout Adder on page 40

• Input Cascade for Fixed-Point Arithmetic on page 40

1.3. Resources

Table 3. Number of Multipliers in Intel Stratix 10 Devices

ProductLine

Number ofVariable-precisionDSP Block

Independent Input andOutput

Number of MultiplicationsOperator

Single-Precision

Floating-PointMultiplier

Single-PrecisionFloating-

PointAdders

18 x 19Multiplier

Adder SumMode

18 x 18Multiplier

AdderSummed

with 36 bitInput18 x 19

Multiplier27 x 27

Multiplier

GX 400/ SX400

648 1,296 648 648 648 648 648

GX 650/ SX650

1,152 2,304 1,152 1,152 1,152 1,152 1,152

GX 850/ SX850

2,016 4,032 2,016 2,016 2,016 2,016 2,016

GX 1100/SX 1100

2,592 5,184 2,592 2,592 2,592 2,592 2,592

GX 1650/SX 1650

3,145 6,290 3,145 3,145 3,145 3,145 3,145

GX 2100/SX 2100

3,744 7,488 3,744 3,744 3,744 3,744 3,744

GX 2500/SX 2500

5,011 10,022 5,011 5,011 5,011 5,011 5,011

GX 2800/SX 2800

5,760 11,520 5,760 5,760 5,760 5,760 5,760

GX 1660 3,326 6,652 3,326 3,326 3,326 3,326 3,326

GX 2110 3,960 7,920 3,960 3,960 3,960 3,960 3,960

TX 400 648 1,296 648 648 648 648 648

continued...


UG-S10-DSP | 2019.10.22


7


ProductLine

Number ofVariable-precisionDSP Block

Independent Input andOutput

Number of MultiplicationsOperator

Single-Precision

Floating-PointMultiplier

Single-PrecisionFloating-

PointAdders

18 x 19Multiplier

Adder SumMode

18 x 18Multiplier

AdderSummed

with 36 bitInput18 x 19

Multiplier27 x 27

Multiplier

TX 850 2,016 4,032 2,016 2,016 2,016 2,016 2,016

TX 1100 2,592 5,184 2,592 2,592 2,592 2,592 2,592

TX 1650 3,326 6,652 3,326 3,326 3,326 3,326 3,326

TX 2100 3,960 7,920 3,960 3,960 3,960 3,960 3,960

TX 2500 5,011 10,022 5,011 5,011 5,011 5,011 5,011

TX 2800 5,760 11,520 5,760 5,760 5,760 5,760 5,760

MX 1650 3,326 6,652 3,326 3,326 3,326 3,326 3,326

MX 2100 3,960 7,920 3,960 3,960 3,960 3,960 3,960

DX 1100 2,592 5,184 2,592 2,592 2,592 2,592 2,592

DX 2100 3,960 7,920 3,960 3,960 3,960 3,960 3,960

DX 2800 5,760 11,520 5,760 5,760 5,760 5,760 5,760


UG-S10-DSP | 2019.10.22


8


2. Block Architecture OverviewThe Intel Stratix 10 variable precision DSP consists of the following blocks:

Table 4. Block Architecture

DSP Implementations Block Architecture

Fixed-Point Arithmetic • Input register bank• Pipeline register• Pre-adder/subtract• Internal coefficient• Multipliers• Adder and Subtractor• Accumulator, chainout adder, and Preload Constant• Systolic registers• Double accumulation register• Output register bank

Floating-Point Arithmetic • Input register bank• Pipeline register• Multipliers• Adder• Accumulator• Output register bank• Exception Handling

UG-S10-DSP | 2019.10.22

Send Feedback







Figure 1. Variable Precision DSP Block Architecture in 18 x 19 Mode for Fixed-PointArithmetic in Intel Stratix 10 Devices

Piplei

ne Re

giste

rIn

put R

egist

er Ba

nk

scanin[18..0]

scanout[18..0]

LOADCONST

ACCUMULATE

NEGATE

ay[18..0]

az[17..0]

ax[17..0]

COEFSELA[2..0]

by[18..0]

bz[17..0]

bx[17..0]

COEFSELB[2..0]

SUB

+/-

Pre-Adder

+/-

Pre-Adder

+/-

InternalCoefficient

InternalCoefficient

Multiplier

Adder and Subtractor

+/- +/-

**Systolic Registers

**Systolic Register

Chainout adder/accumulator

+

Outp

ut Re

giste

r Ban

k

Constant

Double Accumulation

Register

chainin[63..0]

chainout[63..0]

resulta[36:0]

Multiplier

x

x

CLK[2..0]

ENA[2..0]

CLR[1..0]

**Systolic registers are enabled in systolic mode only.

*1st

Piplei

ne Re

giste

r

resultb[36:0]

*2nd

Piple

ine Re

giste

r

*This block diagram shows the functional representation of the DSP block. The pipeline registers are embedded within the various circuits of the DSP block.

**Systolic Registers

Figure 2. Variable Precision DSP Block Architecture in 27 x 27 Mode for Fixed-PointArithmetic in Intel Stratix 10 Devices

+

Constant

64

DoubleAccumulation

Register

resulta[63:0]

chainout[63:0]

InternalCoefficients

+/-

LOADCONST

ACCUMULATE

NEGATE

ay[26:0]az[25:0]

ax[26:0]

COEFSELA[2:0]

Multiplierx

chainin[63:0]

OutputRegister

Bank

Chainout Adder/Accumulator

InputRegister

Bank

Pre-Adder+/-

*1st PipelineRegister

*2nd PipelineRegister

scanin[26:0]

scanout[26:0]


clk [2:0]

ena[2:0]

clr [1:0]

2. Block Architecture Overview

UG-S10-DSP | 2019.10.22


10


Figure 3. Variable Precision DSP Block Architecture for Floating-Point Arithmetic inIntel Stratix 10 Devices

chainout[31:0]

chainin[31:0]

accumulate

ax[31:0]

ay[31:0]

az[31:0]

resulta[31:0]Input

RegisterBank

Multiplier

Adder

*PipelineRegister

*PipelineRegister

*PipelineRegister

*PipelineRegister

mult_invalidmult_inexactmult_overflowmult_underflowadder_invalidadder_inexactadder_overflowadder_underflow

*PipelineRegister

OutputRegister

Bank


Related Information

• Native Floating Point DSP Intel Stratix 10 FPGA IP References on page 90

• Native Fixed Point DSP Intel Stratix 10 FPGA IP Core References on page 44


UG-S10-DSP | 2019.10.22


11


2.1. Input Register Bank for Fixed-Point and Floating-PointArithmetic

The input register banks in Intel Stratix 10 DSP blocks are available for the followinginput signals:

Table 5. Input Register Bank

Fixed-Point Arithmetic Floating-Point Arithmetic

• Data• Dynamic control signals

— NEGATE— LOADCONST— ACCUMULATE— SUB

• Data• Dynamic ACCUMULATE control signal

All the registers in the DSP blocks are positive-edge triggered. These registers are notreset after power up and may hold unwanted data. Assert the CLR signal to clear theregisters before starting an operation. Each multiplier operand can feed an inputregister or a multiplier directly, bypassing the input registers.

The following variable precision DSP block signals control the input registers within thevariable precision DSP block:

• CLK[2..0]

• ENA[2..0]

• CLR[0]


UG-S10-DSP | 2019.10.22


12


Figure 4. Data Input Registers in Fixed-Point Arithmetic 18 x 19 Mode

ay[18..0]

az[17..0]

ax[17..0]

by[18..0]

Top delay registers

bz[17..0]

bx[17..0]

Bottom delay registers

scanin[18..0]

scanout[18..0]

CLK[2..0]

ENA[2..0]

CLR[0]


UG-S10-DSP | 2019.10.22


13


Figure 5. Data Input Registers in Fixed-Point Arithmetic 27 x 27 Mode

ay[26..0]

az[25..0]

ax[26..0]

scanin[26..0]

CLK[2..0]

ENA[2..0]

CLR[0]

scanout[26..0]

2.2. Pipeline Registers for Fixed-Point and Floating-Point Arithmetic

In addition to the input and output registers, there are 2 columns of pipeline registersfor fixed-point arithmetic. Pipeline registers are used to get the maximum Fmaxperformance. The pipeline registers can be bypassed if high Fmax is not needed.

The following variable precision DSP block signals control the pipeline registers withinthe variable precision DSP block:

• CLK[2..0]

• ENA[2..0]

• CLR[1]

Floating-point arithmetic has 3 latency layers of pipeline registers. You can bypass alllatency layers of the pipeline registers or use any one, two or three layers of pipelineregisters.


UG-S10-DSP | 2019.10.22


14


2.3. Pre-adder for Fixed-Point Arithmetic

Each variable precision DSP block has two 19-bit pre-adders. You can configure thesepre-adders in the following configurations:

• 18-bit (signed or unsigned) addition or 18-bit (signed) subtraction for 18 x 19mode

• 26-bit addition or subtraction for 27 x 27 mode

For 18 x 19 mode, when both pre-adders within the same DSP block are used, theymust share the same operation type (either addition or subtraction).

2.4. Internal Coefficient for Fixed-Point Arithmetic

The Intel Stratix 10 variable precision DSP block has the flexibility of selecting themultiplicand from either the dynamic input or the internal coefficient.

The internal coefficient can support up to eight constant coefficients for themultiplicands in 18-bit and 27-bit modes. When you enable the internal coefficientfeature, COEFSELA/COEFSELB are used to control the selection of the coefficientmultiplexer.

2.5. Multipliers for Fixed-Point and Floating-Point Arithmetic

A single variable precision DSP block can perform many multiplications in parallel,depending on the data width of the multiplier and implementation.

There are two multipliers per variable precision DSP block. You can configure thesetwo multipliers in several operational modes:

Table 6. Operational Modes


• Two 18 (signed or unsigned) x 19 (signed)multipliers

• One 27 x 27 multiplier

• One floating-point arithmetic single precision multiplier


UG-S10-DSP | 2019.10.22


15


2.6. Adder or Subtractor for Fixed-Point and Floating-PointArithmetic

Depending on the operational mode, you can use the adder or subtractor as follows:

• One 38-bit adder for fixed-point arithmetic addition and subtraction between twomultipliers within a DSP block.

• One floating-point arithmetic single precision adder or subtractor.

Use the dynamic SUB port to select the adder to perform addition or subtractionoperation for fixed-point arithmetic.

Table 7. Adder Operations with SUB Dynamic Control Signal

Operation Description SUB Signal

Addition Adds the results of the two multipliers within one DP block. 0

Subtraction Subtracts the results between two multipliers within the same DSPblock.

1

The dynamic SUB port is not supported in floating-point arithmetic.

2.7. Accumulator, Chainout Adder, and Preload Constant for Fixed-Point Arithmetic

The Intel Stratix 10 variable precision DSP block supports accumulator and adder upto 64 bits for fixed-point arithmetic.

The following signals can dynamically control the function of the accumulator and thechainout adder:

• NEGATE

• LOADCONST

• ACCUMULATE

The accumulator and chainout adder features are not available in two fixed-pointarithmetic independent 18 x 19 modes.

Table 8. Accumulator Functions and Dynamic Control Signals

Function Description NEGATE LOADCONST ACCUMULATE

Zeroing Disables the accumulator. 0 0 0

Preload

The result is always added to the preloadvalue. Only one bit of the 64-bit preloadvalue can be “1”. You can use this functionto round the DSP result to any position ofthe 64-bit result.

0 1 0

continued...


UG-S10-DSP | 2019.10.22


16


Function Description NEGATE LOADCONST ACCUMULATE

Accumulation Adds the current result to the previousaccumulate result. 0 X 1

Decimation +Accumulation

This function takes the current result,converts it into two’s complement, andadds it to the previous result.

1 X 1

Decimation +Chainout Adder

This function takes the current result,converts it into two’s complement, andadds it to the output of previous DSPblock.

1 0 0

2.8. Systolic Register for Fixed-Point Arithmetic

There are two sets of systolic registers per variable precision DSP block and each setsupports up to 44 bits chain in and chain out adder. If the variable precision DSP blockis not configured in fixed-point arithmetic systolic FIR mode, both sets of systolicregisters are bypassed.

The first set of systolic registers consists of 18-bit and 19-bit registers that are used toregister the 18-bit and 19-bit inputs of the upper multiplier, respectively.

The second set of systolic registers are used to delay the chainin input from theprevious variable precision DSP block.

Below are the guidelines when implementing systolic registers in your design:

• The input and output register must be enabled when using systolic registers.

• First and second pipeline registers are optional when using systolic registers. Ifsecond pipeline is enabled, use the same clock as the input systolic register.

• The chainin systolic register always has the same clock source as the outputregister.

• All registers are recommended to use the same clock source to ensure correctsystolic operation.

2.9. Double Accumulation Register for Fixed-Point Arithmetic

The accumulator supports double accumulation by enabling the 64-bit doubleaccumulation registers located between the output register bank and the accumulatorfeedback path.

If the double accumulation register is enabled, an extra clock cycle delay is added intothe feedback path of the accumulator.

This register has the same CLK, ENA, and CLR settings as the output register bank.

By enabling this register, you can have two accumulator channels using the samenumber of variable precision DSP block. This is useful when processing interleavedcomplex data (I, Q).


UG-S10-DSP | 2019.10.22


17


2.10. Output Register Bank for Fixed-Point and Floating-PointArithmetic

The positive edge of the clock signal triggers the 74-bit bypassable output registerbank. The output register bank is not reset after power up and may hold unwanteddata. Assert the CLR signal to clear the register before starting an operation.

The following variable precision DSP block signals control the output register pervariable precision DSP block:

• CLK[2..0]

• ENA[2..0]

• CLR[1]


UG-S10-DSP | 2019.10.22


18


2.11. Exception Handling for Floating-Point Arithmetic

The Intel Stratix 10 floating-point arithmetic supports exception handling for themultiplier and adder blocks.

Table 9. Supported Exception Flags

Exception Flags Width Description

Multiplication

mult_overflow 1 This signal indicates if the multiplier result is a larger value compared to themaximum presentable value.1: If the multiplier result is a larger value compared to the maximumrepresentable value and the result is cast to infinity.0: If the multiplier result is not larger than the maximum presentable value.This signal is not available in Adder or Subtract Mode.

mult_underflow 1 This signal indicates if the multiplier result is a smaller value compared to theminimum presentable value.1: If the multiplier result is a smaller value compared to the minimumrepresentable value and the result is flushed to zero.0: If the multiplier result is a larger than the minimum representable value.This signal is not available in Adder or Subtract Mode.

mult_inexact 1 This signal indicates if the multiplier result is an exact representation.1: If the multiplier result is:• a rounded value• a smaller value compared to the minimum representable value or• a larger value compared to the maximum representable value.0: If the multiplier result does not meet any of the criteria above.This signal is not available in Adder or Subtract Mode.

mult_invalid 1 This signal indicates if the multiplier operation is ill-defined and produces aninvalid result.1: If the multiplier result is invalid and cast to qNaN.0: If the multiplier result is not an invalid number.This signal is not available in Adder or Subtract Mode.

Addition

adder_overflow 1 This signal indicates if the adder result is a larger value compared to themaximum representable value.1: If the adder result is a larger value compared to the maximum presentablevalue and the result is cast to infinity.0: If the adder result is not larger than the maximum presentable value.This signal is not available in Multiplication Mode.

adder_underflow 1 This signal indicates if the adder result is a smaller value compared to theminimum presentable value.1: If the adder result is a smaller value compared to the minimumrepresentable value and the result is flushed to zero.0: If the adder result is a larger than the minimum representable value.This signal is not available in Multiplication Mode.

adder_inexact 1 This signal indicates if the adder result is an exact representation.1: If the adder result is:• a rounded value• a smaller value compared to the minimum representable value or• a larger value compared to the maximum representable value.0: If the adder result does not meet any of the criteria above.

continued...


UG-S10-DSP | 2019.10.22


19


Exception Flags Width Description

This signal is not available in Multiplication Mode.

adder_invalid 1 This signal indicates if the adder operation is ill-defined and produces aninvalid result.1: If the adder result is invalid and cast to qNaN.0: If the adder result is not an invalid number.This signal is not available in Multiplication Mode.

Table 10. Multiplier Exception Handling Possible Results

Input A Input B Result (4)

FlagsOverflow/Underflow/

Inexact/Invalid

Normalized Normalized Normalized value 0/0/0/0

Normalized (rounded) value 0/0/1/0

Positive/negative infinityvalue

1/0/1/0

Subnormal (denormal) value 0/1/1/0

0 or Subnormal (denormal) Normalized 0 value 0/0/0/0

Positive/negative infinity Normalized Positive/negative infinityvalue

0/0/0/0

Quiet Not A Number (qNaN) Normalized qNaN value 0/0/0/0

0 or Subnormal (denormal) 0 or Subnormal (denormal) 0 value 0/0/0/0

Positive/negative infinity 0 or Subnormal (denormal) qNaN value 0/0/0/1

Quiet Not A Number (qNaN) 0 or Subnormal (denormal) qNaN value 0/0/0/0

Positive/negative infinity Positive/negative Infinity Positive/negative infinityvalue

0/0/0/0

Quiet Not A Number (qNaN) Positive/negative Infinity qNaN value 0/0/0/0

Quiet Not A Number (qNaN) Quiet Not A Number (qNaN) qNaN value 0/0/0/0

Table 11. Adder Exception Handling Possible Results

Input A Input B Result : (4)


Inexact/Invalid

Normalized Normalized Normalized value 0/0/0/0

Normalized (rounded) value 0/0/1/0

Positive/negative infinityvalue

1/0/1/0

0 valueSign bit = 0

0/0/0/0

Subnormal (denormal) value 0/1/1/0

continued...

(4) Output exception flags. These flags do not change if exceptions are at input value.


UG-S10-DSP | 2019.10.22


20


Input A Input B Result : (4)


Inexact/Invalid

The sign is preserved

0 or Subnormal (denormal) Normalized Input b 0/0/0/0

Positive/negative infinity Normalized Positive/negative infinityvalue

0/0/0/0

Quiet Not A Number (qNaN) Normalized qNaN value 0/0/0/0

0 or Subnormal (denormal) 0 or Subnormal (denormal) 0 valueFor (-0 + (-0)) equation,sign bit = 1. For any otherequation, sign bit = 0.

0/0/0/0

Positive/negative infinity 0 or Subnormal (denormal) Positive/negative infinityvalue

0/0/0/0

Quiet Not A Number (qNaN) 0 or Subnormal (denormal) qNaN value 0/0/0/0

Positive/negative infinity Positive/negative infinity qNaN value for invalid casesPositive/negative infinityvalue for valid cases

0/0/0/1 for invalid cases0/0/0/0 for valid cases

Valid cases are:• Positive infinity value +

positive infinity value• Negative infinity value +

negative infinity value• Negative infinity value -

positive infinity value• Positive infinity value -

negative infinity value

Quiet Not A Number (qNaN) Positive/negative infinity qNaN value 0/0/0/0

Quiet Not A Number (qNaN) Quiet Not A Number (qNaN) qNaN value 0/0/0/0

Related Information

Native Floating Point DSP Intel Stratix 10 FPGA IP Core Signals on page 94


UG-S10-DSP | 2019.10.22


21


3. Operational Mode DescriptionsThis section describes how you can configure the Intel Stratix 10 variable precisionDSP block to efficiently support the fixed-point arithmetic and floating-point arithmeticoperational modes.

Table 12. Operational Modes


• Independent multiplier mode• Multiplier adder sum mode• Independent complex multiplier• 18 × 18 multiplication summed with 36-Bit input mode• 18 × 18 systolic FIR mode

• Multiplication mode• Adder or subtract mode• Multiply-add or multiply-subtract mode• Multiply accumulate mode• Vector one mode• Vector two mode• Direct vector dot product• Complex multiplication

3.1. Operational Modes for Fixed-Point Arithmetic

3.1.1. Independent Multiplier Mode

In independent input and output multiplier mode, the variable precision DSP blocksperform individual multiplication operations for general purpose multipliers.

Table 13. Supported Independent Multiplier Modes in Intel Stratix 10 Variable PrecisionDSP Blocks

Configuration Multipliers per Block

18 (unsigned) x 18 (unsigned) 2

18 (signed) x 19 (signed) 2

27 (signed or unsigned) x 27 (signed or unsigned) 1

Related Information


• Supported Operational Modes on page 46

3.1.1.1. 18 × 18 or 18 × 19 Independent Multiplier

The 18 × 18 or 18 × 19 independent multiplier mode uses the following equations:

resulta = ax * ay

resultb = bx * by

UG-S10-DSP | 2019.10.22

Send Feedback







Figure 6. Two 18 × 18 or 18 × 19 Independent Multiplier per Variable Precision DSPBlock for Intel Stratix 10 Devices

In this figure, the variables are defined as follows:

• n = 19 and m = 37 for 18 × 19 signed operands

• n = 18 and m = 36 for 18 × 18 unsigned operands

resulta[(m-1)..0]

Multiplier

x

Multiplier

x

Inpu

t Reg

ister

Bank

ay [(n-1)..0]

ax [17..0]

n

18

Variable-Precision DSP Block

by [(n-1)..0]

bx [17..0]

n

18

m

resultb[(m-1)..0]m

Outp

ut Re

giste

r Ban

k

*1st

Pipeli

ne Re

giste

r

*2nd

Pipe

line R

egist

er


3.1.1.2. 27 × 27 Independent Multiplier

The 27 x 27 independent multiplier mode uses the equation of resulta = ay * ax.

Figure 7. One 27 × 27 Independent Multiplier Mode per Variable Precision DSP Blockfor Intel Stratix 10 DevicesIn this mode, the resulta can be up to 64 bits when combined with a chainout adder or accumulator.

Inpu

t Reg

ister

Bank

Multiplier

x resulta[53..0]

ay[26..0]

ax[26..0]

27

27

54


Outp

ut Re

giste

r Ban

k

*1st

Pipeli

ne Re

giste

r

*2nd

Pipe

line R

egist

er


3. Operational Mode Descriptions

UG-S10-DSP | 2019.10.22


23


3.1.2. Multiplier Adder Sum Mode

The multiplier adder sum mode uses the equations:

• resulta = (bx * by) + (ax * ay) to calculate the sum of the two 18 x 19multiplications.

• resulta = (bx * by) - (ax * ay) to calculate the difference of the two 18 x 19multiplications.

Figure 8. One Sum of Two 18 x 18 or 18 × 19 Multipliers with One Variable PrecisionDSP Block for Intel Stratix 10 Devices

In this figure, the variable is defined as follows:

• n = 19 for 18 × 19 signed operands

• n = 18 for 18 × 18 unsigned operands

Inpu

t Reg

ister

Bank

resulta[37..0]

ay[(n-1)..0]

ax17..0]

n

18


by[(n-1)..0]

bx[17..0]

n

18

38

Multiplier

Multiplier

Adder

+/-

SUB

Outp

ut R

egist

er Ba

nk

x

x

*1st

Pipeli

ne Re

giste

r

*2nd

Pipe

line R

egist

er


Set the SUB dynamic control signal to high to calculate the difference of the two18 × 19 multiplications.

Related Information



3.1.3. Independent Complex Multiplier

The Intel Stratix 10 devices support the 18 × 19 complex multiplier mode using twofixed-point arithmetic multiplier adder sum mode.

Figure 9. Sample of Complex Multiplication Equation

The imaginary part [(a × d) + (b × c)] is implemented in the first variable-precisionDSP block, while the real part [(a × c) - (b × d)] is implemented in the secondvariable-precision DSP block.


UG-S10-DSP | 2019.10.22


24


Figure 10. One 18 × 19 Complex Multiplier with Two Variable Precision DSP Blocks forIntel Stratix 10 Devices

Variable-Precision DSP Block 1

Variable-Precision DSP Block 2

Inpu

t Reg

ister

Bank

Imaginary Part(ad+bc)

Multiplier

c[18..0]

b[17..0]

19

18

Multiplier

d[18..0]

a[17..0]

19

18

38

Adder

+

x

x

Outp

ut Re

giste

r Ban

k

Inpu

t Reg

ister

Bank

Real Part(ac-bd)

d[18..0]

b[17..0]

19

18

c[18..0]

a[17..0]

19

18

38

Outp

ut Re

giste

r Ban

k

Multiplier

Multiplier

Adder

-

x

x

*1st

Pipeli

ne Re

giste

r*1

st Pip

eline

Regis

ter

* 2nd

Pipe

line R

egist

er* 2

nd Pi

pelin

e Reg

ister


Related Information



3.1.4. 18 × 19 Multiplication Summed with 36-Bit Input Mode

Intel Stratix 10 variable precision DSP blocks support one 18 × 19 multiplicationsummed to a 36-bit input.

The 18 × 19 multiplication summed with 36-bit input mode uses the equations:

• resulta = (ax * ay) + by to sum the 18 x 19 multiplication with 36-bit input.

• resulta = (ax * ay) - by to subtract the 18 x 19 multiplication with 36-bit input.


UG-S10-DSP | 2019.10.22


25


Use the upper multiplier to provide the input for an 18 × 19 multiplication, while thebottom multiplier is bypassed. The by[17..0] and bx[35..18] signals areconcatenated to produce a 36-bit input.

Use the SUB dynamic control signal to control the adder to perform addition orsubtraction operation.

Figure 11. One 18 x 19 Multiplication Summed with 36-Bit Input Mode for Intel Stratix10 Devices

In this figure, the variable is defined as follows:

• n = 19 for 18 × 19 signed operands

• n = 18 for 18 × 18 unsigned operandsIn

put R

egist

er B

ank

resulta[37..0]

ay [(n-1)..0]

ax [17..0]

n

18


bx [35..18]

by [17..0]

18

18

38

Multiplier

Adder

SUB

Outp

ut Re

giste

r Ban

k

x

+/-

*1st

Pipe

line R

egist

er

*2nd

Pipe

line R

egist

er


Related Information



3.1.5. Systolic FIR Mode

The basic structure of a FIR filter consists of a series of multiplications followed by anaddition.

Figure 12. Basic FIR Filter Equation

Depending on the number of taps and the input sizes, the delay through chaining ahigh number of adders can become quite large. To overcome the delay performanceissue, the systolic form is used with additional delay elements placed per tap toincrease the performance at the cost of increased latency.


UG-S10-DSP | 2019.10.22


26


Figure 13. Systolic FIR Filter Equivalent Circuit

1−kc

][ nx

][ ny

1c 2c kc

][1 nw ][2 nw ][1 nw k − ][ nw k

Intel Stratix 10 variable precision DSP blocks support the following systolic FIRstructures:

• 18-bit

• 27-bit

In systolic FIR mode, the input of the multiplier can come from four different sets ofsources:

• Two dynamic inputs

• One dynamic input and one coefficient input

• One coefficient input and one pre-adder output

• One dynamic input and one pre-adder output

Related Information



3.1.5.1. Mapping Systolic Mode User View to Variable Precision BlockArchitecture View

The following figure shows implementation of the systolic FIR filter (a) using the IntelStratix 10 variable precision DSP blocks (d) by retiming the register and restructuringthe adder. Register B can be retimed into systolic registers at the chainin, ay and axinput paths as shown in (b). The end result of the register retiming is shown in (c).The location of the adder is then restructured to sum both the multipliers output. Theadder result is send to chainout adder to sum with the chainin value from the previousDSP block as shown in (d).


UG-S10-DSP | 2019.10.22


27


Figure 14. Mapping Systolic Mode User View to Variable Precision Block ArchitectureView

x[n]

c1

(a) Systolic FIR FilterUser View

(b) Variable Precision BlockArchitecture View (Before Retiming)

Second DSP Block

dataa_y0 x[n]

dataa_x0 c1

datab_y1 x[n-2]

datab_x1 c2

w1[n]

w2[n]

dataa_y0 x[n-4]

dataa_x0 c3

w3[n]

Register B

datab_y1 x[n-6]

datab_x1 c4

w4[n]

Register C

y[n]

Register A

Multiplier

MultiplierAdder

OutputRegisterBank

ResultFirst DSP Block

Result

OutputRegisterBank

Retiming

ChainoutAdder

Chainin fromPrevious DSP Block

(c) Variable Precision BlockArchitecture View (After Retiming)

Second DSP Block

dataa_y0 x[n]

dataa_x0 c1

datab_y1 x[n-2]

datab_x1 c2

w1[n]

w2[n]

dataa_y0 x[n-4]

dataa_x0 c3

w3[n]

Register B

datab_y1 x[n-6]

datab_x1 c4

w4[n]

Register C

y[n]

Register A

Multiplier

MultiplierAdder

OutputRegisterBank


Result

OutputRegisterBank

ChainoutAdder


SystolicRegister

SystolicRegisters

(d) Variable Precision BlockArchitecture View (Adder Restructured)

Second DSP Block

dataa_y0 x[n]

dataa_x0 c1

datab_y1 x[n-2]

datab_x1 c2

w1[n]

w2[n]

dataa_y0 x[n-4]

dataa_x0 c3

w3[n]

Register B

datab_y1 x[n-6]

datab_x1 c4

w4[n]

Register C

y[n]

Register A

Multiplier

Multiplier

Adder

OutputRegisterBank


Result

OutputRegisterBank

ChainoutAdder


SystolicRegister

SystolicRegisters

Adder

x[n-2]

c2

w1[n]

w2[n]

x[n-4]

c3

w3[n]

x[n-6]

c4

w4[n]

y[n]

Register B

Register A

Register A

3.1.5.2. 18-bit Systolic FIR Mode

In 18-bit systolic FIR mode, the adders are configured as dual 44-bit adders, therebygiving 7 bits of overhead when using an 18 x 19 operation mode, resulting 37-bitresult. This allows a total sixteen 18 x 19 multipliers or eight Intel Stratix 10 variableprecision DSP blocks to be cascaded as systolic FIR structure.

Figure 15. 18-Bit Systolic FIR Mode for Intel Stratix 10 Devices

Inpu

t Reg

ister

Bank

ay[18..0]

az[17..0]

ax[17..0]

COEFSELA[2..0]

by[18..0]

bz[17..0]

bx[17..0]

COEFSELB[2..0]

+/-

Pre-Adder

+/-

Pre-Adder

+/-

InternalCoefficient

InternalCoefficient

Multiplier

Multiplier

Adder

+/-

Systolic Registers

Systolic Register

Chainout adder oraccumulator

+

Outp

ut Re

giste

r Ban

k

chainin[43..0]

chainout[43..0]

resulta[43..0]

18-bit Systolic FIR

x

x

19

18

18

19

18

18

3

3

44

44

44

*1st

Pipeli

ne Re

giste

r

*2nd

Pipe

line R

egist

er


Systolic Registers


UG-S10-DSP | 2019.10.22


28


3.1.5.3. 27-Bit Systolic FIR Mode

In 27-bit systolic FIR mode, the chainout adder or accumulator is configured for a64-bit operation, providing 10 bits of overhead when using a 27-bit data (54-bitproducts). This allows a total of eleven 27 x 27 multipliers or eleven Intel Stratix 10variable precision DSP blocks to be cascaded as systolic FIR structure.

The 27-bit systolic FIR mode allows the implementation of one stage systolic filter perDSP block. Systolic registers are not required in this mode.

Figure 16. 27-Bit Systolic FIR Mode for Intel Stratix 10 Devices

Inpu

t Reg

ister

Bank

ay[25..0]

az[25..0]

ax[26..0]

COEFSELA[2..0]

Pre-Adder

+/-

InternalCoefficient

Multiplier

Chainout adder oraccumulator

+

chainin[63..0]

chainout[63..0]

27-bit Systolic FIR

27 x

Outp

ut Re

giste

r Ban

k

26

3

27

26

64

64

*2nd

Pipe

line R

egist

er

*1st

Pipeli

ne Re

giste

r

64resulta[63..0]


3.2. Operational Modes for Floating-Point Arithmetic

3.2.1. Single Floating-Point Arithmetic Functions

One floating-point arithmetic DSP can perform the following:

• Multiplication mode

• Adder or subtract mode

• Multiply accumulate mode

Related Information

Native Floating Point DSP Intel Stratix 10 FPGA IP Core Supported Operational Modeson page 91

3.2.1.1. Multiplication Mode

This mode allows you to apply basic floating-point multiplication equation:

result = ay*az

The floating-point multiplication mode supports the following exception flags:


UG-S10-DSP | 2019.10.22


29


• mult_invalid

• mult_inexact

• mult_overflow

• mult_underflow

Figure 17. Multiplication Mode for Intel Stratix 10 Devices

chainout[31:0]

chainin[31:0]

accumulate

ax[31:0]

ay[31:0]

az[31:0]

OutputRegister

Bank

resulta[31:0]Input

RegisterBank

Multiplier

Adder

*PipelineRegister

Bank

*PipelineRegister

Bank RegisterBank

*PipelineRegister

Bank

*PipelineRegisterBank

mult_invalidmult_inexactmult_overflowmult_underflow

*Pipeline


3.2.1.2. Adder or Subtract Mode

This mode allows you to apply following equations:

result = ax+ay

result = ay-ax

The floating-point adder or subtract mode supports the following exception flags:

• adder_invalid

• adder_inexact

• adder_overflow

• adder_underflow


UG-S10-DSP | 2019.10.22


30


Figure 18. Adder or Subtract Mode for Intel Stratix 10

chainout[31:0]

chainin[31:0]

accumulate

ax[31:0]

ay[31:0]

az[31:0]

OutputRegisterBank

InputRegister

Bank

resulta[31:0]

Multiplier

Adder

*PipelineRegister

Bank RegisterBank


adder_invalidadder_inexactadder_overflowadder_underflow

*Pipeline

PipelineRegister

Bank

PipelineRegister

Bank


3.2.1.3. Multiply Accumulate Mode

This mode performs floating-point multiplication followed by floating-point addition orsubtraction with the previous multiplication result.

When ACCUMULATE signal is high, this mode uses the equation of result = (ay*az) +/-previous value.

When ACCUMULATE signal is low, this mode uses the equation of result = (ay*az).

The floating-point multiply accumulate mode supports the following exception flags:

• mult_invalid

• mult_inexact

• mult_overflow

• mult_underflow

• adder_invalid

• adder_inexact

• adder_overflow

• adder_underflow


UG-S10-DSP | 2019.10.22


31


Figure 19. Multiply Accumulate Mode for Intel Stratix 10 Devices

chainout[31:0]

chainin[31:0]

accumulate

ax[31:0]ay[31:0]

az[31:0]

OutputRegister

Bank

InputRegister

Bankresulta[31:0]

Multiplier

Adder

*PipelineRegisterBank Register

Bank




adder_inexact

adder_invalid

adder_overflowadder_underflow

*PipelineRegister

Bank*Pipeline


3.2.2. Multiple Floating-Point Arithmetic Functions

Two or more floating-point arithmetic DSP can perform the following:

• Multiply-add or multiply-subtract mode which uses single floating-point arithmeticDSP if the chainin parameter is turn off

• Vector one mode

• Vector two mode

• Direct vector dot product

• Complex multiplication

Related Information

Native Floating Point DSP Intel Stratix 10 FPGA IP Core Supported Operational Modeson page 91

3.2.2.1. Multiply-Add or Multiply-Subtract Mode

This mode performs floating-point multiplication followed by floating-point addition orfloating-point subtraction. The chainin parameter allows you to enable a multiple-chainmode.

Table 14. Equations Applied to Multiply-Add or Multiply-Subtract Mode

Chainin Parameter Multiply-Add Mode Multiply-Subtract Mode

Disable result = (ay*az) + ax result = (ay*az) - ax

Enable result = (ay*az) + chainin result = (ay*az) - chainin

The floating-point multiply-adder or multiply-subtract mode supports the followingexception flags:


UG-S10-DSP | 2019.10.22


32


• mult_invalid

• mult_inexact

• mult_overflow

• mult_underflow

• adder_invalid

• adder_inexact

• adder_overflow

• adder_underflow

Figure 20. Multiply-Add or Multiply-Subtract Mode for Intel Stratix 10 Devices

chainout[31:0]

chainin[31:0]

accumulate

ax[31:0]

ay[31:0]

az[31:0]

OutputRegister

Bank

InputRegister

Bankresulta[31:0]

Multiplier

Adder


*PipelineRegister

BankRegisterBank

*PipelineRegister

Bank



*Pipeline


3.2.2.2. Vector One Mode

This mode performs floating-point multiplication followed by floating-point addition orsubtraction with the chainin input from the previous variable DSP Block. Input ax isdirectly fed into chainout.

Table 15. Equations Applied to Vector One Mode

Chainin Parameter Vector One with Floating-PointAddition

Vector One with Floating-PointSubtraction

Disable result = ay * azChainout = ax

result = ay * azChainout = ax

Enable result = (ay * az) + chaininChainout = ax

result = (ay * az) - chaininChainout = ax

The floating-point vector one mode supports the following exception flags:

• mult_invalid

• mult_inexact

• mult_overflow

• mult_underflow


UG-S10-DSP | 2019.10.22


33


• adder_invalid

• adder_inexact

• adder_overflow

• adder_underflow

Figure 21. Vector One Mode for Intel Stratix 10 Devices

chainout[31:0]

chainin[31:0]

accumulate

ax[31:0]

ay[31:0]

az[31:0]

OutputRegister

Bank

InputRegister

Bankresulta[31:0]

Multiplier

Adder

*PipelineRegister

Bank







3.2.2.3. Vector Two Mode

This mode performs floating-point multiplication where the multiplication result isdirectly fed to chainout. The chainin input from the previous variable DSP Block is thenadded or subtracted from input ax as the output result.

Table 16. Equations Applied to Vector Two Mode

Chainin Parameter Vector Two with Floating-PointAddition

Vector Two with Floating-PointSubtraction

Disable result = axChainout = ay * az

result = axChainout = ay * az

Enable result = ax + chaininChainout = ay * az

result = ax - chaininChainout = ay * az

The floating-point vector two mode supports the following exception flags:

• mult_invalid

• mult_inexact

• mult_overflow

• mult_underflow

• adder_invalid

• adder_inexact

• adder_overflow

• adder_underflow


UG-S10-DSP | 2019.10.22


34


Figure 22. Vector Two Mode for Intel Stratix 10 Devices

chainout[31:0]

chainin[31:0]

accumulate

ax[31:0]

ay[31:0]

az[31:0]

OutputRegister

Bankresulta[31:0]

Multiplier

Adder







InputRegister

Bank


3.2.2.4. Direct Vector Dot Product

In the following figure, the direct vector dot product is implemented by several DSPblocks by setting the following DSP modes:

• Multiply-add and subtract mode with chainin parameter turned on

• Vector one

• Vector two


UG-S10-DSP | 2019.10.22


35


Figure 23. Direct Vector Dot Product

chainout[31:0]

chainin[31:0]accumulate

ax[31:0]

B ay31:0]

A az[31:0]

OutputRegister

Bank

InputRegister

Bankresulta[31:0] AB + CD

Multiplier

Adder

*PipelineRegister

Bank

*PipelineRegister

Bank

chainout[31:0]

accumulate

AB + CD ax[31:0]

D ay[31:0]

C az[31:0]

resulta[31:0] AB + CD + EF + GH

chainout[31:0]

accumulate

EF + GH ax[31:0]

F ay[31:0]

E az[31:0]

EF + GH

chainout[31:0]

accumulate

AB + CD + EF + GH ax[31:0]

H ay[31:0]

G az[31:0]

resulta[31:0]

chainout[31:0]

chainin[31:0]

accumulate

ax[31:0]

J ay[31:0]

I az[31:0]

Multiplication


*PipelineRegister

Bank




OutputRegister

Bank

InputRegister

Bankresulta[31:0] IJ +KL

Multiplier

Adder

*PipelineRegister

Bank

*PipelineRegister

Bank*PipelineRegister

Bank

Vector One


*PipelineRegister

Bank


chainin[31:0]

Multiplier

Adder


*PipelineRegister

Bank

Vector Two


*PipelineRegister

BankInput

RegisterBank

OutputRegister

Bank

OutputRegister

Bank

InputRegister

Bankresulta[31:0]

Multiplier

Adder




Vector One




chainin[31:0]

Multiplier

Adder



Vector Two



InputRegister

Bank

OutputRegister

Bank





KL

AB + CD +EF + GH + IJ +KL

3.2.2.5. Complex Multiplication

The Intel Stratix 10 devices support the floating-point arithmetic single precisioncomplex multiplier using four Intel Stratix 10 variable-precision DSP blocks.

Figure 24. Sample of Complex Multiplication Equation


UG-S10-DSP | 2019.10.22


36


The imaginary part [(a × d) + (b × c)] is implemented in the first two variable-precision DSP blocks, while the real part [(a × c) - (b × d)] is implemented in the nexttwo variable-precision DSP blocks.

Figure 25. Complex Multiplication with Imaginary Result

chainin[31:0]

accumulate

ax[31:0]

a ay[31:0]

d az[31:0]

OutputRegister

Bank resulta[31:0]

InputRegister

Bank

Multiplier

Adder



*PipelineRegister

Bank

chainout[31:0]

accumulate

ax[31:0]

b ay[31:0]

c az[31:0]

OutputRegister

Bank

InputRegister

Bankresulta[31:0] Result Imaginary

Multiplier



chainout[31:0]

chainin[31:0]

Multiplication Mode

Multiply-Add Mode








Adder



UG-S10-DSP | 2019.10.22


37


Figure 26. Complex Multiplication with Result Realchainin[31:0]

accumulate

ax[31:0]

b ay[31:0]

d az[31:0]

OutputRegister

Bank resulta[31:0]

InputRegister

Bank

Multiplier

Adder




chainout[31:0]

accumulate

ax[31:0]

a ay[31:0]

c az[31:0]

OutputRegister

Bank

InputRegister

Bankresulta[31:0] Result Real

Multiplier

Subtract



chainout[31:0]

chainin[31:0]

Multiplication Mode

Multiply-Subtract Mode










UG-S10-DSP | 2019.10.22


38


4. Design ConsiderationsYou should consider the following elements in your design:

Table 17. Design Considerations

DSP Functions Design Elements

Fixed-point arithmetic • Operational modes• Internal coefficient and pre-adder• Accumulator• Chainout adder• Input cascade

Floating-point arithmetic • Operational modes• Chainout adder

Related Information

Supported Operational Modes in Intel Stratix 10 Devices on page 5For a summary of features supported per operational modes.

4.1. Internal Coefficient and Pre-Adder for Fixed-Point Arithmetic

In both 18-bit and 27-bit modes, you can use the coefficient feature and pre-adderfeature independently.

When pre-adder feature is enabled in 18-bit modes, you must enable both top andbottom pre-adder.

When internal coefficient feature is enabled in 18-bit modes, you must enable both topand bottom coefficient.

Related Information


4.2. Accumulator for Fixed-Point Arithmetic

The accumulator in the Intel Stratix 10 devices supports double accumulation byenabling the 64-bit double accumulation registers located between the output registerbank and the accumulator.

Related Information


UG-S10-DSP | 2019.10.22

Send Feedback







4.3. Chainout Adder

Table 18. Chainout Adder


You can use the output chaining path to add results fromanother DSP block.Support for all operational modes except for 18 x 18 or 18 x19 independent multiplier and 27 x 27 independentmultiplier modes.

You can use the output chaining path to add results fromanother DSP block.Support for certain operation modes:• Multiply-add or multiply-subtract mode• Vector one mode• Vector two mode

Related Information


4.4. Input Cascade for Fixed-Point Arithmetic

The input register bank in Intel Stratix 10 variable precision DSP block supports inputcascade feature. This feature provides the capability of cascading the input bus withina DSP block and to another DSP block.

When you enable the input cascade feature in 18 x 19 mode:

• The top multiplier Y input drives the bottom multiplier Y input within a DSP block

• The bottom multiplier Y input of the first DSP block drives the top multiplier Yinput of the subsequent DSP block

For 27 × 27 mode, the multiplier Y input of the first DSP block drives the multiplier Yinput of the subsequent DSP block. This feature is not supported with pre-adderenabled.

There are two delay registers that you can use to balance the latency requirementswhen you use both the input cascade and chainout features in fixed-point arithmetic18 x 19 mode. These are the top delay registers and bottom delay registers. The ayinput register must be enabled when top delay register is enabled. The clock sourcefor both registers must be the same. Similarly, the by input register must be enabledwhen bottom delay register is enabled. The clock source for both registers must be thesame.

The delay registers are only supported in 18 x 18 or 18 x 19 independent multiplier,multiplier adder sum mode and 18-bit systolic FIR mode.

4. Design Considerations

UG-S10-DSP | 2019.10.22


40


Figure 27. Input Cascade in Fixed-Point Arithmetic 18 x 19 Mode

ay[18..0]

az[17..0]

ax[17..0]

by[18..0]

Top delay registers

bz[17..0]

bx[17..0]

Bottom delay registers

scanin[18..0]

scanout[18..0]

CLK[2..0]

ENA[2..0]

CLR[0]


UG-S10-DSP | 2019.10.22


41


Figure 28. Input Cascade in Fixed-Point Arithmetic 27 x 27 Mode

ay[26..0]

az[25..0]

ax[26..0]

scanin[26..0]

CLK[2..0]

ENA[2..0]

CLR[0]

scanout[26..0]

Related Information



UG-S10-DSP | 2019.10.22


42


5. Intel Stratix 10 Variable Precision DSP BlocksImplementation Guide

The Intel Quartus Prime software contains tools for you to create and compile yourdesign, and configure your device.

You can prepare for device migration, set pin assignments, define placementrestrictions, setup timing constraints, and customize IP cores using the Intel QuartusPrime software.

The supported IP cores for Intel Stratix 10 variable precision DSP includes:

• Native Fixed Point DSP Intel Stratix 10 FPGA IP(5)

• Multiply Adder(5)

• ALTMULT_COMPLEX(5)

• LPM_MULT(5)

• Native Floating Point DSP Intel Stratix 10 FPGA IP(5)

Related Information

• Introduction to Intel FPGA IP CoresProvides general information about all Intel FPGA IP cores, includingparameterizing, generating, upgrading, and simulating IP cores.

• Creating Version-Independent IP and Qsys Simulation ScriptsCreate simulation scripts that do not require manual updates for software or IPversion upgrades.

• Project Management Best PracticesGuidelines for efficient management and portability of your project and IP files.

(5) Intel Stratix 10 variable precision DSP IP cores only available in Intel Quartus Prime ProEdition.

UG-S10-DSP | 2019.10.22

Send Feedback



https://www.intel.com/content/www/us/en/programmable/documentation/mwh1409960636914.html#mwh1409958250601

https://www.intel.com/content/www/us/en/programmable/documentation/mwh1409960636914.html#mwh1409958301774

https://www.intel.com/content/www/us/en/programmable/documentation/mwh1409960181641.html#esc1444754592005





6. Native Fixed Point DSP Intel Stratix 10 FPGA IP CoreReferences

The Native Fixed Point DSP Intel Stratix 10 FPGA IP core instantiates and controls asingle Intel Stratix 10 Variable Precision DSP block.

Operational modes supported in this IP core include:

• 18 × 18 full mode

• 18 × 18 full top mode

• 18 × 18 sum-of-2 mode

• 18 × 18 plus 36 mode

• 18 × 18 systolic mode

• 27 × 27 mode

UG-S10-DSP | 2019.10.22

Send Feedback







Figure 29. Native Fixed Point DSP Intel Stratix 10 FPGA IP Core Functional BlockDiagram

ay

ax

sub

ena clr

clk

az +/

by

bxcoefselb

bzBottom

pre-adder+/--

*1st

Pipeli

ne Re

giste

rs*1

st Pip

eline

Regis

ters

Inpu

t Re

giste

rsIn

put R

egist

ers

Top Multiplier

Bottom Multiplier

x

x

InternalCoefficient

InternalCoefficient

OutputRegister

Double AccumulatorRegister

Bottom Delay

Register

TopDelay

Register

SystolicRegister

Input Systolic Register

-+

*2nd

Pipe

line

Regis

ters

*2nd

Pipe

line R

egist

ers

+/-

Adder

+/-

Chainadder

resultaresultb

scanin chainin

Chainin

coefsela

Toppre-adder

scanout chainout

negate

accumulateloadconst


Related Information

• Block Architecture Overview on page 9

• Independent Multiplier Mode on page 22

• Multiplier Adder Sum Mode on page 24

• Independent Complex Multiplier on page 24

• 18 × 19 Multiplication Summed with 36-Bit Input Mode on page 25

• Systolic FIR Mode on page 26

6.1. Native Fixed Point DSP Intel Stratix 10 FPGA IP ReleaseInformation

IP versions are the same as the Intel Quartus Prime Design Suite software versions upto v19.1. From Intel Quartus Prime Design Suite software version 19.2 or later, IPcores have a new IP versioning scheme.

6. Native Fixed Point DSP Intel Stratix 10 FPGA IP Core References

UG-S10-DSP | 2019.10.22


45


The IP versioning scheme (X.Y.Z) number changes from one software version toanother. A change in:

• X indicates a major revision of the IP. If you update your Intel Quartus Primesoftware, you must regenerate the IP.

• Y indicates the IP includes new features. Regenerate your IP to include these newfeatures.

• Z indicates the IP includes minor changes. Regenerate your IP to include thesechanges.

Table 19. Native Fixed Point DSP Intel Stratix 10 FPGA IP Release Information

Item Description

IP Version 19.1.0

Intel Quartus Prime Version 19.3

Release Date 2019.09.30

6.2. Supported Operational Modes

Table 20. Operational Modes Supported by Native Fixed Point DSP Intel Stratix 10 FPGAIP Core

Operational Modes Description

18 × 18 Full Mode This mode operates as two independent 18 (signed) × 19(signed) or 18 (unsigned) × 18 (unsigned) multipliers with37-bit output.This mode applies the following equations:• resulta = ax * ay• resultb = bx * by

18 × 18 Full Top Mode This mode operates as a single 18 (signed) x 19(signed) or18 (unsigned) x 18 (unsigned) multiplier with 37-bit output.This mode applies the following equation:• resulta = ax * ay

18 × 18 Sum of Two Mode This mode operates as sum of two 18 × 19 multiplication.This mode applies the equations of:• resulta = [(bx * by) + (ax * ay)] when sub signal is

driven low.• resulta = [(bx * by) - (ax * ay)] when sub signal is

driven high.The resulta output bus can support up to 64 bits whenyou enable accumulator or chainout adder.

18 × 18 Plus 36 Mode This mode operates as one 18 × 19 multiplication summedto a 36-bit input.This mode applies the equation of resulta = (ax * ay) + (bx,by).When the input bus is less than 36-bit in this mode, you arerequired to provide the necessary signed extension to fill upthe 36-bit input.When you enable the accumulator, the resulta output buscan support up to 64 bits.

18 × 18 Systolic Mode This mode operates as 18-bit systolic FIR.Enable the input systolic register and the output registerwhen using this operational mode.

continued...


UG-S10-DSP | 2019.10.22


46


Operational Modes Description

When you enable the chainout adder, the chainout andchainin width can support up to 44 bits.When you enable the accumulator, the resulta output buscan support up to 64 bits.

27 × 27 Mode This mode operates as one independent 27(signed/unsigned) × 27(signed/unsigned) multiplier.This mode applies the equation of resulta = ax * ay.The resulta output bus can support up to 64 bits whenyou enable accumulator or chainout adder.

Related Information

• Independent Multiplier Mode on page 22

• Multiplier Adder Sum Mode on page 24

• Independent Complex Multiplier on page 24

• 18 × 19 Multiplication Summed with 36-Bit Input Mode on page 25

• Systolic FIR Mode on page 26

6.3. Maximum Input Data Width for Fixed-Point Arithmetic

Table 21. Maximum Input Data Width for Fixed-Point Arithmetic Operational Modes

OperationMode

Maximum Input Data Width

ax ay az bx by bz COEFSELA COEFSELB

Without Pre-adder or Internal Coefficient

m18×18_full 18(signed)18(unsigned)

19(signed)18(unsigned)

Not used 18(signed)18(unsigned)


Not used Not used Not used

m18x18_full_top



Not used Not used Not used Not used Not used Not used

m18×18_sumof2

18(signed)18(unsigned)(6)


Not used 18(signed)18(unsigned)(6)



m18×18_systolic



Not used 18(signed)18(unsigned)(6)



continued...

(6) Maximum width is 17 when negate is used.


UG-S10-DSP | 2019.10.22


47


OperationMode



m18×18_plus36




18(unsigned)(7)


m27×27 27(signed)27(unsigned)(8)


Not used Not used Not used Not used Not used Not used

With Pre-adder Feature Only

m18×18_full 18(signed)18(unsigned)






Not used Not used

m18x18_full_top




Not used Not used Not used Not used Not used

m18×18_sumof2







Not used Not used

m18×18_systolic







Not used Not used

m27×27 27(signed)27(unsigned)(8)



Not used Not used Not used Not used Not used

With Internal Coefficient Feature Only

m18×18_full Not used 19(signed)18(unsigned)

Not used Not used 19(signed)18(unsigned)

Not used 3 3

m18x18_full_top


Not used Not used Not used Not used 3 Not used

m18×18_sumof2



Not used 3 3

continued...

(7) When the input bus is less than 36-bit, it is necessary to fill up the 36-bit input with signedextension.

(8) Maximum width is 26 when negate is used.


UG-S10-DSP | 2019.10.22


48


OperationMode



m18×18_systolic



Not used 3 3

m27×27 Not used 27(signed)27(unsigned)

Not used Not used Not used Not used 3 Not used

With Pre-adder and Internal Coefficient Features

m18×18_full Not used 18(signed)17(unsigned)




3 3

m18x18_full_top



Not used Not used Not used 3 Not used

m18×18_sumof2





3 3

m18×18_systolic





3 3

m27×27 Not used 26(signed)26(unsigned)


Not used Not used Not used 3 Not used


UG-S10-DSP | 2019.10.22


49


6.3.1. Using Less Than 36-Bit Operand In 18 x 18 Plus 36 Mode Example

This example shows how to configure the Native Fixed Point DSP Intel Stratix 10 FPGAIP core to use 18 × 18 Plus 36 operational mode with a signed 12-bit input data of101010101010 (binary) instead of a 36-bit operand.

1. Set Representation format for bottom multiplier x operand to signed.

2. Set Representation format for bottom multiplier y operand to unsigned.

3. Set 'bx' input bus width to 18.

4. Set 'by' input bus width to 18.

5. Provide 18-bit signed representation data, example,'111111111111111111', tobx input bus.

This step is to perform sign extension. The initial 12 bits input is extended to 36bits with bx representing the most significant 18 bits.

6. Provide data 18-bit signed representation data, example,'111111101010101010', to by input bus.

6.4. Parameterizing Native Fixed Point DSP IP Core

1. In Intel Quartus Prime Pro Edition, create a new project that targets a Intel Stratix10 device.

2. In IP Catalog, click Library ➤ DSP ➤ Primitive DSP ➤ Native Fixed PointDSP.The Native Fixed Point DSP IP Core IP parameter editor opens.

3. In the New IP Variation dialog box, enter an Entity Name and click OK.

4. Under Parameters, select the operation mode, multiplier configuration, clearsignal, port width, and internal coefficient configurations according to the variantof your IP core

5. In the DSP Block View, switch the clock of each valid register.

6. Click the input and output ports in the GUI to select your desired inputs andoutputs.

7. Click the Preadder symbols in the GUI to select addition or subtraction.

8. Click the Top delay register Bottom delay register and symbols in the GUI toenable the delay registers.

9. Click the multiplexer symbols in the GUI to enable the preadder modules and theinternal coefficient modules.

10. Click the clken port symbols to create clock enable signal for each valid register.

11. Click the clr port symbols to create clear signal for each valid register.

12. Click Generate HDL.

13. Click Finish.


UG-S10-DSP | 2019.10.22


50


6.4.1. Native Fixed Point DSP Intel Stratix 10 FPGA IP Parameters

Table 22. General Parameters

Parameter IPGeneratedParameter

Value DefaultValue

Description

Operation Mode

Select the OperationMode

operation_mode

m18×18_fullm18×18_full_topm18×18_sumof2m18×18_plus36m18×18_systolicm27×27

m18×18_full

Select the desired operational mode.

Multiplier Configuration

Representation formatfor AX input bus

signed_max signedunsigned

unsigned Specify the representation format for thetop multiplier x operand.

Representation formatfor AY/AZ input buses

signed_may signedunsigned

unsigned Specify the representation format for thetop multiplier y operand.

Representation formatfor BX input bus

signed_mbx signedunsigned

unsigned Specify the representation format for thebottom multiplier x operand.

Representation formatfor BY/BZ input buses

signed_mby signedunsigned

unsigned Specify the representation format for thebottom multiplier y operand.Always select unsigned form18×18_plus36 .

Clear Signal Setting

Type of clear signal clear_type noneaclrsclr

none Select aclr to use asynchronous clearsignal type for all registers.Select sclr to use synchronous clearsignal type for all registers.

Port Width Setting

How wide should AXinput bus be?

ax_width 1–27 18 Specify the width of ax input bus.Refer to Maximum Input Data Width forFixed-Point Arithmetic on page 47.

How wide should BXinput bus be?

bx_width 1–18 18 Specify the width of bx input bus.Set this parameter to 0 when usingm18x18_full_top mode.Refer to Maximum Input Data Width forFixed-Point Arithmetic on page 47.

How wide should AYinput bus be?

ay_scan_in_width

1–27 18 Specify the width of ay or scanin inputbus.Refer to Maximum Input Data Width forFixed-Point Arithmetic on page 47.

How wide should BYinput bus be?

by_width 1–19 18 Specify the width of by input bus.Set this parameter to 0 when usingm18x18_full_top mode.Refer to Maximum Input Data Width forFixed-Point Arithmetic on page 47.

How wide should AZinput bus be?

az_width 0-18 0 Specify the width of az input bus.Refer to Maximum Input Data Width forFixed-Point Arithmetic on page 47.

continued...


UG-S10-DSP | 2019.10.22


51


Parameter IPGeneratedParameter

Value DefaultValue

Description

How wide should BZinput bus be?

bz_width 0–18 0 Specify the width of bz input bus.Set this parameter to 0 when usingm18x18_full_top mode.Refer to Maximum Input Data Width forFixed-Point Arithmetic on page 47.

How wide should resultA width?

result_a_width

1–64 37 Specify the width of resulta output bus.

How wide should resultB width?

result_b_width

1–37 37 Specify the width of resultb output bus.This parameter is supported only inm18x18_full mode.

How wide should resultscanout port (1)

scan_out_width

1–27 0 Specify the width of scanout output bus.

Figure 30. DSP Block ViewEach block is described in the DSP Block View Parameters table.

Table 23. DSP Block View Parameters

Parameter Value Default Value Description

loadconst DisableEnable

Disable Click the port symbol toenable loadconst port andits input register.

accumulate port (2) DisableEnable

Disable Click the port symbol toenable accumlate port andits input register.

continued...


UG-S10-DSP | 2019.10.22


52



negate port (3) DisableEnable

Disable Click the port symbol toenable negate port and itsinput register.

sub port (4) DisableEnable

Disable Click the port symbol toenable sub port and itsinput register.

Top delay register (5) DisableEnable

Disable Click to enable the top delayregister for ay input bus.This feature is not supportedin m18×18_plus36 andm27x27 operational mode.

Bottom delay register (6) DisableEnable

Disable Click to enable bottom delayregister for by input bus.This feature is not supportedin m18×18_plus36,m18x18_top_full, andm27x27 operational mode.

Scanout output bus (7) DisableEnable

Disable Click to enable scanoutoutput bus.

Input cascade for ay input(8)

DisableEnable

Disable Click to enable input cascademodule for ay input.When you enable inputcascade module, the Stratix10 Native Fixed Point DSP IPcore uses the scanin inputsignals as input instead ofay input signal.

Input cascade for by input(9)

DisableEnable

Disable Click to enable input cascademodule for by input.When you enable inputcascade module, the Stratix10 Native Fixed Point DSP IPcore uses the ay inputsignals as input instead ofby input signal.

Register clock (10) NoneClock 0Clock 1Clock 2

Clock 0 To bypass any register,switch the register clock toNone.Switch the register clock to:• Clock 0 to use clk[0]

signal as the clock source• Clock 1 to use clk[1]

signal as the clock source• Clock 2 to use clk[2]

signal as the clock source

Top pre-adder (11) DisableEnable

Disable Click to enable top pre-addermodule.This uses az input bus asone of the operand source.To use pre-adder feature,both top and bottom pre-adder modules must beenabled.

Top Pre-adder operation (12) +-

+ Click to switch the operationof top preadder betweenaddition and subtraction.

continued...


UG-S10-DSP | 2019.10.22


53



Top coefficient module (13) DisableEnable

Disable Click to enable top internalcoefficient module.To use internal coefficientfeature, both top andbottom internal coefficientmodules must be enabled.

Bottom pre-adder (14) DisableEnable

Disable Click to enable bottom pre-adder module.This uses bz input bus asone of the operand source.To use pre-adder feature,both top and bottom pre-adder modules must beenabled.

Bottom coefficient module(15)

DisableEnable

Disable Click to enable bottominternal coefficient module.To use internal coefficientfeature, both top andbottom internal coefficientmodules must be enabled.

Bottom Pre-adder operation(16)

+-

+ Click to switch the operationof bottom preadder betweenaddition and subtraction.

Chainin input bus (17) DisableEnable

Disable Click to enable Chainininput bus.

Clock enable for clock 0 (18) DisableEnable

Disable Click to create clock enablesignal for clock 0.





Clear signal for inputregisters (21)

DisableEnable

Disable Click to create Clr[0]signal for all input registers.Use the Type of clearsignal parameter to selectasynchronous clear orsynchronous clear for theinput registers.

Clear signal for output andpipeline registers (22)

DisableEnable

Disable Click to create Clr[1]signal for all output andpipeline registers.Use the Type of clearsignal parameter to selectasynchronous clear orsynchronous clear for theoutput and pipelineregisters.

Double accumulator module(23)

DisableEnable

Disable Click to enable doubleaccumulator feature.

Chainout output bus (24) DisableEnable

Disable Click to enable Chainoutoutput bus.


UG-S10-DSP | 2019.10.22


54


Table 24. Coefficient Configuration

Parameter IP GeneratedParameter

Value Default Value Description

Load Const Setting

What is the valuefor loadconst?

load_const_value 0 - 63 0 Specify the presetconstant value.This value can be 2N

where N is the presetconstant value.

Coefficient A Storage Configuration

Coef_a_0 coef_a_0 Integer 0 Specify the coefficientvalues for ax inputbus.For 18-bit operationmode, the maximuminput value is 218 - 1.For 27-bit operation,the maximum value is227 - 1.

Coef_a_1 coef_a_1

Coef_a_2 coef_a_2

Coef_a_3 coef_a_3

Coef_a_4 coef_a_4

Coef_a_5 coef_a_5

Coef_a_6 coef_a_6

Coef_a_7 coef_a_7

Coefficient B Storage Configuration

Coef_b_0 coef_a_0 Integer 0 Specify the coefficientvalues for ax inputbus.Set coefficient valuesto more than67108864 whenoperand is set tounsigned andnegate is enabled.

Coef_b_1 coef_a_1

Coef_b_2 coef_a_2

Coef_b_3 coef_a_3

Coef_b_4 coef_a_4

Coef_b_5 coef_a_5

Coef_b_6 coef_a_6

Coef_b_7 coef_a_7


UG-S10-DSP | 2019.10.22


55


6.5. Signals

The following figure shows the input and output signals of the Native Fixed Point DSPIntel Stratix 10 FPGA IP core.

Figure 31. Native Fixed Point DSP Intel Stratix 10 FPGA IP Core Signals

sub

negate

accumulate

loadconst

ax[17:0], [26:0]

ay[18:0], [26:0]

az[17:0], [25:0]

bx[17:0]

by[18:0]

bz[17:0]

scanin[26:0]

coefsela[2:0]

coefselb[2:0]

clk[2:0]

ena[2:0]

clr[1:0]

resulta[63:0]

resultb[36:0]

scanout[26:0]

chainout[63:0]

Native Fixed Point DSP Intel Stratix 10 FPGA IP

DataInputSignals

DataOutputSignals

Dynamic ControlSignals

Internal CoefficientSignals

Clock,EnableandClearSignals

OutputCascadeSignals

Input CascadeSignals

chainin[63:0]

Table 25. Data Input Signals

Signal Name Type Width Description

ax[26:0] Input 27 Input data bus to top multiplier.This signal is not available when internal coefficient featureis enabled.

ay[26:0] Input 27 Input data bus to top multiplier.When pre-adder is enabled, these signals are served asinput to the top pre-adder.

az[25:0] Input 26 These signal are input to the top pre-adder.These signals are only available when pre-adder is enabledand not available in m18x18_plus36 operational mode.

bx[17:0] Input 18 Input data bus to bottom multiplier.These signals are not available in m27×27operationalmode and when internal coefficient feature is enabled.

by[18:0] Input 19 Input data bus to bottom multiplier.

continued...


UG-S10-DSP | 2019.10.22


56



When pre-adder is enabled, these signals serve as inputsignals to the bottom pre-adder.These signals are not available in m27×27 operationalmode.

bz[17:0] Input 18 These signals are input signals to the bottom pre-adder.These signals are only available when pre-adder is enabled.These signals are not available in m18x18_plus36 andm27×27 operational modes.

Table 26. Data Output Signals


resulta[63:0] Output 64 Output data bus from top multiplier.Only in m18×18_full mode, these signals support up to 37bits.

resultb[36:0] Output 37 Output data bus from bottom multiplier.These signals are only available in m18×18_fulloperational mode.

Table 27. Clock, Enable and Clear Signals


clk[2:0] Input 3 Input clock for all registers.These clock are only available if any of the input registers,pipeline registers or output register is set to Clock0 orClock1 or Clock2.• clk[0] = Clock0• clk[1] = Clock1• clk[2] = Clock2

ena[2:0] Input 3 Clock enable for clk[2:0].These signals are active-High.• ena[0] is for Clock0• ena[1] is for Clock1• ena[2] is for Clock2

clr[1:0] Input 2 These signals can be asynchronous or synchronous clearinput signals for all registers. You may select the type ofclear input signal using Type of CLEAR signal parameter.These signals are active-High.Use clr[0] for all input registers and use clr[1] for allpipeline and output registers.By default, this signal is de-asserted.

Table 28. Dynamic Control SignalsFor summary of supported dynamic control features for each operational modes, please refer to Table 2 onpage 6


sub Input 1 Dynamic input signal to control the operation of the addermodule.• De-assert this signal to add the output of the top

multiplier with the output of the bottom multiplier.• Assert this signal to subtract the output of the top

multiplier from the output of the bottom multiplier.

continued...


UG-S10-DSP | 2019.10.22


57



By default, this signal is de-asserted. You can assert orde-assert this signal during run-time.This signal is not available in m18x18_full,m18x18_full_top, and m27x27 operational modes.

negate Input 1 Dynamic input signal to control the operation of thechainout adder module.• Deassert this signal to add the sum of the top and

bottom multipliers with the chainin data input bus andaccumulate loopback data.

• Assert this signal to subtract the sum of the top andbottom multipliers from the chainin data input bus andaccumulate loopback data.

By default, this signal is de-asserted. You can assert orde-assert this signal during run-time.This signal is not available in m18x18_full andm18x18_full_topoperational modes.

accumulate Input 1 Input signal to enable or disable the accumulator feature.• De-assert this signal to generate the current result

without accumulating the previous result.• Assert this signal to add the current result to the

previous result.By default, this signal is de-asserted. You can assert orde-assert this signal during run-time.This signal is not available in m18x18_full andm18x18_full_topoperational modes.

loadconst Input 1 Input signal to enable or disable the load constantfeature.• De-assert this signal to disable the load constant

feature.• Assert this signal to add a preload constant to the

result to perform a biased rounding.By default, this signal is de-asserted. You can assert orde-assert this signal during run-time.This signal is not available in m18x18_full andm18x18_full_top operational modes.

Table 29. Internal Coefficient PortsFor summary of supported features for each operational modes, please refer to Table 1 on page 5


coefsela[2:0] Input 3 Input selection signals for 8 coefficient values defined byuser for the top multiplier. The coefficient values are storedin the internal memory and specified by parameterscoef_a_0 to coef_a_7.• coefsela[2:0] = 000 refers to coef_a_0• coefsela[2:0] = 001 refers to coef_a_1• coelsela[2:0] = 010 refers to coef_a_2 and so forth.These signals are only available when the internalcoefficient feature is enabled.These signals are not available in m18x18_plus36operational mode.

coefselb[2:0] Input 3 Input selection signals for 8 coefficient values defined byuser for the bottom multiplier. The coefficient values arestored in the internal memory and specified by parameterscoef_b_0 to coef_b_7.

continued...


UG-S10-DSP | 2019.10.22


58



• coefselb[2:0] = 000 refers to coef_b_0• coefselb[2:0] = 001 refers to coef_b_1• coelselb[2:0] = 010 refers to coef_b_2 and so forth.These signals are only available when the internalcoefficient feature is enabled.These signals are not available in m18x18_full,m18x18_plus36 and m27x27 operational modes.

Table 30. Input Cascade Signals


scanin[26:0] Input 27 Input data bus for input cascade module.Connect these signals to the scanout signals from thepreceding DSP core.

scanout[26:0] Ouput 27 Output data bus of the input cascade module.Connect these signals to the scanin signals of the nextDSP core.

Table 31. Output Cascade Signals


chainin[63:0] Input 64 Input data bus for output cascade module.Connect these signals to the chainout signals from thepreceding DSP core.In 18 x 18 systolic mode, only 44 bits of output cascade issupported.

chainout[63:0] Output 64 Output data bus of the output cascade module.Connect these signals to the chainin signals of the nextDSP core.In 18 x 18 systolic mode, only 44 bits of output cascade issupported.


UG-S10-DSP | 2019.10.22


59


7. Multiply Adder IP Core ReferencesThe Multiply Adder Intel FPGA IP core allows you to implement a multiplier-adder.(9)

The following figure shows the ports for the Multiply Adder Intel FPGA IP core.

Figure 32. Multiply Adder Intel FPGA IP Ports

Mult 2

Register

Mult 3

Register

Mult 1

Register

Mult 4

Register

N Layersof

PipelineRegister

SystolicRegister

SystolicRegister

SystolicRegister

OutputRegister

Register

chainout

chainin

scanoutRegister scanouta

ControlSignal

Registeraddnsub3

ControlSignal

Registerssignbsigna

addnsub1negate

accum_sload/sload_accum

dataa_2datab_2

scaninadataa_0datab_0

datab_0/datac_0coefsel0

dataa_3datab_3

datab_3/datac_3

Dat

a Reg

ister

s

dataa_1datab_1

datab_1/datac_1

coefsel2

datab_2/datac_2

coefsel2

coefsel3

Pipeli

ne Re

giste

rs

A multiplier-adder accepts pairs of inputs, multiplies the values together and thenadds to or subtracts from the products of all other pairs.

(9) Intel Stratix 10 variable precision DSP IP cores only available in Intel Quartus Prime ProEdition.

UG-S10-DSP | 2019.10.22

Send Feedback







The DSP block uses 18 × 19-bit input multipliers to process data with widths up to 18bits and 27 × 27 bit input multipliers to process data with widths between 18 to 27bits. For data with widths more than 27 bits, the DSP block uses partial productsalgorithm to process the data and 27 × 27-bit input multiplier to process data withwidths between 18 to 27 bits.

The registers and extra pipeline registers for the following signals are also placedinside the DSP block:

• Data input

• Signed or unsigned select

• Add or subtract select

• Products of multipliers

In the case of the output result, the first register is placed in the DSP block. Howeverthe extra latency registers are placed in logic elements outside the block. Peripheral tothe DSP block, including data inputs to the multiplier, control signal inputs, andoutputs of the adder, use regular routing to communicate with the rest of the device.All connections in the function use dedicated routing inside the DSP block. Thisdedicated routing includes the shift register chains when you select the option to shifta multiplier's registered input data from one multiplier to an adjacent multiplier.

7.1. Multiply Adder Intel FPGA IP Release Information






Table 32. Multiply Adder Intel FPGA IP Release Information

Item Description

IP Version 19.1.0



7. Multiply Adder IP Core References

UG-S10-DSP | 2019.10.22


61


7.2. Features

The Multiply Adder Intel FPGA IP core offers the following features:

• Generates a multiplier to perform multiplication operations of two numbers

Note: When building multipliers larger than the natively supported size there may/will be a performance impact resulting from the partial productionimplementation.

• Supports data widths of 1– 256 bits

• Supports signed and unsigned data representation format

• Supports pipelining with configurable input latency

• Provides an option to dynamically switch between signed and unsigned datasupport

• Provides an option to dynamically switch between add and subtract operation

• Supports optional asynchronous and synchronous clear and clock enable inputports

• Supports systolic delay register mode

• Supports pre-adder with 8 pre-load coefficients per multiplier

• Supports pre-load constant to complement accumulator feedback

7.2.1. Pre-adder

With pre-adder, additions or subtractions are done prior to feeding the multiplier.

There are five pre-adder modes:

• Simple mode

• Coefficient mode

• Input mode

• Square mode

• Constant mode

Note: When pre-adder is used (pre-adder coefficient/input/square mode), all data inputs tothe multiplier must have the same clock setting.

7.2.1.1. Pre-adder Simple Mode

In this mode, both operands derive from the input ports and pre-adder is not used orbypassed. This is the default mode.

Figure 33. Pre-adder Simple Mode

a0

b0

Mult0

result


UG-S10-DSP | 2019.10.22


62


7.2.1.2. Pre-adder Coefficient Mode

In this mode, one multiplier operand derives from the pre-adder, and the otheroperand derives from the internal coefficient storage. The coefficient storage allows upto 8 preset constants. The coefficient selection signals are coefsel[0..3].

This mode is expressed in the following equation.

The following shows the pre-adder coefficient mode of a multiplier.

Figure 34. Pre-adder Coefficient Mode

a0

b0

Mult0

result

coef

+/-

Preadder

coefsel0

7.2.1.3. Pre-adder Input Mode

In this mode, one multiplier operand derives from the pre-adder, and the otheroperand derives from the datac[] input port.


The following shows the pre-adder input mode of a multiplier.

Figure 35. Pre-adder Input Mode

a0

b0

Mult0

result

c0

+/-

7.2.1.4. Pre-adder Square Mode



UG-S10-DSP | 2019.10.22


63


The following shows the pre-adder square mode of two multipliers.

Figure 36. Pre-adder Square Mode

a0

b0

Mult0

result+/-

7.2.1.5. Pre-adder Constant Mode

In this mode, one multiplier operand derives from the input port, and the otheroperand derives from the internal coefficient storage. The coefficient storage allows upto 8 preset constants. The coefficient selection signals are coefsel[0..3].


The following figure shows the pre-adder constant mode of a multiplier.

Figure 37. Pre-adder Constant Mode

a0

Mult0

result

coef

coefsel0

7.2.2. Systolic Delay Register

In a systolic architecture, the input data is fed into a cascade of registers acting as adata buffer. Each register delivers an input sample to a multiplier where it is multipliedby the respective coefficient. The chain adder stores the gradually combined resultsfrom the multiplier and the previously registered result from the chainin[] inputport to form the final result. Each multiply-add element must be delayed by a singlecycle so that the results synchronize appropriately when added together. Eachsuccessive delay is used to address both the coefficient memory and the data buffer oftheir respective multiply-add elements. For example, a single delay for the secondmultiply add element, two delays for the third multiply-add element, and so on.


UG-S10-DSP | 2019.10.22


64


Figure 38. Systolic Registers

x(t)

c(0) c(1) c(2)

y(t)

c(N-1)

Systolic registers

S -1 S -1 S -1 S -1 S -1 S -1

S -1 S -1 S -1S -1

x(t) represents the results from a continuous stream of input samples and y(t)represents the summation of a set of input samples, and in time, multiplied by theirrespective coefficients. Both the input and output results flow from left to right. Thec(0) to c(N-1) denotes the coefficients. The systolic delay registers are denoted by S-1,whereas the –1 represents a single clock delay. Systolic delay registers are added atthe inputs and outputs for pipelining in a way that ensures the results from themultiplier operand and the accumulated sums stay in sync. This processing element isreplicated to form a circuit that computes the filtering function. This function isexpressed in the following equation.

N represents the number of cycles of data that has entered into the accumulator, y(t)represents the output at time t, A(t) represents the input at time t, and B(i) are thecoefficients. The t and i in the equation correspond to a particular instant in time, soto compute the output sample y(t) at time t, a group of input samples at N differentpoints in time, or A(n), A(n-1), A(n-2), … A(n-N+1) is required. The group of N inputsamples are multiplied by N coefficients and summed together to form the final resulty.

The systolic register architecture is available only for sum-of-2 and sum-of-4 modes.

The following figure shows the systolic delay register implementation of 2 multipliers.


UG-S10-DSP | 2019.10.22


65


Figure 39. Systolic Delay Register Implementation of 2 Multipliers

a0

b0

Mult0

result

chainin

a1

b1

Mult1

+/-

+/-

The sum of two multipliers is expressed in the following equation.

The following figure shows the systolic delay register implementation of 4 multipliers.


UG-S10-DSP | 2019.10.22


66


Figure 40. Systolic Delay Register Implementation of 4 Multipliers

a0

b0

Mult0

chainin

a1

b1

Mult1

a2

b2

Mult2

a3

b3

Mult3

result

+/-

+/-

+/-

+/-

The sum of four multipliers is expressed in the following equation.

The following lists the advantages of systolic register implementation:

• Reduces DSP resource usage

• Enables efficient mapping in the DSP block using the chain adder structure


UG-S10-DSP | 2019.10.22


67


7.2.3. Pre-load Constant

The pre-load constant controls the accumulator operand and complements theaccumulator feedback. The valid LOADCONST_VALUE ranges from 0–64. The constantvalue is equal to 2N, where N = LOADCONST_VALUE. When the LOADCONST_VALUE isset to 64, the constant value is equal to 0. This function can be used as biasedrounding.

The following figure shows the pre-load constant implementation.

Figure 41. Pre-load Constant

a0

b0

a1

b1

Mult0

Mult1

Accumulator feedback

accum_sload

constant

result

+/-

+/-

sload_accum

7.2.4. Double Accumulator

The double accumulator feature adds an additional register in the accumulatorfeedback path that process the interleaved complex data (I, Q) . The doubleaccumulator register follows the output register, which includes the clock, clockenable, and aclr. The additional accumulator register returns result with a one-cycledelay. This feature enables you to have two accumulator channels with the sameresource count.

The following figure shows the double accumulator implementation.


UG-S10-DSP | 2019.10.22


68


Figure 42. Double Accumulator

a0

b0

a1

b1

Mult0

Mult1

Accumulator feedba ck

Output result

+/-

+/-

Double Accu mulator Register

Output Register

7.3. Parameters

You can customize the Multiply Adder Intel FPGA IP core by specifying the parametersusing the parameter editor in the Intel Quartus Prime software.

7.3.1. General Tab

Table 33. General Tab


What is the number ofmultipliers?

1 - 4 1 Number of multipliers to be added together.Values are 1 up to 4.

How wide should the A inputbuses be?

1 - 256 16 Specify the width of the dataa[] port.

How wide should the B inputbuses be?

1 - 256 16 Specify the width of the datab[] port.

How wide should the 'result'output bus be?

1 - 256 32 Specify the width of the result[] port.

Create an associated clockenable for each clock

OnOff

Off Select this option to create clock enable for eachclock.

7.3.2. Extra Modes

Table 34. Extra Modes Tab


Outputs Configuration

Register output of theadder unit

OnOff

Off Turn on this option to enable output register ofthe adder module.

What is the source for clockinput?

Clock0Clock1Clock2

Clock0 Select Clock0 , Clock1 or Clock2 to enableand specify the clock source for outputregisters.

continued...


UG-S10-DSP | 2019.10.22


69



You must select Register output of theadder unit to enable this parameter.

What is the source forasynchronous clear input?

NONEACLR0ACLR1

NONE Specifies the asynchronous clear source for theadder output register.You must select Register output of theadder unit to enable this parameter.The IP core supports either asynchronous orsynchronous clear but not both.

What is the source forsynchronous clear input?

NONESCLR0SCLR1

NONE Specifies the synchronous clear source for theadder output register.You must select Register output of theadder unit to enable this parameter.The IP core supports either asynchronous orsynchronous clear but not both.

Adder Operation

What operation should beperformed on outputs ofthe first pair of multipliers?

ADD,SUB,VARIABLE

ADD Select addition or subtraction operation toperform for the outputs between the first andsecond multipliers.• Select ADD to perform addition operation.• Select SUB to perform subtraction

operation.• Select VARIABLE to use addnsub1 port for

dynamic addition/subtraction control.When VARIABLE value is selected:• Drive addnsub1 signal to high for addition

operation.• Drive addnsub1 signal to low for

subtraction operation.You must select more than two multipliers toenable this parameter.

Register 'addnsub1' input OnOff

Off Turn on this option to enable input register foraddnsub1 port.You must select VARIABLE for Whatoperation should be performed on outputsof the first pair of multipliers to enable thisparameter.


Clock0Clock1Clock2

Clock0 Select Clock0 , Clock1 or Clock2 to specifythe input clock signal for addnsub1 register.You must select Register 'addnsub1' input toenable this parameter.


NONEACLR0ACLR1

NONE Specifies the asynchronous clear source for theaddnsub1 register.You must select Register 'addnsub1' input toenable this parameter.The IP core supports either asynchronous orsynchronous clear but not both.


NONESCLR0SCLR1

NONE Specifies the synchronous clear source for theaddnsub1 register.You must select Register 'addnsub1' input toenable this parameter.The IP core supports either asynchronous orsynchronous clear but not both.

continued...


UG-S10-DSP | 2019.10.22


70



What operation should beperformed on outputs ofthe second pair ofmultipliers?

ADD,SUB,VARIABLE

ADD Select addition or subtraction operation toperform for the outputs between the third andfourth multipliers.• Select ADD to perform addition operation.• Select SUB to perform subtraction

operation.• Select VARIABLE to use addnsub1 port for

dynamic addition/subtraction control.When VARIABLE value is selected:• Drive addnsub1 signal to high for addition

operation.• Drive addnsub1 signal to low for

subtraction operation.You must select the value 4 for What is thenumber of multipliers? to enable thisparameter.

Register 'addnsub3' input OnOff

Off Turn on this option to enable input register foraddnsub3 signal.You must select VARIABLE for Whatoperation should be performed on outputsof the second pair of multipliers to enablethis parameter.


Clock0Clock1Clock2

Clock0 Select Clock0 , Clock1 or Clock2 to specifythe input clock signal for addnsub3 register.You must select Register 'addnsub3' input toenable this parameter.


NONEACLR0ACLR1

NONE Specifies the asynchronous clear source for theaddnsub3 register.You must select Register 'addnsub3' input toenable this parameter.The IP core supports either asynchronous orsynchronous clear but not both.


NONESCLR0SCLR1

NONE Specifies the synchronous clear source for theaddnsub3 register.You must select Register 'addnsub3' input toenable this parameter.The IP core supports either asynchronous orsynchronous clear but not both.

Polarity

Enable ‘use_subadd’ OnOff

Off Turn on this option to reverse the function ofaddnsub input port.When this option is turned on, do the following:• drive addnsub to high for subtraction

operation• drive addnsub to low for addition operation

7.3.3. Multipliers Tab

Table 35. Multipliers Tab


What is the representationformat for Multipliers Ainputs?

SIGNED,UNSIGNED,

UNSIGNED Specify the representation format for themultiplier A input.

continued...


UG-S10-DSP | 2019.10.22


71



VARIABLE

Register ‘signa’ input OnOff

Off Select this option to enable signa register.You must select VARIABLE value for What isthe representation format for Multipliers Ainputs? parameter to enable this option.


Clock0Clock1Clock2

Clock0 Select Clock0 , Clock1 or Clock2 to enableand specify the input clock signal for signaregister.You must select Register ‘signa’ input toenable this parameter.


NONEACLR0ACLR1

NONE Specifies the asynchronous clear source for thesigna register.You must select Register ‘signa’ input toenable this parameter.The IP core supports either asynchronous orsynchronous clear but not both.


NONESCLR0SCLR1

NONE Specifies the synchronous clear source for thesigna register.You must select Register ‘signa’ input toenable this parameter.The IP core supports either asynchronous orsynchronous clear but not both.

What is the representationformat for Multipliers Binputs?

SIGNED,UNSIGNED,VARIABLE

UNSIGNED Specify the representation format for themultiplier B input.

Register ‘signb’ input OnOff

Off Turn on this option to enable signb register.You must select VARIABLE value for What isthe representation format for Multipliers Binputs? parameter to enable this option.


Clock0Clock1Clock2

Clock0 Select Clock0 , Clock1 or Clock2 to enableand specify the input clock signal for signbregister.You must select Register ‘signb’ input toenable this parameter.


NONEACLR0ACLR1

NONE Specifies the asynchronous clear source for thesignb register.You must select Register ‘signb’ input toenable this parameter.The IP core supports either asynchronous orsynchronous clear but not both.


NONESCLR0SCLR1

NONE Specifies the synchronous clear source for thesignb register.You must select Register ‘signb’ input toenable this parameter.The IP core supports either asynchronous orsynchronous clear but not both.

Input Configuration

Register input A of themultiplier

OnOff

Off Turn on this option to enable input register fordataa input bus.


Clock0Clock1Clock2

Clock0 Select Clock0 , Clock1 or Clock2 to enableand specify the register input clock signal fordataa input bus.

continued...


UG-S10-DSP | 2019.10.22


72



You must select Register input A of themultiplier to enable this parameter.


NONEACLR0ACLR1

NONE Specifies the register asynchronous clearsource for the dataa input bus.You must select Register input A of themultiplier to enable this parameter.The IP core supports either asynchronous orsynchronous clear but not both.


NONESCLR0SCLR1

NONE Specifies the register synchronous clear sourcefor the dataa input bus.You must select Register input A of themultiplier to enable this parameter.The IP core supports either asynchronous orsynchronous clear but not both.

Register input B of themultiplier

OnOff

Off Turn on this option to enable input register fordatab input bus.


Clock0Clock1Clock2

Clock0 Select Clock0 , Clock1 or Clock2 to enableand specify the register input clock signal fordatab input bus.You must select Register input B of themultiplier to enable this parameter.


NONEACLR0ACLR1

NONE Specifies the register asynchronous clearsource for the datab input bus.You must select Register input B of themultiplier to enable this parameter.The IP core supports either asynchronous orsynchronous clear but not both.


NONESCLR0SCLR1

NONE Specifies the register synchronous clear sourcefor the datab input bus.You must select Register input B of themultiplier to enable this parameter.The IP core supports either asynchronous orsynchronous clear but not both.

What is the input A of themultiplier connected to?

Multiplier inputScan chain input

Multiplierinput

Select the input source for input A of themultiplier.Select Multiplier input to use dataa input busas the source to the multiplier.Select Scan chain input to use scanin inputbus as the source to the multiplier and enablethe scanout output bus.This parameter is available when you select 2,3 or 4 for What is the number ofmultipliers? parameter.

Scanout A Register Configuration

Register output of the scanchain

OnOff

Off Turn on this option to enable output register forscanouta output bus.You must select Scan chain input for What isthe input A of the multiplier connected to?parameter to enable this option.


Clock0Clock1Clock2

Clock0 Select Clock0 , Clock1 or Clock2 to enableand specify the register input clock signal forscanouta output bus.

continued...


UG-S10-DSP | 2019.10.22


73



You must turn on Register output of thescan chain parameter to enable this option.


NONEACLR0ACLR1

NONE Specifies the register asynchronous clearsource for the scanouta output bus.You must turn on Register output of thescan chain parameter to enable this option.The IP core supports either asynchronous orsynchronous clear but not both.


NONESCLR0SCLR1

NONE Specifies the register synchronous clear sourcefor the scanouta output bus.You must select Register output of the scanchain parameter to enable this option.The IP core supports either asynchronous orsynchronous clear but not both.

7.3.4. Preadder Tab

Table 36. Preadder Tab


Select preadder mode SIMPLE,COEF,INPUT,SQUARE,CONSTANT

SIMPLE Specifies the operationmode for preaddermodule.SIMPLE: This modebypass the preadder.This is the defaultmode.COEF: This mode usesthe output of thepreadder and coefselinput bus as the inputsto the multiplier.INPUT: This mode usesthe output of thepreadder and datacinput bus as the inputsto the multiplier.SQUARE: This modeuses the output of thepreadder as both theinputs to the multiplier.CONSTANT: This modeuses dataa input buswith preadder bypassedand coefsel input busas the inputs to themultiplier.

Select preadder direction ADD,SUB

ADD Specifies the operationof the preadder.To enable thisparameter, select thefollowing for Selectpreadder mode:• COEF• INPUT• SQUARE or• CONSTANT

continued...


UG-S10-DSP | 2019.10.22


74



How wide should the C input busesbe?

1 - 256 16 Specifies the number ofbits for C input bus.You must select INPUTfor Select preaddermode to enable thisparameter.

Data C Input Register Configuration

Register datac input OnOff

On Turn on this option toenable input register fordatac input bus.You must set INPUT toSelect preadder modeparameter to enable thisoption.

What is the source for clock input? Clock0Clock1Clock2

Clock0 Select Clock0 , Clock1or Clock2 to specify theinput clock signal fordatac input register.You must selectRegister datac inputto enable thisparameter.

What is the source for asynchronousclear input?

NONEACLR0ACLR1

NONE Specifies theasynchronous clearsource for the datacinput register.You must selectRegister datac inputto enable thisparameter.The IP core supportseither asynchronous orsynchronous clear butnot both.

What is the source for synchronousclear input?

NONESCLR0SCLR1

NONE Specifies thesynchronous clearsource for the datacinput register.You must selectRegister datac inputto enable thisparameter.The IP core supportseither asynchronous orsynchronous clear butnot both.

Coefficients

How wide should the coef width be? 1 - 27 18 Specifies the number ofbits for coefsel inputbus.You must select COEFor CONSTANT forpreadder mode toenable this parameter.

Coef Register Configuration

continued...


UG-S10-DSP | 2019.10.22


75



Register the coefsel input OnOff

Checked Select this option toenable input register forcoefsel input bus.You must select COEFor CONSTANT forpreadder mode toenable this parameter.

What is the source for clock input? Clock0Clock1Clock2

Clock0 Select Clock0 , Clock1or Clock2 to specify theinput clock signal forcoefsel input register.You must selectRegister the coefselinput to enable thisparameter.

What is the source for asynchronousclear input?

NONEACLR0ACLR1

NONE Specifies theasynchronous clearsource for the coefselinput register.You must selectRegister the coefselinput to enable thisparameter.The IP core supportseither asynchronous orsynchronous clear butnot both.

What is the source for synchronousclear input

NONESCLR0SCLR1

NONE Specifies thesynchronous clearsource for the coefselinput register.You must selectRegister the coefselinput to enable thisparameter.The IP core supportseither asynchronous orsynchronous clear butnot both.

Coefficient_0 Configuration 0x00000 – 0xFFFFFFF 0x00000000 Specifies the coefficientvalues for this firstmultiplier.The number of bits mustbe the same asspecified in How wideshould the coef widthbe? parameter.You must select COEFor CONSTANT forpreadder mode toenable this parameter.

Coefficient_1 Configuration 0x00000 – 0xFFFFFFF 0x00000000 Specifies the coefficientvalues for this secondmultiplier.The number of bits mustbe the same asspecified in How wideshould the coef widthbe? parameter.

continued...


UG-S10-DSP | 2019.10.22


76



You must select COEFor CONSTANT forpreadder mode toenable this parameter.

Coefficient_2 Configuration 0x00000 – 0xFFFFFFF 0x00000000 Specifies the coefficientvalues for this thirdmultiplier.The number of bits mustbe the same asspecified in How wideshould the coef widthbe? parameter.You must select COEFor CONSTANT forpreadder mode toenable this parameter.

Coefficient_3 Configuration 0x00000 – 0xFFFFFFF 0x00000000 Specifies the coefficientvalues for this fourthmultiplier.The number of bits mustbe the same asspecified in How wideshould the coef widthbe? parameter.You must select COEFor CONSTANT forpreadder mode toenable this parameter.

7.3.5. Accumulator Tab

Table 37. Accumulator Tab


Enable accumulator? YES,NO

NO Select YES to enable the accumulator.You must select Register output of adderunit when using accumulator feature.

What is the accumulatoroperation type?

ADD,SUB

ADD Specifies the operation of the accumulator:• ADD for addition operation• SUB for subtraction operation.You must select YES for Enable accumulator?parameter to enable this option.

Preload Constant

Enable preload constant OnOff

Off Enable the accum_sload or sload_accumsignals and the registers input to dynamicallyselect the input to the accumulator.When accum_sload is low or sload_accum ishigh, the multiplier output is feed into theaccumulator.When accum_sload is high or sload_accumis low, a user specified preload constant is feedinto the accumulator.You must select YES for Enable accumulatorparameter to enable this option.

continued...


UG-S10-DSP | 2019.10.22


77



What is the input ofaccumulate port connectedto?

ACCUM_SLOAD,SLOAD_ACCUM

ACCUM_SLOAD

Specifies the behavior of accum_sload/sload_accum signal.ACCUM_SLOAD: Drive accum_sload low toload the multiplier output to the accumulator.SLOAD_ACCUM: Drive sload_accum high toload the multiplier output to the accumulator.You must select Enable preload constantoption to enable this parameter.

Select value for preloadconstant

0 - 64 64 Specify the preset constant value.This value can be 2N where N is the presetconstant value.N=64 represents a constant zero.You must select Enable preload constantoption to enable this parameter.


Clock0Clock1Clock2

Clock0 Select Clock0 , Clock1 or Clock2 to specifythe input clock signal for accum_sload/sload_accum register.You must select Enable preload constantoption to enable this parameter.


NONEACLR0ACLR1

NONE Specifies the asynchronous clear source for theaccum_sload/sload_accum register.You must select Enable preload constantoption to enable this parameter.


NONESCLR0SCLR1

NONE Specifies the synchronous clear source for theaccum_sload/sload_accum register.You must select Enable preload constantoption to enable this parameter.

Enable double accumulator TRUEFALSE

FALSE To enable or disable the double accumulatorfeature.

7.3.6. Systolic/Chainout Tab

Table 38. Systolic/Chainout Adder Tab


Enable chainout adder YES,NO

NO Select YES to enable chainout adder module.

What is the chainout adderoperation type?

ADD,SUB

ADD Specifies the chainout adder operation.For subtraction operation, SIGNED must beselected for What is the representationformat for Multipliers A inputs? and Whatis the representation format for MultipliersB inputs? in the Multipliers Tab.

Enable ‘negate’ input forchainout adder?

PORT_USED,PORT_UNUSED

PORT_UNUSED

Select PORT_USED to enable negate inputsignal.This parameter is invalid when chainout adderis disabled.

Register ‘negate’ input? UNREGISTERED,CLOCK0,CLOCK1,CLOCK2,CLOCK3

UNREGISTERED

To enable the input register for negate inputsignal and specifies the input clock signal fornegate register.Select UNREGISTERED if the negate inputregister to is not neededThis parameter is invalid when you select:

continued...


UG-S10-DSP | 2019.10.22


78



• NO for Enable chainout adder or• PORT_UNUSED for Enable 'negate' input

for chainout adder? parameter


NONEACLR0ACLR1

NONE Specifies the asynchronous clear source for thenegate register.This parameter is invalid when you select:• NO for Enable chainout adder or• PORT_UNUSED for Enable 'negate' input



NONESCLR0SCLR1

NONE Specifies the synchronous clear source for thenegate register.This parameter is invalid when you select:• NO for Enable chainout adder or• PORT_UNUSED for Enable 'negate' input


Systolic Delay

Enable systolic delayregisters

OnOff

Off Select this option to enable systolic mode.This parameter is available when you select 2,or 4 for What is the number of multipliers?parameter.You must enable the Register output of theadder unit to use the systolic delay registers.


CLOCK0,CLOCK1,CLOCK2,

CLOCK0 Specifies the input clock signal for systolicdelay register.You must select enable systolic delayregisters to enable this option.


NONEACLR0ACLR1

NONE Specifies the asynchronous clear source for thesystolic delay register.You must select enable systolic delayregisters to enable this option.


NONESCLR0SCLR1

NONE Specifies the synchronous clear source for thesystolic delay register.You must select enable systolic delayregisters to enable this option.

7.3.7. Pipelining Tab

Table 39. Pipelining Tab


Value DefaultValue

Description

Pipelining Configuration

Do you want to addpipeline register to theinput?

gui_pipelining No,Yes

No Select Yes to enable an additional levelof pipeline register to the input signals.You must specify a value greater than 0for Please specify the number oflatency clock cycles parameter.

Please specify thenumber of latency clockcycles

latency Any value greaterthan 0

0 Specifies the desired latency in clockcycles.One level of pipeline register = 1 latencyin clock cycle.

continued...


UG-S10-DSP | 2019.10.22


79



Value DefaultValue

Description

You must select YES for Do you wantto add pipeline register to the input?to enable this option.

What is the source forclock input?

gui_input_latency_clock

CLOCK0,CLOCK1,CLOCK2

CLOCK0 Select Clock0 , Clock1 or Clock2 toenable and specify the pipeline registerinput clock signal.You must select YES for Do you wantto add pipeline register to the input?to enable this option.

What is the source forasynchronous clearinput?

gui_input_latency_aclr

NONEACLR0ACLR1

NONE Specifies the register asynchronous clearsource for the additional pipelineregister.You must select YES for Do you wantto add pipeline register to the input?to enable this option.

What is the source forsynchronous clearinput?

gui_input_latency_sclr

NONESCLR0SCLR1

NONE Specifies the register synchronous clearsource for the additional pipelineregister.You must select YES for Do you wantto add pipeline register to the input?to enable this option.

7.4. Signals

The following tables list the input and output signals of the Multiply Adder Intel FPGAIP core.

Table 40. Multiply Adder Intel FPGA IP Input Signals

Signal Required Description

dataa_0[]/dataa_1[]/dataa_2[]/dataa_3[]

Yes Data input to the multiplier. Input port [NUMBER_OF_MULTIPLIERS *WIDTH_A - 1 … 0] wide

datab_0[]/datab_1[]/datab_2[]/datab_3[]

Yes Data input to the multiplier. Input signal [NUMBER_OF_MULTIPLIERS *WIDTH_B - 1 … 0] wide

datac_0[] /datac_1[]/datac_2[]/datac_3[]

No Data input to the multiplier. Input signal [NUMBER_OF_MULTIPLIERS *WIDTH_C - 1, … 0] wideSelect INPUT for Select preadder mode parameter to enable thesesignals.

clock[1:0] No Clock input port to the corresponding register. This signal can be usedby any register in the IP core.

aclr[1:0] No Asynchronous clear input to the corresponding register.

sclr[1:0] No Synchronous clear input to the corresponding register.

ena[1:0] No Enable signal input to the corresponding register.

signa No Specifies the numerical representation of the multiplier input A. If thesigna signal is high, the multiplier treats the multiplier input A signal asa signed number. If the signa signal is low, the multiplier treats themultiplier input A signal as an unsigned number.Select VARIABLE for What is the representation format forMultipliers A inputs parameter to enable this signal.

continued...


UG-S10-DSP | 2019.10.22


80



signb No Specifies the numerical representation of the multiplier input B signal. Ifthe signb signal is high, the multiplier treats the multiplier input Bsignal as a signed two's complement number. If the signb signal is low,the multiplier treats the multiplier input B signal as an unsigned number.

scanina[] No Input for scan chain A. Input signal [WIDTH_A - 1, ... 0] wide.When the INPUT_SOURCE_A parameter has a value of SCANA, thescanina[] signal is required.

accum_sload No Dynamically specifies whether the accumulator value is constant. If theaccum_sload signal is low, then the multiplier output is loaded into theaccumulator. Do not use accum_sload and sload_accumsimultaneously.

sload_accum No Dynamically specifies whether the accumulator value is constant. If thesload_accum signal is high, then the multiplier output is loaded intothe accumulator. Do not use accum_sload and sload_accumsimultaneously.

chainin[] No Adder result input bus from the preceding stage. Input signal[WIDTH_CHAININ - 1, … 0] wide.

addnsub1 No Perform addition or subtraction to the outputs from the first pair ofmultipliers. Input 1 to addnsub1 signal to add the outputs from the firstpair of multipliers. Input 0 to addnsub1 signal to subtract the outputsfrom the first pair of multipliers.

addnsub3 No Perform addition or subtraction to the outputs from the first pair ofmultipliers. Input 1 to addnsub3 signal to add the outputs from thesecond pair of multipliers. Input 0 to addnsub3 signal to subtract theoutputs from the first pair of multipliers.

coefsel0[] No Coefficient input signal[0:3] to the first multiplier.

coefsel1[] No Coefficient input signal[0:3]to the second multiplier.

coefsel2[] No Coefficient input signal[0:3]to the third multiplier.

coefsel3[] No Coefficient input signal [0:3] to the fourth multiplier.

Table 41. Multiply Adder Intel FPGA IP Output Signals


result [] Yes Multiplier output signal. Output signal [WIDTH_RESULT - 1 … 0] wide

scanouta [] No Output of scan chain A. Output signal [WIDTH_A - 1..0] wide.Select more than 2 for numbers of multipliers and choose Scan chaininputfor What is the input A of the multiplier connected to parameter toenable this signal.


UG-S10-DSP | 2019.10.22


81


8. ALTMULT_COMPLEX Intel FPGA IP Core ReferenceYou can use the ALTMULT_COMPLEX Intel FPGA IP core to implement the complexmultiplier by instantiating two multipliers.

Figure 43. ALTMULT_COMPLEX Intel FPGA IP Block Diagram

dataa_real

datab_real

dataa_real

datab_real

dataa_imaginary

datab_imaginary

datab_imaginary

dataa_imaginary

result_real

result_imaginary

8.1. ALTMULT_COMPLEX Intel FPGA IP Release Information


UG-S10-DSP | 2019.10.22

Send Feedback











Table 42. ALTMULT_COMPLEX Intel FPGA IP Release Information

Item Description

IP Version 19.1.0



8.2. Features

The ALTMULT_COMPLEX Intel FPGA IP core offers the following features:

• Generates a multiplier to perform multiplication operations of two complexnumbers

Note: When building multipliers larger than the natively supported size there may/will be a performance impact resulting from the partial productscalculations..

• Supports data width of 1–256 bits


• Supports pipelining with configurable output latency


8.3. Complex Multiplication

Complex numbers are numbers in the form of the following equation:

a + ib

Where:

• a and b are real numbers

• i is an imaginary unit that equals the square root of -1.

Two complex numbers, x = a + ib and y = c + id are multiplied, as shown in thefollowing equations.

8. ALTMULT_COMPLEX Intel FPGA IP Core Reference

UG-S10-DSP | 2019.10.22


83


Figure 44. Equation for Two Complex Numbers Multiplication

8.4. Parameters

Table 43. ALTMULT_COMPLEX Intel FPGA IP Parameters


General

How wide should the Ainput buses be?

1–256 18 Specifies the number of bits for dataa_imagand dataa_real input buses.

How wide should the Binput buses be?

1–256 18 Specifies the number of bits for datab_imagand datab_real input buses.

How wide should the‘result’ output bus be?

1–256 36 Specifies the number of bits for ‘result’output bus.

Input Representation

What is the representationformat for A inputs?

Signed,Unsigned

Signed Specifies the representation format for Ainputs.Only Signed representation format issupported in Intel Stratix 10 devices.

What is the representationformat for B inputs?

Signed,Unsigned

Signed Specifies the representation format for Binputs.Only Signed representation format issupported in Intel Stratix 10 devices.

Implementation Style

Which implementation styleshould be used?

Automatically selecta style for besttrade-off for thecurrent settingsCanonical.(Minimize thenumber of simplemultipliers)Conventional.(Minimize the use oflogic cells)

Automatically select astyle forbest trade-off for thecurrentsettings

Intel Stratix 10 devices supports onlyAutomatically select a style for best trade-off for the current settings style. IntelQuartus Prime software will determine the bestimplementation based on the selected devicefamily and input width.

Pipelining

Output latency 0 - 11 4 Specifies the number of clock cycles for outputlatency.

Create a Clear input? NONEACLRSCLR

NONE Select this option to create aclr or sclrsignal for the complex multiplier.

Create a Clock Enableinput?

OnOff

Off Select this option to create ena signal for thecomplex multiplier clock.


UG-S10-DSP | 2019.10.22


84


8.5. Signals

Table 44. ALTMULT_COMPLEX Intel FPGA IP Input Signals


aclr No Asynchronous clear for the complex multiplier. When the aclr signal isasserted high, the function is asynchronously cleared.

sclr No Synchronous clear for the complex multiplier. When the sclr signal isasserted high, the function is asynchronously cleared.

clock Yes Clock input to the ALTMULT_COMPLEX function.

dataa_imag[] Yes Imaginary input value for the data A signal of the complex multiplier.The size of the input signal depends on the How wide should the Ainput buses be? parameter value.

dataa_real[] Yes Real input value for the data A signal of the complex multiplier. The sizeof the input signal depends on the How wide should the A inputbuses be? parameter value.

datab_imag[] Yes Imaginary input value for the data B signal of the complex multiplier.The size of the input signal depends on the How wide should the Binput buses be? parameter value.

datab_real[] Yes Real input value for the data B signal of the complex multiplier. The sizeof the input signal depends on the How wide should the B inputbuses be? parameter value.

ena No Active high clock enable for the clock signal of the complex multiplier.

Table 45. ALTMULT_COMPLEX Intel FPGA IP Output Signals


result_imag Yes Imaginary output value of the multiplier. The size of the output signal depends onthe WIDTH_RESULT parameter value.

result_real Yes Real output value of the multiplier. The size of the output signal depends on theWIDTH_RESULT parameter value.


UG-S10-DSP | 2019.10.22


85


9. LPM_MULT Intel FPGA IP Core ReferencesThe LPM_MULT Intel FPGA IP core implements a multiplier to multiply two input datavalues to produce a product as an output.

Figure 45. LPM_MULT Intel FPGA IP Core Architecture

dataa[]

datab[]aclr/sclr

clken

clock

result[]

9.1. LPM_MULT Intel FPGA IP Release Information






Table 46. LPM_MULT Intel FPGA IP Release Information

Item Description

IP Version 19.1.0



9.2. Features

The LPM_MULT Intel FPGA IP core offers the following features:

UG-S10-DSP | 2019.10.22

Send Feedback







• Generates a multiplier that multiplies two input data values

• Supports data width of 1–256 bits


• Supports area or speed optimization

• Supports pipelining with configurable output latency

• Provides an option for implementation in dedicated digital signal processing (DSP)block circuitry or logic elements (LEs)

Note: When building multipliers larger than the natively supported size there may/will be a performance impact resulting from the cascading of the DSPblocks.


9.3. Parameters

You can customize the Intel Stratix 10 LPM_MULT Intel FPGA IP core by specifying theparameters using the IP Parameter Editor in the Intel Quartus Prime software.

9.3.1. General Tab

Table 47. General Tab


Multiplier Configuration

Type Multiply 'dataa'input by 'datab'inputMultiply 'dataa'input by itself(squaringoperation)

Multiply'dataa' inputby 'datab'input

Select the desired configuration for themultiplier.

Data Port Widths

Dataa width 1 - 256 bits 8 bits Specify the width of the dataa[] port.

Datab width 1 - 256 bits 8 bits Specify the width of the datab[] port.

How should the width of the 'result' output be determined?

Type Automaticallycalculate the widthRestrict the width

Automatically calculatethe width

Select the desired method to determine thewidth of the result[] port.

Value 1 - 512 bits 16 bits Specify the width of the result[] port.This value will only be effective if you selectRestrict the width in the Type parameter.

Result width 1 - 512 bits — Displays the effective width of the result[]port.

9. LPM_MULT Intel FPGA IP Core References

UG-S10-DSP | 2019.10.22


87


9.3.2. General 2 Tab

Table 48. General 2 Tab


Datab Input

Does the 'datab' input bushave a constant value?

• No• Yes

No Select Yes to specify the constant value of the‘datab’ input bus, if any.

Value Any value greaterthan 0

0 Specify the constant value of datab[] port.

Multiplication Type

Which type ofmultiplication do you want?

• Unsigned• Signed

Unsigned Specify the representation format for bothdataa[] and datab[] inputs.

Implementation Style

Which multiplierimplementation should beused?

• Use the defaultimplementation

• Use thededicatedmultipliercircuitry (Notavailable for allfamilies)

• Use logicelements

Use thedefaultimplementation

Select the desired method to determine thewidth of the result[] port.When SCLR is selected for Clear Signal Typeparameter, only Use the dedicated multipliercircuitry (Not available for all families)option is available.

9.3.3. Pipelining Tab

Table 49. Pipelining Tab


Do you want to pipeline the function?

Pipeline NoYes

No Select Yes to enable pipeline register to themultiplier's output. Enabling the pipelineregister adds extra latency to the output.

Latency Any value greaterthan 0.

1 Specify the desired output latency in clockcycle.

Clear Signal Type NONEACLRSCLR

NONE Specify the type of reset for the pipelineregister.Select NONE if you do not use any pipelineregister.Select ACLR to use asynchronous clear for thepipeline register. This generates ACLR port.Select SCLR to use synchronous clear for thepipeline register. This generates SCLR port.

Create a 'clken' clockenable clock

OffOn

Off Specifies active high clock enable for the clockport of the pipeline register

What type of optimization do you want?

Type DefaultSpeedArea

Default Specify the desired optimization for the IP core.Select Default to let Intel Quartus Primesoftware to determine the best optimization forthe IP core.


UG-S10-DSP | 2019.10.22


88


9.4. Signals

Table 50. LPM_MULT Intel FPGA IP Core Input Signals

Signal Name Required Description

dataa[] Yes Data input.The size of the input signal depends on the Dataa width parameter value.

datab[] Yes Data input.The size of the input signal depends on the Datab width parameter value.

clock No Clock input for pipelined usage.For Latency values other than 1 (default), the clock signal must be enabled.

clken No Clock enable for pipelined usage. When the clken signal is asserted high, theadder/subtractor operation takes place. When the signal is low, no operationoccurs. If omitted, the default value is 1.

aclr No Asynchronous clear signal used at any time to reset the pipeline to all 0s,asynchronously to the clock signal. The pipeline initializes to an undefined (X)logic level. The outputs are a consistent, but non-zero value.

sclr No Synchronous clear signal used at any time to reset the pipeline to all 0s,synchronously to the clock signal. The pipeline initializes to an undefined (X)logic level. The outputs are a consistent, but non-zero value.

Table 51. LPM_MULT Intel FPGA IP Output signals

signal Name Required Description

result[] Yes Data output.The size of the output signals depends on the Result width parameter.


UG-S10-DSP | 2019.10.22


89


10. Native Floating Point DSP Intel Stratix 10 FPGA IPReferences

The Native Floating Point DSP Intel Stratix 10 FPGA IP instantiates and controls asingle Intel Stratix 10 Variable Precision DSP block.

Related Information

Block Architecture Overview on page 9More information related to functional blocks in Intel Stratix 10 Floating-Point DSPIP core.

10.1. Native Floating Point DSP Intel Stratix 10 FPGA IP ReleaseInformation






Table 52. Native Floating Point DSP Intel Stratix 10 FPGA IP Release Information

Item Description

IP Version 19.1.0



UG-S10-DSP | 2019.10.22

Send Feedback







10.2. Native Floating Point DSP Intel Stratix 10 FPGA IP CoreSupported Operational Modes

Table 53. Operational Modes Supported by Native Floating Point DSP Intel Stratix 10FPGA IP Core

Operational Modes Description Supported ExceptionFlags

Multiply mode This mode performs single precision multiplication operation.This mode applies the following equation:• Out = Ay * Az

• mult_overflow

• mult_underflow

• mult_inexact

• mult_invalid

Add mode This mode performs single precision addition or subtractionoperation.This mode applies the following equations:• Out = Ay + Ax• Out = Ay - Ax

• adder_overflow

• adder_underflow

• adder_inexact

• adder_invalid

Multiply Add mode This mode performs single precision multiplication, followed byaddition or subtraction operations.This mode applies the following equations:• Out = (Ay * Az) - chainin• Out = (Ay * Az) + chainin• Out = (Ay * Az) - Ax• Out = (Ay * Az) + Ax

• mult_overflow

• mult_underflow

• mult_inexact

• mult_invalid

• adder_overflow

• adder_underflow

• adder_inexact

• adder_invalidMultiply Accumulate mode This mode performs floating-point multiplication followed byfloating-point addition or subtraction with the previousmultiplication result.This mode applies the following equations:• Out(t) = [Ay(t) * Az(t)] - Out (t-1) when accumulate signal

is driven high.• Out(t) = [Ay(t) * Az(t)] + Out (t-1) when accumulate port

is driven high.• Out(t) = Ay(t) * Az(t) when accumulate port is driven low.

Vector Mode 1 This mode performs floating-point multiplication followed byfloating-point addition or subtraction with the chainin input fromthe previous variable DSP Block.This mode applies the following equations:• Out = (Ay * Az) - chainin, chainout = Ax• Out = (Ay * Az) + chainin , chainout = Ax• Out = (Ay * Az) , chainout = Ax

Vector Mode 2 This mode performs floating-point multiplication where themultiplication result is directly fed to chainout. The chainininput from the previous variable DSP Block is then added orsubtracted from input Ax as the output result.This mode applies the following equations:• Out = Ax - chainin , chainout = Ay * Az• Out = Ax + chainin , chainout = Ay * Az• Out = Ax , chainout = Ay * Az

Related Information

• Single Floating-Point Arithmetic Functions on page 29

• Multiple Floating-Point Arithmetic Functions on page 32

10. Native Floating Point DSP Intel Stratix 10 FPGA IP References

UG-S10-DSP | 2019.10.22


91


10.3. Parameterizing the Native Floating Point DSP Intel Stratix 10FPGA IP

Select different parameters to create an IP core suitable for your design.

1. In Intel Quartus Prime Pro Edition,create a new project that targets a Intel Stratix10 device.

2. In IP Catalog, click Library ➤ DSP ➤ Primitive DSP ➤ Native Floating PointDSP Intel Stratix 10 FPGA IP.The Native Floating Point DSP Intel Stratix 10 FPGA IP Core IP parameter editoropens.

3. In the New IP Variation dialog box, enter an Entity Name and click OK.

4. Under Parameters, select the DSP Template and the View you want for your IPcore

5. In the DSP Block View, switch the clock or reset of each valid register.

6. For Multiply Add or Vector Mode 1, click the Chain In multiplexer in the GUI toselect input from chainin port or Ax port.

7. Click the Adder symbol in the GUI to select addition or subtraction.

8. Click the Chain Out multiplexer in the GUI to enable chainout port.

9. Click Generate HDL.

10. Click Finish.

10.3.1. Native Floating Point DSP Intel Stratix 10 FPGA IP Parameters

Table 54. Parameters


DSP Template MultiplyAddMultiply AddMultiply AccumulateVector Mode 1Vector Mode 2

Multiply Select the desired operational mode for theDSP block.The selected operation is reflected in the DSPBlock View.

View Register EnablesRegister Clears

Register Enables Options to select clocking scheme or resetscheme for registers view. The selectedoperation is reflected in the DSP Block View.Select Register Enables for DSP Block Viewto show registers clocking scheme. You canchange the clocks for each of the registers inthis view.Select Register Clears for DSP Block View toshow registers reset scheme. Turn on UseSingle Clear to change the registers resetscheme.

Clear Type NoneSynchronousAsynchronous

Synchronous Options to select reset type for all registers.Select None to not reset the registers.Select Synchronous use synchronous clearsignal type for all registers.Select Asynchronous to use asynchronousclear signal type for all registers.

continued...


UG-S10-DSP | 2019.10.22


92



Single Clear On or off Off Turn on this parameter if you want a singlereset to reset all the registers in the DSP block.Turn off this parameter to use different resetports to reset the registers.This parameter is disable when you selectNone for Clear Type.

Connect ExceptionFlags

OnOff

Off Click this parameter to use and generateexception flags output ports for the DSP block.When you turn off this parameter, the IP coredoes not generate exception flags output ports.

DSP View Block.

Chain In Multiplexer(1)

EnableDisable

Disable Click the multiplexer to enable chainin port.

Chain Out Multiplexer(2)

DisableEnable

Disable Click the multiplexer to enable chainout port.

Adder (3) +-

+ Click the Adder symbol to select addition orsubtraction mode.

Register Clock (4) NoneClock 0Clock 1Clock 2

Clock 0 To bypass any register, switch the register clockto None.Switch the register clock to:• Clock 0 to use clk[0] signal as the clock

source• Clock 1 to use clk[1] signal as the clock

source• Clock 2 to use clk[2] signal as the clock

sourceYou can only change these settings when youselect Register Enables in View parameter.

Register Clear (4) Clear 0Clear 1

Clear 0 for inputregistersClear 1 for outputand pipeline registers

This view shows the IP core reset scheme.Clear 0 uses clr[0] signal.Clear 1 uses clr[1] signal.All input registers use clr[0] reset signal. Alloutput and pipeline registers use clr[1] resetsignal.


UG-S10-DSP | 2019.10.22


93


Figure 46. DSP View Block

10.4. Native Floating Point DSP Intel Stratix 10 FPGA IP CoreSignals

Figure 47. Native Floating Point DSP Intel Stratix 10 FPGA IP Core Signals

The figure shows the input and output signals of the Native Floating Point DSP Intel Stratix 10 FPGA IP core.

ax[31:0]

ay[31:0]

az[31:0]

clk[2:0]

ena[2:0]

clr[1:0]

result[31:0]chainout[31:0]

Native Floating Point DSP Intel Stratix 10 FPGA IP

Data Input Signals

DataOutputSignals

Dynamic Control Signal

Clock, Enable and Clear Signals

chainin[31:0]

accumulate

mult_overflowmult_underflow

mult_invalidmult_inexact

adder_overflowadder_underflow

adder_invalidadder_inexact

Exception Flags Output Signals

Table 55. Native Floating Point DSP Intel Stratix 10 FPGA IP Core Signals

Signal Name Type Width Default Description

ax[31:0] Input 32 Low Input data bus to the multiplier.Available in:

continued...


UG-S10-DSP | 2019.10.22


94



• Add mode• Multiply-Add mode without chainin and chainout

feature• Vector Mode 1• Vector Mode 2

ay[31:0] Input 32 Low Input data bus to the multiplier.Available in all floating-point operational modes.

az[31:0] Input 32 Low Input data bus to the multiplier.Available in:• Multiply• Multiply Add• Multiply Accumulate• Vector Mode 1• Vector Mode 2

chainin[31:0] Input 32 Low Connect these signals to the chainout signals fromthe preceding floating-point DSP IP core.

clk[2:0] Input 3 Low Input clock signals for all registers.These clock signals are only available if any of theinput registers, pipeline registers, or output registeris set to Clock0 or Clock1 or Clock2.

ena[2:0] Input 3 High Clock enable for clk[2:0].These signals are active-High.• ena[0] is for Clock0• ena[1] is for Clock1• ena[2] is for Clock2

clr[1:0] Input 2 Low These signals are active-high.Use clr[0] for all input registers and use clr[1] forall pipeline and output registers.

accumulate Input 1 Low Input signal to enable or disable the accumulatorfeature.• Assert this signal to enable feedback the adder's

output.• De-assert this signal to disable the feedback

mechanism.You can assert or de-assert this signal during run-time.Available in Multiply Accumulate mode.

chainout[31:0] Output 32 — Connect these signals to the chainin signals of thenext floating-point DSP IP core.

result[31:0] Output 32 — Output data bus from IP core.

mult_overflow Output 1 This signal indicates if the multiplier result is alarger value compared to the maximum presentablevalue.1: If the multiplier result is a larger value comparedto the maximum representable value and the resultis cast to infinity.0: If the multiplier result is not larger than themaximum presentable value.Not available in Adder mode.

mult_underflow Output 1 — This signal indicates if the multiplier result is asmaller value compared to the minimumpresentable value.

continued...


UG-S10-DSP | 2019.10.22


95



1: If the multiplier result is a smaller valuecompared to the minimum representable value andthe result is flushed to zero.0: If the multiplier result is a larger than theminimum representable value.Not available in Adder mode.

mult_inexact Output 1 — This signal indicates if the multiplier result is anexact representation.1: If the multiplier result is:• a rounded value or• a smaller value compared to the minimum

representable value or• a larger value compared to the maximum

representable value.0: If the multiplier result does not meet any of thecriteria above.Not available in Adder mode.

mult_invalid Output 1 — This signal indicates if the multiplier operation is ill-defined and produces an invalid result.1: If the multiplier result is invalid and cast toqNaN.0: If the multiplier result is not an invalid number.Not available in Adder mode.

adder_overflow Output 1 — This signal indicates if the adder result is a largervalue compared to the maximum representablevalue.1: If the adder result is a larger value compared tothe maximum presentable value and the result iscast to infinity.0: If the multiplier result is not larger than themaximum presentable value.Not available in Multiply mode.

continued...


UG-S10-DSP | 2019.10.22


96



adder_underflow Output 1 — This signal indicates if the adder result is a smallervalue compared to the minimum presentable value.1: If the multiplier result is a smaller valuecompared to the minimum representable value andthe result is flushed to zero.0: If the multiplier result is a larger than theminimum representable value.Not available in Multiply mode.

adder_inexact Output 1 — This signal indicates if the adder result is an exactrepresentation.1: If the adder result is:• a rounded value• a smaller value compared to the minimum

representable value or• a larger value compared to the maximum

representable value.0: If the multiplier result does not meet any of thecriteria above.Not available in Multiply mode.

adder_invalid Output 1 — This signal indicates if the adder operation is ill-defined and produces an invalid result.1: If the multiplier result is invalid and cast toqNaN.0: If the multiplier result is not an invalid number.Not available in Multiply mode.

Related Information

Exception Handling for Floating-Point Arithmetic on page 19More information related to exception flags for Intel Stratix 10 Floating-Point DSPblock.


UG-S10-DSP | 2019.10.22


97


11. LPM_DIVIDE (Divider) Intel FPGA IP CoreThe LPM_DIVIDE Intel FPGA IP core implements a divider to divide a numerator inputvalue by a denominator input value to produce a quotient and a remainder.

The following figure shows the ports for the LPM_DIVIDE IP core.

Figure 48. LPM_DIVIDE Ports

numer[]

denom[]

inst

LPM_DIVIDE

quotient[]

clken

clock

aclr

remain[]

11.1. LPM_DIVIDE Intel FPGA IP Release Information






Table 56. LPM_DIVIDE Intel FPGA IP Release Information

Item Description

IP Version 19.1



UG-S10-DSP | 2019.10.22

Send Feedback







11.2. Features

The LPM_DIVIDE IP core offers the following features:

• Generates a divider that divides a numerator input value by a denominator inputvalue to produce a quotient and a remainder.

• Supports data width of 1–256 bits.

• Supports signed and unsigned data representation format for both the numeratorand denominator values.

• Supports area or speed optimization.

• Provides an option to specify a positive remainder output.

• Supports pipelining configurable output latency.

• Supports optional asynchronous clear and clock enable ports.

11.3. Verilog HDL Prototype

The following Verilog HDL prototype is located in the Verilog Design File (.v) lpm.v inthe <Intel Quartus Prime installation directory>\eda\synthesisdirectory.

module lpm_divide ( quotient, remain, numer, denom, clock, clken, aclr);parameter lpm_type = "lpm_divide";parameter lpm_widthn = 1;parameter lpm_widthd = 1;parameter lpm_nrepresentation = "UNSIGNED";parameter lpm_drepresentation = "UNSIGNED";parameter lpm_remainderpositive = "TRUE";parameter lpm_pipeline = 0;parameter lpm_hint = "UNUSED";input clock;input clken;input aclr;input [lpm_widthn-1:0] numer;input [lpm_widthd-1:0] denom;output [lpm_widthn-1:0] quotient;output [lpm_widthd-1:0] remain;endmodule

11.4. VHDL Component Declaration

The VHDL component declaration is located in the VHDL Design File (.vhd)LPM_PACK.vhd in the <Intel Quartus Prime installation directory>\libraries\vhdl\lpm directory.

component LPM_DIVIDE generic (LPM_WIDTHN : natural; LPM_WIDTHD : natural;LPM_NREPRESENTATION : string := "UNSIGNED";LPM_DREPRESENTATION : string := "UNSIGNED";LPM_PIPELINE : natural := 0;LPM_TYPE : string := L_DIVIDE;LPM_HINT : string := "UNUSED");port (NUMER : in std_logic_vector(LPM_WIDTHN-1 downto 0);DENOM : in std_logic_vector(LPM_WIDTHD-1 downto 0);ACLR : in std_logic := '0';CLOCK : in std_logic := '0';CLKEN : in std_logic := '1';

11. LPM_DIVIDE (Divider) Intel FPGA IP Core

UG-S10-DSP | 2019.10.22


99


QUOTIENT : out std_logic_vector(LPM_WIDTHN-1 downto 0);REMAIN : out std_logic_vector(LPM_WIDTHD-1 downto 0));end component;

11.5. VHDL LIBRARY_USE Declaration

The VHDL LIBRARY-USE declaration is not required if you use the VHDL ComponentDeclaration.

LIBRARY lpm; USE lpm.lpm_components.all;

11.6. Ports

The following tables list the input and output ports for the LPM_DIVIDE IP core.

Table 57. LPM_DIVIDE Input Ports

Port Name Required Description

numer[] Yes Numerator data input. The size of the input port depends on theLPM_WIDTHN parameter value.

denom[] Yes Denominator data input. The size of the input port depends on theLPM_WIDTHD parameter value.

clock No Clock input for pipelined usage. For LPM_PIPELINE values other than0 (default), the clock port must be enabled.

clken No Clock enable pipelined usage. When the clken port is asserted high,the division operation takes place. When the signal is low, no operationoccurs. If omitted, the default value is 1.

aclr No Asynchronous clear port used at any time to reset the pipeline to all'0's asynchronously to the clock input.

Table 58. LPM_DIVIDE Output Ports

Port Name Required Description

quotient[] Yes Data output. The size of the output port depends on the LPM_WIDTHNparameter value.

remain[] Yes Data output. The size of the output port depends on the LPM_WIDTHDparameter value.

11.7. Parameters

The following table lists the parameters for the LPM_DIVIDE Intel FPGA IP core.


UG-S10-DSP | 2019.10.22


100


11.7.1. General Tab

Parameter Name Value DefaultValue

Description

How wide should the 'numerator'input bus be?

1–64 8 Specifies the widths of the numer[] andquotient[] ports.

How wide should the 'denominator'input bus be?

1–64 8 Specifies the widths of the denom[] andremain[] ports. Values are 1 to 64.

Numerator Representation • Unsigned

• Signed

Unsigned Sign representation of the numerator input.When this parameter is set to Signed, thedivider interprets the numer[] input as signedtwo's complement.

Denominator Representation • Unsigned

• Signed

Unsigned Sign representation of the denominator input.When this parameter is set to Signed, thedivider interprets the denom[] input as signedtwo's complement.

11.7.2. General1 Tab

Parameter Name Value DefaultValue

Description

Pipelining

Output latency 0–14 0 Specifies the number of clock cycles of latencyassociated with the quotient[] and remain[]outputs. A value of zero (0) indicates that nolatency exists, and that a purely combinationalfunction is instantiated. If omitted, the defaultvalue is 0 (non-pipelined). You cannot specify avalue for the Output latency parameter that ishigher than the value specified in the Howwide should the 'numerator' input bus be?parameter.

Create an asynchronous Clear input? • On• Off

Off Select this option to create aclr signal.

Create a Clock Enable Input? • On• Off

Off Select this option to create clken signal for theIP clock.

Optimization

Which do you wish to optimize? • DefaultOptimization

• Area• Speed

DefaultOptimization

Specify type of optimization for a specificinstance of the IP.• Default Optimization: Select this option to

use Intel Quartus Prime software to optimizeusing default optimization technique logic fora specific instance of the IP.

• Area: Select this option to use Intel QuartusPrime software to optimize routability for aspecific instance of the IP.

• Speed: Select this option to use IntelQuartus Prime software to optimize speed byusing carry chains for a specific instance ofthe IP.

Remainder

Always return a positive remainder? • Yes• No

Yes In order to reduce area and improve speed,Intel recommends setting this parameter to Yesin operations where the remainder must bepositive or unimportant.


UG-S10-DSP | 2019.10.22


101


12. Intel Stratix 10 Variable Precision DSP Blocks UserGuide Document Archives

If the table does not list a software version, the user guide for the previous software version applies.

Intel Quartus PrimeVersion

User Guide

18.1 Intel Stratix 10 Variable Precision DSP Blocks User Guide

17.1 Intel Stratix 10 Variable Precision DSP Blocks User Guide

UG-S10-DSP | 2019.10.22

Send Feedback



https://www.altera.com/en_US/pdfs/literature/hb/stratix-10/archives/ug-s10-dsp-18-1.pdf

https://www.altera.com/en_US/pdfs/literature/hb/stratix-10/archives/ug-s10-dsp-17-1.pdf





13. Document Revision History for Intel Stratix 10Variable Precision DSP Blocks User Guide

Document Version Intel QuartusPrime Version

Changes

2019.10.22 19.3 • Updated number of multipliers for Intel Stratix 10 TX 400, DX 1100, DX2100, and DX 2800 devices in Resources section.

• Added IP release information for:— Native Fixed Point DSP Intel Stratix 10 version 19.1.0— ALTMULT_COMPLEX Intel FPGA IP version 19.1.0— Multiply Adder Intel FPGA IP version 19.1.0— LPM_MULT Intel FPGA IP version 19.1.0— Native Floating Point DSP Intel Stratix 10 version 19.1.0

• Added information about LPM_DIVIDE Intel FPGA IP version 19.1.• Updated input and output register bank reset behavior in Input Register

Bank for Fixed-point and Floating-point Arithmetic and Output RegisterBank for Fixed-point Arithmetic topics.

2018.09.24 18.1 • Updated resource count for device GX 1100, SX 1100, and MX 1100 inNumber of Multipliers in Intel Stratix 10 Devices table.

2018.05.07 18.0 • Updated default value for How wide should result scanout width?parameter in General Parameters table for Native Fixed Point DSP IntelStratix 10 FPGA IP core.

• Added What is the value for loadconst? parameter in CoefficientConfiguration table for Native Fixed Point DSP Intel Stratix 10 FPGA IPcore.

• Added a footnote to the supported operation instance for the fixed-point independent 18 x 19 multiplication operation mode in theSupported Combinations of Operational Modes and Features forVariable Precision DSP Block in the Intel Stratix 10 Devices table.

• Added the subtract feature in the Fixed-Point Arithmetic DSPImplementation in the Block Architecture table.

• Added subtractor to the one-point floating arithmetic in the Adder orSubtractor for Fixed-Point and Floating-Point Arithmetic section.

• Added the Enable double accumulator parameter in the AccumulatorTab table.

• Updated the figure title of the One Sum of Two 18 x 18 or 18 x 19Multipliers with One Variable Precision DSP Block for Intel Stratix 10Devices.

• Minor editorial edits.• Updated all IP names as per Intel rebranding.

Date Version Changes

November 2017 2017.11.06 • Updated resource count for Intel Stratix 10 MX and TX variants.• Updated Maximum Input Data Width for Fixed-Point Arithmetic.• Introduced new m18x18_full_top operation mode in Native Fixed

Point DSP IP core.• Updated systolic registers implementation guidelines.

continued...

UG-S10-DSP | 2019.10.22

Send Feedback







Date Version Changes

• Added note to clarify the floating-point exception handling flags arereferring to output value exceptions in Multiplier Exception HandlingPossible Results and Adder Exception Handling Possible Results tables.

• Updated description for exception handling flags in Multiplier ExceptionHandling Possible Results and Adder Exception Handling PossibleResults tables.

• Updated Native Fixed Point DSP Parameter settings with DSP BlockView description.

• Rebranded ALTERA_MULT_ADD IP core to Multiply Adder.• Removed aclr[1:0] signal from Native Floating Point DSP Intel

Stratix 10 FPGA IP core.

May 2017 2017.05.08 Updated the behavior description of the sload_accum and accum_sloadsignals in the ALTERA_MULT_ADD Input Signals table.

October 2016 2016.10.31 Initial release.

13. Document Revision History for Intel Stratix 10 Variable Precision DSP Blocks User Guide

UG-S10-DSP | 2019.10.22


104


intel® stratix® 10 variable precision dsp blocks user guide · 2020-01-18 · 1. intel ® stratix...

Documents