Spring 2010 EECS150 - Lec03-FPGA Page
EECS150 - Digital DesignLecture 3 - Field Programmable
Gate Arrays (FPGAs)January 26, 2010
John Wawrzynek
1
Spring 2010 EECS150 - Lec03-FPGA Page
FPGA Overview• Basic idea: two-dimensional array of logic blocks and flip-flops with a means
for the user to configure (program): 1. the interconnection between the logic blocks, 2. the function of each block.
Simplified version of FPGA internal architecture:
2
Spring 2010 EECS150 - Lec03-FPGA Page
Die Photos: Virtex FPGA vs. Pentium IV
• FGPA Vertex chip looks remarkably structured– Very dense, very regular structure
• “Full-Custom” Pentium chip somewhat more random in structure– Large on-chip memories (caches) are visible
3
Spring 2010 EECS150 - Lec03-FPGA Page
FPGAs are in widespread use
4
INSIDE
Make MicroBlaze Processing Roar With Hardware Acceleration
FPGAs Help CERN TrackParticles Approaching Speed of Light
Hardware Trumps Software in Medical Device Design
Taming Power Draw inConsumer MPUs
INSIDE
Make MicroBlaze Processing Roar With Hardware Acceleration
FPGAs Help CERN TrackParticles Approaching Speed of Light
Hardware Trumps Software in Medical Device Design
Taming Power Draw inConsumer MPUs
FPGAs Power Net-CentricBattlefield on Many Fronts
FPGAs Power Net-CentricBattlefield on Many Fronts
INSIDE
Algorithm Developers PowerNew DA System on XilinxAutomotive FPGA Platform
Engineer Turns Blown Engine into Hot Startup
How to Beat Your Sonat Guitar Hero Using Xilinx FPGA
Tips and Tricks for Using FPGA Editor and SystemVerilog
INSIDE
Algorithm Developers PowerNew DA System on XilinxAutomotive FPGA Platform
Engineer Turns Blown Engine into Hot Startup
How to Beat Your Sonat Guitar Hero Using Xilinx FPGA
Tips and Tricks for Using FPGA Editor and SystemVerilog
Automotive Innovators Hit Top Gear in
Driver Assistance with FPGA Platforms
Automotive Innovators Hit High Gear in Driver Assistance
with FPGA Platforms
SUBSCRIBE
Plugging into High-VolumeConsumer Products
Plugging into High-VolumeConsumer Products
HIGH VOLUMESpartan-3E: A New Era
Multimedia for Automotive
DSP Algorithms
DESIGN TOOLSNew ISE 7.1i Software
Control Your Designs
SERIAL I/OExtend Your Reach
HIGH VOLUMESpartan-3E: A New Era
Multimedia for Automotive
DSP Algorithms
DESIGN TOOLSNew ISE 7.1i Software
Control Your Designs
SERIAL I/OExtend Your Reach
Far more designs are implemented in FPGA than in custom chips.
Spring 2010 EECS150 - Lec03-FPGA Page
Background (review) for upcoming
• A MUX or multiplexor is a combinational logic circuit that chooses between 2N inputs under the control of N control signals.
• A latch is a 1-bit memory (similar to a flip-flop).
5
Spring 2010 EECS150 - Lec03-FPGA Page
FPGA Variations• Families of FPGA’s differ in:
– physical means of implementing user programmability,
– arrangement of interconnection wires, and
– the basic functionality of the logic blocks.
• Most significant difference is in the method for providing flexible blocks and connections:
• Anti-fuse based (ex: Actel)
+ Non-volatile, relatively small
– fixed (non-reprogrammable)
6
Spring 2010 EECS150 - Lec03-FPGA Page
User Programmability
• Latches are used to:1. control a switch to make or
break cross-point connections in the interconnect
2. define the function of the logic blocks
3. set user options:
• within the logic blocks
• in the input/output blocks
• global reset/clock
• “Configuration bit stream” is loaded under user control
• Latch-based (Xilinx, Altera, …)
+ reconfigurable
– volatile
– relatively large.
7
Spring 2010 EECS150 - Lec03-FPGA Page
Idealized FPGA Logic Block
• 4-input look up table (LUT)– implements combinational logic functions
• Register– optionally stores output of LUT
8
Spring 2010 EECS150 - Lec03-FPGA Page
4-LUT Implementation• n-bit LUT is implemented as a 2n x
1 memory:– inputs choose one of 2n memory
locations.
– memory locations (latches) are normally loaded with values from user’s configuration bit stream.
– Inputs to mux control are the CLB inputs.
• Result is a general purpose “logic gate”.
– n-LUT can implement any function of n inputs!
9
Spring 2010 EECS150 - Lec03-FPGA Page
LUT as general logic gate• An n-lut as a direct implementation of
a function truth-table.
• Each latch location holds the value of the function corresponding to one input combination.
Example: 4-lut
Example: 2-lut
Implements any function of 2 inputs.
How many of these are there?How many functions of n inputs?
10
Spring 2010 EECS150 - Lec03-FPGA Page
FPGA Generic Design Flow
• Design Entry:– Create your design files using:
• schematic editor or • HDL (hardware description languages: Verilog, VHDL)
• Design Implementation:– Logic synthesis (in case of using HDL entry) followed by,– Partition, place, and route to create configuration bit-
stream file• Design verification:
– Optionally use simulator to check function,– Load design onto FPGA device (cable connects PC to
development board), optional “logic scope” on FPGA• check operation at full speed in real environment.
11
Spring 2010 EECS150 - Lec03-FPGA Page
Example Partition, Placement, and Route
• Example Circuit:– collection of gates and flip-flops
• Idealized FPGA structure:
Circuit combinational logic must be “covered” by 4-input 1-output LUTs.Flip-flops from circuit must map to FPGA flip-flops. (Best to preserve “closeness” to CL to minimize wiring.)Best placement in general attempts to minimize wiring.
12Vdd, GND, clock, and global resets are all “prewired”.
Spring 2010 EECS150 - Lec03-FPGA Page
Example Partition, Placement, and Route
• Example Circuit:– collection of gates and flip-flops
13
Two partitions. Each has single output, no more than 4 inputs, and no more than 1 flip-flop. In this case, inverter goes in both partitions. Note: the partition can be arbitrarily large as long as it has not more than 4 inputs and 1 output, and no more than 1 flip-flop.
A
A
B
B
INOUT
Spring 2010 EECS150 - Lec03-FPGA Page
Xilinx FPGAs (interconnect detail)
14
Spring 2010 EECS150 - Lec03-FPGA Page
Project platform: Xilinx ML505-110
15
Spring 2010 EECS150 - Lec03-FPGA Page
FPGA: Xilinx Virtex-5 XC5VLX110TVirtex-5 “die photo”
A die is an unpackaged part
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
16
Spring 2010 EECS150 - Lec03-FPGA Page
From die to PC board ...
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
2 www.xilinx.com XAPP426 (v1.3) March 3, 2006
Implementing Xilinx Flip-Chip BGA PackagesR
*Xilinx flip-chip packages are not hermetically sealed and exposure to cleaning solvents/ chemicals or excessive moisture during boards assembly could pose serious package reliability concerns. Small vents are kept by design between the heatspreader (lid) and the organic substrate to allow for outgassing and moisture evaporation. Solvents or other corrosive chemicals could seep through these vents and attack the organic materials and components inside the package and hence are strongly discouraged during board assembly of Xilinx flip-chip BGA packages.
Recommended PCB Design Rules
For Xilinx BGA packages, NSMD (Non Solder Mask Defined) pads on the board are suggested. This allows a clearance between the land metal (diameter L) and the solder mask opening (diameter M) as shown in Figure 3. The space between the NSMD pad and the solder mask, and the actual signal trace widths depends on the capability of the PCB vendor. The cost of the PCB is higher when the line widths and spaces are tighter.
Figure 2: Package Construction with Type II Lid
Copper Heatspreader
Solder Ball
Flip Chip Solder Bump
Organic Build-Up Substrate
Underfill Epoxy
Silicon Die
Thermal Interface Material
Adhesive Epoxy*
Figure 3: Suggested Board Layout of Soldered Pads for BGA
Ball Grid Array (BGA)
Flip-Chip Package
17
Spring 2010 EECS150 - Lec03-FPGA Page
Banks of I/O placed on chip floor plan
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
Virtex-5 FPGA User Guide www.xilinx.com 217UG190 (v4.2) May 9, 2008
SelectIO Resources General GuidelinesR
Virtex-5 I/O Bank RulesIn Virtex-5 devices, with some exceptions in the center column, an I/O bank consists of 40 IOBs (20 CLBs high and a single clock region). There are always four half-sized banks (20 IOBs) and a single configuration bank in the center column. The number of banks depends upon the device size, and in larger devices, there are additional full-sized banks in the center column. In the Virtex-5 Overview the total number of I/O banks is listed by device type. The XC5VLX30 has 12 usable I/O banks and one configuration bank. Figure 6-3 is an example of a columnar floorplan showing the XC5VLX30 I/O banks.
Reference Voltage (VREF) Pins
Low-voltage, single-ended I/O standards with a differential amplifier input buffer require an input reference voltage (VREF). VREF is an external input into Virtex-5 devices. Within each I/O bank, one of every 20 I/O pins is automatically configured as a VREF input, if using a single-ended I/O standard that requires a differential amplifier input buffer.
Output Drive Source Voltage (VCCO) Pins
Many of the low-voltage I/O standards supported by Virtex-5 devices require a different output drive voltage (VCCO). As a result, each device often supports multiple output drive source voltages.
Output buffers within a given VCCO bank must share the same output drive source voltage. The following input buffers use the VCCO voltage: LVTTL, LVCMOS, PCI, LVDCI and other DCI standards.
Figure 6-3: Virtex-5 XC5VLX30 I/O Banks
ug190_6_03_021306
BANK40 I/O
BANK20 I/OBANK20 I/O
BANK20 I/O
BANK20 I/O
BANK40 I/O
BANK40 I/O
BANK40 I/O
CONFIG
BANK40 I/O
BANK40 I/O
BANK40 I/O
BANK40 I/O
Virtex-5 FPGA Packaging and Pinout Specification www.xilinx.com 275UG195 (v4.3) June 18, 2008
FF676 Package—LX30R
Figure 3-8: FF676 Package—LX30 SelectIO Bank Diagram
12
34
56
78
910
1112
1314
1516
1718
1920
2122
2324
2526
ug195_c3_06_031507
12
34
56
78
910
1112
1314
1516
1718
1920
2122
2324
2526
ABCDEFGHJKLMNPRTUVWY
AAABACADAEAF
ABCDEFGHJKLMNPRTUVWYAAABACADAEAF
16 16 16 16 16 16 16 16 16 16 15 15 15 15 15 15 15 15 15 1516 16 16 16 16 16 16 16 16 16 15 15 15 15 15 15 15 15 15 15 1516 16 16 16 16 16 16 16 16 16 16 15 15 15 15 15 15 15 15 15 1516 16 16 16 16 16 16 16 16 3 3 3 15 15 15 15 15 15 15 15 1514 14 14 12 12 12 3 3 3 3 3 3 3 3 3 11 11 11 11 13 13
14 14 12 12 12 3 3 3 3 1 3 1 3 3 3 11 11 11 13 1314 14 12 12 12 12 1 1 1 1 1 1 1 1 1 11 11 11 13 13 1314 14 14 12 12 12 1 1 1 1 1 1 1 1 1 11 11 11 11 13 1314 14 12 12 12 11 11 11 11 13 13 1314 14 14 12 12 12 11 11 11 11 13 13
14 12 12 12 12 11 11 11 11 13 1314 14 12 12 12 12 11 11 11 11 13 13 1314 14 12 12 12 12 11 11 11 11 13 1314 12 12 12 12 11 11 11 11 13 13 1314 14 12 12 12 12 17 17 11 11 13 13
14 12 18 18 18 17 17 17 17 13 1314 14 18 18 18 18 17 17 17 17 13 13 1314 14 18 18 18 18 17 17 17 17 17 1314 18 18 18 18 17 17 17 17 17 13 1314 14 18 18 18 18 2 2 2 2 2 2 17 17 17 17 13 13
14 18 18 18 18 2 2 2 2 2 2 2 2 2 17 17 17 17 17 1314 14 18 18 18 18 4 4 2 4 2 2 2 2 17 17 17 17 17 13 1314 14 18 18 18 4 4 4 4 4 4 4 4 4 4 17 17 17 17 17 1314 18 18 18 18 4 4 4 4 4 4 4 1714 14 18 18 18
14 18 18 18
Colors on this package pinout map to banks.
18
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
Colors represent different types of resources:
LogicBlock RAMDSP (ALUs)ClockingI/OSerial I/O + PCI
A routing fabric runs throughout the chip to wire everything together.
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
Routing fabric requires many interconnect layers.
Spring 2010 EECS150 - Lec03-FPGA Page
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
Configurable Logic Blocks (CLBs)
� �� �$� � � � �� � � � � !� �� www.xilinx.com ���UG190 (v4.2) May 9, 2008
�
Chapter 5
Configurable Logic Blocks (CLBs)
���� �"��"��#The Configurable Logic Blocks (CLBs) are the main logic resources for implementing sequential as well as combinatorial circuits. Each CLB element is connected to a switch matrix for access to the general routing matrix (shown in Figure 5-1). A CLB element contains a pair of slices. These two slices do not have direct connections to each other, and each slice is organized as a column. Each slice in a column has an independent carry chain. For each CLB, slices in the bottom of the CLB are labeled as SLICE(0), and slices in the top of the CLB are labeled as SLICE(1).
The Xilinx tools designate slices with the following definitions. An “X” followed by a number identifies the position of each slice in a pair as well as the column position of the slice. The “X” number counts slices starting from the bottom in sequence 0, 1 (the first CLB column); 2, 3 (the second CLB column); etc. A “Y” followed by a number identifies a row of slices. The number remains the same within a CLB, but counts up in sequence from one CLB row to the next CLB row, starting from the bottom. Figure 5-2 shows four CLBs located in the bottom-left corner of the die.
Figure 5-1: ���������� �����������#� ���� ������
SwitchMatrix
Slice(1)
COUTCOUT
CINCIN
Slice(0)
� � �
UG190_5_01_122605
Slices define regular connections to the switching fabric, and to slices in
CLBs above and below it on the die.
The LX110T has 17,280 slices.21
Spring 2010 EECS150 - Lec03-FPGA Page
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
X-Y naming convention for slices
172 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008
Chapter 5: Configurable Logic Blocks (CLBs)R
Slice DescriptionEvery slice contains four logic-function generators (or look-up tables), four storage elements, wide-function multiplexers, and carry logic. These elements are used by all slices to provide logic, arithmetic, and ROM functions. In addition to this, some slices support two additional functions: storing data using distributed RAM and shifting data with 32-bit registers. Slices that support these additional functions are called SLICEM; others are called SLICEL. SLICEM (shown in Figure 5-3) represents a superset of elements and connections found in all slices. SLICEL is shown in Figure 5-4.
Figure 5-2: Row and Column Relationship between CLBs and Slices
SliceX1Y1
COUTCOUT
CINCIN
SliceX0Y1
CLB
UG190_5_02_122605
SliceX1Y0
COUTCOUT
SliceX0Y0
CLB
SliceX3Y1
COUTCOUT
CINCIN
SliceX2Y1
CLB
SliceX3Y0
COUTCOUT
SliceX2Y0
CLB
Lower-left corner of the die.
X0, X2, ... are lower CLB slices.X1, X3, ... are upper CLB slices.
Y0, Y1, ... are CLB column positions.
22
Spring 2010 EECS150 - Lec03-FPGA Page
Atoms: 5-input Look Up Tables (LUTs)
ExpressFabric Technology
WP245 (v1.1.1) July 7, 2006 www.xilinx.com 5
R
The 6-input LUT leads to several benefits:• As it implements wider functions directly in the LUT, the number of logic levels
between registers is reduced, leading to higher performance.• It implements significantly more logic than a LUT with four inputs.• Power consumption is reduced because the larger LUT reduces the amount of
required interconnect (routing resources).The Virtex-5 family SLICEM LUTs also provide additional benefits:• New aspect ratios for distributed RAM. Every LUT can be configured as a 64 x 1 or
32 x 2 distributed RAM. Benefits for the designer are a much denser and faster implementation of distributed RAM with increased flexibility.
• Longer SRL chains. A single LUT supports a 32-bit SRL. A slice can thus implement a shift register of up to 128 bits, providing significant area savings and reduced routing resources in comparison to previous architectures. Shift registers are features available only in Xilinx devices. The Xilinx ISE™ software packer automatically packs two 16-bit SRLs with common addressing but different data. In other words, if the application needs a 16-bit deep, 8-bit-wide shift register, it can be implemented in a single slice.
Routing and Interconnect ArchitectureWith process technology advancements, interconnect timing delays can account for more than 50% of the critical path delay. A new diagonally symmetric interconnect pattern, developed for the Virtex-5 family, enhances performance by reaching more places in fewer hops. The new pattern allows for more logic connections to be made within two or three hops. Moreover, the more regular routing pattern makes it easier for the Xilinx ISE software to find the most optimal routes. All of the interconnect features are transparent to FPGA designers, but translate to higher overall
Figure 3: Block Diagram of a Virtex-5 6-Input LUT
WP245_03_051006
LUT5
A1
A2
A3
A4 D
A5
A6
A2
A3
A4 D
D6
D5
A5
A6
A2
A3
A4
A5
A6
LUT6
LUT5
A[6:2] D000000000100010
....
101
111011111011111
001
Q
Q
Q
Q
Q
Q
(1)
(1)
(1)
(0)
(0)
(0)
.... D
A[6:2]
Computes any 5-input logic function.
Timing is independent of function.
Latchesset during
configuration. 23
Spring 2010 EECS150 - Lec03-FPGA Page
Virtex-5 6-LUTs: Composition of 5-LUTsMay be used
as one 6-input LUT (D6 out) ...
Combinational logic
(post configuration)
... or as two 5-input LUTS (D6 and D5)
ExpressFabric Technology
WP245 (v1.1.1) July 7, 2006 www.xilinx.com 5
R
The 6-input LUT leads to several benefits:• As it implements wider functions directly in the LUT, the number of logic levels
between registers is reduced, leading to higher performance.• It implements significantly more logic than a LUT with four inputs.• Power consumption is reduced because the larger LUT reduces the amount of
required interconnect (routing resources).The Virtex-5 family SLICEM LUTs also provide additional benefits:• New aspect ratios for distributed RAM. Every LUT can be configured as a 64 x 1 or
32 x 2 distributed RAM. Benefits for the designer are a much denser and faster implementation of distributed RAM with increased flexibility.
• Longer SRL chains. A single LUT supports a 32-bit SRL. A slice can thus implement a shift register of up to 128 bits, providing significant area savings and reduced routing resources in comparison to previous architectures. Shift registers are features available only in Xilinx devices. The Xilinx ISE™ software packer automatically packs two 16-bit SRLs with common addressing but different data. In other words, if the application needs a 16-bit deep, 8-bit-wide shift register, it can be implemented in a single slice.
Routing and Interconnect ArchitectureWith process technology advancements, interconnect timing delays can account for more than 50% of the critical path delay. A new diagonally symmetric interconnect pattern, developed for the Virtex-5 family, enhances performance by reaching more places in fewer hops. The new pattern allows for more logic connections to be made within two or three hops. Moreover, the more regular routing pattern makes it easier for the Xilinx ISE software to find the most optimal routes. All of the interconnect features are transparent to FPGA designers, but translate to higher overall
Figure 3: Block Diagram of a Virtex-5 6-Input LUT
WP245_03_051006
LUT5
A1
A2
A3
A4 D
A5
A6
A2
A3
A4 D
D6
D5
A5
A6
A2
A3
A4
A5
A6
LUT6
LUT5
The LX110T has 69,120 6-LUTs6-LUT delay is 0.9 ns
24
Spring 2010 EECS150 - Lec03-FPGA Page
The simplest view of a slice
194 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008
Chapter 5: Configurable Logic Blocks (CLBs)R
Designing Large Multiplexers
4:1 Multiplexer
Each Virtex-5 LUT can be configured into a 4:1 MUX. The 4:1 MUX can be implemented with a flip-flop in the same slice. Up to four 4:1 MUXes can be implemented in a slice, as shown in Figure 5-21.
Figure 5-21: Four 4:1 Multiplexers in a Slice
UG190_5_21_050506
(D[6:1])
(C[6:1])
(B[6:1])
(A[6:1])
(CLK)CLK
6
SLICE
LUT
LUT
LUT
LUT
A[6:1]
O6
6 A[6:1]
O6
RegisteredOutput
4:1 MUX Output
(Optional)
D Q
(D)
(DQ)
RegisteredOutput
4:1 MUX Output
(Optional)
D Q
(C)
(CQ)
RegisteredOutput
4:1 MUX Output
(Optional)
D Q
(B)
(BQ)
RegisteredOutput
4:1 MUX Output
(Optional)
D Q
(A)
(AQ)
6 A[6:1]
O6
6 A[6:1]
O6
SEL D [1:0], DATA D [3:0]Input
SEL C [1:0], DATA C [3:0]Input
SEL B [1:0], DATA B [3:0]Input
SEL A [1:0], DATA A [3:0]Input
Four 6-LUTs
Four Flip-Flops
Switching fabric may see combinational and registered
outputs.
An actual Virtex-5 slice adds many small features to this
simplified diagram. We show them one by one ...
25
Spring 2010 EECS150 - Lec03-FPGA Page
Two 7-LUTs per slice ...
Extra multiplexers(F7AMUX,
F7BMUX)
Virtex-5 FPGA User Guide www.xilinx.com 195UG190 (v4.2) May 9, 2008
CLB OverviewR
8:1 Multiplexer
Each slice has an F7AMUX and an F7BMUX. These two muxes combine the output of two LUTs to form a combinatorial function up to 13 inputs (or an 8:1 MUX). Up to two 8:1 MUXes can be implemented in a slice, as shown in Figure 5-22.
Figure 5-22: Two 8:1 Multiplexers in a Slice
UG190_5_22_090806
(D[6:1])
(C[6:1])
(CX)
(B[6:1])
(A[6:1])
(AX)
SELF7(1)(CLK)
CLK
SELF7(2)
SEL D [1:0], DATA D [3:0]Input (1)
SEL C [1:0], DATA C [3:0]Input (1)
SEL B [1:0], DATA B [3:0]Input (2)
SEL A [1:0], DATA A [3:0]Input (2)
6
SLICE
LUT
LUT
LUT
LUT
A[6:1]
O6
6 A[6:1]
O6 RegisteredOutput
8:1 MUXOutput (1)
(Optional)
D Q
(CMUX)
(CQ)
RegisteredOutput
8:1 MUXOutput (2)
(Optional)
D Q
(AMUX)
(AQ)
6 A[6:1]
O6
6 A[6:1]
O6
F7BMUX
F7AMUX
Extra inputs (AX and CX)
26
Spring 2010 EECS150 - Lec03-FPGA Page
Or one 8-LUTs per slice ...
Third multiplexer(F8MUX)
Third input (BX)
196 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008
Chapter 5: Configurable Logic Blocks (CLBs)R
16:1 Multiplexer
Each slice has an F8MUX. F8MUX combines the outputs of F7AMUX and F7BMUX to form a combinatorial function up to 27 inputs (or a 16:1 MUX). Only one 16:1 MUX can be implemented in a slice, as shown in Figure 5-23.
It is possible to create multiplexers wider than 16:1 across more than one SLICEM. However, there are no direct connections between slices to form these wide multiplexers.
Fast Lookahead Carry Logic
In addition to function generators, dedicated carry logic is provided to perform fast arithmetic addition and subtraction in a slice. The Virtex-5 CLB has two separate carry chains, as shown in Figure 5-1. The carry chains are cascadable to form wider add/subtract logic, as shown in Figure 5-2.
The carry chain in the Virtex-5 device is running upward and has a height of four bits per slice. For each bit, there is a carry multiplexer (MUXCY) and a dedicated XOR gate for adding/subtracting the operands with a selected carry bits. The dedicated carry path and
Figure 5-23: 16:1 Multiplexer in a Slice
UG190_5_23_050506
(D[6:1])
(C[6:1])
(CX)
(B[6:1])
(A[6:1])
(AX)(BX)
(CLK)
SELF7
SELF7SELF8
CLK
6
SLICE
LUT
LUT
LUT
LUT
A[6:1]
O6
6 A[6:1]
O6
RegisteredOutput
16:1 MUXOutput
(Optional)
D Q
(BMUX)
(B)
6 A[6:1]
O6
6 A[6:1]
O6
F7BMUX
F8MUX
F7AMUX
SEL D [1:0], DATA D [3:0]Input
SEL C [1:0], DATA C [3:0]Input
SEL B [1:0], DATA B [3:0]Input
SEL A [1:0], DATA A [3:0]Input Configuring the
“n” of an n-LUT ...27
Spring 2010 EECS150 - Lec03-FPGA Page
Extra muxes to chose LUT option ...
Virtex-5 FPGA User Guide www.xilinx.com 199UG190 (v4.2) May 9, 2008
CLB / Slice Timing ModelsR
General Slice Timing Model and ParametersA simplified Virtex-5 slice is shown in Figure 5-25. Some elements of the Virtex-5 slice are omitted for clarity. Only the elements relevant to the timing paths described in this section are shown.
Figure 5-25: Simplified Virtex-5 Slice
UG190_5_25_050506
LUT
O6
O5
6 D
FE/LAT
DCECLK
SR REV
Q
F7BMUX
F8MUX
DMUX
DQ
D Inputs
LUT
O6
O5
6 C
FE/LAT
DCECLK
SR REV
Q CQ
CMUX
C Inputs
DX
CX
LUT
O6
O5
6 B
FE/LAT
DCECLK
SR REV
Q BQ
BMUX
B Inputs
BX
FE/LAT
DCECLK
SR REV
Q AQ
F7AMUXLUT
O6
O5
6 A
AMUX
A Inputs
AX
CE
CLK
SRREV(DX)
From eight 5-LUTs ... to one 8-LUT.
Combinational or registered outs.
Flip-flops unused by LUTs can be used
standalone.
Flip-flops ...28
Spring 2010 EECS150 - Lec03-FPGA Page
Slice flip-flop properties ...
Virtex-5 FPGA User Guide www.xilinx.com 177UG190 (v4.2) May 9, 2008
CLB OverviewR
Table 5-3: Truth Table when SRLOW is Used (Default Condition)
SR REV Function
0 0 No Logic Change
0 1 1
1 0 0
1 1 0
Table 5-4: Truth Table when SRHIGH is Used
SR REV Function
0 0 No Logic Change
0 1 0
1 0 1
1 1 0
Figure 5-5: Register/Latch Configuration in a SliceUG190_5_05_071207
DX
CX
BX
CE
AX
DQ
CQ
BQ
AQ
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
DFFLUT D Output
LUT C Output
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
Q
CECK
Q
Q
Q
SR
LUT B Output
LUT A Output AFF
BFF
CFF
CLK
Reset Type
Sync
Async
Each state element may be edge-triggered FF or latch.
Each state element may respond differently to
set/reset signal.
Clock enable, clock polarity, and set/reset lines in a slice
are shared.
Next: The vertical dimension ...29
Spring 2010 EECS150 - Lec03-FPGA Page
Reminder: arithmetic addition ...
One-bit full adder
[Co,S] = A + B + Ci
A, B, Ci: 1-bit number inputs.[Co,S]: 2-bit number output.“+”: Arithmetic addition.
Simplest multi-bit adder
“Ripple-carry adder”30
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Virtex-5 FPGA User Guide www.xilinx.com 197UG190 (v4.2) May 9, 2008
CLB OverviewR
carry multiplexer (MUXCY) can also be used to cascade function generators for implementing wide logic functions.
Figure 5-24 illustrates the carry chain with associated logic elements in a slice.
The carry chains carry lookahead logic along with the function generators. There are ten independent inputs (S inputs – S0 to S3, DI inputs – DI1 to DI4, CYINIT and CIN) and eight independent outputs (O outputs – O0 to O3, and CO outputs – CO0 to CO3).
The S inputs are used for the “propagate” signals of the carry lookahead logic. The “propagate” signals are sourced from the O6 output of a function generator. The DI inputs are used for the “generate” signals of the carry lookahead logic. The “generate” signals are sourced from either the O5 output of a function generator or the BYPASS input (AX, BX, CX, or DX) of a slice. The former input is used to create a multiplier, while the latter is used
Figure 5-24: Fast Carry Logic Path and Associated Elements
UG190_5_24_050506
O6 From LUTD
DMUX/DQ*
DMUX
DQO5 From LUTD
DX
S3MUXCY
DI3
CO3
O3
COUT (To Next Slice)
Carry Chain Block(CARRY4)
(Optional)
D Q
O6 From LUTC
CMUX/CQ*
CMUX
CQO5 From LUTC
CX
S2MUXCY
DI2
CO2
CO1
CO0
O2
(Optional)
D Q
O6 From LUTB
BMUX/BQ*
BMUX
BQO5 From LUTB
BX
S1MUXCY
DI1
O1
(Optional)
D Q
O6 From LUTA
AMUX/AQ*
AMUX
AQO5 From LUTA
AX
S0MUXCY
DI0
CIN
CIN (From Previous Slice)
* Can be used ifunregistered/registeredoutputs are free.
CYINIT
10
O0
(Optional)
D Q
We can map ripple-carry addition onto
carry-chain block.
The carry-chain block also useful for speeding up other adder structures
and counters.
Virtex 5 Verical Logic
Spring 2010 EECS150 - Lec03-FPGA Page
Putting it all together ... a SLICEL.
174 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008
Chapter 5: Configurable Logic Blocks (CLBs)R
Each CLB can contain zero or one SLICEM. Every other CLB column contains a SLICEMs. In addition, the two CLB columns to the left of the DSP48E columns both contain a SLICEL and a SLICEM.
Figure 5-4: Diagram of SLICEL
A6LUTROM
COUT
D
DX
C
CX
B
BX
A
AX
O6O5
UG190_5_04_032606
A5A4A3A2A1
D6
DMUX
D
DQ
C
CQ
CMUX
B
BQ
BMUX
A
AQ
AMUX
DX
D5D4D3D2D1
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
Q
CECK
CIN
0/1
A6LUTROM
O6O5
A5A4A3A2A1
C6
CX
C5C4C3C2C1
A6LUTROM
O6O5
A5A4A3A2A1
B6
BX
B5B4B3B2B1
A6LUTROM
O6O5
A5A4A3A2A1
A6
AXSRCE
CLK
A5A4A3A2A1
Q
Q
Q
Reset Type
Sync
Async The previous slides explain all
SLICEL features.
About 50% of the 17,280 slices in an LX110T are
SLICELs.
The other slices are SLICEMs, and have extra
features. 32
Spring 2010 EECS150 - Lec03-FPGA Page
Recall: 5-LUT architecture ...
ExpressFabric Technology
WP245 (v1.1.1) July 7, 2006 www.xilinx.com 5
R
The 6-input LUT leads to several benefits:• As it implements wider functions directly in the LUT, the number of logic levels
between registers is reduced, leading to higher performance.• It implements significantly more logic than a LUT with four inputs.• Power consumption is reduced because the larger LUT reduces the amount of
required interconnect (routing resources).The Virtex-5 family SLICEM LUTs also provide additional benefits:• New aspect ratios for distributed RAM. Every LUT can be configured as a 64 x 1 or
32 x 2 distributed RAM. Benefits for the designer are a much denser and faster implementation of distributed RAM with increased flexibility.
• Longer SRL chains. A single LUT supports a 32-bit SRL. A slice can thus implement a shift register of up to 128 bits, providing significant area savings and reduced routing resources in comparison to previous architectures. Shift registers are features available only in Xilinx devices. The Xilinx ISE™ software packer automatically packs two 16-bit SRLs with common addressing but different data. In other words, if the application needs a 16-bit deep, 8-bit-wide shift register, it can be implemented in a single slice.
Routing and Interconnect ArchitectureWith process technology advancements, interconnect timing delays can account for more than 50% of the critical path delay. A new diagonally symmetric interconnect pattern, developed for the Virtex-5 family, enhances performance by reaching more places in fewer hops. The new pattern allows for more logic connections to be made within two or three hops. Moreover, the more regular routing pattern makes it easier for the Xilinx ISE software to find the most optimal routes. All of the interconnect features are transparent to FPGA designers, but translate to higher overall
Figure 3: Block Diagram of a Virtex-5 6-Input LUT
WP245_03_051006
LUT5
A1
A2
A3
A4 D
A5
A6
A2
A3
A4 D
D6
D5
A5
A6
A2
A3
A4
A5
A6
LUT6
LUT5
A[6:2] D000000000100010
....
101
111011111011111
001
Q
Q
Q
Q
Q
Q
(1)
(1)
(1)
(0)
(0)
(0)
.... D
A[6:2]
32 Latches. Configured to 1 or 0.
Some parts of a logic design need many state elements.
SLICEMs replace normal 5-LUTs with circuits that can act like 5-LUTs, but can
alternatively use the 32 latches as RAM, ROM,
shift registers.33
Spring 2010 EECS150 - Lec03-FPGA Page
A SLICEM 6-LUT ...
Normal 6-LUT inputs.
Normal 5/6-LUT outputs.
Memory write
address
Memory data input
Memory data input.
Control output for chaining LUTs to
make larger memories.
Virtex-5 FPGA User Guide www.xilinx.com 173UG190 (v4.2) May 9, 2008
CLB OverviewR
Figure 5-3: Diagram of SLICEM
A6DI2
COUT
D
DX
C
CX
B
BX
A
AX
O6
DI1MC31
O5
UG190_5_03_041006
A5A4A3A2A1
D6
DIDMUX
D
DQ
C
CQ
CMUX
B
BQ
BMUX
A
AQ
AMUX
Reset Type
DX
D5D4D3D2D1
WA1-WA6WA7WA8
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
Q
CECK
CLKWSGEN
CIN
0/1
WE
Sync
Async
A6DI2
O6
DI1
MC31
O5
A5A4A3A2A1
C6
CI
CX
C5C4C3C2C1
A6DI2
O6
DI1
MC31
O5
A5A4A3A2A1
B6
BI
BX
B5B4B3B2B1
A6DI2
O6
DI1
MC31
O5
A5A4A3A2A1
A6
AI
AXSRCE
CLK
WE
A5A4A3A2A1
Q
Q
Q
WA1-WA6WA7WA8
WA1-WA6WA7WA8
WA1-WA6WA7WA8
A 1.1 Mb distributed RAM can be made if all SLICEMs of an LX110T are used as RAM.
34
Spring 2010 EECS150 - Lec03-FPGA Page
Many RAM configurations possible ...
178 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008
Chapter 5: Configurable Logic Blocks (CLBs)R
SRHIGH and SRLOW can be set individually for each storage element in a slice. The choice of synchronous (SYNC) or asynchronous (ASYNC) set/reset (SRTYPE) cannot be set individually for each storage element in a slice.
The initial state after configuration or global initial state is defined by separate INIT0 and INIT1 attributes. By default, setting the SRLOW attribute sets INIT0, and setting the SRHIGH attribute sets INIT1. Virtex-5 devices can set INIT0 and INIT1 independent of SRHIGH and SRLOW.
The configuration options for the set and reset functionality of a register or a latch are as follows:
! No set or reset
! Synchronous set
! Synchronous reset
! Synchronous set and reset
! Asynchronous set (preset)
! Asynchronous reset (clear)
! Asynchronous set and reset (preset and clear)
Distributed RAM and Memory (Available in SLICEM only)
Multiple LUTs in a SLICEM can be combined in various ways to store larger amount of data.
The function generators (LUTs) in SLICEMs can be implemented as a synchronous RAM resource called a distributed RAM element. RAM elements are configurable within a SLICEM to implement the following:
! Single-Port 32 x 1-bit RAM
! Dual-Port 32 x 1-bit RAM
! Quad-Port 32 x 2-bit RAM
! Simple Dual-Port 32 x 6-bit RAM
! Single-Port 64 x 1-bit RAM
! Dual-Port 64 x 1-bit RAM
! Quad-Port 64 x 1-bit RAM
! Simple Dual-Port 64 x 3-bit RAM
! Single-Port 128 x 1-bit RAM
! Dual-Port 128 x 1-bit RAM
! Single-Port 256 x 1-bit RAM
Distributed RAM modules are synchronous (write) resources. A synchronous read can be implemented with a storage element or a flip-flop in the same slice. By placing this flip-flop, the distributed RAM performance is improved by decreasing the delay into the clock-to-out value of the flip-flop. However, an additional clock latency is added. The distributed elements share the same clock input. For a write operation, the Write Enable (WE) input, driven by either the CE or WE pin of a SLICEM, must be set High.
Virtex-5 FPGA User Guide www.xilinx.com 187UG190 (v4.2) May 9, 2008
CLB OverviewR
Distributed RAM configurations greater than the provided examples require more than one SLICEM. There are no direct connections between slices to form larger distributed RAM configurations within a CLB or between slices.
Figure 5-14: Distributed RAM (RAM256X1S)
UG190_5_14_050506
DI1D
A[7:0]
WCLKWE
(CLK)(WE/CE)
68
SPRAM64
RAM256X1S
A[6:1]WA[8:1]CLKWE
O6
DI1
68
SPRAM64
A[6:1]WA[8:1]CLKWE
O6F7BMUX
F8MUXRegisteredOutput
Output
(Optional)
D Q
O
DI1
68
SPRAM64
A[6:1]WA[8:1]CLKWE
O6
DI1
68
SPRAM64
A[6:1]WA[8:1]CLKWE
O6F7AMUX
A6 (CX)
A6 (AX)
A7 (BX)
Example configuration: Single-port 256b x 1,
registered output.
A complete list:
A 128 x 32b LUT RAM has a 1.1ns access time.35
Spring 2010 EECS150 - Lec03-FPGA Page
SLICEM shift register (one of many).
Virtex-5 FPGA User Guide www.xilinx.com 189UG190 (v4.2) May 9, 2008
CLB OverviewR
or flip-flop is available to implement a synchronous read. In this case, the clock-to-out of the flip-flop determines the overall delay and improves performance. However, one additional cycle of clock latency is added. Any of the 32 bits can be read out asynchronously (at the O6 LUT outputs) by varying the 5-bit address. This capability is useful in creating smaller shift registers (less than 32 bits). For example, when building a 13-bit shift register, simply set the address to the 13th bit. Figure 5-15 is a logic block diagram of a 32-bit shift register.
Figure 5-16 illustrates an example shift register configuration occupying one function generator.
Figure 5-15: 32-bit Shift Register Configuration
Figure 5-16: Representation of a Shift Register
ug190_5_15_050506
Output (Q)
RegisteredOutput
(Optional)
(AQ)
DI1
D Q
(AX)
SHIFTIN (MC31 of Previous LUT)
SHIFTIN (D)
A[4:0]
CLKCE
(A[6:2])
(CLK)(WE/CE)
SRL32
SRLC32E
A[6:2]
CLKCE
O6
MC31 SHIFTOUT (Q31)5
UG190_5_16_050506
SHIFTIN (D)
SHIFTOUT(Q31)WE
CLK
Address (A[4:0])
32-bit Shift Register
MUX
Q
5
See Virtex-5 User Guide for an complete list of shift-register types.
36
Spring 2010 EECS150 - Lec03-FPGA Page
SLICEL vs SLICEM ...
Virtex-5 FPGA User Guide www.xilinx.com 173UG190 (v4.2) May 9, 2008
CLB OverviewR
Figure 5-3: Diagram of SLICEM
A6DI2
COUT
D
DX
C
CX
B
BX
A
AX
O6
DI1MC31
O5
UG190_5_03_041006
A5A4A3A2A1
D6
DIDMUX
D
DQ
C
CQ
CMUX
B
BQ
BMUX
A
AQ
AMUX
Reset Type
DX
D5D4D3D2D1
WA1-WA6WA7WA8
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
Q
CECK
CLKWSGEN
CIN
0/1
WE
Sync
Async
A6DI2
O6
DI1
MC31
O5
A5A4A3A2A1
C6
CI
CX
C5C4C3C2C1
A6DI2
O6
DI1
MC31
O5
A5A4A3A2A1
B6
BI
BX
B5B4B3B2B1
A6DI2
O6
DI1
MC31
O5
A5A4A3A2A1
A6
AI
AXSRCE
CLK
WE
A5A4A3A2A1
Q
Q
Q
WA1-WA6WA7WA8
WA1-WA6WA7WA8
WA1-WA6WA7WA8
174 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008
Chapter 5: Configurable Logic Blocks (CLBs)R
Each CLB can contain zero or one SLICEM. Every other CLB column contains a SLICEMs. In addition, the two CLB columns to the left of the DSP48E columns both contain a SLICEL and a SLICEM.
Figure 5-4: Diagram of SLICEL
A6LUTROM
COUT
D
DX
C
CX
B
BX
A
AX
O6O5
UG190_5_04_032606
A5A4A3A2A1
D6
DMUX
D
DQ
C
CQ
CMUX
B
BQ
BMUX
A
AQ
AMUX
DX
D5D4D3D2D1
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
Q
CECK
CIN
0/1
A6LUTROM
O6O5
A5A4A3A2A1
C6
CX
C5C4C3C2C1
A6LUTROM
O6O5
A5A4A3A2A1
B6
BX
B5B4B3B2B1
A6LUTROM
O6O5
A5A4A3A2A1
A6
AXSRCE
CLK
A5A4A3A2A1
Q
Q
Q
Reset Type
Sync
Async
SLICEMSLICEL
SLICEM adds memory features to LUTs, + muxes.
37
Spring 2010 EECS150 - Lec03-FPGA Page
Virtex-5 DSP48E Slice
38
Efficient implementation of multiply, add, bit-wise logical.
LX110T has 64 in a single column.
Spring 2010 EECS150 - Lec03-FPGA Page 39
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
To be continued ...
Throughout the semester, we will look at different Virtex-5 features in-depth.
Switch fabricBlock RAMDSP48 (ALUs)ClockingI/OSerial I/O + PCI