basics summary from previous courseatm.neuro.pub.ro/radu_d/html/09_10/src2009/1/curs3_en.pdf ·...
TRANSCRIPT
BASICS SUMMARY FROM PREVIOUS COURSE
S. Reda EN2911X FALL’07
Advantages:• Very good efficiency and performancesDisadvantages:• No flexibility (cannot be modified after the fabrication process)• large cost
Hardware(Application Specific Integrated Circuits)
Software-programmedprocessors
Advantages:• Software is very flexible to changesDisadvantages:• Performances are clock dependent• Fixed set of instructions given by hardware
Reconfigurablecomputing
Advantages:• Is midway between hardware and software solutions• Better performances than software processors• Much more flexible than a hardware solution
Algorithm execution methods
Definition for reconfigurable systems:
A set of reconfigurable resources which can be interconnected indifferent ways.
Reconfigurable Technologies Architectures:
• Boolean function realization by the use of product sum; (PALs, PLAs, PLDs…)
• Cell areas (FPGA)
Programming technologies:
•Fuse
•Anti-fuse
•Static RAMs (SRAM)
RECONFIGURABLE SYSTEMS
Reconfigurable Calculus Systems Charactesitics
• Particularized parallelism in order to correspond with design objectives
• Specialized logic to accomplish certain specific functions
• Functionalities which can change according to problem requirements
Parallelism, specialization, hardware level adaptation
Reconfigurable circuits usage1. Small/medium IC production
2. Logic prototypization and emulation
3. Sorting algorithm acceleration, dynamic programming, with applications if domains such as: signal and image processing, cryptography, bioinformatics, shape recognition, etc.
i. Reconfigurable functional units using one processor (specific instructions)
ii. Reconfigurable units used as a co-processor
iii. Reconfigurable units accessed through external I/O or networks
[Compton’02]
Temporal vs. spatial calculus
Temporal execution(software)
Spatial execution(reconfigurable computing)
�The key to accelerate calculus processes is the ability to extract parallelism (concurrence) from algorithms. This can be accomplished using reconfigurable systems. FPGA is such a device.
PLA - Programmable Logic Array – is a SOP (sum-of-products) type logic circuit with a programmable area of AND gates and another programmable area of OR gates. Functions can be implemented using available minterms, which can be used by more than one function.PAL - Programmable Array Logic - – is a SOP (sum-of-products) type logic circuit with a programmable area of AND gates and another programmable area of OR gates. Functions can be implemented using available minterms for each output function. (PAL is a registered trademark of Advanced Micro Devices )
Each programmable circuit type contains:
• Interconnections.
• programmable switches
• Logic Block
CPLD contine mai multe Elemente functionale (functional blocks) fiecare cu mai multe"macrocells" plus o matrice de interconectare globala.
FPGA — Field-Programmable Gate ArrayReconfigurable block areas connected by programmable interconnections. Block architecture can be different from manufacturer to manufacturer, but it is LUT based.
Circuit types:
· Non-volatile, one time programmable (anti-fuse)
· Non-volatiles, reprogrammable (flash)
· Volatile (SRAM)
A logic cell is composed of a LUT (Look-up table) – search table, D type flip-flop
8:1
Mul
tiple
xer
Information can be stored in SRAM. 4-6 input Look-Up Table (LUT) is a typical size
Function programming example in a LUT (Look-up Table)
A logic cell is a look-up table (LUT) which can have a different number of inputs depending on the type of circuit. Therefore, for Virtex 6 from Xilinx, LUT has 6 inputs, Virtex 4 has 4 inputs. More LUTs form a CLB (Configurable Logic Block). This is a denomination which for a different manufacturer is changed. CLB is the name used by Xilinx company. Altera uses for its circuits (eg. Stratix circuit family) the denomination -adaptive logic modules (ALMs). Another possibility to implement logic is to use mux (multiplexers).
4-inputLUT
flip-flop
clock
muxy
qe
abcd
Switchbox
Programmable logic
In FPGA circuits, other units are used: RAM, ALU, multiplexers, adders.
Logic units are connected using reconfigurable interconnections.
FPGA Configuration
[Maxfield’04]
FPGAs can be dynamically reprogrammed before running the logic (runtime) or during the logic run (virtual hardware)
• fully
• partially
FPGA architecture
[Maxfield’04]
Programmable logic element
• Field-Programmable Gate Arrays (FGPAs) – an example of reconfigurable circuits
• FPGA are composed of an area of programmable logic blocks and functionality is determined through programmable configuration bits
• Logic blocks are connected by a set of routing resources, which are also programmable.
� Logic can be mapped in reconfigurable resources (reconfigurable fabric)
FPGA design implementation involves design decomposition and mapping in Logic Block (LB)
Mapping in LUT of a CLB
RTL
.
.C = A+B
.
Circuit
AB
+ C
Array
1) Mapping technology
LUT
Circuit compilation
2) Placement
LUT
?Assigning a logic LUT to a physical location.
3) Routing
Switch and wire segment selection for interconnections.
Programmable routing resources
• Wires which assure connection of an output computational node with the input of another computational node
• Why are interconnecting resources important?- Routing resources occupy a very large area.
– Delays caused by wiring increase according to length -> it is recommended to avoid long wires when they are not strictly necessary
– Technological miniaturization reduces circuit size but leads to wire delays.
General routing definitions
• A wire segment is a wire which is not traversed by a programmable switch
• Track – a sequence of one or more wire segments in a line. The segments can be connected using end switches.
• A routing channel is a group of parallel paths. Channel width is given by the number of paths in a channel.
channel
segment
track
CLB CLB CLB CLB
– Every logic element has a data bit at the output.
– Between the elements there are programmable interconnections
– Interconnecting tracks are grouped in channels
LE LE
LE LE
LE LE LE LE
LE LE
LE LE
Logic Element Routing Tracks
Block connections: are formed where CLB input or output pins are connected to routing channels.
It would be easier if only the logic blocks from the same column or row would have to communicate. The reality, however is different!
Switch which connects bidirectional segment-segment wires
channel
segment
track
CLB CLB CLB CLB
Switch blocks : are formed at vertical and horizontal channel intersections
Switchbox
The unit “switch box” grows as a square function of the input wire number.
Bidirectional switch. Details
Segmented and Hierarchical routingsegmented routing hierarchical routing
•Short wires for local traffic•Short wires can connect to form long wires using the switch matrixes
•Contains long wires too to allow an efficient communication without using the switches
•Routing is done in a group of logic blocks at local level
•Long hierarchic wires connect to different groups
Xilinx XC4000 Routing
Reconfiguration types
• Static – the application does not run in the moment of configuration
• Partially static (=partial reconfiguration) different parts of the application are executed in temporal fragments on the same configure support
•Dynamic – the application can be modified as a response to exterior changes (during execution)
• Reconfigurable circuits can contain elements which are not reconfigurable and which are the interface with logic blocks using programmableinterconnections. Examples:
–Embedded memory
–Multipliers, adders, etc.
–Embedded processors
• It is expensive to implement memory using logic configurable blocks → RAM blocks are added
•Size and position vary depending on the device.
•Each block can be used independently or it can be combined to for larger RAM memories
•There can be single or dual-port RAM memories
Embedded multipliers and adders
•Multipliers are slower if they are implemented by connecting a large number of programmable logic blocks -> the solution is to add multiplier blocks onto the hardware
•Typically, these are found close to the embedded RAM blocks
•Some FPGAs use Multiply-And-Accumulate (MAC) blocks (used in DSP applications)
MAC = multiply-accumulate = perform the product of two numbers and add the product in theaccumulator.
Configuration of memory which determines the logic block and interconnection programmability
Switch programming technology
Anti-fuse
Flash
SRAM
The default switch is OFF; when it is programmed it is ON.Advantages:•Small delays•Small size Disadvantages:•Not exactly reconfigurable, theuare one time programmable
The default switch is ON; when it is programmed it is OFF.Advantages:•The programming is not lost when the circuit is no longer powered up Disadvantages:•Requires more steps in the manufacturing process
SRAM cells store device programmabilityAdvantage:•Can be reconfigured really fast, and as many times as needed•Does not require additional manufacturing stepsDisadvantages:•Larges size•Programming is lost if no longer powered up
Anti-Fuse FPGA
– Small routing resources (fast)– Security– Good tolerance to manufacturing errors– Approximately 100 ��resistance
antifuse polysilicon ONO dielectric
n+ antifuse diffusion
2 �
Anti-fuse technology switch
Metal 3
Metal-to-Metal Anti-fuse
Metal 2
Via
Metal 1
Contact
Silicon
– Anti-fuses - ore “one-time programmable” (OTP).• Voltage pulse eliminates the dielectric
• Require programming only once.
Interconnection Fabric
Logic Module
Horizontal Track
Vertical Track
Anti-fuse
Typical anti-fuse interconnection for circuits produces by Actel
“Anti-fuse” technology design security
• The technology is very good for securing the design
• The configuration file (bitstream) cannot be intercepted (does not require bitstream transfer, without external configuration)
– Requires a microscope “Scanning Electron Microscope” (SEM) to determine the state of the anti-fuse (example: Actel AX2OOO anti-fuse FPGA - contains 53 million anti-fuses, of which only 2-5% are programmed in a medium size design
©ACTEL
Flash / EEPROM Devices
Coolrunner II - CPLDCourtesy: Xilinx
•The technology derives from PLD technology
•Traditionally based on AND/OR technology
• Typically used for logic glue, state machines, etc.
Flash/EEPROM trends
•Logic elements (LUTs and flip-flops)
•Segmented routing
SRAM-FPGA
Read or Write
Data
Q
Q
Programming Bit I1 I2
P1
P2P3P4
Out
2-Input LUT
•Can be programmed many times
•Every programming bit has about 5 transistors associated to it
•Large area of devices allocated for programming -> reduces speed compared to EPROM and anti-fuse.
– Some logic clusters are large (ex. Altera contains 8 LUT-FF pairs)
– 3 important idea:
• Logic elements per cluster
• Cluster connectivity towards interconnection wires –connectivity flexibility (CF)
• Interconnection matrix flexibility – switch box
LogicCluster
IO connectionsswitchbox
Problems and implications in designing circuits
LogicElement
•How much functionality does a logic element offer?
•How many routing paths are there in a channel?
•Switch capabilities?
Logic Cluster
Question: how many elementary logic blocks (ELB) must there be in a cluster?
Logic cluster size
•Interestingly, smaller logic cluster are more efficient (Betz –CICC’99)
•Includes the necessary routing area
•Small clusters (ex. A single ELB per cluster) are not “friendly” for CAD type programs (Computer Aided Design)
•Most commercial circuits have 4-8 ELB per cluster
Cluster inputs number
•By reducing the number of inputs, circuit size is reduced and itbecomes faster
•Most FPGAs (Xilinx, Lucent) have 4 ELB per cluster with more inputs than needed
LogicCluster
IO pin
Tracks
OutT0 T1 T2
T0T1T2Out
FC = 3T0 T1 T2
– Fc -> at how many routing paths can a pin be connected?
– If the logic cluster is small, FC is large FC = W
– If the logic cluster is large, Fc can be small.
•Cluster connectivity with interconnecting wires –connectivity flexibility (CF Factor)
•Interconnecting matrix flexibility – switch box
Switchbox Flexibility – given by Fs factor
0
1
0
1
0 1
0 1
– Switch box – provides areas of optimized interconnection
– Flexibility is not as important as FC
– Ex: 6 transistors are needed for FS= 3
Circuit examples
•Xilinx - family XC4000EX
-FS = 3
-FC = 0.2
-I = 8
-Num BLE – about 2.5
•Altera – family Flex10K
-FS = 3
-FC = 0.25
-I = 22
-Num BLE – about 28
° Altera does not use segmentation, all lines cross large portions of circuits.
Switchbox 0 21
2
01
2
01
0 21
– Connection rotation inside the switchbox, while maintaining FS= 3
– 6 transistors are still needed for commutation matrix
Another problem: buffer placement
– FPGA require buffers to isolate RC networks
– Architects need to decide where to place these buffers.
S S
Interconnecting paths based on switches and wires can be modeled just like RC networks
S S
FPGA Memory Implementation
Addr
A D
A D
16X1
16X1LUT1
LUT2
– For 4-input LUTs - 16 bits of information available
– Can be connected (chain) through programmable networks
– Flexibility is a key aspect
Xilinx XC4000 Series Devices
– Ideal for storing small dimension data– No space is lost
Xilinx XC4000 Dual Port Mem
– Concurrent mode data accessing.
– Fine-grained access
– Synchronous access
Coarse-grained memory
5V
Word line
BITLine
BITLine
– Large SRAM blocks, special, which are found in FPGA areas
– Lead to efficient memory implementations with predictibleperformances
– 6 transistor SRAM cells
Xilinx Virtex Block Memory
4096 bit SRAMs
- Example: Altera Flex10K- Each EAB (Embedded Array Block) has a 2K bit storage space
- EAB is placed in the center of the circuit
– LE and EAB are connected around the interconnection channel
EAB - Altera
– Optimized integrated area
– Data and addresses are used with very good performances
– Scalable for larges dimensions
•Memory growth rate•Approximately 2400 transistors per CLB
-(1200 per LUT) for implementing XC4000 (32x1 SRAM)
•6 transistors per cell for Altera SRAM (2K per EAB)
Altera 10K Xilinx 4000E
Size EABs trans CLBs trans
32x1 1 12288 1 2400
32x8 1 12288 8 19200
128x8 1 12288 32 76800
512x8 2 24576 128 307200
For 512x8 fine-grained – 10X larger dimension
•Bi-dimensional blocks of Logic Array Blocks (LABs), with 16 Logic Elements (LEs) in each LAB.
•Embedded memory blocks (M4K) and multipliers (18x18)•PLL (Phased Locked Loops) – used to generate clock signals in a certain frequency range
•Example: Circuit EP2C35 for Cyclone family: 60 columns, 45 rows and 33216 LEs. 105 M4K blocks and 35 integrated multipliers.
Example- case study: Altera Cyclone II
LE has 2 operating modes: normal and arithmetic Normal mode – general logic implementation• 4-input LUT• 6 input connections• 3 output connections
Logic element organization (normal mode)
Arithmetic mode is suitable to implement: adders, counters, accumulators, comparators
LUT is divided in two 3-input LUTs (ideal for implementing a 2-bit adder)
Logic element organization in arithmetic mode
• Each LAB: 16 LEs, LAB control signals, LE carry chains, register chains and local interconnections.
• Local interconnections transfer signals between logic elements (LE) in the same LAB
• LABs, PLLs, M4K RAM and multipliers in the left and right vicinity can also use a local connection for LAB
Logic array block organisation
Register/carry chain – connection with LAB
•Multi-track – composed of rows (direct-link, R4, R24) and columns (register chain, C4, C16)•Interconnections R4/C4 cross 4 blocks (left, right / up, down)
Multi-track interconnections
• 4608 RAM bits •250 MHz• single or dual ports• configurable as a FIFO
• ideal for DSP applications• 250 Mhz• configurable as an 18 bit multiplier, or as two 9-bit independent multipliers
Embedded RAM and multipliers
• IO Element, structure (IOE) (allows bidirectional signals)
• 5 IOE per row I/O block
IO (Input/Output) Element and structure (IOE)
ANEXES:
Information regarding reconfigurable circuit prices (FPGA) –DIGI-KEY CORPORATION: http://www.digikey.com/2 examples:-Spartan IIE (300 gates) approx. 25$
- Virtex 4 approx. 2000$ (contains a series of high performance functional units - DSP, microcontroller, Ethernet interfaces etc..)
Most recent research area in reconfigurable calculus domain
• Efficient consumer architecture development and CAD techniques for FPGA
• New applications for reconfigurable circuits (especially integrated applications and securization)
•A better understanding of standard microprocessor role and reconfigurable hardware
•Multiple parallelism types
•Coarse-grained reconfigurable architectures
Bibliography:[1] Prof. Sherief Reda Division of Engineering, Brown University Spring 2007 - Reconfigurable Computing[2] Prof. Russell Tessier - Reconfigurable Computing , 2006[3] Dr. Gilles SASSATELLI- Lecture on Reconfigurable Technologies, 2009
�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������