ee109 fpgas and memories
TRANSCRIPT
18.1
Unit 18
Field Programmable Gate Arrays (FPGAs)
Implementing Logic Functions with Memories
18.2
HARDWARE IMPLEMENTATION TARGETS
18.3
Processing Logic Approaches• Recall HW/SW designs sit on a continuum• Suppose I want to implement: F = (X+Y)*(A+B)• Custom Hardware (Faster, Less Power)
– Logic that directly implements a specific task– Example above may use separate adders and a
multiplier unit
• General Purpose (GP) Processor/Microcontroller (Design Time, Cost)– Logic designed to execute SW instructions– Provides basic processing resources that are reused by
each instruction
• What if I want to perform: (X*Y) + (A*B)– What's easiest to redesign?
+(Adder)
+(Adder)
*
X
Y
A
B
F
Custom HW ImplementationC
om
pu
tin
g S
ys
tem
Co
nti
nu
um
Application
Specific Hardware
(no software)
Processor
Executing Software
Fle
xib
ilit
y, D
es
ign
Tim
e
Pe
rfo
rma
nc
e
Co
st
+ *
CPU controlInstruc.
StoreADD T,X,YADD S,A,BMUL F,T,S
GP Proc. Implementation
of (X+Y)*(A+B)
Data in Mem.
Proc
18.4
Progression of HW Logic Density
• Our ability to design hardware components with greater numbers of gates/transistors has increased exponentially
• Small Scale Integrated (SSI) Circuits– 1960’s and 1970’s– A few gates on a chip (74LS00 has 4 NAND gates)
• Medium Scale Integrated (MSI) Circuits– 1970’s– Around a hundred gates per chip (4-bit adder)
• Large Scale Integrated (LSI) Circuits• Very Large Scale Integrated (VLSI) Circuits
– 100’s of millions of gates
18.5
ASICs
• Application Specific Integrated Circuits (ASICs) is another name for a typical "chip"
• Computer engineers determine the gates and their interconnection that performs a specific task/application– Start with high level "behavioral" description
– Use CAD software tools to refine that to logic gates
– Use CAD software tools to refine that to transistors and where each should be located on the surface of the chip and how they should be wired together
– From there the chip is fabricated and mass-produced
• Design process is expensive, and once fabricated the design cannot be changed (but it is fast and uses less power)
In an ASIC design, a
unique chip will be
manufactured that
implements our design at
which point the HW
design is fixed & cannot
be changed (example:
Pentium, etc.)
18.6
ASICs
18.7
Motivation for Reconfigurable Logic• Could we get some of the benefits of
both hardware (speed/power) AND software (flexible/reusable)
• Yes…enter Field Programmable Gate Arrays (FPGAs)– Has prebuilt, generic hardware constructs
that can be configured and interconnected based on one design and then reconfigured and interconnected later for another design
• Let's learn more about the secret ingredient to FPGAs…memories!
Computing System ContinuumApplication
Specific Hardware
(no software /
custom chip)
Microcontroller/Processor
Executing Software
Reconfigurable
Hardware; FPGAs
FPGA’s have “logic
resources” on them that
we can configure to
implement our specific
design. We can then
reconfigure it to
implement another design
18.8
Where are FPGAs Used
• Datacenters
– Bing search engine
– Real-time data analytics
– Compression and encryption
– High-frequency trading
• Robots and Rovers
– JPL and the Mars Rovers
• Telecom
• Aerospace
18.9
USING MEMORIES TO BUILD COMBINATIONAL CIRCUITS
18.10
MEMORY BASICSDimensions and Operations
18.11
Memories
• Memories store (write) and retrieve (read) data
– Read-Only Memories (ROM’s): Can only retrieve data (contents are initialized and then cannot be changed)
– Read-Write Memories (RWM’s): Can retrieve data and change the contents to store new data
18.12
ROM’s
• Memories are just tables of data with rows and columns
• When data is read, one entire row of data is read out
• The row to be read is selected by putting a binary number on the address inputs
0 0 1 1
1 0 1 0
0 1 0 0
0 1 1 1
1 1 0 1
1 0 0 0
0 1 1 0
1 0 1 1
A2
A0
A1
D3 D2 D1 D0
0
1
2
3
4
5
6
7
Address
Inputs
Data
Outputs
ROM
18.13
ROM’s
• Example– Address = 410 = 1002 is
provided as input
– ROM outputs data in that row (1101 bin.)
0 0 1 1
1 0 1 0
0 1 0 0
0 1 1 1
1 1 0 1
1 0 0 0
0 1 1 0
1 0 1 1
A2
A0
A1
1 1 0 1
0
1
2
3
4
5
6
7
Address:
1002 = 410
Data:
Row 4 is
output
ROM
1
0
0
D3 D2 D1 D0
18.14
Memory Dimensions
• Memories are named by their dimensions:
– Rows x Columns
• n rows and m columns => n x m ROM
• n rows => log2n address bits…or…2k rows => k address bits
• m cols => m data outputs
0 … 1
1 0
0 0
0 0
1 1
0
1
2
2n-2
ROM
.
.
.
2n-1
An-1
A0
A1
…
Dm-1 D0
18.15
RWM’s
• Writable memories provide a set of data inputs for write data (as opposed to the data outputs for read data)
• A control signal R/W (1=READ / 0 = WRITE) is provided to tell the memory what operation the user wants to perform
0 0 1 1
1 0 1 0
0 1 0 0
0 1 1 1
1 1 0 1
1 0 0 0
0 1 1 0
1 0 1 1
A2
A0
A1
DO3 DO2 DO1 DO0
0
1
2
3
4
5
6
7
Address
Inputs
Data
Outputs
8x4 RWM
DI2
DI0
DI1
DI3Data
Inputs
R/W
18.16
RWM’s
• Write example– Address = 310 = 0112
– DI = 1210 = 11002
– R/W = 0 => Write op.
• Data in row 3 is overwritten with the new value of 11002.
0 0 1 1
1 0 1 0
0 1 0 0
0 1 1 1
1 1 0 1
1 0 0 0
0 1 1 0
1 0 1 1
0
1
1
? ? ? ?
0
1
2
3
4
5
6
7
Address
Inputs
Data
Outputs
8x4 RWM
1
0
0
1Data
Inputs
0
R/W
1 1 0 0
A2
A0
A1
DI2
DI0
DI1
DI3
DO3 DO2 DO1 DO0
R/W
18.17
USING MEMORIES TO BUILD COMBINATIONAL FUNCTIONS
Look-up tables…
18.18
Memories as Look-Up Tables
• One major application of memories in digital design is to use them as LUT’s (Look-Up Tables) to implement logic functions
– This is the core technology used by FPGAs (Field-Programmable Gate Arrays)
• Idea: Use a memory to hold the truth table of a function and feed the inputs of the function to the address inputs to "look-up" the answer
18.19
Implementing Functions w/ Memories
1
0
1
1
0
0
0
1
A2
A0
A1
D0
0
1
2
3
4
5
6
7
8x1 Memory
X Y Z F
0 0 0 1
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 0
1 0 1 0
1 1 0 0
1 1 1 1
Arbitrary
Logic
Function
X
Z
Y
F
1
0
1
1
0
0
0
1
A2
A0
A1
D0
0
1
2
3
4
5
6
7
8x1 Memory
1
0
1
0
X,Y,Z inputs
“look up”
the correct
answer
Use a memory with the same dimensions as 'output' side of the truth table.
It's almost TOO easy.
X
YZ
F
XYZ F
A0
A1
A2D0
8x1 Mem.
18.20
Implementing Functions w/ Memories
0 0
0 1
0 1
1 0
0 1
1 0
1 0
1 1
A2
A0
A1
D1
0
1
2
3
4
5
6
7
8x2 Memory
X Y Z C S
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Multi-bit function
(One's count)
X
Z
Y
C
8x2 Memory
D0
S
0 0
0 1
0 1
1 0
0 1
1 0
1 0
1 1
A2
A0
A1
D1
0
1
2
3
4
5
6
7
1
1
0
1
D0
01+0+1 = 10
Use a memory with the same dimensions as 'output' side of the truth table.
It's almost TOO easy.
18.21
3-bit Squaring Circuit
• Q: What size memory would you use to build our 3-bit squaring circuit?
• A: 8x6 memory
• Q: What would you connect to the address inputs of the memory?
• A: A[2:0]
• Q: What bits would you program into row 5 of the memory?
• A: 011001 (i.e. 25 = 52)
Inputs Outputs
A A2 A1 A0 B5 B4 B3 B2 B1 B0 B=A2
0 0 0 0 0 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0 1 1
2 0 1 0 0 0 0 1 0 0 4
3 0 1 1 0 0 1 0 0 1 9
4 1 0 0 0 1 0 0 0 0 16
5 1 0 1 0 1 1 0 0 1 25
6 1 1 0 1 0 0 1 0 0 36
7 1 1 1 1 1 0 0 0 1 49
Memory Contents to
build 3-bit Squaring
Circuit
18.22
4x4 Multiplier ExampleDetermine the dimensions of the memory that would be necessary to implement a 4x4-bit unsigned multiplier with inputs X[3:0] and Y[3:0] and outputs P[??:0]
Question: How many bits are needed for P?
Question: What are the contents of the numbered rows?
Example:
X3X2X1X0=0010
Y3Y2Y1Y0=0001
P = X * Y = 2 * 1 = 2
= 00010
ROM
...
A2
A0
A1Y1
Y0
Y2
Y3 A3
A6
A4
A5X1
X0
X2
X3 A7
P7 P0
0
2
20
39
255
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 1 1 1 0
1 1 1 0 0 0 0 1
20=00010100
=0001*0100=4
39=00100111
=0010*0111=14
255=11111111
=1111*1111=225
18.23
Implementing Functions w/ Memories
• To implement a function w/ n-variables and m outputs
• Just place the output truth table values in the memory
• Memory will have dimensions: 2n rows and m columns
– Still does not scale terribly well (i.e. n-inputs requires memory w/ 2n rows)
– But it is easy and since we can change the contents of memories it allows us to create "reconfigurable" logic
– This idea is at the heart of FPGAs
18.24
FPGAS
18.25
Basis of FPGA’s
• Memories provide a universal way to implement any combinational logic function– 2n x m memory can implement a
function of n-variables and m outputs
• If we use RWM (read/write memory) rather than ROM’s we can change what function the memory implements
• Memories are referred to as Look-up Tables (LUT’s)
0 0
0 1
0 1
1 0
0 1
1 0
1 0
1 1
X
Cin
Y
Cout S
D1 D0
0
1
2
3
4
5
6
7
8x2 Memory
A2
A0
A1
Full Adder
Implementation
18.26
Configurable Logic Blocks (CLB’s)
• The memory allows for any combinational function
• Provided D-FF’s allow designs with sequential logic
– “Bypass” mux selects the pure combinational output of the LUT or the sequential/registered/D-FF output
• Blue boxes indicate configurable bits that control the operation and function of the logic
Any 3-input /
2-output
combinational
function
FF’s if
sequential
logic needed
0
1
2
3
4
5
6
7
0 0
0 1
0 1
1 0
0 1
1 0
1 0
1 1
A0
A1
A2
D1 D0
8x2 Mem.
CLK
D
Q
CLK
D
Q
CLB
01 01
bypass mux
18.27
Routing & Switch Matrices
• Inputs and outputs of neighboring CLB’s connect to a “switch matrix” (SM)
• Switch matrix is simply composed of muxesthat allow us to “route” inputs and outputs to another CLB or further away
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
18.28
Routing & Switch Matrices
• Suppose we want the connection shown in green and purple, what select values would be used? B
A
L
BA
L
LBA
LBA...
...
...
...
C
To / from
N SM
Switch
Matrix
(SM)
CLB
CLB
To / from E SM
To / from
S SM
CLB
CLB
To / from W SM
A B
D
E
F
GHI
J
K
L 1110
01
11
01
11
1110
11
10=
10
11
2
110=00012
18.29
Place and Route
• ASIC: Find where each gate should be placed on the chip and how to route the wires that connect to it– Direct connections can be faster
• FPGA: Determine which LUT’s should be used and how to route through switch matrices– Added delay to go through the routing muxes
ASICFPGA
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
SM
CLB CLB
CLB CLB
3
2
2
3
2
3
3
2
18.30
BA
L
LBA...
...
C
To / from
N SM
Switch
Matrix
(SM)
CLB
To / from E SM
A B
D
E
F
1110
01
11
CLB
CLB 1CLB 2
CLB 1
CLB 2
CLB 2
CLB 1
Exercise
• Find the configuration bits to build a 3-bit free-running (always enabled) counter
0
1
2
3
4
5
6
7
A0
A1
A2
D1 D0
8x2 Mem.
CLK
D
Q
CLK
D
Q
CLB
01 01
0
1
2
3
4
5
6
7
A0
A1
A2
D1 D0
8x2 Mem.
CLK
D
Q
CLK
D
Q
CLB
01 01
0 1
1 0
d d
d d
d d
d d
d d
d d
0 0
0 1
0 1
1 0
1 0
1 1
1 1
0 0
Q0
0
0
Co
Q1
Q2
0 111
Q1Q2 Q0Co
0
0 0 Q0
Co Q0
Co
Q1
Q2
Q1
Q2
Select to
choose Q0
(B input
label) = 0001
HA
3-bit Reg.
HA HA
1
Q0Q1Q2
Ci
Q1
Q2
Q0
Q0 Co Q0*(Q0+1)
0 0 1
1 1 0
Co
Q2 Q1 Co Q2* Q1*
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 1 0
1 0 1 1 1
1 1 0 1 1
1 1 1 0 0
Selects to
choose
A = 0000
D = 0011
E = 0100
3
4
18.31
ASIC’s vs. FPGA’s
• ASIC’s
– Faster
– Handles Larger Designs
– More Expensive
– Less Flexible (Cannot be reconfigured to perform a new hardware function)
• FPGA’s
– Slower (extra logic to make it reconfigurable)
– Smaller Designs
– Less Expensive
– Extremely Flexible
18.32
Modern FPGA's
• SoC design (Xilinx Kintex [KU115])
– Quad-Core ARM cores
– DDR3 SDRAM Memory Interface
– ~800 I/O Pins
– ~15M gate equivalent FPGA fabric
• ~1M D-FFs + 552K LUTs
• 1968 dedicated DSP "slices" 18x18 multiply + adder