2/19/2016 fpgas k. elliott fleming computer science & artificial intelligence lab massachusetts...
DESCRIPTION
What can we build? ResourceDE2-70DE agSMIPS V2 Logic Elements Registers SRAM250 (4K)1040 (9k)265 (9K)226(4K) Multipliers Clock Buffers PLL4810 Lines of Code /19/2016 L Very complex systemsTRANSCRIPT
05/04/23 http://csg.csail.mit.edu/6.375 L11-01
FPGAsK. Elliott Fleming Computer Science & Artificial Intelligence LabMassachusetts Institute of Technology
05/04/23 L11-2http://csg.csail.mit.edu/6.375
FPGA: A Sea of Resources
Processor
I/O Pads
Logic Blocks
SRAM
Multiplier
Clock Buffers
PLL
What can we build?Resource DE2-70 DE3 802.11ag SMIPS V2 Logic Elements
68416 135200 85924 6501
Registers 70234 270400 42107 2841SRAM 250 (4K) 1040 (9k) 265 (9K) 226(4K)Multipliers 300 576 321 0Clock Buffers
16 32 7 5
PLL 4 8 1 0Lines of Code
8762 1603
05/04/23 L11-3http://csg.csail.mit.edu/6.375
- Very complex systems
05/04/23 L11-4http://csg.csail.mit.edu/6.375
Logic Block: Building functionality
Look-up Table
Look-up Table
++
Mux
ing
Logi
c
Com
bina
tiona
l Inp
ut
Com
bina
tiona
l Out
put
Carry Out
Carry In
05/04/23 L11-5http://csg.csail.mit.edu/6.375
Slice:Look-up Table
Mux
ing
Logi
c
Com
bina
tiona
l Out
put
Com
bina
tiona
l Inp
ut
Enab
le D
emux
Arbitrary Logic Program flipflops Use inputs to select
Can we make a ROM?Can we make a RAM? Just add enable logic
05/04/23 L11-6
Reconfigurable Wiring
Logic Block
Switch
Switch
Switch
Switch
http://csg.csail.mit.edu/6.375
2D Mesh Grid Local connections
made by driving powerful transistors
Switches route across dimensions
Heterogeneous wire length Many wires to
nearby cells Few long-length
wires
SMIPS System
05/04/23 L11-7http://csg.csail.mit.edu/6.375
SMIPS Infrastructure
05/04/23 L11-8http://csg.csail.mit.edu/6.375
SMIPS InfrastructureBus Interface Logic Avalon Master/SlaveCbus Devices mkCBusWideRegRW(addr,reg); Many interfaces (Get, RegFile, etc.) Mechanism for building memory map
automatically Some C drivers included
05/04/23 L11-9http://csg.csail.mit.edu/6.375
DemonstrationSynplify ProQuartus IINios-II IDE
05/04/23 L11-10http://csg.csail.mit.edu/6.375
Cryptosort: Think DifferentLarge (.5 GB) encrypted database Decrypt Database Sort Database on key Encrypt DatabaseDo it fast, on an FPGA Design principals differ from ASIC Must be aware of FPGA hardware
05/04/23 L11-11http://csg.csail.mit.edu/6.375
Joint with Myron King, Man Cheuk Ng
0x084b6743c6530x3f9856235c580x223ad89654970x328d5487ca840x3982675a91850x928ab986ce460x92861184ff964
0x038d5487ca840x4892675a91850x147ab986ce460x92861184ff9640x084b6743c6530xcc1856235c580x982ad8965497
0x0020000000000x0004000000000x1000000000000x0000040000000x0000000001200x0200000000000x000041000000
0x0000000000300x0110000000000x0000000300000x0000001000000x0000420000000x0000000340000x030000000000
0x0000000000300x0000000001200x0000000300000x0000000340000x0000001000000x0000040000000x000041000000
0x0000420000000x0004000000000x0020000000000x0110000000000x0200000000000x0300000000000x100000000000
From Problem:
Cryptosorter0x084b6743c6530x3f9856235c580x223ad89654970x328d5487ca840x3982675a91850x928ab986ce460x92861184ff964
0x038d5487ca840x4892675a91850x147ab986ce460x92861184ff9640x084b6743c6530xcc1856235c580x982ad8965497
Encrypted Records in External Memory Decrypt Database with AES Sort Records in Ascending OrderEncrypt Sorted Records with AES
05/04/23 L11-12http://csg.csail.mit.edu/6.375
DRAM
Cryptosort Architecture:
Level 6 Sorter
Level 1 Sorter
Level 2 Sorter
Level 3 Sorter
Level 4 Sorter
Level 5 Sorter
AESCores (2)
xor
xorMemory Write
Logic
Read Memory
Logic
Record Input
Record Output
Sort Tree
Feeder
PPC PLB Master
Function Unit:Sort Tree
PLB
DRAM
05/04/23 L11-13http://csg.csail.mit.edu/6.375
Use Merge Sort O(n log(n))
L-13
05/04/23
Engineering the Merge Tree < < < <
< <
<
Each level merges 2n streams into n streams
Easy to para-meterize and build tree
Probably optimal for ASIC
L11-14http://csg.csail.mit.edu/6.375
05/04/23
Refining the ModuleNaïve implementation: exponential resource usage Each comparator takes 3% of slices At most, fit 3 levels Key observation: Throughput is rate-limited by final 2-to-1 merge step
This means each level only needs to perform one comparison per cycle
L11-15http://csg.csail.mit.edu/6.375
05/04/23
Sharing the Comparator: Idea
<
Loop:
Choose non-empty input pair corresponding to output fifo with room (scheduling)Compare the fifo headsDequeue the smaller one and put it on output fifo
We save area by having one comparator per levelBut we introduce a comparator scheduling problem
L11-16http://csg.csail.mit.edu/6.375
Sharing the Comparator: Physical Implementation Issues
Not enough regs Each BRAM
contains multiple FIFOs
Aggressive clock Single cycle
scheduling is impossible
Enq happens several cycles after scheduling
Credit based flow control
05/04/23 L11-17http://csg.csail.mit.edu/6.375
05/04/23 L11-18http://csg.csail.mit.edu/6.375
DRAM,PLB,OPB
Level 6
Level 5
Level 4
Level 3Level 2
Level 1
AESCore 0
AESCore 1
Sort TreeRead Memory
Logic
Sort TreeWrite Memory
Logic
PPC
Layout:
Level 6 Sorter
Level 1 Sorter
Level 2 Sorter
Level 3 Sorter
Level 4 Sorter
Level 5 Sorter
AESCores (2)
xor
xorMemory Write
Logic
Read Memory
Logic
Record Input
Record Output
Sort Tree