from “field programmable” to “programmable” · cmu/ece/calcm/hoe arm research summit,...
TRANSCRIPT
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-1
From “Field-Programmable” to “Programmable”
James C. Hoe
Department of ECE
Carnegie Mellon University
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-2
Classic FPGA in a Nutshell
I
I/O pins
programmable lookup tables (LUT) and flip-flops (FF)
aka “soft logic” or “fabric”
Inte
rco
nn
ect
LUT FF
programmable routing
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-3
FPGAs as we knew it
Traditionally, FPGAs have been the bastard step-brother of ASICs. They have been forced to act like ASICs and fit themselves into the ASIC development model. . . . . . .
. . . . . . This has meant ignoring their unique strengths: reprogrammability, late binding and run-time reconfiguration.
Andre DeHon, ISFPGA 2004
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-4
Perspective on FPGAs changed when
• Microsoft (and others) got desperate enough to do this
[www.microsoft.com/en-us/research/project/project-catapult]
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-5
New FPGAs are not RTL targets
“*”• spatial data/compute•highly concurrent• finely controllable• reprogrammable
Immediate Challenges• killer apps• ease of development
[Xilinx Versal] [Intel Agilex]
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-6
Greater break from ASIC mentality
• Dynamism ⎯ actually use the programmability
– support more functionality on same parts cost
– achieve better performance by specializing
• Shareability ⎯ multitenancy to consume “slack”
– too much logic: partition fabric spatially
– too much throughput: repurpose fabric temporally
• Manageability ⎯ bring FPGA under OS purview
– part of compute resource pool (CPU cycles, DRAM)
– seamless interface, virtualization and isolation (security and QoS)
Dynamic Partial Reconfiguration is a key capability
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-7
DPR: what is feasible today?
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-8
FPGA
DPR
Programmable Crossbar
DMA IO
DPR DPR …
DMADMA
embedded ARM core
schedulermapper
(module to RP)Interconnect and DMA configurer
plug-and-play architecture
runtime
MAMBMCMDMEMFMGMHMI MAMA
MA MB MC
MD ME MF
MG MH MI
Dynamic Execution Framework for Interactive Vision
vision stage IP library + pipeline specifications
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-9
Flash DRAMPS
AXI-PCAP Bridge
AXI Master Interface
FIFO
PCAP Interface
ARM core(user code + SW management)
MA MB MCCamera Display
MD ME MFCamera Display
MG MH MICamera Display
RPFPGA
Overlay Crossbar
DMA DMA
RP
DMA Camera
RPRP RP
DisplayDMA
RP
DMA
MD ME MFMA MB MC MG MH MI
Spatial and Temporal Multitenancy
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-10
We Actually Use Thishttps://www.cs.cmu.edu/smartheadlight
FPGA
ZynqSoC-FPGA
Camera
SLMbeamsplitter
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-11
720p
Time-Multiplexing Feasibility [FPL2018]• Interleaving (a) 2 pipelines and (b) 3 pipelines
• Pipelines differ between 1 and 6 PR partitions
1080p
batch size
It takes more than doing PR in quick successions!!!
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-12
Cost and Energy/Power Benefits [FPL2019]
• Application casestudy with 6 modules
– color-based detector triggers follow-up processing
– meets throughput and latency requirements
– ~3x logic saving (7x $ saving in parts cost)
– ~30% power/energy saving in worstcase
Static Mapping
detect
stereo
SIFT
etc.
“large”FPGA
Timeshare
DPR
“small”FPGA
busy busy busy busy busy
idle busy idle
idle busy idle
idle
timelineframe
det. stereo det. SIFT det.
frametimeline
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-13
Today’s Practical Constraints
• Number and size of PR partitions fixed apriori
– too few/too large: internal fragmentation
– too many/too small: external fragmentation
• Not all PR partitions are equal⎯even if same interface and shape
– a module needs a different bitstream for each partition it goes into
– build and store upto MxN bitstreams for N partitions and M modules
• PR is not all that fast . . .
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-14
PR could be a lot better if need be
• Reconfiguration time could be a lot faster
– increase raw speeds
– increase concurrency
• Tools could be more powerful and friendly
– more push-button GUI, less manual scripting
– higher level interfaces (incorporating management and scheduling) in-line with use models (e.g., AOC)
• More flexibility, e.g. soft partition boundaries, relocatable bitstreams
• Time-multiplex contextful modules?
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-15
Hopes and Dreams: Spatial/Temporal Multitasking Fabric?
• Manage and schedule FPGA fabric like one would with CPU cycles and memory
• First-class support of PR compute modules
– hard transport with standard
virtually, private interfaces
to memory, I/O & resources
– loadable modules designed
independently of placement
and surroundings
– security and QoS provisions
for multitenant sharing
CMU/ECE/CALCM/Hoe ARM Research Summit, September 2019, slide-16
16
Sponsored by the CONIX Research Center, one of six centers administered by the
JUMP phase of the Focused Center Research Program (FCRP), a Semiconductor
Research Corporation program sponsored by MARCO and DARPA.