istc-ec @ cornell accelerating belief propagation in hardware skand hurkat and josé martínez...
TRANSCRIPT
ISTC-EC @ Cornell
Accelerating Belief Propagation in Hardware
Skand Hurkat and José MartínezComputer Systems Laboratory
Cornell Universityhttp://www.csl.cornell.edu/
ISTC-EC @ Cornell
The Cornell Team
• Prof. José Martínez (PI), Prof. Rajit Manohar@ Computer Systems Lab
• Prof. Tsuhan Chen@ Advanced Multimedia Processing Lab
• MS/Ph.D. students– Yuan Tian, MS ’13– Skand Hurkat– Xiaodong Wang
ISTC-EC @ Cornell
The Cornell Graph
ISTC-EC @ Cornell
The Cornell Project
• Provide hardware accelerators for belief propagation algorithms on embedded SoCs (retail/car/home/mobile)– High speed– Very low power– Self-optimizing– Highly programmable
BP Accelerator within SoC
Graph Inference Algorithm
Result
ISTC-EC @ Cornell
What is belief propagation?
Belief propagation is a message passing algorithm for performing inference on graphical models, such as Bayesian networks or Markov Random Fields
ISTC-EC @ Cornell
What is belief propagation?
• Labelling problem• Energy as a measure of convergence• Minimize energy (MAP label estimation)• Exact results for trees– Converges in exactly two iterations
• Approximate results for graphs with loops– Yields “good” results in practice• Minimum over large neighbourhoods• Close to optimal solution
ISTC-EC @ Cornell
Not all “that” alien to embedded
𝑠0
𝑠11
𝑠12
𝑠13
𝑠21
𝑠22
𝑠23
𝑠31
𝑠32
𝑠33
𝑠41
𝑠4 2
𝑠4 3
𝑠5𝑠0 𝑠1 𝑠2 𝑠3 𝑠4 𝑠5
Remember the Viterbi algorithm?• Used extensively in digital communications
ISTC-EC @ Cornell
What does this mean?
• Every mobile device uses Viterbi decoders– Error correction codes (eg: turbo codes)– Mitigating inter-symbol interference (ISI)
• Increasing number of mobile applications involve belief propagation– More general belief propagation accelerators can
greatly improve user experience with mobile devices
ISTC-EC @ Cornell
Target markets
Retail/Car/Home/Mobile• Image processing
– De-noising– Segmentation– Object detection– Gesture recognition
• Handwriting recognition– Improved recognition through
context identification
• Speech recognition– Hidden Markov models are
key to speech recognition
Servers• Data mining tasks
– Part-of-speech tagging– Information retrieval– “Knowledge graph” like
applications
• Machine learning based tasks– Constructive machine learning– Recommendation systems
• Scientific computing– Protein structure inference
ISTC-EC @ Cornell
Hardware accelerator for BP
BP Accelerator within SoC
Graph Inference Algorithm
Result
ISTC-EC @ Cornell
Work done so far
Software• General purpose MRF inference
library– Support for arbitrary graphs– Floating point math– Parallel techniques for faster
inference
• Library optimized for grid graphs– Optimized data structures– Template can use any data type– Multiple inference techniques
optimized for early vision– Stereo matching in 200 ms
Hardware• High level synthesis of message
update unit– Vivado HLS (C-to-gates) tool used to
synthesize message update unit on ZedBoard
– ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference
– Fixed point math
• GraphGen collaboration– On-going work– Stereo matching task mapped to
multiple platforms– 10x speedup on GPU w.r.t. CPU only
implementation
ISTC-EC @ Cornell
Work done so far
Software• General purpose MRF inference
library– Support for arbitrary graphs– Floating point math– Parallel techniques for faster
inference
• Library optimized for grid graphs– Optimized data structures– Template can use any data type– Multiple inference techniques
optimized for early vision– Stereo matching in 200 ms
Hardware• High level synthesis of message
update unit– Vivado HLS (C-to-gates) tool used to
synthesize message update unit on ZedBoard
– ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference
– Fixed point math
• GraphGen collaboration– On-going work– Stereo matching task mapped to
multiple platforms– 10x speedup on GPU w.r.t. CPU only
implementation
ISTC-EC @ Cornell
Work done so far
Software• General purpose MRF inference
library– Support for arbitrary graphs– Floating point math– Parallel techniques for faster
inference
• Library optimized for grid graphs– Optimized data structures– Template can use any data type– Multiple inference techniques
optimized for early vision– Stereo matching in 200 ms
Hardware• High level synthesis of message
update unit– Vivado HLS (C-to-gates) tool used to
synthesize message update unit on ZedBoard
– ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference
– Fixed point math
• GraphGen collaboration– On-going work– Stereo matching task mapped to
multiple platforms– 10x speedup on GPU w.r.t. CPU only
implementation
ISTC-EC @ Cornell
Hierarchical belief propagation
ISTC-EC @ Cornell
Results – Stereo Matching
BP-S
BP-M (lo
gspac
e)
Hierarch
ical
Residual
hierarch
ical
440000
450000
460000
470000
480000
02000000400000060000008000000100000001200000014000000
Comparing inference algorithms on “Tsukuba” benchmark
UpdatesEnergy
Energy
Updates
ISTC-EC @ Cornell
Work done so far
Software• General purpose MRF inference
library– Support for arbitrary graphs– Floating point math– Parallel techniques for faster
inference
• Library optimized for grid graphs– Optimized data structures– Template can use any data type– Multiple inference techniques
optimized for early vision– Stereo matching in 200 ms
Hardware• High level synthesis of message
update unit– Vivado HLS (C-to-gates) tool used to
synthesize message update unit on ZedBoard
– ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference
– Fixed point math
• GraphGen collaboration– On-going work– Stereo matching task mapped to
multiple platforms– 10x speedup on GPU w.r.t. CPU only
implementation
ISTC-EC @ Cornell
Work done so far
Software• General purpose MRF inference
library– Support for arbitrary graphs– Floating point math– Parallel techniques for faster
inference
• Library optimized for grid graphs– Optimized data structures– Template can use any data type– Multiple inference techniques
optimized for early vision– Stereo matching in 200 ms
Hardware• High level synthesis of message
update unit– Vivado HLS (C-to-gates) tool used to
synthesize message update unit on ZedBoard
– ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference
– Fixed point math
• GraphGen collaboration– On-going work– Stereo matching task mapped to
multiple platforms– 10x speedup on GPU w.r.t. CPU only
implementation
ISTC-EC @ Cornell
GraphGen synthesis of BP-M
• BP-M update (logspace messages) implemented using GraphGen (Intel/CMU/UW)
• GPU implementation 10x faster than CPU based implementation
• On-going work on FPGA based implementation and on implementing hierarchical update
ISTC-EC @ Cornell
Cornell Publications (2013 only)
• 3x Comp. Vision & Pattern Recognition (CVPR)• 3x Asynchronous VLSI (ASYNC)• 2x Intl. Symp. Computer Architecture (ISCA)• 1x Intl. Conf. Image Processing (ICIP)
• 1x ASPLOS (w/ GraphGen folks, under review)
ISTC-EC @ Cornell
Year 3 Plans
• GraphGen extensions for BP applications– Multiple inference techniques
• Extraction of “BP ISA”– Ops on arbitrary graphs– Efficient representation
• Amplification work on UAV ensembles– Self-optimizing, collaborative SoCs
• One-day “graph” workshop with GraphGen+UIUC
ISTC-EC @ Cornell
Accelerating Belief Propagation in Hardware
Skand Hurkat and José MartínezComputer Systems Laboratory
Cornell Universityhttp://www.csl.cornell.edu/