istc-ec @ cornell accelerating belief propagation in hardware skand hurkat and josé martínez...

ISTC-EC @ Cornell

Accelerating Belief Propagation in Hardware

Skand Hurkat and José MartínezComputer Systems Laboratory

Cornell Universityhttp://www.csl.cornell.edu/

ISTC-EC @ Cornell

The Cornell Team

• Prof. José Martínez (PI), Prof. Rajit Manohar@ Computer Systems Lab

• Prof. Tsuhan Chen@ Advanced Multimedia Processing Lab

• MS/Ph.D. students– Yuan Tian, MS ’13– Skand Hurkat– Xiaodong Wang

ISTC-EC @ Cornell

The Cornell Graph

ISTC-EC @ Cornell

The Cornell Project

• Provide hardware accelerators for belief propagation algorithms on embedded SoCs (retail/car/home/mobile)– High speed– Very low power– Self-optimizing– Highly programmable

BP Accelerator within SoC

Graph Inference Algorithm

Result

ISTC-EC @ Cornell

What is belief propagation?

Belief propagation is a message passing algorithm for performing inference on graphical models, such as Bayesian networks or Markov Random Fields

ISTC-EC @ Cornell

What is belief propagation?

• Labelling problem• Energy as a measure of convergence• Minimize energy (MAP label estimation)• Exact results for trees– Converges in exactly two iterations

• Approximate results for graphs with loops– Yields “good” results in practice• Minimum over large neighbourhoods• Close to optimal solution

ISTC-EC @ Cornell

Not all “that” alien to embedded

𝑠0

𝑠11

𝑠12

𝑠13

𝑠21

𝑠22

𝑠23

𝑠31

𝑠32

𝑠33

𝑠41

𝑠4 2

𝑠4 3

𝑠5𝑠0 𝑠1 𝑠2 𝑠3 𝑠4 𝑠5

Remember the Viterbi algorithm?• Used extensively in digital communications

ISTC-EC @ Cornell

What does this mean?

• Every mobile device uses Viterbi decoders– Error correction codes (eg: turbo codes)– Mitigating inter-symbol interference (ISI)

• Increasing number of mobile applications involve belief propagation– More general belief propagation accelerators can

greatly improve user experience with mobile devices

ISTC-EC @ Cornell

Target markets

Retail/Car/Home/Mobile• Image processing

– De-noising– Segmentation– Object detection– Gesture recognition

• Handwriting recognition– Improved recognition through

context identification

• Speech recognition– Hidden Markov models are

key to speech recognition

Servers• Data mining tasks

– Part-of-speech tagging– Information retrieval– “Knowledge graph” like

applications

• Machine learning based tasks– Constructive machine learning– Recommendation systems

• Scientific computing– Protein structure inference

ISTC-EC @ Cornell

Hardware accelerator for BP

BP Accelerator within SoC

Graph Inference Algorithm

Result

ISTC-EC @ Cornell

Work done so far

Software• General purpose MRF inference

library– Support for arbitrary graphs– Floating point math– Parallel techniques for faster

inference

• Library optimized for grid graphs– Optimized data structures– Template can use any data type– Multiple inference techniques

optimized for early vision– Stereo matching in 200 ms

Hardware• High level synthesis of message

update unit– Vivado HLS (C-to-gates) tool used to

synthesize message update unit on ZedBoard

– ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference

– Fixed point math

• GraphGen collaboration– On-going work– Stereo matching task mapped to

multiple platforms– 10x speedup on GPU w.r.t. CPU only

implementation

ISTC-EC @ Cornell

Hierarchical belief propagation

ISTC-EC @ Cornell

Results – Stereo Matching

BP-S

BP-M (lo

gspac

e)

Hierarch

ical

Residual

hierarch

ical

440000

450000

460000

470000

480000

02000000400000060000008000000100000001200000014000000

Comparing inference algorithms on “Tsukuba” benchmark

UpdatesEnergy

Energy

Updates

ISTC-EC @ Cornell

Work done so far

Software• General purpose MRF inference

library– Support for arbitrary graphs– Floating point math– Parallel techniques for faster

inference

• Library optimized for grid graphs– Optimized data structures– Template can use any data type– Multiple inference techniques

optimized for early vision– Stereo matching in 200 ms

Hardware• High level synthesis of message

update unit– Vivado HLS (C-to-gates) tool used to

synthesize message update unit on ZedBoard

– ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference

– Fixed point math

• GraphGen collaboration– On-going work– Stereo matching task mapped to

multiple platforms– 10x speedup on GPU w.r.t. CPU only

implementation

ISTC-EC @ Cornell

GraphGen synthesis of BP-M

• BP-M update (logspace messages) implemented using GraphGen (Intel/CMU/UW)

• GPU implementation 10x faster than CPU based implementation

• On-going work on FPGA based implementation and on implementing hierarchical update

ISTC-EC @ Cornell

Cornell Publications (2013 only)

• 3x Comp. Vision & Pattern Recognition (CVPR)• 3x Asynchronous VLSI (ASYNC)• 2x Intl. Symp. Computer Architecture (ISCA)• 1x Intl. Conf. Image Processing (ICIP)

• 1x ASPLOS (w/ GraphGen folks, under review)

ISTC-EC @ Cornell

Year 3 Plans

• GraphGen extensions for BP applications– Multiple inference techniques

• Extraction of “BP ISA”– Ops on arbitrary graphs– Efficient representation

• Amplification work on UAV ensembles– Self-optimizing, collaborative SoCs

• One-day “graph” workshop with GraphGen+UIUC

ISTC-EC @ Cornell

Accelerating Belief Propagation in Hardware

Skand Hurkat and José MartínezComputer Systems Laboratory

Cornell Universityhttp://www.csl.cornell.edu/

istc-ec @ cornell accelerating belief propagation in hardware skand hurkat and josé martínez...

Documents

cornell cornell publications

cornell work

cornell year

cornell project

cornell team

math slide

review slide

uiuc slide