cortical column hierarchiesnice.sandia.gov/documents/mikko_lipasti_nice_sandia_feb_2013.pdf ·...

31
Cortically-Inspired Computing Neuro-Inspired Computing Elements Workshop, Albuquerque, NM Mikko H. Lipasti Professor, Electrical and Computer Engineering University of Wisconsin – Madison http://pharm.ece.wisc.edu Collaborators: Atif Hashmi, Andy Nere, Giulio Tononi , James Thomas (WI); Olivier Temam, Hugues Berry (INRIA); IBM Synapse team; Tianshi Chen, Yunji Chen (ICT); Marc Duranton (CEA); Qi Guo (IBM China); Shi Qiu (USTC); Michele Sebag (LRI); © Mikko Lipasti

Upload: nguyendang

Post on 16-Feb-2018

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Cortically-Inspired Computing Neuro-Inspired Computing Elements Workshop, Albuquerque, NM

Mikko H. Lipasti Professor, Electrical and Computer Engineering

University of Wisconsin – Madison

http://pharm.ece.wisc.edu

Collaborators: Atif Hashmi, Andy Nere, Giulio Tononi , James Thomas (WI); Olivier Temam, Hugues Berry (INRIA); IBM Synapse team; Tianshi Chen, Yunji Chen (ICT); Marc Duranton (CEA); Qi Guo (IBM China); Shi Qiu (USTC); Michele Sebag (LRI);

© Mikko Lipasti

Page 2: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Rely on abstraction layers to manage complexity

Von Neumann Machine

Applications

Technology

Computer Architecture

What Do I Do?

Quantum Physics

Transistors & Devices

Logic Gates & Memory

Von Neumann Machine

x86 Machine Primitives

Visual C++

Firefox, MS Excel

Windows 7

Page 3: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

End of Moore‟s Law We are running into physical limits

Ultimately, single molecule/atom/electron

Before we reach the atomic scale Manufacturing yield (working parts) Reliability (intermittent/permanent failure) Variability (each device has unique characteristics) Power (can‟t afford to use all devices all the time)

On the software side: multicore impact

Parallel software is very difficult to write

Need fundamentally new approaches

Von Neumann machines: too successful

Page 4: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Look to Biology? By no means a novel inspiration

“If I haven‟t seen further, it‟s from standing

in the footprints of giants.”

But, neuroscientific understanding has

improved substantially Detailed characterization of low-level primitives Structure and connectivity much better

understood Advances in measurement, analysis Etc.

Is the brain even an interesting candidate?

Page 5: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Ken Jennings vs. IBM Watson

Ken (“baseline”) Watson

Pretty good at Jeopardy (also, life) Pretty good at Jeopardy

400g gray matter 10 racks, 15TB DRAM, 2880 CPU cores, 80 TFLOPs

30W 200KW

1 lifetime of experience 100 person-years to develop

Page 6: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Talk Outline

Introduction & Motivation

Neuromorphic applications [IISWC‟12]

Semantic Gap in Neuromorphic Systems

Neuromorphic ISA proposal [ASPLOS‟11]

Digital LIF Spiking Neurons [HPCA‟13]

Conclusions & Future Work

Applications

Technology

Computer Architecture

Page 7: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

7

blackscholes bodytrack canneal

facesim

dedup

fluidanimate freqmine

stream cluster

ferret

swaptions vips x264

Classification Clustering Approximation Optimization Filtering

PARSEC [Intel, Princeton]

Emerging Applications: RMS

Page 8: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Application Accelerators

GreenDroid

FPGAs/CGRAs

GPUs

Multi-Purpose Accelerators

Loop Accelerators

Heterogeneous multi-cores

Flexibility/energy efficiency/robustness/performance

NNet Accelerators

Page 9: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

9

Technology trigger

Peak of inflated expectations

Trough of disillusionment

Plateau of productivity

Slope of enlightment

time

level of interest

NNets... Again?! [slide: O. Temam]

SVM Killer Micro

Scientific computing

1990s NN

1990

Page 10: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

10

blackscholes bodytrack canneal

facesim

dedup

fluidanimate freqmine

streamcluster

ferret

swaptions vips x264

Classification Clustering Approximation Optimization Filtering

PARSEC

PARSEC Benchmarks

✔ ✔ ✔

BenchNN: On the Broad Potential Application Scope of Hardware Neural Network Accelerators. T. Chen et al. In Proc. of the 2012 IISWC 2012, Nov 2012. Also: Neural Acceleration for General-Purpose Approximate Programs, H. Esmaeilzadeh et al., Proceedings of MICRO-45, December 2012.

Page 11: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Talk Outline

Introduction & Motivation

Neuromorphic applications [IISWC‟12]

Semantic Gap in Neuromorphic Systems

Neuromorphic ISA proposal [ASPLOS‟11]

Digital LIF Spiking Neurons [HPCA‟13]

Conclusions & Future Work

Applications

Technology

Computer Architecture

Page 12: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

A History Lesson

Before Instruction Set Architecture…

Software depended on hardware knowledge

No portability

Optimizations were SW / HW pair specific

New computer => all new software

Gene Amdahl introduces the ISA

Contract between SW / HW

IBM S/360 line from 1964 to present

Independently develop SW and HW

Safely optimize, transform SW

ISA

Page 13: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Biologically True

STDP / LIF

Cortical Column

ANN

NISA Abstraction

Multicore CPU

GPGPU Analog ANN

Memristor ANN

Digital ANN

Code Generation

“Software”

“Hardware”

NISA proposal [ASPLOS‟11]

Page 14: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Neuromorphic HW/SW Interface Neuromorphic Instruction Set Architecture (NISA)

Represents structure and state

Automatic deployment/code generation

Goals similar to HP Labs COG, PyNN

Online profiling tools

Monitor cortical network and optimize/restructure

Offline optimizations tools

Improve the networks for efficiency and robustness

1. Hashmi et al., A Case for Neuromorphic Instruction Set Architecture, ASPLOS, 2011

2. Nere and Hashmi et al., Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms, IPDPS, 2011 3. Nere and Hashmi et al., Simulating Cortical Networks on Heterogeneous Multi-GPU Systems, JPDC, 2012

Hardware-Software Interface

Hypercolumn Model

NVIDIA GPU CPU NETLIST

Page 15: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Talk Outline Introduction & Motivation

Neuromorphic applications [IISWC‟12]

Semantic Gap in Neuromorphic Systems

Neuromorphic ISA proposal [ASPLOS‟11]

Digital LIF Spiking Neurons [HPCA‟13]

Conclusions & Future Work

Applications

Technology

Computer Architecture

Page 16: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

IBM‟s Neurosynaptic Core Digital spiking Neurosynaptic Core

Neurons (NCNs)

LP CMOS, standard digital logic

256 neurons/core on 4.2mm2

“Biologically competitive” energy

Few parameters/neuron

Binary synapses

Linear, no transcendental functions

1kHz operating frequency of NCNs

45pJ/spike

Brain Models

Simple HW

Semantic Gap

Page 17: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Ak

A2

A1

N2

Gk

G2

G1

Deco

de

Select & Encode

Output spikes

N1

A3

Input spikes

Axons SRAM Synapses Type

1 0 1 1 A3 G3

N1 N3 NM Neurons

*Figure adapted from Merolla et al.

Page 18: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Visual Cortex

Com

ple

xity o

f Featu

res

Page 19: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

11 March 2013

Retina

LGN

Helicopter

IT

Car

V4

V1 (vertical)

V1 (horizontal)

Page 20: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Visual System NNet (VSNN) 100,000 modeled neurons

Applications

Invariant object recognition

Pattern completion

Motion detection/tracking/prediction

Noise filtering

Requires complex neuronal behaviors

Not implemented in NCN primitives!

Page 21: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

NRT

Retina Input

V2

LGN

V1

V4

IT Excitatory (23%)

Connection Type

Inhibitory (8%)

STP * (19%)

NMDA ** (40%)

Hebbian (10%)

VSNN Architecture

Complex Behaviors!

NCN Compatible

* Short Term Plasticity (STP) modulated synapse

** N-methyl D-aspartate (NMDA) modulated synapses

Page 22: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Neuromorphic Semantic Gap NCN neurons are very simple (for efficiency)

Biology incorporates numerous complex behaviors

NMDA receptor effects last much longer than 1ms

Presynaptic Neuron

Postsynaptic Neuron NA NB

NMDA Receptors

50ms

Page 23: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

NCN Assembly - NMDA Composable circuit of NCN emulates effect

Presynaptic Neuron

Postsynaptic Neuron NB NA

NSyn Random

Inhibitory Spikes

Page 24: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Ak

AB

AA

NB

Gk

Output spikes

Input spikes

Axons SRAM Synapses Type

Neuro

ns

AR Sparse

Random Spikes

NA

GB

GA

ASyn GSyn

NSyn NM

GR

Mapping to IBM NCNs

NB NA

NSyn

Page 25: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

1 extra NCN/presynaptic neuron area overhead

~50*45pJ power overhead (extra spikes)

NCN Assembly - NMDA

NB NA

NSyn

Complex Neuron/Synapse Model (software)

NCN Assembly (Neurosynaptic Core hardware)

NB NA

Page 26: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Semantic Gap – Plasticity IBM NCN does not support synaptic plasticity*

Hebbian learning – “fire together, wire together”

Presynaptic Neuron

Postsynaptic Neuron NA NB

*Seo et al. design features two simple online learning rules

Page 27: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Hebbian Learning Assembly

Presynaptic Neuron

Postsynaptic Neuron NA NB

NG

NSyn

2 extra NCNs/synapse

~1000*45pJ power overhead/learned synapse

Page 28: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

VSNN on Neurosynaptic Core “Compiler” replaces complex neurons/synapses with NCN

assemblies

Deployable on Neurosynaptic Core hardware

VSNN System Overheads

Neuron “Area” 100K

Regular Neurons

200K

NMDA Assemblies

40K

STP

24K

Hebbian

Dynamic Power

3.64x (~364K

Neurons) 10Hz

45pJ/Spike

45 uW 2.6 mW .82mW .27 mW 83.2x (~3.7mW)

Page 29: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Conclusions Many compelling applications map to neural nets [IISWC‟12]

Also: Neural Acceleration for General-Purpose Approximate Programs, H. Esmaeilzadeh et al., Proceedings of MICRO-45, December 2012.

Semantic gap between “software” and “hardware”

Biological neural networks – complex nonlinear behavior

Hardware substrates:

CPU, GPU, FPGA: compile & optimize [ASPLOS‟11]

IBM Neurosynaptic Core: map to composable neuronal assemblies

Details in [Nere et al. HPCA „13]

29

Page 30: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Open Questions Applications

RMS, Approximate computing, robotics/control, …

Finding the right abstractions/interfaces

HP COG? NISA? Multiple NISAs?

Theoretical foundations would be helpful

Building a software ecosystem

Compilers, runtimes, libraries, optimizers (static vs. runtime)

Finding the right hardware primitives

Digital LLIF? Analog? Memristor? Parameters, attributes, behavior

Online learning, HW vs. SW

Page 31: Cortical Column Hierarchiesnice.sandia.gov/documents/Mikko_Lipasti_nice_sandia_feb_2013.pdf · GreenDroid FPGAs/CGRAs GPUs Multi-Purpose Accelerators Loop Accelerators Heterogeneous

Questions? http://pharm.ece.wisc.edu