asynchronous logic: a computer systems perspective · async (2001) v. exploiting low noise and low...

23
Asynchronous Logic: A Computer Systems Perspective Rajit Manohar Computer Systems Lab, Cornell http://vlsi.cornell.edu/

Upload: others

Post on 01-Apr-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

Asynchronous Logic: A Computer Systems Perspective

Rajit ManoharComputer Systems Lab, Cornellhttp://vlsi.cornell.edu/

Page 2: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

Approaches to computation

Discrete Time Continuous Time

Discrete Value Digital, synchronous logic

Digital, asynchronous

logic

Continuous Value Switched-capacitor analog

General analog computation

Page 3: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

Asynchronous logic

sender

receiver

clock

data timesigna

l val

ue

sampling window

sender

receiver

data 0

data 1acksender

receiver

datareq

ack

data 1

data 0

acknowledge

data

request

acknowledge

X X X X

Page 4: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

Explicit versus implicit synchronization

time

A1

B2

B4

B3

B1

A2 C2

C1

A3

A4C4

C3

A B C

time

A1 B1

A2

C2

C1

A3

A4

C4

C3

A B C

B2

B3

B4

Synchronous computation

Asynchronous computation

Page 5: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

Differences at a glance

Asynchronous Synchronous

Continuous time computation. Discrete time computation.

Observed delay is the average over possible circuit paths.

Observed delay is the maximum over possible circuit paths.

Local switching from data-driven computation. Can lead to low-power operation.

Global activity from clock-driven computation. Low power requires careful clock gating.

Throughput determined by device speed. (Margins needed for some circuit families.)

Throughput determined by device speed and additional operating margins.

Data must be encoded, requiring additional wires for signals.

Unencoded data acceptable due to global clock network.

Page 6: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

A little bit of (biased) history…

1940 1950 1960 1970

1980 1990 2000 2010

Binary addition (von Neumann)

Illiac II (UIUC/Muller)Macro modular systems (Clarke, Molnar, Stucki,…)

Flow tables (Huffman)

System Timing (Seitz, in Mead/Conway)

Switching theory (Miller)

Micro pipelines (Sutherland)

General hazard-free synthesis

Asynchronous CPU (Martin)

ARM with caches (Furber)

Commercial 80C51 (Phillips)

10G Ethernet switch (Fulcrum)

Event-driven CPU (Manohar)

Pipelined FPGAs (Manohar)

TrueNorth (IBM/Cornell)

Atlas, MU5 (Manchester)

SpiNNaker (Furber)

Neurogrid (Boahen)

AER (Mead lab)

Sensory systems (retina, cochlea, etc.)

17FO4 MIPS CPU (Martin)

UltraSparc IIIi mem controller (Sun)

Trace theory (Snepscheut)

Syntactic compilation (Martin, Rem)

HiAER (Cauwenberghs)

FACETS (Heidelberg)

Page 7: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

I. Exploiting the gap between average and max delay

• Slow part: computing carries❖ … but only in the worst-case!

1 1 0 0 0

1 0 1 1 0 0

+ 0 1 1 0 1 0

1 0 0 0 1 1 0

Theorem [von Neumann, 1946]. The average-case latency for a ripple carry binary adder is O(log N) for i.i.d. inputs

A.W. Burks, H.H. Goldstein, J. von Neumann. Preliminary discussion of the logical design of an electronic computing instrument. (1946)

Page 8: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

I. Exploiting the gap between average and max delay

• Standard adder tree❖ O(log N) latency

• Hybrid adder tree❖ Tree adder + ripple adder

Theorem [Winograd, 1965]. The worst-case latency for binary addition is Ω(log N)Theorem. The average-case latency of the hybrid asynchronous adder is O(log log N) for i.i.d. inputsTheorem. The average-case latency of the hybrid asynchronous adder is optimal for any input distribution

R. Manohar and J. Tierno. Asynchronous parallel prefix computation. IEEE Transactions on Computers (1998)

S.O. Winograd. On the time required to perform addition. JACM (1965)

Page 9: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

II. Exploiting data-driven power management

• Low power FIR filter❖ External, synchronous I/O❖ Monitor occupancy of FIFOs❖ Speed up/slow down using

adaptive power supply

• Break-point add/multiply❖ Detect when top half of

operand is required❖ Dynamically switch between

full width and half width arithmetic

FIR filter

State

DC/DC converter

Vdd

L. Nielsen and J. Sparso. A Low-power Asynchronous Datapath for a FIR filter bank. Proc. ASYNC (1996)

Page 10: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

II. Exploiting data-driven power management

• Width-adaptive numbers❖ Compress leading 0’s and 1’s

sender

receiver

sender

receiver

128−bit datapath

8−bits per VDW block

Number: 1.4⇥ 1014 ⇤ 247

Number: �29 ⇤ �25

R. Manohar. Width-Adaptive Data Word Architectures. Proc. ARVLSI (2001)

0 1 0 1 0 1

0 0 0 · · · 0 0 0 1+ 0 0 0 · · · 0 0 0 1

0 0 0 · · · 0 0 1 0

0 1+ 0 1

0 1 0

00

0 1 0 1 0 1

0 0 0 · · · 0 0 0 1+ 0 0 0 · · · 0 0 0 1

0 0 0 · · · 0 0 1 0

0 1+ 0 1

0 1 0

00

v/s

Average: 16.3

Average: 10.8

Average: 10.0Average: 8.4

Average: 10.5Average: 8.8

Average: 10.8Average: 13.4

Average: 10.9

Average: 12.6

Average: 11.8

Average: 10.3

Operand Widths

15 20 25 30

Cum

ulat

ive

Dist

ribut

ion

Func

tion

(%)

Operand Widths

SPEC: 124.m88ksim

no compactionsimple compaction

full compaction

SPEC: 132.ijpeg

no compactionsimple compaction

0

20

40

60

80

100

5 10 15 20 25 30

Cum

ulat

ive

Dist

ribut

ion

Func

tion

(%)

Operand Widths

SPEC: 129.compress

no compactionsimple compaction

0

10

20

30

40

50

60

70

80

90

100

5 10 15 20 25 30

Cum

ulat

ive

Dist

ribut

ion

Func

tion

(%)

Operand Widths

SPEC: 126.gcc

no compactionsimple compaction

full compaction

full compaction full compaction

5

100

90

80

70

60

50

40

30

20

10

0

0

20

40

60

80

100

5 10 15 20 25 30

Cum

ulat

ive

Dist

ribut

ion

Func

tion

(%)

10

Page 11: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

III. Exploiting elasticity of pipelines

gf h

gf h

*R. Manohar, A. J. Martin. Slack Elasticity in Concurrent Computing. Proc. Mathematics of Program Construction (1998)

Asynchronous computation is* robust to changes in circuit-

level pipelining!

Page 12: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

III. Exploiting elasticity of pipelines

• Asynchronous FIR filter• Data arrives at variable rates• Data rates are predictable

• Fixed amount of time for asynccomputation

• Variable number of clock cyclesfor the fixed time budget

M. Singh, J.A. Tierno, A. Rylyakov, S. Rylov, S.M. Nowick. An adaptively pipelined mixed synchronous-asynchronous digital FIR filter chip operating at 1.3 GHz. Proc. ASYNC (2002)

1 2 3 9

Page 13: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

III. Exploiting elasticity of pipelines

• FPGA throughput low due to overhead ofprogrammable interconnect

• ~3x improvement in peak throughput bytransparent interconnect pipelining CB

CB

CB

CB

SB

SB

CB

CB

CB

CB

SB

SB

LB

LB

LB

LB

J. Teifel and R. Manohar. Highly pipelined asynchronous FPGAs. Proc. FPGA (2004)

Page 14: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

IV. Exploiting timing robustness

77K

200

400

600

800

1000

1200

0.5 1 1.5 2 2.5Voltage (V)

Xilinx Virtex

400K

Asynchronous FPGA Test Data

Process: TSMC 0.18um

Thro

ughp

ut (M

Hz) Nominal Vdd: 1.8V

294K

12K

0

D. Fang, J. Teifel, and R. Manohar. A High-Performance Asynchronous FPGA: Test Results. Proc. FCCM (2005)

Page 15: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

IV. Exploiting timing robustness

• “GasP” logic family• 6-10 FO4

• No margins on controlsignals

• Standard datapath

• Latest 40nm chip operates>6 GHz

w

w

Self-reset

Single-trackcontrol wire

Track reset

TracksetWait for

L set,R reset

Datapathdriver

RA(i) RA(i+1)

I. Sutherland, S. Fairbanks. GasP: a minimal FIFO control. Proc. ASYNC (2001)

Page 16: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

V. Exploiting low noise and low EMI

J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous Circuits in Contactless Smart Cards. Proc. ASYNC (2000)

Synchronous 80C51 (Phillips)

Asynchronous 80C51 (Phillips)

0 100 200 300 400 MHz 0 100 200 300 400 MHz

frequency domain

time domain

Page 17: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

VI. Exploiting continuous-time information processing

• Continuous-timedigital signal processing

• Automatic adaptationto input bandwidth

• No aliasing from samplingprocess

Nyquist sampling Level-crossing sampling

B. Schell, Y. Tsividis. A clockless adc/dsp/dac system with activity-dependent power dissipation and no aliasing. Proc. ISSCC (2008)

Page 18: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

VI. Exploiting continuous-time information processing

• Address-event representation in neuromorphic systems❖ “Time represents itself”❖ Spike timing and coincidence for information processing

• Asynchronous logic❖ Temporal resolution is decoupled from throughput❖ Automatic power management

• Projects❖ Silicon retinas, cochleas, …❖ More recent: FACETS, HiAER, Neurogrid, neuroP, SpiNNaker, TrueNorth

Page 19: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

A note on oscillations in asynchronous logic• Well-known result in asynchronous circuit theory

❖ Signal transitions: look at the ith occurrence of transition t❖ = the actual time of this event❖

• Then:

❖ Bounded, independent of i, and t❖ If the circuit is strongly connected, this result holds

• More recently: stronger periodicity under more general conditions

S.M. Burns, A.J. Martin. Performance Analysis and Optimization of Asynchronous Circuits. Proc. ARVLSI (1990)

W. Hua and R. Manohar. Exact timing analysis for concurrent systems. To appear, work-in-progress session, DAC (2016)

J. Magott. Performance evaluation of concurrent systems using petri nets. IPL (1984)

|time(t, i)� time0(t, i)| < B

time(t, i)

time0(t, i) = f(t) + p⇤ ⇥ i

Page 20: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

Asynchronous logic in TrueNorth

• Spikes trigger activity…❖ … so they should be asynchronous

• Neurons leak periodically…❖ … so add a periodic event to the system (the clock)

SRAM

Neuron

Token Control

Scheduler

Router

At the external timing “tick”: • Update each neuron, delivering spikes plus

any local state update • If the neuron spikes, send output spike to

destinations, with specified delay Spikes travel through asynchronous network

• Receiver buffers spikes until it is time to deliver them

Digital, asynchronous logic

Page 21: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

Asynchronous logic in TrueNorth

• Spike communication network❖ Links are a shared resource❖ Links might be contended

• Spike delivery times can differ based on traffic❖ Different runs of the same network might

produce different spike patterns

• Instead, we use a spike scheduler❖ Guarantees deterministic computation❖ … and hence, repeatability

Page 22: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

Summary

• Asynchronous logic has a long history

• Some key properties exploited in projects❖ Gap between average and maximum delay for a computation❖ Automatic data-driven power management❖ Elastic pipelining❖ Robustness to timing uncertainty/variation❖ Reduced EMI and noise❖ Continuous-time information representation

Page 23: Asynchronous Logic: A Computer Systems Perspective · ASYNC (2001) V. Exploiting low noise and low EMI J. Kessels, T. Kramer, G. den Besten, A. Peeters, V. Timm. Applying Asynchronous

Acknowledgments

• The “async” community• Students:

❖ Ned Bingham, Wenmian Hua, Sean Ogden, Praful Purohit, Tayyar Rzayev, Nitish Srivastava

❖ John Teifel, Clinton Kelly, Virantha Ekanayake, Song Peng, David Biermann, David Fang, Chris LaFrieda, Filipp Akopyan, Basit Riaz Sheikh, Benjamin Tang, Nabil Imam, Sandra Jackson, Carlos Otero, Benjamin Hill, Stephen Longfield, Jonathan Tse, Robert Karmazin

• Collaborators:❖ General: David Albonesi, Sunil Bhave, Francois Guimbretiere, Amit Lal, Yoram

Moses, Ivan Sutherland, Yannis Tsividis❖ Neuromorphic: John Arthur, Kwabena Boahen, Tobi Delbruck, Chris Eliasmith,

Giacomo Indiveri, Paul Merolla, Dharmendra Modha

• Support: AFRL, DARPA, IARPA, NSF, ONR, IBM, Lockheed, Qualcomm, Samsung, SRC