![Page 1: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/1.jpg)
Distribution A. Approved for public release; distribution is unlimitedDistribution A. Approved for public release; distribution is unlimited
NARESH
UNIVERSITY OF ILLINOIS AT
SHANBHAG
URBANA-CHAMPAIGN
![Page 2: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/2.jpg)
Distribution A. Approved for public release; distribution is unlimited
MRAM-BASED DEEP IN-MEMORY ARCHITECTURES
This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views,opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views orpolicies of the Department of Defense or the U.S. Government.
![Page 3: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/3.jpg)
Distribution A. Approved for public release; distribution is unlimited
THE TEAM
Mike Burkland Naveen Verma(Princeton)
Pavan Hanumolu(UIUC)
Naresh Shanbhag(UIUC)
[systems & ckts][mixed-signal ICs][systems & ckts]
• L.-Y. Chen• P. Deaville• B. Zhang
• A. Patil, H. Hua, S. Gonugondla, K.-H. Kim
Ajey Jacob
[devices, foundry]
• S. Soss, D. Brown
• N. Gaul, B. Paul, B. Lanza
• J. Broussard• L. Suantak• J. Goulding• S. Cole• T. Gangwer• R. Kelley• C. Contreras
[DoD systems]
Raytheon MS GLOBALFOUNDRIES Princeton University of Illinois at Urbana-Champaign
![Page 4: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/4.jpg)
Distribution A. Approved for public release; distribution is unlimited
REALIZING ARTIFICIAL INTELLIGENCE @ THE EDGE
AI at the Edge
model
data
AI in the Cloud(autonomous cognitive agent)
• client-server model• efficiency challenged• security + privacy issues
• energy-latency efficient, securebut……
• volume + resource constraineddecisions
[source: pexels.com][source: pexels.com]
![Page 5: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/5.jpg)
Distribution A. Approved for public release; distribution is unlimited
THE DATA MOVEMENT PROBLEM𝑬𝑬𝒎𝒎𝒎𝒎𝒎𝒎𝑬𝑬𝒎𝒎𝒎𝒎𝒎𝒎
≈ ~𝟏𝟏𝟏𝟏𝟏𝟏 × (SRAM) → ~𝟓𝟓𝟏𝟏𝟏𝟏 × (DRAM) → ~𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 × (Flash)
[Canziani, Arxiv’16]
The Memory Wall in von Neumann Architectures
![Page 6: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/6.jpg)
Distribution A. Approved for public release; distribution is unlimited
fundamental question
how do we design intelligent autonomous machinesthat operate at the limits of energy-delay-accuracy?
![Page 7: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/7.jpg)
Distribution A. Approved for public release; distribution is unlimited
Proceedings of IEEE, Special Issue on non von Neumann Computing, January 2019.
Systems on Nanoscale Information fabriCswww.sonic-center.org
[2013-’17](STARnet Program by Semiconductor Research Corporation & DARPA)
![Page 8: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/8.jpg)
Distribution A. Approved for public release; distribution is unlimited
SHANNON-INSPIRED MODEL OF COMPUTING
use information-based metrics e.g., mutual information 𝐼𝐼(𝑌𝑌𝑜𝑜; �𝑌𝑌)1design low SNR fabrics, e.g., deep in-memory architecture (DIMA)2
develop statistical error-compensation (SEC) techniques3
(noisy channel)
DIMA
[ISSCC’18]
![Page 9: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/9.jpg)
Distribution A. Approved for public release; distribution is unlimited
integratethe MRAM device
within a
deep in-memory architecture (DIMA)
using
Shannon-inspired compute models
FRANC OBJECTIVEGLOBALFOUNDRIES PRINCETON & UIUC RAYTHEON MISSILE SYSTEMS
thermally limited
MRAM device
deep in-memory arch.
Shannon-inspired compute models
autonomous platforms
![Page 10: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/10.jpg)
Distribution A. Approved for public release; distribution is unlimited
[ICASSP 2014 UIUC, JSSC 2017 Princeton, JSSC 2018 UIUC]
read functions of datastandard bit-cell array(preserves storage density)
THE DEEP IN-MEMORY ARCHITECTURE (DIMA)
![Page 11: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/11.jpg)
Distribution A. Approved for public release; distribution is unlimited
SRAM DIMA PROTOTYPES
Multi-functionalinference processor
(65nm CMOS)
Random forest processor
(65nm CMOS)
On-chip training processor
(65nm CMOS)
Fully (128) row-parallel compute
(130nm CMOS)
100X EDP reduction over von Neumann equivalent* @ iso-accuracy* 8b fixed-function digital architecture with identical SRAM size
[Feb. JSSC’18] [ESSCIRC’17, JSSC July’18] [ISSCC’18, JSSC Nov.’18] [VLSI’16, JSSC’17]
![Page 12: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/12.jpg)
Distribution A. Approved for public release; distribution is unlimited
MRAM OPPORTUNITIES & CHALLENGES
high conductance → large array currentssmall TMR → high sensitivity readouthuge 𝑅𝑅𝑀𝑀𝑀𝑀𝑀𝑀 − 𝑅𝑅𝑀𝑀𝑀𝑀𝑀𝑀 limits cell compute modelshigh cell density → severe pitch-matching const.high MTJ process variation restricts bit-line SNR
MTJ resistance distribution
Free Layer
Ref. Layer
Tunnel Barrier
Select Transistor
Top Electrode
BottomElectrode
WL
Source Line
pMTJ stack 40Mb MRAM demo
Challenges:Opportunities:high density → high on-chip compute densitynon-volatility → ultra low-power duty-cycled operationin production → enables system prototyping &
reduces time to DoD availability
![Page 13: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/13.jpg)
Distribution A. Approved for public release; distribution is unlimited
MRAM-BASED DEEP IN-MEMORY ARCHITECTURES
four DIMAs proposed – two selected for prototypingBit-line Sensing (DIMA-BLS) Current Drive Time-based
Sensing (DIMA-CDTS)
𝑥𝑥[𝑖]
𝑀𝑀 × 𝑁𝑁 single-shot matrix vector multiply Functional read: 5-bit × 4-bit dot products on SLs Compute cell: 1T-1MTJ ( 4 × 1 multiplication) SL current sensing via time-based 5-bit ADCs SNR vs. energy trade-off due to sensing circuits
𝑀𝑀 × 𝑁𝑁 single-shot matrix vector multiply Functional read: binary dot products on BLs Compute cell: 2T-2MTJ (1b XNOR/AND) BL voltage sensing via 4-bit SAR ADCs SNR vs. energy trade-off due to sensing circuits
![Page 14: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/14.jpg)
Distribution A. Approved for public release; distribution is unlimited
EDP GAINS OVER VON NEUMANN EQUIVALENT
(includes all peripherals)
43× EDP reduction (256-dimensional dot products)
250×-to-1000× EDP reduction (128-dimensional dot products)
DIMA-BLS
DIMA-CDTS
MVM: matrix-vector multiply
![Page 15: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/15.jpg)
Distribution A. Approved for public release; distribution is unlimited
SHANNON-INSPIRED COMPUTE MODELS
• relaxes MRAM-based DIMA’s compute SNR requirements → enables substantial energy savings
• two methods: stochastic data-driven hardware resilience & coded DIMA
MRAM-based DIMA
use information-based metrics develop error-compensation techniques
![Page 16: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/16.jpg)
Distribution A. Approved for public release; distribution is unlimited
STOCHASTIC DATA-DRIVEN HARDWARE RESILIENCE
• noise-tolerance enhanced well beyond MTJ variations
• can be exploited for energy-SNR tradeoff
MRAM-based DIMA InferenceModel Parameters𝜃𝜃𝑀𝑀𝑀𝑀𝑀𝑀𝑆𝑆𝑆𝑆(𝑥𝑥,𝐺𝐺)G (MRAM variation, ADC noise) gi
DIMA-aware Stochastic Training
[B. Zhang, ICASSP’19]
Simulated MRAM-based BNN (CIFAR-10):
![Page 17: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/17.jpg)
Distribution A. Approved for public release; distribution is unlimited
CODED DIMA - ENHANCING DIMA’S COMPUTE SNRMTJ variation aware coding enables HIGH system SNR in spite of LOW circuit SNR
part
ial d
ecim
atio
n-in
-tim
eA
rchi
tect
ure
for m
appi
ng
FFT
on D
IMA
N-point Input Sequence
N/ M-pointInput
Sequence
N/ M-pointInput
Sequence
Input shuffling
N/ M-pointDFT
N/ M-pointDFT
Merge N/ M-point DFTs
N-point DFTCoefficients
N- p
oint
DFT
DFT, DHT, QFT
Coefficient precision (𝐵𝐵𝑊𝑊)
Out
put S
NR(d
B)
Limited byinput SNR
Limited byMRAM device
variation
Limited byquantization
noise
Input SNR: 40 dB; = 8; = 128
18dB
61 dB
41 dB
MRAM-DI MAWrite data
Functional read
Calculate error
Calibrate
write-time compensation (WTC)of MRAM variations
WTC
MTJ
𝐵𝐵𝑥𝑥 𝑁𝑁
![Page 18: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/18.jpg)
Distribution A. Approved for public release; distribution is unlimited
CURRENT STATUS
Technology transition via SRAM-based DIMA DIMA prototype designs taped out
(2019) [single-bank] validate EDPgains & calibrate models
(2020) [multi-bank] enable scaling and system integration
DIMA technology transition to RMS foralgorithm prototyping & evaluation
facilitates future MRAM-based DIMAintegration by RMS
verified & quantified RMS’s mission capability enabled by MRAM-based DIMA MRAM-based DIMA demonstrates > 2 orders-of-magnitude EDP reduction
[H. Jia, arXiv:1811.04047, Princeton]
![Page 19: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/19.jpg)
Distribution A. Approved for public release; distribution is unlimited
NEXT STEPS & SUMMARY
Mission: To realize > 200X in EDP gains in DoD workloads by integrating MRAM devicewithin Deep In-memory Architectures using Shannon-inspired compute models
FRANC Program: “to provide the foundation for new materials technology and newintegration approaches to be exploited in pursuit of novel compute architectures”
Multi-functioned DIMAs for Diverse Applications Parameterized DIMAs for Platform-design Tools
Summary
[Srivastava, et al., ISCA’18]
![Page 20: naresh - ERI SummitNARESH UNIVERSITY OF ILLINOIS AT SHANBHAG. URBANA-CHAMPAIGN. Distribution A. Approved for public release; distribution is unlimited. MRAM-BASED DEEP IN-MEMORY ARCHITECTURES](https://reader033.vdocuments.net/reader033/viewer/2022042223/5ec9673bc26f4211407e49bf/html5/thumbnails/20.jpg)