1. nature: non-volatile nanotube ram based field-programmable gate arrays wei zhang†, niraj k....
TRANSCRIPT
![Page 1: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/1.jpg)
11
![Page 2: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/2.jpg)
NATURE: Non-Volatile Nanotube RAM based Field-Programmable
Gate Arrays
NATURE: Non-Volatile Nanotube RAM based Field-Programmable
Gate ArraysWei Zhang†, Niraj K. Jha† and Li Shang ‡
†Dept. of Electrical EngineeringPrinceton University
‡ Dept. of Electrical and Computer Engineering
Queen’s University
![Page 3: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/3.jpg)
33
A Hybrid CMOS/NAnoTUbe REconfigurable Architecture
A Hybrid CMOS/NAnoTUbe REconfigurable Architecture
Motivation
Background on CNT and NRAM
Architecture of NATURE
Logic Folding
Experimental Results
Conclusions
![Page 4: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/4.jpg)
44
MotivationMotivation
Moore’s Law: What’s Next?Moore’s Law: What’s Next?Carbon nanotubes (CNTs)Carbon nanotubes (CNTs)Nanowires Nanowires Single electron devicesSingle electron devices......
Challenges in nano-circuits/architecturesChallenges in nano-circuits/architecturesLack of a mature fabrication processLack of a mature fabrication processDefects and run-time failuresDefects and run-time failures
Reconfigurable architectures, such as an Reconfigurable architectures, such as an FPGA, favoredFPGA, favored
Regular structures ease fabricationFault tolerance through reconfiguration
![Page 5: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/5.jpg)
55
Motivation (Contd.)Motivation (Contd.)
Problems of existing reconfigurable architecturesProblems of existing reconfigurable architecturesHigh reconfiguration time overheadHigh reconfiguration time overheadLow area efficiencyLow area efficiency
Some recent works on programmable nanofabricsSome recent works on programmable nanofabrics
Molecular logic array (Goldstein et al. [ICCAD 2002])Molecular logic array (Goldstein et al. [ICCAD 2002])
Nanowire PLA (Dehon et al. [FPGA 2004])Nanowire PLA (Dehon et al. [FPGA 2004])
CMOS/nanowire hybrid architecture CMOL (Strukov CMOS/nanowire hybrid architecture CMOL (Strukov et al. [Nanotechnology 2005])et al. [Nanotechnology 2005])
Fabrication problem not yet solvedFabrication problem not yet solved
![Page 6: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/6.jpg)
66
NATURE
CMOS fabricationcompatible
CMOS fabricationcompatible NRAM-basedNRAM-based
Run-timereconfiguration
Run-timereconfiguration
Temporallogic folding
Temporallogic folding
Designflexibility
Designflexibility Logic
density
Logicdensity
Advantages of NATUREAdvantages of NATUREAdvantages of NATUREAdvantages of NATURE
Hybrid design leverages beneficial aspects of both CMOS and CNT technologies
NRAMs are distributed in NATURE to store multi-context reconfiguration bits
Fine-grain reconfiguration (even cycle-by-cycle)
Enables temporal logic folding
Flexibility to perform area-performance trade-offsOne-to-two orders of magnitude increase in logic density
![Page 7: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/7.jpg)
77
BackgroundBackground
Carbon nanotube (CNT)Metallic or semiconductingSingle-wall or multi-wallDiameter: 1-100nmLength: up to millimetersBallistic transportExcellent thermal conductivityVery high current densityHigh chemical stabilityRobust to environment
Source: Euronanotrade
![Page 8: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/8.jpg)
88
Background (Contd.)Background (Contd.)
Non-volatile nanotube random-access memory (NRAM)
Mechanically bent or not: determines bistable on/off statesFully CMOS-compatible manufacturing processPrototype chip: 10 Gbit NRAMWill be ready for the market in the near future
Source: Nantero
![Page 9: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/9.jpg)
99
NRAMsNRAMs
Properties of NRAMsNon-volatileSimilar speed to SRAM Similar density to DRAMChemically and mechanically stable
NATURE not tied to NRAMs Phase change RAM Magnetoresistive RAM Ferroelectric RAM
![Page 10: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/10.jpg)
1010
Length-1wire
Length-4wire Long wire Switch boxLB
Switchmatrix SMB
S1
S1
Long wireLength-4 wire
Length-1 wire
Direct link
S1
S1 S1: Switch box between length-1 wires
S2: Switch box betweenlength-4 wires
Switch matrix: Local routingnetwork
Connection block Switch block
Architecture of NATUREArchitecture of NATURE Architecture of NATUREArchitecture of NATURE
Island-style logic blocks (LBs) connected by various levels of interconnects
An LB contains a super macroblock (SMB) and a local switch matrix
![Page 11: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/11.jpg)
1111
Architecture of a Super Macroblock (SMB)Architecture of a Super Macroblock (SMB)Architecture of a Super Macroblock (SMB)Architecture of a Super Macroblock (SMB)
n1 macroblocks (MBs) comprise an SMB, here n1 = 4
MB MB
48 to 16 crossbar
48 to 16crossbar
NRAM
MB
48 to 16crossbar
NRAMNRAM MB
SRAMbits
SRAMbits
---- 1
6---
- 16
---- 1
6
---- 1
6
CLK and Global signals
48 to 16crossbar
---- 8
---- 8
---- 8
---- 8
---- 1
44
---- 1
44
---- 1
44
NRAM
SRAMbits
SRAMbits
---- 1
44
CLK and Global signals
ReconfigurationbitsReconfiguration
bits
From Switch matrix
From Switch matrix
From Switch matrix
32 Outputsof SMB
![Page 12: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/12.jpg)
1212
Architecture of a Macroblock (MB)Architecture of a Macroblock (MB)Architecture of a Macroblock (MB)Architecture of a Macroblock (MB)
n2 logic elements (LEs) comprise an MB, here n2 = 4
NRAM LE LE
12 to 4crossbar
12 to 4crossbar
NRAM
LE
12 to 4crossbar
NRAMNRAM LE
48 SRAMbits
48 SRAMbits
48 SRAMbits
48 SRAMbits
---- 4 ---
- 4
---- 4
---- 4
---- 1
7
---- 1
7
---- 1
7
---- 1
7
12 to 4crossbar
---- 2
---- 2
---- 2
---- 2
CLK and Global signals
---- 4
8
---- 4
8
---- 4
8
---- 4
8
8 Outputsof MB
CLK and Global signals
Inputs to MB
Inputs to MB
Inputs to MB
Reconfiguration bits
Reconfiguration bits
![Page 13: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/13.jpg)
1313
Logic Element and InterconnectLogic Element and Interconnect
An LE implements a computation and contains:
An m-input look-up table (LUT) A flip-flop A pass transistor
Interconnect Mixed wire segment scheme25%, 50% and 25% distribution for length-1, length-4 and long wiresDirect links from one LB to its 4 neighbors
m-inputLUT DFF
CLK
SRAM cells
SMB
MB MB MB MB NRAM
---- 2
0One input
---- 4
Length-164 tracks
---- 2
---- 4
---- 8
(a)
Direct link128 tracks
Length-4128 tracks
Long wire64 tracks
![Page 14: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/14.jpg)
1414
Support for ReconfigurationSupport for Reconfiguration
Reconfiguration time short: 160ps
Area overhead of NRAMsk: no. of reconfiguration sets per NRAM, assume k = 16Area overhead: 20.5% per LB, assuming 100nm technology for CMOS logic and nanotube lengthLogic density = k (conf. copies) x area per configuration = 16*(1-0.205)=12.75
Appropriate value for k obtained through design space exploration
Word line decoder
Bit
line
deco
der
ReadVoltage
SRAMCell
Pulldown Resistor
NRAM Structure
Electrode
![Page 15: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/15.jpg)
1515
Temporal Logic FoldingTemporal Logic FoldingTemporal Logic FoldingTemporal Logic Folding
Basic idea: one can use NRAM-enabled run-time reconfiguration to realize different Boolean functions in the same logic element (LE) every few cycles
ab
c
d
e
f
g
h
OUT
LUT1
LUT2
LUT3
LUT1
a
b
c
e
OUTf
h
LUT3
d
gLUT
2
i = abc’
i l
LUT1
i
l = (i’+e’+f’)h’
OUT = d’g’+l
lOUT
NRAMCycle 1
Cycle 2
Cycle 3
![Page 16: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/16.jpg)
1616
ExampleExample
Without logic folding
Num of LEs= 6
Delay= 4 LE delays+Interconnectdelay
Num of LEs= 2
Delay=4*clock_period
With logic folding
LE1 LE2
LE3
x0 x1 x2 x3 y0 y1 y2 y3
LE4 LE5
a0
b0 c0
LE6
Out
LE1 LE2
LE1
x0 x1 x2 x3 y0 y1 y2 y3
LE1 LE2
a0
b0 c0
LE1
Out
Reconfiguration
Clock period=LE delay +Reconfiguration+Interconnectdelay
![Page 17: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/17.jpg)
1717
Folding LevelsFolding Levels
Logic folding can be performed at different levels of granularity, providing flexibility to perform area-performance trade-offs
A level-p folding implies reconfiguration of the LE after the execution of p LUT computations
(a) level-1 folding (b) level-2 folding
a0
y0 y1 y2 y3
b0 c0
z0 z1 z2
d0 g0
x0 x1 x2 x3
e0
x0 x1 x2 x3
f0
y0 y1 y2 y3
h0
Macroblock1
LUT node
Outputd
i0
a2 a3 a4 a6
Reconfiguration
Reconfiguration
a0
y0 y1 y2 y3
b0 c0
z0 z1 z2
d0e0
x0 x1 x2 x3
f0
y0 y1 y2 y3
g0
x0 x1 x2 x3
h0
d
i0
a2 a3 a4 a6
Macroblock1 Macroblock2
Output
![Page 18: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/18.jpg)
1818
Choosing the Folding LevelChoosing the Folding Level
Advantages of logic foldingSignificant flexibility for performing area-performance trade-offsAbility to map much larger circuits using the same number of LEsSignificant improvement in the area/circuit delay productReduction in the need for global routing
Folding level
Clock period increases:Routing delay increasesNumber of clock cycles decreasesReconfiguration time decreases
Total delay typically decreases
Number of LEs increases Area increases
![Page 19: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/19.jpg)
1919
Experimental SetupExperimental Setup
Instance of architecture: 4 MBs in an SMB, 4 LEs in an MB, and LEs contain a 4-input LUT
Number of reconfiguration copies k varied in order to compare implementations corresponding to selected folding levels: level-1, level-2, level-4 and no logic folding
Results based on 100nm CMOS technology parameters
![Page 20: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/20.jpg)
2020
-0.1
0.1
0.3
0.5
0.7
0.9
1.1
1.3
1.5
pm
1
sct
cm16
3a
z4m
l
cc poler8
cord
ic
lal
ldd
9sym
ml
alu
2
(normalized to level-1)
Delay (ns) for different folding levels
Level-1 Level-2 Level-4 No-folding
0.1
1
10
pm
1
sct
cm16
3a
z4m
l
cc poler8
cord
ic
lal
ldd
9sym
ml
alu
2
(normalized to level-1)
#LEs * Delay for different folding levels
Level-1 Level-2 Level-4 No-folding
Experimental ResultsExperimental ResultsExperimental ResultsExperimental Results
Average area-time product advantage = 2X
Maximum area-time product advantage = 3X
![Page 21: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/21.jpg)
2121
-0.1
0.1
0.3
0.5
0.7
0.9
1.1
1.3
1.5
16-R
CA
32-R
CA
64-R
CA
16-C
LA
32-C
LA
64-C
LA
16-C
SA
32-C
SA
64-C
SA
8-M
UL
16-M
UL
32-M
UL
(normalized to level-1)
Delay (ns) for different folding levels
Level-1 Level-2 Level-4 No-folding
16-RCA: 16-bit ripple carry adder 16-CLA: 16-bit carry lookahead adder
16-CSA: 16-bit carry select adder 8-MUL: 8-bit multiplier
0.1
1
10
100
16-R
CA
32-R
CA
64-R
CA
16-C
LA
32-C
LA
64-C
LA
16-C
SA
32-C
SA
64-C
SA
8-M
UL
16-M
UL
32-M
UL
(normalized to level-1)
#LEs * Delay for different folding levels
Level-1 Level-2 Level-4 No-folding
Experimental Results (Contd.)Experimental Results (Contd.)Experimental Results (Contd.)Experimental Results (Contd.)
Average area-time product advantage = 13X
Maximum area-time product advantage = 35X
![Page 22: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/22.jpg)
2222
Experimental Results (Contd.)Experimental Results (Contd.)
Flexibility in performing area-performance trade-off
For area-time (AT) product, larger the circuit depth, more the advantages of level-1 folding relative to no folding
For the 64-bit ripple-carry adder, this advantage is about 35X
LE utilization and logic density very high, with a reduced need for a deep interconnect hierarchy
![Page 23: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton](https://reader036.vdocuments.net/reader036/viewer/2022062309/56649eaa5503460f94baf138/html5/thumbnails/23.jpg)
2323
ConclusionsConclusions
NATURE: A novel high-performance run-time reconfigurable architecture
Introduction of NRAMs into the architecture enables cycle-by-cycle reconfiguration and logic folding
Choice of different folding levels allows the flexibility of performing area-performance trade-offs
Logic density and area-time product improved significantly
Can be very useful for cost-conscious embedded systems and future FPGA improvement