emerging nvm enabled storage...

YiranChenElectricalandComputerEngineeringUniversityofPittsburghSponsors:NSF,DARPA,AFRL,andHPLabs

EmergingNVMEnabledStorageArchitecture:FromEvolutiontoRevolution.

Outline

• Introduction• EvolutionwitheNVM:

– On‐chiphighspeedstorage;

– Off‐chipsecondarystorage;• RevolutionwitheNVM:

– Memristor‐basedneuromorphic accelerator• Conclusion

ConventionalMemoryScaling

2012– 201338nm‐ 32nmM:StackedMIMP:PlanarA:6F2, bWLG:poly/SiO2C: SiV: 1.35V

2014– 201529nm‐ 22nmM:StackedMIMP:Planar,HKMGA:6F2,bWLG:HKMGC:SiV:1.2V

2016– 201722nm‐ 16nmM:StackedMIMP:PlanarA:6F2,bBL,LBL,1T1C(VFET)G:HKMGC:SiV:1.1V

2018– 201916nm‐ 14nmM:FBRAM,STT‐RAM,RRAM,PCRAMP:PlanarA:4F2,1T,1T1R,1TMTJ(VFET)G:HKMGC:SiV:~1V

Burj KhalifaA/R=6

AspectRatioA/R

60 50 40 30 20

11Å9Å

8Å7Å

TechnologyNode1990 2000 2010

Mb/Chip

SDRAM133

DDR1200-400

DDR2400-800

DDR3800-1600

Sources:ASML,ITRS,IMEC,Hynix,IBM

Intrinsic difficulty of charge-based computing and storage!

EmergingNonvolatileMemory

MemoryTechnologiesComparison

STT‐RAM

NANDFLASH

1/0.1ms

RefreshPower

120‐140

0.2 ns

LeakageCurrent

DataRetention

MemoryCell (F2)

ReadTime

Write/EraseTime

Number ofRewrites

PowerConsumptionRead/Write

PowerConsumptionotherthanR/W

1/0.1ms <50ns

LeakageCurrent

RefreshPower

>10y >10y

5‐10ns 5‐10ns

<10ns<10ns

5‐10 ns

<10ns <10ns

5‐10ns

Low LowLow Low

None None

Source:ITRSERDworkshoppresentationbyProf.Y. Chen

Challenges:

• Identifyingtheevolutional applicationsthatcan– Easilyandseamlesslyintegratedintothecurrentmemoryhierarchyandcomputingplatform;

– FullyleveragetheadvantagesofemergingNVM;

– Notbeeasilyreplacedbyotheralternativetechnologyorarchitecture.

• Inventingarevolutionary computingandstoragearchitecturethatcan– Offerahigh‐performance,powerefficient,andscalablecomputingmodel;

– Provideatrulyseamlessintegrationbetweencomputingandmemory.

Outline

– On‐chiphighspeedstorage;• STT‐RAMbased3DcacheforCPU.

• RacetrackbasedregisterfileforGPU.

– Memristor‐basedneuromorphic accelerator.• Conclusion

Writing‘1’

1T‐1MTJSTT‐RAMSchematic

STT‐RAMbased3DcacheSpin‐TransferTorqueRandomAccessMemory

Source‐line

ReferenceLayer

FreeLayer

Bit‐line

Word‐line

Ascalabletechnology

Writing‘0’

MgO Layer

Magnetictunnelingjunction

• Pros:Lowleakagepower,highdensity.

• Cons:Longwritelatencyandlargewritepower

SRAMvs.MRAM(STT‐RAM)

Area (65nm) 3.66mm2 SRAM 3.30mm2 MRAM

Capacity/Bank 128KB 512KB

Read latency 2.25ns 2.32ns

Write latency 2.26ns 11.02ns

Read energy 0.90nJ 0.86nJ

Write energy 0.80nJ 5.00nJ

Cache configurations Leakage power

2MB (16x128KB) SRAM cache 2.09W

8MB (16x512KB) MRAM cache 0.26W

STT‐RAMbased3Dcache

• Baseline3DArchitecture– CoreLayer+CacheLayers.

– NUCAcacheswithNOCconnections.

Layer 1

Cache Controller

Layer 2

Cache Bank

Router

Cache Bank

Horizontal Hop

Data Migration

G. Sun, X. Dong, Y. Xie, J. Li, Y. Chen, HPCA, 2009.

• Challenges:longwritelatencyofSTT‐RAM.

• Solution1(S1):Read‐PreemptiveWriteBuffer.

STT-RAMCaches

Write Op.

Read Op.

Read Data

Write Buffer (FIFO) Write Req.

Read Req.

Write just begins.Write is almost done.

STT‐RAMbased3Dcache• SolutionS2:SRAM‐MRAMHybridL2Cache

MRAM Bank

SRAM Bank

32-Way STT-RAM31-Way STT-RAM &

1-Way SRAM

• Result(S1&S2):– Performanceisimprovedby4.91%comparedwithSTT‐RAMbaseline.

– Powerconsumptionisreducedby73.5%.

2M-SRAM-DNUCA 8M-MRAM-DNUCA8M Hybrid DNUCA

Outline

– On‐chiphighspeedstorage;• STT‐RAMbased3DcacheforCPU.

• RacetrackbasedregisterfileforGPU.

RacetrackforGPU

• Racetrackcell:

– Twofixedpinningregions:freeregion,andfixregion

– Write`0’

– Write`1’

– Read

WWL RWL

Pinning layer Pinning layer

Free layer

Reference layer

• Racetrack

– Racetrack‐magnetictrack– Injectcurrenttomovecell– Accessport

RacetrackforGPU

• BenefitsfromRacetrack:– Extremelysmallcellsize;

• Majorchallenges:– Shiftingcauseddelay/energy.

• Warpregisterremapping(WRR)– 60.0%RFareallocatedduringtheexecution

– Non‐optimalwarpregistermapping,maxshiftdistance—8‐cell

– WRR,interleavesthewarpregistersacrosstheaccessports,maxshiftdistance—4‐cell

…...

SLBL SLBL SLBL SLBL

Row Decod

Write/Read/Shifter Driver

Column MuxSense Amplifier Arrays

Shift ControllerArbitrator

Warp 0 Warp 0

M. Mao, W. Wen, Y. Zhang, Y. Chen, H. Li, DAC 2014

RacetrackforGPU• Writebuffer

– “piggyback‐write”towritebacktoRFfromwritebuffer;

– Relyonthetrackmovementtriggeredbythereadrequests;

– Positiveside‐effect:filtertheredundantRFR/WbyleveragingRAWandWAW.

32 4 8

To EXE/MEM

RacetrackforGPU

• Experimentresults:– Baseline:SRAM‐basedregisterfiles.

– Energyreduction:59%.

– Performanceimprovement:4%.

Outline

– Secondarystorage;• PCRAMandNANDhybridSSD;

• RevolutionwitheNVM:– Memristor‐basedneuromorphic accelerator.

• Conclusion

HybridSSD

• Memoryhierarchy

Off-chip memory 100~300 cycles

On-chip memory1~30 cycles Page mode

↓Random

access

erase-before-write (EBW)

↓In-place-

update (IPU)

Courtesy: Al Fazio (Intel)

Solid State Disk(Flash)

25K~2M cycles

PN=0, V

Erase Unit

PN=1, V

PN=2, V

PN=n, V

• Onetransistor/diodeandoneGST(GeSbTe).

• In‐placeupdating(IPU)

PRAM(PCM)Cell

High resistance: ‘0’Low resistance: ‘1’

Top ElectrodeGST

Substrate

Bottom Electrode

Heater

+NTop Electrode

Substrate

Bottom Electrode

Heater

AmorphousCrystalline

HybridSSD

• ConventionalSSD:FLASH.

• Promisingcandidate:PRAM(Phasechange).

• Tocombinebenefitsofbothtechnologies:

– HybridSSD.

• Twousage:– Performance;

– Reliability.

HybridSSD:performanceenhancement

PN=0, V

Erase Unit 1

PN=1, V

PN=2, V

PN=n, V

PN=Page Number; V=Valid; I=Invalid

Erase Unit 2

PN=0, V

Erase Unit 3

PN=1, V

PN=2, V

PN=n, VPN=n, I

(Empty Pages)

PN=2, VPN=2, I

PN=n, V

Merge Operation (time consuming)

Erase Unit = 128/256KB, Page = 512Bytes ~ 8KBG.Sun, Y. Joo, Y. Chen, Y. Xie, Y.Chen, H. Li, HPCA, 2010.

… …Data Region

DataBuffer

inMemory

Hybrid ArchitecturePhysical View Structural View

… …Log Region

NANDflash

Erase Unit

In-place updating

Sector (512Bytes)

DifferentLogAssignments

Data Region

Log Region

Erase Unit

FixedAssignment

Data Region

Log Region

Erase Unit

Organizelog pages in group

Data Region

Log Region

Erase Unit

DynamicAssignment

Static log assignmentGroup log assignmentDynamic log assignment

Outline

– Secondarystorage;• RevolutionwitheNVM:

Computing:PresentandFuture

2000 20101990

Multi‐core

ClockFrequency(MHz)

NewTrend:- Multi‐core,advancedpowermanagement,largeon‐chipstorage.

Future:- Heterogeneoussystem,Brain‐like computing.

Source:CPUDB,Intel

NeuralNetwork

2000 20101990

RocketLaunch

NuclearReactor

HotPlate

PowerDensity(m

GraymatterWhitematter

Neocortex6layersSignalstravelwithinandbetweenlayers

Brain– TheMostEfficientComputingMachine

Brain:15–30BneuronsExtremelycomplexorgan4km/mm3

Neuron:Processsignalsfromotherneurons.

Synapse:MemoryWeightsignals

NeuralNetwork

Brain‐likeNeuromorphicCircuits

HighlyparallelUltrapowerefficient

Flexible Extremelyrobust

Realworldinput

Humanfriendlyoutput

Datafriendly

Slowprogressinneuoromoprhic hardwareimplementation• Lackofefficientsynapsedesign• Notsupportivetomassconnection

0 10 20 30 40 50 60 70300

Pulse number

0 10 20 30 40 50 60 70-4

Memristor– RebirthofNeuromorphicCircuits

• Twoterminal,highdensity• Non‐volatility• Analog/multi‐levelstates

• Naturalmatrixfunction• AMIMOsystem• Goodcombinationwithmemristor

Memristor↔ Synapse Crossbar↔Network

TaN1+x

HPlab,2012

EIlab,DAC’12

1 2 3 j-1 j n-1 n

EIlab,APL’13

EIlab &HPlabTiN-TaOx device, pulses grows linearly in amplitude

Conclusion

• Emergingnonvolatilememorytechnology(NVM)suchasSTT‐RAM,racetrack,PRAMdeliverssignificantimprovementforvariousapplications.

• Challengesexistandcanbesolvedbyarchitectureleveloptimization.

• InnovationofrevolutionaryarchitecturewhichprovidesMulti‐orderspeedup,powerefficiencyimprovement,andhardwarecostreductionispromised.

emerging nvm enabled storage...

Documents

nanoscale memristor device as synapse in neuromorphic...

neuromorphic computing lab intel neuromorphic research

mnsim: a simulation platform for memristor-based...

memristor and memristive devices and systems -...

training scheme analysis for memristor-based neuromorphic...

a memristive dynamic adaptive neural network array...

spice compact modeling of bipolar/unipolar memristor...

neuromorphic microchips

engineering memristor : control over the fourth fundamental...

current mode memristor crossbars for neuromorphic...

3-d memristor crossbars for analog and neuromorphic...

memristor ppt

a proposal for hybrid memristor-cmos spiking neuromorphic...

neuromorphic computing with memristor crossbar › phy ›...

ella gale, ben de lacy costello and andrew adamatzky...

a heterogeneous computing system with …...a heterogeneous...

memristor device modeling and circuit design...

memristor oscillators

memristor and memristive devices and systems · memristive...

analog cmos/ analog cmos/memristor memristor hybrid circuits