20180806 mrambasedcaches using mram to create intelligent ... · 8/6/2018 · intelligent...
TRANSCRIPT
Using MRAM to Create Intelligent SSDs
Jérôme GaysseSenior Technology&Market Analyst
MRAM Developer Day 2018Santa Clara, CA 1
Study context
§ Analysis of system & application§ Performance modeling§ Emerging technologies analysis§ New architecture definition§ Performance simulation
MRAM Developer Day 2018Santa Clara, CA 2
Deep learning
§ Inference and training§ Training value = weights value
§ Training problem• Take a long time: time to market impact• Expansive hardware resources : TCO impact
§ How MRAM could solve it?
MRAM Developer Day 2018Santa Clara, CA 3
Training process
MRAM Developer Day 2018Santa Clara, CA 4
Training systemData set(100k to 1M images)
80% training20% verification
Batch
(128 images)
Forward
Backward
For each image:
All images = 1 epoch
1) Inference calculation2) Error calculation3) Weights update
Training = multiple epochs
Deep learning training
§ System architecture
MRAM Developer Day 2018Santa Clara, CA 5
GPU HBM
GPU HBM
CPU
SSD
NIC
SSD
All flash array
Performance analysis
§ Focus on ResNet50 neural network• 50 layers• 25M parameters• 3.9GMACs
§ Many benchmarks available• Nvidia, HPE, Dell…
MRAM Developer Day 2018Santa Clara, CA 6
Performance analysis
§ Throughput : 400 images per second/GPU• FP32 resolution• 25M parameters => 100MB model size
§ Checkpointing (about every 400 images)• =>100MB/s write
§ Data set : 100kB images @400fps• => 40MB/s read
MRAM Developer Day 2018Santa Clara, CA 7
No IO storage bottleneck
DL improvements
§ From FP32 to FP16 to INT8• Less computing requirements• Less memory bandwidth requirements
§ Pruning (less connexions between neurons)• Less computing requirements• Less memory bandwidth requirements
MRAM Developer Day 2018Santa Clara, CA 8
Training throughput to increase
Training performance increase
§ With DL optimization,§ and new Deep Learning Processor
developement
§ From 400 FPS to 10,000FPS (estimation)• x25
MRAM Developer Day 2018Santa Clara, CA 9
IO impact
§ If throughtput x25
• Read => 1GB/s (x25)
• Write => 1GB/s (x10)
• x25 accesses but less data to write
§ Average bandwidth, not so high
• The key bottleneck will be the latency
MRAM Developer Day 2018
Santa Clara, CA 10
Need new architecture
§ Huge data movement between training system and storage array• => High power consumption• => High silicon cost (AFA controller, NIC…)
MRAM Developer Day 2018Santa Clara, CA 11
All flash array Training system
8 GB/s
8 GB/s
Computational storage concept
§ Computational storage = computing capabilities in the SSD
MRAM Developer Day 2018Santa Clara, CA 12
SSD controller+
NandFlash+
Local computing
CPU
Reduce data movement:Less power, higher performance
Computational storage architecture
§ NV cache for DL weights checkpointing
MRAM Developer Day 2018Santa Clara, CA 13
Computational storage controller
NVMe
Local computing
Flashcontroller
NV Cache
Flash
NV Cachecontroller
Theory of operation
§ Namespaces• Weights• Data set
§ Vendor specific commands• Start training
MRAM Developer Day 2018Santa Clara, CA 14
NV cache technologies
§ MRAM standalone chipor§ NVDIMM-N like technology
(DRAM+FLASH+Energy source)
MRAM Developer Day 2018Santa Clara, CA 15
Is MRAM capacity ok?
§ Model size examples• 25MB to 100MB for RestNet50• 60MB To 230MB for RestNet152
§ MRAM capacity• 256Mb chip today• 1Gb this year• 4 Gb in 2-5 years
MRAM Developer Day 2018Santa Clara, CA 16
Capacity: ok
Is MRAM performance ok?
§ Weights Write : 1GB/s§ MRAM bandwidth:10GB/s
• 1333MT/s
MRAM Developer Day 2018Santa Clara, CA 17
Performance: ok
Is MRAM endurance ok?
§ Training setup• 50 epochs,• 1M images data set• Checkpoint every 400 images• =>125k write operations in 1.5h
§ MRAM: 1x1010 cycles• 80k trainings, 13 years
MRAM Developer Day 2018Santa Clara, CA 18
Endurance: ok
System benefit
MRAM Developer Day 2018Santa Clara, CA 19
All flash array GPU HBMCPU
SSD
NIC
CPUDeep
LearningProcessor
Flash
MRAM
§ Removing hardware cost & power
Computational storage
What about a more integratedsolution?
§ 3D§ Embedded
MRAM Developer Day 2018Santa Clara, CA 20
Embedded NVM or Cache?
§ Up to 32MB-64MB§ Not enough to store all the weights
§ New interconnect may solve it
MRAM Developer Day 2018Santa Clara, CA 21
Deep learning processor
eMRAM
Deep learning processor
eMRAM
GenZ, CCIX
Next studies
§ Simulation at system level§ Detailed storage access (latency)
§ Computational storage applied to NVDIMM?
MRAM Developer Day 2018Santa Clara, CA 22
MRAM Developer Day 2018Santa Clara, CA 23
Want to know more?
Intelligent Hybrid Flash ManagementTuesday Aug 7, 3:40-5:45 PM, SSDS-102-1: Enterprise SSDs (SSDs Track)
A Comparison of In-storage Processing Architectures and TechnologiesMonday Sept 24, 10:35 AM - 11:25 AM