wh why near-d i data processing might be real this time– enables integg, gy yration, bandwidth and...

19
Nanostores and Beyond Nanostores and Beyond Wh d i Why near-data processing might be real this time Jichuan Chang HP Labs / Dec 8 2013 HP Labs / Dec 8, 2013

Upload: others

Post on 13-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Nanostores and BeyondNanostores and Beyond

Wh d i Why near-data processing might be real this time

Jichuan ChangHP Labs / Dec 8 2013HP Labs / Dec 8, 2013

Page 2: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Resource-Efficient Data-Centric Computing

• Opportunities • Challenges

16X L1

CPU

Challenges

56Xonline data (7-year)

6Moore’s law L2

L3

eDRAMonline data (7-year)

F t T t l

DRAM memory

SLC NVM

[Dally 2011]

Big Data

FastResponse

TotalIntegration

MLC NVM

Flash CacheFreshInsight

DeepAnalysis

Flash Cache

StorageSlide 2

Page 3: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Key Design Principlesy g p• Move and match compute to data

R d i i h h / / hi h• Redesigning the cache/memory/storage hierarchy• Rethink the hardware/software interfaces• Resilience/-ilities as cross-cutting optimizations

Slide 3

Page 4: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

From Microprocessors to Nanostores

4

Slide 4

Page 5: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

What if Data Come with Their Own Compute? Leverage NVM to eliminate hard drive Flatten cache / memory hierarchy Flatten cache / memory hierarchy Co-locate compute with data stores Specialize compute to match data Specialize compute to match data Balance compute/communicate/store C t Hi h d A ti * Compute Hierarchy and Active-* Co-design software to exploit NVM M t h ith di t ib t d t di Match with distributed systems paradigm Optimize for simplicity, volume, and efficiency

5

Slide 5

Page 6: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

A Taxonomy of NVM-based Architectures

6

Slide 6

Page 7: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Anatomy of SSD vs. Nanostores

NVMSSD NVM NVM

Compute

NVM NVM NVM

Ctrl

DRAM NVM NVM NVM

Bandwidth Efficiency Parallelism Scaling and Flexibility Issues

DRAM

Nanostore Nanostore Nanostore

Bandwidth, Efficiency, Parallelism Scaling and Flexibility Issues

CPUGPU

Compute Hierarchy

7

Nanostore Nanostore NanostoreSlide 7

Page 8: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Evaluation Challenges and New Models Upload/download Search & indexing Transactions

/

Correlation Cubing/zoom-in Classification/prediction

dom

am

mutation

l-time

ch gh

w heavy

heavy

nced

:All

:Subset

ured ctured

media

Filtering/aggregation Social network analysis

Access:Rand

Access:Strea

Access:Perm

Latency:Real

Latency:Batc

Compute:Hi

Compute:Low

R/W:Write

R /W:Readh

R/W:balan

WorkingSet

WorkingSet

Data:Struct u

Data:Unstruc

Data:Rich

m

Sort X X X X X XSort X X X X X XSearch (web) X X X X X X XSearch (image) X X X X X X XSearch indexing X X X X X XRecommender X X X X X X X XChecksum/de-dup X X X X X X X XTransactionprocessing

X X X X X X X X

Decision support X X X X X X X X X X XV d d X X X X X X

8

Video transcoding X X X X X XMining and learning X X X X X X X X

Slide 8

Page 9: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Efficiency Benefits of Nanostores

60ty

40prov

emen

t

20ctor o

f Imp

0

Fac

Benchmark 1 Benchmark 2 Benchmark 3 Benchmark 4 Benchmark 5 Mean

• Balanced system design: across compute, memory, IO and network• Future bottlenecks: network/software scalability; power/thermal

9

Future bottlenecks: network/software scalability; power/thermal

Slide 9

Page 10: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Feedbacks from Our Customers • Positive validations on use-cases and workloads

Oil/ fi i l DB/N SQL St E t G h – Oil/gas, financial, DB/NoSQL, Storage, Events, Graph …• Excited customers want early access to technology

– HP Moonshot and new prototyping efforts• More importantly: system architecture roadmapp y y p

– Large investment in time/resource to co-design software– How to train programmers and what tools to use?How to train programmers and what tools to use?– How to solve the SW vs. HW “chicken and egg” problem?

Slide 10

Page 11: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Feedbacks from Our Colleaguesg• A good time to revisit PIM and Active Memory

P i k EXECUBE PIM IRAM S t M – Prior work: EXECUBE, PIM, IRAM, Smart Memory, DIVA, LIMA/LCMT, Active pages, FlexRAM, Pinnacle Impulse/AMO Pinnacle, Impulse/AMO, …

– Funding agencies: DOE Blackcomb, DRAPA OHPC, …

• So, why it might be real this time? • And what are the new research opportunities?

Slide 11

Page 12: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Thoughts on What’s Different This Timeg• Assumption: both HW and SW changes are needed!

• Caveat: my personal viewpoints• Insights from many collaborators and colleagues • Insights from many collaborators and colleagues

– Partha Ranganathan, Norm Jouppi, Mehul Shah, Sheng Li, Doe Hyun Yoon, Kevin Lim, Justin Meza, Greg Astfalk, J h S t P l F b hi J h B L John Sontag, Paolo Faraboschi, John Byrne, Laura Ramirez, Kim Keeton, Niraj Tolia, Rob Schreiber, Gilberto Ribeiro, Dwight Barron, Mitch Wright, Siamak Tavallaei, …

– Trevor Mudge, Tom Wenisch, David Roberts, Steven Pelley, Prateek Tandon, Ron Dreslinski

– Christos Kozyrakis, Mingyu Gao, …y , gy ,

Slide 12

Page 13: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Top 10 Important Reasons (1)p p ( )1. Necessity: the renewed focus on efficiency

R d d d h hi h h d– Reduces data movement and cache hierarchy overhead– Opportunities to rebalance compute-to-memory ratios

Opportunities for NDP appliances and accelerators– Opportunities for NDP appliances and accelerators Workloads and programming model are open issues

2 Technology: 3D/2 5D stacking2. Technology: 3D/2.5D stacking– No more “merged logic/memory” dilemma – Enables integration, bandwidth and energy efficiencyg , gy y– Allows integration of NIC and accelerators Cost and thermal issues are still open challenges

Slide 13

Page 14: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Top 10 Important Reasons (2)p p ( )3. Software: distributed software framework

– MapReduce as a primary example– Popularized the concept of moving compute to data Handles tough issues such as data layout and reliability

4. Interface: new host-memory interface support4. Interface: new host memory interface support– Mobile memory is the new commodity– What’s next? DDRx LPDDRx Wide I/O HMB HMC What s next? DDRx, LPDDRx, Wide I/O, HMB, HMC, …– Motivations to replace strict master/slave interfaces Great opportunity to add support for NDP and active * Great opportunity to add support for NDP and active-

Slide 14

Page 15: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Top 10 Important Reasons (3)p p ( )5. Hierarchy: non-volatile memory / storage

– Flattens the memory/storage hierarchy– Nanostores become a self-contained building block No more choking points for getting data into the memory

6. Balance: integrated and high-BW networking6. Balance: integrated and high BW networking– Renewed interest in new fabrics and photonics– SoC/3D enable integrated NIC and efficient stackSoC/3D enable integrated NIC and efficient stack– Potential new opportunity for NDP-focused networks Together can enable efficient and scalable platforms Together can enable efficient and scalable platforms

Slide 15

Page 16: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Top 10 Important Reasons (4)p p ( )7. Heterogeneity: recent trends have paved the road

– Heterogeneity needed for specialization and flexibility– This use to one of the key barriers to incorporate NDP GPU/APU, bigLITTLE, FPGA/SoC, … NDP

8. Capacity: key to lower cost and divisibility8. Capacity: key to lower cost and divisibility– Small capacity once required high || and data movement– New compute-to-capacity ratios enabled by NVM etcNew compute to capacity ratios enabled by NVM, etc Lower-cost memory is also important for adoption

Slide 16

Page 17: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Top 10 Important Factors (5)p p ( )9. Anchor workloads: co-designed big data systems

– Databases: IBM Netezza, Oracle Exadata, …– New frameworks: MapReduce, Impala, …– Other possibilities: Financial/Geo/Telco, graphs, events … Commercially-viable anchor market is critical y

10. Ecosystem: prototypes and tools now available– SW: OpenCL MapReduce SW: OpenCL, MapReduce, …– HW: Adapteva, Vinray, Micron’s AP, Samsung SSD … Tools: extend multicore & heterogeneous computing tools Tools: extend multicore & heterogeneous computing tools

Slide 17

Page 18: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Summary yFrom microprocessors to Nanostores

– Resource-efficient data center computing– Moving and matching compute to data– Promising results and feedbacks received

Why near-data processing might be real this time? – Top 10 reasons (my take on this)– Open questions remain, but exciting times ahead …– Your insights and feedbacks are most appreciated!

Slide 18

Page 19: Wh Why near-d i data processing might be real this time– Enables integg, gy yration, bandwidth and energy efficiency – Allows integration of NIC and accelerators

Top 10 Reasons for NDP2.0p1. Necessity: the renewed focus on efficiency 2 Technology: 3D/2 5D stacking2. Technology: 3D/2.5D stacking3. Software: distributed software framework4 Interface: new host memory interface support4. Interface: new host-memory interface support5. Hierarchy: non-volatile memory / storage6 Balance: integrated and high-BW networking6. Balance: integrated and high BW networking7. Heterogeneity: recent trends have paved the road8. Capacity: key to lower cost and divisibility8. Capacity: key to lower cost and divisibility9. Anchor workloads: co-designed big data systems

10. Ecosystem: prototypes and tools now availabley p yp

Slide 19