in-memory data management trends & techniques

35
Lightning Talk In-Memory Data Management Trends & Techniques GREG LUCK CTO HAZELCAST

Upload: hazelcast

Post on 03-Sep-2014

123 views

Category:

Software


0 download

DESCRIPTION

www.hazelcast.com

TRANSCRIPT

Page 1: In-memory Data Management Trends & Techniques

Lightning TalkIn-Memory Data Management Trends & Techniques"

GREG LUCK!CTO HAZELCAST

Page 2: In-memory Data Management Trends & Techniques

2

In-Memory Hardware Trends

How to Use It

Page 3: In-memory Data Management Trends & Techniques

3

Von Neumann Architecture

3

Page 4: In-memory Data Management Trends & Techniques

Hardware Trends

4

Page 5: In-memory Data Management Trends & Techniques

5

Commodity Multi-core Servers

5

0

4

8

12

16

20

Cores/CPU

Page 6: In-memory Data Management Trends & Techniques

UMA -> NUMA

Page 7: In-memory Data Management Trends & Techniques

7

Commodity 64 bit servers

7

4GB 32 18EB 64

Page 8: In-memory Data Management Trends & Techniques

8

50 Years of RAM Prices Historical and Projected

8

Page 9: In-memory Data Management Trends & Techniques

9

50 Years of Disk Prices

9

Page 10: In-memory Data Management Trends & Techniques

10

SSD Prices

10

Average Price $1/GB

Page 11: In-memory Data Management Trends & Techniques

11

Cost Comparison: USD/GB 2012

11

Disk: $0.04

SSD: $1 25x

DRAM: $21 525x

$4k

$100k

$2.1m

100TB

Page 12: In-memory Data Management Trends & Techniques

12

Max RAM Per Commodity Server

12

0 1 2 3 4 5 6 7 8 9

2010 2011 2012 2013

TB

Page 13: In-memory Data Management Trends & Techniques

13

Latency across the network

13

0 10 20 30 40 50 60 70

µs

Page 14: In-memory Data Management Trends & Techniques

14

Access Times & Sizes

14

Level RR Latency Typical Size Technology Managed By Registers <1 ns 1 KB Custom CMOS Compiler L1 Cache 1 ns 8 – 128 KB SRAM Hardware L2 Cache 3 ns .5 – 8 MB SRAM Hardware L3 Cache (oc) 10-15 ns 4 – 30 MB SRAM Hardware Main Memory 60 ns 16GB – TB DRAM OS/App SSD 50 -100us 400GB – 6TB Flash Memory OS/App Main Memory over Network

2-100us Unbounded DRAM/Ethernet/Infinband

OS/App

Disk 4 - 7ms Multiple TBs Magnetic Rotational Disk

OS/App

Disk over Network

6 - 10ms Unbounded Disk/Ethernet/Infiniband

OS/App

Page 15: In-memory Data Management Trends & Techniques

15

Access Times & Sizes

15

Level RR Latency Typical Size Technology Managed By Registers <1 ns 1 KB Custom CMOS Compiler L1 Cache 1 ns 8 – 128 KB SRAM Hardware L2 Cache 3 ns .5 – 8 MB SRAM Hardware L3 Cache (oc) 10-15 ns 4 – 30 MB SRAM Hardware Main Memory 60 ns 16GB – TB DRAM OS/App SSD 50 -100us 400GB – 6TB Flash Memory OS/App Main Memory over Network

2-100us Unbounded DRAM/Ethernet/Infinband

OS/App

Disk 4 - 7ms Multiple TBs Magnetic Rotational Disk

OS/App

Disk over Network

6 - 10ms Unbounded Disk/Ethernet/Infiniband

OS/App

Cache up to 30 times faster than memory. Memory 106 times faster than disk.

Network Memory 103 times faster than disk. SSD 102 faster than disk

Page 16: In-memory Data Management Trends & Techniques

Techniques

16

Page 17: In-memory Data Management Trends & Techniques

Exploit Data Locality

Data is more likely to be read if: •  It was recently read (temporal locality) •  If it is adjacent to other data (e.g. arrays, fields in an object) •  If it is part of a pattern (e.g. looping, relations) •  Some data is naturally accessed more frequently e.g. Pareto

Distribution

Page 18: In-memory Data Management Trends & Techniques

Working with the CPU’s Cache Hierarchy

•  Memory up to 30x slower than cache •  Alleviated somewhat by NUMA, wide ���

channel, multi-channel/large cache •  Vector instructions •  Work with Cache Lines •  Work with Memory Pages (TLBs) •  Work with Prefetching •  Exploit NUMA with cpu affinity������numactl --physcpubind=0 –localalloc java … ���

•  Exploit natural data locality

Page 19: In-memory Data Management Trends & Techniques

Data Locality Effects – intra machine

0

20

40

60

80

100

120

140

160

Linear Random - Page

Random - Heap

Intel U4100 i7-860 i7-2760QM

Page 20: In-memory Data Management Trends & Techniques

20

Tiered Storage

20

20

Local Disk SSD and Rotational

(Restartable)

Local Storage

Heap Store

Off-Heap Store

5,000,000+

1,000,000

10

1000+

2,000+

Speed (TPS) Size (GB)

100,000

10,000s -

Network Storage

Network Accessible Memory - 100,000 +

Page 21: In-memory Data Management Trends & Techniques

21

Data Locality Effects – inter machine

21 21

Compared  with  hybrid  in-­‐process  and  distributed  cache:    Latency  =  L1  speed  *  propor:on                                        +  L2  speed  *  propor:on  

L1  =  0ms  (<  5us)  for  on-­‐heap  and  50-­‐100  us  off-­‐heap  L2  =  1  ms    80%  L1  Pareto  Model:    

 =  0  *  .8  +  1  *  .2  =  .2  ms  

 90%  L1  Pareto  Model:    latency  =  0  *  .9  +  1  *  .1  

=  .1  ms  

Page 22: In-memory Data Management Trends & Techniques

Columnar Storage

•  Manipulate data locality •  Sorted Dictionary compression

for finite values •  Allows values to be held in

cache for SSE instructions •  Better cache line effectiveness •  Fewer CPU cache misses for

aggregate calculations •  Cross-over point is around a

few dozen columns

Page 23: In-memory Data Management Trends & Techniques

Parallelism

•  Multi-threading •  Avoid synchronized: CAS •  Query using a scatter gather pattern •  Map/Reduce e.g. Hazelcast Map/Reduce

Page 24: In-memory Data Management Trends & Techniques

Java: Will it make the cut?

Garbage Collection limits heap usage. G1 and Balanced aim for <100ms at 10GB. ���

Unused Memory

64GB

4GB 4s

Heap

Java Apps Memory Bound

GC Pause Time

Available Memory

GC

Off-Heap Storage

No low-level CPU access ���

Java is challenged as an infrastructure language despite its newly popular ���

usage for this

Page 25: In-memory Data Management Trends & Techniques

CEP/Stream Processing

•  Don’t let data pool up and then process with “pull queries”. •  Invert that and process it as it streams in. “push queries” •  Queries execute against “tables” that breaks the stream up into���

a current time window •  Hold the window and intermediate results in memory������������ Results are in real-time

Page 26: In-memory Data Management Trends & Techniques

In-Situ Processing

Rather than moving the data to be processed you process it in-situ. ��� Examples: ��� - HANA Calculation Engine���- Google Big Query���- Exadata Storage Servers���- Hazelcast EntryProcessor and Distributed Executor Service

Page 27: In-memory Data Management Trends & Techniques

27

Souped-Up Von Neumann Architecture

27

Memory Over The Network

Memory Over The Network

SSD (Flash and

RAM)

Multi-processor Multi-core/

Compression

64 bit DRAM

More Cache, NUMA, Wide/Multi channel, Locality

PCI Flash

PCI Flash

Vector/AES etc

Page 28: In-memory Data Management Trends & Techniques

The Data Management Landscape

28

Page 29: In-memory Data Management Trends & Techniques

29 29

The new data management world

Data Grid

Terracotta Coherence Gemfire …

Page 30: In-memory Data Management Trends & Techniques

SAP HANA Relational | Analytical

•  “Appliance” •  Aggressive IA64 optimisations •  ACID, SQL and MDX •  In-memory SSD and Disk •  Row and Column based Storage •  Fast aggregation on column store •  Single Instance 1TB limit •  Uses compression (est. 5x size) •  Parallel DB - round-robin, hash, or range partitioning of a table

with shared storage •  Updates as delta inserts •  Data is fed from source systems near real-time, real-time or

batch

Page 31: In-memory Data Management Trends & Techniques

Volt DB Relational | New SQL | Operational | Analytical

•  An all in-memory design •  Full SQL and full ACID •  Partitioned per core so that one thread own its partition –

avoids locking and latching •  Redundancy provided by ���

multiples instances with ���writes being replicated

•  Claims to be 45x faster

Page 32: In-memory Data Management Trends & Techniques

Oracle Exadata Relational | Operational | Analytical | Appliance

•  Combines Oracle RAC with “Storage Servers” •  Connected with the box with Infiniband QDR •  SS use PCI Flash (not SSD) for a 22 TB hardware cache •  In-situ computation on the Storage Servers with “Smart Scan” •  Uses “Hybrid Columnar Compression” a compromise of row

and column storage.

PCI Flash Card

Page 33: In-memory Data Management Trends & Techniques

Terracotta BigMemory Key-Value | Operational | Data Grid

•  In-memory •  Key-value with the Ehcache and soon javax.cache APIs •  In-process (L1) and server storage (L2) •  Persistence via log-forward Fast Restart Store: SSD or Disk •  Tiered Storage: local on-heap, local off-heap, server on-heap,

server off-heap •  Partitions with consistent hashing •  Search with parallel in-situ execution •  Off-heap allows 2TB uncompressed in each app server Java

process and on each server partition •  Compression •  Speed ranging from < 1µs to a few ms.

Page 34: In-memory Data Management Trends & Techniques

Hazelcast Key-Value | Operational | Data Grid

•  In-memory •  Key-value Map API and javax.cache API •  Near cache and server data storage •  Tiered Storage: local on-heap, local off-heap, server on-heap,

server off-heap •  Partitions with consistent hashing •  Search with parallel in-situ execution •  In-situ processing with Entry Processors and Distributed

Executors •  Speed ranging from < 1µs to a few ms.

Page 35: In-memory Data Management Trends & Techniques

Disk is the new tape

35

SSD is the new disk

Memory is the new operational store