hybrid memory aerospike @ paypal€¦ · aerospike user summit 2018 18 in-memory nosql (50tb) total...
TRANSCRIPT
-
AERO SPIKE USER SUM M IT 2018
1
Hybrid Memory Aerospike @ PayPal
Saibabu Devabhaktuni
Sr. Director of Database,
Systems, and Storage
PayPal
Athreya Gopalakrishna
Sr. MTS Engineer, Database
Engineering
PayPal
-
AERO SPIKE USER SUM M IT 2018
2
PayPal Fraud Detection System
2
• Analytical system
• Built on relational and KV system
• Requires Low Latency and High throughput
• 1-200KB avg. object sizes
• 1-2ms@99
• Millions of transactions/sec
• Trillions of keys
• 100s of Terabytes of Storage
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
3
3
Happy with the fast
machine
Fly Speed = 1300+mph
Cost = US$200M
Passengers = 100/flight
(Analogy: In-Memory NoSQL DB)
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
4
4
0.5
3
12
18
0
2
4
6
8
10
12
14
16
18
20
2011 2012 2013 2014
STORAGE GROWTH TREND – FRAUD SYSTEMS
Data Growth (TB)
In-Memory
NoSQL
: )
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
5
5
0.1
4
16
40
0
5
10
15
20
25
30
35
40
45
2011 2012 2013 2014
KEY SPACE GROWTH TREND
Keys (Billions)
Fly Speed = 1300+mph
Cost = US$200M
Passengers = 100/flight
: ) In-MemoryNoSQL
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
6
6
0.1 4 16 40 20
240
1200
2400
0
500
1000
1500
2000
2500
3000
2011 2012 2013 2014 2015 2016 2017 2018
Keys (Billions)
: (
Growth Estimates
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
7
7
Fly Speed = 650 mph
Cost = US$400 M
Passengers = 850/flight
In search of a new machine
(NoSQL DB)
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
8
Back to drawing board(in-memory vs. memory-first vs. hybrid-memory)
8
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
9
9
In-Memory Memory-First Hybrid-Memory
Database
Memory
Client
Read
Path Database
Memory
Client
Database
Disk
Read Path
With
Cache Hit
Read Path
With
Cache Miss
Read Path – Latency and Consistency
Latency - Low
Throughput - Consistent
Latency – Low and High
Throughput - Inconsistent
Database
Memory
Client
Database
Disk
Read Path
Latency – Low
Throughput - Consistent
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
10
DC1, RF=2
50TB
C1
50TB
C2
DC2, RF=2
50TB
C3
50TB
C4
DC3, RF=2
50TB
C5
50TB
C6
eg. Designing a 50TB A/A Database
X-DC replicationX-DC replication
X-DC replication
A/A
10
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
11
In-Memory Database for 50TB
Servers ~ 1024
Racks~18
Price ~$12M
DC1
DC2 DC3
(Predictable performance)
11
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
12
Memory-First Database for 50TB
in-memory(Predictable performance)
Servers ~ 1024
Price ~$15M
Racks~18
DC1
DC2 DC33.2TB 3.2TB
12
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
13
Memory-First Database for 50TB
(memory first + Disk )(Unpredictable performance)
Servers ~ 120 Price ~$1.8M
Racks~3
DC1
DC2 DC33.2TB 3.2TB
13
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
14
Hybrid Memory Database for 50TB(Predictable performance)
Servers ~ 120 Price ~$1.8M
Racks~3
DC1
DC2 DC33.2TB 3.2TB
14
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
15
Aerospike as NoSQL Database
• Written in C
• Simple KV database
• Distributed shared nothing architecture
• Operates In-Memory or Hybrid-Memory Modes
• Low write amplification
• SSD optimized for consistent performance
• High storage density
• Low CPU utilization
• UDF for server side computations
15
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
©2018 PayPal Inc. Confidential and proprietary.16
Aerospike Architecture
Distributed K/V
Shared nothing
Auto-Failover
Auto-Rebalancing
Cross DC Replication
UDFnode
node node
node
node
node
node
node
node
Key Differentiators in NoSQL space
Ground up, Designed for SSDs.
(Achieves – Even wear and tear on Device)
Proprietary file system
(Achieves – Consistent Device Latency,
Follows Device throughput)
Hybrid Storage – Predictable capacity.
(Achieves – Enables huge storage on SSD)
Key, Value
16
-
AERO SPIKE USER SUM M IT 2018
17
Hybrid Memory
Used = 15GB, 5%
Used =
268GB
5%
Load 250M
1KB Value Size
Raw Data = 250GB
Max Write = 100K TPS
CPU = 5%
Mem = 15GB
Disk = 268GB
System Efficiency
Util 100%
Used = 384GB
40 cores
1.92TB x 1
SATA
RI
System configuration
17
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
18
In-Memory NoSQL (50TB)
Total Cost = $12.5m
Aerospike (Hybrid Memory NoSQL)(50TB)
18 Racks# of server 1024
3 Racks# of servers = 120
Total Cost = $3.5m
99.5 ATB 99.99+ ATB
Pe
rfo
rma
nc
e C
os
t
3x
Sp
ac
e/P
ow
er
8x
Ava
ila
bil
ity
10x
• Consistent Performance
• Stable
• Avg Throughput – 1M TPS
• Ultra low latency (~200us)
• Inconsistent Performance
• Unstable
• Average Throughput – 200K TPS
• Low latency (~1ms) Before After
18
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
19
0.5 3 12
18 30
90
210
450
0
50
100
150
200
250
300
350
400
450
500
2011 2012 2013 2014 2015 2016 2017 2018
Fraud Detection systems with Aerospike
Data Growth (TB)
In-Memory
NoSQL
Hybrid-Memory
with
Aerospike
19
©2018 PayPal Inc. Confidential and proprietary.
-
20 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc
Concorde + Airbus 380
MISSION POSSIBLE
-
AERO SPIKE USER SUM M IT 2018
21
Operations@Scale
21
©2018 PayPal Inc. Confidential and proprietary.
-
22 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc
Cluster Size
Data Unavailable = { m (m-1) } / { n (n - 1) }
m = Total nodes unavailable.
n = Total nodes in the cluster.
• MAX = 20 Nodes
• RF=2
• Mode=AP
• 256GB RAM, 6.4TB SSD/Node
-
23 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc
Deployment Topologies (Active/Active Or Active/Standby)
DC1 DC2
DC3
DC1 DC2
DC3
-
24 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc
Monitoring at Scale
-
25 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc
Database Lifecycle Automation
1000+ servers
48 clusters
Ansible Automation API(s) – Programmatic OR Human interface
- prepare_new_node - backup
- wipe_out_server - restore
- create_cluster - prepare_tools_node
- reconfigure_database - remove_node
- add_node - validate_cluster
- change_password - switch_paxos_protocol
- reset_cluster_name - turn_off_clear_port
- apply_os_patch_rolling – apply_os_patch_single_node
3 Datacenters
-
26 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc
Database Reporting
Capacity Report
XDC/Latency ReportInventory Report
-
27 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc
Security
-
28 A E R O S P I K E U S E R S U M M I T | Proprietary & Confidential | All rights reserved. © 2018 Aerospike Inc
• 1 Click cluster provisioning
• 3-5 Node cluster
• Dev/QA
Next
• NVMf
• DB on Containers
• Strong consistency evaluation, Possibly larger clusters
Cloud
-
AERO SPIKE USER SUM M IT 2018
29
Thank You
©2018 PayPal Inc. Confidential and proprietary.
-
AERO SPIKE USER SUM M IT 2018
30
Q & A
©2018 PayPal Inc. Confidential and proprietary.