apexmeetup geode - talk1 2016-03-17
TRANSCRIPT
Apache Geode,and Pivotal's leadership role
in open sourcing (Gemfire)
Nitin Lamba
(incubating)
Pivotal’s Open Source strategyWhat is Apache Geode?HistoryDifferentiatorsBasic Concepts
ResourcesQ & A
Agenda
2
3
4
In 2015, Pivotal granted the components of its Big Data Suite to open source
6 Million Lines of Code4 new open source communities
5
May 2015 Sept 2015
Sept 2015Oct 2015
From GEMFIRE to GEODE…
6
A distributed, memory-based data management platform for data oriented apps that need:• high performance, scalability,
resiliency and continuous availability
• fast access to critical data sets• location-aware distributed data
processing• event-driven data architecture
What is GEODE?
7
• 1000+ systems in production (real customers)
• Cutting edge use cases
Incubating but ROCK solid…
8
<2000 2004 2008 2012 2016
Early drivers• Data Volumes• Margins/
transactions• IT maintenance
costs • Elasticity needs
Real-time needs• Real-time response• Time to market
needs• Flexible Data Models • Persistent+In-
memory
Global Data• Visibility across
DC• Fast Ingest• Device to
enterprise • Uptime (always
on)
Open Source!• Apache Incubation• Gemfire > Geode• Geode M1 release• 1st Geode Summit
Financial Services
US DoD
Trade Clearing
Travel Portal
Online Gambling
TelcosManufacturingAuto
InsurancePayroll processing
Rail systems
…with both SCALE and SPEED, …
9
40K
Transactionsper second
3TB Data
in-memory
17B Records
in-memory
120K
Concurrent users
… and impacting a LOT of people!
10
China RailwayCorporation
Indian Railway
s
17%
19%
36%
of the world population
High-level Architecture
11
Powerful app development kit• APIs: Java & REST• Adapters: Redis, Lucene*,
Spark*, …
Multiple persistence options• Filesystem, RDBMS or HDFS*• Sync: read-through, write-
through• Async: write-behind
Durable <K,V> cache/ store• Data replicated or partitioned• Redundant storage in-memory/
disk• Flexible data retention policiesÎ
Loca
tor
Serv
er
Serv
er
Serv
er
Serv
er +
A Peer-2-Peer in-memory Distributed System
REST
* Experimental and waiting community feedback
• Minimize copying• Minimize contention points• Run user code in-process• Partitioning & parallelism• Avoid disk seeks• Automated benchmarks
What makes it go FAST?
12
• Cache• Region• Member• Client Cache• Persistence• Functions
Let’s talk about a few BASIC CONCEPTS…
13
• In-memory storage and management for your data
• Configurable through XML, Java API or CLI
• Collection of Region
What is a CACHE?
14
• Distributed java.util.Map on steroids (Key/Value)
• Consistent API regardless of where or how data is stored
• Observable (reactive) • Highly available, redundant on
cache Member (s).
What is a REGION?
15
• Local, Replicated or Partitioned• In-memory or persistent• Redundant• LRU • Overflow
Region: Types & Options
16
LOCALLOCAL_HEAP_LRULOCAL_OVERFLOWLOCAL_PERSISTENTLOCAL_PERSISTENT_OVERFLOWPARTITIONPARTITION_HEAP_LRUPARTITION_OVERFLOWPARTITION_PERSISTENTPARTITION_PERSISTENT_OVERFLOWPARTITION_PROXYPARTITION_PROXY_REDUNDANTPARTITION_REDUNDANTPARTITION_REDUNDANT_HEAP_LRUPARTITION_REDUNDANT_OVERFLOWPARTITION_REDUNDANT_PERSISTENTPARTITION_REDUNDANT_PERSISTENT_OVERFLOWREPLICATEREPLICATE_HEAP_LRUREPLICATE_OVERFLOWREPLICATE_PERSISTENTREPLICATE_PERSISTENT_OVERFLOWREPLICATE_PROXY
• Durability• WAL for efficient writing• Consistent recovery• Compaction
Persistent Regions
17
Server 1 Server N
• A process that has a connection to the system
• A process that has created a cache
• Embeddable within your application
What is a MEMBER?
18
Client
Locator
Server
• A process connected to the Geode server(s)
• Can have a local copy of the data• Run OQL queries on local
data• Can be notified about events
on the servers
What is a CLIENT CACHE?
19
Persistence - Shared Nothing
20
Server 3Server 2Server 1
Persistence - Shared Nothing
21
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2Primary
Secondary
Persistence - Shared Nothing
22
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2Primary
Secondary
Persistence - Shared Nothing
23
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2Primary
Secondary
Persistence - Shared Nothing
24
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2Primary
Secondary
B3
B2
Server 1 waits for others when it starts
Persistence - Shared Nothing
25
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2Primary
Secondary
Fetches missed operations on restart
Persistence - Operational Logs
26
Create
k1->v1Create k2->v2
Modifyk1->v3
Create k4->v4
Modify
k1->v5Create k6->v6
Member 1Put k6->v6
Oplog2.crf
Oplog1.crf
Append to
operation log
Persistence - Operational Logs: Compaction
27
Create
k1->v1
Create k2->v2
Modifyk1->v3
Create k4->v4
Modify
k1->v5Create k6->v6
Member 1Put k6->v6
Oplog2.crf
Oplog1.crf
Append to
operation log
Copy live
data forward
• Used for distributed concurrent processing (Map/Reduce, stored procedure)
• Highly available• Data oriented• Member oriented
Functions
28
Functions
29
30
• Check out: http://geode.incubator.apache.org
• Subscribe: [email protected]
• Download: http://geode.incubator.apache.org/releases/
Join the Community!
31
Thank you!
Additional Slides
32
Built for PERFORMANCE…
33
A Re
ads
A Up
date
s
B Re
ads
B Up
date
s
C Re
ads
D In
serts
D Re
ads
F Re
ads
F Up
date
s0
200,000
400,000
600,000
800,000
1,000,000
Cassandra Geode
YCSB Workloads
Oper
atio
ns p
er s
econ
d
…and horizontal, consistent SCALABILITY!
34
Horizontal scaling for reads, consistent latency and CPU
2 4 6 8 100
1.25
2.5
3.75
5
6.25
0
4.5
9
13.5
18
speedup latency (ms) CPU %
Server Hosts
Spee
dup
• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers• Partitioned region with redundancy and 1K data size
High Availability
35