virtualizing latency sensitive workloads and vfabric gemfire
Post on 01-Nov-2014
2.468 Views
Preview:
DESCRIPTION
TRANSCRIPT
© 2011 VMware Inc. All rights reserved
VMware vFabric™ GemFire®
Virtualizing Latency Sensitive Workloads and vFabric GemFire – PEX 2012
Emad Benjamin – Staff Architect
2
Agenda The Data Challenge and Latency Sensitive Workloads VMware vFabric Cloud Application Platform High Performance Data with vFabric GemFire Primary GemFire Topologies and Usage Design and Sizing Best Practices Customer Case Study Next Steps
3
The Data Challenge and Latency Sensitive Workloads
4
Data Challenges in Modern Application Architectures
Explosive data growth• 60% year over year
Bridging data supply with data demand• Indeterminate user load, 24x7 access, new device types driving increased
application use
Business challenges• How to outpace competitors by delivering superior service and experience
IT challenges• Scalability
• Performance
• Data reliability
• Geographic distribution
5
Latency Sensitive
10ms to 100ms matter?• Then this is a latency sensitive application
• High chatter between VMs – many small data packets – many updates
6
VMware vFabricCloud Application Platform
7
VMware Cloud Application Platform
Virtual Datacenter Cloud Infrastructure and Management
RichWeb
ProgrammingModel
Social and Mobile
DataAccess
IntegrationPatterns
BatchFramework
WaveMakerSpring Tool Suite
CloudFoundry
App Monitoring(Spring Insight)
Performance Mgmt(Hyperic)
Automated App Provisioning(AppDirector)
JavaOptimizations(EM4J, …)
Java Runtime(tc Server)
Web Runtime(ERS)
Messaging(RabbitMQ)
Global Data(GemFire)
In-mem SQL(SQLFire)
8
High Performance Data with vFabric GemFire
9
Your Apps Are Cloud-Friendly… but What About Your Data?
The big glaring hole [with cloud] is data handling.
-Adrian Kunzle, MDHead of Engineering & Architecture, JPMorgan Chase
“ ”
File Systems Databases Other Systems
10
What’s the Problem?
How do youscale
the data tier?
11
What is VMware vFabric GemFire?
Data moves to the middle tier• Closer to where it is needed
Scalability• Easily accommodate more
application users
High performance• Dramatic application performance
gains – execute from memory
Data reliability• Data written through or behind
to disk
Geographic distribution• WAN connectivity
12
vFabric GemFire in a Nutshell
Databases Other Data SystemsFile Systems
Conventional Data Storage Systems
vFabric GemFire Data Fabric
High Throughput Low Latency High Scalability Continuous Availability
Reliable Event Notification Continuous Querying Parallel Execution
WAN Distribution
Enterprise Data Consuming Applications
Data Durability
13
Enabling Extreme Data Scalability and Elasticity
Application Data Lives Here
File Systems Databases Mainframes/other
Application Data Sleeps Here
Primary Use Cases Web session cache, L2 cache• Shopping cart state management
App data cache, in-memory DB • High performance OLTP
Grid data fabric: client compute• Shared data grid accessed by many
clients executing app logic
Grid data fabric: fabric compute• Shared data grid where app logic is
executed within the data fabric itself
14
GemFire Features
Rich objects
Replicated master data
Partitioned active data
Co-located active data
Ultra-fast co-located transactions
Distributed transactions*
Server-side event listeners
Ultra-low latency RAM durability
Client-side durable subscriptions Parallel MapReduce function execution
Java, C++, .NET
Customer
Address
Street
City
Preferences
Customers
Orders
Products
Promotions
Redundant Copies
* Available in v 7.0
Java
OQL OQLOQL
Java, C++, .NET
Redundancy for instant FT
Continuous queries
Parallel OQL queries
LRU overflow to disk in native format for fast retrieval
Parallel, shared nothing persistence to disk with online backup
Synchronous or asynchronous write-through, read-through
Uni- or bi-directional cluster synchronization over WAN
Java O-R Mapper
Elastic growth without pausing
Update
Regions
Update
Java O-R Mapper
Request
15
Primary GemFire Topologiesand Usage
16
Primary GemFire Topologies
Peer-to-peer• Intercommunicating set of vFabric GemFire servers that do not have clients
accessing them
• For example, back office, or backend type of processing
Peer GemFire Server 1
PeerGemFire Server 2
DistributedSystem
17
Primary GemFire Topologies
Client/Server is the most common topology used in practice
GemFireServer 1 GemFire Server 2
DistributedSystem
StandaloneClient Cache 1
StandaloneClient Cache 2
Client Tier
Server Tier
StandaloneClient Cache 3
StandaloneClient Cache 4
18
Primary GemFire Topologies – Global Multisite
GemFire 1
GemFire2Standby Gateway
DistributedSystem
GemFire4Gateway
GemFire3
New York Site
GemFire5
Standby Gateway
GemFire6
DistributedSystem
GemFire7GemFire8Gateway
Tokyo Site
GemFire12StandbyGateway
GemFire9Gateway
DistributedSystem
GemFire 10GemFire 11
London Site
Primary Gateway Paths
Standby GatewayPaths
19
Primary GemFire Usage – Hibernate Cache
20
Primary GemFire Usage – HTTP Session Management
21
Design and Sizing
22
Design and Sizing – Three Basic Steps
Step 1 Determine vFabric GemFire server JVM heap size needed to house region data for both RR and PR regions
Step 2Benchmark vertical scalability to determine VM size for GemFire server needed for the building-block VM
Step 3Benchmark horizontal scalability to determine how many vFabric GemFire servers are needed in a cluster
23
Design and Sizing – Understanding JVM Memory Segments
JVM MaxHeap-Xmx
JVM Memory
for GemFire
Perm Gen
Initial Heap
Guest OSMemory
VM Memory
for GemFire
-Xms
Java Stack-Xss per thread
-XX:MaxPermSize
Other mem
Direct NativeMemory
Non Direct MemoryVirtualAddress Space
24
Design and Sizing – Understanding JVM Memory Segments
Guest OS Memory approximately 0.5-1G (depends on OS/other processes)
Perm Size is an area additional to the –Xmx (Max Heap) value and is not GC-ed because it contains class-level information.
“other mem” is additional mem required for NIO buffers, JIT code cache, classloaders, socket buffers (receive/send), JNI, GC internal info
VM Memory for GemFire = Guest OS Memory + JVM Memory for GemFireJVM Memory for GemFire =
JVM Max Heap (-Xmx value) +
JVM Perm Size (-XX:MaxPermSize) +
NumberOfConcurrentThreads * (-Xss) + “other Mem”
25
Design and Sizing – Step 1: Calculating Region Data
Formula 1
NumberOfGemFireServers=NumberOfVMsInSystem=NumberOfJVMsInSystem= TotalMemoryPerGemFireSystemWithHeadRoom /
32GB
TotalMemoryPerGemFireSystemWithHeadRoom = TotalMemoryPerGemFireSystem * 1.5
Formula 2
TotalMemoryPerGemFireSystem = TotalOfAllMemoryForAllRegions +
TotalOfAllMemoryForIndicesInAllRegions + TotalMemoryForSocketsAndThreads
Formula 3
26
Design and Sizing – Step 1: Calculating Region Data (cont.)
Formula 4
ApproxServerMachineRAM= TotalMemoryPerGemFireSystemWithHeadRoom * (DataLossTolerancePercentage/ (NumberOfRedundantCopies +1) )
27
Design and Sizing – Step 1: Calculating Region Data (cont.)
JVM MaxHeap-Xmx
(29696m)
Perm Gen
Initial Heap
Guest OSMemory
-Xms (29696m)
Java Stack -Xss per thread (192k*100)
-XX:MaxPermSize (256m)
Other mem (=1484m)
500m used by OS
Set mem reservation to 31955m or set to Active
memory used by VM which could be lower
JVM Memory
for GemFire
(31455m)
VM Memory
for GemFire(31955)
28
What is the practical limit for JVM Memory sizing (not to scale)
16 Exa Bytes
64 bit JavaTheoretical
Limit
Guest OS Limit
1 to 16 TB ESX5i limit32vCPU
1TB RAM Physical Server limit
~256G<1TB
Per NUMA RAM
Most limiting practical sizing factor is the per NUMA node
RAM
29
Design and Sizing – NUMA Considerations
NUMA Node Local Mem = Total RAM on Server/Number of NUMA nodes
For Example 1• Assume 2 sockets server with 8 cores (8pCPU) and total of 196GB RAM
• This server has 2 NUMA nodes
• Each NUMA node will have 196GB/2=> 98GB RAM
• Hence the largest sized virtual machine should not exceed 8vCPU and 98GB RAM
For Example 2• 2 sockets quad core on each socket (4pCPU) and total of 64GB
• Each NUMA node would get 64/2=> 32GB
• Hence the largest GemFire virtual machine should be sized as 4vCPU and 32GB RAM
Proc 2
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Proc 1
128 GB RAM on Server
Each NUMANode has 128/264GB
MemoryMemory
Memory
Memory
Memory
Memory
2vCPU VMsLess than 32GB RAM on each VM
ESX Scheduler
Proc 2
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Proc 1
128 GB RAM on Server
Each NUMANode has 128/264GB
MemoryMemory
Memory
Memory
Memory
Memory
2vCPU VMsLess than 32GB RAM on each VM
ESX Scheduler
4vCPU VM
Split by ESX into 2 NUMA ClientsESX4.1
Proc 2
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Proc 1
128 GB RAM on Server
Each NUMANode has 128/264GB
MemoryMemory
Memory
Memory
Memory
Memory
4vCPU VMsLess than 32GB RAM on each VM
33
DETERMINE HOW MANY VMsEstablish Horizontal
ScalabilityScale Out Test How many VMs do you need to
meet your Response Time SLAs without reaching 70%-80% saturation of CPU?
Establish your Horizontal scalability Factor before bottleneck appear in your application
Horizontal Scalability Test
Building Block VM
Building Block VM
SLA OK?
Test complete
Investigate bottlenecked layer
Network, Storage, Application Configuration,
and vSphere
If horizontal scaling is bottlenecked
mitigate, and iterate scale out
test
If building block app/VM config problem, adjust
and iterate No
ESTABLISH vFabric GemFire BUILDING BLOCK VM
Size within NUMA boundaries of ESX host
Establish JVM Heap Size Size the Building Block VM that
houses vFabric GemFire Server
Verti
cal s
cala
bilit
y Te
st
Building Block VM
Building Block VM
Step 3 – Iterative Horizontal Scalability
Step 2 and Step 3: Establish Benchmark Vertical Scalability
34
Design and Sizing – Step 1: Calculating Region Data (cont.)
Formula 6 – for Global Multisite TopologyMaximum Throughput (bits/second) =
TCP-Windows-Size In Bits / Round Trip Latency in Seconds
Use WAN accelerators
35
Best Practices
36
vFabric GemFire on VMware – Best Practices
Best Practices paper here:• http://www.vmware.com/resources/techresources/10231
vFabric GemFire on VMware• Set appropriate memory reservation
• Leave HT enabled, size bases on vCPU=1.25pCPU if needed
• RHEL6 and SLES 11 SP1 have tickless kernel that does not rely on a high frequency interrupt-based timer, and is therefore much friendlier to virtualized latency-sensitive workloads
• Do not overcommit memory
37
vFabric GemFire on VMware – Best Practices
vFabric GemFire on VMware• Put vSphere Distributed Resource Scheduler (DRS) in manual mode
• Locators process should not be VMware vSphere® vMotion® migration, it otherwise would lead to network split brain problems
• vMotion over 10Gbps when doing scheduled maintenance
• Disable VMware HA
• Use Affinity and Anti-Affinity rules to avoid redundant copies on the same VMware ESX®/ESXi host
38
vFabric GemFire on VMware – Best Practices
Proc 2
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Proc 1
MemoryMemory
Memory
Memory
Memory
Memory
data Many enterprise appsconsuming data from GemFire and running within NUMA boundary
GemFire VM running within NUMA boundary
39
vFabric GemFire on VMware – Best Practices
vFabric GemFire on VMware• Disable NIC interrupt coalescing on physical and virtual NIC
• Extremely helpful in reducing latency for latency-sensitive virtual machines• Disable virtual interrupt coalescing for VMXNET3 • It can lead to some performance penalties for other virtual machines on the ESXi host,
as well as higher CPU utilization to deal with the higher rate of interrupts from the physical NIC
• This implies it is best to use dedicated ESX cluster for vFabric GemFire workloads
• All host is configured the same way for latency sensitivity and this insures non GemFire workloads are not negatively impacted
40
vFabric GemFire on VMware – Best Practices
vFabric GemFire on VMware – JVM tuning• Size with 50% headroom
• Use –XX:CompressedOops
• Use JDK 1.6.0_24 or later
• Set –Xms=-Xmx
• Use –XX:+UseConcMarkSweepGC low-pause collector and parallel Young Generation
41
vFabric GemFire on VMware – Best Practices
vFabric GemFire on VMware – JVM tuning• -XX:+DisableExplicitGC
• -XX:CMSInitiatingOccupancyFraction=<50-75>
• -Xmn 33% of –Xmx and ideally less than a range of 2GB
42
vFabric GemFire on VMware – Best Practices
vFabric GemFire on VMware – General• All peer-to-peer members of the distributed system must have the same
version of vFabric GemFire• Clients can be up to one major release behind. For example, any 6.x client
interoperates with any 6.x or 7.x server, but not with an 8.x server
• Set cache-server max-connections and max-threads
• Use GFMon and VSD tools for monitoring
• When troubleshooting performance problems, check to see you are not impacted by SYN cookies• SYN cookies are the key element of a technique used to guard against SYN
flood attacks. Daniel J. Bernstein, the technique's primary inventor, defines SYN cookies as “particular choices of initial TCP sequence numbers by TCP servers
43
Customer Case Study
44
Airline Industry
Client/Server topology Re-architecture of their
main Web store• To speed up search,
checkout/book process
In 2010• 80+ million passengers carried
• 12B in revenue
Next GenSessionServer
Next GenSessionServer
Next GenSessionServer
Next GenSessionServer
Client ClientClient
Number of servers per data center
4
Number of JVMs per server 1Heap Size per JVM -Xms34G and –Xmx34G
34GB
Available heap memory per JVM 34GB
Available RAM per JVM Includes 50% ratio for churn
17GB
Total RAM needed per data center
136GB
45
Getting Started – vmware.com/go/gemfire
46
Thank you! Any Questions?
You can buy my book here: https://www.createspace.com/3632131
47
Backup Slides
48
Storage Device
Database Node
Archival, OLAP and Regulatory RDBMS
Synchronous consistency within the fabric
Eventual consistency with archival database
Eventual consistency with other fabric instances
Data Fabric Node Data Fabric Node Data Fabric Node Data Fabric Node
Data Fabric Node Data Fabric Node Data Fabric Node Data Fabric Node
Consistency Model
49
Memory-Based Performance
High Performance
vFabric GemFire uses memory on peer machines to make data updates durable, allowing the updating thread to return 10x to 100x faster than updates written through to disk, without risking any data loss. Typical latencies are in the few hundreds of microseconds instead of tens to hundreds of milliseconds
vFabric GemFire can optionally write updates to disk, or to a data warehouse, asynchronously and reliably
50
Cloud Ready
Add or remove data servers dynamically
Elastic
Fabric is elastic so it can grow or shrink dynamically with no interruption of service or data loss
51
Distributed Events
• Targeted, guaranteed delivery, event notification, and continuous queries
Active
52
Data Fabric Node Data Fabric Node Data Fabric Node Data Fabric Node
Counterparty Descriptions
Settlement Instructions
Netting Agreements
Replicated regions model many-to-many relationships
Many-to-many, many-to-one, and one-to-many relationships can be modeled Co-location of related data eliminates distributed transactions All entities within the transaction are located on a single machine Targeted procedures have all the data entities they need locally
Partitioning and Co-Location Example
53
Data Fabric NodeData Fabric Node Data Fabric Node Data Fabric Node
Partitioned Data
Partitioned regionsmodel one-to-many and many-to-one
Position Data
Trade Data
Market Data
Instrument Data
Rating Information
Many-to-many, many-to-one and one-to-many relationships can be modeled Co-location of related data eliminates distributed transactions All entities within the transaction are located on a single machine Targeted procedures have all the data entities they need locally
Partitioning and Co-Location Example
54
Parallel Queries
Batch Controller or Client
Scatter-Gather (Map-Reduce) Queries and Functions
Parallel
55
Fault Tolerant, Data-Aware Function Routing
Targeted
vFabric GemFire provides “data aware function routing”—moving the behavior to the correct data instead of moving the data to the behavior
Batch Controller or Client
Data Aware Function
56
Multisite Capability
Data replication for disaster recovery is done with the fault-tolerant, bi-directional shared-nothing, store-and-forward gateways
Active Everywhere
57
Data Distribution
Distribute
vFabric GemFire can keep clusters that are distributed around the world “eventually consistent” in near real-time and can operate reliably in disconnected, intermittent, and low-bandwidth network environments
58
Design and Sizing – Step 1: Calculating Region Data (cont.)
Formula 5
TotalMemoryForSocketsAndThreads = TotalMemoryForSockets + TotalMemoryForThreadOverhead TotalMemoryForThreadOverhead = MaxClientThreads * ThreadStackSize TotalMemoryForSockets = TotaNumbrOfsockets * SocketBufferSizeBytes TotalNumberOfSockets = NumberOfServers * NumberOfThreadsOnServer
+ AppThreads+ MaxClientThreads+ MaxClientThreads * 2 *
NumberofServers *
IfHostPartitionedRegionAndConserveSocketsIsFalse
Proc 2
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Proc 1
128 GB RAM on Server
Each NUMANode has 128/264GB
MemoryMemory
Memory
Memory
Memory
Memory
ESX Scheduler
12vCPU VM
Can be done through12 vCPU vSocket/vNUMAin ESX5
vSocket
60
Primary GemFire Usage – Hibernate Cache
Hibernate configuration• (hibernate.cfg.xml)
<property name="hibernate.cache.use_second_level_cache">true</property>
• Set region.factory_class to GemFireRegionFactory (hibernate.cfg.xml version 3.3+)<property name="hibernate.cache.region.factory_class">
com.gemstone.gemfire.modules.hibernate.GemFireRegionFactory
</property>
61
Enabling Extreme Data Scalability and Elasticity
Application Data Lives Here
File Systems Databases Mainframes/other
Application Data Sleeps Here
Key Capabilities Low-latency, linearly-scalable,
memory-based data fabric• Data distribution, replication,
partitioning, and co-location
• Pools memory and disk across many nodes
Data-aware execution• Move functionality to the data for peak
performance
Active/continuous querying and event notification• Changes are propagated to one or
more “active” copies
62
GemFire in Mission Critical Wall Street Applications
Reference data (top 3 US-based bank)• Large amounts of in-memory data, mostly static but some intraday updates
• 5x–10x performance increase
• Global distribution – consistent global views
• Domain-specific and regional edge caches
Market data (top 3 Japan-based financials firm)• Ultra low latency for value added “derived” market data
• Fault tolerant store-and-forward global data distribution
• Global consistency
Risk processing system (top 3 US-based bank)• Credit risk, market risk, trader risk
• Over 1TB of credit risk data processing
• Processing moving from batch toward real time
• Consistent snapshot of data across long running calculation/analysis
top related