nosql with mysql
DESCRIPTION
NoSQL with MySQL. Yekesa Kosuru Distinguished Architect, Nokia Peter Zaitsev, Percona. Agenda. Part 1 : Problem statement Part 2 : Background (What is, Why, ACID, CAP) Part 3 : Building a Key-Value Store Part 4 : NoSQL with MySQL. Part 1: Problem statement. NoSQL : No Formal Definition. - PowerPoint PPT PresentationTRANSCRIPT
NoSQL with MySQL
Yekesa KosuruDistinguished Architect, Nokia
Peter Zaitsev, Percona
Agenda• Part 1 : Problem statement• Part 2 : Background (What is, Why, ACID, CAP)• Part 3 : Building a Key-Value Store• Part 4 : NoSQL with MySQL
PART 1: PROBLEM STATEMENT
NoSQL : No Formal Definition• No Formal Definition
– Not Only SQL ?– Don’t use SQL ?– Limit expressive power ?
• Underlying issues ?– ACID (Strong Consistency ?, Isolation ?)– SQL– Something else ?
• Facts– Hardware will fail– Software will have bugs– Growing traffic, data volumes, unpredictable– No downtime or expose errors to users
Problem to solve• Key Value Store Problem Statement
– A consumer facing system with foll. characteristics :• Host large volumes of data & queries• Responsive with predictable performance• Always available (survives failures, software/hardware upgrades)• Low cost of ownership • Minimal administration
• Availability : How to service customers (24 X 7) despite failures, upgrades• Scalability : How to scale (capacity, transactions, costs…)• Predictability : Predictable responsiveness• Not suited for all applications
PART 2: BACKGROUND
ACID• Transactions simplify job of developers• Client/developer point of view• Atomicity: Bundled operations
– All of the operations (multiple updates, deletes, inserts) in the transaction will complete, or none will
• Consistency: The database is in a consistent state when the transaction begins and ends– Referential integrity, and other constraints– All reads must see latest data
• Isolation: The transaction will behave as if it is the only operation being performed upon the database. Serialized updates i.e locks
• Durability: Upon completion of the transaction, the operation will not be reversed.• ACID in single database is efficient
Issues Scaling DBMS• Databases extend ACID to span multiple nodes using 2PC• Shared disk & shared nothing architectures• Cost of 2PC (N nodes involved in transaction)
– Consistency & Isolation over 2PC makes it expensive– 3N+1 Message Complexity– N+1 Stable Writes – Locks – lock resources
• Availability is a pain point– 2PC is blocking protocol , uses single coordinator and not fault tolerant– Coordinator failure can block transaction indefinitely– Blocked transaction causes increased contention
• Rebalancing• Upgrades• Total Cost of Ownership
RequestCommit
PreparePreparePreparePrepare
PreparePreparePrepareCommit
PreparePreparePreparePrepared
CAP• Introduced by Prof. Brewer in year 2000• Consistency, Availability, (Network) Partition Tolerance
– Industry proved that we can guarantee two out of three• System point of view• Consistency : Defines rules for the apparent order and visibility of updates
– Variations on consistency– Eventual Consistency - eventually replicas will be updated – order of
updates may vary - all reads will eventually see it• Availability : All clients can read & write data to some replica, even in the
presence of failures• Partition-tolerance : operations will complete, even if network is
unavailable i.e disconnected network
PART 3: BUILDING KEY-VALUE STORE
Important Considerations• Functional
– Simple API ( Get, Put, Delete - like Hashtable)– Primary key lookups only (limit expressive power)
• No joins, No non-key lookups, No grouping functions, No constraints …– Data Model (Key-Value, Value = JSON)– Read-Your-Write Consistency
• SLA– Highly Available & Horizontally Scalable– Performance (low latencies & predictable)
• Uniform cost of queries• Operational
– Symmetric, Rolling Upgrades, Backups, Alerts, Monitoring• Total Cost of Ownership (TCO)
– Hardware, Software, Licenses …
Making it Available• Make N replicas of data• Use Quorum (R + W > N)• Overlap R & W to make it strongly consistent• Handle Faults (partition tolerance and self heal)
– Real faults • Machine down, NIC down …• Eventual Consistency (data)• Prefer A & P in CAP• If a node is down during write, read repair later – let write proceed
– Intermittent faults• X and Y are clients of Z; X can talk to Z but Y can’t• Server does not respond within timeout, stress scenarios …• Inconsistent cluster view can cause inconsistencies in data• If two concurrently writing clients have different views of cluster, resolve
inconsistency later - let write proceed
R + W > N2 + 2 > 3
N = Number of replicasR = Number of ReadsW = Number of Writes
Making it Scalable• Sharding (or partition data) works well for Capacity• Large number of logical shards distributed over few physical machines• How to shard
- Consistent Hashing- Hash(key) say “2”, move clockwise to find nearest node “B”
- Hash to create a uniform distribution- Allows for adding of machines
- Without redesign of hashing- Without massive movement of data
- Load balancing- If a machine can’t handle load, move shards to another machine
A,B,C = nodes1,2,3,4 = keys
Vector Clocks• Determine history of a key-value, think data versioning• Need to interpret causality of events, to identify missed updates & detect
inconsistencies• Timestamps are one solution, but rely on synchronized clocks and don't capture
causality• Vector clocks are an alternative method of capturing order in a distributed
system• Node (i) coordinating the write generates a clock at the time of write• Construct: ( nodeid : logical sequence)
– Node A receives first insert : A:1– Node B receives an update : A:2
• Clock: (A:1) is predecessor of (A:1, B:1)• Clock: (A:2) and (A:1, B:1) are concurrent• Reads use vector clocks to detect missed updates & inconsistencies
3 replicasEventual Consistency Example 3/2/2
ServerNode B
Application ServerNode A
Client
PUT
GET
V1 A:1WriteWrite
Read
PUT
PUT
GET
PUT
Write
ServerNode C
WriteV1 A:1
V1 A:1
V2 (A:1,B:1)V2 (A:1,B:1)
ReadV1 A:1
Read
Read Repair
NODE 1 DOWN
NODE 1 UP
Detect Missing Update
V2 (A:1,B:1)
V2 (A:1,B:1)
V2 (A:1,B:1)V2 (A:1,B:1)
OutageUpgrade
Missed update added
PART 4: NOSQL WITH MYSQL
NoSQL with MySQL• Key-Value store implemented on top of MySQL/ InnoDB• MySQL serves as persistence layer, Java code handles the rest• Atomicity limited to one operation (auto commit), Consistency & Isolation limited
to one machine (recall issues with 2PC)• Why MySQL ?
– Easy to install, tune and restore– Fast, Reliable, Proven– Performance :
• MVCC, row level locking• Buffer pool, insert buffering, direct IO
– Data Protection & Integrity :• Automatic crash recovery• Backup, Recovery (full and incremental)
– Leverage expertise across ACID and BASE systems
Implementation• InnoDB Engine (innodb_file_per_table)• Simple schema
– Auto increment PK– App Key, Value– Metadata– Hash of app key (index)
• Clustered Index and Secondary Index• KV use simple queries – no joins or grouping functions• Database is a shard (pick right number of shards)• Backup & Recovery
Tuning• Queries are so simple that tuning is focused on IOPS, Memory, TCP, Sockets,
Timeouts, Connection Pool, JDBC, JVM , Queuing etc – Memory to disk ratio– Connector/J
useServerPrepStmts=true useLocalSessionState = true
cachePrepStmts = truecacheResultSetMetadata = truetcpKeepAlive=truetcpTrafficClass=24autoReconnect=truemaxReconnects=...allowMultiQueries=trueprepStmtCacheSize=…blobSendChunkSize=…largeRowSizeThreshold=…locatorFetchBufferSize=…
• CentOS 5.5, XFS File System
Scalability Test (TPS/IOPS) EC2’s
0 20 40 60 80 100 1200
500
1000
1500
2000
2500
3000
3500
TPS
50% LAT
90% LAT
99% LAT
Server IOPS
TPS
Client Threads
Characteristics
m1.xlarge imageUser Dataset = 256GB (240M 1K values)Machines = 5Spindles=4 per boxXFS File System Buffer Pool=12GB/machineMemory:Disk=1:4Workload R/W=100/0Random Access PatternN/R/W=1/1/1Incremental load to 100 threads
Client TPS 50.00% 90.00% 99.00% 99.50% 99.90% Single ServerThreads LAT LAT LAT LAT LAT IOPS
1 114.4 8 16 24 28 52 262 226.88 8 16 26 30 63 484 437.37 9 17 29 34 63 965 600.1 8 14 27 32 64 10710 1061.45 9 18 35 44 77 19920 1730.69 10 23 52 66 118 30750 2775.85 12 40 98 123 202 482
100 3207.18 15 78 224 274 407 571
Know Your IOPS
Disk IOPS (random) 7.2k rpm SATA 9010k rpm SAS 14015k rpm SAS 180Value SSD 400Intel X25-M 1500Intel X25-E 5000DDR Drive x1 300000
Performance Test
1458 2393 3328 4263 5198 6132 7066 8000 8937 98630
0.1
0.2
0.3
0.4
0.5
0.6
50% 90% 99%
Characteristics
User Dataset = 900GBMachines = 9Spindles=8 (15K)/machine XFS File System (RAID0)Buffer Pool=16GB/machineMemory:Disk=1:6Cache=16%Workload R/W=100/0Recency Skewi.e Recent Access=80/20N/R/W=3/2/2Incremental load to 10K (TPS)
SECS
TPS
Tips By Peter ZaitsevPercona
Why XFS
File System Threads(1) Threads(16)
EXT3
RNDRD 243 582
RNDWR 219 218
Total 462 800
XFS
RNDRD 239 552
RNDWR 205 408
Total 444 960
MySQL for NoSQL Tune for Simple Queries Memory is the most important
innodb_buffer_pool_size Are we looking at intensive writes ?
innodb_log_file_size RAID with BBU XFS + O_DIRECT (avoid per inode locking)
Restrict Concurrency on database Connection pool size or innodb_thread_concurrency
More on Scalability Is contention problem
Use MySQL 5.5 or Percona Server Tune innodb_buffer_pool_instances
Index update contention Use Multiple Tables
Or MySQL Partitions if you can't Eliminate Query Parsing Overhead
Consider HandlerSocket NoSQL Interface to InnoDB Storage Engine
Work on Result Stability Ensure you're getting stable results
InnoDB Flushing can cause non uniform performance