web scale mysql at facebook (domas mituzas)
DESCRIPTION
TRANSCRIPT
Web scale MySQL@ facebook
Domas Mituzas2011-10-03
1 Intro
2 Current
3 Future
Agenda
Facebook• 800M active monthly users
• 500M active daily users
• 350M mobile users
• 7M apps and websites integrated via platform
1 Setup
2 Performance Overview
3 Stalls
4 Efficiency
5 Projects
Current
Setup▪ Software
▪ MySQL 5.1
▪ Custom facebook patch
▪ Launchpad - mysqlatfacebook
▪ Extra resiliency
▪ Reduced operations effort
▪ Hardware
▪ Variety of generations
▪ Many core
▪ Local storage
▪ Some flash storage
UDB Performance numbers(From Sep. 2011)
▪ Query response time
▪ 4ms reads, 5ms writes
▪ Network bytes sent per second
▪ 90GB peak
▪ Queries per second
▪ 60M peak
▪ Rows read per second
▪ 1450M peak
▪ Rows changed per second
▪ 3.5M peak
▪ InnoDB page IO per second
▪ 8.1M peak
Performance focus▪ Focus on reliable throughput in production
▪ Avoid performance stalls
▪ Make sure hardware is used
▪ 99th percentile rather than average or median
▪ Worst offender analysis – topN & histograms instead of tier averages
Stalls▪ “Dogpiles”
▪ Temporary slow down – even 0.1s is huge
Stall tools▪ Dogpiled (in-house)
▪ Snapshot aggregation of server state at distress
▪ “time machine” view into logs before the event too
▪ Aspersa (stalk, collect)
▪ Poor man’s profiler (.org)
▪ Later iterations – apmp, hpmp, tpmp
▪ GDB
Stalls found▪ Tables extending – global I/O mutex held
▪ Drop table – both SQL layer and InnoDB global mutexes held
▪ Purge contention – unnecessary dictionary lock held
▪ Binlog reads – no commits can happen if old events read
▪ Kernel mutex – O(N) and O(N^2) operations
▪ Transaction creation
▪ Lock creation/removal, deadlock detection
▪ Background page flushing not really background
▪ Many more
Efficiency▪ Increasing utilization of hardware
▪ Memory to Disk ratio
▪ Finding bottlenecks
▪ Disk bound normally
▪ Sometimes network
▪ Application or server software chokepoints
▪ Rarely CPU/memory bandwidth
▪ Application design
▪ Biggest wins are in optimizing the workload
Disk efficiency▪ Normally disk IOPS bound
▪ Allowing higher queue lengths
▪ Can operate at more than 8 pending operations per disk
▪ InnoDB page size
▪ Need adjustable per table or index for real gain
▪ XFS/deadline
▪ Parallelism at MySQL layer
▪ >300 iops on 166 rps disks
Memory efficiency▪ Compact records – Thrift compaction for objects, etc
▪ Clustered and covering index planning
▪ FORCE INDEX – avoid unnecessary I/O and cached pages
▪ Historical data access costly
▪ Full table scans
▪ ETL-type queries, mysqldump, …
▪ Tune midpoint insertion LRU for InnoDB
▪ Incremental updating, incremental binary backups
▪ O_DIRECT data and logs access
Pure flash(Cheating)
▪ Data stored directly on flash
▪ Limited data size
▪ Not utilizing flash card fully
▪ Still used in some cases
Flashcache▪ Flash in front of disks
▪ Can use slower disks
▪ Write-back cache
▪ Much more data storage
▪ Able to utilize much more of flash card
▪ Very long warmup time
▪ Open source (github/facebook/flashcache)
MySQL 2x▪ Flash allows for large loads
▪ Large performance difference from pure disk servers
▪ Many older servers still being used
▪ Solution?
▪ Run multiple MySQL instances per server
▪ Use ports 3307, 3308, 3309, etc…
▪ Replication prevents direct consolidation
▪ Redo a lot of port assumptions in code
Application caching▪ Old: memcached
▪ Cache invalidation stampedes, refetching full dataset on refresh, many copies
▪ New: write-through caching
▪ Incremental cache updates
▪ Cache hierarchies for datacenter local copies
▪ Efficient operations for association set
▪ Common API for all use cases
Group commit▪ Some OLTP workloads too busy even for modern RAID cards
▪ High I/O pressure increases response times
▪ Durability compromises increase operational overhead
▪ Dead batteries are extremely painful otherwise
▪ Now in 5.1.52-fb
Admission control▪ Server resources are limited
▪ Per account thread concurrency
▪ Reduces O(N^2) blowup chance
▪ max_connections are no longer impacting server load
▪ Per-application resource throttling
▪ Now in 5.1.52-fb
Online Schema Change▪ External PHP script, open source
▪ Utilizes triggers for change tracking
▪ Used on 100G+ sized tables
▪ Dump/reload + fast index creation
▪ Extendable class, may allow:
▪ PK composition changes with conflict resolution
▪ Indexing previously unindexed datasets
Tools▪ Table and user statistics
▪ Shadows
▪ Slocket
▪ pmysql
▪ Replication sampling
▪ Client log aggregation
▪ Query comments
▪ Indigo (Query monitor)
1 Visibility
2 Replication
3 Compression
Future
Future▪ MySQL is never a solved problem
▪ Always investigating better/new solutions
▪ New hardware types
▪ New datacenters and topologies
▪ New use cases and clients
▪ New neighbors to share data with
Visibility▪ Never assume
▪ Use metrics to measure
▪ When metrics aren’t available, add them
▪ Full stack
▪ More InnoDB info
▪ More application info
Replication▪ Lag used to be a big problem, still is a bottleneck
▪ Possible solutions:
▪ “Better” slave prefetch
▪ Maatkit version has problems
▪ Our own version being used on some tiers successfully
▪ May be possible with InnoDB cooperation
▪ Continuent parallel slave
▪ Oracle parallel slave in 5.6
InnoDB Compression▪ Originally was planned during 5.1 upgrade
▪ Problems
▪ Replication stream cost
▪ Increased log writes
▪ Performance in some cases
▪ Stability, monitoring, etc
(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0