make it scale - optimise and scale mysql
DESCRIPTION
www.makeitscale.co.uk A quarterly event for Startups who ar part of http://uk.sun.com/startups to learn how to scale their company technically and commercially, with UK startups talking about their challenges and what they have overcome.TRANSCRIPT
Optimise and Scale MySQL
Mike GriffithsProven Scaling
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
About me and us• Me:
Used & administered MySQL databases for 10 years Consultant with Proven Scaling Used to be a DBA and Service Architect at Yahoo! I don’t always follow the party line
• Us: Founded in 2006 Specialise in MySQL, but work with whole stack Primarily consult on architecture, design and
optimisation for large scalable systems We also do training, DBA work, audits, coding…
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Mike’s Mantras• Scalability + Availability = Service Quality• Be as lazy as possible• Be paranoid• You’re only as strong as your weakest link
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Common Scenario #1• “I don’t think scalability will ever be a problem for
me.” Denial Overconfidence No-one uses my product
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Common Scenario #2• “It’s all melting down! Help!”• Find the cause
External Factors Slashdot, Digg, Blogs
Internal Factors New features Bad code Marketing
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Pre-requisites• Solid development process• Version & Release control• Test / Staging Environment• Load testing strategy• Good cross-functional communication
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Common Scenario #3• “I might have problems with scaling up in the
future… … I’ll fix them if they happen.” … I’ll fix them now. All of them.”
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Extreme Approaches•Wait & See
Faster, Cheaper Quality of service
compromised Slower to react to
changes in workload Sometimes chosen
because of lack of skillor knowledge
Prototypes are useful,but they are what theyare!
•Fix All Now Slower, Expensive Time to market
increased Risk of wasted time Increased quality of
service Better prepared for
unexpected
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Sensible Approach• Somewhere between the two extremes• Architect for what you might conceivably face in
the future• Implement now what you know you will face• Always avoid dead ends and short cuts
Even if it means much more effort now
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
MySQL Scalability•Query Optimisation•Functional Separation•Replication•Sharding•Archiving
•Isolation•Error Handling
•Caching
•Configuration
•Hardware
•Capacity Planning
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Query Optimisation• Ensure queries are correctly indexed• Use the slow query log• Learn EXPLAIN• Look carefully at queries which can’t be optimised
Rewrite queries Change schema Denormalise Split into multiple queries
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Query Optimisation• Big is generally slower
Longer to recover, query, maintain• Partitioning in MySQL 5.1 can sometimes help
query performance• Is your server optimally configured?
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Functional Separation• Common prototype/start-up scenario is to have
multiple functions on one database host Blog Forum Monitoring Application
• Split into different databases• Move different databases to different hosts• Can provide short-term breathing space
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Replication• Can help scalability problems with reads• Doesn’t help with scaling writes• Asynchronous nature adds complexity to
applications• Can be used to increase availability
Actually, replace “can” with “should” here• Part of a scalability solution
… but not the whole solution Be aware of its strengths and weaknesses Know what doesn’t work with replication
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Replication: ReadScalability
• Single master (write point) replicates to multipleread-only slaves
• Too many slaves can overload a master Use “relay” slaves to build a replication tree
• Inefficient (cost, storage, memory) strategy forscaling reads in isolation Load balancing strategy is key Consider having different roles for slaves Combine with sensible caching elsewhere in stack
for best results
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Replication: WriteScalability
• Replication doesn’t help with scaling writes• Write-saturated master normally means write-
saturated slaves Row-based replication in MySQL 5.1 can help
sometimes• Updates to slaves done in single thread
Overall write capacity on a slave is less than themaster
• Read performance on slaves drops rapidly as writeload increases
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Replication:Asynchronous
• Applications need to be aware of replication delays• Worst case: write followed by a read• How to handle?
Promotion of read-only database handles toread/write when required
Use binary log positions to work out if a slave hasnew enough data
Can other users be shown out-of-date data?• Can be very difficult to add handling for replication
delays into existing applications
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Replication: Availability• Deploy servers in pairs using master-master
replication• Never use master-master as a way of scaling writes• Use virtual IP addresses to control access - see
MMM, Flipper, Linux HA• Use “inactive” machine for maintenance, backups,
slow reads for reporting
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Sharding• Splitting database into smaller chunks• Only solution for scaling writes• Needs careful planning• Can be difficult to implement• Even more difficult, if not impossible to retro-fit to
an existing system• Architect your system with sharding in mind from
Day One … even if not immediately implemented or used
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Sharding• How to split the data?
Hashing on user-supplied data Username, email address
Splitting on other data Time
Data dictionary Any combination of the above
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Sharding• What data to split?
Cross-shard queries are not impossible, butlogistically more difficult
Use your application design to decide• Popular choice: Primary/Secondary split
Primary dataset Frequently used information (highly cacheable) Relationships Pointers to secondary data
Secondary dataset(s) Vertically split data Less frequently used information
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Sharding• Keep shards to the right size
Small enough to be manageable Large enough so you don’t need thousands Consistent size per shard
• Architectures should allow for adding shards Hash-based sharding often falls down here
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Sharding• Accessing data across shards is more difficult, but
not impossible Handle JOINs in your application Replicate primary data to secondary shards Maintain summary tables outside of shards Parallelisation of execution across shards can give
significant performance boost
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Archiving• Big is generally slower• Keep your data size as small as possible• Move older & less frequently accessed data
To slower and/or cheaper infrastructure To infrastructure with real or imagined lower service
level
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Caching• Caching closer to the user is more efficient
Use a content delivery network Use squid Use memcached Maybe use MySQL’s query cache
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Isolation & ErrorHandling
• Isolate your application from MySQL• Design your application to run without the MySQL
server (as much as possible)• Pre-produce full and/or partial web pages• No need to process web access logs in real-time• Cope (more) gracefully when there’s an outage
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Configuration• Ensure your servers are configured correctly
Use LVM or ZFS for snapshots for backup Ensure BBWC is on and working
Enables InnoDB to commit without disk head movement Make sure MySQL’s configured right
Use InnoDB. MyISAM rarely the right choice. Give as much memory as possible to InnoDB Aim: Get all data in InnoDB Buffer Pool
• Ensure everything’s monitored• Automate whatever you can
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Hardware for MySQL• Should I scale up or scale out?• Each approach has benefits
Don’t listen to sales & marketing people• Each approach has problems
Don’t listen to sales & marketing people• Should I consider cloud computing?
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Scale up• Can be cheaper
Smaller power and space usage Fewer machines to administer
• Many eggs in one basket• MySQL’s own scaling problems add complexity
Need to run multiple instances to take advantage ofmassively parallel machines
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Scale up• Storage doesn’t scale up cheaply
Significant storage infrastructure might be required Cost per I/O operation and terabyte likely to be
significantly higher … but, with a SAN, you only have one “live” copy of your
data Power & space savings could be cancelled out Single copy of data reduces maintainability I/O latency with SAN normally higher than with local
disk
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Scale out• Can be cheaper
Lower initial cost Local storage cheaper
… offset by duplication of data
• Can be more expensive Power, space, administration
• More frequent failures … but potentially less damaging
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Cloud computing• More appropriate in other parts of the application
stack• Worth considering if you have peaky, CPU-
intensive workload Cater for your baseline yourself Use on-demand services for peaks
• Otherwise, avoid! Poor I/O performance will hurt you
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Hardware• Hardware strategy depends on:
Application Architecture Predicted Growth Rates Funding
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Capacity Planning• Working out what you need to provide your
product(s) to your users within an acceptabletimeframe
• Constant process• Use all the data available
Extrapolate from monitoring data Use load testing data Use data gained from prototype testing
• Allow for hardware failure
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Capacity Planning
Past traffic data Future trafficpredictions
Traffic profile
Performance dataService LevelRequirements
42… and a new load
balancingstrategy
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Summary• Sharding for scaling writes• Replication and caching for scaling reads• Big is bad. Small is super. Manageable is magic.• Get the right hardware for the job• Use your kit as efficiently as possible
Optimise your queries Configure things correctly
• Monitor as much as possible• Think about scaling now, not later
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Thanks!
Want help to make it scale? Get in [email protected]