mysql scaling and high availability architectures

57
MySQL Scaling and High Availability Architectures Jeremy Cole [email protected] Eric Bergen [email protected]

Upload: britt

Post on 14-Oct-2014

6.304 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: MySQL Scaling and High Availability Architectures

MySQL Scaling and HighAvailability Architectures

Jeremy [email protected]

Eric [email protected]

Page 2: MySQL Scaling and High Availability Architectures

Who are we?• Proven Scaling is a consulting company founded

in 2006 by Eric and Jeremy specializing in MySQL

• We primarily deal with architecture and design forlarge scalable systems

• We also do training, DBA work, custom MySQLfeatures, etc.

• Jeremy: optimization, architecture, performance• Eric: operations, administration, monitoring

Page 3: MySQL Scaling and High Availability Architectures

Overview• What’s the problem?• Basic Tenets of Scaling and High Availability• Lifetime of a Scalable System• Approaches to Scaling• Approaches to High Availability• Tools and Components

Page 4: MySQL Scaling and High Availability Architectures

What’s the problem?• Internet-age systems can grow (or be forced to

choose between growth and death) very quickly• No matter what you plan for or predict, users will

always surprise you• Mobs, err, valued users can be very annoying

sometimes (e.g. “biggest group ever” logic)• Users may have vastly different usage patterns• Web 2.0™ (blechhhh!) sites have changed the

world of scaling; it’s much harder now• Everyone (your VCs included) expects you to be

Web 2.0® compliant™

Page 5: MySQL Scaling and High Availability Architectures

Basic Tenets• Don’t design scalable or highly available systems:

Using components you do not control or that haveloose tolerances (e.g. DNS)

Using processes with potentially ugly side effects(e.g. code changes to add a new server) [Yes,configuration files are very often “code”]

• If a user doesn’t think/notice something is down,it’s not really “down”

• Eliminate (or limit) single points of failure -- if youhave only one of any component, examine why

• Cache everything

Page 6: MySQL Scaling and High Availability Architectures

Lifetime of a ScalableSystem

Page 7: MySQL Scaling and High Availability Architectures

Newborn• Shared hosting• Might start worrying (a little bit) about query

optimization at this point• Don’t have much control over configuration• Overall performance may be poor• Traffic picks up, and performance is bad... What do

we do about it?

Page 8: MySQL Scaling and High Availability Architectures

Toddler• A single (dedicated) server for everything• MySQL and Apache etc., competing for resources• MySQL needs memory for caching data• Apache (and especially PHP etc.) needs lots of

memory for handling requests• Memory contention will be the first major

bottleneck

Page 9: MySQL Scaling and High Availability Architectures

Child• Separate web servers and database server• Usually go ahead and get multiple web servers

now, since it’s easy• Get a single database server, since it’s hard --

maybe better hardware?• Now we need to do session management across

web servers… hmm we have this nice database…• Other load same as before, but now with added

network overhead• Single database server becomes your biggest

bottleneck

Page 10: MySQL Scaling and High Availability Architectures

Teenager• “Simple” division of load by moving tables or

processes• Use replication to move reporting off production• Move individual tables or databases to lighten load• Use replication to move reads to slaves• Modify code to know where everything is• Still too many writes in some parts of system• Replication synchronization problems mean either

annoying users or writing lots of code to workaround the problem

Page 11: MySQL Scaling and High Availability Architectures

Late teens to 20s• The “awkward” stage• This is where many applications (and sometimes

entire companies) die by making bad decisions• Death can be slow (becoming irrelevant due to

speed or lack of scalability) or quick (massivemeltdown losing user confidence)

• Managing the move from teenager into adulthoodis often the first real project requiring specs andreal processes to do it right

• Downtime at this point is hard to swallow due tosize of userbase

Page 12: MySQL Scaling and High Availability Architectures

Adult• Scalable system that can grow for a long time,

generally based on data partitioning• Most improvements now are incremental• System is built to allow incremental improvements

without downtime• A lot has been learned from the successful

transition to adulthood

Page 13: MySQL Scaling and High Availability Architectures

Data Partitioning:The only game in town

Page 14: MySQL Scaling and High Availability Architectures

What is partitioning?• Distributing data on a record-by-record basis• Usually a single basis for distributing records in

each data set is chosen: a “partition key”• An application may have multiple partition keys• Each node has all related tables, but only a portion

of the data

Page 15: MySQL Scaling and High Availability Architectures

Partitioning Models• Fixed “hash key” partitioning• Dynamic “directory” partitioning

• Partition by “group”• Partition by “user”

Page 16: MySQL Scaling and High Availability Architectures

Partitioning Difficulties• Inter-partition interactions are a lot more difficult• Example: Partitioning by user, where do we store a

message sent from one user to another? Howabout a friend list?

• Overall reporting becomes more difficult• Example: Find the average number of friends a

user has by state…

Page 17: MySQL Scaling and High Availability Architectures

Partition by …• Partitioning by user (or equivalent) allows for the

most flexibility in most applications• In many cases it may make sense to partition by

groups, if most (or all) interactions between usersare within that group

• You could also get most of the same benefits ofpartitioning by group by partitioning by user withan affinity based on group

Page 18: MySQL Scaling and High Availability Architectures

Fixed Hash Key• Divide the data into B buckets• Divide the B buckets over M machines• Example: Define 1024 user buckets 0..1023 based on (user_id

% 1024) for 4 physical servers, so each server gets 256 of thebuckets by range: 0-255, 256-511, 512-767, 768-1023

• Problem: Moving entire buckets means affecting 1/B of yourusers at a time in the best case… in simple implmentationsyou may have to affect 1/M or 2/M of your users

• Problem: The bucket-to-machine mapping must be storedsomewhere (usually in code) and updated atomically

• Problem: You have no control over which bucket (and thusmachine) a given user is assigned to

Page 19: MySQL Scaling and High Availability Architectures

Dynamic Directory• A “directory” server maintains a database of

mappings between users and partitions• A user is assigned (often randomly) to one

partition and that mapping is stored• Any user may be moved later by locking the user,

moving their data, and updating their mapping inthe directory

• Solution: Only single users are affected by anyrepartitioning that must be done

• Solution: Partitions may be rebalanced user-by-user at any time

Page 20: MySQL Scaling and High Availability Architectures

Custom Solutions

Page 21: MySQL Scaling and High Availability Architectures

Custom Solutions• It’s very easy to implement simple hash key

partitioning to get data distributed• It’s much more difficult to be able to re-partition• It’s difficult to grow

Page 22: MySQL Scaling and High Availability Architectures

Hibernate Shards

Page 23: MySQL Scaling and High Availability Architectures

Hibernate Shards• Sort of a merge between fixed key partitioning and

directory-based partitioning• “Virtual Shards” abstract the mapping of objects to

shards, but simplistically• It’s still painful to repartition• It doesn’t handle rebalancing at all currently• It doesn’t handle aggregation at all

Page 24: MySQL Scaling and High Availability Architectures

HiveDB

Page 25: MySQL Scaling and High Availability Architectures

HiveDB Project• HiveDB is an Open Source project to design and

implement the entire “standard” partition-by-keyMySQL system in Java

• Originally envisioned by Jeremy while working withseveral customers

• Implemented by Fortress Consulting andCafePress along with help and guidance fromProven Scaling

• Many companies have built somewhat similarsystems but nobody has really open sourced it

Page 26: MySQL Scaling and High Availability Architectures

Why HiveDB?• Many solutions that exist only solve the easy part:

storing and retrieving data across many machines• Nobody really touches on the hard part: being able

to rebalance and move users on the fly

Page 27: MySQL Scaling and High Availability Architectures

Server Architecture• Hive Metadata

Partition definition• Directory

Partition Key -> Partition mapping Secondary Key -> Partition Key mapping

• Hive Queen - makes management and rebalancingdecisions

• Job Server (Quartz) - actually executes tasks• Aggregation Layer (future)

Page 28: MySQL Scaling and High Availability Architectures

Client Architecture• Client uses Hive API to request a connection for a

certain partition key• Client uses those direct connections to do work

• Hive API should be written in each developmentlanguage as necessary

Page 29: MySQL Scaling and High Availability Architectures

High Availability

Page 30: MySQL Scaling and High Availability Architectures

Goals• Avoid downtime due to failures• No single point of failure• Extremely fast failover• No dependency on DNS changes• No dependency on code changes

• Allow for painless, worry-free “casual failovers” toupgrade, change hardware, etc.

• Fail-back must be just as painless

Page 31: MySQL Scaling and High Availability Architectures

MySQL Replication

Page 32: MySQL Scaling and High Availability Architectures

Basics• MySQL replication is master-slave one-way

asynchronous replication• “Master” keeps logs of all changes – called “binary

logs” or “binlogs”• “Slave” connects to the master through the normal

MySQL protocol on TCP port 3306

• Slave requests binary logs from last position• Master sends binary logs up to current time• Master keeps sending binary logs in real-time

Page 33: MySQL Scaling and High Availability Architectures

More Basics• Replication works with all tables types and (mostly)

all features

• Any “critical”, reads, ones that cannot be allowedto return stale data, must be done on the master –replication is asynchronous, so there may be adelay at any time

Page 34: MySQL Scaling and High Availability Architectures

Typical Setup• One Master (single source of truth)• Any number of slaves

• Slaves are used for reads only• All writes go to the master

• There are many other possibilities…

Page 35: MySQL Scaling and High Availability Architectures

Replication Topologies

Page 36: MySQL Scaling and High Availability Architectures

Master with One Slave

Master

Slave

Page 37: MySQL Scaling and High Availability Architectures

Master with Many Slaves

Master

Slave SlaveSlave Slave Slave

Page 38: MySQL Scaling and High Availability Architectures

Master with Relay Slave

Master

RelaySlave

Slave

Page 39: MySQL Scaling and High Availability Architectures

Master with Relay andMany Slaves

Master

RelaySlave

Slave SlaveSlave Slave Slave

Page 40: MySQL Scaling and High Availability Architectures

Master with Many Relays

Master

RelaySlave

Slave SlaveSlave Slave Slave

RelaySlave

RelaySlave

RelaySlave

RelaySlave

Page 41: MySQL Scaling and High Availability Architectures

Dual Masters

Master Master

Page 42: MySQL Scaling and High Availability Architectures

Dual Masters withSlaves

Master Master

Slave

RelaySlave

Slave SlaveSlave

Page 43: MySQL Scaling and High Availability Architectures

Ring (Don’t Use)

Master Master

Master

Page 44: MySQL Scaling and High Availability Architectures

High Availability Options

Page 45: MySQL Scaling and High Availability Architectures

Dual Master• Two machines with independent storage

configured as master and slave of each other• Optionally: Any number of slaves for reads only• Manual (scripted) or automatic (heartbeat-based)

failover is possible

Page 46: MySQL Scaling and High Availability Architectures

Dual Master Pros• Very simple configuration• Simple to understand = simple to maintain• Very similar to basic master-slave configuration

that many are familiar with• Allows easy failover in either direction without

reconfiguration or rebuilding• Allows for easy and reliable failover for non-

emergency situations: upgrades, schema changes,etc.

• Allows for quick failover in emergency• Can work between distant sites fairly easily

Page 47: MySQL Scaling and High Availability Architectures

Dual Master Cons• Does not help scale writes (no, not at all)• Limited to two sites; replication does not allow

multiple masters, so three or more is not possible• Replication is asynchronous, and may get behind --

there is always a chance of data loss (albeit small)

Page 48: MySQL Scaling and High Availability Architectures

SAN• Shared storage of a single set of disks by two

MySQL servers, with a single copy of the data on aFibreChannel or IP/iSCSI SAN

• Automatic (heartbeat) failover by fencing andmounting the SAN on the other machine

Page 49: MySQL Scaling and High Availability Architectures

SAN Pros• Single copy of the data means lower storage cost

for extremely large databases• No worries about replication getting behind• SAN systems can achieve very high performance

for same or lower cost as two very large RAIDs

Page 50: MySQL Scaling and High Availability Architectures

SAN Cons• Single copy of the data means corruption is

possible, and could be very damaging• For medium or small databases, cost can be

prohibitive• FibreChannel requires additional infrastructure

often not present in typical MySQL systems; iSCSIcan be very helpful in this regard

• Single copy of the data -- no schema change tricksare possible

Page 51: MySQL Scaling and High Availability Architectures

DRBD• Block device-level replication between two

machines with their own independent storage(mirrors of the same data)

• Automatic (heartbeat-based) failover by fencingand mounting local copy of filesystem is typical

Page 52: MySQL Scaling and High Availability Architectures

DRBD Pros• Simple hardware and infrastructure using locally-

attached RAID• No expensive hardware or network

Page 53: MySQL Scaling and High Availability Architectures

DRBD Cons• Complex configuration and maintenance• May cause performance problems, especially if

poorly configured• Failure of or problems with mirror can cause

problems in production• From the software perspective, there is still a

single copy of the data, which may get corrupted• Single copy of the data -- no schema change tricks

are possible

Page 54: MySQL Scaling and High Availability Architectures

Putting It All Together

Page 55: MySQL Scaling and High Availability Architectures

Partitioning + HA• No partitioning solutions really address HA .. They

treat the “shards” or “partitions” as single MySQLservers

• In reality you would implement an HA solution foreach partition

• There are many possibilities

Page 56: MySQL Scaling and High Availability Architectures

HiveDB + Dual Master• We recommend HiveDB plus Dual Master for most

installations• While not technically perfect, and with a chance of

data loss, administrative tasks are very simple• Additionally, LVM for volume management gives

ability to take snapshot backups easily

Page 57: MySQL Scaling and High Availability Architectures

Any questions?Discussion!