scaling out mysql - hardware today and tomorrow

Post on 12-Nov-2014

1.628 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Scaling out MySQL:Hardware todayand tomorrow

Jeremy Cole, Eric Bergen{jeremy,eric}@provenscaling.com

Overview• A look at hardware out there today

• What’s important for MySQL?

The big questions

What about 64-bit?• Make absolutely everything 64-bit• Every server you buy now will have 64-bit CPUs• Except in a few corner cases, it won’t hurt, but may

not help• For MySQL servers, it will absolutely make your life

easier• Caveat: If you use third-party software, it may not

work properly due to library issues etc.

How many cores?• MySQL has problems scaling on many-core CPUs• Peter Zaitsev and Mark Callaghan have addressed

the issues many times in blog posts andconference sessions

• We normally recommend dual dual core or dualquad core

• Unless you are highly concurrent and CPU-bound,dual dual core at a faster clock speed shouldperform better than dual quad core at a slowerclock speed

How much memory?• As much as you can!• Memory is quite cheap these days, as 4GB DIMMs

have come down in price by about 80% or more• Typical servers can hold up to 32GB, go for it!

Shared storage?• This is usually the biggest question: Should I buy

this big expensive SAN, or should I put some disksin RAID in each server?

• Shared storage places a lot of trust in a singlesystem

• Reliability can be more difficult to achieve when asingle system failure affects multiple othersystems

• Storage shared across many tasks will make it verydifficult to provide reliable service to MySQL

• I/O latency is much higher on SAN or NAS systems

Which vendor?• Major server vendors: Dell, HP, IBM, Sun• Smaller server vendors: SuperMicro, Rackable,

Silicon Mechanics, iX Systems, etc.

• Bigger vendors can generally provide equipmentmuch faster in a pinch

• Bigger vendors will have an easier time providingthe same type of machines over a longer period oftime

• Smaller vendors may be more willing to work withyou on custom configurations or special needs

Acronym Soup

RAID• “Redundant Array of Inexpensive Disks”• Different RAID levels: 0, 1, 5, 10 are common• For databases, 5 and 10 are the most common• Can be connected via IDE, SATA, SCSI, SAS• Can be internal or external (“shelf”)• Can be implemented in hardware (LSI, 3ware,

Adaptec, etc.) or software (Linux kernel, etc.)

RAID: Common Levels• RAID 0 - Striping• RAID 1 - Mirroring

• RAID 10 (1+0) - Mirroring + Striping• RAID 0+1 - Striping + Mirroring

• RAID 5 - Distributed Parity

DAS• “Direct-Attached Storage”• Usually refers to a set of many RAIDed disks• RAID isn’t necessarily a prerequisite to being DAS,

you could have a JBOD DAS• “Direct-Attached” because it’s attached to the host

that will use the disks, not to a “headend” or otherinterim host

JBOD• “Just a Bunch of Disks”• Disks that are not RAIDed or part of a SAN or NAS

system• The OS will see each individual disk and is

responsible for combining them if necessary(using e.g. software RAID or LVM)

BBWC, TB[B]U• “Battery-Backed Write Cache”• “Transportable Battery [Backup] Unit”• A cache to hold writes while queuing them to be

written to the actual disks• Usually present in RAID cards• Almost always present in SAN or other solutions

BBWC:Write Back vs. Through

• A BBWC can be in “write back” or “write through”mode

• “Write Back” uses the cache without writing thedata to physical disk immediately (very dangerouswithout working battery) -- but drastically increasesperformance on sequential, individually committedwrites (such as binary logs, InnoDB logs)

• “Write Through” requires data to be written to thephysical disk before acknowledging writes -- but isslow

SAN• “Storage Area Network”• Generally either FC (Fibre Channel) or iSCSI (SCSI

over IP, often via Gigabit Ethernet)• Provides a volume to the host as a block device• SANs are typically shared by many machines, but

each volume on a SAN is normally only used byone host (“initiator”) at a time

• SANs may provide the ability to take copy-on-writesnapshots to the host

NAS• “Network Attached Storage”• Generally NFS and/or CIFS• Provides the host a view of files via a high-level

export protocol• NASes are typically shared by many machines, and

a single volume may be shared by many hosts• NASes coordinate access to files

Out with the old:PATA, SCSI

• “Parallel ATA”• Older host interface, primarily used in desktop

machines

• “SCSI” :) (ok, technically “Small Computer SystemInterface”)

• Older host interface, primarily used in servers• Allows for hot swapping• High pin count, requires terminators, etc.

In with the new: SATA, SAS

• “Serial ATA”• New version of ATA using a serial protocol at 1.5

Gbps and 3.0 Gbps• Very low pin count, simple cables, hot swappable

• “Serial Attached SCSI”• Same basic host interface as SATA• SAS hosts can connect to SATA disks seamlessly• SAS has additional features, such as multiple

attachment, and a richer command set

SSD• “Solid State Disk”• Uses flash memory to store data• Capable of very low latency for random “seek”• Commercially available versions are much better

suited to high random read environments thanrandom writes

• Kevin Burton did lots of research on availableSSDs, conclusion: Not fast enough for high random write environments

yet InnoDB needs work to really take advantage

MySQL Stuff

Typical MySQLRequirements

• Assuming high write needs, fairly large database

• BBWC to allow InnoDB to commit without diskhead movement

• Lots of memory to allow for a large InnoDB bufferpool

• Storage with low latency and high random writethroughput

• Decent (but not awesome) CPUs

Memory Allocation• Assuming an InnoDB-only system

• Normally recommend system memory minusperhaps 2GB should be allocated to InnoDB bufferpool

• Very little memory needed for anything else -really!

Shared vs. Independent• Shared storage systems can be used in

combination with Linux HA to achieve failover• Independent storage can be used in combination

with MySQL replication to achieve failover• On shared storage systems, failover will require a

recovery of MySQL databases• On independent storage, failover can be nearly

instantaneous

An Example Machine• Dell PowerEdge 2950• Dual Quad Core E5430 @ 2.66Ghz• 32GB 667Mhz RAM• 8 x 73GB 15k RPM 2.5” SAS• Dual power supplies• Rack mount kit

• List price: $8400• Real price: ~$6k• Power consumption: typical 440W (3.83A @ 115V)

Special Hardware

Kickfire• Execute queries on a SQL-processing custom chip• Massive access to large memories

• Very cool tech!

Violin Memory• Half a TB of DRAM in a 2U• Accessible as a block device

• Very cool tech as well!

High-speedInterconnects

• InifiniBand• Dolphin Interconnect

• Both very interesting for clustered systems,providing low latency high throughput networkaccess

• Software has to be written specifically to use either

top related