scaling data in magento
DESCRIPTION
In this technical session, we will look at how you can scale the database horizontally behind Magento. We will discuss the reasons for scaling through replication and how this may impact on your infrastructure, deployment and Magento implementation. Replication brings with it a great deal of benefits but also some pitfalls and potential problems with things such as high rates of data change. We will present simple solutions with MySQL replication and also using Tungsten by Continuent to bring high availability and high performance to MySQL. This will be backed by real data and metrics from one of the highest volume Magento stores in the UK showing how Magento can be deployed at scale with high availability to serve the UK, USA and Australia from a single implementation generating over $100 million in revenue.TRANSCRIPT
Scaling Data in Magento
Alistair SteadCTO
@alistairstead
Focused seasonal trends such as Cyber Monday have the capacity to melt tin
Server capacity can of course be expandedhardware is inexpensive...
Back pressure
from lower systems will block PHP
When the DB is the bottleneck
adding more servers will only make it worse
I'm not a DBA
When your store is popular you have to cope learn
Don't just copy what is on stackoverflow
identify your exact problem.
You have to learn to question the current state?
You have to find people that can help!
After all engineering is not a solitary affair
So you have your store up and running...
All web servers are scaled and running nicely...
You have Magento configured for optimal
running...
Now you have more traffic and things are slowing
down...
What do you do?
Well... ..?
You can't do anything with out instrumentation
Development instrumentationidentify problems early
Production instrumentation
will show the real problems
We have metrics that say the DB is a slowing us
down
We need to take some actions...
But first...a brief interlude
Scaling, high availability and redundancy
All related but separate things
Scaling:
The ability to function within acceptable limits as the number of users increases
High availability:
The ability to facilitate continuous function following and during failure
Redundancy:
Duplication of critical systems so as to have no single point of failure
In a mission critical application
all these need to be balanced
In commerceconversion rates are directly effected
the decisions made in these areas
Technology should facilitate conversion!
So where should we start?
The Apache / Nginx process is blocking
waiting on PHP...
PHP is waiting on the Database...
Step 1Make your queries FASTer
MySQL IndexesIdentify missing indexes for a query
and speed up the result
Re-design queriesNot recommended for core queries but
sometimes you have to...
However send a patch back to Magento for inclusion in the next release
Step 2Cache as much as you can
Expand query cache as much as you can
Can you fit your entire DB into memory?
Use Full Page Cache
State the obvious but it protects the database at peak loads
Use proxy or edge caches
If you don't need to execute PHP don't
At some point your cache MUST expire
On highly merchandised sites then cache is simply not as effective
But this is all for read operations...
What about writing data?
Lock wait timeout...
Have you seen this in your exception log?
Step 4Ensure all tables are INNODB
Some legacy code will have created MYISAM
Increase lock_wait_timeout
Don't this is an anti-pattern
Step 5Transaction level
Use READ COMMITTED
instead of the MySQL default of REPEATABLE READ
Step 6Reduce transaction size
Your transaction is not committed?
Your waiting for external service calls or none critical writes...
Step 7Reducing non-critical write operations
Logging can be done somewhere elese<?xml version="1.0" encoding="UTF-8"?><frontend> <events> <controller_action_predispatch> <observers><log><type>disabled</type></log></observers> </controller_action_predispatch> <controller_action_postdispatch> <observers><log><type>disabled</type></log></observers> </controller_action_postdispatch> <customer_login> <observers><log><type>disabled</type></log></observers> </customer_login> <customer_logout> <observers><log><type>disabled</type></log></observers> </customer_logout> <sales_quote_save_after> <observers><log><type>disabled</type></log></observers> </sales_quote_save_after> <checkout_quote_destroy> <observers><log><type>disabled</type></log></observers> </checkout_quote_destroy> </events></frontend>
HTTP 101Only modify state on HTTP POST
#TIP 1. This simple rule can help so many aspects of scaling
Off-load functionality to third parties
logging and tracking can be handled else where
Move data and logic to the client
If state has not changed then the client should know all it needs to know
Step 8Asynchronous write operations
Use job queues
non-crtitical write operations can be pushed to the queue
You then have to work with eventual consistency
Step 9Clustering & replication
Introduce a slave databaseReplicate data to the slave database
Use standard MySQL replication
Enable binary logging
Ensure you have compression enabled!
Or you will flood you internal network
Use MIXED Binary logging format
for quicker replication
STATEMENT Binary logging
Can cause PK clashes... in our experience...
Single threaded replication
Prior to MySQL 5.6 you only have one thread
Split read from write operations
Across the cluster
Write, read consistency
Can be resolved with module level connections
Module config.xml
<?xml version="1.0" encoding="UTF-8"?><config> <global> <resources> <module_read> <connection> <use>core_write</use> </connection> </module_read> </resources> </global></config>
Cluster & replication options
Tungsten
Replicator
A multi-threaded replication process over MySQL
Connection manager
A smart connection manager that can filter based on query content
High availability
Connection manager provides active service discovery
Connection Manager
runs on every server and allows the master to float around the cluster
Hot production upgrades
MySQL can be configured or upgraded with zero downtime
The Master Database
Can be moved to any node without config changes or downtime
Service discovery
All servers connect to their own Connection Manager
Next steps...
One setting does not rule them all
Use many tuned connections for specific operations types
Alternate replication architecture
Fan-in for example allowing multiple masters
Sharding
The smart connector can re-write the query on the fly
Gotchas...
Turn off security updates because your cluster will FAIL
Ensure enough RAM for the transaction size
Do you have enough file descriptors
This will be limited ensure you have enough
Thank you!
Questions?
http://bit.ly/sdinmage