e-commerce database best practices for your black friday and … · mongodb cassandra redis and...
TRANSCRIPT
Black Friday and Cyber Monday:Best Practices for Your E-Commerce Database
Tim VaillancourtSr. Technical Operations Architect
@ Percona
Agenda
● Synchronous versus Asynchronous Applications● Scaling a Synchronous/Latency-sensitive Application● Scaling an Asynchronous Application● Efficient Usage of Data at Scale
○ Secondary/Slave Hosts○ Caching○ Queuing
● Efficient Usage of Data at Scale○ Moving Expensive Work○ Caching Techniques○ Counters and In-memory Stores○ Connection Pooling
Agenda
● Scaling Out (Horizontal) Tricks○ Pre-Sharding○ Kill Switches○ Limits and Graphs
● Scaling with Hardware (Vertical Scaling)● Testing Performance and Capacity● Knowing Your Application and Questions to Ask at Development
Time● Questions
About Me
● Started at Percona in January 2016● Experience
○ Web Publishing■ Big-scale LAMP-based Websites
○ Ecommerce■ Large Inventory SaaS
○ Gaming■ DevOps
● 50-100 Microservices● 5-7+ x Massive Launches / Year● Design, launch and maintain apps
About Me
■ DBA at EA DICE● 2 x New Titles● 5+ x Legacy Titles
○ Technologies■ MySQL■ MongoDB■ Cassandra■ Redis and Memcached■ RabbitMQ, Kafka and ActiveMQ■ Solr and Elasticsearch■ (Sort of) AWS, HDFS, HBase, Postgres, etc…
Services
Monolith● One application that does everything● Example: Chrome, MySQL, huge Python app
Microservice● Different purposes, pain points, SLA apps are discreet services● Often easier to scale/troubleshoot● Reduces risk of outage● Example: frontend PHP app, messaging app, encoding app, etc
In Practice● Both can be scaled up and down with the right features● Microservices offer more flexibility● Monolith services bring problems at scale
Application Operations
Synchronous● Blocking operation until success or failure● Slower requests● Example: a file uploading app
Asynchronous● Request and response are separated● Fast response time back to user/application● Example: a social media site
Slow Operations● Can cause pileups in a tiered system
Applications
Synchronous● Pros: less code, always the right answer● Cons: blocking operations and poorer efficiency● Example: a file uploading app
Latency/Integrity Sensitive● Pros: always the right answer● Cons: less scalability tricks available● Example: a stock trading app that cannot accept “slave lag”
Asynchronous● Pros: light operations and more scalability● Cons: eventual consistency (and sometimes more code)● Example: a social media site
Types of Data Designs
Decentralised● Data is duplicated in several places● Pros: lighter to read, decreased locking, easy to shard● Cons: increased storage space, extra duplication effort
Centralised● Data is kept in one (or few) places and referenced● Pros: less storage, one source-of-truth● Cons: locking, inefficiencies, sharding issues
Balancing Request Impact
Read-focused Apps● Benefit from
○ Values pre-computed at write/change-time○ Indices and/or few “scans” for data○ No/few JOINs/operations to get result
Write-focused Apps● Benefit from
○ No pre-computing of values (compute at read-time)○ No/few indices to update○ Insert/Append > Update○ Reads: compute read summaries with replicas, add indices
to secondaries only, etc
Event Metadata● Example: “UserX has the new top score!”● Without Queue example
○ Update Top Score in Database(s)○ Send Email to Friends○ Post to Facebook Page○ Update cache○ ...
● With Queue example○ Add event to queue ‘topscore’○ Apps read queue
Queuing Updates
Queuing Updates
Update Buffering● Scenario: there is a high rate of updates to buffer● Queue-based example
○ App adds to update buffer (queue)○ Worker app works from the bottom of buffer
● Queue Operational Benefits○ Spikes in traffic○ Backend downtime○ Communication bus
Scaling Sync./Latency-Sensitive Apps
● Rethink the Flow Using Async● Use lots of database RAM● Shard the database● Reduce impact of request flow● Apache Cassandra
○ Synchronous○ Very write optimized
● Percona XtraDB Cluster, NDB● Use memory-based storage
○ Queue persistence to database
Expensive DB Work● Focus on lightweight user-facing operations● Move aggregations/summaries/reporting to background● Use replicas for expensive jobs● Avoid or reduce (maybe cache) “JOINs”● Enable and monitor metrics
○ MySQL■ log_queries_not_using_indexes
○ MongoDB■ Enable operationProfiling
○ Review metrics and improve!○ Percona Monitoring and Management
Efficient Usage of Data at Scale
Efficient Usage of Data at Scale
Caching / In-Memory Stores● Alleviates load from database● Very fast lookups● Low connection overhead
○ MySQL connection buffers: ~1MB+○ MongoDB connection buffers: ~1MB○ Redis or Memcache connection buffers: 0-limit/infinity**
● Server-Side○ Hit/Miss Caching
■ If something is not in the cache: find + add it. TTL expiry○ Inline/Preemptive Caching
■ Update/Delete cache data at change time/preemptively
Efficient Usage of Data at Scale
Caching / In-Memory Stores (continued)○ Client-Side
■ Cache client data in the client app/browser/etc○ In-memory Stores
■ Memcached■ Redis■ Percona Server for MongoDB with Memory Engine :)
○ Use TTLs to trim data
Efficient Usage of Data at Scale
Storing Numerical Counters and Stats● Offload to in-memory stores
○ Incremented/decremented counters○ Aggregations, summaries, counts
● Count-style Queries to Counters○ Increment counter at request/change time○ Read counter value at read-request time○ Or, try to use an index
Efficient Usage of Data at Scale
Connection Pooling● Removes 3-way TCP “handshake” from request (more w/SSL)● Reduces threading overhead on databases● Proxies on App server localhost/loopback
○ Reduces 1 x TCP ‘hop’, ie: faster connect time○ Can create a LOT of DB connections with many app servers
Efficient Usage of Data at Scale
Connection Pooling (continued)● MySQL Proxies
○ ProxySQL○ HAProxy○ Maxscale○ Others…
● MongoDB Proxies○ Mongos (sharding) process
● Proxy-on-Localhost or direct is fastest
Virtualization● Pretends to be a real computer from BIOS up● OS + Software run under a hypervisor layer● Pros
○ Full hardware-level emulation, eg: CentOS, Redhat, Win 10○ Automation of platform (sometimes)
● Cons○ Emulation overhead○ Slow boot-up time○ Lots of OSs to update
Virtualization, Containers, etc
Virtualization, Containers, etc
Containers (cgroups, jails)● Several can run inside a single operating system and kernel● Offers controls to limit resources like RAM, CPU time, etc● Pros
○ Low overhead○ Container creation is very fast
Virtualization, Containers, etc
Mesos, Kubernetes, etc● Make a lot of servers distribute work, containers, etc● Apache Mesos: “Distributed systems kernel”
○ Agent on every host and manager servers give out work● Kubernetes
Virtualization, Containers, etc
Many Processes per Host● Run un-related processes on hosts● Add/remove from load balancers● Not advised for disk-bound or high-bandwidth apps
Scaling Out Tricks
Sharding● Techniques
○ Modulus■ Even distribution of keys■ Hard to reshape data
○ Map-based■ 1-to-1 shard mapping using another table, config, etc■ Easy to reshape data
● Launch with many shards in advance○ 1-4 MySQL/MongoDB Instance/host○ 1 MySQL/MongoDB Instance/host, 4 x databases as shards○ 1 MySQL/MongoDB Instance/host, small hardware
Scaling Out Tricks
Sharding
Modulus: Mapping:
Scaling Out Tricks
Hardware● Have a strategy to add/remove capacity quickly
○ Cloud Instances○ Mesos/Kubernetes○ Automation
● Use cheap application servers for in-memory stores and apps● Launch with lots of RAM, scale down post-launch
Scaling Out Tricks
Elasticity● Ensure there is a way to add/remove hosts, examples:
○ Load Balancers■ Good health-checks are important
○ Application Configs■ File■ Database■ Zookeeper
Scaling Out Tricks
At Launch...● Scale-out
○ Keep spare servers online, partially configured○ Launch with extra database replicas (slave/secondary)○ Monitor usage and remove extra hardware post-launch○ Monitor and adjust capacity
● Scale-up○ Launch with lots of RAM
● Traffic Control○ Launch one region at a time○ Launch with rate limits
Scaling Out Tricks
Application “Kill switches”● A switch to disable certain app features/functions● Useful when there is:
○ Too much traffic/scale-up○ DDoS○ A maintenance
Scaling Out Tricks
Limiting Graph Structures● “Friends” / ”Followers” features are often graphs● If Katy Perry or Barack Obama used your “friends” feature…● Limit the size of graphs, or queue events for fan-out updating
Scaling Out Tricks
Batching and Parallel Work● Do large queries in parallel
○ Modern CPUs have many cores (2, 4, 8+)○ 1 connection = 1 thread = 1 CPU core
● Batch inserts/updates○ 1 x update with 1000 items > 1000 x updates with 1 item
Scaling Up Tricks
● Test provider turn-around time on hardware upgrading● Test application performance on improved hardware in advance● Scale up only resources needed
Databases
General● Monitoring/reviewing slow queries reduces most inefficiencies● More memory will reduce disk requests● SSDs will reduce disk request time● Proper database and kernel tunings will help further
○ Linux has very inefficient defaults!● Try to use real local-disks, not EBS, NFS, etc
Queries● Don’t try to make MySQL/MongoDB a queue or search engine!● Decentralizing data and pre-computing answers for reads will
take you far● The best query is no query (cache)
Databases
Sharding
Testing Performance and Capacity
General● Try to emulate the real user traffic● Add micro-pauses to simulate reality● Cloud-based providers are great for running load generation
Applications● Component testing
○ Test the max volume of each component on a single host○ Test the max volume of each component on many hosts○ Calculate host scalability, ie: “+1 host = +80% more traffic”
● Feature capacity○ Test the impact of each feature if not separate
Testing Performance and Capacity
Databases● Replay real user traffic on real backups● Load test tools: Linkbench, Sysbench, TPCC, JMeter, etc● Single feature/query testing
○ Understand host capacity per feature, eg: “2000 user login queries/sec per db replica”
● Know your slowest query!
Development-time Questions
General● What does the app do?● If I break X, what happens?● Are connections to data stores “pooled”?
Replicas● Can the app use replicas (with possible lag)?
○ Tip: start early, deploy replication from the start● Can we Add/Remove replicas without disruption?
Sharding● Can the app understand shards/partitions?● How is data balanced post-sharding?● Are there cross-shard references?
Development-time Questions
Caching● What data can be cached?● Will an change be read immediately?
○ Can we pre-cache this change?● When should the cache delete an item?
○ Can we set TTLs on our keys?● How do we add/remove cache servers easily?
Knowing Your App
If you see…● The app is write heavy
○ Remove overhead from immediate write path○ Batch writes if possible
● The app is read heavy○ Reduce scans/operations from the read path (index, etc)○ Add as many replicas (slave/secondary) as needed
● The app queries for counts often, ie: # of items, friends, etc○ Move count-queries to incremented in-memory counters○ Or, create an index for the count query
● The app uses references or joins often○ Consider decentralising the data (with fan-out updates)
Themes
● Make all features, apps, databases elastic● Request Flow
○ Make the heavy workload easy / make the light workload hard○ Move graph updates to background (queues, async, etc)○ Move ‘counts’ to counters
● Caching○ Cheaper/faster to access than DB○ Try to cache before anyone reads data
● Queues○ Great for replicating events while simplifying update○ Great for batching changes
● Monitor everything! Try Percona Monitoring and Management!
Join us at Percona Live Europe
When: October 3-5, 2016Where: Amsterdam, Netherlands
The Percona Live Open Source Database Conference is a great event for users of any level using open source database technologies.
● Get briefed on the hottest topics● Learn about building and maintaining high-performing deployments ● Listen to technical experts and top industry leaders
Use promo code “WebinarPLAM16” and receive €15 off the current registration price!
Sponsorship opportunities available as well here.
Questions?
Thanks for joining! Be sure to checkout
the Percona Blog for more technical blogs and topics!