advanced deployment
DESCRIPTION
Advanced Deployment by Jonathan Weiss presented at Scotland on Rails 2009 in Edinburgh. Deployment and Scaling best practices. See more at http://scotlandonrails.com/schedule/28-march/advanced-deployment/TRANSCRIPT
Advanced Deployment Scotland on Rails 2009
Jonathan Weiss, 28 March 2009
Peritor GmbH
2
Who am I?
Jonathan Weiss
• Consultant for Peritor GmbH in Berlin
• Specialized in Rails, Scaling, Deployment, and Code Review
• Webistrano - Rails deployment tool
• FreeBSD Rubygems and Ruby on Rails maintainer
http://www.peritor.com
http://blog.innerewut.de
3
Deployment
Architecture Process
Deployment
4
Deployment Process Requirements
Automatic Reproducible Accountable Notifications
5
Deployment Tools
Several tools available
• Capistrano
• Webistrano
• Vlad
• Puppet
• Chef
The deployment process is usually not that complicated
6
Architecture
7
How deployment starts out …
8
… and how it ends
9
Agenda
Search
Background Processing
Scaling the database
Multiple Client Installations
Cloud Infrastructure
10
General Advice -
Simple is better than complex
11
Search
12
Search
Full text search
Can become very slow on big data sets
13
Full Text Search Engine
Separate Service
• Creates full text index
• Application queries search daemon
• Index update through application or database
Possible Engines
• Ferret
• Sphinx
• Solr
• Lucene
• …
14
Search Slave
Database replication slave
• Has complete dataset
• Migrates slow search queries from master
• Can use different database table engine
15
Database Index
PostgreSQL Tsearch2
• Core since 8.3
• Allows to create full text index on multiple columns or arbitrary SQL expressions
MySQL MyISAM FULLTEXT index
• Only works with MySQL <= 5.0 and MyISAM tables
• Full text index on multiple columns
16
What to use?
Different characteristics
• Real-time updates and stale data
• Lost updates
• Performance
• Document content and format
• Complexity
17
Background Processing
18
Problem
Long running tasks
• Resizing uploaded images
• Mailing
• Computing an expensive operation
• Accessing slow back-ends
When running inside request-response-cycle
• Blocks user
• Blocks Rails instance
• Hard to monitor and debug
19
Solution
Asynchronous processing in the background
Message/Queue Scheduler
20
Background Processing
21
Options
Options for message bus:
• Database
• Amazon SQS
• Drb
• Memcache
• ActiveMQ
• …
Options for background process:
• (Ruby) Daemon
• Cron job with script/runner
• Forked process
• Delayed Job / BJ / (Backgroundrb)
• run_later
• ….
22
Database/Ruby daemon example
23
Scaling the database
24
Scaling the database
One database for everything
• All domain data in one place
• The simplest solution
Problems at some point
• Number of read and write requests
• Data size
25
Scaling the database
Read Slave
• Slave replicates each SQL-statement on the master
• Increase read performance by reading from replicating slave
• Stale read problem
• Better used explicitly, but then makes you think
Better use memcached
26
Scaling the database
Master-Master
• Increase write and read performance
• Each server is a slave of the other
• Synchronization can be tricky
• Limited by database size
Better for HA than for write performance
27
Data Partitioning
Partition on domain models
• Separate users and products
• Makes sense if JOINs are rare
• Scales reads/writes
• Reduces data size per database
• Depends on separate domains
Simple and effective
28
Data Partitioning
Sharding
• Split data into shards
• All tables
• Only big ones like users
• Partition by id, hash function or lookup
• Complex and makes JOINs complicated
• Scales reads/writes
• Reduces data size per database
29
Data Partitioning
Sharding
• Split data into shards
• All tables
• Only big ones like users
• Partition by id, hash function or lookup
• Complex and makes JOINs complicated
• Scales reads/writes
• Reduces data size per database
Last resort
30
Alternatives
Data size is often the bigger problem
Reduce data size Archiving
31
Archiving
Get rid of (historical) data
• Delete old data
• Aggregate old data
• Partition old data
Have an archiving policy from the start
32
Reduce data size
Avoid exponential data growth
• Do not store data in database, move to
• File system
• S3
• SimpleDB
• Do not normalize data
• Duplicate data in order to remove JOINs (and JOIN tables)
• Combine indices
33
Multiple clients
34
Multiple Clients
NOT the same as multiple users
Client is more like a separate domain – i.e. expansion to another country
• Different settings
• Different themes
• Different features enabled
• Different language
• Different audience
How to combine in one app?
35
Multiple Clients
Questions to ask
• How many different clients?
• Is there shared state (users, settings, posts, …)?
• What is the expected data size and growth of each client?
36
Multiple Clients
The easy way to maintenance hell
• Fork the code
• One branch per client
• One install per client
37
Multiple Clients
Same code – same database
• Move different behavior into configuration
• Move configuration into database
• Scope data by DB-column
• Scope all data request in the code
38
Multiple Clients
Same code – partition the data
• Move different behavior into configuration
• Partition data by database
Hardcode database while booting
39
Multiple Clients
Same code – partition the data
• Move different behavior into configuration
• Partition data by database
Choose database dynamically
40
Multiple Clients
Generate local databases
• Import global content into master DB
• Push shared content in the correct format to app DBs
• Build reverse channel if needed
41
Cloud Infrastructure
42
Cloud Infrastructure
Servers come and go
• You do not know your servers before deploying
• Restarting is the same as introducing a new machine
You can’t hardcode IPs
database.yml
43
Solution #1
Query and manually adjust
• Servers do not change that often
• New nodes probably need manual intervention
• Use AWS ElasticIPs to ease the pain
Set servers dynamically AWS Elastic IP
44
Solution #2
Use a central directory service
• A central place to manage your running instances
• Instances query the directory and react
45
Solution #2
Use a central directory service
• A central place to manage your running instances
• Instances query the directory and react
46
Central Directory
Different Implementations
• File on S3
• SimpleDB
• A complete service, capable of monitoring and controlling your instances
47
Summary
Simple is better than complex
Carefully evaluate the different solutions
Only introduce a new component if you really need to
Everything has strings attached
Solving the data size problem often solves others too
48
Questions?
49
49
Peritor GmbH
Teutonenstraße 16 14129 Berlin
Telefon: +49 (0)30 69 20 09 84 0 Telefax: +49 (0)30 69 20 09 84 9
Internet: www.peritor.com E-Mail: [email protected]
Peritor GmbH - Alle Rechte vorbehalten