advanced deployment

Advanced Deployment Scotland on Rails 2009

Jonathan Weiss, 28 March 2009

Peritor GmbH

Who am I?

Jonathan Weiss

• Consultant for Peritor GmbH in Berlin

• Specialized in Rails, Scaling, Deployment, and Code Review

• Webistrano - Rails deployment tool

• FreeBSD Rubygems and Ruby on Rails maintainer

http://www.peritor.com

http://blog.innerewut.de

Deployment

Architecture Process

Deployment

Deployment Process Requirements

Automatic Reproducible Accountable Notifications

Deployment Tools

Several tools available

• Capistrano

• Webistrano

• Vlad

• Puppet

• Chef

The deployment process is usually not that complicated

Architecture

How deployment starts out …

… and how it ends

Agenda

Search

Background Processing

Scaling the database

Multiple Client Installations

Cloud Infrastructure

General Advice -

Simple is better than complex

Search

Full text search

Can become very slow on big data sets

Full Text Search Engine

Separate Service

• Creates full text index

• Application queries search daemon

• Index update through application or database

Possible Engines

• Ferret

• Sphinx

• Solr

• Lucene

• …

Search Slave

Database replication slave

• Has complete dataset

• Migrates slow search queries from master

• Can use different database table engine

Database Index

PostgreSQL Tsearch2

• Core since 8.3

• Allows to create full text index on multiple columns or arbitrary SQL expressions

MySQL MyISAM FULLTEXT index

• Only works with MySQL <= 5.0 and MyISAM tables

• Full text index on multiple columns

What to use?

Different characteristics

• Real-time updates and stale data

• Lost updates

• Performance

• Document content and format

• Complexity

Problem

Long running tasks

• Resizing uploaded images

• Mailing

• Computing an expensive operation

• Accessing slow back-ends

When running inside request-response-cycle

• Blocks user

• Blocks Rails instance

• Hard to monitor and debug

Solution

Asynchronous processing in the background

Message/Queue Scheduler

Options

Options for message bus:

• Database

• Amazon SQS

• Drb

• Memcache

• ActiveMQ

• …

Options for background process:

• (Ruby) Daemon

• Cron job with script/runner

• Forked process

• Delayed Job / BJ / (Backgroundrb)

• run_later

• ….

Database/Ruby daemon example

One database for everything

• All domain data in one place

• The simplest solution

Problems at some point

• Number of read and write requests

• Data size

Read Slave

• Slave replicates each SQL-statement on the master

• Increase read performance by reading from replicating slave

• Stale read problem

• Better used explicitly, but then makes you think

Better use memcached

Master-Master

• Increase write and read performance

• Each server is a slave of the other

• Synchronization can be tricky

• Limited by database size

Better for HA than for write performance

Data Partitioning

Partition on domain models

• Separate users and products

• Makes sense if JOINs are rare

• Scales reads/writes

• Reduces data size per database

• Depends on separate domains

Simple and effective

Data Partitioning

Sharding

• Split data into shards

• All tables

• Only big ones like users

• Partition by id, hash function or lookup

• Complex and makes JOINs complicated

Data Partitioning

Sharding

• Split data into shards

• All tables

• Only big ones like users

• Partition by id, hash function or lookup

• Complex and makes JOINs complicated

Last resort

Alternatives

Data size is often the bigger problem

Reduce data size Archiving

Archiving

Get rid of (historical) data

• Delete old data

• Aggregate old data

• Partition old data

Have an archiving policy from the start

Reduce data size

Avoid exponential data growth

• Do not store data in database, move to

• File system

• S3

• SimpleDB

• Do not normalize data

• Duplicate data in order to remove JOINs (and JOIN tables)

• Combine indices

Multiple clients

Multiple Clients

NOT the same as multiple users

Client is more like a separate domain – i.e. expansion to another country

• Different settings

• Different themes

• Different features enabled

• Different language

• Different audience

How to combine in one app?

Multiple Clients

Questions to ask

• How many different clients?

• Is there shared state (users, settings, posts, …)?

• What is the expected data size and growth of each client?

Multiple Clients

The easy way to maintenance hell

• Fork the code

• One branch per client

• One install per client

Multiple Clients

Same code – same database

• Move different behavior into configuration

• Move configuration into database

• Scope data by DB-column

• Scope all data request in the code

Multiple Clients

Same code – partition the data

• Partition data by database

Hardcode database while booting

Multiple Clients

Same code – partition the data

• Partition data by database

Choose database dynamically

Multiple Clients

Generate local databases

• Import global content into master DB

• Push shared content in the correct format to app DBs

• Build reverse channel if needed

Cloud Infrastructure

Servers come and go

• You do not know your servers before deploying

• Restarting is the same as introducing a new machine

You can’t hardcode IPs

database.yml

Solution #1

Query and manually adjust

• Servers do not change that often

• New nodes probably need manual intervention

• Use AWS ElasticIPs to ease the pain

Set servers dynamically AWS Elastic IP

Solution #2

Use a central directory service

• A central place to manage your running instances

• Instances query the directory and react

Solution #2

Use a central directory service

• A central place to manage your running instances

• Instances query the directory and react

Central Directory

Different Implementations

• File on S3

• SimpleDB

• A complete service, capable of monitoring and controlling your instances

Summary

Simple is better than complex

Carefully evaluate the different solutions

Only introduce a new component if you really need to

Everything has strings attached

Solving the data size problem often solves others too

Questions?

Peritor GmbH

Teutonenstraße 16 14129 Berlin

Telefon: +49 (0)30 69 20 09 84 0 Telefax: +49 (0)30 69 20 09 84 9

Internet: www.peritor.com E-Mail: kontakt@peritor.com

advanced deployment

searchfull text search

slow search queries

advanced deployment

rails maintainer http

peritor gmbh

jonathan weiss consultant

multiple columns

complete dataset

Technology

advanced public transportation systems deployment in the...

advanced server-load balancing deployment guide€¦ ·...

deployment day session 2 mdt 2012 advanced

advanced topics - session 1 - continuous deployment...

advanced metering infrastructure procurement and deployment

advanced aids to deployment - ucl computer science

advanced threat prevention deployment - palo alto networks

revised smart grid advanced metering infrastructure...

most advanced gtm deployment. ever!

advanced topics in continuous deployment

advanced topics in ip multicast deployment

rapid deployment - winncom technologies, corp.€¦ ·...

simulation and deployment of control algorithms in advanced...

advanced visual studio 2005 tools for office programming and...

doe efforts to accelerate deployment and commercialization...

m202 mongodb advanced deployment and operations

advanced anyconnect deployment and troubleshooting with …...

advanced application deployment with puppet

deployment of advanced telecommunications capability: second

advanced server load-balancing deployment guide