advanced deployment

Post on 15-Jan-2015

2.742 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Advanced Deployment by Jonathan Weiss presented at Scotland on Rails 2009 in Edinburgh. Deployment and Scaling best practices. See more at http://scotlandonrails.com/schedule/28-march/advanced-deployment/

TRANSCRIPT

Advanced Deployment Scotland on Rails 2009

Jonathan Weiss, 28 March 2009

Peritor GmbH

2

Who am I?

Jonathan Weiss

•  Consultant for Peritor GmbH in Berlin

•  Specialized in Rails, Scaling, Deployment, and Code Review

•  Webistrano - Rails deployment tool

•  FreeBSD Rubygems and Ruby on Rails maintainer

http://www.peritor.com

http://blog.innerewut.de

3

Deployment

Architecture Process

Deployment

4

Deployment Process Requirements

Automatic Reproducible Accountable Notifications

5

Deployment Tools

Several tools available

•  Capistrano

•  Webistrano

•  Vlad

•  Puppet

•  Chef

The deployment process is usually not that complicated

6

Architecture

7

How deployment starts out …

8

… and how it ends

9

Agenda

Search

Background Processing

Scaling the database

Multiple Client Installations

Cloud Infrastructure

10

General Advice -

Simple is better than complex

11

Search

12

Search

Full text search

Can become very slow on big data sets

13

Full Text Search Engine

Separate Service

•  Creates full text index

•  Application queries search daemon

•  Index update through application or database

Possible Engines

•  Ferret

•  Sphinx

•  Solr

•  Lucene

•  …

14

Search Slave

Database replication slave

•  Has complete dataset

•  Migrates slow search queries from master

•  Can use different database table engine

15

Database Index

PostgreSQL Tsearch2

•  Core since 8.3

•  Allows to create full text index on multiple columns or arbitrary SQL expressions

MySQL MyISAM FULLTEXT index

•  Only works with MySQL <= 5.0 and MyISAM tables

•  Full text index on multiple columns

16

What to use?

Different characteristics

•  Real-time updates and stale data

•  Lost updates

•  Performance

•  Document content and format

•  Complexity

17

Background Processing

18

Problem

Long running tasks

•  Resizing uploaded images

•  Mailing

•  Computing an expensive operation

•  Accessing slow back-ends

When running inside request-response-cycle

•  Blocks user

•  Blocks Rails instance

•  Hard to monitor and debug

19

Solution

Asynchronous processing in the background

Message/Queue Scheduler

20

Background Processing

21

Options

Options for message bus:

•  Database

•  Amazon SQS

•  Drb

•  Memcache

•  ActiveMQ

•  …

Options for background process:

•  (Ruby) Daemon

•  Cron job with script/runner

•  Forked process

•  Delayed Job / BJ / (Backgroundrb)

•  run_later

•  ….

22

Database/Ruby daemon example

23

Scaling the database

24

Scaling the database

One database for everything

•  All domain data in one place

•  The simplest solution

Problems at some point

•  Number of read and write requests

•  Data size

25

Scaling the database

Read Slave

•  Slave replicates each SQL-statement on the master

•  Increase read performance by reading from replicating slave

•  Stale read problem

•  Better used explicitly, but then makes you think

Better use memcached

26

Scaling the database

Master-Master

•  Increase write and read performance

•  Each server is a slave of the other

•  Synchronization can be tricky

•  Limited by database size

Better for HA than for write performance

27

Data Partitioning

Partition on domain models

•  Separate users and products

•  Makes sense if JOINs are rare

•  Scales reads/writes

•  Reduces data size per database

•  Depends on separate domains

Simple and effective

28

Data Partitioning

Sharding

•  Split data into shards

•  All tables

•  Only big ones like users

•  Partition by id, hash function or lookup

•  Complex and makes JOINs complicated

•  Scales reads/writes

•  Reduces data size per database

29

Data Partitioning

Sharding

•  Split data into shards

•  All tables

•  Only big ones like users

•  Partition by id, hash function or lookup

•  Complex and makes JOINs complicated

•  Scales reads/writes

•  Reduces data size per database

Last resort

30

Alternatives

Data size is often the bigger problem

Reduce data size Archiving

31

Archiving

Get rid of (historical) data

•  Delete old data

•  Aggregate old data

•  Partition old data

Have an archiving policy from the start

32

Reduce data size

Avoid exponential data growth

•  Do not store data in database, move to

•  File system

•  S3

•  SimpleDB

•  Do not normalize data

•  Duplicate data in order to remove JOINs (and JOIN tables)

•  Combine indices

33

Multiple clients

34

Multiple Clients

NOT the same as multiple users

Client is more like a separate domain – i.e. expansion to another country

•  Different settings

•  Different themes

•  Different features enabled

•  Different language

•  Different audience

How to combine in one app?

35

Multiple Clients

Questions to ask

•  How many different clients?

•  Is there shared state (users, settings, posts, …)?

•  What is the expected data size and growth of each client?

36

Multiple Clients

The easy way to maintenance hell

•  Fork the code

•  One branch per client

•  One install per client

37

Multiple Clients

Same code – same database

•  Move different behavior into configuration

•  Move configuration into database

•  Scope data by DB-column

•  Scope all data request in the code

38

Multiple Clients

Same code – partition the data

•  Move different behavior into configuration

•  Partition data by database

Hardcode database while booting

39

Multiple Clients

Same code – partition the data

•  Move different behavior into configuration

•  Partition data by database

Choose database dynamically

40

Multiple Clients

Generate local databases

•  Import global content into master DB

•  Push shared content in the correct format to app DBs

•  Build reverse channel if needed

41

Cloud Infrastructure

42

Cloud Infrastructure

Servers come and go

•  You do not know your servers before deploying

•  Restarting is the same as introducing a new machine

You can’t hardcode IPs

database.yml

43

Solution #1

Query and manually adjust

•  Servers do not change that often

•  New nodes probably need manual intervention

•  Use AWS ElasticIPs to ease the pain

Set servers dynamically AWS Elastic IP

44

Solution #2

Use a central directory service

•  A central place to manage your running instances

•  Instances query the directory and react

45

Solution #2

Use a central directory service

•  A central place to manage your running instances

•  Instances query the directory and react

46

Central Directory

Different Implementations

•  File on S3

•  SimpleDB

•  A complete service, capable of monitoring and controlling your instances

47

Summary

Simple is better than complex

Carefully evaluate the different solutions

Only introduce a new component if you really need to

Everything has strings attached

Solving the data size problem often solves others too

48

Questions?

49

49

Peritor GmbH

Teutonenstraße 16 14129 Berlin

Telefon: +49 (0)30 69 20 09 84 0 Telefax: +49 (0)30 69 20 09 84 9

Internet: www.peritor.com E-Mail: kontakt@peritor.com

Peritor GmbH - Alle Rechte vorbehalten

top related