the gearman cookbook

53
The Gearman Cookbook OSCON 2010 Eric Day http://oddments.org/ Senior Software Engineer @ Rackspace

Upload: others

Post on 04-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

The Gearman CookbookOSCON 2010

Eric Dayhttp://oddments.org/

Senior Software Engineer @ Rackspace

OSCON 2010 The Gearman Cookbook 2

Thanks for being here!

OSCON 2010 The Gearman Cookbook 3

Ask questions!

Grab a mic for long questions.

OSCON 2010 The Gearman Cookbook 4

Use the source...

Source: 00

OSCON 2010 The Gearman Cookbook 5

What is Gearman?

OSCON 2010 The Gearman Cookbook 6

It is not German.

(well, not entirely at least)

OSCON 2010 The Gearman Cookbook 7

A protocol with multiple implementations.

OSCON 2010 The Gearman Cookbook 8

A message queue.

OSCON 2010 The Gearman Cookbook 9

A job coordinator.

OSCON 2010 The Gearman Cookbook 10

MANAGER

GEARMAN

OSCON 2010 The Gearman Cookbook 11

“A massively distributed, massively fault tolerant

fork mechanism.”- Joe Stump, SimpleGeo

OSCON 2010 The Gearman Cookbook 12

A building block for distributed architectures.

OSCON 2010 The Gearman Cookbook 13

Features

● Open Source● Simple & Fast● Multi-language● Flexible application design● Embeddable● No single point of failure

OSCON 2010 The Gearman Cookbook 14

How doesGearman work?

OSCON 2010 The Gearman Cookbook 15

OSCON 2010 The Gearman Cookbook 16

OSCON 2010 The Gearman Cookbook 17

While large-scale architectures work well,you can start off simple.

Source: 01

OSCON 2010 The Gearman Cookbook 18

Foreground(synchronous)

orBackground

(asynchronous)

Source: 02

OSCON 2010 The Gearman Cookbook 19

Questions?

OSCON 2010 The Gearman Cookbook 20

Let's get cooking!

OSCON 2010 The Gearman Cookbook 21

Required Ingredients:

OSCON 2010 The Gearman Cookbook 22

Job Server

● Perl Server (Gearman::Server in CPAN)● The original implementation● Actively maintained by folks at SixApart

● C Server (https://launchpad.net/gearmand)● Rewrite for performance and threading● Added new features like persistent queues● Different port (IANA assigned 4730)● Now moving to C++

OSCON 2010 The Gearman Cookbook 23

Client API

● Available for most common languages● Command line tool● User defined functions in SQL databases

● MySQL● PostgreSQL● Drizzle

OSCON 2010 The Gearman Cookbook 24

Worker API

● Available for most common languages● Usually in the same packages as the client API

● Command line tool

OSCON 2010 The Gearman Cookbook 25

Optional Ingredients

● Databases● Shared or distributed file systems● Other network protocols

● HTTP● E-Mail

● Domain specific libraries● Image manipulation● Full-text indexing

OSCON 2010 The Gearman Cookbook 26

Recipes

● Scatter/Gather● Map/Reduce● Asynchronous Queues● Pipeline Processing

OSCON 2010 The Gearman Cookbook 27

Scatter/Gather

● Perform a number of tasks concurrently● Great way to speed up web applications● Tasks don't need to be related● Allocate dedicated resources for different tasks● Push logic down to where data exists

OSCON 2010 The Gearman Cookbook 28

Scatter/Gather

Client

DB Query

DB Query

ImageResize

LocationSearch

Full-textSearch

OSCON 2010 The Gearman Cookbook 29

Scatter/Gather

● Start simple with a single task● Multiple tasks● Concurrent tasks

Source: 03

OSCON 2010 The Gearman Cookbook 30

Scatter/Gather

● Concurrent tasks with different workers● All tasks run in the time for longest running● Must have enough workers available

Source: 04

OSCON 2010 The Gearman Cookbook 31

Note on Resize Worker

OSCON 2010 The Gearman Cookbook 32

Web Applications

● Reduce page load time with concurrency● Don't tie up web server resources● Improve time to first byte

● Start non-blocking requests● Send first part of response● Block when you need one of the results

OSCON 2010 The Gearman Cookbook 33

Questions?

OSCON 2010 The Gearman Cookbook 34

Map/Reduce

● Similar to scatter/gather, but split up one task● Push logic to where data exists (map)● Report aggregates or other summary (reduce)● Can be multi-tier

OSCON 2010 The Gearman Cookbook 35

Map/Reduce

ClientTask T

Task T0

Task T1

Task T2

Task T3

Task T0

OSCON 2010 The Gearman Cookbook 36

Map/Reduce

ClientTask T

Task T0

Task T1

Task T2

Task T3

Task T0

Task T00

Task T01

Task T02

OSCON 2010 The Gearman Cookbook 37

Log Service

● Push all log entries to log_collect queue● tail -f access_log | gearman -n -f log_collect● Natural spreading between workers when busy● Can shutdown workers to help balance

● Worker for each operation per log server● Push operations to where data resides

OSCON 2010 The Gearman Cookbook 38

Log Service

Source: 05

OSCON 2010 The Gearman Cookbook 39

Questions?

OSCON 2010 The Gearman Cookbook 40

Asynchronous Queues

● They help you scale● Not everything needs immediate processing

● Sending e-mail, tweets, …● Log entries and other notifications● Data insertion and indexing

● Allows for batch operations

OSCON 2010 The Gearman Cookbook 41

Delayed E-Mail

● Replace:

● With:

# Send email right nowmail($to_address, $subject, $body, $headers);

# Put email in queue to send$client = new GearmanClient();$client->addServer('127.0.0.1', 4730);$client->doBackground('send_email', serialize($email_options));

Source: 06

OSCON 2010 The Gearman Cookbook 42

Database Updates

● Also useful as a database trigger● Start background jobs on database changes● Requires MySQL UDF package

CREATE TRIGGER tweet_blogBEFORE INSERT ON blog_entriesFOR EACH ROW SET @ret=gman_do_background('send_tweet', CONCAT(NEW.title, " - ", NEW.url));

OSCON 2010 The Gearman Cookbook 43

Questions?

OSCON 2010 The Gearman Cookbook 44

Pipeline Processing

● Some tasks need a series of transformations● Chain workers to send data for the next step

ClientTask T

WorkerOperation 1

ClientTask T

WorkerOperation 2

WorkerOperation 3

Output

OSCON 2010 The Gearman Cookbook 45

Search Engine

● Insert URLs, track duplicates● Fetch contents of URLs● Store URLs with title and body● Search stored URLs

OSCON 2010 The Gearman Cookbook 46

Search Engine

Insert

Fetch

Store/Search

Insert

Search

Source: 07

OSCON 2010 The Gearman Cookbook 47

Questions?

OSCON 2010 The Gearman Cookbook 48

Persistent Queues

● By default, jobs are only stored in memory● Various contributions from community

● MySQL/Drizzle● PostgreSQL● SQLite● Tokyo Cabinet● memcached (not always “persistent”)

OSCON 2010 The Gearman Cookbook 49

Persistent Queues

● Use at your own risk, test in your environment!● Configure back-end to meet your performance

and durability needs

Source: 08

OSCON 2010 The Gearman Cookbook 50

Timeouts

● By default, operations block forever● Clients may want a timeout on foreground jobs● Workers may need to periodically run other

code besides job callback

Source: 09

OSCON 2010 The Gearman Cookbook 51

gearmand --help

● --job-retries - Prevent poisonous jobs● --worker-wakeup - Don't wake up all workers for

every job● --threads - Run multiple I/O threads (C only)● --protocol - Load pluggable protocols (C only)

OSCON 2010 The Gearman Cookbook 52

New Distributed Applications

● Think of scalable cloud architectures● Not just LAMP on a virtual machine● Elastic servers and services (workers)● New data models

● Use eventual consistency whenever possible

● Blogs, wikis, and other web apps powered by EC and queues, not a single logical database

OSCON 2010 The Gearman Cookbook 53

Get involved!

● http://gearman.org/● Mailing list, documentation, related projects

● #gearman on irc.freenode.net● Contact me at: http://oddments.org/● Stickers!