celery - a distributed task queue

34
Celery - A Distributed Task Queue Duy Do (@duydo) 1

Upload: duy-do

Post on 16-Aug-2015

409 views

Category:

Software


0 download

TRANSCRIPT

Celery - A Distributed Task QueueDuy Do (@duydo)

1

Outline1. About

2. What is Celery?

3. Celery Architecture

4. Broker, Task, Worker

5. Monitoring

6. Coding

7. Q & A

2

About

A father, a husband and a software engineer

Passionate in distributed systems, real-time data processing, search engine

Work @sentifi as a backend engineer

Follow me @duydo

3

What is Celery?

Distributed Task Queue written in Python

Simple, fast, flexible, highly available, scalable

Mature, feature rich

Open source, BSD License

Large community

4

What is Task Queue?

Task Queue is a system for parallel execution of tasks

5

Client WorkerBrokersend tasks distribute tasks

Worker

distribute tasks

Celery Architecture

6

Client 1

Task Queue 2…

Task Queue N

Task Queue 1

Broker

Client 2

Worker1

Worker2

Task Result Storage

distribute tasks

distribute tasks

send tasks

send tasks

store task results

store task results

get task result

get task result

Broker

The middle man holds the tasks (messages)

Celery supports:

• RabbitMQ, Redis

• MongoDB, CouchDB

• ZeroMQ, Amazon SQS, IronMQ

7

Task

A unit of work

Exists until it has been acknowledged

Result of the tasks can be stored or ignored

States: PENDING, STARTED, SUCCESS, FAILURE, RETRY, REVOKED

Periodic task (cron jobs)

8

Define Tasks

#  function  style  @app.taskdef  add(x,  y):        return  x  *  y  

#  class  style  class  AddTask(app.Task):        def  run(self,  x,  y):                return  x  +  y

9

Calling Tasksapply_async(args[,  kwargs[,  …]])

delay(*args,  **kwargs)

calling(__call__)  

e.g:

• result  =  add.delay(1,  2)

• result  =  add.apply_async((1,  2),  countdown=10)

10

Calling Task Optionseta a specific date time that is the earliest time at which task will be executed

countdown set eta by seconds into the future

expires set task’s expire time

serializer pickle (default), json, yaml and msgpack

compression compress the messages using gzip or bzip2

queue route the tasks to different queues

11

Task Result

result.ready() true if the task has been executed

result.successful() true if the task executed successfully

result.result the return value of the task or exception

result.get() blocks until the task is complete, return result or exception

12

Tasks Workflows

Signatures: Partials, Immutability, Callbacks

The Primitives: Chains, Groups, Chords, Map & Starmap, Chunks

13

Signatures

signature() wraps args, kwargs, options of a single task invocation in a way such that it can be:

• passed to functions

• serialized and sent across the wire

like subtasks

14

Create Signatures#  ws.tasks.add(1,  2)s  =  signature('ws.tasks.add',  args=(1,  2),  countdown=10)  s  =  add.subtask((1,  2),  countdown=10)  s  =  add.s(1,  2)  s  =  add.s(1,  2,  debug=True)

#  inspect  fieldss.args    #  (1,  2)s.kwargs    #  {'debug':  True')s.options    #  {countdown=10}  

#  execute  as  task  s.delay()  s.apply_async()  s()

15

Partial Signatures

16

Specifying additional args, kwargs or options to apply_async/delay to create partial

• partial  =  add.s(1)  

• partial.delay(2)  #  1  +  2  

• partial.apply_async((2,))  #  1  +  2

Immutable Signatures

17

A signature can only be set with options

Using si() to create immutable signature

• add.si(1,  2)

Callbacks Signatures

18

Use the link arg of apply_sync to add callbacks

add.apply_async((1,  2),  link=add.s(3))

Group

19

A signature takes a list of tasks should be applied in parallel

s  =  group(add.s(i,  i)  for  i  in  xrange(5))  

s().get()  =>  [0,  2,  4,  6,  8]

Chain

20

Chain of callbacks, think pipeline

c  =  chain(add.s(1,  2),  add.s(3),  add.s(4))  

c  =  chain(add.s(1,  2)  |  add.s(3)  |  add.s(4))  

c().get()  =>  ((1  +  2)  +  3)  +  4

Chord

21

Like a group but with a callback

c  =  chord((add.s(i,  i)  for  i  in  xrange(5)),  xsum.s())  

c  =  chord(add.s(i,  i)  for  i  in  xrange(5))(xsum.s())  

c().get()  =>  20

Map

22

Like built-in map function

c  =  task.map([1,  2,  3])  

c()  =>  [task(1),  task(2),  task(3)]

Starmap

23

Same map except the args are applied as *args

c  =  add.map([(1,  2),  (3,  4)])  

c()  =>  [add(1,  2),  add(3,  4)]

Chunks

24

Chunking splits a long list of args to parts

items  =  zip(xrange(10),  xrange(10))  

c  =  add.chunks(items,  5)  

c()  =>  [0,  2,  4,  6,  8],  [10,  12,  14,  16,  18]

WorkerAuto reloading

Auto scaling

Time & Rate Limits

Resource Leak Protection

Scheduling

User Components

25

Autoloading

Automatically reloading the worker source code as it changes

celery  worker  —autoreload

26

Autoscaling

Dynamically resizing the worker pool depending on load or custom metrics defined by user

celery  worker  —autoscale=8,2  

=>  min  processes:  2,  max  processes:8

27

Time & Rate Limits

number of tasks per second/minute/hour

how long a task can be allowed to run

28

Resource Leak Protection

Limit number of tasks a pool worker process can execute before it’s replaced by a new one

celery  worker  —maxtaskperchild=10

29

Scheduling

Specify the time to run a task

in seconds, date time

periodic tasks (interval, crontab expressions)

30

User ComponentsCelery uses a dependency graph enabling fire grained control of the workers internally, called “bootsteps”

Customize the worker components, e.g: ConsumerStep

Add new components

Bootsteps http://celery.readthedocs.org/en/latest/userguide/extending.html

31

MonitoringFlower - Real-time Celery web monitor

• Task progress and history

• Show task details (arguments, start time, runtime, and more)

• Graphs and statistics

• Shutdown, restart worker instances

• Control worker pool size, autoscaling settings

• …

32

Coding…

Get your hand dirty…

33

–Duy Do (@duydo)

Thank you

34