celery: the distributed task queue

29
Celery An introduction to the distributed task queue. Rich Leland ZPUGDC // April 6, 2010 @richleland [email protected] http://creative.discovery.com

Upload: rich-leland

Post on 13-May-2015

7.391 views

Category:

Technology


3 download

DESCRIPTION

An introduction to Celery from the April 6, 2010 meetup of ZPUGDC.

TRANSCRIPT

Page 1: Celery: The Distributed Task Queue

Celery

An introduction to the distributed task queue.

Rich Leland

ZPUGDC // April 6, 2010

@richleland

[email protected]

http://creative.discovery.com

Page 2: Celery: The Distributed Task Queue

What is Celery?

A task queue based on distributed message passing.

Page 3: Celery: The Distributed Task Queue

What is Celery?

An asynchronous, concurrent, distributed,

super-awesome task queue.

Page 4: Celery: The Distributed Task Queue

A brief history

• First commit in April 2009 as "crunchy"

• Originally built for use with Django

• Django is still a requirement

• Don't be scurred! No Django app required!

• It's for the ORM, caching, and signaling

• Future is celery using SQLAlchemy and louie

Page 5: Celery: The Distributed Task Queue

Why should I use Celery?

Page 6: Celery: The Distributed Task Queue

User perspective

• Minimize request/response cycle

• Smoother user experience

• Difference between pleasant and unpleasant

Page 7: Celery: The Distributed Task Queue

Developer perspective

• Offload time/cpu intensive processes

• Scalability - add workers as needed

• Flexibility - many points of customization

• About to turn 1 (apr 24)

• Actively developed

• Great documentation

• Lots of tutorials

Page 8: Celery: The Distributed Task Queue

LATENCY == DELAY == NOT GOOD!

Page 9: Celery: The Distributed Task Queue

Business perspective

• Latency == $$$

• Every 100ms of latency cost Amazon 1% in sales

• Google found an extra .5 seconds in search page

generation time dropped traffic by 20%

• 5ms latency in an electronic trading platform could mean

$4 million in lost revenues per millisecond

http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it

Page 10: Celery: The Distributed Task Queue

Example Uses

• Image processing

• Calculate points and award badges

• Upload files to a CDN

• Re-generate static files

• Generate graphs for enormous data sets periodically

• Send blog comments through a spam filter

• Transcoding of audio and video

Page 11: Celery: The Distributed Task Queue

What do I need?

Page 12: Celery: The Distributed Task Queue

Result Store

Message Queue

Worker 1

Application

Users

requests responses

tasks

Worker 2 Worker 3 Worker N...

Page 13: Celery: The Distributed Task Queue

MongoDB

RabbitMQ

celeryd

Application

Users

requests responses

tasks

celeryd celeryd celeryd...

DatabasememcachedRedisTokyo TyrantAMQP

StompRedisDatabase

Page 14: Celery: The Distributed Task Queue

USE RABBITMQ!

Page 15: Celery: The Distributed Task Queue

Installation

Page 16: Celery: The Distributed Task Queue

Installation

1. Install message queue from source or w/package mgr

2. pip install celery

3. pip install -r http://github.com/ask/celery/blob/v1.0.2/

contrib/requirements/default.txt?raw=true4. Configure application

5. Launch services (app server, rabbitmq, celeryd, etc.)

Page 17: Celery: The Distributed Task Queue

Usage

Page 18: Celery: The Distributed Task Queue

Configure

• celeryconf.py for pure python

• settings.py within a Django project

Page 19: Celery: The Distributed Task Queue

Define a task

from celery.decorators import task

@taskdef add(x, y): return x + y

Page 20: Celery: The Distributed Task Queue

Execute the task

>>> from tasks import add>>> add.delay(4, 4)<AsyncResult: 889143a6-39a2-4e52-837b-d80d33efb22d>

Page 21: Celery: The Distributed Task Queue

Analyze the results

>>> result = add.delay(4, 4)>>> result.ready() # has task has finished processing?False>>> result.result # task is not ready, so no return value yet.None>>> result.get() # wait until the task is done and get retval.8>>> result.result # access result8>>> result.successful()True

Page 22: Celery: The Distributed Task Queue

The Task class

class CanDrinkTask(Task): """ A task that determines if a person is 21 years of age or older. """ def run(self, person_id, **kwargs): logger = self.get_logger(**kwargs) logger.info("Running determine_can_drink task for person %s" % person_id) person = Person.objects.get(pk=person_id) now = date.today() diff = now - person.date_of_birth # i know, i know, this doesn't account for leap year age = diff.days / 365 if age >= 21: person.can_drink = True person.save() else: person.can_drink = False person.save() return True

Page 23: Celery: The Distributed Task Queue

Task retries

class CanDrinkTask(Task): """ A task that determines if a person is 21 years of age or older. """ default_retry_delay = 5 * 60 # retry in 5 minutes max_retries = 5 def run(self, person_id, **kwargs): logger = self.get_logger(**kwargs) logger.info("Running determine_can_drink task for person %s" % person_id) ...

Page 24: Celery: The Distributed Task Queue

The PeriodicTask class

class FullNameTask(PeriodicTask): """ A periodic task that concatenates fields to form a person's full name. """ run_every = timedelta(seconds=60)

def run(self, **kwargs): logger = self.get_logger(**kwargs) logger.info("Running full name task.") for person in Person.objects.all(): person.full_name = " ".join([person.prefix, person.first_name, person.middle_name, person.last_name, person.suffix]).strip() person.save() return True

Page 25: Celery: The Distributed Task Queue

Holy chock full of features Batman!

• Messaging

• Distribution

• Concurrency

• Scheduling

• Performance

• Return values

• Result stores

• Webhooks

• Rate limiting

• Routing

• Remote-control

• Monitoring

• Serialization

• Tracebacks

• Retries

• Task sets

• Web views

• Error reporting

• Supervising

• init scripts

Page 26: Celery: The Distributed Task Queue

Resources

Page 27: Celery: The Distributed Task Queue

Community

• Friendly core dev: Ask Solem Hoel

• IRC: #celery

• Mailing lists: celery-users

• Twitter: @ask

Page 28: Celery: The Distributed Task Queue

Docs and articles

Celery

• http://celeryproject.org

• http://ask.github.com/celery/

• http://ask.github.com/celery/tutorials/external.html

Message Queues

• http://amqp.org

• http://bit.ly/amqp_intro

• http://rabbitmq.com/faq.html

Page 29: Celery: The Distributed Task Queue

Thank you!

Rich Leland

Discovery Creative

@richleland

[email protected]

http://creative.discovery.com