celery: the distributed task queue
DESCRIPTION
An introduction to Celery from the April 6, 2010 meetup of ZPUGDC.TRANSCRIPT
Celery
An introduction to the distributed task queue.
Rich Leland
ZPUGDC // April 6, 2010
@richleland
http://creative.discovery.com
What is Celery?
A task queue based on distributed message passing.
What is Celery?
An asynchronous, concurrent, distributed,
super-awesome task queue.
A brief history
• First commit in April 2009 as "crunchy"
• Originally built for use with Django
• Django is still a requirement
• Don't be scurred! No Django app required!
• It's for the ORM, caching, and signaling
• Future is celery using SQLAlchemy and louie
Why should I use Celery?
User perspective
• Minimize request/response cycle
• Smoother user experience
• Difference between pleasant and unpleasant
Developer perspective
• Offload time/cpu intensive processes
• Scalability - add workers as needed
• Flexibility - many points of customization
• About to turn 1 (apr 24)
• Actively developed
• Great documentation
• Lots of tutorials
LATENCY == DELAY == NOT GOOD!
Business perspective
• Latency == $$$
• Every 100ms of latency cost Amazon 1% in sales
• Google found an extra .5 seconds in search page
generation time dropped traffic by 20%
• 5ms latency in an electronic trading platform could mean
$4 million in lost revenues per millisecond
http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it
Example Uses
• Image processing
• Calculate points and award badges
• Upload files to a CDN
• Re-generate static files
• Generate graphs for enormous data sets periodically
• Send blog comments through a spam filter
• Transcoding of audio and video
What do I need?
Result Store
Message Queue
Worker 1
Application
Users
requests responses
tasks
Worker 2 Worker 3 Worker N...
MongoDB
RabbitMQ
celeryd
Application
Users
requests responses
tasks
celeryd celeryd celeryd...
DatabasememcachedRedisTokyo TyrantAMQP
StompRedisDatabase
USE RABBITMQ!
Installation
Installation
1. Install message queue from source or w/package mgr
2. pip install celery
3. pip install -r http://github.com/ask/celery/blob/v1.0.2/
contrib/requirements/default.txt?raw=true4. Configure application
5. Launch services (app server, rabbitmq, celeryd, etc.)
Usage
Configure
• celeryconf.py for pure python
• settings.py within a Django project
Define a task
from celery.decorators import task
@taskdef add(x, y): return x + y
Execute the task
>>> from tasks import add>>> add.delay(4, 4)<AsyncResult: 889143a6-39a2-4e52-837b-d80d33efb22d>
Analyze the results
>>> result = add.delay(4, 4)>>> result.ready() # has task has finished processing?False>>> result.result # task is not ready, so no return value yet.None>>> result.get() # wait until the task is done and get retval.8>>> result.result # access result8>>> result.successful()True
The Task class
class CanDrinkTask(Task): """ A task that determines if a person is 21 years of age or older. """ def run(self, person_id, **kwargs): logger = self.get_logger(**kwargs) logger.info("Running determine_can_drink task for person %s" % person_id) person = Person.objects.get(pk=person_id) now = date.today() diff = now - person.date_of_birth # i know, i know, this doesn't account for leap year age = diff.days / 365 if age >= 21: person.can_drink = True person.save() else: person.can_drink = False person.save() return True
Task retries
class CanDrinkTask(Task): """ A task that determines if a person is 21 years of age or older. """ default_retry_delay = 5 * 60 # retry in 5 minutes max_retries = 5 def run(self, person_id, **kwargs): logger = self.get_logger(**kwargs) logger.info("Running determine_can_drink task for person %s" % person_id) ...
The PeriodicTask class
class FullNameTask(PeriodicTask): """ A periodic task that concatenates fields to form a person's full name. """ run_every = timedelta(seconds=60)
def run(self, **kwargs): logger = self.get_logger(**kwargs) logger.info("Running full name task.") for person in Person.objects.all(): person.full_name = " ".join([person.prefix, person.first_name, person.middle_name, person.last_name, person.suffix]).strip() person.save() return True
Holy chock full of features Batman!
• Messaging
• Distribution
• Concurrency
• Scheduling
• Performance
• Return values
• Result stores
• Webhooks
• Rate limiting
• Routing
• Remote-control
• Monitoring
• Serialization
• Tracebacks
• Retries
• Task sets
• Web views
• Error reporting
• Supervising
• init scripts
Resources
Community
• Friendly core dev: Ask Solem Hoel
• IRC: #celery
• Mailing lists: celery-users
• Twitter: @ask
Docs and articles
Celery
• http://celeryproject.org
• http://ask.github.com/celery/
• http://ask.github.com/celery/tutorials/external.html
Message Queues
• http://amqp.org
• http://bit.ly/amqp_intro
• http://rabbitmq.com/faq.html
Thank you!
Rich Leland
Discovery Creative
@richleland
http://creative.discovery.com