job queues overview
TRANSCRIPT
● Allows for asynchronous computation of jobs (or tasks)● Uses consumers (or workers) to complete the job in the
background● Results are available when the job is complete
Queue
● enqueue adds an item to end of queue➜● dequeue pulls the oldest item off the queue➜● isEmpty boolean ➜● length integer (number of items in queue)➜
Queue Operations
For an unbounded queue, we choose a singly linked list with head and tail pointers as the data structure.
Queue Data Structure
All O(1) operations!
● enqueue - sets current tail next pointer and tail pointer to new item
● dequeue - returns current head and sets head pointer to head next pointer
● isEmpty - head/tail is null
Producers push jobs onto the job queue
Examples:● Web servers - A typical HTTP response must return
within a short timeframe (200ms - 2000ms)● Humans phoning into tech support
Producers
Consumers pop jobs off of the queue and complete them
Example use cases (any long running process):● Map / reduce calls on large datasets● Media conversion, manipulation and rendering● Image resize● Downloading remote resources● CPU intensive tasks (calculations)
Consumers
Producers and Consumers can be part of the same process!
Example: a web crawler (breadth first search)1. Push a base URL to the queue (e.g. http://yahoo.com/)2. Pop a URL from the queue and parse it 3. For each link the page, push it onto the queue4. Goto 2
Producers and Consumers
Each job exists in one of the following states:● Queued● Processing (in progress)● Completed● FailedJobs may also output:● Logs● Progress (% complete)
Job States
Consumers are functional. The only input they receive comes from the job, which comes from the producer.
Job data should include:● Type● Any information needed to complete the job
Job Data
...states that the speedup a concurrent algorithm can achieve is limited by the serial path.
Locks and serial parts limit the maximum performance of a concurrent system.
Amdahl’s law...
● Priority ordered Queue data structure● Highest priority jobs are dequeued first● On the same priority level, oldest jobs are dequeued
first
Priority Queue
● enqueue adds a job to end of queue with a priorty➜● dequeue pulls the highest priority, oldest job off the ➜
queue● isEmpty boolean ➜● length integer (number of items in queue)➜
Priority Queue Operations
● Data structure (max heap)● Binary tree with the max heap property (each parent
node is larger than its children)● For a priority queue, each item in the tree would be a
pointer to a regular queue for that priority
Priority Queue Data Structure
Enqueue and dequeue O(log n) operations!
● Average wait time per job type● Number of queued jobs● Jobs processed / time● Jobs pushed / time
Jobs processed / time ≥ Jobs push / time
Otherwise a backlog forms!
Priority Queue Metrics
In sophisticated job systems, a job scheduler exists to:● Maximize use of computing power● Minimize wait time● Provide an interface to job tasks
They can use a combination of priority, estimated (historical) job time and available computing power to determine how jobs are run. Sophisticated job scheduling algorithms exists.
Job Scheduler
Case Study: Grocery Lines
4 consumers, 4 queues, 12 jobs of varying durations
Average wait time = (10 + 13 + 4 + 6 + 1 + 9 + 6 + 13) / 12 = 5.1666...
Case Study: Grocery Lines
4 consumers, 1 queue, 12 jobs of varying durations
Order: 6, 1, 4, 10, 7 (1), 8 (4), 2 (6), 3 (6), 11 (8), 5 (8), 12 (9), 9 (10)
Average wait time = (1 + 4 + 6 + 6 + 8 + 8 + 9 + 10) / 12 = 4.3333...
Case Study: Grocery Lines
4 consumers, 1 queue, 12 jobs of varying durations intelligently ordered to minimize wait time:
Order: 1, 2, 3, 4, 5 (1), 6 (2), 7 (3), 8 (4), 9 (5), 10 (6), 11 (8), 12 (9)
Average wait time = (1 + 2 + 3 + 4 + 5 + 6 + 8 + 9) / 12 = 3.1667...
● Beanstalkd (C) http://kr.github.io/beanstalkd/ ● Celery (Python + many backends) http://www.celeryproject.org/ ● Delayed::Job (Ruby + DB) https://github.com/collectiveidea/delayed_job ● Gearman (C++) http://gearman.org/ ● Kue (Node + Redis) https://github.com/learnboost/kue ● Resque (Ruby + Redis) http://resquework.org/ ● RQ (Python + Redis) http://python-rq.org/ ● Sidekiq (Ruby) http://sidekiq.org/ ● SQS by Amazon (managed) http://aws.amazon.com/sqs/
More links and information at http://queues.io/
Job Queue Software