faster drupal sites using queue api
TRANSCRIPT
Frédéric G. MARAND (fgm)Yuriy GERASIMOV (ygerasimov)
OSInetygerasimov
Faster web siteswith Queue API
Frédéric G. Marand
● OSInet: performance/architecture consulting for internal teams at larger accounts
● Core contributor 4.7 to 8.2.x, MongoDB + XMLRPCmaintainer + others
● Already 7 D8 customer projects , 4 before 8.0.0
● Customer D8 in production since 07/2015
● Frequently adds queueing to larger Drupalprojects : Beanstalkd, RabbitMQ, Apache Kafka...
fgm
Yuriy Gerasimov
● FFW
● Drupal architect & developer
● Contrib 7 modules: services, draggableviews
● Founder at Backtrac.io
ygerasimov
Why use queues ?
To have websites which are :
● Faster for visitors● Snappier for editors● More scaleable
To process time-consuming jobs :
● Video encoding● High-resolution gallery uploads and processing
Actual use cases
● Prepare content for non-Drupal front-ends
● Anticipate content generation
● Deferred submits, e.g. comments handling
● Slow operations: node saves, previews, image processing
● External data sources: pull, push
● Multi-step operations: batch
Cooking content for front-ends
Frontend
Anticipated content generation
BlocksCtools content typesControllersetc.
Contrib :http://github.com/FGM/lazy
Content created Served from cache
Fresh Stale Expiredt0 t1 t2
Served from cache Regenerate cache
time
Usual Drupal
Content created Served from cache
Fresh Stale Fresht0 t1 t2
Served from cache+ request update Store
Served from cache
time
Anticipated content generation
Comments handling
“Pull” data sources (aggregator)
“Push” data sources
Image processing
Job servers
● How to get results
● Rerun failed jobs
● Separate queue for failed jobs
● Monitoring queues, workers
● Supervisor
Some implementations
Queue D6 D7 D8
Memory core core
Database OK core core
AdvancedQueue OK Not yet
Amazon SQS (aws_sqs) OK Not yet
Beanstalkd OK OK
evQueue Private
Queue D6 D7 D8
Apache Kafka OK Started
Gearman OK OK Not yet
MongoDB OK Started
PHPResque OK Not yet
RabbitMQ OK OK
Redis (redis[_queue]) OK OK Alpha
Queue API: conceptsQueue: a minimally-featured FIFO
Worker: the code actually doing the work
Item: a piece of workload submitted to the queue
Runner: the process triggering/monitoring workers
Batch subsystem: a high-level API on top of Queue API
D8: Manager, Plugins
D6/D7 Queue API
D7: coreD6: drupal_queue module
Declaring queues:
hook_cron_queue_info[_alter]()
● “Skip on cron”: enable decoupling from cron runs● Time: max lifetime allocated to process items
during a cron run, useless with skip on cron =TRUE
● Worker callback: an implementation ofcallback_queue_worker (mixedqueue_item): void
API useable without cron
Default Runner:
● In the cron subsystem ● Pokemon exception handling
D8 Queue API
API useable without cron Declaring queue workers:
Service: plugin.manager.queue_worker
Instantiates QueueWorker plugins
Definition:● Cron, not enabled by default
○ Time: max lifetime allocated toprocess items during a cron run
● Core examples : AggregatorRefresh,LocaleTranslation
● hook_queue_info_alter()
Default Runner:
In the cron subsystem:Drupal\Core\Cron::processQueues()
SuspendQueueException: $q->releaseItem()
Queue API methods: Queue
QueueInterface● Q::createItem(mixed $data: void● Q::claimItem($lease_time = 3600: mixed $item
○ FALSE | stdClass + [item_id => int, data => mixed, created => timestamp]
○ $lease_time → Assumptions for runner, currently not used● Q::deleteItem($item): void -> work done● Q::releaseItem($item): bool● Q::numberOfItems(): int → best guess, unreliable● Q::createQueue() / Q::deleteQueue()
ReliableQueueInterface: ordering, single execution
Queue API methods: others
Queue service → QueueFactory::get($name, $reliable)
QueueManager: a vanilla plugin manager● In charge of hook_queue_info_alter()● createInstance($plugin_id, $configuration)
QueueWorkerInterface:● processItem (mixed data) : void @throws SuspendQueueException
Queue Runners
Core / Contrib● Core Cron / Elysia Cron / Queue_Runner● Drush: queue-list / queue-run● Similar limitations:
○ Default on in D6 / D7, default off in D8○ Limited timeout support: non preemptive○ Single threaded, single process across queues
Custom runners● Provided by queue modules or per-project one-offs● Preemption, parallel execution...
Queue API limitations
Limited FIFO paradigm● D8: non-Reliable
QueueInterface: datagram
No monitoring
No queue disciplines● Priority management● Tagging● Delay, burying ...
Implementations may provide more● Item structure is free-form: add
richer interfaces
No Peek(), no LIFO, no deduplication: hacks
Performance edge
Runners:● Avoid active polling à la core DB● Use a blocking layer + select()● Parallel handling of multiple queues → multiple runners, scheduling
Workers: read after write● Write in the queue → cache invalidated● Read again→ cache primed
Sprints: all week
https://www.flickr.com/photos/amazeelabs/9965814443/in/faves-38914559@N03/
Sprint with the Community until Sunday
We have tasks for every skillset.
Mentors are available for new contributors.
Follow @drupalmentoring.