Вадим Абрамчук — big drupal: issues we met

Download Вадим Абрамчук — Big Drupal: Issues We Met

Post on 29-Jan-2018




2 download

Embed Size (px)


  1. 1. Big Drupal: Issues We Met Vadym Abramchuk Internetdevels
  2. 2. Project overview Realty analytics platform Used by professional brokers New data imported each few hours
  3. 3. The technologies PHP, MySQL, Drupal 7
  4. 4. The technologies PHP, MySQL, Drupal 7 MySQL is a primary storage Solr 5.2 as search engine PostgreSQL + PostGIS for spatial calculations
  5. 5. I wanted to say One million nodes but I didnt
  6. 6. The facts 958769 nodes 7004555 ECK entities in 7 types 375874 custom entity type rows MySQL database size: 4G gzipped, 66G disk space usage Solr indexes: 91G
  7. 7. Importing data into Drupal
  8. 8. Bulk data import Know your data Prepare your data Avoid direct insertions Insert directly if youre brave enough (or you dont have a choice) Watch your step
  9. 9. Ok, so you dont batch Drupal has memory leaks Entity metadata wrapper has memory leaks https://www.drupal.org/node/1343196 Everything has memory leaks Avoid long-running processes Offload processing with queues and batches
  10. 10. Offloading Put your data into separate table Add queue Let it run in background. Queues are made to be indepent Having small amount of data? Use batches Need to update really fast? Think about MySQL cursors
  11. 11. Tuning garbage collector
  12. 12. Tuning garbage collector
  13. 13. Playing with garbage collector gc_disable() process your batch, then gc_collect_cycles() gc_enable() Not a Holy Grail, may become Pandoras box
  14. 14. Living with Solr
  15. 15. Drupal and Solr Solr is a search engine Solr is fast and scalable Data is denormalized Takes lots of space Needs indexation
  16. 16. Data indexing flow
  17. 17. Why indexing is slow Search API indexation is single threaded Uses entity metadata wrappers You can run multiple indexation processes, one per each index Items are not locked, multiple workers at same index will do the same Indexation via drush does not use batch and is slow due to memory leaks
  18. 18. Faster indexing Index in parallel One daemon to rule them all Pool of workers Daemon maintains list of items to index Fully utilizes hardware Needs special Solr configuration due to autocommit
  19. 19. Faster indexing How it looks like
  20. 20. Search API Views integration Make a view like you always do, but for Search API items Solr returns results set Views fields handlers loads corresponding entities with entity_load (sic!) https://www.drupal.org/node/2028337 Database is queried anyway but load is much less Entity properties are rendered as entity properties Use Search API processors to avoid loading entity and running getters
  21. 21. Caching
  22. 22. Cache backends Database is slow Memcache is faster Redis is even faster Redis caching module locks Redis by running LUA script inside server local keys = redis.call("KEYS", ARGV[1]) for i, k in ipairs(keys) do redis.call("DEL", k) end return 1
  23. 23. So useful useless cache Field cache litters the cache storage Use entitycache module with memcached Large amount of cache entries kills Redis
  24. 24. Solr-oriented caching Solr has an index version Changed during Solr commit May be used for cache key
  25. 25. Thats all folks! Thanks for your time everyone Reach me if you have some questions Reach me right now if you have some answers ;)