hppg - high performance photo gallery

Download HPPG - high performance photo gallery

If you can't read please download the document

Upload: remigijus-kiminas

Post on 20-May-2015

2.222 views

Category:

Technology


1 download

DESCRIPTION

New version of presentation a lot of changes since last time.

TRANSCRIPT

  • 1. Introducing high performance photo gallery Remigijus Kiminas 2010-11-29 v5

2. Who I am? Author of http://livehelperchat.com/ http://redmine.remdex.info my projects :) Currently working http://www.coralsolutions.com/ Freelancing and building open-source software in free time 3. Purpose of the presentation 1 Present some architecture decisions witch were applied building image gallery 4. What's new since last presentation Mobile devices get support Image gallery can be used as shopping CMS Credit's based buying Checkout using paypal service Uncached pages get speed improvement by finding bug in paginator. Official ngnix support 5. What's new since last presentation 2 Extensions Kernel modules override Kernel classes override CSS compile Most popular images in 24 hours Photo approvement functionality Image filtering by resolution 6. What's new since last presentation 3 Thumbnails recreation script 100% duplicates management accuracy More configurable system aspects as: Max upload photo size Max archive size Max file queue size Animated gif support 7. What's new since last presentation 4 Animated gif support Completely fixed AJAX navigation usability, no more confusing of available images to left or to right. Front end design remake, thanks to http://pauliusc.lt HTML output compression HTML 5 frontend changes, saves bandwidth 8. What's new since last presentation 5 Some performance improvement regarding users permissions settings More things moved to Memcached service 9. What's new since last presentation 5 V4 Sort by relevance was introduced AddQuery usage implementation in search Refactored search page. One query less now. Paginator updates Sphinx wildcard support Images without original deletion script SEO enchancement related to resolution and user current page 10. What's new since last presentation 5 V5 Refactored captcha, it's now AJAX/javacript based, performs well, plus saves one request on image preview window Image preview full window cache!!! cached windows is as fast as cached pagination around 5ms Image counter from log file, avoid insert on each image preview window 11. What's new since last presentation 5 V5 Mysql query hint for album pagination, mysql planner choosed wrong indexes Smart selects in image preview window Full multilanguage support including translatable module URL!!! none of my known gallery/cms has this featyre. E.x gallery/search (engish) or gallerie/recherche (french) Full InnoDB support. Performs well as MyISAM. Top process is PHP not Mysql :) 12. Future works Pagination sharding with index filter shard table. It should boost large sets of pagination around 100% > and keep constant speed with millions of photos. http://remdex.info/Optimising-mysql-limit-performan Backend redesign 13. Issues with previous image gallery's I had A lot of users = a lot of problems No caching support Unoptimized SQL query's Resource hungry No framework used (well, perhaps this is not a problem, but most of the time they just duplicate frameworks functionality, reinventing the wheel...) No Etag based caching, bandwidth saver... 14. Requirements Optimized SQL queries Fulltext search engine Etag based caching SQL querys caching Fullpage caching Low resource requirements 15. Adopted software APC opcode cache for PHP Sphinx free open-source SQL full-text search engine (http://sphinxsearch.com/) Memcached free & open source, high-performance, distributed memory object caching system (http://memcached.org/) eZ Components an enterprise-ready, general-purpose PHP library of components used independently or together for PHP application development. (http://ez.no/ezcomponents) JQuery is a fast and concise JavaScript Library that simplifies HTML document traversing, event handling, animating, and Ajax interactions for rapid web development. (http://jquery.com/) Lighttpd lightweight open-source web server. (http://www.lighttpd.net/) Mysql database engine (http://www.mysql.com) 16. Adopted software Ngnix - A HTTP and mail proxy server licensed under a 2-clause BSD-like license. (http://nginx.org/) Fully working ngnix config provided. For eshop requirements and standard 17. Building process core Gallery core is based on eZ Components. Used components: Authentication Configuration Database Feed ImageAnalysis ImageConversion PersistentObject Translation Cache Url UserInput 18. Fulltext search implementation Why sphinx? Very very fast :) Used features of 9.9 SetSelect this feature was introduced in 9.9 version and allowed to make fancy filtering. Example in next slide 19. Image full mode problem with previous and next image Search condition in literal. I need to find 2 previous images based on current image position including search keyword, sorting mode. URL consists of Current image ID (16679) Keyword (haposai) Sort mode (popular) How do I find out what should I display in two first thumbnails (middle image is current our image)? 20. Solution Use SetSelect query $cl->SetSelect ( "*, (hits > '.$Image->hits.' OR (hits = '.$Image->hits.' AND pid > '.$Image- >pid.')) AS myfilter" ); $cl->SetFilter ( "myfilter", array(1) ); Things I do not know how to do till now. If sorting is based on relevance how to now previous two images. I know now. But: SetSelect does not work with @weight attributes in it. Had to use two query's. SetFilter() works with @weight AddQuery comes in help here for perfromance. Mutch more relevance images now. 21. Some search statistic Each day around 190 K querys. It were more if search result page were not be cached :) 22. Mysql performance tweaking Just optimise querys (EXPLAIN is you friend) Not a single slow query Some tips: With large data sets use SELECT * FROM `lh_gallery_images` INNER JOIN ( SELECT pid FROM lh_gallery_images ORDER BY comtime DESC, pid DESC LIMIT 20 OFFSET 20 ) AS items ON lh_gallery_images.pid = items.pid This query is at least 5x times faster than normal select. Tested with (150 K records.) See - http://www.mysqlperformanceblog.com 23. Supported HTTP servers Lighttpd Apache Ngnix With ngnix managed to produce around 1200 Q/S on cached page. It's 30% more than with Lighttpd. 24. Caching objects Version caching http://www.bestechvideos.com/2009/03/21/railslab-scaling-rails-episode-8-memcached http://www.infoq.com/presentations/lutke-rockstar-memcaching Version cache were used in Album pages Last uploaded Last hits Popular images and so on. The most popular images in 24 hours Then cache is cleared? It's not, only version number is increased, and automatic cache self expire, because cache key does not exists. 25. Some code with version cache Cache Key calculation in Album $cache = CSCacheAPC::getMem(); $cacheKey = md5('version_'.$cache->getCacheVersion('album_'.(int)$Params['user_parameters']['album_id']). $mode.'album_view_url'.(int)$Params['user_parameters']['album_id'].'_page_'.$Params['user_parameters_unordered']['page']); Includes: Album version $mode sorting mode (Ex. Popular) Page this combination gives unique cache version for each page. Same logic applies to all listing pages 26. Some benchmarks[root@ks310613 ~]# ab -n 500 -c 10 http://animeonly.org/Fantasy/Mix-16a.html This is ApacheBench, Version 2.0.40-dev apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking animeonly.org (be patient) Completed 100 requests Completed 200 requests Completed 300 requests Completed 400 requests Finished 500 requests Server Software: lighttpd Server Hostname: animeonly.org Server Port: 80 Document Path: /Fantasy/Mix-16a.html Document Length: 26883 bytes Concurrency Level: 10 Time taken for tests: 0.545137 seconds Complete requests: 500 Failed requests: 0 Write errors: 0 Total transferred: 13593092 bytes HTML transferred: 13441500 bytes Requests per second: 917.20 [#/sec] (mean) Time per request: 10.903 [ms] (mean) Time per request: 1.090 [ms] (mean, across all concurrent requests) Transfer rate: 24349.84 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 0 Processing: 5 10 2.9 9 23 Waiting: 4 9 3.1 9 23 Total: 5 10 2.9 9 23 Percentage of the requests served within a certain time (ms) 50% 9 66% 12 75% 13 80% 13 90% 13 95% 13 98% 20 27. Etag base caching What is it? An ETag (entity tag) is part of HTTP, the protocol for the World Wide Web. It is a response header that may be returned by an HTTP/1.1 compliant web server and is used to determine change in content at a given URL (http://en.wikipedia.org/wiki/HTTP_ETag) 28. How to use it? $ExpireTime = 3600; $currentKeyEtag = md5($cacheKey.'user_id_'.erLhcoreClassUser::instance()->getUserID());; header('Cache-Control: max-age=' . $ExpireTime); // must-revalidate header('Expires: '.gmdate('D, d M Y H:i:s', time()+$ExpireTime).' GMT'); header('ETag: ' . $currentKeyEtag); $iftag = isset($_SERVER['HTTP_IF_NONE_MATCH']) ? $_SERVER['HTTP_IF_NONE_MATCH'] == $currentKeyEtag : null; if ($iftag === true) { header ("HTTP/1.0 304 Not Modified"); header ('Content-Length: 0'); exit; } $cacheKey from previous example cache key User ID is needed if user is logged in. Can be used for custom pages, that do not change Then image is uploaded or deleted, we just increase cache version and Etag is expired automatic also. 29. Some MRTG screen shots 1 Hits per hour Mysql queries 30. Some MRTG screen shots 2 Memcached status Traffic stats 31. Conclusions Single server with sphinx, memcached, mysql, nginx handles per day around 180 K pageviews daily. No performance issues at this time. Gallery home page http://code.google.com/p/hppg/ 32. Thank you for your attention :) Questions etc: [email protected]