hppg r819 gallery presentation, search by color introduced
Post on 16-Jul-2015
973 Views
Preview:
TRANSCRIPT
Introducing high performance photo gallery
Remigijus Kiminas2010-12-25v6
Who I am?
Author of
http://livehelperchat.com/
http://redmine.remdex.info my projects :)
Currently working
http://www.coralsolutions.com/
Freelancing and building open-source software in free time
Purpose of the presentation 1
Present some architecture decisions witch were applied building image gallery
What's new since last presentation
Mobile devices get support
Image gallery can be used as shopping CMS
Credit's based buying
Checkout using paypal service
Uncached pages get speed improvement by finding bug in paginator.
Official ngnix support
What's new since last presentation 2
Extensions
Kernel modules override
Kernel classes override
CSS compile
Most popular images in 24 hours
Photo approvement functionality
Image filtering by resolution
What's new since last presentation 3
Thumbnails recreation script
100% duplicates management accuracy
More configurable system aspects as:
Max upload photo size
Max archive size
Max file queue size
Animated gif support
What's new since last presentation 4
Animated gif support
Completely fixed AJAX navigation usability, no more confusing of available images to left or to right.
Front end design remake, thanks to http://pauliusc.lt
HTML output compression
HTML 5 frontend changes, saves bandwidth
What's new since last presentation 5
Some performance improvement regarding users permissions settings
More things moved to Memcached service
What's new since last presentation 5 V4
Sort by relevance was introduced
AddQuery usage implementation in search
Refactored search page. One query less now.
Paginator updates
Sphinx wildcard support
Images without original deletion script
SEO enchancement related to resolution and user current page
What's new since last presentation 5 V5
Refactored captcha, it's now AJAX/javacript based, performs well, plus saves one request on image preview window
Image preview full window cache!!! cached windows is as fast as cached pagination around 5ms
Image counter from log file, avoid insert on each image preview window
What's new since last presentation 5 V5
Last rated functionality
Cache status window
Recently top rated, in 24 hours
APC support as cache engine.
HTML5, SWF, FLV files support
Search suggest feature
What's new since last presentation 5 V5
Mysql query hint for album pagination, mysql planner choosed wrong indexes
Smart selects in image preview window
Full multilanguage support including translatable module URL!!! none of my known gallery/cms has this featyre. E.x gallery/search (engish) or gallerie/recherche (french)
Full InnoDB support. Performs well as MyISAM. Top process is PHP not Mysql :)
What's new since last presentation V6
Search by color, multicolor, and keyword at the same time.
For best performance this feature uses MySQL partitions. Biggest table has around 8M records for 270 000 images.
Multicolor search uses self inner joins. Regarding performance memory table can be activated.
http://code.google.com/p/hppg/wiki/SearchByColor
What's new since last presentation V6
Sphinx can be used as search by color handler also.$cl->SetMatchMode( SPH_MATCH_EXTENDED2);
$cl->SetRankingMode(SPH_RANK_WORDCOUNT);
Much faster than MySQL layer. Also pays attention to keyword density. Results are almost the same as MySQL layer.
What's new since last presentation V6
Custom color_indexer was writeln using opencv library. Yes I know a little bit C :)
Gives 24x performance boost compared to standard method using php and mysql.
http://code.google.com/p/hppg/wiki/ColorIndexer
How does search by color works?
Some reference firsthttp://opencv.willowgarage.com/wiki/
This library was used for writing color_indexer application
http://mattmueller.me/blog/creating-piximilar-image-search-by-color
There I got my inspiration and basic concept. Either database structure is completely different.
http://www.compuphase.com/cmetric.htm Formula for calculating similar color to our pallete
http://en.wikipedia.org/wiki/Tag_cloud Formula for representing color density in image sphinx table
Database structure 1Two tables
Pallete table
CREATE TABLE IF NOT EXISTS `lh_gallery_pallete` ( `id` int(11) NOT NULL AUTO_INCREMENT, `red` int(11) NOT NULL DEFAULT '0', `green` int(11) NOT NULL DEFAULT '0', `blue` int(11) NOT NULL DEFAULT '0', `position` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`id`), KEY `position` (`position`)) ENGINE=MyISAM;
Images statistic table
CREATE TABLE IF NOT EXISTS `lh_gallery_pallete_images` ( `pid` int(11) NOT NULL, `pallete_id` smallint(3) NOT NULL, `count` smallint(5) NOT NULL, PRIMARY KEY (`pallete_id`,`pid`), KEY `pid` (`pallete_id`,`count`,`pid`), KEY `pallete_id` (`pallete_id`), KEY `pid_2` (`pid`)) ENGINE=MyISAM ;
Database structure 2
Table for quick fetch of image top colors:CREATE TABLE IF NOT EXISTS `lh_gallery_pallete_images_stats` (
`pid` int(11) NOT NULL, `colors` varchar(100) NOT NULL, PRIMARY KEY (`pid`)) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Filling sphinx index
Sphinx index table has dedicated field “colors” witch is filled in the following way. 15400 is our thumbnail size 120x130 – maximum color matches
/** * This part was changed based on formula * * It fits here better than just log * * http://en.wikipedia.org/wiki/Tag_cloud * */ $max = 15400/2; // A little better distribution of color $min = 25; $rmax = 50; $rmin = 1;
$colorIndex = array();
foreach ($colorsMaximumImage as $color) { $colorIndexString = trim(str_repeat(' pld'.$color['pallete_id'],round((($rmin*($color['count']-25))/($max-$min))*100))); if ($colorIndexString != '') $colorIndex[] = $colorIndexString; }
Searching by colorTwo options as I wrote earlier:
Use MySQL as search engine
Advantages – activated by default, works faster than sphinx with single color search
Disantavtages – works slowly then more than one color filter is used
Sphinx as search engine
Advantages – performance stays the same with one or multiple colors
Disatvantages – need to install sphinx, works a little bit slower with one color filter than MySQL.
Recomendation?
Definitely use Sphinx for color search.
Future works V5 (implemented)
Pagination sharding with index filter shard table.It should boost large sets of pagination around 100% > and keep constant speed with millions of photos.
http://remdex.info/Optimising-mysql-limit-performance-99a.html
Backend redesign
Issues with previous image gallery's I had
A lot of users = a lot of problemsNo caching support
Unoptimized SQL query's
Resource hungry
No framework used (well, perhaps this is not a problem, but most of the time they just duplicate frameworks functionality, reinventing the wheel...)
No Etag based caching, bandwidth saver...
Requirements
Optimized SQL queries
Fulltext search engine
Etag based caching
SQL querys caching
Fullpage caching
Low resource requirements
Adopted software
APC – opcode cache for PHP
Sphinx – free open-source SQL full-text search engine (http://sphinxsearch.com/)
Memcached – free & open source, high-performance, distributed memory object caching system (http://memcached.org/)
eZ Components – an enterprise-ready, general-purpose PHP library of components used independently or together for PHP application development.(http://ez.no/ezcomponents)
JQuery – is a fast and concise JavaScript Library that simplifies HTML document traversing, event handling, animating, and Ajax interactions for rapid web development. (http://jquery.com/)
Lighttpd – lightweight open-source web server.(http://www.lighttpd.net/)
Mysql – database engine(http://www.mysql.com)
Adopted software
Ngnix - A HTTP and mail proxy server licensed under a 2-clause BSD-like license. (http://nginx.org/)
Fully working ngnix config provided. For eshop requirements and standard
Building process – core
Gallery core is based on eZ Components. Used components:
Authentication
Configuration
Database
Feed
ImageAnalysis
ImageConversion
PersistentObject
Translation
Cache
Url
UserInput
Fulltext search implementation
Why sphinx?
Very very fast :)
Used features of 9.9
SetSelect – this feature was introduced in 9.9 version and allowed to make fancy filtering.
Example in next slide
Image full mode problem with previous and next image
Search condition in literal. I need to find 2 previous images based on current image position including search keyword, sorting mode.
URL consists of
Current image ID (16679)
Keyword (haposai)
Sort mode (popular)
How do I find out what should I display in two first thumbnails (middle image is current our image)?
Solution
Use SetSelect query$cl->SetSelect ( "*, (hits > '.$Image->hits.' OR (hits = '.$Image->hits.' AND pid > '.$Image->pid.')) AS myfilter" );$cl->SetFilter ( "myfilter", array(1) );
Things I do not know how to do till now. If sorting is based on relevance how to now previous two images.
I know now. But:
SetSelect does not work with @weight attributes in it.
Had to use two query's. SetFilter() works with @weight
AddQuery comes in help here for perfromance. Mutch more relevance images now.
Some search statistic
Each day around 190 K querys. It were more if search result page were not be cached :)
Mysql performance tweaking
Just optimise querys (EXPLAIN is you friend)
Not a single slow query
Some tips:
With large data sets useSELECT * FROM `lh_gallery_images`
INNER JOIN ( SELECT pid FROM lh_gallery_images ORDER BY comtime DESC, pid DESC LIMIT 20 OFFSET 20 ) AS items
ON lh_gallery_images.pid = items.pid
This query is at least 5x times faster than normal select. Tested with (150 K records.)
See - http://www.mysqlperformanceblog.com
Supported HTTP servers
Lighttpd
Apache
Ngnix
With ngnix managed to produce around 1200 Q/S on cached page. It's 30% more than with Lighttpd.
Caching objects
Version cachinghttp://www.bestechvideos.com/2009/03/21/railslab-scaling-rails-episode-8-memcached
http://www.infoq.com/presentations/lutke-rockstar-memcaching
Version cache were used in
Album pages
Last uploaded
Last hits
Popular images and so on.
The most popular images in 24 hours
Then cache is cleared?
It's not, only version number is increased, and automatic cache self expire, because cache key does not exists.
Some code with version cache
Cache Key calculation in Album$cache = CSCacheAPC::getMem();
$cacheKey = md5('version_'.$cache->getCacheVersion('album_'.(int)$Params['user_parameters']['album_id']).$mode.'album_view_url'.(int)$Params['user_parameters']['album_id'].'_page_'.$Params['user_parameters_unordered']['page']);
Includes:
Album version
$mode – sorting mode (Ex. Popular)
Page
this combination gives unique cache version for each page.
Same logic applies to all listing pages
Some benchmarks[root@ks310613 ~]# ab -n 500 -c 10 http://animeonly.org/Fantasy/Mix-16a.htmlThis is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/Copyright 2006 The Apache Software Foundation, http://www.apache.org/
Benchmarking animeonly.org (be patient)Completed 100 requestsCompleted 200 requestsCompleted 300 requestsCompleted 400 requestsFinished 500 requests
Server Software: lighttpdServer Hostname: animeonly.orgServer Port: 80
Document Path: /Fantasy/Mix-16a.htmlDocument Length: 26883 bytes
Concurrency Level: 10Time taken for tests: 0.545137 secondsComplete requests: 500Failed requests: 0Write errors: 0Total transferred: 13593092 bytesHTML transferred: 13441500 bytesRequests per second: 917.20 [#/sec] (mean)Time per request: 10.903 [ms] (mean)Time per request: 1.090 [ms] (mean, across all concurrent requests)Transfer rate: 24349.84 [Kbytes/sec] received
Connection Times (ms) min mean[+/-sd] median maxConnect: 0 0 0.0 0 0Processing: 5 10 2.9 9 23Waiting: 4 9 3.1 9 23Total: 5 10 2.9 9 23
Percentage of the requests served within a certain time (ms) 50% 9 66% 12 75% 13 80% 13 90% 13 95% 13 98% 20 99% 22 100% 23 (longest request)
Etag base caching
What is it?
An ETag (entity tag) is part of HTTP, the protocol for the World Wide Web. It is a response header that may be returned by an HTTP/1.1 compliant web server and is used to determine change in content at a given URL (http://en.wikipedia.org/wiki/HTTP_ETag)
How to use it?
$ExpireTime = 3600;$currentKeyEtag = md5($cacheKey.'user_id_'.erLhcoreClassUser::instance()->getUserID());;header('Cache-Control: max-age=' . $ExpireTime); // must-revalidateheader('Expires: '.gmdate('D, d M Y H:i:s', time()+$ExpireTime).' GMT');header('ETag: ' . $currentKeyEtag);
$iftag = isset($_SERVER['HTTP_IF_NONE_MATCH']) ? $_SERVER['HTTP_IF_NONE_MATCH'] == $currentKeyEtag : null;
if ($iftag === true){ header ("HTTP/1.0 304 Not Modified"); header ('Content-Length: 0'); exit;}
$cacheKey – from previous example cache keyUser ID is needed if user is logged in.Can be used for custom pages, that do not changeThen image is uploaded or deleted, we just increase cache version and Etag is expired automatic also.
Some MRTG screen shots 1
Hits per hour
Mysql queries
Some MRTG screen shots 2
Memcached status
Traffic stats
Conclusions
Single server with sphinx, memcached, mysql, nginx handles per day around 180 K pageviews daily.
No performance issues at this time.
Gallery home page
http://code.google.com/p/hppg/
Thank you for your attention :)
Questions etc:
remdex@gmail.com
top related