search@hyves

15
Search@Hyves Anuj Ahuja | [email protected] | anujahuja.hyves.nl | #anujmca female single amsterdam 20

Upload: anuj-ahuja

Post on 15-Jan-2015

732 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Search@Hyves

Search@Hyves

Anuj Ahuja | [email protected] | anujahuja.hyves.nl | #anujmca

female single amsterdam 20

Page 2: Search@Hyves

Old Search

• MySQL Full text indexes• Hash Match - Combinations of the searchterms are stored

Limitations• Indexing is very slow - takes ~5h to index • Fragile State management - As coordinated by daemons• Scalability - It is not transparent for Application• No support for indexing data from distributed databases

Page 3: Search@Hyves

Scale@Hyves?

• MySql Master-Slave architectureso 40 Masters, 284 Slaves 

• Storeo ~64 Clusters, ~256 Mysql Hosts

• How big is dataset [Jan 2011] ?o ~400G of indexable datao Includes reactions, photos Title, WWW, etc

Page 4: Search@Hyves

Search Architecture - Ideas

Function• Enable search for everything on Hyves• Apply social relevance/weight to content• Make new data available for search within an hour

Tech• Combine data from multiple data sources• Attributes based filtering - for example geo location• Abstract state management from data import jobs • Scaling should be transparent to application layer

Page 5: Search@Hyves

Search Architecture - Decisions 

• Pure data jobs  Vs Leveraging Hyves application stack(PHP)• Listeners Vs Iterator• Handling deletes - Realtime updates Vs Ignore on select

Page 6: Search@Hyves

Search Architecture - Technology 

• Search backend - Sphinx• Data Importers - PHP and Hadoop Job• Pre-Indexing database - Mysql on temp fs• State Management - Mysql (Innodb)• Job Orchestration – Jenkins• Deploy – Puppet, Hyves Deploy Script• Monitoring - Ganglia, Realtime stats, Google Analytics

Page 7: Search@Hyves

Search Architecture - SearchTube

Page 8: Search@Hyves

Sphinx?

• Sphinx is full text search server written in C++• Easy Distribution• Attributes based filtering• Support querying multiple indexes• Ranking - (BM25 + Phrase Proximity) + Social Relevance • Utilize multi-core machines by distributed index• Benchmarking results

Page 9: Search@Hyves

Search Tube - Job Orchestration 

• Responsible for executing and synchronizing various jobs• Jenkins Plugin

o Join Plugin - Job synchronizationo Plot plugin - Reportingo Dependency Graph View Plugin - Visualization

• Other servers are added as labeled nodeso slow slaves, hadoop node, search slaves, etc.

•  Puppetized and Jenkins API• https://github.com/salimfadhley/jenkinsapi

Page 10: Search@Hyves

Search Tube - Jenkins

Page 11: Search@Hyves

Search Tube - Reporting

Page 12: Search@Hyves

Search - Failover Scenarios

Failed 1

Failed 2

Failed 3

Failed 4

Failed 5

Failed 6

Failed 7

Failed 8

Page 13: Search@Hyves

Search - What is new?

• Simplified user interface- Single search field for searcho “ivo utrecht 26” [first name + city + age]o “amsterdam female 20” [city + gender + age]o “ram* van alte* ams” [partial search]o “milea marius” [last name + First name]o “coumans amst” [last name + city]o “hyves hq” [hub name]

• Improved Rankingo Member results are influenced by number of friendso Hub results are influenced by number of hub members.

• Snappy searcho Server side it takes ~ 20mso Enabled search on every key stroke.

• Refining results o Results can be further refined by type for example member, hub, etc.

• New Content is indexed every hour

Page 14: Search@Hyves

Search Result [December]

• Page View - 8,599,572• Ajax Search Queries - 28,742,425• Search Slaves (2x3 slaves, 2 search master )

o During peeks hours 120 Search/seco Average query ~20ms

• Google Analytic shows click through and relevance 

* Only 1% of traffic is measured by Google Analytic 

Page 15: Search@Hyves

Questions?

Anuj Ahuja | [email protected] | anujahuja.hyves.nl | #anujmca