meetup elasticsearch 13 novembre 2014
DESCRIPTION
ElasticsearchFR meetup #11 Orange French search enginesTRANSCRIPT
Orange search engine
Jean-Pierre Paris
Orange France
november 13th, 2014
2 Orange French search engines and Elastisearch
agenda
part 1 Orange French search engine
part 2 why Elasticsearch?
part 3 conclusion
3 Orange French search engines and Elastisearch
Orange search engine
4 millions ~1 million
8 bn docs FR
80 persons ~1000 servers 3 datacenters
4 Orange French search engines and Elastisearch
search engine response page
§ one response page…
§ with a lot of data sources
§ and a lot of engines
5 Orange French search engines and Elastisearch
vertical search engines
6 Orange French search engines and Elastisearch
web search and web graph
repris de Wikipedia
7 Orange French search engines and Elastisearch
volume
§ vertical search engines
– 10m documents – 5 engines in 2014
§ web graph
– 8bn urls – 2bn internal vertices – 6bn leaf vertices
– 100bn edges
13TB
10GB
8 Orange French search engines and Elastisearch
agenda
part 1 Orange French search engine
part 2 why Elasticsearch?
part 3 conclusion
9 Orange French search engines and Elastisearch
our needs
§ vertical search engines
– adopt one common technology – lower maintenance cost – prepare future needs
§ web graph
– gain insight on large dataset – build analysis and visualization – test new technology with large volume
10 Orange French search engines and Elastisearch
Elasticsearch responses
§ rest interface
§ near real time distributed indexing and distributed search
§ native full text search
– with a lot of different queries and wildcards § facets… oups! aggregations!
– values distribution on a specific criterion § interactive mode while exploring a dataset
– short query response time
11 Orange French search engines and Elastisearch
hardware architecture
…
…
x30
x30
Elasticsearch cluster store store store
12 Orange French search engines and Elastisearch
indexing with ES v0.90
§ performances
– starting at 160 doc/s (1 injector, 4 ES 2cpus, 4GB) – with bulk 1000: 920 doc/s (1 injector, 4 ES 2cpus, 4GB) – 3 injectors: 570 doc/s * 3 = 1700 doc/s
– 1 injector, 30 ES (8cpus, 16GB): 1700 doc/s – 30 injectors, 30 ES (8cpus, 16GB): 32,000 doc/s – 30 injectors, 60 ES (http-data) (8cpus, 16GB): 36,000 doc/s – 240 injectors, 60 ES (http-data) (8cpus, 16GB): 75,000 doc/s, then
43,000 doc/s – 1bn docs in 5h (55,000 doc/s)
13 Orange French search engines and Elastisearch
hardware architecture
…
…
x30
x30
Elasticsearch cluster
store http
data data data
store http
store http
14 Orange French search engines and Elastisearch
number of shards
0
200
400
600
800
1000
1200
0 5 10 15 20 25 30
321 sec for 12 shards
sec!
#shards!
15 Orange French search engines and Elastisearch
bulksize
0
100
200
300
400
500
600
0 1000 2000 3000 4000 5000 6000 7000 8000
278 sec for bulksize 1700
bulksize!
sec!
16 Orange French search engines and Elastisearch
searching
§ performance
– 2 req/s out of the box with 6.5TB index – OS cache is mandatory
– 130 req/s in cache – lot of requests needed to load cache
§ relevance
– good for vertical engines – non significant in web graph experimentation
17 Orange French search engines and Elastisearch
why Elasticsearch AND hadoop?
§ simply use existing bridge
– open-sourced by Elasticsearch § ability to choose best technology
– performance – expression power
§ examples
– compute and re-inject back links – distribute Elasticsearch injections
18 Orange French search engines and Elastisearch
hardware architecture
…
…
x30
x30
Elasticsearch cluster
hadoop cluster
x1
http http http
data hdfs
hadoop hive pig
master
data hdfs
hadoop hive pig
data hdfs
hadoop hive pig
19 Orange French search engines and Elastisearch
agenda
part 1 Orange French search engine
part 2 why Elasticsearch?
part 3 conclusion
20 Orange French search engines and Elastisearch
conclusion
§ vertical engines
– migration decided 09/13 – first set in production 01/14
§ web graph
– experimentation decided 09/13 – 1bn docs indexed 12/13, significant queries 03/14
§ professional community
§ connectors to others technologies
§ flexibility
– production and experimentation – high volume
thanks! more infos http://blog.lemoteur.fr
or @lemoteur