real-time search in drupal with elasticsearch @moldcamp

90
Real-time search in Drupal. Meet Elasticsearch By Alexei Gorobets asgorobets

Upload: alexei-gorobets

Post on 11-May-2015

391 views

Category:

Technology


0 download

TRANSCRIPT

Page 2: Real-time search in Drupal with Elasticsearch @Moldcamp

Elasticsearch

Flexible and powerful open source, distributed real-time search and analytics engine for the cloud

Page 3: Real-time search in Drupal with Elasticsearch @Moldcamp

Why use Elasticsearch?

Page 4: Real-time search in Drupal with Elasticsearch @Moldcamp

● RESTful API● Open Source● JSON over HTTP● based on Lucene● distributed● highly available● schema free● massively scalable

Page 5: Real-time search in Drupal with Elasticsearch @Moldcamp
Page 6: Real-time search in Drupal with Elasticsearch @Moldcamp

Setup in 2 steps:

1. Extract the archive2. > bin/elasticsearch

Page 7: Real-time search in Drupal with Elasticsearch @Moldcamp

How to use it?

Page 8: Real-time search in Drupal with Elasticsearch @Moldcamp

> curl -XGET localhost:9200/?pretty

Page 9: Real-time search in Drupal with Elasticsearch @Moldcamp

> curl -XGET localhost:9200/?pretty

{"ok" : true,"status" : 200,"name" : "Infinity","version" : {

"number" : "0.90.1","snapshot_build" : false,"lucene_version" : "4.3"

},"tagline" : "You Know, for Search"

}

Page 10: Real-time search in Drupal with Elasticsearch @Moldcamp

> curl -XGET localhost:9200/?pretty

action (verb)

Page 11: Real-time search in Drupal with Elasticsearch @Moldcamp

> curl -XGET localhost:9200/?pretty

node + port

Page 12: Real-time search in Drupal with Elasticsearch @Moldcamp

> curl -XGET localhost:9200/?pretty

path

Page 13: Real-time search in Drupal with Elasticsearch @Moldcamp

> curl -XGET localhost:9200/?pretty

query string

Page 14: Real-time search in Drupal with Elasticsearch @Moldcamp

Let's index some data

Page 15: Real-time search in Drupal with Elasticsearch @Moldcamp

> PUT /index/type/id

Where?It's very similar to database in SQL

Page 16: Real-time search in Drupal with Elasticsearch @Moldcamp

> PUT /index/type/id

What?Table

Content type,Entity type,

any kind of type you decide

Page 17: Real-time search in Drupal with Elasticsearch @Moldcamp

> PUT /index/type/id

Which?Node ID,Entity ID,

any kind of serial ID

Page 18: Real-time search in Drupal with Elasticsearch @Moldcamp

> PUT /mysite/node/1 -d

{"nid": "1","status": "1","title": "Hello elasticsearch","body": "First elasticsearch document"

}

Page 19: Real-time search in Drupal with Elasticsearch @Moldcamp

> PUT /mysite/node/1 -d

{"nid": "1","status": "1","title": "Hello elasticsearch","body": "First elasticsearch document"

}

{"ok":true,"_index":"mysite","_type":"node","_id":"1","_version":1

}

Page 20: Real-time search in Drupal with Elasticsearch @Moldcamp

Let's GET some data

Page 21: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /mysite/node/1{ "_index" : "mysite", "_type" : "node", "_id" : "1", "_version" : 1, "exists" : true, "_source" : { "nid":"1", "status":"1", "title":"Hello elasticsearch", "body":"First elasticsearch document" }

Page 22: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /mysite/node/1?fields=title,body

Get specific fields

Page 23: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /mysite/node/1?fields=title,body

Get specific fields

> GET /mysite/node/1/_source

Get source only

Page 24: Real-time search in Drupal with Elasticsearch @Moldcamp

Let's UPDATE some data

Page 25: Real-time search in Drupal with Elasticsearch @Moldcamp

> PUT /mysite/node/1 -d

{"status":"0"

}

Page 26: Real-time search in Drupal with Elasticsearch @Moldcamp

> PUT /mysite/node/1 -d

{"ok":true,"_index":"mysite","_type":"node","_id":"1","_version":2

}

{"status":"0"

}

Page 27: Real-time search in Drupal with Elasticsearch @Moldcamp

UPDATE = DELETE + PUT

Page 28: Real-time search in Drupal with Elasticsearch @Moldcamp

Let's DELETE some data

Page 29: Real-time search in Drupal with Elasticsearch @Moldcamp

> DELETE /mysite/node/1

Page 30: Real-time search in Drupal with Elasticsearch @Moldcamp

> DELETE /mysite/node/1

{"ok":true,"found":true,"_index":"mysite","_type":"node","_id":"1","_version":3

}

Page 31: Real-time search in Drupal with Elasticsearch @Moldcamp

Distributed, Highly Available

Page 32: Real-time search in Drupal with Elasticsearch @Moldcamp
Page 33: Real-time search in Drupal with Elasticsearch @Moldcamp

> PUT /new_index -d '{ "settings" : { "number_of_shards" : 3, "number_of_replicas" : 2 }}'

Page 34: Real-time search in Drupal with Elasticsearch @Moldcamp

Concurrency, Version control

Page 35: Real-time search in Drupal with Elasticsearch @Moldcamp

> PUT /myapp/node/1?version=1{ "title": "hi girl"}

Page 36: Real-time search in Drupal with Elasticsearch @Moldcamp

> PUT /myapp/node/1?version=1{ "title": "hi girl"}

{ "_index": "myapp", "_type": "node", "_id": "1", "_version": 1, "created": false}

Page 37: Real-time search in Drupal with Elasticsearch @Moldcamp

> PUT /myapp/node/1?version=1{ "title": "hey boy"}

# 200

Page 38: Real-time search in Drupal with Elasticsearch @Moldcamp

> PUT /myapp/node/1?version=1{ "title": "hey boy"}

# 409

> version conflict, current [2], provided [1]

Page 39: Real-time search in Drupal with Elasticsearch @Moldcamp

Let's SEARCH for something

Page 40: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /_search

Page 41: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /_search

{"took" : 32,"timed_out" : false,"_shards" : {

"total" : 20,"successful" : 20,"failed" : 0

},"hits" : { results... }

}

Page 42: Real-time search in Drupal with Elasticsearch @Moldcamp

Let's SEARCH in multiple indices and types

Page 43: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /index/_search

> GET /index/type/_search

> GET /index1,index2/_search

> GET /myapp_*/type, entity_*/_search

Page 44: Real-time search in Drupal with Elasticsearch @Moldcamp

Let's PAGINATE results

Page 45: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /_search?size=10&from=20

size = results per pagefrom = starting from

Page 46: Real-time search in Drupal with Elasticsearch @Moldcamp

Let's search oldschool

Page 47: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /_search?q=title:elasticsearch

> GET /_search?q=nid:60

Page 48: Real-time search in Drupal with Elasticsearch @Moldcamp

+title:awesome +status:1 +created:[1369917354 TO *]

Page 49: Real-time search in Drupal with Elasticsearch @Moldcamp

?q=title:awesome%20%2Bcreated:[1369917354%20TO%20*]%2Bstatus:1

+title:awesome +status:1 +created:[1369917354 TO *]

The ugly encoding =)

Page 50: Real-time search in Drupal with Elasticsearch @Moldcamp

Query DSL style

Page 51: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /_search -d

{"query": {

"match": "awesome"}

}

Page 52: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /_search -d

{"query": {

"match" : { "title" : { "query" : "+awesome -poor", "boost" : 2.0, }}

}}

Page 53: Real-time search in Drupal with Elasticsearch @Moldcamp

Mappings and types

Page 54: Real-time search in Drupal with Elasticsearch @Moldcamp

Core types* string* number* date* boolean

Page 55: Real-time search in Drupal with Elasticsearch @Moldcamp

Complex types* array type* object type* nested type

Others:ip typegeo pointgeo shapeattachments

Page 56: Real-time search in Drupal with Elasticsearch @Moldcamp

Define type mapping

Page 57: Real-time search in Drupal with Elasticsearch @Moldcamp

> PUT /myapp/node -d

{ "node" : { "properties" : { "message" : {

"type" : "string", "store" : true

} } }}

Page 58: Real-time search in Drupal with Elasticsearch @Moldcamp

Indexed fields

Page 59: Real-time search in Drupal with Elasticsearch @Moldcamp

Full text

analyzed

== is splitted into terms

Term

not analyzed

== is stored as is

Page 60: Real-time search in Drupal with Elasticsearch @Moldcamp

> PUT /myapp/node -d

{ "node" : { "properties" : { "name" : {

"type" : "string", "store" : true,“index”: “not_analyzed”

} } }}

Page 61: Real-time search in Drupal with Elasticsearch @Moldcamp

Dynamic mapping

Page 62: Real-time search in Drupal with Elasticsearch @Moldcamp

Analysis and indexing

Page 63: Real-time search in Drupal with Elasticsearch @Moldcamp

Inverted index

1. “The quick brown fox jumped over the lazy dog”

2. “Quick brown foxes leap over lazy dogs in summer”

Term Doc_1 Doc_2

-------------------------

Quick | | X

The | X |

brown | X | X

dog | X |

dogs | | X

fox | X |

foxes | | X

in | | X

jumped | X |

lazy | X | X

leap | | X

over | X | X

quick | X |

summer | | X

the | X |

Page 64: Real-time search in Drupal with Elasticsearch @Moldcamp

Analyzer

Tokenizers

● standard● keyword● whitespace● ngram

TokenFilters

standardlowercasestoptruncatesnowball

Page 65: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /_analyze?analyzer=standard -d 'this is a test baby'

{ "tokens" : [ { "token" : "test", "start_offset" : 10, "end_offset" : 14, "type" : "<ALPHANUM>", "position" : 4 }, { "token" : "baby", "start_offset" : 15, "end_offset" : 19, "type" : "<ALPHANUM>", "position" : 5 } ]}

Page 66: Real-time search in Drupal with Elasticsearch @Moldcamp

Autocomplete fields

Page 67: Real-time search in Drupal with Elasticsearch @Moldcamp

Queries & Filters

Page 68: Real-time search in Drupal with Elasticsearch @Moldcamp

Queries & Filters

full text search

relevance score

heavy

not cacheable

exact match

show or hide

lightning fast

cacheable

Page 69: Real-time search in Drupal with Elasticsearch @Moldcamp

Combine Filters & Queries

Page 70: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /_search -d

{"query": {

"filtered": {"query": {

"match": { "title": "awesome" }},"filter": {

"term": { "type": "article" }}

} }

}

Page 71: Real-time search in Drupal with Elasticsearch @Moldcamp

and Sorting

Page 72: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /_search -d

{"query": {

"filtered": {"query": {

"match": { "title": "awesome" }},"filter": {

"term": { "type": "article" }}

} }"sort": {"date":"desc"}

}

Page 73: Real-time search in Drupal with Elasticsearch @Moldcamp

Relevance. Explain API

Page 74: Real-time search in Drupal with Elasticsearch @Moldcamp
Page 75: Real-time search in Drupal with Elasticsearch @Moldcamp
Page 76: Real-time search in Drupal with Elasticsearch @Moldcamp

Term frequencyHow often does the term appear in the field? The more often, the more relevant.

Inverse document frequency

How often does each term appear in the index? The more often, the less relevant. T

Field norm

How long is the field? The longer it is, the less likely it is that words in the field will be relevant.

Page 77: Real-time search in Drupal with Elasticsearch @Moldcamp

and Facets

Page 78: Real-time search in Drupal with Elasticsearch @Moldcamp

Facets on Amazon

Page 79: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /_search -d

{"facets": {

"home_team": {"terms": {

"field": "field_home_team"}

}}

}

Page 80: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /_search -d

{"facets": {

"home_team": {"terms": {

"field": "field_home_team"}

}}

}

Give your facet a name

Page 81: Real-time search in Drupal with Elasticsearch @Moldcamp

> GET /_search -d

{"facets": {

"home_team": {"terms": {

"field": "field_home_team"}

}}

}

Your facet filter can be:

● Terms● Range● Histogram● Date Histogram● Filter● Query● Statistical● Terms Stats● Geo Distance

Page 82: Real-time search in Drupal with Elasticsearch @Moldcamp

"facets" : { "home_team" : { "_type" : "terms", "missing" : 203, "total" : 100, "other" : 42, "terms" : [ { "term" : "hou", "count" : 8 }, { "term" : "sln", "count" : 6 }, ...

Page 83: Real-time search in Drupal with Elasticsearch @Moldcamp

STOP! I want this in Drupal?

Page 85: Real-time search in Drupal with Elasticsearch @Moldcamp

Development directions:

1. Search API implementation2. Field Storage API3. Alternative backends

Available modules:

Elasticsearch Elasticsearch ConnectorSearch API elasticsearch

Page 86: Real-time search in Drupal with Elasticsearch @Moldcamp

Field Storage API implementation

Elasticsearch field storage sandbox by Damien TournoudStarted in July 2011

Page 87: Real-time search in Drupal with Elasticsearch @Moldcamp

Field Storage API implementation

Elasticsearch field storage sandbox by Damien TournoudStarted in July 2011

Elasticsearch EntityFieldQuery sandbox https://drupal.org/sandbox/asgorobets/2073151

Page 88: Real-time search in Drupal with Elasticsearch @Moldcamp

Let's DEMO

Page 89: Real-time search in Drupal with Elasticsearch @Moldcamp

Let the Search be with you

Page 90: Real-time search in Drupal with Elasticsearch @Moldcamp