elasticsearch - guide to search

22
Elasticsearch Guide to search #1 Antoni Orfin [email protected]

Upload: antoni-orfin

Post on 21-Jan-2017

676 views

Category:

Technology


6 download

TRANSCRIPT

Elasticsearch Guide to search #1

Antoni Orfin [email protected]

USE CASES

1. Intelligent search engines …learning on users behaviour

„Search for cats that I would love from 3M database”

…forgiving spelling mistakes „Search for Mihael Jakson photos and show Michael Jackson photos”

USE CASES

2. Autocomplete „Show the most relevant suggestions that starts with search…”

USE CASES

3. Geo-search (Geospatial) „Search for restaurants that are nearest to ”

USE CASES

4. Search by colors (ColorSearch) „Search for flowers that are ”

OLD SCHOOL Searching in MySQL

SELECT * FROM photos WHERE title LIKE ”%cat%”

SELECT * FROM photos WHERE title LIKE ”%cats%”

Id [PK] title 1 Cute cat and dog 2 Cat plays with a dog 3 Cats playing piano

… …. 3 000 000 Hidden cat

SEARCH THEORY Building Inverted Index

Cute cat and dog #1

Cats playing piano #3

Term [PK] Id cute 1 cat 1, 2, 3 dog 1, 2 play 2, 3

… ….

Cat plays with a dog #2

SEARCH THEORY Text Analysis

Puppy and kitten with guinea pig

1. Tokenization

[Puppy] [and] [kitten] [with] [guinea] [pig]

2. Filtering tokens

[dog] [cat] [guinea] [pig]

Two separate tokens? L

ASCII Folding – róża à roza Lowercase - Cat à cat Synonyms –

kitten à cat puppy à dog

Stopwords – common words to remove

and, what, with, or Stemming - reducing inflected words to their base form

cats -> cat fishing, fisher, fished -> fish

SEARCH THEORY Text Analysis

Lekarz Chorób Wewnętrznych

stemming

Lekarz Choroba Wewnętrzny

asciifolding, lowercase lekarz choroba wewnetrzny

synonyms

internista

SEARCH THEORY Text Analysis

TECHNOLOGIES Search Engines Overview

SOLUTION

Elasticsearch is a flexible and powerful open-source, distributed, real-time search and analytics engine.

ELASTICSEARCH Architecture

Node 1

Shard 1 Shard 2 Replica 3 Replica 4

Shard 3 Shard 4 Replica 1 Replica 2

Node 2

4 shards 1 replica

Elasticsearch MySQL

Node Instance

Index Database

Type Table

Document Row

Attribute Column

ELASTICSEARCH Nomenclature

PUT [localhost:9200]/pixers/photos/_mapping { "photos" : { "properties" : { "title" : {"type" : "string", "analyzer" : "pl"}, ”categories" : {"type" : ”nested”, ...} } } }

Types string, float, double, byte, short, integer, long, date nested geo_point geo_shape … etc …

ELASTICSEARCH Mapping

localhost:9200/{index}/{type}/{document id} PUT [localhost:9200]/pixers/photos/1 { "title" : "Cute cat and dog sitting on books", "keywords": ["cat", "dog"] } GET [localhost:9200]/pixers/photos/1 DELETE [localhost:9200]/pixers/photos/1

ELASTICSEARCH REST API

Searching GET /pixers/photos/_search { "query" : { "match" : { "title" : "cat" } }

} Real life query > >

ELASTICSEARCH REST API

Query vs Filter

Query String „likes:[10 to *] and title:(+cat –dog)”

Match – „funny cat”

Fuzzy – „funy cad”

More Like This

ELASTICSEARCH Searching

Query vs Filter

Terms – [some, tags]

Range – likes > 10

Geo Distance Lat=50; Lon=20; Distance=200m

ELASTICSEARCH Searching

Query vs Filter

Nested

Bool MUST/MUST NOT/SHOULD/SHOULD NOT

Function Score

ELASTICSEARCH Searching

Aggregations Get likes stats and histogram of created_at date grouped by categories. terms: category - stats: likes - histogram: created_at

ELASTICSEARCH Analytics

Contact me at:

[email protected]

linkedin.com/in/antoniorfin twitter.com/antoniorfin

www.pixersize.com

Thank you! Questions & Answers