elastic search: beyond ordinary fulltext search (webexpo 2011 prague)

75
ElasticSearch Beyond Ordinary Fulltext Search Karel Mina ř ík

Upload: karel-minarik

Post on 10-May-2015

7.090 views

Category:

Technology


0 download

DESCRIPTION

Talk at the Webexpo 2001 Conference in Prague (http://webexpo.net/)

TRANSCRIPT

Page 1: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

ElasticSearchBeyond Ordinary Fulltext Search

Karel Minařík

Page 2: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

ElasticSearch

http://karmi.cz

Page 3: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

ElasticSearch

AUDIENCE POLL

Does your application have a search feature?

Page 4: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

ElasticSearch

AUDIENCE POLL

What do you use for search?

1. SELECT  ...  LIKE  %foo%

2. Sphinx3. Apache Solr4. ElasticSearch

Page 5: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

ElasticSearch

Search is the primary interfacefor getting information today.

Page 6: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)
Page 9: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

???

Page 10: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

???

Page 11: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)
Page 12: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

#uxfail???

Page 13: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

Y U NO ALIGN???

Page 14: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)
Page 15: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

???

Page 16: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

???

Page 17: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)
Page 18: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

ElasticSearch

Search is hard.Let's go write SQL queries!

Page 19: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

How do you implement search?WHY SEARCH SUCKS?

def  search    @results  =  MyModel.search  params[:q]    respond_with  @resultsend

Page 20: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

def  search    @results  =  MyModel.search  params[:q]    respond_with  @resultsend

How do you implement search?WHY SEARCH SUCKS?

ResultResultsQuery

MAGIC

Page 21: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

def  search    @results  =  MyModel.search  params[:q]    respond_with  @resultsend

ResultResultsQuery

How do you implement search?WHY SEARCH SUCKS?

MAGIC + /

Page 22: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

A personal story...

670px

23px

Page 23: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

MyModel.search  "(this  OR  that)  AND  NOT  whatever"

Arel::Table.new(:articles).    where(articles[:title].eq('On  Search')).    where(["published_on  =>  ?",  Time.now]).    join(comments).    on(article[:id].eq(comments[:article_id]))    take(5).    skip(4).    to_sql

Compare your search library with your ORM libraryWHY SEARCH SUCKS?

Page 24: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

ElasticSearch

How does search work?

Page 25: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

A collection of documentsHOW DOES SEARCH WORK?

file_1.txtThe  ruby  is  a  pink  to  blood-­‐red  colored  gemstone  ...

file_2.txtRuby  is  a  dynamic,  reflective,  general-­‐purpose  object-­‐oriented  programming  language  ...

file_3.txt"Ruby"  is  a  song  by  English  rock  band  Kaiser  Chiefs  ...

Page 26: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

How do you search documents?HOW DOES SEARCH WORK?

File.read('file_1.txt').include?('ruby')File.read('file_2.txt').include?('ruby')...

Page 27: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

The inverted indexHOW DOES SEARCH WORK?

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices

TOKENS POSTINGS

ruby file_1.txt file_2.txt file_3.txt

pink file_1.txt

gemstone file_1.txt

dynamic file_2.txt

reflective file_2.txt

programming file_2.txt

song file_3.txt

english file_3.txt

rock file_3.txt

Page 28: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

The inverted indexHOW DOES SEARCH WORK?

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices

ruby file_1.txt file_2.txt file_3.txt

pink file_1.txt

gemstone file_1.txt

dynamic file_2.txt

reflective file_2.txt

programming file_2.txt

song file_3.txt

english file_3.txt

rock file_3.txt

MySearchLib.search  "ruby"

Page 29: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

The inverted indexHOW DOES SEARCH WORK?

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices

pink file_1.txt

gemstone file_1.txt

dynamic file_2.txt

reflective file_2.txt

programming file_2.txt

song file_3.txt

english file_3.txt

rock file_3.txt

MySearchLib.search  "song"

ruby file_1.txt file_2.txt file_3.txt

Page 30: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

The inverted indexHOW DOES SEARCH WORK?

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices

ruby file_1.txt file_2.txt file_3.txt

pink file_1.txt

gemstone file_1.txt

dynamic file_2.txt

reflective file_2.txt

programming file_2.txt

english file_3.txt

rock file_3.txt

MySearchLib.search  "ruby  AND  song"

song file_3.txt

Page 31: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

module  SimpleSearch

   def  index  document,  content        tokens  =  analyze  content        store  document,  tokens        puts  "Indexed  document  #{document}  with  tokens:",  tokens.inspect,  "\n"    end

   def  analyze  content        #  >>>  Split  content  by  words  into  "tokens"        content.split(/\W/).        #  >>>  Downcase  every  word        map        {  |word|  word.downcase  }.        #  >>>  Reject  stop  words,  digits  and  whitespace        reject  {  |word|  STOPWORDS.include?(word)  ||  word  =~  /^\d+/  ||  word  ==  ''    }    end

   def  store  document_id,  tokens        tokens.each  do  |token|            #  >>>  Save  the  "posting"            (  (INDEX[token]  ||=  [])  <<  document_id  ).uniq!        end    end

   def  search  token        puts  "Results  for  token  '#{token}':"        #  >>>  Print  documents  stored  in  index  for  this  token        INDEX[token].each  {  |document|  "    *  #{document}"  }    end

   INDEX  =  {}    STOPWORDS  =  %w|a  an  and  are  as  at  but  by  for  if  in  is  it  no  not  of  on  or  that  the  then  there  these  this  to  with|

   extend  self

end

A naïve Ruby implementation

Page 32: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

SimpleSearch.index  "file1",  "Ruby  is  a  language.  Java  is  also  a  language."SimpleSearch.index  "file2",  "Ruby  is  a  song."SimpleSearch.index  "file3",  "Ruby  is  a  stone."SimpleSearch.index  "file4",  "Java  is  a  language."

Indexed  document  file1  with  tokens:["ruby",  "language",  "java",  "also",  "language"]

Indexed  document  file2  with  tokens:["ruby",  "song"]

Indexed  document  file3  with  tokens:["ruby",  "stone"]

Indexed  document  file4  with  tokens:["java",  "language"]

Indexing documentsHOW DOES SEARCH WORK?

Words downcased,stopwords removed.

Page 33: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

puts  "What's  in  our  index?"

p  SimpleSearch::INDEX

{    "ruby"          =>  ["file1",  "file2",  "file3"],    "language"  =>  ["file1",  "file4"],    "java"          =>  ["file1",  "file4"],    "also"          =>  ["file1"],    "stone"        =>  ["file3"],    "song"          =>  ["file2"]}

The indexHOW DOES SEARCH WORK?

Page 34: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

SimpleSearch.search  "ruby"

Results  for  token  'ruby':*  file1*  file2*  file3

Search the indexHOW DOES SEARCH WORK?

Page 35: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

The inverted indexHOW DOES SEARCH WORK?

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices

TOKENS POSTINGS

ruby file_1.txt file_2.txt file_3.txt

pink file_1.txt

gemstone file_1.txt

dynamic file_2.txt

reflective file_2.txt

programming file_2.txt

song file_3.txt

english file_3.txt

rock file_3.txt

31

Page 36: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

ElasticSearch

It is very practical to know how search works.

For instance, now you know thatthe analysis step is very important.

It's more important than the “search” step.

Page 37: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

module  SimpleSearch

   def  index  document,  content        tokens  =  analyze  content        store  document,  tokens        puts  "Indexed  document  #{document}  with  tokens:",  tokens.inspect,  "\n"    end

   def  analyze  content        #  >>>  Split  content  by  words  into  "tokens"        content.split(/\W/).        #  >>>  Downcase  every  word        map        {  |word|  word.downcase  }.        #  >>>  Reject  stop  words,  digits  and  whitespace        reject  {  |word|  STOPWORDS.include?(word)  ||  word  =~  /^\d+/  ||  word  ==  ''    }    end

   def  store  document_id,  tokens        tokens.each  do  |token|            #  >>>  Save  the  "posting"            (  (INDEX[token]  ||=  [])  <<  document_id  ).uniq!        end    end

   def  search  token        puts  "Results  for  token  '#{token}':"        #  >>>  Print  documents  stored  in  index  for  this  token        INDEX[token].each  {  |document|  "    *  #{document}"  }    end

   INDEX  =  {}    STOPWORDS  =  %w|a  an  and  are  as  at  but  by  for  if  in  is  it  no  not  of  on  or  that  the  then  there  these  this  to  with|

   extend  self

end A naïve Ruby implementation

Page 38: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

http://search-engines-book.com

Search EnginesInformation Retrieval in PracticeBruce Croft, Donald Metzler and Trevor StrohmaAddison Wesley, 2009

The Search Engine TextbookHOW DOES SEARCH WORK?

Page 39: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

Lucene in ActionMichael McCandless, Erik Hatcher and Otis GospodneticJuly, 2010

The Baseline Information Retrieval ImplementationSEARCH IMPLEMENTATIONS

http://manning.com/hatcher3

Page 40: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

http://elasticsearch.org

Page 41: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

ElasticSearch

ElasticSearch is an open source, scalable, distributed, cloud-ready, highly-available full-text search engine and database with powerfull aggregation features, communicating by JSON over RESTful HTTP, based on Apache Lucene.

Page 42: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)
Page 43: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

ElasticSearch

HTTPJSONSchema-freeIndex as ResourceDistributedQueriesFacetsMappingRuby

{ }

Page 44: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

#  Add  a  documentcurl  -­‐X  POST  \

   "http://localhost:9200/articles/article/1"  \      

   -­‐d  '{  "title"  :  "One"  }'

INDEX TYPE ID

ELASTICSEARCH FEATURES

HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

DOCUMENT

Page 45: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

ELASTICSEARCH FEATURES

#  Add  a  documentcurl  -­‐X  POST  "http://localhost:9200/articles/article/1"  -­‐d  '{  "title"  :  "One"  }'

#  Perform  querycurl  -­‐X  GET    "http://localhost:9200/articles/_search?q=One"curl  -­‐X  POST  "http://localhost:9200/articles/_search"  -­‐d  '{    "query"  :  {  "terms"  :  {  "tags"  :  ["ruby",  "python"],  "minimum_match"  :  2  }  }}'

#  Delete  indexcurl  -­‐X  DELETE    "http://localhost:9200/articles"

#  Create  index  with  settings  and  mappingcurl  -­‐X  PUT      "http://localhost:9200/articles"  -­‐d  '{  "settings"  :  {  "index"  :  "number_of_shards"  :  3,  "number_of_replicas"  :  2  }},{  "mappings"  :  {  "document"  :  {                                      "properties"  :  {                                          "body"  :  {  "type"  :  "string",  "analyzer"  :  "snowball"  }                                      }                              }  }}'

HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

Page 46: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

http  {

   server  {

       listen              8080;        server_name    search.example.com;

       error_log      elasticsearch-­‐errors.log;        access_log    elasticsearch.log;

       location  /  {

           #  Deny  access  to  Cluster  API            if  ($request_filename  ~  "_cluster")  {                return  403;                break;            }

           #  Pass  requests  to  ElasticSearch            proxy_pass  http://localhost:9200;            proxy_redirect  off;                                proxy_set_header    X-­‐Real-­‐IP    $remote_addr;            proxy_set_header    X-­‐Forwarded-­‐For  $proxy_add_x_forwarded_for;            proxy_set_header    Host  $http_host;

           #  Authorize  access            auth_basic                      "ElasticSearch";            auth_basic_user_file  passwords;

           #  Route  all  requests  to  authorized  user's  own  index            rewrite    ^(.*)$    /$remote_user$1    break;            rewrite_log  on;

           return  403;                }

   }}

HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

GET  http://user:password@localhost:8080/_search?q=*  =>  http://localhost:9200/user/_search?q=*

https://gist.github.com/986390

#664 Add HTTPS and basic authentication support NO.

Page 47: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

{    "id"        :  "abc123",

   "title"  :  "ElasticSearch  Understands  JSON!",

   "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s  first  ...",

   "published_on"  :  "2011/05/27  10:00:00",        "tags"    :  ["search",  "json"],

   "author"  :  {        "first_name"  :  "Clara",        "last_name"    :  "Rice",        "email"            :  "[email protected]"    }}

JSON

Page 48: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

curl  -­‐X  DELETE  "http://localhost:9200/articles";  sleep  1curl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d  '

{    "id"        :  "abc123",

   "title"  :  "ElasticSearch  Understands  JSON!",

   "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s  first  ...",

   "published_on"  :  "2011/05/27  10:00:00",        "tags"    :  ["search",  "json"],

   "author"  :  {        "first_name"  :  "Clara",        "last_name"    :  "Rice",        "email"            :  "[email protected]"    }}'curl  -­‐X  POST      "http://localhost:9200/articles/_refresh"

curl  -­‐X  GET  \    "http://localhost:9200/articles/article/_search?q=author.first_name:clara"

HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

Page 49: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

curl  -­‐X  GET        "http://localhost:9200/articles/_mapping?pretty=true"{    "articles"  :  {        "article"  :  {            "properties"  :  {                "title"  :  {                    "type"  :  "string"                },                //  ...                "author"  :  {                    "dynamic"  :  "true",                    "properties"  :  {                        "first_name"  :  {                            "type"  :  "string"                        },                        //  ...                    }                },                "published_on"  :  {                    "format"  :  "yyyy/MM/dd  HH:mm:ss||yyyy/MM/dd",                    "type"  :  "date"                }            }        }    }}

HTTP / JSON / Schema-free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

curl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d  '..."published_on"  :  "2011/05/27  10:00:00",...

Page 50: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

curl  -­‐X  POST      "http://localhost:9200/articles/comment"  -­‐d  '{        "body"  :  "Wow!  Really  nice  JSON  support.",

   "published_on"  :  "2011/05/27  10:05:00",

   "author"  :  {        "first_name"  :  "John",        "last_name"    :  "Pear",        "email"            :  "[email protected]"    }}'curl  -­‐X  POST      "http://localhost:9200/articles/_refresh"

curl  -­‐X  GET  \    "http://localhost:9200/articles/comment/_search?q=author.first_name:john"

HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

DIFFERENT TYPE

Page 51: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

curl  -­‐X  GET  \    "http://localhost:9200/articles/comment/_search?q=body:json"

curl  -­‐X  GET  \    "http://localhost:9200/articles/_search?q=body:json"

curl  -­‐X  GET  \    "http://localhost:9200/articles,users/_search?q=body:json"

curl  -­‐X  GET  \    "http://localhost:9200/_search?q=body:json"

HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

Search single type

Search whole index

Search multiple indices

Search all indices

Page 52: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

curl  -­‐X  DELETE  "http://localhost:9200/articles";  sleep  1

curl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d  '{    "id"        :  "abc123",

   "title"  :  "ElasticSearch  Understands  JSON!",

   "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s  first  ...",

   "published_on"  :  "2011/05/27  10:00:00",        "tags"    :  ["search",  "json"],

   "author"  :  {        "first_name"  :  "Clara",        "last_name"    :  "Rice",        "email"            :  "[email protected]"    }}'curl  -­‐X  POST      "http://localhost:9200/articles/_refresh"

curl  -­‐X  GET  "http://localhost:9200/articles/article/abc123"

Page 53: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

{"_index":"articles","_type":"article","_id":"1","_version":1,  "_source"  :  {    "id"        :  "1",

   "title"  :  "ElasticSearch  Understands  JSON!",

   "body"    :  "ElasticSearch  not  only  “works”  with  JSON,  it  understands  it!  Let’s  first  ...",

   "published_on"  :  "2011/05/27  10:00:00",        "tags"    :  ["search",  "json"],

   "author"  :  {        "first_name"  :  "Clara",        "last_name"    :  "Rice",        "email"            :  "[email protected]"    }}}

“The Index Is Your Database”

Page 54: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

my_alias

index_A

index_B

curl  -­‐X  POST  'http://localhost:9200/_aliases'  -­‐d  '{    "actions"  :  [        {  "add"  :  {                "index"  :  "index_1",                "alias"  :  "myalias"            }        },        {  "add"  :  {                "index"  :  "index_2",                "alias"  :  "myalias"            }        }    ]}'

Index Aliases

Page 55: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

logs

The “Sliding Window” problem

logs_2010_02

logs_2010_03

logs_2010_04

curl  -­‐X  DELETE  http://localhost:9200  /  logs_2010_01

“We can really store only three months worth of data.”

Page 56: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

curl  -­‐X  PUT  localhost:9200/_template/bookmarks_template  -­‐d  '{    "template"  :  "users_*",

   "settings"  :  {        "index"  :  {            "number_of_shards"      :  1,            "number_of_replicas"  :  3        }    },

   "mappings":  {        "url":  {            "properties":  {                "url":  {                    "type":  "string",  "analyzer":  "url_ngram",  "boost":  10                },                "title":  {                    "type":  "string",  "analyzer":  "snowball",  "boost":  5                }                //  ...            }        }    }}'

HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

Apply this configurationfor every matchingindex being created

http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html

Index Templates

Page 57: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

Node 1 Node 2 Node 3 Node 4MASTER

Automatic Discovery Protocol

http://www.elasticsearch.org/guide/reference/modules/discovery/

$  cat  elasticsearch.yml

cluster:    name:  <YOUR  APPLICATION>

Page 58: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

A

A1

A2

A3

A1'

A2'

A3'

A1''

A2''

A3''

Replicas

Shards

curl  -­‐XPUT  'http://localhost:9200/A/'  -­‐d  '{        "settings"  :  {                "index"  :  {                        "number_of_shards"      :  3,                        "number_of_replicas"  :  2                }        }}'

Index is split into 3 shards, and duplicated in 2 replicas.

Page 59: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

Improve indexing performance

HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

SHARDS

REPLICAS

Impro

ve search perfo

rmance

Page 60: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

Y U NO ASK FIRST???

Page 61: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

HTTP / JSON / Schema Free / Index as Resource / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

Indexing 100 000 documents (~ 56MB), one shard, no replicas, MacBookAir SSD 2GB

#  Index  all  at  oncetime  curl  -­‐s  -­‐X  POST  "http://localhost:9200/_bulk"  \    -­‐-­‐data-­‐binary  @data/bulk_all.json  >  /dev/null

real   2m1.142s

#  Index  in  batches  of  1000for  file  in  data/bulk_*.json;  do    time  curl  -­‐s  -­‐X  POST  "http://localhost:9200/_bulk"  \        -­‐-­‐data-­‐binary  @$file  >  /dev/nulldone

real   1m36.697s  (-­‐25sec,  80%)

#  Do  not  refresh  during  indexing  in  batches"settings"  :  {  "refresh_interval"  :  "-­‐1"  }for  file  in  data/bulk_*.json;  do...real   0m38.859s  (-­‐82sec,  32%)

Page 62: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

Terms appleapple  iphone

Phrases "apple  iphone"

Proximity "apple  safari"~5

Fuzzy apple~0.8

Wildcards app**pp*

Boosting apple^10  safari

Range [2011/05/01  TO  2011/05/31][java  TO  json]

Boolean

apple  AND  NOT  iphone+apple  -­‐iphone(apple  OR  iphone)  AND  NOT  review

Fieldstitle:iphone^15  OR  body:iphonepublished_on:[2011/05/01  TO  "2011/05/27  10:00:00"]

http://lucene.apache.org/java/3_1_0/queryparsersyntax.html

$  curl  -­‐X  GET  "http://localhost:9200/_search?q=<YOUR  QUERY>"

Page 63: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

curl  -­‐X  POST  "http://localhost:9200/articles/_search?pretty=true"  -­‐d  '{    "query"  :  {        "terms"  :  {            "tags"  :  [  "ruby",  "python"  ],            "minimum_match"  :  2        }    }}'

HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

Query DSL

http://www.elasticsearch.org/guide/reference/query-dsl/

JSON

Page 64: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

curl  -­‐X  POST  "http://localhost:9200/venues/venue"  -­‐d  '{    "name":  "Pizzeria",    "pin":  {        "location":  {            "lat":  50.071712,            "lon":  14.386832        }    }}'

curl  -­‐X  POST  "http://localhost:9200/venues/_search?pretty=true"  -­‐d  '{

   "query"  :  {        "filtered"  :  {                "query"  :  {  "query_string"  :  {  "query"  :  "pizzeria"  }  },                "filter"  :  {                        "geo_distance"  :  {                                "distance"  :  "0.5km",                                "pin.location"  :  {  "lat"  :  50.071481,  "lon"  :  14.387284  }                        }                }        }    }}'

HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

Accepted  formats  for  Geo:

[lon, lat] # Array

"lat,lon" # String

drm3btev3e86 # Geohash

Geo Search

http://www.elasticsearch.org/guide/reference/query-dsl/geo-distance-filter.html

Page 65: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/

Query

Facets

Page 66: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

curl  -­‐X  POST  "http://localhost:9200/articles/_search?pretty=true"  -­‐d  '{    "query"  :  {        "query_string"  :  {  "query"  :  "title:T*"}    },    "filter"  :  {        "terms"  :  {  "tags"  :  ["ruby"]  }    },    "facets"  :  {        "tags"  :  {            "terms"  :  {                    "field"  :  "tags",                    "size"  :  10            }        }    }}'

#  facets"  :  {#      "tags"  :  {#          "terms"  :  [  {#              "term"  :  "ruby",#              "count"  :  2#          },  {#              "term"  :  "python",#              "count"  :  1#          },  {#              "term"  :  "java",#              "count"  :  1#          }  ]#      }#  }

User query

“Checkboxes”

Facets

http://www.elasticsearch.org/guide/reference/api/search/facets/index.html

Page 67: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

curl  -­‐X  POST  "http://localhost:9200/articles/_search?pretty=true"  -­‐d  '{    "facets"  :  {        "published_on"  :  {            "date_histogram"  :  {                "field"        :  "published",                "interval"  :  "day"            }        }    }}'

Page 68: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

Geo Facetscurl  -­‐X  POST  "http://localhost:9200/venues/_search?pretty=true"  -­‐d  '{        "query"  :  {  "query_string"  :  {  "query"  :  "pizzeria"  }  },        "facets"  :  {                "distance_count"  :  {                        "geo_distance"  :  {                                "pin.location"  :  {                                        "lat"  :  50.071712,                                        "lon"  :  14.386832                                },                                "ranges"  :  [                                        {  "to"  :  1  },                                        {  "from"  :  1,  "to"  :  5  },                                        {  "from"  :  5,  "to"  :  10  }                                ]                        }                }        }}'

HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

http://www.elasticsearch.org/guide/reference/api/search/facets/geo-distance-facet.html

Page 69: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

   def  analyze  content        #  >>>  Split  content  by  words  into  "tokens"        content.split(/\W/).        #  >>>  Downcase  every  word        map        {  |word|  word.downcase  }.        #  ...    end

HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

Remember?

curl  -­‐X  DELETE  "http://localhost:9200/articles"curl  -­‐X  POST      "http://localhost:9200/articles/article"  -­‐d  '{    "mappings":  {        "article":  {            "properties":  {                "tags":  {                    "type":  "string",                    "analyzer":  "keyword"                },                "content":  {                    "type":  "string",                    "analyzer":  "snowball"                },                "title":  {                    "type":  "string",                    "analyzer":  "snowball",                    "boost":        10.0                }            }        }    }}'

curl  -­‐X  GET        'http://localhost:9200/articles/_mapping?pretty=true'

http://www.elasticsearch.org/guide/reference/api/admin-indices-create-index.html

Page 70: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / Ruby

ELASTICSEARCH FEATURES

curl  -­‐X  DELETE  "http://localhost:9200/urls"curl  -­‐X  POST      "http://localhost:9200/urls/url"  -­‐d  '{    "settings"  :  {        "index"  :  {            "analysis"  :  {                "analyzer"  :  {                    "url_analyzer"  :  {                        "type"  :  "custom",                        "tokenizer"  :  "lowercase",                        "filter"        :  ["stop",  "url_stop",  "url_ngram"]                    }                },                "filter"  :  {                    "url_stop"  :  {                        "type"  :  "stop",                        "stopwords"  :  ["http",  "https",  "www"]                    },                    "url_ngram"  :  {                        "type"  :  "nGram",                        "min_gram"  :  3,                        "max_gram"  :  5                    }                }            }        }    }}'

https://gist.github.com/988923

Page 71: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / RubyELASTICSEARCH FEATURES

Tire.index  'articles'  do    delete    create

   store  :title  =>  'One',      :tags  =>  ['ruby'],                      :published_on  =>  '2011-­‐01-­‐01'    store  :title  =>  'Two',      :tags  =>  ['ruby',  'python'],  :published_on  =>  '2011-­‐01-­‐02'    store  :title  =>  'Three',  :tags  =>  ['java'],                      :published_on  =>  '2011-­‐01-­‐02'    store  :title  =>  'Four',    :tags  =>  ['ruby',  'php'],        :published_on  =>  '2011-­‐01-­‐03'

   refreshend

s  =  Tire.search  'articles'  do    query  {  string  'title:T*'  }

   filter  :terms,  :tags  =>  ['ruby']

   sort  {  title  'desc'  }

   facet  'global-­‐tags'    {  terms  :tags,  :global  =>  true  }

   facet  'current-­‐tags'  {  terms  :tags  }end

http://github.com/karmi/tire

Page 72: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

class  Article  <  ActiveRecord::Base    include  Tire::Model::Search    include  Tire::Model::Callbacksend

http://github.com/karmi/tire

HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / RubyELASTICSEARCH FEATURES

Article.search  do    query  {  string  'love'  }    facet('timeline')  {  date  :published_on,  :interval  =>  'month'  }    sort    {  published_on  'desc'  }end

$  rake  environment  tire:import  CLASS='Article'

Page 73: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

class  Article    include  Whatever::ORM

   include  Tire::Model::Search    include  Tire::Model::Callbacksend

http://github.com/karmi/tire

HTTP / JSON / Schema Free / Distributed / Queries / Facets / Mapping / RubyELASTICSEARCH FEATURES

Article.search  do    query  {  string  'love'  }    facet('timeline')  {  date  :published_on,  :interval  =>  'month'  }    sort    {  published_on  'desc'  }end

$  rake  environment  tire:import  CLASS='Article'

Page 74: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

$  rails  new  tired  -­‐m  "https://gist.github.com/raw/951343/tired.rb"

A “batteries included” installation.Downloads and launches ElasticSearch.Sets up a Rails applicationand and launches it.When you're tired of it, just delete the folder.

Try ElasticSearch in a Ruby On Rails aplication with a one-line command

Page 75: Elastic Search: Beyond Ordinary Fulltext Search (Webexpo 2011 Prague)

Thanks!d