elasticsearch - inlogiq · 2017-01-11 · 1. 2. elasticsearch elasticsearch is a highly scalable...

14
1. 2. Elasticsearch Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time. It is generally used as the underlying engine/technology that powers applications that have complex search features and requirements. Here are a few sample use-cases that Elasticsearch could be used for: You want to collect log or transaction data and you want to analyze and mine this data to look for trends, statistics, summarizations, or anomalies. In this case, you can use Logstash (part of the Elasticsearch/Logstash/Kibana stack) to collect, aggregate, and parse your data, and then have Logstash feed this data into Elasticsearch. Once the data is in Elasticsearch, you can run searches and aggregations to mine any information that is of interest to you. You have analytics/business-intelligence needs and want to quickly investigate, analyze, visualize, and ask ad-hoc questions on a lot of data (think millions or billions of records). In this case, you can use Elasticsearch to store your data and then use Kibana (part of the Elasticsearch/Logstash/Kibana stack) to build custom dashboards that can visualize aspects of your data that are important to you. Additionally, you can use the Elasticsearch aggregations functionality to perform complex business intelligence queries against your data. Requisites: Linux based OS Java 8 + Downloading and running ElasticSearch (use this link to install and run elasticsearch) Using the REST API in console Once you have an instance of ElasticSearch up and running you can talk to it using it's JSON based REST API residing at localhost port 9200. You can use any HTTP client to talk to it. In ElasticSearch's own documentation all examples use curl, which makes for concise examples. However, when playing with the API you may find a graphical client such as or more convenient. Fiddler RESTClient Once elasticsearch is installed and running, copy and paste this command in terminal and execute: Simple REST request curl -XPOST "http://localhost:9200/_search" -d'{ "query": { "match_all" :{} } }' The above request will perform the simplest of search queries, matching all documents in all indexes on the server. Running it against a vanilla installation of ElasticSearch produces the following response as there aren't any indexes.

Upload: others

Post on 04-Aug-2020

32 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Elasticsearch - Inlogiq · 2017-01-11 · 1. 2. Elasticsearch Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search,

1. 2.

Elasticsearch

Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze bigvolumes of data quickly and in near real time. It is generally used as the underlying engine/technology that powers applicationsthat have complex search features and requirements.

Here are a few sample use-cases that Elasticsearch could be used for:

You want to collect log or transaction data and you want to analyze and mine this data to look for trends, statistics, summarizations,or anomalies. In this case, you can use Logstash (part of the Elasticsearch/Logstash/Kibana stack) to collect, aggregate, and parseyour data, and then have Logstash feed this data into Elasticsearch. Once the data is in Elasticsearch, you can run searches andaggregations to mine any information that is of interest to you.You have analytics/business-intelligence needs and want to quickly investigate, analyze, visualize, and ask ad-hoc questions on a lotof data (think millions or billions of records). In this case, you can use Elasticsearch to store your data and then use Kibana (part ofthe Elasticsearch/Logstash/Kibana stack) to build custom dashboards that can visualize aspects of your data that are important toyou. Additionally, you can use the Elasticsearch aggregations functionality to perform complex business intelligence queries againstyour data.

Requisites:

Linux based OSJava 8 +

Downloading and running ElasticSearch (use this link to install and runelasticsearch)

Using the REST API in console

Once you have an instance of ElasticSearch up and running you can talk to it using it's JSON based REST API residing at localhost port9200. You can use any HTTP client to talk to it. In ElasticSearch's own documentation all examples use curl, which makes for conciseexamples. However, when playing with the API you may find a graphical client  such as or more convenient.Fiddler RESTClient

Once elasticsearch is installed and running, copy and paste this command in terminal and execute:

Simple REST request

curl -XPOST "http://localhost:9200/_search" -d'{ "query": { "match_all" :{} }}'

The above request will perform the simplest of search queries, matching all documents in all indexes on the server. Running it against avanilla installation of ElasticSearch produces the following response as there aren't any indexes.

Page 2: Elasticsearch - Inlogiq · 2017-01-11 · 1. 2. Elasticsearch Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search,

Response

{ "took":1, "timed_out":false, "_shards":{ "total":0, "successful":0, "failed":0 }, "hits":{ "total":0, "max_score":0.0, "hits":[] }}

CRUD

While we may want to use ElasticSearch primarily for searching the first step is to populate an index with some data, meaning the "Create" ofCRUD, or rather, "indexing". While we're at it we'll also look at how to update, read and delete individual documents.

Indexing

In ElasticSearch indexing corresponds to both "Create" and "Update" in CRUD - if we index a document with a given type and ID that doesn'talready exists it's inserted. If a document with the same type and ID already exists it's overwritten.

In order to index a first JSON object we make a request to the to a URL made up of the index name, type name and ID.PUT REST APIThat is: >].http://localhost:9200/<index>/<type>/[<id

Index and type are required while the id part is optional. If we don't specify an ID, ElasticSearch will generate one for us. However, if wedon't specify an id we should use POST instead of PUT.

The index name is arbitrary. If there isn't an index with that name on the server already one will be created using default configuration.

As for the type name it too is arbitrary. It serves several purposes, including:

Each type has its own ID space.Different types can have different mappings ("schema" that defines how properties/fields should be indexed).Although it's possible, and common, to search over multiple types, it's easy to search only for one or more specific type(s).

Let's index something! We can put just about anything into our index as long as it can be represented as a single JSON object. In this tutorialwe'll be indexing and searching for movies. Here's a classic one:

Indexing

{ "title": "The Godfather", "director": "Francis Ford Coppola", "year": 1972}

To index that we decide on an index name ("movies"), a type name ("movie") and an id ("1") and make a request following the patterndescribed above with the JSON object in the body.

Page 3: Elasticsearch - Inlogiq · 2017-01-11 · 1. 2. Elasticsearch Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search,

Indexing request

curl -XPUT "http://localhost:9200/movies/movie/1" -d'{ "title": "The Godfather", "director": "Francis Ford Coppola", "year": 1972}'

After executing the request we receive a response from ElasticSearch in the form of a JSON object. Below is the screenshot taken fromRESTClient

The response object contains information about the indexing operation, such as whether it was successful ("ok") and the documents ID whichcan be of interest if we don't specify that ourselves.

If we now run the default search request which we used before creating the index, we'll see a different result.

Page 4: Elasticsearch - Inlogiq · 2017-01-11 · 1. 2. Elasticsearch Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search,

Instead of no result,  we're seeing a search result. We'll get to searching later, but for now let's rejoice in the fact that we've indexedsomething!

Updating the index

Now that we've got a movie in our index let's look at how we can update it, adding a list of genres to it. In order to do that we simply index itagain using the same ID. In other words, we make the exact same indexing request as as before but with an extended JSON objectcontaining genres.

Curl to update the index

curl -XPUT "http://localhost:9200/movies/movie/1" -d'{ "title": "The Godfather", "director": "Francis Ford Coppola", "year": 1972, "genres": ["Crime", "Drama"]}'

Page 5: Elasticsearch - Inlogiq · 2017-01-11 · 1. 2. Elasticsearch Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search,

The response from ElasticSearch is the same as before with one difference, the _version property in the result object has value two.instead of one

The version number can be used to track how many times a document has been indexed. It's primary purpose however is to allow foroptimistic concurrency control as we can supply a version in indexing requests as well and ElasticSearch will then only overwrite thedocument if the supplied version is higher than what's in the index.

Getting by ID

We've so far covered indexing new documents as well as updating existing ones. We've also seen an example of a simple search requestand that our indexed movie appeared in that.

While it's possible to search for documents in the index that's overkill if we only want to retrieve a single one with a known ID. A simple andfaster approach would be to retrieve it by ID, using GET.

In order to do that we make a GET request to the same URL as when we indexed it, only this time the ID part of the URL is mandatory.

In other words, in order to retrieve a document by ID from ElasticSearch we make a GET request to http://localhost:9200/<index>/<type>/<id>.

Let's try it with our movie using the following request:

Getting by ID

curl -XGET "http://localhost:9200/movies/movie/1" -d''

Page 6: Elasticsearch - Inlogiq · 2017-01-11 · 1. 2. Elasticsearch Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search,

As you can see the result object contains similar metadata as we've saw when indexing, such as index, type and version information. Lastbut not least it has a property named "_source" which contains the actual document.

There's not much more to say about GET as it's pretty straightforward. Let's move on to the final CRUD operation.

Deleting documents

In order to remove a single document from the index by ID we again use the same URL as for indexing and getting it, only this time wechange the HTTP method to DELETE.

Deleting the index

curl -XDELETE "http://localhost:9200/movies/movie/1" -d''

Page 7: Elasticsearch - Inlogiq · 2017-01-11 · 1. 2. Elasticsearch Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search,

The response object contains some of the usual suspects in terms of meta data, along with a property named "_found" indicating that thedocument was indeed found and that the operation was successful.

If we, after executing the DELETE call, switch back to GET we can verify that the document has indeed been deleted.

Searching

So, we've covered the basics of working with data in an ElasticSearch index and it's time to move on to more exciting things - searching.However, considering the last thing we did was to delete the only document we had from our index we'll first need some sample data. Belowis a number of indexing requests that we'll use.

Page 8: Elasticsearch - Inlogiq · 2017-01-11 · 1. 2. Elasticsearch Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search,

Indexing requests

curl -XPUT "http://localhost:9200/movies/movie/1" -d'{ "title": "The Godfather", "director": "Francis Ford Coppola", "year": 1972, "genres": ["Crime", "Drama"]}'

curl -XPUT "http://localhost:9200/movies/movie/2" -d'{ "title": "Lawrence of Arabia", "director": "David Lean", "year": 1962, "genres": ["Adventure", "Biography", "Drama"]}'

curl -XPUT "http://localhost:9200/movies/movie/3" -d'{ "title": "To Kill a Mockingbird", "director": "Robert Mulligan", "year": 1962, "genres": ["Crime", "Drama", "Mystery"]}'

curl -XPUT "http://localhost:9200/movies/movie/4" -d'{ "title": "Apocalypse Now", "director": "Francis Ford Coppola", "year": 1979, "genres": ["Drama", "War"]}'

curl -XPUT "http://localhost:9200/movies/movie/5" -d'{ "title": "Kill Bill: Vol. 1", "director": "Quentin Tarantino", "year": 2003, "genres": ["Action", "Crime", "Thriller"]}'

curl -XPUT "http://localhost:9200/movies/movie/6" -d'{ "title": "The Assassination of Jesse James by the Coward RobertFord", "director": "Andrew Dominik", "year": 2007, "genres": ["Biography", "Crime", "Drama"]}'

The _search endpoint

Now that we have put some movies into our index, let's see if we can find them again by searching. In order to search with ElasticSearch weuse the _search endpoint, optionally with an index and type. That is, we make requests to an URL following this pattern:

Page 9: Elasticsearch - Inlogiq · 2017-01-11 · 1. 2. Elasticsearch Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search,

<index>/<type>/_search where index and type are both optional.

In other words, in order to search for our movies we can make POST requests to either of the following URLs:

http://localhost:9200/_search - Search across all indexes and all types.http://localhost:9200/movies/_search - Search across all types in the movies index.http://localhost:9200/movies/movie/_search - Search explicitly for documents of type movie within the movies index.

As we only have a single index and a single type which one we use doesn't matter. We'll use the first URL for the sake of brevity.

Search request body and ElasticSearch's query DSL

If we simply send a request to one of the above URL's we'll get all of our movies back. In order to make a more useful search request wealso need to supply a request body with a query. The request body should be a JSON object which, among other things, can contain aproperty named "query" in which we can use .ElasticSearch's query DSL

Query DSL

{ "query": { //Query DSL here }}

Basic free text search

The query DSL features a long list of different types of queries that we can use. For "ordinary" free text search we'll most likely want to useone called "query string query".

A  is an advanced query with a lot of different options that ElasticSearch will parse and transform into a tree of simplerquery string queryqueries. Still, it can be very easy to use if we ignore all of its optional parameters and simply feed it a string to search for.

Let's try a search for the word "kill" which is present in the title of two of our movies:

Page 10: Elasticsearch - Inlogiq · 2017-01-11 · 1. 2. Elasticsearch Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search,

As expected we're getting two hits, one for each of the movies with the word "kill" in the title. Let's look at another scenario, searching inspecific fields.

Specifying fields to search in

In the previous example we used a very simple query, a query string query with only a single property, "query". As mentioned before thequery string query has a number of settings that we can specify and if we don't it will use sensible default values.

One such setting is called "fields" and can be used to specify a list of fields to search in. If we don't use that the query will default tosearching in a special field called "_all" that ElasticSearch automatically generates based on all of the individual fields in a document.

Let's try to search for movies only by title. That is, if we search for "ford" we want to get a hit for "The Assassination of Jesse James by theCoward Robert " but not for either of the movies directed by Francis Coppola.Ford Ford

In order to do that we modify the previous search request body so that the query string query has a fields property with an array of fields wewant to search in:

searchWithfields

curl -XPOST "http://localhost:9200/_search" -d'{ "query": { "query_string": { "query": "ford", "fields": ["title"] } }}'

Page 11: Elasticsearch - Inlogiq · 2017-01-11 · 1. 2. Elasticsearch Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search,

Filtering

We've covered a couple of simple free text search queries above. Let's look at another one where we search for "drama" without explicitlyspecifying fields:

Querying without specifying fields

curl -XPOST "http://localhost:9200/_search" -d'{ "query": { "query_string": { "query": "drama" } }}'

As we have five movies in our index containing the word "drama" in the _all field (from the category field) we get five hits for the above query.

Page 12: Elasticsearch - Inlogiq · 2017-01-11 · 1. 2. Elasticsearch Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search,

Now, imagine that we want to limit the hits to movies of genre "Drama" released in 1962. In order to do that we need to apply a filter requiringthe "year" field to equal 1962.

Page 13: Elasticsearch - Inlogiq · 2017-01-11 · 1. 2. Elasticsearch Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search,

Filtering without a query

In the above example we limit the results of a query string query using a filter. What if all we want to do is apply a filter? That is, we want allmovies matching a certain criteria.

In such cases we still use the "query" property in the search request body, which expects a query. In other words, we can't just add a filter,we need to wrap it in some sort of query.

One solution for doing this is to modify our current search request, replacing the query string query in the filtered query with a match_allquery which is a query that simply matches everything. Like this:

Page 14: Elasticsearch - Inlogiq · 2017-01-11 · 1. 2. Elasticsearch Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search,

Filter with match_all

curl -XPOST "http://localhost:9200/_search" -d'{ "query": { "bool": { "must": { "match_all": {} }, "filter": { "term": { "year": 1962 } } } }}'

Another, simpler option is to use a constant score query:

Constant score

curl -XPOST "http://localhost:9200/_search" -d'{ "query": { "constant_score": { "filter": { "term": { "year": 1962 } } } }}'