Transcript
Page 1: Elasticsearch Introduction at BigData meetup

Introduction to Elasticsearch27th May 2014 - BigData Meetup

Eric Rodriguez @wavyx

Page 2: Elasticsearch Introduction at BigData meetup

About MeEric Rodriguez Founder of data.be !• Web entrepreneur • Data addict • Multi-Language: PHP, Java/

Groovy/Grails, .Net, …

be.linkedin.com/in/erodriguez !github.com/wavyx !@wavyx

Page 3: Elasticsearch Introduction at BigData meetup

Elasticsearch - Company

• Founded in 2012 => http://www.elasticsearch.com

• Professional services

• Training

• Consultancy / Development support

• Production support subscription (3 levels of SLAs)

Page 4: Elasticsearch Introduction at BigData meetup

Enterprises using Elasticsearch

Page 5: Elasticsearch Introduction at BigData meetup

(M)ELK Stack

• Elasticsearch - Search server based on Lucene

• Logstash - Tool for managing events and logs

• Kibana - Visualize logs and time-stamped data

• Marvel - Monitor your cluster’s heartbeat

You Know, for Search…

Page 6: Elasticsearch Introduction at BigData meetup

Logstash• Collect, parse, index, and search logs

Page 7: Elasticsearch Introduction at BigData meetup

Kibana• A versatile dashboard to see and interact with your data

Page 8: Elasticsearch Introduction at BigData meetup

Marvel• Monitor the health of your cluster

cluster-wide metrics, overview of all nodes and indices and events (master election, new nodes)

Page 9: Elasticsearch Introduction at BigData meetup

real time, search and

analytics engine

open-source

Lucene

JSON

schema free

documentstore

RESTful

API

documentation

scalability

high availability

distributed

multi tenancy

per-operation persistence

Page 10: Elasticsearch Introduction at BigData meetup

Use Cases• Full-Text Search

• Data Store

• Analytics

• Alerts

• Ads

• …

Page 11: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 12: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 13: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 14: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 15: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 16: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 17: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 18: Elasticsearch Introduction at BigData meetup

Elasticsearch core• Apache Lucene is a high-performance, full-featured text search engine library

written entirely in Java

• Elasticsearch added value: “Simple is best”

• Simple API (with documentation)

• JSON & RESTful

• Sharding & Replication

• Extensibility: plugins and scripts

• Interoperability: clients and integrations

Page 19: Elasticsearch Introduction at BigData meetup

Terms for DBAs

• Index

• Type

• Document

• Fields

• Mapping

ElasticsearchRDBMs

• Database

• Table

• Row

• Column

• Schema

Page 20: Elasticsearch Introduction at BigData meetup

Plug & Play

• Zero configuration

• 4 LoC to get started ;)

Page 21: Elasticsearch Introduction at BigData meetup

Alive !

=> http://localhost:9200/?pretty

Page 22: Elasticsearch Introduction at BigData meetup

REST• Check your cluster, node, and index health, status, and statistics

• Administer your cluster, node, and index data and metadata

• Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes

• Execute advanced search operations such as paging, sorting, filtering, scripting, faceting, aggregations, and many others

Page 23: Elasticsearch Introduction at BigData meetup

Basic Operations 1/3

• Add a document

• Create index

Page 24: Elasticsearch Introduction at BigData meetup

Basic Operations 2/3

• Modify/Replace a document

• Delete a document

• Delete index

Page 25: Elasticsearch Introduction at BigData meetup

Basic Operations 3/3• Update a document

Page 26: Elasticsearch Introduction at BigData meetup

Mapping 1/2

• Define how a document should be mapped (similar to schema): searchable fields, tokenization, storage, ..

• Explicit mapping is defined on an index/type level

• A default mapping is automatically created

Page 27: Elasticsearch Introduction at BigData meetup

Mapping 2/2• Core types: string, integer/long, float/double, boolean, and null

• Other types: Array, Object, Nested, IP, GeoPoint, GeoShape, Attachment

• Example

Page 28: Elasticsearch Introduction at BigData meetup

Search API 1/2

• Multi-index, Multi-type

• Uri search - Google like Operators (AND/OR), fields, sort, paging, wildcards, …

Page 29: Elasticsearch Introduction at BigData meetup

Search API 2/2• Paging & Sort

• Fields: selection, scripts

• Post filter

• Highlighting

• Rescoring

• Explain

• …

Page 30: Elasticsearch Introduction at BigData meetup

Query DSL• “SQL” for elasticsearch

• Queries should be used

• for full text search

• where the result depends on a relevance score

• Filters should be used

• for binary yes/no searches

• for queries on exact values

Page 31: Elasticsearch Introduction at BigData meetup

Basic Queries

Page 32: Elasticsearch Introduction at BigData meetup

Basic Filters

Page 33: Elasticsearch Introduction at BigData meetup

Analysis 1/2• Analysis is extracting “terms” from a given text

• Processing natural language to make it computer searchable

• Configurable registry of Analyzers that can be used

• to break indexed (analyzed) fields when a document is indexed

• to process query strings

Page 34: Elasticsearch Introduction at BigData meetup

Analysis 2/2

• Analyzers are composed of

• a single Tokenizer (may be preceded by one or more CharFilters)

• zero or more TokenFilters

• Default Analyzersstandard, pattern, whitespace, language, snowball

Page 35: Elasticsearch Introduction at BigData meetup

Copyright 2014 Elasticsearch Inc / Elasticsearch BV. All rights reserved. Content used with permission from Elasticsearch.

Page 36: Elasticsearch Introduction at BigData meetup

Analytics• Aggregation of information: similar to “group by”

• Facets

• Aggregated data based on a search query

• One-dimensional results

• Ex: “term facets” return facetcounts for various values for a specific field Think color, tag, category, …

• Aggregations (ES 1.0+)

• Nested Facets

• Basic Stats: mean, min, max, std dev, term counts

• Significant Terms, Percentiles, Cardinality estimations

Page 37: Elasticsearch Introduction at BigData meetup

Facets• not yet deprecated, but use aggregations!

• Various Facets terms, range, histogram, date, statistical, geo distance, …

Page 38: Elasticsearch Introduction at BigData meetup

Aggregations• A generic powerful framework that can be divided into 2 main families:

• Bucketing Each bucket is associated with a key and a document criterion The aggregation process provides a list of buckets - each one with a set of documents that "belong" to it.

• MetricAggregations that keep track and compute metrics over a set of documents.

• Aggregations can be nested !

Page 39: Elasticsearch Introduction at BigData meetup

Bucket Aggregators• global

• filter

• missing

• terms

• range

• date range

• ip range

• histogram

• date histogram

• geo distance

• geohash grid

• nested

• reverse nested

• top hits (version 1.3)

Page 40: Elasticsearch Introduction at BigData meetup

Metrics Aggregators• count

• stats

• extended stats

• cardinality

• percentiles

• min

• max

• sum

• avg

Page 41: Elasticsearch Introduction at BigData meetup

Search for end users

• Suggesters - “Did you mean” Terms, Phrases, Completion, Context

• “More like this” Find documents that are "like" provided text by running it against one or more fields

Page 42: Elasticsearch Introduction at BigData meetup

Percolator• Classic ES

1. Add & Index documents

2. Search with queries

3. Retrieve matching documents

• Percolator

1. Add & Index queries

2. Percolate documents

3. Retrieve matching queries

Page 43: Elasticsearch Introduction at BigData meetup

Why Percolate ?!

• Alerts: social media mentions, weather forecast, news alerts

• Automatic Monitoring: price monitoring, stock alerts, logs

• Ads: display targeted ads based on user’s search queries

• Enrich: percolate new documents, then add query matches as document tags

Page 44: Elasticsearch Introduction at BigData meetup

High Availability 1/2• Sharding - Write Scalability

• Split logical data over multiple machines & Control data flows

• Each index has a fixed number of shards

• Improve indexing performance

• Replication - Read Scalability

• Each shard can have 0-many replicas (dynamic setup)

• Removing SPOF (Single Point Of Failure)

• Improve search performance

Page 45: Elasticsearch Introduction at BigData meetup

High Availability 2/2• Zen Discovery

• Automatic discovery of nodes within a cluster and electing a master node

• Useful for failover and replication

• Specific modules: Amazon EC2, Microsoft Azure, Google Compute Engine

• Snapshot & Restore module

Page 46: Elasticsearch Introduction at BigData meetup

Cluster Management• Marvel - http://www.elasticsearch.org/overview/marvel/

• BigDesk - http://bigdesk.org/

• Paramedic - https://github.com/karmi/elasticsearch-paramedic

• KOPF - https://github.com/lmenezes/elasticsearch-kopf/

• Elastic HQ - http://www.elastichq.org/

Page 47: Elasticsearch Introduction at BigData meetup

Clients & Integration• Ecosystem: Kibana, Logstash, Marvel, Hadoop integration

• API Clients: Java, Javascript, Groovy, PHP, Perl, Python, .Net, Ruby, Scala, Clojure, Go, Erlang, …

• Integrations: Grails, Django, Play!, Symfony2, Carrot2, Spring, Drupal, Wordpress, …

• Rivers: CouchDB, JDBC, MongoDB, Neo4j, Redis, RabbitMQ, ActiveMQ, Amazon SQS, File System, Twitter, Wikipedia, RSS, …

Page 48: Elasticsearch Introduction at BigData meetup

Fast & Furious EvolutionVersion 1.1March 25, 2014

• Cardinality Agg

• Percentiles Agg

• Significant Terms Agg

• Search Templates

• Cross fields search

• Alias for indices & templates

Version 1.2May 22, 2014• Java 7

• Indexing & Merging performance

• Aggregations performance

• Context suggester

• Deep scrolling

• Field value factor

Benchmark API coming in 1.3

Version 1.0Feb 12, 2014• Aggregations

• Snapshot & Restore

• Distributed Percolator

• Cat API

• Federated search

• Doc values

• Circuit breaker

Page 49: Elasticsearch Introduction at BigData meetup

Resources• http://www.elasticsearch.org/guide/

• http://www.elasticsearch.org/videos/

• http://www.elasticsearchtutorial.com/

• http://exploringelasticsearch.com/

• http://joelabrahamsson.com/elasticsearch-101/

• http://belczyk.com/2014/01/elasticsearch-recomended-learning-materials/

• http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules-plugins.html

Page 50: Elasticsearch Introduction at BigData meetup

Books• Elasticsearch Server

http://www.packtpub.com/elasticsearch-server-2e/book

• Elasticsearch in Action http://www.manning.com/hinman/

Page 51: Elasticsearch Introduction at BigData meetup

Books• Elasticsearch Cookbook

http://www.packtpub.com/elasticsearch-cookbook/book

• Mastering Elasticsearch http://www.packtpub.com/mastering-elasticsearch-querying-and-data-handling/book

Page 52: Elasticsearch Introduction at BigData meetup

Books• Elasticsearch - The Definitive Guide

http://www.elasticsearch.org/blog/elasticsearch-definitive-guide/

Page 53: Elasticsearch Introduction at BigData meetup

Thank [email protected] - @wavyx

be.linkedin.com/in/erodriguez - github.com/wavyxhttp://www.meetup.com/ElasticSearch-User-Group-Belux-Belgium-Luxembourg/


Top Related