search and analyze your data with elasticsearch

Post on 13-Apr-2017

775 Views

Category:

Software

7 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SEARCH AND ANALYZE YOUR DATA WITH ELASTICSEARCHAnton Udovychenko

JEEConf May 20, 2016

ABOUT ME Software Architect @ Levi9 8+ years of Java experience Passionate about agile methodology and clean code

http://ua.linkedin.com/in/antonudovychenko

http://www.slideshare.net/antonudovychenko

AGENDA•Why does search matter to you•Why Elasticsearch• Basic Concepts• Comparison with SQL• Elasticsearch usage• Elasticsearch and Java•Q&A

WHY DOES SEARCH MATTER TO YOU

WHY DOES SEARCH MATTER TO YOU

WHAT IS IT ABOUT

Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics

engine, designed for horizontal scalability, high availability

WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics

engine, designed for horizontal scalability, high availability

WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented,

schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability

WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-

oriented, schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability

Apache 2.0 License

WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-

oriented, schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability

{ "title": "My blogpost", "body": "Having a lot of text...", "user": “es_user", "postDate": "2016-01-01 15:03:32"}

WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented, schema-free, RESTful, full text search and analytics engine,

designed for horizontal scalability, high availability

REST API

WHY ELASTICSEARCH Elasticsearch is a distributed, open source, document-oriented,

schema-free, RESTful, full text search and analytics engine, designed for horizontal scalability, high availability

Image via batman-news.com

WHY ELASTICSEARCH - ALTERNATIVES

– Complex logic (No additional level of abstraction)

+ More fine-grained control= Elasticsearch is based on Lucene

WHY ELASTICSEARCH - ALTERNATIVES

– Proprietary protocol– Real-time caveats– Difficult to go to cloud– More difficult to start using– Smaller community

Sphinx+ Faster on a cold start+ Occupies less memory= Non Java based (C++)

WHY ELASTICSEARCH - ALTERNATIVES

+ Truly open-source+ Primary support of Hadoop distributors+ ZooKeeper is more mature than Zen= Near Real-Time Search= Similar performance

– More difficult to start using– SolrCloud (vs ES out of the box)– Zookeeper is harder to use then Zen– Worse operational tools– Worse monitoring tools– Worse analytical abilities

WHY ELASTICSEARCH

BASIC CONCEPTS•Near realtime•Cluster•Node• Index• Type•Document• Shards and replicas

BASIC CONCEPTS

Cluster

BASIC CONCEPTS

Node Node Node

BASIC CONCEPTS

Shard Shard

Shard

Shard

Shard

Shard

ShardShard

BASIC CONCEPTS

Shard Shard

Shard

Shard

Shard

Shard

ShardShard

Index

BASIC CONCEPTS

Shard

Segm

ent

Segm

ent

Segm

ent

Segm

ent

Lucene Index

BASIC CONCEPTSSegment core

Term Freq

DocIds

brown 2 0,1dog 2 0,1fox 2 0,1in 1 1jump 2 0,1lazy 2 0,1over 2 0,1quick 2 0,1summer 1 1the 2 0,1

Inverted indexDocId

Fields

0 Text: The quick brown fox jumped over the lazy dog Author: Bob

1 Text: Quick brown foxes leap over lazy dogs in summerAuthor: Bill

Document store

0 2101 90

Column store

Likes0 591 23

Shared

BASIC CONCEPTSSegment coreDocId

Fields

0 Text: The quick brown fox jumped over the lazy dog Author: Bob

1 Text: Quick brown foxes leap over lazy dogs in summerAuthor: Bill

Document store

0 2101 90

Column store

Likes0 591 23

Shared

Search term: Leaping brown Fox

Term Freq

DocIds

brown 2 0,1dog 2 0,1fox 2 0,1in 1 1jump 2 0,1lazy 2 0,1over 2 0,1quick 2 0,1summer 1 1the 2 0,1

Inverted index

SQL

ELASTIC

COMPARISON WITH SQLSQL ElasticsearchDatabase IndexTable TypeRow DocumentColumn field Field

COMPARISON WITH SQLSQL ElasticsearchDatabase IndexTable TypeRow Document with propertiesColumn field Field

COMPARISON WITH SQLid title body user postDate1 My first

blogpostHaving a lot of text... es_user 2016-01-01

15:03:32

2 About search

The search data sometimes has a peculiar property…

es_user 2016-01-01 19:22:03

3 Introduction to Elasticsearch

Once I have stumbled upon this idea…

es_user 2016-01-03 11:55:41

COMPARISON WITH SQLPOST http://localhost:9200/blog

CREATE DATABASE blog;USE blog;CREATE TABLE post( id bigint(20) AUTO_INCREMENT,

title varchar(250), body text, user varchar(50), postDate timestamp, PRIMARY KEY(id));

{"mappings": { "post": { "properties": { "title": { "type": "string" }, "body": { "type": "string" }, "user": { "type": "string" }, "postDate": { "type": "date"} } } }

(not obligatory)

COMPARISON WITH SQL (CREATE)

POST http://localhost:9200/blog/postINSERT INTO post( title, body, user, postDate)VALUES( 'My blogpost', 'Having a lot of text...', ‘es_user', '2016-01-01 15:03:32');

{ "title": "My blogpost", "body": "Having a lot of text...", "user": "es_user", "postDate": "2016-01-01 15:03:32"}

COMPARISON WITH SQL (UPDATE)

POST http://localhost:9200/blog/post/1/_update

UPDATE post SET title='My blogpost‘WHERE id=1;

{ "doc": { "title": "My blogpost" }}

COMPARISON WITH SQL (DELETE)

DELETE http://localhost:9200/blog/post/1DELETE FROM post WHERE id=1

COMPARISON WITH SQL (READ)

GET http://localhost:9200/blog/post/1SELECT * FROM post WHERE id=1

SELECT * FROM post GET http://localhost:9200/blog/post/_search

SELECT * FROM post WHERE user=‘es_user’

GET http://localhost:9200/blog/post/_search?q=user:es_user

COMPARISON WITH SQL (READ)

POST http://localhost:9200/blog/post/_search

SELECT * FROM post WHERE body LIKE '%Having %';

{ "query": { "match": { "body": "Having" } }}

DEMO TIME

ELASTICSEARCH AND JAVA

• Native Java client• Spring Data Elasticsearch• REST endpoints• Jest (https://github.com/searchbox-io/Jest)

https://github.com/terrafant/es-feeder

DEMO TIME

Application

ELASTICSEARCH USAGE

ES c

lient

JDBC

DB

Elasticsearch

cluster

REST

Nativ e

Request

SQL

Binary

JSON

ELASTICSEARCH USAGE (DETAILS)

Load

bal

ance

r

Master-

eligible

Node

Master-

eligible

Node

ClientNode

DataNod

e

DataNod

e

DataNod

eDataNod

e

DataNod

e

DataNod

eDataNod

e

DataNod

e

DataNod

eDataNod

e

DataNod

e

DataNod

e

Master

Node

ClientNode

ClientNode

Elas

ticse

arch

clu

ster

ELASTICSEARCH USAGE (ELK)

Frontend Backend

Elasticsearch Kibana

Logstash

Brow

ser

DB

Logstash

Logstash

Broker

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings6. Number of replicas is not less than 2

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings6. Number of replicas is not less than 27. Allocate enough physical memory

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings6. Number of replicas is not less than 27. Allocate enough physical memory8. Configure OS user

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings6. Number of replicas is not less than 27. Allocate enough physical memory8. Configure OS user9. Use monitoring tools

TOP 10 PRODUCTION RECOMMENDATIONS

1. Take care of security2. Avoid split-brain3. Use dedicated master nodes4. Use unicast (not multicast)5. Configure recovery settings6. Number of replicas is not less than 27. Allocate enough physical memory8. Configure OS user9. Use monitoring tools10.Use Oracle JDKs

THANK YOU!Get social@elastic

Explore the docselastic.co/guide

Give it a tryelastic.co/downloads/elasticsearch

Join the communitydiscuss.elastic.com

Check ELK stackdemo.elastic.co

top related