search engine on your server2 agenda 1 terms 3 mappings 4 analyzers and aggregations 5 capacity...

33
1 Aravind Putrevu Developer | Evangelist @aravindputrevu | aravindputrevu.in elastic.co/community Elasticsearch Search Engine on your server

Upload: others

Post on 14-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

1

Aravind PutrevuDeveloper | Evangelist@aravindputrevu | aravindputrevu.inelastic.co/community

Elasticsearch Search Engine on your server

Page 2: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

22

Agenda

Terms1

Mappings 3

Analyzers and Aggregations4

Capacity Planning5

Talking to Elasticsearch2

Page 3: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

33

Agenda

Terms1

Mappings3

Analyzers and Aggregations4

Capacity Planning5

Talking to Elasticsearch2

Page 4: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

44

Agenda

Terms1

Mappings3

Analyzers and Aggregations4

Capacity Planning5

Talking to Elasticsearch2

Page 5: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

55

Agenda

Terms1

Mappings3

Analyzers and Aggregations4

Capacity Planning5

Talking to Elasticsearch2

Page 6: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

66

Agenda

Terms1

Mappings3

Analyzers and Aggregations4

Capacity Planning5

Talking to Elasticsearch2

Page 7: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

7

Elastic StackNo enterprise edition

All new versions with 6.2

X-Pack

Security

Alerting

Monitoring

Reporting

Machine Learning

Graph

Page 8: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

8

Why it is Popular?

Speed Scale Relevance

Page 9: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

9

Terms

NodeCluster Index

Type Document

Shard Replica

https://www.elastic.co/guide/en/elasticsearch/reference/current/glossary.html

A cluster is a collection of one or more nodes (servers)A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities

An index is a collection of documents that have somewhat similar characteristics

*Deprecated in 6.0.0*A type used to be a logical category/partition of your index to allow you to store different

types of documents in the same index

A document is a basic unit of information that can be indexed. This document is expressed in

JSON (JavaScript Object Notation) which is a ubiquitous internet data interchange format.

Elasticsearch provides the ability to subdivide your index into multiple pieces called shards

To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for

short

Page 10: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

10 All product names, logos, and brands are property of their respective owners and are used only for identification purposes. This is not an endorsement.

Elasticsearch Node Types

Elasticsearch

X-Pack

Master (3)

Ingest (X)

Machine Learning (2+)

Data – Warm (X)

Coordinating (X)

Data – Hot (X)

• Master Nodes– Control the cluster, requires a minimum of 3, one is active at any given time

• Data Nodes– Hold indexed data and perform data related operations– Differentiated Hot and Warm Data nodes can be used

• Coordinating Nodes– Route requests, handle search reduce phase, distribute bulk indexing– All nodes function as coordinating nodes

• Ingest Nodes– Use ingest pipelines to transform and enrich before indexing

• Machine Learning Nodes– Run machine learning jobs

Nodes can play one or more roles, for workload isolation and scaling

Page 11: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

11

What powers Elasticsearch?

https://www.elastic.co/blog/found-elasticsearch-top-down

● A Java library

● Great for full-text search

But

● Challenging to use

● Not designed for scale

Page 12: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

12

Talking to Elasticsearch

https://www.elastic.co/guide/en/elasticsearch/client/index.html

Page 13: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

13

Indexing a document

https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

Page 14: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

14

Inserting data _bulk

Page 15: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

15

Where will my data go?

The default value used for _routing is the document’s _id.

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html

0 < shard < number_of_primary_shards - 1

Page 16: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

16

Mappings

Page 17: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

17

Full Text Analysis

Inverted Index

Page 18: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

18

AnalyzerHelps in converting text into tokens for better search capability

Character filters

1 2 3

Tokenizer Token Filters

Page 19: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

19

Aggregations● Metrics

● Bucket

● Pipeline ● and so on...

Page 20: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

20

Querying Data

● Full Text Queries

● Term Level Queries

● Compound Queries

● Geo Queries

Page 21: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

21

Query DSL

Match Query

Page 22: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

22

Query DSL

Term Queries

Page 23: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

23

Query DSL

Nested queries

Page 24: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

24

Query DSL

Geo queries

Page 25: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

25

Beats

Log Files Metrics

Wire Data

Datastore Web APIs

Social Sensors

Kafka

Redis

MessagingQueue

ES-Hadoop

Elasticsearch

Kibana

Master Nodes (3)

Ingest Nodes (X)

Data Nodes – Hot (X)

Data Notes – Warm (X)

Instances (X)

your{beat}

X-Pack X-Pack

Logstash

Nodes (X)

Custom UI

LDAP

Authentication

AD

Notification

SSO

Hadoop Ecosystem

Page 26: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

26

Capacity Planning

It depends...

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Page 27: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

27

Capacity PlanningWhat is your use case?

● Full text search

● Logging/Metrics

● Complex Aggregations with lot of users

Each use case needs a different cluster configuration.

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Page 28: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

28

Capacity PlanningLet us take Logging..● Inflow of data per day

○ Per day : 10GB○ Per Month : 300GB○ Per Year: 3600GB

● Data Retention ○ 15 days

● High Availability (Replication factor)○ 1 i.e., 7200GB Per Year

● Type of Queries

Master Node : X

Data Node : X

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Page 29: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

29

Capacity PlanningHardware Recommendations

● SSD’s are the best

● Local Disk is king!

● Prefer Medium size machine’s over Large size machine’s

● Only 50% of your RAM to Elasticsearch

● Don’t Cross 32GB Java Heap Space

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Page 30: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

30

Beats

Log Files Metrics

Wire Data

Datastore Web APIs

Social Sensors

Kafka

Redis

MessagingQueue

Logstash

ES-Hadoop

Elasticsearch

Kibana

Nodes (X)

Master Nodes (3)

Ingest Nodes (X)

Data Nodes – Hot (X)

Data Notes – Warm (X)

Instances (X)

your{beat}

X-Pack X-Pack

Custom UI

LDAP

Authentication

AD

Notification

SSO

Hadoop Ecosystem

https://www.elastic.co/blog/hot-warm-architecture-in-elasticsearch-5-x

Page 31: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

31

training.elastic.co

Page 32: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

Resources

• https://www.elastic.co/learn• https://www.elastic.co/blog/category/engineering• https://discuss.elastic.co/• https://fb.com/groups/ElasticIndiaUserGroup• https://elastic.co/community

32

Page 33: Search Engine on your server2 Agenda 1 Terms 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning 2 Talking to Elasticsearch

33

Fin!

discuss.elastic.co | [email protected] | @aravindputrevu