elasticsearch distributed search & analytics on bigdata made easy

Download Elasticsearch Distributed search & analytics on BigData made easy

Post on 15-Jul-2015

698 views

Category:

Data & Analytics

6 download

Embed Size (px)

TRANSCRIPT

ElasticSearch Hands-on Tutorial

Itamar Syn-Hershkohttp://code972.com@synhershko

ElasticsearchDistributed search & analytics on BigData made easy1Me?Itamar Syn-Hershko / @synhershkoLucene.NET PMC and lead committerFreelance consultant and developerElasticsearch consulting partnerMicrosoft MVPRavenDBX-Core developerRavenDB in Action author

Consulting Partner

2An index

3ElasticsearchPowered by Apache LuceneOpen-sourceRapid growthHigh profile users world-wide

4REST APIIndexesTypesIDs

$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{ "user" : "synhershko", "post_date" : "2013-05-30T14:12:12", "message" : "trying out Elastic Search", "followers": 3, "registered": true}'5Full-Text Search

6

DocumentsTerm

and big

dark

did

gown

had house in keep keeper keeps

light

never night old

sleep

sleeps the town

whereThe index:Dictionary and posting lists 6 documents to indexExample from:Justin Zobel , Alistair Moffat,Inverted files for text search engines,ACM Computing Surveys (CSUR)v.38 n.2, p.6-es, 2006

The old night keeper keeps the keep in the town1In the big old house in the big old gown.2The house in the town had the big old keep3Where the old night keeper never did sleep.4The night keeper keeps the keep in the night5And keeps in the dark and sleeps in the light.6Full-text Search 101:The inverted index7Full-text Search 101:The inverted index

DocumentsTerm

and big

dark

did

gown

had house in keep keeper keeps

light

never night old

sleep

sleeps the town

whereThe index:Dictionary and posting lists 6 documents to index

The old night keeper keeps the keep in the town1In the big old house in the big old gown.2The house in the town had the big old keep3Where the old night keeper never did sleep.4The night keeper keeps the keep in the night5And keeps in the dark and sleeps in the light.6User queries for keeperTerm NormalizationDocumentsTerm

and big

dark

did

gown

had house in keep keeper keeps

light

never night old

sleep

sleeps the town

whereLowercasingStop words (grey)Not best practice anymoreStemmingPorter stemmers-stemmer

Relevance++SizeOnDisk--Full-Text Search

Your data store10How hard is it to get search right, anyway?

11Relevance

PrecisionThe fraction of the retrieved documents that are relevantRecallThe fraction of the relevant documents that are retrieved

Order of results

12Challenges with searchRelevanceGetting the tokens rightTokenizationStemmingMulti-lingual contentOr other cross-cutting search concernsTolerance

Real-time Analytics

I want full-text search. Oh, Aggregations!Kibana 4Usually log analysis => logstashAggregations FrameworkLive data analysis and reporting, logs as an exampleKibana as ES dashboard14Real-time Analytics

Queue(Redis)

Shippers

Indexer

You could use the same approach as beforeOr this logstash patternRoll your own indexers15Scaling out

Round robin16Moar use cases!#1: Real-Time Alerting System

As data comes in, fire alerts when / if it meets a criteria18Percolation

Can do aggregations on matching queries19#2: Smarter query parsing

Build your own DSL. Recognizing entities.As-you-type feedbackSpan queries. Match phrase prefix.Suggesters.Significant terms facet for query expansion / suggestions20Matching inexact queriesPhrase slopBridge of London -> London BridgeWord-level edit distance with fuzzy queriesditsance -> distancecolor -> colourThere are some challengesBuilt-in suggesters21#3: Offline Classification

Offline can be also real-time; just doesnt happen on searchPre-processing of data to allow for faster, more exact search laterUsing the percolaorSignificant terms aggregation22Structuring the unstructuredRecord linkageBag of words modelMore Like This functionalityNLPEntity extractionProcess that often happens offlinePeople who liked this also likedSignificant terms23#4: Everything is searchable

Geo-spatial searchDistanceShape interactionsMultiple algorithms

Geo-spatial searchMidtown NYC level 5 cellIt is just a matter of figuring out how to represent the piece of data in an index26

Image search

http://colors.qbox.io/ http://cs.stanford.edu/people/karpathy/deepimagesent

Deep Visual-Semantic Alignments for Generating Image Descriptions

#5: Anomaly detection

What is an anomaly

30The Significant Terms Aggregation

Uncommonly commonMark Harwoods talk at

http://www.infoq.com/presentations/elasticsearch-revealing-uncommonly-common

#6: Debugging a distributed system

Queue(Redis)

Logs can be exceptionsExceptions can be parsed grokked33#6: Debugging a distributed system127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"System.NullReferenceException: Object reference not set to an instance of an object. at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add) at AjaxControlToolkit.ToolkitScriptManager.GetScriptCombineAttributes(Assembly assembly) at AjaxControlToolkit.ToolkitScriptManager.IsScriptCombinable(ScriptEntry scriptEntry) at AjaxControlToolkit.ToolkitScriptManager.OnResolveScriptReference(ScriptReferenceEventArgs e) at System.Web.UI.ScriptManager.RegisterScripts() at System.Web.UI.ScriptManager.OnPagePreRenderComplete(Object sender, EventArgs e) at System.Web.UI.Page.OnPreRenderComplete(EventArgs e) at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)Logs can be exceptionsExceptions can be parsed grokked34#7: Distributed git storagePoC in C# using libgit2sharphttps://github.com/synhershko/libgit2sharp.ElasticsearchKudos @nulltokenThank you.Questions?

Itamar Syn-Hershkohttp://code972.com@synhershkoSummary36