elasticsearch server - third edition · the use cases limiting results to given tags searching for...

756
www.EBooksWorld.ir

Upload: others

Post on 02-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 2: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 3: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ElasticsearchServerThirdEdition

www.EBooksWorld.ir

Page 4: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TableofContents

ElasticsearchServerThirdEdition

Credits

AbouttheAuthors

AbouttheReviewer

www.PacktPub.com

eBooks,discountoffers,andmore

Whysubscribe?

Preface

Whatthisbookcovers

Whatyouneedforthisbook

Whothisbookisfor

Conventions

Readerfeedback

Customersupport

Downloadingtheexamplecode

Downloadingthecolorimagesofthisbook

Errata

Piracy

Questions

1.GettingStartedwithElasticsearchCluster

Fulltextsearching

TheLuceneglossaryandarchitecture

Inputdataanalysis

Indexingandquerying

Scoringandqueryrelevance

ThebasicsofElasticsearch

KeyconceptsofElasticsearch

Index

Document

www.EBooksWorld.ir

Page 5: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Documenttype

Mapping

KeyconceptsoftheElasticsearchinfrastructure

Nodesandclusters

Shards

Replicas

Gateway

Indexingandsearching

Installingandconfiguringyourcluster

InstallingJava

InstallingElasticsearch

RunningElasticsearch

ShuttingdownElasticsearch

Thedirectorylayout

ConfiguringElasticsearch

Thesystem-specificinstallationandconfiguration

InstallingElasticsearchonLinux

InstallingElasticsearchusingRPMpackages

InstallingElasticsearchusingtheDEBpackage

Elasticsearchconfigurationfilelocalization

ConfiguringElasticsearchasasystemserviceonLinux

ElasticsearchasasystemserviceonWindows

ManipulatingdatawiththeRESTAPI

UnderstandingtheRESTAPI

StoringdatainElasticsearch

Creatinganewdocument

Automaticidentifiercreation

Retrievingdocuments

Updatingdocuments

Dealingwithnon-existingdocuments

Addingpartialdocuments

www.EBooksWorld.ir

Page 6: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Deletingdocuments

Versioning

Usageexample

Versioningfromexternalsystems

SearchingwiththeURIrequestquery

Sampledata

URIsearch

Elasticsearchqueryresponse

Queryanalysis

URIquerystringparameters

Thequery

Thedefaultsearchfield

Analyzer

Thedefaultoperatorproperty

Queryexplanation

Thefieldsreturned

Sortingtheresults

Thesearchtimeout

Theresultswindow

Limitingper-shardresults

Ignoringunavailableindices

Thesearchtype

Lowercasingtermexpansion

Wildcardandprefixanalysis

Lucenequerysyntax

Summary

2.IndexingYourData

Elasticsearchindexing

Shardsandreplicas

Writeconsistency

Creatingindices

www.EBooksWorld.ir

Page 7: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Alteringautomaticindexcreation

Settingsforanewlycreatedindex

Indexdeletion

Mappingsconfiguration

Typedeterminingmechanism

Disablingthetypedeterminingmechanism

Tuningthetypedeterminingmechanismfornumerictypes

Tuningthetypedeterminingmechanismfordates

Indexstructuremapping

Typeandtypesdefinition

Fields

Coretypes

Commonattributes

String

Number

Boolean

Binary

Date

Multifields

TheIPaddresstype

Tokencounttype

Usinganalyzers

Out-of-the-boxanalyzers

Definingyourownanalyzers

Defaultanalyzers

Differentsimilaritymodels

Settingper-fieldsimilarity

Availablesimilaritymodels

Configuringdefaultsimilarity

ConfiguringBM25similarity

ConfiguringDFRsimilarity

www.EBooksWorld.ir

Page 8: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ConfiguringIBsimilarity

Batchindexingtospeedupyourindexingprocess

Preparingdataforbulkindexing

Indexingthedata

The_allfield

The_sourcefield

Additionalinternalfields

Introductiontosegmentmerging

Segmentmerging

Theneedforsegmentmerging

Themergepolicy

Themergescheduler

Throttling

Introductiontorouting

Defaultindexing

Defaultsearching

Routing

Theroutingparameters

Routingfields

Summary

3.SearchingYourData

QueryingElasticsearch

Theexampledata

Asimplequery

Pagingandresultsize

Returningtheversionvalue

Limitingthescore

Choosingthefieldsthatwewanttoreturn

Sourcefiltering

Usingthescriptfields

Passingparameterstothescriptfields

www.EBooksWorld.ir

Page 9: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Understandingthequeryingprocess

Querylogic

Searchtype

Searchexecutionpreference

SearchshardsAPI

Basicqueries

Thetermquery

Thetermsquery

Thematchallquery

Thetypequery

Theexistsquery

Themissingquery

Thecommontermsquery

Thematchquery

TheBooleanmatchquery

Thephrasematchquery

Thematchphraseprefixquery

Themultimatchquery

Thequerystringquery

Runningthequerystringqueryagainstmultiplefields

Thesimplequerystringquery

Theidentifiersquery

Theprefixquery

Thefuzzyquery

Thewildcardquery

Therangequery

Regularexpressionquery

Themorelikethisquery

Compoundqueries

Theboolquery

Thedis_maxquery

www.EBooksWorld.ir

Page 10: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Theboostingquery

Theconstant_scorequery

Theindicesquery

Usingspanqueries

Aspan

Spantermquery

Spanfirstquery

Spannearquery

Spanorquery

Spannotquery

Spanwithinquery

Spancontainingquery

Spanmultiquery

Performanceconsiderations

Choosingtherightquery

Theusecases

Limitingresultstogiventags

Searchingforvaluesinarange

Boostingsomeofthematcheddocuments

Ignoringlowerscoringpartialqueries

UsingLucenequerysyntaxinqueries

Handlinguserquerieswithouterrors

Autocompleteusingprefixes

Findingtermssimilartoagivenone

Matchingphrases

Spans,spanseverywhere

Summary

4.ExtendingYourQueryingKnowledge

Filteringyourresults

Thecontextisthekey

Explicitfilteringwithboolquery

www.EBooksWorld.ir

Page 11: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Highlighting

Gettingstartedwithhighlighting

Fieldconfiguration

Underthehood

Forcinghighlightertype

ConfiguringHTMLtags

Controllinghighlightedfragments

Globalandlocalsettings

Requirematching

Customhighlightingquery

ThePostingshighlighter

Validatingyourqueries

UsingtheValidateAPI

Sortingdata

Defaultsorting

Selectingfieldsusedforsorting

Sortingmode

Specifyingbehaviorformissingfields

Dynamiccriteria

Calculatescoringwhensorting

Queryrewrite

Prefixqueryasanexample

GettingbacktoApacheLucene

Queryrewriteproperties

Summary

5.ExtendingYourIndexStructure

Indexingtree-likestructures

Datastructure

Analysis

Indexingdatathatisnotflat

Data

www.EBooksWorld.ir

Page 12: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Objects

Arrays

Mappings

Finalmappings

SendingthemappingstoElasticsearch

Tobeornottobedynamic

Disablingobjectindexing

Usingnestedobjects

Scoringandnestedqueries

Usingtheparent-childrelationship

Indexstructureanddataindexing

Childmappings

Parentmappings

Theparentdocument

Childdocuments

Querying

Queryingdatainthechilddocuments

Queryingdataintheparentdocuments

Performanceconsiderations

ModifyingyourindexstructurewiththeupdateAPI

Themappings

Addinganewfieldtotheexistingindex

Modifyingfieldsofanexistingindex

Summary

6.MakeYourSearchBetter

IntroductiontoApacheLucenescoring

Whenadocumentismatched

Defaultscoringformula

Relevancymatters

ScriptingcapabilitiesofElasticsearch

Objectsavailableduringscriptexecution

www.EBooksWorld.ir

Page 13: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Scripttypes

Infilescripts

Inlinescripts

Indexedscripts

Queryingwithscripts

Scriptingwithparameters

Scriptlanguages

Usingotherthanembeddedlanguages

Usingnativecode

Thefactoryimplementation

Implementingthenativescript

Theplugindefinition

Installingtheplugin

Runningthescript

Searchingcontentindifferentlanguages

Handlinglanguagesdifferently

Handlingmultiplelanguages

Detectingthelanguageofthedocument

Sampledocument

Themappings

Querying

Querieswithanidentifiedlanguage

Querieswithanunknownlanguage

Combiningqueries

Influencingscoreswithqueryboosts

Theboost

Addingtheboosttoqueries

Modifyingthescore

Constantscorequery

Boostingquery

Thefunctionscorequery

www.EBooksWorld.ir

Page 14: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Structureofthefunctionquery

Theweightfactorfunction

Fieldvaluefactorfunction

Thescriptscorefunction

Therandomscorefunction

Decayfunctions

Whendoesindex-timeboostingmakesense?

Definingboostinginthemappings

Wordswiththesamemeaning

Synonymfilter

Synonymsinthemappings

Synonymsstoredonthefilesystem

Definingsynonymrules

UsingApacheSolrsynonyms

Explicitsynonyms

Equivalentsynonyms

Expandingsynonyms

UsingWordNetsynonyms

Queryorindex-timesynonymexpansion

Understandingtheexplaininformation

Understandingfieldanalysis

Explainingthequery

Summary

7.AggregationsforDataAnalysis

Aggregations

Generalquerystructure

Insidetheaggregationsengine

Aggregationtypes

Metricsaggregations

Minimum,maximum,average,andsum

Missingvalues

www.EBooksWorld.ir

Page 15: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Usingscripts

Fieldvaluestatisticsandextendedstatistics

Valuecount

Fieldcardinality

Percentiles

Percentileranks

Tophitsaggregation

Additionalparameters

Geoboundsaggregation

Scriptedmetricsaggregation

Bucketsaggregations

Filteraggregation

Filtersaggregation

Termsaggregation

Countsareapproximate

Minimumdocumentcount

Rangeaggregation

Keyedbuckets

Daterangeaggregation

IPv4rangeaggregation

Missingaggregation

Histogramaggregation

Datehistogramaggregation

Timezones

Geodistanceaggregations

Geohashgridaggregation

Globalaggregation

Significanttermsaggregation

Choosingsignificantterms

Multiplevalueanalysis

Sampleraggregation

www.EBooksWorld.ir

Page 16: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Childrenaggregation

Nestedaggregation

Reversenestedaggregation

Nestingaggregationsandorderingbuckets

Bucketsordering

Pipelineaggregations

Availabletypes

Referencingotheraggregations

Gapsinthedata

Pipelineaggregationtypes

Min,max,sum,andaveragebucketaggregations

Cumulativesumaggregation

Bucketselectoraggregation

Bucketscriptaggregation

Serialdifferencingaggregation

Derivativeaggregation

Movingavgaggregation

Predictingfuturebuckets

Themodels

Summary

8.BeyondFull-textSearching

Percolator

Theindex

Percolatorpreparation

Gettingdeeper

Controllingthesizeofreturnedresults

Percolatorandscorecalculation

Combiningpercolatorswithotherfunctionalities

Gettingthenumberofmatchingqueries

Indexeddocumentpercolation

Elasticsearchspatialcapabilities

www.EBooksWorld.ir

Page 17: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Mappingpreparationforspatialsearches

Exampledata

Additionalgeo_fieldproperties

Samplequeries

Distance-basedsorting

Boundingboxfiltering

Limitingthedistance

Arbitrarygeoshapes

Point

Envelope

Polygon

Multipolygon

Anexampleusage

Storingshapesintheindex

Usingsuggesters

Availablesuggestertypes

Includingsuggestions

Suggesterresponse

Termsuggester

Termsuggesterconfigurationoptions

Additionaltermsuggesteroptions

Phrasesuggester

Configuration

Completionsuggester

Indexingdata

Queryingindexedcompletionsuggesterdata

Customweights

Contextsuggester

Contexttypes

Usingcontext

Usingthegeolocationcontext

www.EBooksWorld.ir

Page 18: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheScrollAPI

Problemdefinition

Scrollingtotherescue

Summary

9.ElasticsearchClusterinDetail

Understandingnodediscovery

Discoverytypes

Noderoles

Masternode

Datanode

Clientnode

Configuringnoderoles

Settingthecluster’sname

Zendiscovery

Masterelectionconfiguration

Configuringunicast

Faultdetectionpingsettings

Clusterstateupdatescontrol

Dealingwithmasterunavailability

AdjustingHTTPtransportsettings

DisablingHTTP

HTTPport

HTTPhost

Thegatewayandrecoverymodules

Thegateway

Recoverycontrol

Additionalgatewayrecoveryoptions

IndicesrecoveryAPI

Delayedallocation

Indexrecoveryprioritization

Templatesanddynamictemplates

www.EBooksWorld.ir

Page 19: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Templates

Anexampleofatemplate

Dynamictemplates

Thematchingpattern

Fielddefinitions

Elasticsearchplugins

Thebasics

Installingplugins

Removingplugins

Elasticsearchcaches

Fielddatacache

Fielddatasize

Circuitbreakers

Fielddataanddocvalues

Shardrequestcache

Enablingandconfiguringtheshardrequestcache

Perrequestshardrequestcachedisabling

Shardrequestcacheusagemonitoring

Nodequerycache

Indexingbuffers

Whencachesshouldbeavoided

TheupdatesettingsAPI

TheclustersettingsAPI

TheindicessettingsAPI

Summary

10.AdministratingYourCluster

Elasticsearchtimemachine

Creatingasnapshotrepository

Creatingsnapshots

Additionalparameters

Restoringasnapshot

www.EBooksWorld.ir

Page 20: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Cleaningup–deletingoldsnapshots

Monitoringyourcluster’sstateandhealth

ClusterhealthAPI

Controllinginformationdetails

Additionalparameters

IndicesstatsAPI

Docs

Store

Indexing,get,andsearch

Additionalinformation

NodesinfoAPI

Returnedinformation

NodesstatsAPI

ClusterstateAPI

ClusterstatsAPI

PendingtasksAPI

IndicesrecoveryAPI

IndicesshardstoresAPI

IndicessegmentsAPI

Controllingtheshardandreplicaallocation

Explicitlycontrollingallocation

Specifyingnodeparameters

Configuration

Indexcreation

Excludingnodesfromallocation

Requiringnodeattributes

UsingtheIPaddressforshardallocation

Disk-basedshardallocation

Configuringdiskbasedshardallocation

Disablingdiskbasedshardallocation

Thenumberofshardsandreplicaspernode

www.EBooksWorld.ir

Page 21: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Allocationthrottling

Cluster-wideallocation

Allocationawareness

Forcingallocationawareness

Filtering

Whatdoinclude,exclude,andrequiremean

Manuallymovingshardsandreplicas

Movingshards

Cancelingshardallocation

Forcingshardallocation

MultiplecommandsperHTTPrequest

Allowingoperationsonprimaryshards

Handlingrollingrestarts

Controllingclusterrebalancing

Understandingrebalance

Clusterbeingready

Theclusterrebalancesettings

Controllingwhenrebalancingwillbeallowed

Controllingthenumberofshardsbeingmovedbetweennodesconcurrently

Controllingwhichshardsmayberebalanced

TheCatAPI

Thebasics

UsingCatAPI

Commonarguments

Theexamples

Gettinginformationaboutthemasternode

Gettinginformationaboutthenodes

Retrievingrecoveryinformationforanindex

Warmingup

Defininganewwarmingquery

Retrievingthedefinedwarmingqueries

www.EBooksWorld.ir

Page 22: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Deletingawarmingquery

Disablingthewarmingupfunctionality

Choosingqueriesforwarming

Indexaliasingandusingittosimplifyyoureverydaywork

Analias

Creatinganalias

Modifyingaliases

Combiningcommands

Retrievingaliases

Removingaliases

Filteringaliases

Aliasesandrouting

Zerodowntimereindexingandaliases

Summary

11.ScalingbyExample

Hardware

Physicalserversoracloud

CPU

RAMmemory

Massstorage

Thenetwork

Howmanyservers

Costcutting

PreparingasingleElasticsearchnode

Thegeneralpreparations

Avoidingswapping

Filedescriptors

Virtualmemory

Thememory

Fielddatacacheandbreakingthecircuit

Usedocvalues

www.EBooksWorld.ir

Page 23: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RAMbufferforindexing

Indexrefreshrate

Threadpools

Horizontalexpansion

Automaticallycreatingthereplicas

Redundancyandhighavailability

Costandperformanceflexibility

Continuousupgrades

MultipleElasticsearchinstancesonasinglephysicalmachine

Preventingashardanditsreplicasfrombeingonthesamenode

Designatednoderolesforlargerclusters

Queryaggregatornodes

Datanodes

Mastereligiblenodes

Preparingtheclusterforhighindexingandqueryingthroughput

Indexingrelatedadvice

Indexrefreshrate

Threadpoolstuning

Automaticstorethrottling

Handlingtime-baseddata

Multipledatapaths

Datadistribution

Bulkindexing

RAMbufferforindexing

Adviceforhighqueryratescenarios

Shardrequestcache

Thinkaboutthequeries

Parallelizeyourqueries

Fielddatacacheandbreakingthecircuit

Keepsizeandshardsizeundercontrol

Monitoring

www.EBooksWorld.ir

Page 24: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ElasticsearchHQ

Marvel

SPMforElasticsearch

Summary

Index

www.EBooksWorld.ir

Page 25: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 26: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ElasticsearchServerThirdEdition

www.EBooksWorld.ir

Page 27: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 28: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ElasticsearchServerThirdEditionCopyright©2016PacktPublishing

Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.

Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthors,norPacktPublishing,anditsdealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.

PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.

Firstpublished:October2013

Secondedition:February2015

Thirdedition:February2016

Productionreference:1230216

PublishedbyPacktPublishingLtd.

LiveryPlace

35LiveryStreet

BirminghamB32PB,UK.

ISBN978-1-78588-881-6

www.packtpub.com

www.EBooksWorld.ir

Page 29: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 30: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CreditsAuthors

RafałKuć

MarekRogoziński

Reviewer

PaigeCook

CommissioningEditor

NadeemBagban

AcquisitionEditor

DivyaPoojari

ContentDevelopmentEditor

KirtiPatil

TechnicalEditor

UtkarshaS.Kadam

CopyEditor

AlphaSingh

ProjectCoordinator

NidhiJoshi

Proofreader

SafisEditing

Indexer

RekhaNair

Graphics

JasonMonteiro

ProductionCoordinator

ManuJoseph

CoverWork

ManuJoseph

www.EBooksWorld.ir

Page 31: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 32: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AbouttheAuthorsRafałKućisasoftwareengineer,trainer,speakerandconsultant.HeisworkingasaconsultantandsoftwareengineeratSematextGroupInc.whereheconcentratesonopensourcetechnologiessuchasApacheLucene,Solr,andElasticsearch.Hehasmorethan14yearsofexperienceinvarioussoftwaredomains—frombankingsoftwaretoe–commerceproducts.HeismainlyfocusedonJava;however,heisopentoeverytoolandprogramminglanguagethatmighthelphimtoachievehisgoalseasilyandquickly.Rafałisalsooneofthefoundersofthesolr.plsite,wherehetriestosharehisknowledgeandhelppeoplesolvetheirSolrandLuceneproblems.HeisalsoaspeakeratvariousconferencesaroundtheworldsuchasLuceneEurocon,BerlinBuzzwords,ApacheCon,Lucene/SolrRevolution,Velocity,andDevOpsDays.

RafałbeganhisjourneywithLucenein2002;however,itwasn’tloveatfirstsight.WhenhecamebacktoLuceneinlate2003,herevisedhisthoughtsabouttheframeworkandsawthepotentialinsearchtechnologies.ThenSolrcameandthatwasit.HestartedworkingwithElasticsearchinthemiddleof2010.Atpresent,Lucene,Solr,Elasticsearch,andinformationretrievalarehismainareasofinterest.

RafałisalsotheauthoroftheSolrCookbookseries,ElasticSearchServeranditssecondedition,andthefirstandsecondeditionsofMasteringElasticSearch,allpublishedbyPacktPublishing.

MarekRogozińskiisasoftwarearchitectandconsultantwithmorethan10yearsofexperience.Hisspecializationconcernssolutionsbasedonopensourcesearchengines,suchasSolrandElasticsearch,andthesoftwarestackforbigdataanalyticsincludingHadoop,Hbase,andTwitterStorm.

Heisalsoacofounderofthesolr.plsite,whichpublishesinformationandtutorialsaboutSolrandLucenelibraries.HeisthecoauthorofElasticSearchServeranditssecondedition,andthefirstandsecondeditionsofMasteringElasticSearch,allpublishedbyPacktPublishing.

HeiscurrentlythechieftechnologyofficerandleadarchitectatZenCard,acompanythatprocessesandanalyzeslargequantitiesofpaymenttransactionsinrealtime,allowingautomaticandanonymousidentificationofretailcustomersonallretailerchannels(m-commerce/e-commerce/brick&mortar)andgivingretailersacustomerretentionandloyaltytool.

www.EBooksWorld.ir

Page 33: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 34: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AbouttheReviewerPaigeCookworksasasoftwarearchitectforVidea,partoftheCoxFamilyofCompanies,andlivesnearAtlanta,Georgia.Hehastwentyyearsofexperienceinsoftwaredevelopment,primarilywiththeMicrosoft.NETFramework.Hiscareerhasbeenlargelyfocusedonbuildingenterprisesolutionsforthemediaandentertainmentindustry.HeisespeciallyinterestedinsearchtechnologiesusingtheApacheLucenesearchengineandhasexperiencewithbothElasticsearchandApacheSolr.Apartfromhiswork,heenjoysDIYhomeprojectsandspendingtimewithhiswifeandtwodaughters.

www.EBooksWorld.ir

Page 35: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 36: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.PacktPub.com

www.EBooksWorld.ir

Page 37: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

eBooks,discountoffers,andmoreDidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusat<[email protected]>formoredetails.

Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooksandeBooks.

https://www2.packtpub.com/books/subscription/packtlib

DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt’sonlinedigitalbooklibrary.Here,youcansearch,access,andreadPackt’sentirelibraryofbooks.

www.EBooksWorld.ir

Page 38: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Whysubscribe?FullysearchableacrosseverybookpublishedbyPacktCopyandpaste,print,andbookmarkcontentOndemandandaccessibleviaawebbrowser

www.EBooksWorld.ir

Page 39: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 40: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PrefaceWelcometoElasticsearchServer,ThirdEdition.ThisisthethirdinstalmentofthebookdedicatedtoyetanothermajorreleaseofElasticsearch—thistimeversion2.2.Inthethirdedition,wehavedecidedtogoonasimilarroutethatwetookwhenwewrotethesecondeditionofthebook.WenotonlyupdatedthecontenttomatchthenewversionofElasticsearch,butalsorestructuredthebookbyremovingandaddingnewsectionsandchapters.Wereadthesuggestionswegotfromyou—thereadersofthebook,andwecarefullytriedtoincorporatethesuggestionsandcommentsreceivedsincethereleaseofthefirstandsecondeditions.

Whilereadingthisbook,youwillbetakenonajourneytothewonderfulworldoffull-textsearchprovidedbytheElasticsearchserver.WewillstartwithageneralintroductiontoElasticsearch,whichcovershowtostartandrunElasticsearch,itsbasicconcepts,andhowtoindexandsearchyourdatainthemostbasicway.Thisbookwillalsodiscussthequerylanguage,socalledQueryDSL,thatallowsyoutocreatecomplicatedqueriesandfilterreturnedresults.Inadditiontoallofthis,you’llseehowyoucanusetheaggregationframeworktocalculateaggregateddatabasedontheresultsreturnedbyyourqueries.WewillimplementtheautocompletefunctionalitytogetherandlearnhowtouseElasticsearchspatialcapabilitiesandprospectivesearch.

Finally,thisbookwillshowyouElasticsearch’sadministrationAPIcapabilitieswithfeaturessuchasshardplacementcontrol,clusterhandling,andmore,endingwithadedicatedchapterthatwilldiscussElasticsearch’spreparationforsmallandlargedeployments—bothonesthatconcentrateonindexingandalsoonesthatconcentrateonindexing.

www.EBooksWorld.ir

Page 41: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

WhatthisbookcoversChapter1,GettingStartedwithElasticsearchCluster,coverswhatfull-textsearchingis,whatApacheLuceneis,whattextanalysisis,howtorunandconfigureElasticsearch,andfinally,howtoindexandsearchyourdatainthemostbasicway.

Chapter2,IndexingYourData,showshowindexingworks,howtoprepareindexstructure,whatdatatypesweareallowedtouse,howtospeedupindexing,whatsegmentsare,howmergingworks,andwhatroutingis.

Chapter3,SearchingYourData,introducesthefull-textsearchcapabilitiesofElasticsearchbydiscussinghowtoqueryit,howthequeryingprocessworks,andwhattypesofbasicandcompoundqueriesareavailable.Inadditiontothis,wewillshowhowtouseposition-awarequeriesinElasticsearch.

Chapter4,ExtendingYourQueryKnowledge,showshowtoefficientlynarrowdownyoursearchresultsbyusingfilters,howhighlightingworks,howtosortyourresults,andhowqueryrewriteworks.

Chapter5,ExtendingYourIndexStructure,showshowtoindexmorecomplexdatastructures.Welearnhowtoindextree-likedatatypes,howtoindexdatawithrelationshipsbetweendocuments,andhowtomodifyindexstructure.

Chapter6,MakeYourSearchBetter,coversApacheLucenescoringandhowtoinfluenceitinElasticsearch,thescriptingcapabilitiesofElasticsearch,anditslanguageanalysiscapabilities.

Chapter7,AggregationsforDataAnalysis,introducesyoutothegreatworldofdataanalysisbyshowingyouhowtousetheElasticsearchaggregationframework.Wewilldiscussalltypesofaggregations—metrics,buckets,andthenewpipelineaggregationsthathavebeenintroducedinElasticsearch.

Chapter8,BeyondFull-textSearching,discussesnonfull-textsearch-relatedfunctionalitiessuchaspercolator—reversedsearch,andthegeo-spatialcapabilitiesofElasticsearch.Thischapteralsodiscussessuggesters,whichallowustobuildaspellcheckingfunctionalityandanefficientautocompletemechanism,andwewillshowhowtohandledeep-pagingefficiently.

Chapter9,ElasticsearchClusterinDetail,discussesnodesdiscoverymechanism,recoveryandgatewayElasticsearchmodules,templates,caches,andsettingsupdateAPI.

Chapter10,AdministratingYourCluster,coverstheElasticsearchbackupfunctionality,rebalancing,andshardsmoving.Inadditiontothis,youwilllearnhowtousethewarmupfunctionality,usetheCatAPI,andworkwithaliases.

Chapter11,ScalingbyExample,isdedicatedtoscalingandtuning.WewillstartwithhardwarepreparationsandconsiderationsandasingleElasticsearchnode-relatedtuning.Wewillgothroughclustersetupandverticalscaling,endingthechapterwithhighqueryingandindexingusecasesandclustermonitoring.

www.EBooksWorld.ir

Page 42: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 43: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

WhatyouneedforthisbookThisbookwaswrittenusingElasticsearchserver2.2andalltheexamplesandfunctionsshouldworkwiththis.Inadditiontothis,you’llneedacommandthatallowsyoutosendHTTPrequestsuchascurl,whichisavailableformostoperatingsystems.Pleasenotethatalltheexamplesinthisbookusethepreviouslymentionedcurltool.Ifyouwanttouseanothertool,pleaseremembertoformattherequestinanappropriatewaythatisunderstoodbythetoolofyourchoice.

Inadditiontothis,somechaptersmayrequireadditionalsoftware,suchasElasticsearchplugins,butwhenneededithasbeenexplicitlymentioned.

www.EBooksWorld.ir

Page 44: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 45: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

WhothisbookisforIfyouareabeginnertotheworldoffull-textsearchandElasticsearch,thenthisbookisespeciallyforyou.YouwillbeguidedthroughthebasicsofElasticsearchandyouwilllearnhowtousesomeoftheadvancedfunctionalities.

IfyouknowElasticsearchandyouworkedwithit,thenyoumayfindthisbookinterestingasitprovidesaniceoverviewofallthefunctionalitieswithexamplesanddescriptions.However,youmayencountersectionsthatyoualreadyknow.

IfyouknowtheApacheSolrsearchengine,thisbookcanalsobeusedtocomparesomefunctionalitiesofApacheSolrandElasticsearch.Thismaygiveyoutheknowledgeaboutwhichtoolismoreappropriateforyourusecase.

IfyouknowallthedetailsaboutElasticsearchandyouknowhoweachoftheconfigurationparameterswork,thenthisisdefinitelynotthebookyouarelookingfor.

www.EBooksWorld.ir

Page 46: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 47: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ConventionsInthisbook,youwillfindanumberoftextstylesthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestylesandanexplanationoftheirmeaning.

Codewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandlesareshownasfollows:“IfyouusetheLinuxorOSXcommand,thecURLpackageshouldalreadybeavailable.”

Ablockofcodeissetasfollows:

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"published":{"type":"date"},

"contents":{"type":"string"}

}

}

}

}

Whenwewishtodrawyourattentiontoaparticularpartofacodeblock,therelevantlinesoritemsaresetinbold:

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"published":{"type":"date"},

"contents":{"type":"string"}

}

}

}

}

Anycommand-lineinputoroutputiswrittenasfollows:

curl-XPUThttp://localhost:9200/users/?pretty-d'{

"mappings":{

"user":{

"numeric_detection":true

}

}

}'

NoteWarningsorimportantnotesappearinaboxlikethis.

www.EBooksWorld.ir

Page 48: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 49: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ReaderfeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook—whatyoulikedordisliked.Readerfeedbackisimportantforusasithelpsusdeveloptitlesthatyouwillreallygetthemostoutof.

Tosendusgeneralfeedback,simplye-mail<[email protected]>,andmentionthebook’stitleinthesubjectofyourmessage.

Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideatwww.packtpub.com/authors.

www.EBooksWorld.ir

Page 50: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 51: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CustomersupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.

www.EBooksWorld.ir

Page 52: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DownloadingtheexamplecodeYoucandownloadtheexamplecodefilesforthisbookfromyouraccountathttp://www.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.

Youcandownloadthecodefilesbyfollowingthesesteps:

1. Loginorregistertoourwebsiteusingyoure-mailaddressandpassword.2. HoverthemousepointerontheSUPPORTtabatthetop.3. ClickonCodeDownloads&Errata.4. EnterthenameofthebookintheSearchbox.5. Selectthebookforwhichyou’relookingtodownloadthecodefiles.6. Choosefromthedrop-downmenuwhereyoupurchasedthisbookfrom.7. ClickonCodeDownload.

Oncethefileisdownloaded,pleasemakesurethatyouunziporextractthefolderusingthelatestversionof:

WinRAR/7-ZipforWindowsZipeg/iZip/UnRarXforMac7-Zip/PeaZipforLinux

www.EBooksWorld.ir

Page 53: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DownloadingthecolorimagesofthisbookWealsoprovideyouwithaPDFfilethathascolorimagesofthescreenshots/diagramsusedinthisbook.Thecolorimageswillhelpyoubetterunderstandthechangesintheoutput.Youcandownloadthisfilefromhttps://www.packtpub.com/sites/default/files/downloads/ElasticsearchServerThirdEdition_ColorImages.pdf

www.EBooksWorld.ir

Page 54: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyoufindamistakeinoneofourbooks—maybeamistakeinthetextorthecode—wewouldbegratefulifyoucouldreportthistous.Bydoingso,youcansaveotherreadersfromfrustrationandhelpusimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,pleasereportthembyvisitinghttp://www.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetailsofyourerrata.Onceyourerrataareverified,yoursubmissionwillbeacceptedandtheerratawillbeuploadedtoourwebsiteoraddedtoanylistofexistingerrataundertheErratasectionofthattitle.

Toviewthepreviouslysubmittederrata,gotohttps://www.packtpub.com/books/content/supportandenterthenameofthebookinthesearchfield.TherequiredinformationwillappearundertheErratasection.

www.EBooksWorld.ir

Page 55: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PiracyPiracyofcopyrightedmaterialontheInternetisanongoingproblemacrossallmedia.AtPackt,wetaketheprotectionofourcopyrightandlicensesveryseriously.IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,pleaseprovideuswiththelocationaddressorwebsitenameimmediatelysothatwecanpursuearemedy.

Pleasecontactusat<[email protected]>withalinktothesuspectedpiratedmaterial.

Weappreciateyourhelpinprotectingourauthorsandourabilitytobringyouvaluablecontent.

www.EBooksWorld.ir

Page 56: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

QuestionsIfyouhaveaproblemwithanyaspectofthisbook,youcancontactusat<[email protected]>,andwewilldoourbesttoaddresstheproblem.

www.EBooksWorld.ir

Page 57: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 58: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Chapter1.GettingStartedwithElasticsearchClusterWelcometothewonderfulworldofElasticsearch—agreatfulltextsearchandanalyticsengine.Itdoesn’tmatterifyouarenewtoElasticsearchandfulltextsearchesingeneral,orifyoualreadyhavesomeexperienceinthis.Wehopethat,byreadingthisbook,you’llbeabletolearnandextendyourknowledgeofElasticsearch.Asthisbookisalsodedicatedtobeginners,wedecidedtostartwithashortintroductiontofulltextsearchesingeneral,andafterthat,abriefoverviewofElasticsearch.

PleaserememberthatElasticsearchisarapidlychangingofsoftware.Notonlyarefeaturesadded,buttheElasticsearchcorefunctionalityisalsoconstantlyevolvingandchanging.Wetrytokeepupwiththesechanges,andbecauseofthiswearegivingyouthethirdeditionofthebookdedicatedtoElasticsearch2.x.

ThefirstthingweneedtodowithElasticsearchisinstallandconfigureit.Withmanyapplications,youstartwiththeinstallationandconfigurationandusuallyforgettheimportanceofthesesteps.Wewilltrytoguideyouthroughthesestepssothatitbecomeseasiertoremember.Inadditiontothis,wewillshowyouthesimplestwaytoindexandretrievedatawithoutgoingintotoomuchdetail.ThefirstchapterwilltakeyouonaquickridethroughElasticsearchandthefulltextsearchworld.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

FulltextsearchingThebasicsofApacheLucenePerformingtextanalysisThebasicconceptsofElasticsearchInstallingandconfiguringElasticsearchUsingtheElasticsearchRESTAPItomanipulatedataSearchingusingbasicURIrequests

www.EBooksWorld.ir

Page 59: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

FulltextsearchingBackinthedayswhenfulltextsearchingwasatermknowntoasmallpercentageofengineers,mostofususedSQLdatabasestoperformsearchoperations.UsingSQLdatabasestosearchforthedatastoredinthemwasokaytosomeextent.Suchasearchwasn’tfast,especiallyonlargeamountsofdata.Evennow,smallapplicationsareusuallygoodwithastandardLIKE%phrase%searchinaSQLdatabase.However,aswegodeeperanddeeper,westarttoseethelimitsofsuchanapproach—alackofscalability,notenoughflexibility,andalackoflanguageanalysis.Ofcourse,thereareadditionalmodulesthatextendSQLdatabaseswithfulltextsearchcapabilities,buttheyarestilllimitedcomparedtodedicatedfulltextsearchlibrariesandsearchenginessuchasElasticsearch.SomeofthosereasonsledtothecreationofApacheLucene(http://lucene.apache.org/),alibrarywrittencompletelyinJava(http://java.com/en/),whichisveryfast,light,andprovideslanguageanalysisforalargenumberoflanguagesspokenthroughouttheworld.

www.EBooksWorld.ir

Page 60: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheLuceneglossaryandarchitectureBeforegoingintothedetailsoftheanalysisprocess,wewouldliketointroduceyoutotheglossaryandoverallarchitectureofApacheLucene.WedecidedthatthisinformationiscrucialforunderstandinghowElasticsearchworks,andeventhoughthebookisnotaboutApacheLucene,knowingthefoundationoftheElasticsearchanalyticsandindexingengineisvitaltofullyunderstandhowthisgreatsearchengineworks.

Thebasicconceptsofthementionedlibraryareasfollows:

Document:Thisisthemaindatacarrierusedduringindexingandsearching,comprisingoneormorefieldsthatcontainthedataweputinandgetfromLucene.Field:Thisasectionofthedocument,whichisbuiltoftwoparts:thenameandthevalue.Term:Thisisaunitofsearchrepresentingawordfromthetext.Token:Thisisanoccurrenceofaterminthetextofthefield.Itconsistsofthetermtext,startandendoffsets,andatype.

ApacheLucenewritesalltheinformationtoastructurecalledtheinvertedindex.Itisadatastructurethatmapsthetermsintheindextothedocumentsandnottheotherwayaroundasarelationaldatabasedoesinitstables.Youcanthinkofaninvertedindexasadatastructurewheredataisterm-orientedratherthandocument-oriented.Let’sseehowasimpleinvertedindexwilllook.Forexample,let’sassumethatwehavedocumentswithonlyasinglefieldcalledtitletobeindexed,andthevaluesofthatfieldareasfollows:

ElasticsearchServer(document1)MasteringElasticsearchSecondEdition(document2)ApacheSolrCookbookThirdEdition(document3)

AverysimplifiedvisualizationoftheLuceneinvertedindexcouldlookasfollows:

Eachtermpointstothenumberofdocumentsitispresentin.Forexample,theterm

www.EBooksWorld.ir

Page 61: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

editionispresenttwiceinthesecondandthirddocuments.Suchastructureallowsforveryefficientandfastsearchoperationsinterm-basedqueries(butnotexclusively).Becausetheoccurrencesofthetermareconnectedtothetermsthemselves,Lucenecanuseinformationaboutthetermoccurrencestoperformfastandprecisescoringinformationbygivingeachdocumentavaluethatrepresentshowwelleachofthereturneddocumentsmatchedthequery.

Ofcourse,theactualindexcreatedbyLuceneismuchmorecomplicatedandadvancedbecauseofadditionalfilesthatincludeinformationsuchastermvectors(perdocumentinvertedindex),docvalues(columnorientedfieldinformation),storedfields(theoriginalandnottheanalyzedvalueofthefield),andsoon.However,allyouneedtoknowfornowishowthedataisorganizedandnotwhatexactlyisstored.

Eachindexisdividedintomultiplewrite-onceandread-many-timestructurescalledsegments.EachsegmentisaminiatureApacheLuceneindexonitsown.Whenindexing,afterasinglesegmentiswrittentothediskitcan’tbeupdated,orweshouldrathersayitcan’tbefullyupdated;documentscan’tberemovedfromit,theycanonlybemarkedasdeletedinaseparatefile.ThereasonthatLucenedoesn’tallowsegmentstobeupdatedisthenatureoftheinvertedindex.Afterthefieldsareanalyzedandputintotheinvertedindex,thereisnoeasywayofbuildingtheoriginaldocumentstructure.Whendeleting,Lucenewouldhavetodeletetheinformationfromthesegment,whichtranslatestoupdatingalltheinformationwithintheinvertedindexitself.

Becauseofthefactthatsegmentsarewrite-oncestructuresLuceneisabletomergesegmentstogetherinaprocesscalledsegmentmerging.Duringindexing,ifLucenethinksthattherearetoomanysegmentsfallingintothesamecriterion,anewandbiggersegmentwillbecreated—onethatwillhavedatafromtheothersegments.Duringthatprocess,Lucenewilltrytoremovedeleteddataandgetbackthespaceneededtoholdinformationaboutthosedocuments.SegmentmergingisademandingoperationbothintermsoftheI/OandCPU.Whatwehavetorememberfornowisthatsearchingwithonelargesegmentisfasterthansearchingwithmultiplesmalleronesholdingthesamedata.That’sbecause,ingeneral,searchingtranslatestojustmatchingthequerytermstotheonesthatareindexed.Youcanimaginehowsearchingthroughmultiplesmallsegmentsandmergingthoseresultswillbeslowerthanhavingasinglesegmentpreparingtheresults.

www.EBooksWorld.ir

Page 62: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

InputdataanalysisThetransformationofadocumentthatcomestoLuceneandisprocessedandputintotheinvertedindexformatiscalledindexation.OneofthethingsLucenehastododuringthisisdataanalysis.Youmaywantsomeofyourfieldstobeprocessedbyalanguageanalyzersothatwordssuchascarandcarsaretreatedasthesamebeyourindex.Ontheotherhand,youmaywantotherfieldstobedividedonlyonthewhitespacecharacterorbeonlylowercased.

Analysisisdonebytheanalyzer,whichisbuiltofatokenizerandzeroormoretokenfilters,anditcanalsohavezeroormorecharactermappers.

AtokenizerinLuceneisusedtosplitthetextintotokens,whicharebasicallythetermswithadditionalinformationsuchasitspositionintheoriginaltextanditslength.Theresultsofthetokenizer’sworkiscalledatokenstream,wherethetokensareputonebyoneandarereadytobeprocessedbythefilters.

Apartfromthetokenizer,theLuceneanalyzerisbuiltofzeroormoretokenfiltersthatareusedtoprocesstokensinthetokenstream.Someexamplesoffiltersareasfollows:

Lowercasefilter:MakesallthetokenslowercasedSynonymsfilter:ChangesonetokentoanotheronthebasisofsynonymrulesLanguagestemmingfilters:Responsibleforreducingtokens(actually,thetextpartthattheyprovide)intotheirrootorbaseformscalledthestem(https://en.wikipedia.org/wiki/Word_stem)

Filtersareprocessedoneafteranother,sowehavealmostunlimitedanalyticalpossibilitieswiththeadditionofmultiplefilters,oneafteranother.

Finally,thecharactermappersoperateonnon-analyzedtext—theyareusedbeforethetokenizer.Therefore,wecaneasilyremoveHTMLtagsfromwholepartsoftextwithoutworryingabouttokenization.

www.EBooksWorld.ir

Page 63: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndexingandqueryingYoumaywonderhowalltheinformationwe’vedescribedsofaraffectsindexingandqueryingwhenusingLuceneandallthesoftwarethatisbuiltontopofit.Duringindexing,Lucenewilluseananalyzerofyourchoicetoprocessthecontentsofyourdocument;ofcourse,differentanalyzerscanbeusedfordifferentfields,sothenamefieldofyourdocumentcanbeanalyzeddifferentlycomparedtothesummaryfield.Forexample,thenamefieldmayonlybetokenizedonwhitespacesandlowercased,sothatexactmatchesaredoneandthesummaryfieldisstemmedinadditiontothat.Wecanalsodecidetonotanalyzethefieldsatall—wehavefullcontrolovertheanalysisprocess.

Duringaquery,yourquerytextcanbeanalyzedaswell.However,youcanalsochoosenottoanalyzeyourqueries.ThisiscrucialtorememberbecausesomeElasticsearchqueriesareanalyzedandsomearenot.Forexample,prefixandtermqueriesarenotanalyzed,andmatchqueriesareanalyzed(wewillgettothatinChapter3,SearchingYourData).Havingqueriesthatareanalyzedandnotanalyzedisveryuseful;sometimes,youmaywanttoqueryafieldthatisnotanalyzed,whilesometimesyoumaywanttohaveafulltextsearchanalysis.Forexample,ifwesearchfortheLightRedtermandthequeryisbeinganalyzedbythestandardanalyzer,thenthetermsthatwouldbesearchedarelightandred.Ifweuseaquerytypethathasnotbeenanalyzed,thenwewillexplicitlysearchfortheLightRedterm.Wemaynotwanttoanalyzethecontentofthequeryifweareonlyinterestedinexactmatches.

Whatyoushouldrememberaboutindexingandqueryinganalysisisthattheindexshouldmatchthequeryterm.Iftheydon’tmatch,Lucenewon’treturnthedesireddocuments.Forexample,ifyouusestemmingandlowercasingduringindexing,youneedtoensurethatthetermsinthequeryarealsolowercasedandstemmed,oryourquerieswon’treturnanyresultsatall.Forexample,let’sgetbacktoourLightRedtermthatweanalyzedduringindexing;wehaveitastwotermsintheindex:lightandred.IfwerunaLightRedqueryagainstthatdataanddon’tanalyzeit,wewon’tgetthedocumentintheresults—thequerytermdoesnotmatchtheindexedterms.Itisimportanttokeepthetokenfiltersinthesameorderduringindexingandquerytimeanalysissothatthetermsresultingfromsuchananalysisarethesame.

www.EBooksWorld.ir

Page 64: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ScoringandqueryrelevanceThereisoneadditionalthingthatweonlymentionedoncetillnow—scoring.Whatisthescoreofadocument?Thescoreisaresultofascoringformulathatdescribeshowwellthedocumentmatchesthequery.Bydefault,ApacheLuceneusestheTF/IDF(termfrequency/inversedocumentfrequency)scoringmechanism,whichisanalgorithmthatcalculateshowrelevantthedocumentisinthecontextofourquery.Ofcourse,itisnottheonlyalgorithmavailable,andwewillmentionotheralgorithmsintheMappingsconfigurationsectionofChapter2,IndexingYourData.

NoteIfyouwanttoreadmoreabouttheApacheLuceneTF/IDFscoringformula,pleasevisitApacheLuceneJavadocsfortheTFIDF.Thesimilarityclassisavailableathttp://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

www.EBooksWorld.ir

Page 65: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 66: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThebasicsofElasticsearchElasticsearchisanopensourcesearchserverprojectstartedbyShayBanonandpublishedinFebruary2010.Duringthistime,theprojectgrewintoamajorplayerinthefieldofsearchanddataanalysissolutionsandiswidelyusedinmanycommonorlesser-knownsearchanddataanalysisplatforms.Inaddition,duetoitsdistributednatureandreal-timesearchandanalyticscapabilities,manyorganizationsuseitasadocumentstore.

www.EBooksWorld.ir

Page 67: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

KeyconceptsofElasticsearchInthenextfewpages,wewillgetyouthroughthebasicconceptsofElasticsearch.YoucanskipthissectionifyouarealreadyfamiliarwithElasticsearcharchitecture.However,ifyouarenotfamiliarwithElasticsearch,westronglyadviseyoutoreadthissection.Wewillrefertothekeywordsusedinthissectionintherestofthebook,andunderstandingthoseconceptsiscrucialtofullyutilizeElasticsearch.

IndexAnindexisthelogicalplacewhereElasticsearchstoresthedata.EachindexcanbespreadontomultipleElasticsearchnodesandisdividedintooneormoresmallerpiecescalledshardsthatarephysicallyplacedontheharddrives.Ifyouarecomingfromtherelationaldatabaseworld,youcanthinkofanindexlikeatable.However,theindexstructureispreparedforfastandefficientfulltextsearchingand,inparticular,doesnotstoreoriginalvalues.Thatstructureiscalledaninvertedindex(https://en.wikipedia.org/wiki/Inverted_index).

IfyouknowMongoDB,youcanthinkoftheElasticsearchindexasacollectioninMongoDB.IfyouarefamiliarwithCouchDB,youcanthinkaboutanindexasyouwouldabouttheCouchDBdatabase.Elasticsearchcanholdmanyindiceslocatedononemachineorspreadthemovermultipleservers.Aswehavealreadysaid,everyindexisbuiltofoneormoreshards,andeachshardcanhavemanyreplicas.

DocumentThemainentitystoredinElasticsearchisadocument.Adocumentcanhavemultiplefields,eachhavingitsowntypeandtreateddifferently.Usingtheanalogytorelationaldatabases,adocumentisarowofdatainadatabasetable.WhenyoucompareanElasticsearchdocumenttoaMongoDBdocument,youwillseethatbothcanhavedifferentstructures.ThethingtokeepinmindwhenitcomestoElasticsearchisthatfieldsthatarecommontomultipletypesinthesameindexneedtohavethesametype.Thismeansthatallthedocumentswithafieldcalledtitleneedtohavethesamedatatypeforit,forexample,string.

Documentsconsistoffields,andeachfieldmayoccurseveraltimesinasingledocument(suchafieldiscalledmultivalued).Eachfieldhasatype(text,number,date,andsoon).Thefieldtypescanalsobecomplex—afieldcancontainothersubdocumentsorarrays.ThefieldtypeisimportanttoElasticsearchbecausetypedetermineshowvariousoperationssuchasanalysisorsortingareperformed.Fortunately,thiscanbedeterminedautomatically(however,westillsuggestusingmappings;takealookatwhatfollows).

Unliketherelationaldatabases,documentsdon’tneedtohaveafixedstructure—everydocumentmayhaveadifferentsetoffields,andinadditiontothis,fieldsdon’thavetobeknownduringapplicationdevelopment.Ofcourse,onecanforceadocumentstructurewiththeuseofschema.Fromtheclient’spointofview,adocumentisaJSONobject(seemoreabouttheJSONformatathttps://en.wikipedia.org/wiki/JSON).Eachdocumentisstoredinoneindexandhasitsownuniqueidentifier,whichcanbegenerated

www.EBooksWorld.ir

Page 68: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

automaticallybyElasticsearch,anddocumenttype.Thethingtorememberisthatthedocumentidentifierneedstobeuniqueinsideanindexandshouldbeforagiventype.Thismeansthat,inasingleindex,twodocumentscanhavethesameuniqueidentifieriftheyarenotofthesametype.

DocumenttypeInElasticsearch,oneindexcanstoremanyobjectsservingdifferentpurposes.Forexample,ablogapplicationcanstorearticlesandcomments.Thedocumenttypeletsuseasilydifferentiatebetweentheobjectsinasingleindex.Everydocumentcanhaveadifferentstructure,butinreal-worlddeployments,dividingdocumentsintotypessignificantlyhelpsindatamanipulation.Ofcourse,oneneedstokeepthelimitationsinmind.Thatis,differentdocumenttypescan’tsetdifferenttypesforthesameproperty.Forexample,afieldcalledtitlemusthavethesametypeacrossalldocumenttypesinagivenindex.

MappingInthesectionaboutthebasicsoffulltextsearching(theFulltextsearchingsection),wewroteabouttheprocessofanalysis—thepreparationoftheinputtextforindexingandsearchingdonebytheunderlyingApacheLucenelibrary.Everyfieldofthedocumentmustbeproperlyanalyzeddependingonitstype.Forexample,adifferentanalysischainisrequiredforthenumericfields(numbersshouldn’tbesortedalphabetically)andforthetextfetchedfromwebpages(forexample,thefirststepwouldrequireyoutoomittheHTMLtagsasitisuselessinformation).Tobeabletoproperlyanalyzeatindexingandqueryingtime,Elasticsearchstorestheinformationaboutthefieldsofthedocumentsinso-calledmappings.Everydocumenttypehasitsownmapping,evenifwedon’texplicitlydefineit.

www.EBooksWorld.ir

Page 69: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

KeyconceptsoftheElasticsearchinfrastructureNow,wealreadyknowthatElasticsearchstoresitsdatainoneormoreindicesandeveryindexcancontaindocumentsofvarioustypes.WealsoknowthateachdocumenthasmanyfieldsandhowElasticsearchtreatsthesefieldsisdefinedbythemappings.Butthereismore.Fromthebeginning,Elasticsearchwascreatedasadistributedsolutionthatcanhandlebillionsofdocumentsandhundredsofsearchrequestspersecond.Thisisduetoseveralimportantkeyfeaturesandconceptsthatwearegoingtodescribeinmoredetailnow.

NodesandclustersElasticsearchcanworkasastandalone,single-searchserver.Nevertheless,tobeabletoprocesslargesetsofdataandtoachievefaulttoleranceandhighavailability,Elasticsearchcanberunonmanycooperatingservers.Collectively,theseserversconnectedtogetherarecalledaclusterandeachserverformingaclusteriscalledanode.

ShardsWhenwehavealargenumberofdocuments,wemaycometoapointwhereasinglenodemaynotbeenough—forexample,becauseofRAMlimitations,harddiskcapacity,insufficientprocessingpower,andaninabilitytorespondtoclientrequestsfastenough.Insuchcases,anindex(andthedatainit)canbedividedintosmallerpartscalledshards(whereeachshardisaseparateApacheLuceneindex).Eachshardcanbeplacedonadifferentserver,andthusyourdatacanbespreadamongtheclusternodes.Whenyouqueryanindexthatisbuiltfrommultipleshards,Elasticsearchsendsthequerytoeachrelevantshardandmergestheresultinsuchawaythatyourapplicationdoesn’tknowabouttheshards.Inadditiontothis,havingmultipleshardscanspeedupindexing,becausedocumentsendupindifferentshardsandthustheindexingoperationisparallelized.

ReplicasInordertoincreasequerythroughputorachievehighavailability,shardreplicascanbeused.Areplicaisjustanexactcopyoftheshard,andeachshardcanhavezeroormorereplicas.Inotherwords,Elasticsearchcanhavemanyidenticalshardsandoneofthemisautomaticallychosenasaplacewheretheoperationsthatchangetheindexaredirected.Thisspecialshardiscalledaprimaryshard,andtheothersarecalledreplicashards.Whentheprimaryshardislost(forexample,aserverholdingthesharddataisunavailable),theclusterwillpromotethereplicatobethenewprimaryshard.

GatewayTheclusterstateisheldbythegateway,whichstorestheclusterstateandindexeddataacrossfullclusterrestarts.Bydefault,everynodehasthisinformationstoredlocally;itissynchronizedamongnodes.WewilldiscussthegatewaymoduleinThegatewayandrecoverymodulessectionofChapter9,ElasticsearchCluster,indetail.

www.EBooksWorld.ir

Page 70: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndexingandsearchingYoumaywonderhowyoucantiealltheindices,shards,andreplicastogetherinasingleenvironment.Theoretically,itwouldbeverydifficulttofetchdatafromtheclusterwhenyouhavetoknowwhereyourdocumentis:onwhichserver,andinwhichshard.Evenmoredifficultwouldbesearchingwhenonequerycanreturndocumentsfromdifferentshardsplacedondifferentnodesinthewholecluster.Infact,thisisacomplicatedproblem;fortunately,wedon’thavetocareaboutthisatall—itishandledautomaticallybyElasticsearch.Let’slookatthefollowingdiagram:

Whenyousendanewdocumenttothecluster,youspecifyatargetindexandsendittoanyofthenodes.Thenodeknowshowmanyshardsthetargetindexhasandisabletodeterminewhichshardshouldbeusedtostoreyourdocument.Elasticsearchcanalterthisbehavior;wewilltalkaboutthisintheIntroductiontoroutingsectioninChapter2,IndexingYourData.TheimportantinformationthatyouhavetorememberfornowisthatElasticsearchcalculatestheshardinwhichthedocumentshouldbeplacedusingtheuniqueidentifierofthedocument—thisisoneofthereasonseachdocumentneedsauniqueidentifier.Aftertheindexingrequestissenttoanode,thatnodeforwardsthedocumenttothetargetnode,whichhoststherelevantshard.

Now,let’slookatthefollowingdiagramonsearchingrequestexecution:

www.EBooksWorld.ir

Page 71: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Whenyoutrytofetchadocumentbyitsidentifier,thenodeyousendthequerytousesthesameroutingalgorithmtodeterminetheshardandthenodeholdingthedocumentandagainforwardstherequest,fetchestheresult,andsendstheresulttoyou.Ontheotherhand,thequeryingprocessisamorecomplicatedone.Thenodereceivingthequeryforwardsittoallthenodesholdingtheshardsthatbelongtoagivenindexandasksforminimuminformationaboutthedocumentsthatmatchthequery(theidentifierandscorearematchedbydefault),unlessroutingisused,whenthequerywillgodirectlytoasingleshardonly.Thisiscalledthescatterphase.Afterreceivingthisinformation,theaggregatornode(thenodethatreceivestheclientrequest)sortstheresultsandsendsasecondrequesttogetthedocumentsthatareneededtobuildtheresultslist(alltheotherinformationapartfromthedocumentidentifierandscore).Thisiscalledthegatherphase.Afterthisphaseisexecuted,theresultsarereturnedtotheclient.

Nowthequestionarises:whatisthereplica’sroleinthepreviouslydescribedprocess?Whileindexing,replicasareonlyusedasanadditionalplacetostorethedata.Whenexecutingaquery,bydefault,Elasticsearchwilltrytobalancetheloadamongtheshardanditsreplicassothattheyareevenlystressed.Also,rememberthatwecanchangethisbehavior;wewilldiscussthisintheUnderstandingthequeryingprocesssectioninChapter3,SearchingYourData.

www.EBooksWorld.ir

Page 72: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 73: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

InstallingandconfiguringyourclusterInstallingandrunningElasticsearcheveninproductionenvironmentsisveryeasynowadays,comparedtohowitwasinthedaysofElasticsearch0.20.x.FromasystemthatisnotreadytoonewithElasticsearch,thereareonlyafewstepsthatoneneedstogo.Wewillexplorethesestepsinthefollowingsection:

www.EBooksWorld.ir

Page 74: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

InstallingJavaElasticsearchisaJavaapplicationandtouseitweneedtomakesurethattheJavaSEenvironmentisinstalledproperly.ElasticsearchrequiresJavaVersion7orlatertorun.Youcandownloaditfromhttp://www.oracle.com/technetwork/java/javase/downloads/index.html.YoucanalsouseOpenJDK(http://openjdk.java.net/)ifyouwish.Youcan,ofcourse,useJavaVersion7,butitisnotsupportedbyOracleanymore,atleastwithoutcommercialsupport.Forexample,youcan’texpectnew,patchedversionsofJava7tobereleased.Becauseofthis,westronglysuggestthatyouinstallJava8,especiallygiventhatJava9seemstoberightaroundthecornerwiththegeneralavailabilityplannedtobereleasedinSeptember2016.

www.EBooksWorld.ir

Page 75: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

InstallingElasticsearchToinstallElasticsearchyoujustneedtogotohttps://www.elastic.co/downloads/elasticsearch,choosethelaststableversionofElasticsearch,downloadit,andunpackit.That’sit!Theinstallationiscomplete.

NoteAtthetimeofwriting,weusedasnapshotofElasticsearch2.2.Thismeansthatwe’veskippeddescribingsomepropertiesthatweremarkedasdeprecatedandareorwillberemovedinthefutureversionsofElasticsearch.

ThemaininterfacetocommunicatewithElasticsearchisbasedontheHTTPprotocolandREST.Thismeansthatyoucanevenuseawebbrowserforsomebasicqueriesandrequests,butforanythingmoresophisticatedyou’llneedtouseadditionalsoftware,suchasthecURLcommand.IfyouusetheLinuxorOSXcommand,thecURLpackageshouldalreadybeavailable.IfyouuseWindows,youcandownloadthepackagefromhttp://curl.haxx.se/download.html.

www.EBooksWorld.ir

Page 76: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RunningElasticsearchLet’srunourfirstinstancethatwejustdownloadedastheZIParchiveandunpacked.GotothebindirectoryandrunthefollowingcommandsdependingontheOS:

LinuxorOSX:./elasticsearchWindows:elasticsearch.bat

Congratulations!Now,youhaveyourElasticsearchinstanceup-and-running.Duringitswork,theserverusuallyusestwoportnumbers:thefirstoneforcommunicationwiththeRESTAPIusingtheHTTPprotocol,andthesecondoneforthetransportmoduleusedforcommunicationinaclusterandbetweenthenativeJavaclientandthecluster.ThedefaultportusedfortheHTTPAPIis9200,sowecanchecksearchreadinessbypointingthewebbrowsertohttp://127.0.0.1:9200/.Thebrowsershouldshowacodesnippetsimilartothefollowing:

{

"name":"Blob",

"cluster_name":"elasticsearch",

"version":{

"number":"2.2.0",

"build_hash":"5b1dd1cf5a1957682d84228a569e124fedf8e325",

"build_timestamp":"2016-01-13T18:12:26Z",

"build_snapshot":true,

"lucene_version":"5.4.0"

},

"tagline":"YouKnow,forSearch"

}

TheoutputisstructuredasaJavaScriptObjectNotation(JSON)object.IfyouarenotfamiliarwithJSON,pleasetakeaminuteandreadthearticleavailableathttps://en.wikipedia.org/wiki/JSON.

NoteElasticsearchissmart.Ifthedefaultportisnotavailable,theenginebindstothenextfreeport.Youcanfindinformationaboutthisontheconsoleduringbootingasfollows:

[2016-01-1320:04:49,953][INFO][http][Blob]publish_address

{127.0.0.1:9201},bound_addresses{[fe80::1]:9200},{[::1]:9200},

{127.0.0.1:9201}

Notethefragmentwith[http].Elasticsearchusesafewportsforvarioustasks.TheinterfacethatweareusingishandledbytheHTTPmodule.

Now,wewillusethecURLprogramtocommunicatewithElasticsearch.Forexample,tochecktheclusterhealth,wewillusethefollowingcommand:

curl-XGEThttp://127.0.0.1:9200/_cluster/health?pretty

The-XparameterisadefinitionoftheHTTPrequestmethod.ThedefaultvalueisGET(sointhisexample,wecanomitthisparameter).Fornow,donotworryabouttheGETvalue;wewilldescribeitinmoredetaillaterinthischapter.

www.EBooksWorld.ir

Page 77: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Asastandard,theAPIreturnsinformationinaJSONobjectinwhichnewlinecharactersareomitted.TheprettyparameteraddedtoourrequestsforcesElasticsearchtoaddanewlinecharactertotheresponse,makingtheresponsemoreuser-friendly.Youcantryrunningtheprecedingquerywithandwithoutthe?prettyparametertoseethedifference.

Elasticsearchisusefulinsmallandmedium-sizedapplications,butithasbeenbuiltwithlargeclustersinmind.So,nowwewillsetupourbigtwo-nodecluster.UnpacktheElasticsearcharchiveinadifferentdirectoryandrunthesecondinstance.Ifwelookatthelog,wewillseethefollowing:

[2016-01-1320:07:58,561][INFO][cluster.service][BigMan]

detected_master{Blob}{5QPh00RUQraeLHAInbR4Jw}{127.0.0.1}{127.0.0.1:9300},

added{{Blob}{5QPh00RUQraeLHAInbR4Jw}{127.0.0.1}{127.0.0.1:9300},},reason:

zen-disco-receive(frommaster[{Blob}{5QPh00RUQraeLHAInbR4Jw}{127.0.0.1}

{127.0.0.1:9300}])

Thismeansthatoursecondinstance(namedBigMan)discoveredthepreviouslyrunninginstance(namedBlob).Here,Elasticsearchautomaticallyformedanewtwo-nodecluster.StartingfromElasticsearch2.0,thiswillonlyworkwithnodesrunningonthesamephysicalmachine—becauseElasticsearch2.0nolongersupportsmulticast.Toallowyourclustertoform,youneedtoinformElasticsearchaboutthenodesthatshouldbecontactedinitiallyusingthediscovery.zen.ping.unicast.hostsarrayinelasticsearch.yml.Forexample,likethis:

discovery.zen.ping.unicast.hosts:["192.168.2.1","192.168.2.2"]

www.EBooksWorld.ir

Page 78: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ShuttingdownElasticsearchEventhoughweexpectourcluster(ornode)torunflawlesslyforalifetime,wemayneedtorestartitorshutitdownproperly(forexample,formaintenance).ThefollowingarethetwowaysinwhichwecanshutdownElasticsearch:

Ifyournodeisattachedtotheconsole,justpressCtrl+CThesecondoptionistokilltheserverprocessbysendingtheTERMsignal(seethekillcommandontheLinuxboxesandProgramManageronWindows)

NoteThepreviousversionsofElasticsearchexposedadedicatedshutdownAPIbut,in2.0,thisoptionhasbeenremovedbecauseofsecurityreasons.

www.EBooksWorld.ir

Page 79: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThedirectorylayoutNow,let’sgotothenewlycreateddirectory.Weshouldseethefollowingdirectorystructure:

Directory Description

Bin ThescriptsneededtorunElasticsearchinstancesandforpluginmanagement

Config Thedirectorywhereconfigurationfilesarelocated

Lib ThelibrariesusedbyElasticsearch

Modules ThepluginsbundledwithElasticsearch

AfterElasticsearchstarts,itwillcreatethefollowingdirectories(iftheydon’texist):

Directory Description

Data ThedirectoryusedbyElasticsearchtostoreallthedata

Logs Thefileswithinformationabouteventsanderrors

Plugins Thelocationtostoretheinstalledplugins

Work ThetemporaryfilesusedbyElasticsearch

www.EBooksWorld.ir

Page 80: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ConfiguringElasticsearchOneofthereasons—ofcourse,nottheonlyone—whyElasticsearchisgainingmoreandmorepopularityisthatgettingstartedwithElasticsearchisquiteeasy.Becauseofthereasonabledefaultvaluesandautomaticsettingsforsimpleenvironments,wecanskiptheconfigurationandgostraighttoindexingandquerying(ortothenextchapterofthebook).Wecandoallthiswithoutchangingasinglelineinourconfigurationfiles.However,inordertotrulyunderstandElasticsearch,itisworthunderstandingsomeoftheavailablesettings.

WewillnowexplorethedefaultdirectoriesandthelayoutofthefilesprovidedwiththeElasticsearchtar.gzarchive.Theentireconfigurationislocatedintheconfigdirectory.Wecanseetwofileshere:elasticsearch.yml(orelasticsearch.json,whichwillbeusedifpresent)andlogging.yml.Thefirstfileisresponsibleforsettingthedefaultconfigurationvaluesfortheserver.Thisisimportantbecausesomeofthesevaluescanbechangedatruntimeandcanbekeptasapartoftheclusterstate,sothevaluesinthisfilemaynotbeaccurate.Thetwovaluesthatwecannotchangeatruntimearecluster.nameandnode.name.

Thecluster.namepropertyisresponsibleforholdingthenameofourcluster.Theclusternameseparatesdifferentclustersfromeachother.Nodesconfiguredwiththesameclusternamewilltrytoformacluster.

Thesecondvalueistheinstance(thenode.nameproperty)name.Wecanleavethisparameterundefined.Inthiscase,Elasticsearchautomaticallychoosesauniquenameforitself.Notethatthisnameischosenduringeachstartup,sothenamecanbedifferentoneachrestart.DefiningthenamecanhelpfulwhenreferringtoconcreteinstancesbytheAPIorwhenusingmonitoringtoolstoseewhatishappeningtoanodeduringlongperiodsoftimeandbetweenrestarts.Thinkaboutgivingdescriptivenamestoyournodes.

Otherparametersarecommentedwellinthefile,soweadviseyoutolookthroughit;don’tworryifyoudonotunderstandtheexplanation.Wehopethateverythingwillbecomeclearerafterreadingthenextfewchapters.

NoteRememberthatmostoftheparametersthathavebeensetintheelasticsearch.ymlfilecanbeoverwrittenwiththeuseoftheElasticsearchRESTAPI.WewilltalkaboutthisAPIinTheupdatesettingsAPIsectionofChapter9,ElasticsearchClusterinDetail.

Thesecondfile(logging.yml)defineshowmuchinformationiswrittentosystemlogs,definesthelogfiles,andcreatesnewfilesperiodically.Changesinthisfileareusuallyrequiredonlywhenyouneedtoadapttomonitoringorbackupsolutionsorduringsystemdebugging;however,ifyouwanttohaveamoredetailedlogging,youneedtoadjustitaccordingly.

Let’sleavetheconfigurationfilesfornowandlookatthebaseforalltheapplications—theoperatingsystem.TuningyouroperatingsystemisoneofthekeypointstoensurethatyourElasticsearchinstancewillworkwell.Duringindexing,especiallywhenhaving

www.EBooksWorld.ir

Page 81: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

manyshardsandreplicas,Elasticsearchwillcreatemanyfiles;so,thesystemcannotlimittheopenfiledescriptorstolessthan32,000.ForLinuxservers,thiscanusuallybechangedin/etc/security/limits.confandthecurrentvaluecanbedisplayedusingtheulimitcommand.Ifyouendupreachingthelimit,Elasticsearchwillnotbeabletocreatenewfiles;somergingwillfail,indexingmayfail,andnewindiceswillnotbecreated.

NoteOnMicrosoftWindowsplatforms,thedefaultlimitismorethan16millionhandlesperprocess,whichshouldbemorethanenough.YoucanreadmoreaboutfilehandlesontheMicrosoftWindowsplatformathttps://blogs.technet.microsoft.com/markrussinovich/2009/09/29/pushing-the-limits-of-windows-handles/.

ThenextsetofsettingsisconnectedtotheJavaVirtualMachine(JVM)heapmemorylimitforasingleElasticsearchinstance.Forsmalldeployments,thedefaultmemorylimit(1,024MB)willbesufficient,butforlargeonesitwillnotbeenough.IfyouspotentriesthatindicateOutOfMemoryErrorexceptionsinalogfile,settheES_HEAP_SIZEvariabletoavaluegreaterthan1024.WhenchoosingtherightamountofmemorysizetobegiventotheJVM,rememberthat,ingeneral,nomorethan50percentofyourtotalsystemmemoryshouldbegiven.However,aswithalltherules,thereareexceptions.Wewilldiscussthisingreaterdetaillater,butyoushouldalwaysmonitoryourJVMheapusageandadjustitwhenneeded.

www.EBooksWorld.ir

Page 82: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Thesystem-specificinstallationandconfigurationAlthoughdownloadinganarchivewithElasticsearchandunpackingitworksandisconvenientfortesting,therearededicatedmethodsforLinuxoperatingsystemsthatgiveyouseveraladvantageswhenyoudoproductiondeployment.Inproductiondeployments,theElasticsearchserviceshouldberunautomaticallywithasystemboot;weshouldhavededicatedstartandstopscripts,unifiedpaths,andsoon.ElasticsearchsupportsinstallationpackagesforvariousLinuxdistributionsthatwecanuse.Let’sseehowthisworks.

InstallingElasticsearchonLinuxTheotherwaytoinstallElasticsearchonaLinuxoperatingsystemistousepackagessuchasRPMorDEB,dependingonyourLinuxdistributionandthesupportedpackagetype.Thiswaywecanautomaticallyadapttosystemdirectorylayout;forexample,configurationandlogswillgointotheirstandardplacesinthe/etc/or/var/logdirectories.Butthisisnottheonlything.Whenusingpackages,Elasticsearchwillalsoinstallstartupscriptsandmakeourlifeeasier.What’smore,wewillbeabletoupgradeElasticsearcheasilybyrunningasinglecommandfromthecommandline.Ofcourse,thementionedpackagescanbefoundatthesameURLaddressaswementionedpreviouslywhenwetalkedaboutinstallingElasticsearchfromziportar.gzpackages:https://www.elastic.co/downloads/elasticsearch.Elasticsearchcanalsobeinstalledfromremoterepositoriesviastandarddistributiontoolssuchasapt-getoryum.

NoteBeforeinstallingElasticsearch,makesurethatyouhaveaproperversionofJavaVirtualMachineinstalled.

InstallingElasticsearchusingRPMpackages

WhenusingaLinuxdistributionthatsupportsRPMpackagessuchasFedoraLinux,(https://getfedora.org/)Elasticsearchinstallationisveryeasy.AfterdownloadingtheRPMpackage,wejustneedtorunthefollowingcommandasroot:

yumelasticsearch-2.2.0.noarch.rpm

Alternatively,youcanaddtheremoterepositoryandinstallElasticsearchfromit(thiscommandneedstoberunasrootaswell):

rpm--importhttps://packages.elastic.co/GPG-KEY-elasticsearch

ThiscommandaddstheGPGkeyandallowsthesystemtoverifythatthefetchedpackagereallycomesfromElasticsearchdevelopers.Inthesecondstep,weneedtocreatetherepositorydefinitioninthe/etc/yum.repos.d/elasticsearch.repofile.Weneedtoaddthefollowingentriestothisfile:

[elasticsearch-2.2]

name=Elasticsearchrepositoryfor2.2.xpackages

baseurl=http://packages.elastic.co/elasticsearch/2.x/centos

gpgcheck=1

www.EBooksWorld.ir

Page 83: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

gpgkey=http://packages.elastic.co/GPG-KEY-elasticsearch

enabled=1

Nowit’stimetoinstalltheElasticsearchserver,whichisassimpleasrunningthefollowingcommand(again,don’tforgettorunitasroot):

yuminstallelasticsearch

Elasticsearchwillbeautomaticallydownloaded,verified,andinstalled.

InstallingElasticsearchusingtheDEBpackage

WhenusingaLinuxdistributionthatsupportsDEBpackages(suchasDebian),installingElasticsearchisagainveryeasy.AfterdownloadingtheDEBpackage,allyouneedtodoisrunthefollowingcommand:

sudodpkg-ielasticsearch-2.2.0.deb

Itisassimpleasthat.Anotherway,whichissimilartowhatwedidwithRPMpackages,isbycreatinganewpackagessourceandinstallingElasticsearchfromtheremoterepository.ThefirststepistoaddthepublicGPGkeyusedforpackageverification.Wecandothatusingthefollowingcommand:

wget-qO-https://packages.elastic.co/GPG-KEY-elasticsearch|sudoapt-key

add-

ThesecondstepisbyaddingtheDEBpackagelocation.Weneedtoaddthefollowinglinetothe/etc/apt/sources.listfile:

debhttp://packages.elastic.co/elasticsearch/2.2/debianstablemain

ThisdefinesthesourcefortheElasticsearchpackages.ThelaststepisupdatingthelistofremotepackagesandinstallingElasticsearchusingthefollowingcommand:

sudoapt-getupdate&&sudoapt-getinstallelasticsearch

Elasticsearchconfigurationfilelocalization

WhenusingpackagestoinstallElasticsearch,theconfigurationfilesareinslightlydifferentdirectoriesthanthedefaultconfdirectory.Aftertheinstallation,theconfigurationfilesshouldbestoredinthefollowinglocation:

/etc/sysconfig/elasticsearchor/etc/default/elasticsearch:AfilewiththeconfigurationoftheElasticsearchprocessasausertorunas,directoriesforlogs,dataandmemorysettings/etc/elasticsearch/:AdirectoryfortheElasticsearchconfigurationfiles,suchastheelasticsearch.ymlfile

ConfiguringElasticsearchasasystemserviceonLinuxIfeverythinggoeswell,youcanrunElasticsearchusingthefollowingcommand:

/bin/systemctlstartelasticsearch.service

IfyouwantElasticsearchtostartautomaticallyeverytimetheoperatingsystemstarts,you

www.EBooksWorld.ir

Page 84: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

cansetupElasticsearchasasystemservicebyrunningthefollowingcommand:

/bin/systemctlenableelasticsearch.service

ElasticsearchasasystemserviceonWindowsInstallingElasticsearchasasystemserviceonWindowsisalsoveryeasy.YoujustneedtogotoyourElasticsearchinstallationdirectory,thengotothebinsubdirectory,andrunthefollowingcommand:

service.batinstall

You’llbeaskedforpermissiontodoso.Ifyouallowthescripttorun,ElasticsearchwillbeinstalledasaWindowsservice.

Ifyouwouldliketoseeallthecommandsexposedbytheservice.batscriptfile,justrunthefollowingcommandinthesamedirectoryasearlier:

service.bat

Forexample,tostartElasticsearch,wewilljustrunthefollowingcommand:

service.batstart

www.EBooksWorld.ir

Page 85: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 86: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ManipulatingdatawiththeRESTAPIElasticsearchexposesaveryrichRESTAPIthatcanbeusedtosearchthroughthedata,indexthedata,andcontrolElasticsearchbehavior.YoucanimaginethatusingtheRESTAPIallowsyoutogetasingledocument,indexorupdateadocument,gettheinformationonElasticsearchcurrentstate,createordeleteindices,orforceElasticsearchtomovearoundshardsofyourindices.Ofcourse,theseareonlyexamplesthatshowwhatyoucanexpectfromtheElasticsearchRESTAPI.Fornow,wewillconcentrateonusingthecreate,retrieve,update,delete(CRUD)partoftheElasticsearchAPI(https://en.wikipedia.org/wiki/Create,_read,_update_and_delete),whichallowsustouseElasticsearchinafashionsimilartohowwewoulduseanyotherNoSQL(https://en.wikipedia.org/wiki/NoSQL)datastore.

www.EBooksWorld.ir

Page 87: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UnderstandingtheRESTAPIIfyou’veneverusedanapplicationexposingtheRESTAPI,youmaybesurprisedhoweasyitistousesuchapplicationsandrememberhowtousethem.InREST-likearchitectures,everyrequestisdirectedtoaconcreteobjectindicatedbyapathintheaddress.Forexample,let’sassumethatourhypotheticalapplicationexposesthe/booksRESTend-pointasareferencetothelistofbooks.Insuchcase,acallto/books/1couldbeareferencetoaconcretebookwiththeidentifier1.Youcanthinkofitasadata-orientedmodelofanAPI.Ofcourse,wecannestthepaths—forexample,apathsuchas/books/1/chapterscouldreturnthelistofchaptersofourbookwithidentifier1andapathsuchas/books/1/chapters/6couldbeareferencetothesixthchapterinthatparticularbook.

Wetalkedaboutpaths,butwhenusingtheHTTPprotocol,(https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol)wehavesomeadditionalverbs(suchasPOST,GET,PUT,andsoon.)thatwecanusetodefinesystembehaviorinadditiontopaths.Soifwewouldliketoretrievethebookwithidentifier1,wewouldusetheGETrequestmethodwiththe/books/1path.However,wewouldusethePUTrequestmethodwiththesamepathtocreateabookrecordwiththeidentifierorone,thePOSTrequestmethodtoaltertherecord,DELETEtoremovethatentry,andtheHEADrequestmethodtogetbasicinformationaboutthedatareferencedbythepath.

Now,let’slookatexampleHTTPrequeststhataresenttorealElasticsearchRESTAPIendpoints,sotheprecedinghypotheticalinformationwillbeturnedintosomethingreal:

GEThttp://localhost:9200/:ThisretrievesbasicinformationaboutElasticsearch,suchastheversion,thenameofthenodethatthecommandhasbeensentto,thenameoftheclusterthatnodeisconnectedto,theApacheLuceneversion,andsoon.

GEThttp://localhost:9200/_cluster/state/nodes/Thisretrievesinformationaboutallthenodesinthecluster,suchastheiridentifiers,names,transportaddresseswithports,andadditionalnodeattributesforeachnode.

DELETEhttp://localhost:9200/books/book/123:Thisdeletesadocumentthatisindexedinthebooksindex,withthebooktypeandanidentifierof123.

WenowknowwhatRESTmeansandwecanstartconcentratingonElasticsearchtoseehowwecanstore,retrieve,alter,anddeletethedatafromitsindices.IfyouwouldliketoreadmoreaboutREST,pleaserefertohttp://en.wikipedia.org/wiki/Representational_state_transfer.

www.EBooksWorld.ir

Page 88: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

StoringdatainElasticsearchInElasticsearch,everydocumentisrepresentedbythreeattributes—theindex,thetype,andtheidentifier.Eachdocumentmustbeindexedintoasingleindex,needstohaveitstypecorrespondtothedocumentstructure,andisdescribedbytheidentifier.ThesethreeattributesallowsustoidentifyanydocumentinElasticsearchandneedstobeprovidedwhenthedocumentisphysicallywrittentotheunderlyingApacheLuceneindex.Havingtheknowledge,wearenowreadytocreateourfirstElasticsearchdocument.

CreatinganewdocumentWewillstartlearningtheElasticsearchRESTAPIbyindexingonedocument.Let’simaginethatwearebuildingaCMSsystem(http://en.wikipedia.org/wiki/Content_management_system)thatwillprovidethefunctionalityofabloggingplatformforourinternalusers.Wewillhavedifferenttypesofdocumentsinourindices,butthemostimportantonesarethearticlesthatwillbepublishedandarereadablebyusers.

BecausewetalktoElasticsearchusingJSONnotationandElasticsearchrespondstousagainusingJSON,ourexampledocumentcouldlookasfollows:

{

"id":"1",

"title":"NewversionofElasticsearchreleased!",

"content":"Version2.2releasedtoday!",

"priority":10,

"tags":["announce","elasticsearch","release"]

}

Asyoucanseeintheprecedingcodesnippet,theJSONdocumentisbuiltwithasetoffields,whereeachfieldcanhaveadifferentformat.Inourexample,wehaveasetoftextfields(id,title,andcontent),wehaveanumber(thepriorityfield),andanarrayoftextvalues(thetagsfield).Wewillshowdocumentsthataremorecomplicatedinthenextexamples.

NoteOneofthechangesintroducedinElasticsearch2.0hasbeenthatfieldnamescan’tcontainthedotcharacter.SuchfieldnameswerepossibleinolderversionsofElasticsearch,butcouldresultinserializationerrorsincertaincasesandthusElasticsearchcreatorsdecidedtoremovethatpossibility.

OnethingtorememberisthatbydefaultElasticsearchworksasaschema-lessdatastore.ThismeansthatitcantrytoguessthetypeofthefieldinadocumentsenttoElasticsearch.Itwilltrytousenumerictypesforthevaluesthatarenotenclosedinquotationmarksandstringsfordataenclosedinquotationmarks.Itwilltrytoguessthedateandindexthemindedicatedfieldsandsoon.ThisispossiblebecausetheJSONformatissemi-typed.Internally,whenthefirstdocumentwithanewfieldissenttoElasticsearch,itwillbeprocessedandmappingswillbewritten(wewilltalkmoreaboutmappingsintheMappingsconfigurationsectionofChapter2,IndexingYourData).

www.EBooksWorld.ir

Page 89: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

NoteAschema-lessapproachanddynamicmappingscanbeproblematicwhendocumentscomewithaslightlydifferentstructure—forexample,thefirstdocumentwouldcontainthevalueofthepriorityfieldwithoutquotationmarks(liketheoneshowninthediscussedexample),whiletheseconddocumentwouldhavequotationmarksforthevalueinthepriorityfield.ThiswillresultinanerrorbecauseElasticsearchwilltrytoputatextvalueinthenumericfieldandthisisnotpossibleinLucene.Becauseofthis,itisadvisabletodefineyourownmappings,whichyouwilllearnintheMappingsconfigurationsectionofChapter2,IndexingYourData.

Let’snowindexourdocumentandmakeitavailableforretrievalandsearching.Wewillindexourarticlestoanindexcalledblogunderatypenamedarticle.Wewillalsogiveourdocumentanidentifierof1,asthisisourfirstdocument.Toindexourexampledocument,wewillexecutethefollowingcommand:

curl-XPUT'http://localhost:9200/blog/article/1'-d'{"title":"New

versionofElasticsearchreleased!","content":"Version2.2released

today!","priority":10,"tags":["announce","elasticsearch","release"]

}'

Noteanewoptiontothecurlcommand,the-dparameter.Thevalueofthisoptionisthetextthatwillbeusedasarequestpayload—arequestbody.Thisway,wecansendadditionalinformationsuchasthedocumentdefinition.Also,notethattheuniqueidentifierisplacedintheURLandnotinthebody.Ifyouomitthisidentifier(whileusingtheHTTPPUTrequest),theindexingrequestwillreturnthefollowingerror:

Nohandlerfoundforuri[/blog/article]andmethod[PUT]

Ifeverythingworkedcorrectly,ElasticsearchwillreturnaJSONresponseinformingusaboutthestatusoftheindexingoperation.Thisresponseshouldbesimilartothefollowingone:

{

"_index":"blog",

"_type":"article",

"_id":"1",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0},

"created":true

}

Intheprecedingresponse,Elasticsearchincludedinformationaboutthestatusoftheoperation,index,type,identifier,andversion.Wecanalsoseeinformationabouttheshardsthattookpartintheoperation—allofthem,theonesthatweresuccessfulandtheonesthatfailed.

Automaticidentifiercreation

www.EBooksWorld.ir

Page 90: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Inthepreviousexample,wespecifiedthedocumentidentifiermanuallywhenweweresendingthedocumenttoElasticsearch.However,thereareusecaseswhenwedon’thaveanidentifierforourdocuments—forexample,whenhandlinglogsasourdata.Insuchcases,wewouldlikesomeapplicationtocreatetheidentifierforusandElasticsearchcanbesuchanapplication.Ofcourse,generatingdocumentidentifiersdoesn’tmakesensewhenyourdocumentalreadyhasthem,suchasdatainarelationaldatabase.Insuchcases,youmaywanttoupdatethedocuments;inthiscase,automaticidentifiergenerationisnotthebestidea.However,whenweareinneedofsuchfunctionality,insteadofusingtheHTTPPUTmethodwecanusePOSTandomittheidentifierintheRESTAPIpath.SoifwewouldlikeElasticsearchtogeneratetheidentifierinthepreviousexample,wewouldsendacommandlikethis:

curl-XPOST'http://localhost:9200/blog/article/'-d'{"title":"New

versionofElasticsearchreleased!","content":"Version2.2released

today!","priority":10,"tags":["announce","elasticsearch","release"]

}'

We’veusedtheHTTPPOSTmethodinsteadofPUTandwe’veomittedtheidentifier.TheresponseproducedbyElasticsearchinsuchacasewouldbeasfollows:

{

"_index":"blog",

"_type":"article",

"_id":"AU1y-s6w2WzST_RhTvCJ",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0},

"created":true

}

Asyoucansee,theresponsereturnedbyElasticsearchisalmostthesameasinthepreviousexample,withaminordifference—the_idfieldisreturned.Now,insteadofthe1value,wehaveavalueofAU1y-s6w2WzST_RhTvCJ,whichistheidentifierElasticsearchgeneratedforourdocument.

www.EBooksWorld.ir

Page 91: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RetrievingdocumentsWenowhavetwodocumentsindexedintoourElasticsearchinstance—oneusingaexplicitidentifierandoneusingageneratedidentifier.Let’snowtrytoretrieveoneofthedocumentsusingitsuniqueidentifier.Todothis,wewillneedinformationabouttheindexthedocumentisindexedin,whattypeithas,andofcoursewhatidentifierithas.Forexample,togetthedocumentfromtheblogindexwiththearticletypeandtheidentifierof1,wewouldrunthefollowingHTTPGETrequest:

curl-XGET'localhost:9200/blog/article/1?pretty'

NoteTheadditionalURIpropertycalledprettytellsElasticsearchtoincludenewlinecharactersandadditionalwhitespacesinresponsetomaketheoutputeasiertoreadforusers.

Elasticsearchwillreturnaresponsesimilartothefollowing:

{

"_index":"blog",

"_type":"article",

"_id":"1",

"_version":1,

"found":true,

"_source":{

"title":"NewversionofElasticsearchreleased!",

"content":"Version2.2releasedtoday!",

"priority":10,

"tags":["announce","elasticsearch","release"]

}

}

Asyoucanseeintheprecedingresponse,Elasticsearchreturnedthe_sourcefield,whichistheoriginaldocumentsenttoElasticsearchandafewadditionalfieldsthattellusaboutthedocument,suchastheindex,type,identifier,documentversion,andofcourseinformationastowhetherthedocumentwasfoundornot(thefoundproperty).

Ifwetrytoretrieveadocumentthatisnotpresentintheindex,suchastheonewiththe12345identifier,wegetaresponselikethis:

{

"_index":"blog",

"_type":"article",

"_id":"12345",

"found":false

}

Asyoucansee,thistimethevalueofthefoundpropertywassettofalseandtherewasno_sourcefieldbecausethedocumenthasnotbeenretrieved.

www.EBooksWorld.ir

Page 92: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UpdatingdocumentsUpdatingdocumentsintheindexisamorecomplicatedtaskcomparedtoindexing.WhenthedocumentisindexedandElasticsearchflushesthedocumenttoadisk,itcreatessegments—animmutablestructurethatiswrittenonceandreadmanytimes.ThisisdonebecausetheinvertedindexcreatedbyApacheLuceneiscurrentlyimpossibletoupdate(atleastmostofitsparts).Toupdateadocument,ElasticsearchinternallyfirstfetchesthedocumentusingtheGETrequest,modifiesits_sourcefield,removestheolddocument,andindexesanewdocumentusingtheupdatedcontent.ThecontentupdateisdoneusingscriptsinElasticsearch(wewilltalkmoreaboutscriptinginElasticsearchintheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter).

NotePleasenotethatthefollowingdocumentupdateexamplesrequireyoutoputthescript.inline:onpropertyintoyourelasticsearch.ymlconfigurationfile.ThisisneededbecauseinlinescriptingisdisabledinElasticsearchforsecurityreasons.TheotherwaytohandleupdatesistostorethescriptcontentinthefileintheElasticsearchconfigurationdirectory,butwewilltalkaboutthatintheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter.

Let’snowtrytoupdateourdocumentwithidentifier1bymodifyingitscontentfieldtocontaintheThisistheupdateddocumentsentence.Todothis,weneedtorunaPOSTHTTPrequestonthedocumentpathusingthe_updateRESTend-point.Ourrequesttomodifythedocumentwouldlookasfollows:

curl-XPOST'http://localhost:9200/blog/article/1/_update'-d'{

"script":"ctx._source.content=new_content",

"params":{

"new_content":"Thisistheupdateddocument"

}

}'

Asyoucansee,we’vesenttherequesttothe/blog/article/1/_updateRESTend-point.Intherequestbody,we’veprovidedtwoparameters—theupdatescriptinthescriptpropertyandtheparametersofthescript.Thescriptisverysimple;ittakesthe_sourcefieldandmodifiesthecontentfieldbysettingitsvaluetothevalueofthenew_contentparameter.Theparamspropertycontainsallthescriptparameters.

Fortheprecedingupdatecommandexecution,Elasticsearchwouldreturnthefollowingresponse:

{"_index":"blog","_type":"article","_id":"1","_version":2,"_shards":

{"total":2,"successful":1,"failed":0}}

Thethingtolookatintheprecedingresponseisthe_versionfield.Rightnow,theversionis2,whichmeansthatthedocumenthasbeenupdated(orre-indexed)once.Basically,eachupdatemakesElasticsearchupdatethe_versionfield.

Wecouldalsoupdatethedocumentusingthedocsectionandprovidingthechangedfield,

www.EBooksWorld.ir

Page 93: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

forexample:

curl-XPOST'http://localhost:9200/blog/article/1/_update'-d'{

"doc":{

"content":"Thisistheupdateddocument"

}

}'

Wenowretrievethedocumentusingthefollowingcommand:

curl-XGET'http://localhost:9200/blog/article/1?pretty'

AndwegetthefollowingresponsefromElasticsearch:

{

"_index":"blog",

"_type":"article",

"_id":"1",

"_version":2,

"found":true,

"_source":{

"title":"NewversionofElasticsearchreleased!",

"content":"Thisistheupdateddocument",

"priority":10,

"tags":["announce","elasticsearch","release"]

}

}

Asyoucansee,thedocumenthasbeenupdatedproperly.

NoteThethingtorememberwhenusingtheupdateAPIofElasticsearchisthatthe_sourcefieldneedstobepresentbecausethisisthefieldthatElasticsearchusestoretrievetheoriginaldocumentcontentfromtheindex.Bydefault,thatfieldisenabledandElasticsearchusesittostoretheoriginaldocument.

Dealingwithnon-existingdocumentsThenicethingwhenitcomestodocumentupdates,whichwewouldliketomentionasitcancomeinhandywhenusingElasticsearchUpdateAPI,isthatwecandefinewhatElasticsearchshoulddowhenthedocumentwetrytoupdateisnotpresent.

Forexample,let’stryincrementingthepriorityfieldvalueforanon-existingdocumentwithidentifier2:

curl-XPOST'http://localhost:9200/blog/article/2/_update'-d'{

"script":"ctx._source.priority+=1"

}'

TheresponsereturnedbyElasticsearchwouldlookmoreorlessasfollows:

{"error":{"root_cause":[{"type":"document_missing_exception","reason":"

[article][2]:document

missing","shard":"2","index":"blog"}],"type":"document_missing_exception","

reason":"[article][2]:document

www.EBooksWorld.ir

Page 94: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

missing","shard":"2","index":"blog"},"status":404}

Asyoucanimagine,thedocumenthasnotbeenupdatedbecauseitdoesn’texist.Sonow,let’smodifyourrequesttoincludetheupsertsectioninourrequestbodythatwilltellElasticsearchwhattodowhenthedocumentisnotpresent.Thenewcommandwouldlookasfollows:

curl-XPOST'http://localhost:9200/blog/article/2/_update'-d'{

"script":"ctx._source.priority+=1",

"upsert":{

"title":"Emptydocument",

"priority":0,

"tags":["empty"]

}

}'

Withthemodifiedrequest,anewdocumentwouldbeindexed;ifweretrieveitusingtheGETAPI,itwilllookasfollows:

{

"_index":"blog",

"_type":"article",

"_id":"2",

"_version":1,

"found":true,

"_source":{

"title":"Emptydocument",

"priority":0,

"tags":["empty"]

}

}

Asyoucansee,thefieldsfromtheupsertsectionofourupdaterequestweretakenbyElasticsearchandusedasdocumentfields.

AddingpartialdocumentsInadditiontowhatwealreadywroteabouttheupdateAPI,Elasticsearchisalsocapableofmergingpartialdocumentsfromtheupdaterequesttoalreadyexistingdocumentsorindexingnewdocumentsusinginformationabouttherequest,similartowhatwesawseenwiththeupsertsection.

Let’simaginethatwewouldliketoupdateourinitialdocumentandaddanewfieldcalledcounttoit(settingitto1initially).Wewouldalsoliketoindexthedocumentunderthespecifiedidentifierifthedocumentisnotpresent.Wecandothisbyrunningthefollowingcommand:

curl-XPOST'http://localhost:9200/blog/article/1/_update'-d'{

"doc":{

"count":1

},

"doc_as_upsert":true

}

Wespecifiedthenewfieldinthedocsectionandwesaidthatwewantthedocsectionto

www.EBooksWorld.ir

Page 95: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

betreatedastheupsertsectionwhenthedocumentisnotpresent(withthedoc_as_upsertpropertysettotrue).

Ifwenowretrievethatdocument,weseethefollowingresponse:

{

"_index":"blog",

"_type":"article",

"_id":"1",

"_version":3,

"found":true,

"_source":{

"title":"NewversionofElasticsearchreleased!",

"content":"Thisistheupdateddocument",

"priority":10,

"tags":["announce","elasticsearch","release"],

"count":1

}

}

NoteForafullreferenceondocumentupdates,pleaserefertotheofficialElasticsearchdocumentationontheUpdateAPI,whichisavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html.

www.EBooksWorld.ir

Page 96: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DeletingdocumentsNowthatweknowhowtoindexdocuments,updatethem,andretrievethem,itistimetolearnabouthowwecandeletethem.DeletingadocumentfromanElasticsearchindexisverysimilartoretrievingit,butwithonemajordifference—insteadofusingtheHTTPGETmethod,wehavetouseHTTPDELETEone.

Forexample,ifwewouldliketodeletethedocumentindexedintheblogindexunderthearticletypeandwithanidentifierof1,wewouldrunthefollowingcommand:

curl-XDELETE'localhost:9200/blog/article/1'

TheresponsefromElasticsearchindicatesthatthedocumenthasbeendeletedandshouldlookasfollows:

{

"found":true,

"_index":"blog",

"_type":"article",

"_id":"1",

"_version":4,

"_shards":{

"total":2,

"successful":1,

"failed":0

}

}

Ofcourse,thisisnottheonlythingwhenitcomestodeleting.Wecanalsoremoveallthedocumentsofagiventype.Forexample,ifwewouldliketodeletetheentireblogindex,weshouldjustomittheidentifierandthetype,sothecommandwouldlooklikethis:

curl-XDELETE'localhost:9200/blog'

Theprecedingcommandwouldresultinthedeletionoftheblogindex.

www.EBooksWorld.ir

Page 97: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

VersioningFinally,thereisonelastthingthatwewouldliketotalkaboutwhenitcomestodatamanipulationinElasticsearch—thegreatfeatureofversioning.Asyoumayhavealreadynoticed,Elasticsearchincrementsthedocumentversionwhenitdoesupdatestoit.Wecanleveragethisfunctionalityanduseoptimisticlocking(http://en.wikipedia.org/wiki/Optimistic_concurrency_control),andavoidconflictsandoverwriteswhenmultipleprocessesorthreadsaccessthesamedocumentconcurrently.Youcanassumethatyourindexingapplicationmaywanttotrytoupdatethedocument,whiletheuserwouldliketoupdatethedocumentwhiledoingsomemanualwork.Thequestionthatarisesis:Whichdocumentshouldbethecorrectone—theoneupdatedbytheindexingapplication,theoneupdatedbytheuser,orthemergeddocumentofthechanges?Whatifthechangesareconflicting?Tohandlesuchcases,wecanuseversioning.

UsageexampleLet’sindexanewdocumenttoourblogindex—onewithanidentifierof10,andlet’sindexitssecondversionsoonafterwedothat.Thecommandsthatdothislookasfollows:

curl-XPUT'localhost:9200/blog/article/10'-d'{"title":"Testdocument"}'

curl-XPUT'localhost:9200/blog/article/10'-d'{"title":"Updatedtest

document"}'

Becausewe’veindexedthedocumentwiththesameidentifier,itshouldhaveaversion2(youcancheckitusingtheGETrequest).

Now,let’strydeletingthedocumentwe’vejustindexedbutlet’sspecifyaversionpropertyequalto1.Bydoingthis,wetellElasticsearchthatweareinterestedindeletingthedocumentwiththeprovidedversion.Becausethedocumentisadifferentversionnow,Elasticsearchshouldn’tallowindexingwithversion1.Let’scheckifwhatwesayistrue.Thecommandwewillusetosendthedeleterequestlooksasfollows:

curl-XDELETE'localhost:9200/blog/article/10?version=1'

TheresponsegeneratedbyElasticsearchshouldbesimilartothefollowingone:

{

"error":{

"root_cause":[{

"type":"version_conflict_engine_exception",

"reason":"[article][10]:versionconflict,current[2],provided

[1]",

"shard":1,

"index":"blog"

}],

"type":"version_conflict_engine_exception",

"reason":"[article][10]:versionconflict,current[2],provided

[1]",

"shard":1,

"index":"blog"

},

"status":409

www.EBooksWorld.ir

Page 98: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

Asyoucansee,thedeleteoperationwasnotsuccessful—theversionsdidn’tmatch.Ifwesettheversionpropertyto2,thedeleteoperationwouldbesuccessful:

curl-XDELETE'localhost:9200/blog/article/10?version=2&pretty'

Theresponsethistimewilllookasfollows:

{

"found":true,

"_index":"blog",

"_type":"article",

"_id":"10",

"_version":3,

"_shards":{

"total":2,

"successful":1,

"failed":0

}

}

Thistimethedeleteoperationhasbeensuccessfulbecausetheprovidedversionwasproper.

VersioningfromexternalsystemsTheverygoodthingaboutElasticsearchversioningcapabilitiesisthatwecanprovidetheversionofthedocumentthatwewouldlikeElasticsearchtouse.Thisallowsustoprovideversionsfromexternaldatasystemsthatareourprimarydatastores.Todothis,weneedtoprovideanadditionalparameterduringindexing—version_type=externaland,ofcourse,theversionitself.Forexample,ifwewouldlikeourdocumenttohavethe12345version,wecouldsendarequestlikethis:

curl-XPUT'localhost:9200/blog/article/20?

version=12345&version_type=external'-d'{"title":"Testdocument"}'

TheresponsereturnedbyElasticsearchisasfollows:

{

"_index":"blog",

"_type":"article",

"_id":"20",

"_version":12345,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"created":true

}

Wejustneedtorememberthat,whenusingversion_type=external,weneedtoprovidetheversionincaseswhereweindexthedocument.Incaseswherewewouldliketochangethedocumentanduseoptimisticlocking,weneedtoprovideaversionparameter

www.EBooksWorld.ir

Page 99: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

equalto,orhigherthan,theversionpresentinthedocument.

www.EBooksWorld.ir

Page 100: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 101: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SearchingwiththeURIrequestqueryBeforegettingintothewonderfulworldoftheElasticsearchquerylanguage,wewouldliketointroduceyoutothesimplebutprettyflexibleURIrequestsearch,whichallowsustouseasimpleElasticsearchquerycombinedwiththeLucenequerylanguage.Ofcourse,wewillextendoursearchknowledgeusingElasticsearchinChapter3,SearchingYourData,butfornowwewillsticktothesimplestapproach.

www.EBooksWorld.ir

Page 102: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SampledataForthepurposeofthissectionofthebook,wewillcreateasimpleindexwithtwodocumenttypes.Todothis,wewillrunthefollowingsixcommands:

curl-XPOST'localhost:9200/books/es/1'-d'{"title":"Elasticsearch

Server","published":2013}'

curl-XPOST'localhost:9200/books/es/2'-d'{"title":"ElasticsearchServer

SecondEdition","published":2014}'

curl-XPOST'localhost:9200/books/es/3'-d'{"title":"Mastering

Elasticsearch","published":2013}'

curl-XPOST'localhost:9200/books/es/4'-d'{"title":"Mastering

ElasticsearchSecondEdition","published":2015}'

curl-XPOST'localhost:9200/books/solr/1'-d'{"title":"ApacheSolr4

Cookbook","published":2012}'

curl-XPOST'localhost:9200/books/solr/2'-d'{"title":"SolrCookbookThird

Edition","published":2015}'

Runningtheprecedingcommandswillcreatethebook’sindexwithtwotypes:esandsolr.Thetitleandpublishedfieldswillbeindexedandthus,searchable.

www.EBooksWorld.ir

Page 103: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

URIsearchAllqueriesinElasticsearcharesenttothe_searchendpoint.Youcansearchasingleindexormultipleindices,andyoucanrestrictyoursearchtoagivendocumenttypeormultipletypes.Forexample,inordertosearchourbook’sindex,wewillrunthefollowingcommand:

curl-XGET'localhost:9200/books/_search?pretty'

TheresultsreturnedbyElasticsearchwillincludeallthedocumentsfromourbook’sindex(becausenoqueryhasbeenspecified)andshouldlooksimilartothefollowing:

{

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":6,

"max_score":1.0,

"hits":[{

"_index":"books",

"_type":"es",

"_id":"2",

"_score":1.0,

"_source":{

"title":"ElasticsearchServerSecondEdition",

"published":2014

}

},{

"_index":"books",

"_type":"es",

"_id":"4",

"_score":1.0,

"_source":{

"title":"MasteringElasticsearchSecondEdition",

"published":2015

}

},{

"_index":"books",

"_type":"solr",

"_id":"2",

"_score":1.0,

"_source":{

"title":"SolrCookbookThirdEdition",

"published":2015

}

},{

"_index":"books",

"_type":"es",

"_id":"1",

"_score":1.0,

www.EBooksWorld.ir

Page 104: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"_source":{

"title":"ElasticsearchServer",

"published":2013

}

},{

"_index":"books",

"_type":"solr",

"_id":"1",

"_score":1.0,

"_source":{

"title":"ApacheSolr4Cookbook",

"published":2012

}

},{

"_index":"books",

"_type":"es",

"_id":"3",

"_score":1.0,

"_source":{

"title":"MasteringElasticsearch",

"published":2013

}

}]

}

}

Asyoucansee,theresponsehasaheaderthattellsyouthetotaltimeofthequeryandtheshardsusedinthequeryprocess.Inadditiontothis,wehavedocumentsmatchingthequery—thetop10documentsbydefault.Eachdocumentisdescribedbytheindex,type,identifier,score,andthesourceofthedocument,whichistheoriginaldocumentsenttoElasticsearch.

Wecanalsorunqueriesagainstmanyindices.Forexample,ifwehadanotherindexcalledclients,wecouldalsorunasinglequeryagainstthesetwoindicesasfollows:

curl-XGET'localhost:9200/books,clients/_search?pretty'

WecanalsorunqueriesagainstallthedatainElasticsearchbyomittingtheindexnamescompletelyorsettingthequeriesto_all:

curl-XGET'localhost:9200/_search?pretty'

curl-XGET'localhost:9200/_all/_search?pretty'

Inasimilarmanner,wecanalsochoosethetypeswewanttouseduringsearching.Forexample,ifwewanttosearchonlyintheestypeinthebook’sindex,werunacommandasfollows:

curl-XGET'localhost:9200/books/es/_search?pretty'

Pleaserememberthat,inordertosearchforagiventype,weneedtospecifytheindexormultipleindices.Elasticsearchallowsustohavequitearichsemanticswhenitcomestochoosingindexnames.Ifyouareinterested,pleaserefertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/multi-index.html;however,thereisonethingwewouldliketopointout.Whenrunningaqueryagainstmultiple

www.EBooksWorld.ir

Page 105: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

indices,itmayhappenthatsomeofthemdonotexistorareclosed.Insuchcases,theignore_unavailablepropertycomesinhandy.Whensettotrue,ittellsElasticsearchtoignoreunavailableorclosedindices.

Forexample,let’stryrunningthefollowingquery:

curl-XGET'localhost:9200/books,non_existing/_search?pretty'

Theresponsewouldbesimilartothefollowingone:

{

"error":{

"root_cause":[{

"type":"index_missing_exception",

"reason":"nosuchindex",

"index":"non_existing"

}],

"type":"index_missing_exception",

"reason":"nosuchindex",

"index":"non_existing"

},

"status":404

}

Nowlet’scheckwhatwillhappenifweaddtheignore_unavailable=truetoourrequestandexecutethefollowingcommand:

curl-XGET'localhost:9200/books,non_existing/_search?

pretty&ignore_unavailable=true'

Inthiscase,Elasticsearchwouldreturntheresultswithoutanyerror.

ElasticsearchqueryresponseLet’sassumethatwewanttofindallthedocumentsinourbook’sindexthatcontaintheelasticsearchterminthetitlefield.Wecandothisbyrunningthefollowingquery:

curl-XGET'localhost:9200/books/_search?pretty&q=title:elasticsearch'

TheresponsereturnedbyElasticsearchfortheprecedingrequestwillbeasfollows:

{

"took":37,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.625,

"hits":[{

"_index":"books",

"_type":"es",

"_id":"1",

"_score":0.625,

www.EBooksWorld.ir

Page 106: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"_source":{

"title":"ElasticsearchServer",

"published":2013

}

},{

"_index":"books",

"_type":"es",

"_id":"2",

"_score":0.5,

"_source":{

"title":"ElasticsearchServerSecondEdition",

"published":2014

}

},{

"_index":"books",

"_type":"es",

"_id":"4",

"_score":0.5,

"_source":{

"title":"MasteringElasticsearchSecondEdition",

"published":2015

}

},{

"_index":"books",

"_type":"es",

"_id":"3",

"_score":0.19178301,

"_source":{

"title":"MasteringElasticsearch",

"published":2013

}

}]

}

}

Thefirstsectionoftheresponsegivesusinformationabouthowmuchtimetherequesttook(thetookpropertyisspecifiedinmilliseconds),whetheritwastimedout(thetimed_outproperty),andinformationabouttheshardsthatwerequeriedduringtherequestexecution—thenumberofqueriedshards(thetotalpropertyofthe_shardsobject),thenumberofshardsthatreturnedtheresultssuccessfully(thesuccessfulpropertyofthe_shardsobject),andthenumberoffailedshards(thefailedpropertyofthe_shardsobject).Thequerymayalsotimeoutifitisexecutedforalongerperiodthanwewant.(Wecanspecifythemaximumqueryexecutiontimeusingthetimeoutparameter.)Thefailedshardmeansthatsomethingwentwrongwiththatshardoritwasnotavailableduringthesearchexecution.

Ofcourse,thementionedinformationcanbeuseful,butusually,weareinterestedintheresultsthatarereturnedinthehitsobject.Wehavethetotalnumberofdocumentsreturnedbythequery(inthetotalproperty)andthemaximumscorecalculated(inthemax_scoreproperty).Finally,wehavethehitsarraythatcontainsthereturneddocuments.Inourcase,eachreturneddocumentcontainsitsindexname(the_indexproperty),thetype(the_typeproperty),theidentifier(the_idproperty),thescore(the_scoreproperty),andthe

www.EBooksWorld.ir

Page 107: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

_sourcefield(usually,thisistheJSONobjectsentforindexing.

www.EBooksWorld.ir

Page 108: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

QueryanalysisYoumaywonderwhythequerywe’verunintheprevioussectionworked.WeindexedtheElasticsearchtermandranaqueryforElasticsearchandeventhoughtheydiffer(capitalization),therelevantdocumentswerefound.Thereasonforthisistheanalysis.Duringindexing,theunderlyingLucenelibraryanalyzesthedocumentsandindexesthedataaccordingtotheElasticsearchconfiguration.Bydefault,ElasticsearchwilltellLucenetoindexandanalyzebothstring-baseddataaswellasnumbers.ThesamehappensduringqueryingbecausetheURIrequestquerymapstothequery_stringquery(whichwillbediscussedinChapter3,SearchingYourData),andthisqueryisanalyzedbyElasticsearch.

Let’susetheindices-analyzeAPI(https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html).Itallowsustoseehowtheanalysisprocessisdone.Withthis,wecanseewhathappenedtooneofthedocumentsduringindexingandwhathappenedtoourqueryphraseduringquerying.

InordertoseewhatwasindexedinthetitlefieldoftheElasticsearchserverphrase,wewillrunthefollowingcommand:

curl-XGET'localhost:9200/books/_analyze?pretty&field=title'-d

'ElasticsearchServer'

Theresponsewillbeasfollows:

{

"tokens":[{

"token":"elasticsearch",

"start_offset":0,

"end_offset":13,

"type":"<ALPHANUM>",

"position":0

},{

"token":"server",

"start_offset":14,

"end_offset":20,

"type":"<ALPHANUM>",

"position":1

}]

}

YoucanseethatElasticsearchhasdividedthetextintotwoterms—thefirstonehasatokenvalueofelasticsearchandthesecondonehasatokenvalueoftheserver.

Nowlet’slookathowthequerytextwasanalyzed.Wecandothisbyrunningthefollowingcommand:

curl-XGET'localhost:9200/books/_analyze?pretty&field=title'-d

'elasticsearch'

Theresponseoftherequestwilllookasfollows:

www.EBooksWorld.ir

Page 109: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

{

"tokens":[{

"token":"elasticsearch",

"start_offset":0,

"end_offset":13,

"type":"<ALPHANUM>",

"position":0

}]

}

Wecanseethatthewordisthesameastheoriginalonethatwepassedtothequery.Wewon’tgetintotheLucenequerydetailsandhowthequeryparserconstructedthequery,butingeneraltheindexedtermaftertheanalysiswasthesameastheoneinthequeryaftertheanalysis;so,thedocumentmatchedthequeryandtheresultwasreturned.

www.EBooksWorld.ir

Page 110: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

URIquerystringparametersThereareafewparametersthatwecanusetocontrolURIquerybehavior,whichwewilldiscussnow.Thethingtorememberisthateachparameterinthequeryshouldbeconcatenatedwiththe&character,asshowninthefollowingexample:

curl-XGET'localhost:9200/books/_search?

pretty&q=published:2013&df=title&explain=true&default_operator=AND'

PleaseremembertoenclosetheURLoftherequestusingthe'charactersbecause,onLinux-basedsystems,the&characterwillbeanalyzedbytheLinuxshell.

ThequeryTheqparameterallowsustospecifythequerythatwewantourdocumentstomatch.ItallowsustospecifythequeryusingtheLucenequerysyntaxdescribedintheLucenequerysyntaxsectionlaterinthischapter.Forexample,asimplequerywouldlooklikethis:q=title:elasticsearch.

ThedefaultsearchfieldUsingthedfparameter,wecanspecifythedefaultsearchfieldthatshouldbeusedwhennofieldindicatorisusedintheqparameter.Bydefault,the_allfieldwillbeused.(ThisisthefieldthatElasticsearchusestocopythecontentofalltheotherfields.WewilldiscussthisingreaterdepthinChapter2,IndexingYourData).Anexampleofthedfparametervaluecanbedf=title.

AnalyzerTheanalyzerpropertyallowsustodefinethenameoftheanalyzerthatshouldbeusedtoanalyzeourquery.Bydefault,ourquerywillbeanalyzedbythesameanalyzerthatwasusedtoanalyzethefieldcontentsduringindexing.

ThedefaultoperatorpropertyThedefault_operatorpropertythatcanbesettoORorAND,allowsustospecifythedefaultBooleanoperatorusedforourquery(http://en.wikipedia.org/wiki/Boolean_algebra).Bydefault,itissettoOR,whichmeansthatasinglequerytermmatchwillbeenoughforadocumenttobereturned.SettingthisparametertoANDforaquerywillresultinreturningthedocumentsthatmatchallthequeryterms.

QueryexplanationIfwesettheexplainparametertotrue,Elasticsearchwillincludeadditionalexplaininformationwitheachdocumentintheresult—suchastheshardfromwhichthedocumentwasfetchedandthedetailedinformationaboutthescoringcalculation(wewilltalkmoreaboutitintheUnderstandingtheexplaininformationsectioninChapter6,MakeYourSearchBetter).Alsoremembernottofetchtheexplaininformationduringnormalsearchqueriesbecauseitrequiresadditionalresourcesandaddsperformancedegradationtothequeries.Forexample,aquerythatincludesexplaininformationcouldlookasfollows:

www.EBooksWorld.ir

Page 111: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

curl-XGET'localhost:9200/books/_search?pretty&explain=true&q=title:solr'

TheresultsreturnedbyElasticsearchfortheprecedingquerywouldbeasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":2,

"max_score":0.70273256,

"hits":[{

"_shard":2,

"_node":"v5iRsht9SOWVzu-GY-YHlA",

"_index":"books",

"_type":"solr",

"_id":"2",

"_score":0.70273256,

"_source":{

"title":"SolrCookbookThirdEdition",

"published":2015

},

"_explanation":{

"value":0.70273256,

"description":"weight(title:solrin0)[PerFieldSimilarity],

resultof:",

"details":[{

"value":0.70273256,

"description":"fieldWeightin0,productof:",

"details":[{

"value":1.0,

"description":"tf(freq=1.0),withfreqof:",

"details":[{

"value":1.0,

"description":"termFreq=1.0",

"details":[]

}]

},{

"value":1.4054651,

"description":"idf(docFreq=1,maxDocs=3)",

"details":[]

},{

"value":0.5,

"description":"fieldNorm(doc=0)",

"details":[]

}]

}]

}

},{

"_shard":3,

"_node":"v5iRsht9SOWVzu-GY-YHlA",

"_index":"books",

www.EBooksWorld.ir

Page 112: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"_type":"solr",

"_id":"1",

"_score":0.5,

"_source":{

"title":"ApacheSolr4Cookbook",

"published":2012

},

"_explanation":{

"value":0.5,

"description":"weight(title:solrin1)[PerFieldSimilarity],

resultof:",

"details":[{

"value":0.5,

"description":"fieldWeightin1,productof:",

"details":[{

"value":1.0,

"description":"tf(freq=1.0),withfreqof:",

"details":[{

"value":1.0,

"description":"termFreq=1.0",

"details":[]

}]

},{

"value":1.0,

"description":"idf(docFreq=1,maxDocs=2)",

"details":[]

},{

"value":0.5,

"description":"fieldNorm(doc=1)",

"details":[]

}]

}]

}

}]

}

}

ThefieldsreturnedBydefault,foreachdocumentreturned,Elasticsearchwillincludetheindexname,thetypename,thedocumentidentifier,score,andthe_sourcefield.Wecanmodifythisbehaviorbyaddingthefieldsparameterandspecifyingacomma-separatedlistoffieldnames.Thefieldwillberetrievedfromthestoredfields(iftheyexist;wewilldiscusstheminChapter2,IndexingYourData)orfromtheinternal_sourcefield.Bydefault,thevalueofthefieldsparameteris_source.Anexampleis:fields=title,priority.

Wecanalsodisablethefetchingofthe_sourcefieldbyaddingthe_sourceparameterwithitsvaluesettofalse.

SortingtheresultsUsingthesortparameter,wecanspecifycustomsorting.ThedefaultbehaviorofElasticsearchistosortthereturneddocumentsindescendingorderofthevalueofthe_scorefield.Ifwewanttosortourdocumentsdifferently,weneedtospecifythesort

www.EBooksWorld.ir

Page 113: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

parameter.Forexample,addingsort=published:descwillsortthedocumentsindescendingorderofpublishedfield.Byaddingthesort=published:ascparameter,wewilltellElasticsearchtosortthedocumentsonthebasisofthepublishedfieldinascendingorder.

Ifwespecifycustomsorting,Elasticsearchwillomitthe_scorefieldcalculationforthedocuments.Thismaynotbethedesiredbehaviorinyourcase.Ifyouwanttostillkeepatrackofthescoresforeachdocumentwhenusingacustomsort,youshouldaddthetrack_scores=truepropertytoyourquery.Pleasenotethattrackingthescoreswhendoingcustomsortingwillmakethequeryalittlebitslower(youmaynotevennoticethedifference)duetotheprocessingpowerneededtocalculatethescore.

ThesearchtimeoutBydefault,Elasticsearchdoesn’thavetimeoutforqueries,butyoumaywantyourqueriestotimeoutafteracertainamountoftime(forexample,5seconds).Elasticsearchallowsyoutodothisbyexposingthetimeoutparameter.Whenthetimeoutparameterisspecified,thequerywillbeexecuteduptoagiventimeoutvalueandtheresultsthatweregathereduptothatpointwillbereturned.Tospecifyatimeoutof5seconds,youwillhavetoaddthetimeout=5sparametertoyourquery.

TheresultswindowElasticsearchallowsyoutospecifytheresultswindow(therangeofdocumentsintheresultslistthatshouldbereturned).Wehavetwoparametersthatallowustospecifytheresultswindowsize:sizeandfrom.Thesizeparameterdefaultsto10anddefinesthemaximumnumberofresultsreturned.Thefromparameterdefaultsto0andspecifiesfromwhichdocumenttheresultsshouldbereturned.Inordertoreturnfivedocumentsstartingfromthe11thone,wewilladdthefollowingparameterstothequery:size=5&from=10.

Limitingper-shardresultsElasticsearchallowsustospecifythemaximumnumberofdocumentsthatshouldbefetchedfromeachshardusingterminate_afterpropertyandspecifyingthemaximumnumberofdocuments.Forexample,ifwewanttogetnomorethan100documentsfromeachshard,wecanaddterminate_after=100toourURIrequest.

IgnoringunavailableindicesWhenrunningqueriesagainstmultipleindices,itishandytotellElasticsearchthatwedon’tcareabouttheindicesthatarenotavailable.Bydefault,Elasticsearchwillthrowanerrorifoneoftheindicesisnotavailable,butwecanchangethisbysimplyaddingtheignore_unavailable=trueparametertoourURIrequest.

ThesearchtypeTheURIqueryallowsustospecifythesearchtypeusingthesearch_typeparameter,whichdefaultstoquery_then_fetch.Twovaluesthatwecanusehereare:dfs_query_then_fetchandquery_then_fetch.TherestofthesearchtypesavailableinolderElasticsearchversionsarenowdeprecatedorremoved.We’lllearnmoreabout

www.EBooksWorld.ir

Page 114: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

searchtypesintheUnderstandingthequeryingprocesssectionofChapter3,SearchingYourData.

LowercasingtermexpansionSomequeries,suchastheprefixquery,usequeryexpansion.WewilldiscussthisintheQueryrewritesectioninChapter4,ExtendingYourQueryingKnowledge.Weareallowedtodefinewhethertheexpandedtermsshouldbelowercasedornotusingthelowercase_expanded_termsproperty.Bydefault,thelowercase_expanded_termspropertyissettotrue,whichmeansthattheexpandedtermswillbelowercased.

WildcardandprefixanalysisBydefault,wildcardqueriesandprefixqueriesarenotanalyzed.Ifwewanttochangethisbehavior,wecansettheanalyze_wildcardpropertytotrue.

NoteIfyouwanttoseealltheparametersexposedbyElasticsearchastheURIrequestparameters,pleaserefertotheofficialdocumentationavailableat:https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html.

www.EBooksWorld.ir

Page 115: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

LucenequerysyntaxWethoughtthatitwouldbegoodtoknowabitmoreaboutwhatsyntaxcanbeusedintheqparameterpassedintheURIquery.SomeofthequeriesinElasticsearch(suchastheonecurrentlybeingdiscussed)supporttheLucenequeryparsersyntax—thelanguagethatallowsyoutoconstructqueries.Let’stakealookatitanddiscusssomebasicfeatures.

AquerythatwepasstoLuceneisdividedintotermsandoperatorsbythequeryparser.Let’sstartwiththeterms;youcandistinguishthemintotwotypes—singletermsandphrases.Forexample,toqueryforabookterminthetitlefield,wewillpassthefollowingquery:

title:book

Toqueryfortheelasticsearchbookphraseinthetitlefield,wewillpassthefollowingquery:

title:"elasticsearchbook"

Youmayhavenoticedthenameofthefieldinthebeginningandinthetermorthephraselater.

Aswealreadysaid,theLucenequerysyntaxsupportsoperators.Forexample,the+operatortellsLucenethatthegivenpartmustbematchedinthedocument,meaningthatthetermwearesearchingformustpresentinthefieldinthedocument.The-operatoristheopposite,whichmeansthatsuchapartofthequerycan’tbepresentinthedocument.Apartofthequerywithoutthe+or-operatorwillbetreatedasthegivenpartofthequerythatcanbematchedbutitisnotmandatory.So,ifwewanttofindadocumentwiththebookterminthetitlefieldandwithoutthecatterminthedescriptionfield,wesendthefollowingquery:

+title:book-description:cat

Wecanalsogroupmultipletermswithparentheses,asshowninthefollowingquery:

title:(crimepunishment)

Wecanalsoboostpartsofthequery(thisincreasestheirimportanceforthescoringalgorithm—thehighertheboost,themoreimportantthequerypartis)withthe^operatorandtheboostvalueafterit,asshowninthefollowingquery:

title:book^4

ThesearethebasicsoftheLucenequerylanguageandshouldallowyoutouseElasticsearchandconstructquerieswithoutanyproblems.However,ifyouareinterestedintheLucenequerysyntaxandyouwouldliketoexplorethatindepth,pleaserefertotheofficialdocumentationofthequeryparseravailableathttp://lucene.apache.org/core/5_4_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html.

www.EBooksWorld.ir

Page 116: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 117: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SummaryInthischapter,welearnedwhatfulltextsearchisandthecontributionApacheLucenemakestothis.Inadditiontothis,wearenowfamiliarwiththebasicconceptsofElasticsearchanditstop-levelarchitecture.WeusedtheElasticsearchRESTAPInotonlytoindexdata,butalsotoupdate,retrieve,andfinallydeleteit.We’velearnedwhatversioningisandhowwecanuseitforoptimisticlockinginElasticsearch.Finally,wesearchedourdatausingthesimpleURIquery.

Inthenextchapter,we’llfocusonindexingourdata.WewillseehowElasticsearchindexingworksandwhattheroleofprimaryshardsandreplicasis.We’llseehowElasticsearchhandlesdatathatitdoesn’tknowandhowtocreateourownmappings—theJSONstructurethatdescribesthestructureofourindex.We’llalsolearnhowtousebatchindexingtospeeduptheindexingprocessandwhatadditionalinformationcanbestoredalongwithourindextohelpusachieveourgoal.Inaddition,wewilldiscusswhatanindexsegmentis,whatsegmentmergingis,andhowtotuneasegment.Finally,we’llseehowroutingworksinElasticsearchandwhatoptionswehavewhenitcomestobothindexingandqueryingrouting.

www.EBooksWorld.ir

Page 118: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 119: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Chapter2.IndexingYourDataInthepreviouschapter,welearnedwhatfulltextsearchisandhowApacheLucenefitsthere.WewereintroducedtothebasicconceptsofElasticsearchandwearenowfamiliarwithitstop-levelarchitecture,soweknowhowitworks.WeusedtheRESTAPItoindexdata,toupdateit,todeleteit,andofcoursetoretrieveit.WesearchedourdatawiththesimpleURIqueryandweusedversioningthatallowedustouseoptimisticlockingfunctionality.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

BasicinformationaboutElasticsearchindexingAdjustingElasticsearchschema-lessbehaviorCreatingyourownmappingsUsingoutoftheboxanalyzersConfiguringyourownanalyzersIndexdatainbatchesAddingadditionalinternalinformationtoindicesSegmentmergingRouting

www.EBooksWorld.ir

Page 120: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ElasticsearchindexingSofarwehaveourElasticsearchclusterupandrunning.WealsoknowhowtouseElasticsearchRESTAPItoindexourdata,weknowhowtoretrieveit,andwealsoknowhowtoremovethedatathatwenolongerneed.We’vealsolearnedhowtosearchinourdatabyusingtheURIrequestsearchandApacheLucenequerylanguage.However,untilnowwe’veusedElasticsearchfunctionalitythatallowsusnottocareaboutindices,shards,anddatastructure.ThisisnotsomethingthatyoumaybeusedtowhenyouarecomingfromtheworldofSQLdatabases,whereyouneedthedatabaseandthetableswithallthecolumnscreatedupfront.Ingeneral,youneededtodescribethedatastructuretobeabletoputdataintothedatabase.Elasticsearchisschema-lessandbydefaultcreatesindicesautomaticallyandbecauseofthatwecanjustinstallitandindexdatawithouttheneedofanypreparations.However,thisisusuallynotthebestsituationwhenitcomestoproductionenvironmentswhereyouwanttocontroltheanalysisofyourdata.BecauseofthatwewillstartwithshowingyouhowtomanageyourindicesandthenwewillgetyouthroughtheworldofmappingsinElasticsearch.

www.EBooksWorld.ir

Page 121: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ShardsandreplicasInChapter1,GettingStartedwithElasticsearchCluster,wetoldyouthatindicesinElasticsearcharebuiltfromoneormoreshards.EachofthoseshardscontainspartofthedocumentsetandeachshardisaseparateLuceneindex.Inadditiontothat,eachshardcanhavereplicas–physicalcopiesoftheprimarysharditself.Whenwecreateanindex,wecantellElasticsearchhowmanyshardsitshouldbebuiltfrom.

NoteThedefaultnumberofshardsthatElasticsearchusesis5andeachindexwillalsocontainasinglereplica.Thedefaultconfigurationcanbechangedbysettingtheindex.number_of_shardsandindex.number_of_replicaspropertiesintheelasticsearch.ymlconfigurationfile.

Whendefaultsareused,wewillendupwithfiveApacheLuceneindicesthatourElasticsearchindexisbuiltofandonereplicaforeachofthose.So,withfiveshardsandonereplica,wewouldactuallyget10shards.Thisisbecauseeachshardwouldgetitsowncopy,sothetotalnumberofshardsintheclusterwouldbe10.

Dividingindicesinsuchawayallowsustospreadtheshardsacrossthecluster.Thenicethingaboutthatisthatalltheshardswillbeautomaticallyspreadthroughoutthecluster.Ifwehaveasinglenode,Elasticsearchwillputthefiveprimaryshardsonthatnodeandwillleavethereplicasunassigned,becauseElasticsearchdoesn’tassignshardsandtheirreplicastothesamenode.Thereasonforthatissimple–ifanodewouldcrash,wewouldloseboththeprimarysourceofthedataandallthecopies.So,ifyouhaveoneElasticsearchnode,don’tworryaboutreplicasnotbeingassigned–itissomethingtobeexpected.OfcoursewhenyouhaveenoughnodesforElasticsearchtoassignallthereplicas(inadditiontoshards),itisnotgoodtonothavethemassignedandyoushouldlookfortheprobablecausesofthatsituation.

Thethingtorememberisthathavingshardsandreplicasisnotfree.Firstofall,eachreplicaneedsadditionaldiskspace,exactlythesameamountofspacethattheoriginalshardneeds.Soifwehave3replicasforourindex,wewillactuallyneed4timesmorespace.Ifourprimaryshardweighs100GBintotal,with3replicaswewouldneed400GB–100GBforeachreplica.However,thisisnottheonlycost.EachreplicaisaLuceneindexonitsownandElasticsearchneedssomememorytohandlethat.Themoreshardsinthecluster,themorememoryisbeingused.Andfinally,havingreplicasmeansthatwewillhavetodoindexationoneachofthereplica,inadditiontotheindexationontheprimaryshard.Thereisanotionofshadowreplicaswhichcancopythewholebinaryindex,but,inmostcases,eachreplicawilldoitsownindexation.ThegoodthingaboutreplicasisthatElasticsearchwilltrytospreadthequeryandgetrequestsevenlybetweentheshardsandtheirreplicas,whichmeansthatwecanscaleourclusterhorizontallybyusingthem.

Sotosumuptheconclusions:

Havingmoreshardsintheindexallowsustospreadtheindexbetweenmoreservers

www.EBooksWorld.ir

Page 122: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

andparallelizetheindexingoperationsandthushavebetterindexingthroughput.Dependingonyourdeployment,havingmoreshardsmayincreasequerythroughputandlowerquerieslatency–especiallyinenvironmentsthatdon’thavealargenumberofqueriespersecond.Havingmoreshardsmaybeslowercomparedtoasingleshardquery,becauseElasticsearchneedstoretrievethedatafrommultipleserversandcombinethemtogetherinmemory,beforereturningthefinalqueryresults.Havingmorereplicasresultsinamoreresilientcluster,becausewhentheprimaryshardisnotavailable,itscopywilltakethatrole.Basically,havingasinglereplicaallowsustoloseonecopyofashardandstillservethewholedata.Havingtworeplicasallowsustolosetwocopiesoftheshardandstillseethewholedata.Thehigherthereplicacount,thehigherqueriesthroughputtheclusterwillhave.That’sbecauseeachreplicacanservethedataithasindependentlyfromalltheothers.Thehighernumberofshards(bothprimaryandreplicas)willresultinmorememoryneededbyElasticsearch.

Ofcourse,thesearenottheonlyrelationshipsbetweenthenumberofshardsandreplicasinElasticsearch.Wewilltalkaboutmostofthemlaterinthebook.

So,howmanyshardsandreplicasshouldwehaveforourindices?Thatdepends.Webelievethatthedefaultsarequitegoodbutnothingcanreplaceagoodtest.Notethatthenumberofreplicasisnotveryimportantbecauseyoucanadjustitonaliveclusterafterindexcreation.Youcanremoveandaddthemifyouwantandhavetheresourcestorunthem.Unfortunately,thisisnottruewhenitcomestothenumberofshards.Onceyouhaveyourindexcreated,theonlywaytochangethenumberofshardsistocreateanotherindexandre-indexyourdata.

WriteconsistencyElasticsearchallowsustocontrolthewriteconsistencytopreventwriteshappeningwhentheyshouldnot.Bydefault,Elasticsearchindexingoperationissuccessfulwhenthewriteissuccessfulonthequorumonactiveshards–meaning50%oftheactiveshardsplusone.Wecancontrolthisbehaviorbyaddingaction.write_consitencytoourelasticsearch.ymlfileorbyaddingtheconsistencyparametertoourindexrequest.Thementionedpropertiescantakethefollowingvalues:

quorum:Thedefaultvalue,requiring50%plus1activeshardstobesuccessfulfortheindexoperationtosucceedone:Requiresonlyasingleactiveshardtobesuccessfulfortheindexoperationtosucceedall:Requiresalltheactiveshardstobesuccessfulfortheindexoperationtosucceed

www.EBooksWorld.ir

Page 123: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CreatingindicesWhenwewereindexingourdocumentsinChapter1,GettingStartedwithElasticsearchCluster,wedidn’tcareaboutindexcreationatall.WeassumedthatElasticsearchwilldoeverythingforusandactuallyitwastrue;wejustusedthefollowingcommand:

curl-XPUT'http://localhost:9200/blog/article/1'-d'{"title":"New

versionofElasticsearchreleased!","content":"Version1.0released

today!","tags":["announce","elasticsearch","release"]}'

Thisisjustfine.Ifsuchanindexdoesnotexist,Elasticsearchautomaticallycreatestheindexforus.However,therearetimeswhenwewanttocreateindicesourselvesforvariousreasons.Maybewewouldliketohavecontroloverwhichindicesarecreatedtoavoiderrorsormaybewehavesomenondefaultsettingsthatwewouldliketousewhencreatingaparticularindex.Thereasonsmaydiffer,butit’sgoodtoknowthatwecancreateindiceswithoutindexingdocuments.

ThesimplestwaytocreateanindexistorunaPUTHTTPrequestwiththenameoftheindexwewanttocreate.Forexample,tocreateanindexcalledblog,wecouldusethefollowingcommand:

curl-XPUThttp://localhost:9200/blog/

WejusttoldElasticsearchthatwewanttocreatetheindexwiththenameblog.Ifeverythinggoesright,youwillseethefollowingresponsefromElasticsearch:

{"acknowledged":true}

AlteringautomaticindexcreationWealreadymentionedthatautomaticindexcreationisnotthebestideainsomecases.Forexample,asimpletypoduringindexcreationcanleadtocreatinghundredsofunusedindicesandmakeclusterstateinformationlargerthanitshouldbe,puttingmorepressureonElasticsearchandtheunderlyingJVM.Becauseofthat,wecanturnoffautomaticindexcreationbyaddingasimplepropertytotheelasticsearch.ymlconfigurationfile:

action.auto_create_index:false

Let’sstopforawhileanddiscusstheaction.auto_create_indexproperty,becauseitallowsustodomorecomplicatedthingsthanjustallowing(settingittotrue)anddisabling(settingittofalse)automaticindexcreation.Thementionedpropertyallowsustousepatternsthatspecifytheindexnameswhichshouldbeallowedtobeautomaticallycreatedandwhichshouldbedisallowed.Forexample,let’sassumethatwewouldliketoallowautomaticindexcreationforindicesstartingwithlogsandwewouldliketodisallowalltheothers.Todosomethinglikethis,wewouldsettheaction.auto_create_indexpropertytosomethingasfollows:

action.auto_create_index:+logs*,-*

Nowifwewouldliketocreateanindexcalledlogs_2015-10-01,wewouldsucceed.Tocreatesuchanindex,wewouldusethefollowingcommand:

www.EBooksWorld.ir

Page 124: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

curl-XPUThttp://localhost:9200/logs_2015-10-01/log/1-d'{"message":

"Testlogmessage"}'

Elasticsearchwouldrespondwith:

{

"_index":"logs_2015-10-01",

"_type":"log",

"_id":"1",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"created":true

}

However,supposewenowtrytocreatetheblogusingthefollowingcommand:

curl-XPUThttp://localhost:9200/blog/article/1-d'{"title":"Testarticle

title"}'

Elasticsearchwouldrespondwithanerrorsimilartothefollowingone:

{

"error":{

"root_cause":[{

"type":"index_not_found_exception",

"reason":"nosuchindex",

"resource.type":"index_expression",

"resource.id":"blog",

"index":"blog"

}],

"type":"index_not_found_exception",

"reason":"nosuchindex",

"resource.type":"index_expression",

"resource.id":"blog",

"index":"blog"

},

"status":404

}

Onethingtorememberisthattheorderofpatterndefinitionsmatters.Elasticsearchchecksthepatternsuptothefirstpatternthatmatches,soifwemove-*asthefirstpattern,the+logs*patternwon’tbeusedatall.

SettingsforanewlycreatedindexManualindexcreationisalsonecessarywhenwewanttopassnondefaultconfigurationoptionsduringindexcreation;forexample,initialnumberofshardsandreplicas.WecandothatbyincludingJSONpayloadwithsettingsasthePUTHTTPrequestbody.Forexample,ifwewouldliketotellElasticsearchthatourblogindexshouldonlyhaveasingleshardandtworeplicasinitially,thefollowingcommandcouldbeused:

curl-XPUThttp://localhost:9200/blog/-d'{

www.EBooksWorld.ir

Page 125: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"settings":{

"number_of_shards":1,

"number_of_replicas":2

}

}'

Theprecedingcommandwillresultinthecreationoftheblogindexwithoneshardandtworeplicas,makingatotalofthreephysicalLuceneindices–calledshardsaswealreadyknow.Ofcoursetherearealotmoresettingsthatwecanuse,butwhatwedidisenoughfornowandwewilllearnabouttherestthroughoutthebook.

IndexdeletionOfcourse,similartohowwehandleddocuments,Elasticsearchallowsustodeleteindicesaswell.Deletinganindexisverysimilartocreatingit,butinsteadofusingthePUTHTTPmethod,weusetheDELETEone.Forexample,ifwewouldliketodeleteourpreviouslycreatedblogindex,wewouldrunthefollowingcommand:

curl-XDELETEhttp://localhost:9200/blog

Theresponsewillbethesameastheonewesawearlierwhenwecreatedanindexandshouldlookasfollows:

{"acknowledged":true}

Nowthatweknowwhatanindexis,howtocreateit,andhowtodeleteit,wearereadytocreateindiceswiththemappingswehavedefined.EventhoughElasticsearchisschema–less,therearealotofsituationswherewewouldliketomanuallycreatetheschema,toavoidanyproblemswiththeindexstructure.

www.EBooksWorld.ir

Page 126: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 127: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

MappingsconfigurationIfyouareusedtoSQLdatabases,youmayknowthatbeforeyoucanstartinsertingthedatainthedatabase,youneedtocreateaschema,whichwilldescribewhatyourdatalookslike.AlthoughElasticsearchisaschema-less(werathercallitdatadrivenschema)searchengineandcanfigureoutthedatastructureonthefly,wethinkthatcontrollingthestructureandthusdefiningitourselvesisabetterway.Thefieldtypedeterminingmechanismisnotgoingtoguessthefuture.Forexample,ifyoufirstsendanintegervalue,suchas60,andyousendafloatvaluesuchas70.23forthesamefield,anerrorcanhappenorElasticsearchwilljustcutoffthedecimalpartofthefloatvalue(whichisactuallywhathappens).ThisisbecauseElasticsearchwillfirstsetthefieldtypetointegerandwilltrytoindexthefloatvaluetotheintegerfieldwhichwillcausecuttingofthedecimalpointinthefloatingpointnumber.Inthenextfewpagesyou’llseehowtocreatemappingsthatsuityourneedsandmatchyourdatastructure.

NoteNotethatwedidn’tincludealltheinformationabouttheavailabletypesinthischapterandsomefeaturesofElasticsearch,suchasnestedtype,parent-childhandling,storinggeographicalpoints,andsearch,aredescribedinthefollowingchaptersofthisbook.

www.EBooksWorld.ir

Page 128: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TypedeterminingmechanismBeforewestartdescribinghowtocreatemappingsmanually,wewanttogetbacktotheautomatictypedeterminingalgorithmusedinElasticsearch.Aswealreadysaid,ElasticsearchcantryguessingtheschemaforourdocumentsbylookingattheJSONthatthedocumentisbuiltfrom.BecauseJSONisstructured,thatseemseasytodo.Forexample,stringsaresurroundedbyquotationmarks,Booleansaredefinedusingspecificwords,andnumbersarejustafewdigits.Thisisasimpletrick,butitusuallyworks.Forexample,let’slookatthefollowingdocument:

{

"field1":10,

"field2":"10"

}

Theprecedingdocumenthastwofields.Thefield1fieldwillbegivenatypenumber(tobeprecise,thatfieldwillbegivenalongtype).Thesecondfield,calledfield2willbegivenastringtype,becauseitissurroundedbyquotationmarks.Ofcourse,forsomeusecasesthiscanbethedesiredbehavior.However,ifsomehowwewouldsurroundallthedatausingquotationmark(whichisnotthebestideaanyway)ourindexstructurewouldcontainonlystringtypefields.

NoteDon’tworryaboutthefactthatyouarenotfamiliarwithwhatarethenumerictypes,thestringtypes,andsoon.WewilldescribethemafterweshowyouwhatyoucandototunetheautomatictypedeterminingmechanisminElasticsearch.

DisablingthetypedeterminingmechanismThefirstsolutionistocompletelydisabletheschema-lessbehaviorinElasticsearch.Wecandothatbyaddingtheindex.mapper.dynamicpropertytoourindexpropertiesandsettingittofalse.Wecandothatbyrunningthefollowingcommandtocreatetheindex:

curl-XPUT'localhost:9200/sites'-d'{

"index.mapper.dynamic":false

}'

BydoingthatwetoldElasticsearchthatwedon’twantittoguessthetypeofourdocumentsinthesite’sindexandthatwewillprovidethemappingsourselves.Ifwewilltryindexingsomeexampledocumenttothesite’sindex,wewillgetthefollowingerror:

{

"error":{

"root_cause":[{

"type":"type_missing_exception",

"reason":"type[[doc,tryingtoautocreatemapping,butdynamic

mappingisdisabled]]missing",

"index":"sites"

}],

"type":"type_missing_exception",

"reason":"type[[doc,tryingtoautocreatemapping,butdynamic

www.EBooksWorld.ir

Page 129: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

mappingisdisabled]]missing",

"index":"sites"

},

"status":404

}

Thisisbecausewedidn’tcreateanymappings–noschemafordocumentswascreated.Elasticsearchcouldn’tcreateoneforusbecausewedidn’tallowitandtheindexationcommandfailed.

Ofcoursethisisnottheonlythingwecandowhenitcomestoconfiguringhowthetypedeterminingmechanismworks.Wecanalsotuneitordisableitforagiventypeontheobjectlevel.WewilltalkaboutthesecondcaseinChapter5,ExtendingYourIndexStructure.Fornow,let’slookatthepossibilitiesoftuningtypedeterminingmechanisminElasticsearch.

TuningthetypedeterminingmechanismfornumerictypesOneofthesolutionstotheproblemswithJSONdocumentsandtypeguessingisthatwearenotalwaysincontrolofthedata.Thedocumentsthatweareindexingcancomefrommultipleplacesandsomesystemsinourenvironmentmayincludequotationmarksforallthefieldsinthedocument.Thiscanleadtoproblemsandbadguesses.Becauseofthat,Elasticsearchallowsustoenablemoreaggressivefieldsvaluecheckingfornumericfieldsbysettingthenumeric_detectionpropertytotrueinthemappingsdefinition.Forexample,let’sassumethatwewanttocreateanindexcalledusersandwewantittohavetheusertypeonwhichwewillwantmoreaggressivenumericfieldsparsing.Todothat,wewillusethefollowingcommand:

curl-XPUThttp://localhost:9200/users/?pretty-d'{

"mappings":{

"user":{

"numeric_detection":true

}

}

}'

Nowlet’srunthefollowingcommandtoindexasingledocumenttotheusersindex:

curl-XPOSThttp://localhost:9200/users/user/1-d'{"name":"User1",

"age":"20"}'

Earlier,withthedefaultsettings,theagefieldwouldbesettostringtype.Withthenumeric_detectionpropertysettotrue,thetypeoftheagefieldwillbesettolong.Wecancheckthatbyrunningthefollowingcommand(itwillretrievethemappingsforallthetypesintheusersindex):

curl-XGET'localhost:9200/users/_mapping?pretty'

TheprecedingcommandshouldresultinthefollowingresponsereturnedbyElasticsearch:

{

"users":{

"mappings":{

www.EBooksWorld.ir

Page 130: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"user":{

"numeric_detection":true,

"properties":{

"age":{

"type":"long"

},

"name":{

"type":"string"

}

}

}

}

}

}

Aswecansee,theagefieldwasreallysettobeoftypelong.

TuningthetypedeterminingmechanismfordatesAnothertypeofdatathatcausestroublearefieldswithdates.Datescancomeindifferentflavors,forexample,2015-10-0111:22:33isaproperdateandsois2015-10-01T11:22:33+00.Becauseofthat,Elasticsearchtriestomatchthefieldstotimestampsorstringsthatmatchsomegivendateformat.Ifthatmatchingoperationissuccessful,thefieldistreatedasadatebasedone.Ifweknowhowourdatefieldslook,wecanhelpElasticsearchbyprovidingalistofrecognizeddateformatsusingthedynamic_date_formatsproperty,whichallowsustospecifytheformatsarray.Let’slookatthefollowingcommandforcreatinganindex:

curl-XPUT'http://localhost:9200/blog/'-d'{

"mappings":{

"article":{

"dynamic_date_formats":["yyyy-MM-ddhh:mm"]

}

}

}'

Theprecedingcommandwillresultinthecreationofanindexcalledblogwiththesingletypecalledarticle.We’vealsousedthedynamic_date_formatspropertywithasingledateformatthatwillresultinElasticsearchusingthedatecoretype(refertotheCoretypessectioninthischapterformoreinformationaboutfieldtypes)forfieldsmatchingthedefinedformat.Elasticsearchusesthejoda-timelibrarytodefinethedateformats,sovisithttp://joda-time.sourceforge.net/api-release/org/joda/time/format/DateTimeFormat.htmlifyouareinterestedinknowingaboutthem.

NoteRememberthatthedynamic_date_formatpropertyacceptsanarrayofvalues.Thatmeansthatwecanhandleseveraldateformatssimultaneously.

Withtheprecedingindex,wecannowtryindexinganewdocumentusingthefollowingcommand:

www.EBooksWorld.ir

Page 131: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

curl-XPUTlocalhost:9200/blog/article/1-d'{"name":"Test",

"test_field":"2015-10-0111:22"}'

Elasticsearchwillofcourseindexthatdocument,butlet’slookatthemappingscreatedforourindex:

curl-XGET'localhost:9200/blog/_mapping?pretty'

Theresponsefortheprecedingcommandwillbeasfollows:

{

"blog":{

"mappings":{

"article":{

"dynamic_date_formats":["yyyy-MM-ddhh:mm"],

"properties":{

"name":{

"type":"string"

},

"test_field":{

"type":"date",

"format":"yyyy-MM-ddhh:mm"

}

}

}

}

}

}

Aswecansee,thetest_fieldfieldwasgivenadatetype,soourtuningworks.

Unfortunately,theproblemstillexistsifwewanttheBooleantypetobeguessed.ThereisnooptiontoforcetheguessingofBooleantypesfromthetext.Insuchcases,whenachangeofsourceformatisimpossible,wecanonlydefinethefielddirectlyinthemappingsdefinition.

www.EBooksWorld.ir

Page 132: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndexstructuremappingEachdatahasitsownstructure–someareverysimple,andsomeincludecomplicatedobjectrelations,childrendocuments,andnestedproperties.Ineachcase,weneedtohaveaschemainElasticsearchcalledmappingsthatdefinehowthedatalooks.Ofcourse,wecanusetheschema-lessnatureofElasticsearch,butwecanandweusuallywanttopreparethemappingsupfront,soweknowhowthedataishandled.

Forthepurposesofthischapter,wewilluseasingletypeintheindex.Ofcourse,Elasticsearchasamultitenantsystemallowsustohavemultipletypesinasingleindex,butwewanttosimplifytheexample,tomakeiteasiertounderstand.So,forthepurposeofthenextfewpages,wewillcreateanindexcalledpoststhatwillholddatafordocumentsinaposttype.Wealsoassumethattheindexwillholdthefollowinginformation:

UniqueidentifieroftheblogpostNameoftheblogpostPublicationdateContents–textofthepostitself

InElasticsearch,mappings,aswithalmostallcommunication,aresentasJSONobjectsintherequestbody.So,ifwewanttocreatethesimplestmappingsthatmatchesourneed,itwilllookasfollows(westoredthemappingsintheposts.jsonfile,sowecaneasilysendit):

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"published":{"type":"date"},

"contents":{"type":"string"}

}

}

}

}

Tocreateourpostsindexwiththeprecedingmappingsfile,wewilljustrunthefollowingcommand:

curl-XPOST'http://localhost:9200/posts'[email protected]

NoteNotethatyoucanstoreyourmappingsandsetafilenametowhatevernameyoulike.Thecurlcommandwilljusttakethecontentsofit.

Andagain,ifeverythinggoeswell,weseethefollowingresponse:

{"acknowledged":true}

Elasticsearchreportedthatourindexhasbeencreated.IfwelookattheElasticsearchnodewww.EBooksWorld.ir

Page 133: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

–onthecurrentmaster,wewillseesomethingasfollows:

[2015-10-1415:02:12,840][INFO][cluster.metadata][Shalla-Bal]

[posts]creatingindex,cause[api],templates[],shards[5]/[1],mappings

[post]

Wecanseethatthepostsindexhasbeencreated,with5shardsand1replica(shards[5]/[1])andwithmappingsforasingleposttype(mappings[post]).Let’snowdiscussthecontentsoftheposts.jsonfileandthepossibilitieswhenitcomestomappings.

TypeandtypesdefinitionThemappingsdefinitioninElasticsearchisjustanotherJSONobject,soitneedstobeproperlystartedandendedwithcurlybrackets.Allthemappingsdefinitionsarenestedinsideasinglemappingsobject.Inourexample,wehadasingleposttype,butwecanhavemultipleofthem.Forexample,ifwewouldliketohavemorethanasingletypeinourmappings,wejustneedtoseparatethemwithacommacharacter.Let’sassumethatwewouldliketohaveanadditionalusertypeinourpostsindex.Themappingsdefinitioninsuchcasewilllookasfollows(westoreditintheposts_with_user.jsonfile):

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"published":{"type":"date"},

"contents":{"type":"string"}

}

},

"user":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"}

}

}

}

}

Asyoucansee,wecannamethetypeswiththenameswewant.Undereachtypewehavethepropertiesobjectinwhichwestoretheactualnameofthefieldsandtheirdefinition.

FieldsEachfieldinthemappingsdefinitionisjustanameandanobjectdescribingthepropertiesofthefield.Forexample,wecanhaveafielddefinedasthefollowing:

"body":{"type":"string","store":"yes","index":"analyzed"}

Theprecedingfielddefinitionstartswithaname–body.Afterthatwehaveanobjectwiththreeproperties–thetypeofthefield(thetypeproperty),iftheoriginalfieldvalueshouldbestored(thestoreproperty),andifthefieldshouldbeindexedandhow(theindexproperty).And,ofcourse,multiplefielddefinitionsareseparatedfromeachotherusingthecommacharacter,justlikeotherJSONobjects.

www.EBooksWorld.ir

Page 134: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CoretypesEachfieldtypeinElasticsearchcanbegivenoneoftheprovidedcoretypes.ThecoretypesinElasticsearchareasfollows:

StringNumber(integer,long,float,double)DateBooleanBinary

Inadditiontothecoretypes,Elasticsearchprovidesadditionaltypesthatcanhandlemorecomplicateddata–suchasnesteddocuments,object,andsoon.WewilltalkabouttheminChapter5,ExtendingYourIndexStructure.

Commonattributes

Beforecontinuingwithallthecoretypedescriptions,wewouldliketodiscusssomecommonattributesthatyoucanusetodescribeallthetypes(exceptforthebinaryone):

index_name:Thisattributedefinesthenameofthefieldthatwillbestoredintheindex.Ifthisisnotdefined,thenamewillbesettothenameoftheobjectthatthefieldisdefinedwith.Usually,youdon’tneedtosetthisproperty,butitmaybeusefulinsomecases;forexample,whenyoudon’thavecontroloverthenameofthefieldsintheJSONdocumentsthataresenttoElasticsearch.index:Thisattributecantakethevaluesanalyzedandnoand,forstring-basedfields,itcanalsobesettotheadditionalnot_analyzedvalue.Ifsettoanalyzed,thefieldwillbeindexedandthussearchable.Ifsettono,youwon’tbeabletosearchonsuchafield.Thedefaultvalueisanalyzed.Incaseofstring-basedfields,thereisanadditionaloption,not_analyzed.This,whenset,willmeanthatthefieldwillbeindexedbutnotanalyzed.So,thefieldiswrittenintheindexasitwassenttoElasticsearchandonlyaperfectmatchwillbecountedduringasearch–thequerywillhavetoincludeexactlythesamevalueasthevalueintheindex.IfwecompareittotheSQLdatabasesworld,settingtheindexpropertyofafieldtonot_analyzedwouldworkjustlikeusingwherefield=value.Alsorememberthatsettingtheindexpropertytonowillresultinthedisablinginclusionofthatfieldininclude_in_all(theinclude_in_allpropertyisdiscussedasthelastpropertyinthelist).store:Thisattributecantakethevaluesyesandnoandspecifiesiftheoriginalvalueofthefieldshouldbewrittenintotheindex.Thedefaultvalueisno,whichmeansthatElasticsearchwon’tstoretheoriginalvalueofthefieldandwilltrytousethe_sourcefield(theJSONrepresentingtheoriginaldocumentthathasbeensenttoElasticsearch)whenyouwanttoretrievethefieldvalue.Storedfieldsarenotusedforsearching,howevertheycanbeusedforhighlightingifenabled(whichmaybemoreefficientthatloadingthe_sourcefieldincaseitisbig).doc_values:Thisattributecantakethevaluesoftrueandfalse.Whensettotrue,Elasticsearchwillcreateaspecialondiskstructureduringindexationfornottokenizedfields(likenotanalyzedstringfields,numberbasedfields,Booleanfields,

www.EBooksWorld.ir

Page 135: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

anddatefields).ThisstructureishighlyefficientandisusedbyElasticsearchforoperationsthatrequireun-inverteddata,suchasaggregations,sorting,orscripting.StartingwithElasticsearch2.0thedefaultvalueofthisistruefornottokenizedfields.SettingthisvaluetofalsewillresultinElasticsearchusingfielddatacacheinsteadofdocvalues,whichhashighermemorydemand,butmaybefasterinsomeraresituations.boost:Thisattributedefineshowimportantthefieldisinsidethedocument;thehighertheboost,themoreimportantthevaluesinthefieldare.Thedefaultvalueofthisattributeis1,whichmeansaneutralvalue–anythingabove1willmakethefieldmoreimportant,anythinglessthan1willmakeitlessimportant.null_value:Thisattributespecifiesavaluethatshouldbewrittenintotheindexincasethatfieldisnotapartofanindexeddocument.Thedefaultbehaviorwilljustomitthatfield.copy_to:Thisattributespecifiesanarrayoffieldstowhichtheoriginalvaluewillbecopiedto.Thisallowsfordifferentkindofanalysisofthesamedata.Forexample,youcouldimaginehavingtwofields–onecalledtitleandonecalledtitle_sort,eachhavingthesamevaluebutprocesseddifferently.Wecouldusecopy_totocopythetitlefieldvaluetotitle_sort.include_in_all:Thisattributespecifiesifthefieldshouldbeincludedinthe_allfield.The_allfieldisaspecialfieldusedbyElasticsearchtoalloweasysearchinginthecontentsofthewholeindexeddocument.Elasticsearchcreatesthecontentofthe_allfieldbycopyingallthedocumentfieldsthere.Bydefault,ifthe_allfieldisused,allthefieldswillbeincludedinit.

String

Stringisthebasictexttypewhichallowsustostoreoneormorecharactersinsideit.Asampledefinitionofsuchafieldisasfollows:

"body":{"type":"string","store":"yes","index":"analyzed"}

Inadditiontothecommonattributes,thefollowingattributescanalsobesetforthestring-basedfields:

term_vector:Thisattributecantakethevaluesno(thedefaultone),yes,with_offsets,with_positions,andwith_positions_offsets.ItdefineswhetherornottocalculatetheLucenetermvectorsforthatfield.Ifyouareusinghighlighting(distinctionwhichtermswherematchedinadocumentduringthequery),youwillneedtocalculatethetermvectorforthesocalledfastvectorhighlighting–amoreefficienthighlightingversion.analyzer:Thisattributedefinesthenameoftheanalyzerusedforindexingandsearching.Itdefaultstotheglobally-definedanalyzername.search_analyzer:Thisattributedefinesthenameoftheanalyzerusedforprocessingthepartofthequerystringthatissenttoaparticularfield.norms.enabled:Thisattributespecifieswhetherthenormsshouldbeloadedforafield.Bydefault,itissettotrueforanalyzedfields(whichmeansthatthenormswillbeloadedforsuchfields)andtofalsefornon-analyzedfields.Normsarevalues

www.EBooksWorld.ir

Page 136: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

insideofLuceneindexthatareusedwhencalculatingascoreforadocument–usuallynotneededfornotanalyzedfieldsandusedonlyduringquerytime.Anexampleindexcreationcommandthatdisablesnormforasinglefieldpresentwouldlookasfollows:

curl-XPOST'localhost:9200/essb'-d'{

"mappings":{

"book":{

"properties":{

"name":{

"type":"string",

"norms":{

"enabled":false

}

}

}

}

}

}'

norms.loading:ThisattributetakesthevalueseagerandlazyanddefineshowElasticsearchwillloadthenorms.Thefirstvaluemeansthatthenormsforsuchfieldsarealwaysloaded.Thesecondvaluemeansthatthenormswillbeloadedonlywhenneeded.Normsareusefulforscoring,butmayrequireavastamountofmemoryforlargedatasets.Havingnormsloadedeagerly(propertysettoeager)meanslessworkduringquerytime,butwillleadtomorememoryconsumption.Anexampleindexcreationcommandthateagerlyloadnormsforasinglefieldpresentlookasfollows:

curl-XPOST'localhost:9200/essb_eager'-d'{

"mappings":{

"book":{

"properties":{

"name":{

"type":"string",

"norms":{

"loading":"eager"

}

}

}

}

}

}'

position_offset_gap:Thisattributedefaultsto0andspecifiesthegapintheindexbetweeninstancesofthegivenfieldwiththesamename.Settingthistoahighervaluemaybeusefulifyouwantposition-basedqueries(suchasphrasequeries)tomatchonlyinsideasingleinstanceofthefield.index_options:Thisattributedefinestheindexingoptionsforthepostingslist–thestructureholdingtheterms(wetalkmoreaboutitinthePostingsformatsectionofthischapter).Thepossiblevaluesaredocs(onlydocumentnumbersareindexed),freqs(documentnumbersandtermfrequenciesareindexed),positions(documentnumbers,termfrequencies,andtheirpositionsareindexed),andoffsets(document

www.EBooksWorld.ir

Page 137: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

numbers,termfrequencies,theirpositions,andoffsetsareindexed).Thedefaultvalueforthispropertyispositionsforanalyzedfieldsanddocsforfieldsthatareindexedbutnotanalyzed.ignore_above:Thisattributedefinesthemaximumsizeofthefieldincharacters.Afieldwhosesizeisabovethespecifiedvaluewillbeignoredbytheanalyzer.

NoteInoneoftheupcomingElasticsearchversions,thestringtypemaybedeprecatedandmaybereplacedbytwonewtypes,textandkeyword,tobetterindicatewhatthestringbasedfieldisrepresenting.Thetexttypewillbeusedforanalyzedtextfieldsandthekeywordtypewillbeusedfornotanalyzedtextfields.Ifyouareinterestedintheincomingchanges,refertothefollowingGitHubissue:https://github.com/elastic/elasticsearch/issues/12394.

Number

Thisisthecommonnameforafewcoretypesthatgatherallthenumericfieldtypesthatareavailableandwaitingtobeused.ThefollowingtypesareavailableinElasticsearch(wespecifythembyusingthetypeproperty):

byte:Thistypedefinesabytevalue;forexample,1.Itallowsforvaluesbetween-128and127inclusive.short:Thistypedefinesashortvalue;forexample,12.Itallowsforvaluesbetween-32768and32767inclusive.integer:Thistypedefinesanintegervalue;forexample,134.Itallowsforvaluesbetween-231and231-1inclusiveuptoJava7andvaluesbetween0and232-1inJava8.long:Thistypedefinesalongvalue;forexample,123456789.Itallowsforvaluesbetween-263and263-1inclusiveuptoJava7andvaluesbetween0and264-1inJava8.float:Thistypedefinesafloatvalue;forexample,12.23.Forinformationaboutthepossiblevalues,refertohttps://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2.3.double:Thistypedefinesadoublevalue;forexample,123.45.Forinformationaboutthepossiblevalues,refertohttps://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2.3.

NoteYoucanlearnmoreaboutthementionedJavatypesathttp://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html.

Asampledefinitionofafieldbasedononeofthenumerictypesisasfollows:

"price":{"type":"float","precision_step":"4"}

Inadditiontothecommonattributes,thefollowingonescanalsobesetforthenumericfields:

www.EBooksWorld.ir

Page 138: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

precision_step:Thisattributedefinesthenumberoftermsgeneratedforeachvalueinthenumericfield.Thelowerthevalue,thehigherthenumberoftermsgenerated.Forfieldswithahighernumberoftermspervalue,rangequerieswillbefasteratthecostofaslightlylargerindex.Thedefaultvalueis16forlonganddouble,8forinteger,short,andfloat,and2147483647forbyte.coerce:Thisattributedefaultstotrueandcantakethevalueoftrueorfalse.ItdefinesifElasticsearchshouldtrytoconvertthestringvaluestonumbersforagivenfieldandifthedecimalpartsofthefloatvalueshouldbetruncatedfortheintegerbasedfields.ignore_malformed:Thisattributecantakethevaluetrueorfalse(whichisthedefault).Itshouldbesettotrueinordertoomitthebadlyformattedvalues.

Boolean

ThebooleancoretypeisdesignedforindexingtheBooleanvalues(trueorfalse).Asampledefinitionofafieldbasedonthebooleantypeisasfollows:

"allowed":{"type":"boolean","store":"yes"}

Binary

ThebinaryfieldisaBASE64representationofthebinarydatastoredintheindex.Youcanuseittostoredatathatisnormallywritteninbinaryform,suchasimages.Fieldsbasedonthistypearebydefaultstoredandnotindexed,soyoucanonlyretrievethemandnotperformsearchoperationsonthem.Thebinarytypeonlysupportstheindex_name,type,store,anddoc_valuesproperties.Thesamplefielddefinitionbasedonthebinaryfieldmaylooklikethefollowing:

"image":{"type":"binary"}

Date

Thedatecoretypeisdesignedtobeusedfordateindexing.ThedateinthefieldallowsustospecifyaformatthatwillberecognizedbyElasticsearch.ItisworthnotingthatallthedatesareindexedinUTCandareinternallyindexedaslongvalues.Inadditiontothat,forthedatebasedfields,ElasticsearchacceptslongvaluesrepresentingUTCmillisecondssinceepochregardlessoftheformatspecifiedforthedatefield.

ThedefaultdateformatrecognizedbyElasticsearchisquiteuniversalandallowsustoprovidethedateandoptionallythetime;forexample,2012-12-24T12:10:22.Asampledefinitionofafieldbasedonthedatetypeisasfollows:

"published":{"type":"date","format":"YYYY-mm-dd"}

Asampledocumentthatusestheabovedatefieldwiththespecifiedformatisasfollows:

{

"name":"Sampledocument",

"published":"2012-12-22"

}

Inadditiontothecommonattributes,thefollowingonescanalsobesetforthefields

www.EBooksWorld.ir

Page 139: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

basedonthedatetype:

format:Thisattributespecifiestheformatofthedate.ThedefaultvalueisdateOptionalTime.Forafulllistofformats,visithttps://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html.precision_step:Thisattributedefinesthenumberoftermsgeneratedforeachvalueinthenumericfield.Refertothenumericcoretypedescriptionformoreinformationaboutthisparameter.numeric_resolution:ThisattributedefinestheunitoftimethatElasticsearchwillusewhenanumericvalueispassedtothedatebasedfieldinsteadofthedatefollowingaformat.Bydefault,Elasticsearchusesthemillisecondsvalue,whichmeansthatthenumericvaluewillbetreatedasmillisecondssinceepoch.Anothervalueisseconds.ignore_malformed:Thisattributecantakethevaluetrueorfalse.Thedefaultvalueisfalse.Itshouldbesettotrueinordertoomitbadlyformattedvalues.

MultifieldsTherearesituationswhereweneedtohavethesamefieldanalyzeddifferently.Forexample,oneforsorting,oneforsearching,andoneforanalysiswithaggregations,butallusingthesamefieldvalue,justindexeddifferently.Wecouldofcourseusethepreviouslydescribedfieldvaluecopying,butwecanalsousesocalledmultifields.TobeabletousethatfeatureofElasticsearch,weneedtodefineanadditionalpropertyinourfielddefinitioncalledfields.Thefieldsisanobjectthatcancontainoneormoreadditionalfieldsthatwillbepresentinourindexandwillhavethevalueofthefieldthattheyareassignedto.Forexample,ifwewouldliketohaveaggregationsdoneonthenamefieldandinadditiontothatsearchonthatfield,wewoulddefineitasfollows:

"name":{

"type":"string",

"fields":{

"agg":{"type":"string","index":"not_analyzed"}

}

}

Theprecedingdefinitionwillcreatetwofields–onecallednameandthesecondcalledname.agg.Ofcourse,youdon’thavetospecifytwoseparatefieldsinthedatayouaresendingtoElasticsearch–asingleonenamednameisenough.Elasticsearchwilldotherest,whichmeanscopyingthevalueofthefieldtoallthefieldsfromtheprecedingdefinition.

TheIPaddresstypeTheipfieldtypewasaddedtoElasticsearchtosimplifytheuseofIPv4addressesinanumericform.ThisfieldtypeallowsustosearchdatathatisindexedasanIPaddress,sortonsuchdata,anduserangequeriesusingIPvalues.

Asampledefinitionofafieldbasedononeofthenumerictypesisasfollows:

www.EBooksWorld.ir

Page 140: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"address":{"type":"ip"}

Inadditiontothecommonattributes,theprecision_stepattributecanalsobesetfortheiptypebasedfields.Refertothenumerictypedescriptionformoreinformationaboutthatproperty.

Asampledocumentthatusestheipbasedfieldlooksasfollows:

{

"name":"TomPC",

"address":"192.168.2.123"

}

TokencounttypeThetoken_countfieldtypeallowsustostoreandindexinformationabouthowmanytokensthegivenfieldhasinsteadofstoringandindexingthetextprovidedtothefield.Itacceptsthesameconfigurationoptionsasthenumbertype,butinadditiontothat,weneedtospecifytheanalyzerwhichwillbeusedtodividethefieldvalueintotokens.Wedothatbyusingtheanalyzerproperty.

Asampledefinitionofafieldbasedonthetoken_countfieldtypelooksasfollows:

"title_count":{"type":"token_count","analyzer":"standard"}

www.EBooksWorld.ir

Page 141: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UsinganalyzersThegreatthingaboutElasticsearchisthatitleveragestheanalysiscapabilitiesofApacheLucene.Thismeansthatforfieldsthatarebasedonthestringtype,wecanspecifywhichanalyzerElasticsearchshoulduse.AsyourememberfromtheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster,theanalyzerisafunctionalitythatisusedtoanalyzedataorqueriesinthewaywewant.Forexample,whenwedividewordsonthebasisofwhitespacesandlowercasecharacters,wedon’thavetoworryabouttheuserssendingwordsthatarelowercasedoruppercased.ThismeansthatElasticsearch,elasticsearch,andElAstIcSeaRChwillbetreatedasthesameword.What’smoreisthatElasticsearchallowsustousenotonlytheanalyzersprovidedoutofthebox,butalsocreateourownconfigurations.Wecanalsousedifferentanalyzersatthetimeofindexinganddifferentanalyzersatthetimeofquerying—wecanchoosehowwewantourdatatobeprocessedateachstageofthesearchprocess.Let’snowhavealookattheanalyzersprovidedbyElasticsearchandatElasticsearchanalysisfunctionalityingeneral.

Out-of-the-boxanalyzersElasticsearchallowsustouseoneofthemanyanalyzersdefinedbydefault.Thefollowinganalyzersareavailableoutofthebox:

standard:ThisanalyzerisconvenientformostEuropeanlanguages(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.htmlforthefulllistofparameters).simple:Thisanalyzersplitstheprovidedvalueonnon-lettercharactersandconvertsthemtolowercase.whitespace:Thisanalyzersplitstheprovidedvalueonthebasisofwhitespacecharacters.stop:Thisissimilartoasimpleanalyzer,butinadditiontothefunctionalityofthesimpleanalyzer,itfiltersthedataonthebasisoftheprovidedsetofstopwords(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stop-analyzer.htmlforthefulllistofparameters).keyword:Thisisaverysimpleanalyzerthatjustpassestheprovidedvalue.You’llachievethesamebyspecifyingaparticularfieldasnot_analyzed.pattern:Thisanalyzerallowsflexibletextseparationbytheuseofregularexpressions(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.htmlforthefulllistofparameters).Thekeypointtorememberwhenitcomestothepatternanalyzeristhattheprovidedpatternshouldmatchtheseparatorsofthewords,notthewordsthemselves.language:Thisanalyzerisdesignedtoworkwithaspecificlanguage.Thefulllistoflanguagessupportedbythisanalyzercanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html.snowball:Thisisananalyzerthatissimilartostandard,butadditionallyprovidesthe

www.EBooksWorld.ir

Page 142: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

stemmingalgorithm(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-snowball-analyzer.htmlforthefulllistofparameters).

NoteStemmingistheprocessofreducingtheinflectedandderivedwordstotheirstemorbaseform.Suchaprocessallowsforthereductionofwords,forexample,withcarsandcar.Forthementionedwords,stemmer(whichisanimplementationofthestemmingalgorithm)willproduceasinglestem,car.Afterindexing,thedocumentscontainingsuchwordswillbematchedwhileusinganyofthem.Withoutstemming,thedocumentswiththeword“cars”willonlybematchedbyaquerycontainingthesameword.YoucanfindmoreinformationaboutstemmingonWikipediaathttps://en.wikipedia.org/wiki/Stemming.

DefiningyourownanalyzersInadditiontotheanalyzersmentionedpreviously,ElasticsearchallowsustodefinenewoneswithouttheneedforwritingasinglelineofJavacode.Inordertodothat,weneedtoaddanadditionalsectiontoourmappingsfile;thatis,thesettingssection,whichholdsadditionalinformationusedbyElasticsearchduringindexcreation.Thefollowingcodesnippetshowshowwecandefineourcustomsettingssection:

"settings":{

"index":{

"analysis":{

"analyzer":{

"en":{

"tokenizer":"standard",

"filter":[

"asciifolding",

"lowercase",

"ourEnglishFilter"

]

}

},

"filter":{

"ourEnglishFilter":{

"type":"kstem"

}

}

}

}

}

Wespecifiedthatwewantanewanalyzernamedentobepresent.Eachanalyzerisbuiltfromasingletokenizerandmultiplefilters.Acompletelistofthedefaultfiltersandtokenizerscanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html.Ourenanalyzerincludesthestandardtokenizerandthreefilters:asciifoldingandlowercase,whicharetheonesavailablebydefault,andacustomourEnglishFilter,whichisafilterwehavedefined.

www.EBooksWorld.ir

Page 143: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Todefineafilter,weneedtoprovideitsname,itstype(thetypeproperty),andanynumberofadditionalparametersrequiredbythatfiltertype.ThefulllistoffiltertypesavailableinElasticsearchcanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenfilters.html.Pleasebeaware,thatwewon’tbediscussingeachfilterasthelistoffiltersisconstantlychanging.Ifyouareinterestedinthefullfilterslist,pleaserefertothementionedpageinthedocumentation.

So,thefinalmappingsfilewithourcustomanalyzerdefinedwillbeasfollows:

{

"settings":{

"index":{

"analysis":{

"analyzer":{

"en":{

"tokenizer":"standard",

"filter":[

"asciifolding",

"lowercase",

"ourEnglishFilter"

]

}

},

"filter":{

"ourEnglishFilter":{

"type":"kstem"

}

}

}

}

},

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string","analyzer":"en"}

}

}

}

}

Ifwesavetheprecedingmappingstoafilecalledposts_mappings.json,wecanrunthefollowingcommandtocreatethepostsindex:

curl-XPOST'http://localhost:9200/posts'-d@posts_mappings.json

WecanseehowouranalyzerworksbyusingtheAnalyzeAPI(https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html).Forexample,let’slookatthefollowingcommand:

curl-XGET'localhost:9200/posts/_analyze?pretty&field=name'-d'robots

cars'

www.EBooksWorld.ir

Page 144: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThecommandasksElasticsearchtoshowthecontentoftheanalysisofthegivenphrase(robotscars)withtheuseoftheanalyzerdefinedfortheposttypeanditsnamefield.TheresponsethatwewillgetfromElasticsearchisasfollows:

{

"tokens":[{

"token":"robot",

"start_offset":0,

"end_offset":6,

"type":"<ALPHANUM>",

"position":0

},{

"token":"car",

"start_offset":7,

"end_offset":11,

"type":"<ALPHANUM>",

"position":1

}]

}

Asyoucansee,therobotscarsphrasewasdividedintotwotokens.Inadditiontothat,therobotswordwaschangedtorobotandthecarswordwaschangedtocar.

DefaultanalyzersThereisonemorethingtosayaboutanalyzers.Elasticsearchallowsustospecifytheanalyzerthatshouldbeusedbydefaultifnoanalyzerisdefined.Thisisdoneinthesamewayasweconfiguredacustomanalyzerinthesettingssectionofthemappingsfile,butinsteadofspecifyingacustomnamefortheanalyzer,adefaultkeywordshouldbeused.Sotomakeourpreviouslydefinedanalyzerthedefault,wecanchangetheenanalyzertothefollowing:

{

"settings":{

"index":{

"analysis":{

"analyzer":{

"default":{

"tokenizer":"standard",

"filter":[

"asciifolding",

"lowercase",

"ourEnglishFilter"

]

}

},

"filter":{

"ourEnglishFilter":{

"type":"kstem"

}

}

}

}

}

www.EBooksWorld.ir

Page 145: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

Wecanalsochooseadifferentdefaultanalyzerforsearchingandadifferentoneforindexing.Ifwewouldliketodothatinsteadofusingthedefaultkeywordfortheanalyzername,weshouldusedefault_searchanddefault_indexrespectively.

www.EBooksWorld.ir

Page 146: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DifferentsimilaritymodelsWiththereleaseofApacheLucene4.0in2012,alltheusersofthisgreatfulltextsearchlibraryweregiventheopportunitytoalterthedefaultTF/IDF-basedalgorithmanduseadifferentone(we’vementioneditintheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster).BecauseofthatweareabletochooseasimilaritymodelinElasticsearch,whichbasicallyallowsustousedifferentscoringformulasforourdocuments.

NoteNotethatthesimilaritymodelstopicrangesfromintermediatetoadvancedandinmostcasestheTF/IDFbasedalgorithmwillbesufficientforyourusecase.However,wedecidedtohaveitdescribedinthebook,soyouknowthatyouhavethepossibilityofchangingthescoringalgorithmbehaviorifneeded.

Settingper-fieldsimilaritySinceElasticsearch0.90,weareallowedtosetadifferentsimilarityforeachofthefieldsthatwehaveinourmappingsfile.Forexample,let’sassumethatwehavethefollowingsimplemappingsthatweuseinordertoindextheblogposts:

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"contents":{"type":"string"}

}

}

}

}

Todothis,wewillusetheBM25similaritymodelforthenamefieldandthecontentsfield.Inordertodothat,weneedtoextendourfielddefinitionsandaddthesimilaritypropertywiththevalueofthechosensimilarityname.Ourchangedmappingswilllooklikethefollowing:

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string","similarity":"BM25"},

"contents":{"type":"string","similarity":"BM25"}

}

}

}

}

Andthat’sall,nothingmoreisneeded.Aftertheabovechange,ApacheLucenewillusetheBM25similaritytocalculatethescorefactorforthenameandthecontentsfields.

www.EBooksWorld.ir

Page 147: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AvailablesimilaritymodelsThereareatleastfivenewsimilaritymodelsavailable.Formostoftheusecases,apartfromthedefaultone,youmayfindthefollowingmodelsuseful:

OkapiBM25model:Thissimilaritymodelisbasedonaprobabilisticmodelthatestimatestheprobabilityoffindingadocumentforagivenquery.InordertousethissimilarityinElasticsearch,youneedtousetheBM25name.OkapiBM25similarityissaidperformbestwhendealingwithshorttextdocumentswheretermrepetitionsareespeciallyhurtfultotheoveralldocumentscore.Tousethissimilarity,oneneedstosetthesimilaritypropertyforafieldtoBM25.Thissimilarityisdefinedoutoftheboxanddoesn’tneedadditionalpropertiestobeset.Divergencefromrandomnessmodel:Thissimilaritymodelisbasedontheprobabilisticmodelofthesamename.InordertousethissimilarityinElasticsearch,youneedtousetheDFRname.Itissaidthatthedivergencefromrandomnesssimilaritymodelperformswellontextthatissimilartonaturallanguage.Information-basedmodel:Thisisthelastmodelofthenewlyintroducedsimilaritymodelsandisverysimilartothedivergencefromrandomnessmodel.InordertousethissimilarityinElasticsearch,youneedtousetheIBname.SimilartotheDFRsimilarity,itissaidthattheinformation-basedmodelperformswellondatasimilartonaturallanguagetext.

ThetwoothersimilaritymodelscurrentlyavailableareLMDirichletsimilarity(touseit,setthetypepropertytoLMDirichlet)andLMJelinekMercersimilarity(touseit,setthetypepropertytoLMJelinekMercer).YoucanfindmoreaboutthesesimilaritymodelsinApacheLuceneJavadocs,MasteringElasticsearchSecondEdition,publishedbyPacktPublishingorinofficialdocumentationofElasticsearchavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-similarity.html.

Configuringdefaultsimilarity

Thedefaultsimilarityallowsustoprovideanadditionaldiscount_overlapsproperty.Itallowsustocontrolifthetokensonthesamepositionsinthetokenstream(withpositionincrementof0)areomittedduringscorecalculation.Bydefault,itissettotrue,whichmeansthatthetokensonthesamepositionsareomitted;ifyouwantthemtobecounted,youcansetthatpropertytofalse.Forexample,thefollowingcommandshowshowtocreateanindexwiththediscount_overlapspropertychangedforthedefaultsimilarity:

curl-XPUT'localhost:9200/test_similarity'-d'{

"settings":{

"similarity":{

"altered_default":{

"type":"default",

"discount_overlaps":false

}

}

},

"mappings":{

"doc":{

www.EBooksWorld.ir

Page 148: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"properties":{

"name":{"type":"string","similarity":"altered_default"}

}

}

}

}'

ConfiguringBM25similarity

Eventhoughwedon’tneedtoconfiguretheBM25similarity,wecanprovidesomeadditionaloptionstotuneitsbehavior.TheBM25similarityallowsustoprovidethediscount_overlapspropertysimilartothedefaultsimilarityandtwoadditionalproperties:k1andb.Thek1propertyspecifiesthetermfrequencynormalizationfactorandthebpropertyvaluedeterminestowhatdegreethedocumentlengthwillnormalizethetermfrequencyvalues.

ConfiguringDFRsimilarity

IncaseoftheDFRsimilarity,wecanconfigurethebasic_modelproperty(whichcantakethevaluebe,d,g,if,in,p,orine),theafter_effectproperty(withvaluesofno,b,orl),andthenormalizationproperty(whichcanbeno,h1,h2,h3,orz).Ifwechooseanormalizationvalueotherthanno,weneedtosetthenormalizationfactor.

Dependingonthechosennormalizationvalue,weshouldusenormalization.h1.c(thefloatvalue)forh1normalization,normalization.h2.c(thefloatvalue)forh2normalization,normalization.h3.c(thefloatvalue)forh3normalization,andnormalization.z.z(thefloatvalue)forznormalization.Forexample,thefollowingishowtheexamplesimilarityconfigurationwilllook(weputthisintothesettingssectionofourmappingsfile):

"similarity":{

"esserverbook_dfr_similarity":{

"type":"DFR",

"basic_model":"g",

"after_effect":"l",

"normalization":"h2",

"normalization.h2.c":"2.0"

}

}

ConfiguringIBsimilarity

IncaseofIBsimilarity,wehavethefollowingparametersthroughwhichwecanconfigurethedistributionproperty(whichcantakethevalueofllorspl)andthelambdaproperty(whichcantakethevalueofdfortff).Inadditiontothat,wecanchoosethenormalizationfactor,whichisthesameasfortheDFRsimilarity,sowe’llomitdescribingitasecondtime.ThefollowingishowtheexampleIBsimilarityconfigurationwilllook(weputthisintothesettingssectionofourmappingsfile):

"similarity":{

"esserverbook_ib_similarity":{

"type":"IB",

"distribution":"ll",

www.EBooksWorld.ir

Page 149: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"lambda":"df",

"normalization":"z",

"normalization.z.z":"0.25"

}

}

www.EBooksWorld.ir

Page 150: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 151: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

BatchindexingtospeedupyourindexingprocessInChapter1,GettingStartedwithElasticsearchCluster,wesawhowtoindexaparticulardocumentintoElasticsearch.ItrequiredopeninganHTTPconnection,sendingthedocument,andclosingtheconnection.Ofcourse,wewerenotresponsibleformostofthatasweusedthecurlcommand,butinthebackgroundthisiswhathappened.However,sendingthedocumentsonebyoneisnotefficient.Becauseofthat,itisnowtimetofindouthowtoindexalargenumberofdocumentsinamoreconvenientandefficientwaythandoingsoonebyone.

www.EBooksWorld.ir

Page 152: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PreparingdataforbulkindexingElasticsearchallowsustomergemanyrequestsintoonepackage.Thispackagecanbesentasasinglerequest.What’smore,wearenotlimitedtohavingasingletypeofrequestinthesocalledbulk–wecanmixdifferenttypesofoperationstogether,whichinclude:

Addingorreplacingtheexistingdocumentsintheindex(index)Removingdocumentsfromtheindex(delete)

Addingnewdocumentsintotheindexwhenthereisnootherdefinitionofthedocumentintheindex(create)Modifyingthedocumentsorcreatingnewonesifthedocumentdoesn’texist(update)

Theformatoftherequestwaschosenforprocessingefficiency.ItassumesthateverylineoftherequestcontainsaJSONobjectwiththedescriptionoftheoperationfollowedbythesecondlinewithadocument–anotherJSONobjectitself.Wecantreatthefirstlineasakindofinformationlineandthesecondasthedataline.Theexceptiontothisruleisthedeleteoperation,whichcontainsonlytheinformationline,becausethedocumentisnotneeded.Let’slookatthefollowingexample:

{"index":{"_index":"addr","_type":"contact","_id":1}}

{"name":"FyodorDostoevsky","country":"RU"}

{"create":{"_index":"addr","_type":"contact","_id":2}}

{"name":"ErichMariaRemarque","country":"DE"}

{"create":{"_index":"addr","_type":"contact","_id":2}}

{"name":"JosephHeller","country":"US"}

{"delete":{"_index":"addr","_type":"contact","_id":4}}

{"delete":{"_index":"addr","_type":"contact","_id":1}}

Itisveryimportantthateverydocumentoractiondescriptionisplacedinoneline(endedbyanewlinecharacter).Thismeansthatthedocumentcannotbepretty-printed.Thereisadefaultlimitationonthesizeofthebulkindexingfile,whichissetto100megabytesandcanbechangedbyspecifyingthehttp.max_content_lengthpropertyintheElasticsearchconfigurationfile.Thisletsusavoidissueswithpossiblerequesttimeoutsandmemoryproblemswhendealingwithrequeststhataretoolarge.

NoteNotethatwithasinglebatchindexingfile,wecanloadthedataintomanyindicesanddocumentsinthebulkrequestcanhavedifferenttypes.

www.EBooksWorld.ir

Page 153: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndexingthedataInordertoexecutethebulkrequest,Elasticsearchprovidesthe_bulkendpoint.Thiscanbeusedas/_bulkorwithanindexnameas/index_name/_bulkorevenwithatypeandindexnameas/index_name/type_name/_bulk.Thesecondandthirdformsdefinethedefaultvaluesfortheindexnameandthetypename.WecanomitthesepropertiesintheinformationlineofourrequestandElasticsearchwillusethedefaultvaluesfromtheURI.ItisalsoworthknowingthatthedefaultURIvaluescanbeoverwrittenbythevaluesintheinformationlines.

Assumingwe’vestoredourdatainthedocuments.jsonfile,wecanrunthefollowingcommandtosendthisdatatoElasticsearch:

curl-XPOST'localhost:9200/_bulk?pretty'[email protected]

The?prettyparameterisofcoursenotnecessary.We’veusedthisparameteronlyfortheeaseofanalyzingtheresponseoftheprecedingcommand.Whatisimportant,inthiscase,isusingcurlwiththe--data-binaryparameterinsteadofusing–d.Thisisbecausethestandard–dparameterignoresnewlinecharacters,which,aswesaidearlier,areimportantforparsingthebulkrequestcontentbyElasticsearch.Nowlet’slookattheresponsereturnedbyElasticsearch:

{

"took":469,

"errors":true,

"items":[{

"index":{

"_index":"addr",

"_type":"contact",

"_id":"1",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"status":201

}

},{

"create":{

"_index":"addr",

"_type":"contact",

"_id":"2",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"status":201

}

},{

"create":{

www.EBooksWorld.ir

Page 154: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"_index":"addr",

"_type":"contact",

"_id":"2",

"status":409,

"error":{

"type":"document_already_exists_exception",

"reason":"[contact][2]:documentalreadyexists",

"shard":"2",

"index":"addr"

}

}

},{

"delete":{

"_index":"addr",

"_type":"contact",

"_id":"4",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"status":404,

"found":false

}

},{

"delete":{

"_index":"addr",

"_type":"contact",

"_id":"1",

"_version":2,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"status":200,

"found":true

}

}]

}

Aswecansee,everyresultisapartoftheitemsarray.Let’sbrieflycomparetheseresultswithourinputdata.Thefirsttwocommands,namedindexandcreate,wereexecutedwithoutanyproblems.Thethirdoperationfailedbecausewewantedtocreatearecordwithanidentifierthatalreadyexistedintheindex.Thenexttwooperationsweredeletions.Bothsucceeded.Notethatthefirstofthemtriedtodeleteanonexistentdocument;asyoucansee,thiswasn’taproblemforElasticsearch–thethingworthnotingthoughisthatforthenonexistingdocumentwesawastatusof404,whichintheHTTPresponsecodemeansnotfound(http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html).Asyoucansee,Elasticsearchreturnsinformationabouteachoperation,soforlargebulkrequeststheresponsecanbemassive.

www.EBooksWorld.ir

Page 155: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

The_allfieldThe_allfieldisusedbyElasticsearchtostoredatafromalltheotherfieldsinasinglefieldforeaseofsearching.Thiskindoffieldmaybeusefulwhenwewanttoimplementasimplesearchfeatureandwewanttosearchallthedata(oronlythefieldswecopytothe_allfield),butwedon’twanttothinkaboutthefieldnamesandthingslikethat.Bydefault,the_allfieldisenabledandcontainsallthedatafromallthefieldsfromthedocument.However,thisfieldmakestheindexabitbiggerandthatisnotalwaysneeded.

Forexample,whenyouinputasearchphraseintoasearchboxinthelibrarycatalogsite,youexpectthatyoucansearchusingtheauthor’sname,theISBNnumber,andthewordsthatthebooktitlecontains,butsearchingforthenumberofpagesorthecovertypeusuallydoesnotmakesense.Wecaneitherdisablethe_allfieldcompletelyorexcludethecopyingofcertainfieldstoit.Inordernottoincludeacertainfieldinthe_allfield,weusetheinclude_in_allproperty,whichwasdiscussedearlierinthischapter.Tocompletelyturnoffthe_allfieldfunctionality,wemodifyourmappingsfileasfollows:

{

"book":{

"_all":{

"enabled":false

},

"properties":{

...

}

}

}

Inadditiontotheenabledproperty,the_allfieldsupportsthefollowingones:

store

term_vector

analyzer

Forinformationabouttheprecedingproperties,refertotheMappingsconfigurationsectioninthischapter.

www.EBooksWorld.ir

Page 156: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

The_sourcefieldThe_sourcefieldallowsustostoretheoriginalJSONdocumentthatwassenttoElasticsearchduringindexation.Bydefault,the_sourcefieldisturnedonassomeoftheElasticsearchfunctionalitiesdependonit(forexample,thepartialupdatefeature).Inadditiontothat,the_sourcefieldcanbeusedasthesourceofdataforthehighlightingfunctionalityifafieldisnotstored.However,ifwedon’tneedsuchafunctionality,wecandisablethe_sourcefieldasitcausessomestorageoverhead.Inordertodothat,weneedtosetthe_sourceobject’senabledpropertytofalse,asfollows:

{

"book":{ "_source":{

"enabled":false

},

"properties":{

...

}

}

}

WecanalsotellElasticsearchwhichfieldswewanttoexcludefromthe_sourcefieldandwhichfieldswewanttoinclude.Wedothatbyaddingtheincludesandexcludespropertiestothe_sourcefielddefinition.Forexample,ifwewanttoexcludeallthefieldsintheauthorpathfromthe_sourcefield,ourmappingswilllookasfollows:

{

"book":{

"_source":{

"excludes":["author.*"]

},

"properties":{

...

}

}

}

www.EBooksWorld.ir

Page 157: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AdditionalinternalfieldsThereareadditionalfieldsthatareinternallyusedbyElasticsearch,butwhichwecan’tconfigure.Thosefieldsare:

_id:Thisfieldisusedtoholdtheidentifierofthedocumentinsidetheindexandtype_uid:Thisfieldisusedtoholdtheuniqueidentifierofthedocumentintheindexandisbuiltof_idand_type(thisallowstohavedocumentswiththesameidentifierwithdifferenttypesinsidethesameindex)_type:Thisfieldisthetypenameforthedocument_field_names:Thisfieldisthelistoffieldsexistinginthedocument

www.EBooksWorld.ir

Page 158: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 159: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IntroductiontosegmentmergingIntheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster,wementionedsegmentsandtheirimmutability.WewrotethattheLucenelibrary,andthusElasticsearch,writesdatatocertainstructuresthatarewrittenonceandneverchange.Thisallowsforsomesimplification,butalsointroducestheneedforadditionalwork.Onesuchexampleisdeletion.Becausesegment,cannotbealtered,informationaboutdeletionsmustbestoredalongsideanddynamicallyappliedduringsearch.Thisisdonebyfilteringdeleteddocumentsfromthereturnedresultset.Theotherexampleistheinabilitytomodifythedocuments(however,somemodificationsarepossible,suchasmodifyingnumericdocvalues).Ofcourse,onecansaythatElasticsearchsupportsdocumentupdates(refertotheManipulatingdatawiththeRESTAPIsectionofChapter1,GettingStartedwithElasticsearchCluster).However,underthehood,theolddocumentismarkedasdeletedandtheonewiththeupdatedcontentsisindexed.

Astimepassesandyoucontinuetoindexordeleteyourdata,moreandmoresegmentsarecreated.Dependingonhowoftenyoumodifytheindex,Lucenecreatessegmentswithvariousnumbersofdocuments-thus,segmentshavedifferentsizes.Becauseofthat,thesearchperformancemaybelowerandyourindexmaybelargerthanitshouldbe–itstillcontainsthedeleteddocuments.Theequationissimple-themoresegmentsyourindexhas,theslowerthesearchspeedis.Thisiswhensegmentmergingcomesintoplay.Wedon’twanttodescribethisprocessindetail;inthecurrentElasticsearchversion,thispartoftheenginewassimplifiedbutitisstillaratheradvancedtopic.Wedecidedtomentionmergingbecausewethinkthatitishandytoknowwheretolookforthecauseoftroublesconnectedwithtoomanyopenfiles,suspiciousCPUusage,expandingindices,orsearchingandindexingspeeddegradingwithtime.

www.EBooksWorld.ir

Page 160: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SegmentmergingSegmentmergingistheprocessduringwhichtheunderlyingLucenelibrarytakesseveralsegmentsandcreatesanewsegmentbasedontheinformationfoundinthem.Theresultingsegmenthasallthedocumentsstoredintheoriginalsegmentsexcepttheonesthatweremarkedfordeletion.Afterthemergeoperation,thesourcesegmentsaredeletedfromthedisk.BecausesegmentmergingisrathercostlyintermsofCPUandI/Ousage,itiscrucialtoappropriatelycontrolwhenandhowoftenthisprocessisinvoked.

www.EBooksWorld.ir

Page 161: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheneedforsegmentmergingYoumayaskyourselfwhyyouhavetobotherwithsegmentmerging.Firstofall,themoresegmentstheindexisbuiltfrom,theslowerthesearchwillbeandthemorememoryLucenewilluse.Thesecondisthediskspaceandresources,suchasfiledescriptors,usedbytheindex.Ifyoudeletemanydocumentsfromyourindexthen,untilthemergehappens,thosedocumentsareonlymarkedasdeletedandnotdeletedphysically.So,itmayhappenthatmostofthedocumentsthatuseourCPUandmemorydon’texist!Fortunately,Elasticsearchusesreasonabledefaultsforsegmentmerginganditisveryprobablethatnochangesarenecessary.

www.EBooksWorld.ir

Page 162: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThemergepolicyThemergepolicydefineswhenthemergingprocessshouldbeperformed.Elasticsearchmergessegmentsofapproximatelysimilarsizes,takingintoaccountthemaximumnumberofsegmentsallowedpertier.Thealgorithmofmergingcanfindsegmentswiththelowestcostofmergeandthemostimpactontheresultingsegment.

Thebasicpropertiesofthetieredmergepolicyareasfollows:

index.merge.policy.expunge_deletes_allowed:ThispropertytellsElasticsearchtomergesegmentswithpercentageofthedeleteddocumentshigherthanthisvalue,defaultsto10.index.merge.policy.floor_segment:Thispropertydefaultsto2mbandtellsElasticsearchtotreatsmallersegmentsasoneswithsizeequaltothevalueofthisproperty.Itpreventsflushingoftinysegmentstoavoidtheirhighnumber.index.merge.policy.max_merge_at_once:Inthisproperty,themaximumnumberofsegmentstobemergedatoncedefaultsto10.index.merge.policy.max_merge_at_once_explicit:Inthisproperty,themaximumnumberofsegmentsmergedatonceduringexpungedeletesoroptimizeoperationsdefaultsto10.index.merge.policy.max_merged_segment:Inthisproperty,themaximumsizeofsegmentthatcanbeproducedduringnormalmergingdefaultsto5gb.index.merge.policy.segments_per_tier:Thispropertydefaultsto10androughlydefinesthenumberofsegments.Smallervaluesmeanmoremergingbutfewersegments,whichresultsinhighersearchspeedbutlowerindexingspeedandmoreI/Opressure.Highervaluesofthepropertywillresultinhighersegmentscount,thusslowersearchspeedbuthigherindexingspeed.index.merge.policy.reclaim_deletes_weight–ThispropertytellsElasticsearchhowimportantitistochoosesegmentswithmanydeleteddocuments.Itdefaultsto2.0.

Forexample,toupdatemergepolicysettingsofalreadycreatedindexwecouldrunacommandlikethis:

curl-XPUT'localhost:9200/essb/_settings'-d'{

"index.merge.policy.max_merged_segment":"10gb"

}'

Togetdeeperintosegmentmerging,refertoourbookMasteringElasticsearchSecondEdition,publishedbyPacktPublishing.Youcanalsofindmoreinformationaboutthetieredmergepolicyathttps://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-merge.html.

NoteUptothe2.0versionofElasticsearch,wewereabletochoosebetweenthreemergepolicies:tiered,log_byte_size,andlog_doc.Thecurrentlyusedmergepolicyisbased

www.EBooksWorld.ir

Page 163: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

onthetieredmergepolicyandweareforcedtouseit.

www.EBooksWorld.ir

Page 164: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThemergeschedulerThemergeschedulertellsElasticsearchhowthemergeprocessshouldoccur.Thecurrentimplementationisbasedonaconcurrentmergeschedulerthatisstartedinaseparatethreadandusesthedefinednumberofthreadsdoingmergesinparallel.Elasticsearchallowsyoutosetthenumberofthreadsthatcanbeusedforsimultaneousmergingbyusingtheindex.merge.scheduler.max_thread_countproperty.

www.EBooksWorld.ir

Page 165: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThrottlingAswehavealreadymentioned,mergingmaybeexpensivewhenitcomestoserverresources.Themergeprocessusuallyworksinparalleltootheroperations,sotheoreticallyitshouldn’thavetoomuchinfluence.Inpractice,thenumberofdiskinput/outputoperationscanbesolargeastosignificantlyaffecttheoverallperformance.Insuchcases,throttlingissomethingthatmayhelp.Infact,thisfeaturecanbeusedforlimitingthespeedofthemerge,butitmayalsobeusedforalltheoperationsusingthedatastore.ThrottlingcanbesetintheElasticsearchconfigurationfile(theelasticsearch.ymlfile)ordynamicallybyusingthesettingsAPI(refertotheTheupdatesettingsAPIsectionofChapter9,ElasticsearchCluster,fordetail).Therearetwosettingsthatadjustthrottling:typeandvalue.

Tosetthethrottlingtype,settheindices.store.throttle.typeproperty,whichallowsustousethefollowingvalues:

none:Thisvaluedefinesthatnothrottlingisonmerge:Thisvaluedefinesthatthrottlingaffectsonlythemergeprocessall:Thisvaluedefinesthatthrottlingisusedforallthedatastoreactivities

Thesecondproperty,indices.store.throttle.max_bytes_per_sec,describeshowmuchthethrottlinglimitstheI/Ooperations.Asitsnamesuggests,ittellsushowmanybytescanbeprocessedpersecond.Forexample,let’slookatthefollowingconfiguration:

indices.store.throttle.type:merge

indices.store.throttle.max_bytes_per_sec:10mb

Inthisexample,welimitthemergeoperationsto10megabytespersecond.Bydefault,Elasticsearchusesthemergethrottlingtypewiththemax_bytes_per_secpropertysetto20mb.Thismeansthatallthemergeoperationsarelimitedto20megabytespersecond.

www.EBooksWorld.ir

Page 166: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 167: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IntroductiontoroutingBydefault,Elasticsearchwilltrytodistributeyourdocumentsevenlyamongalltheshardsoftheindex.However,that’snotalwaysthedesiredsituation.Inordertoretrievethedocuments,Elasticsearchmustqueryalltheshardsandmergetheresults.Whatifwecoulddivideourdataonsomebasis(forexample,theclientidentifier)andusethatinformationtoputdatawiththesamepropertiesinthesameplaceinthecluster.Elasticsearchallowsustodothatbyexposingapowerfuldocumentandquerydistributioncontrolmechanismrouting.Inshort,itallowsustochooseashardtobeusedtoindexorsearchthedata.

www.EBooksWorld.ir

Page 168: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DefaultindexingDuringindexingoperations,whenyousendadocumentforindexing,Elasticsearchlooksatitsidentifiertochoosetheshardinwhichthedocumentshouldbeindexed.Bydefault,Elasticsearchcalculatesthehashvalueofthedocument’sidentifierand,onthebasisofthat,itputsthedocumentinoneoftheavailableprimaryshards.Then,thosedocumentsareredistributedtothereplicas.Thefollowingdiagramshowsasimpleillustrationofhowindexingworksbydefault:

www.EBooksWorld.ir

Page 169: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DefaultsearchingSearchingisabitdifferentfromindexing,becauseinmostsituationsyouneedtoqueryalltheshardstogetthedatayouareinterestedin(wewilltalkaboutthatinChapter3,SearchingYourData),atleastintheinitialscatterphaseofthequery.Imagineasituationwhenyouhavethefollowingmappingsdescribingyourindex:

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"contents":{"type":"string"},

"userId":{"type":"long"}

}}

}}

Asyoucansee,ourindexconsistsoffourfields:theidentifier(theidfield),nameofthedocument(thenamefield),contentsofthedocument(thecontentsfield),andtheidentifieroftheusertowhichthedocumentsbelong(theuserIdfield).Togetallthedocumentsforaparticularuser,onewithuserIdequalto12,youcanrunthefollowingquery:

curl–XGET'http://localhost:9200/posts/_search?q=userId:12'

Dependingonthesearchtype(wewilltalkmoreaboutitinChapter3,SearchingYourData),Elasticsearchwillrunyourquery.Itusuallymeansthatitwillfirstqueryallthenodesfortheidentifiersandscoreofthematchingdocumentsandthenitwillsendaninternalqueryagain,butonlytotherelevantshards(theonescontainingtheneededdocuments)togetthedocumentsneededtobuildtheresponse.

Averysimplifiedviewofhowthedefaultsearchingworksduringitsinitialphaseisshowninthefollowingillustration:

www.EBooksWorld.ir

Page 170: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Whatifwecouldputallthedocumentsforasingleuserintoasingleshardandqueryonthatshard?Wouldn’tthatbewiseforperformance?Yes,thatishandyandthatiswhatroutingallowsyoudoto.

www.EBooksWorld.ir

Page 171: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RoutingRoutingcancontrolwhichshardyourdocumentsandquerieswillbeforwardedto.Bynow,youwillprobablyhaveguessedthatwecanspecifytheroutingvaluebothduringindexingandduringqueryingand,infact,ifyoudecidetospecifyexplicitroutingvalues,you’llprobablywanttodothatduringindexingandsearching.

Inourcase,wewillusetheuserIdvaluetosetroutingduringindexingandthesamevaluewillbeusedduringsearching.Becausewewillusethesameroutingvalueforallthedocumentsforasingleuser,thesamehashvaluewillbecalculatedandthusallthedocumentsforthatparticularuserwillbeplacedinthesameshard.Usingthesamevalueduringsearchwillresultinsearchingasingleshardinsteadofthewholeindex.

Thereisonethingyoushouldrememberwhenusingroutingwhensearching.Whensearching,youshouldaddaquerypartthatwilllimitthereturneddocumentstotheonesforthegivenuser.Routingisnotenough.Thisisbecauseyou’llprobablyhavemoredistinctroutingvaluesthanthenumberofshardsyourindexwillbebuiltwith.Forexample,youcanhave10shardsbuildingyourindex,butatthesametimehavehundredsofusers.Itisphysicallyimpossibletodedicateasingleshardtoonlyasingleuser.Itisusuallynotgoodfromascalingpointforviewaswell.Becauseofthat,afewdistinctvaluescanpointtothesameshard–inourcasedataofafewuserswillbeplacedinthesameshard.Becauseofthat,weneedaquerypartthatwilllimitthedatatoaparticularuseridentifier,suchasatermquery.

Thefollowingdiagramshowsaverysimpleillustrationofhowsearchingworkswithaprovidedcustomroutingvalue:

www.EBooksWorld.ir

Page 172: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Asyoucansee,Elasticsearchwillsendourquerytoasingleshard.Nowlet’slookathowwecanspecifytheroutingvalues.

www.EBooksWorld.ir

Page 173: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheroutingparametersTheideaisverysimple.TheendpointusedforalltheoperationsconnectedwithfetchingorstoringdocumentsinElasticsearchallowsustouseadditionalparametercalledrouting.YoucanaddittoyourHTTPorsetitbyusingtheclientlibraryofyourchoice.

So,inordertoindexasampledocumenttothepreviouslyshownindex,wewillusethefollowingcommand:

curl-XPUT'http://localhost:9200/posts/post/1?routing=12'-d'{

"id":"1",

"name":"Testdocument",

"contents":"Testdocument",

"userId":"12"

}'

Ifwenowgetbacktoourpreviousqueryfetchingouruser’sdataandwemodifyittouserouting,itwouldlookasfollows:

curl-XGET'http://localhost:9200/posts/_search?routing=12&q=userId:12'

Asyoucansee,thesameroutingvaluewasusedduringindexingandquerying.Thisispossibleinmostcaseswhenroutingisused.Weknowwhichuserdataweareindexingandwewillprobablyknowwhichuserissearchingforthedata.Inourcase,ourimaginaryuserwasgiventheidentifierof12andweusedthatvalueduringindexingandsearching.

Notethatduringsearchingyoucanspecifymultipleroutingvaluesseparatedbycommas.Forexample,ifwewanttheprecedingquerytobeadditionallyroutedbythevalueofthesectionparameter(ifitexisted)andwealsowanttofilterbythisparameter,ourquerywilllooklikethefollowing:

curl-XGET'http://localhost:9200/posts/_search?

routing=12,6654&q=userId:12+AND+section:6654'

Ofcourse,theprecedingcommandcanmatchmultipleshardsnowasthevaluesgiventoroutingcanpointtomultipleshards.Becauseofthatyouneedtoprovideonlyasingleroutingvalueduringindexation(Elasticsearchneedstobepointedtoasingleshardorindexationwillfail).Youcanofcoursequerymultipleshardsatthesametimeandbecauseofthatmultipleroutingvaluescanbeprovidedduringsearching.

NoteRememberthatroutingisnottheonlythingthatisrequiredtogetresultsforagivenuser.That’sbecauseusuallywehavefewshardsthathaveuniqueroutingvalues.Thismeansthatwewillhavedatafrommultipleusersinasingleshard.So,whenusingrouting,youshouldalsonarrowdownyourresultstotheonesforagivenuser.You’lllearnmoreabouthowyoucandothatinChapter3,SearchingYourData.

www.EBooksWorld.ir

Page 174: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RoutingfieldsSpecifyingtheroutingvaluewitheachrequestiscriticalwhenusinganindexoperation.Withoutit,Elasticsearchusesthedefaultwayofdeterminingwherethedocumentshouldbestored–itusesthehashvalueofthedocumentidentifier.Thismayleadtoasituationwhereonedocumentexistsinmanyversionsondifferentshards.Asimilarsituationmayoccurwhenfetchingthedocument.Whenadocumentisstoredwithagivenroutingvalue,wemayhitthewrongshardandthedocumentmaybenotfound.

Infact,Elasticsearchallowsustochangethedefaultbehaviorandforcesustouseroutingwhenqueryingagivenindex.Todothat,weneedtoaddthefollowingsectiontoourtypedefinition:

"_routing":{

"required":true

}

Theprecedingdefinitionmeansthattheroutingvalueneedstobeprovided(the"required":trueproperty);withoutit,anindexrequestwillfail.

www.EBooksWorld.ir

Page 175: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 176: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SummaryInthischapter,we’velearnedalotwhenitcomestoindexationanddatahandlinginElasticsearch.WestartedwithbasicinformationaboutElasticsearchandweproceededtotuningtheschema-lessbehaviorinElasticsearch.Welearnedhowtoconfigureourmappings,useoutoftheboxlanguageanalysiscapabilitiesofElasticsearch,andcreateourownmappings.Welookedatbatchindexingtospeedupindexationandweaddedadditionalinternalinformationtothedocumentsinourindices.Finally,welookedatsegmentmergingandrouting.

Inthenextchapter,wewillfullyconcentrateonsearchingandtheextensivequerylanguageofElasticsearch.WewillstartwithhowtoqueryElasticsearchandhowtheElasticsearchqueryprocessworks.Wewilllearnaboutallthebasicqueriesandcompoundqueriestobeabletousetheminourapplications.Finally,wewillseewhichqueryshouldbechosenforthegivenusecase.

www.EBooksWorld.ir

Page 177: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 178: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Chapter3.SearchingYourDataInthepreviouschapter,wedivedintoElasticsearchindexing.Welearnedalotwhenitcomestodatahandling.WesawhowtotuneElasticsearchschema-lessmechanismandwenowknowhowtocreateourownmappings.WealsosawthecoretypesofElasticsearchandweusedanalyzers–boththeonethatcomesoutoftheboxwithElasticsearchandtheonewedefinedourselves.Weusedbulkindexingandweaddedadditionalinternalinformationtoourindices.Finally,welearnedwhatsegmentmergingis,howwecanfinetuneit,andhowtouseroutinginElasticsearchandwhatitgivesus.Thischapterisfullydedicatedtoquerying.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

HowtoqueryElasticsearchWhathappensinternallywhenqueriesarerunWhatarethebasicqueriesinElasticsearchWhatarethecompoundqueriesinElasticsearchthatallowustogroupotherqueriesHowtousepositionawarequeries–spanqueriesHowtochoosetherightqueryforthejob

www.EBooksWorld.ir

Page 179: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

QueryingElasticsearchSofar,whenwehavesearchedourdata,weusedtheRESTAPIandasimplequeryortheGETrequest.Similarly,whenwewerechangingtheindex,wealsousedtheRESTAPIandsenttheJSON-structureddatatoElasticsearch.Regardlessofthetypeofoperationwewantedtoperform,whetheritwasamappingchangeordocumentindexation,weusedJSONstructuredrequestbodytoinformElasticsearchabouttheoperationdetails.

AsimilarsituationhappenswhenwewanttosendmorethanasimplequerytoElasticsearch,westructureitusingtheJSONobjectsandsendittoElasticsearchintherequestbody.ThisiscalledthequeryDSL.Inabroaderview,Elasticsearchsupportstwokindsofqueries:basiconesandcompoundones.Basicqueries,suchasthetermquery,areusedforqueryingtheactualdata.WewillcovertheseintheBasicqueriessectionofthischapter.Thesecondtypeofqueryisthecompoundquery,suchastheboolquery,whichcancombinemultiplequeries.WewillcovertheseintheCompoundqueriessectionofthischapter.

However,thisisnotthewholepicture.Inadditiontothesetwotypesofqueries,certainqueriescanhavefiltersthatareusedtonarrowdownyourresultswithcertaincriteria.Filterqueriesdon’taffectscoringandareusuallyveryefficientandeasilycached.

Tomakeitevenmorecomplicated,queriescancontainotherqueries(don’tworry;wewilltrytoexplainallthis!).Furthermore,somequeriescancontainfiltersandotherscancontainbothqueriesandfilters.Althoughthisisnoteverything,wewillstickwiththisworkingexplanationfornow.WewillgooverthisingreaterdetailintheCompoundqueriessectioninthischapterandtheFilteringyourresultssectioninChapter4,ExtendingYourQueryingKnowledge.

www.EBooksWorld.ir

Page 180: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheexampledataIfnotstatedotherwise,thefollowingmappingswillbeusedfortherestofthechapter:

{

"book":{

"properties":{

"author":{

"type":"string"

},

"characters":{

"type":"string"

},

"copies":{

"type":"long",

"ignore_malformed":false

},

"otitle":{

"type":"string"

},

"tags":{

"type":"string",

"index":"not_analyzed"

},

"title":{

"type":"string"

},

"year":{

"type":"long",

"ignore_malformed":false,

"index":"analyzed"

},

"available":{

"type":"boolean"

}

}

}

}

Theprecedingmappingsrepresentasimplelibraryandwereusedtocreatethelibraryindex.OnethingtorememberisthatElasticsearchwillanalyzethestringbasedfieldsifwedon’tconfigureitdifferently.

Theprecedingmappingswerestoredinthemapping.jsonfileand,inordertocreatethementionedlibraryindex,wecanusethefollowingcommands:

curl-XPOST'localhost:9200/library'

curl-XPUT'localhost:9200/library/book/_mapping'[email protected]

Wealsousedthefollowingsampledataastheexampleonesforthischapter:

{"index":{"_index":"library","_type":"book","_id":"1"}}

{"title":"AllQuietontheWesternFront","otitle":"ImWestennichts

Neues","author":"ErichMariaRemarque","year":1929,"characters":["Paul

Bäumer","AlbertKropp","HaieWesthus","FredrichMüller","Stanislaus

www.EBooksWorld.ir

Page 181: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Katczinsky","Tjaden"],"tags":["novel"],"copies":1,"available":true,

"section":3}

{"index":{"_index":"library","_type":"book","_id":"2"}}

{"title":"Catch-22","author":"JosephHeller","year":1961,"characters":

["JohnYossarian","CaptainAardvark","ChaplainTappman","Colonel

Cathcart","DoctorDaneeka"],"tags":["novel"],"copies":6,"available":

false,"section":1}

{"index":{"_index":"library","_type":"book","_id":"3"}}

{"title":"TheCompleteSherlockHolmes","author":"ArthurConan

Doyle","year":1936,"characters":["SherlockHolmes","Dr.Watson","G.

Lestrade"],"tags":[],"copies":0,"available":false,"section":12}

{"index":{"_index":"library","_type":"book","_id":"4"}}

{"title":"CrimeandPunishment","otitle":"Преступлéниеи

наказáние","author":"FyodorDostoevsky","year":1886,"characters":

["Raskolnikov","SofiaSemyonovnaMarmeladova"],"tags":[],"copies":0,

"available":true}

Westoredoursampledatainthedocuments.jsonfileandweusethefollowingcommandtoindexit:

curl-s-XPOST'localhost:9200/_bulk'[email protected]

Thiscommandrunsbulkindexing.YoucanlearnmoreaboutitintheBatchindexingtospeedupyourindexingprocesssectioninChapter2,IndexingYourData.

www.EBooksWorld.ir

Page 182: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AsimplequeryThesimplestwaytoqueryElasticsearchistousetheURIrequestquery.WealreadydiscusseditintheSearchingwiththeURIrequestquerysectionofChapter1,GettingStartedwithElasticsearchCluster.Forexample,tosearchforthewordcrimeinthetitlefield,youcouldsendaqueryusingthefollowingcommand:

curl-XGET'localhost:9200/library/book/_search?q=title:crime&pretty'

Thisisaverysimple,butlimited,wayofsubmittingqueriestoElasticsearch.IfwelookfromthepointofviewoftheElasticsearchqueryDSL,theprecedingqueryisaquery_stringquery.Itsearchesforthedocumentsthathavethetermcrimeinthetitlefieldandcanberewrittenasfollows:

{

"query":{

"query_string":{"query":"title:crime"}

}

}

SendingaqueryusingthequeryDSLisabitdifferent,butstillnotrocketscience.WesendtheGET(POSTisalsoacceptedincaseyourtoolorlibrarydoesn’tallowsendingrequestbodyinHTTPGETrequests)HTTPrequesttothe_searchRESTendpointasearlierandincludethequeryintherequestbody.Let’stakealookatthefollowingcommand:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"query_string":{"query":"title:crime"}

}

}'

Asyoucansee,weusedtherequestbody(the-dswitch)tosendthewholeJSON-structuredquerytoElasticsearch.TheprettyrequestparametertellsElasticsearchtostructuretheresponseinsuchawaythatwehumanscanreaditmoreeasily.Inresponsetotheprecedingcommand,wegetthefollowingoutput:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

www.EBooksWorld.ir

Page 183: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

}]

}

}

Nice!WegotourfirstsearchresultswiththequeryDSL.

www.EBooksWorld.ir

Page 184: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PagingandresultsizeElasticsearchallowsustocontrolhowmanyresultswewanttoget(atmost)andfromwhichresultwewanttostart.Thefollowingarethetwoadditionalpropertiesthatcanbesetintherequestbody:

from:Thispropertyspecifiesthedocumentthatwewanttohaveourresultsfrom.Itsdefaultvalueis0,whichmeansthatwewanttogetourresultsfromthefirstdocument.size:Thispropertyspecifiesthemaximumnumberofdocumentswewantastheresultofasinglequery(whichdefaultsto10).Forexample,ifweareonlyinterestedinaggregationsresultsanddon’tcareaboutthedocumentsreturnedbythequery,wecansetthisparameterto0.

Ifwewantourquerytogetdocumentsstartingfromthetenthitemonthelistandfetch20documents,wesendthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"from":9,

"size":20,

"query":{

"query_string":{"query":"title:crime"}

}

}'

TipDownloadingtheexamplecode

Youcandownloadtheexamplecodefilesforthisbookfromyouraccountathttp://www.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.

Youcandownloadthecodefilesbyfollowingthesesteps:

Loginorregistertoourwebsiteusingyoure-mailaddressandpasswordHoverthemousepointerontheSUPPORTtabatthetopClickonCodeDownloads&ErrataEnterthenameofthebookintheSearchboxSelectthebookforwhichyou’relookingtodownloadthecodefilesChoosefromthedrop-downmenuwhereyoupurchasedthisbookfromClickonCodeDownload

Oncethefileisdownloaded,makesurethatyouunziporextractthefolderusingthelatestversionof:

WinRAR/7-ZipforWindowsZipeg/iZip/UnRarXforMac7-Zip/PeaZipforLinux

www.EBooksWorld.ir

Page 185: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ReturningtheversionvalueInadditiontoalltheinformationreturned,Elasticsearchcanreturntheversionofthedocument(wementionedaboutversioninginChapter1,GettingStartedwithElasticsearchCluster.Todothis,weneedtoaddtheversionpropertywiththevalueoftruetothetoplevelofourJSONobject.So,thefinalquery,whichrequeststheversioninformation,willlookasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"version":true,

"query":{

"query_string":{"query":"title:crime"}

}

}'

Afterrunningtheprecedingquery,wegetthefollowingresults:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_version":1,

"_score":0.5,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

}]

}

}

Asyoucansee,the_versionsectionispresentforthesinglehitwegot.

www.EBooksWorld.ir

Page 186: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

LimitingthescoreFornonstandardusecases,Elasticsearchprovidesafeaturethatletsusfiltertheresultsonthebasisofaminimumscorevaluethatthedocumentmusthavetobeconsideredamatch.Inordertousethisfeature,wemustprovidethemin_scorevalueatthetoplevelofourJSONobjectwiththevalueoftheminimumscore.Forexample,ifwewantourquerytoonlyreturndocumentswithascorehigherthan0.75,wesendthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"min_score":0.75,

"query":{

"query_string":{"query":"title:crime"}

}

}'

Wegetthefollowingresponseafterrunningtheprecedingquery:

{

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":0,

"max_score":null,

"hits":[]

}

}

Ifyoulookatthepreviousexamples,thescoreofourdocumentwas0.5,whichislowerthan0.75,andthuswedidn’tgetanydocumentsinresponse.

Limitingthescoreusuallydoesn’tmakemuchsensebecausecomparingscoresbetweenthequeriesisquitehard.However,maybeinyourcase,thisfunctionalitywillbeneeded.

www.EBooksWorld.ir

Page 187: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ChoosingthefieldsthatwewanttoreturnWiththeuseofthefieldsarrayintherequestbody,Elasticsearchallowsustodefinewhichfieldstoincludeintheresponse.Rememberthatyoucanonlyreturnthesefieldsiftheyaremarkedasstoredinthemappingsusedtocreatetheindex,orifthe_sourcefieldwasused(Elasticsearchusesthe_sourcefieldtoprovidethestoredvaluesandthe_sourcefieldisturnedonbydefault).

So,forexample,toreturnonlythetitleandtheyearfieldsintheresults(foreachdocument),sendthefollowingquerytoElasticsearch:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"fields":["title","year"],

"query":{

"query_string":{"query":"title:crime"}

}

}'

Inresponse,wegetthefollowingoutput:

{

"took":5,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"fields":{

"title":["CrimeandPunishment"],

"year":[1886]

}

}]

}

}

Asyoucansee,everythingworkedaswewantedto.Therearefourthingswewouldliketosharewithyouatthispoint,whichareasfollows:

Ifwedon’tdefinethefieldsarray,itwillusethedefaultvalueandreturnthe_sourcefieldifavailable.Ifweusethe_sourcefieldandrequestafieldthatisnotstored,thenthatfieldwillbeextractedfromthe_sourcefield(however,thisrequiresadditionalprocessing).Ifwewanttoreturnallthestoredfields,wejustpassanasterisk(*)asthefieldname.Fromaperformancepointofview,it’sbettertoreturnthe_sourcefieldinsteadof

www.EBooksWorld.ir

Page 188: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

multiplestoredfields.Thisisbecausegettingmultiplestoredfieldsmaybeslowercomparedtoretrievingasingle_sourcefield.

www.EBooksWorld.ir

Page 189: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SourcefilteringInadditiontochoosingwhichfieldsarereturned,Elasticsearchallowsustouseso-calledsourcefiltering.Thisfunctionalityallowsustocontrolwhichfieldsarereturnedfromthe_sourcefield.Elasticsearchexposesseveralwaystodothis.Thesimplestsourcefilteringallowsustodecidewhetheradocumentshouldbereturnedornot.Considerthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"_source":false,

"query":{

"query_string":{"query":"title:crime"}

}

}'

TheresultretunedbyElasticsearchshouldbesimilartothefollowingone:

{

"took":12,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5

}]

}

}

Notethattheresponseislimitedtobaseinformationaboutadocumentandthe_sourcefieldwasnotincluded.IfyouuseElasticsearchasasecondsourceofdataandcontentofthedocumentisservedfromSQLdatabaseorcache,thedocumentidentifierisallyouneed.

Thesecondwayissimilartothatdescribedintheprecedingfields,althoughwedefinewhichfieldsshouldbereturnedinthedocumentsourceitself.Let’sseethatusingthefollowingexamplequery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"_source":["title","otitle"],

"query":{

"query_string":{"query":"title:crime"}

}

}'

Wewantedtogetthetitleandtheotitledocumentfieldsinthereturned_sourcefield.

www.EBooksWorld.ir

Page 190: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Elasticsearchextractedthosevaluesfromtheoriginal_sourcevalueandincludedthe_sourcefieldonlywiththerequestedfields.ThewholeresponsereturnedbyElasticsearchlookedasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"_source":{

"otitle":"Преступлéниеинаказáние",

"title":"CrimeandPunishment"

}

}]

}

}

Wecanalsouseanasterisktoselectwhichfieldsshouldbereturnedinthe_sourcefield;forexample,title*willreturnvaluesforthetitlefieldandfortitle10(ifwehavesuchfieldinourdata).Ifwehavedocumentswithnestedparts,wecanusenotationwithadot;forexample,title.*toselectallthefieldsnestedunderthetitleobject.

Finally,wecanalsospecifyexplicitlywhichfieldswewanttoincludeandwhichtoexcludefromthe_sourcefield.Wecanincludefieldsusingtheincludepropertyandwecanexcludefieldsusingtheexcludeproperty(bothofthemarearraysofvalues).Forexample,ifwewantthereturned_sourcefieldtoincludeallthefieldsstartingwiththelettertbutnotthetitlefield,wewillrunthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"_source":{

"include":["t*"],

"exclude":["title"]

},

"query":{

"query_string":{"query":"title:crime"}

}

}'

www.EBooksWorld.ir

Page 191: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UsingthescriptfieldsElasticsearchallowsustousescript-evaluatedvaluesthatwillbereturnedwiththeresultdocuments(wewilldiscussElasticsearchscriptingcapabilitiesingreaterdetailintheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter).Tousethescriptfieldsfunctionality,weaddthescript_fieldssectiontoourJSONqueryobjectandanobjectwithanameofourchoiceforeachscriptedvaluethatwewanttoreturn.Forexample,toreturnavaluenamedcorrectYear,whichiscalculatedastheyearfieldminus1800,werunthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"script_fields":{

"correctYear":{

"script":"doc[\"year\"].value-1800"

}

},

"query":{

"query_string":{"query":"title:crime"}

}

}'

NoteBydefault,Elasticsearchdoesn’tallowustousedynamicscripting.Ifyoutriedtheprecedingquery,youprobablygotanerrorwithinformationstatingthatthescriptsoftype[inline]withoperation[search]andlanguage[groovy]aredisabled.Tomakethisexamplework,youshouldaddthescript.inline:onpropertytotheelasticsearch.ymlfile.However,thisexposesasecuritythreat.MakesuretoreadtheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter,tolearnabouttheconsequences.

Usingthedocnotation,likewedidintheprecedingexample,allowsustocatchtheresultsreturnedandspeedupscriptexecutionatthecostofhighermemoryconsumption.Wealsogetlimitedtosingle-valuedandsingletermfields.Ifwecareaboutmemoryusage,orifweareusingmorecomplicatedfieldvalues,wecanalwaysusethe_sourcefield.Thesamequeryusingthe_sourcefieldlooksasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"script_fields":{

"correctYear":{

"script":"_source.year-1800"

}

},

"query":{

"query_string":{"query":"title:crime"}

}

}'

ThefollowingresponseisreturnedbyElasticsearchwithdynamicscriptingenabled:

{

"took":76,

www.EBooksWorld.ir

Page 192: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"fields":{

"correctYear":[86]

}

}]

}

}

Asyoucansee,wegotthecalculatedcorrectYearfieldinresponse.

PassingparameterstothescriptfieldsLet’stakealookatonemorefeatureofthescriptfields-thepassingofadditionalparameters.Insteadofhavingthevalue1800intheequation,wecanuseavariablenameandpassitsvalueintheparamssection.Ifwedothis,ourquerywilllookasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"script_fields":{

"correctYear":{

"script":"_source.year-paramYear",

"params":{

"paramYear":1800

}

}

},

"query":{

"query_string":{"query":"title:crime"}

}

}'

Asyoucansee,weaddedtheparamYearvariableaspartofthescriptedequationandprovideditsvalueintheparamssection.ThisallowsElasticsearchtoexecutethesamescriptwithdifferentparametervaluesinaslightlymoreefficientway.

www.EBooksWorld.ir

Page 193: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 194: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UnderstandingthequeryingprocessAfterreadingtheprevioussection,wenowknowhowqueryingworksinElasticsearch.YouknowthatElasticsearch,inmostcases,needstoscatterthequeryacrossmultiplenodes,gettheresults,mergethem,fetchtherelevantdocumentsfromoneormoreshards,andreturnthefinalresultstotheclientrequestingthedocuments.Whatwedidn’ttalkaboutaretwoadditionalthingsthatdefinehowqueriesbehave:searchtypeandqueryexecutionpreference.WewillnowconcentrateonthesefunctionalitiesofElasticsearch.

www.EBooksWorld.ir

Page 195: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

QuerylogicElasticsearchisadistributedsearchengineandsoallfunctionalityprovidedmustbedistributedinitsnature.Itisexactlythesamewithquerying.Becausewewouldliketodiscusssomemoreadvancedtopicsonhowtocontrolthequeryprocess,wefirstneedtoknowhowitworks.

Let’snowgetbacktohowqueryingworks.Westartedthetheoryinthefirstchapterandwewouldliketogetbacktoit.Bydefault,ifwedon’talteranything,thequeryprocesswillconsistoftwophases:thescatterandthegatherphase.Theaggregatornode(theonethatreceivestherequest)willrunthescatterphasefirst.Duringthatphase,thequeryisdistributedtoalltheshardsthatourindexisbuiltfrom(ofcourseifroutingisnotused).Forexample,ifitisbuiltof5shardsand1replicathen5physicalshardswillbequeried(wedon’tneedtoqueryashardanditsreplicaastheycontainthesamedata).Eachofthequeriedshardswillonlyreturnthedocumentidentifierandthescoreofthedocument.Thenodethatsentthescatterquerywillwaitforalltheshardstocompletetheirtask,gathertheresults,andsortthemappropriately(inthiscase,fromtopscoringtothelowestscoringones).

Afterthat,anewrequestwillbesenttobuildthesearchresults.However,nowonlytothoseshardsthatheldthedocumentstobuildtheresponse.Inmostcases,Elasticsearchwon’tsendtherequesttoalltheshardsbuttoitssubset.That’sbecauseweusuallydon’tgetthecompleteresultofthequerybutonlyaportionofit.Thisphaseiscalledthegatherphase.Afterallthedocumentsaregathered,thefinalresponseisbuiltandreturnedasthequeryresult.ThisisthebasicanddefaultElasticsearchbehaviorbutwecanchangeit.

www.EBooksWorld.ir

Page 196: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SearchtypeElasticsearchallowsustochoosehowwewantourquerytobeprocessedinternally.Wecandothatbyspecifyingthesearchtype.Therearedifferentsituationswheredifferentsearchtypesareappropriate:sometimesonecancareonlyabouttheperformancewhilesometimesqueryrelevanceisthemostimportantfactor.YoushouldrememberthateachshardisasmallLuceneindexand,inordertoreturnmorerelevantresults,someinformation,suchasfrequencies,needstobetransferredbetweentheshards.Tocontrolhowthequeriesareexecuted,wecanpassthesearch_typerequestparameterandsetittooneofthefollowingvalues:

query_then_fetch:Inthefirststep,thequeryisexecutedtogettheinformationneededtosortandrankthedocuments.Thisstepisexecutedagainstalltheshards.Thenonlytherelevantshardsarequeriedfortheactualcontentofthedocuments.Thisisthesearchtypeusedbydefaultifnosearchtypeisprovidedwiththequeryandthisisthequerytypewedescribedpreviously.dfs_query_then_fetch:Thisissimilartoquery_then_fetch.However,itcontainsanadditionalqueryphasecomparingtoquery_then_fetchwhichcalculatesdistributedtermfrequencies.

Therearealsotwodeprecatedsearchtypes:countandscan.ThefirstoneisdeprecatedstartingfromElasticsearch2.0andthesecondonestartingwithElasticsearch2.1.Thefirstsearchtypeusedtoprovidebenefitswhereonlyaggregationsorthenumberofdocumentswasrelevant,butnowitisenoughtoaddsizeequalto0toyourqueries.Thescanrequestwasusedforscrollingfunctionality.

Soifwewouldliketousethesimplestsearchtype,wewouldrunthefollowingcommand:

curl-XGET'localhost:9200/library/book/_search?

pretty&search_type=query_then_fetch'-d'{

"query":{

"term":{"title":"crime"}

}

}'

www.EBooksWorld.ir

Page 197: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SearchexecutionpreferenceInadditiontothepossibilityofcontrollinghowthequeryisexecuted,wecanalsocontrolonwhichshardstoexecutethequery.Bydefault,Elasticsearchusesshardsandreplicasonanynodeinaroundrobinmanner–sothateachshardisqueriedasimilarnumberoftimes.Thedefaultbehavioristhepropermethodofshardexecutionpreferenceformostusecases.Buttheremaybetimeswhenwewanttochangethedefaultbehavior.Forexample,youmaywantthesearchtoonlybeexecutedontheprimaryshards.Todothat,wecansetthepreferencerequestparametertooneofthefollowingvalues:

_primary:Theoperationwillbeonlyexecutedontheprimaryshards,sothereplicaswon’tbeused.Thiscanbeusefulwhenweneedtousethelatestinformationfromtheindexbutourdataisnotreplicatedrightaway._primary_first:Theoperationwillbeexecutedontheprimaryshardsiftheyareavailable.Ifnot,itwillbeexecutedontheothershards._replica:Theoperationwillbeexecutedonlyonthereplicashards._replica_first:Thisoperationissimilarto_primary_first,butusesreplicashards.Theoperationwillbeexecutedonthereplicashardsifpossibleandontheprimaryshardsifthereplicasarenotavailable._local:Theoperationwillbeexecutedontheshardsavailableonthenodewhichtherequestwassentfromand,ifsuchshardsarenotpresent,therequestwillbeforwardedtotheappropriatenodes._only_node:node_id:Thisoperationwillbeexecutedonthenodewiththeprovidednodeidentifier._only_nodes:nodes_spec:Thisoperationwillbeexecutedonthenodesthataredefinedinnodes_spec.ThiscanbeanIPaddress,aname,anameorIPaddressusingwildcards,andsoon.Forexample,ifnodes_specissetto192.168.1.*,theoperationwillberunonthenodeswithIPaddressesstartingwith192.168.1._prefer_node:node_id:Elasticsearchwilltrytoexecutetheoperationonthenodewiththeprovidedidentifier.However,ifthenodeisnotavailable,itwillbeexecutedonthenodesthatareavailable._shards:1,2:Elasticsearchwillexecutetheoperationontheshardswiththegivenidentifiers;inthiscase,onshardswithidentifiers1and2.The_shardsparametercanbecombinedwithotherpreferences,buttheshardsidentifiersneedtobeprovidedfirst.Forexample,_shards:1,2;_local.Customvalue:Anycustom,stringvaluemaybepassed.Requestswiththesamevaluesprovidedwillbeexecutedonthesameshards.

Forexample,ifwewouldliketoexecuteaqueryonlyonthelocalshards,wewouldrunthefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty&preference=_local'-d'{

"query":{

"term":{"title":"crime"}

}

}'

www.EBooksWorld.ir

Page 198: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SearchshardsAPIWhendiscussingthesearchpreference,wewouldalsoliketomentionthesearchshardsAPIexposedbyElasticsearch.ThisAPIallowsustocheckwhichshardsthequerywillbeexecutedon.InordertousethisAPI,runarequestagainstthesearch_shardsrestendpoint.Forexample,toseehowthequerywillbeexecuted,werunthefollowingcommand:

curl-XGET'localhost:9200/library/_search_shards?pretty'-d

'{"query":"match_all":{}}'

Theresponsetotheprecedingcommandwillbeasfollows:

{

"nodes":{

"my0DcA_MTImm4NE3cG3ZIg":{

"name":"Cloud9",

"transport_address":"127.0.0.1:9300",

"attributes":{}

}

},

"shards":[[{

"state":"STARTED",

"primary":true,

"node":"my0DcA_MTImm4NE3cG3ZIg",

"relocating_node":null,

"shard":0,

"index":"library",

"version":4,

"allocation_id":{

"id":"9ayLDbL1RVSyJRYIJkuAxg"

}

}],[{

"state":"STARTED",

"primary":true,

"node":"my0DcA_MTImm4NE3cG3ZIg",

"relocating_node":null,

"shard":1,

"index":"library",

"version":4,

"allocation_id":{

"id":"wfpvtaLER-KVyOsuD46Yqg"

}

}],[{

"state":"STARTED",

"primary":true,

"node":"my0DcA_MTImm4NE3cG3ZIg",

"relocating_node":null,

"shard":2,

"index":"library",

"version":4,

"allocation_id":{

"id":"zrLPWhCOSTmjlb8TY5rYQA"

}

}],[{

www.EBooksWorld.ir

Page 199: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"state":"STARTED",

"primary":true,

"node":"my0DcA_MTImm4NE3cG3ZIg",

"relocating_node":null,

"shard":3,

"index":"library",

"version":4,

"allocation_id":{

"id":"efnvY7YcSz6X8X8USacA7g"

}

}],[{

"state":"STARTED",

"primary":true,

"node":"my0DcA_MTImm4NE3cG3ZIg",

"relocating_node":null,

"shard":4,

"index":"library",

"version":4,

"allocation_id":{

"id":"XJHW2J63QUKdh3bK3T2nzA"

}

}]]

}

Asyoucansee,intheresponsereturnedbyElasticsearch,wehavetheinformationabouttheshardsthatwillbeusedduringthequeryprocess.Ofcourse,withthesearchshardsAPI,wecanuseadditionalparametersthatcontrolthequeryingprocess.Thesepropertiesarerouting,preference,andlocal.Wearealreadyfamiliarwiththefirsttwo.ThelocalparameterisaBoolean(valuestrueorfalse),onethatallowsustotellElasticsearchtousetheclusterstateinformationstoredonthelocalnode(settinglocaltotrue)insteadoftheonefromthemasternode(settinglocaltofalse).Thisallowsustodiagnoseproblemswithclusterstatesynchronization.

www.EBooksWorld.ir

Page 200: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 201: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

BasicqueriesElasticsearchhasextensivesearchanddataanalysiscapabilitiesthatareexposedinformsofdifferentqueries,filters,aggregates,andsoon.Inthissection,wewillconcentrateonthebasicqueriesprovidedbyElasticsearch.Bybasicquerieswemeantheonesthatdon’tcombinetheotherqueriestogetherbutrunontheirown.

www.EBooksWorld.ir

Page 202: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThetermqueryThetermqueryisoneofthesimplestqueriesinElasticsearch.Itjustmatchesthedocumentthathasaterminagivenfield-theexact,notanalyzedterm.Thesimplesttermqueryisasfollows:

{

"query":{

"term":{

"title":"crime"

}

}

}

Itwillmatchthedocumentsthathavethetermcrimeinthetitlefield.Rememberthatthetermqueryisnotanalyzed,soyouneedtoprovidetheexacttermthatwillmatchthetermintheindexeddocument.Notethatinourinputdata,wehavethetitlefieldwiththevalueofCrimeandPunishment(uppercased),butwearesearchingforcrime,becausetheCrimetermsbecomescrimeafteranalysisduringindexing.

Inadditiontothetermwewanttofind,wecanalsoincludetheboostattributetoourtermquery,whichwillaffecttheimportanceofthegiventerm.WewilltalkmoreaboutboostsintheIntroductiontoApacheLucenescoringsectionofChapter6,MakeYourSearchBetter.Fornow,wejustneedtorememberthatitchangestheimportanceofthegivenpartofthequery.

Forexample,tochangeourpreviousqueryandgiveourtermqueryaboostof10.0,sendthefollowingquery:

{

"query":{

"term":{

"title":{

"value":"crime",

"boost":10.0

}

}

}

}

Asyoucansee,thequerychangedabit.Insteadofasimpletermvalue,wenestedanewJSONobjectwhichcontainsthevaluepropertyandtheboostproperty.Thevalueofthevaluepropertyshouldcontainthetermweareinterestedinandtheboostpropertyistheboostvaluewewanttouse.

www.EBooksWorld.ir

Page 203: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThetermsqueryThetermsqueryisanextensiontothetermquery.Itallowsustomatchdocumentsthathavecertaintermsintheircontentsinsteadofasingleterm.Thetermqueryallowedustomatchasingle,notanalyzedtermandthetermsqueryallowsustomatchmultipleofthose.Forexample,let’ssaythatwewanttogetallthedocumentsthathavethetermsnovelorbookinthetagsfield.Toachievethis,wewillrunthefollowingquery:

{

"query":{

"terms":{

"tags":["novel","book"]

}

}

}

Theprecedingqueryreturnsallthedocumentsthathaveoneorbothofthesearchedtermsinthetagsfield.Thisisakeypointtoremember–thetermsquerywillfinddocumentshavinganyoftheprovidedterms.

www.EBooksWorld.ir

Page 204: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThematchallqueryThematchallqueryisoneofthesimplestqueriesavailableinElasticsearch.Itallowsustomatchallofthedocumentsintheindex.Ifwewanttogetallthedocumentsfromourindex,wejustrunthefollowingquery:

{

"query":{

"match_all":{}

}

}

Wecanalsoincludeboostinthequery,whichwillbegiventoallthedocumentsmatchedbyit.Forexample,ifwewanttoaddaboostof2.0toallthedocumentsinourmatchallquery,wewillsendthefollowingquerytoElasticsearch:

{

"query":{

"match_all":{

"boost":2.0

}

}

}

www.EBooksWorld.ir

Page 205: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThetypequeryAverysimplequerythatallowsustofindallthedocumentswithacertaintype.Forexample,ifwewouldliketosearchforallthedocumentswiththebooktypeinourlibraryindex,wewillrunthefollowingquery:

{

"query":{

"type":{

"value":"book"

}

}

}

www.EBooksWorld.ir

Page 206: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheexistsqueryAquerythatallowsustofindallthedocumentsthathaveavalueinthedefinedfield.Forexample,tofindthedocumentsthathaveavalueinthetagsfield,wewillrunthefollowingquery:

{

"query":{

"exists":{

"field":"tags"

}

}

}

www.EBooksWorld.ir

Page 207: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThemissingqueryOppositetotheexistsquery,themissingqueryreturnsthedocumentsthathaveanullvalueornovalueatallinagivenfield.Forexample,tofindallthedocumentsthatdon’thaveavalueinthetagsfield,wewillrunthefollowingquery:

{

"query":{

"missing":{

"field":"tags"

}

}

}

www.EBooksWorld.ir

Page 208: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThecommontermsqueryThecommontermsqueryisamodernElasticsearchsolutionforimprovingqueryrelevanceandprecisionwithcommonwordswhenwearenotusingstopwords(http://en.wikipedia.org/wiki/Stop_words).Forexample,acrimeandpunishmentqueryresultsinthreetermqueriesandeachofthemhaveacostintermsofperformance.However,theandtermisaverycommononeanditsimpactonthedocumentscorewillbeverylow.Thesolutionisthecommontermsquerywhichdividesthequeryintotwogroups.Thefirstgroupistheonewithimportantterms,whicharetheonesthathavelowerfrequency.Thesecondgroupistheonewithlessimportantterms,whicharetheoneswithhighfrequency.ThefirstqueryisexecutedfirstandElasticsearchcalculatesthescoreforallofthetermsfromthefirstgroup.Thiswaythelowfrequencyterms,whichareusuallytheonesthathavemoreimportance,arealwaystakenintoconsideration.ThenElasticsearchexecutesthesecondqueryforthesecondgroupofterms,butcalculatesthescoreonlyforthedocumentsmatchedforthefirstquery.Thiswaythescoreisonlycalculatedfortherelevantdocumentsandthushigherperformancecanbeachieved.

Anexampleofthecommontermsqueryisasfollows:

{

"query":{

"common":{

"title":{

"query":"crimeandpunishment",

"cutoff_frequency":0.001

}

}

}

}

Thequerycantakethefollowingparameters:

query:Theactualquerycontents.cutoff_frequency:Thepercentage(0.001means0.1%)oranabsolutevalue(whenpropertyissettoavalueequaltoorlargerthan1).Highandlowfrequencygroupsareconstructedusingthisvalue.Settingthisparameterto0.001meansthatthelowfrequencytermsgroupwillbeconstructedfortermshavingafrequencyof0.1%andlower.low_freq_operator:Thiscanbesettoororand,butdefaultstoor.ItspecifiestheBooleanoperatorusedforconstructingqueriesinthelowfrequencytermgroup.Ifwewantallthetermstobepresentinadocumentforittobeconsideredamatch,weshouldsetthisparametertoand.high_freq_operator:Thiscanbesettoororand,butdefaultstoor.ItspecifiestheBooleanoperatorusedforconstructingqueriesinthehighfrequencytermgroup.Ifwewantallthetermstobepresentinadocumentforittobeconsideredamatch,weshouldsetthisparametertoand.minimum_should_match:Insteadofusinglow_freq_operatorandhigh_freq_operator,wecanuseminimum_should_match.Justlikewiththeother

www.EBooksWorld.ir

Page 209: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

queries,itallowsustospecifytheminimumnumberoftermsthatshouldbefoundinadocumentforittobeconsideredamatch.Wecanalsospecifyhigh_freqandlow_freqinsidetheminimum_should_matchobject,whichallowsustodefinethedifferentnumberoftermsthatneedtobematchedforthehighandlowfrequencyterms.boost:Theboostgiventothescoreofthedocuments.analyzer:Thenameoftheanalyzerthatwillbeusedtoanalyzethequerytext,whichdefaultstothedefaultanalyzer.disable_coord:Defaultstofalseandallowsustoenableordisablethescorefactorcomputationthatisbasedonthefractionofallthequerytermsthatadocumentcontains.Setittotrueforlessprecisescoring,butslightlyfasterqueries.

NoteUnlikethetermandtermsqueries,thecommontermsqueryisanalyzedbyElasticsearch.

www.EBooksWorld.ir

Page 210: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThematchqueryThematchquerytakesthevaluesgiveninthequeryparameter,analyzesit,andconstructstheappropriatequeryoutofit.Whenusingamatchquery,Elasticsearchwillchoosetheproperanalyzerforthefieldwechoose,soyoucanbesurethatthetermspassedtothematchquerywillbeprocessedbythesameanalyzerthatwasusedduringindexing.Rememberthatthematchquery(andthemulti_matchquery)doesn’tsupportLucenequerysyntax;however,itperfectlyfitsasaqueryhandlerforyoursearchbox.Thesimplestmatch(andthedefault)querywilllooklikethefollowing:

{

"query":{

"match":{

"title":"crimeandpunishment"

}

}

}

Theprecedingquerywillmatchallthedocumentsthathavethetermscrime,and,orpunishmentinthetitlefield.However,thepreviousqueryisonlythesimplestone;therearemultipletypesofmatchquerywhichwewilldiscussnow.

TheBooleanmatchqueryTheBooleanmatchqueryisaquerywhichanalyzestheprovidedtextandmakesaBooleanqueryoutofit.Thisisalsothedefaulttypeforthematchquery.ThereareafewparameterswhichallowustocontrolthebehavioroftheBooleanmatchqueries:

operator:Thisparametercantakethevalueofororand,andcontrolswhichBooleanoperatorisusedtoconnectthecreatedBooleanclauses.Thedefaultvalueisor.Ifwewantallthetermsinourquerytobematched,weshouldusetheandBooleanoperator.analyzer:Thisspecifiesthenameoftheanalyzerthatwillbeusedtoanalyzethequerytextanddefaultstothedefaultanalyzer.fuzziness:Providingthevalueofthisparameterallowsustoconstructfuzzyqueries.Thevalueofthisparametercanvary.Fornumericfields,itshouldbesettonumericvalue;fordatebasedfield,itcanbesettomillisecondortimevalue,suchas2h;andfortextfields,itcanbesetto0,1,or2(theeditdistanceintheLevenshteinalgorithm–https://en.wikipedia.org/wiki/Levenshtein_distance),AUTO(whichallowsElasticsearchtocontrolhowfuzzyqueriesareconstructedandwhichisapreferredvalue).Finally,fortextfields,itcanalsobesettovaluesfrom0.0to1.0,whichresultsineditdistancebeingcalculatedastermlengthminus1.0multipliedbytheprovidedfuzzinessvalue.Ingeneral,thehigherthefuzziness,themoredifferencebetweentermswillbeallowed.prefix_length:Thisallowscontroloverthebehaviorofthefuzzyquery.Formoreinformationonthevalueofthisparameter,refertotheThefuzzyquerysectioninthischapter.max_expansions:Thisallowscontroloverthebehaviorofthefuzzyquery.Formore

www.EBooksWorld.ir

Page 211: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

informationonthevalueofthisparameter,refertotheThefuzzyquerysectioninthischapter.zero_terms_query:Thisallowsustospecifythebehaviorofthequery,whenallthetermsareremovedbytheanalyzer(forexample,becauseofstopwords).Itcanbesettononeorall,withnoneasthedefault.Whensettonone,nodocumentswillbereturnedwhentheanalyzerremovesallthequeryterms.Ifsetittoall,allthedocumentswillbereturned.cutoff_frequency:Itallowsdividingthequeryintotwogroups:onewithhighfrequencytermsandonewithlowfrequencyterms.Refertothedescriptionofthecommontermsquerytoseehowthisparametercanbeused.lenient:Whensettotrue(bydefaultitisfalse),itallowsustoignoretheexceptionscausedbydataincompatibility,suchastryingtoquerynumericfieldsusingstringvalue.

Theparametersshouldbewrappedinthenameofthefieldwearerunningthequeryagainst.SoifwewanttorunasampleBooleanmatchqueryagainstthetitlefield,wesendaqueryasfollows:

{

"query":{

"match":{

"title":{

"query":"crimeandpunishment",

"operator":"and"

}

}

}

}

ThephrasematchqueryAphrasematchqueryissimilartotheBooleanquery,but,insteadofconstructingtheBooleanclausesfromtheanalyzedtext,itconstructsthephrasequery.YoumaywonderwhatphraseiswhenitcomestoLuceneandElasticsearch–well,itistwoormoretermspositionedoneafteranotherinanorder.Thefollowingparametersareavailable:

slop:Anintegervaluethatdefineshowmanyunknownwordscanbeputbetweenthetermsinthetextqueryforamatchtobeconsideredaphrase.Thedefaultvalueofthisparameteris0,whichmeansthatnoadditionalwordsareallowed.analyzer:Thisspecifiesthenameoftheanalyzerthatwillbeusedtoanalyzethequerytextanddefaultstothedefaultanalyzer.

Asamplephrasematchqueryagainstthetitlefieldlookslikethefollowingcode:

{

"query":{

"match_phrase":{

"title":{

"query":"crimepunishment",

"slop":1

}

www.EBooksWorld.ir

Page 212: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

}

}

Notethatweremovedtheandtermfromourquery,butbecausetheslopissetto1,itwillstillmatchourdocumentbecauseweallowedonetermtobepresentbetweenourterms.

ThematchphraseprefixqueryThelasttypeofthematchqueryisthematchphraseprefixquery.Thisqueryisalmostthesameasthephrasematchquery,butinaddition,itallowsprefixmatchesonthelastterminthequerytext.Also,inadditiontotheparametersexposedbythematchphrasequery,itexposesanadditionalone–themax_expansionsparameter,whichcontrolshowmanyprefixesthelasttermwillberewrittento.Ourexamplequerychangedtothematch_phrase_prefixquerywilllookasfollows:

{

"query":{

"match_phrase_prefix":{

"title":{

"query":"crimepunishm",

"slop":1,

"max_expansions":20

}

}

}

}

Notethatwedidn’tprovidethefullcrimeandpunishmentphrase,butonlycrimepunishmandstillthequerywouldmatchourdocument.Thisisbecauseweusedthematch_phrase_prefixquerycombinedwithslopsetto1.

www.EBooksWorld.ir

Page 213: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThemultimatchqueryItisthesameasthematchquery,butinsteadofrunningagainstasinglefield,itcanberunagainstmultiplefieldswiththeuseofthefieldsparameter.Ofcourse,alltheparametersyouusewiththematchquerycanbeusedwiththemultimatchquery.Soifwewouldliketomodifyourmatchquerytoberunagainstthetitleandotitlefields,wewillrunthefollowingquery:

{

"query":{

"multi_match":{

"query":"crimepunishment",

"fields":["title^10","otitle"]

}

}

}

Asshownintheprecedingexample,thenicethingaboutthemultimatchqueryisthatthefieldsdefinedinitsupportboosting,sowecanincreaseordecreasetheimportanceofmatchesoncertainfields.

However,thisisnottheonlydifferencewhenitcomestocomparisonwiththematchquery.Wecanalsocontrolhowthequeryisruninternallybyusingthetypepropertyandsettingittooneofthefollowingvalues:

best_fields:Thisisthedefaultbehavior,whichfindsdocumentshavingmatchesinanyfieldfromthedefinedones,butsettingthedocumentscoretothescoreofthebestmatchingfield.Themostusefultypewhensearchingformultiplewordsandwantingtoboostdocumentsthathavethosewordsinthesamefield.most_fields:Thisvaluefindsdocumentsthatmatchanyfieldandsetsthescoreofthedocumenttothecombinedscorefromallthematchedfields.cross_fields:Thisvaluetreatsthequeryasifallthetermswereinone,bigfield,thusreturningdocumentsmatchinganyfield.phrase:Thisvalueusesthematch_phrasequeryoneachfieldandsetsthescoreofthedocumenttothescorecombinedfromallthefields.phrase_prefix:Thisvalueusesthematch_phrase_prefixqueryoneachfieldandsetsthescoreofthedocumenttothescorecombinedfromallthefields.

Inadditiontotheparametersmentionedinthematchqueryandtype,themultimatchqueryexposessomeadditionalonesallowingmorecontroloveritsbehavior:

tie_breaker:Thisallowsustospecifythebalancebetweentheminimumandthemaximumscoringqueryitemsandthevaluecanbefrom0.0to1.0.Whenused,thescoreofthedocumentisequaltothebestscoringelementplusthetie_breakermultipliedbythescoreofalltheothermatchingfieldsinthedocument.So,whensetto0.0,Elasticsearchwillonlyusethescoreofthemostscoringmatchingelement.YoucanreadmoreaboutitinThedis_maxquerysectioninthischapter.

www.EBooksWorld.ir

Page 214: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThequerystringqueryIncomparisontotheotherqueriesavailable,thequerystringquerysupportsfullApacheLucenequerysyntax,whichwediscussedearlierintheLucenequerysyntaxsectionofChapter1,GettingStartedwithElasticsearchCluster.Itusesaqueryparsertoconstructanactualqueryusingtheprovidedtext.Anexamplequerystringquerywilllooklikethefollowingcode:

{

"query":{

"query_string":{

"query":"title:crime^10+title:punishment-otitle:cat+author:

(+Fyodor+dostoevsky)",

"default_field":"title"

}

}

}

BecausewearefamiliarwiththebasicsoftheLucenequerysyntax,wecandiscusshowtheprecedingqueryworks.Asyoucansee,wewantedtogetthedocumentsthatmayhavethetermcrimeinthetitlefieldandsuchdocumentsshouldbeboostedwiththevalueof10.Next,wewantedonlythedocumentsthathavethetermpunishmentinthetitlefieldandwedidn’twantdocumentswiththetermcatintheotitlefield.Finally,wetoldLucenethatweonlywantedthedocumentsthathadthefyodoranddostoevskytermsintheauthorfield.

SimilartomostofthequeriesinElasticsearch,thequerystringqueryprovidesquiteafewparametersthatallowustocontrolthequerybehaviorandthelistofparametersforthisqueryisratherextensive:

query:Thisspecifiesthequerytext.default_field:Thisspecifiesthedefaultfieldthequerywillbeexecutedagainst.Itdefaultstotheindex.query.default_fieldproperty,whichisbydefaultsetto_all.default_operator:Thisspecifiesthedefaultlogicaloperator(ororand)usedwhennooperatorisspecified.Thedefaultvalueofthisparameterisor.analyzer:Thisspecifiesthenameoftheanalyzerusedtoanalyzethequeryprovidedinthequeryparameter.allow_leading_wildcard:Thisspecifiesifawildcardcharacterisallowedasthefirstcharacterofaterm.Itdefaultstotrue.lowercase_expand_terms:Thisspecifiesifthetermsthatarearesultofqueryrewriteshouldbelowercased.Itdefaultstotrue,whichmeansthattherewrittentermswillbelowercased.enable_position_increments:Thisspecifiesifpositionincrementsshouldbeturnedonintheresultquery.Itdefaultstotrue.fuzzy_max_expansions:Thisspecifiesthemaximumnumberoftermsintowhichfuzzyquerywillbeexpanded,iffuzzyqueryisused.Itdefaultsto50.fuzzy_prefix_length:Thisspecifiestheprefixlengthforthegeneratedfuzzy

www.EBooksWorld.ir

Page 215: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

queriesanddefaultsto0.Tolearnmoreaboutit,lookatthefuzzyquerydescription.phrase_slop:Thisspecifiesthephraseslopanddefaultsto0.Tolearnmoreaboutit,lookatthephrasematchquerydescription.boost:Thisspecifiestheboostvaluewhichwillbeusedanddefaultsto1.0.analyze_wildcard:Thisspecifiesifthetermsgeneratedbythewildcardqueryshouldbeanalyzed.Itdefaultstofalse,whichmeansthatthosetermswon’tbeanalyzed.auto_generate_phrase_queries:specifiesifthephrasequerieswillbeautomaticallygeneratedfromthequery.Itdefaultstofalse,whichmeansthatthephrasequerieswon’tbeautomaticallygenerated.minimum_should_match:ThiscontrolshowmanyofthegeneratedBooleanshouldclausesshouldbematchedagainstadocumentforthedocumenttobeconsideredahit.Thevaluecanbeprovidedasapercentage;forexample,50%,whichwouldmeanthatatleast50percentofthegiventermsshouldmatch.Itcanalsobeprovidedasanintegervalue,suchas2,whichmeansthatatleast2termsmustmatch.fuzziness:Thiscontrolsthebehaviorofthegeneratedfuzzyquery.Refertothematchquerydescriptionformoreinformation.max_determined_states:Thisdefaultsto10000andsetsthenumberofstatesthattheautomatoncanhaveforhandlingregularexpressionqueries.Itisusedtodisallowveryexpensivequeriesusingregularexpressions.locale:Thissetsthelocalethatshouldbeusedfortheconversionofstringvalues.Bydefault,itissettoROOT.time_zone:Thissetsthetimezonethatshouldbeusedbyrangequeriesthatarerunondatebasedfields.lenient:Thiscantakethevalueoftrueorfalse.Ifsettotrue,format-basedfailureswillbeignored.Bydefault,itissettofalse.

NotethatElasticsearchcanrewritethequerystringqueryand,becauseofthat,Elasticsearchallowsustopassadditionalparametersthatcontroltherewritemethod.However,formoredetailsaboutthisprocess,gototheUnderstandingthequeryingprocesssectioninthischapter.

RunningthequerystringqueryagainstmultiplefieldsItispossibletorunthequerystringqueryagainstmultiplefields.Inordertodothat,oneneedstoprovidethefieldsparameterinthequerybody,whichshouldholdthearrayofthefieldnames.Therearetwomethodsofrunningthequerystringqueryagainstmultiplefields:thedefaultmethodusestheBooleanquerytomakequeriesandtheothermethodcanusethedis_maxquery.

Inordertousethedis_maxquery,oneshouldaddtheuse_dis_maxpropertyinthequerybodyandsetittotrue.Anexamplequerywilllooklikethefollowingcode:

{

"query":{

"query_string":{

"query":"crimepunishment",

"fields":["title","otitle"],

www.EBooksWorld.ir

Page 216: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"use_dis_max":true

}

}

}

www.EBooksWorld.ir

Page 217: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThesimplequerystringqueryThesimplequerystringqueryusesoneofthenewestqueryparsersinLucene-theSimpleQueryParser(https://lucene.apache.org/core/5_4_0/queryparser/org/apache/lucene/queryparser/simple/SimpleQueryParser.htmlSimilartothequerystringquery,itacceptsLucenequerysyntaxasthequery;however,unlikeit,itneverthrowsanexceptionwhenaparsingerrorhappens.Insteadofthrowinganexception,itdiscardstheinvalidpartsofthequeryandrunstherest.

Anexamplesimplequerystringquerywilllooklikethefollowingcode:

{

"query":{

"simple_query_string":{

"query":"crimepunishment",

"default_operator":"or"

}

}

}

Thequerysupportsparameterssuchasquery,fields,default_operator,analyzer,lowercase_expanded_terms,locale,lenient,andminimum_should_match,andcanalsoberunagainstmultiplefieldsusingthefieldsproperty.

www.EBooksWorld.ir

Page 218: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheidentifiersqueryThisisasimplequerythatfiltersthereturneddocumentstoonlythosewiththeprovidedidentifiers.Itworksontheinternal_uidfield,soitdoesn’trequirethe_idfieldtobeenabled.Thesimplestversionofsuchaquerywilllooklikethefollowing:

{

"query":{

"ids":{

"values":["1","2","3"]

}

}

}

Thisquerywillonlyreturnthosedocumentsthathaveoneoftheidentifierspresentinthevaluesarray.Wecancomplicatetheidentifiersqueryabitandalsolimitthedocumentsonthebasisoftheirtype.Forexample,ifwewanttoonlyincludedocumentsfromthebooktypes,wewillsendthefollowingquery:

{

"query":{

"ids":{

"type":"book",

"values":["1","2","3"]

}

}

}

Asyoucansee,we’veaddedthetypepropertytoourqueryandwe’vesetitsvaluetothetypeweareinterestedin.

www.EBooksWorld.ir

Page 219: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheprefixqueryThisqueryissimilartothetermqueryinitsconfigurationandtothemultitermquerywhenlookingintoitslogic.Theprefixqueryallowsustomatchdocumentsthathavethevalueinacertainfieldthatstartswithagivenprefix.Forexample,ifwewanttofindallthedocumentsthathavevaluesstartingwithcriinthetitlefield,wewillrunthefollowingquery:

{

"query":{

"prefix":{

"title":"cri"

}

}

}

Similartothetermquery,youcanalsoincludetheboostattributetoyourprefixquerywhichwillaffecttheimportanceofthegivenprefix.Forexample,ifwewouldliketochangeourpreviousqueryandgiveourqueryaboostof3.0,wewillsendthefollowingquery:

{

"query":{

"prefix":{

"title":{

"value":"cri",

"boost":3.0

}

}

}

}

NoteNotethattheprefixqueryisrewrittenbyElasticsearchandbecauseofthatElasticsearchallowsustopassanadditionalparameter,thatis,controllingtherewritemethod.However,formoredetailsaboutthatprocess,refertotheUnderstandingthequeryingprocesssectioninthischapter.

www.EBooksWorld.ir

Page 220: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThefuzzyqueryThefuzzyqueryallowsustofinddocumentsthathavevaluessimilartotheoneswe’veprovidedinthequery.Thesimilarityoftermsiscalculatedonthebasisoftheeditdistancealgorithm.Theeditdistanceiscalculatedonthebasisoftermsweprovideinthequeryandagainstthesearcheddocuments.ThisquerycanbeexpensivewhenitcomestoCPUresources,butcanhelpuswhenweneedfuzzymatching;forexample,whenusersmakespellingmistakes.Inourexample,let’sassumethatinsteadofcrime,ouruserentersthecrmewordintothesearchboxandwewouldliketorunthesimplestformoffuzzyquery.Suchaquerywilllooklikethis:

{

"query":{

"fuzzy":{

"title":"crme"

}

}

}

Theresponseforsuchaquerywillbeasfollows:

{

"took":81,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

}]

}

}

Eventhoughwemadeatypo,Elasticsearchmanagedtofindthedocumentswewereinterestedin.

www.EBooksWorld.ir

Page 221: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Wecancontrolthefuzzyquerybehaviorbyusingthefollowingparameters:

value:Thisspecifiestheactualquery.boost:Thisspecifiestheboostvalueforthequery.Itdefaultsto1.0.fuzziness:Thiscontrolsthebehaviorofthegeneratedfuzzyquery.Refertothematchquerydescriptionformoreinformation.prefix_length:Thisisthelengthofthecommonprefixofthedifferencingterms.Itdefaultsto0.max_expansions:Thisspecifiesthemaximumnumberoftermsthequerywillbeexpandedto.Thedefaultvalueisunbounded.

Theparametersshouldbewrappedinthenameofthefieldwearerunningthequeryagainst.Soifwewouldliketomodifythepreviousqueryandaddadditionalparameters,thequerywilllooklikethefollowingcode:

{

"query":{

"fuzzy":{

"title":{

"value":"crme",

"fuzziness":2

}

}

}

}

www.EBooksWorld.ir

Page 222: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThewildcardqueryAquerythatallowsustouse*and?wildcardsinthevalueswesearch.Apartfromthat,thewildcardqueryisverysimilartothetermqueryincaseofitsbody.Tosendaquerythatwouldmatchallthedocumentswiththevalueofthecr?meterm(?matchinganycharacter)wewouldsendthefollowingquery:

{

"query":{

"wildcard":{

"title":"cr?me"

}

}

}

Itwillmatchthedocumentsthathaveallthetermsmatchingcr?meinthetitlefield.However,youcanalsoincludetheboostattributetoyourwildcardquerywhichwillaffecttheimportanceofeachtermthatmatchesthegivenvalue.Forexample,ifwewouldliketochangeourpreviousqueryandgiveourtermqueryaboostof20.0,wewillsendthefollowingquery:

{

"query":{

"wildcard":{

"title":{

"value":"cr?me",

"boost":20.0

}

}

}

}

NoteNotethatwildcardqueriesarenotveryperformanceorientedqueriesandshouldbeavoidedifpossible;especiallyavoidleadingwildcards(termsstartingwithwildcards).ThewildcardqueryisrewrittenbyElasticsearchandbecauseofthatElasticsearchallowsustopassanadditionalparameter,thatis,controllingtherewritemethod.Formoredetailsaboutthisprocess,refertotheUnderstandingthequeryingprocesssectioninthischapter.Alsorememberthatthewildcardqueryisnotanalyzed.

www.EBooksWorld.ir

Page 223: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TherangequeryAquerythatallowsustofinddocumentsthathaveafieldvaluewithinacertainrangeandwhichworksfornumericalfieldsaswellasforstring-basedfieldsanddatebasedfields(justmapstoadifferentApacheLucenequery).Therangequeryshouldberunagainstasinglefieldandthequeryparametersshouldbewrappedinthefieldname.Thefollowingparametersaresupported:

gte:Thequerywillmatchdocumentswiththevaluegreaterthanorequaltotheoneprovidedwiththisparametergt:Thequerywillmatchdocumentswiththevaluegreaterthantheoneprovidedwiththisparameterlte:Thequerywillmatchdocumentswiththevaluelowerthanorequaltotheoneprovidedwiththisparameterlt:Thequerywillmatchdocumentswiththevaluelowerthantheoneprovidedwiththisparameter

Soforexample,ifwewanttofindallthebooksthathavethevaluefrom1700to1900intheyearfield,wewillrunthefollowingquery:

{

"query":{

"range":{

"year":{

"gte":1700,

"lte":1900

}

}

}

}

www.EBooksWorld.ir

Page 224: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RegularexpressionqueryRegularexpressionqueryallowsustouseregularexpressionsasthequerytext.Rememberthattheperformanceofsuchqueriesdependsonthechosenregularexpression.Ifourregularexpressionwouldmatchmanyterms,thequerywillbeslow.Thegeneralruleisthatthemoretermsmatchedbytheregularexpression,theslowerthequerywillbe.

Anexampleregularexpressionquerylookslikethis:

{

"query":{

"regexp":{

"title":{

"value":"cr.m[ae]",

"boost":10.0

}

}

}

}

TheprecedingquerywillresultinElasticsearchrewritingthequery.Therewrittenquerywillhavethenumberoftermqueriesdependingonthecontentofourindexmatchingthegivenregularexpression.Theboostparameterseeninthequeryspecifiestheboostvalueforthegeneratedqueries.

ThefullregularexpressionsyntaxacceptedbyElasticsearchcanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#regexp-syntax.

www.EBooksWorld.ir

Page 225: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThemorelikethisqueryOneofthequeriesthatgotamajorreworkinElasticsearch2.0,themorelikethisqueryallowsustoretrievedocumentsthataresimilar(ornotsimilar)totheprovidedtextortothedocumentsthatwereprovided.

Themorelikethisqueryallowsustogetdocumentsthataresimilartotheprovidedtext.Elasticsearchsupportsafewparameterstodefinehowthemorelikethisqueryshouldwork:

fields:Anarrayoffieldsthatthequeryshouldberunagainst.Itdefaultstothe_allfield.like:Thisparametercomesintwoflavors:itallowsustoprovideatextwhichthereturneddocumentsshouldbesimilartooranarrayofdocumentsthatthereturningdocumentshouldbesimilarto.unlike:Thisissimilartothelikeparameter,butitallowsustodefinetextordocumentsthatourreturningdocumentshouldnotbesimilarto.min_term_freq:Theminimumtermfrequency(forthetermsinthedocuments)belowwhichtermswillbeignored.Itdefaultsto2.max_query_terms:Themaximumnumberoftermsthatwillbeincludedinanygeneratedquery.Itdefaultsto25.Thehighervaluemaymeanhigherprecision,butlowerperformance.stop_words:Anarrayofwordsthatwillbeignoredwhencomparingdocumentsandthequery.Itisemptybydefault.min_doc_freq:Theminimumnumberofdocumentsinwhichthetermhastobepresentinordernottobeignored.Itdefaultsto5,whichmeansthatatermneedstobepresentinatleastfivedocuments.max_doc_freq:Themaximumnumberofdocumentsinwhichthetermmaybepresentinordernottobeignored.Bydefault,itisunbounded(setto0).min_word_len:Theminimumlengthofasinglewordbelowwhichawordwillbeignored.Itdefaultsto0.max_word_len:Themaximumlengthofasinglewordabovewhichitwillbeignored.Itdefaultstounbounded(whichmeanssettingthevalueto0).boost_terms:Theboostvaluethatwillbeusedwhenboostingeachterm.Itdefaultsto0.boost:Theboostvaluethatwillbeusedwhenboostingthequery.Itdefaultsto1.include:Thisspecifiesiftheinputdocumentsshouldbeincludedintheresultsreturnedbythequery.Itdefaultstofalse,whichmeansthattheinputdocumentswon’tbeincluded.minimum_should_match:Thiscontrolsthenumberoftermsthatneedtobematchedintheresultingdocuments.Bydefault,itissetto30%.analyzer:Thenameoftheanalyzerthatwillbeusedtoanalyzethetextweprovided.

Anexampleforamorelikethisquerylookslikethis:

www.EBooksWorld.ir

Page 226: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

{

"query":{

"more_like_this":{

"fields":["title","otitle"],

"like":"crimeandpunishment",

"min_term_freq":1,

"min_doc_freq":1

}

}

}

Aswesaidearlier,thelikepropertycanalsobeusedtoshowwhichdocumentstheresultsshouldbesimilarto.Forexample,thefollowingisthequerythatwillusethelikepropertytopointtoagivendocument(notethatthefollowingquerywon’treturndocumentsonourexampledata):

{

"query":{

"more_like_this":{

"fields":["title","otitle"],

"min_term_freq":1,

"min_doc_freq":1,

"like":[

{

"_index":"library",

"_type":"book",

"_id":"4"

}

]

}

}

}

Wecanalsomixthedocumentsandtexttogether:

{

"query":{

"more_like_this":{

"fields":["title","otitle"],

"min_term_freq":1,

"min_doc_freq":1,

"like":[

{

"_index":"library",

"_type":"book",

"_id":"4"

},

"crimeandpunishment"

]

}

}

}

www.EBooksWorld.ir

Page 227: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 228: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CompoundqueriesIntheBasicqueriessectionofthischapter,wediscussedthesimplestqueriesexposedbyElasticsearch.WealsotalkedaboutthepositionawarequeriescalledspanqueriesintheSpanqueriessection.However,thesimpleonesandthespanqueriesarenottheonlyqueriesthatElasticsearchprovides.Thecompoundqueries,aswecallthem,allowustoconnectmultiplequeriestogetheroralterthebehaviorofotherqueries.Youmaywonderifyouneedsuchfunctionality.Yourdeploymentmaynotneedit,butanythingapartfromasimplequerywillprobablyrequirecompoundqueries.Forexample,combiningasimpletermquerywithamatch_phrasequerytogetbettersearchresultsmaybeagoodcandidateforcompoundqueriesusage.

www.EBooksWorld.ir

Page 229: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheboolqueryTheboolqueryallowsustowrapavirtuallyunboundednumberofqueriesandconnectthemwithalogicalvalueusingoneofthefollowingsections:

should:Thequerywrappedintothissectionmayormaynotmatch.Thenumberofshouldsectionsthathavetomatchiscontrolledbytheminimum_should_matchparametermust:Thequerywrappedintothissectionmustmatchinorderforthedocumenttobereturned.must_not:Thequerywhenwrappedintothissectionmustnotmatchinorderforthedocumenttobereturned.

Eachoftheprecedingmentionedsectionscanbepresentmultipletimesinasingleboolquery.Thisallowsustobuildverycomplexqueriesthathavemultiplelevelsofnesting(youcanincludetheboolqueryinanotherboolquery).Rememberthatthescoreoftheresultingdocumentwillbecalculatedbytakingasumofallthewrappedqueriesthatthedocumentmatched.

Inadditiontotheprecedingsections,wecanaddthefollowingparameterstothequerybodytocontrolitsbehavior:

filter:Thisallowsustospecifythepartofthequerythatshouldbeusedasafilter.YoucanreadmoreaboutfiltersintheFilteringyourresultssectioninChapter4,ExtendingYourQueryingKnowledge.boost:Thisspecifiestheboostusedinthequery,defaultingto1.0.Thehighertheboost,thehigherthescoreofthematchingdocument.minimum_should_match:Thisdescribestheminimumnumberofshouldclausesthathavetomatchinorderforthecheckeddocumenttobecountedasamatch.Forexample,itcanbeanintegervaluesuchas2orapercentagevaluesuchas75%.Formoreinformation,refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-minimum-should-match.html.disable_coord:ABooleanparameter(defaultstofalse),whichallowsustoenableordisablethescorefactorcomputationthatisbasedonthefractionofallthequerytermsthatadocumentcontains.Weshouldsetittotrueforlessprecisescoring,butslightlyfasterqueries.

Imaginethatwewanttofindallthedocumentsthathavethetermcrimeinthetitlefield.Inaddition,thedocumentsmayormaynothavearangeof1900to2000intheyearfieldandmaynothavethenothingtermintheotitlefield.Suchaquerymadewiththeboolquerywilllookasfollows:

{

"query":{

"bool":{

"must":{

"term":{

"title":"crime"

www.EBooksWorld.ir

Page 230: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

},

"should":{

"range":{

"year":{

"from":1900,

"to":2000

}

}

},

"must_not":{

"term":{

"otitle":"nothing"

}

}

}

}

}

NoteNotethatthemust,should,andmust_notsectionscancontainasinglequeryoranarrayofqueries.

www.EBooksWorld.ir

Page 231: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Thedis_maxqueryThedis_maxqueryisveryusefulasitgeneratesaunionofdocumentsreturnedbyallthesubqueriesandreturnsitastheresult.Thegoodthingaboutthisqueryisthefactthatwecancontrolhowthelowerscoringsubqueriesaffectthefinalscoreofthedocuments.Forthedis_maxquery,wespecifythequeriesusingthequeriesproperty(queryoranarrayofqueries)andthetiebreaker,withthetie_breakerproperty.Wecanalsoincludeadditionalboostbyspecifyingtheboostparameter.

Thefinaldocumentscoreiscalculatedasthesumofscoresofthemaximumscoringqueryandthesumofscoresreturnedfromtherestofthequeries,multipliedbythevalueofthetieparameter.So,thetie_breakerparameterallowsustocontrolhowthelowerscoringqueriesaffectthefinalscore.Ifwesetthetie_breakerparameterto1.0,wegettheexactsum,whilesettingthetieparameterto0.1resultsinonly10percentofthescores(ofallthescoresapartfromthemaximumscoringquery)beingaddedtothefinalscore.

Anexampleofthedis_maxqueryisasfollows:

{

"query":{

"dis_max":{

"tie_breaker":0.99,

"boost":10.0,

"queries":[

{

"match":{

"title":"crime"

}

},

{

"match":{

"author":"fyodor"

}

}

]

}

}

}

Asyoucansee,weincludedthetie_breakerandboostparameters.Inadditiontothat,wespecifiedthequeriesparameterthatholdsthearrayofqueriesthatwillberunandusedtogeneratetheunionofdocumentsforresults.

www.EBooksWorld.ir

Page 232: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheboostingqueryTheboostingquerywrapsaroundtwoqueriesandlowersthescoreofthedocumentsreturnedbyoneofthequeries.Therearethreesectionsoftheboostingquerythatneedtobedefined:thepositivesectionthatholdsthequerywhosedocumentscorewillbeleftunchanged,thenegativesectionwhoseresultingdocumentswillhavetheirscorelowered,andthenegative_boostsectionthatholdstheboostvaluethatwillbeusedtolowerthesecondsection’squeryscore.Theadvantageoftheboostingqueryisthattheresultsofboththequeries(thenegativeandthepositiveones)willbepresentintheresults,althoughthescoresofsomequerieswillbelowered.Forcomparison,ifweweretousetheboolquerywiththemust_notsection,wewouldn’tgettheresultsforsuchaquery.

Let’sassumethatwewanttohavetheresultsofasimpletermqueryforthetermcrimeinthetitlefieldandwantthescoreofsuchdocumentstonotbechanged.However,wealsowanttohavethedocumentsthatrangefrom1800to1900intheyearfield,andthescoresofdocumentsreturnedbysuchaquerytohaveanadditionalboostof0.5.Suchaquerywilllooklikethefollowing:

{

"query":{

"boosting":{

"positive":{

"term":{

"title":"crime"

}

},

"negative":{

"range":{

"year":{

"from":1800,

"to":1900

}

}

},

"negative_boost":0.5

}

}

}

www.EBooksWorld.ir

Page 233: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Theconstant_scorequeryTheconstant_scorequerywrapsanotherqueryandreturnsaconstantscoreforeachdocumentreturnedbythewrappedquery.Wespecifythescorethatshouldbegiventothedocumentsbyusingtheboostproperty,whichdefaultsto1.0.Itallowsustostrictlycontrolthescorevalueassignedforadocumentmatchedbyaquery.Forexample,ifwewanttohaveascoreof2.0forallthedocumentsthathavethetermcrimeinthetitlefield,wesendthefollowingquerytoElasticsearch:

{

"query":{

"constant_score":{

"query":{

"term":{

"title":"crime"

}

},

"boost":2.0

}

}

}

www.EBooksWorld.ir

Page 234: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheindicesqueryTheindicesqueryisusefulwhenexecutingaqueryagainstmultipleindices.Itallowsustoprovideanarrayofindices(theindicesproperty)andtwoqueries,onethatwillbeexecutedifwequerytheindexfromthelist(thequeryproperty)andthesecondthatwillbeexecutedonalltheotherindices(theno_match_queryproperty).Forexample,assumewehaveanaliasnamedbooks,holdingtwoindices:libraryandusers.Whatwewanttodoisusethisalias.However,wewanttorundifferentqueriesdependingonwhichindexisusedforsearching.Anexamplequeryfollowingthislogicwilllookasfollows:

{

"query":{

"indices":{

"indices":["library"],

"query":{

"term":{

"title":"crime"

}

},

"no_match_query":{

"term":{

"user":"crime"

}

}

}

}

}

Intheprecedingquery,thequerydescribedinthequerypropertywasrunagainstthelibraryindexandthequerydefinedintheno_match_querysectionwasrunagainstalltheotherindicespresentinthecluster,whichforourhypotheticalaliasmeanstheusersindex.

Theno_match_querypropertycanalsohaveastringvalueinsteadofaquery.Thisstringvaluecaneitherbeallornone,butitdefaultstoall.Iftheno_match_querypropertyissettoall,thedocumentsfromtheindicesthatdon’tmatchwillbereturned.Settingtheno_match_querypropertytononewillresultinnodocumentsfromtheindicesthatdon’tmatchthequeryfromthatsection.

www.EBooksWorld.ir

Page 235: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 236: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UsingspanqueriesElasticsearchleveragesLucenespanqueries,whichallowustomakequerieswhensometokensorphrasesarenearothertokensorphrases.Basically,wecancallthempositionawarequeries.Whenusingthestandardnonspanqueries,wearenotabletomakequeriesthatarepositionaware;tosomeextent,thephrasequeriesallowthat,butonlytosomeextent.So,forElasticsearchandtheunderlyingLucene,itdoesn’tmatterifthetermisinthebeginningofthesentenceorattheendornearanotherterm.Whenusingspanqueries,itdoesmatter.

ThefollowingspanqueriesareexposedinElasticsearch:

spantermqueryspanfirstqueryspannearqueryspanorqueryspannotqueryspanwithinqueryspancontainingqueryspanmultiquery

Beforewecontinuewiththedescription,let’sindexadocumenttoacompletelynewindexthatwewillusetoshowhowspanquerieswork.Todothis,weusethefollowingcommand:

curl-XPUT'localhost:9200/spans/book/1'-d'{

"title":"Testbook",

"author":"Testauthor",

"description":"Theworldbreakseveryone,andafterward,somearestrong

atthebrokenplaces"

}'

www.EBooksWorld.ir

Page 237: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AspanAspan,inourcontext,isastartingandendingtokenpositioninafield.Forexample,inourcase,theworldbreakseveryonecouldbeasinglespan,aworldcanbeasinglespantoo.Asyoumayknow,duringanalysis,Lucene,inadditiontotoken,includessomeadditionalparameters,suchaspositioninthetokenstream.PositioninformationcombinedwiththetermsallowsustoconstructspansusingElasticsearchspanqueries(whicharemappedtoLucenespanqueries).Inthenextfewpages,wewilllearnhowtoconstructspansusingdifferentspanqueriesandhowtocontrolwhichdocumentsarematched.

www.EBooksWorld.ir

Page 238: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SpantermqueryThespan_termqueryisabuilderfortheotherspanqueries.Aspan_termqueryisaquerysimilartothealreadydiscussedtermquery.Onitsown,itworksjustlikethementionedtermquery–itmatchesaterm.Itsdefinitionissimpleandlooksasfollows(weomittedsomepartsofthequeriesonpurpose,becausewewilldiscussitlater):

{

"query":{

...

"span_term":{

"description":{

"value":"world",

"boost":5.0

}

}

}

}

Asyoucansee,itisverysimilartothestandardtermquery.Theabovequeryisrunagainstthedescriptionfieldandwewanttohavethedocumentsthathavetheworldtermreturned.Wealsospecifiedtheboost,whichisalsoallowed.

Onethingtorememberisthatthespan_termquery,similartothestandardtermquery,isnotanalyzed.

www.EBooksWorld.ir

Page 239: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SpanfirstqueryThespanfirstqueryallowsustomatchdocumentsthathavematchesonlyinthefirstpositionsofthefield.Inordertodefineaspanfirstquery,weneedtonestinsideofitanyotherspanquery;forexample,aspantermquerywealreadyknow.So,let’sfindthedocumentthathasthetermworldinthefirsttwopositionsinthedescriptionfield.Wedothatbysendingthefollowingquery:

{

"query":{

"span_first":{

"match":{

"span_term":{"description":"world"}

},

"end":2

}

}

}

Intheresults,wewillgetthedocumentthatwehadindexedinthebeginningofthissection.Inthematchsectionofthespanfirstquery,weshouldincludeatleastasinglespanquerythatshouldbematchedatthemaximumpositionspecifiedbytheendparameter.

So,tounderstandeverythingwell,ifwesettheendparameterto1,weshouldn’tgetourdocumentwiththepreviousquery.So,let’scheckitbysendingthefollowingquery:

{

"query":{

"span_first":{

"match":{

"span_term":{"description":"world"}

},

"end":1

}

}

}

Theresponsetotheprecedingquerywillbeasfollows:

{

"took":1,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":0,

"max_score":null,

"hits":[]

}

}

www.EBooksWorld.ir

Page 240: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Soitisworkingasexpected.Thisisbecausethefirstterminourindexwillbethetermtheandnotthetermworldwhichwesearchedfor.

www.EBooksWorld.ir

Page 241: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SpannearqueryThespannearqueryallowsustomatchdocumentsthathaveotherspansneareachotherandwecancallthisqueryacompoundqueryasitwrapsanotherspanquery.Forexample,ifwewanttofinddocumentsthathavethetermworldnearthetermeveryone,wewillrunthefollowingquery:

{

"query":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"everyone"}}

],

"slop":0,

"in_order":true

}

}

}

Asyoucansee,wespecifyourqueriesintheclausessectionofthespannearquery.Itisanarrayofotherspanqueries.Theslopparameterdefinestheallowednumberoftermsbetweenthespans.Thein_orderparametercanbeusedtolimitthematchesonlytothosedocumentsthatmatchourqueriesinthesameorderthattheyweredefinedin.So,inourcase,wewillgetdocumentsthathaveworldeveryone,butnoteveryoneworldinthedescriptionfield.

Solet’sgetbacktoourquery,rightnowitwouldreturn0results.Ifyoulookatourexampledocument,youwillnoticethatbetweenthetermsworldandeveryone,anadditionaltermispresentandwesettheslopparameterto0(slopwasdiscussedduringthephrasequerydescription).Ifweincreaseitto1,wewillgetourresult.Totestit,let’ssendthefollowingquery:

{

"query":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"everyone"}}

],

"slop":1,

"in_order":true

}

}

}

TheresultsreturnedbyElasticsearchareasfollows:

{

"took":6,

"timed_out":false,

"_shards":{

"total":5,

www.EBooksWorld.ir

Page 242: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.10848885,

"hits":[{

"_index":"spans",

"_type":"book",

"_id":"1",

"_score":0.10848885,

"_source":{

"title":"Testbook",

"author":"Testauthor",

"description":"Theworldbreakseveryone,andafterward,someare

strongatthebrokenplaces"

}

}]

}

}

Aswecansee,thealteredquerysuccessfullyreturnedourindexeddocument.

www.EBooksWorld.ir

Page 243: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SpanorqueryThespanorqueryallowsustowrapotherspanqueriesandaggregatematchesofallthosethatwe’vewrapped.Similartothespan_nearquery,thespan_orqueryusesthearrayofclausestospecifyotherspanqueries.Forexample,ifwewanttogetthedocumentsthathavethetermworldinthefirsttwopositionsofthedescriptionfield,ortheonesthathavethetermworldnotfurtherthanasinglepositionfromthetermeveryone,wewillsendthefollowingquerytoElasticsearch:

{

"query":{

"span_or":{

"clauses":[

{

"span_first":{

"match":{

"span_term":{"description":"world"}

},

"end":2

}

},

{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"everyone"}}

],

"slop":1,

"in_order":true

}

}

]

}

}

}

Theresultoftheprecedingquerywillreturnourindexeddocument.

www.EBooksWorld.ir

Page 244: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SpannotqueryThespannotqueryallowsustospecifytwosectionsofqueries.Thefirstistheincludesectionwhichspecifieswhichspanqueriesshouldbematchedandthesecondsectionistheexcludeonewhichspecifiesthespanquerieswhichshouldn’tbeoverlappingthefirstones.Tokeepitsimple,ifaqueryfromtheexcludeonematchesthesamespan(orapartofit)asthequeryfromtheincludesection,suchadocumentwon’tbereturnedasamatchforsuchaspannotquery.Eachofthesesectionscancontainmultiplespanqueries.

So,toillustratethatquery,let’smakeaquerythatwillreturnallthedocumentsthathavethespanconstructedfromasingletermandwhichhavethetermbreaksinthedescriptionfield.Let’salsoexcludethedocumentsthathaveaspanwhichmatchesthetermsworldandeveryoneatthemaximumofasinglepositionfromeachother,whensuchaspanoverlapstheonedefinedinthefirstspanquery.

{

"query":{

"span_not":{

"include":{

"span_term":{"description":"breaks"}

},

"exclude":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"everyone"}}

],

"slop":1

}

}

}

}

}

Thefollowingistheresult:

{

"took":1,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":0,

"max_score":null,

"hits":[]

}

}

Asyouwouldhavenoticed,theresultofthequeryisaswewouldhaveexpected.Ourdocumentwasn’tfoundbecausethespanqueryfromtheexcludesectionwasoverlapping

www.EBooksWorld.ir

Page 245: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

thespanfromtheincludesection.

www.EBooksWorld.ir

Page 246: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SpanwithinqueryThespan_withinqueryallowsustofinddocumentsthathaveaspanenclosedinanotherspan.Wedefinetwosectionsinthespan_withinquery:thelittleandthebig.Thelittlesectiondefinesaspanquerythatneedstobeenclosedbythespanquerydefinedusingthebigsection.

Forexample,ifwewouldliketofindadocumentthathasthetermworldnearthetermbreaksandthosetermsshouldbeinsideaspanthatisboundbythetermsworldandafterwardnotmorethan10termsfromeachother,thequerythatdoesthatwilllookasfollows:

{

"query":{

"span_within":{

"little":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"breaks"}}

],

"slop":0,

"in_order":false

}

},

"big":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"afterward"}}

],

"slop":10,

"in_order":false

}

}

}

}

}

www.EBooksWorld.ir

Page 247: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SpancontainingqueryThespan_contaningquerycanbeseenastheoppositeofthespan_withinquerywejustdiscussed.Itallowsustomatchspansthatoverlapotherspans.Again,weusetwosectionswiththespanqueries:thelittleandthebig.Thelittlesectiondefinesaspanquerythatneedstobeenclosedbythespanquerydefinedusingthebigsection.

Wecanusethesameexample.Ifwewouldliketofindadocumentthathasthetermworldnearthetermbreaks,andthosetermsshouldbeinsideaspanthatisboundbythetermsworldandafterwardnotmorethan10termsfromeachother,thequerythatdoesthatwilllookasfollows:

{

"query":{

"span_containing":{

"little":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"breaks"}}

],

"slop":0,

"in_order":false

}

},

"big":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"afterward"}}

],

"slop":10,

"in_order":false

}

}

}

}

}

www.EBooksWorld.ir

Page 248: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SpanmultiqueryThelasttypeofspanquerythatElasticsearchsupportsisthespan_multiquery.Itallowsustowrapanymultitermquerythatwe’vediscussed(thetermquery,therangequery,thewildcardquery,theregexquery,thefuzzyquery,ortheprefixquery)asaspanquery.

Forexample,ifwewanttofinddocumentsthathavethetermstartingwiththeprefixworinthefirsttwopositionsinthedescriptionfield,wecandothatbysendingthefollowingquery:

{

"query":{

"span_multi":{

"match":{

"prefix":{

"description":{"value":"wor"}

}

}

}

}

}

Thereisonethingtoremember–themultitermquerythatwewanttouseneedstobeenclosedinthematchsectionofthespan_multiquery.

www.EBooksWorld.ir

Page 249: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PerformanceconsiderationsAfewwordsattheendofdiscussingspanqueries.Rememberthattheyarecostlierwhenitcomestoprocessingpower,becausenotonlydothetermshavetobematchedbutalsopositionshavetobecalculatedandchecked.ThismeansthatLuceneandthusElasticsearchwillneedmoreCPUcyclestocalculatealltheneededinformationtofindmatchingdocuments.Youcanexpectspanqueriestobeslowerthanthequeriesthatdon’ttakepositionsintoaccount.

www.EBooksWorld.ir

Page 250: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 251: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ChoosingtherightqueryBynowwe’veseenwhatqueriesareavailableinElasticsearch,boththesimpleonesandtheonesthatcangroupotherqueriesaswell.Beforecontinuingwithmorecomplicatedtopics,wewouldliketodiscusswhichofthequeriesshouldbeusedforwhichusecase.Ofcourse,onecoulddedicatethewholebooktoshowingdifferentqueriesusecases,sowewillonlyshowafewofthemtohelpyouseewhatyoucanexpectandwhichquerytouse.

www.EBooksWorld.ir

Page 252: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheusecasesAsyoualreadyknowwhichqueriescanbeusedtofindwhichdata,whatwewouldliketoshowyouareexampleusecasesusingthedataweindexedinChapter2,IndexingYourData.Todothis,wewillstartwithafewguidinglinesonhowtochosethequeryandthenwewillshowyouexampleusecasesanddiscusswhythosequeriescouldbeused.

LimitingresultstogiventagsOneofthesimplestexamplesofqueryingElasticsearchisthesearchforexactterms.ByexactwemeancharactertocharactercomparisonofatermthatisindexedandwrittenintoLuceneinvertedindex.Torunsuchaquery,wecanusethetermqueryprovidedbyElasticsearch.ThisisbecauseitscontentisnotanalyzedbyElasticsearch.Forexample,let’sassumethatwewouldliketosearchforallthebookswiththevaluenovelinthetagsfield,whichasweknowfromthemappingsisnotanalyzed.Todothat,wewouldrunthefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"term":{

"tags":"novel"

}

}

}'

www.EBooksWorld.ir

Page 253: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SearchingforvaluesinarangeOneofthesimplestqueriesthatcanberunisaquerymatchingdocumentsinagivenrangeofvalues.Usuallysuchqueriesareapartofalargerqueryorafilter.Forexample,aquerythatwouldreturnbookswiththenumberofcopiesfrom1to3inclusive,wouldlookasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"range":{

"copies":{

"gte":1,

"lte":3

}

}

}

}'

BoostingsomeofthematcheddocumentsTherearemanycommonexamplesofusingtheboolquery.Forexample,verysimpleoneslikefindingdocumentshavingalistofterms.Whatwewouldliketoshowyouishowtousetheboolquerytoboostsomeofthedocuments.Forexample,ifwewanttofindallthedocumentsthathaveoneormorecopyandhavetheonesthatarepublishedafter1950,wewillrunthefollowingquery:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"bool":{

"must":[

{

"range":{

"copies":{

"gte":1

}

}

}

],

"should":[

{

"range":{

"year":{

"gt":1950

}

}

}

]

}

}

}'

IgnoringlowerscoringpartialqueriesThedis_maxquery,aswediscussed,allowsustocontrolhowinfluentialthelowerscoring

www.EBooksWorld.ir

Page 254: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

partialqueriesare.Forexample,ifwewouldonlywanttoassignthescoreofthehighestscoringpartialqueryforthedocumentsmatchingcrimepunishmentinthetitlefieldorraskolnikovinthecharactersfield,wewouldrunthefollowingquery:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"fields":["_id","_score"],

"query":{

"dis_max":{

"tie_breaker":0.0,

"queries":[

{

"match":{

"title":"crimepunishment"

}

},

{

"match":{

"characters":"raskolnikov"

}

}

]

}

}

}'

Theresultfortheprecedingquerywilllookasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.70710677,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.70710677

}]

}

}

Nowlet’sseethescoreofthepartialqueriesalone.Todothat,wewillrunthepartialqueriesusingthefollowingcommands:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"fields":["_id","_score"],

"query":{

"match":{

"title":"crimepunishment"

}

www.EBooksWorld.ir

Page 255: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

}'

Theresponsefortheprecedingqueryisasfollows:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.70710677,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.70710677

}]

}

}

Thefollowingisthenextcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"fields":["_id","_score"],

"query":{

"match":{

"characters":"raskolnikov"

}

}

}'

Theresponseisasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5

}]

}

}

www.EBooksWorld.ir

Page 256: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Asyoucansee,thescoreofthedocumentreturnedbyourdis_maxqueryisequaltothescoreofthehighestscoringpartialquery(thefirstpartialquery).Thatisbecausewesetthetie_breakerpropertyto0.0.

UsingLucenequerysyntaxinqueriesHavingasimplesearchsyntaxisveryusefulforusersandwealreadyhavesuch–theLucenequerysyntax.Usingthequery_stringqueryisanexamplewherewecanleveragethatbyallowingtheuserstotypeinquerieswithadditionalcontrolcharacters.Forexample,ifwewouldliketofindbookshavingthetermscrimeandpunishmentintheirtitleandthefyodordostoevskyphraseintheauthorfield,andnotbeingpublishedbetween2000(exclusive)and2015(inclusive),wewouldusethefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"query_string":{

"query":"+title:crime+title:punishment+author:\"fyodordostoevsky\"

-copies:{2000TO2015]"

}

}

}'

Asyoucansee,weusedtheLucenequerysyntaxtopassallthematchingrequirementsandweletthequeryparserconstructtheappropriatequery.

HandlinguserquerieswithouterrorsUsingthequery_stringqueryisveryhandy,butitisnoterrortolerant.IfouruserprovidesincorrectLucenesyntax,thequerywillreturnanerror.Becauseofthat,ElasticsearchexposesasecondquerythatsupportsanalysisandfullLucenequerysyntax–thesimple_query_stringquery.Usingsuchaqueryallowsustoruntheuserqueriesandnotcareabouttheparsingerrorsatall.Forexample,let’slookatthefollowingquery:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"query_string":{

"query":"+crime+punishment\"",

"default_field":"title"

}

}

}'

Theresponsewillcontain:

{

"error":{

"root_cause":[{

"type":"query_parsing_exception",

"reason":"Failedtoparsequery[+crime+punishment\"]",

"index":"library",

"line":6,

"col":3

}],

"type":"search_phase_execution_exception",

www.EBooksWorld.ir

Page 257: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"reason":"allshardsfailed",

"phase":"query",

"grouped":true,

"failed_shards":[{

"shard":0,

"index":"library",

"node":"7jznW07BRrqjG-aJ7iKeaQ",

"reason":{

"type":"query_parsing_exception",

"reason":"Failedtoparsequery[+crime+punishment\"]",

"index":"library",

"line":6,

"col":3,

"caused_by":{

"type":"parse_exception",

"reason":"Cannotparse'+crime+punishment\"':Lexicalerror

atline1,column21.Encountered:<EOF>after:\"\"",

"caused_by":{

"type":"token_mgr_error",

"reason":"Lexicalerroratline1,column21.

Encountered:<EOF>after:\"\""

}

}

}

}]

},

"status":400

}

Thismeansthatthequerywasnotproperlyconstructedandaparseerrorhappened.That’swhythesimple_query_stringquerywasintroduced.Itusesaqueryparserthattriestohandleusermistakesandtriestoguesshowthequeryshouldlook.Ourqueryusingthatparserwilllookasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"simple_query_string":{

"query":"+crime+punishment\"",

"fields":["title"]

}

}

}'

Ifyouruntheprecedingquery,youwillseethattheproperdocumentisreturnedbyElasticsearcheventhoughthequeryisnotproperlyconstructed.

AutocompleteusingprefixesAverycommonusecaseistoprovideautocompletefunctionalityontheindexeddata.Asweknow,theprefixqueryisnotanalyzedandworksonthebasisoftermsindexedinthefield.Sotheactualfunctionalitydependsonwhichtokensareproducedduringindexing.Forexample,let’sassumethatwewouldliketoprovideautocompletefunctionalityonanytokeninthetitlefieldandtheuserprovidedwesprefix.Aquerythatwouldmatchsucharequirementlooksasfollows:

www.EBooksWorld.ir

Page 258: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"prefix":{

"title":"wes"

}

}

}'

FindingtermssimilartoagivenoneAverysimpleexampleisusingthefuzzyquerytofinddocumentshavingatermsimilartoagivenone.Forexample,ifwewanttofindallthedocumentshavingavaluesimilartocrimea,wewillrunthefollowingquery:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"fuzzy":{

"title":{

"value":"crimea",

"fuzziness":2,

"max_expansions":50

}

}

}

}'

MatchingphrasesThesimplestpositionawarequery,thephrasequeryallowsustofinddocumentsnotwithatermbuttermspositionedoneafteranother–onesthatformaphrase.Forexample,aquerythatwouldonlymatchdocumentsthathavethewestennichtsneuesphraseintheotitlefieldwouldlookasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_phrase":{

"otitle":"westennichtsneues"

}

}

}'

Spans,spanseverywhereThelastusecasewewouldliketodiscussisamorecomplicatedexampleofpositionawarequeriescalledspanqueries.Imaginethatwewouldliketorunaquerytofinddocumentsthathavethewesternfrontphrasenotmorethanthreepositionsafterthetermquietandallthatjustaftertheallterm?Thiscanbedonewithspanqueriesandthefollowingcommandshowshowsuchquerywilllook:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"span_near":{

"clauses":[

{

"span_term":{

www.EBooksWorld.ir

Page 259: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"title":"all"

}

},

{

"span_near":{

"clauses":[

{

"span_term":{

"title":"quiet"

}

},

{

"span_near":{

"clauses":[

{

"span_term":{

"title":"western"

}

},

{

"span_term":{

"title":"front"

}

}

],

"slop":0,

"in_order":true

}

}

],

"slop":3,

"in_order":true

}

}

],

"slop":0,

"in_order":true

}

}

}'

Notethatthespanqueriesarenotanalyzed.WecanseethatbylookingattheresponseoftheExplainAPI.Toseethatresponse,weshouldrunthesamerequestbody(ourquery)tothe/library/book/1/_explainRESTend-point.Theinterestingpartoftheoutputlooksasfollows:

"description":"weight(spanNear([title:all,spanNear([title:quiet,

spanNear([title:western,title:front],0,true)],3,true)],0,true)in0)

[PerFieldSimilarity],resultof:",

www.EBooksWorld.ir

Page 260: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 261: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SummaryThischapterhasbeenallaboutthequeryingprocess.WestartedbylookingathowtoqueryElasticsearchandwhatElasticsearchdoeswhenitneedstohandlethequery.Wealsolearnedaboutthebasicandcompoundqueries,sowearenowabletousebothsimplequeriesaswellastheonesthatgroupmultiplesmallqueriestogether.Finally,wediscussedhowtochoosetherightqueryforagivenusecase.

Inthenextchapter,wewillextendourqueryknowledge.WewillstartwithfilteringourqueriesandmovetohighlightingpossibilitiesandawaytovalidateourqueriesusingElasticsearchAPI.WewilldiscusssortingofsearchresultsandqueryrewritewhichwillshowuswhathappenstosomequeriesinElasticsearchinternals.

www.EBooksWorld.ir

Page 262: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 263: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Chapter4.ExtendingYourQueryingKnowledgeInthepreviouschapter,wedivedintoElasticsearchqueryingcapabilities.WediscussedhowtoqueryElasticsearchindetailandwelearnedhowElasticsearchqueryingworks.Wenowknowthebasicandcompoundqueriesofthisgreatsearchengineandwhataretheconfigurationoptionsforeachquerytype.Wealsogottoknowwhentouseourqueriesandwediscussedafewusecasesandwhichqueriescanbeusedtohandlethem.Thischapterisdedicatedtoextendingourqueryingknowledge.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

WhatfilteringisandhowtouseitWhathighlightingisandhowtouseitWhatarethehighlightertypesandwhatbenefitstheybringHowtovalidateyourqueriesHowtosortyourqueryresultsWhatqueryrewriteisandhowtocontrolit

www.EBooksWorld.ir

Page 264: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

FilteringyourresultsInthepreviouschapter,wetalkedaboutvarioustypesofqueries.Thecommonpartwasthatwealwayswantedtogetthebestresultsfirst.Thisisthemaindifferencefromthestandarddatabaseapproachwhereeverydocumentmatchesthequeryornot.Inthedatabaseworld,wedonotaskhowgoodthedocumentis;ouronlyinterestliesintheresultsreturned.Whentalkingaboutfulltextsearchenginesthisisdifferent–weareinterestednotonlyintheresults,wearealsointerestedintheirquality.Thereasonisobvious,wearesearchinginunstructureddata,usingtextfieldsthatuselanguageanalysis,stemming,andsoon.Becauseofthat,theinitialresultsofourqueries,inmostcases,giveresultsthatarefarfromoptimal.Thisiswhywhenwetalkaboutsearching,wetalkaboutprecisionanddocumentrecall.

Ontheotherhand,sometimeswewanttolimitthewholesubsetofdocumentstoachosenpart.Forexample,inalibrary,wemaywanttosearchonlytheavailablebooks,therestbeingunimportant.Sometimesthescore,busilycalculatedforthegivenfields,onlyinterfereswiththeoverallscoreandhasnomeaningintermsofaccuracy.Insuchcases,filtersshouldbeusedtolimittheresultsofthequery,butnotinterferewiththecalculatedscore.

PriortoElasticsearch2.0,filterswereindependententitiesfromqueries.Inpractice,almosteveryqueryhaditsowncounterpartinfilters.Therewasthetermqueryandthetermfilter,theboolqueryandtheboolfilter,therangequeryandtherangefilter,andsoon.Fromtheuserpointofview,themostimportantdifferencebetweenthequeriesandthefilterswasscoring.Thefilterdidn’tcalculatescore,whichresultedinthefilterbeingeasilycachedandmoreefficient.Butthisdifferencewasveryinconvenientforusers.WiththereleaseofElasticsearch2.0anditsusageofLucene5.3,filterqueriesweredeprecatedalongwithsometypesofqueriesthatallowedustousefilters.Let’sdiscusshowfilteringworksnowandwhatwecandotoachievethesameorbetterperformanceasbeforeinElasticsearch2.0.

www.EBooksWorld.ir

Page 265: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThecontextisthekeyInElasticsearch2.0,queriescancalculatescoreoromititbychoosingmoreefficientwayofexecution.Thisbehavior,inmanycases,isdoneautomaticallybasedonthecontextwherethequeryisused.Thisisaboutthequeriesthatincludefiltersections,whichremovethedocumentsbasedonsomecriteria.Thesedocumentsareunnecessaryinthereturnedresultsandshouldbeskippedasquicklyaspossiblewithoutaffectingtheoverallscore.Thankstothis,afterdiscardingsomedocumentswecanfocusonlyontherestofthedocuments,calculatingtheirscores,andsortingthembeforereturning.Theexampleofthiscasecanbethemust_notclauseofaBooleanquery.Thedocumentthatmatchesthemust_notclausewillberemovedfromthereturnedresultset,socalculatingthescoreforthedocumentsmatchedbythispartoftheboolquerywouldbeanadditional,unnecessary,andperformanceineffectivework.

Thebestthingaboutallthechangesisthatwedon’tneedtocareaboutifwewanttousefilteringornot.ElasticsearchandtheunderlyingApacheLucenelibrarytakecareofchoosingtherightexecutionmethodforus.

www.EBooksWorld.ir

Page 266: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ExplicitfilteringwithboolqueryAswementionedintheCompoundqueriessectioninChapter3,SearchingYourData,theboolqueryinElasticsearch2.0allowsustoaddafilterexplicitlybyaddingthefiltersectionandincludingaqueryinthatsection.Thisisveryconvenientifwewanttohaveapartofthequerythatneedstomatch,butwearenotinterestedinthescoreforthosedocuments.

Let’slookatthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"available":true

}

}

}'

Weseeasimplequerythatshouldreturnallthebooksinourlibraryavailableforborrowing,whichmeansthedocumentswiththeavailablefieldsettotrue.Nowlet’scompareitwiththefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"bool":{

"must":{

"match_all":{}

},

"filter":{

"term":{

"available":true

}

}

}

}

}'

Thisqueryreturnsallthebooks,butitalsocontainsthefiltersection,whichtellsElasticsearchthatweareonlyinterestedintheavailablebooks.Thequerywillreturnthesameresultsasthepreviousquerywe’veseen,ofcoursewhenlookingonlyatthenumberofdocumentsandwhichdocumentsarereturned.Thedifferenceisthescore.Forourexampledata,boththequeriesreturntwobooks.Theresultsreturnedforthefirstquerylookasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

www.EBooksWorld.ir

Page 267: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"total":2,

"max_score":1.0,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":1.0,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

},{

"_index":"library",

"_type":"book",

"_id":"1",

"_score":0.30685282,

"_source":{

"title":"AllQuietontheWesternFront",

"otitle":"ImWestennichtsNeues",

"author":"ErichMariaRemarque",

"year":1929,

"characters":["PaulBäumer","AlbertKropp","HaieWesthus",

"FredrichMüller","StanislausKatczinsky","Tjaden"],

"tags":["novel"],

"copies":1,

"available":true,

"section":3

}

}]

}

}

Theresultsforthesecondquerylookasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":2,

"max_score":1.0,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":1.0,

www.EBooksWorld.ir

Page 268: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

},{

"_index":"library",

"_type":"book",

"_id":"1",

"_score":1.0,

"_source":{

"title":"AllQuietontheWesternFront",

"otitle":"ImWestennichtsNeues",

"author":"ErichMariaRemarque",

"year":1929,

"characters":["PaulBäumer","AlbertKropp","HaieWesthus",

"FredrichMüller","StanislausKatczinsky","Tjaden"],

"tags":["novel"],

"copies":1,

"available":true,"section":3}

}]

}

}

Ifyoulookatthescoreforthedocumentsineachquery,you’llnoticethedifference.Inthesimpletermquery,Elasticsearch(theLucenelibrary,infact)hasascoreof1.0forthefirstdocumentandascoreof0.30685282forthesecondone.Thisisnotaperfectsolutionbecausetheavailabilitycheckismoreorlessbinaryandwedon’twantittointerferewiththescore.That’swhythesecondqueryisbetterinthiscase.Withtheboolqueryandfiltering,thescoreforthefilterelementisnotcalculatedandthescoreforboththedocumentsisthesame,thatis1.0.

www.EBooksWorld.ir

Page 269: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 270: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

HighlightingYouhaveprobablyheardofhighlightingorseenit.YoumaynotevenknowthatyouareactuallyusinghighlightingwhenyouareusingthebiggerandsmallerpublicsearchenginesontheWorldWideWeb(WWW).Whenwetalkabouthighlightingincontextoffulltextsearch,weusuallymeanshowingwhichwordsorphrasesfromthequerywerematchedintheresultingdocuments.Forexample,ifweuseGoogleandsearchforthewordlucene,wewouldseethatwordboldedinthesearchresults:

ItisevenmorevisibleontheMicrosoftBingsearchengine:

Inthischapter,wewillseehowtouseElasticsearchhighlightingcapabilitiestoenhanceourapplicationwithhighlightedresults.

www.EBooksWorld.ir

Page 271: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

GettingstartedwithhighlightingThereisnobetterwayofshowinghowhighlightingworksotherthanmakingaqueryandlookingattheresultsreturnedbyElasticsearch.Solet’sdothat.Weassumethatwewouldliketohighlightthetermsthatarematchedinthetitlefieldofourdocumentstoincreasethesearchexperienceofourusers.Bynowyouknowtheexampledatafromtoptobottom,solet’sagainreusethesamedataset.Wewanttomatchthetermcrimeinthetitlefieldandwewanttogethighlightingresults.Oneofthesimplestqueriesthatcanachievethislooksasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"match":{

"title":"crime"

}

},

"highlight":{

"fields":{

"title":{}

}

}

}'

Theresponsefortheprecedingqueryisasfollows:

{

"took":16,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

},

"highlight":{

"title":["<em>Crime</em>andPunishment"]

}

www.EBooksWorld.ir

Page 272: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}]

}

}

Asyoucansee,apartfromthestandardinformationaboutthedocumentsthatmatchedthequery,wegotanewsectioncalledhighlight.Elasticsearchusedthe<em>HTMLtagasthebeginningofthehighlightingsectionanditsclosingcounterparttoclosethehighlightedsection.ThisisthedefaultbehaviorofElasticsearch,butwewilllearnhowtochangethat.

www.EBooksWorld.ir

Page 273: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

FieldconfigurationInordertoperformhighlighting,theoriginalcontentofthefieldneedstobepresent.Wehavetosetthefieldswewilluseforhighlighting.Thisisdonebyeithermarkingafieldtobestoredorusingthe_sourcefieldwiththosefieldsincluded.Ifthefieldissettobestoredinthemappings,thestoredversionwillbeused,otherwiseElasticsearchwilltrytousethe_sourcefieldandextractthefieldthatneedstobehighlighted.

www.EBooksWorld.ir

Page 274: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UnderthehoodElasticsearchusesApacheLuceneunderthehoodandhighlightingisoneofthefeaturesofthatlibrary.Luceneprovidesthreetypesofhighlightingimplementation:thestandardone,whichwejustused;thesecondonecalledFastVectorHighlighter(https://lucene.apache.org/core/5_4_0/highlighter/org/apache/lucene/search/vectorhighlight/FastVectorHighlighter.htmlwhichneedstermvectorsandpositionstobeabletowork;andthethirdonecalledPostingsHighlighter

(http://lucene.apache.org/core/5_4_0/highlighter/org/apache/lucene/search/postingshighlight/PostingsHighlighter.htmlElasticsearchchoosestherighthighlighterimplementationautomatically.Ifthefieldisconfiguredwiththeterm_vectorpropertysettowith_positions_offsets,FastVectorHighlighterwillbeused.Ifthefieldisconfiguredwiththeindex_optionspropertysettooffsets,PostingsHighlighterwillbeused.Otherwise,thestandardhighlighterwillbeusedbyElasticsearch.

Whichhighlightertousedependsonyourdata,yourqueries,andtheneededperformance.Thestandardhighlighterisageneralusecaseone.However,ifyouwanttohighlightfieldswithlotsofdata,FastVectorHighlighteristherecommendedone.Thethingtorememberaboutitisthatitrequirestermvectorstobepresentandthatwillmakeyourindexslightlylarger.Finally,thefastesthighlighter,thatisalsorecommendedfornaturallanguagehighlighting,isPostingsHighlighter.However,thethingtorememberisthatPostingsHighlighterdoesn’tsupportcomplexqueriessuchasthematch_phrase_prefixqueryandinsuchcaseshighlightingwon’tbereturned.

ForcinghighlightertypeWhileElasticsearchchoosesthehighlightertypeforus,wecanalsoenforcethehighlightingtypeifwereallywantto.Todothat,weneedtosetthetypepropertytooneofthefollowingvalues:

plain:Whenthisvalueisset,Elasticsearchwillusethestandardhighlighterfvh:Whenthisvalueisset,ElasticsearchwilltryusingFastVectorHighlighter.Itwillrequiretermvectorstobeturnedonforthefieldusedforhighlighting.postings:Whenthisvalueisset,ElasticsearchwilltryusingPostingsHighlighter.Itwillrequireoffsetstobeturnedonforthefieldusedforhighlighting

Forexample,tousethestandardhighlighter,wewillrunthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"fields":{

"title":{"type":"plain"}

}

}

}'

www.EBooksWorld.ir

Page 275: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ConfiguringHTMLtagsThedefaultbehaviorofhighlightingmechanismmaynotbesuitedforeveryone–notallofuswouldliketohavethe<em>and</em>tagstobeusedforhighlighting.Becauseofthat,Elasticsearchallowsustochangethedefaultbehaviorandchangethetagsthatareusedforthatpurpose.Todothat,weshouldsetthepre_tagsandpost_tagspropertiestothecodesnippetswewantthehighlightingtostartfromandendat;forexample,by<b>and</b>.Thepre_tagsandpost_tagspropertiesarearraysandbecauseofthatwecanprovidemorethanasingleopeningandclosingtagandElasticsearchwilluseeachofthedefinedtagstohighlightdifferentwords.Forexample,ifwewanttouse<b>astheopeningtagand</b>astheclosingtag,ourquerywilllooklikethis:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"pre_tags":["<b>"],

"post_tags":["</b>"],

"fields":{

"title":{}

}

}

}'

TheresultreturnedbyElasticsearchtotheprecedingquerywillbeasfollows:

{

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

www.EBooksWorld.ir

Page 276: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"available":true

},

"highlight":{

"title":["<b>Crime</b>andPunishment"]

}

}]

}

}

Asyoucansee,thetermCrimeinthetitlefieldwassurroundedbythetagsofourchoice.

www.EBooksWorld.ir

Page 277: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ControllinghighlightedfragmentsElasticsearchallowsustocontrolthenumberofhighlightedfragmentsreturnedandtheirsizesbyexposingtwoproperties.Thefirstoneisnumber_of_fragments,whichdefinesthenumberoffragmentsreturnedbyElasticsearch(defaultsto5).Settingthispropertyto0causesthewholefieldtobereturned,whichcanbehandyforshortfieldsbutexpensiveforlongerfields.Thesecondpropertyisfragment_sizewhichletsusspecifythemaximumlengthofthehighlightedfragmentsincharactersanddefaultsto100.

Anexamplequeryusingthesepropertieswilllookasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"fields":{

"title":{"fragment_size":200,"number_of_fragments":0}

}

}

}'

www.EBooksWorld.ir

Page 278: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

GlobalandlocalsettingsThehighlightingpropertieswediscussedpreviouslycanbesetbothonaglobalbasisandperfieldbasis.Theglobaloneswillbeusedforallthefieldsthatdon’toverwritethemandshouldbeplacedonthesamelevelasthefieldssectionofyourhighlighting,forexample,likethis:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"pre_tags":["<b>"],

"post_tags":["</b>"],

"fields":{

"title":{}

}

}

}'

Youcanalsosetthepropertiesforeachfield.Forexample,ifwewouldliketokeepthedefaultbehaviorforallthefieldsexceptourtitlefield,wewoulddothefollowing:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"fields":{

"title":{"pre_tags":["<b>"],"post_tags":["</b>"]}

}

}

}'

Asyoucansee,insteadofplacingthepropertiesonthesamelevelasthefieldssection,weplaceditinsidetheemptyJSONobjectthatspecifiesthetitlefieldbehavior.Ofcourse,eachfieldcanbeconfiguredusingdifferentproperties.

www.EBooksWorld.ir

Page 279: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RequirematchingSometimestheremaybeaneed(especiallywhenusingmultiplehighlightedfields)toshowonlythefieldsthatmatchedourquery.Inordertohavesuchbehavior,weneedtosettherequire_field_matchpropertytotrue.Settingthispropertytofalsewillcauseallthetermstobehighlightedevenifafielddidn’tmatchthequery.

Toseehowthatworks,let’screateanewindexcalledusersandlet’sindexasingledocumentthere.Wewilldothatbysendingthefollowingcommand:

curl-XPUT'http://localhost:9200/users/user/1'-d'{

"name":"Testuser",

"description":"Testdocument"

}'

So,let’sassumewewanttohighlightthehitsinbothoftheprecedingfields.Ourcommandsendingthequerytoournewindexwilllooklikethis:

curl-XGET'localhost:9200/users/_search?pretty'-d'{

"query":{

"term":{

"name":"test"

}

},

"highlight":{

"fields":{

"name":{"pre_tags":["<b>"],"post_tags":["</b>"]},

"description":{"pre_tags":["<b>"],"post_tags":["</b>"]}

}

}

}'

Theresultoftheprecedingquerywillbeasfollows:

{

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.19178301,

"hits":[{

"_index":"users",

"_type":"user",

"_id":"1",

"_score":0.19178301,

"_source":{

"name":"Testuser",

"description":"Testdocument"

},

"highlight":{

www.EBooksWorld.ir

Page 280: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"name":["<b>Test</b>user"]

}

}]

}

}

Notethatweonlygothighlightingonthenamefield.Thisisbecauseourquerymatchedonlythatfield.Let’sseewhatwillhappenifwesettherequire_field_matchpropertytofalseanduseacommandsimilartothefollowingone:

curl-XGET'localhost:9200/users/_search?pretty'-d'{

"query":{

"term":{

"name":"test"

}

},

"highlight":{

"require_field_match":false,

"fields":{

"name":{"pre_tags":["<b>"],"post_tags":["</b>"]},

"description":{"pre_tags":["<b>"],"post_tags":["</b>"]}

}

}

}'

Nowlet’slookatthemodifiedqueryresults:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.19178301,

"hits":[{

"_index":"users",

"_type":"user",

"_id":"1",

"_score":0.19178301,

"_source":{

"name":"Testuser",

"description":"Testdocument"

},

"highlight":{

"name":["<b>Test</b>user"],

"description":["<b>Test</b>document"]

}

}]

}

}

Asyoucansee,Elasticsearchreturnedhighlightinginboththefieldsnow.

www.EBooksWorld.ir

Page 281: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CustomhighlightingqueryThereareusecaseswhereyourqueriesarecomplicatedandnotreallysuitableforhighlighting,butyoustillwanttousehighlightingfunctionality.Insuchcases,Elasticsearchallowsustohighlightresultsonthebasisofadifferentqueryprovidedusingthehighlight_queryproperty.Anexampleofusingadifferenthighlightingquerylooksasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"fields":{

"title":{

"highlight_query":{

"term":{

"title":"punishment"

}

}

}

}

}

}'

Theprecedingquerywillresultinhighlightingthetermpunishmentinthetitlefield,insteadofthecrimeone.

www.EBooksWorld.ir

Page 282: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThePostingshighlighterItistimetotalkaboutthethirdavailablehighlighter.ItwasaddedinElasticsearch0.90.6andisslightlydifferentfromthepreviousones.PostingsHighlighterisautomaticallyusedwhenthefielddefinitionhasindex_optionssettooffsets.ToillustratehowPostingsHighlighterworks,wewillcreateasimpleindexwithproperconfigurationthatallowsthathighlightertowork.Wewilldothatbyusingthefollowingcommands:

curl-XPUT'localhost:9200/hl_test'

curl-XPOST'localhost:9200/hl_test/doc/_mapping'-d'{

"doc":{

"properties":{

"contents":{

"type":"string",

"fields":{

"ps":{"type":"string","index_options":"offsets"}

}

}

}

}

}'

Ifeverythinggoeswell,weshouldhaveanewindexandthemappings.Themappingshavetwofieldsdefined:onenamedcontentsandthesecondonenamedcontents.ps.Inthissecondcase,weturnedontheoffsetsbyusingtheindex_optionsproperty.ThismeansthatElasticsearchwillusethestandardhighlighterforthecontentsfieldandthepostingshighlighterforthecontents.psfield.

Toseethedifference,wewillindexasingledocumentwithafragmentfromWikipediadescribingthehistoryofBirmingham.Wedothatbyrunningthefollowingcommand:

curl-XPUTlocalhost:9200/hl_test/doc/1-d'{

"contents":"Birmingham''searlyhistoryisthatofaremoteand

marginalarea.Themaincentresofpopulation,powerandwealthinthepre-

industrialEnglishMidlandslayinthefertileandaccessiblerivervalleys

oftheTrent,theSevernandtheAvon.TheareaofmodernBirminghamlayin

between,ontheuplandBirminghamPlateauandwithinthedenselywoodedand

sparselypopulatedForestofArden."

}'

Thelaststepistosendaqueryusingboththehighlighters.Wecandoitinasinglerequestbyusingthefollowingcommand:

curl'localhost:9200/hl_test/_search?pretty'-d'{

"query":{

"term":{

"contents.ps":"modern"

}

},

"highlight":{

"require_field_match":false,

"fields":{

"contents":{},

"contents.ps":{}

www.EBooksWorld.ir

Page 283: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

}

}'

Ifeverythinggoeswell,youwillfindthefollowingsnippetintheresponsereturnedbyElasticsearch:

"highlight":{

"contents":["valleysoftheTrent,theSevernandtheAvon.Thearea

of<em>modern</em>Birminghamlayinbetween,ontheupland"],

"contents.ps":["Theareaof<em>modern</em>Birminghamlayinbetween,

ontheuplandBirminghamPlateauandwithinthedenselywoodedandsparsely

populatedForestofArden."]

}

Asyousee,boththehighlightersfoundtheoccurrenceofthedesiredword.Thedifferenceisthatthepostingshighlighterreturnsthesmartersnippet–itchecksforthesentenceboundaries.

Let’stryonemorequery:

curl'localhost:9200/hl_test/_search?pretty'-d'{

"query":{

"match_phrase":{

"contents.ps":"centresof"

}

},

"highlight":{

"require_field_match":false,

"fields":{

"contents":{},

"contents.ps":{}

}

}

}'

Wesearchedforthephrasecentresof.Asyoumayexpect,theresultsforthetwohighlighterswilldiffer.Forthestandardhighlighter,runonthecontentsfield,wewillfindthefollowingphraseintheresponse:

"Birminghamsearlyhistoryisthatofaremoteandmarginalarea.Themain

<em>centres</em><em>of</em>population"

Asyoucanclearlysee,thestandardhighlighterdividedthegivenphraseandhighlightedindividualterms.Also,notalloccurrencesofthetermscentresandofwerehighlighted,butonlytheonesthatformedthephrase.

Ontheotherhand,thePostingsHighlighterreturnedthefollowinghighlightedfragment:

"Birminghamsearlyhistoryisthat<em>of</em>aremoteandmarginal

area.","Themain<em>centres</em><em>of</em>population,powerandwealth

inthepre-industrialEnglishMidlandslayinthefertileandaccessible

rivervalleys<em>of</em>theTrent,theSevernandtheAvon.","Thearea

<em>of</em>modernBirminghamlayinbetween,ontheuplandBirmingham

PlateauandwithinthedenselywoodedandsparselypopulatedForest

www.EBooksWorld.ir

Page 284: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

<em>of</em>Arden."

Thisisthesignificantdifference.ThePostingsHighlighterhighlightedallthetermsmatchingthequeryandnotonlythosethatformedthephrase,andreturnedwholesentences.Thisisaverynicefeature,especiallywhenyouwanttodisplaythehighlightingresultsfortheuserintheUIofyourapplication.

www.EBooksWorld.ir

Page 285: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 286: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ValidatingyourqueriesTherearetimeswhenyouarenotintotalcontrolofthequeriesthatyousendtoElasticsearch.Thequeriescanbegeneratedfrommultiplecriteriamakingthemamonsterorevenworse.Theycanbegeneratedbysomekindofawizardwhichmakesithardtotroubleshootandfindthepartthatisfaultyandmakingthequeryfail.Becauseofsuchusecases,ElasticsearchexposestheValidateAPI,whichhelpsusvalidateourqueriesanddiagnosepotentialproblems.

www.EBooksWorld.ir

Page 287: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UsingtheValidateAPITheusageoftheValidateAPIisverysimple.Insteadofsendingthequerytothe_searchRESTendpoint,wesendittothe_validate/queryone.Andthat’sit.Let’slookatthefollowingcommand:

curl-XGET'localhost:9200/library/_validate/query?pretty'--data-binary'{

"query":{

"bool":{

"must":{

"term":{

"title":"crime"

}

},

"should":{

"range:{

"year":{

"from":1900,

"to":2000

}

}

},

"must_not":{

"term":{

"otitle":"nothing"

}

}

}

}

}'

AsimilarquerywasalreadyusedinthisbookinChapter3,SearchingYourData.TheprecedingcommandwilltellElasticsearchtovalidateitandreturntheinformationaboutitsvalidity.TheresponseofElasticsearchtotheprecedingcommandwillbesimilartothefollowingone:

{

"valid":false,

"_shards":{

"total":1,

"successful":1,

"failed":0

}

}

Lookatthevalidattribute.Itissettofalse.Somethingwentwrong.Let’sexecutethequeryvalidationonceagainwiththeexplainparameteraddedinthequery:

curl-XGET'localhost:9200/library/_validate/query?pretty&explain'--data-

binary'{

"query":{

"bool":{

"must":{

"term":{

www.EBooksWorld.ir

Page 288: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"title":"crime"

}

},

"should":{

"range:{

"year":{

"from":1900,

"to":2000

}

}

},

"must_not":{

"term":{

"otitle":"nothing"

}

}

}

}

}'

NowtheresultreturnedfromElasticsearchismoreverbose:

{

"valid":false,

"_shards":{

"total":1,

"successful":1,

"failed":0

},

"explanations":[{

"index":"library",

"valid":false,

"error":"[library]QueryParsingException[Failedtoparse];nested:

JsonParseException[Illegalunquotedcharacter((CTRL-CHAR,code10)):has

tobeescapedusingbackslashtobeincludedinname\nat[Source:

org.elasticsearch.transport.netty.ChannelBufferStreamInput@1110d090;line:

10,column:18]];;com.fasterxml.jackson.core.JsonParseException:Illegal

unquotedcharacter((CTRL-CHAR,code10)):hastobeescapedusing

backslashtobeincludedinname\nat[Source:

org.elasticsearch.transport.netty.ChannelBufferStreamInput@1110d090;line:

10,column:18]"

}]

}

Noweverythingisclear.Inourexample,wehaveimproperlyquotedtherangeattribute.

NoteYoumaywonderwhyinourcurlqueryweusedthe--data-binaryparameter.ThisparameterproperlypreservesthenewlinecharacterwhensendingaquerytoElasticsearch.Thismeansthatthelineandthecolumnnumberremainintactandit’seasiertofinderrors.Intheothercases,the–dparameterismoreconvenientbecauseit’sshorter.

TheValidateAPIcanalsodetectothererrors,forexample,incorrectformatofanumberorothermapping-relatedissues.Unfortunately,forourapplication,itisnoteasytodetectwhattheproblemisbecauseofalackofstructureintheerrormessages.

www.EBooksWorld.ir

Page 289: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheValidateAPIsupportsmostoftheparametersthataresupportedbystandardElasticsearchqueries,whichinclude:explain,ignore_unavailable,allow_no_indices,expand_wildcards,operation_threading,analyzer,analyze_wildcard,default_operator,df,lenient,lowercase_expanded_terms,andrewrite.

www.EBooksWorld.ir

Page 290: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 291: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SortingdataSofarwe’verunourqueriesandgottheresultsintheorderdeterminedbythescoreofeachdocument.However,itisnotenoughforalltheusecases.Itisreallyhandytobeabletosortourresultsonthebasisofthefieldvalues.Forexample,whenyouaresearchinglogsortime-baseddataingeneral,youprobablywanttohavethemostrecentdatafirst.Inadditiontothat,Elasticsearchallowsustocontrolhowthedocumentsuchbesortednotonlyusingfieldvalues,butalsousingmoresophisticatedsortinglikeonesthatusescriptsorsortingonfieldsthathavemultiplevalues.Wewillcoverallthatinthissection.

www.EBooksWorld.ir

Page 292: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DefaultsortingLet’slookatthefollowingquerythatreturnsallthebookswithatleastoneofthespecifiedwords:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"terms":{

"title":["crime","front","punishment"]

}

}

}'

Underthehood,wecanimaginethatElasticsearchseestheprecedingqueryasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"terms":{

"title":["crime","front","punishment"]

}

},

"sort":{"_score":"desc"}

}'

Lookatthehighlightedsectionintheprecedingquery.ThisisthedefaultsortingusedbyElasticsearch.Forbettervisibility,wecanchangetheformattingslightlyandshowthehighlightedfragmentasfollows:

"sort":[

{"_score":"desc"}

]

Theprecedingsectiondefineshowthedocumentsshouldbesortedintheresultslist.Inthiscase,Elasticsearchwillshowthedocumentswiththehighestscoreontopoftheresultslist.Thesimplestmodificationistoreversetheorderingbychangingthesortsectiontothefollowingone:

"sort":[

{"_score":"asc"}

]

www.EBooksWorld.ir

Page 293: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SelectingfieldsusedforsortingDefaultsortingisboring,isn’tit?So,let’schangeittosortonthebasisofthevaluesofthefieldspresentinthedocuments.Let’schoosethetitlefield,whichmeansthatthesortsectionofourquerywilllookasfollows:

"sort":[

{"title":"asc"}

]

Unfortunately,thisdoesn’tworkasexpected.AlthoughElasticsearchsortedthedocuments,theorderingissomewhatstrange.Lookcloselyattheresponse.Witheverydocument,Elasticsearchreturnsinformationaboutthesorting;forexample,fortheCrimeandPunishmentbook,thereturneddocumentlookslikethefollowingcode:

{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":null,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

},

"sort":["punishment"]

}

Ifyoucomparethetitlefieldandthereturnedsortinginformation,everythingshouldbeclear.Elasticsearch,duringtheanalysisprocess,splitsthefieldintoseveraltokens.Sincesortingisdoneusingasingletoken,Elasticsearchchoosesoneoftheproducedtokens.Itdoesthebestthatitcanbysortingthesetokensalphabeticallyandchoosingthefirstone.Thisisthereasonwhy,inthesortingvalue,wefindonlyasinglewordinsteadofthewholecontentofthetitlefield.IfyouwouldliketoseehowElasticsearchbehaveswhenusingdifferentfieldsforsorting,youcantryfieldssuchascopies:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"terms":{

"title":["crime","front","punishment"]

}

},

"sort":[

{"copies":"asc"}

]

}'

Ingeneral,itisagoodideatohaveanotanalyzedfieldforsorting.Wecanusefieldswith

www.EBooksWorld.ir

Page 294: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

multiplevaluesforsorting,but,inmostcases,itdoesn’tmakemuchsenseandhaslimitedusage.

Asanexampleofusingtwodifferentfields,oneforsortingandanotherforsearching,let’schangeourtitlefield.Thechangedtitlefielddefinitionwilllookasfollows:

"title":{

"type":"string",

"fields":{

"sort":{"type":"string","index":"not_analyzed"}

}

}

Afterchangingthetitlefieldinthemappings(we’veusedthesamemappingsasinChapter3,SearchingYourData)andre-indexingthedata,wecantrysortingthetitle.sortfieldandseewhetheritworks.Todothis,wewillneedtosendthefollowingquery:

{

"query":{

"match_all":{}

},

"sort":[

{"title.sort":"asc"}

]

}

Now,itworksproperly.Asyoucansee,weusedthenewfield,thetitle.sortone.Wesetitasnottobeanalyzed,sothereisasingletokenforthatfieldintheindexofElasticsearch.

SortingmodeIntheresponsefromElasticsearch,everydocumentcontainsinformationaboutthevalueusedforsorting.Forexample,let’slookatoneofthedocumentsreturnedbythequeryinwhichweusedthetitlefieldforsorting:

{

"_index":"library",

"_type":"book",

"_id":"1",

"_score":null,

"_source":{

"title":"AllQuietontheWesternFront",

"otitle":"ImWestennichtsNeues",

"author":"ErichMariaRemarque",

"year":1929,

"characters":["PaulBäumer","AlbertKropp","HaieWesthus",

"FredrichMüller","StanislausKatczinsky","Tjaden"],

"tags":["novel"],

"copies":1,

"available":true,

"section":3

},

"sort":["all"]

www.EBooksWorld.ir

Page 295: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

Thesortingusedinthequerytogettheprecedingdocument,wasasfollows:

"sort":[

{"title":"asc"}

]

However,becausewearesortingonananalyzedfield,whichcontainsmorethanasinglevalue,thesortingdefinitionisinfactequivalenttothelongerform,whichlooksasfollows:

"sort":[

{"title":{"order":"asc","mode":"min"}

]

modedefineswhichtokenshouldbeusedforcomparisonwhensortingonafieldwhichhasmorethanonevalue.Theavailablevalueswecanchoosefromare:

min:Sortingwillusethelowestvalue(orthefirstalphabeticalvalueonthetextbasedfields)max:Sortingwillusethehighestvalue(orthelastalphabeticalvalueonthetextbasedfields)avg:Sortingwillusetheaveragevaluemedian:Sortingwillusethemedianvaluesum:Sortingwillusethesumofallthevaluesinthefield

NoteThemodessuchasmedian,avg,andsumareusefulfornumericalmultivaluedfields,butdon’tmakemuchsensewhenitcomestotextbasedfields.

Notethatsort,inrequestandresponse,isgivenasanarray.Thissuggeststhatwecanuseseveraldifferentorderings.Elasticsearchwillusethenextelementinthesortingdefinitionlisttodetermineorderingbetweenthedocumentsthathavethesamevalueoftheprevioussortingclause.So,ifwehavethesamevalueinthetitlefield,thedocumentswillbesortedbythenextfieldthatwespecify.Forexample,ifwewouldliketogetthedocumentsthathavethemostcopiesandthensortbythetitle,wewillrunthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"terms":{

"title":["crime","front","punishment"]

}

},

"sort":[

{"copies":"desc"},{"title":"asc"}

]

}'

www.EBooksWorld.ir

Page 296: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SpecifyingbehaviorformissingfieldsWhataboutwhensomeofthedocumentsthatmatchthequerydon’thavethefieldwewanttosorton?Bydefault,documentswithoutthegivenfieldarereturnedfirstinthecaseofascendingorderandlastinthecaseofdescendingorder.However,sometimesthisisnotexactlywhatwewanttoachieve.

Whenweusesortingonnumericfields,wecanchangethedefaultElasticsearchbehaviorfordocumentswithmissingfields.Forexample,let’stakealookatthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":[

{

"section":{

"order":"asc",

"missing":"_last"

}

}

]

}'

Notetheextendedformofthesortsectionofourquery.We’veaddedthemissingparametertoit.Bysettingthemissingparameterto_last,Elasticsearchwillplacethedocumentswithoutthegivenfieldatthebottomoftheresultslist.Settingthemissingparameterto_firstwillresultinElasticsearchplacingdocumentswithoutthegivenfieldatthetopoftheresultslist.Itisworthmentioningthatbesidesthe_lastand_firstvalues,Elasticsearchalsoallowsustouseanynumber.Insuchacase,adocumentwithoutadefinedfieldwillbetreatedasthedocumentwiththisgivenvalue.

www.EBooksWorld.ir

Page 297: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DynamiccriteriaAswementionedintheprevioussection,Elasticsearchallowsustosortusingfieldsthathavemultiplevalues.Wecancontrolhowthecomparisonismadeusingscripts.WedothatbyshowingElasticsearchhowtocalculatethevaluethatshouldbeusedforsorting.Let’sassumethatwewanttosortbythefirstvalueindexedinthetagsfield.Let’stakealookatthefollowingexamplequery(notethatrunningthefollowingqueryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile):

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":"doc[\"tags\"].values.size()>0?doc[\"tags\"].values[0]

:\"\u19999\"",

"type":"string",

"order":"asc"

}

}

}'

Intheprecedingexample,wereplacedeverynonexistentvaluewiththeUnicodecodeofacharacterthatshouldbelowenoughinthelist.Themainideaofthiscodeistocheckifourarraycontainsatleastasingleelement.Ifitdoes,thenthefirstvaluefromthearrayisreturned.Ifthearrayisempty,wereturntheUnicodecharacterthatshouldbeplacedatthebottomoftheresultslist.Besidesthescriptparameter,thisoptionofsortingrequiresustospecifytheorder(ascending,inourcase)andtypeparametersthatwillbeusedforthecomparison(wereturnstringfromourscript).

www.EBooksWorld.ir

Page 298: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CalculatescoringwhensortingBydefault,Elasticsearchassumesthatwhenyouusesorting,thescoreiscompletelyunimportant.Usuallyitisagoodassumption;whydoadditionalcomputationswhentheimportanceofthedocumentsisgivenbythesortingformula.Sometimes,however,youwanttoknowhowgoodthedocumentisinrelationtothecurrentquery,evenifthedocumentsarepresentedinadifferentorder.Thisiswhenthetrack_scoresparametershouldbeusedandsettotrue.Anexamplequeryusingitlooksasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"match_all":{}

},

"track_scores":true,

"sort":[

{"title":{"order":"asc"}}

]

}'

Theprecedingquerycalculatesthescoreforeverydocument.Infact,inourexample,thescoreisboringandisalwaysequalto1.0becauseofthematch_allquerywhichtreatsallthedocumentsasequal.

www.EBooksWorld.ir

Page 299: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 300: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

QueryrewriteWhendebuggingyourqueries,itisveryvaluabletoknowhowallthequeriesareexecuted.Becauseofthat,wedecidedtoincludethesectiononhowqueryrewriteworksinElasticsearch,whyitisused,andhowtocontrolit.Ifyouhaveeverusedqueries,suchastheprefixqueryandthewildcardquery,basicallyanyquerythatissaidtobemultiterm(aquerythatisbuiltofmultipleterms),you’veusedqueryrewritingeventhoughyoumaynothaveknownaboutit.Elasticsearchdoesrewriteforperformancereasons.Therewriteprocessisaboutchangingtheoriginal,expensivequeryintoasetofqueriesthatarefarlessexpensivefromanApacheLucenepointofview,thusspeedingupthequeryexecution.

www.EBooksWorld.ir

Page 301: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PrefixqueryasanexampleThebestwaytoillustratehowtherewriteprocessisdoneinternallyistolookatanexampleandseewhichtermsareusedinsteadoftheoriginalqueryterm.Wewillindexthreedocumentstoourlibrary_itindexbyusingthefollowingcommands:

curl-XPOST'localhost:9200/library_it/book/1'-d'{"title":"Solr4

Cookbook"}'

curl-XPOST'localhost:9200/library_it/book/2'-d'{"title":"Solr3.1

Cookbook"}'

curl-XPOST'localhost:9200/library_it/book/3'-d'{"title":"Mastering

Elasticsearch"}'

Whatwewouldlikeistofindallthedocumentsthatstartwiththeletters.Simpleasthat,werunthefollowingqueryagainstourlibrary_itindex:

curl-XGET'localhost:9200/library_it/_search?pretty'-d'{

"query":{

"prefix":{

"title":{

"prefix":"s",

"rewrite":"constant_score_boolean"

}

}

}

}'

We’veusedasimpleprefixquery;we’vesaidthatwewouldliketofindallthedocumentswiththelettersinthetitlefield.We’vealsousedtherewritepropertytospecifythequeryrewritemethod,butlet’sskipitfornowaswewilldiscussthepossiblevaluesofthisparameterinthelaterpartofthissection.

Astheresponsetothepreviousquery,wegetthefollowing:

{

"took":13,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":2,

"max_score":1.0,

"hits":[{

"_index":"library_it",

"_type":"book",

"_id":"2",

"_score":1.0,

"_source":{

"title":"Solr3.1Cookbook"

}

},{

"_index":"library_it",

www.EBooksWorld.ir

Page 302: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"_type":"book",

"_id":"1",

"_score":1.0,

"_source":{

"title":"Solr4Cookbook"

}

}]

}

}

Asyoucansee,inresponsewegotthetwodocumentsthathadthecontentsofthetitlefieldstartingwiththedesiredcharacter.Wedidn’tspecifythemappingsexplicitly,sowereliedonElasticsearch’sabilitytochoosethemappingtypeforus.Aswealreadyknow,forthetextfield,Elasticsearchusesthedefaultanalyzer.Thismeansthatthetermsinourdocumentswillbelowercasedand,becauseofthat,weusedthelowercasedletterinourprefixquery(rememberthattheprefixqueryisnotanalyzed).

www.EBooksWorld.ir

Page 303: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

GettingbacktoApacheLuceneNowlet’stakeastepbackandlookatApacheLuceneagain.IfyourecallwhatLuceneinvertedindexisbuiltfrom,youcantellthatitcontainsaterm,count,anddocumentpointer(ifyoudon’trecall,refertotheFulltextsearchingsectioninChapter1,GettingStartedwithElasticsearchCluster).So,let’sseehowthesimplifiedviewoftheindexmaylookfortheprecedingdatawe’veputtothelibrary_itindex:

WhatyouseeinthecolumnwiththeTermtextisquiteimportant.IfyoulookatElasticsearchandApacheLuceneinternals,youcanseethatourprefixquerywasrewrittentothefollowingLucenequery:

ConstantScore(title:solr)

WecanchecktheportionsoftherewriteusingtheElasticsearchAPI.Firstofall,wecanusetheExplainAPIbyrunningthefollowingcommand:

curl-XGET'localhost:9200/library_it/book/1/_explain?pretty'-d'{

"query":{

"prefix":{

"title":{

"prefix":"s",

"rewrite":"constant_score_boolean"

}

}

}

}'

Theresultwillbeasfollows:

{

"_index":"library_it",

"_type":"book",

"_id":"1",

"matched":true,

"explanation":{

www.EBooksWorld.ir

Page 304: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"value":1.0,

"description":"sumof:",

"details":[{

"value":1.0,

"description":"ConstantScore(title:solr),productof:",

"details":[{

"value":1.0,

"description":"boost",

"details":[]

},{

"value":1.0,

"description":"queryNorm",

"details":[]

}]

},{

"value":0.0,

"description":"matchonrequiredclause,productof:",

"details":[{

"value":0.0,

"description":"#clause",

"details":[]

},{

"value":1.0,

"description":"_type:book,productof:",

"details":[{

"value":1.0,

"description":"boost",

"details":[]

},{

"value":1.0,

"description":"queryNorm",

"details":[]

}]

}]

}]

}

}

WecanseethatElasticsearchusedaconstantscorequerywiththetermsolragainstthetitlefield.

www.EBooksWorld.ir

Page 305: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

QueryrewritepropertiesWecancontrolhowthequeriesarerewritteninternally.Todothat,weplacetherewriteparameterinsidetheJSONobjectresponsiblefortheactualquery.Forexample:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"prefix":{

"title":"s",

"rewrite":"constant_score_boolean"

}

}

}'

Therewritepropertycantakethefollowingvalues:

scoring_boolean:ThisrewritemethodtranslateseachgeneratedtermintoaBooleanshouldclauseintheBooleanquery.Thisrewritemethodcausesthescoretobecalculatedforeachdocument.Becauseofthat,thismethodmaybeCPUdemanding.Pleasealsonotethat,forqueriesthathavemanyterms,itmayexceedtheBooleanquerylimit,whichissetto1024.ThedefaultBooleanquerylimitcanbechangedbysettingtheindex.query.bool.max_clause_countpropertyintheelasticsearch.ymlfile.However,rememberthatthemoreBooleanqueriesproduced,thelowerthequeryperformancemaybe.constant_score:Thisrewritemethodchoosesconstant_score_booleanorconstant_score_filterdependingonthequeryandtakingperformanceintoconsideration.Thisisalsothedefaultbehaviorwhentherewritepropertyisnotsetatall.constant_score_boolean:Thisrewritemethodissimilartothescoring_booleanrewritemethoddescribedpreviously,butlessCPUdemandingbecausethescoringisnotcomputedand,insteadofthat,eachtermreceivesascoreequaltothequeryboost(onebydefault,andwhichcanbesetusingtheboostproperty).BecausethisrewritemethodalsoresultsinBooleanshouldclausesbeingcreated,similartothescoring_booleanrewritemethod,thismethodcanalsohitthemaximumBooleanclauseslimit.top_terms_N:ArewritemethodthattranslateseachgeneratedtermintoaBooleanshouldclauseinaBooleanqueryandkeepsthescoresascomputedbythequery.However,unlikethescoring_booleanrewritemethod,itonlykeepsanNnumberoftopscoringtermstoavoidhittingthemaximumBooleanclauseslimitandincreasethefinalqueryperformance.top_terms_blended_freqs_N:ArewritemethodthattranslateseachtermintoaBooleanqueryandtreatthetermsasiftheyhadthesametermfrequency.top_terms_boost_N:Arewritemethodsimilartothetop_terms_None,butthescoresarenotcomputed.Instead,thedocumentsaregivenascoreequaltothevalueoftheboostproperty(onebydefault).

Forexample,ifwewouldlikeourexamplequerytousetop_terms_NwithNequalto2,ourquerywouldlooklikethis:

www.EBooksWorld.ir

Page 306: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"prefix":{

"title":{

"prefix":"s",

"rewrite":"top_terms_2"

}

}

}

}'

IfyoulookattheresultsreturnedbyElasticsearch,you’llnoticethat,unlikeourinitialquery,thedocumentsweregivenascoredifferentthanthedefault1.0:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.15342641,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"3",

"_score":0.15342641,

"_source":{

"title":"TheCompleteSherlockHolmes",

"author":"ArthurConanDoyle",

"year":1936,

"characters":["SherlockHolmes","Dr.Watson","G.Lestrade"],

"tags":[],

"copies":0,

"available":false,

"section":12

}

}]

}

}

Thescoreisdifferentthanthedefault1.0becausewe’veusedthetop_terms_NrewritetypeandthistypeofqueryrewritekeepsthescoreforNtopscoringterms.

BeforewefinishtheQueryrewritesectionofthischapter,weshouldaskourselvesonelastquestion:whentousewhichrewritetype?Theanswertothisquestiongreatlydependsonyourusecase,but,tosummarize,ifyoucanlivewithlowerprecisionandrelevancy(buthigherperformance),youcangoforthetopNrewritemethod.Ifyouneedhighprecisionandthusmorerelevantqueries(butlowerperformance),choosetheBooleanapproach.

www.EBooksWorld.ir

Page 307: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 308: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SummaryThechapteryoujustfinishedwasagainfocusedonquerying.Weusedfiltersandsawwhathighlightingisandhowtouseit.Welearnedwhatarethehighlightertypesandhowtheycanhelpus.WevalidatedourqueriesandwelearnedhowElasticsearchcanhelpuswhenitcomestosortingourresults.Finally,wediscussedqueryrewriting,whatthatbringsus,andhowwecancontrolit.

Inthenextchapter,wewillgetbacktoindexationtopic.WewilldiscussindexingcomplexJSONobjectssuchastree-likestructuresandindexingdatathatisnotflat.WewillprepareElasticsearchtohandlerelationshipsbetweendocumentsandwewillusetheElasticsearchAPItoupdatethestructureofourindices.

www.EBooksWorld.ir

Page 309: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 310: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Chapter5.ExtendingYourIndexStructureWestartedthepreviouschapterbylearninghowtodealwithrevisedfilteringinElasticsearch2.xandwhattoexpectfromitnow.Wealsoexploredhighlightingandhowitcanhelpusinimprovingtheusers’searchexperience.WediscoveredqueryvalidationinElasticsearchandlearnedthewaysofdatasortinginElasticsearch.Finally,wediscussedqueryrewritingandhowthataffectsourqueries.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

Indexingtree-likestructuresIndexingdatathatisnotflatHandlingdocumentrelationshipsbyusingnestedobjectandparent–childfeaturesModifyingindexstructurebyusingElasticsearchAPI

www.EBooksWorld.ir

Page 311: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Indexingtree-likestructuresTreesareeverywhere.Ifyoudevelopane-commerceshopapplication,yourproductswillprobablybedescribedwiththeuseofcategories.Thethingaboutcategoriesisthatinmostcasestheyarehierarchical.Therearetopcategories,suchaselectronics,music,books,andsoon.Eachofthetoplevelcategoriescanhavenumerouschildrencategories,suchasfictionandscience,andthosecangetevendeeperintosciencefiction,romance,andsoon.Ifyoulookatthefilesystem,thefilesanddirectoriesarearrangedintree-likestructuresaswell.Thisbookcanalsoberepresentedasatree:chapterscontaintopicsandtopicsaredividedintosubtopics.Sothedataaroundusisarrangedintotree-likestructuresandasyoucanimagine,Elasticsearchiscapableofindexingtree-likestructuressothatwecanrepresentthedatainaneasiermanner.Let’scheckhowwecannavigatethroughthistypeofdatausingpath_analyzer.

www.EBooksWorld.ir

Page 312: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DatastructureTobeginwith,let’screateasimpleindexstructurebyusingthefollowingcommand:

curl-XPUT'localhost:9200/path?pretty'-d'{

"settings":{

"index":{

"analysis":{

"analyzer":{

"path_analyzer":{"tokenizer":"path_hierarchy"}

}

}

}

},

"mappings":{

"category":{

"properties":{

"category":{

"type":"string",

"fields":{

"name":{"type":"string","index":"not_analyzed"},

"path":{"type":"string","analyzer":"path_analyzer",

"store":true}

}

}

}

}

}

}'

Asyoucansee,wehaveasingletypecreated–thecategorytype.Wewilluseittostoreandindextheinformationaboutthelocationofourdocumentinthetreestructure.Theideaissimple–wecanshowthelocationofthedocumentasapath,intheexactsamemannerasthefilesanddirectoriesarepresentedonyourharddiskdrive.Forexample,inanautomotiveshop,wecanhave/cars/passenger/sport,/cars/passenger/camper,or/cars/delivery_truck/.However,toachievethat,weneedtoindexthispathintwodifferentways.Firstofall,wewilluseannotanalyzedfieldcalledname,tostoreandindexpathsnameinitsoriginalform.Wewillalsouseafieldcalledpath,whichwillusethepath_analyzeranalyzerwhichwe’vedefinedtoprocessthepathsoitiseasiertosearch.

www.EBooksWorld.ir

Page 313: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AnalysisNow,let’sseewhatElasticsearchwilldowiththecategorypathduringtheanalysisprocess.Toseethis,wewillusethefollowingcommandline,whichusestheanalysisAPIdiscussedintheUnderstandingtheexplaininformationsectionofChapter6,MakeYourSearchBetter:

curl-XGET'localhost:9200/path/_analyze?field=category.path&pretty'-d

'/cars/passenger/sport'

ThefollowingresultswillbereturnedbyElasticsearch:

{

"tokens":[{

"token":"/cars",

"start_offset":0,

"end_offset":5,

"type":"word",

"position":0

},{

"token":"/cars/passenger",

"start_offset":0,

"end_offset":15,

"type":"word",

"position":0

},{

"token":"/cars/passenger/sport",

"start_offset":0,

"end_offset":21,

"type":"word",

"position":0

}]

}

Aswecansee,ourcategorypath/cars/passenger/sportwasprocessedbyElasticsearchanddividedintothreetokens.Thankstothis,wecansimplyfindeverydocumentthatbelongstoagivencategoryoritssubcategoriesusingthetermfilter.Fortheexampletobecomplete,let’sindexasimpledocumentbyusingthefollowingcommand:

curl-XPUT'localhost:9200/path/category/1'-d'{"category":

"/cars/passenger/sport"}'

Anexampleofusingfiltersisasfollows:

curl-XGET'localhost:9200/path/_search?pretty'-d'{

"query":{

"bool":{

"filter":{

"term":{

"category.path":"/cars"

}

}

}

}

}'

www.EBooksWorld.ir

Page 314: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Notethatwealsohavetheoriginalvalueindexedinthecategory.namefield.Thisishandywhenwewanttofinddocumentsfromaparticularpath,ignoringthedocumentsthataredeeperinthehierarchy.

www.EBooksWorld.ir

Page 315: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 316: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndexingdatathatisnotflatNotalldataisflatliketheexampleswehaveusedinthebookuntilnow.MostofthedatayouwillencounterwillhavesomestructureandnestedobjectsinsidetherootJSONobject.Ofcourse,ifwearebuildingoursystemthatElasticsearchwillbeapartofandweareincontrolofallthepiecesofit,wecancreateastructurethatisconvenientforElasticsearch.Buteveninsuchcases,flatdataisnotalwaysanoption.Thankfully,Elasticsearchallowsustoindexdatathatisnotflatandthissectionwillshowushowtodothat.

www.EBooksWorld.ir

Page 317: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DataLet’sassumethatwehavethefollowingdata(westoreitinthefilecalledstructured_data.json):

{

"author":{

"name":{

"firstName":"Fyodor",

"lastName":"Dostoevsky"

}

},

"isbn":"123456789",

"englishTitle":"CrimeandPunishment",

"year":1886,

"characters":[

{

"name":"Raskolnikov"

},

{

"name":"Sofia"

}

],

"copies":0

}

Asyoucanseethedataisnotflat–itcontainsarraysandnestedobjects.Ifwewanttocreatemappingsandusetheknowledgethatwe’vegotsofar,wewillhavetoflattenthedata.However,aswealreadysaid,Elasticsearchallowssomedegreeofstructureandweshouldbeabletocreatemappingsthatwillworkfortheprecedingexample.

www.EBooksWorld.ir

Page 318: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ObjectsTheprecedingexampledatashowsthestructuredJSONfile.Asyoucanseeintheexample,ourrootobjecthassomeadditional,simpleproperties,suchasenglishTitle,isbn,year,andcopies.Thesewillbeindexedasnormalfieldsintheindexandwealreadyknowhowtodealwiththem(wediscussedthatintheMappingsconfigurationsectionofChapter2,IndexingYourData).Inadditiontothat,ithasthecharactersarraytypeandtheauthorobject.Theauthorobjecthasanotherobjectnestedwithinit–thenameobject,whichhastwoproperties:firstNameandlastName.Soasyoucansee,wecanhavemultiplenestedobjectsinsideeachother.

www.EBooksWorld.ir

Page 319: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ArraysWehavealreadyusedarraytypedata,butwedidn’ttalkaboutit.Bydefault,allthefieldsinLuceneandthusinElasticsearcharemultivalued,whichmeansthattheycanstoremultiplevalues.InordertosendsuchfieldstoindexingtoElasticsearch,weusetheJSONarraytype,whichisnestedwithintheopeningandclosingsquarebrackets[].Asyoucanseeintheprecedingexample,weusedthearraytypeforthecharactersofourbook.

www.EBooksWorld.ir

Page 320: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

MappingsLet’snowlookathowourmappingswouldlooklikeforthebookobjectweshowedearlier.Wealreadysaidthattoindexarrayswedon’tneedanythingspecial.So,inourcase,toindexthecharactersdatawewillneedtoaddfieldsdefinitionsimilartothefollowingone:

"characters":{

"properties":{

"name":{"type":"string"}

}

}

Nothingstrange!Wejustnestthepropertiessectioninsidethearraysname(whichischaractersinourcase)andwedefinethefieldsthere.Astheresultoftheprecedingmappings,wewillgetthecharacters.namemultivaluedfieldintheindex.

Wedosimilarthingforourauthorobject.Wecallthesectionwiththesamenameasitispresentinthedata.Wehavetheauthorobject,butitalsohasthenameobjectnestedinit,sowedothesame–wejustnestanotherobjectinsideit.So,ourmappingsfortheauthorfieldwouldlookasfollows:

"author":{

"properties":{

"name":{

"properties":{

"firstName":{"type":"string"},

"lastName":{"type":"string"}

}

}

}

}

ThefirstNameandlastNamefieldsappearintheindexasauthor.name.firstNameandauthor.name.lastName.

Therestofthefieldsaresimplecoretypes,soI’llskipdiscussingthemastheywerealreadydiscussedintheMappingsconfigurationsectionofChapter2,IndexingYourData.

FinalmappingsSoourfinalmappingsfile,thatwe’vecalledstructured_mapping.json,lookslikethefollowing:

{

"book":{

"properties":{

"author":{

"type":"object",

"properties":{

"name":{

"type":"object",

www.EBooksWorld.ir

Page 321: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"properties":{

"firstName":{"type":"string"},

"lastName":{"type":"string"}

}

}

}

},

"isbn":{"type":"string"},

"englishTitle":{"type":"string"},

"year":{"type":"integer"},

"characters":{

"properties":{

"name":{"type":"string"}

}

},

"copies":{"type":"integer"}

}

}

}

SendingthemappingstoElasticsearchNowthatwehaveourmappingsdone,wewouldliketotestifalltheworkwedidactuallyworks.Thistimewewilluseaslightlydifferenttechniqueofcreatinganindexandputtingthemappings.First,let’screatethelibraryindexwiththefollowingcommand(youneedtodeletethelibraryindexifyoualreadyhaveit):

curl-XPUT'localhost:9200/library'

Now,let’ssendourmappingsforthebooktype:

curl-XPUT'localhost:9200/library/book/_mapping'-d

@structured_mapping.json

Nowwecanindexourexampledata:

curl-XPOST'localhost:9200/library/book/1'-d@structured_data.json

www.EBooksWorld.ir

Page 322: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TobeornottobedynamicAswealreadyknow,Elasticsearchisschema-less,whichmeansthatitcanindexdatawithouttheneedofcreatingthemappingsupfront.WhatElasticsearchwilldointhebackgroundwhenanewfieldisencounteredinthedataisamappingupdate–itwilltrytoguessthefieldtypeandaddittothemappings.ThedynamicbehaviorofElasticsearchisturnedonbydefault,buttheremaybesituationswhereyoumaywanttoturnitoffforsomepartsofyourindex.Inordertodothat,oneshouldaddthedynamicpropertytothegivenfieldandsetittofalse.Thisshouldbedoneonthesamelevelofnestingasthetypepropertyfortheobject,whichshouldn’tbedynamic.Forexample,ifwewantourauthorandnameobjectstonotbedynamic,weshouldmodifytherelevantpartofthemappingsfilesothatitlooksasfollows:

"author":{

"type":"object",

"dynamic":false,

"properties":{

"name":{

"type":"object",

"dynamic":false,

"properties":{

"firstName":{"type":"string","index":"analyzed"},

"lastName":{"type":"string","index":"analyzed"}

}

}

}

}

However,rememberthatinordertoaddnewfieldsforsuchobjects,wewouldhavetoupdatethemappings.

NoteYoucanalsoturnoffthedynamicmappingsfunctionalitybyaddingtheindex.mapper.dynamicpropertytoyourelasticsearch.ymlconfigurationfileandsettingittofalse.

www.EBooksWorld.ir

Page 323: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DisablingobjectindexingThereisoneadditionalthingthatwewouldliketomentionwhenitcomestoobjectshandling–wecandisableindexingaparticularobjectbyusingtheenabledpropertyandsettingittofalse.Theremaybevariousreasonsforthat,suchasnotwantingafieldtobeindexedornotwantingawholeJSONobjecttobeindexed.Forexample,ifwewanttoomitanobjectcalledinformationfromourauthorobject,wewillhavetheauthorobjectdefinitionlookasfollows:

"author":{

"type":"object",

"properties":{

"name":{

"type":"object",

"dynamic":false,

"properties":{

"firstName":{"type":"string","index":"analyzed"},

"lastName":{"type":"string","index":"analyzed"},

"information":{"type":"object","enabled":false}

}

}

}

}

Thedynamicparametercanalsobesettostrict.Thismeansthatnewfieldswon’tbeaddedintothedocumentwhentheyappearandtheindexingofsuchdocumentwillfail.

www.EBooksWorld.ir

Page 324: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 325: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UsingnestedobjectsNestedobjectscancomeinhandyincertainsituations.Basically,withnestedobjectsElasticsearchallowsustoconnectmultipledocumentstogether–onemaindocumentandmultipledependentones.Themaindocumentandthenestedonesareindexedtogetherandtheyareplacedinthesamesegmentoftheindex(actually,inthesameblockinsidethesegment,neareachother),whichguaranteesthebestperformancewecangetforsuchadatastructure.Thesamegoesforchangingthedocument;unlessyouareusingtheupdateAPI,youneedtoindextheparentdocumentandalltheothernestedonesatthesametime.

NoteIfyouwouldliketoreadmoreabouthownestedobjectsworkontheApacheLucenelevel,thereisaverygoodblogpostwrittenbyMikeMcCandlessathttp://blog.mikemccandless.com/2012/01/searching-relational-content-with.html.

Nowlet’sgetonwithourexampleusecase.Imaginethatwehaveashopwithclothesandwestorethesizeandcolorofeacht-shirt.Ourstandard,non-nestedmappingswilllooklikethis(storedincloth.json):

{

"cloth":{

"properties":{

"name":{"type":"string"},

"size":{"type":"string","index":"not_analyzed"},

"color":{"type":"string","index":"not_analyzed"}

}

}

}

Tocreatetheshopindexwithoutclothmapping,werunthefollowingcommands:

curl-XPOST'localhost:9200/shop'

curl-XPUT'localhost:9200/shop/cloth/_mapping'[email protected]

Nowimaginethatwehaveat-shirtinourshopthatweonlyhaveinXXLsizeinredandinXLsizeinblack.Soourexampledocumentindexationcommandwilllookasfollows:

curl-XPOST'localhost:9200/shop/cloth/1'-d'{

"name":"Testshirt",

"size":["XXL","XL"],

"color":["red","black"]

}'

However,thereisaproblemwithsuchadatastructure.WhatifoneofourclientssearchesourshopinordertofindtheXXLt-shirtinblack?Let’scheckthatbyrunningthefollowingquery(weassumethatwe’veusedourmappingstocreatetheindexandwe’veindexedourexampledocument):

curl-XGET'localhost:9200/shop/cloth/_search?pretty=true'-d'{

"query":{

www.EBooksWorld.ir

Page 326: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"bool":{

"must":[

{

"term":{"size":"XXL"}

},

{

"term":{"color":"black"}

}

]

}

}

}'

Weshouldgetnoresults,right?ButinfactElasticsearchreturnedthefollowingdocument:

{

(…)

"hits":{

"total":1,

"max_score":0.4339554,

"hits":[{

"_index":"shop",

"_type":"cloth",

"_id":"1",

"_score":0.4339554,

"_source":{

"name":"Testshirt",

"size":["XXL","XL"],

"color":["red","black"]

}

}]

}

}

Thisisbecausethedocumentwasmatched–wehavethevalueswearesearchingforinthesizefieldandinthecolorfield.Ofcourse,thisisnotwhatwewouldliketoget.

So,let’smodifyourmappingstousethenestedobjectstoseparatecolorandsizetodifferentnesteddocuments.Thefinalmappinglooksasfollows(westorethesemappingsinthecloth_nested.jsonfile):

{

"cloth":{

"properties":{

"name":{"type":"string","index":"analyzed"},

"variation":{

"type":"nested",

"properties":{

"size":{"type":"string","index":"not_analyzed"},

"color":{"type":"string","index":"not_analyzed"}

}

}

}

}

}

www.EBooksWorld.ir

Page 327: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Now,wewillcreateasecondindexcalledshop_nestedusingourmodifiedmappingsbyrunningthefollowingcommands:

curl-XPOST'localhost:9200/shop_nested'

curl-XPUT'localhost:9200/shop_nested/cloth/_mapping'-d

@cloth_nested.json

Asyoucansee,we’veintroducedanewobjectinsideourclothtype–variationone,whichisanestedone(thetypepropertysettonested).Itbasicallysaysthatwewillwanttoindexthenesteddocuments.Now,let’smodifyourdocument.Wewilladdthevariationobjecttoitandthatobjectwillstoretheobjectswithtwoproperties–sizeandcolor.Sotheindexcommandforourmodifiedexampleproductwilllooklikethefollowing:

curl-XPOST'localhost:9200/shop_nested/cloth/1'-d'{

"name":"Testshirt",

"variation":[

{"size":"XXL","color":"red"},

{"size":"XL","color":"black"}

]

}'

We’vestructuredthedocumentsothateachsizeanditsmatchingcolorisaseparatedocument.However,ifyourunourpreviousquery,itwon’treturnanydocuments.Thisisbecauseinordertoqueryfornesteddocuments,weneedtouseaspecializedquery.Sonowourquerylooksasfollows:

curl-XGET'localhost:9200/shop_nested/cloth/_search?pretty=true'-d'{

"query":{

"nested":{

"path":"variation",

"query":{

"bool":{

"must":[

{"term":{"variation.size":"XXL"}},

{"term":{"variation.color":"black"}}

]

}

}

}

}

}'

Andnow,theprecedingquerywillnotreturntheindexeddocument,becausewedon’thaveanesteddocumentthathasthesizeequaltoXXLandcolorblack.

Let’sgetbacktothequeryforasecondtodiscussitbriefly.Asyoucansee,weusedthenestedqueryinordertosearchinthenesteddocuments.Thepathpropertyspecifiesthenameofthenestedobject(yes,wecanhavemultipleofthem).Wejustincludedastandardquerysectionunderthenestedtype.Alsonotethatwespecifiedthefullpathforthefieldnamesinthenestedobjects,whichishandywhenyouhavemultilevelnesting,whichisalsopossible.

www.EBooksWorld.ir

Page 328: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ScoringandnestedqueriesThereisoneadditionalpropertywhenitcomestohandlingnesteddocumentsduringquery.Inadditiontothepathproperty,thereisthescore_modeproperty,whichallowsustodefinehowthescoringiscalculatedfromthenestedqueries.Elasticsearchallowsustosetthescore_modepropertytooneofthefollowingvalues:

avg:Thisisthedefaultvalue.Usingitforthescore_modepropertywillresultinElasticsearchtakingtheaveragevaluecalculatedfromthescoresofthedefinednestedqueries.Calculatedaveragewillbeincludedinthescoreofthemainquery.sum:Usingthisvalueforthescore_modepropertywillresultinElasticsearchtakingasumofthescoresforeachnestedqueryandincludingitinthescoreofthemainquery.min:Usingthisvalueforthescore_modepropertywillresultinElasticsearchtakingthescoreoftheminimumscoringnestedqueryandincludingitinthescoreofthemainquery.max:Usingthisvalueforthescore_modepropertywillresultinElasticsearchtakingthescoreofthemaximumscoringnestedqueryandincludingitinthescoreofthemainquery.none:Usingthisvalueforthescore_modepropertywillresultinnoscorebeingtakenfromthenestedquery.

www.EBooksWorld.ir

Page 329: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 330: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Usingtheparent-childrelationshipIntheprevioussection,wediscussedusingElasticsearchtoindexthenesteddocumentsalongwiththeparentone.However,eventhoughthenesteddocumentsareindexedasseparatedocumentsintheindex,wecan’tchangeasinglenesteddocument(unlessweusetheupdateAPI).Elasticsearchallowsustohavearealparent-childrelationshipandwewilllookatitinthefollowingsection.

www.EBooksWorld.ir

Page 331: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndexstructureanddataindexingLet’susethesameexamplethatweusedwhendiscussingthenesteddocuments–thehypotheticalclothstore.Whatwewouldliketohaveistheabilitytoupdatethesizesandcolorswithouttheneedtoindexthewholeparentdocumentaftereachchange.WewillseehowtoachievethatusingElasticsearchparent-childfunctionality.

ChildmappingsFirstwehavetocreateachildindexdefinition.Tocreatechildmappings,weneedtoaddthe_parentpropertywiththenameoftheparenttype,whichwillbeclothinourcase.Inthechildrendocuments,wewanttohavethesizeandthecolorofthecloth.So,thecommandthatwillcreatetheshopindexandthevariationtypewilllookasfollows:

curl-XPOST'localhost:9200/shop'

curl-XPUT'localhost:9200/shop/variation/_mapping'-d'{

"variation":{

"_parent":{"type":"cloth"},

"properties":{

"size":{"type":"string","index":"not_analyzed"},

"color":{"type":"string","index":"not_analyzed"}

}

}

}'

Andthat’sall.Youdon’tneedtospecifywhichfieldwillbeusedtoconnectthechilddocumentstotheparentones.Bydefault,Elasticsearchwillusethedocuments’uniqueidentifierforthat.Ifyourememberfromthepreviouschapters,theinformationaboutauniqueidentifierispresentintheindexbydefault.

ParentmappingsTheonlyfieldweneedtohaveinourparentdocumentisname.Wedon’tneedanythingmorethanthat.So,inordertocreateourclothtypeintheshopindex,wewillrunthefollowingcommands:

curl-XPUT'localhost:9200/shop/cloth/_mapping'-d'{

"cloth":{

"properties":{

"name":{"type":"string"}

}

}

}'

TheparentdocumentNowwearegoingtoindexourparentdocument.Aswewanttostoretheinformationaboutthesizeandthecolorinthechilddocuments,theonlythingweneedtohaveintheparentdocumentsisthename.Ofcourse,thereisonethingtoremember–ourparentdocumentsneedtobeoftypecloth,becauseofthe_parentpropertyvalueinthechildmappings.Theindexingcommandforourparentdocumentisverysimpleandlooksasfollows:

www.EBooksWorld.ir

Page 332: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

curl-XPOST'localhost:9200/shop/cloth/1'-d'{

"name":"Testshirt"

}'

Ifyoulookattheprecedingcommand,you’llnoticethatourdocumentwillbegiventheidentifier1.

ChilddocumentsToindexthechilddocuments,weneedtoprovideinformationabouttheparentdocumentwiththeuseoftheparentrequestparameter.Thevalueoftheparentparametershouldpointtotheidentifieroftheparentdocument.So,toindextwochilddocumentstoourparentdocument,weneedtorunthefollowingcommandlines:

curl-XPOST'localhost:9200/shop/variation/1000?parent=1'-d'{

"color":"red",

"size":"XXL"

}'

curl-XPOST'localhost:9200/shop/variation/1001?parent=1'-d'{

"color":"black",

"size":"XL"

}'

Andthat’sall.We’veindexedtwoadditionaldocuments,whichareofourvariationtype,butwe’vespecifiedthatourdocumentshaveaparent,thedocumentwithanidentifierof1.

www.EBooksWorld.ir

Page 333: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

QueryingWe’veindexedourdataandnowweneedtouseappropriatequeriestomatchthedocumentswiththedatastoredintheirchildren.Thisisbecause,bydefault,Elasticsearchsearchesonthedocumentswithoutlookingattheparent-childrelations.Forexample,thefollowingquerywillmatchallthreedocumentsthatwe’veindexed(twochildrenandoneparent):

curl-XGET'localhost:9200/shop/_search?q=*&pretty'

Thisisnotwhatwewouldliketoachieve,atleastinmostcases.Usually,weareinterestedinparentdocumentsthathavechildrenmatchingthequery.OfcourseElasticsearchprovidessuchfunctionalitieswithspecializedtypesofqueries.

NoteThethingtorememberthoughisthat,whenrunningqueriesagainstparents,thechildrendocumentswon’tbereturned,andviceversa.

QueryingdatainthechilddocumentsImaginethatwewanttogetclothesthatareoftheXXLsizeandarered.Asyourecall,thesizeandthecoloroftheclothareindexedinthechilddocuments,soweneedaspecializedhas_childquery,tocheckwhichparentdocumentshavechildrenwiththedesiredsizeandcolor.Soanexamplequerythatmatchesourrequirementlooksasfollows:

curl-XGET'localhost:9200/shop/_search?pretty'-d'{

"query":{

"has_child":{

"type":"variation",

"query":{

"bool":{

"must":[

{"term":{"size":"XXL"}},

{"term":{"color":"red"}}

]

}

}

}

}

}'

Thequeryisquitesimple;itisofthehas_childtype,whichtellsElasticsearchthatwewanttosearchinthechilddocuments.Inordertospecifywhichtypeofchildrenweareinterestedin,wespecifythetypepropertywiththenameofthechildtype.Thequeryisprovidedusingthequeryproperty.We’veusedastandardboolquery,whichwe’vealreadydiscussed.Theresultofthequerywillcontainonlythoseparentdocumentsthathavechildrenmatchingourboolquery.Inourcase,thesingledocumentreturnedlooksasfollows:

{

www.EBooksWorld.ir

Page 334: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"took":16,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":1.0,

"hits":[{

"_index":"shop",

"_type":"cloth",

"_id":"1",

"_score":1.0,

"_source":{

"name":"Testshirt"

}

}]

}

}

Thehas_childqueryallowsustoprovideadditionalparameterstocontrolitsbehavior.Everyparentdocumentfoundmaybeconnectedwithoneormorechilddocuments.Thismeansthateverychilddocumentcaninfluencetheresultingscore.Bydefault,thequerydoesn’tcareaboutthechildrendocuments,howmanyofthemmatched,andwhatistheircontent–itonlymattersiftheymatchthequeryornot.Thiscanbechangedbyusingthescore_modeparameter,whichcontrolsthescorecalculationofthehas_childquery.Thevaluesthisparametercantakeare:

none:Thedefaultone,thescoregeneratedbytherelationis1.0min:Thescoreistakenfromthelowestscoredchildmax:Thescoreistakenfromthehighestscoredchildsum:Thescoreiscalculatedasthesumofthechildscoresavg:Thescoreistakenastheaverageofthechildscores

Let’sseeanexample:

curl-XGET'localhost:9200/shop/_search?pretty'-d'{

"query":{

"has_child":{

"type":"variation",

"score_mode":"sum",

"query":{

"bool":{

"must":[

{"term":{"size":"XXL"}},

{"term":{"color":"red"}}

]

}

}

}

}

}'

www.EBooksWorld.ir

Page 335: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Weusedsumasscore_modewhichresultsinchildrencontributingtothefinalscoreoftheparentdocument–thecontributionisthesumofscoresofeverychilddocumentmatchingthequery.

Andfinally,wecanlimitthenumberofchildrendocumentsthatneedtobematched;wecanspecifyboththemaximumnumberofthechildrendocumentsallowedtobematched(themax_childrenproperty)andtheminimumnumberofchildrendocuments(themin_childrenproperty)thatneedtobematched.Thequeryillustratingtheusageoftheseparametersisasfollows:

curl-XGET'localhost:9200/shop/_search?pretty'-d'{

"query":{

"has_child":{

"type":"variation",

"min_children":1,

"max_children":3,

"query":{

"bool":{

"must":[

{"term":{"size":"XXL"}},

{"term":{"color":"red"}}

]

}

}

}

}

}'

QueryingdataintheparentdocumentsSometimes,wearenotinterestedintheparentdocumentsbutinthechildrendocuments.Ifyouwouldliketoreturnthechilddocumentsthatmatchesagivendataintheparentdocument,Elasticsearchhasaqueryforus–thehas_parentquery.Itissimilartothehas_childquery;however,insteadofthetypeproperty,wespecifytheparent_typepropertywiththevalueoftheparentdocumenttype.Forexample,thefollowingquerywillreturnboththechilddocumentsthatwe’veindexed,butnottheparentdocument:

curl-XGET'localhost:9200/shop/_search?pretty'-d'{

"query":{

"has_parent":{

"parent_type":"cloth",

"query":{

"term":{"name":"test"}

}

}

}

}'

TheresponsefromElasticsearchwillbesimilartothefollowingone:

{

"took":3,

"timed_out":false,

"_shards":{

www.EBooksWorld.ir

Page 336: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":2,

"max_score":1.0,

"hits":[{

"_index":"shop",

"_type":"variation",

"_id":"1000",

"_score":1.0,

"_routing":"1",

"_parent":"1",

"_source":{

"color":"red",

"size":"XXL"

}

},{

"_index":"shop",

"_type":"variation",

"_id":"1001",

"_score":1.0,

"_routing":"1",

"_parent":"1",

"_source":{

"color":"black",

"size":"XL"

}

}]

}

}

Similartothehas_childquery,thehas_parentqueryalsogivesusthepossibilityoftuningthescorecalculationofthequery.Inthiscase,score_modehasonlytwooptions:none,thedefaultonewherethescorecalculatedbythequeryisequalto1.0,andscore,whichcalculatesthescoreofthedocumentonthebasisoftheparentdocumentcontents.Anexamplethatusesscore_modeinthehas_parentquerylooksasfollows:

curl-XGET'localhost:9200/shop/_search?pretty'-d'{

"query":{

"has_parent":{

"parent_type":"cloth",

"score_mode":"score",

"query":{

"term":{"name":"test"}

}

}

}

}'

Theonedifferencewiththepreviousexampleisscore_mode.Ifyouchecktheresultsofthesequeries,you’llnoticeonlyasingledifference.Thescoreofallthedocumentsfromthefirstexampleis1.0,whilethescorefortheresultsreturnedbytheprecedingqueryisequalto0.8784157.Inthiscase,allthedocumentsfoundhavethesamescore,because

www.EBooksWorld.ir

Page 337: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

theyhaveacommonparentdocument.

www.EBooksWorld.ir

Page 338: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PerformanceconsiderationsWhenusingElasticsearchparent-childfunctionality,youhavetobeawareoftheperformanceimpactthatithas.Thefirstthingyouneedtorememberisthattheparentandthechilddocumentsneedtobestoredinthesameshardinorderforthequeriestowork.Ifyouhappentohaveahighnumberofchildrenforasingleparent,youmayendupwithshardsnothavingasimilarnumberofdocuments.Becauseofthat,yourqueryperformancecanbelowerononeofthenodes,resultinginthewholequerybeingslower.Also,rememberthatparent-childquerieswillbeslowerthanonesthatrunagainstthedocumentsthatdon’thavearelationshipbetweenthem.Thereisawayofspeedingupjoinsfortheparent-childqueriesatthecostofmemorybyeagerlyloadingthesocalledglobalordinals;however,wewilldiscussthatmethodintheElasticsearchcachessectionofChapter9,ElasticsearchClusterinDetail.

Finally,thefirstquerywillpreloadandcachethedocumentidentifiersusingthedocvalues.Thistakestime.Inordertoimprovetheperformanceofinitialqueriesthatusetheparent-childrelationship,WarmerAPIcanbeused.YoucanfindmoreinformationabouthowtoaddwarmingqueriestoElasticsearchintheWarmingupsectionofChapter10,AdministratingYourCluster.

www.EBooksWorld.ir

Page 339: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 340: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ModifyingyourindexstructurewiththeupdateAPIInthepreviouschapters,wediscussedhowtocreateindexmappingsandindexthedata.Butwhatifyoualreadyhavethemappingscreated,anddataindexed,butyouwanttomodifythestructureoftheindex?Ofcourseonecouldsaythatwecouldjustcreateanewindexwithnewmappings,butthatisnotalwaysapossibility,especiallyinaproductionenvironment.Thisispossibletosomeextent.Forexample,bydefault,ifweindexadocumentwithanewfield,Elasticsearchwilladdthatfieldtotheindexstructure.Let’snowlookathowtomodifytheindexstructuremanually.

NoteForsituationswheremappingchangesareneededandtheyarenotpossiblebecauseofconflictswiththecurrentindexstructure,itisverygoodtousealiases–bothreadandwriteones.WewilldiscussaliasingintheIndexaliasingsectionofChapter10,AdministratingYourCluster.

www.EBooksWorld.ir

Page 341: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThemappingsLet’sassumethatwehavethefollowingmappingsforourusersindexstoredintheuser.jsonfile:

{

"user":{

"properties":{

"name":{"type":"string"}

}

}

}

Asyoucansee,itisverysimple.Itjusthasasinglepropertythatwillholdtheusername.Nowlet’screateanindexcalledusersandlet’susetheprecedingmappingstocreateourtype.Todothat,wewillrunthefollowingcommands:

curl-XPOST'localhost:9200/users'

curl-XPUT'localhost:9200/users/user/_mapping'[email protected]

Ifeverythinggoeswell,wewillhaveourindex(calledusers)andtype(calleduser)created.Sonowlet’strytoaddanewfieldtothemappings.

AddinganewfieldtotheexistingindexInordertoillustratehowtoaddanewfieldtoourmappings,weassumethatwewanttoaddaphonenumbertothedatastoredforeachuser.Inordertodothat,weneedtosendanHTTPPUTcommandtothe/index_name/type_name/_mappingRESTendpointwiththeproperbodythatwillincludeournewfield.Forexample,toaddthementionedphonefield,wewillrunthefollowingcommand:

curl-XPUT'http://localhost:9200/users/user/_mapping'-d'{

"user":{

"properties":{

"phone":{"type":"string",index:"not_analyzed"}

}

}

}'

Similartothepreviouscommandweran,ifeverythinggoeswell,weshouldhaveanewfieldaddedtoourindexstructure.

NoteOfcourse,Elasticsearchwon’treindexourdataorpopulatethenewlyaddedfieldautomatically.Itwilljustalterthemappingsheldbythemasternodeandpopulatethemappingstoalltheothernodesintheclusterandthat’sall.Datareindexationmustbedonebyusortheapplicationthatindexesthedatainourenvironment.Untilthen,theolddocumentswon’thavethenewlyaddedfield.Thisiscrucialtoremember.Ifyoudon’thavetheoriginaldocuments,youcanusethe_sourcefieldtogettheoriginaldatafromElasticsearchandindexthemonceagain.

Toensureeverythingisokay,wecanruntheGETHTTPrequesttothe_mappingRESTend

www.EBooksWorld.ir

Page 342: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

pointandElasticsearchwillreturntheappropriatemappings.Anexamplecommandtogetthemappingsforourusertypeintheusersindexwilllookasfollows:

curl-XGET'localhost:9200/users/user/_mapping?pretty'

ModifyingfieldsofanexistingindexOurusersindexstructurecontainstwofields:nameandphone.Let’simaginethatweindexedsomedatabutafterawhilewedecidedthatwewanttosearchonthephonefieldandwewouldliketochangeitsindexpropertyfromnot_analyzedtoanalyzed.Becausewealreadyknowhowtoaltertheindexstructure,wewillrunthefollowingcommand:

curl-XPUT'http://localhost:9200/users/user/_mapping?pretty'-d'{

"user":{

"properties":{

"phone":{"type":"string","store":"yes","index":"analyzed"}

}

}

}'

WhatElasticsearchwillreturnisaresponseindicatinganerror,whichlooksasfollows:

{

"error":{

"root_cause":[{

"type":"illegal_argument_exception",

"reason":"Mapperfor[phone]conflictswithexistingmappingin

othertypes:\n[mapper[phone]hasdifferent[index]values,mapper[phone]

hasdifferent[store]values,mapper[phone]hasdifferent[omit_norms]

values,cannotchangefromdisabletoenabled,mapper[phone]hasdifferent

[analyzer]]"

}],

"type":"illegal_argument_exception",

"reason":"Mapperfor[phone]conflictswithexistingmappinginother

types:\n[mapper[phone]hasdifferent[index]values,mapper[phone]has

different[store]values,mapper[phone]hasdifferent[omit_norms]values,

cannotchangefromdisabletoenabled,mapper[phone]hasdifferent

[analyzer]]"

},

"status":400

}

Thisisbecausewecan’tchangeafieldthatwassettobenot_analyzedtoonethatisanalyzed.Andnotonlythat,inmostcasesyouwon’tbeabletoupdatethefieldsmapping.Thisisagoodthing,becauseifwewouldbeallowedtochangesuchsettings,wewouldconfuseElasticsearchandLucene.Imaginethatwealreadyhavemanydocumentswiththephonefieldsettonot_analyzedandweareallowedtochangethemappingstoanalyzed.Elasticsearchwouldn’tchangethedatathatwasalreadyindexed,butthequeriesthatareanalyzedwouldbeprocessedwithadifferentlogicandthusyouwouldn’tbeabletoproperlyfindyourdata.

However,togiveyousomeexamplesofwhatisprohibitedandwhatisnot,wedecidedtomentionsomeoftheoperationsforboththecases.Forexample,thefollowingmodificationcanbesafelymade:

www.EBooksWorld.ir

Page 343: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AddinganewtypedefinitionAddinganewfieldAddinganewanalyzer

Thefollowingmodificationsareprohibitedorwillnotwork:

EnablingnormsforafieldChangingafieldtobestoredornotstoredChangingthetypeofthefield(forexample,fromtexttonumeric)ChangingastoredfieldtonotstoredandviceversaChangingthevalueofindexedpropertyChangingtheanalyzerofanalreadyindexeddocument

RememberthattheprecedingmentionedexamplesofallowedandnotallowedupdatesdonotmentionallthepossibilitiesofupdateAPIusageandyouhavetotryforyourselfiftheupdateyouaretryingtodowillwork.

www.EBooksWorld.ir

Page 344: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 345: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SummaryThechapteryoujustfinishedreadingconcentratedonindexingoperationsandhandlingdatathatisnotflatorhaverelationshipsbetweenthedocuments.Westartedwithindexingtree-likestructuresandobjectsinElasticsearch.Wealsousednestedobjectsandlearnedwhentheycanbeused.Wealsousedparent-childfunctionalityandwelearnedhowthisapproachisdifferentcomparedtonesteddocuments.Finally,wemodifiedourindicesstructurewithacallofanAPIandlearnedwhenthisispossible.

Inthenextchapter,wewillgetbacktoqueryingrelatedtopics.WewilllearnhowLucenescoringworks,howtousescriptsinElasticsearch,andhowtohandlemultilingualdata.Wewillaffectscoringusingboostsandwewillusesynonymstoimproveusers’searchresults.Finally,wewilllookatwhatwecandotoseehowourdocumentswerescored.

www.EBooksWorld.ir

Page 346: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 347: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Chapter6.MakeYourSearchBetterInthepreviouschapter,wewerefocusedonindexingoperations;welearnedhowtohandlethestructureddata.Westartedwithindexingtree-likestructuresandJSONobjects.Weusednestedobjectsandindexeddocumentsusingparent-childfunctionality.Finally,attheendofthechapter,weusedElasticsearchAPItomodifyourindicesstructures.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

UnderstandinghowApacheLucenescoringworksUsingscriptingHandlingmultilingualdataUsingboostingtoaffectdocumentscoringUsingsynonymsUnderstandinghowyourdocumentswerescored

www.EBooksWorld.ir

Page 348: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IntroductiontoApacheLucenescoringWhentalkingaboutqueriesandtheirrelevance,wecan’tomittheinformationaboutthescoringandwhereitcomesfrom.Butwhatisascore?Thescoreisapropertythatdescribestherelevanceofadocumentinthecontextofaquery.Inthefollowingsection,wewilltalkaboutthedefaultApacheLucenescoringmechanism–theTF/IDFalgorithmandhowitaffectsthereturneddocument.

NoteTheTF/IDFisnottheonlyavailablealgorithmexposedbyElasticsearch.Formoreinformationabouttheavailablemodels,refertotheAvailablesimilaritymodelssectioninChapter2,IndexingYourData.YoucanalsorefertothebooksMasteringElasticsearchandMasteringElasticsearchSecondEditionpublishedbyPacktPublishing.

www.EBooksWorld.ir

Page 349: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

WhenadocumentismatchedWhenadocumentisreturnedbyLucene,itmeansthatitmatchedthequerywesenttoit.Inmostcases,eachoftheresultingdocumentsintheresponseisgivenascore.Thehigherthescore,themorerelevantthedocumentisfromthesearchengine’spointofview,ofcourse,inthecontextofagivenquery.Thismeansthatthescorefactorcalculatedforthesamedocumentontwodifferentquerieswillbedifferent.Becauseofthat,comparingscoresbetweenqueriesusuallydoesn’tmakemuchsense.However,let’sgetbacktothescoring.Tocalculatethescorepropertyforadocument,multiplefactorsaretakenintoaccount:

documentboost:Theboostvaluegivenforadocumentduringindexing.fieldboost:Theboostvaluegivenforafieldduringqueryingandindexing.coord:Thecoordinationfactorthatisbasedonthenumberoftermsthedocumenthas.Itisresponsibleforgivingmorevaluetothedocumentsthatcontainmoresearchtermscomparedtotheotherdocuments.inversedocumentfrequency:Thetermbasedfactorthattellsthescoringformulahowrareforscorepropertycalculation:inversedocumentfrequency”thegiventermis.Thehighertheinversedocumentfrequencythelesscommonthetermis.lengthnorm:Thefieldbasedfactorfornormalizationbasedonthenumberoftermsthegivenfieldcontains.Thelongerthefield,thesmallerboostthisfactorwillgive.Itbasicallymeansthattheshorterdocumentswillbefavored.termfrequency:Thetermbasedfactordescribinghowmanytimesthegiventermoccursinadocument.Thehigherthetermfrequency,thehigherthescoreofthedocumentwillbe.querynorm:Thequerybasednormalizationfactorthatiscalculatedasthesumofthesquaredweightofeachofthequeryterms.Querynormisusedtoallowscorecomparisonbetweenqueries,whichwesaidisnotalwayseasyorpossible.

www.EBooksWorld.ir

Page 350: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DefaultscoringformulaThepracticalformulafortheTF/IDFalgorithmlooksasfollows:

Toadjustyourqueryrelevance,youdon’tneedtorememberthedetailsoftheequation,butitisveryimportanttoknowhowitworks–toatleastbeawarethatthereisanequationyoucananalyze.Wecanseethatthescorefactorforthedocumentisafunctionofqueryqanddocumentd.Therearealsotwofactorsthatarenotdependentdirectlyonqueryterms:coordandqueryNorm.Thesetwoelementsoftheformulaaremultipliedbythesumcalculatedforeachterminthequery.Thesumontheotherhandiscalculatedbymultiplyingthetermfrequencyforthegiventerm,itsinversedocumentfrequency,termboost,andthenorm,whichisthelengthnormwediscussedpreviously.

NoteNotethattheprecedingformulaisapracticalone.YoucanfindmoreinformationabouttheconceptualformulainLuceneJavadocsathttp://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

Thegoodthingabouttheprecedingrulesisthatyoudon’tneedtorememberallofthat.Whatyoushouldbeawareofiswhatmatterswhenitcomestothedocumentscore.Basically,thereareafewruleswhichcomefromtheprecedingmentionedequation:

Therarerthematchedtermis,thehigherthescorethedocumentwillhaveTheshorterthedocumentfieldsare(thelesstermstheyhave),thehigherthescorethedocumentwillhaveThehighertheboostforthefieldsis,thehigherthescorethedocumentwillhave

Aswecansee,Lucenegivesahigherscoreforthedocumentsthathavemanyquerytermsmatchedandhaveshorterfields(lesstermsindexed)thatwereusedformatching,anditalsofavorsrarertermsinsteadofthecommonones(ofcourse,theonesthatmatched).

www.EBooksWorld.ir

Page 351: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RelevancymattersInmostcases,wewanttogetthebestmatchingdocuments.However,themostrelevantdocumentsdon’talwaysmeanthesameasthebestmatches.Someusecasesdefineverystrictrulesonwhyagivendocumentshouldbehigherontheresultslist.Forexample,onecouldsaythat,inadditiontothedocumentbeingaperfectmatchintermsofTF/IDFsimilarity,wehavepayingcustomerstoconsider.Dependingonthecustomerplan,wewanttogivemoreimportancetosuchdocuments.Insuchcases,wecouldwantthedocumentsforthecustomersthatpaythemosttobeontopofthesearchresults.Ofcourse,thisisnotrelevantinTF/IDF.

Theotherexampleisyellowpages,wherecustomerspayformoreinformationdescribingthedocument.SuchlargedocumentsmaynotbethemostrelevantonesaccordingtoTF/IDF,soyoumaywanttoadjustthescoringifyouareworkingwithsuchdata.

TheseareverysimpleexamplesandElasticsearchqueriescanbecomereallycomplicated.WewilltalkaboutsuchqueriesintheInfluencingscoreswithqueryboostssectioninthischapter.

Whenworkingonsearchrelevance,youshouldalwaysrememberthatitisnotaonetimeprocess.Yourdatawillchangewithtimeandyourquerieswillneedtobeadjusted.Inmostcases,tuningthequeryrelevancywillbeconstantwork.Youwillneedtoreacttoyourbusinessrulesandneeds,tohowtheusersbehave,andsoon.Itisveryimportanttorememberthatthisprocessisnotasingletimeoneaboutwhichyoucanforget.

www.EBooksWorld.ir

Page 352: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 353: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ScriptingcapabilitiesofElasticsearchElasticsearchhasafewfunctionalitieswherescriptscanbeused.You’vealreadyseenexamplessuchasupdatingdocumentsandsearching.WewillalsousethescriptingcapabilitiesofElasticsearchwhenwediscussaggregations.Eventhoughscriptsseemtobearatheradvancedtopic,wewilllookatthepossibilitiesofferedbyElasticsearch.That’sbecausescriptsarepricelessincertainsituations.

Elasticsearchcanuseseverallanguagesforscripting.Whennotexplicitlydeclared,itassumesthatGroovy(www.groovy-lang.org/)isused.OtherlanguagesavailableoutoftheboxareLuceneexpressionlanguageandMustache(https://mustache.github.io/).Ofcoursewecanuseplugins,whichwillmakeElasticsearchunderstandadditionalscriptinglanguages,suchasJavaScript,MVEL,andPython.Thethingworthmentioningisthatindependentfromthescriptinglanguagethatwechoose,Elasticsearchexposesobjectsthatwecanuseinourscripts.Let’sstartbybrieflylookingatwhattypeofinformationweareallowedtouseinourscripts.

www.EBooksWorld.ir

Page 354: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ObjectsavailableduringscriptexecutionDuringdifferentoperations,Elasticsearchallowsustousedifferentobjectsinourscripts.Todevelopascriptthatfitsourusecase,weshouldbefamiliarwiththeseobjects.

Forexample,duringasearchoperation,thefollowingobjectsareavailable:

_doc(alsoavailableasdoc):Thisisaninstanceoftheorg.elasticsearch.search.lookup.LeafDocLookupobject.Itgivesusaccesstothecurrentdocumentfoundwiththecalculatedscoreandfieldvalues._source:Thisisaninstanceoftheorg.elasticsearch.search.lookup.SourceLookupobject.Itprovidesaccesstothesourceofthecurrentdocumentandthevaluesdefinedinthesource._fields:Thisisaninstanceoftheorg.elasticsearch.search.lookup.LeafFieldsLookupobject.Itcanbeusedtoaccessthevaluesofthedocumentfields.

Ontheotherhand,duringadocumentupdateoperation,theprecedingmentionedvariablesarenotaccessible.Elasticsearchexposesonlythectxobjectwiththe_sourceproperty,whichprovidesaccesstothedocumentcurrentlyprocessedintheupdaterequest.

Aswehavepreviouslyseen,severalmethodsarementionedinthecontextofdocumentfieldsandtheirvalues.Let’snowlookatexamplesofhowtogetthevalueforaparticularfieldusingthepreviouslymentionedobjectavailableduringthesearchoperation.Inthebracketsafterthescriptpiece,youcanseewhatElasticsearchwillreturnforoneofourexampledocumentsfromthelibraryindex(wewillusethedocumentwithidentifier4):

_doc.title.value(and)_source.title(crimeandpunishment)_fields.title.value(null)

Abitconfusing,isn’tit?Duringindexing,theoriginaldocumentisbydefaultstoredinthe_sourcefield.Ofcourse,bydefault,allthefieldsarepresentinthat_sourcefield.Inadditiontothat,thedocumentisparsedandeveryfieldmaybestoredinanindexifitismarkedasstored(thatis,ifthestorepropertyissettotrue;otherwise,bydefault,thefieldsarenotstored).Finally,thefieldvaluemaybeconfiguredasindexed.Thismeansthatthefieldvalueisanalyzedandplacedintheindex.Tosumup,onefieldmaylandinElasticsearchindexinthefollowingways:

Asapartofthe_sourcedocumentAsastoredandunparsedoriginalvalueAsanindexedvaluethatisprocessedbyananalyzer

Inscripts,wehaveaccesstoallthesefieldrepresentations.Theonlyexceptionistheupdateoperation,which,aswe’vementionedbefore,givesusonlyaccesstodocument_sourceaspartofthectxvariable.Youmaywonderwhichversionyoushoulduse.Well,ifyouwantaccesstotheprocessedform,theanswerwillbesimple–usethe_docobject.Whatabout_sourceand_fields?Inmostcases,_sourceisagoodchoice.Itisusually

www.EBooksWorld.ir

Page 355: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

fastandneedslessdiskoperationsthanreadingtheoriginalfieldvaluesfromtheindex.Thisisespeciallytruewhenyouneedtoreadthevaluesofmultiplefieldsinyourscripts;fetchingasingle_sourcefieldisfasterthanfetchingmultipleindependentfieldsfromtheindex.

www.EBooksWorld.ir

Page 356: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ScripttypesElasticsearchallowsustousescriptsinthreedifferentways:

Inlinescripts:ThesourceofthescriptisdirectlydefinedinthequeryInfilescripts:ThesourceisdefinedintheexternalfileplacedintheElasticsearchconfig/scriptsdirectoryAsadocumentinthededicatedindex:Thesourceofthescriptisdefinedasadocumentinaspecialindexavailablebyusingthe/_scriptsAPIend-point

Choosingthewaytodefinescriptsdependsonseveralfactors.Ifyouhavescriptswhichyouwilluseinmanydifferentqueries,thefileorthededicatedindexseemtobethebestsolutions.Thescriptsinfileisprobablylessconvenient,butitispreferredfromthesecuritypointofview;theycan’tbeoverwrittenandinjectedintoyourquerycausingasecuritybreach.

InfilescriptsThisistheonlywaytoallowdynamicscriptingifwedon’twanttoenablequerydynamicscriptinginElasticsearch.Theideaisthateveryscriptusedbythequeriesisdefinedinitsownfileplacedintheconfig/scriptsdirectory.Wewillnowlookatthismethodofusingscripts.Let’screateanexamplefilecalledtag_sort.groovyandlet’splaceitintheconfig/scriptsdirectoryofourElasticsearchinstance(orinstancesifwerunacluster).Thecontentofthementionedfileshouldlooklikethis:

_doc.tags.values.size()>0?_doc.tags.values[0]:'\u19999'

Afterfewseconds,Elasticsearchwillautomaticallyloadanewfile.YoushouldseesomethinglikethefollowingintheElasticsearchlogs:

[2015-08-3013:14:33,005][INFO][script][AlexWilder]

compilingscriptfile[/Users/negativ/Developer/ES/es-

current/config/scripts/tag_sort.groovy]

NoteIfyouhavemulti-nodecluster,youhavetomakesurethatthescriptisavailableoneverynode.

Nowwearereadytousethisscriptinourqueries.YoumayrememberthatweusedexactlythesamescriptintheSortingdatasectioninChapter4,ExtendingYourQueryingKnowledge.Nowthemodifiedquerythatusesourscriptstoredinthefilelooksasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"file":"tag_sort"

www.EBooksWorld.ir

Page 357: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

},

"type":"string",

"order":"asc"

}

}

}'

Wewillreturntothis,butfirst,thenextpossiblewayofdefininginlinescripts.

InlinescriptsInlinescriptsareamoreconvenientwayofusingscripts,especiallyforconstantlychangingqueriesandforad-hocqueries.Themaindrawbackofsuchanapproachissecurity.Ifweallowuserstorunanykindofquery,includingscripts,wecanexposeourElasticsearchinstancetoattackers.SuchattackscanexecutearbitrarycodeontheserverrunningElasticsearchwithrightsequaltotheonesgiventotheuserrunningElasticsearch.Intheworstcasescenario,theattackercouldusesecurityholestogainsuperuserrights.Thisisthereasonwhyinlinescriptsaredisabledbydefault.Aftercarefulconsideration,youcanenablethembyadding:

script.inline:on

Addtheprecedingcommandlinetotheelasticsearch.ymlfile.

Afterallowingtheinlinescripttobeexecuted,wecanrunaquerythatlooksasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"inline":"_doc.tags.values.size()>0?_doc.tags.values[0]:

\"\u19999\""

},

"type":"string",

"order":"asc"

}

}

}'

IndexedscriptsThelastoptionfordefiningscriptsisstoringtheminthededicatedElasticsearchindex.Forthesamesecurityreasons,dynamicexecutionoftheindexedscriptsisbydefaultdisabled.Toenabletheindexedscripts,wehavetoaddasimilarconfigurationoptiontotheoneweaddedtobeabletousetheinlinescripts.Weneedtoaddthefollowinglinetotheelasticsearch.ymlfile:

script.indexed:on

Afteraddingtheprecedingpropertytoallthenodesandrestartingthecluster,wewillbereadytostartusingtheindexedscripts.Elasticsearchprovidesanadditional,dedicated

www.EBooksWorld.ir

Page 358: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

endpointforthispurpose.Let’sstoreourscript:

curl-XPOST'localhost:9200/_scripts/groovy/tag_sort'-d'{

"script":"_doc.tags.values.size()>0?_doc.tags.values[0]:

\"\u19999\""

}'

Thescriptisready,butlet’sdiscusswhatwejustdid.WesentanHTTPPOSTrequesttothespecial_scriptsRESTend-point.Wealsospecifiedthelanguageofthescript(groovyinourcase)andthenameofthescript(tag_sort).Thebodyoftherequestisthescriptitself.

Wecannowmoveontothequery,whichlooksasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"id":"tag_sort"

},

"type":"string",

"order":"asc"

}

}

}'

Aswesee,thequeryispracticallyidenticaltothequeryusedwiththescriptdefinedinafile.Theonlydifferenceisthatweprovidedtheidentifierofthescriptusingtheidparameterinsteadofprovidingthefilename.

www.EBooksWorld.ir

Page 359: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

QueryingwithscriptsIfwelookatanyrequestmadetoElasticsearchthatusesscripts,wewillnoticesomesimilarproperties,whichareasfollows:

script:Thispropertywrapsthescriptdefinition.inline:Thispropertyholdsthecodeofthescriptitself.id:Thispropertydefinestheidentifieroftheindexedscript.file:Thefilenameofthescriptwithouttheextension.lang:Thispropertydefinesthelanguageofthescript.Ifitisomitted,Elasticsearchassumesgroovy.params:Thisobjectcontainstheparametersandtheirvalues.Everydefinedparametercanbeusedinsidethescriptbyspecifyingthatparameter’sname.Theparametersallowustowritecleanercodewhichwillbeexecutedinamoreefficientmanner.Scriptsusingtheparametersareexecutedfasterthancodewithembeddedconstantsbecauseofcaching.

www.EBooksWorld.ir

Page 360: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ScriptingwithparametersAsourscriptsbecomemoreandmorecomplicated,theneedforcreatingmultiple,almostidenticalscriptscanappear.Thesescriptsusuallydifferinthevaluesused,withthelogicbehindthembeingexactlythesame.Inoursimpleexample,weusedahardcodedvalueusedtomarkdocumentswithemptytagslist.Let’schangethistoallowdefinitionofthehardcodedvalue.Let’suseinfilescriptdefinitionandcreateatag_sort_with_param.groovyfilewiththefollowingcontents:

_doc.tags.values.size()>0?_doc.tags.values[0]:tvalue

Theonlychangewe’vemadeistheintroductionoftheparameternamedtvalue,whichcanbesetinthequeryinthefollowingway:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"file":"tag_sort_with_param",

"params":{

"tvalue":"000"

}

},

"type":"string",

"order":"asc"

}

}

}'

Theparamssectiondefinesallthescriptparameters.Inoursimpleexample,we’veonlyusedasingleparameter,butofcoursewecanhavemultipleparametersinasinglequery.

www.EBooksWorld.ir

Page 361: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ScriptlanguagesAswealreadysaid,thedefaultlanguageforscriptingisGroovy.However,wearenotlimitedtoonlyasinglescriptinglanguagewhenusingElasticsearch.Infact,ifyouwouldliketo,youcanevenuseJavatowriteyourscripts.Inadditiontothat,thecommunitybehindElasticsearchprovidesadditionallanguagessupportasplugins.Soifyouarewillingtoinstallplugins,youcanextendthelistofscriptinglanguagesthatElasticsearchsupportsevenfurther.YoumaywonderwhyyouwouldevenconsiderusingascriptinglanguageotherthanthedefaultGroovy.Thefirstreasonisyourownpreferences.Ifyouareapythonenthusiast,youareprobablynowthinkingabouthowtousepythonforyourElasticsearchscripts.Theotherreasoncouldbesecurity.Whenwetalkedabouttheinlinescripts,wetoldyouthattheyareturnedoffbydefault.Thisisnotexactlytrueforallthescriptinglanguagesavailableoutofthebox.TheinlinescriptsaredisabledbydefaultwhenusingGroovy,butyoucanuseLuceneexpressionsandMustachewithoutanyissues.Thisisbecausethoselanguagesaresandboxed,whichmeansthatthesecuritysensitivefunctionsareturnedoff.Andofcourse,thelastfactorwhenchoosingalanguageisperformance.Theoretically,thenativescripts(inJava)shouldhavebetterperformancethanothers,butyoushouldrememberthatthedifferencecanbeinsignificant.Youshouldalwaysconsiderthecostofdevelopmentandmeasureperformance.

www.EBooksWorld.ir

Page 362: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UsingotherthanembeddedlanguagesUsingGroovyforscriptingisasimpleandsufficientsolutionformostusecases.However,youmayhaveadifferentpreferenceandyoumayliketousesomethingdifferent,suchasJavaScript,Python,orMvel.Beforeusingotherlanguages,wemustinstallanappropriateplugin.YoucanreadmoredetailsaboutpluginsintheElasticsearchpluginssectionofChapter9,ElasticsearchCluster.Fornow,we’lljustrunthefollowingcommandfromtheElasticsearchdirectory:

bin/plugininstalllang-javascript

TheprecedingcommandwillinstallapluginthatwillallowtheusageofJavaScriptasthescriptinglanguage.Theonlychangeweshouldmakeintherequestistoaddtheadditionalinformationaboutthelanguageweareusingforscriptingand,ofcourse,modifythescriptitselftocorrectlyusethenewlanguage.Lookatthefollowingexample:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"inline":"_doc.tags.values.length>0?_doc.tags.values[0]

:\"\u19999\";",

"lang":"javascript"

},

"type":"string",

"order":"asc"

}

}

}'

Asyoucansee,we’veusedJavaScriptforscriptinginsteadofthedefaultGroovy.ThelangparameterinformsElasticsearchaboutthelanguagebeingused.

www.EBooksWorld.ir

Page 363: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UsingnativecodeIncasethescriptsaretoosloworyoudon’tlikescriptinglanguages,ElasticsearchallowsyoutowriteJavaclassesandusetheminsteadofscripts.Therearetwopossiblewaysofaddingnativescripts:addingclassesdefiningscriptstoElasticsearchclasspathoraddingscriptasafunctionalityprovidedbyaplugin.Wewilldescribethissecondsolutionasitismoreelegant.

ThefactoryimplementationWeneedtoimplementatleasttwoclassestocreateanewnativescript.Thefirstoneisafactoryforourscript.Fornow,let’sfocusonit.Thefollowingsamplecodeillustratesthefactoryforourscript:

packagepl.solr.elasticsearch.examples.scripts;

importjava.util.Map;

importorg.elasticsearch.common.Nullable;

importorg.elasticsearch.script.ExecutableScript;

importorg.elasticsearch.script.NativeScriptFactory;

publicclassHashCodeSortNativeScriptFactoryimplementsNativeScriptFactory

{

@Override

publicExecutableScriptnewScript(@NullableMap<String,Object>params)

{

returnnewHashCodeSortScript(params);

}

@Override

publicbooleanneedsScores(){

returnfalse;

}

}

Theessentialpartsarehighlightedinthecodesnippet.Thisclassshouldimplementtheorg.elasticsearch.script.NativeScriptFactoryclass.Theinterfaceforcesustoimplementtwomethods.ThenewScript()methodtakestheparametersdefinedintheAPIcallandreturnsaninstanceofourscript.Finally,needsScores()informsElasticsearchifwewanttousescoringandwhetheritshouldbecalculated.

ImplementingthenativescriptNowlet’slookattheimplementationofourscript.Theideaissimple–ourscriptwillbeusedforsorting.DocumentswillbeorderedbythehashCode()valueofthechosenfield.Thedocumentswithoutavalueinthedefinedfieldwillbefirstontheresultslist.Weknowthelogicdoesn’tmaketoomuchsense,butitisgoodforpresentationasitissimple.Thesourcecodeforournativescriptlooksasfollows:

www.EBooksWorld.ir

Page 364: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

packagepl.solr.elasticsearch.examples.scripts;

importjava.util.Map;

importorg.elasticsearch.script.AbstractSearchScript;

publicclassHashCodeSortScriptextendsAbstractSearchScript{

privateStringfield="name";

publicHashCodeSortScript(Map<String,Object>params){

if(params!=null&&params.containsKey("field")){

this.field=params.get("field").toString();

}

}

@Override

publicObjectrun(){

Objectvalue=source().get(field);

if(value!=null){

returnvalue.hashCode();

}

return0;

}

}

Firstofall,ourclassinheritsfromtheorg.elasticsearch.script.AbstractSearchScriptclassandimplementstherun()method.Thisiswherewegettheappropriatevaluesfromthecurrentdocument,processitaccordingtoourstrangelogic,andreturntheresult.Youmaynoticethesource()call.Itisexactlythesame_sourceparameterthatweusedwhendealingwithnon-nativescripts.Thedoc()andfields()methodsarealsoavailableandtheyfollowthesamelogicwedescribedearlier.

Thethingworthlookingatishowwe’veusedtheparameters.Weassumethatausercanputthefieldparameter,tellinguswhichdocumentfieldwillbeusedformanipulation.Wealsoprovideadefaultvalueforthisparameter.

TheplugindefinitionWesaidthatwewillinstallourscriptasapartofaplugin.Thisiswhyweneedadditionalfiles.ThefirstfileistheplugininitializationclasswherewetellElasticsearchaboutournewscript:

packagepl.solr.elasticsearch.examples.scripts;

importorg.elasticsearch.plugins.Plugin;

importorg.elasticsearch.script.ScriptModule;

publicclassScriptPluginextendsPlugin{

@Override

publicStringdescription(){

return"Theexampleofnativesortscript";

www.EBooksWorld.ir

Page 365: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

@Override

publicStringname(){

return"naive-sort-plugin";

}

publicvoidonModule(finalScriptModulemodule){

module.registerScript("native_sort",

HashCodeSortNativeScriptFactory.class);

}

}

Theimplementationiseasy.Thedescription()andname()methodsareonlyforinformation,solet’sfocusontheonModule()method.Inourcase,weneedaccesstothescriptmodule–Elasticsearchservicewithscriptsandscriptinglanguages.ThisiswhywedefineonModule()withoneScriptModuleargument.ThankstoElasticsearchmagic,wecanusethismoduleandregisterourscriptsoitcanbefoundbytheengine.WehaveusedtheregisterScript()method,whichtakesthescriptnameandthepreviouslydefinedfactoryclass.

Thesecondneededfileisaplugindescriptorfile:plugin-descriptor.properties.ItdefinestheconstantsusedbytheElasticsearchpluginsubsystem.Withoutmorethinking,let’slookatthecontentsofthisfile:

jvm=true

classname=pl.solr.elasticsearch.examples.scripts.ScriptPlugin

elasticsearch.version=2.2.0

version=0.0.1-SNAPSHOT

name=native_script

description=ExampleNativeScripts

java.version=1.7

Theappropriatelineshavethefollowingmeaning:

jvm:tellsElasticsearchthatourfilecontainsJavacodeclassname:describesthemainclasswithplugindefinitionelasticsearch.versionandjava.version:tellsusabouttheElasticsearchversionthatissupportedbythepluginandtheJavaversionthatisneedednameanddescription:Informativenameandshortdescriptionofourplugin

Andthat’sit.Wehaveallthefilesneededtorunourscript.Pleasenotethatyoucanhavemorethanasinglescriptpackedasasingleplugin.

InstallingthepluginNowit’stimetoinstallournativescriptembeddedintheplugin.AfterpackingthecompiledclassesasaJARarchive,weshouldputitintheElasticsearchplugins/native-scriptdirectory.Thenative-scriptpartisarootdirectoryforourpluginandyoumaynameitasyouwish.Inthisdirectoryyoualsoneedthepreparedplugin-descriptor.propertiesfile.ThismakesourpluginvisibletoElasicsearch.

www.EBooksWorld.ir

Page 366: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RunningthescriptAfterrestartingElasticsearch(orthewholeclusterifyourunmorethanasinglenode),wecanstartsendingthequeriesthatuseournativescript.Forexample,wewillsendaquerythatusesourpreviouslyindexeddatafromthelibraryindex.Thisexamplequerylooksasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"script":"native_sort",

"lang":"native",

"params":{

"field":"otitle"

}

},

"type":"string",

"order":"asc"

}

}

}'

Notetheparamspartofthequery.Inthiscall,wewanttosortontheotitlefield.Weprovidethescriptnamenative_sortandthescriptlanguagenative.Thisisrequired.Ifeverythinggoeswell,weshouldseeourresultssortedbyourcustomsortlogic.IfwelookattheresponsefromElasticsearch,wewillseethatthedocumentswithouttheotitlefieldareatthefirstfewpositionsoftheresultslistandtheirsortvalueis0.

www.EBooksWorld.ir

Page 367: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 368: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SearchingcontentindifferentlanguagesUntilnow,whendiscussinglanguageanalysis,we’vetalkedmostlyabouttheory.Wedidn’tseeanexampleregardinglanguageanalysis,handlingmultiplelanguagesthatourdatacanconsistof,andsoon.Nowthiswillchange,asthissectionisdedicatedtoinformationabouthowwecanhandledatainmultiplelanguages.

www.EBooksWorld.ir

Page 369: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

HandlinglanguagesdifferentlyAsyoualreadyknow,Elasticsearchallowsustochoosedifferentanalyzersforourdata.Wecanhaveourdatadividedonthebasisofwhitespaces,orhavethemlowercased,andsoon.Thiscanusuallybedoneregardlessofthelanguage–thesametokenizationonthebasisofwhitespaceswillworkforEnglish,German,andPolish,althoughitwon’tworkforChinese.However,whatifyouwanttofinddocumentsthatcontainwordssuchascatandcatsbyonlysendingthewordcattoElasticsearch?Thisiswherelanguageanalysiscomesintoplaywithstemmingalgorithmsfordifferentlanguages,whichallowtheanalyzedwordstobereducedtotheirrootforms.Andnowtheworstpart–wecan’tuseonegeneralstemmingalgorithmforallthelanguagesintheworld;wehavetochooseoneappropriatelanguage.Thefollowingsectionsinthechapterwillhelpyouwithsomepartsofthelanguageanalysisprocess.

www.EBooksWorld.ir

Page 370: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

HandlingmultiplelanguagesThereareafewwaysofhandlingmultiplelanguagesinElasticsearchandallofthemhavesomeprosandcons.Wewon’tbediscussingeverything,butjustforthepurposeofgivingyouanidea,afewofthosemethodsareasfollows:

StoringdocumentsindifferentlanguagesasdifferenttypesStoringdocumentsindifferentlanguagesinseparateindicesStoringlanguagedataindifferentfieldsofasingledocument

Forthepurposeofthebook,wewillfocusonasinglemethod–theonethatallowsstoringdocumentsindifferentlanguagesinasingleindex.Wewillfocusonaproblemwherewehaveasingletypeofdocument,buteachdocumentmaycomefromanywhereintheworldandthuscanbewritteninmultiplelanguages.Also,wewouldliketoenableouruserstousealltheanalysiscapabilities,suchasstemmingandstopwordsfordifferentlanguages,notonlyforEnglish.

NoteNotethatthestemmingalgorithmsperformdifferentlyfordifferentlanguages,bothintermsofanalysisperformanceandtheresultingterms.Forexample,Englishstemmersareverygood,butyoucanrunintoissueswithEuropeanlanguages,suchasGerman.

www.EBooksWorld.ir

Page 371: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DetectingthelanguageofthedocumentBeforewecontinuewithshowingyouhowtosolveourproblemwithhandlingmultiplelanguagesinElasticsearch,wewouldliketotellyouaboutoneadditionalthing,thatislanguagedetection.Therearesituationswhereyoujustdon’tknowwhatlanguageyourdocumentorqueryarein.Insuchcases,languagedetectionlibrariesmaybeagoodchoice,especiallywhenusingJavaasyourprogramminglanguageofchoice.Someofthelibrariesareasfollows:

ApacheTika(http://tika.apache.org/)Languagedetection(https://github.com/shuyo/language-detection)

Thelanguagedetectionlibraryclaimstohaveover99percentprecisionfor53languages;that’salotifyouaskus.

Youshouldremember,though,thatdatalanguagedetectionwillbemorepreciseforlongertext.Becausethetextofqueriesisusuallyshort,youcanexpecttohavesomedegreeoferrorduringquerylanguageidentification.

www.EBooksWorld.ir

Page 372: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SampledocumentLet’sstartwithintroducingasampledocument,whichisasfollows:

{

"title":"Firsttestdocument",

"content":"Thisisatestdocument"

}

Asyoucansee,thedocumentisprettysimple;itcontainsthefollowingtwofields:

title:Thisfieldholdsthetitleofthedocumentcontent:Thisfieldholdstheactualcontentofthedocument

Thisdocumentisquitesimple,but,fromthesearchpointofview,theinformationaboutdocumentlanguageismissing.Whatweshoulddoisenrichthedocumentbyaddingtheneededinformation.Wecandothatbyusingoneofthepreviouslymentionedlibraries,whichwilltrytodetectthelanguage.

Afterwehavethelanguagedetected,weinformElasticsearchwhichanalyzershouldbeusedandmodifythedocumenttodirectlyshowthelanguageofeachfield.Eachofthefieldswouldhavetobeanalyzedbyalanguageanalyzerdedicatedtothedetectedlanguage.

NoteAfulllistoftheselanguageanalyzerscanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html).

Ifadocumentiswritteninalanguagethatwearenotsupporting,wewilljustfallbacktosomedefaultfieldwiththedefaultanalyzer.Forexample,ourprocessedandpreparedforindexingdocumentcouldlooklikethis:

{

"title_english":"Firsttestdocument",

"content_english":"Thisisatestdocument"

}

Thethingisthatallthisprocessingwe’vementionedwouldhavetobedoneoutsideofElasticsearchorinsomekindofcustompluginthatwouldimplementthementionedlogic.

NoteInthepreviousversionsofElasticsearch,therewasapossibilityofchoosingananalyzerbasedonthevalueofanadditionalfield,whichcontainedtheanalyzername.Thiswasamoreconvenientandelegantwaybutintroducedsomeuncertaintyaboutthefieldcontents.Youalwayshadtodeliveraproperanalyzerwhenusingthegivenfieldorstrangethingshappened.TheElasticsearchteammadethedifficultdecisionandremovedthisfeature.

Thereisalsoasimplerway:wecantakeourfirstdocumentandindexitinseveralwaysindependentlyfrominputlanguage.Let’sfocusonthissolution.

www.EBooksWorld.ir

Page 373: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThemappingsTohandleoursolution,whichwillprocessthedocumentusingseveraldefinedlanguages,weneednewmappings.Let’slookatthemappingswe’vecreatedtoindexourdocuments(we’vestoredtheminthemappings.jsonfile):

{

"mappings":{

"doc":{

"properties":{

"title":{

"type":"string",

"index":"analyzed",

"fields":{

"english":{

"type":"string",

"index":"analyzed",

"analyzer":"english"

},

"russian":{

"type":"string",

"index":"analyzed",

"analyzer":"russian"

},

"german":{

"type":"string",

"index":"analyzed",

"analyzer":"german"

}

}

},

"content":{

"type":"string",

"index":"analyzed",

"fields":{

"english":{

"type":"string",

"index":"analyzed",

"analyzer":"english"

},

"russian":{

"type":"string",

"index":"analyzed",

"analyzer":"russian"

},

"german":{

"type":"string",

"index":"analyzed",

"analyzer":"german"

}

}

}

}

}

}

www.EBooksWorld.ir

Page 374: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

Intheprecedingmappings,we’veshownthedefinitionforthetitleandcontentfields(ifyouarenotfamiliarwithanyaspectofmappingsdefinition,refertotheMappingsconfigurationsectionofChapter2,IndexingYourData).WehaveusedthemultifieldfeatureofElasticsearch:eachfieldcanbeindexedinseveralwaysusingvariouslanguageanalyzers(inourexample,thoseanalyzersare:English,Russian,andGerman).

Inaddition,thebasefieldusesthedefaultanalyzer,whichwemayuseatquerytimewhenthelanguageisunknown.So,eachfieldwillactuallyhavefourfields–thedefaultoneandthreelanguageorientedfields.

Inordertocreateasampleindexcalleddocsthatusesourmappings,wewillusethefollowingcommand:

curl-XPUT'localhost:9200/docs'[email protected]

www.EBooksWorld.ir

Page 375: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

QueryingNowlet’sseehowwecanqueryourdatatousethenewlycreatedlanguagefields.Wecandividethequeryingsituationintotwodifferentcases.Ofcourse,tostartqueryingweneeddocuments.Let’sindexourexampledocumentbyrunningthefollowingcommand:

curl-XPOST'localhost:9200/docs/doc/1'-d'{"title":"Firsttest

document","content":"Thisisatestdocument"}'

QuerieswithanidentifiedlanguageThefirstcaseiswhenwehaveourquerylanguageidentified.Let’sassumethattheidentifiedlanguageisEnglish.Insuchcases,ourqueryisasfollows:

curl'localhost:9200/docs/_search?pretty'-d'{

"query":{

"match":{

"content.english":"documents"

}

}

}'

Thethingtoputemphasisonintheprecedingqueryisthefieldusedforqueryingandthequerytype.Thefieldusediscontent.english,whichalsoindicateswhichanalyzerwewanttouse.Weusedthatfieldbecausewehadidentifiedourlanguagebeforerunningthequery.Thankstothis,theEnglishanalyzercanfindourdocumentevenifwehavethesingularformofthewordinthedocument.TheresponsereturnedbyElasticsearchwillbeasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.19178301,

"hits":[{

"_index":"docs",

"_type":"doc",

"_id":"1",

"_score":0.19178301,

"_source":{

"title":"Firsttestdocument",

"content":"Thisisatestdocument"

}

}]

}

}

Thethingtonoteisalsothequerytype–thematchquery.Weusedthematchquery

www.EBooksWorld.ir

Page 376: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

becauseitanalyzesitsbodywiththeanalyzerusedbythefieldthatitisrunagainst.Weneedthattoproperlymatchthedatainthequeryandthedataintheindex.

QuerieswithanunknownlanguageNowlet’slookatthesecondsituation–handlingquerieswhenwecouldn’tidentifythelanguageofthequery.Insuchcases,wecan’tusethefieldnamepointingtooneofthelanguages,suchascontent.german.Insuchacase,weusethedefaultfieldwhichusesthedefaultanalyzerandwesendthequerytothecontentfieldinstead.Thequerywilllookasfollows:

curl'localhost:9200/docs/_search?pretty'-d'{

"query":{

"match":{

"content":"documents"

}

}

}'

However,wedidn’tgetanyresultsthistimebecausethedefaultanalyzercan’tdealwithasingularformofawordwhenwearesearchingwithapluralform.

www.EBooksWorld.ir

Page 377: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CombiningqueriesToadditionallyboostthedocumentsthatperfectlymatchwithourdefaultanalyzer,wecancombinethetwoprecedingquerieswiththeboolquery.Suchacombinedquerywilllookasfollows:

curl-XGET'localhost:9200/docs/_search?pretty=true'-d'{

"query":{

"bool":{

"minimum_should_match":1,

"should":[

{

"match":{

"content.english":"documents"

}

},

{

"match":{

"content":"documents"

}

}

]

}

}

}'

Forthedocumenttobereturned,atleastoneofthedefinedqueriesmustmatch.Iftheybothmatch,thedocumentwillhaveahigherscorevalueandwillbeplacedhigherintheresults.

Thereisoneadditionaladvantageoftheprecedingcombinedquery.Ifourlanguageanalyzerdoesn’tfindadocument(forexample,whentheanalysisisdifferentfromtheoneusedduringindexing),thesecondqueryhasachancetofindthetermsthataretokenizedonlybywhitespacecharactersandlowercase.

www.EBooksWorld.ir

Page 378: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 379: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

InfluencingscoreswithqueryboostsInthebeginningofthischapter,welearnedwhatscoringisandhowElasticsearchusesthescoringformula.Whenanapplicationgrows,theneedforimprovingthequalityofsearchalsoincreases-wecallitsearchexperience.Weneedtogainknowledgeaboutwhatismoreimportanttotheuserandweseehowtheusersusethesearchesfunctionality.Thisleadstovariousconclusions;forexample,weseethatsomepartsofthedocumentsaremoreimportantthanothersorthatparticularqueriesemphasizeonefieldatthecostofothers.Weneedtoincludesuchinformationinourdataandqueriessothatbothsidesofthescoringequationareclosertoourbusinessneeds.Thisiswhereboostingcanbeused.

www.EBooksWorld.ir

Page 380: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheboostBoostisanadditionalvalueusedintheprocessofscoring.Wealreadyknowitcanbeappliedto:

Query:Whenused,weinformthesearchenginethatthegivenqueryisapartofacomplexqueryandismoresignificantthantheotherparts.Document:Whenusedduringindexing,wetellElasticsearchthatadocumentismoreimportantthantheothersintheindex.Forexample,whenindexingblogposts,weareprobablymoreinterestedinthepoststhemselvesthanpingbacksorcomments.

Valuesassignedbyustoaqueryoradocumentarenottheonlyfactorsusedwhenwecalculatetheresultingscoreandweknowthat.Wewillnowlookatafewexamplesofqueryboosting.

www.EBooksWorld.ir

Page 381: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AddingtheboosttoqueriesLet’simaginethatourindexhastwodocumentsandwe’veusedthefollowingcommandstoindexthem:

curl-XPOST'localhost:9200/messages/email/1'-d'{

"id":1,

"to":"JohnSmith",

"from":"DavidJones",

"subject":"Topsecret!"

}'

curl-XPOST'localhost:9200/messages/email/2'-d'{

"id":2,

"to":"DavidJones",

"from":"JohnSmith",

"subject":"John,readthisdocument"

}'

Thisdataistrivial,butitshoulddescribeourproblemverywell.Nowlet’sassumewehavethefollowingquery:

curl-XGET'localhost:9200/messages/_search?pretty'-d'{

"query":{

"query_string":{

"query":"john",

"use_dis_max":false

}

}

}'

Inthiscase,Elasticsearchwillcreateaquerytothe_allfieldandwillfinddocumentsthatcontainthedesiredwords.Wealsosaidthatwedon’twantthedisjunctionquerytobeusedbyspecifyingtheuse_dis_maxparametertofalse(ifyoudon’trememberwhatadisjunctionqueryis,refertotheThedis_maxquerysectioninChapter3,SearchingYourData).Aswecaneasilyguess,bothourrecordswillbereturned.Therecordwithidentifierequalto2willbefirstbecausethewordJohnoccurstwotimes–onceinthefromfieldandonceinthesubjectfield.Let’scheckthisoutinthefollowingresult:

"hits":{

"total":2,

"max_score":0.13561106,

"hits":[{

"_index":"messages",

"_type":"email",

"_id":"2",

"_score":0.13561106,

"_source":{

"id":2,

"to":"DavidJones",

"from":"JohnSmith",

"subject":"John,readthisdocument"

}

},{

www.EBooksWorld.ir

Page 382: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"_index":"messages",

"_type":"email",

"_id":"1",

"_score":0.11506981,

"_source":{

"id":1,

"to":"JohnSmith",

"from":"DavidJones",

"subject":"Topsecret!"

}

}]

}

Iseverythingallright?Technically,yes.Butwethinkthattheseconddocument(theonewithidentifier1)shouldbepositionedasthefirstoneintheresultlist,becausewhensearchingforsomething,themostimportantfactor(inmanycases)ismatchingpeopleratherthanthesubjectofthemessage.Youcandisagree,butthisisexactlywhyfull-textsearchingrelevanceisadifficulttopic;sometimesitishardtotellwhichorderingisbetterforaparticularcase.Whatcanwedo?First,let’srewriteourquerytoimplicitlyinformElasticsearchwhatfieldsshouldbeusedforsearching:

curl-XGET'localhost:9200/messages/_search?pretty'-d'{

"query":{

"query_string":{

"fields":["from","to","subject"],

"query":"john",

"use_dis_max":false

}

}

}'

Thisisnotexactlythesamequeryasthepreviousone.Ifwerunit,wewillgetthesameresults(inourcase).However,ifyoulookcarefully,youwillnoticedifferencesinscoring.Inthepreviousexample,Elasticsearchonlyusedonefield,thatisthedefault_allfield.Thequerythatweareusingnowisusingthreefieldsformatching.Thismeansthatseveralfactors,suchasfieldlengths,arechanged.Anyway,thisisnotsoimportantinourcase.Elasticsearchunderthehoodgeneratesacomplexquerymadeofthreequeries–onetoeachfield.Ofcourse,thescorecontributedbyeachquerydependsonthenumberoftermsfoundinthisfieldandthelengthofthisfield.

Let’sintroducesomedifferencesbetweenthefieldsandtheirimportance.Comparethefollowingquerytothelastone:

curl-XGET'localhost:9200/messages/_search?pretty'-d'{

"query":{

"query_string":{

"fields":["from^5","to^10","subject"],

"query":"john",

"use_dis_max":false

}

}

}'

www.EBooksWorld.ir

Page 383: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Lookatthehighlightedparts(^5and^10).Byusingthatnotation(the^characterfollowedbyanumber),wecaninformElasticsearchhowimportantagivenfieldis.Weseethatthemostimportantfieldisthetofield(becauseofthehighestboostvalue).Nextwehavethefromfield,whichislessimportant.Thesubjectfieldhasthedefaultvalueforboost,whichis1.0andistheleastimportantfieldwhenitcomestoscorecalculation.Alwaysrememberthatthisvalueisonlyoneofthevariousfactors.Youmaybewonderingwhywechoose5andnot1000or1.23.Well,thisvaluedependsontheeffectwewanttoachieve,whatquerywehave,and,mostimportantly,whatdatawehaveinourindex.Typically,whendatachangesinthemeaningfulparts,weshouldprobablycheckandtuneourrelevanceonceagain.

Intheend,let’slookatasimilarexample,butusingtheboolquery:

curl-XGET'localhost:9200/messages/_search?pretty'-d'{

"query":{

"bool":{

"should":[

{"term":{"from":{"value":"john","boost":5}}},

{"term":{"to":{"value":"john","boost":10}}},

{"term":{"subject":{"value":"john"}}}

]

}

}

}'

Theprecedingquerywillyieldthesameresults,whichmeansthatthefirstdocumentontheresultslistwillbetheonewiththeidentifier1,butthescoreswillbeslightlydifferent.ThisisbecausetheLucenequeriesmadefromthelasttwoexamplesareslightlydifferentandthusthescoresaredifferent.

www.EBooksWorld.ir

Page 384: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ModifyingthescoreTheprecedingexampleshowshowtoaffecttheresultlistbyboostingparticularquerycomponents–thefields.Anothertechniqueistorunaqueryandaffectthescoreofthematcheddocuments.Inthefollowingsections,wewillsummarizethepossibilitiesofferedbyElasticsearch.Intheexamples,wewilluseourlibrarydatathatwehavealreadyusedinthepreviouschapters.

ConstantscorequeryAconstant_scorequeryallowsustotakeanyqueryandexplicitlysetthevaluethatshouldbeusedasthescorethatwillbegivenforeachmatchingdocumentbyusingtheboostparameter.

Atfirst,thisquerydoesn’tseemtobepractical.Butwhenwethinkaboutbuildingcomplexqueries,thisqueryallowsustosethowmanydocumentsmatchingthisquerycanaffectthetotalscore.Lookatthefollowingexample:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"constant_score":{

"query":{

"query_string":{

"query":"available:falseauthor:heller"

}

}

}

}

}'

Inourdata,wehavetwodocumentswiththeavailablefieldsettofalse.Oneofthesedocumentshasanadditionalvalueintheauthorfield.Ifweuseadifferentquery,thedocumentwithanadditionalvalueintheauthorfield(abookwithidentifier2)wouldbegivenahigherscore,but,thankstotheconstantscorequery,Elasticsearchwillignorethatinformationduringscoring.Bothdocumentswillbegivenascoreequalto1.0.

BoostingqueryThenexttypeofquerythatcanbeusedwithboostingistheboostingquery.Theideaistoallowustodefineapartofquerywhichwillcausematcheddocumentstohavetheirscoreslowered.Thefollowingexamplereturnsalltheavailablebooks(availablefieldsettotrue),butthebookswrittenbyE.M.Remarquewillhaveanegativeboostof0.1(whichmeansabouttentimeslowerscore):

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"boosting":{

"positive":{

"term":{

"available":true

}

},

www.EBooksWorld.ir

Page 385: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"negative":{

"match":{

"author":"remarque"

}

},

"negative_boost":0.1

}

}

}'

ThefunctionscorequeryTillnowwe’veseentwoexamplesofqueriesthatallowedustoalterthescoreofthereturneddocuments.Thethirdexamplewewantedtotalkabout,thefunction_scorequery,iswaymorecomplicatedthanthepreviouslydiscussedqueries.Thefunction_scorequeryisveryusefulwhenthescorecalculationismorecomplicatedthangivingasingleboosttoallthedocuments;boostingmorerecentdocumentsisanexampleofaperfectusecaseforthefunction_scorequery.

Structureofthefunctionquery

Thestructureofthefunctionqueryisquitesimpleandlooksasfollows:

{

"query":{

"function_score":{

"query":{...},

"functions":[

{

"filter":{...},

"FUNCTION":{...}

}

],

"boost_mode":"...",

"score_mode":"...",

"max_boost":"...",

"min_score":"...",

"boost":"..."

}

}

}

Ingeneral,thefunctionscorequerycanuseaquery,oneofseveralfunctions,andadditionalparameters.Eachfunctioncanhaveafilterdefinedtofiltertheresultsonwhichitwillbeapplied.Ifnofilterisgivenforafunction,itwillbeappliedtoallthedocuments.

Thelogicbehindthefunctionscorequeryisquitesimple.Firstofall,thefunctionsarematchedagainstthedocumentsandthescoreiscalculatedbasedonscore_mode.Afterthat,thequeryscoreforthedocumentiscombinedwiththescorecalculatedforthefunctionsandcombinedtogetheronthebasisofboost_mode.

Let’snowdiscusstheparameters:

Boostmode:Theboost_modeparameterallowsustodefinehowthescorecomputedbythefunctionquerieswillbecombinedwiththescoreofthequery.Thefollowing

www.EBooksWorld.ir

Page 386: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

valuesareallowed:

multiply:Thedefaultbehavior,whichresultsinthequeryscorebeingmultipliedbythescorecomputedfromthefunctionsreplace:Thequeryscorewillbetotallyignoredandthedocumentscorewillbeequaltothescorecalculatedbythefunctionssum:Thedocumentscorewillbecalculatedasthesumofthequeryandthefunctionscoresavg:Thescoreofthedocumentwillbeanaverageofthequeryscoreandthefunctionscoremax:Thedocumentwillbegivenamaximumofqueryscoreandfunctionscoremin:Thedocumentwillbegivenaminimumofqueryscoreandfunctionscore

Scoremode:Thescore_modeparameterdefineshowthescorecomputedbythefunctionsarecombinedtogether.Thefollowingscore_modeparametervaluesaredefined:

multiply:Thedefaultbehaviorwhichresultsinthescoresreturnedbythefunctionsbeingmultipliedsum:Thescoresreturnedbythedefinedfunctionsaresummedavg:Thescorereturnedbythefunctionsisanaverageofallthescoresofthematchingfunctionsfirst:Thescoreofthefirstfunctionwithafiltermatchingthedocumentisreturnedmax:Themaximumscoreofthefunctionsisreturnedmin:Theminimumscoreofthefunctionsisreturned

Thereisonethingtoremember–wecanlimitthemaximumcalculatedscorevaluebyusingthemax_boostparameterinthefunctionscorequery.Bydefault,thatparameterissettoFloat.MAX_VALUE,whichmeansthemaximumfloatvalue.

Theboostparameterallowsustosetaquerywideboostforthedocuments.

Ofcourse,thereisonethingweshouldremember–thescorecalculateddoesn’taffectwhichdocumentsmatchedthequery.Becauseofthat,themin_scorepropertyhasbeenintroduced.Itallowsustodefinetheminimumscoreofthedocuments.Documentsthathaveascorelowerthanthemin_scorepropertywillbeexcludedfromtheresults.

Whatwehaven’ttalkedaboutyetarethefunctionscoresthatwecanincludeinthefunctionssectionofourquery.Thecurrentlyavailablefunctionsare:

weightfactorfieldvaluefactorscriptscorerandomdecay

Theweightfactorfunction

Theweightfactorfunctionallowsustomultiplythescoreofthedocumentbyagiven

www.EBooksWorld.ir

Page 387: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

value.Thevalueoftheweightparameterisnotnormalizedandistakenasis.Anexampleusingtheweightfunction,wherewemultiplythescoreofthedocumentby20,looksasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"function_score":{

"query":{

"term":{

"available":true

}

},

"functions":[

{"weight":20}

]

}

}

}'

Fieldvaluefactorfunction

Thefield_value_factorfunctionallowsustoinfluencethescoreofthedocumentbyusingavalueofthefieldinthatdocument.Forexample,tomultiplythescoreofthedocumentbythevalueoftheyearfield,werunthefollowingquery:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"function_score":{

"query":{

"term":{

"available":true

}

},

"functions":[

{

"field_value_factor":{

"field":"year",

"missing":1

}

}

]

}

}

}'

Inadditiontochoosingthefieldwhosevalueshouldbeused,wecanalsocontrolthebehaviorofthefieldvaluefactorfunctionbyusingthefollowingproperties:

factor:Themultiplicationfactorthatwillbeusedalongwiththefieldvalue.Itdefaultsto1.modifier:Themodifierthatwillbeappliedtothefieldvalue.Itdefaultstonone.Itcantakethevalueoflog,log1p,log2p,ln,ln1p,ln2p,square,sqrt,andreciprocal.missing:Thevaluethatshouldbeusedwhenadocumentdoesn’thaveanyvaluein

www.EBooksWorld.ir

Page 388: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

thefieldspecifiedinthefieldproperty.

Thescriptscorefunction

Thescript_scorefunctionallowsustouseascripttocalculatethescorethatwillbeusedasthescorereturnedbyafunction(andthuswillfallintobehaviordefinedbytheboost_modeparameter).Anexampleofscript_scoreusageisasfollows(forthefollowingexampletowork,inlinescriptingneedstobeallowed,whichmeansaddingthescript.inlinepropertyandsettingittooninelasticsearch.yml):

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"function_score":{

"query":{

"term":{

"available":true

}

},

"functions":[

{

"script_score":{

"script":{

"inline":"_score*_source.copies*parameter1",

"params":{

"parameter1":12

}

}

}

}

]

}

}

}'

Therandomscorefunction

Byusingtherandom_scorefunction,wecangenerateapseudorandomscore,byspecifyingaseed.Inordertosimulaterandomness,weshouldspecifyanewseedeverytime.Therandomnumberwillbegeneratedbyusingthe_uidfieldandtheprovidedseed.Ifaseedisnotprovided,thecurrenttimestampwillbeused.Anexampleofusingthisisasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"function_score":{

"query":{

"term":{

"available":true

}

},

"functions":[

{

"random_score":{

"seed":12345

}

www.EBooksWorld.ir

Page 389: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

]

}

}

}'

Decayfunctions

Inadditiontotheearliermentionedscoringfunctions,Elasticsearchexposesadditionalones,calledthedecayfunctions.Thedifferencefromthepreviouslydescribedfunctionsisthatthescoregivenbythosefunctionslowerswithdistance.Thedistanceiscalculatedonthebasisofasinglevaluednumericfield(suchasadate,ageographicalpoint,orastandardnumericfield).Thesimplestexamplethatcomestomindisboostingdocumentsonthebasisofdistancefromagivenpointorboostingonthebasisofdocumentdate.

Forexample,let’sassumethatwehaveapointfieldthatstoresthelocationandwewantourdocument’sscoretobeaffectedbythedistancefromapointwheretheuserstands(forexample,ourusersendsaqueryfromamobiledevice).Assumingtheuserisat52,21,wecouldsendthefollowingquery:

{

"query":{

"function_score":{

"query":{

"term":{

"available":true

}

},

"functions":[

{

"linear":{

"point":{

"origin":"52,21",

"scale":"1km",

"offset":0,

"decay":0.2

}

}

}

]

}

}

}

Intheprecedingexample,thelinearisthenameofthedecayfunction.Thevaluewilldecaylinearlywhenusingit.Theotherpossiblevaluesaregaussandexp.We’vechosenthelineardecayfunctionbecauseofthefactthatitsetsthescoreto0whenthefieldvalueexceedsthegivenoriginvaluetwice.Thisisusefulwhenyouwanttolowerthevalueofthedocumentsthataretoofaraway.

NoteNotethatthegeographicalsearchingcapabilitiesofElasticsearchwillbediscussedintheGeosectionofChapter8,BeyondFull-textSearching.

www.EBooksWorld.ir

Page 390: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Nowlet’sdiscusstherestofthequerystructure.Thepointisthenameofthefieldwewanttouseforscorecalculation.Ifthedocumentdoesn’thaveavalueinthedefinedfield,itwillbegivenavalueof1forthetimeofcalculation.

Inadditiontothat,we’veprovidedadditionalparameters.Theoriginandscalearerequired.Theoriginparameteristhecentralpointfromwhichthecalculationwillbedoneandthescaleistherateofdecay.Bydefault,theoffsetissetto0.Ifdefined,thedecayfunctionwillonlycomputeascoreforthedocumentswithvaluegreaterthanthevalueofthisparameter.ThedecayparametertellsElasticsearchhowmuchthescoreshouldbeloweredandissetto0.5bydefault.Inourcase,we’vesaidthat,atthedistanceof1kilometer,thescoreshouldbereducedby20%(0.2).

NoteWeexpectthefunction_scorequerytobemodifiedandextendedwiththenextversionsofElasticsearch(justasitwaswithElasticsearchversion1.x).Wesuggestfollowingtheofficialdocumentationandthepagededicatedtothefunction_scorequeryathttps://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html.

www.EBooksWorld.ir

Page 391: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 392: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Whendoesindex-timeboostingmakesense?Intheprevioussection,wediscussedboostingqueries.Thiskindofapproachtohandlingdifferencesintheweightofdocumentsisveryhandy,powerful,andeasytouse.Itisalsosufficientinmostsituations.However,therearecaseswhenamoreconvenientwayofdocumentsboostingisindex-timeboosting.Oneofsuchusecaseisthesituationwhenweknowwhichdocumentsareimportantduringtheindexingphase.Insuchacase,wecanpreparethedocumentboostandincludeitaspartofthedocument.Wegainaboostthatisindependentfromaqueryatthecostofreindexingthedocumentswhentheboostvalueischanged(becauseweneedtoapplythechangedboost).Inadditiontothat,theperformancegetsslightlybetterbecausesomepartsneededintheboostingprocessarealreadycalculatedatindextime,whichcanmatterwhenyourindiceshavealargenumberofdocuments.Informationabouttheboostisstoredasapartofthenormalizationfactorandbecauseofthatitisimportanttokeepthenormsturnedon.Thismeansthatwecan’tsetnorms.enabledtofalsebecausewewon’tbeabletouseindextimeboosting.

www.EBooksWorld.ir

Page 393: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DefiningboostinginthemappingsItisalsopossibletodirectlydefinethefield’sboostinourmappings.ThiswillresultinElasticsearchgivingaboostforallthedocumentshavingavalueinsuchafield.Ofcourse,thatwillalsohappenduringindexingtime.Thefollowingexampleillustratesthat:

{

"mappings":{

"book":{

"properties":{

"title":{"type":"string"},

"author":{"type":"string","boost":10.0}

}

}

}

}

Thankstotheprecedingboost,allquerieswillfavorvaluesfoundinthefieldnamedauthor.Thisalsoappliestoqueriesusingthe_allfield,becauseElasticsearchwillapplytheboosttovaluescopiedbetweenthefields.

www.EBooksWorld.ir

Page 394: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 395: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

WordswiththesamemeaningYoumayhaveheardaboutsynonyms,wordsthathavethesameorsimilarmeaning.Sometimesyouwouldwanttohavesomewordsmatchedwhenoneofthosewordsisenteredintothesearchbox.Let’srecalloursampledatafromChapter3,SearchingYourData.Therewasabookcalledcrimeandpunishment.Whatifwewantthatbooktonotonlybematchedwhenthewordscrimeorpunishmentareused,butalsowhenusingthewordssuchascriminalityandabuse.Atfirstglance,thismaynotsoundlikegoodbehavior,butsometimesthisisreallyneeded,especiallyinusecaseswheretherearemultiplewordsmeaningthesame(likeinmedicine).Tohandlesuchusecases,wewillusesynonyms.

www.EBooksWorld.ir

Page 396: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SynonymfilterSynonymsinElasticsearcharehandledontheanalysislevel–atbothindexandquerytime,byadedicatedsynonymsfilter.Tousethesynonymfilter,weneedtodefineourownanalyzer.Forexample,let’sdefineananalyzerthatwillbecalledsynonymandwillusethewhitespacetokenizerandasinglefiltercalledsynonym.Ourfilter’stypepropertyneedstobesettosynonym,whichtellsElasticsearchthatthisfilterisasynonymfilter.

Inadditiontothat,wewanttoignorecase,sothattheuppercasedandlowercasedsynonymsaretreatedequally(settheignore_casepropertytotrue).Todefineourcustomsynonymanalyzerthatusesasynonymfilterwhencreatinganewindex,wewouldusethefollowingcommand:

curl-XPOST'localhost:9200/test'-d'{

"index":{

"analysis":{

"analyzer":{

"synonym":{

"tokenizer":"whitespace",

"filter":[

"synonym"

]

}

},

"filter":{

"synonym":{

"type":"synonym",

"ignore_case":true,

"synonyms":[

"crime=>criminality"

]

}

}

}

}

}'

SynonymsinthemappingsInthedefinitionyou’vejustseen,we’vespecifiedthesynonymruleinthemappingswesendtoElasticsearch.Todothat,weneededtoaddthesynonymsproperty,whichisanarrayofsynonymrules.Forexample,thefollowingpartofthemappingsdefinitiondefinesasinglesynonymrule:

"synonyms":[

"crime=>criminality"

]

TheprecedingruletellsElasticsearchtochangethecrimetermtothecriminalitytermwhenthecrimetermisencounteredduringanalysis.

Synonymsstoredonthefilesystem

www.EBooksWorld.ir

Page 397: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Apartfromstoringthesynonymsrulesinthemappings,Elasticsearchallowsustouseafile-basedsynonymsruleset.Touseafile,weneedtospecifythesynonyms_pathpropertyinsteadofthesynonymsone.Thesynonyms_pathpropertyshouldbesettothenameofthefilethatholdsthesynonym’sdefinitionandthespecifiedfilepathisrelativetotheElasticsearchconfigdirectory.So,ifwestoreoursynonymsinthesynonyms.txtfileandwesavethatfileintheconfigdirectory,then,inordertouseit,weshouldsetsynonyms_pathtothevalueofsynonyms.txt.

Forexample,thisishowoursynonymfilterwouldlooklikeifwewanttousethesynonymsstoredinafile:

"filter":{

"synonym":{

"type":"synonym",

"synonyms_path":"synonyms.txt"

}

}

www.EBooksWorld.ir

Page 398: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DefiningsynonymrulesSofarwehavediscussedwhatwehavetodoinordertousesynonymexpansionsinElasticsearch.Nowlet’sseewhatformatsofsynonymsareallowed.

UsingApacheSolrsynonymsThemostcommonsynonymstructureintheApacheLuceneworldisprobablytheoneusedbyApacheSolr(http://lucene.apache.org/solr/),thesearchenginebuiltontopofLucene,justlikeElasticsearchis.ThisisthedefaultwayofhandlingsynonymsinElasticsearchandthepossibilitiesofdefininganewsynonymarediscussedinthefollowingsections.

Explicitsynonyms

Asimplemappingallowsustomapalistofwordsontootherwords.So,inourcase,ifwewantthewordcriminalitytobemappedtocrimeandthewordabusetobemappedtopunishment,weneedtodefinethefollowingentries:

criminality=>crime

abuse=>punishment

Ofcourse,asinglewordcanbemappedintomultipleonesandmultipleonescanbemappedintoasingleone.Forexample:

starwars,wars=>starwars

Theprecedingexamplemeansthatstarwarsandwarswillbechangedtostarwarsbythesynonymfilter.

Equivalentsynonyms

Inadditiontotheexplicitmapping,Elasticsearchallowsustouseequivalentsynonyms.Forexample,thefollowingdefinitionwillmakeallthewordsexchangeablesothatyoucanuseanyofthemtomatchadocumentthathasoneoftheminitscontents:

star,wars,starwars,starwars

Expandingsynonyms

AsynonymfilterallowsustouseoneadditionalpropertywhenitcomestoApacheSolrformatsynonyms–theexpandproperty.Whentheexpandpropertyissettotrue(bydefaultitissettofalse),allsynonymswillbeexpandedbyElasticsearchtoallequivalentforms.Forexample,let’ssaywehavethefollowingfilterconfiguration:

"filter":{

"synonym":{

"type":"synonym",

"expand":false,

"synonyms":[

"one,two,three"

]

}

}

www.EBooksWorld.ir

Page 399: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Elasticsearchwillmaptheprecedingsynonymdefinitiontothefollowing:

one,two,three=>one

Thismeansthatthewordsone,two,andthreewillbechangedtoone.However,ifwesettheexpandpropertytotrue,thesamesynonymdefinitionwillbeinterpretedinthefollowingway:

one,two,three=>one,two,three

Thisbasicallymeansthateachofthewordsfromtheleft-sideofthedefinitionwillbeexpandedtoallthewordsontheright-side.

UsingWordNetsynonymsIfwewanttouseWordNet-structured(tolearnmoreaboutWordNet,visithttp://wordnet.princeton.edu/)synonyms,weneedtoprovideanadditionalpropertyforoursynonymfilter.ThepropertynameisformatandweshouldsetitsvaluetowordnetinorderforElasticsearchtounderstandthatformat.

www.EBooksWorld.ir

Page 400: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Queryorindex-timesynonymexpansionAswithalltheanalyzers,onecanwonderwhentousethesynonymfilter–duringindexing,duringquerying,ormaybeduringindexingandquerying.Ofcourse,itdependsonyourneeds.However,rememberthatusingindex-timesynonymsrequiresdatareindexingaftereachsynonymchange.That’sbecausetheyneedtobereappliedtoallthedocuments.Ifweuseonlythequery-timesynonyms,wecanupdatethesynonym’slistsandhavethemappliedwithoutdatareindexation.

www.EBooksWorld.ir

Page 401: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 402: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UnderstandingtheexplaininformationComparedtodatabases,usingsystemscapableofperformingfull-textsearchcanoftenbeanythingotherthanobvious.Wecansearchinmanyfieldssimultaneouslyandthedataintheindexcanvaryfromtheonesprovidedasthevaluesofthedocumentfields(becauseoftheanalysisprocess,synonyms,abbreviations,andothers).It’sevenworse!Bydefault,searchenginessortdatabyrelevance,whichmeansthateachdocumentisgivenanumberindicatinghowsimilarthedocumentistothequery.Thekeypointhereisunderstandingthehowsimilarphrase.Aswediscussedinthebeginningofthechapter,scoringtakesmanyfactorsintoaccount–howmanysearchedwordswerefoundinthedocument,howfrequentthewordis,howmanytermsareinthefield,andsoon.Thisseemscomplicatedandfindingoutwhyadocumentwasfoundandwhyanotherdocumentisbetterisnoteasy.Fortunately,Elasticsearchprovidesuswithtoolsthatcananswerthesequestionsandwewilllookattheminthissection.

www.EBooksWorld.ir

Page 403: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UnderstandingfieldanalysisOneofthecommonquestionsaskedwhenanalyzingthereturneddocumentsiswhyagivendocumentwasnotfound.Inmanycases,theproblemliesinthemappingsdefinitionandtheanalysisprocessconfiguration.Fordebuggingtheanalysisprocess,ElasticsearchprovidesadedicatedRESTAPIendpoint–the_analyzeone.

Usingitisverysimple.Let’sseehowitisusedbyrunningarequesttoElasticsearchtogiveusinformationonhowthecrimeandpunishmentphraseisanalyzed.Todothat,wewillrunacommandusingHTTPGETtothe_analyzeRESTend-pointandwewillprovidethephraseastherequestbody.Thefollowingcommanddoesthat:

curl-XGET'localhost:9200/_analyze?pretty'-d'CrimeandPunishment'

Inresponse,wegetthefollowingdata:

{

"tokens":[{

"token":"crime",

"start_offset":0,

"end_offset":5,

"type":"<ALPHANUM>",

"position":0

},{

"token":"and",

"start_offset":6,

"end_offset":9,

"type":"<ALPHANUM>",

"position":1

},{

"token":"punishment",

"start_offset":10,

"end_offset":20,

"type":"<ALPHANUM>",

"position":2

}]

}

Aswecansee,Elasticsearchdividedtheinputphraseintothreetokens.Duringprocessing,thephrasewasdividedintotokensonthebasisofwhitespacecharactersandwaslowercased.Thisshowsusexactlywhatwouldbehappeningduringtheanalysisprocess.Wecanalsoprovidethenameoftheanalyzer.Forexample,wecanchangetheprecedingcommandtosomethinglikethis:

curl-XGET'localhost:9200/_analyze?analyzer=standard&pretty'-d'Crimeand

Punishment'

Theprecedingcommandwillallowustocheckhowthestandardanalyzeranalyzesthedata.

ItisworthnotingthatthereisanotherformofanalysisAPIavailable–onewhichallowsustoprovidetokenizersandfilters.Itisveryhandywhenwewanttoexperimentwithconfigurationbeforecreatingthetargetmappings.Insteadofspecifyingtheanalyzer

www.EBooksWorld.ir

Page 404: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

parameterintherequest,weprovidethetokenizerandthefiltersparameters.Wecanprovideasingletokenizerandalistoffilters(separatedbycommacharacter).Forexample,toillustratehowtokenizationusingwhitespacetokenizerworkswithlowercaseandkstemfilterswewouldrunthefollowingrequest:

curl-XGET'localhost:9200/library/_analyze?

tokenizer=whitespace&filters=lowercase,kstem&pretty'-d'JohnSmith'

Aswecansee,ananalysisAPIcanbeveryusefulfortrackingdownbugsinthemappingconfiguration.Itisalsopricelesswhenwewanttosolveproblemswithqueriesandmatching.Itcanshowushowouranalyzerswork,whattermstheyproduce,andwhattheattributesofthosetermsare.Withsuchinformation,analyzingthequeryproblemswillbeeasiertotrackdown.

www.EBooksWorld.ir

Page 405: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ExplainingthequeryInadditiontolookingatwhathappenedduringanalysis,Elasticsearchallowsustoexplainhowthescorewascalculatedforaparticularqueryanddocument.Let’slookatthefollowingexample:

curl-XGET'localhost:9200/library/book/1/_explain?pretty&q=quiet'

Theprecedingrequestspecifiesadocumentandaquerytorun.ThedocumentisspecifiedintheURIandthequeryispassedusingtheqparameter.Usingthe_explainendpoint,weaskElasticsearchforanexplanationabouthowthedocumentwasmatchedbyElasticsearch(ornotmatched).TheresponsereturnedbyElasticsearchfortheprecedingrequestlooksasfollows:

{

"_index":"library",

"_type":"book",

"_id":"1",

"matched":true,

"explanation":{

"value":0.057534903,

"description":"sumof:",

"details":[{

"value":0.057534903,

"description":"weight(_all:quietin0)[PerFieldSimilarity],result

of:",

"details":[{

"value":0.057534903,

"description":"fieldWeightin0,productof:",

"details":[{

"value":1.0,

"description":"tf(freq=1.0),withfreqof:",

"details":[{

"value":1.0,

"description":"termFreq=1.0",

"details":[]

}]

},{

"value":0.30685282,

"description":"idf(docFreq=1,maxDocs=1)",

"details":[]

},{

"value":0.1875,

"description":"fieldNorm(doc=0)",

"details":[]

}]

}]

},{

"value":0.0,

"description":"matchonrequiredclause,productof:",

"details":[{

"value":0.0,

"description":"#clause",

"details":[]

www.EBooksWorld.ir

Page 406: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

},{

"value":3.2588913,

"description":"_type:book,productof:",

"details":[{

"value":1.0,

"description":"boost",

"details":[]

},{

"value":3.2588913,

"description":"queryNorm",

"details":[]

}]

}]

}]

}

}

Itcanlookslightlycomplicatedandwell,itiscomplicated.Itisevenworseifwerealizethatthisisonlyasimplequery!Elasticsearch,andmorespecificallytheLucenelibrary,showstheinternalinformationaboutthescoringprocess.Wewillonlyscratchthesurfaceandwillexplainthemostimportantthingsabouttheprecedingresponse.

ThefirstthingthatyoucannoticeisthatfortheparticularqueryElasticsearchprovidedtheinformationifthedocumentwasamatchornot.Ifthematchedpropertyissettotrue,itmeansthatthedocumentwasamatchfortheprovidedquery.

Thenextimportantthingistheexplanationobject.Itcontainsthreeproperties:thevalue,thedescription,andthedetails.Thevalueisthescorecalculatedforthegivenpartofthequery.Thedescriptionisthesimplifiedtextrepresentationoftheinternalscorecalculation,andthedetailsobjectcontainsdetailedinformationaboutthescorecalculation.ThenicethingisthatthedetailsobjectwillagaincontainthesamethreepropertiesandthisishowElasticsearchprovidesuswithinformationonhowthescoreiscalculated.

Forexample,let’sanalyzethefollowingpartoftheresponse:

"value":0.057534903,

"description":"sumof:",

"details":[{

"value":0.057534903,

"description":"weight(_all:quietin0)[PerFieldSimilarity],result

of:",

"details":[{

"value":0.057534903,

"description":"fieldWeightin0,productof:",

"details":[{

"value":1.0,

"description":"tf(freq=1.0),withfreqof:",

"details":[{

"value":1.0,

"description":"termFreq=1.0",

"details":[]

}]

},{

www.EBooksWorld.ir

Page 407: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"value":0.30685282,

"description":"idf(docFreq=1,maxDocs=1)",

"details":[]

},{

"value":0.1875,

"description":"fieldNorm(doc=0)",

"details":[]

}]

}]

Thescoreoftheelementis0.057534903(thevalueproperty)anditisasumof(weseethatinthedescriptionproperty)alltheinnerelements.Inthedescriptiononthefirstlevelofnestingoftheprecedingfragment,wecanseethatPerFieldSimilarityhasbeenusedandthatthescoreofthatelementistheresultoftheinnerelements–thesecondlevelofnesting.

Onthesecondlevelofdetailsnesting,wecanseethreeelements.Thefirstoneshowsusthescoreoftheelement,whichistheproductofthetwoscoresoftheelementsbelowit.Wecanalsoseevariousinternalstatisticsretrievedfromtheindex:thetermfrequencywhichinformsushowcommonthetermis(termFreq=1.0),theinverteddocumentfrequency,whichshowsushowoftenthetermappearsinthedocuments(idf(docFreq=1,maxDocs=1)),andthefieldnormalizationfactor(fieldNorm(doc=0)).

TheExplainAPIsupportsthefollowingparameters:analyze_wildcard,analyzer,default_operator,df,fields,lenient,lowercase_expanded_terms,parent,preference,routing,_source,_source_exclude,and_source_include.Tolearnmoreaboutalltheseparameters,refertotheofficialElasticsearchdocumentationregardingExplainAPI,whichisavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html.

www.EBooksWorld.ir

Page 408: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 409: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SummaryThechapterwejustfinishedwasfocusedonquerying;notaboutthematchingpartofitbutmostlyaboutscoring.WelearnedhowApacheLuceneTF/IDFscoringworks.WesawthescriptingcapabilitiesofElasticsearchandwehandledmultilingualdata.Weusedboostingtoinfluencehowthescoresofthereturneddocumentswerecalculatedandweusedsynonyms.Finally,weusedexplaininformationtoseehowthedocumentscoreswerecalculatedbythequery.

Inthenextchapter,wewillfullyfocusonElasticsearchdataanalysiscapabilities–theaggregations,theirtypes,andhowtheycanbeused.

www.EBooksWorld.ir

Page 410: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 411: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Chapter7.AggregationsforDataAnalysisInthepreviouschapter,wediscussedthequeryingsideofElasticsearchagain.WelearnedhowtheLuceneTF/IDFalgorithmworksandhowtouseElasticsearchscriptingcapabilities.Wehandledmultilingualdataandinfluenceddocumentscoreswithboosts.WeusedsynonymstomatchwordsthathavethesamemeaningandweusedElasticsearchExplainAPItoseehowdocumentscoreswerecalculated.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

WhatareaggregationsHowtheElasticsearchaggregationengineworksHowtousemetricsaggregationsHowtousebucketsaggregationsHowtousepipelineaggregations

www.EBooksWorld.ir

Page 412: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AggregationsIntroducedinElasticsearch1.0,aggregationsaretheheartofdataanalyticsinElasticsearch.Highlyflexibleandperformant,aggregationsbroughtElasticsearch1.0toanewpositionasafull-featuredanalysisengine.ExtendedthroughthelifeofElasticsearch1.x,in2.xtheyareyetmorepowerful,lessmemorydemanding,andfaster.Withthisframework,youcanuseElasticsearchastheanalysisenginefordataextractionandvisualization.Let’sseehowthatfunctionalityworksandwhatwecanachievebyusingit.

www.EBooksWorld.ir

Page 413: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

GeneralquerystructureTouseaggregations,weneedtoaddanadditionalsectioninourquery.Ingeneral,ourquerieswithaggregationslooklikethis:

{

"query":{…},

"aggs":{

"aggregation_name":{

"aggregation_type":{

...

}

}

}

}

Intheaggsproperty(youcanuseaggregationsifyouwant;aggsisjustanabbreviation),youcandefineanynumberofaggregations.EachaggregationisdefinedbyitsnameandoneofthetypesofaggregationsthatareprovidedbyElasticsearch.Onethingtorememberthoughisthatthekeydefinesthenameoftheaggregation(youwillneedittodistinguishparticularaggregationsintheserverresponse).Let’stakeourlibraryindexandcreatethefirstqueryusinguseaggregations.Acommandsendingsuchaquerylookslikethis:

curl'localhost:9200/library/_search?

search_type=query_then_fetch&size=0&pretty'-d'{

"aggs":{

"years":{

"stats":{

"field":"year"

}

},

"words":{

"terms":{

"field":"copies"

}

}

}

}'

Thisquerydefinestwoaggregations.Theaggregationnamedyearsshowsstatisticsfortheyearfield.Thewordsaggregationcontainsinformationaboutthetermsusedinagivenfield.

NoteInourexamplesweassumedthatweperformaggregationinadditiontosearching.Ifwedon’tneedfounddocuments,abetterideaistousethesizeparameterandsetitto0.Thisomitssomeunnecessaryworkandismoreefficient.Insuchacase,theendpointshouldbe/library/_search?size=0.YoucanreadmoreaboutsearchtypesinChapter3,UnderstandingtheQueryingProcess.

Let’snowlookattheresponsereturnedbyElasticsearchfortheprecedingquery:

www.EBooksWorld.ir

Page 414: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"words":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":0,

"doc_count":2

},{

"key":1,

"doc_count":1

},{

"key":6,

"doc_count":1

}]

},

"years":{

"count":4,

"min":1886.0,

"max":1961.0,

"avg":1928.0,

"sum":7712.0

}

}

}

Asyousee,boththeaggregations(yearsandwords)werereturned.Thefirstaggregationwedefinedinourquery(years)returnedgeneralstatisticsforthegivenfieldgatheredacrossallthedocumentsthatmatchedourquery.Thesecondofthedefinedaggregations(words)wasabitdifferent.Itcreatedseveralsetscalledbucketsthatwerecalculatedonthereturneddocumentsandeachoftheaggregatedvalueswaswithinoneofthesesets.Asyoucansee,therearemultipleaggregationtypesavailableandtheyreturndifferentresults.Wewillseethedifferencesinthelaterpartofthissection.

Thegreatthingabouttheaggregationengineisthatitallowsyoutohavemultipleaggregationsandthataggregationscanbenested.Thismeansthatyoucanhaveindefinitelevelsofnestingandanynumberofaggregationsingeneral.Theextendedstructureofthequeryisshownnext:

{

"query":{…},

"aggs":{

www.EBooksWorld.ir

Page 415: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"first_aggregation_name":{

"aggregation_type":{

...

},

"aggregations":{

"first_nested_aggregation":{

...

},

.

.

.

"nth_nested_aggregation":{

...

}

}

},

.

.

.

"nth_aggregation_name":{

...

}

}

}

www.EBooksWorld.ir

Page 416: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

InsidetheaggregationsengineAggregationsworkonthebasisofresultsreturnedbythequery.Thisisveryhandyaswegettheinformationthatweareinterestedin,bothfromthequeryaswellasthedataanalysisperspective.SowhatdoesElasticsearchdowhenweincludetheaggregationpartofthequeryintherequestthatwesendtoElasticsearch?Firstofall,theaggregationisexecutedoneachrelevantshardandtheresultsarereturnedtothenodethatisresponsibleforrunningthatquery.Thatnodewaitsforthepartialresultstobecalculated;afteritgetsalltheresults,itmergestheresults,producingthefinalresults.

Thisapproachisnothingnewwhenitcomestodistributedsystemsandhowtheyworkandcommunicate,butcancauseissueswhenitcomestotheprecisionoftheresults.Inmostcasesthisisnotaproblem,butyoushouldbeawareaboutwhattoexpect.Let’simaginethefollowingexample:

Theprecedingimageshowsasimplifiedviewofthreeshards,eachcontainingdocumentshavingonlyElasticsearchandSolrtermsinthem.Nowimaginethatweareinterestedinasingletermforourindex.Thetermsaggregationwhenrunusingsize=1wouldreturnasingleterm,thatwouldbetheonethatisthemostfrequent(ofcourselimitedtothequerywe’verun).SoouraggregatornodewouldseepartialresultstellingusthatElasticsearchispresentin19documentsinShard1andtheSolrtermispresentin10documentsinShard2andShard3,whichmeansthatthetoptermisSolr,whichisnottrue.Thisisanextremecase,butthereareusecases(suchasaccounting)whereprecisioniskeyandyoushouldbeawareaboutsuchsituations.

NoteComparedtoqueries,aggregationsareheavierforElasticsearchintermsofbothCPUcyclesandmemoryconsumption.WewilldiscussthisinmoredetailintheCachingAggregationssectionofthischapter.

www.EBooksWorld.ir

Page 417: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 418: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AggregationtypesElasticsearch2.xallowsustousethreetypesofaggregation:metrics,buckets,andpipeline.Themetricsaggregationsreturnametric,justlikethestatsaggregationweusedforthestatsfield.Thebucketaggregationsreturnbuckets,thekeyandthenumberofdocumentssharingthesamevalues,ranges,andsoon,justlikethetermsaggregationweusedforthecopiesfield.Finally,thepipelineaggregationsintroducedinElasticsearch2.0aggregatetheoutputoftheotheraggregationsandtheirmetrics,whichallowsustodoevenmoresophisticateddataanalysis.Knowingallthat,let’snowlookatalltheaggregationswecanuseinElasticsearch2.x.

www.EBooksWorld.ir

Page 419: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

MetricsaggregationsWewillstartwiththemetricsaggregations,whichcanaggregatevaluesfromdocumentsintoasinglemetric.Thisisalwaysthecasewithmetricsaggregations–youcanexpectthemtobeasinglemetriconthebasisofthedata.Let’snowtakealookatthemetricsaggregationsavailableinElasticsearch2.x.

Minimum,maximum,average,andsumThefirstgroupofmetricsaggregationsthatwewanttoshowyouistheonethatcalculatesthebasicvaluefromthegivendocuments.Theseaggregationsare:

min:Thiscalculatestheminimumvaluefromthegivennumericfieldinthereturneddocumentsmax:Thiscalculatesthemaximumvaluefromthegivennumericfieldinthereturneddocumentsavg:Thiscalculatesanaveragefromthegivennumericfieldinthereturneddocumentssum:Thiscalculatesthesumfromthegivennumericfieldinthereturneddocuments

Asyoucansee,theprecedingmentionedaggregationsareprettyself-explanatory.So,let’strytocalculatetheaveragevalueonourdata.Forexample,let’sassumethatwewanttocalculatetheaveragenumberofcopiesforourbooks.Thequerytodothatwilllookasfollows:

{

"aggs":{

"avg_copies":{

"avg":{

"field":"copies"

}

}

}

}

TheresultsreturnedbyElasticsearchafterrunningtheprecedingquerywillbeasfollows:

{

"took":5,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"avg_copies":{

"value":1.75

www.EBooksWorld.ir

Page 420: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

}

}

So,wehaveanaverageof1.75copiesperbook.Itisveryeasytocalculate–(6+0+1+0)/4isequalto1.75.Seemsthatwegotitright.

Missingvalues

ThenicethingaboutthepreviouslymentionedaggregationsisthatwecancontrolwhatvalueElasticsearchcanuseifthefieldswe’vespecifieddon’thaveany.Forexample,ifwewantedElasticsearchtouse0asthevalueforthecopiesfieldinourpreviousexample,wewouldaddthemissingpropertytoourqueryandandsetitto0.Forexample:

{

"aggs":{

"avg_copies":{

"avg":{

"field":"copies",

"missing":0

}

}

}

}

Usingscripts

Theinputvaluescanalsobegeneratedbyascript.Forexample,ifwewanttofindtheminimumvaluefromallthevaluesintheyearfield,butwewanttosubtract1000fromthosevalues,wewillsendanaggregationlikethefollowingone:

{

"aggs":{

"min_year":{

"min":{

"script":"doc['year'].value-1000"

}

}

}

}

NoteNotethattheprecedingqueryrequiresinlinescriptstobeallowed.Thismeansthatthequeryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile.

Inthiscase,thevaluetheaggregationswillusewillbetheoriginalyearfieldvaluereducedby1000.

WecanalsousethevaluescriptcapabilitiesofElasticsearch.Forexample,toachievethesameasthepreviousscript,wecanusethefollowingquery:

{

"aggs":{

"min_year":{

"min":{

www.EBooksWorld.ir

Page 421: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"field":"year",

"script":{

"inline":"_value-factor",

"params":{

"factor":1000

}

}

}

}

}

}

IfyouarenotfamiliarwithElasticsearchscriptingcapabilities,youcanreadmoreaboutitintheScriptingcapabilitiesofElasticsearchsectionofChapter6,MakeYourSearchBetter.

Onethingworthrememberingisthatusingthecommandlinemayrequireproperescapingofthevaluesinthedocarray.Forexample,thecommandthatexecutesthefirstscriptedquerywouldlookasfollows:

curl-XGET'localhost:9200/library/_search?size=0&pretty'-d'{

"aggs":{

"min_year":{

"min":{

"script":"doc[\"year\"].value-1000"

}

}

}

}'

FieldvaluestatisticsandextendedstatisticsThenextaggregationswewilldiscussaretheonesthatprovideuswiththestatisticalinformationaboutthenumericfieldwearerunningtheaggregationon:thestatsandextended_statsaggregations.

Forexample,thefollowingqueryprovidesextendedstatisticsfortheyearfield:

{

"aggs":{

"extended_statistics":{

"extended_stats":{

"field":"year"

}

}

}

}

Theresponsetotheprecedingquerywillbeasfollows:

{

"took":1,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

www.EBooksWorld.ir

Page 422: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"extended_statistics":{

"count":4,

"min":1886.0,

"max":1961.0,

"avg":1928.0,

"sum":7712.0,

"sum_of_squares":1.4871654E7,

"variance":729.5,

"std_deviation":27.00925767213901,

"std_deviation_bounds":{

"upper":1982.018515344278,

"lower":1873.981484655722

}

}

}

}

Asyoucansee,intheresponsewegotinformationaboutthenumberofdocumentswithvalueintheyearfield,theminimumvalue,themaximumvalue,theaverage,andthesum.Thesearethevaluesthatwewillgetifwerunthestatsaggregationinsteadofextended_stats.Theextended_statsaggregationprovidesadditionalinformation,suchasthesumofsquares,variance,andstandarddeviation.Elasticsearchprovidestwotypesofaggregationsbecauseextended_statsisslightlymoreexpensivewhenitcomestoprocessingpower.

NoteThestatsandextended_statsaggregations,similartothemin,max,avg,andsumaggregations,supportscriptingandallowustospecifywhichvalueshouldbeusedforthefieldsthatdon’thavevalueinthespecifiedfield.

ValuecountThevalue_countaggregationisasimpleaggregationwhichallowscountingvaluesinaggregateddocuments.Thisisquiteusefulwhenusedwithnestedaggregations.Wearenotfocusingonthattopicrightnow,butitissomethingtokeepinmind.Forexample,tousethevalue_countaggregationonthecopiedfield,wewillrunthefollowingquery:

{

"aggs":{

"count":{

"value_count":{

"field":"copies"

}

}

}

www.EBooksWorld.ir

Page 423: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

NoteThevalue_countaggregationallowsustousescripts,discussedearlierinthischapterwhenwedescribedthemin,max,avg,andsumaggregations.PleaserefertothebeginningofMetricsaggregationsectionearlierinthecurrentchapterforfurtherreference.

FieldcardinalityOneoftheaggregationthatallowsustocontrolhowresourcehungrytheaggregationwillbebycontrollingitsprecision,thecardinalityaggregationcalculatesthecountofdistinctvaluesinagivenfield.However,onethingneedstoberemembered:thecalculatedcountisanapproximation,nottheexactvalue.ElasticsearchusestheHyperLogLog++algorithm(http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdftocalculatethevalue.

Thisaggregationhasawidevarietyofusecases,suchasshowingthenumberofdistinctvaluesinafieldthatisresponsibleforholdingthestatuscodeforyourindexedApacheaccesslogs.Onequery,andyouknowtheapproximatedcountofthedistinctvaluesinthatfield.

Forexample,wecanrequestthecardinalityforourtitlefield:

{

"aggs":{

"card_title":{

"cardinality":{

"field":"title"

}

}

}

}

Tocontroltheprecisionofthecardinalitycalculation,wecanspecifytheprecision_thresholdproperty–thehigherthevalue,themoreprecisetheaggregationwillbeandthemoreresourcesitwillneed.Thecurrentmaximumprecision_thresholdvalueis40000andthedefaultdependsontheparentaggregation.Anexamplequeryusingtheprecision_thresholdpropertylooksasfollows:

{

"aggs":{

"card_title":{

"cardinality":{

"field":"title",

"precision_threshold":1000

}

}

}

}

Percentiles

www.EBooksWorld.ir

Page 424: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThepercentilesaggregationisanotherexampleofaggregationinElasticsearch.Itusesanalgorithmicapproximationapproachtoprovideuswithresults.ItusestheT-Digestalgorithm(https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)fromTedDunningandOtmarErtlandallowsustocalculatepercentiles:metricsthatshowushowmanyresultsareaboveacertainvalue.Forexample,the99thpercentileshowsusthevaluethatisgreaterthan99percentoftheothervalues.

Let’sgointoanexampleandlookataquerythatwillcalculatepercentilesfortheyearfieldinourdata:

{

"aggs":{

"copies_percentiles":{

"percentiles":{

"field":"year"

}

}

}

}

TheresultsreturnedbyElasticsearchfortheprecedingrequestwilllookasfollows:

{

"took":26,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"copies_percentiles":{

"values":{

"1.0":1887.2899999999997,

"5.0":1892.4499999999998,

"25.0":1918.25,

"50.0":1932.5,

"75.0":1942.25,

"95.0":1957.25,

"99.0":1960.25

}

}

}

}

Asyoucansee,thevaluethatishigherthan99percentofthevaluesis1960.25.

Youmaywonderwhysuchaggregationisimportant.Itisveryusefulforperformancemetrics;forexample,whereweusuallylookataveragesforsomeperiodoftime.Imaginethattheaverageresponsetimeofourqueriesforthelasthouris50milliseconds,whichis

www.EBooksWorld.ir

Page 425: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

notbad.However,ifthe95thpercentilewouldshow2seconds,thatwouldmeanthatabout5percentoftheusershadtowaittwoormoresecondsforthesearchresults,whichisnotthatgood.

Bydefault,thepercentilesaggregationcalculatessevenpercentiles:1,5,25,50,75,95,and99.Wecancontrolthisbyusingthepercentspropertyandspecifywhichpercentilesweareinterestedin.Forexample,ifwewanttogetonlythe95thandthe99thpercentile,wechangeourquerytothefollowingone:

{

"aggs":{

"copies_percentiles":{

"percentiles":{

"field":"year",

"percents":["95","99"]

}

}

}

}

NoteSimilartothemin,max,avg,andsumaggregations,thepercentilesaggregationsupportsscriptingandallowsustospecifywhichvalueshouldbeusedforthefieldsthatdon’thavevalueinthespecifiedfield.

We’vementionedearlierthatthepercentilesaggregationusesanalgorithmicapproachandisanapproximation.Aswithallapproximations,wecancontroltheprecisionandmemoryusageofthealgorithm.Wedothatbyusingthecompressionproperty,whichdefaultsto100.ItisaninternalpropertyofElasticsearchanditsimplementationdetailsmaychangebetweenversions.Itisworthknowingthatsettingthecompressionvaluetoonehigherthan100canincreasethealgorithmprecisionatthecostofmemoryusage.

PercentileranksThepercentile_ranksaggregationissimilartothepercentilesonethatwejustdiscussed.Itallowsustoshowwhichpercentileagivenvaluehas.Forexample,toshowuswhichpercentileyear1932andyear1960are,werunthefollowingquery:

{

"aggs":{

"copies_percentile_ranks":{

"percentile_ranks":{

"field":"year",

"values":["1932","1960"]

}

}

}

}

TheresponsereturnedbyElasticsearchwillbeasfollows:

{

"took":2,

www.EBooksWorld.ir

Page 426: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"copies_percentile_ranks":{

"values":{

"1932.0":49.5,

"1960.0":61.5

}

}

}

}

TophitsaggregationThetop_hitsaggregationkeepstrackofthemostrelevantdocumentbeingaggregated.Thisdoesn’tsoundveryappealing,butitallowsustoimplementoneofthemostdesiredfunctionalitiesinElasticsearchcalleddocumentgrouping,fieldcollapsing,ordocumentfolding.Suchfunctionalityisveryusefulinsomeusecases—forexample,whenwewanttoshowabookcatalogbutonlyonefromasinglepublisher.Todothatwithoutthetop_hitsaggregation,wewouldhavetorunmultiplequeries.Withthetop_hitsaggregation,weneedonlyasinglequery.

Thetop_hitsaggregationwasintroducedinElasticsearch1.3.Infact,thementioneddocumentfoldingismoreorlessasideeffectandonlyoneofthepossibleusageexamplesofthetop_hitsaggregation.

Theideabehindthetop_hitsaggregationissimple.Everydocumentthatisassignedtoaparticularbucketcanbealsoremembered.Bydefault,onlythreedocumentsperbucketareremembered.

NoteNotethat,inordertoshowthefullpotentialofthetop_hitsaggregation,wedecidedtouseoneofthebucketingaggregationsaswellandnestthemtoshowthedocumentgroupingfunctionalityimplementation.Thebucketingaggregationsaredescribedindetaillaterinthischapter.

Toshowyouapotentialusecasethatleveragesthetop_hitsaggregation,wehavedecidedtouseasimpleexample.Wewouldliketogetthemostrelevantbookpublishedevery100years.Todothatweusethefollowingquery:

{

"aggs":{

"when":{

www.EBooksWorld.ir

Page 427: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"book":{

"top_hits":{

"_source":{

"include":["title","available"]

},

"size":1

}

}

}

}

}

}

Intheprecedingexample,wedidthehistogramaggregationonyearranges.Everybucketwascreatedforeveryonehundredyears.Thenestedtop_hitsaggregationsremembersasingledocumentwiththegreatestscorefromeachbucket(becauseofthesizepropertybeingsetto1).Weaddedtheincludeoptiononlyforsimplerresults,sothatweonlyreturnthetitleandavailablefieldsforeveryaggregateddocument.TheresponsereturnedbyElasticsearchwillbeasfollows:

{

"took":8,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"when":{

"buckets":[{

"key":1800,

"doc_count":1,

"book":{

"hits":{

"total":1,

"max_score":1.0,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":1.0,

"_source":{

"available":true,

"title":"CrimeandPunishment"

www.EBooksWorld.ir

Page 428: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

}]

}

}

},{

"key":1900,

"doc_count":3,

"book":{

"hits":{

"total":3,

"max_score":1.0,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"2",

"_score":1.0,

"_source":{

"available":false,

"title":"Catch-22"

}

}]

}

}

}]

}

}

}

Wecanseethat,becauseofthetop_hitsaggregation,wehavethemostscoringdocument(fromeachbucket)includedintheresponse.Inourparticularcase,thequerywasthematch_alloneandallthedocumentshadthesamescore,sothetop-scoringdocumentforeverybucketwasmoreorlessrandom.However,youneedtorememberthatthisisthedefaultbehavior.Ifwewanttohavecustomsorting,thisisnotaproblemforElasticsearch.Wejustneedtoaddthesortpropertyforourtop_hitsaggregator.Forexample,wecanreturnthefirstbookfromagivencentury:

{

"aggs":{

"when":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"book":{

"top_hits":{

"sort":{

"year":"asc"

},

"_source":{

"include":["title","available"]

},

"size":1

}

}

www.EBooksWorld.ir

Page 429: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

}

}

}

Weaddedsortingtothetop_hitsaggregation,sotheresultsaresortedonthebasisoftheyearfield.Thismeansthatthefirstdocumentwillbetheonewiththelowestvalueinthatfieldandthisisthedocumentthatisgoingtobereturnedforeachbucket.

Additionalparameters

Sortingandfieldinclusionisnoteverythingthatwecanwedoinsidethetop_hitsaggregation.Becausethisaggregationreturnsdocuments,wecanalsousefunctionalitiessuchas:

highlightingexplainscriptingfielddatafield(uninvertedrepresentationofthefields)version

Wejustneedtoincludeanappropriatesectioninthetop_hitsaggregationbody,similartowhatwedowhenweconstructaquery.Forexample:

{

"aggs":{

"when":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"book":{

"top_hits":{

"highlight":{

"fields":{

"title":{}

}

},

"explain":true,

"version":true,

"_source":{

"include":["title","available"]

},

"fielddata_fields":["title"],

"script_fields":{

"century":{

"script":"(doc[\"year\"].value/100).intValue()"

}

},

"size":1

}

}

}

}

www.EBooksWorld.ir

Page 430: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

}

NoteNotethattheprecedingqueryrequirestheinlinescriptstobeallowed.Thismeansthatthequeryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile.

GeoboundsaggregationThegeo_boundsaggregationisasimpleaggregationthatallowsustocomputetheboundingboxthatincludesallthegeo_pointtypefieldvaluesfromtheaggregateddocuments.

NoteIfyouareinterestedinspatialsearches,thesectiondedicatedtoitiscalledGeoandisincludedinChapter8,BeyondFull-textSearching.

Weonlyneedtoprovidethefield(byusingthefieldproperty;itneedstobeofthegeo_pointtype).Wecanalsoprovidewrap_longitude(valuestrueorfalse;itdefaultstotrue)iftheboundingboxisallowedtooverlaptheinternationaldateline.Inresponse,wegetthelatitudeandlongitudeofthetop-leftandbottom-rightcornersoftheboundingbox.Anexamplequeryusingthisaggregationlooksasfollows(usingthehypotheticallocationfield):

{

"aggs":{

"box":{

"geo_bounds":{

"field":"location"

}

}

}

}

ScriptedmetricsaggregationThelastmetricaggregationwewanttodiscussisthescripted_metricaggregation,whichallowsustodefineourownaggregationcalculationusingscripts.Forthisaggregation,wecanprovidethefollowingscripts(map_scriptistheonlyrequiredone,therestareoptional):

init_script:Thisscriptisrunduringinitializationandallowsustosetupaninitialstateofthecalculation.map_script:Thisistheonlyrequiredscript.Itisexecutedonceforeverydocumentthatneedstostorethecalculationinanobjectcalled_agg.combine_script:ThisscriptisexecutedonceoneachshardafterElasticsearchfinishesdocumentcollectiononthatshard.reduce_script:Thisscriptisexecutedonceonthenodethatiscoordinatingaparticularqueryexecution.Thisscripthasaccesstothe_aggsvariable,whichisanarrayofthevaluesreturnedbycombine_script.

www.EBooksWorld.ir

Page 431: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Forexample,wecanusethescripted_metricaggregationtocalculateallthecopiesofallthebookswehaveinourlibrarybyrunningthefollowingrequest(weshowthewholerequesttoshowhowthenamesareescaped):

curl-XGET'localhost:9200/library/_search?size=0&pretty'-d'{

"aggs":{

"all_copies":{

"scripted_metric":{

"init_script":"_agg[\"all_copies\"]=0",

"map_script":"_agg.all_copies+=doc.copies.value",

"combine_script":"return_agg.all_copies",

"reduce_script":"sum=0;for(numberin_aggs){sum+=number};

returnsum"

}

}

}

}'

Ofcourse,theprecedingscriptisjustasimplesumandwecouldusesumaggregation,butwejustwantedtoshowyouasimpleexampleofwhatyoucandowiththescripted_metricaggregation.

NoteNotethattheprecedingqueryrequiresinlinescriptstobeallowed.Thismeansthatthequeryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile.

Asyoucansee,theinit_scriptpartoftheaggregationisusedtoinitializetheall_copiesvariable.Next,wehavemap_script,whichisexecutedonceforeverydocumentandwejustaddthevalueofthecopiesfieldtotheearlierinitializedvariable.Thecombine_scriptpart,executedonceoneachshard,tellsElasticsearchtoreturnthecalculatedvariable.Finally,thereduce_scriptpart,executedonceforthewholequeryontheaggregatornode,willrunaforloop,whichwillgothroughallthereturnedvaluesthatarestoredinthe_aggsarrayandreturnthesumofthose.ThefinalresultreturnedbyElasticsearchfortheprecedingquerylooksasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"all_copies":{

"value":7

}

www.EBooksWorld.ir

Page 432: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

}

www.EBooksWorld.ir

Page 433: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

BucketsaggregationsThesecondtypeofaggregationsthatwewilldiscussarethebucketsaggregations.Incomparisontometricsaggregations,bucketaggregationreturnsdatanotasasinglemetricbutasalistofkeyvaluepairscalledbuckets.Forexample,thetermsaggregationreturnsthenumberofdocumentsassociatedwitheachterminagivenfield.Theverypowerfulthingaboutbucketsaggregationsisthattheycanhavesub-aggregations,whichmeansthatwecannestotheraggregationsinsidetheaggregationsthatreturnbuckets(wewilldiscussthisattheendofthebucketsaggregationdiscussion).Let’slookatthebucketaggregationsthatareprovidedbyElasticsearchnow.

FilteraggregationThefilteraggregationisasimplebucketingaggregationthatallowsustofiltertheresultstoasinglebucket.Forexample,let’sassumethatwewanttogetacountandtheaveragecopiescountofallthebooksthatarenovels,whichmeanstheyhavethetermnovelinthetagsfield.Thequerythatwillreturnsuchresultslooksasfollows:

{

"aggs":{

"novels_count":{

"filter":{

"term":{

"tags":"novel"

}

},

"aggs":{

"avg_copies":{

"avg":{

"field":"copies"

}

}

}

}

}

}

Asyoucansee,wedefinedthefilterinthefiltersectionoftheaggregationdefinitionandwedefinedasecondnestedaggregation.Thenestedaggregationistheonethatwillberunonthefiltereddocuments.

TheresponsereturnedbyElasticsearchlooksasfollows:

{

"took":13,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

www.EBooksWorld.ir

Page 434: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"max_score":0.0,

"hits":[]

},

"aggregations":{

"novels_count":{

"doc_count":2,

"avg_copies":{

"value":3.5

}

}

}

}

Inthereturnedbucket,wehaveinformationaboutthenumberofdocuments(representedbythedoc_countproperty)andtheaveragenumberofcopies,whichisallwewanted.

FiltersaggregationThesecondbucketaggregationwewanttoshowyouisthefiltersaggregation.Whilethepreviouslydiscussedfilteraggregationresultedinasinglebucket,thefiltersaggregationreturnsmultiplebuckets–oneforeachofthedefinedfilters.Let’sextendourpreviousexampleandassumethat,inadditiontotheaveragenumberofcopiesforthenovels,wealsowanttoknowtheaveragenumberofcopiesforthebooksthatareavailable.Thequerythatwillgetusthisinformationwillusethefiltersaggregationandwilllookasfollows:

{

"aggs":{

"count":{

"filters":{

"filters":{

"novels":{

"term":{

"tags":"novel"

}

},

"available":{

"term":{

"available":true

}

}

}

},

"aggs":{

"avg_copies":{

"avg":{

"field":"copies"

}

}

}

}

}

}

Let’sstophereandlookatthedefinitionoftheaggregation.Asyoucansee,wedefined

www.EBooksWorld.ir

Page 435: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

twofiltersusingthefilterssectionofthefiltersaggregation.EachfilterhasanameandtheactualElasticsearchfilter;thefirstiscallednovelsandthesecondiscalledavailable.Elasticsearchwillusethesenamesinthereturnedresponse.ThethingtorememberisthatElasticsearchwillcreateabucketforeachdefinedfilterandwillcalculatethenestedaggregationthatwedefined–inourcase,theonethatcalculatestheaveragenumberofcopies.

NoteThefiltersaggregationallowsustoreturnonemorebucketinadditiontothedefinedones–abucketwithallthedocumentsthatdidn’tmatchthefilters.Inordertocalculatesuchabucket,weneedtoaddtheother_bucketpropertytothebodyoftheaggregationandsetittotrue.

TheresultsreturnedbyElasticsearchareasfollows:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"count":{

"buckets":{

"novels":{

"doc_count":2,

"avg_copies":{

"value":3.5

}

},

"available":{

"doc_count":2,

"avg_copies":{

"value":0.5

}

}

}

}

}

}

Asyoucansee,wegottwobuckets,whichiswhatweexpected.

TermsaggregationOneofthemostcommonlyusedbucketaggregationsisthetermsaggregation.Itallows

www.EBooksWorld.ir

Page 436: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ustogetinformationaboutthetermsandthecountofdocumentshavingthoseterms.Forexample,oneofthesimplestusesisgettingthecountofthebooksthatareavailableandnotavailable.Wecandothatbyrunningthefollowingquery:

{

"aggs":{

"counts":{

"terms":{

"field":"available"

}

}

}

}

Intheresponse,wewillgettwobuckets(becausetheBooleanfieldcanonlyhavetwovalues–trueandfalse).Here,thiswilllookasfollows:

{

"took":7,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"counts":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":0,

"key_as_string":"false",

"doc_count":2

},{

"key":1,

"key_as_string":"true",

"doc_count":2

}]

}

}

}

Bydefault,thedataissortedonthebasisofdocumentcount,whichmeansthatthemostcommontermswillbeplacedontopoftheaggregationresults.Ofcourse,wecancontrolthisbehaviorbyspecifyingtheorderpropertyandprovidingtheorderjustlikeweusuallydowhensortingbyarbitraryfieldvalues.Elasticsearchallowsustosortbythedocumentcount(usingthe_countstaticvalue)andbytheterm(usingthe_termstaticvalue).Forexample,ifwewanttosortourprecedingaggregationresultsbydescendingterm,wecanrunthefollowingquery:

www.EBooksWorld.ir

Page 437: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

{

"aggs":{

"counts":{

"terms":{

"field":"available",

"order":{"_term":"desc"}}

}

}

}

However,that’snotallwhenitcomestosorting.Wecanalsosortbytheresultsofthenestedaggregationsthatwereincludedinthequery.

Notetermsaggregation,similartothemin,max,avg,andsumaggregationsdiscussedinthemetricsaggregationsectionofthischapter,supportsscriptingandallowsustospecifywhichvalueshouldbeusedforthefieldsthatdon’thaveavalueinthespecifiedfield.

Countsareapproximate

Thethingtorememberwhendiscussingtermsaggregationisthatthecountsareapproximate.Thisisbecauseeachshardprovidesitsowncountsandreturnsthataggregatedinformationtothecoordinatingnode.Thecoordinatingnodeaggregatestheinformationitgotreturningthefinalinformationtotheclient.Becauseofthat,dependingonthedataandhowitisdistributedbetweentheshards,someinformationaboutthecountsmaybelostandthecountswillnotbeexact.Ofcourse,whendealingwithlowcardinalityfields,theapproximationwillbeclosertoexactnumbers,butstillthisissomethingthatshouldbeconsideredwhenusingthetermsaggregation.

Wecancontrolhowmuchinformationisreturnedfromeachoftheshardstothecoordinatingnode.Wecandothisbyspecifyingthesizeandtheshard_sizeproperties.Thesizepropertyspecifieshowmanybucketswillbereturnedatmost.Thehigherthesizeproperty,themoreaccuratethecalculationwillbe.However,thatwillcostusadditionalmemoryandCPUcycles,whichmeansthatthecalculationwillbemoreexpensiveandwillputmorepressureonthehardware.Thisisbecausetheresultsreturnedtothecoordinatingnodefromeachshardwillbelargerandtheresultmergingprocesswillbeharder.

Theshard_sizepropertycanbeusedtominimizetheworkthatneedstobedonebythecoordinatingnode.Whenset,thecoordinatingnodewillfetch(fromeachshard)thenumberofbucketsdeterminedbytheshard_sizeproperty.Thisallowsustoincreasetheprecisionoftheaggregationwhileavoidingtheadditionaloverheadonthecoordinatingnode.Rememberthattheshard_sizepropertycannotbesmallerthanthesizeproperty.

Finally,thesizepropertycanbesetto0,whichwilltellElasticsearchnottolimitthenumberofreturnedbuckets.Itisusuallynotwisetosetthesizepropertyto0asitcanresultinhighresourceconsumption.Also,avoidsettingthesizepropertyto0forhighcardinalityfieldsasthiswilllikelymakeyourElasticsearchclusterexplode.

Minimumdocumentcount

www.EBooksWorld.ir

Page 438: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Elasticsearchprovidesuswithtwoadditionalproperties,whichcanbeusefulincertainsituations:min_doc_countandshard_min_doc_count.Themin_doc_countpropertydefaultsto1andspecifieshowmanydocumentsmustmatchatermtobeincludedintheaggregationresults.Onethingtorememberisthatsettingthemin_doc_countpropertyto0willresultinreturningalltheterms,nomatteriftheyhaveamatchingdocumentornot.Thiscanresultinaverylargeresultsetforaggregationresults.Forexample,ifwewanttoreturntermsmatchedby5ormoredocuments,wewillrunthefollowingquery:

{

"aggs":{

"counts":{

"terms":{

"field":"available",

"min_doc_count":5}

}

}

}

Theshard_min_doc_countpropertyisverysimilaranddefineshowmanydocumentsmustmatchatermtobeincludedintheaggregation’sresults,butontheshardlevel.

RangeaggregationTherangeaggregationallowsustodefineoneormorerangesandElasticsearchcalculatesbucketsforthem.Forexample,ifwewanttocheckhowmanybookswerepublishedinagivenperiodoftime,wecreatethefollowingquery:

{

"aggs":{

"years":{

"range":{

"field":"year",

"ranges":[

{"to":1850},

{"from":1851,"to":1900},

{"from":1901,"to":1950},

{"from":1951,"to":2000},

{"from":2001}

]

}

}

}

}

Wespecifythefieldwewanttheaggregationtobecalculatedonandthearrayofranges.Eachrangeisdefinedbyoneortwoproperties:thetwoandfromsimilartotherangequerieswhichwealreadydiscussed.

TheresultreturnedbyElasticsearchforourdatalooksasfollows:

{

"took":23,

"timed_out":false,

"_shards":{

www.EBooksWorld.ir

Page 439: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"years":{

"buckets":[{

"key":"*-1850.0",

"to":1850.0,

"to_as_string":"1850.0",

"doc_count":0

},{

"key":"1851.0-1900.0",

"from":1851.0,

"from_as_string":"1851.0",

"to":1900.0,

"to_as_string":"1900.0",

"doc_count":1

},{

"key":"1901.0-1950.0",

"from":1901.0,

"from_as_string":"1901.0",

"to":1950.0,

"to_as_string":"1950.0",

"doc_count":2

},{

"key":"1951.0-2000.0",

"from":1951.0,

"from_as_string":"1951.0",

"to":2000.0,

"to_as_string":"2000.0",

"doc_count":1

},{

"key":"2001.0-*",

"from":2001.0,

"from_as_string":"2001.0",

"doc_count":0

}]

}

}

}

Forexample,between1901and1950wehadtwobooksreleased.

NoteTherangeaggregation,similartothemin,max,avg,andsumaggregationsdiscussedinthemetricsaggregationssectionofthischapter,supportsscriptingandallowsustospecifywhichvalueshouldbeusedforthefieldsthatdon’thaveavalueinthespecifiedfield.

Keyedbuckets

www.EBooksWorld.ir

Page 440: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Onethingthatshouldmentionwhenitcomestotherangeaggregationisthatwecangivethedefinedrangesnames.Forexample,let’sassumethatwewanttousethenamesBefore18thcenturyforthebooksreleasedbefore1799,18thcenturyforthebooksreleasedbetween1800and1900,19thcenturyforthebooksreleasedbetween1900and1999,andAfter19thcenturyforthebooksreleasedafter2000.Wecandothisbyaddingthekeypropertytoeachdefinedrange,givingitthename,andaddingthekeyedpropertysettotrue.Settingthekeyedpropertytotruewillassociateauniquestringvaluetoeachbucketandthekeypropertydefinesthenameforthebucketthatwillbeusedastheuniquename.Aquerythatdoesthatwilllookasfollows:

{

"aggs":{

"years":{

"range":{

"field":"year",

"keyed":true,

"ranges":[

{"key":"Before18thcentury","to":1799},

{"key":"18thcentury","from":1800,"to":1899},

{"key":"19thcentury","from":1900,"to":1999},

{"key":"After19thcentury","from":2000}

]

}

}

}

}

TheresponsereturnedbyElasticsearchinsuchacasewilllookasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"years":{

"buckets":{

"Before18thcentury":{

"to":1799.0,

"to_as_string":"1799.0",

"doc_count":0

},

"18thcentury":{

"from":1800.0,

"from_as_string":"1800.0",

"to":1899.0,

www.EBooksWorld.ir

Page 441: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"to_as_string":"1899.0",

"doc_count":1

},

"19thcentury":{

"from":1900.0,

"from_as_string":"1900.0",

"to":1999.0,

"to_as_string":"1999.0",

"doc_count":3

},

"After19thcentury":{

"from":2000.0,

"from_as_string":"2000.0",

"doc_count":0

}

}

}

}

}

NoteAnimportantandquiteusefulpointabouttherangeaggregationisthatthedefinedrangesneednotbedisjoint.Insuchcases,Elasticsearchwillproperlycountthedocumentformultiplebuckets.

DaterangeaggregationThedate_rangeaggregationissimilartothepreviouslydiscussedrangeaggregationbutitisdesignedforfieldsthatusedate-basedtypes.However,inthelibraryindex,thedocumentshaveyears,butthefieldisanumber,notadate.Forthepurposeofshowinghowthisaggregationworks,let’simaginethatwewanttoextendourlibraryindextosupportnewspapers.Todothiswewillcreateanewindex(calledlibrary2)byusingthefollowingcommand:

curl-XPOSTlocalhost:9200/_bulk--data-binary'{"index":{"_index":

"library2","_type":"book","_id":"1"}}

{"title":"Fishingnews","published":"2010/12/0310:00:00","copies":3,

"available":true}

{"index":{"_index":"library2","_type":"book","_id":"2"}}

{"title":"Knittingmagazine","published":"2010/11/0711:32:00",

"copies":1,"available":true}

{"index":{"_index":"library2","_type":"book","_id":"3"}}

{"title":"Theguardian","published":"2009/07/1304:33:00","copies":0,

"available":false}

{"index":{"_index":"library2","_type":"book","_id":"4"}}

{"title":"HadoopWorld","published":"2012/01/0104:00:00","copies":6,

"available":true}

'

Forthepurposeofthisexample,wewillleavethemappingsdefinitionforElasticsearch–thisissufficientinthiscase.Let’sstartwiththefirstqueryusingthedate_rangeaggregation:

{

www.EBooksWorld.ir

Page 442: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"aggs":{

"years":{

"date_range":{

"field":"published",

"ranges":[

{"to":"2009/12/31"},

{"from":"2010/01/01","to":"2010/12/31"},

{"from":"2011/01/01"}

]

}

}

}

}

Comparedwiththeordinaryrangeaggregation,theonlythingthatchangedistheaggregationtype,whichisnowdate_range.ThedatescanbepassedasastringinaformrecognizedbyElasticsearchorasanumbervalue(numberofmillisecondssince1970-01-01).TheresponsereturnedbyElasticsearchfortheprecedingquerylooksasfollows:

{

"took":5,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"years":{

"buckets":[{

"key":"*-2009/12/3100:00:00",

"to":1.2622176E12,

"to_as_string":"2009/12/3100:00:00",

"doc_count":1

},{

"key":"2010/01/0100:00:00-2010/12/3100:00:00",

"from":1.262304E12,

"from_as_string":"2010/01/0100:00:00",

"to":1.2937536E12,

"to_as_string":"2010/12/3100:00:00",

"doc_count":2

},{

"key":"2011/01/0100:00:00-*",

"from":1.29384E12,

"from_as_string":"2011/01/0100:00:00",

"doc_count":1

}]

}

}

}

www.EBooksWorld.ir

Page 443: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Asyoucansee,theresponseisnodifferentwhencomparedtotheresponsereturnedbytherangeaggregation.Wehavetwoattributesforeachbucket-namedfromandtowhichrepresentthenumberofmillisecondsfrom1970-01-01.Thepropertiesfrom_as_stringandto_as_stringpresentthesameinformationasfromandto,butinahuman-readableform.Ofcoursethekeyedparameterandkeyinthedefinitionofdaterangeworkinthealreadydescribedway.

Elasticsearchalsoallowsustodefinetheformatofpresenteddatesusingtheformatattribute.Inourexample,wepresentedthedateswithyearresolution,sothedayandtimepartswereunnecessary.Ifwewanttoshowthemonthnames,wecansendaquerysuchasthefollowingone:

{

"aggs":{

"years":{

"date_range":{

"field":"published",

"format":"MMMMYYYY",

"ranges":[

{"to":"December2009"},

{"from":"January2010","to":"December2010"},

{"from":"January2011"}

]

}

}

}

}

Notethatthedatesinthetoandfromparametersalsoneedtobeprovidedinthespecifiedformat.Oneofthereturnedrangeslooksasfollows:

{

"key":"January2010-December2010",

"from":1.262304E12,

"from_as_string":"January2010",

"to":1.2911616E12,

"to_as_string":"December2010",

"doc_count":1

}

NoteTheavailableformatswecanuseinformataredefinedintheJodaTimelibrary.Thefulllistisavailableathttp://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html.

Thereisonemorethingaboutthedate_rangeaggregationthatwewanttomention.Imaginethatsometimewemaywanttobuildanaggregationthatcanchangewithtime.Forexample,wemaywanttoseehowmanynewspaperswerepublishedinthelast3,6,9,and12months.Thisispossiblewithouttheneedtoadjustthequeryeverytime,aswecanuseconstantssuchasnow-9M.Thefollowingexampleshowsthis:

{

www.EBooksWorld.ir

Page 444: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"aggs":{

"years":{

"date_range":{

"field":"published",

"format":"dd-MM-YYYY",

"ranges":[

{"to":"now-9M/M"},

{"to":"now-9M"},

{"from":"now-6M/M","to":"now-9M/M"},

{"from":"now-3M/M"}

]

}

}

}

}

Thekeyhereisexpressionssuchasnow-9M.Elasticsearchdoesthemathandgeneratestheappropriatevalue.Forexample,youcanusey(year),M(month),w(week),d(day),h(hour),m(minute),ands(second).Forexample,theexpressionnow+3dmeansthreedaysfromnow.The/Minourexampletakesonlythedateroundedtomonths.Thankstosuchnotation,weonlycountfullmonths.Thesecondadvantageisthatthecalculateddateismorecache-friendlywithouttheroundingdatechangeseverymillisecondthatmakeeverycachebasedontherangeirrelevantandbasicallyuselessinmostcases.

IPv4rangeaggregationAveryinterestingaggregationistheip_rangeoneasitworksonInternetaddresses.ItworksonthefieldsdefinedwiththeiptypeandallowsdefiningrangesgivenbytheIPrangeinCIDRnotation(http://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing).Anexampleusageoftheip_rangeaggregationlooksasfollows:

{

"aggs":{

"access":{

"ip_range":{

"field":"ip",

"ranges":[

{"from":"192.168.0.1","to":"192.168.0.254"},

{"mask":"192.168.1.0/24"}

]

}

}

}

}

Theresponsetotheprecedingqueryisasfollows:

"access":{

"buckets":[

{

"from":3232235521,

"from_as_string":"192.168.0.1",

"to":3232235774,

"to_as_string":"192.168.0.254",

"doc_count":0

www.EBooksWorld.ir

Page 445: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

},

{

"key":"192.168.1.0/24",

"from":3232235776,

"from_as_string":"192.168.1.0",

"to":3232236032,

"to_as_string":"192.168.2.0",

"doc_count":4

}

]

}

Similartotherangeaggregation,wedefinebothendsofthebracketsandthemask.TherestisdonebyElasticsearchitself.

MissingaggregationThemissingaggregationallowsustocreateabucketandseehowmanydocumentshavenovalueinaspecifiedfield.Forexample,wecancheckhowmanyofourbooksinthelibraryindexdon’thavetheoriginaltitledefined–theotitlefield.Todothis,werunthefollowingquery:

{

"aggs":{

"missing_original_title":{

"missing":{

"field":"otitle"

}

}

}

}

TheresponsereturnedbyElasticsearchinthiscasewilllookasfollows:

{

"took":15,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"missing_original_title":{

"doc_count":2

}

}

}

Aswecansee,wehavetwodocumentswithouttheotitlefield.

www.EBooksWorld.ir

Page 446: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

HistogramaggregationThehistogramaggregationisaninterestingonebecauseofitsautomation.Thisaggregationdefinesbucketsitself.Weareonlyresponsiblefordefiningthefieldandtheinterval,andtherestisdoneautomatically.Thesimplestformofaquerythatusesthisaggregationlooksasfollows:

{

"aggs":{

"years":{

"histogram":{

"field":"year",

"interval":100

}

}

}

}

Thenewinformationweneedtoprovideisinterval,whichdefinesthelengthofeveryrangethatwillbeusedtocreateabucket.Wesettheintervalto100,whichinourcasewillresultinbucketsthatare100yearswide.Theaggregationpartoftheresponsetotheprecedingquerythatwassenttoourlibraryindexisasfollows:

{

"took":13,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"years":{

"buckets":[{

"key":1800,

"doc_count":1

},{

"key":1900,

"doc_count":3

}]

}

}

}

Similartotherangeaggregation,thehistogramaggregationallowsustousethekeyedpropertytodefinenamedbuckets.Theotheravailableoptionismin_doc_count,whichallowsustospecifytheminimumnumberofdocumentsrequiredtocreateabucket.Ifwesetthemin_doc_countpropertytozero,Elasticsearchwillalsoincludebucketswiththedocumentcountofzero.Wecanalsousethemissingpropertytospecifythevalue

www.EBooksWorld.ir

Page 447: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Elasticsearchshouldusewhenadocumentdoesn’thaveavalueinthespecifiedfield.

www.EBooksWorld.ir

Page 448: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DatehistogramaggregationAsadate_rangeaggregationisaspecializedformoftherangeaggregation,date_histogramisanextensionofthehistogramaggregationthatworksondates.Forthepurposeofthisexample,wewillagainusethedataweindexedwhendiscussingthedateaggregation.Thismeansthatwewillrunourqueriesagainsttheindexcalledlibrary2.Anexamplequeryusingthedate_histogramaggregationlooksasfollows:

{

"aggs":{

"years":{

"date_histogram":{

"field":"published",

"format":"yyyy-MM-ddHH:mm",

"interval":"10d",

"min_doc_count":1}

}

}

}

Thedifferencebetweenthehistogramanddate_histogramaggregationsistheintervalproperty.Thevalueofthispropertyisnowastringdescribingthetimeinterval,whichinourcaseis10days.Ofcoursewecansetittoanythingwewant.Itusesthesamesuffixeswediscussedwhiletalkingaboutformatsinthedate_rangeaggregation.Itisworthmentioningthatthenumbercanbeafloatvalue.Forexample,1.5mmeansthatthelengthofthebucketwillbeoneandahalfminutes.Theformatattributeisthesameasinthedate_rangeaggregation.Thankstoit,Elasticsearchcanaddahuman-readabledatetextaccordingtothedefinedformat.Ofcoursetheformatattributeisnotrequiredbutuseful.Inadditiontothat,similartotheotherrangeaggregations,thekeyedandmin_doc_countattributesstillwork.

TimezonesElasticsearchstoresallthedatesintheUTCtimezone.YoucandefinethetimezonetobeusedbyElasticsearchbyusingthetime_zoneattribute.Bysettingthisproperty,webasicallytellElasticsearchwhichtimezoneshouldbeusedtoperformthecalculations.Therearethreenotationswithwhichtosettheseattributes:

Wecansetthehoursoffset;forexample,time_zone:5Wecanusethetimeformat;forexample,time_zone:"-04:30"Wecanusethenameofthetimezone;forexample,time_zone:"Europe\Warsaw"

NoteLookathttp://joda-time.sourceforge.net/timezones.htmltoseetheavailabletimezones.

www.EBooksWorld.ir

Page 449: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

GeodistanceaggregationsThenexttwoaggregationsareconnectedwithmapsandspatialsearches.WewilltalkaboutgeotypesandqueriesintheElasticsearchspatialcapabilitiessectionofChapter8,BeyondFull-textSearching,sofeelfreetoskipthesetwotopicsnowandreturntothemlater.

Lookatthefollowingquery:

{

"aggs":{

"neighborhood":{

"geo_distance":{

"field":"location",

"origin":[-0.1275,51.507222],

"ranges":[

{"to":1200},

{"from":1201}

]

}

}

}

}

Youcanseethatthequeryissimilartotherangeaggregation.Theprecedingaggregationwillcalculatethenumberofdocumentsthatfallintotwobuckets:onecloserthan1200kmandthesecondonefurtherthan1200kmfromthegeographicalpointdefinedbytheoriginproperty(intheprecedingcase,theoriginisLondon).TheaggregationsectionoftheresponsereturnedbyElasticsearchlooksasfollows:

"neighborhood":{

"buckets":[

{

"key":"*-1200.0",

"from":0,

"to":1200,

"doc_count":1

},

{

"key":"1201.0-*",

"from":1201,

"doc_count":4

}

]

}

Thekeyedandthekeyattributesworkinthegeo_distanceaggregationaswell,sowecaneasilymodifytheresponsetoourneedsandcreatenamedbuckets.

Thegeo_distanceaggregationsupportsafewadditionalparametersthatareshowninthefollowingquery:

{

"aggs":{

www.EBooksWorld.ir

Page 450: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"neighborhood":{

"geo_distance":{

"field":"location",

"origin":{"lon":-0.1275,"lat":51.507222},

"unit":"m",

"distance_type":"plane",

"ranges":[

{"to":1200},

{"from":1201}

]

}

}

}

}

Wehavehighlightedthreethingsintheprecedingquery.Thefirstchangeishowwedefinedtheoriginpoint.Thistimewespecifiedthelocationbyprovidingthelatitudeandlongitudeexplicitly.

Thesecondchangeistheunitattribute.Itdefinestheunitsusedintherangesarray.Thepossiblevaluesare:km(thedefault,kilometers),mi(miles),in(inches),yd(yards),m(meters),cm(centimeters),andmm(millimeters).

Thelastattribute,distance_type,specifieshowElasticsearchcalculatesthedistance.Thepossiblevaluesare(fromthefastestbutleastaccuratetotheslowestbutthemostaccurate):plane,sloppy_arc(thedefault),andarc.

www.EBooksWorld.ir

Page 451: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

GeohashgridaggregationThesecondaggregationrelatedtogeographicalanalysisisbasedongridsandiscalledgeohash_grid.Itorganizesareasintogridsandassignseverylocationtoacellinsuchagrid.Todothisefficiently,ElasticsearchusesGeohash(http://en.wikipedia.org/wiki/Geohash),whichencodesthelocationintoastring.Thelongerthestringis,themoreaccuratethedescriptionofaparticularlocation.Forexample,oneletterissufficienttodeclareaboxofaboutfivethousandsquarekilometersand5lettersareenoughtoincreasetheaccuracytofivesquarekilometers.Let’slookatthefollowingquery:

{

"aggs":{

"neighborhood":{

"geohash_grid":{

"field":"location",

"precision":5

}

}

}

}

Wedefinedthegeohash_gridaggregationwithbucketsthathaveaprecisionoffivesquarekilometers(theprecisionattributedescribesthenumberoflettersusedinthegeohashstringobject).Thetablewithresolutionsversusthelengthofgeohashcanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-geohashgrid-aggregation.html.

Ofcourse,themoreaccuratewewanttheaggregationtobe,themoreresourcesElasticsearchwillconsume,becauseofthenumberofbucketsthattheaggregationhastocalculate.Bydefault,Elasticsearchdoesnotgeneratemorethan10,000buckets.Youcanchangethisbehaviorbyusingthesizeattribute,butkeepinmindthattheperformancemaysufferforverywidequeriesconsistingofthousandsofbuckets.

www.EBooksWorld.ir

Page 452: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

GlobalaggregationTheglobalaggregationisanaggregationthatdefinesasinglebucketcontainingallthedocumentsfromagivenindexandtype,andnotinfluencedbythequeryitself.Thethingthatdifferentiatestheglobalaggregationfromalltheothersisthattheglobalaggregationhasanemptybody.Forexample,lookatthefollowingquery:

{

"query":{

"term":{

"available":"true"

}

},

"aggs":{

"all_books":{

"global":{}

}

}

}

Inourlibraryindex,weonlyhavetwoavailablebooks,buttheresponsetotheprecedingquerylooksasfollows:

{

"took":1,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":3,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"all_books":{

"doc_count":4

}

}

}

Asyoucansee,theglobalaggregationisnotboundbythequery.Becausetheresultoftheglobalaggregationisasinglebucketcontainingallthedocuments(notnarroweddownbythequeryitself),itisaperfectcandidateforuseasatop-levelparentaggregationfornestingaggregations.

www.EBooksWorld.ir

Page 453: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SignificanttermsaggregationThesignificant_termsaggregationallowsustogetthetermsthatarerelevantandprobablythemostsignificantforagivenquery.Thegoodthingisthatitdoesn’tonlyshowthetoptermsfromtheresultsofthegivenquery,butalsotheonethatseemstobethemostimportantone.

Theusecasesforthisaggregationtypecanvaryfromfindingthemosttroublesomeserverworkinginyourapplicationenvironment,tosuggestingnicknamesfromtext.WheneverElasticsearchseesasignificantchangeinthepopularityofaterm,suchatermisacandidateforbeingsignificant.

NoteRememberthatthesignificant_termsaggregationisveryexpensivewhenitcomestoresourcesandrunningagainstlargeindices.Workisbeingdonetoprovidealightweightversionofthataggregation;asaresult,theAPIforsignificant_termsaggregationmaychangeinthefuture.

Thebestwaytodescribethesignificant_termsaggregationtypeistouseanexample.Let’sstartwithindexing12simpledocuments,whichrepresentreviewsofworkdonebyinterns:

curl-XPOST'localhost:9200/interns/review/1'-d'{"intern":"Richard",

"grade":"bad","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/2'-d'{"intern":"Ralf",

"grade":"perfect","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/3'-d'{"intern":"Richard",

"grade":"bad","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/4'-d'{"intern":"Richard",

"grade":"bad","type":"review"}'

curl-XPOST'localhost:9200/interns/review/5'-d'{"intern":"Richard",

"grade":"good","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/6'-d'{"intern":"Ralf",

"grade":"good","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/7'-d'{"intern":"Ralf",

"grade":"perfect","type":"review"}'

curl-XPOST'localhost:9200/interns/review/8'-d'{"intern":"Richard",

"grade":"medium","type":"review"}'

curl-XPOST'localhost:9200/interns/review/9'-d'{"intern":"Monica",

"grade":"medium","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/10'-d'{"intern":"Monica",

"grade":"medium","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/11'-d'{"intern":"Ralf",

"grade":"good","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/12'-d'{"intern":"Ralf",

"grade":"good","type":"grade"}'

Ofcourse,toshowtherealpowerofthesignificant_termsaggregation,weshoulduseawaylargerdataset.However,forthepurposeofthisbook,wewillconcentrateonthisexample,soitiseasiertoillustratehowthisaggregationworks.

Nowlet’stryfindingthemostsignificantgradeforRichard.Todothiswewillusethe

www.EBooksWorld.ir

Page 454: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

followingquery:

curl-XGET'localhost:9200/interns/_search?size=0&pretty'-d'{

"query":{

"match":{

"intern":"Richard"

}

},

"aggregations":{

"description":{

"significant_terms":{

"field":"grade"

}

}

}

}'

Theresultoftheprecedingquerylooksasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":5,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"description":{

"doc_count":5,

"buckets":[{

"key":"bad",

"doc_count":3,

"score":0.84,

"bg_count":3

}]

}

}

}

Asyoucansee,forourqueryElasticsearchinformedusthatthemostsignificantgradeforRichardisbad.Maybeitwasn’tthebestinternshipforhim;whoknows.

ChoosingsignificanttermsTocalculatesignificantterms,Elasticsearchlooksfordatathatreportsasignificantchangeintheirpopularitybetweentwosetsofdata:theforegroundsetandthebackgroundset.Theforegroundsetisthedatareturnedbyourquery,whilethebackgroundsetisthedatainourindex(orindices,dependingonhowwerunourqueries).Ifatermexistsin10documentsoutofonemillionindexed,butappearsin5documentsfromthe10returned,

www.EBooksWorld.ir

Page 455: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

thensuchatermisdefinitelysignificantandworthconcentratingon.

Let’sgetbacktoourprecedingexamplenowtoanalyzeitabit.Richardgotthreegradesfromthereviewers–badthreetimes,mediumonetime,andgoodonetime.Fromthesethree,thebadvalueappearedinthreeoutofthefivedocumentsmatchingthequery.Ingeneral,thebadgradeappearedinthreedocuments(thebg_countproperty)outofthe12documentsintheindex(thisisourbackgroundset).Thisgivesus25percentoftheindexeddocuments.Ontheotherhand,thebadgradeappearedinthreeoutofthefivedocumentsmatchingthequery(thisisourforegroundset),whichgivesus60percentofthedocuments.Asyoucansee,thechangeinpopularityissignificantforthebadgradeandthat’swhyElasticsearchhasreturneditinthesignificant_termsaggregationresults.

MultiplevalueanalysisThesignificant_termsaggregationcanbenestedandprovideuswithnicedataanalysiscapabilitiesthatconnecttwomultiplesetsofdata.Forexample,let’strytofindasignificantgradeforeachoftheinternsthatwehaveinformationabout.Todothiswewillnestthesignificant_termsaggregationinsidethetermsaggregation.Thequerythatdoesthatlooksasfollows:

curl-XGET'localhost:9200/interns/_search?size=0&pretty'-d'{

"aggregations":{

"grades":{

"terms":{

"field":"intern"

},

"aggregations":{

"significantGrades":{

"significant_terms":{

"field":"grade"

}

}

}

}

}

}'

TheresultsreturnedbyElasticsearchfortheprecedingqueryareasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":12,

"max_score":0.0,

"hits":[]

},

"aggregations":{

www.EBooksWorld.ir

Page 456: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"grades":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":"ralf",

"doc_count":5,

"significantGrades":{

"doc_count":5,

"buckets":[{

"key":"good",

"doc_count":3,

"score":0.48,

"bg_count":4

}]

}

},{

"key":"richard",

"doc_count":5,

"significantGrades":{

"doc_count":5,

"buckets":[{

"key":"bad",

"doc_count":3,

"score":0.84,

"bg_count":3

}]

}

},{

"key":"monica",

"doc_count":2,

"significantGrades":{

"doc_count":2,

"buckets":[]

}

}]

}

}

}

www.EBooksWorld.ir

Page 457: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SampleraggregationThesampleraggregationisoneoftheexperimentalaggregationsinElasticsearch.Itallowsustolimitthesubaggregationprocessingtoasampleofdocumentsthataretop-scoringones.Thisallowsfilteringandpotentialremovalofgarbageinthedata.Itisaverynicecandidateasatop-levelaggregationtolimittheamountofdatathesignificant_termsaggregationrunson.Thesimplestexampleofusingthisaggregationisasfollows:

{

"aggs":{

"sampler_example":{

"sampler":{

"field":"tags",

"max_docs_per_value":1,

"shard_size":10

},

"aggs":{

"best_terms":{

"terms":{

"field":"title"

}

}

}

}

}

}

Toseetherealpowerofsampling,wewillhavetoplaywithitonalargerdataset,butfornowwewilldiscusstheprecedingexample.Thesampleraggregationwasdefinedwiththreeproperties:field,max_docs_per_value,andshard_size.Thefirsttwopropertiesallowustocontrolthediversityofthesampling.WetellElasticsearchhowmanydocumentsatmaximum(thevalueofthemax_doc_per_valueproperty)canbecollectedonashardwiththesamevalueinthedefinedfield(thevalueofthefieldproperty).

Theshard_sizepropertytellsElasticsearchhowmanydocuments(atmost)tocollectfromeachshard.

www.EBooksWorld.ir

Page 458: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ChildrenaggregationThechildrenaggregationisasingle-bucketaggregationthatcreatesabucketwithallthechildrenofthespecifiedtype.Let’sgetbacktotheUsingtheparent-childrelationshipsectioninChapter5,ExtendingYourIndexStructure,andlet’susethecreatedshopindex.Tocreateabucketofallchildrendocumentswiththevariationtypeintheshopindex,werunthefollowingquery:

{

"aggs":{

"variation_children":{

"children":{

"type":"variation"

}

}

}

}

TheresponsereturnedbyElasticsearchisasfollows:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":3,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"variation_children":{

"doc_count":2

}

}

}

NoteBecausethechildrenaggregationusesparent–childfunctionality,itreliesonthe_parentfield,whichneedstobepresent.

www.EBooksWorld.ir

Page 459: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

NestedaggregationIntheUsingnestedobjectssectionofChapter5,ExtendingYourIndexStructure,welearnedaboutnesteddocuments.Let’susethatdatatolookintothenexttypeofaggregation–thenestedone.Let’screatethesimplestworkingquery,whichlookslikethis(weusetheshop_nestedindexcreatedinthementionedchapter):

{

"aggs":{

"variations":{

"nested":{

"path":"variation"

}

}

}

}

Theprecedingqueryissimilarinstructuretoanyotheraggregation.However,insteadofprovidingthefieldnameonwhichtheaggregationshouldbecalculated,itcontainsasingleparameterpath,whichpointstothenesteddocument.Intheresponsewegetanumberofnesteddocuments:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"variations":{

"doc_count":2

}

}

}

Theprecedingresponsemeansthatwehavetwonesteddocumentsintheindex,withtheprovidedtypevariation.

www.EBooksWorld.ir

Page 460: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ReversenestedaggregationThereverse_nestedaggregationisaspecial,single-bucketaggregationthatallowsaggregationonparentdocumentsfromthenesteddocuments.Thereverse_nestedaggregationdoesn’thaveabodysimilartoglobalaggregation.Soundsquitecomplicated,butitisnot.Let’slookatthefollowingquerythatwerunagainsttheshop_nestedindexcreatedinChapter5,ExtendingYourIndexStructureintheUsingnestedobjectssection:

{

"aggs":{

"variations":{

"nested":{

"path":"variation"

},

"aggs":{

"sizes":{

"terms":{

"field":"variation.size"

},

"aggs":{

"product_name_terms":{

"reverse_nested":{},

"aggs":{

"product_name_terms_per_size":{

"terms":{

"field":"name"

}

}

}

}

}

}

}

}

}

}

Westartwiththetoplevelaggregation,whichisthesamenestedaggregationthatweusedwhendiscussingthenestedaggregation.However,weincludeasub-aggregationthatusesreverse_nestedtobeabletoshowtermsfromthetitleforeachsizereturnedbythetop-levelnestedaggregation.Thisispossiblebecause,whenthereverse_nestedaggregationisused,Elasticsearchcalculatesthedataonthebasisoftheparentdocumentsinsteadofusingthenesteddocuments.

NoteRememberthatthereverse_nestedaggregationmustbeusedinsidethenestedaggregation.

Theresponsetotheprecedingquerywilllookasfollows:

{

"took":7,

www.EBooksWorld.ir

Page 461: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"variations":{

"doc_count":2,

"sizes":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":"XL",

"doc_count":1,

"product_name_terms":{

"doc_count":1,

"product_name_terms_per_size":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":"shirt",

"doc_count":1

},{

"key":"test",

"doc_count":1

}]

}

}

},{

"key":"XXL",

"doc_count":1,

"product_name_terms":{

"doc_count":1,

"product_name_terms_per_size":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":"shirt",

"doc_count":1

},{

"key":"test",

"doc_count":1

}]

}

}

}]

}

}

}

}

www.EBooksWorld.ir

Page 462: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

NestingaggregationsandorderingbucketsWhentalkingaboutbucketaggregations,wejustneedtogetbacktothetopicofnestingaggregations.Thisisaverypowerfultechnique,becauseitallowsyoutofurtherprocessthedatafordocumentsinthebuckets.Forexample,thetermsaggregationwillreturnabucketforeachtermandthestatsaggregationcanshowusthestatisticsfordocumentsineachbucket.Forexample,let’slookatthefollowingquery:

{

"aggs":{

"copies":{

"terms":{

"field":"copies"

},

"aggs":{

"years":{

"stats":{

"field":"year"

}

}

}

}

}

}

Thisisanexampleofnestedaggregations.Thetermsaggregationwillreturnbucketsforeachtermfromthecopiesfield(threebucketsinthecaseofourdata),andthestatsaggregationwillcalculatestatisticsfortheyearfieldforthedocumentsfallingintoeachbucketreturnedbythetopaggregation.TheresponsefromElasticsearchfortheprecedingquerylooksasfollows:

{

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"copies":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":0,

"doc_count":2,

"years":{

"count":2,

www.EBooksWorld.ir

Page 463: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"min":1886.0,

"max":1936.0,

"avg":1911.0,

"sum":3822.0

}

},{

"key":1,

"doc_count":1,

"years":{

"count":1,

"min":1929.0,

"max":1929.0,

"avg":1929.0,

"sum":1929.0

}

},{

"key":6,

"doc_count":1,

"years":{

"count":1,

"min":1961.0,

"max":1961.0,

"avg":1961.0,

"sum":1961.0

}

}]

}

}

}

Thisisapowerfulfeatureandallowsustobuildverycomplexdataprocessingpipelines.Ofcourse,wearenotlimitedtoasinglenestedaggregationandwecannestmultipleofthemandevennestanaggregationinsideanestedaggregation.Forexample:

{

"aggs":{

"popular_tags":{

"terms":{

"field":"copies"

},

"aggs":{

"years":{

"terms":{

"field":"year"

},

"aggs":{

"available_by_year":{

"stats":{

"field":"available"

}

}

}

},

"available":{

"stats":{

"field":"available"

www.EBooksWorld.ir

Page 464: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

}

}

}

}

}

Asyoucansee,thepossibilitiesarealmostunlimited,ifyouhaveenoughmemoryandCPUpowertohandleverycomplicatedaggregations.

BucketsorderingThereisonemorefeatureaboutnestedaggregationsandtheorderingofaggregationresults.Elasticsearchcanusevaluesfromthenestedaggregationstosorttheparentbuckets.Forexample,let’slookatthefollowingquery:

{

"aggs":{

"availability":{

"terms":{

"field":"copies",

"order":{"numbers.avg":"desc"}

},

"aggs":{

"numbers":{"stats":{}}

}

}

}

}

Inthepreviousexample,theorderintheavailabilityaggregationisbasedontheaveragevaluefromthenumbersaggregation.Thenotationnumbers.avgisrequiredinthiscase,becausestatsisamultivaluedaggregationandprovidesmultipleinformationandwewereinterestedintheaverage.Ifitwerethesumaggregation,thenameoftheaggregationwouldbesufficient.

www.EBooksWorld.ir

Page 465: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 466: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PipelineaggregationsThelasttypeofaggregationwewilldiscussispipelineaggregations.Tillnowwe’velearnedaboutmetricsaggregationsandbucketaggregations.Thefirstonereturnedmetricswhilethesecondtypereturnedbuckets.Andbothmetricsandbucketsaggregationsworkedonthebasisofreturneddocuments.Pipelineaggregationsaredifferent.Theyworkontheoutputoftheotheraggregationsandtheirmetrics,allowingfunctionalitiessuchasmoving-averagecalculations(https://en.wikipedia.org/wiki/Moving_average).

NoteRememberthatpipelineaggregationswereintroducedinElasticsearch2.0andareconsideredexperimental.ThismeansthattheAPIcanchangeinthefuture,breakingbackwards-compatibility.

www.EBooksWorld.ir

Page 467: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AvailabletypesTherearetwotypesofpipelineaggregation.Thesocalledparentaggregationsfamilyworksontheoutputofotheraggregations.Theyareabletoproducenewbucketsornewaggregationstoaddtoexistingbuckets.Thesecondtypeiscalledsiblingaggregationsandtheseaggregationsareabletoproducenewaggregationsonthesamelevel.

www.EBooksWorld.ir

Page 468: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ReferencingotheraggregationsBecauseoftheirnature,thepipelineaggregationsneedtobeabletoaccesstheresultsoftheotheraggregations.Wecandothatviathebuckets_pathproperty,whichisdefinedusingaspecifiedformat.WecanuseafewkeywordsthatallowustotellElasticsearchexactlywhichaggregationandmetricweareinterestedin.The>separatestheaggregationsandthe.characterseparatestheaggregationfromitsmetrics.Forexample,my_sum.summeansthatwetakethesummetricofanaggregationcalledmy_sum.Anotherexampleispopular_tags>my_sum.sum,whichmeansthatweareinterestedinthesummetricofasubaggregationcalledmy_sum,whichisnestedinsidethepopular_tagsaggregation.Inadditiontothis,wecanuseaspecialpathcalled_count.Thiscanbeusedtocalculatethepipelineaggregationsondocumentcountinsteadofspecifiedmetrics.

www.EBooksWorld.ir

Page 469: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

GapsinthedataOurdatacancontaingaps–situationswherethedatadoesn’texist.Forsuchusecases,wehavetheabilitytospecifythegap_policypropertyandsetittoskiporinsert_zeros.TheskipvaluetellsElasticsearchtoignorethemissingdataandcontinuefromthenextavailablevalue,whileinsert_zerosreplacesthemissingvalueswithzero.

www.EBooksWorld.ir

Page 470: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PipelineaggregationtypesMostoftheaggregationswewillshowinthissectionareverysimilartotheoneswe’vealreadyseeninthesectionsaboutmetricsandbucketsaggregations.Becauseofthat,wewon’tdiscussthemindepth.Therearealsonew,specificpipelineaggregationsthatwewanttotalkaboutinalittlemoredata.

Min,max,sum,andaveragebucketaggregationsThemin_bucket,max_bucket,sum_bucket,andavg_bucketaggregationsaresiblingaggregations,similarinwhattheyreturntothemin,max,sum,andavgaggregations.However,insteadofworkingonthedatareturnedbythequery,theyworkontheresultsoftheotheraggregations.

Toshowyouasimpleexampleofhowthisaggregationworks,let’scalculatethesumofallthebucketsreturnedbytheotheraggregations.Thequerythatwilldothatlooksasfollows:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

}

}

},

"sum_copies":{

"sum_bucket":{

"buckets_path":"periods_histogram>copies_per_100_years"

}

}

}

}

Asyoucansee,weusedthehistogramaggregationandweincludedanestedaggregationthatcalculatesthesumofthecopiesfield.Oursum_bucketsiblingaggregationisusedoutsidethemainaggregationandreferstoitusingthebuckets_pathproperty.IttellsElasticsearchthatweareinterestedinsummingthevaluesofmetricsreturnedbythecopies_per_100_yearsaggregation.TheresultreturnedbyElasticsearchforthisquerylooksasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

www.EBooksWorld.ir

Page 471: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"periods_histogram":{

"buckets":[{

"key":1800,

"doc_count":1,

"copies_per_100_years":{

"value":0.0

}

},{

"key":1900,

"doc_count":3,

"copies_per_100_years":{

"value":7.0

}

}]

},

"sum_copies":{

"value":7.0

}

}

}

Asyoucansee,Elasticsearchaddedanotherbuckettotheresults,calledsum_copies,whichholdsthevaluewewereinterestedin.

CumulativesumaggregationThecumulative_sumaggregationisaparentpipelineaggregationthatallowsustocalculatethesuminthehistogramordate_histogramaggregation.Asimpleexampleoftheaggregationlooksasfollows:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

},

"cumulative_copies_sum":{

"cumulative_sum":{

"buckets_path":"copies_per_100_years"

}

www.EBooksWorld.ir

Page 472: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

}

}

}

}

Becausethisaggregationisaparentpipelineaggregation,itisdefinedinthesubaggregations.Thereturnedresultlooksasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"periods_histogram":{

"buckets":[{

"key":1800,

"doc_count":1,

"copies_per_100_years":{

"value":0.0

},

"cumulative_copies_sum":{

"value":0.0

}

},{

"key":1900,

"doc_count":3,

"copies_per_100_years":{

"value":7.0

},

"cumulative_copies_sum":{

"value":7.0

}

}]

}

}

}

Thefirstcumulative_copies_sumis0becauseofthesumdefinedinthebucket.Thesecondisthesumofallthepreviousonesandthecurrentbucket,whichmeans7.Thenextwillbethesumofallthepreviousonesandthenextbucket.

BucketselectoraggregationThebucket_selectoraggregationisanothersiblingparentaggregation.Itallowsusingascripttodecideifabucketshouldberetainedintheparentmulti-bucketaggregation.For

www.EBooksWorld.ir

Page 473: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

example,tokeeponlybucketsthathavemorethanonecopyperperiod,wecanrunthefollowingquery(itneedsthescript.inlinepropertytobesettoonintheelasticsearch.ymlfile):

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

},

"remove_empty_buckets":{

"bucket_selector":{

"buckets_path":{

"sum_copies":"copies_per_100_years"

},

"script":"sum_copies>1"

}

}

}

}

}

}

Therearetwoimportantthingshere.Thefirstisthebuckets_pathproperty,whichisdifferenttowhatwe’veusedsofar.Nowitusesakeyandavalue.Thekeyisusedtoreferencethevalueinthescript.Thesecondimportantthingisthescriptproperty,whichdefinesthescriptthatdecidesiftheprocessedbucketshouldberetained.TheresultsreturnedbyElasticsearchinthiscaseareasfollows:

{

"took":330,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"periods_histogram":{

"buckets":[{

"key":1900,

"doc_count":3,

"copies_per_100_years":{

www.EBooksWorld.ir

Page 474: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"value":7.0

}

}]

}

}

}

Aswecansee,thebucketwiththecopies_per_100_yearsvalueequalto0hasbeenremoved.

BucketscriptaggregationThebucket_scriptaggregation(siblingparent)allowsustodefinemultiplebucketpathsandusetheminsideascript.Theusedmetricsmustbethenumerictypeandthereturnedvaluealsoneedstobenumeric.Anexampleofusingthisaggregationfollows(thefollowingqueryneedsthescript.inlinepropertytobesettoonintheelasticsearch.ymlfile):

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

},

"stats_per_100_years":{

"stats":{

"field":"copies"

}

},

"example_bucket_script":{

"bucket_script":{

"buckets_path":{

"sum_copies":"copies_per_100_years",

"count":"stats_per_100_years.count"

},

"script":"sum_copies/count*1000"

}

}

}

}

}

}

Therearetwothingshere.Thefirstthingisthatwe’vedefinedtwoentriesinthebuckets_pathproperty.Weareallowedtodothatinthebucket_scriptaggregation.Eachentryisakeyandavalue.Thekeyisthenameofthevaluethatwecanuseinthescript.Thesecondisthepathtotheaggregationmetricweareinterestedin.Ofcourse,the

www.EBooksWorld.ir

Page 475: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

scriptpropertydefinesthescriptthatreturnsthevalue.

Thereturnedresultsfortheprecedingqueryareasfollows:

{

"took":5,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"periods_histogram":{

"buckets":[{

"key":1800,

"doc_count":1,

"copies_per_100_years":{

"value":0.0

},

"stats_per_100_years":{

"count":1,

"min":0.0,

"max":0.0,

"avg":0.0,

"sum":0.0

},

"example_bucket_script":{

"value":0.0

}

},{

"key":1900,

"doc_count":3,

"copies_per_100_years":{

"value":7.0

},

"stats_per_100_years":{

"count":3,

"min":0.0,

"max":6.0,

"avg":2.3333333333333335,

"sum":7.0

},

"example_bucket_script":{

"value":2333.3333333333335

}

}]

}

}

}

www.EBooksWorld.ir

Page 476: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SerialdifferencingaggregationTheserial_diffaggregationisaparentpipelineaggregationthatimplementsatechniquewherethevaluesintimeseriesdata(suchasahistogramordatehistogram)aresubtractedfromthemselvesatdifferenttimeperiods.Thistechniqueallowsdrawingthedatachangesbetweentimeperiodsinsteadofdrawingthewholevalue.Youknowthatthepopulationofacitygrowswithtime.Ifweusetheserialdifferencingaggregationwiththeperiodofoneday,wecanseethedailygrowth.

Tocalculatetheserial_diffaggregation,weneedtheparentaggregation,whichisahistogramoradate_histogram,andweneedtoprovideitwithbuckets_path,whichpointstothemetricweareinterestedin,andlag(apositive,non-zerointegervalue),whichtellswhichpreviousbuckettosubtractfromthecurrentone.Wecanomitlag,inwhichcaseElasticsearchwillsetitto1.

Let’snowlookatasimplequerythatusesthediscussedaggregation:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

},

"first_difference":{

"serial_diff":{

"buckets_path":"copies_per_100_years",

"lag":1

}

}

}

}

}

}

Theresponsetotheprecedingquerylooksasfollows:

{

"took":68,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

www.EBooksWorld.ir

Page 477: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"hits":[]

},

"aggregations":{

"periods_histogram":{

"buckets":[{

"key":1800,

"doc_count":1,

"copies_per_100_years":{

"value":0.0

}

},{

"key":1900,

"doc_count":3,

"copies_per_100_years":{

"value":7.0

},

"first_difference":{

"value":7.0

}

}]

}

}

}

Asyoucansee,withthesecondbucketwegotouraggregation(wewillgetitwitheverybucketafterthataswell).Thecalculatedvalueis7becausethecurrentvalueofcopies_per_100_yearsis7andthepreviousis0.Subtracting0from7givesus7.

DerivativeaggregationThederivativeaggregationisanotherexampleofparentpipelineaggregation.Asitsnamesuggests,itcalculatesaderivative(https://en.wikipedia.org/wiki/Derivative)ofagivenmetricfromahistogramordatehistogram.Theonlythingweneedtoprovideisbuckets_path,whichpointstothemetricweareinterestedin.Anexamplequeryusingthisaggregationlooksasfollows:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

},

"derivative_example":{

"derivative":{

"buckets_path":"copies_per_100_years"

}

}

}

www.EBooksWorld.ir

Page 478: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

}

}

MovingavgaggregationThelastpipelineaggregationthatwewanttodiscussisthemoving_avgone.Itcalculatesthemovingaveragemetric(https://en.wikipedia.org/wiki/Moving_average)overthebucketsoftheparentaggregation(yes,thisisaparentpipelineaggregation).Similartothefewpreviouslydiscussedaggregations,itneedstoberunontheparenthistogramordatehistogramaggregation.

Whencalculatingthemovingaverage,Elasticsearchwilltakethewindow(specifiedbythewindowpropertyandsetto5bydefault),calculatetheaverageforbucketsinthewindow,movethewindowonebucketfurther,andrepeat.Ofcoursewealsoneedtoprovidebuckets_path,whichpointstothemetricthatthemovingaverageshouldbecalculatedfor.

Anexampleofusingthisaggregationlooksasfollows:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":10

},

"aggs":{

"copies_per_10_years":{

"sum":{

"field":"copies"

}

},

"moving_avg_example":{

"moving_avg":{

"buckets_path":"copies_per_10_years"

}

}

}

}

}

}

Wewillomitincludingtheresponsefortheprecedingqueryasitisquitelarge.

Predictingfuturebuckets

Theverynicethingaboutmovingaverageaggregationisthatitsupportspredictions;itcanattempttoextrapolatethedataithasandcreatefuturebuckets.Toforcetheaggregationtopredictbuckets,wejustneedtoaddthepredictpropertytoanymovingaverageaggregationandsetittothenumberofpredictionswewanttoget.Forexample,ifwewanttoaddfivepredictionstotheprecedingquery,wewillchangeittolookasfollows:

www.EBooksWorld.ir

Page 479: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":10

},

"aggs":{

"copies_per_10_years":{

"sum":{

"field":"copies"

}

},

"moving_avg_example":{

"moving_avg":{

"buckets_path":"copies_per_10_years",

"predict":5

}

}

}

}

}

Ifyoulookattheresultsandcomparetheresponsereturnedforthepreviousquerywiththeonewithpredictions,youwillnoticethatthelastbucketinthepreviousqueryendsonthekeypropertyequalto1960,whilethequerywithpredictionsendsonthekeypropertyequalto2010,whichisexactlywhatwewantedtoachieve.

Themodels

Bydefault,Elasticsearchusesthesimplestmodelforcalculatingthemovingaveragesaggregation,butwecancontrolthatbyspecifyingthemodelproperty;thispropertyholdsthenameofthemodelandthesettingsobject,whichwecanusetoprovidemodelproperties.

Thepossiblemodelsare:simple,linear,ewma,holt,andholt_winters.Discussingeachofthemodelsindetailisbeyondthescopeofthebook,soifyouareinterestedindetailsaboutthedifferentmodels,refertotheofficialElasticsearchdocumentationregardingthemovingaveragesaggregationavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-pipeline-movavg-aggregation.html.

Anexamplequeryusingdifferentmodellooksasfollows:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":10},

"aggs":{

"copies_per_10_years":{

"sum":{

"field":"copies"

www.EBooksWorld.ir

Page 480: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}},

"moving_avg_example":{

"moving_avg":{

"buckets_path":"copies_per_10_years",

"model":"holt",

"settings":{

"alpha":0.6,

"beta":0.4

}

}

}

}

}

}

}

www.EBooksWorld.ir

Page 481: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 482: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SummaryThechapterwejustfinishedwasallaboutdataanalysisinElasticsearch:theaggregationsengine.Welearnedwhattheaggregationsareandhowtheywork.Weusedmetrics,buckets,andnewlyintroducedpipelineaggregations,andlearnedwhatwecandowiththem.

Inthenextchapter,we’llgobeyondfulltextsearching.Wewillusesuggesterstobuildefficientautocompletefunctionalityandcorrecttheusers’spellingmistakes.Wewillseewhatpercolationisandhowtouseitinourapplication.WewillusethegeospatialabilitiesofElasticsearchandwe’lllearnhowtoefficientlyfetchlargeamountofdatafromElasticsearch.

www.EBooksWorld.ir

Page 483: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 484: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Chapter8.BeyondFull-textSearchingThepreviouschapterwasfullydedicatedtodataanalysisandhowwecanperformitwithElasticsearch.Welearnedhowtouseaggregations,whattypesofaggregationareavailable,andwhataggregationsareavailablewithineachtypeandhowtousethem.Inthischapter,wewillgetbacktoqueryrelatedtopics.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

WhatispercolatorandhowtouseitWhatarethegeospatialcapabilitiesofElasticsearchHowtouseandbuildfunctionalitiesusingElasticsearchsuggestersHowtousetheScrollAPItoefficientlyfetchlargenumbersofresults

www.EBooksWorld.ir

Page 485: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PercolatorHaveyoueverwonderedwhatwouldhappenifwereversethetraditionalmodelofusingqueriestofinddocumentsinElasticsearch?Doesitmakesensetohaveadocumentandsearchforqueriesmatchingit?Itisnotsurprisingthatthereisawholerangeofsolutionswherethismodelisveryuseful.Wheneveryouoperateonanunboundedstreamofinputdata,whereyousearchfortheoccurrencesofparticularevents,youcanusethisapproach.Thiscanbeusedforthedetectionoffailuresinamonitoringsystemorforthe“Tellmewhenaproductwiththedefinedcriteriawillbeavailableinthisshop”functionality.Inthissection,wewilllookathowanElasticsearchpercolatorworksandhowwecanuseittoimplementoneoftheaforementionedusecases.

www.EBooksWorld.ir

Page 486: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheindexInalltheexamplestobeusedwhendiscussingpercolatorfunctionality,wewilluseanindexcallednotifier.Thementionedindexiscreatedbyusingthefollowingcommand:

curl-XPOST'localhost:9200/notifier'-d'{

"mappings":{

"book":{

"properties":{

"title":{

"type":"string"

},

"otitle":{

"type":"string"

},

"year":{

"type":"integer"

},

"available":{

"type":"boolean"

},

"tags":{

"type":"string",

"index":"not_analyzed"

}

}

}

}

}'

Itisquitesimple.Itcontainsasingletypeandfivefields,whichwillbeusedduringourjourneythroughtheworld.

www.EBooksWorld.ir

Page 487: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PercolatorpreparationElasticsearchexposesaspecialtypecalled.percolatorthatistreateddifferently.Thismeansthatwecanstoreanydocumentsandalsosearchthemlikeanordinarytypeinanyindex.IfyoulookatanyElasticsearchquery,youwillnoticethateachisavalidJSONdocument,whichmeansthatwecanindexandstoreitasadocumentaswell.Thethingisthatpercolatorallowsustoinversethesearchlogicandsearchforquerieswhichmatchagivendocument.Thisispossiblebecauseofthetwojustdiscussedfeatures:thespecial.percolatortypeandthefactthatqueriesinElasticsearcharevalidJSONdocuments.

Let’sgetbacktothelibraryexamplefromChapter2,IndexingYourData,andtrytoindexoneofthequeriesinthepercolator.Weassumethatourusersneedtobeinformedwhenanybookmatchingthecriteriadefinedbythequeryisavailable.

Lookatthefollowingquery1.jsonfilethatcontainsanexamplequerygeneratedbytheuser:

{

"query":{

"bool":{

"must":{

"term":{

"title":"crime"

}

},

"should":{

"range":{

"year":{

"gt":1900,

"lt":2000

}

}

},

"must_not":{

"term":{

"otitle":"nothing"

}

}

}

}

}

Toenhancetheexample,wealsoassumethatourusersareallowedtodefinefiltersusingourhypotheticaluserinterface.Forexample,ourusermaybeinterestedintheavailablebooksthatwerewrittenbeforetheyear2010.Anexamplequerythatcouldhavebeenconstructedbysuchauserinterfacewouldlookasfollows(thequerywaswrittentothequery2.jsonfile):

{

"query":{

"bool":{

"must":{

"range":{

www.EBooksWorld.ir

Page 488: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"year":{

"lt":2010

}

}

},

"filter":{

"term":{

"available":true

}

}

}

}

}

Now,let’sregisterbothqueriesinthepercolator(notethatweareregisteringthequeriesandhaven’tindexedanydocuments).Inordertodothis,wewillrunthefollowingcommands:

curl-XPUT'localhost:9200/notifier/.percolator/1'[email protected]

curl-XPUT'localhost:9200/notifier/.percolator/old_books'[email protected]

Intheprecedingexamples,weusedtwocompletelydifferentidentifiers.Wedidthatinordertoshowthatwecanuseanidentifierthatbestdescribesthequery.Itisuptoustodecideunderwhichnamewewouldlikethequerytoberegistered.

Wearenowreadytouseourpercolator.Ourapplicationwillprovidedocumentstothepercolatorandcheckifanyofthealreadyregisteredqueriesmatchthedocument.Thisisexactlywhatapercolatorallowsustodo-toreversethesearchlogic.Insteadofindexingthedocumentsandrunningqueriesagainstthem,westorethequeriesandsendthedocumentstofindthematchingqueries.

Let’suseanexampledocumentthatwillmatchbothstoredqueries;itwillhavetherequiredtitleandthereleasedate,andwillmentionwhetheritiscurrentlyavailable.Thecommandtosendsuchadocumenttothepercolatorlooksasfollows:

curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{

"doc":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

}'

Asweexpected,bothqueriesmatchedandtheElasticsearchresponseincludestheidentifiersofthematchingqueries.Sucharesponselooksasfollows:

{

"took":36,

"_shards":{

"total":5,

www.EBooksWorld.ir

Page 489: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"successful":5,

"failed":0

},

"total":2,

"matches":[{

"_index":"notifier",

"_id":"old_books"

},{

"_index":"notifier",

"_id":"1"

}]

}

Thisworkslikeacharm.Oneveryimportantthingtonoteistheendpointusedinthisquery:_percolate.Usingthisendpointisrequiredwhenwewanttousethepercolator.Theindexnamecorrespondstotheindexwherethequerieswerestored,andthetypeisequaltothetypedefinedinthemappings.

NoteTheresponseformatcontainsinformationabouttheindexandthequeryidentifier.Thisinformationisincludedforcaseswhenwesearchagainstmultipleindicesatonce.Whenusingasingleindex,addinganadditionalqueryparameter,percolate_format=ids,willchangetheresponseasfollows:

"matches":["old_books","1"]

www.EBooksWorld.ir

Page 490: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

GettingdeeperBecausethequeriesregisteredinapercolatorareinfactdocuments,wecanuseanormalquerysenttoElasticsearchinordertochoosewhichqueriesstoredinthe.percolatortypeshouldbeusedinthepercolationprocess.Thismaysoundweird,butitreallygivesalotofpossibilities.Inourlibrary,wecanhaveseveralgroupsofusers.Let’sassumethatsomeofthemhavepermissionstoborrowveryrarebooks,orthatwehaveseveralbranchesinthecityandtheusercandeclarewhereheorshewouldliketogetthebookfrom.

Let’sseehowsuchusecasescanbeimplementedbyusingthepercolator.Todothis,wewillneedtoupdateourmappingandincludethebranchinformation.Wedothatbyrunningthefollowingcommand:

curl-XPOST'localhost:9200/notifier/.percolator/_mapping'-d'{

".percolator":{

"properties":{

"branches":{

"type":"string",

"index":"not_analyzed"

}

}

}

}'

Now,inordertoregisteraquery,weusethefollowingcommand:

curl-XPUT'localhost:9200/notifier/.percolator/3'-d'{

"query":{

"term":{

"title":"crime"

}

},

"branches":["brA","brB","brD"]

}'

Intheprecedingexample,weregisteredaquerythatshowsauser’sinterest.Ourhypotheticaluserisinterestedinanybookwiththetermcrimeinthetitlefield(thetermqueryisresponsibleforthis).Heorshewantstoborrowthisbookfromoneofthethreelistedbranches.Whenspecifyingthemappings,wedefinedthatthebranchesfieldisanon-analyzedstringfield.Wecannowincludeaqueryalongwiththedocumentwesentpreviously.Let’slookathowtodothis.

Ourbooksystemjustgotthebook,anditisreadytoreportthebookandcheckwhetherthebookisofinteresttoanyone.Tocheckthis,wesendthedocumentthatdescribesthebookandaddanadditionalquerytosucharequest-thequerythatwilllimittheuserstoonlytheonesinterestedinthebrBbranch.Sucharequestlooksasfollows:

curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{

"doc":{

"title":"CrimeandPunishment",

"otitle":"

www.EBooksWorld.ir

Page 491: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Преступлéниеинаказáние

",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

},

"size":10,

"filter":{

"term":{

"branches":"brB"

}

}

}'

Ifeverythingwasexecutedcorrectly,theresponsereturnedbyElasticsearchshouldlookasfollows(weindexedourquerywith3asanidentifier):

{

"took":27,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"total":1,

"matches":[{

"_index":"notifier",

"_id":"3"

}]

}

ControllingthesizeofreturnedresultsThesizeoftheresultswhenitcomestopercolatormakesthedifference.Themorequeriesasingledocumentmatches,themoreresultswillbereturnedandmorememorywillbeneededbyElasticsearch.Becauseofthis,thereisoneadditionalthingtonote-thesizeparameter.Itallowsustolimitthenumberofmatchesreturned.

PercolatorandscorecalculationInthepreviousexamples,wefilteredourqueriesusingasingletermquery,butwedidn’tthinkaboutthescoringprocessatall.Elasticsearchallowsustocalculatethescorewhenusingthepercolator.Let’schangethepreviouslyuseddocumentsenttothepercolatorandadjustitsothatscoringisused:

curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{

"doc":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

www.EBooksWorld.ir

Page 492: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"tags":[],

"copies":0,

"available":true

},

"size":10,

"query":{

"term":{

"branches":"brB"

}

},

"track_scores":true,

"sort":{

"_score":"desc"

}

}'

Asyoucansee,weusedthequerysectionandincludedanadditionaltrack_scoresattributesettotrue.Thisisneeded,becausebydefaultElasticsearchwon’tcalculatethescoreforthedocumentsbecauseofperformance.Ifweneedscoresinthepercolationprocess,weshouldbeawarethatsuchquerieswillbeslightlymoredemandingwhenitcomestoCPUprocessingpowerthantheonesthatomitcalculatingthescore.

NoteIntheprecedingexample,wetoldElasticsearchtosortourresultonthebasisofthescoreindescendingorder.Thisisthedefaultbehaviorwhentrack_scoresisturnedon,sowecanomitsortdeclaration.Atthetimeofwriting,sortingonscoreindescendingdirectionistheonlyavailableoption.

CombiningpercolatorswithotherfunctionalitiesIfweareallowedtousequeriesalongwiththedocumentssentforpercolation,whycanwenotuseotherElasticsearchfunctionalities?Ofcourse,thisispossible.Forexample,thefollowingdocumentissentalongwithanaggregationandtheresultswillincludetheaggregationcalculation:

curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{

"doc":{

"title":"CrimeandPunishment",

"available":true

},

"aggs":{

"test":{

"terms":{

"field":"branches"

}

}

}

}'

Aswecansee,percolatorallowsustorunbothqueryandaggregations.Lookatthefollowingexampledocument:

curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{

www.EBooksWorld.ir

Page 493: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"doc":{

"title":"CrimeandPunishment",

"year":1886,

"available":true

},

"size":10,

"highlight":{

"fields":{

"title":{}

}

}

}'

Asyoucansee,itcontainsahighlightingsection.AfragmentoftheresponsereturnedbyElasticsearchlooksasfollows:

{

"_index":"notifier",

"_id":"3",

"highlight":{

"title":["<em>Crime</em>andPunishment"]

}

}

NoteNotethattherearesomelimitationswhenitcomestothequerytypessupportedbythepercolatorfunctionality.Inthecurrentimplementation,parent-childrelationsarenotavailableinthepercolator,soyoucan’tusequeriessuchashas_child,top_children,andhas_parent.

www.EBooksWorld.ir

Page 494: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

GettingthenumberofmatchingqueriesSometimesyoudon’tcareaboutthematchedqueriesandyouonlywantthenumberofmatchedqueries.Insuchcases,sendingadocumentagainstthestandardpercolatorendpointisnotefficient.Elasticsearchexposesthe_percolate/countendpointtohandlesuchcasesinanefficientway.Anexampleofsuchacommandfollows:

curl-XGET'localhost:9200/notifier/book/_percolate/count?pretty'-d'{

"doc":{...}

}'

www.EBooksWorld.ir

Page 495: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndexeddocumentpercolationInthefinal,closingparagraphofthepercolationsection,wewanttoshowyouonemorething–thepossibilityofpercolatingadocumentthatisalreadyindexed.Todothis,weneedtousetheGEToperationonthedocumentandprovideinformationaboutwhichpercolatorindexshouldbeused.Let’slookatthefollowingcommand:

curl-XGET'localhost:9200/library/book/1/_percolate?

percolate_index=notifier'

Thiscommandchecksthedocumentwiththe1identifierfromourlibraryindexagainstthepercolatorindexdefinedbythepercolate_indexparameter.Rememberthat,bydefault,Elasticsearchwillusethepercolatorinthesameindexasthedocument;that’swhywe’vespecifiedthepercolate_indexparameter.

www.EBooksWorld.ir

Page 496: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 497: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ElasticsearchspatialcapabilitiesThesearchserverssuchasElasticsearchareusuallylookedatfromtheperspectiveoffull-textsearching.Elasticsearch,becauseofitsmarketingasbeingpartofELK(Elasticsearch,Logstash,andKibana),isalsohighlyknownforbeingabletohandlelargeamountoftimeseriesdata.However,thisisonlyapartofthewholeview.Sometimesbothofthementionedusecasesarenotenough.Imaginesearchingforlocalservices.Fortheenduser,themostimportantthingistheaccuracyoftheresults.Byaccuracy,wenotonlymeantheproperresultsofthefull-textsearch,butalsotheresultsbeingasnearastheycanintermsoflocation.Inseveralcases,thisisthesameasatextsearchongeographicalnamessuchascitiesorstreets,butinothercaseswecanfinditveryusefultobeabletosearchonthebasisofthegeographicalcoordinatesofourindexeddocuments.AndthisisalsoafunctionalitythatElasticsearchiscapableofhandling.

WiththereleaseofElasticsearch2.2,thegeo_pointtypereceivedalotofchanges,especiallyinternallywherealltheoptimizationsweredone.Priorto2.2,thegeo_pointtypewasstoredintheindexasatwonotanalyzedstringvaluesandthischanged.WiththereleaseofElasticsearch2.2,thegeo_pointtypegotallthegreatimprovementsfromApacheLucenelibraryandisnowmoreefficient.

www.EBooksWorld.ir

Page 498: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

MappingpreparationforspatialsearchesInordertodiscussthespatialsearchfunctionality,let’sprepareanindexwithalistofcities.Thiswillbeaverysimpleindexwithonetypenamedpoi(whichstandsforthepointofinterest),thenameofthecity,anditscoordinates.Themappingsareasfollows:

{

"mappings":{

"poi":{

"properties":{

"name":{"type":"string"},

"location":{"type":"geo_point"}

}

}

}

}

Assumingthatweputthisdefinitionintothemapping1.jsonfile,wecancreateanindexbyrunningthefollowingcommand:

curl-XPUTlocalhost:9200/[email protected]

Theonlynewthingintheprecedingmappingsisthegeo_pointtype,whichisusedforthelocationfield.Byusingit,wecanstorethegeographicalpositionofourcityandusespatial-basedfunctionalities.

www.EBooksWorld.ir

Page 499: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ExampledataOurexampledocuments1.jsonfilewithdocumentslooksasfollows:

{"index":{"_index":"map","_type":"poi","_id":1}}

{"name":"NewYork","location":"40.664167,-73.938611"}

{"index":{"_index":"map","_type":"poi","_id":2}}

{"name":"London","location":[-0.1275,51.507222]}

{"index":{"_index":"map","_type":"poi","_id":3}}

{"name":"Moscow","location":{"lat":55.75,"lon":37.616667}}

{"index":{"_index":"map","_type":"poi","_id":4}}

{"name":"Sydney","location":"-33.859972,151.211111"}

{"index":{"_index":"map","_type":"poi","_id":5}}

{"name":"Lisbon","location":"eycs0p8ukc7v"}

Inordertoperformabulkrequest,weaddedinformationabouttheindexname,type,anduniqueidentifiersofourdocuments;so,wecannoweasilyimportthisdatausingthefollowingcommand:

curl-XPOSTlocalhost:9200/[email protected]

Onethingthatweshouldtakeacloserlookatisthelocationfield.Wecanusevariousnotationsforcoordination.Wecanprovidethelatitudeandlongitudevaluesasastring,asapairofnumbers,orasanobject.Notethatthestringandarraymethodsofprovidingthegeographicallocationhavedifferentordersforthelatitudeandlongitudeparameters.ThelastrecordshowsthatthereisalsoapossibilitytogivecoordinationasaGeohashvalue(thenotationisdescribedindetailathttp://en.wikipedia.org/wiki/Geohash).

Additionalgeo_fieldpropertiesWiththereleaseofElasticsearch2.2,thenumberofparametersthatthegeo_pointtypecanaccepthasbeenreducedandisasfollows:

geohash:BooleanparametertellingElasticsearchwhetherthe.geohashfieldshouldbecreated.Defaultstofalseunlessgeohash_prefixisused.geohash_precision:Maximumsizeofgeohashandgeohash_prefix.geohash_prefix:BooleanparametertellingElasticsearchtoindexthegeohashanditsprefixes.Defaultstofalse.ignore_malformed:BooleanparametertellingElasticsearchtoignoreabadlywrittengeo_fieldpointinsteadofrejectingthewholedocument.Defaultstofalse,whichmeansthatthebadlyformattedgeo_fielddatawillresultinanindexationerrorforthewholedocument.lat_lon:BooleanparametertellingElasticsearchtoindexthespatialdataintwoseparatefieldscalled.latand.lon.Defaultstofalse.precision_step:Parameterallowingcontroloverhowournumericgeographicalpointswillbeindexed.

Keepinmindthatthegeohashfieldrelatedandlat_lonfieldrelatedpropertieswerenotremovedforbackward-compatibilityreasons.Theuserscanstillusethem.However,thequerieswillnotusethembutwillinsteadusethehighlyoptimizeddatastructurethatis

www.EBooksWorld.ir

Page 500: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

builtduringindexingbythegeo_pointtype.

www.EBooksWorld.ir

Page 501: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SamplequeriesNowlet’slookatseveralexamplesofusingcoordinatesandsolvingcommonrequirementsinmodernapplicationsthatrequiregeographicaldatasearchingalongwithfull-textsearching.

NoteIfyouareinterestedinallthegeospatialqueriesthatareavailableforElasticsearchusers,refertotheofficialdocumentationavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/geo-queries.html.

Distance-basedsortingLet’sstartwithaverycommonrequirement:sortingthereturnedresultsbydistancefromagivenpoint.Inourexample,wewanttogetallthecitiesandsortthembytheirdistancesfromthecapitalofFrance,Paris.Todothis,wesendthefollowingquerytoElasticsearch:

curl-XGETlocalhost:9200/map/_search?pretty-d'{

"query":{

"match_all":{}

},

"sort":[{

"_geo_distance":{

"location":"48.8567,2.3508",

"unit":"km"

}

}]

}'

IfyouremembertheSortingdatasectionfromChapter4,ExtendingYourQueryingKnowledge,you’llnoticethattheformatisslightlydifferent.Weareusingthe_geo_distancekeytoindicatesortingbydistance.Wemustgivethebaselocation(thelocationattribute,whichholdstheinformationofthelocationofParisinourcase),andweneedtospecifytheunitsthatcanbeusedintheresults.Theavailablevaluesarekmandmi,whichstandforkilometersandmiles,respectively.Theresultofsuchaquerywillbeasfollows:

{

"took":5,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":5,

"max_score":null,

"hits":[{

"_index":"map",

"_type":"poi",

"_id":"2",

www.EBooksWorld.ir

Page 502: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"_score":null,

"_source":{

"name":"London",

"location":[-0.1275,51.507222]

},

"sort":[343.17487356850313]

},{

"_index":"map",

"_type":"poi",

"_id":"5",

"_score":null,

"_source":{

"name":"Lisbon",

"location":"eycs0p8ukc7v"

},

"sort":[1452.9506736367805]

},{

"_index":"map",

"_type":"poi",

"_id":"3",

"_score":null,

"_source":{

"name":"Moscow",

"location":{

"lat":55.75,

"lon":37.616667

}

},

"sort":[2483.837565935267]

},{

"_index":"map",

"_type":"poi",

"_id":"1",

"_score":null,

"_source":{

"name":"NewYork",

"location":"40.664167,-73.938611"

},

"sort":[5832.645958617513]

},{

"_index":"map",

"_type":"poi",

"_id":"4",

"_score":null,

"_source":{

"name":"Sydney",

"location":"-33.859972,151.211111"

},

"sort":[16978.094780773998]

}]

}

}

Aswiththeotherexamplesofsorting,Elasticsearchshowsinformationaboutthevalueusedforsorting.Let’slookatthehighlightedrecord.Aswecansee,thedistancebetweenParisandLondonisabout343km,andifyoucheckatraditionalmap,youwillseethat

www.EBooksWorld.ir

Page 503: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

thisistrue.

BoundingboxfilteringThenextexamplethatwewanttoshowisnarrowingdowntheresultstoaselectedareathatisboundedbyagivenrectangle.Thisisveryhandyifwewanttoshowresultsonthemaporwhenweallowausertomarkthemapareaforsearching.YoualreadyreadaboutfiltersintheFilteringyourresultssectionofChapter4,ExtendingYourQueryingKnowledge,buttherewedidn’tmentionspatialfilters.Thefollowingqueryshowshowwecanfilterbyusingtheboundingbox:

curl-XGETlocalhost:9200/map/_search?pretty-d'{

"query":{

"bool":{

"must":{"match_all":{}},

"filter":{

"geo_bounding_box":{

"location":{

"top_left":"52.4796,-1.903",

"bottom_right":"48.8567,2.3508"

}

}

}

}

}

}'

Intheprecedingexample,weselectedamapfragmentbetweenBirminghamandParisbyprovidingthetop-leftandbottom-rightcornercoordinates.Thesetwocornersareenoughtospecifyanyrectanglewewant,andElasticsearchwilldotherestofthecalculationforus.Thefollowingscreenshotshowsthespecifiedrectangleonthemap:

www.EBooksWorld.ir

Page 504: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Aswecansee,theonlycityfromourdatathatmeetsthecriteriaisLondon.So,let’scheckwhetherElasticsearchknowsthisbyrunningtheprecedingquery.Let’snowlookatthereturnedresults:

{

"took":38,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":1.0,

"hits":[{

"_index":"map",

"_type":"poi",

"_id":"2",

"_score":1.0,

"_source":{

"name":"London",

"location":[-0.1275,51.507222]

}

}]

}

}

www.EBooksWorld.ir

Page 505: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Asyoucansee,againElasticsearchagreeswiththemap.

LimitingthedistanceThelastexampleshowsthenextcommonrequirement:limitingtheresultstotheplacesthatarelocatednofurtherthanthedefineddistancefromagivenpoint.Forexample,ifwewanttolimitourresultstoallthecitieswithinthe500kmradiusfromParis,wecanusethefollowingquery:

curl-XGETlocalhost:9200/map/_search?pretty-d'{

"query":{

"bool":{

"must":{"match_all":{}},

"filter":{

"geo_distance":{

"location":"48.8567,2.3508",

"distance":"500km"

}

}

}

}

}'

Ifeverythinggoeswell,Elasticsearchshouldonlyreturnasinglerecordfortheprecedingquery,andtherecordshouldbeLondonagain.However,wewillleaveitforyouasareadertocheck.

www.EBooksWorld.ir

Page 506: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ArbitrarygeoshapesSometimes,usingasinglegeographicalpointorasinglerectangleisjustnotenough.Insuchcasessomethingmoresophisticatedisneeded,andElasticsearchaddressesthisbygivingyouthepossibilitytodefineshapes.Inordertoshowyouhowwecanleveragecustomshape-limitinginElasticsearch,weneedtomodifyourindexorcreateanewoneandintroducethegeo_shapetype.Ournewmappinglooksasfollows(wewillusethistocreateanindexcalledmap2):

{

"mappings":{

"poi":{

"properties":{

"name":{"type":"string","index":"not_analyzed"},

"location":{"type":"geo_shape"}

}

}

}

}

Assumingwewrotetheprecedingmappingdefinitiontothemapping2.jsonfile,wecancreateanindexbyusingthefollowingcommand:

curl-XPUTlocalhost:9200/[email protected]

NoteElasticsearchallowsustosetseveralattributesforthegeo_shapetype.Themostcommonlyusedistheprecisionparameter.Duringindexing,theshapeshavetobeconvertedtoasetofterms.Themoreaccuracyrequired,themoretermsshouldbegenerated,whichisdirectlyreflectedintheindexsizeandperformance.Precisioncanbedefinedinthefollowingunits:in,inch,yd,yard,mi,miles,km,kilometers,m,meters,cm,centimeters,ormm,millimeters.Bydefault,theprecisionissetto50m.

Next,let’schangeourexampledatatomatchournewindexstructureandcreatethedocuments2.jsonfilewiththefollowingcontents:

{"index":{"_index":"map2","_type":"poi","_id":1}}

{"name":"NewYork","location":{"type":"point","coordinates":

[-73.938611,40.664167]}}

{"index":{"_index":"map2","_type":"poi","_id":2}}

{"name":"London","location":{"type":"point","coordinates":

[-0.1275,51.507222]}}

{"index":{"_index":"map2","_type":"poi","_id":3}}

{"name":"Moscow","location":{"type":"point","coordinates":[

37.616667,55.75]}}

{"index":{"_index":"map2","_type":"poi","_id":4}}

{"name":"Sydney","location":{"type":"point","coordinates":

[151.211111,-33.865143]}}

{"index":{"_index":"map2","_type":"poi","_id":5}}

{"name":"Lisbon","location":{"type":"point","coordinates":

[-9.142685,38.736946]}}

www.EBooksWorld.ir

Page 507: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Thestructureofthefieldofthegeo_shapetypeisdifferentfromgeo_point.ItissyntacticallycalledGeoJSON(http://en.wikipedia.org/wiki/GeoJSON).Itallowsustodefinevariousgeographicaltypes.Nowit’stimetoindexourdata:

curl-XPOSTlocalhost:9200/[email protected]

Let’ssumupthetypesthatwecanuseduringquerying,atleasttheonesthatwethinkarethemostusefulones.

PointApointisdefinedbythetablewhenthefirstelementisthelongitudeandthesecondisthelatitude.Anexampleofsuchashapeisasfollows:

{

"type":"point",

"coordinates":[-0.1275,51.507222]

}

EnvelopeAnenvelopedefinesaboxgivenbythecoordinatesoftheupper-leftandbottom-rightcornersofthebox.Anexampleofsuchashapeisasfollows:

{

"type":"envelope",

"coordinates":[[-0.087890625,51.50874245880332],[2.4169921875,

48.80686346108517]]

}

PolygonApolygondefinesalistofpointsthatareconnectedtocreateourpolygon.Thefirstandthelastpointinthearraymustbethesamesothattheshapeisclosed.Anexampleofsuchashapeisasfollows:

{

"type":"polygon",

"coordinates":[[

[-5.756836,49.991408],

[-7.250977,55.124723],

[1.845703,51.500194],

[-5.756836,49.991408]

]]

}

Ifyoulookcloselyattheshapedefinition,youwillfindasupplementaryleveloftables.Thankstothis,youcandefinemorethanasinglepolygon.Insuchacase,thefirstpolygondefinesthebaseshapeandtherestofthepolygonsaretheshapesthatwillbeexcludedfromthebaseshape.

MultipolygonThemultipolygonshapeallowsustocreateashapethatconsistsofmultiplepolygons.Anexampleofsuchashapeisasfollows:

www.EBooksWorld.ir

Page 508: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

{

"type":"multipolygon",

"coordinates":[

[[

[-5.756836,49.991408],

[-7.250977,55.124723],

[1.845703,51.500194],

[-5.756836,49.991408]

]],[[

[-0.087890625,51.50874245880332],

[2.4169921875,48.80686346108517],

[3.88916015625,51.01375465718826],

[-0.087890625,51.50874245880332]

]]]

}

Themultipolygonshapecontainsmultiplepolygonsandfallsintothesamerulesasthepolygontype.So,wecanhavemultiplepolygonsand,inadditiontothis,wecanincludemultipleexclusionshapes.

AnexampleusageNowthatwehaveourindexwiththegeo_shapefields,wecancheckwhichcitiesarelocatedintheUK.Thequerythatwillallowustodothislooksasfollows:

curl-XGETlocalhost:9200/map2/_search?pretty-d'{

"query":{

"bool":{

"must":{"match_all":{}},

"filter":{

"geo_shape":{

"location":{

"shape":{

"type":"polygon",

"coordinates":[[

[-5.756836,49.991408],[-7.250977,55.124723],

[-3.955078,59.352096],[1.845703,51.500194],

[-5.756836,49.991408]

]]

}

}

}

}

}

}

}'

ThepolygontypedefinestheboundariesoftheUK(inavery,veryimpreciseway),andElasticsearch’sresponseisasfollows:

{

"took":7,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

www.EBooksWorld.ir

Page 509: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"failed":0

},

"hits":{

"total":1,

"max_score":1.0,

"hits":[{

"_index":"map2",

"_type":"poi",

"_id":"2",

"_score":1.0,

"_source":{

"name":"London",

"location":{

"type":"point",

"coordinates":[-0.1275,51.507222]

}

}

}]

}

}

Asfarasweknow,theresponseiscorrect.

StoringshapesintheindexUsually,shapedefinitionsarecomplex,andthedefinedareasdon’tchangetoooften(forexample,theboundariesoftheUK).Insuchcases,itisconvenienttodefinetheshapesintheindexandusetheminqueries.Thisispossible,andwewillnowdiscusshowtodoit.Asusual,wewillstartwiththeappropriatemapping,whichisasfollows:

{

"mappings":{

"country":{

"properties":{

"name":{"type":"string","index":"not_analyzed"},

"area":{"type":"geo_shape"}

}

}

}

}

Thismappingissimilartothemappingusedpreviously.Wehaveonlychangedthefieldnameandsaveditinthemapping3.jsonfile.Let’screateanewindexbyrunningthefollowingcommand:

curl-XPUTlocalhost:9200/[email protected]

Theexampledatathatwewilluselooksasfollows(storedinthefilecalleddocuments3.json):

{"index":{"_index":"countries","_type":"country","_id":1}}

{"name":"UK","area":{"type":"polygon","coordinates":[[[-5.756836,

49.991408],[-7.250977,55.124723],[-3.955078,59.352096],[1.845703,

51.500194],[-5.756836,49.991408]]]}}

{"index":{"_index":"countries","_type":"country","_id":2}}

{"name":"France","area":{"type":"polygon","coordinates":[[[

www.EBooksWorld.ir

Page 510: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

3.1640625,42.09822241118974],[-1.7578125,43.32517767999296],[

-4.21875,48.22467264956519],[2.4609375,50.90303283111257],[

7.998046875,48.980216985374994],[7.470703125,44.08758502824516],[

3.1640625,42.09822241118974]]]}}

{"index":{"_index":"countries","_type":"country","_id":3}}

{"name":"Spain","area":{"type":"polygon","coordinates":[[[

3.33984375,42.22851735620852],[-1.845703125,43.32517767999296],[

-9.404296875,43.19716728250127],[-6.6796875,41.57436130598913],[

-7.3828125,36.87962060502676],[-2.109375,36.52729481454624],[

3.33984375,42.22851735620852]]]}}

Toindexthedata,wejustneedtorunthefollowingcommand:

curl-XPOSTlocalhost:9200/[email protected]

Asyoucanseeinthedata,eachdocumentcontainsapolygontype.Thepolygonsdefinetheareaofthegivencountries(again,itisfarfrombeingaccurate).Ifyouremember,thefirstpointofashapeneedstobethesameasthelastonesothattheshapeisclosed.Now,let’schangeourquerytoincludetheshapesfromtheindex.Ournewquerylooksasfollows:

curl-XGETlocalhost:9200/map2/_search?pretty-d'{

"query":{

"bool":{

"must":{"match_all":{}},

"filter":{

"geo_shape":{

"location":{

"indexed_shape":{

"index":"countries",

"type":"country",

"path":"area",

"id":"1"

}

}

}

}

}

}

}'

Whencomparingthesetwoqueries,wecannotethattheshapeobjectchangedtoindexed_shape.WeneedtotellElasticsearchwheretolookforthisshape.Wecandothisbydefiningtheindex(theindexproperty,whichdefaultstoshape),thetype(thetypeproperty),andthepath(thepathproperty,whichdefaultstoshape).Theoneitemlackingisanidpropertyoftheshape.Inourcase,thisis1.However,ifyouwanttoindexmoreshapes,weadviseyoutoindextheshapeswiththeirnameastheiridentifier.

www.EBooksWorld.ir

Page 511: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 512: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UsingsuggestersAlongtimeago,startingfromElasticsearch0.90(whichwasreleasedonApril29,2013),wegottheabilitytouseso-calledsuggesters.Wecandefineasuggesterasafunctionalityallowingustocorrecttheuser’sspellingmistakesandbuildautocompletefunctionalitykeepingperformanceinmind.Thissectionisdedicatedtothesefunctionalitiesandwillhelpyoulearnaboutthem.Wewilldiscusseachavailablesuggestertypeandshowthemostcommonpropertiesthatallowustocontrolthem.However,keepinmindthatthissectionisnotacomprehensiveguidedescribingeachandeveryproperty.Descriptionofallthedetailsaboutsuggestersareaverybroadtopicandisoutofthescopeofthisbook.Ifyouwanttodigintotheirfunctionality,refertotheofficialElasticsearchdocumentation(https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html)ortotheMasteringElasticsearchSecondEditionbookpublishedbyPacktPublishing.

www.EBooksWorld.ir

Page 513: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AvailablesuggestertypesThesehavechangedsincetheinitialintroductionoftheSuggestAPItoElasticsearch.Wearenowabletousefourtypeofsuggesters:

term:Asuggesterreturningcorrectionsforeachwordpassedtoit.Usefulforsuggestionsthatarenotphrases,suchassingletermqueries.phrase:Asuggesterworkingonphrases,returningaproperphrase.completion:Asuggesterdesignedtoprovidefastandefficientautocompleteresults.context:ExtensiontotheSuggestAPIofElasticsearch.Allowsustohandlepartsofthesuggestqueriesinmemoryandthusveryeffectiveintermsofperformance.

www.EBooksWorld.ir

Page 514: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IncludingsuggestionsLet’snowtrygettingsuggestionsalongwiththequeryresults.Forexample,let’suseamatch_allqueryandtrygettingasuggestionforaserlockholnesphrase,whichhastwotermsspelledincorrectly.Todothis,werunthefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"suggest":{

"first_suggestion":{

"text":"serlockholnes",

"term":{

"field":"_all"

}

}

}

}'

Asyoucansee,we’veintroducedanewsectiontoourquery–thesuggestone.We’vespecifiedthetextwewanttogetthecorrectionforbyusingthetextproperty.We’vespecifiedthesuggesterwewanttouse(thetermone)andconfigureditspecifyingthenameofthefieldthatshouldbeusedforbuildingsuggestionsusingthefieldproperty.first_suggestionisthenamewegivetooursuggester;weneedtodothisbecausetherecanbemultipleonesused.Thisishowyousendarequestforsuggestioningeneral.

Ifwewanttogetmultiplesuggestionsforthesametext,wecanembedoursuggestionsinthesuggestobjectandplacethetextpropertyasthesuggestobjectoption.Forexample,ifwewanttogetsuggestionsfortheserlockholnestextforthetitlefieldandforthe_allfield,werunthefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"suggest":{

"text":"serlockholnes",

"first_suggestion":{

"term":{

"field":"_all"

}

},

"second_suggestion":{

"term":{

"field":"title"

}

}

}

}'

Suggesterresponse

www.EBooksWorld.ir

Page 515: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Nowlet’slookattheresponseofthefirstquerywesent.Asyoucanguess,theresponseincludesboththequeryresultsandthesuggestions:

{

"took":10,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":1.0,

"hits":[...]

},

"suggest":{

"first_suggestion":[{

"text":"serlock",

"offset":0,

"length":7,

"options":[{

"text":"sherlock",

"score":0.85714287,

"freq":1

}]

},{

"text":"holnes",

"offset":8,

"length":6,

"options":[{

"text":"holmes",

"score":0.8333333,

"freq":1

}]

}]

}

}

Wecanseethatwegotboththesearchresultsandthesuggestions(we’veomittedtheresultstomaketheexamplemorereadable)intheresponse.Thetermsuggesterreturnedalistofpossiblesuggestionsforeachtermthatwaspresentinthetextparameter.Foreachterm,thetermsuggesterreturnsanarrayofpossiblesuggestions.Lookingatthedatareturnedfortheserlockterm,wecanseetheoriginalword(thetextparameter),itsoffsetintheoriginaltextparameter(theoffsetparameter),anditslength(thelengthparameter).

TheoptionsarraycontainssuggestionsforthegivenwordandwillbeemptyifElasticsearchdoesn’tfindanysuggestions.Eachentryinthisarrayisasuggestionanddescribedbythefollowingproperties:

text:Textofthesuggestion.score:Suggestionscore;thehigherthescore,thebetterthesuggestion.freq:Frequencyofthesuggestion.Thefrequencyrepresentshowmanytimesthe

www.EBooksWorld.ir

Page 516: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

wordappearsinthedocumentsintheindexwearerunningthesuggestionqueryagainst.

www.EBooksWorld.ir

Page 517: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TermsuggesterThetermsuggesterworksonthebasisofstringeditdistance.Thismeansthatthesuggestionwiththefewestcharactersthatneedtobechanged,added,orremovedtomakethesuggestionlookastheoriginalword,isthebestone.Forexample,let’stakethewordsworlandwork.Tochangetheworltermtowork,weneedtochangethellettertok,soitmeansadistanceof1.Thetextprovidedtothesuggesterisofcourseanalyzedandthentermsarechosentobesuggested.

TermsuggesterconfigurationoptionsThecommonandmostusedtermsuggesteroptionscanbeusedforallthesuggesterimplementationsthatarebasedonthetermone.Currently,thesearethephrasesuggesterandofcoursethebasetermone.Theavailableoptionsare:

text:Thetextwewanttogetthesuggestionsfor.Thisparameterisrequiredinorderforthesuggestertowork.field:Anotherrequiredparameterthatweneedtoprovide.Thefieldparameterallowsustosetwhichfieldthesuggestionsshouldbegeneratedfor.analyzer:Thenameoftheanalyzerwhichshouldbeusedtoanalyzethetextprovidedinthetextparameter.Ifnotset,Elasticsearchutilizestheanalyzerusedforthefieldprovidedbythefieldparameter.size:Defaultsto5andspecifiesthemaximumnumberofsuggestionsallowedtobereturnedbyeachtermprovidedinthetextparameter.suggest_mode:Controlswhichsuggestionswillbeincludedandforwhattermsthesuggestionswillbereturned.Thepossibleoptionsare:missing–thedefaultbehavior,whichmeansthatthesuggesterwillonlyprovidesuggestionsfortermsthatarenotpresentintheindex;popular–meansthatthesuggestionswillonlybereturnedwhentheyaremorefrequentthantheprovidedterm;andfinallyalwaysmeansthatsuggestionswillbereturnedeverytime.sort:AllowsustospecifyhowthesuggestionsaresortedintheresultreturnedbyElasticsearch.Bydefault,itissettoscore,whichtellsElasticsearchthatthesuggestionsshouldbesortedbythesuggestionscorefirst,thesuggestiondocumentfrequencynext,andfinallybytheterm.Thesecondpossiblevalueisfrequency,whichmeansthattheresultsarefirstsortedbythedocumentfrequency,thenbythescore,andfinallybytheterm.

AdditionaltermsuggesteroptionsInadditiontotheprecedingcommontermsuggestoptions,Elasticsearchallowsustouseadditionalonesthatonlymakesenseforthetermsuggesteritself.Someoftheseoptionsareasfollows:

lowercase_terms:Whensettotrue,ittellsElasticsearchtolowercaseallthetermsthatareproducedfromthetextfieldafteranalysis.max_edits:Itdefaultsto2andspecifiesthemaximumeditdistancethatthesuggestioncanhavetobereturnedasatermsuggestion.Elasticsearchallowsusto

www.EBooksWorld.ir

Page 518: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

setthisvalueto1or2.prefix_len:Bydefault,itissetto1.Ifwearestrugglingwithsuggesterperformance,increasingthisvaluewillimprovetheoverallperformance,becausefewersuggestionswillneedtobeprocessed.min_word_len:Itdefaultsto4andspecifiestheminimumnumberofcharactersasuggestionmusthaveinordertobereturnedonthesuggestionslist.shard_size:Itdefaultstothevaluespecifiedbythesizeparameterandallowsustosetthemaximumnumberofsuggestionsthatshouldbereadfromeachshard.Settingthispropertytovalueshigherthanthesizeparametercanresultinmoreaccuratedocumentfrequencyatthecostofdegradationinsuggesterperformance.

NoteTheprovidedlistofparametersdoesnotcontainalltheoptionsthatareavailableforthetermsuggester.RefertotheofficialElasticsearchdocumentationforreference,athttps://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-term.html.

www.EBooksWorld.ir

Page 519: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PhrasesuggesterThetermsuggesterprovidesagreatwaytocorrectuserspellingmistakesonpertermbasis,butitisnotgreatforphrases.That’swhythephrasesuggesterwasintroduced.Itisbuiltontopofthetermsuggester,butaddsadditionalphrasecalculationlogictoit.

Let’sstartwithanexampleofhowtousethephrasesuggester.Thistimewewillomitthequerysectioninourquery.Wedothatbyrunningthefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"suggest":{

"text":"sherlockholnes",

"our_suggestion":{

"phrase":{"field":"_all"}

}

}

}'

Asyoucanseeintheprecedingcommand,itisalmostthesameaswesentwhenusingthetermsuggester,butinsteadofspecifyingthetermsuggestertypewe’vespecifiedthephrasetype.Theresponsetotheprecedingcommandisasfollows:

{

"took":24,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":1.0,

"hits":[...]

},

"suggest":{

"our_suggestion":[{

"text":"sherlockholnes",

"offset":0,

"length":15,

"options":[{

"text":"sherlockholmes",

"score":0.12227806

}]

}]

}

}

Asyoucansee,theresponseisverysimilartotheonereturnedbythetermsuggesterbut,insteadofasinglewordbeingreturned,itisalreadycombinedandreturnedasaphrase.

ConfigurationBecausethephrasesuggesterisbasedonthetermsuggester,itcanalsousesomeofthe

www.EBooksWorld.ir

Page 520: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

configurationoptionsprovidedbyit.Thoseoptionsare:text,size,analyzer,andshard_size.Inadditiontothementionedproperties,thephrasesuggesterexposesadditionaloptions.Someoftheseoptionsare:

max_errors:Specifiesthemaximumnumber(orpercentage)oftermsthatcanbeerroneousinordertocreateacorrectionusingit.Thevalueofthispropertycanbeeitheranintegernumber,suchas1,orafloatbetween0and1whichwillbetreatedasapercentagevalue.Bydefault,itissetto1,whichmeansthatatmostasingletermcanbemisspelledinagivencorrection.separator:Defaultstoawhitespacecharacterandspecifiestheseparatorthatwillbeusedtodividethetermsintheresultingbigramfield.

NoteTheprovidedlistofparametersdoesnotcontainalltheoptionsthatareavailableforthephrasesuggester.Infact,thelistiswaymoreextensivethanwhatwe’veprovided.RefertotheofficialElasticsearchdocumentationforreference,athttps://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html,ortoMasteringElasticsearchSecondEditionpublishedbyPacktPublishing.

www.EBooksWorld.ir

Page 521: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CompletionsuggesterThecompletionsuggesterallowsustocreateautocompletefunctionalityinaveryperformance-effectiveway,becauseofstoringcomplicatedstructuresintheindexinsteadofcalculatingthemduringquerytime.WeneedtoprepareElasticsearchforthatbyusingadedicatedfieldtypecalledcompletion.Let’sassumethatwewanttocreateanautocompletefeaturetoallowustoshowbookauthors.Inadditiontoauthor’snamewewanttoreturntheidentifiersofthebooksshe/hewrote.Westartwithcreatingtheauthorsindexbyrunningthefollowingcommand:

curl-XPOST'localhost:9200/authors'-d'{

"mappings":{

"author":{

"properties":{

"name":{"type":"string"},

"ac":{

"type":"completion",

"payloads":true,

"analyzer":"standard",

"search_analyzer":"standard"

}

}

}

}

}'

Ourindexwillcontainasingletypecalledauthor.Eachdocumentwillhavetwofields:thenameandtheacfield,whichisthefieldwewilluseforautocomplete.We’vedefinedtheacfieldusingthecompletiontype.Inadditiontothat,we’veusedthestandardanalyzerforboththeindexandthequerytime.Thelastthingisthepayload-theadditional,optionalinformationwewillreturnalongwiththesuggestion-inourcaseitwillbeanarrayofbookidentifiers.

IndexingdataToindexthedata,weneedtoprovidesomeadditionalinformationalongwiththeonesweusuallyprovideduringindexing.Let’slookatthefollowingcommandsthatindextwodocumentsdescribingtheauthors:

curl-XPOST'localhost:9200/authors/author/1'-d'{

"name":"FyodorDostoevsky",

"ac":{

"input":["fyodor","dostoevsky"],

"output":"FyodorDostoevsky",

"payload":{"books":["123456","123457"]}

}

}'

curl-XPOST'localhost:9200/authors/author/2'-d'{

"name":"JosephConrad",

"ac":{

"input":["joseph","conrad"],

"output":"JosephConrad",

www.EBooksWorld.ir

Page 522: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"payload":{"books":["121211"]}

}

}'

Notethestructureofthedatafortheacfield.Wehaveprovidedtheinput,output,andpayloadproperties.Theoptionalpayloadpropertyisusedtoprovidetheadditionalinformationthatwillbereturned.Theinputpropertyisusedtoprovidetheinputinformationthatwillbeusedforbuildingthecompletionusedbythesuggester.Itwillbeusedforuserinputmatching.Theoptionaloutputpropertyisusedtotellthesuggesterwhichdatashouldbereturnedforthedocument.

Wecanalsoomittheadditionalparameterssectionandindexdatainthewayweareusedto,justlikeinthefollowingexample:

curl-XPOST'localhost:9200/authors/author/1'-d'{

"name":"FyodorDostoevsky",

"ac":"FyodorDostoevsky"

}'

However,becausethecompletionsuggesterusesFSTunderthehood,wewon’tbeabletofindtheprecedingdocumentbystartingwiththesecondpartoftheacfield.That’swhywethinkthatindexingthedatainthewayweshowedfirstismoreconvenient,becausewecanexplicitlycontrolwhatwewanttomatchandwhatwewanttoshowasanoutput.

QueryingindexedcompletionsuggesterdataIfwewanttofinddocumentsthathaveauthorsstartingwithfyo,werunthefollowingcommand:

curl-XGET'localhost:9200/authors/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fyo",

"completion":{

"field":"ac"

}

}

}'

Beforewelookattheresults,let’sdiscussthequery.Asyoucansee,we’verunthecommandtothe_suggestendpoint,becausewedon’twanttorunastandardquery;wearejustinterestedintheautocompleteresults.Thequeryisquitesimple.WesetitsnametoauthorsAutocomplete,wesetthetextwewanttogetthecompletionfor(thetextproperty),andweaddedthecompletionobjectwiththeconfigurationinit.Theresultoftheprecedingcommandlooksasfollows:

{

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

www.EBooksWorld.ir

Page 523: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"length":3,

"options":[{

"text":"FyodorDostoevsky",

"score":1.0,

"payload":{

"books":["123456","123457"]

}

}]

}]

}

Asyoucanseeintheresponse,wegetthedocumentwewerelookingforalongwiththepayloadinformation,ifitisavailable(fortheprecedingresponse,itisnot).

Wecanalsousefuzzysearches,whichallowustotoleratespellingmistakes.Wedothatbyincludingtheadditionalfuzzysectioninourquery.Forexample,toenablefuzzymatchinginthecompletionsuggesterandsetthemaximumeditdistanceto2(whichmeansthatamaximumoftwoerrorsareallowed),wesendthefollowingquery:

curl-XGET'localhost:9200/authors/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fio",

"completion":{

"field":"ac",

"fuzzy":{

"edit_distance":2

}

}

}

}'

Althoughwe’vemadeaspellingmistake,wewillstillgetthesameresultsaswegotearlier.

CustomweightsBydefault,thetermfrequencyisusedtodeterminetheweightofthedocumentreturnedbytheprefixsuggester.However,thismaynotbethebestsolution.Insuchcases,itisusefultodefinetheweightofthesuggestionbyspecifyingtheweightpropertyforthefielddefinedascompletion.Theweightpropertyshouldbesettoanintegervalue.Thehighertheweightpropertyvalue,themoreimportantthesuggestion.Forexample,ifwewanttospecifyaweightforthefirstdocumentinourexample,werunthefollowingcommand:

curl-XPOST'localhost:9200/authors/author/1'-d'{

"name":"FyodorDostoevsky",

"ac":{

"input":["fyodor","dostoevsky"],

"output":"FyodorDostoevsky",

"payload":{"books":["123456","123457"]},

"weight":30

}

}'

www.EBooksWorld.ir

Page 524: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Nowifwerunourexamplequery,theresultswillbeasfollows:

{

...

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

"length":3,

"options":[{

"text":"FyodorDostoevsky",

"score":30.0,

"payload":{

"books":["123456","123457"]

}

}]

}]

}

Lookhowthescoreoftheresultchanged.Inourinitialexample,itwas1.0andnowitis30.0.Thisissobecausewesettheweightparameterto30duringindexing.

ContextsuggesterThecontextsuggesterisanextensiontotheElasticsearchSuggestAPIforElasticsearch2.1andolderversionsthatwejustdiscussed.WhendescribingthecompletionsuggesterforElasticsearch2.1,wementionedthatthissuggesterallowsustohandlesuggester-relatedsearchesentirelyinmemory.Usingthissuggester,wecandefinethesocalledcontextforthequerythatwilllimitthesuggestionstoasubsetofdocuments.Becausewedefinethecontextinthemappings,itiscalculatedduringindexation,whichmakesquerytimecalculationseasierandlessdemandingintermsofperformance.

NoteRememberthatthissectionisrelatedtoElasticsearch2.1.ContextsinElasticsearch2.2arehandleddifferentlyandwerediscussedwhendiscussingthecompletionsuggester.

Contexttypes

Elasticsearch2.1supportstwotypesofcontext:categoryandgeo.Thecategorytypeofcontextallowsustoassignadocumenttooneormorecategoriesduringtheindextime.Later,duringthequerytime,wecantellElasticsearchwhichcategoryweareinterestedinandElasticsearchwilllimitthesuggestionstothosecategories.Thegeocontextallowsustolimitthedocumentsreturnedbythesuggesterstoagivenlocationortoacertaindistancefromapoint.Thenicethingaboutcontextisthatwecanhavemultiplecontexts.Forexample,wecanhaveboththecategorycontextandthegeocontextforthesamedocument.Let’snowseewhatweneedtodotousecontextinsuggestions.

Usingcontext

Usingthegeoandcategorycontextisverysimilar–theyjustdifferinparameters.Wewillshowyouhowtousecontextsinanexampleusingthesimplercategorycontextandlaterwewillgetbacktothegeocontextandshowyouwhatweneedtoprovide.

www.EBooksWorld.ir

Page 525: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Thefirststepwhenusingcontextsuggesteriscreatingapropermapping.Let’sgetbacktoourauthormapping,butthistimelet’sassumethateachauthorcanbegivenoneormorecategory–thebrandofbooksshe/heiswriting.Thiswillbeourcontext.Themappingsusingthecontextlookasfollows:

curl-XPOST'localhost:9200/authors_geo_context'-d'{

"mappings":{

"author":{

"properties":{

"name":{"type":"string"},

"ac":{

"type":"completion",

"analyzer":"simple",

"search_analyzer":"simple",

"context":{

"brand":{

"type":"category",

"default":["none"]

}

}

}

}

}

}

}'

We’veintroducedanewsectioninouracfielddefinition:context.Eachcontextisgivenaname,whichisbrandinourcase,andinsidethatobjectweprovideconfiguration.Weneedtoprovidethetypeusingthetypeproperty–wewillbeusingthecategorycontextsuggesternow.Inadditiontothat,we’vesetthedefaultarray,whichprovidesuswiththevalueorvaluesthatshouldbeusedasthedefaultcontext.Ifwewant,wecanalsoprovidethepathproperty,whichwillpointElasticsearchtoafieldinthedocumentsfromwhichthecontextvalueshouldbetaken.

Wecannowindexasingleauthorbymodifyingthecommandsweusedearlier,becauseweneedtoprovidethecontext:

curl-XPOST'localhost:9200/authors_context/author/1'-d'{

"name":"FyodorDostoevsky",

"ac":{

"input":"FyodorDostoevsky",

"context":{

"brand":"drama"

}

}

}'

Asyoucansee,theacfielddefinitionisabitdifferentnow;itisanobject.Theinputpropertyisusedtoprovidethevalueforautocompleteandthecontextobjectisusedtoprovidethevaluesforeachofthecontextsdefinedinthemappings.

Finally,wecanquerythedata.Asyoucouldimagine,wewillagainprovidethecontextweareinterestedin.Thequerythatdoesthatlooksasfollows:

www.EBooksWorld.ir

Page 526: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

curl-XGET'localhost:9200/authors_context/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fyo",

"completion":{

"field":"ac",

"context":{

"brand":"drama"

}

}

}

}'

Asyoucansee,we’veincludedthecontextobjectinthequeryinsidethecompletionsectionandwe’vesetthecontextweareinterestedinusingthecontextname.TheresponsereturnedbyElasticsearchisasfollows:

{

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

"length":3,

"options":[{

"text":"FyodorDostoevsky",

"score":1.0

}]

}]

}

However,ifwechangethebrandcontexttocomedy,forexample,Elasticsearchwillreturnnoresults,becausewedon’thaveauthorswithsuchacontext.Let’stestitbyrunningthefollowingquery:

curl-XGET'localhost:9200/authors_context/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fyo",

"completion":{

"field":"ac",

"context":{

"brand":"comedy"

}

}

}

}'

ThistimeElasticsearchreturnsthefollowingresponse:

{

"_shards":{

"total":5,

"successful":5,

"failed":0

www.EBooksWorld.ir

Page 527: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

},

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

"length":3,

"options":[]

}]

}

Thisisbecausenoauthorwiththebrandcontextandthevalueofcomedyispresentintheauthors_contextindex.

Usingthegeolocationcontext

Thegeocontextissimilartothecategorycontextwhenitcomestousingit.However,insteadoffilteringbyterms,wefilterusinggeographicalpointsanddistances.Whenweusethegeocontext,weneedtoprovideprecision,whichdefinestheprecisionofthecalculatedgeohash.Thesecondpropertythatweprovideistheneighborsone,whichcanbesettotrueorfalse.Bydefault,itissettotrue,whichmeansthattheneighboringgeohasheswillbeincludedinthecontext.

Inadditiontothat,similartothecategorycontext,wecanprovidepath,whichspecifieswhichfieldtouseasthelookupforthegeographicalpoint,andthedefaultproperty,specifyingthedefaultgeopointforthedocuments.

Forexample,let’sassumethatwewanttofilteronthebirthplaceofourauthors.Themappingsforsuchasuggesterwilllookasfollows:

curl-XPOST'localhost:9200/authors_geo_context'-d'{

"mappings":{

"author":{

"properties":{

"name":{"type":"string"},

"ac":{

"type":"completion",

"analyzer":"simple",

"search_analyzer":"simple",

"context":{

"birth_location":{

"type":"geo",

"precision":["1000km"],

"neighbors":true,

"default":{

"lat":0.0,

"lon":0.0

}

}

}

}

}

}

}

}'

Nowwecanindexthedocumentsandprovidethebirthlocation.Forourexampleauthor,

www.EBooksWorld.ir

Page 528: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

itwilllookasfollows(thecentreofMoscow):

curl-XPOST'localhost:9200/authors_geo_context/author/1'-d'{

"name":"FyodorDostoevsky",

"ac":{

"input":"FyodorDostoevsky",

"context":{

"birth_location":{

"lat":55.75,

"lon":37.61

}

}

}

}'

Asyoucansee,we’veprovidedthebirth_locationcontextforourauthor.

Nowduringquerytime,weneedtoprovidethecontextthatweareinterestedinandwecan(butwearenotobligatedto)providetheprecisionasthesubsetoftheprecisionvaluesprovidedinthemappings.We’vedefinedtheprecisionto1000km,solet’sfindalltheauthorsstartingwithfyothatwereborninKazan,whichisabout800kmfromMoscow.Weshouldfindourexampleauthor.

Thequerythatdoesthatlooksasfollows:

curl-XGET'localhost:9200/authors_geo_context/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fyo",

"completion":{

"field":"ac",

"context":{

"birth_location":{

"lat":55.45,

"lon":49.8

}

}

}

}

}'

TheresponsereturnedbyElasticsearchlooksasfollows:

{

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

"length":3,

"options":[{

"text":"FyodorDostoevsky",

"score":1.0

}]

www.EBooksWorld.ir

Page 529: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}]

}

However,ifwerunthesamequerybutpointtotheNorthPole,wewillgetnoresults:

curl-XGET'localhost:9200/authors_geo_context/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fyo",

"completion":{

"field":"ac",

"context":{

"birth_location":{

"lat":0.0,

"lon":0.0

}

}

}

}

}'

ThefollowingistheresponsefromElasticsearchinthiscase:

{

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

"length":3,

"options":[]

}]

}

www.EBooksWorld.ir

Page 530: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 531: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheScrollAPILet’simaginethatwehaveanindexwithseveralmilliondocuments.Wealreadyknowhowtobuildourqueryandsoon.However,whentryingtofetchalargenumberofdocuments,youseethatwhengettingfurtherandfurtherwithpagesoftheresults,thequeriesslowdownandfinallytimeoutorresultinmemoryissues.

Thereasonforthisisthatfull-textsearchengines,especiallythosethataredistributed,don’thandlepagingverywell.Ofcourse,gettingafewhundredpagesofresultsisnotaproblemforElasticsearch,butforgoingthroughalltheindexeddocumentsorthroughlargeresultset,aspecializedAPIhasbeenintroduced.

www.EBooksWorld.ir

Page 532: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ProblemdefinitionWhenElasticsearchgeneratesaresponse,itmustdeterminetheorderofthedocumentsthatformtheresult.Ifweareonthefirstpage,thisisnotabigproblem.Elasticsearchjustfindsthesetofdocumentsandcollectsthefirstones;let’ssay,20documents.Butifweareonthetenthpage,Elasticsearchhastotakeallthedocumentsfrompagesonetotenandthendiscardtheonesthatareonpagesonetonine.Thisisevenmorecomplicatedifwehaveadistributedenvironment,becausewedon’tknowfromwhichnodestheresultswillcome.Becauseofthat,eachnodeneedstobuildtheresponseandkeepitinmemoryforsometime.TheproblemisnotElasticsearch-specific;asimilarsituationcanbefoundinthedatabasesystems,forexample,generally,ineverysystemthatusestheso-calledpriorityqueue.

www.EBooksWorld.ir

Page 533: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ScrollingtotherescueThesolutionissimple.SinceElasticsearchhastodosomeoperations(determinethedocumentsforthepreviouspages)foreachrequest,wecanaskElasticsearchtostorethisinformationforsubsequentqueries.Thedrawbackisthatwecannotstorethisinformationforeverduetolimitedresources.Elasticsearchassumesthatwecandeclarehowlongweneedthisinformationtobeavailable.Let’sseehowitworksinpractice.

Firstofall,wequeryElasticsearchasweusuallydo.However,inadditiontoalltheknownparameters,weaddonemore:theparameterwiththeinformationthatwewanttousescrollingwithandhowlongwesuggestthatElasticsearchshouldkeeptheinformationabouttheresults.Wecandothisbysendingaqueryasfollows:

curl'localhost:9200/library/_search?pretty&scroll=5m'-d'{

"size":1,

"query":{

"match_all":{}

}

}'

Thecontentofthisqueryisirrelevant.TheimportantthingishowElasticsearchmodifiestheresponse.LookatthefollowingfirstfewlinesoftheresponsereturnedbyElasticsearch:

{

"_scroll_id":

"cXVlcnlUaGVuRmV0Y2g7NTsxNjo1RDNrYnlfb1JTeU1sX20yS0NRSUZ3OzE3OjVEM2tieV9vUl

N5TWxfbTJLQ1FJRnc7MTg6NUQza2J5X29SU3lNbF9tMktDUUlGdzsxOTo1RDNrYnlfb1JTeU1sX

20yS0NRSUZ3OzIwOjVEM2tieV9vUlN5TWxfbTJLQ1FJRnc7MDs=",

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

...

Thenewpartisthe_scroll_idsection.Thisisahandlethatwewilluseinthequeriesthatfollow.Elasticsearchhasaspecialendpointforthis:the_search/scrollendpoint.Let’slookatthefollowingexample:

curl-XGET'localhost:9200/_search/scroll?pretty'-d'{

"scroll":"5m",

"scroll_id":

"cXVlcnlUaGVuRmV0Y2g7NTsyNjo1RDNrYnlfb1JTeU1sX20yS0NRSUZ3OzI3OjVEM2tieV9vUl

N5TWxfbTJLQ1FJRnc7Mjg6NUQza2J5X29SU3lNbF9tMktDUUlGdzsyOTo1RDNrYnlfb1JTeU1sX

20yS0NRSUZ3OzMwOjVEM2tieV9vUlN5TWxfbTJLQ1FJRnc7MDs="

}'

Noweverycalltothisendpointwithscroll_idreturnsthenextpageofresults.

www.EBooksWorld.ir

Page 534: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Rememberthatthishandleisonlyvalidforthedefinedtimeofinactivity.

Ofcourse,thissolutionisnotideal,anditisnotveryappropriatewhentherearemanyrequeststorandompagesofvariousresultsorwhenthetimebetweentherequestsisdifficulttodetermine.However,youcanusethissuccessfullyforusecaseswhereyouwanttogetlargerresultsets,suchastransferringdatabetweenseveralsystems.

www.EBooksWorld.ir

Page 535: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 536: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SummaryInthechapterthatwejustfinished,welearnedaboutsomefunctionalitiesofElasticsearchthatwewon’tprobablyuseeverydayoratleastnoteveryoneofuswillusethem.Wediscussedpercolator–anupsidedownsearchfunctionalitythatallowsustoindexqueriesandfindwhichdocumentsmatchthem.WelearnedaboutthespatialcapabilitiesofElasticsearchandweusedsuggesterstocorrectuserspellingmistakesandbuildahighlyefficientautocompletefunctionality.WealsousedtheScrollAPItoefficientlyfetchlargenumberofresultsfromourElasticsearchindices.

Inthenextchapter,wewillfocusonclustersanditsconfiguration.Wewilldiscussnodediscovery,gateway,andrecoverymodules–whattheyareresponsibleforandhowtoconfigurethemtomatchourneeds.Wewillusetemplatesanddynamictemplates,andwewillseehowtoinstallpluginsextendingElasticsearch’sout-of-theboxfunctionalities.WewilllearnwhatarethecachesofElasticsearchcachesareandhowtoconfigurethemefficientlytomakethemostoutofthem.Finally,wewillusetheupdatesettingsAPItoupdateElasticsearchconfigurationonliveandrunningclusters.

www.EBooksWorld.ir

Page 537: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 538: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Chapter9.ElasticsearchClusterinDetailThepreviouschapterwasfullydedicatedtosearchfunctionalitiesthatarenotonlyaboutfulltextsearching.Welearnedhowtousepercolator–aninversedsearchthatallowsustobuildalteringfunctionalitiesontopofElasticsearch.WelearnedtousespatialfunctionalitiesofElasticsearchandweusedthesuggestAPIthatallowedustocorrectuser’sspellingmistakesaswellasbuildveryefficientautocompletefunctionalities.Butlet’snowfocusonrunningandadministeringElasticsearch.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

HowdoesElasticsearchfindnewnodesthatshouldjointheclusterWhatarethegatewayandrecoverymodulesHowdotemplatesworkHowtousedynamictemplatesHowtousetheElasticsearchpluginmechanismWhatarethecachesinElasticsearchandhowtotunethemHowtousetheUpdateSettingsAPItoupdateElasticsearchsettingsonrunningclusters

www.EBooksWorld.ir

Page 539: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UnderstandingnodediscoveryWhenstartingyourElasticsearchnode,oneofthefirstthingsthathappensislookingforamasternodethathasthesameclusternameandisvisible.Ifamasterisfound,thenodegetsjoinedintoanalreadyformedcluster.Ifnomasterisfound,thenthenodeitselfisselectedasamaster(ofcourseiftheconfigurationallowssuchbehavior).Theprocessofformingaclusterandfindingnodesiscalleddiscovery.Themoduleresponsiblefordiscoveryhastwomainpurposes:electingamasteranddiscoveringnewnodeswithinacluster.Inthissection,wewilldiscusshowwecanconfigureandtunethediscoverymodule.

www.EBooksWorld.ir

Page 540: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DiscoverytypesBydefault,withoutinstallingadditionalplugins,ElasticsearchallowsustouseZendiscovery,whichprovidesuswithunicastdiscovery.Unicast(http://en.wikipedia.org/wiki/Unicast)allowstransmissionofasinglemessageoverthenetworktoasinglehostatonce.Elasticsearchnodesendsthemessagetothenodesdefinedintheconfigurationandwaitsforaresponse.Whenthenodeisacceptedintothecluster,therecoverymodulekicksinandstartstherecoveryprocessifneeded,orthemasterelectionprocessifthemasterisstillnotelected.

NotePriortoElasticsearch2.0,theZendiscoverymoduleallowedustousemulticastdiscovery.Onamulticastcapablenetwork,ElasticsearchwasabletoautomaticallydiscovernodeswithoutspecifyinganyIPaddressesofotherElasticsearchserverssharingthesameclustername.Thiswasverymistakeproneandnotadvisedforproductionuseandthusitwasdeprecatedandremovedtoaplugin.

Elasticsearcharchitectureisdesignedtobepeertopeer.Whenrunningoperationssuchasindexingorsearching,themasternodedoesn’ttakepartincommunicationandtherelevantnodescommunicatewitheachotherdirectly.

www.EBooksWorld.ir

Page 541: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

NoderolesElasticsearchnodescanbeconfiguredtoworkinoneofthefollowingroles:

Master:Thenoderesponsibleformaintainingtheglobalclusterstate,changingitdependingontheneeds,andhandlingtheadditionandremovalofnodes.Therecanonlybeasinglemasternodeactiveinasinglecluster.Data:Thenoderesponsibleforholdingthedataandexecutingdatarelatedoperations(indexationandsearching)ontheshardsthatarepresentlocallyforthenode.Client:Thenoderesponsibleforhandlingrequests.Fortheindexingrequests,theclientnodeforwardstherequesttotheappropriateprimaryshardand,forthesearchrequests,itsendsittoalltherelevantshardsandaggregatestheresults.

Bydefault,eachnodecanworkasmaster,data,orclient.Itcanbeadataandaclientatthesametimeforexample.Onlargeandhighlyloadedclusters,itisveryimportanttodividetherolesofthenodesintheclusterandhavethenodesdoonlyasingleroleatatime.Whendealingwithsuchclusters,youwilloftenseeatleastthreemasternodes,multipledatanodes,andafewclientonlynodesaspartofthewholecluster.

MasternodeItisthemostimportantnodetypefromElasticsearchcluster’spointofview.Ithandlestheclusterstate,changesit,managesthenodesjoiningandleavingthecluster,checksthehealthoftheothernodesinthecluster(byrunningpingrequests),andmanagestheshardrelocationoperations.Ifthemasterissomehowdisconnectedfromthecluster,theremainingnodeswillselectanewmasterfromeachother.Alltheseprocessesaredoneautomaticallyonthebasisoftheconfigurationvaluesweprovide.YouusuallywantthemasternodestoonlycommunicatewiththeotherElasticsearchnodes,usingtheinternalJavacommunication.Toavoidhittingthemasternodesbymistake,itisadvisedtoturnofftheHTTPmoduleforthemintheconfiguration.

DatanodeThedatanodeisresponsibleforholdingthedataintheindices.Thedatanodesaretheonesthatneedthemostdiskspacebecauseofbeingloadedwithdataindexationrequestsandrunningsearchesonthedatatheyhavelocally.Thedatanodes,similartothemasternodescanhavetheHTTPmoduledisabled.

ClientnodeTheclientnodesareinmostcasesnodesthatdon’thaveanydataandarenotmasternodes.Theclientnodesaretheonesthatcommunicatewiththeoutsideworldandwithallthenodesinthecluster.Theyforwardthedatatotheappropriateshardsandaggregatethesearchandaggregationsresultsfromalltheothernodes.

Keepinmindthatclientnodescanhavedataaswell,butinsuchacasetheywillrunboththeindexingrequestsandthesearchrequestsforthelocaldataandwillaggregatethedatafromtheothernodes,whichinlargeclustersmaybetoomuchworkforasinglenode.

www.EBooksWorld.ir

Page 542: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ConfiguringnoderolesBydefault,Elasticsearchallowseverynodetobeamasternode,adatanode,oraclientnode.However,aswealreadymentioned,incertainsituationsyoumaywanttohavenodesthatonlyholddata,clientnodesthatareonlyusedtoprocessrequests,andmasterhoststomanagethecluster.Onesuchsituationiswhenmassiveamountsofdataneedstobehandled,wherethedatanodesshouldbeasperformantaspossible.TotellElasticsearchwhatroleitshouldtake,weusethreeBooleanpropertiessetintheelasticsearch.ymlconfigurationfile:

node.master:Whensettotrue,wetellElasticsearchthatthenodeismastereligible,whichmeansthatitcantaketheroleofamaster.However,notethatthemasterwillbeautomaticallymarkedasnotmastereligibleassoonasitisassignedaclientrole.node.data:Whensettotrue,wetellElasticsearchthatthenodecanbeusedtoholddata.node.client:Whensettotrue,wetellElasticsearchthatthenodeshouldbeusedasaclient.

So,tosetanodetoonlyholddata,weshouldaddthefollowingpropertiestotheelasticsearch.ymlconfigurationfile:

node.master:false

node.data:true

node.client:false

Tosetthenodetonotholddataandonlybeamasternode,weneedtoinstructElasticsearchthatwedon’twantthenodetoholddata.Inordertodothis,weaddthefollowingpropertiestotheelasticsearch.ymlconfigurationfile:

node.master:true

node.data:false

node.client:false

www.EBooksWorld.ir

Page 543: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Settingthecluster’snameIfwedon’tsetthecluster.namepropertyinourelasticsearch.ymlfile,Elasticsearchusestheelasticsearchdefaultvalue.Thisisnotagoodthing,becauseeachnewElasticsearchnodewillhavethesameclusternameandyoumaywanttohavemultipleclustersinthesamenetwork.Insuchacase,connectingthewrongnodestogetherisjustamatteroftime.Becauseofthat,wesuggestsettingthecluster.namepropertytosomeothervalueofyourchoice.Usually,itisagoodideatoadjustclusternamesbasedonclusterresponsibilities.

www.EBooksWorld.ir

Page 544: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ZendiscoveryThedefaultdiscoverymethodusedbyElasticsearchandonethatiscommonlyusedintheElasticsearchworldiscalledZendiscovery.Itsupportsunicastdiscoveryandallowsadjustingvariouspartsofitsconfiguration.

NoteNotethatthereareadditionaldiscoverytypesavailableasplugins,suchasAmazonEC2discovery,MicrosoftAzurediscovery,andGoogleComputeEnginediscovery.

MasterelectionconfigurationImaginethatyouhaveaclusterthatisbuiltof10nodes.Everythingisworkingfineuntilonedaywhenyournetworkfailsand3ofyournodesaredisconnectedfromthecluster,buttheystillseeeachother.BecauseoftheZendiscoveryandmasterelectionprocess,thenodesthatgotdisconnectedelectanewmasterandyouendupwithtwoclusterswiththesamename,withtwomasternodes.Suchasituationiscalledasplit-brainandyoumustavoiditasmuchaspossible.Whensplit-brainhappens,youendupwithtwo(ormore)clustersthatwon’tjoineachotheruntilthenetwork(oranyother)problemsarefixed.Thethingtorememberisthatsplit-brainmayresultinnotrecoverableerrors,suchasdataconflictsinwhichyouendupwithdatacorruptionorpartialdataloss.That’swhyitisimportanttoavoidsuchsituationsatallcosts.

Inordertopreventsplit-brainsituations,Elasticsearchprovidesadiscovery.zen.minimum_master_nodesproperty.Thispropertydefinestheminimumamountofmastereligiblenodesthatshouldbeconnectedtoeachotherinordertoformacluster.Sonowlet’sgetbacktoourcluster;ifwesetthediscovery.zen.minimum_master_nodespropertyto50percentofthetotalnodesavailable+1(whichis6inourcase),wewillendupwithasinglecluster.Whyisthat?Beforethenetworkfailure,wehad10nodes,whichismorethansixnodes,andthosenodesformedacluster.Afterthedisconnectionofthethreenodes,wewouldstillhavethefirstclusterupandrunning.However,becauseonlythreenodesgotdisconnectedandthreeislessthansix,thesethreenodeswouldn’tbeallowedtoelectanewmasterandtheywouldwaitforreconnectionwiththeoriginalcluster.

Ofcoursethisisalsonotaperfectscenario.Itisadvisedtohaveadedicatedmastereligiblenodesonly,thatdon’tworkasdataorclientnodes.Tohaveaquoruminsuchacase,weneedatleastthreededicatedmastereligiblenodes,becausethatwillallowustohaveasinglemasterofflineandstillkeepthequorum.Thisisusuallyenoughtokeeptheclustersinagoodshapewhenitcomestomasterrelatedfeaturesandtobesplit-brainproof.Ofcourse,insuchacase,thediscovery.zen.minimum_master_nodespropertyshouldbesetto2andweshouldhavethethreemasternodesupandrunning.

Furthermore,ElasticsearchallowsustoadditionallyspecifytwoadditionalBooleanproperties:discover.zen.master_election.filter_clientanddiscover.zen.master_election.filter_data.TheyallowustotellElasticsearchtoignorepingrequestsfromtheclientanddatanodesduringmasterelection.Bydefault,the

www.EBooksWorld.ir

Page 545: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

firstmentionedpropertyissettotrueandthesecondissettofalse.ThisallowsElasticsearchtofocusonthemasterelectionandnotbeoverloadedwithpingrequestsfromthenodesthatarenotmastereligible.

Inadditiontothementionedproperties,Elasticsearchallowsconfiguringtimeoutsrelatedtothemasterelectionprocess.discovery.zen.ping_timeout,whichdefaultsto3s(threeseconds),allowsconfiguringtimeoutforslownetworks–thehigherthevalue,thelesserthechanceoffailure,buttheelectionprocesscantakelonger.Thesecondpropertyiscalleddiscover.zen.join_timeoutandspecifiesthetimeoutforthejoinrequesttothemaster.Itdefaultsto20timesthediscovery.zen.ping_timeoutproperty.

ConfiguringunicastBecauseofthewayunicastworks,weneedtospecifyatleastahostthattheunicastmessageshouldbesentto.Todothis,weshouldaddthediscovery.zen.ping.unicast.hostspropertytoourelasticsearch.ymlconfigurationfile.Basically,weshouldspecifyallthehoststhatformtheclusterinthediscovery.zen.ping.unicast.hostsproperty(wedon’thavetospecifyallthehosts,wejustneedtoprovideenoughsothatwearesurethatasingleonewillwork).Forexample,ifwewantthehosts192.168.2.1,192.168.2.2and192.168.2.3forourhost,weshouldspecifytheprecedingpropertyinthefollowingway:

discovery.zen.ping.unicast.hosts:192.168.2.1:9300,192.168.2.2:9300,

192.168.2.3:9300

OnecanalsodefinearangeoftheportsElasticsearchcanuse.Forexample,tosaythatportsfrom9300to9399canbeused,wespecifythefollowing:

discovery.zen.ping.unicast.hosts:192.168.2.1:[9300-9399],192.168.2.2:

[9300-9399],192.168.2.3:[9300-9399]

Notethatthehostsareseparatedwithacommacharacterandwe’vespecifiedtheportonwhichweexpectunicastmessages.

FaultdetectionpingsettingsInadditiontothesettingsdiscussedpreviously,wecanalsocontroloralterthedefaultpingconfiguration.Pingisasignalsentbetweenthenodestocheckiftheyarerunningandresponsive.Themasternodepingsalltheothernodesintheclusterandeachoftheothernodesintheclusterpingsthemasternode.Thefollowingpropertiescanbeset:

discovery.zen.fd.ping_interval:Thisdefaultsto1s(onesecond)andspecifieshowoftenthenodespingeachotherdiscovery.zen.fd.ping_timeout:Thisdefaultsto30s(30seconds)anddefineshowlonganodewillwaitfortheresponsetoitspingmessagebeforeconsideringanodeasunresponsivediscovery.zen.fd.ping_retries:Thisdefaultsto3andspecifieshowmanyretriesshouldbetakenbeforeconsideringanodeasnotworking

Ifyouexperiencesomeproblemswithyournetwork,oryouknowthatyournodesneedmoretimetoseethepingresponse,youcanadjusttheprecedingvaluestotheonesthat

www.EBooksWorld.ir

Page 546: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

aregoodforyourdeployment.

ClusterstateupdatescontrolAswehavealreadydiscussed,themasternodeistheoneresponsibleforhandlingthechangesoftheclusterstateandElasticsearchallowsustocontrolthatprocess.Formostusecases,thedefaultsettingsaremorethanenough,butyoumayrunintosituationswherechangingthesettingsisrequired.

Themasternodeprocessesasingleclusterstatecommandatatime.Firstthemasternodepropagatesthechangestoothernodesandthenitwaitsforresponse.Eachclusterstatechangeisnotconsideredfinisheduntilenoughnodesrespondtothemasterwithacknoledgment.Thenumberofnodesthatneedtorespondisspecifiedbydiscovery.zen.minimum_master_nodes,whichwearealreadyawareof.ThemaximumtimeanElasticsearchnodewaitsforthenodestorespondis30sbydefaultandisspecifiedbythediscovery.zen.commit_timeoutproperty.Ifnotenoughnodesrespondtothemaster,theclusterstatechangeisrejected.

Onceenoughnodesrespondtothemasterpublishmessage,theclusterstatechangeisacceptedonthemasterandtheclusterstateischanged.Oncethatisdone,themastersendsamessagetoallthenodessayingthatthechangecanbeapplied.Thetimeoutofthismessageisagainsetto30secondsandiscontrolledusingthediscovery.zen.publish_timeoutproperty.

DealingwithmasterunavailabilityIfaclusterhasnomasternode,whateverthereasonmaybe,itisnotfullyoperational.Bydefault,wecan’tchangethemetadata,clusterwidecommandswillnotbeworking,andsoon.Elasticsearchallowsustoconfigurethebehaviorofthenodeswhenthemasternodeisnotelected.Todothat,wecanusethediscovery.zen.no_master_blockpropertywhichthesettingsofallandwrite.Settingthispropertytoallmeansthatalltheoperationsonthenodewillberejected,thatis,thesearchoperations,thewriterelatedoperations,andtheclusterwideoperationssuchashealthormappingsretrieval.Settingthispropertytowritemeansthatonlythewriteoperationwillberejected–thisisthedefaultbehaviorofElasticsearch.

www.EBooksWorld.ir

Page 547: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AdjustingHTTPtransportsettingsWhilediscussingthenodediscoverymoduleandprocess,wementionedtheHTTPmoduleafewtimes.WewouldliketogetbacktothattopicnowanddiscussafewusefulpropertieswhendiscussingandusingElasticsearch.

DisablingHTTPThefirstthingisdisablingtheHTTPcompletely.Thisisusefultoensurethatthemasteranddatanodeswon’tacceptanyqueriesorrequestsingeneralfromusers.TodisabletheHTTPtransportcompletely,wejustneedtoaddthehttp.enabledpropertyandsetittofalseinourelasticsearch.ymlfile.

HTTPportElasticsearchallowsustodefinetheportonwhichitwillbelisteningtoHTTPrequests.Thisisdonebyusingthehttp.portproperty.Itdefaultsto9200-9300,whichmeansthatElasticsearchwillstartfrom9200portandincreaseiftheportisnotavailable(sothenextinstancewilluse9201port,andsoon).Thereisalsohttp.publish_port,whichisveryusefulwhenrunningElasticsearchbehindafirewallandwhentheHTTPportisnotdirectlyaccessible.ItdefineswhichportshouldbeusedbytheclientsconnectingtoElasticsearchanddefaultstothesamevalueasthehttp.portproperty.

HTTPhostWecanalsodefinethehosttowhichElasticsearchwillbind.Tospecifyit,weneedtodefinethehttp.hostproperty.Thedefaultvalueistheonesetbythenetworkmodule.Ifneeded,wecansetthepublishhostandthebindhostseparatelyusingthehttp.publish_hostandhttp.bind_hostproperties.Youusuallydon’thavetospecifythesepropertiesunlessyournodeshavenonstandardhostnamesormultiplenamesandyouwantElasticsearchtobindtoasingleoneonly.

YoucanfindthefulllistofpropertiesallowedfortheHTTPmoduleinElasticsearchofficialdocumentationavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/2.2/modules-http.html.

www.EBooksWorld.ir

Page 548: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 549: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThegatewayandrecoverymodulesApartfromourindicesandthedataindexedinsidethem,Elasticsearchneedstoholdthemetadata,suchasthetypemappings,theindexlevelsettings,andsoon.Thisinformationneedstobepersistedsomewheresoitcanbereadduringclusterrecovery.Ofcourse,itcouldbestoredinmemory,butfullclusterrestartorafatalfailurewouldresultinthisinformationbeinglost,whichisnotsomethingthatwewant.ThisiswhyElasticsearchintroducedthegatewaymodule.Youcanthinkaboutitasasafeheavenforyourclusterdataandmetadata.Eachtimeyoustartyourcluster,alltheneededdataisreadfromthegatewayand,whenyoumakeachangetoyourcluster,itispersistedusingthegatewaymodule.

www.EBooksWorld.ir

Page 550: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThegatewayInordertosetthetypeofgatewaywewanttouse,weneedtoaddthegateway.typepropertytotheelasticsearch.ymlconfigurationfileandsetittothelocalvalue.Currently,Elasticsearchrecommendsusingthelocalgatewaytype(gateway.typesettolocal),whichisthedefaultoneandtheonlyoneavailablewithoutadditionalplugins.

Thedefaultlocalgatewaytypestorestheindicesandtheirmetadatainthelocalfilesystem.Comparedtotheothergateways,thewriteoperationtothisgatewayisnotperformedinanasynchronousway,so,wheneverawritesucceeds,youcanbesurethatthedatawaswrittenintothegateway(sobasicallyindexedorstoredinthetransactionlog).

www.EBooksWorld.ir

Page 551: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RecoverycontrolInadditiontochoosingthegatewaytype,Elasticsearchallowsustoconfigurewhentostarttheinitialrecoveryprocess.Therecoveryisaprocessofinitializingalltheshardsandreplicas,readingallthedatafromthetransactionlog,andapplyingthemontheshards.Basically,it’saprocessneededtostartElasticsearch.

Forexample,let’simaginethatwehaveaclusterthatconsistsof10Elasticsearchnodes.WeshouldinformElasticsearchaboutthenumberofnodesbysettinggateway.expected_nodestothatvalue,so10inourcase.WeinformElasticsearchaboutthenumberofexpectednodesthatareeligibletoholdthedataandeligibletobeselectedasamaster.Elasticsearchwillstarttherecoveryprocessimmediatelyifthenumberofnodesintheclusterisequaltothatproperty.

Wewouldalsoliketostarttherecoveryaftersixnodesaretogether.Todothis,weshouldsetthegateway.recover_after_nodespropertyto6.Thispropertyshouldbesettoavaluethatensuresthatthenewestversionoftheclusterstatesnapshotwillbeavailable,whichusuallymeansthatyoushouldstartrecoverywhenmostofyournodesareavailable.

Thereisalsoonemorething.Wewouldlikethegatewayrecoveryprocesstostart5minutesafterthegateway.recover_after_nodesconditionismet.Todothis,wesetthegateway.recover_after_timepropertyto5m.Thispropertytellsthegatewaymodulehowlongtowaitwiththerecoveryprocessafterthenumberofnodesreachedtheminimumspecifiedbythegateway.recovery_after_nodesproperty.Wemaywanttodothisbecauseweknowthatournetworkisquiteslowandwewantthenodescommunicationtobestable.NotethatElasticsearchwon’tdelaytherecoveryifthenumberofmasteranddataeligiblenodesthatformedtheclusterisequaltothevalueofthegateway.expected_nodesproperty.

Theprecedingpropertyvaluesshouldbesetintheelasticsearch.ymlconfigurationfile.Forexample:ifwewouldliketohavethepreviouslydiscussedvalueinthementionedfile,wewouldendupwiththefollowingsectioninthefile:

gateway.recover_after_nodes:6

gateway.recover_after_time:5m

gateway.expected_nodes:10

AdditionalgatewayrecoveryoptionsInadditiontothementionedoptions,Elasticsearchallowsussomeadditionaldegreeofcontrol.Theseadditionaloptionsare:

gateway.recover_after_master_nodes:Thisissimilartothegateway_recover_after_nodesproperty,butinsteadoftakingintoconsiderationallthenodes,itallowsustospecifyhowmanymastereligiblenodesshouldbepresentintheclusterbeforerecoverystartsgateway.recover_after_data_nodes:Thisisalsosimilartothegateway_recover_after_nodesproperty,butitallowsspecifyinghowmanydata

www.EBooksWorld.ir

Page 552: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

nodesshouldbepresentintheclusterbeforerecoverystartsgateway.expected_master_nodes:Thisissimilartothegateway.expected_nodesproperty,butinsteadofspecifyingthenumberofallthenodesthatweexpectinthecluster,itallowsspecifyinghowmanymastereligiblenodesweexpecttobepresentgateway.expected_master_nodes:Thisissimilartothegateway.expected_nodesproperty,butallowsspecifyinghowmanymasternodesweexpecttobepresentgateway.expected_data_nodes:Thisisalsosimilartothegateway.expected_nodesproperty,butallowsspecifyinghowmanydatanodesweexpecttobepresent

IndicesrecoveryAPIThereisalsooneotherthingwhenitcomestotherecoveryprocess–theindicesrecoveryAPI.Itallowsustoseetheprocessofindexorindicesrecovery.Touseit,wejustneedtospecifytheindicesandusethe_recoveryend-point.Forexample,tochecktherecoveryprocessofthelibraryindex,wewillrunthefollowingcommand:

curl-XGET'localhost:9200/library/_recovery?pretty'

Theresponsefortheprecedingcommandcanbelargeanddependsonthenumberofshardsintheindexandofcoursetheamountofindiceswewanttogetinformationfor.Inourcase,theresponselooksasfollows(weleftinformationaboutasingleshardtomakeitlessextensive):

{

"library":{

"shards":[{

"id":0,

"type":"STORE",

"stage":"DONE",

"primary":true,

"start_time_in_millis":1444030695956,

"stop_time_in_millis":1444030695962,

"total_time_in_millis":5,

"source":{

"id":"Brt5ejEVSVCkIfvY9iDMRQ",

"host":"127.0.0.1",

"transport_address":"127.0.0.1:9300",

"ip":"127.0.0.1",

"name":"PuffAdder"

},

"target":{

"id":"Brt5ejEVSVCkIfvY9iDMRQ",

"host":"127.0.0.1",

"transport_address":"127.0.0.1:9300",

"ip":"127.0.0.1",

"name":"PuffAdder"

},

"index":{

"size":{

"total_in_bytes":157,

"reused_in_bytes":157,

"recovered_in_bytes":0,

www.EBooksWorld.ir

Page 553: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"percent":"100.0%"

},

"files":{

"total":1,

"reused":1,

"recovered":0,

"percent":"100.0%"

},

"total_time_in_millis":1,

"source_throttle_time_in_millis":0,

"target_throttle_time_in_millis":0

},

"translog":{

"recovered":0,

"total":-1,

"percent":"-1.0%",

"total_on_start":-1,

"total_time_in_millis":4

},

"verify_index":{

"check_index_time_in_millis":0,

"total_time_in_millis":0

}

},

...

]

}

}

Asyoucanseeintheresponse,weseetheinformationabouteachshard.Foreachshard,weseethetypeoftheoperation(thetypeproperty),thestage(thestageproperty)describingwhatpartoftherecoveryprocessisinprogress,andwhetheritisaprimaryshard(theprimaryproperty).Inadditiontothis,weseesectionsaboutthesourceshard,thetargetshard,theindextheshardispartof,theinformationaboutthetransactionlog,andfinallyinformationabouttheindexverification.Allofthisallowsustoseewhatisthestatusoftherecoveryofourindices.

DelayedallocationWealreadydiscussedthatbydefaultElasticsearchtriestobalancetheshardsintheclusteraccordinglytothenumberofnodesinthatcluster.Becauseofthat,whenanodedropsoffthecluster(ormultiplenodesdo)orwhennodesjointhecluster,Elasticsearchstartsrebalancingthecluster,movingtheshardsandthereplicasaround.Thisisusuallyveryexpensive–newprimaryshardsmaybepromotedoutoftheavailablereplicas,largeamountofdatamaybecopiedbetweenthenewprimaryanditsreplicas,andsoon.Andthismaybehappeningbecauseasinglenodewasjustrestartedfor30secondsmaintenance.

Toavoidsuchsituations,Elasticsearchprovidesuswiththepossibilitytocontrolhowlongtowaitbeforebeginningallocationofshardsthatareinunassignedstate.Wecancontrolthedelaybyusingtheindex.unassigned.node_left.delayed_timeoutpropertyandsettingitonperindexbasis.Forexample,toconfiguretheallocationtimeoutforthe

www.EBooksWorld.ir

Page 554: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

libraryindexto10minutes,werunthefollowingcommand:

curl-XPUT'localhost:9200/library/_settings'-d'{

"settings":{

"index.unassigned.node_left.delayed_timeout":"10m"

}

}'

Wecanalsoconfiguretheallocationtimeoutforalltheindicesbyrunningthefollowingcommand:

curl-XPUT'localhost:9200/_all/_settings'-d'{

"settings":{

"index.unassigned.node_left.delayed_timeout":"10m"

}

}'

IndexrecoveryprioritizationElasticsearch2.2exposesonemorefeaturewhenitcomestotheindicesrecoveryprocessthatallowsustodefinewhichindicesshouldbeprioritizedwhenitcomestorecovery.Byspecifyingtheindex.prioritypropertyintheindexsettingsandassigningitapositiveintegervalue,wedefinetheorderinwhichElasticsearchshouldrecovertheindices;theoneswiththehigherindex.prioritypropertywillbestartedfirst.

Forexample,let’sassumethatwehavetwoindices,libraryandmap,andwewantthelibraryindextoberecoveredbeforethemapindex.Todothis,wewillrunthefollowingcommands:

curl-XPUT'localhost:9200/library/_settings'-d'{

"settings":{

"index.priority":10

}

}'

curl-XPUT'localhost:9200/map/_settings'-d'{

"settings":{

"index.priority":1

}

}'

Weassignedhigherprioritytothelibraryindexand,becauseofthat,itwillberecoveredfaster.

www.EBooksWorld.ir

Page 555: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 556: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TemplatesanddynamictemplatesIntheMappingsconfigurationsectionofChapter2,IndexingYourData,wediscussedmappings,howtheyarecreated,andhowthetype-determiningmechanismworks.Nowwewillgetintomoreadvancedtopics.Wewillshowyouhowtodynamicallycreatemappingsfornewindicesandhowtoapplysomelogictothetemplates,sothatnewindicesarealreadycreatedwithpredefinedmappings.

www.EBooksWorld.ir

Page 557: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TemplatesInvariouspartsofthebook,whendiscussingindexconfigurationanditsstructure,we’veseenthatthiscanbecomecomplicated,especiallywhenwehavesophisticateddatastructuresthatwewanttoindex,search,andaggregate.Especiallyifyouhavealotofsimilarindices,takingcareofthemappingsineachofthemcanbeaverypainfulprocess–eachnewindexhastobecreatedwithappropriatemappings.Elasticsearchcreatorspredictedthisandimplementedafeaturecalledindextemplates.Eachtemplatedefinesapattern,whichiscomparedtoanewlycreatedindexname.Whenbothofthemmatch,thevaluesdefinedinthetemplatearecopiedtotheindexstructuredefinition.Whenmultipletemplatesmatchthenameofthenewlycreatedindex,allofthemareappliedandthevaluesfromthetemplatesthatareappliedlateroverridethevaluesdefinedinthepreviouslyappliedtemplates.Thisisveryconvenientbecausewecandefineafewcommonsettingsinthegeneraltemplatesandchangetheminthemorespecializedones.Inaddition,thereisanorderparameterthatletsusforcethedesiredtemplateordering.Youcanthinkoftemplatesasdynamicmappingsthatcanbeappliednottothetypesindocumentsbuttotheindices.

AnexampleofatemplateLet’sseearealexampleofatemplate.Imaginethatwewanttocreatemanyindicesinwhichwedon’twanttostorethesourceofthedocumentssothatourindicesaresmaller.Wealsodon’tneedanyreplicas.WecancreateatemplatethatmatchesourneedbyusingtheElasticsearchRESTAPIandthe/_templateendpoint,bysendingthefollowingcommand:

curl-XPUThttp://localhost:9200/_template/main_template?pretty-d'{

"template":"*",

"order":1,

"settings":{

"index.number_of_replicas":0

},

"mappings":{

"_default_":{

"_source":{

"enabled":false

}

}

}

}'

Fromnowon,allthecreatedindiceswillhavenoreplicasandnosourcestored.Thisisbecausethetemplateparametervalueissetto*,whichmatchesallthenamesoftheindices.Notethe_default_typenameinourexample.Thisisaspecialtypenamewhichindicatesthatthecurrentruleshouldbeappliedtoeverydocumenttype.Thesecondinterestingthingistheorderparameter.Let’sdefineasecondtemplatebyusingthefollowingcommand:

curl-XPUThttp://localhost:9200/_template/ha_template?pretty-d'{

"template":"ha_*",

www.EBooksWorld.ir

Page 558: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"order":10,

"settings":{

"index.number_of_replicas":5

}

}'

Afterrunningtheprecedingcommand,allthenewindiceswillbehaveasearlierexcepttheoneswithnamesbeginningwithha_.Incaseoftheseindices,boththetemplatesareapplied.First,thetemplatewiththelowerordervalueisusedandthenthenexttemplateoverwritesthereplica’ssetting.So,theindiceswhosenamesstartwithha_willhavefivereplicasanddisabledsourcesstored.

NoteBeforeversion2.0,Elasticsearchtemplatescouldalsobestoredinfiles.StartingwithElasticsearch2.0,thisfeatureisnolongeravailable.

www.EBooksWorld.ir

Page 559: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DynamictemplatesSometimeswewanttohavethepossibilityofdefiningtypethatisdependentonthefieldnameandthetype.Thisiswheredynamictemplatescanhelp.Dynamictemplatesaresimilartotheusualmappings,buteachtemplatehasitspatterndefined,whichisappliedtoadocument’sfieldname.Ifafieldnamematchesthepattern,thetemplateisused.

Let’shavealookatthefollowingexample:

curl-XPOST'localhost:9200/news'-d'{

"mappings":{

"article":{

"dynamic_templates":[

{

"template_test":{

"match":"*",

"mapping":{

"index":"analyzed",

"fields":{

"str":{

"type":"{dynamic_type}",

"index":"not_analyzed"

}

}

}

}

}

]

}

}

}'

Intheprecedingexample,wedefinedthemappingforthearticletype.Inthismapping,wehaveonlyonedynamictemplatenamedtemplate_test.Thistemplateisappliedforeveryfieldintheinputdocumentbecauseofthesingleasteriskpatterninthematchproperty.Eachfieldwillbetreatedasamultifield,consistingofafieldnamedastheoriginalfield(forexample,title)andthesecondfieldwithanamesuffixedwithstr(forexample,title.str).ThefirstfieldwillhaveitstypedeterminedbyElasticsearch(withthe{dynamic_type}type),andthesecondfieldwillbeastring(becauseofthestringtype).

ThematchingpatternWehavetwowaysofdefiningthematchingpattern.Theyareasfollows:

match:Thistemplateisusedifthenameofthefieldmatchesthepattern(thispatterntypewasusedinourexample)unmatch:Thistemplateisusedifthenameofthefielddoesn’tmatchthepattern

Bydefault,thepatternisverysimpleandusesglobpatterns.Thiscanbechangedbyusingmatch_pattern=regexp.Afteraddingthisproperty,wecanuseallthemagicprovidedbyregularexpressionstomatchandunmatchthepatterns.Therearevariationssuchas

www.EBooksWorld.ir

Page 560: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

path_matchandpath_unmatchthatcanbeusedtomatchthenamesinnesteddocuments(byprovidingpath,similartoqueries).

FielddefinitionsWhenwritingatargetfielddefinition,thefollowingvariablescanbeused:

{name}:Thenameoftheoriginalfieldfoundintheinputdocument{dynamic_type}:Thetypedeterminedfromtheoriginaldocument

NoteNotethatElasticsearchchecksthetemplatesintheorderoftheirdefinitionsandthefirstmatchingtemplateisapplied.Thismeansthatthemostgenerictemplates(forexample,with"match":"*")mustbedefinedattheend.

www.EBooksWorld.ir

Page 561: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 562: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ElasticsearchpluginsAtvariousplacesinthisbook,wehaveuseddifferentpluginsthathavebeenabletoextendthecorefunctionalityofElasticsearch.YouprobablyremembertheadditionalprogramminglanguagesusedinscriptsdescribedintheScriptingcapabilitiesofElasticsearchsectionofChapter6,MakeYourSearchBetter.Inthissection,wewilllookathowthepluginsworkandhowtoinstallthem.

www.EBooksWorld.ir

Page 563: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThebasicsBydefault,Elasticsearchpluginsarelocatedintheirownsubdirectoryinthepluginssubdirectoryofthesearchenginehomedirectory.Ifyouhavedownloadedanewpluginmanually,youcanjustcreateanewdirectorywiththepluginnameandunpackthatpluginarchivetothisdirectory.Thereisalsoamoreconvenientwaytoinstallplugins:byusingthepluginscript.Wehaveuseditseveraltimesinthisbookwithouttalkingaboutit,sothistimelet’stakethetimeanddescribethistool.

Elasticsearchhastwomaintypesofplugins.Thesetwotypescanbecategorizedbasedonthecontentoftheplugin-descriptor.propertiesfile:Javapluginsandsiteplugins.Let’sstartwiththesiteplugins.TheyusuallycontainsetsofHTML,CSS,andJavaScriptfilesandaddadditionalUIcomponentstoElasticsearch.Elasticsearchtreatsthesitepluginsasafilesetthatshouldbeservedbythebuilt-inHTTPserverunderthe/_plugin/plugin_name/URL(forexample,/_plugin/bigdesk/).Thistypeofplugindoesn’tchangeanythingincoreElasticsearchfunctionality.

TheJavapluginsaretheonesthataddormodifythecoreElasticsearchfeatures.TheyusuallycontaintheJARfiles.Theplugin-descriptor.propertiesfilecontainsinformationaboutthemainclassthatshouldbeusedbyElasticsearchasanentrypointtoconfigurepluginsandallowthemtoextendtheElasticsearchfunctionality.ThenicethingabouttheJavapluginsisthattheycancontainthesitepartaswell.Thesitepartofthepluginneedstobeplacedinthe_sitedirectoryifweareunpackingthepluginmanually.

www.EBooksWorld.ir

Page 564: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

InstallingpluginsPluginscanbedownloadedfromthreesourcetypes.Thefirstistheofficialrepositorylocatedathttps://download.elastic.co.Allpluginsfromthissourcecanbeinstalledbyreferringtothepluginname.Forexample:

bin/plugininstalllang-javascript

Theprecedingcommandresultsininstallationofapluginthatallowsustouseanadditionalscriptinglanguage,JavaScript.ElasticsearchautomaticallytriestofindapluginversionthatisthesameastheversionofElasticsearchweareusing.Sometimes,likeinthefollowingexample,apluginmayaskforadditionalpermissionsduringinstallation.

Justsoweknowwhattoexpect,thisisanexampleresultofrunningtheprecedingcommand:

->Installinglang-javascript…

Trying

https://download.elastic.co/elasticsearch/release/org/elasticsearch/plugin/

lang-javascript/2.2.0/lang-javascript-2.2.0.zip…

Downloading…...............................................................

.........DONE

Verifying

https://download.elastic.co/elasticsearch/release/org/elasticsearch/plugin/

lang-javascript/2.2.0/lang-javascript-2.2.0.zipchecksumsifavailable…

Downloading.DONE

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

@WARNING:pluginrequiresadditionalpermissions@

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

*java.lang.RuntimePermissioncreateClassLoader

*org.elasticsearch.script.ClassPermission<<STANDARD>>

*org.elasticsearch.script.ClassPermission

org.mozilla.javascript.ContextFactory

*org.elasticsearch.script.ClassPermissionorg.mozilla.javascript.Callable

*org.elasticsearch.script.ClassPermission

org.mozilla.javascript.NativeFunction

*org.elasticsearch.script.ClassPermissionorg.mozilla.javascript.Script

*org.elasticsearch.script.ClassPermission

org.mozilla.javascript.ScriptRuntime

*org.elasticsearch.script.ClassPermissionorg.mozilla.javascript.Undefined

*org.elasticsearch.script.ClassPermission

org.mozilla.javascript.optimizer.OptRuntime

See

http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.

html

fordescriptionsofwhatthesepermissionsallowandtheassociatedrisks.

Continuewithinstallation?[y/N]y

Installedlang-javascriptinto/Users/someplace/elasticsearch-

2.2.0/plugins/lang-javascript

Installedlang-javascriptinto

/Users/negativ/Developer/Elastic/elasticsearch-2.2.0/plugins/lang-

javascript

www.EBooksWorld.ir

Page 565: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Ifthepluginisnotavailableatthefirstlocation,itcanbeplacedinoneoftheApacheMavenrepositories:MavenCentral(https://search.maven.org/)orMavenSonatype(https://oss.sonatype.org/).Inthiscase,thepluginnameforinstallationshouldbeequaltogroupId/artifactId/version,justaseverylibraryforMaven(http://maven.apache.org/).Forexample:

bin/plugininstallorg.elasticsearch/elasticsearch-mapper-attachments/3.0.1

ThethirdsourcearetheGitHub(https://github.com/)repositories.Theplugintoolassumesthatthegivenpluginaddresscontainstheorganizationnamefollowedbythepluginnameand,optionally,theversionnumber.Let’slookatthefollowingcommandexample:

bin/plugininstallmobz/elasticsearch-head

Ifyouwriteyourownpluginandyouhavenoaccesstotheearlier-mentionedsites,thereisnoproblem.Theplugintoolacceptstheurlpropertyfromwherethepluginshouldbedownloaded(insteadofspecifyingthenameoftheplugin).Thisoptionallowsustosetanylocationfortheplugins,includingthelocalfilesystem(usingthefile://prefix)orremotefile(usingthehttp://prefix).Forexample,thefollowingcommandwillresultintheinstallationofapluginarchivedonthelocalfilesysteminthe/tmp/elasticsearch-lang-javascript-3.0.0.RC1.zipdirectory:

bin/plugininstallfile:///tmp/elasticsearch-lang-javascript-3.0.0.RC1.zip

www.EBooksWorld.ir

Page 566: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RemovingpluginsRemovingapluginisassimpleasremovingitsdirectory.Youcanalsodothisbyusingtheplugintool.Forexample,toremovethepreviouslyinstalledJavaScriptplugin,werunacommandasfollows:

bin/pluginremovelang-javascript

Theoutputfromthecommandjustconfirmsthatthepluginwasremoved:

->Removinglang-javascript…

Removedlang-javascript

NoteYouneedtorestarttheElasticsearchnodefortheplugininstallationorremovaltotakeeffect.

www.EBooksWorld.ir

Page 567: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 568: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ElasticsearchcachesUntilnowwehaven’tmentionedElasticsearchcachesmuchinthebook.However,asmostcommonsystemsElasticsearchusersavarietyofcachestoperformmorecomplicatedoperationsortospeedupperformanceofheavydataretrievalfromdiskbasedLuceneindices.Inthissection,wewilllookatthemostcommoncachesofElasticsearch,whattheyareusedfor,whataretheperformanceimplicationsofusingthem,andhowtoconfigurethem.

www.EBooksWorld.ir

Page 569: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

FielddatacacheInthebeginningofthebook,wediscussedthatElasticsearchusesthesocalledinvertedindexdatastructuretoquicklyandefficientlysearchthroughthedocuments.Thisisverygoodwhensearchingandfilteringthedata,butforfeaturessuchasaggregations,sorting,orscriptusage,Elasticsearchneedsanun-inverteddatastructure,becausethesefunctionsrelyonperdocumentdatainformation.

Becauseoftheneedforuninverteddata,whenElasticsearchwasfirstreleaseditcontainedandstillcontainsaninmemorydatastructurecalledfielddata.Fielddataisusedtostoreallthevaluesofagivenfieldtomemorytoprovideveryfastdocumentbasedlookup.However,thecostofusingfielddataismemoryandincreasedgarbagecollection.Becauseofmemoryandperformancecost,startingfromElasticsearch2.0,eachindexed,notanalyzedfieldusesdocvaluesbydefault.Otherfields,suchasanalyzedtextfields,stillusefielddataandbecauseofthatitisgoodtoknowhowtohandlefielddata.

FielddatasizeElasticsearchallowsustocontrolhowmuchmemorythefielddatacacheuses.Bydefault,thecacheisunbounded,whichisverydangerous.Ifyouhavelargeindices,youmayrunintomemoryissues,wherethefielddatacachewilleatmostofthememorygiventoElasticsearchandwillresultinnodefailure.Weareallowedtoconfigurethesizeofthefielddatacachebyusingthestaticindices.fielddata.cache.sizepropertysettoanexplicitvalue(like10GB)ortoapercentageofthewholememorygiventoElasticsearch(like20%).

Rememberthatthefielddatacacheisveryexpensivetobuildasitneedstoloadallthevaluesofagivenfieldtomemory.Thiscantakealotoftimeresultingindegradationintheperformanceofthequeries.Becauseofthis,itisadvisedtohaveenoughmemorytokeeptheneededcachepermanentlyinElasticsearchmemory.However,weunderstandthatthisisnotalwayspossiblebecauseofhardwarecosts.

CircuitbreakersThenicethingaboutElasticsearchisthatitallowsustoachieveasimilarthinginmultiplewaysandwehavethesamesituationwhenitcomestofielddataandlimitingthememoryusage.Elasticsearchallowsustouseafunctionalitycalledcircuitbreakers,whichcanestimatehowmuchmemoryarequestoraquerywilluse,andifitisaboveadefinedthreshold,itwon’tbeexecutedatall,resultinginnomemoryusageandanexceptionthrown.Thisisverynicewhenwedon’twanttolimitthesizeofthefielddatacachebutwealsodon’twantasinglequerytocausememoryissuesandmaketheclusterunstable.Therearetwomaincircuitbreakers:thefielddatacircuitbreakerandtherequestcircuitbreaker.

Thefirstcircuitbreaker,thefielddataone,estimatestheamountofmemorythatwillneedtobeusedtoloaddatatothefielddatacacheforagivenquery.Wecanconfigurethelimitbyusingtheindices.breaker.fielddata.limitproperty,whichisbydefaultsetto60%,whichmeansthatafielddatacacheforasinglequerycan’tusemorethan60percent

www.EBooksWorld.ir

Page 570: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ofthememorygiventoElasticsearch.

Thesecondcircuitbreaker,therequestone,estimatesthememoryusedbyperrequestdatastructuresandpreventsthemfromusingmorethantheamountspecifiedbytheindices.breaker.request.limitproperty.Bydefault,thementionedpropertyissetto40%,whichmeansthatsinglerequestdatastructures,suchastheonesusedforaggregationcalculation,can’tusemorethan40%ofthememorygiventoElasticsearch.

Finally,thereisonemorecircuitbreakerthatisdefinedbytheindices.breaker.limit.totalproperty(bydefaultsetto70%).Thiscircuitbreakerdefinesthetotalamountofmemorythatcanbeusedbyboththeperrequestdatastructuresandfielddata.

Rememberthatthesettingsforcircuitbreakersaredynamicandcanbeupdatedusingclusterupdatesettings.

www.EBooksWorld.ir

Page 571: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

FielddataanddocvaluesAswealreadydiscussed,insteadoffielddatacache,docvaluescanbeused.Ofcourse,thisisonlytruefornotanalyzedfieldsandonesusingnumericdatatypesandnotmultivaluedones.Thiswillsavememoryandshouldbefasterthanthefielddatacacheduringquerytime,atthecostofslightindexingspeeddegradations(verysmall)andaslightlylargerindex.Ifyoucanusedocvalues,dothat–itwillhelpyourElasticsearchclustertomaintainstabilityandrespondtoqueriesquickly.

www.EBooksWorld.ir

Page 572: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ShardrequestcacheThefirstofthecachesthatoperatesonthequeries.Theshardrequestcachecachestheaggregationsandsuggestionsresultedbythequery,but,whenwritingthisbook,itwasnotcachingqueryhits.WhenElasticsearchexecutesthequery,thiscachecansavetheresourceconsumingaggregationsforthequeryandspeedupthesubsequentqueriesbyretrievingtheaggregationsorsuggestionsfrommemory.

NoteDuringthewritingofthisbook,theshardrequestcachewasonlyusedwhenthesize=0parameterwassetforthequery.Thismeansthatonlythetotalnumberofhits,aggregationresults,andsuggestionswillbecached.Rememberthatwhenrunningquerieswithdatesandusingthenowconstant,theshardquerycachewon’talsobeused.

Theshardrequestcache,asitsnamesays,cachestheresultsofthequeriesoneachshard,beforetheyarereturnedtothenodethataggregatestheresults.Thiscanbeverygoodwhenyouraggregationsareheavy,liketheonesthatdoalotofcomputationonthedatareturnedbythequery.Ifyourunalotofaggregationswithyourqueriesandthequeriescanberepeated,thinkaboutusingtheshardrequestcacheasitshouldhelpyouwithquerieslatency.

EnablingandconfiguringtheshardrequestcacheTheshardrequestcacheisdisabledbydefault,butcanbeeasilyenabled.Toenableit,weshouldsettheindex.requests.cache.enablepropertytotruewhencreatingtheindex.Forexample,toenabletheshardrequestcacheforanindexcallednew_library,weusethefollowingcommand:

curl-XPUT'localhost:9200/new_library'-d'{

"settings":{

"index.requests.cache.enable":true

}

}'

Onethingtorememberisthatthementionedsettingisnotdynamicallyupdatable.Weneedtoincludeitintheindexcreationcommandorwecanupdateitwhentheindexisclosed.

Themaximumsizeofthecacheisspecifiedusingtheindices.requests.cache.sizepropertyandissetto1%bydefault(whichmeans1%ofthetotalmemorygiventoElasticsearch).Wecanalsospecifyhowlongeachentryshouldbekeptbyusingtheindices.requests.cache.expireproperty,butitisnotsetbydefault.Also,thecacheisinvalidatedoncetheindexisrefreshed(duringindexsearcherreopening),whichmakesthesettinguselessmostofthetime.

NoteNotethatintheearlierversionsofElasticsearch,forexampleinthe1.xbranch,toenableordisablethiscache,theindex.cache.query.enablepropertywasused.Thismaybe

www.EBooksWorld.ir

Page 573: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

importantwhenmigratingfromolderElasticsearchversions.

PerrequestshardrequestcachedisablingElasticsearchallowsustocontroltherequestshardcacheusedonaperrequestbasis.Ifwehavethementionedcacheenabled,wecanstillforcethesearchenginetoomitcachingforsuchrequests.Thisisdonebyusingtherequest_cacheparameter.Ifsettotrue,therequestwillbecachedand,ifsettofalse,therequestwon’tbecached.Thisisespeciallyusefulwhenwewanttocacheourrequestsingeneralbutomitcachingforsomequeriesthatarerareandnotusedoften.Itisalsowiseforrequeststhatusenon-deterministicscriptsandtimerangestonotbecached.

ShardrequestcacheusagemonitoringIfwedon’tuseanymonitoringsoftwarethatallowsmonitoringthecachesusage,wecanuseElasticsearchAPItocheckthemetricsaroundtheshardrequestcache.Thiscanbedonebothattheindicesleveloratthenodeslevel.

Tocheckthemetricsfortheshardrequestcacheforalltheindices,weshouldusetheindicesstatsAPIandrunthefollowingcommand:

curl'localhost:9200/_stats/request_cache?pretty'

Tochecktherequestcachemetrics,butinpernodeview,werunthefollowingcommand:

curl'localhost:9200/_nodes/stats/indices/request_cache?pretty'

www.EBooksWorld.ir

Page 574: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

NodequerycacheThenodequerycacheisresponsibleforholdingtheresultsofqueriesforthewholenode.Itssizeisdefinedusingindices.queries.cache.size,defaultingto10%,andissharableacrossalltheshardspresentonthenode.WecansetitbothtothepercentageoftheheapmemorygiventoElasticsearch,likethedefaultone,ortoanexplicitvalue,like1024mb.Onethingtorememberaboutthecacheisthatitsconfigurationisstatic,itcan’tbeupdateddynamicallyandshouldbesetintheelasticsearch.ymlfile.Thenodequerycacheusestheleastrecentusedevictionpolicy,whichmeansthat,whenfull,itremovesthedatathatwasusedtheleast.

Thiscacheisveryusefulwhenyourunqueriesthatarerepetetiveandheavy,suchastheonesusedtogeneratecategorypagesorthemainpageinane-commerceapplication.

www.EBooksWorld.ir

Page 575: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndexingbuffersThelastcachewewanttodiscussistheindexingbufferthatallowsustoimproveindexingthroughput.Theindexingbufferisdividedbetweenalltheshardsonthenodeandisusedtostorenewlyindexeddocuments.Oncethecachefillsup,Elasticsearchflushesthedatafromthecachetodisk,creatinganewLucenesegmentintheindex.

Therearefourstaticpropertiesthatallowustoconfiguretheindexingbuffersize.Theyneedtobesetintheelasticsearch.ymlfileandcan’tbechangeddynamicallyusingtheSettingsAPI.Thesepropertiesare:

indices.memory.index_buffer_size:Thispropertydefinestheamountofmemoryusedbyanodefortheindexingbuffer.Itacceptsbothapercentagevalueaswellasanexplicitvalueinbytes.Itdefaultsto10%,whichmeansthat10%oftheheapmemorygiventoanodewillbeusedastheindexingbuffer.indices.memory.min_index_buffer_size:Thispropertydefaultsto48mbandspecifiestheminimummemorythatwillbeusedbytheindexingbuffer.Itisusefulwhenindices.memory.index_buffer_sizeisdefinedasapercentagevalue,sothattheindexingbufferisneversmallerthanthevaluedefinedbythisproperty.indices.memory.max_index_buffer_size:Thispropertyspecifiesthemaximummemorythatwillbeusedbytheindexingbuffer.Itisusefulwhenindices.memory.index_buffer_sizeisdefinedasapercentagevalue,sothattheindexingbuffernevercrossesacertainamountofmemoryusage.indices.memory.min_shard_index_buffer_size:Thispropertydefaultsto4mbandsetsthehardminimumlimitoftheindexingbufferthatisgiventoeachshardonanode.Theindexingbufferforeachshardwillnotbelowerthanthevaluesetbythisproperty.

Whenitcomestoindexingperformance,ifyouneedhigherindexingthroughput,considersettingtheindexingbuffersizetoavaluehigherthanthedefaultsize.ItwillallowElasticsearchtoflushthedatatodisklessoftenandcreatefewersegments.Thiswillresultinlessmerges,thuslessI/OandCPUintensiveoperations.Becauseofthat,Elasticsearchwillbeabletousemoreresourcesforindexingpurposes.

www.EBooksWorld.ir

Page 576: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

WhencachesshouldbeavoidedTheusualquestionthatmaybeaskedbyusersisiftheyshouldreallycachealltheirrequests.Theanswerisobvious–ofcourse,cachesarenotthetoolforeveryone.Usingcachingisnotfree–itrequiresmemoryandadditionaloperationstoputthedatatocacheorgetthedataoutofthere.

What’smore,youshouldrememberthatElasticsearchroundrobinsqueriesbetweenprimaryshardsarereplicas,so,ifyouhavereplicas,noteveryrequestafterthefirstonewillusethecache.Imaginethatyouhaveanindexwhichhasasingleprimaryshardandtworeplicas.Whenthefirstrequestcomes,itwillhitarandomshard,butthenextrequest,evenwiththesamequery,willhitanothershard,notthesameone(unlessroutingisused).Youshouldtakethisintoconsiderationwhenusingcaches,becauseifyourqueriesarenotrepeated,youmayhavethemrunninglongerbecauseofacachebeingused.

Sotoanswerthequestionifyoushouldusecachingornot,wewouldadvisetakingyourdata,takingyourqueries,andrunningperformancetestsusingtoolssuchasJMeter(http://jmeter.apache.org).Thiswillletyouseehowyourclusterbehaveswithrealdataunderatestloadandseeifthequeriesareactuallyfasterwithorwithoutthecaches.

www.EBooksWorld.ir

Page 577: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 578: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheupdatesettingsAPIElasticsearchletsustuneitselfbyspecifyingthevariousparametersintheelasticsearch.ymlfile.ButyoushouldtreatthisfileasthesetofdefaultvaluesthatcanbechangedintheruntimeusingtheElasticsearchRESTAPI.Wecanchangeboththeperindexsettingandtheclusterwidesettings.However,youshouldrememberthatnotallpropertiescanbedynamicallychanged.Ifyoutrytoaltertheseparameters,Elasticsearchwillrespondwithapropererror.

www.EBooksWorld.ir

Page 579: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheclustersettingsAPIInordertosetoneoftheclusterproperties,weneedtousetheHTTPPUTmethodandsendaproperrequesttothe_cluster/settingsURI.However,wehavetwooptions:addingthechangesastransientorpermanent.

Thefirstone,transient,willsetthepropertyonlyuntilthefirstrestart.Inordertodothis,wesendthefollowingcommand:

curl-XPUT'localhost:9200/_cluster/settings'-d'{

"transient":{

"PROPERTY_NAME":"PROPERTY_VALUE"

}

}'

Asyoucansee,intheprecedingcommand,weusedtheobjectnamedtransientandweaddedourpropertydefinitionthere.Thismeansthatthepropertywillbevalidonlyuntiltherestart.Ifwewantourpropertysettingstopersistbetweenrestarts,insteadofusingtheobjectnamedtransient,weusetheonenamedpersistent.

Atanymoment,youcanfetchthesesettingsusingthefollowingcommand:

curl-XGETlocalhost:9200/_cluster/settings

www.EBooksWorld.ir

Page 580: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheindicessettingsAPITochangetheindicesrelatedsettings,Elasticsearchprovidesthe/_settingsendpointforchangingtheparametersforalltheindicesandthe/index_name/_settingsendpointformodifyingthesettingsofasingleindex.Whencomparedtotheclusterwidesettings,allthechangesdonetoindicesusingtheAPIarealwayspersistentandvalidafterElasticsearchrestarts.Tochangethesettingsforalltheindices,wesendthefollowingcommand:

curl-XPUT'localhost:9200/_settings'-d'{

"index":{

"PROPERTY_NAME":"PROPERTY_VALUE"

}

}'

Thecurrentsettingsforalltheindicescanbelistedusingthefollowingcommand:

curl-XGETlocalhost:9200/_settings

Tosetapropertyforasingleindex,werunthefollowingcommand:

curl-XPUT'localhost:9200/index_name/_settings'-d'{

"index":{

"PROPERTY_NAME":"PROPERTY_VALUE"

}

}'

Thegetthesettingsforthelibraryindex,werunthefollowingcommand:

curl-XGETlocalhost:9200/library/_settings

www.EBooksWorld.ir

Page 581: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 582: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SummaryInthechapterwejustfinished,welearnedafewveryimportantthingsaboutElasticsearch.Firstofall,welearnedhowwecanconfigurethenodediscoverymechanism.Inadditiontothat,welearnedtocontrolwhathappensaftertheclusterisinitiallyformedusingtherecoveryandgatewaymodules.Weuseddynamicandnon-dynamictemplatestohandleourindicesmoreeasily,andwelearnedwhattypeofcachesElasticsearchhasandhowtocontrolthem.Finally,weusedtheupdatesettingsAPItoupdatethevariousElasticsearchconfigurationvariablesonanalreadylivecluster.

Inthenextchapter,wewillfocusonclusteradministration.Wewillstartwithlearninghowtobackupourdataandhowtomonitorthekeyclustermetrics.We’llseethewaytocontrolclusterrebalancingandshardallocation,andwewilluseahumanfriendlyCatAPIthatallowsustogetvariedinformationaboutthecluster.Finally,wewilllearnaboutwarmingupourindicesandaliasing.

www.EBooksWorld.ir

Page 583: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 584: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Chapter10.AdministratingYourClusterInthepreviouschapter,wefocusedonElasticsearchnodesandclusterconfiguration.Westartedbydiscussingthenodediscoveryprocess,whatitisandhowtoconfigureit.We’vediscussedgatewayandrecoverymodulesandtunedthemtomatchourneeds.We’veusedtemplatesanddynamictemplatestomanagedatastructureeasilyandlearnedhowtoinstallpluginstoextendthefunctionalitiesofElasticsearch.Finally,we’velearnedaboutthecachesofElasticsearchandhowtoupdateindicesandclustersettingsusingadedicatedAPI.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

BackingupyourindicesinElasticsearchMonitoringyourclustersControllingshardsandrebalancingreplicasControllingshardsandallocatingreplicasUsingCATAPItolearnaboutclusterstateWarmingupAliasing

www.EBooksWorld.ir

Page 585: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ElasticsearchtimemachineAgoodpieceofsoftwareisaonethatcanmanageexceptionalsituationssuchashardwarefailureorhumanerror.Eventhoughaclusterofafewserversislessdependentonhardwareproblems,badthingscanstillhappen.Forexample,let’simaginethatyouneedtorestoreyourindices.OnepossiblesolutionistoreindexallyourdatafromaprimarydatastoresuchasaSQLdatabase.Butwhatwillyoudoifittakestoolongor,evenworse,theonlydatastoreisElasticsearch?BeforeElasticsearch1.0,creatingbackupsofindiceswasnoteasy.Theprocedureincludedstoppingindexation,flushingthedatatodisk,shuttingdownthecluster,and,finally,copyingthedatatoabackupdevice.

Fortunately,nowwecantakesnapshotsandthissectionwillguideyouandshowhowthisfunctionalityworks.

www.EBooksWorld.ir

Page 586: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CreatingasnapshotrepositoryAsnapshotkeepsallthedatarelatedtotheclusterfromthetimethesnapshotcreationstartsanditincludesinformationabouttheclusterstateandindices.Beforewecreatesnapshots,atleastthefirstone,asnapshotrepositorymustbecreated.Eachrepositoryisrecognizedbyitsnameandshoulddefinethefollowingaspects:

name:Auniquenameoftherepository;wewillneeditlater.type:Thetypeoftherepository.Thepossiblevaluesarefs(arepositoryonasharedfilesystem)andurl(aread-onlyrepositoryavailableviaURL)settings:Additionalinformationneededdependingontherepositorytype

Now,let’screateafilesystemrepository.Beforethis,wehavetomakesurethatthedirectoryforourbackupsfulfilstworequirements.Thefirstisrelatedtosecurity.EveryrepositoryhastobeplacedinthepathdefinedintheElasticsearchconfigurationfileaspath.repo.Forexample,ourelasticsearch.ymlincludesalinesimilartothefollowingone:

path.repo:["/tmp/es_backup_folder","/tmp/backup/es"]

Thesecondrequirementsaysthateverynodeintheclustershouldbeabletoaccessthedirectorywesetfortherepository.

Sonow,let’screateanewfilesystemrepositorybyrunningthefollowingcommand:

curl-XPUTlocalhost:9200/_snapshot/backup-d'{

"type":"fs",

"settings":{

"location":"/tmp/es_backup_folder/cluster1"

}

}'

Theprecedingcommandcreatesarepositorynamedbackup,whichstoresthebackupfilesinthedirectorygivenbythelocationattribute.Elasticsearchrespondswiththefollowinginformation:

{"acknowledged":true}

Atthesametime,es_backup_folderonthelocalfilesystemiscreated—withoutanycontentyet.

NoteYoucanalsosetarelativepathwiththelocationparameter.Inthiscase,Elasticsearchdeterminestheabsolutepathbyfirstgettingthedirectorydefinedinpath.repo.

Aswesaid,thesecondrepositorytypeisurl.Itrequiresaurlparameterinsteadofthelocation,whichpointstotheaddresswheretherepositoryresides,forexample,theHTTPaddress.Asinthepreviouscase,theaddressshouldbedefinedintherepositories.url.allowed_urlsparameterintheElasticsearchconfiguration.Theparameterallowstheuseofwildcardsintheaddress.

www.EBooksWorld.ir

Page 587: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

NoteNotethatfile://addressesarecheckedagainstthepathsdefinedinthepath.repoparameter.

YoucanalsostoresnapshotsinAmazonS3,HDFS,orAzureusingtheadditionalpluginsavailable.Tolearnaboutthese,pleasevisitthefollowingpages:

https://github.com/elastic/elasticsearch-cloud-aws#s3-repositoryhttps://github.com/elastic/elasticsearch-hadoop/tree/master/repository-hdfshttps://github.com/elastic/elasticsearch-cloud-azure#azure-repository

Nowthatwehaveourfirstrepository,wecanseeitsdefinitionusingthefollowingcommand:

curl-XGETlocalhost:9200/_snapshot/backup?pretty

Wecanalsocheckalltherepositoriesbyrunningacommandlikethefollowing:

curl-XGETlocalhost:9200/_snapshot/_all?pretty

Orsimply,wecanusethis:

curl-XGETlocalhost:9200/_snapshot/_all?pretty

curl-XGETlocalhost:9200/_snapshot/?pretty

Ifyouwanttodeleteasnapshotrepository,thestandardDELETEcommandhelps:

curl-XDELETElocalhost:9200/_snapshot/backup?pretty

www.EBooksWorld.ir

Page 588: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CreatingsnapshotsBydefault,Elasticsearchtakesalltheindicesandclustersettings(exceptthetransientones)whencreatingsnapshots.Youcancreateanynumberofsnapshotsandeachwillholdinformationavailablerightfromthetimewhenthesnapshotwascreated.Thesnapshotsarecreatedinasmartway;onlynewinformationiscopied.ThismeansthatElasticsearchknowswhichsegmentsarealreadystoredintherepositoryanddoesn’thavetosavethemagain.

Tocreateanewsnapshot,weneedtochooseauniquenameandusethefollowingcommand:

curl-XPUT'localhost:9200/_snapshot/backup/bckp1'

Theprecedingcommanddefinesanewsnapshotnamedbckp1(youcanonlyhaveonesnapshotwithagivenname;Elasticsearchwillcheckitsuniqueness)anddataisstoredinthepreviouslydefinedbackuprepository.Thecommandreturnsanimmediateresponse,whichlooksasfollows:

{"accepted":true}

Theprecedingresponsemeansthattheprocessofsnapshot-inghasstartedandcontinuesinthebackground.Ifyouwouldliketheresponsetobereturnedonlywhentheactualsnapshotiscreated,youcanaddthewait_for_completion=trueparameterasshowninthefollowingexample:

curl-XPUT'localhost:9200/_snapshot/backup/bckp2?

wait_for_completion=true&pretty'

Theresponsetotheprecedingcommandshowsthestatusofacreatedsnapshot:

{

"snapshot":{

"snapshot":"bckp2",

"version_id":2000099,

"version":"2.2.0",

"indices":["news"],

"state":"SUCCESS",

"start_time":"2016-01-07T21:21:43.740Z",

"start_time_in_millis":1446931303740,

"end_time":"2016-01-07T21:21:44.750Z",

"end_time_in_millis":1446931304750,

"duration_in_millis":1010,

"failures":[],

"shards":{

"total":5,

"failed":0,

"successful":5

}

}

}

Asyoucansee,Elasticsearchpresentsinformationaboutthetimetakenbythesnapshot-

www.EBooksWorld.ir

Page 589: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ingprocess,itsstatus,andtheindicesaffected.

AdditionalparametersThesnapshotcommandalsoacceptsthefollowingadditionalparameters:

indices:Thenamesoftheindicesofwhichwewanttotakesnapshots.ignore_unavailable:Whenthisissettofalse(thedefault),Elasticsearchwillreturnanerrorifanyindexlistedusingtheindicesparameterismissing.Whensettotrue,Elasticsearchwilljustignorethemissingindicesduringbackup.include_global_state:Whenthisissettotrue(thedefault),theclusterstateisalsowrittentothesnapshot(exceptforthetransientsettings).partial:Thesnapshotoperationsuccessdependsontheavailabilityofalltheshards.Ifanyoftheshardsisnotavailable,thesnapshotoperationwillfail.SettingpartialtotruecausesElasticsearchtosaveonlytheavailableshardsandomitthelostones.

Anexampleofusingadditionalparameterscanlookasfollows:

curl-XPUT'localhost:9200/_snapshot/backup/bckp3?

wait_for_completion=true&pretty'-d'{

"indices":"b*",

"include_global_state":"false"

}'

www.EBooksWorld.ir

Page 590: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RestoringasnapshotNowthatwehaveoursnapshotsdone,wewillalsolearnhowtorestoredatafromagivensnapshot.Aswesaidearlier,asnapshotcanbeaddressedbyitsname.Wecanlistallthesnapshotsusingthefollowingcommand:

curl-XGET'localhost:9200/_snapshot/backup/_all?pretty'

TheresponsereturnedbyElasticsearchtotheprecedingcommandshowsthelistofallavailablebackups.Everylistitemissimilartothefollowing:

{

"snapshot":{

"snapshot":"bckp2",

"version_id":2000099,

"version":"2.2.0",

"indices":["news"],

"state":"SUCCESS",

"start_time":"2016-01-07T21:21:43.740Z",

"start_time_in_millis":1446931303740,

"end_time":"2016-01-07T21:21:44.750Z",

"end_time_in_millis":1446931304750,

"duration_in_millis":1010,

"failures":[],

"shards":{

"total":5,

"failed":0,

"successful":5

}

}

}

Therepositorywecreatedearlieriscalledbackup.Torestoreasnapshotnamedbckp1fromoursnapshotrepository,runthefollowingcommand:

curl-XPOST'localhost:9200/_snapshot/backup/bckp1/_restore'

Duringtheexecutionofthiscommand,Elasticsearchtakestheindicesdefinedinthesnapshotandcreatesthemwiththedatafromthesnapshot.However,iftheindexalreadyexistsandisnotclosed,thecommandwillfail.Inthiscase,youmayfinditconvenienttoonlyrestorecertainindices,forexample:

curl-XPOST'localhost:9200/_snapshot/backup/bckp1/_restore?pretty'-d'{

"indices":"c*"}'

Theprecedingcommandrestoresonlytheindicesthatbeginwiththeletterc.Theotheravailableparametersareasfollows:

ignore_unavailable:Thisparameterwhensettofalse(thedefaultbehavior),willcauseElasticsearchtofailtherestoreprocessifanyoftheexpectedindicesisnotavailable.include_global_state:ThisparameterwhensettotruewillcauseElasticsearchtorestoretheglobalstateincludedinthesnapshot,whichisalsothedefaultbehavior.

www.EBooksWorld.ir

Page 591: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

rename_pattern:Thisparameterallowstherenamingoftheindexduringarestoreoperation.Thankstothis,therestoredindexwillhaveadifferentname.Thevalueofthisparameterisaregularexpressionthatdefinesthesourceindexname.Ifapatternmatchesthenameoftheindex,namesubstitutionwilloccur.Inthepattern,youshouldusegroupslimitedbyparenthesesusedintherename_replacementparameter.rename_replacement:Thisparameteralongwithrename_patterndefinesthetargetindexname.Usingthedollarsignandnumber,youcanrecalltheappropriategroupfromrename_pattern.

Forexample,duetorename_pattern=products_(.*),onlytheindiceswithnamesthatbeginwithproducts_willberestored.Therestoftheindexnamewillbeusedduringreplacement.rename_pattern=products_(.*)togetherwithrename_replacement=items_$1causestheproducts_carsindextoberestoredtoanindexcalleditems_cars.

www.EBooksWorld.ir

Page 592: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Cleaningup–deletingoldsnapshotsElasticsearchleavessnapshotrepositorymanagementuptoyou.Currently,thereisnoautomaticclean-upprocess.Butdon’tworry;thisissimple.Forexample,let’sremoveourpreviouslytakensnapshot:

curl-XDELETE'localhost:9200/_snapshot/backup/bckp1?pretty'

Andthat’sall.Thecommandcausesthesnapshotnamedbckp1fromthebackuprepositorytobedeleted.

www.EBooksWorld.ir

Page 593: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 594: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Monitoringyourcluster’sstateandhealthMonitoringisessentialwhenitcomestohandlingyourclusterandensuringitisinahealthystate.Itallowsadministratorsanddevelopstodetectpossibleproblemsandpreventthembeforetheyoccurortoactassoonastheystartshowing.Intheworstcase,monitoringallowsustodoapostmortemanalysisofwhathappenedtotheapplication—inthiscase,ourElasticsearchclusterandeachofthenodes.

Elasticsearchprovidesverydetailedinformationthatallowsustocheckandmonitorournodesortheclusterasawhole.Thisincludesstatisticsandinformationabouttheservers,nodes,indices,andshards.Ofcourse,wearealsoabletogetinformationabouttheentireclusterstate.BeforewegetintothedetailsaboutthementionedAPI,pleaserememberthattheAPIiscomplexandweareonlydescribingthebasics.Wewilltrytoshowyouwheretostartsoyou’llbeabletoknowwhattolookforwhenyouneedverydetailedinformation.

www.EBooksWorld.ir

Page 595: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ClusterhealthAPIOneofthemostbasicAPIsistheclusterhealthAPI,whichallowsustogetinformationabouttheentireclusterstatewithasingleHTTPcommand.Forexample,let’srunthefollowingcommand:

curl-XGET'localhost:9200/_cluster/health?pretty'

AsampleresponsereturnedbyElasticsearchfortheprecedingcommandlooksasfollows:

{

"cluster_name":"elasticsearch",

"status":"yellow",

"timed_out":false,

"number_of_nodes":1,

"number_of_data_nodes":1,

"active_primary_shards":11,

"active_shards":11,

"relocating_shards":0,

"initializing_shards":0,

"unassigned_shards":11,

"delayed_unassigned_shards":0,

"number_of_pending_tasks":0,

"number_of_in_flight_fetch":0,

"task_max_waiting_in_queue_millis":0,

"active_shards_percent_as_number":50.0

}

Themostimportantinformationisaboutthestatusofthecluster.Inourexample,weseethattheclusterisinyellowstatus.Thismeansthatalltheprimaryshardshavebeenallocatedproperly,butthereplicaswerenot(becauseofasinglenodeinthecluster,butthatdoesn’tmatterfornow).

Ofcourse,apartfromtheclusternameandstatus,wecanseehowtherequestwastimedout,howmanynodesthereare,howmanydatanodes,primaryshards,initializingshards,unassignedones,andsoon.

Let’sstophereandtalkabouttheclusterandwhenthecluster,asawhole,isfullyoperational.ClusterisfullyoperationalwhenElasticsearchisabletoallocatealltheshardsandreplicasaccordingtotheconfiguration.Thisiswhentheclusterisinthegreenstate.Theyellowstatemeansthatwearereadytohandlerequestsbecausetheprimaryshardsareallocated,butsome(orall)replicasarenot.Thelaststate,theredone,meansthatatleastoneprimaryshardwasnotallocatedandbecauseofthis,theclusterisnotreadyyet.Thatmeansthatthequeriesmayreturnerrorsornotcompleteresults.

Theprecedingcommandcanalsobeexecutedtocheckthehealthstateofcertainindices.Forexample,ifwewouldliketocheckthehealthofthelibraryandmapindices,wewouldrunthefollowingcommand:

curl-XGET'localhost:9200/_cluster/health/library,map/?pretty'

Controllinginformationdetails

www.EBooksWorld.ir

Page 596: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Elasticsearchallowsustospecifyaspeciallevelparameter,whichcantakethevalueofcluster(default),indices,orshards.ThisallowsustocontrolthedetailsofinformationreturnedbythehealthAPI.We’vealreadyseenthedefaultbehavior.Whensettingthelevelparametertoindices,apartfromtheclusterinformation,wewillalsogetperindexhealth.SettingthementionedparametertoshardstellsElasticsearchtoreturnpershardinformationinadditiontowhatwe’veseenintheexample.

AdditionalparametersInadditiontothelevelparameter,wehaveafewadditionalparametersthatcancontrolthebehaviorofthehealthAPI.

Thefirstofthementionedparametersistimeoutandallowsustocontrolhowlongatthemost,thecommandexecutionwillwaitwhenoneofthefollowingparametersisused:wait_for_status,wait_for_nodes,wait_for_relocating_shards,andwait_for_active_shards.Bydefault,itissetto30sandmeansthatthehealthcommandwillwait30secondsmaximumandreturntheresponsebythen.

Thewait_for_statusparameterallowsustotellElasticsearchwhichhealthstatustheclustershouldbeattoreturnthecommand.Itcantakethevaluesofgreen,yellow,andred.Forexample,whensettogreen,thehealthAPIcallwillreturntheresultsuntilthegreenstatusortimeoutisreached.

Thewait_for_nodesparameterallowsustosettherequirednumberofnodesavailabletoreturnthehealthcommandresponse(oruntiladefinedtimeoutisreached).Itcanbesettoanintegernumberlike3ortoasimpleequationlike>=3(means,greaterthanorequaltothreenodes)or<=3(meanslessthanorequaltothreenodes).

Thewait_for_active_shardsparametermeansthatElasticsearchwillwaitforaspecifiednumberofactiveshardstobepresentbeforereturningtheresponse.

Thelastparameteristhewait_for_relocating_shard,whichisbydefaultnotspecified.ItallowsustotellElasticsearchhowmanyrelocatingshardsitshouldwaitfor(oruntilthetimeoutisreached).Settingthisparameterto0meansthatElasticsearchshouldwaitforalltherelocatingshards.

Anexampleusageofthehealthcommandwithsomeofthementionedparametersisasfollows:

curl-XGET'localhost:9200/_cluster/health?

wait_for_status=green&wait_for_nodes=>=3&timeout=100s'

www.EBooksWorld.ir

Page 597: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndicesstatsAPIElasticsearchindexistheplacewhereourdatalivesanditisacrucialpartformostdeployments.WiththeuseoftheindicesstatsAPIavailableusingthe_statsendpoint,wecangetalotofinformationabouttheindiceslivinginsideourcluster.Ofcourse,aswithmostoftheAPI’sinElasticsearch,wecansendacommandtogettheinformationaboutalltheindices(usingthepure_statsendpoint),aboutoneparticularindex(forexamplelibrary/_stats)orseveralindicesatthesametime(forexamplelibrary,map/_stats).Forexample,tocheckthestatisticsforthemapandlibraryindiceswe’veusedinthebook,wecouldrunthefollowingcommand:

curl-XGET'localhost:9200/library,map/_stats?pretty'

Theresponsetotheprecedingcommandhasmorethan700lines,soweonlydescribeitsstructureomittingtheresponseitself.Apartfromtheinformationabouttheresponsestatusandtheresponsetime,wecanseethreeobjectsnamedprimaries,total(in_allobject),andindices.Theindicesobjectcontainsinformationaboutthelibraryandmapindices.Theprimariesobjectcontainsinformationabouttheprimaryshardsallocatedtothecurrentnode,andthetotalobjectcontainsinformationaboutalltheshardsincludingreplicas.Alltheseobjectscancontainobjectsdescribingaparticularstatisticsuchasthefollowing:docs,store,indexing,get,search,merges,refresh,flush,warmer,query_cache,fielddata,percolate,completion,segments,translog,suggest,request_cache,andrecovery.

WecanlimittheamountofinformationthatwegetfromtheindicesstatsAPIbyprovidingthetypeofdataweareinterestedinusingthenamesofthestatisticsmentionedpreviously.Forexample,ifwewanttogetinformationaboutindexingandsearching,wecanrunthefollowingcommand:

curl-XGET'localhost:9200/library,map/_stats/indexing,search?pretty'

Let’sdiscusstheinformationstoredinthoseobjects.

DocsThedocssectionoftheresponseshowsinformationaboutindexeddocuments.Forexample,itcouldlookasfollows:

"docs":{

"count":4,

"deleted":0

}

Themaininformationisthecount,indicatingthenumberofdocumentsinthedescribedindex.Whenwedeletedocumentsfromtheindex,Elasticsearchdoesn’tremovethesedocumentsimmediatelyandonlymarksthemasdeleted.Documentsarephysicallydeletedduringthesegmentmergeprocess.Thenumberofdocumentsmarkedasdeletedispresentedbythedeletedattributeandshouldbe0rightafterthemerge.

Store

www.EBooksWorld.ir

Page 598: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Thenextstatistic,thestoreone,providesinformationregardingstorage.Forexample,suchasectioncouldlookasfollows:

"store":{

"size_in_bytes":6003,

"throttle_time_in_millis":0

}

Themaininformationisabouttheindex(orindices)size.Wecanalsolookatthrottlingstatistics.ThisinformationisusefulwhenthesystemhasproblemswiththeI/Operformanceandhasconfiguredlimitsonaninternaloperationduringsegmentmerging.

Indexing,get,andsearchTheindexing,get,andsearchsectionsoftheresponseprovideinformationaboutdatamanipulationindexingwithdeleteoperations,usingreal-timegetandsearching.Let’slookatthefollowingexamplereturnedbyElasticsearch:

"indexing":{

"index_total":0,

"index_time_in_millis":0,

"index_current":0,

"delete_total":0,

"delete_time_in_millis":0,

"delete_current":0,

"noop_update_total":0,

"is_throttled":false,

"throttle_time_in_millis":0

},

"get":{

"total":0,

"time_in_millis":0,

"exists_total":0,

"exists_time_in_millis":0,

"missing_total":0,

"missing_time_in_millis":0,

"current":0

},

"search":{

"open_contexts":0,

"query_total":0,

"query_time_in_millis":0,

"query_current":0,

"fetch_total":0,

"fetch_time_in_millis":0,

"fetch_current":0,

"scroll_total":0,

"scroll_time_in_millis":0,

"scroll_current":0

}

Asyoucansee,allofthesestatisticshavesimilarstructures.Wecanreadthetotaltimespentinvariousrequesttypes(inmilliseconds),thenumberofrequests(whichwiththetotaltimeallowsustocalculatetheaveragetimeofasinglequery).Inthecaseofgetrequests,valuableinformationishowmanyfetcheswereunsuccessful(missing

www.EBooksWorld.ir

Page 599: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

documents);anindexingrequesthasinformationaboutthrottling,andsearchincludesinformationregardingscrolling.

AdditionalinformationInadditiontothepreviouslydescribedsection,Elasticsearchprovidesthefollowinginformation:

merges:ThissectioncontainsinformationaboutLucenesegmentmergesrefresh:Thissectioncontainsinformationabouttherefreshoperationflush:Thissectioncontainsinformationaboutflusheswarmer:Thissectioncontainsinformationaboutwarmersandforhowlongtheywereexecutedquery_cache:Thisquerycachesstatisticsfielddata:Thisfielddatacachesstatisticspercolate:Thissectioncontainsinformationaboutthepercolatorusagecompletion:Thissectioncontainsinformationaboutthecompletionsuggestersegments:ThissectioncontainsinformationaboutLucenesegmentstranslog:Thissectioncontainsinformationaboutthetransactionlogscountandsizesuggest:Thissectioncontainssuggesters-relatedstatisticsrequest_cache:Thiscontainsshardrequestcachesstatisticsrecovery:Thiscontainsshardsrecoveryinformation

www.EBooksWorld.ir

Page 600: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

NodesinfoAPIThenodesinfoAPIprovidesuswithinformationaboutthenodesinthecluster.TogetinformationfromthisAPI,weneedtosendtherequesttothe_nodesRESTendpoints.ThesimplestcommandtoretrievenodesrelatedinformationfromElasticsearchwouldbeasfollows:

curl-XGET'localhost:9200/_nodes?pretty'

ThisAPIcanbeusedtofetchinformationaboutparticularnodesorasinglenodeusingthefollowing:

Nodename:IfwewouldliketogetinformationaboutthenodenamedPulse,wecouldrunacommandtothefollowingRESTendpoint:_nodes/PulseNodeidentifier:Ifwewouldliketogetinformationaboutthenodewithanidentifierequaltony4hftjNQtuKMyEvpUdQWg,wecouldrunacommandtothefollowingRESTendpoint:_nodes/ny4hftjNQtuKMyEvpUdQWgIPaddress:WecanuseIPaddressestogetinformationaboutthenodes.Forexample,ifwewouldliketogetinformationaboutthenodewithanIPaddressequalto192.168.1.103,wecouldrunacommandtothefollowingRESTendpoint:_nodes/192.168.1.103

ParametersfromtheElasticsearchconfiguration:Ifwewouldliketogetinformationaboutallthenodeswiththenode.rackpropertysetto2,wecouldrunacommandtothefollowingRESTendpoint:/_nodes/rack:2

ThisAPIalsoallowsustogetinformationaboutseveralnodesatonceusingthese:

Patterns,forexample:_nodes/192.168.1.*or_nodes/P*Nodesenumeration,forexample:_nodes/Pulse,SlabBothpatternsandenumerations,forexample:/_nodes/P*,S*

ReturnedinformationBydefault,thenodesAPIwillreturnextensiveinformationabouteachnodealongwiththename,identifier,andaddresses.Thisextensiveinformationincludesthefollowing:

settings:TheElasticsearchconfigurationos:Informationabouttheserversuchasprocessor,RAM,andswapspaceprocess:Processidentifierandrefreshintervaljvm:InformationaboutJavaVirtualMachinesuchasmemorylimits,memorypools,andgarbagecollectorsthread_pool:Theconfigurationofthreadpoolsforvariousoperationstransport:Listeningaddressesforthetransportprotocolhttp:InformationaboutlisteningaddressesforanHTTP-basedAPIplugins:Informationaboutthepluginsinstalledbytheusermodules:Informationaboutthebuilt-inplugins

AnexampleusageofthisAPIcanbeillustratedbythefollowingcommand:

curl'localhost:9200/_nodes/Pulse/os,jvm,plugins'

www.EBooksWorld.ir

Page 601: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheprecedingcommandwillreturnthebasicinformationaboutthenodenamedPulseand,inadditiontothis,itwillincludetheoperatingsysteminformation,javavirtualmachineinformation,andplugins-relatedinformation.

www.EBooksWorld.ir

Page 602: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

NodesstatsAPIThenodesstatsAPIissimilartothenodesinfoAPIdescribedintheprecedingsection.ThemaindifferenceisthatthepreviousAPIprovidedinformationabouttheenvironmentinwhichthenodeisrunning,whiletheonewearecurrentlydiscussingtellsusaboutwhathappenedwiththeclusterduringitswork.TousethenodesstatsAPI,youneedtosendacommandtothe/_nodes/statsRESTendpoint.However,similartothenodesinfoAPI,wecanalsoretrieveinformationaboutspecificnodes(forexample:_nodes/Pulse/stats).

ThesimplestcommandtoretrievenodesrelatedinformationfromElasticsearchwouldbeasfollows:

curl-XGET'localhost:9200/_nodes/stats?pretty'

Bydefault,Elasticsearchreturnsalltheavailablestatisticsbutwecanlimittheonesweareinterestedin.Theavailableoptionsareasfollows:

indices:Informationabouttheindicesincludingsize,documentcount,indexingrelatedstatistics,searchandgettime,caches,segmentmerges,andsoonos:Operatingsystemrelatedinformationsuchasfreediskspace,memory,swapusage,andsoonprocess:Memory,CPU,andfilehandlerusagerelatedtotheElasticsearchprocessjvm:Javavirtualmachinememoryandgarbagecollectorstatisticstransport:Informationaboutdatasentandreceivedbythetransportmodulehttp:Informationabouthttpconnectionsfs:InformationaboutavailablediskspaceandI/Ooperationsstatisticsthread_pool:Informationaboutthestateofthethreadsassignedtovariousoperationsbreakers:Informationaboutcircuitbreakersscript:Scriptingenginerelatedinformation

AnexampleusageofthisAPIcanbeillustratedbythefollowingcommand:

curl'localhost:9200/_nodes/Pulse/stats/os,jvm,breaker'

www.EBooksWorld.ir

Page 603: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ClusterstateAPIAnotherAPIprovidedbyElasticsearchistheclusterstateAPI.Asitsnamesuggests,itallowsustogetinformationabouttheentirecluster(wecanalsolimitthereturnedinformationtoalocalnodebyaddingthelocal=trueparametertotherequest).ThebasiccommandusedtogetalltheinformationreturnedbythisAPIlooksasfollows:

curl-XGET'localhost:9200/_cluster/state?pretty'

Wecanalsolimittheprovidedinformationtothegivenmetricsincomma–separatedform,specifiedafterthe_cluster/statepartoftheRESTcall.Forexample:

curl-XGET'localhost:9200/_cluster/state/version,nodes?pretty'

Wecanalsolimittheinformationtothegivenmetricsandindices.Forexample,ifwewouldliketogetthemetadataforthelibraryindex,wecouldrunthefollowingcommand:

curl-XGET'localhost:9200/_cluster/state/metadata/library?pretty'

Thefollowingmetricsareallowedtobeused:

version:Thisreturnsinformationabouttheclusterstateversion.master_node:Thisreturnsinformationabouttheelectedmasternode.nodes:Thisreturnsnodesinformation.routing_table:Thisreturnsroutingrelatedinformation.metadata:Thisreturnsmetadatarelatedinformation.Whenspecifyingretrievingthemetadatametricwecanalsoincludeanadditionalparametersuchasindex_templates=true,whichwillresultinincludingthedefinedindextemplates.blocks:Thisreturnstheblockspartoftheresponse.

www.EBooksWorld.ir

Page 604: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ClusterstatsAPITheclusterstatsAPIallowsustogetstatisticsabouttheindicesandnodesfromtheclusterwideperspective.TousethisAPI,weneedtoruntheGETrequesttothe/_cluster/statsRESTendpoint,forexample:

curl-XGET'localhost:9200/_cluster/stats?pretty'

Theresponsesizedependsonthenumberofshards,indices,andnodesinthecluster.Itwillincludebasicindicesinformationsuchasshards,theirstate,recoveryinformation,cachesinformation,andnoderelatedinformation.

www.EBooksWorld.ir

Page 605: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PendingtasksAPIOneoftheAPI’sthathelpsusinseeingwhatElasticsearchisdoing;itallowsustocheckwhichtasksarewaitingtobeexecuted.Toretrievethisinformation,weneedtosendarequesttothe/_cluster/pending_tasksRESTendpoint.Inthisresponse,wewillseeanarrayoftaskswithinformationaboutthem,suchastaskpriorityandtimeinqueue.

www.EBooksWorld.ir

Page 606: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndicesrecoveryAPITherecoveryAPIgivesusinsightabouttherecoverystatusoftheshardsthatarebuildingindicesinourcluster(learnmoreaboutrecoveryinThegatewayandrecoverymodulessectionofChapter9,ElasticsearchClusterinDetail).

Thesimplestcommandthatwouldreturntheinformationabouttherecoveryofalltheshardsintheclusterwouldlookasfollows:

curl-XGET'http://localhost:9200/_recovery?pretty'

Wecanalsogetinformationaboutrecoveryforparticularindices,suchasthelibraryindexforexample:

curl-XGET'http://localhost:9200/library/_recovery?pretty'

TheresponsereturnedbyElasticsearchisdividedbyindicesandshards.Aresponseforasingleshardcouldlookasfollows:

{

"id":2,

"type":"STORE",

"stage":"DONE",

"primary":true,

"start_time_in_millis":1446132761730,

"stop_time_in_millis":1446132761734,

"total_time_in_millis":4,

"source":{

"id":"DboTibRlT1KJSQYnDPxwZQ",

"host":"127.0.0.1",

"transport_address":"127.0.0.1:9300",

"ip":"127.0.0.1",

"name":"Plague"

},

"target":{

"id":"DboTibRlT1KJSQYnDPxwZQ",

"host":"127.0.0.1",

"transport_address":"127.0.0.1:9300",

"ip":"127.0.0.1",

"name":"Plague"

},

"index":{

"size":{

"total_in_bytes":156,

"reused_in_bytes":156,

"recovered_in_bytes":0,

"percent":"100.0%"

},

"files":{

"total":1,

"reused":1,

"recovered":0,

"percent":"100.0%"

},

"total_time_in_millis":0,

www.EBooksWorld.ir

Page 607: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

"source_throttle_time_in_millis":0,

"target_throttle_time_in_millis":0

},

"translog":{

"recovered":0,

"total":-1,

"percent":"-1.0%",

"total_on_start":-1,

"total_time_in_millis":3

},

"verify_index":{

"check_index_time_in_millis":0,

"total_time_in_millis":0

}

}

Intheprecedingresponse,wecanseeinformationabouttheshardidentifier,thestageofrecovery,informationwhethertheshardisaprimaryorareplica,thetimestampsofthestartandendofrecovery,andthetotaltimetherecoveryprocesstook.Wecanseethesourcenode,targetnode,andinformationabouttheshard’sphysicalstatistics,suchassize,numberoffiles,transactionlog-relatedstatistics,andindexverificationtime.

Itisworthknowingtheinformationaboutthestagesofrecoveryandtypes.Whenitcomestothetypesofrecovery(thetypeattributeintheresponse),wecanexpectthefollowing:theSTORE,SNAPSHOT,REPLICA,andRELOCATINGvalues.Whenitcomestothestageofrecovery(thestageattributeintheresponse),wecanexpectvaluessuchasINIT(recoveryhasnotstarted),INDEX(Elasticsearchcopiesmetadatainformationanddatafromsourcetodestination),START(Elasticsearchisopeningtheshardforuse),FINALIZE(finalstage,whichcleansupgarbage),andDONE(recoveryhasended).

WecanlimittheresponsereturnedbytheindicesrecoveryAPItoonlytheshardsthatarecurrentlyinactiverecoverybyincludingtheactive_only=trueparameterintherequest.Finally,wecanrequestmoredetailedinformationbyaddingthedetailed=trueparameterintheAPIcall.

www.EBooksWorld.ir

Page 608: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndicesshardstoresAPITheindicesshardstoresAPIgivesusinformationaboutthestorefortheshardsofourindices.WeusethisAPIbyrunningasimplecommandtothe/_shard_storesRESTendpointandprovidingornotprovidingthecomma-separatedindicesnames.

Forexample,togetinformationaboutalltheindices,wewouldrunthefollowingcommand:

curl-XGET'http://localhost:9200/_shard_stores?pretty'

Wecanalsogetinformationaboutparticularindices,suchasthelibraryandmapones:

curl-XGET'http://localhost:9200/library,map/_shard_stores?pretty'

TheresponsereturnedbyElasticsearchcontainsinformationaboutthestoreforeachshard.Forexample,thisiswhatElasticsearchreturnedforoneoftheshardsofthelibraryindex:

"0":{

"stores":[{

"DboTibRlT1KJSQYnDPxwZQ":{

"name":"Plague",

"transport_address":"127.0.0.1:9300",

"attributes":{}

},

"version":6,

"allocation":"primary"

}]

}

Wecanseeinformationaboutthenodeinthestoresarrays.Eachentrycontainsnoderelatedinformation(thenodewheretheshardisphysicallylocated),theversionofthestorecopy,andtheallocation,whichcantakethevaluesofprimary(forprimaryshards),replica(forreplicas),andunused(forunassignedshards).

www.EBooksWorld.ir

Page 609: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndicessegmentsAPIThelastAPIwewanttomentionistheLucenesegmentsAPIthatcanbeavailedbyusingthe/_segmentsendpoint.Wecaneitherrunitfortheentirecluster,forexamplelikethis:

curl-XGET'localhost:9200/_segments?pretty'

Wecanalsorunthecommandforindividualindices.Forexample,ifwewouldliketogetsegmentsrelatedinformationforthemapandlibraryindices,wewouldusethefollowingcommand:

curl-XGET'localhost:9200/library,map/_segments?pretty'

ThisAPIprovidesinformationaboutshards,theirplacements,andinformationaboutsegmentsconnectedwiththephysicalindexmanagedbytheApacheLucenelibrary.

www.EBooksWorld.ir

Page 610: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 611: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ControllingtheshardandreplicaallocationTheindicesthatliveinsideyourElasticsearchclustercanbebuiltfrommanyshardsandeachshardcanhavemanyreplicas.Theabilitytodivideasingleindexintomultipleshardsgivesusthepossibilityofdividingthedataintomultiplephysicalinstances.Thereasonswhywewanttodothismaybedifferent.Wemaywanttoparallelizeindexingtogetmorethroughput,orwemaywanttohavesmallershardssothatourqueriesarefaster.Ofcourse,wemayhavetoomanydocumentstofitthemonasinglemachineandwemaywantashardbecauseofthis.Withreplicas,wecanparallelizethequeryloadbyhavingmultiplephysicalcopiesofeachshard.Wecansaythat,usingshardsandreplicas,wecanscaleoutElasticsearch.However,Elasticsearchhastofigureoutwhereintheclusteritshouldplaceshardsandreplicas.Itneedstofigureoutonwhichserver/nodeseachshardorreplicashouldbeplaced.

www.EBooksWorld.ir

Page 612: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ExplicitlycontrollingallocationOneofthemostcommonusecasesthatuseexplicitcontrollingofshardsandreplicasallocationinElasticsearchistime-baseddata,thatis,logs.Eachlogeventhasatimestampassociatedwithit;however,theamountoflogsinmostorganizationsisjustenormous.Thethingisthatyouneedalotofprocessingpowertoindexthem,butyoudon’tusuallysearchhistoricaldata.Ofcourse,youmaywanttodothat,butitwillbedonelessfrequentlythanthequeriesforthemostrecentdata.

Becauseofthis,wecandividetheclusterintosocalledtwotiers—thecoldandthehottier.Thehottiercontainsmorepowerfulnodes,onesthathaveveryfastdisks,lotsofCPUprocessingpower,andmemory.Thesenodeswillhandlebothalotofindexingaswellasqueriesforrecentdata.Thecoldtier,ontheotherhand,willcontainnodesthathaveverylargedisks,butarenotveryfast.Wewon’tbeindexingintothecoldtier;wewillonlystoreourhistoricalindiceshereandsearchthemfromtimetotime.WiththedefaultElasticsearchbehavior,wecan’tbesurewheretheshardsandreplicaswillbeplaced,butluckilyElasticsearchallowsustocontrolthis.

NoteThemainassumptionwhenitcomestotimeseriesdataisthatoncetheyareindexed,theyarenotbeingupdated.ThisistrueforlogindexingusecasesandweassumewecreateElasticsearchdeploymentforsuchausecase.

Theideaistocreatetheindicesthatindextoday’sdataonthehotnodesand,whenwestopusingit(whenanotherdaystarts),weupdatetheindexsettingssothatitismovedtothetiercalledcold.Let’snowseehowwecandothis.

SpecifyingnodeparametersSolet’sdivideourclusterintotwotiers.Wesaytiers,buttheycanbeanynameyouwant,wejustliketheterm“tier”anditiscommonlyused.Weassumethatwehavesixnodes.Wewantourmorepowerfulnodesnumbered1and2tobeplacedinthetiercalledhotandthenodesnumbered3,4,5,and6,whicharesmallerintermsofCPUandmemory,butverylargeintermsofdiskspace,tobeplacedinatiercalledcold.

ConfigurationToconfigure,weaddthefollowingpropertytotheelasticsearch.ymlconfigurationfileonnodes1and2(theonesthataremorepowerful):

node.tier:hot

Ofcourse,wewilladdasimilarpropertytotheelasticsearch.ymlconfigurationfileonnodes3,4,5,and6(thelesspowerfulones):

node.tier:cold

IndexcreationNowlet’screateourdailyindexfortoday’sdata,onecalledlogs_2015-12-10.Aswesaid

www.EBooksWorld.ir

Page 613: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

earlier,wewantthistobeplacedonthenodesinthehottier.Wedothisbyrunningthefollowingcommands:

curl-XPUT'http://localhost:9200/logs_2015-12-10'-d'{

"settings":{

"index":{

"routing.allocation.include.tier":"hot"

}

}

}'

Theprecedingcommandwillresultinthecreationofthelogs_2015-12-10indexandspecificationoftheindex.routing.allocation.include.tierpropertytoit.Wesetthispropertytothehotvalue,whichmeansthatwewanttoplacethelogs_2015-12-10indexonthenodesthathavethenode.tierpropertysettohot.

Now,whenthedayendsandweneedtocreateanewindex,weagainputitonthehotnodes.Wedothisbyrunningthefollowingcommand:

curl-XPUT'http://localhost:9200/logs_2015-12-11'-d'{

"settings":{

"index":{

"routing.allocation.include.tier":"hot"

}

}

}'

Finally,weneedtotellElasticsearchtomovetheindexholdingthedataforthepreviousdaytothecoldtier.Wedothisbyupdatingtheindexsettingsandsettingtheindex.routing.allocation.include.tierpropertytocold.Thisisdoneusingthefollowingcommand:

curl-XPUT'http://localhost:9200/logs_2015-12-10/_settings'-d'{

"index.routing.allocation.include.tier":"cold"

}'

Afterrunningtheprecedingcommand,Elasticsearchwillstartrelocatingtheindexcalledlogs_2015-12-10tothenodesthathavethenode.tierpropertysettocoldintheelasticsearch.ymlfilewithoutanymanualworkneededfromus.

ExcludingnodesfromallocationInthesamemanneraswespecifiedonwhichnodestheindexshouldbeplaced,wecanalsoexcludenodesfromindexallocation.Referringtothepreviouslyshownexample.ifwewanttheindexcalledlogs_2015-12-10tonotbeplacedonthenodeswiththenode.tierpropertysettocold,wewouldrunthefollowingcommand:

curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{

"index.routing.allocation.exclude.tier":"cold"

}'

Noticethatinsteadoftheindex.routing.allocation.include.tierproperty,we’veusedtheindex.routing.allocation.exclude.tierproperty.

www.EBooksWorld.ir

Page 614: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RequiringnodeattributesInadditiontoinclusionandexclusionrules,wecanalsospecifytherulesthatmustmatchinorderforashardtobeallocatedtoagivennode.Thedifferenceisthatwhenusingtheindex.routing.allocation.includeproperty,theindexwillbeplacedonanynodethatmatchesatleastoneoftheprovidedpropertyvalues.Usingindex.routing.allocation.require,Elasticsearchwillplacetheindexonanodethathasallthedefinedvalues.Forexample,let’sassumethatwe’vesetthefollowingsettingsforthelogs_2015-12-10index:

curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{

"index.routing.allocation.require.tier":"hot",

"index.routing.allocation.require.disk_type":"ssd"

}'

Afterrunningtheprecedingcommand,Elasticsearchwouldonlyplacetheshardsofthelogs_2015-12-10indexonanodewiththenode.tierpropertysettohotandthenode.disk_typepropertysettossd.

UsingtheIPaddressforshardallocationInsteadofaddingaspecialparametertothenodesconfiguration,weareallowedtouseIPaddressestospecifywhichnodeswewanttoincludeorexcludefromtheshardsandreplicasallocation.Inordertodothis,insteadofusingthetierpartoftheindex.routing.allocation.include.tierorindex.routing.allocation.exclude.tierproperties,weshouldusethe_ip.Forexample,ifwewouldlikeourlogs_2015-12-10indextobeplacedonlyonthenodeswiththe10.1.2.10and10.1.2.11IPaddresses,wewouldrunthefollowingcommand:

curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{

"index.routing.allocation.include._ip":"10.1.2.10,10.1.2.11"

}'

NoteInadditionto_ip,Elasticsearchalsoallowsustouse_nametospecifyallocationrulesusingnodenamesand_hosttospecifyallocationrulesusinghostnames.

Disk-basedshardallocationInadditiontothealreadydescribedallocationfilteringmethods,Elasticsearchgivesusdisk-basedshardallocationrules.Itallowsustosetallocationrulesbasedonthenodes’diskusage.

Configuringdiskbasedshardallocation

Therearefourpropertiesthatcontrolthebehaviorofadisk-basedshardallocation.Allofthemcanbeupdateddynamicallyorsetintheelasticsearch.ymlconfigurationfile.

Thefirstoftheseiscluster.info.update.interval,whichisbydefaultsetto30secondsanddefineshowoftenElasticsearchupdatesinformationaboutdiskusageonnodes.

www.EBooksWorld.ir

Page 615: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Thesecondpropertyisthecluster.routing.allocation.disk.watermark.low,whichisbydefaultsetto0.85.ThismeansthatElasticsearchwillnotallocatenewshardstoanodethatusesmorethan85%ofitsdiskspace.

Thethirdpropertyisthecluster.routing.allocation.disk.watermark.high,whichcontrolswhenElasticsearchwillstartrelocatingshardsfromagivennode.Itdefaultsto0.90andmeansthatElasticsearchwillstartreallocatingshardswhenthediskusageonagivennodeisequaltoormorethan90%.

Boththecluster.routing.allocation.disk.watermark.lowandcluster.routing.allocation.disk.watermark.highpropertiescanbesettoapercentagevalue(suchas0.60,meaning60%)andtoanabsolutevalue(suchas600mb,meaning600megabytes).

Finally,thelastpropertyiscluster.routing.allocation.disk.include_relocations,whichbydefaultissettotrue.IttellsElasticsearchtotakeintoaccounttheshardsthatarenotyetcopiedtothenodebutElasticsearchisintheprocessofdoingthat.Havingthisbehaviorturnedonbydefaultmeansthatthedisk-basedallocationmechanismwillbemorepessimisticwhenitcomestoavailablediskspaces(whenshardsarerelocating),butwewon’trunintosituationswhereshardscan’tberelocatedbecausetheassumptionsaboutdiskspacewerewrong.

Disablingdiskbasedshardallocation

Thediskbasedshardallocationisenabledbydefault.Wecandisableitbyspecifyingthecluster.routing.allocation.disk.threshold_enabledpropertyandsettingittofalse.Wecandothisintheelasticsearch.ymlfileordynamicallyusingtheclustersettingsAPI:

curl-XPUTlocalhost:9200/_cluster/settings-d'{

"transient":{

"cluster.routing.allocation.disk.threshold_enabled":false

}

}'

www.EBooksWorld.ir

Page 616: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThenumberofshardsandreplicaspernodeInadditiontospecifyingshardsandreplicasallocation,wearealsoallowedtospecifythemaximumnumberofshardsthatcanbeplacedonasinglenodeforasingleindex.Forexample,ifwewouldlikeourlogs_2015-12-10indextohaveonlyasingleshardpernode,wewouldrunthefollowingcommand:

curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{

"index.routing.allocation.total_shards_per_node":1

}'

Thispropertycanbeplacedintheelasticsearch.ymlfileorcanbeupdatedonliveindicesusingtheprecedingcommand.PleaserememberthatyourclustercanstayintheredstateifElasticsearchwon’tbeabletoallocatealltheprimaryshards.

www.EBooksWorld.ir

Page 617: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AllocationthrottlingTheElasticsearchallocationmechanismcanbethrottled,whichmeansthatwecancontrolhowmuchresourcesElasticsearchwilluseduringtheshardallocationandrecoveryprocess.Wearegivenfivepropertiestocontrol,whichareasfollows:

cluster.routing.allocation.node_concurrent_recoveries:Thispropertydefineshowmanyconcurrentshardrecoveriesmaybehappeningatthesametimeonanode.Thisdefaultsto2andshouldbeincreasedifyouwouldlikemoreshardstoberecoveredatthesametimeonasinglenode.However,increasingthisvaluewillresultinmoreresourceconsumptionduringrecovery.Also,pleaserememberthatduringthereplicarecoveryprocess,datawillbecopiedfromtheothernodesoverthenetwork,whichcanbeslow.cluster.routing.allocation.node_initial_primaries_recoveries:Thispropertydefaultsto4anddefineshowmanyprimaryshardsarerecoveredatthesametimeonagivennode.Becauseprimaryshardrecoveryusesdatafromlocaldisks,thisprocessshouldbeveryfast.cluster.routing.allocation.same_shard.host:ABooleanpropertythatdefaultstofalseandisapplicableonlywhenmultipleElasticsearchnodesarestartedonthesamemachine.Whensettotrue,thiswillforceElasticsearchtocheckwhetherphysicalcopiesofthesameshardarepresentonasinglephysicalmachine.Thedefaultfalsevaluemeansnocheckisdone.indices.recovery.concurrent_streams:Thisisthenumberofnetworkstreamsusedtocopydatafromothernodesthatcanbeusedconcurrentlyonasinglenode.Themorethestreams,thefasterthedatawillbecopied,butthiswillresultinmoreresourceconsumption.Thispropertydefaultsto3.indices.recovery.concurrent_small_file_streams:Thisissimilartotheindices.recovery.concurrent_streamsproperty,butdefineshowmanyconcurrentdatastreamsElasticsearchwillusetocopysmallfiles(onesthatareunder5mbinsize).Thispropertydefaultsto2.

Thisallowsustoperformachecktopreventtheallocationofmultipleinstancesofthesameshardonasinglehost,basedonhostnameandhostaddress.Thisdefaultstofalse,meaningthatnocheckisperformedbydefault.Thissettingonlyappliesifmultiplenodesarestartedonthesamemachine.

www.EBooksWorld.ir

Page 618: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Cluster-wideallocationInadditiontotheperindicesallocationsettings,Elasticsearchalsoallowsustocontrolshardandindicesallocationonacluster-widebasis—socalledshardallocationawareness.Thisisespeciallyusefulwhenwehavenodesindifferentphysicalracksandwewouldliketoplaceshardsandreplicasindifferentphysicalnodes.

Let’sstartwithasimpleexample.Weassumethatwehaveaclusterbuiltoffournodes.Eachnodeinadifferentphysicalrack.Thesimplegraphicthatillustratesthisisasfollows:

Asyoucansee,ourclusterisbuiltfromfournodes.EachnodewasboundtoaspecificIPaddressandeachnodewasgiventhetagpropertyandagroupproperty(addedtoelasticsearch.ymlasthenode.tagandnode.groupproperties).Thisclusterwillservethepurposeofshowinghowshardallocationfilteringworks.Thegroupandtagpropertiescanbegivenwhatevernamesyouwant,youjustneedtoprefixyourdesiredpropertynamewiththenodename,forexample,ifyouwouldliketouseapartypropertyname,youwouldjustaddnode.party:party1toyourelasticsearch.yml.

AllocationawarenessAllocationawarenessallowsustoconfigureshardsandtheirreplicasallocationwiththeuseofgenericparameters.Inordertoillustratehowallocationawarenessworks,wewilluseourexamplecluster.Fortheexampletowork,weshouldaddthefollowingpropertytotheelasticsearch.ymlfile:

www.EBooksWorld.ir

Page 619: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

cluster.routing.allocation.awareness.attributes:group

ThiswilltellElasticsearchtousethenode.grouppropertyastheawarenessparameter.

NoteYoucanspecifymultipleattributeswhensettingthecluster.routing.allocation.awareness.attributesproperty.Forexample:cluster.routing.allocation.awareness.attributes:group,node

Afterthis,let’sstartthefirsttwonodes,theoneswiththenode.groupparameterequaltogroupA,andlet’screateanindexbyrunningthefollowingcommand:

curl-XPOST'localhost:9200/awarness'-d'{

"settings":{

"index":{

"number_of_shards":1,"number_of_replicas":1

}

}

}'

Afterthiscommand,ourtwo-nodeclusterwilllookmoreorlesslikethis:

Asyoucansee,theindexwasdividedbetweenthetwonodesevenly.Nowlet’sseewhathappenswhenwelaunchtherestofthenodes(theoneswithnode.groupsettogroupB):

www.EBooksWorld.ir

Page 620: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Noticethedifference—theprimaryshardswerenotmovedfromtheiroriginalallocationnodes,butthereplicashardsweremovedtothenodeswithadifferentnode.groupvalue.That’sexactlyright;whenusingshardallocationawareness,Elasticsearchwon’tallocatetheprimaryshardsandreplicasofthesameindextothenodeswiththesamevalueofthepropertyusedtodeterminetheallocationawareness(whichinourcaseisthenode.group).

NotePleaserememberthatwhenusingallocationawareness,shardswillnotbeallocatedtothenodethatdoesn’thavetheexpectedattributesset.Soinourexample,anodewithoutthenode.grouppropertysetwillnotbetakenintoconsiderationbytheallocationmechanism.

ForcingallocationawarenessForcingallocationawarenesscancomeinhandywhenweknow,inadvance,howmanyvaluesourawarenessattributescantakeandwedon’twantmorereplicasthanneededtobeallocatedinourcluster,forexample,nottooverloadourclusterwithtoomanyreplicas.Forthis,wecanforcetheallocationawarenesstobeactiveonlyforcertainattributes.Wecanspecifythesevaluesusingthecluster.routing.allocation.awareness.force.zone.valuespropertyandprovidingalistofcomma-separatedvaluestoit.Forexample,ifwewouldliketheallocationawarenesstouseonlythegroupAandgroupBvaluesofthenode.groupproperty,wewouldaddthefollowingtotheelasticsearch.ymlfile:

www.EBooksWorld.ir

Page 621: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

cluster.routing.allocation.awareness.attributes:group

cluster.routing.allocation.awareness.force.zone.values:groupA,groupB

FilteringElasticsearchallowsustoconfigureallocationfortheentireclusterorfortheindexlevel.Inthecaseofclusterallocation,wecanusethepropertiesprefixes:

cluster.routing.allocation.include

cluster.routing.allocation.require

cluster.routing.allocation.exclude

Whenitcomestoindex-specificallocation,wecanusethefollowingpropertiesprefixes:

index.routing.allocation.include

index.routing.allocation.require

index.routing.allocation.exclude

Thepreviouslymentionedprefixescanbeusedwiththepropertiesthatwe’vedefinedintheelasticsearch.ymlfile(ourtagandgroupproperties)andwithaspecialpropertycalled_ipthatallowsustomatchorexcludetheuseofthenodes’IPaddresses,forexample,likethis:

cluster.routing.allocation.include._ip:192.168.2.1

IfwewouldliketoincludenodeswithagrouppropertymatchingthegroupAvalue,wewouldsetthefollowingproperty:

cluster.routing.allocation.include.group:groupA

Noticethatwe’veusedthecluster.routing.allocation.includeprefixandwe’veconcatenateditwiththenameoftheproperty,whichisgroupinourcase.

Whatdoinclude,exclude,andrequiremean

Ifyoulookcloselyattheprecedingparameters,youwillnoticethattherearethreekinds:

include:Thistypewillresultinincludingallthenodeswiththisparameterdefined.Ifmultipleincludeconditionsarevisiblethanallthenodesthatmatchatleastaoneoftheseconditionswillbetakenintoconsiderationwhenallocatingshards.Forexample,ifweaddtwocluster.routing.allocation.include.tagparameterstoourconfiguration,onewithapropertywiththevalueofnode1andsecondwiththenode2value,wewouldendupwithindices(actuallytheirshards)beingallocatedtothefirstandsecondnode(countingfromlefttoright).TosumupthenodesthathavetheincludeallocationparametertypewillbetakenintoconsiderationbyElasticsearchwhenchoosingthenodestoplaceshardson,butthisdoesn’tmeanthatElasticsearchwillputshardsinthem.require:Thisparameter,whichwasintroducedintheElasticsearch0.90typeofallocationfilter,requiresallthenodestohaveavaluethatmatchesthevalueofthisproperty.Forexample,ifweaddonecluster.routing.allocation.require.tagparametertoourconfigurationwiththevalueofnode1andacluster.routing.allocation.require.groupparameterwiththevalueofgroupA,

www.EBooksWorld.ir

Page 622: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

wewouldendupwithshardsallocatedonlytothefirstnode(theonewithanIPaddressof192.168.2.1).exclude:Thisparameterallowsustoexcludenodeswithgivenpropertiesfromtheallocationprocess.Forexample,ifwesetcluster.routing.allocation.include.tagtogroupA,wewouldendupwithindicesbeingallocatedonlytothenodeswithIPaddresses192.168.3.1and192.168.3.2(thethirdandfourthnodesinourexample).

NoteThepropertyvaluecanusesimplewildcardcharacters.Forexample,ifwewanttoincludeallthenodesthathavethegroupparametervaluebeginningwithgroup,wecouldsetthecluster.routing.allocation.include.grouppropertytogroup*.Intheexampleclustercase,thiswouldresultinmatchingnodeswiththegroupAandgroupBgroupparametervalues.

www.EBooksWorld.ir

Page 623: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ManuallymovingshardsandreplicasThelastthingwewantedtodiscussistheabilitytomanuallymoveshardsbetweennodes.Elasticsearchexposesthe_cluster/rerouteRESTend-point,whichallowsustocontrolthat.Thefollowingoperationsareavailable:

MovingashardfromnodetonodeCancellingshardallocationForcingshardallocation

Nowlet’slookcloselyatalloftheprecedingoperations.

MovingshardsLet’ssaywehavetwonodescalledes_node_oneandes_node_two,andwehavetwoshardsoftheshopindexplacedbyElasticsearchonthefirstnodeandwewouldliketomovethesecondshardtothesecondnode.Inordertodothis,wecanrunthefollowingcommand:

curl-XPOST'localhost:9200/_cluster/reroute'-d'{

"commands":[{

"move":{

"index":"shop",

"shard":1,

"from_node":"es_node_one",

"to_node":"es_node_two"

}

}]

}'

We’vespecifiedthemovecommand,whichallowsustomoveshards(andreplicas)oftheindexspecifiedbytheindexproperty.Theshardpropertyisthenumberofshardswewanttomove.And,finally,thefrom_nodepropertyspecifiesthenameofthenodewewanttomovetheshardfromandtheto_nodepropertyspecifiesthenameofthenodewewanttheshardtobeplacedon.

CancelingshardallocationIfwewouldliketocancelanon-goingallocationprocess,wecanrunthecancelcommandandspecifytheindex,node,andshardwewanttocanceltheallocationfor.Forexample:

curl-XPOST'localhost:9200/_cluster/reroute'-d'{

"commands":[{

"cancel":{

"index":"shop",

"shard":0,

"node":"es_node_one"

}

}]

}'

Theprecedingcommandwouldcanceltheallocationofshard0oftheshopindexonthe

www.EBooksWorld.ir

Page 624: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

es_node_onenode.

ForcingshardallocationInadditiontocancellingandmovingshardsandreplicas,wearealsoallowedtoallocateanunallocatedshardtoaspecificnode.Forexample,ifwehaveanunallocatedshardnumbered0fortheusersindexandwewouldlikeittobeallocatedtoes_node_twobyElasticsearch,wewouldrunthefollowingcommand:

curl-XPOST'localhost:9200/_cluster/reroute'-d'{

"commands":[{

"allocate":{

"index":"users",

"shard":0,

"node":"es_node_two"

}

}]

}'

MultiplecommandsperHTTPrequestWecan,ofcourse,includemultiplecommandsinasingleHTTPrequest.Forexample:

curl-XPOST'localhost:9200/_cluster/reroute'-d'{

"commands":[

{"move":{"index":"shop","shard":1,"from_node":"es_node_one",

"to_node":"es_node_two"}},

{"cancel":{"index":"shop","shard":0,"node":"es_node_one"}}

]

}'

AllowingoperationsonprimaryshardsThecancelandallocatecommandsacceptanadditionalallow_primaryparameter.Ifsettotrue,ittellsElasticsearchthattheoperationcanbeperformedontheprimaryshard.Pleasebeadvisedthatoperationswiththeallow_primaryparametersettotruemayresultindataloss.

www.EBooksWorld.ir

Page 625: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

HandlingrollingrestartsThereisonemorethingthatwewouldliketodiscusswhenitcomestoshardandreplicaallocation—handlingrollingrestarts.WhenElasticsearchisrestarted,itmaytakesometimetogetitbacktothecluster.Duringthistime,therestoftheclustermaydecidetodorebalancingandmoveshardsaround.Whenweknowwearedoingrollingrestarts,forexample,toupdateElasticsearchtoanewversionorinstallaplugin,wemaywanttotellthistoElasticsearch.Theprocedureforrestartingeachnodeshouldbeasfollows:

First,beforeyoudoanymaintenance,youshouldstoptheallocationbysendingthefollowingcommand:

curl-XPUT'localhost:9200/_cluster/settings'-d'{

"transient":{

"cluster.routing.allocation.enable":"none"

}

}'

ThiswilltellElasticsearchtostopallocation.Afterthis,wewillstopthenodewewanttodomaintenanceonandstartitagain.Afteritjoinsthecluster,wecanenabletheallocationagainbyrunningthefollowing:

curl-XPUT'localhost:9200/_cluster/settings'-d'{

"transient":{

"cluster.routing.allocation.enable":"all"

}

}'

Thiswillenabletheallocationagain.Thisprocedureshouldberepeatedforeachnodewewanttoperformmaintenanceon.

www.EBooksWorld.ir

Page 626: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 627: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ControllingclusterrebalancingBydefault,Elasticsearchtriestokeeptheshardsandtheirreplicasevenlybalancedacrossthecluster.Suchbehaviorisgoodinmostcases,buttherearetimeswhenwewanttocontrolthisbehavior—forexample,duringrollingrestarts.Wedon’twanttorebalancetheentireclusterwhenoneortwonodesarerestarted.Inthissection,wewilllookathowtoavoidclusterrebalanceandcontrolthisprocess’behaviorindepth.

Imagineasituationwhereyouknowthatyournetworkcanhandleveryhighamountsoftrafficortheoppositeofthis—yournetworkisusedextensivelyandyouwanttoavoidtoomuchloadonit.TheotherexampleisthatyoumaywanttodecreasethepressurethatisputonyourI/Osubsystemafterafull-clusterrestartandyouwanttohavelessshardsandreplicasbeinginitializedatthesametime.Theseareonlytwoexampleswhererebalancecontrolmaybehandy.

www.EBooksWorld.ir

Page 628: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UnderstandingrebalanceRebalancingistheprocessofmovingshardsbetweendifferentnodesinourcluster.Aswehavealreadymentioned,itisfineinmostsituations,butsometimesyoumaywanttocompletelyavoidthis.Forexample,ifwedefinehowourshardsareplacedandwewanttokeepitthisway,wemaywanttoavoidrebalancing.However,bydefault,ElasticsearchwilltrytorebalancetheclusterwhenevertheclusterstatechangesandElasticsearchthinksarebalanceisneeded(andthedelayedtimeouthaspassedasdiscussedinThegatewayandrecoverymodulessectionofChapter9,ElasticsearchClusterinDetail).

www.EBooksWorld.ir

Page 629: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ClusterbeingreadyWealreadyknowthatourindicesarebuiltfromshardsandreplicas.Primaryshardsorjustshardsaretheonesthatgetthedatafirst.Thereplicasarephysicalcopiesoftheprimariesandgetthedatafromthem.Youcanthinkoftheclusterasbeingreadytobeusedwhenalltheprimaryshardsareassignedtotheirnodesinyourcluster–assoonastheyellowhealthstateisachieved.However,Elasticsearchmaystillinitializeothershards–thereplicas.However,youcanuseyourclusterandbesurethatyoucansearchyourentiredatasetandsendindexchangecommands.Thenthecommandswillbeprocessedproperly.

www.EBooksWorld.ir

Page 630: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheclusterrebalancesettingsElasticsearchletsuscontroltherebalanceprocesswiththeuseofafewpropertiesthatcanbesetintheelasticsearch.ymlfileorbyusingtheElasticsearchRESTAPI(asdescribedinTheupdatesettingsAPIsectionofChapter9,ElasticsearchClusterinDetail).

ControllingwhenrebalancingwillbeallowedThecluster.routing.allocation.allow_rebalancepropertyallowsustospecifywhenrebalancingisallowed.Thispropertycantakethefollowingvalues:

always:Rebalancingwillbeallowedassoonasit’sneededindices_primaries_active:Rebalancingwillbeallowedwhenalltheprimaryshardsareinitializedindices_all_active:Thedefaultone,whichmeansthatrebalancingwillbeallowedwhenalltheshardsandreplicasareinitialized

Thecluster.routing.allocation.allow_rebalancepropertycanbesetintheelasticsearch.ymlconfigurationfileandupdateddynamicallyaswell.

ControllingthenumberofshardsbeingmovedbetweennodesconcurrentlyThecluster.routing.allocation.cluster_concurrent_rebalancepropertyallowsustospecifyhowmanyshardscanbemovedbetweennodesatonceintheentirecluster.Ifyouhaveaclusterthatisbuiltfrommanynodes,youcanincreasethisvalue.Thisvaluedefaultsto2.Youcanincreasethedefaultvalueifyouwouldliketherebalancingtobeperformedfaster,butthiswillputmorepressureonyourclusterresourcesandwillaffectindexingandquerying.Thecluster.routing.allocation.cluster_concurrent_rebalancepropertycanbesetintheelasticsearch.ymlconfigurationfileandupdateddynamicallyaswell.

ControllingwhichshardsmayberebalancedThecluster.routing.allocation.enablepropertyallowsustospecifywhenwhichshardswillbeallowedtoberebalancedbyElasticsearch.Thispropertycantakethefollowingvalues:

all:Thedefaultbehavior,whichtellsElasticsearchtorebalancealltheshardsintheclusterprimaries:Thisvalueallowstherebalancingoftheprimaryshardsonlyreplicas:Thisvalueallowstherebalancingofthereplicashardsonlynone:Thisvaluedisablestherebalancingofalltypeofshardsforallindicesinthecluster

Thecluster.routing.allocation.enablepropertycanbesetintheelasticsearch.ymlconfigurationfileandupdateddynamicallyaswell.

www.EBooksWorld.ir

Page 631: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 632: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheCatAPITheElasticsearchAdminAPIisquiteextensiveandcoversalmosteverypartofElasticsearcharchitecture:fromlow-levelinformationaboutLucenetohigh-levelonesabouttheclusternodesandtheirhealth.AllthisinformationisavailableusingtheElasticsearchJavaAPIaswellastheRESTAPI.However,thereturneddata,eventhoughitisaJSONdocument,isnotveryreadablebyauser,atleastwhenitcomestotheamountofinformationgiven.

Becauseofthis,Elasticsearchprovidesuswithamorehuman-friendlyAPI–theCatAPI.ThespecialCatAPIreturnsdatainasimpletext,tabularformatandwhat’smore–itprovidesaggregateddatathatisusuallyusablewithoutanyfurtherprocessing.

www.EBooksWorld.ir

Page 633: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThebasicsThebaseendpointfortheCatAPIisquiteobvious:itis/_cat.Withoutanyparameters,itshowsalltheavailableendpointsforthisAPI.Wecancheckthisbyrunningthefollowingcommand:

curl-XGET'localhost:9200/_cat'

TheresponsereturnedbyElasticsearchshouldbesimilaroridentical(dependingonyourElasticsearchversion)tothefollowingone:

=^.^=

/_cat/allocation

/_cat/shards

/_cat/shards/{index}

/_cat/master

/_cat/nodes

/_cat/indices

/_cat/indices/{index}

/_cat/segments

/_cat/segments/{index}

/_cat/count

/_cat/count/{index}

/_cat/recovery

/_cat/recovery/{index}

/_cat/health

/_cat/pending_tasks

/_cat/aliases

/_cat/aliases/{alias}

/_cat/thread_pool

/_cat/plugins

/_cat/fielddata

/_cat/fielddata/{fields}

/_cat/nodeattrs

/_cat/repositories

/_cat/snapshots/{repository}

SolookingfromthetopElasticsearchallowsustogetthefollowinginformationusingtheCatAPI:

Shardallocation-relatedinformationAllshards-relatedinformation(alsoonelimitedtoagivenindex)InformationaboutthemasternodeNodesinformationIndicesstatistics(alsoonelimitedtoagivenindex)Segmentsstatistics(alsoonelimitedtoagivenindex)Documentscount(alsoonelimitedtoagivenindex)Recoveryinformation(alsoonelimitedtoagivenindex)ClusterhealthTaskspendingforexecutionIndexaliasesandindicesforagivenaliasThreadpoolconfiguration

www.EBooksWorld.ir

Page 634: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PluginsinstalledoneachnodeFielddatacachesizeandfielddatacachesizesforindividualfieldsNodeattributesinformationDefinedbackuprepositoriesSnapshotscreatedinthebackuprepository

www.EBooksWorld.ir

Page 635: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UsingCatAPIUsingtheCatAPIisassimpleasrunningtheGETrequesttotheoneofthepreviouslymentionedRESTend-points.Forexample,togetinformationabouttheclusterstate,wecouldrunthefollowingcommand:

curl-XGET'localhost:9200/_cat/health'

TheresponsereturnedbyElasticsearchfortheprecedingcommandshouldbesimilartothefollowingone,but,ofcourse,willbedependentonyourcluster:

144629204112:47:21elasticsearchyellow11212100210-50.0%

Thisiscleanandnice.Becauseitisintabularformat,itisalsoeasytousetheresponseintoolssuchasgrep,awk,orsed–astandardsetoftoolsforeveryadministrator.Itisalsomorereadableonceyouknowwhatitisallabout.

Toaddaheaderdescribingeachcolumnpurpose,wejustneedtoaddanadditionalvparameter,justlikethis:

curl-XGET'localhost:9200/_cat/health?v'

CommonargumentsEveryCatAPIendpointhasitsownarguments,butthereareafewcommonoptionsthataresharedamongallofthem:

v:Thisaddsaheaderlinetotheresponsewiththenamesofpresenteditems.h:Thisallowsustoshowonlythechosencolumns,forexampleh=status,node.total,shards,pri.help:Thislistsallthepossiblecolumnsthatthisparticularendpointisabletoshow.Thecommandshowsthenameoftheparameter,itsabbreviation,anddescription.bytes:Thisistheformatfortheinformationrepresentingthevaluesinbytes.Aswesaidearlier,theCatAPIisdesignedtobeusedbyhumansandbecauseofthis,bydefault,thesevaluesarerepresentedinhuman-readableform,forexample:3.5kBor40GB.Thebytesoptionallowsthesettingofthesamebaseforallthenumbers,sosortingornumericalcomparisonwillbeeasier.Forexample,bytes=bpresentsallvaluesinbytes,bytes=kinkilobytes,andsoon.

NoteForthefulllistofargumentsforeachCatAPIendpoint,pleaserefertotheofficialElasticsearchdocumentationavailableat:https://www.elastic.co/guide/en/elasticsearch/reference/2.2/cat.html.

www.EBooksWorld.ir

Page 636: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TheexamplesWhenwewrotethisbook,theCatAPIhadtwenty-twoendpoints.Wedon’twanttodescribethemall–itwouldbearepeatofinformationcontainedinthedocumentationanditdoesn’tmakesense.However,wedidn’twanttoleavethissectionwithoutanexampleregardingtheusageoftheCatAPI.Becauseofthis,wedecidedtoshowhoweasilyyoucangetinformationusingtheCatAPIcomparedtothestandardJSONAPIexposedbyElasticsearch.

GettinginformationaboutthemasternodeThefirstexampleshowshoweasyitistogetinformationaboutwhichnodeinourclusteristhemasternode.Bycallingthe/_cat/masterRESTendpointwecangetinformationaboutthenodesandwhichoneofthemiscurrentlybeingelectedasamaster.Forexample,let’srunthefollowingcommand:

curl-XGET'localhost:9200/_cat/master?v'

TheresponsereturnedbyElasticsearchformylocaltwo-nodeclusterlooksasfollows:

idhostipnode

Cfj3tzqpSNi5SZx4g8osAg127.0.0.1127.0.0.1Skin

Asyoucanseeinresponse,we’vegottheinformationaboutwhichnodeiscurrentlyelectedasthemaster:wecanseeitsidentifier,IPaddress,andname.

GettinginformationaboutthenodesThe/_cat/nodesRESTendpointprovidesinformationaboutallthenodesinthecluster.Let’sseewhatElasticsearchwillreturnafterrunningthefollowingcommand:

curl-XGET'localhost:9200/_cat/nodes?v&h=name,node.role,load,uptime'

Intheprecedingexample,wehaveusedthepossibilityofchoosingwhatinformationwewanttogetfromtheapproximatelyseventyoptionsofthisendpoint.Wehavechosentogetonlythenodename,itsrole—whetherthenodeisadataorclientnode-,nodeload,anditsuptime.

AndtheresponsereturnedbyElasticsearchlooksasfollows:

namenode.roleloaduptime

Skind2.001.3h

Asyoucansee,the/_cat/nodesRESTendpointprovidesalltherequestedinformationaboutthenodesinthecluster.

RetrievingrecoveryinformationforanindexAnotherniceexampleofusingtheCatAPIisgettinginformationabouttherecoveryofasingleindexoralltheindices.Inourcase,wewillretrieverecoveryinformationforasinglelibraryindexbyrunningthefollowingcommand:

curl-XGET'localhost:9200/_cat/recovery/library?

www.EBooksWorld.ir

Page 637: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

v&h=index,shard,time,type,stage,files_percent'

Theresponsefortheprecedingcommandlooksasfollows:

indexshardtimetypestagefiles_percent

library075storedone100.0%

library183storedone100.0%

library288storedone100.0%

library379storedone100.0%

library45storedone100.0%

www.EBooksWorld.ir

Page 638: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 639: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

WarmingupSometimes,theremaybeaneedtoprepareElasticsearchtohandleyourqueries.Maybeit’sbecauseyouheavilyrelyonthefielddatacacheandyouwantittobeloadedbeforeyourproductionqueriesarrive,ormaybeyouwanttowarmupyouroperatingsystem’sI/Ocachesothatthedataindicesfilesarereadfromthecache.Whateverthereason,Elasticsearchallowsustousesocalledwarmingqueriesforourtypesandindices.

www.EBooksWorld.ir

Page 640: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DefininganewwarmingqueryAwarmingqueryisnothingmorethantheusualquerystoredinaspecialtypecalled_warmerinElasticsearch.Let’sassumethatwehavethefollowingquerythatwewanttouseforwarmingup:

curl-XGETlocalhost:9200/library/_search?pretty-d'{

"query":{

"match_all":{}

},

"aggs":{

"warming_aggs":{

"terms":{

"field":"tags"

}

}

}

}'

Tostoretheprecedingqueryasawarmingqueryforourlibraryindex,wewillrunthefollowingcommand:

curl-XPUT'localhost:9200/library/_warmer/tags_warming_query'-d'{

"query":{

"match_all":{}

},

"aggs":{

"warming_aggs":{

"terms":{

"field":"tags"

}

}

}

}'

Theprecedingcommandwillregisterourqueryasawarmingquerywiththetags_warming_queryname.Youcanhavemultiplewarmingqueriesforyourindex,buteachofthesequeriesneedstohaveauniquename.

Wecannotonlydefinewarmingqueriesfortheentireindex,butalsoforthespecifictypeinit.Forexample,tostoreourpreviouslyshownqueryasthewarmingqueryonlyforthebooktypeinthelibraryindex,runtheprecedingcommandnottothe/library/_warmerURIbutto/library/book/_warmer.So,theentirecommandwillbeasfollows:

curl-XPUT'localhost:9200/library/book/_warmer/tags_warming_query'-d'{

"query":{

"match_all":{}

},

"aggs":{

"warming_aggs":{

"terms":{

"field":"tags"

}

}

www.EBooksWorld.ir

Page 641: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

}

}'

Afteraddingawarmingquery,beforeElasticsearchallowsanewsegmenttobesearchedon,itwillbewarmedupbyrunningthedefinedwarmingqueriesonthatsegment.ThisallowsElasticsearchandtheoperatingsystemtocachedataand,thus,speedupsearching.

JustaswereadintheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster,Lucenedividestheindexintopartscalledsegments,whichoncewrittencan’tbechanged.Everynewcommitoperationcreatesanewsegment(whichiseventuallymergedifthenumberofsegmentsistoohigh),whichLuceneusesforsearching.

NotePleasenotethattheWarmerAPIwillberemovedinthefutureversionsofElasticsearch.

www.EBooksWorld.ir

Page 642: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RetrievingthedefinedwarmingqueriesInordertogetaspecificwarmingqueryforourindex,wejustneedtoknowitsname.Forexample,ifwewanttogetthewarmingquerynamedastags_warming_queryforourlibraryindex,wewillrunthefollowingcommand:

curl-XGET'localhost:9200/library/_warmer/tags_warming_query?pretty'

TheresultreturnedbyElasticsearchwillbeasfollows:

{

"library":{

"warmers":{

"tags_warming_query":{

"types":["book"],

"source":{

"query":{

"match_all":{}

},

"aggs":{

"warming_aggs":{

"terms":{

"field":"tags"

}

}

}

}

}

}

}

}

Wecanalsogetallthewarmingqueriesfortheindexandtypeusingthefollowingcommand:

curl-XGET'localhost:9200/library/_warmer?pretty'

Andfinally,wecanalsogetallthewarmingqueriesthatstartwithagivenprefix.Forexample,ifwewanttogetallthewarmingqueriesforthelibraryindexthatstartwiththetagsprefix,wewillrunthefollowingcommand:

curl-XGET'localhost:9200/library/_warmer/tags*?pretty'

www.EBooksWorld.ir

Page 643: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DeletingawarmingqueryDeletingawarmingqueryisverysimilartogettingone;wejustneedtousetheDELETEHTTPmethod.Todeleteaspecificwarmingqueryfromourindex,wejustneedtoknowitsname.Forexample,ifwewanttodeletethewarmingquerynamedtags_warming_queryforourlibraryindex,wewillrunthefollowingcommand:

curl-XDELETE'localhost:9200/library/_warmer/tags_warming_query'

Wecanalsodeleteallthewarmingqueriesfortheindexusingthefollowingcommand:

curl-XDELETE'localhost:9200/library/_warmer/_all'

Andfinally,wecanalsoremoveallthewarmingqueriesthatstartwithagivenprefix.Forexample,ifwewanttoremoveallthewarmingqueriesforthelibraryindexthatstartwiththetagsprefix,wewillrunthefollowingcommand:

curl-XDELETE'localhost:9200/library/_warmer/tags*'

www.EBooksWorld.ir

Page 644: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DisablingthewarmingupfunctionalityTodisablethewarmingqueriestotallybuttosavetheminthe_warmerindex,youshouldsettheindex.warmer.enabledconfigurationpropertytofalse(settingthispropertytotruewillresultinenablingthewarmingupfunctionality).Thissettingcanbeeitherputintheelasticsearch.ymlfileorjustsetusingtheRESTAPIonalivecluster.

Forexample,ifwewanttodisablethewarmingupfunctionalityforthelibraryindex,wewillrunthefollowingcommand:

curl-XPUT'localhost:9200/library/_settings'-d'{

"index.warmer.enabled":false

}'

www.EBooksWorld.ir

Page 645: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ChoosingqueriesforwarmingFinally,weshouldaskourselvesonequestion:whichqueriesshouldbeconsideredascandidatesforwarming.Typically,you’llwanttochooseonesthatareexpensivetoexecuteandonesthatrequirecachestobepopulated.Soyou’llprobablywanttochoosequeriesthatincludeaggregationsandsortingbasedonthefieldsinyourindex.Thiswillforcetheoperatingsystemtoloadthepartoftheindicesthatholdthedatarelatedtosuchqueriesandimprovetheperformanceofconsecutivequeriesthatarerun.Inadditiontothis,parent-childqueriesandnestedqueriesarealsopotentialcandidatesforwarming.Youmayalsochooseotherqueriesbylookingatthelogs,andfindingwhereyourperformanceisnotasgreatasyouwantittobe.Suchqueriesmayalsobeperfectcandidatesforwarmingup.

Forexample,let’ssaythatwehavethefollowingloggingconfigurationsetintheelasticsearch.ymlfile:

index.search.slowlog.threshold.query.warn:10s

index.search.slowlog.threshold.query.info:5s

index.search.slowlog.threshold.query.debug:2s

index.search.slowlog.threshold.query.trace:1s

Andwehavethefollowinglogginglevelsetinthelogging.ymlconfigurationfile:

logger:

index.search.slowlog:TRACE,index_search_slow_log_file

Noticethattheindex.search.slowlog.threshold.query.tracepropertyissetto1sandtheindex.search.slowloglogginglevelissettoTRACE.Thismeansthatwheneveraqueryisexecutedforlongerthanonesecond(onashard,notintotal),itwillbeloggedintotheslowlogfile(thenameofwhichisspecifiedbytheindex_search_slow_log_fileconfigurationsectionofthelogging.ymlconfigurationfile).Forexample,thefollowingcanbefoundinaslowlogfile:

[2015-11-2519:53:00,248][TRACE][index.search.slowlog.query]

took[340000.2ms],took_millis[3400],types[],stats[],

search_type[QUERY_THEN_FETCH],total_shards[5],source[{"query":

{"match_all":{}},"aggs":{"warming_aggs":{"terms":{"field":"tags"}}}}],

extra_source[],

Asyoucansee,intheprecedinglogline,wehavethequerytime,searchtype,andthequerysource,whichshowsustheexecutedquery.

Ofcourse,thevaluescanbedifferentinyourconfigurationbuttheslowlogcanbeavaluablesourceofthequeriesthathavebeenrunningtoolongandmayneedtohavesomewarmupdefined;maybetheseareparent-childqueriesandneedsomeidentifierstobefetchedtoperformbetter,ormaybeyouareusingafilterthatisexpensivewhenyouexecuteitforthefirsttime.

Thereisonethingyoushouldremember:don’toverloadyourElasticsearchclusterwithtoomanywarmingqueriesbecauseyoumayendupspendingtoomuchtimeinwarmingupinsteadofprocessingyourproductionqueries.

www.EBooksWorld.ir

Page 646: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 647: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 648: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndexaliasingandusingittosimplifyyoureverydayworkWhenworkingwithmultipleindicesinElasticsearch,youcansometimeslosetrackofthem.Imagineasituationwhereyoustorelogsinyourindicesortime-baseddataingeneral.Usually,theamountofdatainsuchcasesisquitelargeand,therefore,itisagoodsolutiontohavethedatadividedsomehow.Alogicaldivisionofsuchdataisobtainedbycreatingasingleindexforasingledayoflogs(ifyouareinterestedinanopensourcesolutionusedtomanagelogs,lookattheLogstashfromtheElasticsearchsuiteathttps://www.elastic.co/products/logstash).

However,aftersometime,ifwekeepalltheindices,wewillstarthavingaproblemintakingcareofallthat.Anapplicationneedstotakecareofalltheinformation,suchaswhichindextosenddatato,whichtoquery,andsoon.Withthehelpofaliases,wecanchangethistoworkwithasinglenamejustaswewoulduseasingleindex,butwewillworkwithmultipleindices.

www.EBooksWorld.ir

Page 649: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AnaliasWhatisanindexalias?It’sanadditionalnameforoneormoreindicesthatallowsustousetheseindicesbyreferringtothemwiththoseadditionalnames.Asinglealiascanhavemultipleindicesaswellastheotherwayround;asingleindexcanbeapartofmultiplealiases.

However,pleaserememberthatyoucan’tuseanaliasthathasmultipleindicesforindexingorforreal-timeGEToperations.Elasticsearchwillthrowanexceptionifyoudothis.Wecanstilluseanaliasthatlinkstoonlyasingleindexforindexing,though.ThisisbecauseElasticsearchdoesn’tknowinwhichindexthedatashouldbeindexedorfromwhichindexthedocumentshouldbefetched.

www.EBooksWorld.ir

Page 650: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CreatinganaliasTocreateanindexalias,weneedtoruntheHTTPPOSTmethodtothe_aliasesRESTend-pointwithadefinedaction.Forexample,thefollowingrequestwillcreateanewaliascalledweek12thatwillincludetheindicesnamedday10,day11,andday12(weneedtocreatethoseindicesfirst):

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{"add":{"index":"day10","alias":"week12"}},

{"add":{"index":"day11","alias":"week12"}},

{"add":{"index":"day12","alias":"week12"}}

]

}'

Iftheweek12aliasisn’tpresentinourElasticsearchcluster,theprecedingcommandwillcreateit.Ifitispresent,thecommandwilljustaddthespecifiedindicestoit.

Wewouldrunasearchacrossthethreeindicesasfollows:

curl-XGET'localhost:9200/day10,day11,day12/_search?q=test'

Ifeverythinggoeswell,wecaninsteadrunitasfollows:

curl-XGET'localhost:9200/week12/_search?q=test'

Isn’tthisbetter?

Sometimeswehaveasetofindiceswhereeveryindexservesindependentinformationbutsomequeriesshouldgoacrossallofthem;forexample,wehavededicatedindicesforcountries(country_en,country_us,country_de,andsoon).Inthiscase,wewouldcreatethealiasbygroupingthemall:

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{"add":{"index":"country_*","alias":"countries"}}

]

}'

Thelastcommandcreatedonlyonealias.Elasticsearchallowsyoutorewritethistosomethinglessverbose:

curl-XPUT'localhost:9200/country_*/_alias/countries'

www.EBooksWorld.ir

Page 651: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ModifyingaliasesOfcourse,youcanalsoremoveindicesfromanalias.Wecandothissimilarlytohowweaddindicestoanalias,butinsteadoftheaddcommand,weusetheremoveone.Forexample,toremovetheindexnamedday9fromtheweek12index,wewillrunthefollowingcommand:

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{"remove":{"index":"day9","alias":"week12"}}

]

}'

www.EBooksWorld.ir

Page 652: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CombiningcommandsTheaddandremovecommandscanbesentasasinglerequest.Forexample,ifyouwouldliketocombineallthepreviouslysentcommandsintoasinglerequest,youwillhavetosendthefollowingcommand:

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{"add":{"index":"day10","alias":"week12"}},

{"add":{"index":"day11","alias":"week12"}},

{"add":{"index":"day12","alias":"week12"}},

{"remove":{"index":"day9","alias":"week12"}}

]

}'

www.EBooksWorld.ir

Page 653: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RetrievingaliasesInadditiontoaddingorremovingindicestoorfromaliases,weandourapplicationsthatuseElasticsearchmayneedtoretrieveallthealiasesavailableintheclusterorallthealiasesthatanindexisconnectedto.Toretrievethesealiases,wesendarequestusingtheHTTPGETcommand.Forexample,thefollowingcommandgetsallthealiasesfortheday10indexandthesecondonewillgetalltheavailablealiases:

curl-XGET'localhost:9200/day10/_aliases'

curl-XGET'localhost:9200/_aliases'

Theresponsefromthesecondcommandisasfollows:

{

"day12":{

"aliases":{

"week12":{}

}

},

"library":{

"aliases":{}

},

"day11":{

"aliases":{

"week12":{}

}

},

"day9":{

"aliases":{}

},

"day10":{

"aliases":{

"week12":{}

}

}

}

Youcanalsousethe_aliasendpointtogetallaliasesfromthegivenindex:

curl-XGET'localhost:9200/day10/_alias/*'

Togetaparticularaliasdefinition,youcanusethefollowing:

curl-XGET'localhost:9200/day10/_alias/day12'

www.EBooksWorld.ir

Page 654: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RemovingaliasesYoucanalsoremoveanaliasusingthe_aliasendpoint.Forexample,sendingthefollowingcommandwillremovetheclientaliasfromthedataindex:

curl-XDELETElocalhost:9200/data/_alias/client

www.EBooksWorld.ir

Page 655: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

FilteringaliasesAliasescanbeusedinawaysimilartohowviewsareusedinSQLdatabases.YoucanuseafullQueryDSL(discussedindetailinChapter3,SearchingYourData)andhaveyourfilterappliedtoallcount,search,deletebyquery,andsoon.

Let’slookatanexample.Imaginethatwewanttohavealiasesthatreturndataforacertainclientsowecanuseitinourapplication.Let’ssaythattheclientidentifierweareinterestedinisstoredintheclientIdfieldandweareinterestedinthe12345client.So,let’screatethealiasnamedclientwithourdataindex,whichwillapplyaqueryforclientIdautomatically:

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{

"add":{

"index":"data",

"alias":"client",

"filter":{"term":{"clientId":12345}}

}

}

]

}'

Sowhenusingthedefinedalias,youwillalwaysgetyourrequestfilteredbyatermquerythatensuresthatallthedocumentshavethe12345valueintheclientIdfield.

www.EBooksWorld.ir

Page 656: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AliasesandroutingIntheIntroductiontoroutingsectionofChapter2,IndexingYourData,wetalkedaboutrouting.Similartoaliasesthatusefiltering,wecanaddroutingvaluestothealiases.Imaginethatweareusingroutingonthebasisofuseridentifierandwewanttousethesameroutingvalueswithouraliases.So,forthealiasnamedclient,wewillusetheroutingvaluesof12345,12346,and12347forquerying,andonly12345forindexing.Todothis,wewillcreateanaliasusingthefollowingcommand:

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{

"add":{

"index":"data",

"alias":"client",

"search_routing":"12345,12346,12347",

"index_routing":"12345"

}

}

]

}'

Thisway,whenweindexourdatausingtheclientalias,thevaluesspecifiedbytheindex_routingpropertywillbeused.Atthetimeofquerying,thevaluesspecifiedbythesearch_routingpropertywillbeused.

Thereisonemorething.Pleaselookatthefollowingquerysenttothepreviouslydefinedalias:

curl-XGET'localhost:9200/client/_search?q=test&routing=99999,12345'

Thevalueusedasaroutingvaluewillbe12345.ThisisbecauseElasticsearchwilltakethecommonvaluesofthesearch_routingattributeandthequeryroutingparameter,whichinourcaseis12345.

www.EBooksWorld.ir

Page 657: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ZerodowntimereindexingandaliasesOneofthegreatestadvantagesofusingaliasesistheabilitytore-indexthedatawithoutanydowntimefromthesystemusingElasticsearch.Toachievethis,youwouldneedtointeractwithyourindicesonlythroughaliases—bothforindexingandquerying.Insuchacase,youcanjustcreateanewindex,indexthedatahere,andswitchaliaseswhenneeded.Duringindexing,aliaseswouldstillpointtotheoldindex,sotheapplicationcouldworkasusual.

www.EBooksWorld.ir

Page 658: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 659: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SummaryInthischapter,wediscussedElasticsearchadministration.WestartedbylearninghowtoperformbackupsofourindicesandhowtomonitorourclusterhealthandstateusingitsAPI.Wecontrolledclustershardrebalancingandlearnedhowtoadjustshardallocationaccordingtoourneeds.We’veusedtheCATAPItogetinformationaboutElasticsearchinhuman-readableformandwe’vewarmedupourqueriestomakethemfaster.Finally,we’veusedaliasestoallowabettermanagementofourindicesandtohavemoreflexibility.

Inthenextandfinalchapterofthebook,wewillfocusonahypotheticalonlinelibrarystoretoseehowtomakeElasticsearchworkinpractice.Wewillstartwithabriefintroductionandhardwareconsiderations.WewilltuneasingleinstanceofElasticsearchandproperlyconfigureourclusterbydiscussingeachofitspartsandprovidingaproperarchitecture.Wewillverticallyexpandtheclusterandprepareitforbothhighqueryingandhighindexingload.Finally,wewilllearnhowtomonitorsuchapreparedcluster.

www.EBooksWorld.ir

Page 660: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 661: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Chapter11.ScalingbyExampleInthepreviouschapter,wediscussedElasticsearchadministration.WestartedwithdiscussionaboutbackupsandhowwecandothembyusingavailableAPI.Wemonitoredthehealthandstateofourclustersandnodesandwelearnedhowtocontrolshardrebalancing.WecontrolledtheshardandreplicasallocationandusedhumanfriendlyCatAPItogetinformationabouttheclusterandnodes.Wesawhowtousewarmerstospeeduppotentiallyheavyqueriesandweusedindexaliasingtomanageourindicesmoreeasily.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

HardwarepreparationsforrunningElasticsearchTuningasingleElasticsearchnodePreparinghighlyavailableandfaulttolerantclustersExpandingElasticsearchverticallyPreparingElasticsearchforhighqueryandindexingthroughputMonitoringElasticsearch

www.EBooksWorld.ir

Page 662: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

HardwareOneofthefirstdecisionsthatweneedtomakewhenstartingeveryserioussoftwareprojectisasetchoicesrelatedtohardware.Andbelieveus,thisisnotonlyaveryimportantchoice,butalsooneofthemostdifficultones.Oftenthedecisionsaremadeatearlyprojectstages,whenonlythebasicarchitectureisknownandwedon’thavepreciseinformationregardingthequeries,dataload,andsoon.Projectarchitecthastobalanceprecautionandprojectedcostofthewholesolution.Toomanytimesitisanintersectionofexperienceandclairvoyance,whichcanleadtoeithergreatorterribleresults.

www.EBooksWorld.ir

Page 663: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PhysicalserversoracloudLet’sstartwithadecision:acloud,virtual,orphysicalmachines.Nowadays,theseareallvalidoptions,butitwasnotalwaysthecase.Sometimeagotheonlyoptionwastobuynewserversforeachenvironmentpartorshareresourceswiththeotherapplicationsonthesamemachine.Thesecondoptionmakesperfectsenseasitismorecost-effectivebutintroducesrisk.Problemswithoneapplication,especiallywhentheyarehardwarerelated,willresultinproblemsforanotherapplication.YoucanimagineoneofyourapplicationsusingmostoftheI/OsubsystemofthephysicalmachineandalltheotherapplicationsstrugglingwithlotsofI/Owaitsandperformanceproblemsbecauseofthat.Virtualizationpromisesapplicationseparationandamoreconvenientwayofmanagingresources,butyouarestilllimitedbytheunderlyinghardware.Everyunexpectedtrafficcouldbeaproblemandaffectserviceavailability.Imaginethatyourecommercesitesuddenlygainsmassivenumberofcustomers.Insteadofbeinggladthatthespikeappearedandyouhavemorepotentialcustomers,yousearchforaplacewhereyoucanbuyadditionalhardwarethatwillbesuppliedassoonaspossible.

Cloudcomputingontheotherhandmeansamoreflexiblecostmodel.Wecaneasilyaddnewmachineswheneverweneed.Wecanaddthemtemporarilywhenweexpectagreaterload(forexample,beforeChristmasforanecommercesite)andpayonlyfortheactuallyusedprocessingpower.Itisjustafewclicksintheadminpanel.Evenmore,wecanalsosetupautomaticscaling,thatisnewvirtualmachinescanappearautomaticallywhenweneedthem.Cloud-basedsoftwarecanalsoshutthemdownwhenwedonotneedthemanymore.Thecloudhasmanyadvantages,suchaslowerinitialcost,abilitytoeasilygrowyourbusiness,andinsensitivitytotemporalfluctuationsofresourcerequirements,butitalsohasseveralflaws.Thecostsofcloudserversrisefasterthanthatofphysicalmachines.Also,massstorage,althoughpracticallyunlimited,hasworsecharacteristics(numberofoperationsperseconds)thanphysicalservers.Thisissometimesagreatproblemforus,especiallywithdiskbasedstoragesuchasElasticsearch.

Inpractice,asusual,thechoicecanbehardbutgoingthroughafewpointscanhelpyouwithyourdecision:

Businessrequirementsmaydirectlypointforyourownservers;forexample,someproceduresrelatedtofinancialormedicaldataautomaticallyexcludecloudsolutionshostedbythird-partyvendorsForproofofconceptandlow/mediumloadservices,thecloudcanbeagoodchoicebecauseofsimplicity,scalability,andlowcostSolutionswithstrongrequirementsconnectedwithI/OsubsystemswillprobablyworkbetteronbaremetalmachineswhereyouhavegreaterinfluencewhatstoragetypeisavailabletoyouWhenthetrafficcangreatlychangewithinashorttime,thecloudisaperfectplaceforyou

Forthepurposeoffurtherdiscussion,let’sassumethatwewanttobuyourownservers.Weareinthecomputerstorenowandlet’sbuysomething!

www.EBooksWorld.ir

Page 664: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CPUInmostcases,thisistheleastimportantpart.YoucanchooseanymodernCPUmodelbutyoushouldknowthatmorenumberofcoresmeansahighernumberofconcurrentqueriesandindexingthreads.Thatwillleadtobeingabletoindexdatafaster,especiallywithcomplicatedanalysisandlotsofmerges.

www.EBooksWorld.ir

Page 665: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RAMmemoryMoregigabytesofRAMisalwaysbetterthanlessgigabytesofRAM.Memoryisnecessary,especiallyforaggregationandsorting.Itislessofaproblemnow,withElasticsearch2.0anddocvalues,butstillcomplicatedquerieswithlotsofaggregationrequirememorytoprocessthedata.Memoryisalsousedforindexingbuffersandcanleadtoindexingspeedimprovements,becausemoredatacanbebufferedinmemoryandthusdiskswillbeusedlessfrequently.Ifyoutrytousemorememorythanavailable,theoperatingsystemwillusetheharddisksastemporaryspace(itstartsswapping)andyoushouldavoidthisatallcost.NotethatyoushouldnevertrytoforceElasticsearchtouseasmuchaspossiblememory.ThefirstreasonisJavagarbagecollector–lessmemoryismoreGCfriendly.Thesecondreasonisthattheunusedmemoryisactuallyusedbytheoperatingsystemforbuffersanddiskcache.Infact,whenyourindexcanfitinthisspace,alldataisreadfromthesecachesandnotfromthedisksdirectly.Thiscandrasticallyimprovetheperformance.Bydefault,ElasticsearchandtheI/OsubsystemsharethesameI/Ocache,whichgivesanotherreasontoleaveevenmorememoryfortheoperatingsystemitself.

Inpractice,8GBisthelowestrequirementformemory.ItdoesnotmeanthatElasticsearchwillneverworkwithlessmemory,butformostsituationsanddataintensiveapplications,itisthereasonableminimum.Ontheotherhand,morethan64GBisrarelyneeded.Inlieu,thinkaboutscalingthesystemhorizontallyinsteadofassigningsuchamountsofmemorytoasingleElasticsearchnode.

www.EBooksWorld.ir

Page 666: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

MassstorageWesaidthatweareinagoodsituationwhenthewholeindexfitsintomemory.Inpracticethiscanbedifficulttoachieve,sogoodandfastdisksareveryimportant.Itisevenmoreimportantifoneoftherequirementsishighindexingthroughput.Insuchacase,youmayconsiderfastSSDdisks.Unfortunately,thesedisksareexpensiveifyourdatavolumeisbig.YoucanimprovethesituationbyavoidingusingRAID(seehttps://en.wikipedia.org/wiki/RAID),exceptRAID0.Inmostcases,whenyouhandlefaulttolerancebyhavingmultipleservers,theadditionallevelofsecurityontheRAIDlevelisunnecessary.Thelastthingistoavoidusingexternalstorage,suchasnetworkattachedstorage(NAS)orNFSvolumes.Thenetworklatencyinsuchcasesalwayskillsalltheadvantagesofthesesolutions.

www.EBooksWorld.ir

Page 667: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThenetworkWhenyouuseElasticsearchcluster,eachnodeopensseveralconnectionstoothernodesforvarioususes.Whenyouindex,thedataisforwardedtodifferentshardsandreplicas.Whenyouqueryfordata,thenodeusedforqueryingcanrunmultiplepartialqueriestotheothernodesandcomposereplyfromthedatafetchedfromtheothernodes.Thisiswhyyoushouldmakesurethatyournetworkisnotthebottleneck.Inpractice,useonenetworkforalltheserversintheclusterandavoidsolutionsinwhichthenodesintheclusterarespreadbetweendatacenters.

www.EBooksWorld.ir

Page 668: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

HowmanyserversTheanswerisalwaysthesame,asitdepends.Itdependsonmanyfactors:thenumberofrequestperseconds,thedatavolume,thelevelofthequery’scomplexity,theaggregationsandsortingusage,thenumberofnewdocumentsperunitoftime,howfastnewdatashouldbeavailableforsearching(therefreshtime),theaveragedocumentsize,andtheanalyzersused.Inpractice,thehandiestansweris-testitandapproximate.

Theonethingthatisoftenunderestimatedisdatasecurity.Whenyouthinkaboutfaulttoleranceandavailability,youshouldstartfromthreeservers.Why?WetalkedaboutthesplitbrainsituationintheMasterelectionconfigurationsectionofChapter9,ElasticsearchClusterinDetail.StartingfromthreeserversweareabletohandleasingleElasticsearchnodefailurewithouttakingdownthewholecluster.

www.EBooksWorld.ir

Page 669: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CostcuttingYoudidsometests,consideredcarefullyplannedfunctionalities,estimatedvolumesandload,andwenttotheprojectownerwithanarchitecturedraft.“Itstooexpensive”,hesaidandaskedyoutothinkaboutserversonceagain.Whatcanwedo?

Let’sthinkaboutserverrolesandtrytointroducesomedifferencesbetweenthem.Ifoneoftherequirementsisindexingmassiveamountsofdataconnectedwithtime(maybelogs),thepossiblewayishavingtwogroupsofservers:hotnodes,whennewdataarrives,andcoldnodes,whenolddataismoved.Thankstothisapproach,hotnodesmayhavefasterbutsmallerdisks(thatis,solidstatedrives)inoppositetothecoldnodes,whenfastdisksarenotsoimportantbutspaceis.Youcanalsodivideyourarchitectureintoseveralgroupsasmasterservers(lesspowerful,withrelativlysmalldisks),datanodes(biggerdisks),andqueryaggregatornodes(moreRAM).Wewilltalkaboutthisinthefollowingsections.

www.EBooksWorld.ir

Page 670: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 671: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PreparingasingleElasticsearchnodeWhenwetalkaboutverticalscaling,weoftenmeanaddingmoreresourcestotheserverElasticsearchisrunningon.WecanaddmemoryorwecanswitchtoamachinewithabetterCPUorfasterdiskstorage.Ofcourse,withbettermachineswecanexpectanincreaseinperformance;dependingonourdeploymentanditsbottlenecks,itcanbeasmallorlargeimprovement.However,therearelimitationswhenitcomestoverticalscaling.Forexample,oneofthelimitationsisthemaximumamountofphysicalmemoryavailableforyourserversorthetotalmemoryrequiredbytheJVMtooperate.Whenhavinglargedataandcomplicatedqueries,youcanverysoonrunintomemoryissuesandaddingnewmemorymaynothelpatall.Inthissection,wewilltrytogiveyougeneraladviceonwheretolookandwhattotunewhenitcomestoasingleElasticsearchnode.

Thethingtorememberwhentuningyoursystemisperformancetests,onesthatcanberepeatedunderthesamecircumstances.Onceyoumakeachange,youneedtobeabletoseehowitaffectstheoverallperformance.Inadditiontothat,Elasticsearchscalesgreat.Usingthatknowledge,wecanrunperformancetestsonasinglemachine(orafewofthem)andextrapolatetheresults.Suchobservationsmaybeagoodstartingpointforfurthertuning.

Alsokeepinmindthatthissectiondoesn’tcontainadeepdiveintoallperformancerelatedtopics,butisdedicatedtoshowingyouthemostcommonthings.

www.EBooksWorld.ir

Page 672: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThegeneralpreparationsApartfromallthethingswewilldiscussinthissection,therearethreemajor,operatingsystemrelatedthingsyouneedtoremember:thenumberofallowedfiledescriptors,thevirtualmemory,andavoidingswapping.

NotethatthefollowingsectioncontainsinformationforLinuxoperatingsystems,butyoucanalsoachievesimilaroptionsonMicrosoftWindows.

AvoidingswappingLet’sstartwiththethirdone.ElasticsearchandJavaVirtualMachinebasedapplications,ingeneral,don’tliketobeswapped.Thismeansthattheseapplicationsworkbestiftheoperatingsystemdoesn’tputthememorythattheyuseintheswapspace.Thisisverysimple,because,toaccesstheswappedmemory,theoperatingsystemwillhavetoreaditfromthedisk,whichisslowandwhichwouldaffecttheperformanceinaverybadway.

Ifwehaveenoughmemory,andweshouldhaveifwewantourElasticsearchinstancetoperformwell,wecanconfigureElasticsearchtoavoidswapping.Todothat,wejustneedtomodifytheelasticsearch.ymlfileandincludethefollowingproperty:

bootstrap.mlockall:true

Thisisoneoftheoptions.Thesecondoneistosetthepropertyvm.swappinessinthe/etc/sysctl.conffileto0(forcompleteswapdisabling)or1forswappingonlyinemergency(forKernelversions3.5andabove).

Thethirdoptionistodisableswappingbyediting/etc/fstabandremovingthelinesthatcontaintheswapword.Thefollowingisanexample/etc/fstabcontent:

LABEL=cloudimg-rootfs/ext4defaults,discard00

/dev/xvdbswapswapdefaults00

Todisableswappingwewouldjustremovethesecondlinefromtheabovecontents.Wecouldalsorunthefollowingcommandtodisableswapping:

sudoswapoff-a

However,rememberthatthiseffectwon’tpersistbetweenloggingoffandbackintothesystem,sothisisonlyatemporarysolution.

Also,rememberthatifyoudon’thaveenoughmemorytorunElasticsearch,theoperatingsystemwilljustkilltheprocesswhenswappingisdisabled.

FiledescriptorsMakesureyouhaveenoughlimitsrelatedtofiledescriptorsfortheuserrunningElasticsearch(wheninstallingfromofficialpackages,thatuserwillbecalledelasticsearch).Ifyoudon’t,youmayendupwithproblemswhenElasticsearchtriestoflushthedataandcreatenewsegmentsormergesegmentstogether,whichcanresultinindexcorruption.

Toadjustthenumberofallowedfiledescriptors,youwillneedtoadjustthewww.EBooksWorld.ir

Page 673: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

/etc/security/limits.conffile(atleastonmostcommonLinuxsystems)andadjustoraddanentryrelatedtoagivenuser(forbothsoftandhardlimits).Forexample:

elasticsearchsoftnofile65536

elasticsearchhardnofile65536

Itisadvisedtosetthenumberofallowedfiledescriptorstoatleast65536,butevenmorecanbeneeded,dependingonyourindexsize.

OnsomeLinuxsystems,youmayalsoneedtoloadanappropriatelimitsmodulefortheprecedingsettingtotakeeffect.Toloadthatmodule,youneedtoadjustthe/etc/pam.d/loginfileandaddoruncommentthefollowingline:

sessionrequiredpam_limits.so

ThereisalsoapossibilitytodisplaythenumberoffiledescriptorsavailableforElasticsearchbyaddingthe-Des.max-open-files=trueparametertoElasticsearchstartupparameters.Forexample,likethis:

bin/elasticsearch-Des.max-open-files=true

Whendoingthat,Elasticsearchwillincludeinformationaboutthefiledescriptorsinthelogs:

[2015-12-2000:22:19,869][INFO][bootstrap]max_open_files

[10240]

VirtualmemoryElasticsearch2.2useshybriddirectoryimplementation,whichisacombinationofmmapfsandniofsdirectories.Becauseofthat,especiallywhenyourindicesarelarge,youmayneedalotofvirtualmemoryonyoursystem.Bydefault,theoperatingsystemlimitstheamountofmemorymappedfilesandthatcancauseerrorswhenrunningElasticsearch.Becauseofthat,werecommendincreasingthedefaultvalues.Todothat,youjustneedtoeditthe/etc/sysctl.conffileandsetthevm.max_map_countproperty;forexample,toavalueequalto262144.

Youcanalsochangethevaluetemporarilybyrunningthefollowingcommand:

sysctl-wvm.max_map_count=262144

www.EBooksWorld.ir

Page 674: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThememoryBeforethinkingaboutElasticsearchconfigurationrelatedthings,weshouldrememberaboutgivingenoughmemorytoElasticsearch.Ingeneral,weshouldn’tgivemorethan50-60percentofthetotalavailablememorytotheJVMprocessrunningElasticsearch.WedothatbecausewewanttoleavememoryfortheoperatingsystemandfortheoperatingsystemI/Ocache.However,weneedtorememberthatthe50-60percentfigureisnotalwaystrue.Youcanimaginehavingnodeswith256GBofRAMandhavingindicesof30GBintotalonsuchanode.Insuchcircumstances,evenassigningmorethan60percentofphysicalRAMtoElasticsearchwouldleaveplentyofRAMfortheoperatingsystem.ItisalsoagoodideatosettheXmxandXmspropertiestothesamevaluestoavoidJVMheapsizeresizing.

Anotherthingtorememberarethesocalledcompressedoops(http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html#compressedOop),theordinaryobjectpointers.Javavirtualmachinecanbetoldtousethembyaddingthe-XX:+UseCompressedOopsswitch.ThisallowsJavavirtualmachinetouselessmemorytoaddresstheobjectsontheheap.However,thisisonlytrueforheapsizeslessthanorequalto31GB.Goingforalargerheapmeansnocompressedoopsandhighermemoryusageforaddressingtheobjectsontheheap.

www.EBooksWorld.ir

Page 675: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

FielddatacacheandbreakingthecircuitAsweknow,bydefaultthefielddatacacheinElasticsearchisunbounded.Thiscanbeverydangerous,especiallywhenyouareusingaggregationsandsortingonmanyfieldsthatareanalysed,becausetheydon’tusedocvaluesbydefault.Ifthosefieldsarehighcardinalityones,thenyoucanrunintoevenmoretrouble.Bytroublewemeanrunningoutofmemory.

Wehavetwodifferentfactorswecantunetobesurethatwedon’trunintooutofmemoryerrors.Firstofall,wecanlimitthesizeofthefielddatacacheandweshoulddothat.Thesecondthingisthecircuitbreaker,whichwecaneasilyconfiguretojustthrowexceptionsinsteadofloadingtoomuchdata.Combiningthesetwothingstogetherwillensurethatwedon’trunintomemoryissues.

However,weshouldalsorememberthatElasticsearchwillevictdatafromthefielddatacacheifitssizeisnotenoughtohandleaggregationrequestsorsorting.Thiswillaffectthequeryperformancebecauseloadingthefielddatainformationisnotveryefficientandisresourceintensive.However,inouropinion,itisbettertohaveourqueriesslowerthanhavingourclusterblownupbecauseofoutofmemoryerrors.

ThefielddatacacheandcachesingeneralwerediscussedintheElasticsearchcachessectionofChapter9,ElasticsearchClusterinDetail.

www.EBooksWorld.ir

Page 676: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UsedocvaluesWheneveryouplantousesorting,aggregations,orscriptingheavily,youshouldusedocvalueswheneveryoucan.Thiswillnotonlysaveyouthememoryneededforthefielddatacache,becauseoffewerobjectsproduced,itwillalsomaketheJavavirtualmachineworkbetterwithlowergarbagecollectortime.DocvalueswerediscussedintheMappingsConfigurationsectionofChapter2,IndexingYourData.

www.EBooksWorld.ir

Page 677: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RAMbufferforindexingIntheElasticsearchcachessectionofChapter9,ElasticsearchClusterinDetail,wealsodiscussed.Thereareafewthingswewouldliketomention.Firstofall,themoreRAMfortheindexingbuffer,themoredocumentsElasticsearchwillbeabletoholdinmemory.Sothemorememorywehaveforindexing,thelessoftentheflushtodiskwillhappenandfewersegmentswillbecreated.Becauseofthat,yourindexingwillbefaster.Butofcourse,wedon’twantElasticsearchtooccupy100percentoftheavailablememory.KeepinmindthattheRAMbuffersaresetpershard,sotheamountofmemorythatwillbeuseddependsonthenumberofshardsandreplicasthatareassignedonthegivennodeandonthenumberofdocumentsyouindex.Youshouldsettheupperlimitssoyournodedoesn’tblowupwhenithasmultipleshardsassigned.

www.EBooksWorld.ir

Page 678: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndexrefreshrateElasticsearchusesLuceneandweknowitbynow.ThethingwithLuceneisthattheviewoftheindexisnotrefreshedwhennewdataisindexedorsegmentsarecreated.Toseethenewlyindexeddata,weneedtorefreshtheindex.Bydefault,Elasticsearchdoesthatonceeverysecondandtheperiodofrefreshiscontrolledbyusingtheindex.refresh_intervalproperty,specifiedperindex.Thelowertherefreshrate,thesoonerthedocumentswillbevisibleforsearchoperations.However,thatalsomeansthatElasticsearchwillneedtoputmoreresourcesintorefreshingtheindexview,meaningthattheindexingandsearchingoperationswillbeslower.Thehighertherefreshrate,themoretimeyouwillhavetowaitbeforebeingabletoseethedatainthesearchresults,butyourindexingandqueryingwillbefaster.

www.EBooksWorld.ir

Page 679: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ThreadpoolsWehaven’ttalkedaboutthreadpoolsuntilnow,butwewouldliketomentionthemnow.EachElasticsearchnodeholdsseveralthreadpoolsthatcontroltheexecutionqueuesforoperationssuchasindexingorquerying.Elasticsearchusesseveralpoolstoallowcontroloverhowthethreadsarehandledandmuchthememoryconsumptionisallowedforuserrequests.

NoteJavavirtualmachineallowsapplicationstousemultiplethreads-concurrentlyrunningmultipleapplicationtasks.FormoreinformationaboutJavathreads,refertohttp://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html.

Therearemanythreadpools(wecanspecifythetypeweareconfiguringbyspecifyingthetypeproperty).However,forperformance,themostimportantare:

generic:Thisisthethreadpoolforgenericoperations,suchasnodediscovery.Bydefault,thegenericthreadpoolisoftypecached.index:Thisisthethreadpoolusedforindexinganddeletingoperations.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto200.search:Thisisthethreadpoolusedforsearchandcountrequests.Itstypedefaultstofixedanditssizetothenumberofavailableprocessorsmultipliedby3anddividedby2,withthesizeofthequeuedefaultingto1000.suggest:Thisisthethreadpoolusedforsuggestrequests.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto1000.get:Thisisthethreadpoolusedforrealtimegetrequests.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto1000.bulk:Asyoucanguess,thisisthethreadpoolusedforbulkoperations.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto50.percolate:Thisisthethreadpoolforpercolationrequests.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto1000.

NoteBeforeElasticsearch2.1,wecouldcontrolthetypeofthethreadpool.StartingwithElasticsearch2.1wecannolongerdothat.Formoreinformationpleaserefertotheofficialdocumentation-https://www.elastic.co/guide/en/elasticsearch/reference/2.1/breaking_21_removed_features.html

Forexample,ifwewanttoconfigurethethreadpoolforindexingoperationstohaveasizeof100andaqueueof500,wewillsetthefollowingintheelasticsearch.ymlconfigurationfile:

threadpool.index.size:100

threadpool.index.queue_size:500

www.EBooksWorld.ir

Page 680: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AlsorememberthatthethreadpoolconfigurationcanbeupdatedusingtheclusterupdateAPI.Forexample,likethis:

curl-XPUT'localhost:9200/_cluster/settings'-d'{

"transient":{

"threadpool.index.size":100,

"threadpool.index.queue_size":500

}

}'

Ingeneral,youdon’tneedtoworkwiththethreadpoolsandtheirconfiguration.However,whenconfiguringyourcluster,youmaywanttoputmoreemphasisonindexingorqueryingand,insuchcases,givingmorethreadsorlargerqueuestotheprioritizedoperationmayresultinmoreresourcesbeingusedforsuchoperations.

www.EBooksWorld.ir

Page 681: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 682: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

HorizontalexpansionElasticsearchisahighlyscalablesearchandanalyticsplatform.Wecanscaleitbothhorizontallyandvertically.WediscussedhowtotuneasinglenodeinthePreparingasingleElasticsearchnodesectionearlierinthischapterandwewouldliketofocusonhorizontalscalingnow;howtohandlemultiplenodesinthesamecluster,whatrolesshouldtheyhave,andhowtotunetheconfigurationtohaveahighlyreliable,available,andfaulttolerantcluster.

Youcanimagineverticalscalinglikebuildingaskyscrapper–wehavelimitedspaceavailableandweneedtogoashighaswecan.Ofcourse,thatisexpensiveandrequiresalotofengineeringdoneright.Ontheotherhand,wehavehorizontalscaling,whichislikehavingmanyhousesinaresidentialarea.Insteadofinvestingintohardwareandhavingpowerfulmachines,wechoosetohavemultiplemachinesandourdatasplitbetweenthem.Horizontalscalinggivesusvirtuallyunlimitedscalingpossibilities.Evenwiththemostpowerfulhardware,thetimecomeswhenasinglemachineisnotenoughtohandlethedata,thequeries,orbothofthem.Insuchcases,spreadingthedataamongmultipleserversiswhatsavesusandallowsustohaveterabytesofdatainmultipleindicesspreadacrossthewholecluster,justliketheoneinthefollowingimage:

Wehaveour4nodesclusterwiththelibraryindexcreatedandbuiltoffourshards.

Ifwewanttoincreasethequeryingcapabilitiesofourcluster,wecanjustaddadditionalnodes,forexample,fourofthem.Afteraddingnewnodestothecluster,wecaneithercreatenewindicesthatwillbebuiltofmoreshardstospreadtheloadmoreevenlyoraddreplicastothealreadyexistingshards.Bothoptionsareviable.Thisisbecausewedon’thavethepossibilityofsplittingshardsoraddingmoreprimaryshardstoanexistingindex.Weshouldgoforhavingmoreprimaryshardswhenourhardwareisnotenoughtohandletheamountofdataitholds.Insuchcases,weusuallyrunintooutofmemorysituations,longshardqueryexecutiontime,swapping,orhighI/Owaits.Thesecondoption,thatishavingreplicas,isthewaytogowhenourhardwareishappilyhandlingthedatawehavebutthetrafficissohighthatthenodesjustcan’tkeepup.

www.EBooksWorld.ir

Page 683: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Thefirstoptionissimple,butlet’slooksatthesecondcase-havingmorereplicas.Sowithfouradditionalnodes,ourclusterwouldlookasfollows:

Now,let’srunthefollowingcommandtoaddasinglereplica:

curl-XPUT'localhost:9200/library/_settings'-d'{

"index":{

"number_of_replicas":1

}

}'

Ourclusterviewwouldlookmoreorlessasfollows:

Asyoucansee,eachoftheinitialshardsbuildingthelibraryindexhasasinglereplicastoredonanothernode.ThenicethingaboutshardsandtheirreplicasisthatElasticsearchissmartenoughtobalancetheshardsinasingleindexandputthemonseparatenodes.Forexample,youwon’teverendupinasituationwhereyouhaveashardanditsreplicasonthesamenode.Also,Elasticsearchisabletoroundrobinthequeriesbetweentheshardsandtheirreplicas,whichmeansthatallthenodeswillbehitbythequeriesandwedon’thavetocareaboutthat.Becauseofthat,weareabletohandlealmostdoublethequeryloadcomparedtoourinitialdeployment.

www.EBooksWorld.ir

Page 684: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AutomaticallycreatingthereplicasLet’sstayabitlongeraroundreplicas.Elasticsearchallowsustoautomaticallyexpandreplicaswhentheclusterisbigenough.Thismeansthatthereplicascanbecreatedautomaticallywhennewnodesareaddedtothecluster.Youcanwonderwheresuchfunctionalitycanbeuseful.Imagineasituationwhereyouhaveasmallindexthatyouwouldliketobepresentoneverynodesothatyourpluginsdon’thavetorundistributedqueriesjusttogetthedatafromit.Inadditiontothat,yourclusterisdynamicallychanging,thatisyouaddandremovenodesfromit.ThesimplestwaytoachievesuchfunctionalityistoallowElasticsearchtoautomaticallyexpandthereplicas.Todothat,weneedtosetindex.auto_expand_replicasto0-all,whichmeansthattheindexcanhave0replicasorbepresentonallthenodes.SoifoursmallindexiscalledshopsandwewouldlikeElasticsearchtoautomaticallyexpanditsreplicastoallthenodesinthecluster,wewouldusethefollowingcommandtocreatetheindex:

curl-XPOST'localhost:9200/shops/'-d'{

"settings":{

"index":{

"auto_expand_replicas":"0-all"

}

}

}'

Wecanalsoupdatethesettingsofthatindexifitisalreadycreatedbyrunningthefollowingcommand:

curl-XPUT'localhost:9200/shops/_settings'-d'{

"index":{

"auto_expand_replicas":"0-all"

}

}'

www.EBooksWorld.ir

Page 685: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RedundancyandhighavailabilityTheElasticsearchreplicationmechanismnotonlygivesusabilitytohandlehigherquerythroughput,butalsogivesusredundancyandhighavailability.ImagineanElasticsearchclusterhostingasingleindexcalledlibrarythatisbuiltof2shardsand0replicas.Suchaclusterwouldlookasfollows:

Nowwhathappenswhenoneofthenodesfail?Thesimplestansweristhatweloseabout50percentofthedataand,ifthefailureisfatal,welosethatdataforever.Evenwhenhavingbackups,wewouldneedtospinupanothernodeandrestorethebackupandthattakestime.Duringthattime,yourapplication,orpartsofitthatarebasedonElasticsearch,can’tworkcorrectly.IfyourbusinessreliesonElasticsearch,downtimemeansmoneyloss.Ofcourse,wecanusereplicastocreatemorereliableclustersthatcanhandlethehardwareandsoftwarefailures.Andonethingtorememberisthateverythingwillfaileventually–ifthesoftwarewon’t,hardwarewill.Forexample,sometimeagoGooglesaidthatineachoftheirclusters,duringthefirstyearatleast1000machineswillfail(youcanreadmoreonthattopicathttp://www.cnet.com/news/google-spotlights-data-center-inner-workings/).Becauseofthat,weneedtobereadytohandlesuchcases.

Let’slookatthesameclusterbutwithonereplica:

NowlosingasingleElasticsearchnodemeansthatwestillhavethewholedataavailableandwecanworkonrestoringthefullclusterstructurewithoutdowntime.Ofcourse,thisisonlyaverysmallclusterbuiltoftwoElasticsearchnodesclusters.Thelargerthecluster,themorereplicas,themorefailureyouwillbeabletohandlewithoutworryingaboutthedataloss.Ofcourseyouwillhavelowerperformance,dependingonthepercentageofnodesthatfail,butthedatawillstillbethereandtheclusterwillbeoperational.

That’swhy,whendesigningyourarchitectureanddecidingonthenumberofnodesandindicesandtheirarchitecture,youshouldtakeintoconsiderationhowmanynodes,failure

www.EBooksWorld.ir

Page 686: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

youwanttolivewith.Ofcourse,youcan’tforgetabouttheperformancepartoftheequation,butredundancyandhighavailabilityshouldbeoneofthefactorsofthescalingequation.

www.EBooksWorld.ir

Page 687: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

CostandperformanceflexibilityThedefaultdistributednatureofElasticsearchanditsabilitytoscalehorizontallyallowsustobeflexiblewhenitcomestoperformanceandcoststhatwehavewhenrunningourenvironment.Firstofall,highendserverswithhighperformancedisks,numerousCPUcores,andalotofRAMarestillexpensive.Inadditiontothat,cloudcomputingisgettingmoreandmorepopularandifyouneedalotofflexibilityanddon’twanttohaveyourownhardware,youcanchoosesolutionssuchasAmazon(http://aws.amazon.com/),Rackspace(http://www.rackspace.com/),DigitalOcean(https://www.digitalocean.com/),andsoon.Theydonotonlyallowustorunoursoftwareonrentedmachines,butalsoallowustoscaleondemand.Wejustneedtoaddmoremachineswhichisafewclicksawayorcanevenbeautomatedwithsomedegreeofwork.

Usingahostedsolutionwithoneclickmachinerentingallowshavingatrulyhorizontallyscalablesolution.Ofcourse,that’snotcheap–youpayfortheflexibility.Butwecaneasilysacrificeperformanceifcostsarethemostcrucialfactorinourbusinessplan.Ofcourse,wecanalsogotheotherway.Ifwecanaffordlargebaremetalmachines,Elasticsearchclusterscanbepushedtohundredsofterabytesofdatastoredintheindicesandstillgetdecentperformance(ofcoursewithaproperhardwareandpropertydistributed).

www.EBooksWorld.ir

Page 688: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ContinuousupgradesHighavailability,costandperformanceflexibility,andvirtuallyendlessgrowtharenottheonlythingsworthtalkingaboutwhendiscussingthescalabilitysideofElasticsearch.Atsomepointintime,youwillwanttohaveyourElasticsearchclusterupgradedtoanewversion.Itcanbebecauseofbugfixes,performanceimprovements,newfeatures,oranythingthatyoucanthinkof.Thethingisthatwhenyouhaveasingleinstanceofeachshard,withoutreplicas,anupgrademeansunavailabilityofElasticsearch(oratleastitsparts)andthatmaymeandowntimeoftheapplicationsthatuseElasticsearch.Thisisanotherreasonwhyhorizontalscalingissoimportant;youcanperformupgrades,atleasttothepointwheresoftwaresuchasElasticsearchsupports.Forexample,youcantakeElasticsearch2.0andupgradetoElasticsearch2.1withonlyrollingrestarts(gettingonenodeoutofthecluster,upgradingit,bringingitback,andcontinuingwiththenextnodeuntilallthenodesaredone),thushavingallthedatastillavailableforsearchingandindexinghappeningatthesametime.

www.EBooksWorld.ir

Page 689: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

MultipleElasticsearchinstancesonasinglephysicalmachineHavingalargephysicalmachinewithlotofmemoryandCPUcoreshasadvantagesandsomechallenges.Firstofall,ifyoudecidetorunasingleElasticsearchnodeonthatmachine,youwillsoonerorlaterrunintogarbagecollectionissues,youwillhavelotsofshardsonasinglenodewhichwillrequireahighnumberofI/OoperationsfortheinternalElasticsearchcommunication(retrievingclusterstatistics),andsoso.What’smore,youusuallyshouldn’tgoabove31GBofheapmemoryforasingleJVMprocessbecauseyoucan’tusecompressedordinaryobjectpointers(https://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html).

Insuchcases,youcaneitherrunmultipleElasticsearchinstancesonthesamebaremetalmachine,runmultiplevirtualmachinesandasingleElasticsearchinsideeachone,orrunElasticsearchinacontainer,suchasDocker(http://www.docker.com/).Thisisoutofthescopeofthebook,but,becausewearetalkingaboutscaling,wethoughtitmaybeagoodthingtomentionwhatcanbedoneinsuchcases.

NoteThereisalsothepossibilityofrunningmultipleElasticsearchserversonasinglephysicalmachinewithoutrunningmultiplevirtualmachines.Whichroadtotake-virtualmachinesormultipleinstances-isreallyyourchoice.However,weliketokeepthingsseparateandbecauseofthatweusuallygofordividinganylargeserverintomultiplevirtualmachines.Whendividingonelargeserverintomultiplesmallervirtualmachines,rememberthattheI/Osubsystemwillbesharedacrossthosesmallervirtualmachines.Becauseofthat,itmaybegoodtowiselydividethedisksbetweenthevirtualmachines.

PreventingashardanditsreplicasfrombeingonthesamenodeThereisoneadditionalthingworthmentioning.Whenyouhavemultiplephysicalserversdividedintovirtualmachines,itiscrucialtoensurethattheshardanditsreplicadon’tenduponthesamephysicalmachine.Bydefault,ElasticsearchissmartenoughtonotputtheshardanditsreplicaonthesameElasticsearchinstance,butitdoesn’tknowanythingaboutbaremetalmachines,soweneedtotellit.WecantellElasticsearchtoseparatetheshardsandreplicasbyusingclusterallocationawareness.Inourpreviouscase,wehadthreephysicalservers.Let’scallthem:server1,server2,andserver3.

NowforeachElasticsearchonaphysicalserver,wedefinethenode.server_namepropertyandwesetittotheidentifieroftheserver(thenameofthepropertycanbeanythingwewant).Soforexample,forallElasticsearchnodesonthefirstphysicalserver,wewouldsetthefollowingpropertyintheelasticsearch.ymlconfigurationfile:

node.server_name:server1

Inadditiontothat,eachElasticsearchnode(nomatteronwhichphysicalserver)needstohavethefollowingpropertyaddedtotheelasticsearch.ymlconfigurationfile:

www.EBooksWorld.ir

Page 690: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

cluster.routing.allocation.awareness.attributes:server_name

IttellsElasticsearchnottoputtheprimaryshardanditsreplicasonthenodeswiththesamevalueinthenode.server_nameproperty.ThisisenoughforusandElasticsearchwilltakecareoftherest.

www.EBooksWorld.ir

Page 691: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

DesignatednoderolesforlargerclustersThereisonemorethingthatwewanttodiscussandemphasise.Whenitcomestolargeclusters,itisimportanttoassignrolestoallthenodesinthecluster.ThisallowsforatrulyfullyfaulttolerantandhighlyavailableElasticsearchcluster.TheroleswecanassigntoeachElasticsearchnodeareasfollows:

MastereligiblenodeDatanodeQueryaggregatornode

Bydefault,eachElasticsearchnodeisbothmastereligible(itcanserveasamasternode),canholddata,andworkasaqueryaggregatornode.Youmaywonderwhythatisneeded.Letusgiveyouasimpleexample:ifthemasternodeisunderalotofstress,itmaynotbeabletohandletheclusterstaterelatedcommandfastenoughandtheclustercouldbecomeunstable.Thisisonlyasingle,simpleexampleandyoucanthinkofnumerousothers.

Becauseofthat,mostElasticsearchclustersthatarelargerthanafewnodes,usuallylookliketheonepresentedinthefollowingpicture:

Asyoucansee,ourhypotheticalclustercontainsthreeclientnodes(becauseweknowthattherewillbealotofqueries),alargenumberofdatanodesbecausetheamountofdatawillbelarge,andatleastthreemastereligiblenodesthatshouldn’tbedoinganythingelse.WhythreemasternodeswhenElasticsearchwillonlyuseasingleoneatanygiventime?Again,becauseofredundancyandtobeabletopreventsplitbrainsituationsbysettingdiscovery.zen.minimum_master_nodesto2,whichwouldallowustoeasilyhandlethefailureofasinglemastereligiblenodeinthecluster.

Letusnowgiveyousnippetsoftheconfigurationforeachtypeofnodeinourcluster.WealreadytalkedaboutthatintheUnderstandingnodediscoverysectioninChapter9,ElasticsearchClusterinDetail,butwewouldliketomentionthatonceagain.

www.EBooksWorld.ir

Page 692: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

QueryaggregatornodesThequeryaggregatornodesconfigurationisquitesimple.Toconfigurethose,wejustneedtotellElasticsearchthatwedon’twantthosenodestobemastereligibleortoholddata.Thiscorrespondstothefollowingconfigurationsnippetsintheelasticsearch.ymlfile:

node.master:false

node.data:false

DatanodesDatanodesarealsoverysimpletoconfigure.Wejustneedtotellthattheyshouldnotbemastereligible.However,wearenotbigfansofdefaultconfigurations(becausetheytendtochange)andthusourElasticsearchdatanodesconfigurationlooksasfollows:

node.master:false

node.data:true

MastereligiblenodesWe’veleftthemastereligiblenodestotheendofthegeneralscalingsection.Ofcourse,suchElasticsearchnodesshouldn’tbeallowedtoholddata,but,inadditiontothat,itisagoodpracticetodisableHTTPprotocolonsuchnodes.Thisisdonetoavoidaccidentallyqueryingthosenodes.Mastereligiblenodescanuselessresourcesthandataandqueryaggregatornodesandbecauseofthatweshouldensurethattheyareonlyusedformasterrelatedpurpose.Soourconfigurationformastereligiblenodeslooksmoreorlessasfollows:

node.master:true

node.data:false

http.enabled:false

www.EBooksWorld.ir

Page 693: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 694: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

PreparingtheclusterforhighindexingandqueryingthroughputUntilthischapter,wemostlytalkedaboutdifferentfunctionalitiesofElasticsearch,bothintermsofhandlingqueries,indexingdata,andtuning.However,runningaclusterinproductionisnotonlyaboutusingthisgreatsearchengine,butalsoaboutpreparingtheclustertohandleboththeindexingandqueryingload.Let’snowsummarizetheknowledgewehaveandseewhatarethethingsweneedtocareaboutwhenitcomestopreparingtheclusterforhighindexingandqueryingthroughput.

www.EBooksWorld.ir

Page 695: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndexingrelatedadviceInthissection,wewilllookattheindexingrelatedadvicearoundtuningElasticsearch.Eachproductionenvironmentdataisdifferent,indexrateisdifferent,anduser’sbehaviorisdifferent.Takethatintoconsiderationandrunperformancetestsonyourenvironment.Thiswillgiveyouthebestideaaboutwhattoexpectandwhatworksthebestinthecaseofyoursystem.

IndexrefreshrateOneofthegeneralthingsyoushouldpayattentiontoistheindexrefreshrate.Weknowthatrefreshratespecifieshowfastthedocumentswillbevisibleforsearchoperations.Theequationisquitesimple-thefastertherefreshrate,theslowerthequerieswillbeandthelowertheindexingthroughput.Ifwecanallowourselvestohaveaslowerrefreshrate,suchas10sor30s,goforit.ItwillputlesspressureonElasticsearch,Lucene,andhardwareingeneral.Rememberthatbydefaulttherefreshrateissetto1s,whichbasicallymeansthattheindexsearcherobjectisreopenedeverysecond.

Togiveyouabitofinsightintowhatperformancegainswearetalkingabout,wedidsomeperformancetestsincludingElasticsearchanddifferentrefreshrates.Withtherefreshrateof1swewereabletoindexabout1000documentspersecondusingasingleElasticsearchnode.Increasingtherefreshrateto5sgaveusincreaseinindexingthroughputofmorethan25percentandwewereabletoindexabout1250documentspersecond.Settingtherefreshrateto25sgaveusabout70percentofmorethroughputascomparedto1srefreshrate,whichwasabout1700documentspersecondonthesameinfrastructure.Itisalsoworthrememberingthatincreasingthetimeindefinitelydoesn’tmakemuchsense,becauseafteracertainpoint(dependingonyourdataloadandtheamountofdatayouhave)theincreaseofperformanceisnegligible.

Someperformancecomparisonsrelatedtoindexingthroughputandindexrefreshratecanbefoundintheblogpostathttp://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/.

ThreadpoolstuningBydefault,Elasticsearchcomeswithverygooddefaultswhenitcomestoallthreadpoolsconfiguration.Youshouldrememberthattuningthedefaultthreadpoolsconfigurationshouldbedoneonlywhenyoureallyseethatyournodesarefillingupthequeuesandtheyhavestillprocessingpowerleftthatcouldbedesignatedtotheprocessingofthewaitingoperationsorwhenyouwanttoincreasethepriorityofoneormoreoperations.

Forexample,ifyoudidyourperformancetestsandyousawyourElasticsearchinstancesnotbeingsaturated100percent,butontheotherhandyouexperiencedarejectedexecutionerror,thenthatisapointwhenyoushouldstartadjustingthethreadpools.Youcaneitherincreasetheamountofthreadsthatareallowedtobeexecutedatthesametimeorincreasethequeue.Ofcourse,youshouldalsorememberthatincreasingthenumberofconcurrentlyrunningthreadstoveryhighnumberswillleadtomanyCPUcontextswitches(http://en.wikipedia.org/wiki/Context_switch)whichwillresultinaperformance

www.EBooksWorld.ir

Page 696: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

drop.

AutomaticstorethrottlingBeforeElasticsearch2.0,wehadtocareabouthowoursegmentprocesswasconfiguredandhowmuchdiskI/Omergingcoulduseingeneral,butthatchanged.RightnowElasticsearchlooksathowI/Osubsystembehavesandadjuststhethrottlingandmergingprocessifthemergesarefallingbehindtheindexing.So,wenolongerneedtoautomaticallyadjustthrottlingfordiskbasedoperations.YoucanreadmoreabouttherelatedchangesonGitHubathttps://github.com/elastic/elasticsearch/pull/9243.

Handlingtime-baseddataWhenyouhavetime-baseddata,suchaslogsforexample,thearchitectureofyourindicesplaysaveryimportantrole.Let’sassumethatwehavelogsindexedintoElasticsearch.Theseusuallycomeinlargenumbers,areconstantlyindexed,andaretimerelated(aneventthatisloggedhappenedatacertainpointintime).TheassumptionisthatyouhaveacertainretentiontoyourdataandatimethatyouwouldlikethedatatobepresentandsearchableinElasticsearch.Afterthattime,youjustdeletethedataandforgetaboutit.

Withsuchassumptionsinmind,youcouldjustcreateasingleindexwithlotofshardsandtrytoindexlargeamountsoflogsthere.However,that’snottheperfectsolution.Firstofall,becauseofmerges–thelargertheindexgets,themoreexpensivethemergesare.ElasticsearchneedstomergelargerandlargersegmentsandmoreI/OandCPUisrequiredtohandlethem.Thismeansslowdowns.Inadditiontothat,deleteswillbeexpensivebecauseyouwillhavetodeletethedataeitherbyusingTTLorbyusingdeletebyqueryplugin–bothexpensivetouseintermsofperformanceandwillcauseevenmoremerging.Andthisisnoteverything–duringqueryingyouwillhavetorunthroughthewholeindextogeteventhesmallestsliceofthedata.So,aretherebetterindexarchitecturesfortime-baseddata?

Yes,oneofthemostcommonandbestsolutionsistousetimebasedindices.Dependingonthedatavolume,youcanhavedaily,weekly,monthly,orevenhourlyindices.Thedownsideisthenumberofshardsyouwillhavewhenthenumberofindicesgrow,butapartfromthatthereareonlypros:youcancontroleachindex,changethenumberofshardsifthatisneeded,andhavefastermergingbecausetheindiceswillbesmallercomparedtoonlyonebigindex.What’smore,deletingdatawon’tbepainfulatall–theideaistodeletethewholeindices;forexample,adayworthofdataincaseofdailyindices.Querieswillalsobenefit–youcanjustrunthequeryonasingletimebasedindextonarrowdownthesearchresults.Finally,Elasticsearch,bydefault,willcreatetheindicesforus.Forexample,whenusingdailyindices,wecanhavenamessuchaslogs_2016-01-01,logs_2016-01-02,andsoon.

TheonlythingweneedtocareaboutisprovidingtheindexnameonthebasisofthedateandcreatingtemplatestoconfigureeachnewlycreatedindexandElasticsearchwilldotherest.

Multipledatapaths

www.EBooksWorld.ir

Page 697: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

WiththereleaseofElasticsearch2.0,weweregiventheabilitytospecifymultiplepath.datapropertiesinourelasticsearch.ymlpointingtodifferentdirectoriesondifferentphysicaldevices.Elasticsearchcannowleveragethatbyputtingdifferentshardsondifferentdevicesandusingthemultiplepathsinthemostefficientway.Becauseofthat,wecanparallelizewritingtodisksifwehavemorethanasingledisk.Thisisespeciallyusefulforhighindexingusecaseswhereyouindexalotofdata.

DatadistributionAsweknow,eachindexintheElasticsearchworldcanbedividedintomultipleshardsandeachshardcanhavemultiplereplicas.IncaseswhenyouhavemultipleElasticsearchnodes(andyouwillprobablyhaveinproduction),youshouldthinkaboutthenumberofshardsandreplicasandhowthatwillaffectyournodes.Datadistributionmaybecrucialtoeventheloadontheclusterandnothavesomenodesdoingmoreworkthantheotherones.

Let’stakethefollowingexample.Imaginewehaveaclusterthatisbuiltof4nodesandithasasingleindexcalledbookbuiltof3shardsandonereplica.Suchadeploymentwilllookasfollows:

Asyoucansee,thefirsttwonodeshavetwophysicalshardsallocatedtothem,whilethelasttwonodeshaveonlyoneshardallocatedeach.Theactualdataallocationisnoteven.Whensendingthequeriesandindexingdata,wewillhavethefirsttwonodesdomoreworkthantheothertwo-thisiswhatwewanttoavoid.Oneoptionistohavethebookindexhavetwoshardsandonereplica,soitlooksasfollows:

www.EBooksWorld.ir

Page 698: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Thisarchitecturewillworkanditisperfectlyfine.Wedon’thavetohaveprimaryshardsonallournodes,wecanhavereplicas,dependingonwhatbottleneckweexpect.Forqueryingwemaywanttohavemorereplicas,forindexingmoreprimaries.

Wecanalsohaveourprimaryshardssplitevenly,likeinthefollowingimage:

ThethingtorememberthoughisthatinbothcaseswewillendupwithevendistributionofshardsandreplicasandElasticsearchwilldosimilaramountofworkonallthenodes.Ofcourse,withmoreindices(likehavingdailyindices)itmaybetrickiertogetthedataevenlydistributedanditmaynotbepossibletohaveevenlydistributedshards,butweshouldtrytogettosuchpoint.

Onemorethingtorememberwhenitcomestodatadistributionandshardsandreplicasisthatwhendesigningyourindexarchitecture,youshouldrememberwhatyouwanttoachieve.Ifyouaregoingforaveryhighindexingusecase,youmaywanttospreadtheindexintomultipleshardstolowerthepressurethatisputontheCPUandtheI/Osubsystemoftheserver.Thisisalsotrueforrunningexpensivequeries,becausewithmoreshardsyoucanlowertheloadonasingleserver.However,withthequeriesthereis

www.EBooksWorld.ir

Page 699: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

onemorething-ifyournodescan’tkeepupwiththeloadcausedbyqueries,youcanaddmoreElasticsearchnodesandincreasethenumberofreplicassothatthephysicalcopiesoftheprimaryshardsareplacedonthosenodes.Thatwillmakeindexingabitslowerbutwillgiveyouthecapacitytohandlemorequeriesatthesametime.

BulkindexingThisisveryobviousadvice,butyouwouldbesurprisedhowmanyElasticsearchusersforgetaboutindexingdatainbulksinsteadofsendingthedocumentsonebyone.Sotheadvicehereistodobulksinsteadofonebyoneindexingwheneverpossible.ThethingtorememberthoughisnottooverloadElasticsearchwithtoomanybulkrequestsandtokeepthemunderareasonablesize(donotpushmillionsofdocumentsinasinglerequest).Rememberaboutthebulkthreadpoolanditssizeandtrytoadjustyourindexersnottogobeyonditoryouwillfirststarttoqueuetheserequestsand,ifElasticsearchwillnotbeabletoprocessthem,youwillquicklystartseeingrejectedexecutionexceptionsandyourdatawon’tbeindexed.

Justasanexample,wewouldliketoshowresultsoftestswedidsometimeagoforthetwotypesofindexing:onebyoneandbulks.Inthefollowingimage,wehavetheindexingthroughputwhenrunningindexationonedocumentbyone:

Inthisnextimage,wedothesame,butinsteadofindexingdocumentsonebyone,weindextheminbatchesof10documents(whichisstillarelativelylownumberofdocumentsinabulk):

www.EBooksWorld.ir

Page 700: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Asyoucansee,whenindexingdocumentsonebyone,wewereabletoindexabout30documentspersecondanditwasstable.Thesituationchangedwithbulkindexingandbatchesof10documents;wewereabletoindexslightlymorethan200documentspersecond.Sothedifferencecanbeclearlyseen.

Ofcoursethisisaverybasiccomparisonofindexingspeed.Toshowtherealdifference,weshouldusedozensofthreadsandpushElasticsearchtoitslimits.However,theprecedingcomparisonshouldgiveyouabasicviewoftheindexingthroughputgainswhenusingbulkindexing.

RAMbufferforindexingRemember,themoreavailableRAMfortheindexingbuffer(theindices.memory.index_buffer_sizeproperty),themoredocumentsElasticsearchcanholdinmemory.However,wedon’twanttohaveElasticsearchoccupy100percentoftheavailablememory.Theindexingbuffercanhelpuswithdelayingtheflushtodisk,whichwillmeanlessI/Opressureandlessmerges.YoucanreadmoreaboutindexingbufferconfigurationinChapter9,ElasticsearchClusterinDetail.

www.EBooksWorld.ir

Page 701: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

AdviceforhighqueryratescenariosOneofthegreatfeaturesofElasticsearchisitsabilitytosearchandanalyzethedatathatwasindexed.However,sometimesitisnecessarytoadjustElasticsearchandourqueriestonotonlygettheresultsofthequery,butalsogetthemfast(orinareasonableamountoftime).Inthissection,wewilllookatthepossibilitiesofpreparingElasticsearchforhighquerythroughputusecases,butnotjustthat.Wewillalsolookatgeneralperformancetipswhenitcomestoquerying.

ShardrequestcacheThepurposeoftheshardrequestcacheistocacheaggregations,suggesterresults,andnumbersofhits(itwillnotcachethereturneddocumentsandthusonlyworkswithsize=0).Whenyourqueriesuseaggregationsorsuggestions,itmaybeagoodideatoenablethiscache(itisdisabledbydefault)sothatElasticsearchcanre-usethedatastoredthere.Thebestthingaboutthecacheisthatitpromisesthesamenearreal-timesearchasasearchthatisnotcached.YoucanreadmoreaboutcachesandtheshardrequestcacheinparticularinChapter9,ElasticsearchClusterinDetail.

ThinkaboutthequeriesThisisthemostgeneraladvicewecanactuallygive–youshouldalwaysthinkaboutoptimalquerystructure,filterusage,andsoon.Forexample,let’slookatthefollowingquery:

{

"query":{

"bool":{

"must":[

{

"query_string":{

"query":"masteringANDdepartment:itANDcategory:book",

"default_field":"name"

}

},

{

"term":{

"tag":"popular"

}

},

{

"term":{

"tag":"2014"

}

}

]

}

}

}

Itreturnsthebookmatchingafewconditions.However,thereareafewthingswecanimproveintheprecedingquery.Forexample,wecanmovethestaticthingssuchasthe

www.EBooksWorld.ir

Page 702: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

tag,department,andcategoryfieldrelatedconditionstothefiltersectionoftheBooleanquery,sothatthenexttimeweusesomepartsofthequerywesaveCPUcyclesandre-usetheinformationstoredincache.Thatstaticfilteringinformationisalsonotrelevantwhenitcomestoscoring.Becauseofthatwecanmovethosestaticelementstothefiltersectionandomitscoringcalculationforthem.Forexample,thisishowtheoptimizedquerywilllooklike:

{

"query":{

"bool":{

"must":[

{

"query_string":{

"query":"mastering",

"default_field":"name"

}

}

],

"filter":[

{

"term":{

"tag":"popular"

}

},

{

"term":{

"tag":"2014"

}

},

{

"term":{

"department":"it"

}

},

{

"term":{

"category":"book"

}

}

]

}

}

}

Asyoucansee,thereareafewthingsthatwedid.Westillusedtheboolquery,butweintroducedtheuseofthefiltersection.Weusedfilteringforthestatic,non-analyzedfields.Thisallowsustoeasilyre-usethefiltersinthenextqueriesthatweexecute.Becauseofsuchqueryrestructuring,wewereabletosimplifythemainquery.Thisisexactlywhatyoushouldbedoingwhenoptimizingyourqueriesordesigningthem-haveoptimizationandperformanceinmindandtrytokeepthemasoptimalastheycanbe.Thiswillresultinfasterexecutionofthequeries,lowerresourceconsumption,andbetterhealthofthewholeElasticsearchcluster.

www.EBooksWorld.ir

Page 703: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ParallelizeyourqueriesOnethingthatisusuallyforgottenistheneedofparallelizingqueries.Imaginethatyouhaveadozennodesinyourclusterbutyourindexisbuiltofasingleshard.Iftheindexislarge,yourquerieswillperformworsethanyouexpect.Ofcourseyoucanincreasethenumberofreplicas,butthatwon’thelp.Asinglequerywillstillgotoasingleshardinthatindex,becausereplicasarenotmorethanthecopiesoftheprimaryshardandtheycontainthesamedata(oratleasttheyshould).Thisisalsotruenotonlyforindiceshavingoneshardbutalsoifyouhavemorethanoneshard,buttheyareverylarge,youcanstillhaveperformancerelatedproblems.Itissaidthatthequeryisonlyasfastastheslowestpartialqueryresponse.

Ofcourse,theparallelizationalsodependsontheusecase.IfyourunalotofqueriestoElasticsearch,youmaynotneedtoparallelizethequeries,especiallywhentheshardsaresmallenoughandyoudon’tseeproblemsatshardlevel.Ingeneral,lookatyourElasticsearchnodesandseeiftheyhaveunusedCPUcoresand,ifthat’sthecase,youmayhaveroomforimprovementandparallelization.

FielddatacacheandbreakingthecircuitWehavetwodifferentfactorswecantunetobesurethatwedon’trunintooutofmemoryerrors.Firstofall,wecanlimitthesizeofthefielddatacache.Thesecondthingisthecircuitbreaker,whichwecaneasilyconfiguretojustthrowanexceptioninsteadofloadingtoomuchdata.Combiningthesetwothingswillensurethatwedon’trunintomemoryissues.Evenifyouareusingdocvaluesalot,youmaystillrunintooutofmemoryissues.Forexample,foranalysedfields,whichcan’tusedocvaluesandwilluse,fielddatacache–configurethefielddatacacheandcircuitbreakerscorrectly.YoucanreadmoreabouthowtoconfiguretheminChapter9,ElasticsearchClusterinDetail.

KeepsizeandshardsizeundercontrolWhendealingwithsomeofthequeriesthatuseaggregations,wehavethepossibilityofusingtwoproperties:sizeandshard_size.Thesizeparameterdefineshowmanybucketsshouldbereturnedbythefinalaggregationresults;thenodethataggregatesthefinalresultswillgetthetopbucketsfromeachshardthatreturnstheresultandwillonlyreturnthetopsizeofthemtotheclient.Theshard_sizeparametertellsElasticsearchaboutthesamebutattheshardlevel.Increasingthevalueoftheshard_sizeparameterwillleadtomoreaccurateaggregations(likeinthecaseofsignificanttermsaggregation)atthecostofnetworktrafficandmemoryusage.Loweringthatparameterwillcauseaggregationresultstobelessprecise,butwewillbenefitfromlowermemoryconsumptionandlowernetworktraffic.Ifweseethatthememoryusageistoolarge,wecanlowerthesizeandshard_sizepropertiesforproblematicqueriesandseeifthequalityoftheresultsisstillacceptable.

www.EBooksWorld.ir

Page 704: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 705: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

MonitoringElasticsearchmonitoringAPIsexposealotofinformation,bothaboutthesearchengineitselfaswellasabouttheenvironment,suchastheoperatingsystem.WesawthatinChapter10,AdministratingYourCluster.Becauseofthisandtheeaseofretrievingthisinformation,numerousapplicationswerebuilt–onesthatallowustodomonitoringandbeyond.Someoftheseapplicationsaresimpleandjustreadthedatainrealtimewithoutanypersistentstorage,whileothersallowustoreadhistoricaldataaboutourclusterbehavior.Inthischapter,wewillonlyslightlytouchthetopofthepileofinformationaboutsuchapplications,butwestronglyadviseyoutogetfamiliarwithsomeofthemastheycanmakeyoureverydayworkwithElasticsearcheasier.

WechosethreeexamplesofmonitoringsolutionswhichtakeadifferentapproachofintegrationwithElasticsearch.ThefirsttwotoolsareavailableasElasticsearchpluginsandthethirdtakesadifferentapproachtointegration.

www.EBooksWorld.ir

Page 706: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ElasticsearchHQThistoolisavailableasanElasticsearchpluginbutcanalsobedownloadedseparatelyasaJavaScriptapplicationruninabrowser.

ElasticsearchHQusesJavaScriptandAJAXtechniqueswheredataisfetchedperiodicallyfromthecluster,preparedforvisualizationonthebrowserside,andshowntotheuser.

Thetoolallowsustotrackstatisticsonaparticularnode.Thebrowsercanpresentvitalinformationabouttheclusterandparticularnodes.ThefollowingscreenshotshowsthegraphicaluserinterfacefromElasticsearchHQ:

Wehavethebasicinformationaboutthecluster,thenumberofnodes,andElasticsearchhealth.Wecanalsoseewhichnodewearelookingatandsomestatisticsaboutthenode,whichincludethememoryusage(bothheapandnon-heap),thenumberofthreads,Javavirtualmachinegarbagecollectorwork,andsoon.Thepluginalsopresentssimplifiedinformationaboutschemaandshardsandallowsexecutionofsimplequeries.

InordertoinstallElasticsearchHQ,oneshouldjustrunthefollowingcommand:

bin/plugininstallroyrusso/elasticsearch-HQ

Afterthat,theGUIwillbeavailableathttp://localhost:9200/_plugin/hq/.

OnethingtorememberisthatElasticsearchHQdoesn’tpersistthefetcheddataanywhere,sothedataisonlyfetchedwhenyourbrowserisrunningandhasElasticsearchHQopened.Ifsomethinghashappenedinthepast,youwon’tbeabletodiagnoseit.

www.EBooksWorld.ir

Page 707: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

MarvelMarvelisthetoolcreatedbytheElasticsearchteam.Inthecurrentversion,itisbuiltasapluginforavisualizationplatformcalledKibana(https://www.elastic.co/products/kibana).

NoteKibanaisoutofthescopeofthisbook.YoucanfindmoreaboutKibanaonofficialproductpageavailableat

https://www.elastic.co/.

Marvelalsovisualizesbasicinformationaboutclustersandnodesbydrawingnicegraphsthataredynamicallyupdatedovertime.ThemaindifferencefromElasticsearchHQisthattheperformancedataisstoredontheserverside(inthesameorexternalElasticsearchcluster),sohistoricaldataisavailable.Theexamplescreenshotispresentednext:

TheinstallationprocedureforMarvelcontainsthreesteps:

bin/plugininstalllicense

bin/plugininstallmarvel-agent

Andfinally,thethirdstepistoinstalltheMarvelplugininKibanabyrunningthefollowingcommand:

bin/kibanaplugin--installelasticsearch/marvel/latest

www.EBooksWorld.ir

Page 708: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SPMforElasticsearchThistoolpresentsadifferentapproachthanthepreviouslymentionedtools.SPMisaSoftwareasaService(SaaS)solutioncreatedformonitoringElasticsearchinstallationsofanysizeandallowsmonitoringseveralclustersanddifferenttechnologies.ThoughitsrootsareSaaS-based,itisalsoavailableonpremises,whichmeansthatyoucanrunSPMonyourownmachineswithouttheneedforsendingyourmetricstocloud.

InformationissentbysimpleclientsoftwareinstalledontheElasticsearchmachinetotheSPMservers.Themainadvantageisthepossibilityofstoringinformationforawiderrangeoftimeandseeingwhatwashappeninginthepast.Youcancreateyourowndashboardsandcorrelatemetricswithlogsbetweenmultipleapplications(SPMallowsyoutomonitorawidevarietyofapplications).

ThefollowingscreenshotshowsthedashboardofSPMforElasticsearch:

Theoverviewdashboardshownintheprecedingscreenshotprovidesinformationabouttheclusternodes,therequestrateandlatency,thenumberofdocumentsintheindices,CPUusage,load,memorydetails,Javavirtualmachinememory,thediskspaceusage,andfinallynetworktraffic.Youcangetdetailedinformationabouteachoftheseelementsbygoingintothetabdedicatedtoit.

YoucanfindadditionalinformationaboutSPMinstallationandavailableoptionsathttp://sematext.com/spm/index.html.

www.EBooksWorld.ir

Page 709: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

www.EBooksWorld.ir

Page 710: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

SummaryInthischapter,wefocusedonscalingandtuningElasticsearch.Westartedwiththehardwarepreparationsanddecisionsweneedtomake.Next,wetunedasingleElasticsearchnodeasmuchaswecouldandafterthatweconfiguredthewholeclustertoworkaswellasitcould.Wediscussedverticalexpansionpossibilitiesandwelearnedhowtomonitorourclusteronceithitstheproductionenvironment.

Sonowwehavereachedtheendofthebook.Wehopethatitwasanicereadingexperienceandthatyoufoundthebookinteresting.Sincethepreviouseditionofthebook,Elasticsearchhaschangedalot.Notonlywhenitcomestoversions,butalsowhenitcomestofunctionalities.Someofthefeaturesarenolongerthere,someofthemweremovedtoplugins,andofcoursenewfeatureswereadded.WereallyhopethatyouhavelearnedsomethingfromthisbookandnowyouwillfinditeasiertouseElasticsearcheveryday–nomatterifyouareabeginnerinthisworldorasemi–experiencedElasticsearchuser.Astheauthorsofthisbook,butalsoasElasticsearchusersourselves,wetriedtobringyou,ourreaders,thebestreadingexperiencewecould.OfcourseElasticsearchismorethanwedescribedinthebook,especiallywhenitcomestomonitoringandadministrationcapabilitiesandAPI.However,thenumberofpagesislimitedandifweweretodescribeeverythingingreatdetailswewouldhaveendedupwithabookonethousandpageslong.WeneedtorememberthatElasticsearchisnotonlyuserfriendlybutalsoprovidesalargeamountofconfigurationoptions,queryingpossibilities,andsoon.Duetothat,wehadtochoosewhichfunctionalitiestodescribeingreaterdetails,whichhadtobeonlymentioned,andwhichhadtobetotallyskipped.Aswiththetwopreviouseditionsofthebookyouareholding,wehopethatwemadetherightchoiceandthatyouarehappyaboutwhatyou’veread.

WewouldalsoliketosaythatitisworthrememberingthatElasticsearchisconstantlyevolving.Whenwritingthisbook,wewentthroughafewstableversionsfinallymakingittothereleaseofElasticsearch2.2.Evenbackthenweknewthatnewfeaturesandimprovementswerecoming,likesomeofthechangesmentionedinthebookthatwillbepartofthenextrelease,oratleasttheyareplannedtobe.BesuretochecktheofficialdocumentationofElasticsearchperiodicallyforthereleasenotesfornewversionsofElasticsearch,ifyouwanttobeuptodatewiththenewfeaturesbeingadded.Wewillalsobewritingaboutnewfeaturesthatwethinkareworthmentioningonwww.elasticsearchserverbook.com.Soifyouareinterested,visitthesitefromtimetotime.

Onceagainthankyouforthetimeyou’vespentwiththebook.

www.EBooksWorld.ir

Page 711: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

IndexA

advices,forhighqueryratescenariosabout/Adviceforhighqueryratescenariosshardrequestcache/Shardrequestcachequeries/Thinkaboutthequeriesqueries,parallelizing/Parallelizeyourqueriesfielddatacache/Fielddatacacheandbreakingthecircuitcircuit,breaking/Fielddatacacheandbreakingthecircuitsize,controlling/Keepsizeandshardsizeundercontrolshardsize,controlling/Keepsizeandshardsizeundercontrol

aggregationengineworking/Insidetheaggregationsengine

aggregationsabout/Aggregationsgeneralquerystructure/Generalquerystructuretypes/Aggregationtypesdate_histogram/Datehistogramaggregationgeodistanceaggregations/Geodistanceaggregationsgeohashgridaggregation/Geohashgridaggregationglobalaggregation/Globalaggregationsignificant_termsaggregation/Significanttermsaggregationsampleraggregation/Sampleraggregationchildrenaggregation/Childrenaggregationnestedaggregation/Nestedaggregationreverse_nestedaggregation/Reversenestedaggregationnestingaggregations/Nestingaggregationsandorderingbuckets

aggregations,typesmetrics/Aggregationtypes,Metricsaggregationsbuckets/Aggregationtypes,Bucketsaggregationspipeline/Aggregationtypes

AmazonURL/Costandperformanceflexibility

AmazonS3URL/Creatingasnapshotrepository

AnalyzeAPIURL/Definingyourownanalyzers

analyzersusing/Usinganalyzersout-of-the-boxanalyzers/Out-of-the-boxanalyzersdefining/Definingyourownanalyzersdefaultanalyzers/Defaultanalyzers

www.EBooksWorld.ir

Page 712: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ApacheLucene/GettingbacktoApacheLuceneURL/Fulltextsearchingglossary/TheLuceneglossaryandarchitecturearchitecture/TheLuceneglossaryandarchitecturedocument/TheLuceneglossaryandarchitecturefield/TheLuceneglossaryandarchitectureterm/TheLuceneglossaryandarchitecturetoken/TheLuceneglossaryandarchitecturetokenizer/Inputdataanalysisscoring/IntroductiontoApacheLucenescoring

ApacheLuceneJavadocsfortheTFIDFURL/Scoringandqueryrelevance

ApacheLucenescoringabout/IntroductiontoApacheLucenescoringdocumentmatching,factors/Whenadocumentismatcheddefaultscoringformula/Defaultscoringformularelevantdocuments/Relevancymatters

ApacheSolrURL/UsingApacheSolrsynonyms

ApacheSolrsynonymsusing/UsingApacheSolrsynonymsexplicitsynonyms/Explicitsynonymsequivalentsynonyms/Equivalentsynonymsexpandproperty/Expandingsynonyms

ApacheTikaURL/Detectingthelanguageofthedocument

arbitrarygeoshapesabout/Arbitrarygeoshapespoint/Pointenvelope/Envelopepolygon/Polygonmultipolygon/Multipolygonexampleusage/Anexampleusagestoring,inindex/Storingshapesintheindex

arguments,CatAPIURL/Commonarguments

attributes,indexstructuremappingindex_name/Commonattributesindex/Commonattributesstore/Commonattributesdoc_values/Commonattributesboost/Commonattributesnull_value/Commonattributescopy_to/Commonattributes

www.EBooksWorld.ir

Page 713: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

include_in_all/Commonattributesprecision_step/Number,Datecoerce/Numberignore_malformed/Number,Dateformat/Dateformat,referencelink/Datenumeric_resolution/Date

availableobjects,scriptexecution_doc/Objectsavailableduringscriptexecution_source/Objectsavailableduringscriptexecution_fields/Objectsavailableduringscriptexecution

AzureURL/Creatingasnapshotrepository

www.EBooksWorld.ir

Page 714: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Bbasicqueries

about/Basicqueriestermquery/Thetermquerytermsquery/Thetermsquerymatchallquery/Thematchallquerytypequery/Thetypequeryexistsquery/Theexistsquerymissingquery/Themissingquerycommontermsquery/Thecommontermsquerymatchquery/Thematchquerymultimatchquery/Themultimatchqueryquerystringquery/Thequerystringquerysimplequerystringquery/Thesimplequerystringqueryidentifiersquery/Theidentifiersqueryprefixquery/Theprefixqueryfuzzyquery/Thefuzzyquerywildcardquery/Thewildcardqueryrangequery/Therangequeryregularexpressionquery/Regularexpressionquerymorelikethisquery/Themorelikethisquery

batchindexingused,forspeedingupindexingprocess/Batchindexingtospeedupyourindexingprocess

Booleanpropertiessetnode.master/Configuringnoderolesnode.data/Configuringnoderolesnode.client/Configuringnoderoles

boolqueryabout/Theboolqueryshouldsection/Theboolquerymustsection/Theboolquerymust_notsection/Theboolqueryfilterparameter/Theboolqueryboostparameter/Theboolqueryminimum_should_matchparameter/Theboolquerydisable_coordparameter/Theboolqueryused,forexplicitfiltering/Explicitfilteringwithboolquery

boostingquery/Theboostingqueryboost_modeparameter

multiplyvalue/Structureofthefunctionqueryreplacevalue/Structureofthefunctionquerysumvalue/Structureofthefunctionquery

www.EBooksWorld.ir

Page 715: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

avgvalue/Structureofthefunctionquerymaxvalue/Structureofthefunctionqueryminvalue/Structureofthefunctionquery

bucketaggregationsordering/Nestingaggregationsandorderingbuckets,Bucketsordering

buckets/Generalquerystructurebucketsaggregations

about/Bucketsaggregationsfilteraggregation/Filteraggregationfiltersaggregation/Filtersaggregationtermsaggregation/Termsaggregationrangeaggregation/Rangeaggregationdate_rangeaggregation/Daterangeaggregationip_rangeaggregation/IPv4rangeaggregationmissingaggregation/Missingaggregationhistogramaggregation/Histogramaggregation

bulkindexingdata,preparing/Preparingdataforbulkindexing

www.EBooksWorld.ir

Page 716: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Ccaches

about/Elasticsearchcachesfielddatacache/Fielddatacachefielddata,usingwithdocvalues/Fielddataanddocvaluesshardrequestcache/Shardrequestcachenodequerycache/Nodequerycacheindexingbuffers/Indexingbuffersavoiding,scenarios/Whencachesshouldbeavoided

CatAPIabout/TheCatAPIdefining/Thebasicsusing/UsingCatAPIcommonarguments/Commonargumentsexamples/Theexamples,Gettinginformationaboutthenodes

childrenaggregationabout/Childrenaggregation

CIDRnotationURL/IPv4rangeaggregation

ClassDateTimeFormatURL/Tuningthetypedeterminingmechanismfordates

clientnodeabout/Noderoles,Clientnode

clusterabout/Nodesandclustersinstalling/Installingandconfiguringyourclusterconfiguring/Installingandconfiguringyourclusterdirectorylayout/Thedirectorylayoutsystem-specificinstallationandconfiguration/Thesystem-specificinstallationandconfiguration

clusterhealthAPIabout/ClusterhealthAPIinformationdetails,controlling/Controllinginformationdetailsadditionalparameters/Additionalparameters

clusterrebalancingcontrolling/Controllingclusterrebalancingdefining/Understandingrebalanceimplementing/Clusterbeingreadysettings/Theclusterrebalancesettings,Controllingthenumberofshardsbeingmovedbetweennodesconcurrently

clustersettingsAPI/TheclustersettingsAPIclusterwideallocation

about/Cluster-wideallocation

www.EBooksWorld.ir

Page 717: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

allocationawareness/Allocationawarenessallocationawareness,forcing/Forcingallocationawarenessfiltering/Filtering

CMSsystemURL/Creatinganewdocument

commontermsquery/Thecommontermsquerycompletionsuggester

about/CompletionsuggesterinElasticsearch2.2/Completionsuggester

completionsuggester,Elasticsearch2.1data,indexing/Indexingdataindexeddata,querying/Queryingindexedcompletionsuggesterdatacustomweights/Customweights

completionsuggester,Elasticsearch2.2about/Completionsuggester

compoundqueriesabout/Compoundqueriesboolquery/Theboolquerydis_maxquery/Thedis_maxqueryboostingquery/Theboostingqueryconstant_scorequery/Theconstant_scorequeryindicesquery/Theindicesquery

compressedoopsURL/Thememory

compressedordinaryobjectpointersreferencelink/MultipleElasticsearchinstancesonasinglephysicalmachine

configurationoptions,phrasesuggestermax_errors/Configurationseparator/Configuration

configurationoptions,termsuggestertext/Termsuggesterconfigurationoptionsfield/Termsuggesterconfigurationoptionsanalyzer/Termsuggesterconfigurationoptionssize/Termsuggesterconfigurationoptionssuggest_mode/Termsuggesterconfigurationoptionssort/Termsuggesterconfigurationoptions

constant_scorequery/Theconstant_scorequerycontent

searching,indifferentlanguages/Searchingcontentindifferentlanguagescontent,searchingindifferentlanguages

about/Searchingcontentindifferentlanguageslanguages,handling/Handlinglanguagesdifferentlymultiplelanguages,handling/Handlingmultiplelanguagesdocumentlanguage,detecting/Detectingthelanguageofthedocument

www.EBooksWorld.ir

Page 718: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

sampledocument/Sampledocumentmappings/Themappingsdata,querying/Queryingqueries,combining/Combiningqueries

contextsuggesterabout/Contextsuggestertypes/Contexttypesusing/Usingcontextgeolocationcontext,using/Usingthegeolocationcontext

contextswitchesreferencelink/Threadpoolstuning

coretypes,indexstructuremappingabout/Coretypescommonattributes/Commonattributesstring/Stringnumber/Numberboolean/Booleanbinary/Binarydate/Date

counttoitfield/Addingpartialdocumentscreate,retrieve,update,delete(CRUD)

URL/ManipulatingdatawiththeRESTAPIcURLcommand

URL/InstallingElasticsearch

www.EBooksWorld.ir

Page 719: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Ddata

manipulating,withRESTAPI/ManipulatingdatawiththeRESTAPIstoring,inElasticsearch/StoringdatainElasticsearchpreparing,forbulkindexing/Preparingdataforbulkindexingindexing/Indexingthedata_allfield/The_allfield_sourcefield/The_sourcefieldinternalfields/Additionalinternalfieldssorting/Sortingdatadefaultsorting/Defaultsortingquerying,inchilddocuments/Queryingdatainthechilddocumentsquerying,inparentdocuments/Queryingdataintheparentdocuments

data,thatisnotflatindexing/Indexingdatathatisnotflatdata/Dataobjects/Objectsarrays/Arraysmappings/Mappingsdynamicbehavior/Tobeornottobedynamicobjectindexing,disabling/Disablingobjectindexing

datanodeabout/Noderoles

dataquerying,casesidentifiedlanguage,using/Querieswithanidentifiedlanguageunknownlanguage,using/Querieswithanunknownlanguage

datasetsforegroundsets/Choosingsignificanttermsbackgroundsets/Choosingsignificantterms

datasortingabout/Sortingdatadefaultsorting/Defaultsortingfields,selecting/Selectingfieldsusedforsortingmode/Sortingmodebehaviorformissingfields,specifying/Specifyingbehaviorformissingfieldsdynamiccriteria/Dynamiccriteriascoring,calculating/Calculatescoringwhensorting

date_histogramaggregationsabout/Datehistogramaggregationtimezones/Timezones

DEBpackageused,forinstallingElasticsearch/InstallingElasticsearchusingtheDEBpackage

www.EBooksWorld.ir

Page 720: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

defaultindexing/Defaultindexingderivativeaggregation

URL/Derivativeaggregationdesignatednodesrolesforlargerclusters

about/Designatednoderolesforlargerclustersqueryaggregatornodes/Queryaggregatornodesdatanodes/Datanodesmastereligiblenodes/Mastereligiblenodes

DigitalOceanURL/Costandperformanceflexibility

directorylayout,clusterbin/Thedirectorylayoutconfig/Thedirectorylayoutlib/Thedirectorylayoutmodules/Thedirectorylayoutdata/Thedirectorylayoutlogs/Thedirectorylayoutplugins/Thedirectorylayoutwork/Thedirectorylayout

disk-basedshardallocationabout/Disk-basedshardallocationconfiguring/Configuringdiskbasedshardallocationdisabling/Disablingdiskbasedshardallocation

dis_maxquery/Thedis_maxqueryDocker

referencelink/MultipleElasticsearchinstancesonasinglephysicalmachinedocument

about/Documentcreating/Creatinganewdocumentautomaticidentifiercreation,creating/Automaticidentifiercreationretrieving/Retrievingdocumentsupdating/Updatingdocumentsnon-existingdocuments,dealingwith/Dealingwithnon-existingdocumentspartialdocuments,adding/Addingpartialdocumentsdeleting/Deletingdocuments

documenttype/Documenttypedoubletype

URL/Numberdynamictemplates

about/Templatesanddynamictemplates,Dynamictemplatesmatchingpattern/Thematchingpatterntargetfielddefinition,writing/Fielddefinitions

www.EBooksWorld.ir

Page 721: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

EElasticsearch

about/ThebasicsofElasticsearchkeyconcepts/KeyconceptsofElasticsearchindex/Indexdocument/Documentdocumenttype/Documenttypemapping/Mappingindexing/Indexingandsearching,Elasticsearchindexingsearching/IndexingandsearchingURL/InstallingElasticsearch,Availablesimilaritymodelsinstalling/InstallingElasticsearchrunning/RunningElasticsearchshuttingdown/ShuttingdownElasticsearchconfiguring/ConfiguringElasticsearchinstalling,withRPMpackage/InstallingElasticsearchusingRPMpackagesinstalling,withDEBpackage/InstallingElasticsearchusingtheDEBpackageconfigurationfiles,localization/Elasticsearchconfigurationfilelocalizationquerying/QueryingElasticsearch,Asimplequeryexampledata/Theexampledatapaging/Pagingandresultsizeresultsize,controlling/Pagingandresultsizeversionvalue,returning/Returningtheversionvaluescore,limiting/Limitingthescorereturnfields,selecting/Choosingthefieldsthatwewanttoreturnsourcefiltering/Sourcefilteringscriptfields,using/Usingthescriptfieldsparameters,passingtoscriptfields/Passingparameterstothescriptfieldsparametrs,passingtoscriptfields/Passingparameterstothescriptfieldsscriptingcapabilities/ScriptingcapabilitiesofElasticsearchspatialcapabilities/Elasticsearchspatialcapabilitiesreferencedocumentation,URL/Configurationplugins/Elasticsearchpluginscaches/Elasticsearchcacheshardwarepreparations/Hardwaremonitoring/MonitoringKibana,URL/Marvel

Elasticsearch2.1URL/Threadpools

Elasticsearch2.2completionsuggester/Completionsuggester

Elasticsearchclusterpreparing,forhighindexing/Preparingtheclusterforhighindexingand

www.EBooksWorld.ir

Page 722: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

queryingthroughputpreparing,forhighquerying/Preparingtheclusterforhighindexingandqueryingthroughput

ElasticsearchHQtoolusing/ElasticsearchHQ

Elasticsearchindexingabout/Elasticsearchindexingshards/Shardsandreplicasreplicas/Shardsandreplicasindices,creating/Creatingindices

Elasticsearchinfrastructurekeyconcepts/KeyconceptsoftheElasticsearchinfrastructurenode/Nodesandclusterscluster/Nodesandclustersshard/Shardsreplica/Replicasgateway/Gateway

Elasticsearchmonitoringabout/MonitoringElasticsearchHQtool,using/ElasticsearchHQMarveltool,using/MarvelSPMtool,using/SPMforElasticsearch

Elasticsearchtimemachineabout/Elasticsearchtimemachinesnapshotrepository,creating/Creatingasnapshotrepositorysnapshots,creating/Creatingsnapshotssnapshot,restoring/Restoringasnapshotparameters/Restoringasnapshotoldsnapshots,deleting/Cleaningup–deletingoldsnapshots

existsquery/TheexistsqueryExplainAPI

URL/Explainingthequeryexplaininformation

about/Understandingtheexplaininformationfieldanalysis/Understandingfieldanalysisquery,explaining/Explainingthequery

www.EBooksWorld.ir

Page 723: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Ffactors,forscorepropertycalculation

documentboost/Whenadocumentismatchedfieldboost/Whenadocumentismatchedcoord/Whenadocumentismatchedinversedocumentfrequency/Whenadocumentismatchedlengthnorm/Whenadocumentismatchedtermfrequency/Whenadocumentismatchedquerynorm/Whenadocumentismatched

FastVectorHighlighterURL/Underthehood

FedoraLinuxURL/InstallingElasticsearchusingRPMpackages

fielddatacacheabout/Fielddatacachesize,controlling/Fielddatasizecircuitbreakers/Circuitbreakers

fielddefinitionvariables,dynamictemplates{name}/Fielddefinitions{dynamic_type}/Fielddefinitions

filteringabout/Filteringinclude/Whatdoinclude,exclude,andrequiremeanrequire/Whatdoinclude,exclude,andrequiremeanexclude/Whatdoinclude,exclude,andrequiremean

filterslowercasefilter/Inputdataanalysissynonymsfilter/Inputdataanalysislanguagestemmingfilters/Inputdataanalysis

filtersandtokenizersURL/Definingyourownanalyzers

filtertypesURL/Definingyourownanalyzers

fulltextsearchingabout/FulltextsearchingApacheLucene,glossary/TheLuceneglossaryandarchitectureApacheLucene,architecture/TheLuceneglossaryandarchitectureinputdataanalysis/Inputdataanalysisindexing/Indexingandqueryingquerying/Indexingandqueryingscoring/Scoringandqueryrelevancequeryrelevance/Scoringandqueryrelevance

functionscorequery

www.EBooksWorld.ir

Page 724: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

about/Thefunctionscorequerystructure/Structureofthefunctionqueryweightfactorfunction/Theweightfactorfunctionfield_value_factorfunction/Fieldvaluefactorfunctionscript_scorefunction/Thescriptscorefunctionrandom_scorefunction/Therandomscorefunctiondecayfunctions/Decayfunctions

function_scorequeryURL/Decayfunctions

fuzzyqueryabout/Thefuzzyquery

www.EBooksWorld.ir

Page 725: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Ggateway/Gatewaygatewaymodule

about/Thegatewayandrecoverymodules,Thegatewaygatewayrecoveryoptions

gateway.recover_after_master_nodes/Additionalgatewayrecoveryoptionsgateway.recover_after_data_nodes/Additionalgatewayrecoveryoptionsgateway.expected_master_nodes/Additionalgatewayrecoveryoptionsgateway.expected_data_nodes/Additionalgatewayrecoveryoptions

generalpreparations,singleElasticsearchnodeabout/Thegeneralpreparationsswapping,avoiding/Avoidingswappingfiledescriptors/Filedescriptorsvirtualmemory/Virtualmemory,Thememory

Geo/Geoboundsaggregationgeodistanceaggregations

about/GeodistanceaggregationsGeohash

URL/Geohashgridaggregationgeohashgridaggregation

about/GeohashgridaggregationURL/Geohashgridaggregation

GeohashvalueURL/Exampledata

GeoJSONURL/Arbitrarygeoshapes

geospatialqueriesURL/Samplequeries

geo_fieldpropertiesgeohash/Additionalgeo_fieldpropertiesgeohash_precision/Additionalgeo_fieldpropertiesgeohash_prefix/Additionalgeo_fieldpropertiesignore_malformed/Additionalgeo_fieldpropertieslat_lon/Additionalgeo_fieldpropertiesprecision_step/Additionalgeo_fieldproperties

GitHubURL/Installingpluginsautomaticstorethrottling,URL/Automaticstorethrottling

Githubissue,URL/String

globalaggregationabout/Globalaggregation

Groovy

www.EBooksWorld.ir

Page 726: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

URL/ScriptingcapabilitiesofElasticsearch

www.EBooksWorld.ir

Page 727: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Hhardwarepreparations,forrunningElasticsearch

about/Hardwarephysicalservers/Physicalserversoracloudcloud/PhysicalserversoracloudCPU/CPURAMmemory/RAMmemorymassstorage/Massstoragenetwork/Thenetworkserverscounting/Howmanyserverscostcutting/Costcutting

HDFSURL/Creatingasnapshotrepository

highlightedfragmentscontrolling/Controllinghighlightedfragments

highlightertypeselecting/Forcinghighlightertype

highlightingabout/Highlightingusing/Gettingstartedwithhighlightingfieldconfiguration/FieldconfigurationApacheLucene,using/Underthehoodhighlightertype,selecting/ForcinghighlightertypeHTMLtags,,configuring/ConfiguringHTMLtagsglobalsettings/Globalandlocalsettingslocalsettings/Globalandlocalsettingsmatchingneed/Requirematchingcustomquery/CustomhighlightingqueryPostingshighlighter/ThePostingshighlighter,Validatingyourqueries

horizontalexpansionabout/Horizontalexpansionreplicas,automaticcreation/Automaticallycreatingthereplicasredundancy/Redundancyandhighavailabilityhighavailability/Redundancyandhighavailabilityreferencelinks/Redundancyandhighavailabilitycostandperformanceflexibility/Costandperformanceflexibilitycontinuesupgrades/ContinuousupgradesmultipleElasticsearchinstances,onsinglephysicalmachine/MultipleElasticsearchinstancesonasinglephysicalmachinedesignatednodesrolesforlargerclusters/Designatednoderolesforlargerclusters

howsimilarphrase/UnderstandingtheexplaininformationHTTPmodule

www.EBooksWorld.ir

Page 728: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

properties,URL/HTTPhostHTTPprotocol

URL/UnderstandingtheRESTAPIHTTPtransportsettings,adjusting

node/AdjustingHTTPtransportsettingsHTTP,disabling/DisablingHTTPHTTPport/HTTPportHTTPhost/HTTPhost

HyperLogLog++algorithmURL/Fieldcardinality

www.EBooksWorld.ir

Page 729: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Iidentifiersquery

about/Theidentifiersqueryindex

segments/TheLuceneglossaryandarchitectureabout/Index

index-timeboostingusing/Whendoesindex-timeboostingmakesense?defining,inmappings/Definingboostinginthemappings

indexaliasabout/Indexaliasingandusingittosimplifyyoureverydayworkdefining/Analiascreating/Creatinganaliasmodifying/Modifyingaliasescommands,combining/Combiningcommandsretrieving/Retrievingaliasesremoving/Removingaliasesfiltering/Filteringaliasesandrouting/Aliasesandroutingandzerodowntimereindexing/Zerodowntimereindexingandaliases

indexation/Inputdataanalysisindexingprocess

speedingup,batchindexingused/Batchindexingtospeedupyourindexingprocess

indexingrelatedadvicesabout/Indexingrelatedadviceindexrefreshrate/Indexrefreshratethreadpools,tuning/Threadpoolstuningautomaticstorethrottling/Automaticstorethrottlingtime-baseddata,handling/Handlingtime-baseddatamultipledatapaths/Multipledatapathsdatadistribution/Datadistributionbulkindexing/BulkindexingRAMbuffer,usedforindexing/RAMbufferforindexing

indexrefreshratereferencelink/Indexrefreshrate

indexstructuremodifying,withupdateAPI/ModifyingyourindexstructurewiththeupdateAPI

indexstructure,modifyingmappings/Themappingsnewfield,adding/Addinganewfieldtotheexistingindexexistingindexfields,modifying/Modifyingfieldsofanexistingindex

www.EBooksWorld.ir

Page 730: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

indexstructure,parent-childrelationshipabout/Indexstructureanddataindexingchildmappings/Childmappingsparentmappings/Parentmappingsparentdocument/Theparentdocumentchildrendocuments/Childdocuments

indexstructuremappingabout/Indexstructuremappingtypes/Typeandtypesdefinitiontypesdefinition/Typeandtypesdefinitionfields/Fieldscoretypes/Coretypesmultifields/MultifieldsIPaddresstype/TheIPaddresstypetokencounttype/Tokencounttype

indices,Elasticsearchindexingcreating/Creatingindicesautomaticcreation,altering/Alteringautomaticindexcreationnewlycreatedindex,settings/Settingsforanewlycreatedindexdeleting/Indexdeletion

indicesanalyzeAPIURL/Queryanalysis

indicesquery/TheindicesqueryindicessettingsAPI/TheindicessettingsAPIindicesstatsAPI

about/IndicesstatsAPIdocs/Docsstore/Storeindexing/Indexing,get,andsearchget/Indexing,get,andsearchsearch/Indexing,get,andsearchdefining/Additionalinformation

internalfields_id/Additionalinternalfields_uid/Additionalinternalfields_type/Additionalinternalfields_field_names/Additionalinternalfields

invertedindexabout/TheLuceneglossaryandarchitectureURL/Index

www.EBooksWorld.ir

Page 731: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

JJava

URL/Fulltextsearchinginstalling/InstallingJava

JavaScriptObjectNotation(JSON)URL/RunningElasticsearch

JavathreadsURL/Threadpools

JavatypesURL/Number

JavaVersion7URL/InstallingJava

JavaVirtualMachine(JVM)/ConfiguringElasticsearchJMeter

URL/WhencachesshouldbeavoidedJodaTimelibrary

URL/DaterangeaggregationJSON

URL/Document

www.EBooksWorld.ir

Page 732: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

KKibana

URL/Marvel

www.EBooksWorld.ir

Page 733: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Llanguageanalyzer

URL/Out-of-the-boxanalyzerslanguageanalyzers

URL/Sampledocumentlanguagedetection

URL/DetectingthelanguageofthedocumentLevenshteinalgorithm

URL/TheBooleanmatchqueryLinux

Elasticsearch,installing/InstallingElasticsearchonLinuxElasticsearch,configuringassystemservice/ConfiguringElasticsearchasasystemserviceonLinux

LogstashURL/Indexaliasingandusingittosimplifyyoureverydaywork

LuceneJavadocsURL/Defaultscoringformula

Lucenequerysyntaxabout/Lucenequerysyntax

www.EBooksWorld.ir

Page 734: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Mmapping/Mappingmappings

configuration/Mappingsconfigurationtypedeterminingmechanism/Typedeterminingmechanismindexstructuremapping/Indexstructuremappinganalyzers,using/Usinganalyzerssimilaritymodels/Differentsimilaritymodelsabout/Mappingsfinalmappings/Finalmappingssending,toElasticsearch/SendingthemappingstoElasticsearchnewfield,addingtoexistingindex/Addinganewfieldtotheexistingindexfieldofexistingindex,modifying/Modifyingfieldsofanexistingindex

Marveltoolusing/Marvel

masternodeabout/Noderoles,Masternode

matchallquery/Thematchallquerymatchingpattern,dynamictemplates

match/Thematchingpatternunmatch/Thematchingpattern

matchqueryabout/ThematchqueryBooleanmatchquery/TheBooleanmatchqueryphrasematchquery/Thephrasematchquerymatchphraseprefixquery/Thematchphraseprefixquery

MavenURL/Installingplugins

MavenCentralURL/Installingplugins

MavenSonatypeURL/Installingplugins

mergepolicyabout/Themergepolicyproperties/Themergepolicy

mergescheduler/Themergeschedulermetricsaggregations

about/Metricsaggregationsmin/Minimum,maximum,average,andsummax/Minimum,maximum,average,andsumavg/Minimum,maximum,average,andsumsum/Minimum,maximum,average,andsummissingvalues/Missingvalues

www.EBooksWorld.ir

Page 735: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

scripts,using/Usingscriptsfieldvaluestatistics/Fieldvaluestatisticsandextendedstatisticsextended_statistics/Fieldvaluestatisticsandextendedstatisticsvalue_countaggregation/Valuecountfieldcardinalityaggregation/Fieldcardinalitypercentilesaggregation/Percentilespercentile_ranksaggregation/Percentilerankstop_hitsaggregation/Tophitsaggregationtop_hitsaggregation,additionalparameters/Additionalparametersgeo_boundsaggregation/Geoboundsaggregationscriptedmetricsaggregation/Scriptedmetricsaggregation

MicrosoftWindowsplatformfilehandles,URL/ConfiguringElasticsearch

minimum_should_matchparameterURL/Theboolquery

missingquery/Themissingquerymorelikethisquery

about/Themorelikethisquerymovingaveragescalculation

URL/Pipelineaggregationsmoving_avgaggregation

URL/Movingavgaggregationabout/Movingavgaggregationfuturebuckets,predicting/Predictingfuturebucketsmodels/Themodelsmodels,URL/Themodels

multimatchquery/ThemultimatchquerymultipleElasticsearchinstances,onsinglephysicalmachine

about/MultipleElasticsearchinstancesonasinglephysicalmachineshard,preventingonsamenode/Preventingashardanditsreplicasfrombeingonthesamenodereplicas,preventingonsamenode/Preventingashardanditsreplicasfrombeingonthesamenode

multipleindicesURL/URIsearch

multiterm/Queryrewritemultivaluedfield/DocumentMustache

URL/ScriptingcapabilitiesofElasticsearch

www.EBooksWorld.ir

Page 736: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Nnativecode,using

factoryimplementation/Thefactoryimplementationnativescriptimplementation/Implementingthenativescriptplugindefinition/Theplugindefinitionplugin,installing/Installingthepluginscript,running/Runningthescript

nestedaggregationabout/Nestedaggregation

nestedobjectsusing/UsingnestedobjectsURL/Usingnestedobjectsnestedqueries/Scoringandnestedqueriesscore_modeproperty,setting/Scoringandnestedqueries

nestingaggregationsabout/Nestingaggregationsandorderingbuckets

networkattachedstorage(NAS)/Massstoragenode/Nodesandclusters

discoveryTopicnabout/Understandingnodediscoverydiscoverytypes/Understandingnodediscovery,Discoverytypesroles/Noderolesclustername,setting/Settingthecluster’snameZendiscovery/ZendiscoveryHTTPtransportsettings,adjusting/AdjustingHTTPtransportsettings

noderolesmasternode/Noderoles,Masternodedatanode/Noderoles,Datanodeclientnode/Noderoles,Clientnodeconfiguring/Configuringnoderoles

nodesinfoAPIabout/NodesinfoAPIrequisites/NodesinfoAPIextensiveinformation,returning/Returnedinformation

NoSQLURL/ManipulatingdatawiththeRESTAPI

number,indexstructuremappingbyte/Numbershort/Numberinteger/Numberlong/Numberfloat,URL/Numberfloat/Numberdouble/Number

www.EBooksWorld.ir

Page 737: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

double,URL/Number

www.EBooksWorld.ir

Page 738: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Oobjectindexing

disabling/Disablingobjectindexingofficialrepository

URL/InstallingpluginsOpenJDK

URL/InstallingJavaoptimisticlocking

URL/Versioningoptions,termsuggester

lowercase_terms/Additionaltermsuggesteroptionsmax_edits/Additionaltermsuggesteroptionsprefix_len/Additionaltermsuggesteroptionsmin_word_len/Additionaltermsuggesteroptionsshard_size/Additionaltermsuggesteroptions

out-of-the-boxanalyzersstandard/Out-of-the-boxanalyzerssimple/Out-of-the-boxanalyzerswhitespace/Out-of-the-boxanalyzersstop/Out-of-the-boxanalyzerskeyword/Out-of-the-boxanalyzerspattern/Out-of-the-boxanalyzerslanguage/Out-of-the-boxanalyzerssnowball/Out-of-the-boxanalyzers

www.EBooksWorld.ir

Page 739: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Pparameters,Booleanmatchquery

operator/TheBooleanmatchqueryanalyzer/TheBooleanmatchqueryfuzziness/TheBooleanmatchqueryprefix_length/TheBooleanmatchquerymax_expansions/TheBooleanmatchqueryzero_terms_query/TheBooleanmatchquerycero_terms_query/TheBooleanmatchquerylenient/TheBooleanmatchquery

parameters,fuzzyqueryvalue/Thefuzzyqueryboost/Thefuzzyqueryfuzziness/Thefuzzyqueryprefix_length/Thefuzzyquerymax_expansions/Thefuzzyquery

parameters,morelikethisqueryfields/Themorelikethisquerylike/Themorelikethisqueryunlike/Themorelikethisqueryin_term_freq/Themorelikethisquerymax_query_terms/Themorelikethisquerystop_words/Themorelikethisquerymin_doc_freq/Themorelikethisquerymin_word_len/Themorelikethisquerymax_word_len/Themorelikethisqueryboost_terms/Themorelikethisqueryboost/Themorelikethisqueryinclude/Themorelikethisqueryminimum_should_match/Themorelikethisqueryanalyzer/Themorelikethisquery

parameters,querystringqueryquery/Thequerystringquerydefault_field/Thequerystringquerydefault_operator/Thequerystringqueryanalyzer/Thequerystringqueryallow_leading_wildcard/Thequerystringquerylowercase_expand_terms/Thequerystringqueryenable_position_increments/Thequerystringqueryfuzzy_max_expansions/Thequerystringqueryfuzzy_prefix_length/Thequerystringqueryphrase_slop/Thequerystringqueryboost/Thequerystringquery

www.EBooksWorld.ir

Page 740: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

analyze_wildcard/Thequerystringqueryauto_generate_phrase_queries/Thequerystringqueryminimum_should_match/Thequerystringqueryfuzziness/Thequerystringquerymax_determined_states/Thequerystringquerylocale/Thequerystringquerytime_zone/Thequerystringquerylenient/Thequerystringquery

parameters,rangequerygte/Therangequerygt/Therangequerylte/Therangequerylt/Therangequery

parent-childrelationshipusing/Usingtheparent-childrelationshipindexstructure/Indexstructureanddataindexingdataindexing/Indexstructureanddataindexingquerying/Queryingperformanceconsiderations/Performanceconsiderations

parentaggregations/Availabletypespatternanalyzer

URL/Out-of-the-boxanalyzerspercolator

about/Percolatorindex/Theindexpreparing/Percolatorpreparationexploring/Gettingdeeperreturnedresultssize,controlling/Controllingthesizeofreturnedresultsusing,forandscorecalculation/Percolatorandscorecalculationcombining,withotherfunctionalities/Combiningpercolatorswithotherfunctionalitiesmatchingqueriescount,obtaining/Gettingthenumberofmatchingqueriesindexeddocumentspercolation/Indexeddocumentpercolation

phrasematchqueryslop/Thephrasematchqueryanalyzer/Thephrasematchquery

phrasesuggesterabout/Phrasesuggesterconfiguration/Configuration

pipelineaggregationsabout/PipelineaggregationsURL/Pipelineaggregationsparentaggregationfamily/Availabletypessiblingaggregationfamily/Availabletypes

www.EBooksWorld.ir

Page 741: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

types/Availabletypes,Pipelineaggregationtypesotheraggregations,referencing/Referencingotheraggregationsdata,gaps/Gapsinthedata

pipelineaggregations,typessum_bucket/Min,max,sum,andaveragebucketaggregationsmin_bucket/Min,max,sum,andaveragebucketaggregationsmax_bucket/Min,max,sum,andaveragebucketaggregationsavg_bucket/Min,max,sum,andaveragebucketaggregationscumulative_sumaggregation/Cumulativesumaggregationbucket_selectoraggregation/Bucketselectoraggregationbucket_scriptaggregation/Bucketscriptaggregationserial_diffaggregation/Serialdifferencingaggregationderivativeaggregation/Derivativeaggregationmoving_avgaggregation/Movingavgaggregation

pluginsabout/Elasticsearchpluginsbasics/Thebasicsinstalling/Installingpluginsremoving/Removingplugins

PostingsHighlighterURL/Underthehoodabout/ThePostingshighlighter

prefixquery/Theprefixqueryproperties,faultdetectionpingsettings

discovery.zen.fd.ping_interval/Faultdetectionpingsettingsdiscovery.zen.fd.ping_timeout/Faultdetectionpingsettingsdiscovery.zen.fd.ping_retries/Faultdetectionpingsettings

properties,mergepolicyindex.merge.policy.expunge_deletes_allowed/Themergepolicyindex.merge.policy.max_merge_at_once/Themergepolicyindex.merge.policy.max_merge_at_once_explicit/Themergepolicyindex.merge.policy.max_merged_segment/Themergepolicyindex.merge.policy.segments_per_tier/Themergepolicyindex.merge.policy.reclaim_deletes_weight/Themergepolicy

www.EBooksWorld.ir

Page 742: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Qqueries

selecting,forwarming/Choosingqueriesforwarmingqueryboost

applying,todocument/Theboostqueryboosts

used,forinfluencingscores/Influencingscoreswithqueryboostsabout/Theboostadding,toqueries/Theboost,Addingtheboosttoqueriesscore,modifying/Modifyingthescore

queryingdata,inchilddocuments/Queryingdatainthechilddocumentsdata,inparentdocuments/Queryingdataintheparentdocuments

queryingprocessabout/Understandingthequeryingprocessquerylogic/Querylogicsearchtype,specifying/Searchtypesearchexecutionpreference,specifying/SearchexecutionpreferencesearchshardsAPI,specifying/SearchshardsAPI

queryparserURL/Lucenequerysyntax

queryrewriteabout/Queryrewriteprefixquery,example/PrefixqueryasanexampleApacheLucene,using/GettingbacktoApacheLuceneproperties/Queryrewriteproperties

querystringqueryabout/Thequerystringqueryrunning,againstmultiplefields/Runningthequerystringqueryagainstmultiplefields

www.EBooksWorld.ir

Page 743: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

RRackspace

URL/CostandperformanceflexibilityRAID

URL/Massstoragerangeaggregation

about/Rangeaggregationkeyedbuckets/Keyedbuckets

rangequery/Therangequeryrecoverymodules

about/Thegatewayandrecoverymodulesrecoveryprocess

about/Recoverycontrolgatewayrecoveryoptions/AdditionalgatewayrecoveryoptionsindicesrecoveryAPI/IndicesrecoveryAPIdelayedallocation/Delayedallocationindexrecoveryprioritization/Indexrecoveryprioritization

regularexpressionqueryabout/RegularexpressionqueryURL/Regularexpressionquery

replica/Replicasreplicas,Elasticsearchindexing

about/Shardsandreplicaswriteconsistency,controlling/Writeconsistency

RESTAPIused,fordatamanipulation/ManipulatingdatawiththeRESTAPIabout/UnderstandingtheRESTAPIURL/UnderstandingtheRESTAPIdata,storinginElasticsearch/StoringdatainElasticsearchdocuments,retrieving/Retrievingdocumentsdocuments,updating/Updatingdocumentsdocuments,deleting/Deletingdocumentsversioning/Versioning

resultsfiltering/Filteringyourresultsquerycontext/Thecontextisthekeyexplicitfiltering,boolqueryused/Explicitfilteringwithboolquery

reverse_nestedaggregationabout/Reversenestedaggregation

rewriteproperty,valuesscoring_boolean/Queryrewritepropertiesconstant_score/Queryrewritepropertiesconstant_score_boolean/Queryrewriteproperties

www.EBooksWorld.ir

Page 744: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

top_terms/Queryrewritepropertiestop_terms_blendedfreqs/Queryrewritepropertiestop_terms_boost_N/Queryrewriteproperties

rightqueryselecting/Choosingtherightqueryusecases/Theusecasesresults,limitingtogiventags/Limitingresultstogiventagsvaluesinrange,searching/Searchingforvaluesinarange

routingabout/Introductiontorouting,Routingdefaultindexing/Defaultindexingdefaultsearching/Defaultsearchingparameters/Theroutingparametersfields/Routingfields

RPMpackageused,forinstallingElasticsearch/InstallingElasticsearchusingRPMpackages

www.EBooksWorld.ir

Page 745: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Ssample

distance-basedsorting/Distance-basedsortingboundingboxfiltering/Boundingboxfilteringdistance,limiting/Limitingthedistance

samplequeriesabout/Samplequeries

sampleraggregationabout/Sampleraggregation

scoreabout/IntroductiontoApacheLucenescoringinfluencing,withqueryboosts/Influencingscoreswithqueryboostsmodifying/Modifyingthescore

score,modifyingabout/Modifyingthescoreconstant_scorequery/Constantscorequeryboostingquery/Boostingqueryfunctionscorequery/Thefunctionscorequery

score_modeparameterabout/Structureofthefunctionquerymultiplevalue/Structureofthefunctionquerysumvalue/Structureofthefunctionqueryavgvalue/Structureofthefunctionqueryfirstvalue/Structureofthefunctionquerymaxvalue/Structureofthefunctionqueryminvalue/Structureofthefunctionquery

scriptfieldsselecting/Usingthescriptfieldsparameters,passingto/Passingparameterstothescriptfields

scriptingcapabilitiesabout/ScriptingcapabilitiesofElasticsearchscriptexecution,availableobjects/Objectsavailableduringscriptexecutionscript,types/Scripttypesquerying,scriptsused/Queryingwithscriptsparameters,using/Scriptingwithparameterslanguages,Groovy/Scriptlanguagesotherthanembeddedlanguages,using/Usingotherthanembeddedlanguagesnativecode,using/Usingnativecode

scriptpropertiesscript/Queryingwithscriptsinline/Queryingwithscriptsid/Queryingwithscriptsfile/Queryingwithscripts

www.EBooksWorld.ir

Page 746: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

lang/Queryingwithscriptsparams/Queryingwithscripts

scripts,scripted_metricaggregationinit_script/Scriptedmetricsaggregationmap_script/Scriptedmetricsaggregationcombine_script/Scriptedmetricsaggregationreduce_script/Scriptedmetricsaggregation

scripttypesabout/Scripttypesinlinescripts/Scripttypes,Inlinescriptsinfilescripts/Infilescriptsindexedscripts/Indexedscripts

ScrollAPIabout/TheScrollAPIproblemdefinition/Problemdefinitionproblemdefinition,solution/Scrollingtotherescue

searching/Defaultsearchingsearchingrequestexecution/Indexingandsearchingsegmentmerging

about/Introductiontosegmentmerging,Segmentmergingneedfor/Theneedforsegmentmergingmergepolicy/Themergepolicymergepolicy,basicproperties/Themergepolicymergescheduler/Themergeschedulerthrottling/Throttling

shardallocationIPaddress,usingfor/UsingtheIPaddressforshardallocationcancelling/Cancelingshardallocationforcing/ForcingshardallocationmultiplecommandsperHTTPrequest/MultiplecommandsperHTTPrequestoperations,allowingonprimaryshards/Allowingoperationsonprimaryshards

shardandreplicaallocationcontrolling/Controllingtheshardandreplicaallocationcontrolling,explicitly/Explicitlycontrollingallocationnodeparameters,specifying/Specifyingnodeparametersconfiguration/Configurationindex,creating/Indexcreationnodes,excluding/Excludingnodesfromallocationnodeattributes,requiring/Requiringnodeattributesnumberofshardsandreplicaspernode/Thenumberofshardsandreplicaspernodeallocationthrottling/Allocationthrottlingclusterwideallocation/Cluster-wideallocationshardsandreplicas,movingmanually/Manuallymovingshardsandreplicas

www.EBooksWorld.ir

Page 747: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

rollingrestarts,handling/Handlingrollingrestartsshardrequestcache

about/Shardrequestcacheenabling/Enablingandconfiguringtheshardrequestcacheconfiguring/Enablingandconfiguringtheshardrequestcacheperrequestshardrequestcache,disabling/Perrequestshardrequestcachedisablingusagemonitoring/Shardrequestcacheusagemonitoring

shards/Index,Shardsmoving/Movingshards

shards,Elasticsearchindexingabout/Shardsandreplicaswriteconsistency,controlling/Writeconsistency

siblingaggregations/Availabletypessignificant_termsaggregation

about/Significanttermsaggregationsignificantterms,selecting/Choosingsignificanttermsmultiplevalue,analyzing/Multiplevalueanalysis

similaritymodelsabout/Differentsimilaritymodelsper-fieldsimilarity,setting/Settingper-fieldsimilarityOkapiBM25model/Availablesimilaritymodelsrandomnessmodel,divergence/Availablesimilaritymodelsinformation-basedmodel/Availablesimilaritymodelsdefaultsimilarity,configuring/ConfiguringdefaultsimilarityBM25similarity,configuring/ConfiguringBM25similarityDFRsimilarity,configuring/ConfiguringDFRsimilarityIBsimilarity,configuring/ConfiguringIBsimilarity

simplequerystringqueryabout/ThesimplequerystringqueryURL/Thesimplequerystringquery

singleElasticsearchnodetuning/PreparingasingleElasticsearchnodegeneralpreparations/Thegeneralpreparationsfielddatacache/Fielddatacacheandbreakingthecircuitcircuit,breaking/Fielddatacacheandbreakingthecircuitdocvalues,using/UsedocvaluesRAMbuffer,usedforindexing/RAMbufferforindexingindexrefreshrate/Indexrefreshratethreadpools/Threadpools

snapshotscreating/Creatingsnapshotsadditionalparameters/Additionalparameters

snowballanalyzer

www.EBooksWorld.ir

Page 748: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

URL/Out-of-the-boxanalyzersSoftwareasaService(SaaS)/SPMforElasticsearchsourcefiltering/Sourcefilteringspan/Aspanspanfirstquery/Spanfirstqueryspannearquery/Spannearqueryspannotquery/Spannotqueryspanorquery/Spanorqueryspanqueries

using/Usingspanqueriesspan/Aspanspan_termquery/Spantermqueryspanfirstquery/Spanfirstqueryspannearquery/Spannearqueryspanorquery/Spanorqueryspannotquery/Spannotqueryspan_withinquery/Spanwithinqueryspan_containingquery/Spancontainingqueryspan_multiquery/Spanmultiqueryperformanceconsiderations/Performanceconsiderations

span_contaningquery/Spancontainingqueryspan_multiquery/Spanmultiqueryspan_termquery/Spantermqueryspan_withinquery/Spanwithinqueryspatialcapabilities

about/Elasticsearchspatialcapabilitiesmappingspreparation/Mappingpreparationforspatialsearchesexampledata/Exampledatageo_fieldproperties/Additionalgeo_fieldproperties

SPMtoolURL/SPMforElasticsearch

standardanalyzerURL/Out-of-the-boxanalyzers

stateandhealth,clustermonitoring/Monitoringyourcluster’sstateandhealthclusterhealthAPI/ClusterhealthAPIindicesstatsAPI/IndicesstatsAPInodesinfoAPI/NodesinfoAPInodesstatsAPI/NodesstatsAPIclusterstateAPI/ClusterstateAPIclusterstatsAPI/ClusterstatsAPIpendingtasksAPI/PendingtasksAPIindicesrecoveryAPI/IndicesrecoveryAPIindicesshardstoresAPI/IndicesshardstoresAPI

www.EBooksWorld.ir

Page 749: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

indicessegmentsAPI/IndicessegmentsAPIstaticproperties,forindexingbuffersizeconfiguration

indices.memory.index_buffer_size/Indexingbuffersindices.memory.min_index_buffer_size/Indexingbuffersindices.memory.max_index_buffer_size/Indexingbuffersindices.memory.min_shard_index_buffer_size/Indexingbuffers

statuscodedefinitionURL/Indexingthedata

stemmingURL/Out-of-the-boxanalyzers

stopanalyzerURL/Out-of-the-boxanalyzers

stopwordsURL/Thecommontermsquery

string,indexstructuremappingterm_vector/Stringanalyzer/Stringsearch_analyzer/Stringnorms.enabled/Stringnorms.loading/Stringposition_offset_gap/Stringindex_options/Stringignore_above/String

suggestersusing/UsingsuggestersURL/Usingsuggesters,Additionaltermsuggesteroptionstypes/Availablesuggestertypessuggestions,including/Includingsuggestionsresponse/Suggesterresponsetextproperty/Suggesterresponsescoreproperty/Suggesterresponsefreqproperty/Suggesterresponse

synonymrulesdefining/DefiningsynonymrulesApacheSolrsynonyms,using/UsingApacheSolrsynonymsWordNetsynonyms,using/UsingWordNetsynonyms

synonymsabout/Wordswiththesamemeaningfiltering/Synonymfilterinmappings/Synonymsinthemappingsstoring,infilesystem/Synonymsstoredonthefilesystemrules,defining/Definingsynonymrulesindex-timesynonymsexpansion/Queryorindex-timesynonymexpansionquery-timesynonymexpansion/Queryorindex-timesynonymexpansion

www.EBooksWorld.ir

Page 750: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

synonymsfilterusing/Synonymfilter

system-specificinstallationandconfigurationabout/Thesystem-specificinstallationandconfigurationElasticsearch,installingonLinux/InstallingElasticsearchonLinuxElasticsearch,configuringassystemserviceonLinux/ConfiguringElasticsearchasasystemserviceonLinuxElasticsearch,usingassystemserviceonWindows/ElasticsearchasasystemserviceonWindows

www.EBooksWorld.ir

Page 751: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

TT-Digestalgorithm

URL/Percentilestemplates

about/Templatesexample/Anexampleofatemplate

termquery/Thetermquerytermsaggregation

about/Termsaggregationapproximatecounts/Countsareapproximateminimumdocumentcount/Minimumdocumentcount

termsquery/Thetermsquerytermsuggester

about/Termsuggesterconfigurationoptions/Termsuggesterconfigurationoptionsoptions/Additionaltermsuggesteroptions

threadpoolsabout/Threadpoolsgeneric/Threadpoolsindex/Threadpoolssearch/Threadpoolssuggest/Threadpoolsget/Threadpoolsbulk/Threadpoolspercolate/Threadpools

throttling,adjustingtypesetting/Throttlingvalue/Throttlingnonevalue/Throttlingmergevalue/Throttlingallvalue/Throttling

timezonesURL/Timezones

tree-likestructuresindexing/Indexingtree-likestructuresdatastructure/Datastructureanalysis/Analysis

typedeterminingmechanismabout/Typedeterminingmechanismdisabling/Disablingthetypedeterminingmechanismtuning,fornumerictypes/Tuningthetypedeterminingmechanismfornumerictypestuning,fordates/Tuningthetypedeterminingmechanismfordates

www.EBooksWorld.ir

Page 752: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

typeproperty,valuesplain/Forcinghighlightertypefvh/Forcinghighlightertypepostins/Forcinghighlightertype

typequery/Thetypequerytypes,suggesters

term/Availablesuggestertypes,Termsuggesterphrase/Availablesuggestertypes,Phrasesuggestercompletion/Availablesuggestertypes,Completionsuggestercontext/Availablesuggestertypes,Contextsuggester

www.EBooksWorld.ir

Page 753: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

UUnicast

URL/DiscoverytypesupdateAPI

used,formodifyingindexstructure/ModifyingyourindexstructurewiththeupdateAPI

UpdateAPIURL/Addingpartialdocuments

updatesettingsAPIabout/TheupdatesettingsAPIclustersettingsAPI/TheclustersettingsAPIindicessettingsAPI/TheindicessettingsAPI

URIquerystringparametersabout/URIquerystringparametersquery/Thequerydefaultsearchfield/Thedefaultsearchfieldanalyzerproperty/Analyzerdefaultoperator/Thedefaultoperatorpropertyexplainparameter/Queryexplanationfieldsreturned/Thefieldsreturnedresults,sorting/Sortingtheresultssearchtimeout/Thesearchtimeoutresultswindow/Theresultswindowpershardresults,limiting/Limitingper-shardresultsunavailableindices,ignoring/Ignoringunavailableindicessearchtype/Thesearchtypelowercasingtermsexpansion/Lowercasingtermexpansionwildcardqueriesanalysis/Wildcardandprefixanalysisanalyze_wildcardproperty/Wildcardandprefixanalysisprefixqueriesanalysis/Wildcardandprefixanalysis

URIrequestqueryused,forsearching/SearchingwiththeURIrequestquerysampledata/SampledataURIsearch/URIsearchanalyzing/Queryanalysisparameters/URIquerystringparametersLucenequerysyntax/Lucenequerysyntax

URIsearchabout/URIsearchElasticsearchqueryresponse/ElasticsearchqueryresponseURL/Wildcardandprefixanalysis

www.EBooksWorld.ir

Page 754: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

VValidateAPI

using/UsingtheValidateAPIvalues,has_childqueryparameter

none/Queryingdatainthechilddocumentsmin/Queryingdatainthechilddocumentsmax/Queryingdatainthechilddocumentssum/Queryingdatainthechilddocumentsavg/Queryingdatainthechilddocuments

values,inrangesearching/Searchingforvaluesinarangematcheddocuments,boosting/Boostingsomeofthematcheddocumentslowerscoringpartialqueries,ignoring/IgnoringlowerscoringpartialqueriesLucenequerysyntax,usinginqueries/UsingLucenequerysyntaxinqueriesuserquerieswithouterrors,handling/Handlinguserquerieswithouterrorsprefixes,usedforprovidingautocompletefunctionality/Autocompleteusingprefixessimilarterms,finding/Findingtermssimilartoagivenonespans/Spans,spanseverywhere

values,score_modepropertyavg/Scoringandnestedqueriessum/Scoringandnestedqueriesmin/Scoringandnestedqueriesmax/Scoringandnestedqueriesnone/Scoringandnestedqueries

versioningabout/Versioningusageexample/Usageexamplefromexternalsystem/Versioningfromexternalsystems

verticalscaling/PreparingasingleElasticsearchnode

www.EBooksWorld.ir

Page 755: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

Wwarmingquery

about/Warmingupdefining/Defininganewwarmingquerydefinedwarmingqueries,retrieving/Retrievingthedefinedwarmingqueriesdeleting/Deletingawarmingquerywarmingupfunctionality,disabling/Disablingthewarmingupfunctionality

wildcardquery/ThewildcardqueryWindows

Elasticsearch,configuringassystemservice/ElasticsearchasasystemserviceonWindows

WordNetURL/UsingWordNetsynonyms

www.EBooksWorld.ir

Page 756: Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for values in a range ... Percolator The index Percolator preparation Getting deeper

ZZendiscovery

about/Zendiscoverymasterelectionconfiguration/Masterelectionconfigurationunicast,configuring/Configuringunicastfaultdetectionpingsettings/Faultdetectionpingsettingsclusterstateupdatescontrol/Clusterstateupdatescontrolmasterunavailability,dealingwith/Dealingwithmasterunavailability

www.EBooksWorld.ir