elasticsearch server - third edition · the use cases limiting results to given tags searching for...
TRANSCRIPT
www.EBooksWorld.ir
www.EBooksWorld.ir
ElasticsearchServerThirdEdition
www.EBooksWorld.ir
TableofContents
ElasticsearchServerThirdEdition
Credits
AbouttheAuthors
AbouttheReviewer
www.PacktPub.com
eBooks,discountoffers,andmore
Whysubscribe?
Preface
Whatthisbookcovers
Whatyouneedforthisbook
Whothisbookisfor
Conventions
Readerfeedback
Customersupport
Downloadingtheexamplecode
Downloadingthecolorimagesofthisbook
Errata
Piracy
Questions
1.GettingStartedwithElasticsearchCluster
Fulltextsearching
TheLuceneglossaryandarchitecture
Inputdataanalysis
Indexingandquerying
Scoringandqueryrelevance
ThebasicsofElasticsearch
KeyconceptsofElasticsearch
Index
Document
www.EBooksWorld.ir
Documenttype
Mapping
KeyconceptsoftheElasticsearchinfrastructure
Nodesandclusters
Shards
Replicas
Gateway
Indexingandsearching
Installingandconfiguringyourcluster
InstallingJava
InstallingElasticsearch
RunningElasticsearch
ShuttingdownElasticsearch
Thedirectorylayout
ConfiguringElasticsearch
Thesystem-specificinstallationandconfiguration
InstallingElasticsearchonLinux
InstallingElasticsearchusingRPMpackages
InstallingElasticsearchusingtheDEBpackage
Elasticsearchconfigurationfilelocalization
ConfiguringElasticsearchasasystemserviceonLinux
ElasticsearchasasystemserviceonWindows
ManipulatingdatawiththeRESTAPI
UnderstandingtheRESTAPI
StoringdatainElasticsearch
Creatinganewdocument
Automaticidentifiercreation
Retrievingdocuments
Updatingdocuments
Dealingwithnon-existingdocuments
Addingpartialdocuments
www.EBooksWorld.ir
Deletingdocuments
Versioning
Usageexample
Versioningfromexternalsystems
SearchingwiththeURIrequestquery
Sampledata
URIsearch
Elasticsearchqueryresponse
Queryanalysis
URIquerystringparameters
Thequery
Thedefaultsearchfield
Analyzer
Thedefaultoperatorproperty
Queryexplanation
Thefieldsreturned
Sortingtheresults
Thesearchtimeout
Theresultswindow
Limitingper-shardresults
Ignoringunavailableindices
Thesearchtype
Lowercasingtermexpansion
Wildcardandprefixanalysis
Lucenequerysyntax
Summary
2.IndexingYourData
Elasticsearchindexing
Shardsandreplicas
Writeconsistency
Creatingindices
www.EBooksWorld.ir
Alteringautomaticindexcreation
Settingsforanewlycreatedindex
Indexdeletion
Mappingsconfiguration
Typedeterminingmechanism
Disablingthetypedeterminingmechanism
Tuningthetypedeterminingmechanismfornumerictypes
Tuningthetypedeterminingmechanismfordates
Indexstructuremapping
Typeandtypesdefinition
Fields
Coretypes
Commonattributes
String
Number
Boolean
Binary
Date
Multifields
TheIPaddresstype
Tokencounttype
Usinganalyzers
Out-of-the-boxanalyzers
Definingyourownanalyzers
Defaultanalyzers
Differentsimilaritymodels
Settingper-fieldsimilarity
Availablesimilaritymodels
Configuringdefaultsimilarity
ConfiguringBM25similarity
ConfiguringDFRsimilarity
www.EBooksWorld.ir
ConfiguringIBsimilarity
Batchindexingtospeedupyourindexingprocess
Preparingdataforbulkindexing
Indexingthedata
The_allfield
The_sourcefield
Additionalinternalfields
Introductiontosegmentmerging
Segmentmerging
Theneedforsegmentmerging
Themergepolicy
Themergescheduler
Throttling
Introductiontorouting
Defaultindexing
Defaultsearching
Routing
Theroutingparameters
Routingfields
Summary
3.SearchingYourData
QueryingElasticsearch
Theexampledata
Asimplequery
Pagingandresultsize
Returningtheversionvalue
Limitingthescore
Choosingthefieldsthatwewanttoreturn
Sourcefiltering
Usingthescriptfields
Passingparameterstothescriptfields
www.EBooksWorld.ir
Understandingthequeryingprocess
Querylogic
Searchtype
Searchexecutionpreference
SearchshardsAPI
Basicqueries
Thetermquery
Thetermsquery
Thematchallquery
Thetypequery
Theexistsquery
Themissingquery
Thecommontermsquery
Thematchquery
TheBooleanmatchquery
Thephrasematchquery
Thematchphraseprefixquery
Themultimatchquery
Thequerystringquery
Runningthequerystringqueryagainstmultiplefields
Thesimplequerystringquery
Theidentifiersquery
Theprefixquery
Thefuzzyquery
Thewildcardquery
Therangequery
Regularexpressionquery
Themorelikethisquery
Compoundqueries
Theboolquery
Thedis_maxquery
www.EBooksWorld.ir
Theboostingquery
Theconstant_scorequery
Theindicesquery
Usingspanqueries
Aspan
Spantermquery
Spanfirstquery
Spannearquery
Spanorquery
Spannotquery
Spanwithinquery
Spancontainingquery
Spanmultiquery
Performanceconsiderations
Choosingtherightquery
Theusecases
Limitingresultstogiventags
Searchingforvaluesinarange
Boostingsomeofthematcheddocuments
Ignoringlowerscoringpartialqueries
UsingLucenequerysyntaxinqueries
Handlinguserquerieswithouterrors
Autocompleteusingprefixes
Findingtermssimilartoagivenone
Matchingphrases
Spans,spanseverywhere
Summary
4.ExtendingYourQueryingKnowledge
Filteringyourresults
Thecontextisthekey
Explicitfilteringwithboolquery
www.EBooksWorld.ir
Highlighting
Gettingstartedwithhighlighting
Fieldconfiguration
Underthehood
Forcinghighlightertype
ConfiguringHTMLtags
Controllinghighlightedfragments
Globalandlocalsettings
Requirematching
Customhighlightingquery
ThePostingshighlighter
Validatingyourqueries
UsingtheValidateAPI
Sortingdata
Defaultsorting
Selectingfieldsusedforsorting
Sortingmode
Specifyingbehaviorformissingfields
Dynamiccriteria
Calculatescoringwhensorting
Queryrewrite
Prefixqueryasanexample
GettingbacktoApacheLucene
Queryrewriteproperties
Summary
5.ExtendingYourIndexStructure
Indexingtree-likestructures
Datastructure
Analysis
Indexingdatathatisnotflat
Data
www.EBooksWorld.ir
Objects
Arrays
Mappings
Finalmappings
SendingthemappingstoElasticsearch
Tobeornottobedynamic
Disablingobjectindexing
Usingnestedobjects
Scoringandnestedqueries
Usingtheparent-childrelationship
Indexstructureanddataindexing
Childmappings
Parentmappings
Theparentdocument
Childdocuments
Querying
Queryingdatainthechilddocuments
Queryingdataintheparentdocuments
Performanceconsiderations
ModifyingyourindexstructurewiththeupdateAPI
Themappings
Addinganewfieldtotheexistingindex
Modifyingfieldsofanexistingindex
Summary
6.MakeYourSearchBetter
IntroductiontoApacheLucenescoring
Whenadocumentismatched
Defaultscoringformula
Relevancymatters
ScriptingcapabilitiesofElasticsearch
Objectsavailableduringscriptexecution
www.EBooksWorld.ir
Scripttypes
Infilescripts
Inlinescripts
Indexedscripts
Queryingwithscripts
Scriptingwithparameters
Scriptlanguages
Usingotherthanembeddedlanguages
Usingnativecode
Thefactoryimplementation
Implementingthenativescript
Theplugindefinition
Installingtheplugin
Runningthescript
Searchingcontentindifferentlanguages
Handlinglanguagesdifferently
Handlingmultiplelanguages
Detectingthelanguageofthedocument
Sampledocument
Themappings
Querying
Querieswithanidentifiedlanguage
Querieswithanunknownlanguage
Combiningqueries
Influencingscoreswithqueryboosts
Theboost
Addingtheboosttoqueries
Modifyingthescore
Constantscorequery
Boostingquery
Thefunctionscorequery
www.EBooksWorld.ir
Structureofthefunctionquery
Theweightfactorfunction
Fieldvaluefactorfunction
Thescriptscorefunction
Therandomscorefunction
Decayfunctions
Whendoesindex-timeboostingmakesense?
Definingboostinginthemappings
Wordswiththesamemeaning
Synonymfilter
Synonymsinthemappings
Synonymsstoredonthefilesystem
Definingsynonymrules
UsingApacheSolrsynonyms
Explicitsynonyms
Equivalentsynonyms
Expandingsynonyms
UsingWordNetsynonyms
Queryorindex-timesynonymexpansion
Understandingtheexplaininformation
Understandingfieldanalysis
Explainingthequery
Summary
7.AggregationsforDataAnalysis
Aggregations
Generalquerystructure
Insidetheaggregationsengine
Aggregationtypes
Metricsaggregations
Minimum,maximum,average,andsum
Missingvalues
www.EBooksWorld.ir
Usingscripts
Fieldvaluestatisticsandextendedstatistics
Valuecount
Fieldcardinality
Percentiles
Percentileranks
Tophitsaggregation
Additionalparameters
Geoboundsaggregation
Scriptedmetricsaggregation
Bucketsaggregations
Filteraggregation
Filtersaggregation
Termsaggregation
Countsareapproximate
Minimumdocumentcount
Rangeaggregation
Keyedbuckets
Daterangeaggregation
IPv4rangeaggregation
Missingaggregation
Histogramaggregation
Datehistogramaggregation
Timezones
Geodistanceaggregations
Geohashgridaggregation
Globalaggregation
Significanttermsaggregation
Choosingsignificantterms
Multiplevalueanalysis
Sampleraggregation
www.EBooksWorld.ir
Childrenaggregation
Nestedaggregation
Reversenestedaggregation
Nestingaggregationsandorderingbuckets
Bucketsordering
Pipelineaggregations
Availabletypes
Referencingotheraggregations
Gapsinthedata
Pipelineaggregationtypes
Min,max,sum,andaveragebucketaggregations
Cumulativesumaggregation
Bucketselectoraggregation
Bucketscriptaggregation
Serialdifferencingaggregation
Derivativeaggregation
Movingavgaggregation
Predictingfuturebuckets
Themodels
Summary
8.BeyondFull-textSearching
Percolator
Theindex
Percolatorpreparation
Gettingdeeper
Controllingthesizeofreturnedresults
Percolatorandscorecalculation
Combiningpercolatorswithotherfunctionalities
Gettingthenumberofmatchingqueries
Indexeddocumentpercolation
Elasticsearchspatialcapabilities
www.EBooksWorld.ir
Mappingpreparationforspatialsearches
Exampledata
Additionalgeo_fieldproperties
Samplequeries
Distance-basedsorting
Boundingboxfiltering
Limitingthedistance
Arbitrarygeoshapes
Point
Envelope
Polygon
Multipolygon
Anexampleusage
Storingshapesintheindex
Usingsuggesters
Availablesuggestertypes
Includingsuggestions
Suggesterresponse
Termsuggester
Termsuggesterconfigurationoptions
Additionaltermsuggesteroptions
Phrasesuggester
Configuration
Completionsuggester
Indexingdata
Queryingindexedcompletionsuggesterdata
Customweights
Contextsuggester
Contexttypes
Usingcontext
Usingthegeolocationcontext
www.EBooksWorld.ir
TheScrollAPI
Problemdefinition
Scrollingtotherescue
Summary
9.ElasticsearchClusterinDetail
Understandingnodediscovery
Discoverytypes
Noderoles
Masternode
Datanode
Clientnode
Configuringnoderoles
Settingthecluster’sname
Zendiscovery
Masterelectionconfiguration
Configuringunicast
Faultdetectionpingsettings
Clusterstateupdatescontrol
Dealingwithmasterunavailability
AdjustingHTTPtransportsettings
DisablingHTTP
HTTPport
HTTPhost
Thegatewayandrecoverymodules
Thegateway
Recoverycontrol
Additionalgatewayrecoveryoptions
IndicesrecoveryAPI
Delayedallocation
Indexrecoveryprioritization
Templatesanddynamictemplates
www.EBooksWorld.ir
Templates
Anexampleofatemplate
Dynamictemplates
Thematchingpattern
Fielddefinitions
Elasticsearchplugins
Thebasics
Installingplugins
Removingplugins
Elasticsearchcaches
Fielddatacache
Fielddatasize
Circuitbreakers
Fielddataanddocvalues
Shardrequestcache
Enablingandconfiguringtheshardrequestcache
Perrequestshardrequestcachedisabling
Shardrequestcacheusagemonitoring
Nodequerycache
Indexingbuffers
Whencachesshouldbeavoided
TheupdatesettingsAPI
TheclustersettingsAPI
TheindicessettingsAPI
Summary
10.AdministratingYourCluster
Elasticsearchtimemachine
Creatingasnapshotrepository
Creatingsnapshots
Additionalparameters
Restoringasnapshot
www.EBooksWorld.ir
Cleaningup–deletingoldsnapshots
Monitoringyourcluster’sstateandhealth
ClusterhealthAPI
Controllinginformationdetails
Additionalparameters
IndicesstatsAPI
Docs
Store
Indexing,get,andsearch
Additionalinformation
NodesinfoAPI
Returnedinformation
NodesstatsAPI
ClusterstateAPI
ClusterstatsAPI
PendingtasksAPI
IndicesrecoveryAPI
IndicesshardstoresAPI
IndicessegmentsAPI
Controllingtheshardandreplicaallocation
Explicitlycontrollingallocation
Specifyingnodeparameters
Configuration
Indexcreation
Excludingnodesfromallocation
Requiringnodeattributes
UsingtheIPaddressforshardallocation
Disk-basedshardallocation
Configuringdiskbasedshardallocation
Disablingdiskbasedshardallocation
Thenumberofshardsandreplicaspernode
www.EBooksWorld.ir
Allocationthrottling
Cluster-wideallocation
Allocationawareness
Forcingallocationawareness
Filtering
Whatdoinclude,exclude,andrequiremean
Manuallymovingshardsandreplicas
Movingshards
Cancelingshardallocation
Forcingshardallocation
MultiplecommandsperHTTPrequest
Allowingoperationsonprimaryshards
Handlingrollingrestarts
Controllingclusterrebalancing
Understandingrebalance
Clusterbeingready
Theclusterrebalancesettings
Controllingwhenrebalancingwillbeallowed
Controllingthenumberofshardsbeingmovedbetweennodesconcurrently
Controllingwhichshardsmayberebalanced
TheCatAPI
Thebasics
UsingCatAPI
Commonarguments
Theexamples
Gettinginformationaboutthemasternode
Gettinginformationaboutthenodes
Retrievingrecoveryinformationforanindex
Warmingup
Defininganewwarmingquery
Retrievingthedefinedwarmingqueries
www.EBooksWorld.ir
Deletingawarmingquery
Disablingthewarmingupfunctionality
Choosingqueriesforwarming
Indexaliasingandusingittosimplifyyoureverydaywork
Analias
Creatinganalias
Modifyingaliases
Combiningcommands
Retrievingaliases
Removingaliases
Filteringaliases
Aliasesandrouting
Zerodowntimereindexingandaliases
Summary
11.ScalingbyExample
Hardware
Physicalserversoracloud
CPU
RAMmemory
Massstorage
Thenetwork
Howmanyservers
Costcutting
PreparingasingleElasticsearchnode
Thegeneralpreparations
Avoidingswapping
Filedescriptors
Virtualmemory
Thememory
Fielddatacacheandbreakingthecircuit
Usedocvalues
www.EBooksWorld.ir
RAMbufferforindexing
Indexrefreshrate
Threadpools
Horizontalexpansion
Automaticallycreatingthereplicas
Redundancyandhighavailability
Costandperformanceflexibility
Continuousupgrades
MultipleElasticsearchinstancesonasinglephysicalmachine
Preventingashardanditsreplicasfrombeingonthesamenode
Designatednoderolesforlargerclusters
Queryaggregatornodes
Datanodes
Mastereligiblenodes
Preparingtheclusterforhighindexingandqueryingthroughput
Indexingrelatedadvice
Indexrefreshrate
Threadpoolstuning
Automaticstorethrottling
Handlingtime-baseddata
Multipledatapaths
Datadistribution
Bulkindexing
RAMbufferforindexing
Adviceforhighqueryratescenarios
Shardrequestcache
Thinkaboutthequeries
Parallelizeyourqueries
Fielddatacacheandbreakingthecircuit
Keepsizeandshardsizeundercontrol
Monitoring
www.EBooksWorld.ir
ElasticsearchHQ
Marvel
SPMforElasticsearch
Summary
Index
www.EBooksWorld.ir
www.EBooksWorld.ir
ElasticsearchServerThirdEdition
www.EBooksWorld.ir
www.EBooksWorld.ir
ElasticsearchServerThirdEditionCopyright©2016PacktPublishing
Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.
Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthors,norPacktPublishing,anditsdealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.
PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.
Firstpublished:October2013
Secondedition:February2015
Thirdedition:February2016
Productionreference:1230216
PublishedbyPacktPublishingLtd.
LiveryPlace
35LiveryStreet
BirminghamB32PB,UK.
ISBN978-1-78588-881-6
www.packtpub.com
www.EBooksWorld.ir
www.EBooksWorld.ir
CreditsAuthors
RafałKuć
MarekRogoziński
Reviewer
PaigeCook
CommissioningEditor
NadeemBagban
AcquisitionEditor
DivyaPoojari
ContentDevelopmentEditor
KirtiPatil
TechnicalEditor
UtkarshaS.Kadam
CopyEditor
AlphaSingh
ProjectCoordinator
NidhiJoshi
Proofreader
SafisEditing
Indexer
RekhaNair
Graphics
JasonMonteiro
ProductionCoordinator
ManuJoseph
CoverWork
ManuJoseph
www.EBooksWorld.ir
www.EBooksWorld.ir
AbouttheAuthorsRafałKućisasoftwareengineer,trainer,speakerandconsultant.HeisworkingasaconsultantandsoftwareengineeratSematextGroupInc.whereheconcentratesonopensourcetechnologiessuchasApacheLucene,Solr,andElasticsearch.Hehasmorethan14yearsofexperienceinvarioussoftwaredomains—frombankingsoftwaretoe–commerceproducts.HeismainlyfocusedonJava;however,heisopentoeverytoolandprogramminglanguagethatmighthelphimtoachievehisgoalseasilyandquickly.Rafałisalsooneofthefoundersofthesolr.plsite,wherehetriestosharehisknowledgeandhelppeoplesolvetheirSolrandLuceneproblems.HeisalsoaspeakeratvariousconferencesaroundtheworldsuchasLuceneEurocon,BerlinBuzzwords,ApacheCon,Lucene/SolrRevolution,Velocity,andDevOpsDays.
RafałbeganhisjourneywithLucenein2002;however,itwasn’tloveatfirstsight.WhenhecamebacktoLuceneinlate2003,herevisedhisthoughtsabouttheframeworkandsawthepotentialinsearchtechnologies.ThenSolrcameandthatwasit.HestartedworkingwithElasticsearchinthemiddleof2010.Atpresent,Lucene,Solr,Elasticsearch,andinformationretrievalarehismainareasofinterest.
RafałisalsotheauthoroftheSolrCookbookseries,ElasticSearchServeranditssecondedition,andthefirstandsecondeditionsofMasteringElasticSearch,allpublishedbyPacktPublishing.
MarekRogozińskiisasoftwarearchitectandconsultantwithmorethan10yearsofexperience.Hisspecializationconcernssolutionsbasedonopensourcesearchengines,suchasSolrandElasticsearch,andthesoftwarestackforbigdataanalyticsincludingHadoop,Hbase,andTwitterStorm.
Heisalsoacofounderofthesolr.plsite,whichpublishesinformationandtutorialsaboutSolrandLucenelibraries.HeisthecoauthorofElasticSearchServeranditssecondedition,andthefirstandsecondeditionsofMasteringElasticSearch,allpublishedbyPacktPublishing.
HeiscurrentlythechieftechnologyofficerandleadarchitectatZenCard,acompanythatprocessesandanalyzeslargequantitiesofpaymenttransactionsinrealtime,allowingautomaticandanonymousidentificationofretailcustomersonallretailerchannels(m-commerce/e-commerce/brick&mortar)andgivingretailersacustomerretentionandloyaltytool.
www.EBooksWorld.ir
www.EBooksWorld.ir
AbouttheReviewerPaigeCookworksasasoftwarearchitectforVidea,partoftheCoxFamilyofCompanies,andlivesnearAtlanta,Georgia.Hehastwentyyearsofexperienceinsoftwaredevelopment,primarilywiththeMicrosoft.NETFramework.Hiscareerhasbeenlargelyfocusedonbuildingenterprisesolutionsforthemediaandentertainmentindustry.HeisespeciallyinterestedinsearchtechnologiesusingtheApacheLucenesearchengineandhasexperiencewithbothElasticsearchandApacheSolr.Apartfromhiswork,heenjoysDIYhomeprojectsandspendingtimewithhiswifeandtwodaughters.
www.EBooksWorld.ir
www.EBooksWorld.ir
www.PacktPub.com
www.EBooksWorld.ir
eBooks,discountoffers,andmoreDidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusat<[email protected]>formoredetails.
Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooksandeBooks.
https://www2.packtpub.com/books/subscription/packtlib
DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt’sonlinedigitalbooklibrary.Here,youcansearch,access,andreadPackt’sentirelibraryofbooks.
www.EBooksWorld.ir
Whysubscribe?FullysearchableacrosseverybookpublishedbyPacktCopyandpaste,print,andbookmarkcontentOndemandandaccessibleviaawebbrowser
www.EBooksWorld.ir
www.EBooksWorld.ir
PrefaceWelcometoElasticsearchServer,ThirdEdition.ThisisthethirdinstalmentofthebookdedicatedtoyetanothermajorreleaseofElasticsearch—thistimeversion2.2.Inthethirdedition,wehavedecidedtogoonasimilarroutethatwetookwhenwewrotethesecondeditionofthebook.WenotonlyupdatedthecontenttomatchthenewversionofElasticsearch,butalsorestructuredthebookbyremovingandaddingnewsectionsandchapters.Wereadthesuggestionswegotfromyou—thereadersofthebook,andwecarefullytriedtoincorporatethesuggestionsandcommentsreceivedsincethereleaseofthefirstandsecondeditions.
Whilereadingthisbook,youwillbetakenonajourneytothewonderfulworldoffull-textsearchprovidedbytheElasticsearchserver.WewillstartwithageneralintroductiontoElasticsearch,whichcovershowtostartandrunElasticsearch,itsbasicconcepts,andhowtoindexandsearchyourdatainthemostbasicway.Thisbookwillalsodiscussthequerylanguage,socalledQueryDSL,thatallowsyoutocreatecomplicatedqueriesandfilterreturnedresults.Inadditiontoallofthis,you’llseehowyoucanusetheaggregationframeworktocalculateaggregateddatabasedontheresultsreturnedbyyourqueries.WewillimplementtheautocompletefunctionalitytogetherandlearnhowtouseElasticsearchspatialcapabilitiesandprospectivesearch.
Finally,thisbookwillshowyouElasticsearch’sadministrationAPIcapabilitieswithfeaturessuchasshardplacementcontrol,clusterhandling,andmore,endingwithadedicatedchapterthatwilldiscussElasticsearch’spreparationforsmallandlargedeployments—bothonesthatconcentrateonindexingandalsoonesthatconcentrateonindexing.
www.EBooksWorld.ir
WhatthisbookcoversChapter1,GettingStartedwithElasticsearchCluster,coverswhatfull-textsearchingis,whatApacheLuceneis,whattextanalysisis,howtorunandconfigureElasticsearch,andfinally,howtoindexandsearchyourdatainthemostbasicway.
Chapter2,IndexingYourData,showshowindexingworks,howtoprepareindexstructure,whatdatatypesweareallowedtouse,howtospeedupindexing,whatsegmentsare,howmergingworks,andwhatroutingis.
Chapter3,SearchingYourData,introducesthefull-textsearchcapabilitiesofElasticsearchbydiscussinghowtoqueryit,howthequeryingprocessworks,andwhattypesofbasicandcompoundqueriesareavailable.Inadditiontothis,wewillshowhowtouseposition-awarequeriesinElasticsearch.
Chapter4,ExtendingYourQueryKnowledge,showshowtoefficientlynarrowdownyoursearchresultsbyusingfilters,howhighlightingworks,howtosortyourresults,andhowqueryrewriteworks.
Chapter5,ExtendingYourIndexStructure,showshowtoindexmorecomplexdatastructures.Welearnhowtoindextree-likedatatypes,howtoindexdatawithrelationshipsbetweendocuments,andhowtomodifyindexstructure.
Chapter6,MakeYourSearchBetter,coversApacheLucenescoringandhowtoinfluenceitinElasticsearch,thescriptingcapabilitiesofElasticsearch,anditslanguageanalysiscapabilities.
Chapter7,AggregationsforDataAnalysis,introducesyoutothegreatworldofdataanalysisbyshowingyouhowtousetheElasticsearchaggregationframework.Wewilldiscussalltypesofaggregations—metrics,buckets,andthenewpipelineaggregationsthathavebeenintroducedinElasticsearch.
Chapter8,BeyondFull-textSearching,discussesnonfull-textsearch-relatedfunctionalitiessuchaspercolator—reversedsearch,andthegeo-spatialcapabilitiesofElasticsearch.Thischapteralsodiscussessuggesters,whichallowustobuildaspellcheckingfunctionalityandanefficientautocompletemechanism,andwewillshowhowtohandledeep-pagingefficiently.
Chapter9,ElasticsearchClusterinDetail,discussesnodesdiscoverymechanism,recoveryandgatewayElasticsearchmodules,templates,caches,andsettingsupdateAPI.
Chapter10,AdministratingYourCluster,coverstheElasticsearchbackupfunctionality,rebalancing,andshardsmoving.Inadditiontothis,youwilllearnhowtousethewarmupfunctionality,usetheCatAPI,andworkwithaliases.
Chapter11,ScalingbyExample,isdedicatedtoscalingandtuning.WewillstartwithhardwarepreparationsandconsiderationsandasingleElasticsearchnode-relatedtuning.Wewillgothroughclustersetupandverticalscaling,endingthechapterwithhighqueryingandindexingusecasesandclustermonitoring.
www.EBooksWorld.ir
www.EBooksWorld.ir
WhatyouneedforthisbookThisbookwaswrittenusingElasticsearchserver2.2andalltheexamplesandfunctionsshouldworkwiththis.Inadditiontothis,you’llneedacommandthatallowsyoutosendHTTPrequestsuchascurl,whichisavailableformostoperatingsystems.Pleasenotethatalltheexamplesinthisbookusethepreviouslymentionedcurltool.Ifyouwanttouseanothertool,pleaseremembertoformattherequestinanappropriatewaythatisunderstoodbythetoolofyourchoice.
Inadditiontothis,somechaptersmayrequireadditionalsoftware,suchasElasticsearchplugins,butwhenneededithasbeenexplicitlymentioned.
www.EBooksWorld.ir
www.EBooksWorld.ir
WhothisbookisforIfyouareabeginnertotheworldoffull-textsearchandElasticsearch,thenthisbookisespeciallyforyou.YouwillbeguidedthroughthebasicsofElasticsearchandyouwilllearnhowtousesomeoftheadvancedfunctionalities.
IfyouknowElasticsearchandyouworkedwithit,thenyoumayfindthisbookinterestingasitprovidesaniceoverviewofallthefunctionalitieswithexamplesanddescriptions.However,youmayencountersectionsthatyoualreadyknow.
IfyouknowtheApacheSolrsearchengine,thisbookcanalsobeusedtocomparesomefunctionalitiesofApacheSolrandElasticsearch.Thismaygiveyoutheknowledgeaboutwhichtoolismoreappropriateforyourusecase.
IfyouknowallthedetailsaboutElasticsearchandyouknowhoweachoftheconfigurationparameterswork,thenthisisdefinitelynotthebookyouarelookingfor.
www.EBooksWorld.ir
www.EBooksWorld.ir
ConventionsInthisbook,youwillfindanumberoftextstylesthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestylesandanexplanationoftheirmeaning.
Codewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandlesareshownasfollows:“IfyouusetheLinuxorOSXcommand,thecURLpackageshouldalreadybeavailable.”
Ablockofcodeissetasfollows:
{
"mappings":{
"post":{
"properties":{
"id":{"type":"long"},
"name":{"type":"string"},
"published":{"type":"date"},
"contents":{"type":"string"}
}
}
}
}
Whenwewishtodrawyourattentiontoaparticularpartofacodeblock,therelevantlinesoritemsaresetinbold:
{
"mappings":{
"post":{
"properties":{
"id":{"type":"long"},
"name":{"type":"string"},
"published":{"type":"date"},
"contents":{"type":"string"}
}
}
}
}
Anycommand-lineinputoroutputiswrittenasfollows:
curl-XPUThttp://localhost:9200/users/?pretty-d'{
"mappings":{
"user":{
"numeric_detection":true
}
}
}'
NoteWarningsorimportantnotesappearinaboxlikethis.
www.EBooksWorld.ir
www.EBooksWorld.ir
ReaderfeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook—whatyoulikedordisliked.Readerfeedbackisimportantforusasithelpsusdeveloptitlesthatyouwillreallygetthemostoutof.
Tosendusgeneralfeedback,simplye-mail<[email protected]>,andmentionthebook’stitleinthesubjectofyourmessage.
Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideatwww.packtpub.com/authors.
www.EBooksWorld.ir
www.EBooksWorld.ir
CustomersupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.
www.EBooksWorld.ir
DownloadingtheexamplecodeYoucandownloadtheexamplecodefilesforthisbookfromyouraccountathttp://www.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.
Youcandownloadthecodefilesbyfollowingthesesteps:
1. Loginorregistertoourwebsiteusingyoure-mailaddressandpassword.2. HoverthemousepointerontheSUPPORTtabatthetop.3. ClickonCodeDownloads&Errata.4. EnterthenameofthebookintheSearchbox.5. Selectthebookforwhichyou’relookingtodownloadthecodefiles.6. Choosefromthedrop-downmenuwhereyoupurchasedthisbookfrom.7. ClickonCodeDownload.
Oncethefileisdownloaded,pleasemakesurethatyouunziporextractthefolderusingthelatestversionof:
WinRAR/7-ZipforWindowsZipeg/iZip/UnRarXforMac7-Zip/PeaZipforLinux
www.EBooksWorld.ir
DownloadingthecolorimagesofthisbookWealsoprovideyouwithaPDFfilethathascolorimagesofthescreenshots/diagramsusedinthisbook.Thecolorimageswillhelpyoubetterunderstandthechangesintheoutput.Youcandownloadthisfilefromhttps://www.packtpub.com/sites/default/files/downloads/ElasticsearchServerThirdEdition_ColorImages.pdf
www.EBooksWorld.ir
ErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyoufindamistakeinoneofourbooks—maybeamistakeinthetextorthecode—wewouldbegratefulifyoucouldreportthistous.Bydoingso,youcansaveotherreadersfromfrustrationandhelpusimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,pleasereportthembyvisitinghttp://www.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetailsofyourerrata.Onceyourerrataareverified,yoursubmissionwillbeacceptedandtheerratawillbeuploadedtoourwebsiteoraddedtoanylistofexistingerrataundertheErratasectionofthattitle.
Toviewthepreviouslysubmittederrata,gotohttps://www.packtpub.com/books/content/supportandenterthenameofthebookinthesearchfield.TherequiredinformationwillappearundertheErratasection.
www.EBooksWorld.ir
PiracyPiracyofcopyrightedmaterialontheInternetisanongoingproblemacrossallmedia.AtPackt,wetaketheprotectionofourcopyrightandlicensesveryseriously.IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,pleaseprovideuswiththelocationaddressorwebsitenameimmediatelysothatwecanpursuearemedy.
Pleasecontactusat<[email protected]>withalinktothesuspectedpiratedmaterial.
Weappreciateyourhelpinprotectingourauthorsandourabilitytobringyouvaluablecontent.
www.EBooksWorld.ir
QuestionsIfyouhaveaproblemwithanyaspectofthisbook,youcancontactusat<[email protected]>,andwewilldoourbesttoaddresstheproblem.
www.EBooksWorld.ir
www.EBooksWorld.ir
Chapter1.GettingStartedwithElasticsearchClusterWelcometothewonderfulworldofElasticsearch—agreatfulltextsearchandanalyticsengine.Itdoesn’tmatterifyouarenewtoElasticsearchandfulltextsearchesingeneral,orifyoualreadyhavesomeexperienceinthis.Wehopethat,byreadingthisbook,you’llbeabletolearnandextendyourknowledgeofElasticsearch.Asthisbookisalsodedicatedtobeginners,wedecidedtostartwithashortintroductiontofulltextsearchesingeneral,andafterthat,abriefoverviewofElasticsearch.
PleaserememberthatElasticsearchisarapidlychangingofsoftware.Notonlyarefeaturesadded,buttheElasticsearchcorefunctionalityisalsoconstantlyevolvingandchanging.Wetrytokeepupwiththesechanges,andbecauseofthiswearegivingyouthethirdeditionofthebookdedicatedtoElasticsearch2.x.
ThefirstthingweneedtodowithElasticsearchisinstallandconfigureit.Withmanyapplications,youstartwiththeinstallationandconfigurationandusuallyforgettheimportanceofthesesteps.Wewilltrytoguideyouthroughthesestepssothatitbecomeseasiertoremember.Inadditiontothis,wewillshowyouthesimplestwaytoindexandretrievedatawithoutgoingintotoomuchdetail.ThefirstchapterwilltakeyouonaquickridethroughElasticsearchandthefulltextsearchworld.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:
FulltextsearchingThebasicsofApacheLucenePerformingtextanalysisThebasicconceptsofElasticsearchInstallingandconfiguringElasticsearchUsingtheElasticsearchRESTAPItomanipulatedataSearchingusingbasicURIrequests
www.EBooksWorld.ir
FulltextsearchingBackinthedayswhenfulltextsearchingwasatermknowntoasmallpercentageofengineers,mostofususedSQLdatabasestoperformsearchoperations.UsingSQLdatabasestosearchforthedatastoredinthemwasokaytosomeextent.Suchasearchwasn’tfast,especiallyonlargeamountsofdata.Evennow,smallapplicationsareusuallygoodwithastandardLIKE%phrase%searchinaSQLdatabase.However,aswegodeeperanddeeper,westarttoseethelimitsofsuchanapproach—alackofscalability,notenoughflexibility,andalackoflanguageanalysis.Ofcourse,thereareadditionalmodulesthatextendSQLdatabaseswithfulltextsearchcapabilities,buttheyarestilllimitedcomparedtodedicatedfulltextsearchlibrariesandsearchenginessuchasElasticsearch.SomeofthosereasonsledtothecreationofApacheLucene(http://lucene.apache.org/),alibrarywrittencompletelyinJava(http://java.com/en/),whichisveryfast,light,andprovideslanguageanalysisforalargenumberoflanguagesspokenthroughouttheworld.
www.EBooksWorld.ir
TheLuceneglossaryandarchitectureBeforegoingintothedetailsoftheanalysisprocess,wewouldliketointroduceyoutotheglossaryandoverallarchitectureofApacheLucene.WedecidedthatthisinformationiscrucialforunderstandinghowElasticsearchworks,andeventhoughthebookisnotaboutApacheLucene,knowingthefoundationoftheElasticsearchanalyticsandindexingengineisvitaltofullyunderstandhowthisgreatsearchengineworks.
Thebasicconceptsofthementionedlibraryareasfollows:
Document:Thisisthemaindatacarrierusedduringindexingandsearching,comprisingoneormorefieldsthatcontainthedataweputinandgetfromLucene.Field:Thisasectionofthedocument,whichisbuiltoftwoparts:thenameandthevalue.Term:Thisisaunitofsearchrepresentingawordfromthetext.Token:Thisisanoccurrenceofaterminthetextofthefield.Itconsistsofthetermtext,startandendoffsets,andatype.
ApacheLucenewritesalltheinformationtoastructurecalledtheinvertedindex.Itisadatastructurethatmapsthetermsintheindextothedocumentsandnottheotherwayaroundasarelationaldatabasedoesinitstables.Youcanthinkofaninvertedindexasadatastructurewheredataisterm-orientedratherthandocument-oriented.Let’sseehowasimpleinvertedindexwilllook.Forexample,let’sassumethatwehavedocumentswithonlyasinglefieldcalledtitletobeindexed,andthevaluesofthatfieldareasfollows:
ElasticsearchServer(document1)MasteringElasticsearchSecondEdition(document2)ApacheSolrCookbookThirdEdition(document3)
AverysimplifiedvisualizationoftheLuceneinvertedindexcouldlookasfollows:
Eachtermpointstothenumberofdocumentsitispresentin.Forexample,theterm
www.EBooksWorld.ir
editionispresenttwiceinthesecondandthirddocuments.Suchastructureallowsforveryefficientandfastsearchoperationsinterm-basedqueries(butnotexclusively).Becausetheoccurrencesofthetermareconnectedtothetermsthemselves,Lucenecanuseinformationaboutthetermoccurrencestoperformfastandprecisescoringinformationbygivingeachdocumentavaluethatrepresentshowwelleachofthereturneddocumentsmatchedthequery.
Ofcourse,theactualindexcreatedbyLuceneismuchmorecomplicatedandadvancedbecauseofadditionalfilesthatincludeinformationsuchastermvectors(perdocumentinvertedindex),docvalues(columnorientedfieldinformation),storedfields(theoriginalandnottheanalyzedvalueofthefield),andsoon.However,allyouneedtoknowfornowishowthedataisorganizedandnotwhatexactlyisstored.
Eachindexisdividedintomultiplewrite-onceandread-many-timestructurescalledsegments.EachsegmentisaminiatureApacheLuceneindexonitsown.Whenindexing,afterasinglesegmentiswrittentothediskitcan’tbeupdated,orweshouldrathersayitcan’tbefullyupdated;documentscan’tberemovedfromit,theycanonlybemarkedasdeletedinaseparatefile.ThereasonthatLucenedoesn’tallowsegmentstobeupdatedisthenatureoftheinvertedindex.Afterthefieldsareanalyzedandputintotheinvertedindex,thereisnoeasywayofbuildingtheoriginaldocumentstructure.Whendeleting,Lucenewouldhavetodeletetheinformationfromthesegment,whichtranslatestoupdatingalltheinformationwithintheinvertedindexitself.
Becauseofthefactthatsegmentsarewrite-oncestructuresLuceneisabletomergesegmentstogetherinaprocesscalledsegmentmerging.Duringindexing,ifLucenethinksthattherearetoomanysegmentsfallingintothesamecriterion,anewandbiggersegmentwillbecreated—onethatwillhavedatafromtheothersegments.Duringthatprocess,Lucenewilltrytoremovedeleteddataandgetbackthespaceneededtoholdinformationaboutthosedocuments.SegmentmergingisademandingoperationbothintermsoftheI/OandCPU.Whatwehavetorememberfornowisthatsearchingwithonelargesegmentisfasterthansearchingwithmultiplesmalleronesholdingthesamedata.That’sbecause,ingeneral,searchingtranslatestojustmatchingthequerytermstotheonesthatareindexed.Youcanimaginehowsearchingthroughmultiplesmallsegmentsandmergingthoseresultswillbeslowerthanhavingasinglesegmentpreparingtheresults.
www.EBooksWorld.ir
InputdataanalysisThetransformationofadocumentthatcomestoLuceneandisprocessedandputintotheinvertedindexformatiscalledindexation.OneofthethingsLucenehastododuringthisisdataanalysis.Youmaywantsomeofyourfieldstobeprocessedbyalanguageanalyzersothatwordssuchascarandcarsaretreatedasthesamebeyourindex.Ontheotherhand,youmaywantotherfieldstobedividedonlyonthewhitespacecharacterorbeonlylowercased.
Analysisisdonebytheanalyzer,whichisbuiltofatokenizerandzeroormoretokenfilters,anditcanalsohavezeroormorecharactermappers.
AtokenizerinLuceneisusedtosplitthetextintotokens,whicharebasicallythetermswithadditionalinformationsuchasitspositionintheoriginaltextanditslength.Theresultsofthetokenizer’sworkiscalledatokenstream,wherethetokensareputonebyoneandarereadytobeprocessedbythefilters.
Apartfromthetokenizer,theLuceneanalyzerisbuiltofzeroormoretokenfiltersthatareusedtoprocesstokensinthetokenstream.Someexamplesoffiltersareasfollows:
Lowercasefilter:MakesallthetokenslowercasedSynonymsfilter:ChangesonetokentoanotheronthebasisofsynonymrulesLanguagestemmingfilters:Responsibleforreducingtokens(actually,thetextpartthattheyprovide)intotheirrootorbaseformscalledthestem(https://en.wikipedia.org/wiki/Word_stem)
Filtersareprocessedoneafteranother,sowehavealmostunlimitedanalyticalpossibilitieswiththeadditionofmultiplefilters,oneafteranother.
Finally,thecharactermappersoperateonnon-analyzedtext—theyareusedbeforethetokenizer.Therefore,wecaneasilyremoveHTMLtagsfromwholepartsoftextwithoutworryingabouttokenization.
www.EBooksWorld.ir
IndexingandqueryingYoumaywonderhowalltheinformationwe’vedescribedsofaraffectsindexingandqueryingwhenusingLuceneandallthesoftwarethatisbuiltontopofit.Duringindexing,Lucenewilluseananalyzerofyourchoicetoprocessthecontentsofyourdocument;ofcourse,differentanalyzerscanbeusedfordifferentfields,sothenamefieldofyourdocumentcanbeanalyzeddifferentlycomparedtothesummaryfield.Forexample,thenamefieldmayonlybetokenizedonwhitespacesandlowercased,sothatexactmatchesaredoneandthesummaryfieldisstemmedinadditiontothat.Wecanalsodecidetonotanalyzethefieldsatall—wehavefullcontrolovertheanalysisprocess.
Duringaquery,yourquerytextcanbeanalyzedaswell.However,youcanalsochoosenottoanalyzeyourqueries.ThisiscrucialtorememberbecausesomeElasticsearchqueriesareanalyzedandsomearenot.Forexample,prefixandtermqueriesarenotanalyzed,andmatchqueriesareanalyzed(wewillgettothatinChapter3,SearchingYourData).Havingqueriesthatareanalyzedandnotanalyzedisveryuseful;sometimes,youmaywanttoqueryafieldthatisnotanalyzed,whilesometimesyoumaywanttohaveafulltextsearchanalysis.Forexample,ifwesearchfortheLightRedtermandthequeryisbeinganalyzedbythestandardanalyzer,thenthetermsthatwouldbesearchedarelightandred.Ifweuseaquerytypethathasnotbeenanalyzed,thenwewillexplicitlysearchfortheLightRedterm.Wemaynotwanttoanalyzethecontentofthequeryifweareonlyinterestedinexactmatches.
Whatyoushouldrememberaboutindexingandqueryinganalysisisthattheindexshouldmatchthequeryterm.Iftheydon’tmatch,Lucenewon’treturnthedesireddocuments.Forexample,ifyouusestemmingandlowercasingduringindexing,youneedtoensurethatthetermsinthequeryarealsolowercasedandstemmed,oryourquerieswon’treturnanyresultsatall.Forexample,let’sgetbacktoourLightRedtermthatweanalyzedduringindexing;wehaveitastwotermsintheindex:lightandred.IfwerunaLightRedqueryagainstthatdataanddon’tanalyzeit,wewon’tgetthedocumentintheresults—thequerytermdoesnotmatchtheindexedterms.Itisimportanttokeepthetokenfiltersinthesameorderduringindexingandquerytimeanalysissothatthetermsresultingfromsuchananalysisarethesame.
www.EBooksWorld.ir
ScoringandqueryrelevanceThereisoneadditionalthingthatweonlymentionedoncetillnow—scoring.Whatisthescoreofadocument?Thescoreisaresultofascoringformulathatdescribeshowwellthedocumentmatchesthequery.Bydefault,ApacheLuceneusestheTF/IDF(termfrequency/inversedocumentfrequency)scoringmechanism,whichisanalgorithmthatcalculateshowrelevantthedocumentisinthecontextofourquery.Ofcourse,itisnottheonlyalgorithmavailable,andwewillmentionotheralgorithmsintheMappingsconfigurationsectionofChapter2,IndexingYourData.
NoteIfyouwanttoreadmoreabouttheApacheLuceneTF/IDFscoringformula,pleasevisitApacheLuceneJavadocsfortheTFIDF.Thesimilarityclassisavailableathttp://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
www.EBooksWorld.ir
www.EBooksWorld.ir
ThebasicsofElasticsearchElasticsearchisanopensourcesearchserverprojectstartedbyShayBanonandpublishedinFebruary2010.Duringthistime,theprojectgrewintoamajorplayerinthefieldofsearchanddataanalysissolutionsandiswidelyusedinmanycommonorlesser-knownsearchanddataanalysisplatforms.Inaddition,duetoitsdistributednatureandreal-timesearchandanalyticscapabilities,manyorganizationsuseitasadocumentstore.
www.EBooksWorld.ir
KeyconceptsofElasticsearchInthenextfewpages,wewillgetyouthroughthebasicconceptsofElasticsearch.YoucanskipthissectionifyouarealreadyfamiliarwithElasticsearcharchitecture.However,ifyouarenotfamiliarwithElasticsearch,westronglyadviseyoutoreadthissection.Wewillrefertothekeywordsusedinthissectionintherestofthebook,andunderstandingthoseconceptsiscrucialtofullyutilizeElasticsearch.
IndexAnindexisthelogicalplacewhereElasticsearchstoresthedata.EachindexcanbespreadontomultipleElasticsearchnodesandisdividedintooneormoresmallerpiecescalledshardsthatarephysicallyplacedontheharddrives.Ifyouarecomingfromtherelationaldatabaseworld,youcanthinkofanindexlikeatable.However,theindexstructureispreparedforfastandefficientfulltextsearchingand,inparticular,doesnotstoreoriginalvalues.Thatstructureiscalledaninvertedindex(https://en.wikipedia.org/wiki/Inverted_index).
IfyouknowMongoDB,youcanthinkoftheElasticsearchindexasacollectioninMongoDB.IfyouarefamiliarwithCouchDB,youcanthinkaboutanindexasyouwouldabouttheCouchDBdatabase.Elasticsearchcanholdmanyindiceslocatedononemachineorspreadthemovermultipleservers.Aswehavealreadysaid,everyindexisbuiltofoneormoreshards,andeachshardcanhavemanyreplicas.
DocumentThemainentitystoredinElasticsearchisadocument.Adocumentcanhavemultiplefields,eachhavingitsowntypeandtreateddifferently.Usingtheanalogytorelationaldatabases,adocumentisarowofdatainadatabasetable.WhenyoucompareanElasticsearchdocumenttoaMongoDBdocument,youwillseethatbothcanhavedifferentstructures.ThethingtokeepinmindwhenitcomestoElasticsearchisthatfieldsthatarecommontomultipletypesinthesameindexneedtohavethesametype.Thismeansthatallthedocumentswithafieldcalledtitleneedtohavethesamedatatypeforit,forexample,string.
Documentsconsistoffields,andeachfieldmayoccurseveraltimesinasingledocument(suchafieldiscalledmultivalued).Eachfieldhasatype(text,number,date,andsoon).Thefieldtypescanalsobecomplex—afieldcancontainothersubdocumentsorarrays.ThefieldtypeisimportanttoElasticsearchbecausetypedetermineshowvariousoperationssuchasanalysisorsortingareperformed.Fortunately,thiscanbedeterminedautomatically(however,westillsuggestusingmappings;takealookatwhatfollows).
Unliketherelationaldatabases,documentsdon’tneedtohaveafixedstructure—everydocumentmayhaveadifferentsetoffields,andinadditiontothis,fieldsdon’thavetobeknownduringapplicationdevelopment.Ofcourse,onecanforceadocumentstructurewiththeuseofschema.Fromtheclient’spointofview,adocumentisaJSONobject(seemoreabouttheJSONformatathttps://en.wikipedia.org/wiki/JSON).Eachdocumentisstoredinoneindexandhasitsownuniqueidentifier,whichcanbegenerated
www.EBooksWorld.ir
automaticallybyElasticsearch,anddocumenttype.Thethingtorememberisthatthedocumentidentifierneedstobeuniqueinsideanindexandshouldbeforagiventype.Thismeansthat,inasingleindex,twodocumentscanhavethesameuniqueidentifieriftheyarenotofthesametype.
DocumenttypeInElasticsearch,oneindexcanstoremanyobjectsservingdifferentpurposes.Forexample,ablogapplicationcanstorearticlesandcomments.Thedocumenttypeletsuseasilydifferentiatebetweentheobjectsinasingleindex.Everydocumentcanhaveadifferentstructure,butinreal-worlddeployments,dividingdocumentsintotypessignificantlyhelpsindatamanipulation.Ofcourse,oneneedstokeepthelimitationsinmind.Thatis,differentdocumenttypescan’tsetdifferenttypesforthesameproperty.Forexample,afieldcalledtitlemusthavethesametypeacrossalldocumenttypesinagivenindex.
MappingInthesectionaboutthebasicsoffulltextsearching(theFulltextsearchingsection),wewroteabouttheprocessofanalysis—thepreparationoftheinputtextforindexingandsearchingdonebytheunderlyingApacheLucenelibrary.Everyfieldofthedocumentmustbeproperlyanalyzeddependingonitstype.Forexample,adifferentanalysischainisrequiredforthenumericfields(numbersshouldn’tbesortedalphabetically)andforthetextfetchedfromwebpages(forexample,thefirststepwouldrequireyoutoomittheHTMLtagsasitisuselessinformation).Tobeabletoproperlyanalyzeatindexingandqueryingtime,Elasticsearchstorestheinformationaboutthefieldsofthedocumentsinso-calledmappings.Everydocumenttypehasitsownmapping,evenifwedon’texplicitlydefineit.
www.EBooksWorld.ir
KeyconceptsoftheElasticsearchinfrastructureNow,wealreadyknowthatElasticsearchstoresitsdatainoneormoreindicesandeveryindexcancontaindocumentsofvarioustypes.WealsoknowthateachdocumenthasmanyfieldsandhowElasticsearchtreatsthesefieldsisdefinedbythemappings.Butthereismore.Fromthebeginning,Elasticsearchwascreatedasadistributedsolutionthatcanhandlebillionsofdocumentsandhundredsofsearchrequestspersecond.Thisisduetoseveralimportantkeyfeaturesandconceptsthatwearegoingtodescribeinmoredetailnow.
NodesandclustersElasticsearchcanworkasastandalone,single-searchserver.Nevertheless,tobeabletoprocesslargesetsofdataandtoachievefaulttoleranceandhighavailability,Elasticsearchcanberunonmanycooperatingservers.Collectively,theseserversconnectedtogetherarecalledaclusterandeachserverformingaclusteriscalledanode.
ShardsWhenwehavealargenumberofdocuments,wemaycometoapointwhereasinglenodemaynotbeenough—forexample,becauseofRAMlimitations,harddiskcapacity,insufficientprocessingpower,andaninabilitytorespondtoclientrequestsfastenough.Insuchcases,anindex(andthedatainit)canbedividedintosmallerpartscalledshards(whereeachshardisaseparateApacheLuceneindex).Eachshardcanbeplacedonadifferentserver,andthusyourdatacanbespreadamongtheclusternodes.Whenyouqueryanindexthatisbuiltfrommultipleshards,Elasticsearchsendsthequerytoeachrelevantshardandmergestheresultinsuchawaythatyourapplicationdoesn’tknowabouttheshards.Inadditiontothis,havingmultipleshardscanspeedupindexing,becausedocumentsendupindifferentshardsandthustheindexingoperationisparallelized.
ReplicasInordertoincreasequerythroughputorachievehighavailability,shardreplicascanbeused.Areplicaisjustanexactcopyoftheshard,andeachshardcanhavezeroormorereplicas.Inotherwords,Elasticsearchcanhavemanyidenticalshardsandoneofthemisautomaticallychosenasaplacewheretheoperationsthatchangetheindexaredirected.Thisspecialshardiscalledaprimaryshard,andtheothersarecalledreplicashards.Whentheprimaryshardislost(forexample,aserverholdingthesharddataisunavailable),theclusterwillpromotethereplicatobethenewprimaryshard.
GatewayTheclusterstateisheldbythegateway,whichstorestheclusterstateandindexeddataacrossfullclusterrestarts.Bydefault,everynodehasthisinformationstoredlocally;itissynchronizedamongnodes.WewilldiscussthegatewaymoduleinThegatewayandrecoverymodulessectionofChapter9,ElasticsearchCluster,indetail.
www.EBooksWorld.ir
IndexingandsearchingYoumaywonderhowyoucantiealltheindices,shards,andreplicastogetherinasingleenvironment.Theoretically,itwouldbeverydifficulttofetchdatafromtheclusterwhenyouhavetoknowwhereyourdocumentis:onwhichserver,andinwhichshard.Evenmoredifficultwouldbesearchingwhenonequerycanreturndocumentsfromdifferentshardsplacedondifferentnodesinthewholecluster.Infact,thisisacomplicatedproblem;fortunately,wedon’thavetocareaboutthisatall—itishandledautomaticallybyElasticsearch.Let’slookatthefollowingdiagram:
Whenyousendanewdocumenttothecluster,youspecifyatargetindexandsendittoanyofthenodes.Thenodeknowshowmanyshardsthetargetindexhasandisabletodeterminewhichshardshouldbeusedtostoreyourdocument.Elasticsearchcanalterthisbehavior;wewilltalkaboutthisintheIntroductiontoroutingsectioninChapter2,IndexingYourData.TheimportantinformationthatyouhavetorememberfornowisthatElasticsearchcalculatestheshardinwhichthedocumentshouldbeplacedusingtheuniqueidentifierofthedocument—thisisoneofthereasonseachdocumentneedsauniqueidentifier.Aftertheindexingrequestissenttoanode,thatnodeforwardsthedocumenttothetargetnode,whichhoststherelevantshard.
Now,let’slookatthefollowingdiagramonsearchingrequestexecution:
www.EBooksWorld.ir
Whenyoutrytofetchadocumentbyitsidentifier,thenodeyousendthequerytousesthesameroutingalgorithmtodeterminetheshardandthenodeholdingthedocumentandagainforwardstherequest,fetchestheresult,andsendstheresulttoyou.Ontheotherhand,thequeryingprocessisamorecomplicatedone.Thenodereceivingthequeryforwardsittoallthenodesholdingtheshardsthatbelongtoagivenindexandasksforminimuminformationaboutthedocumentsthatmatchthequery(theidentifierandscorearematchedbydefault),unlessroutingisused,whenthequerywillgodirectlytoasingleshardonly.Thisiscalledthescatterphase.Afterreceivingthisinformation,theaggregatornode(thenodethatreceivestheclientrequest)sortstheresultsandsendsasecondrequesttogetthedocumentsthatareneededtobuildtheresultslist(alltheotherinformationapartfromthedocumentidentifierandscore).Thisiscalledthegatherphase.Afterthisphaseisexecuted,theresultsarereturnedtotheclient.
Nowthequestionarises:whatisthereplica’sroleinthepreviouslydescribedprocess?Whileindexing,replicasareonlyusedasanadditionalplacetostorethedata.Whenexecutingaquery,bydefault,Elasticsearchwilltrytobalancetheloadamongtheshardanditsreplicassothattheyareevenlystressed.Also,rememberthatwecanchangethisbehavior;wewilldiscussthisintheUnderstandingthequeryingprocesssectioninChapter3,SearchingYourData.
www.EBooksWorld.ir
www.EBooksWorld.ir
InstallingandconfiguringyourclusterInstallingandrunningElasticsearcheveninproductionenvironmentsisveryeasynowadays,comparedtohowitwasinthedaysofElasticsearch0.20.x.FromasystemthatisnotreadytoonewithElasticsearch,thereareonlyafewstepsthatoneneedstogo.Wewillexplorethesestepsinthefollowingsection:
www.EBooksWorld.ir
InstallingJavaElasticsearchisaJavaapplicationandtouseitweneedtomakesurethattheJavaSEenvironmentisinstalledproperly.ElasticsearchrequiresJavaVersion7orlatertorun.Youcandownloaditfromhttp://www.oracle.com/technetwork/java/javase/downloads/index.html.YoucanalsouseOpenJDK(http://openjdk.java.net/)ifyouwish.Youcan,ofcourse,useJavaVersion7,butitisnotsupportedbyOracleanymore,atleastwithoutcommercialsupport.Forexample,youcan’texpectnew,patchedversionsofJava7tobereleased.Becauseofthis,westronglysuggestthatyouinstallJava8,especiallygiventhatJava9seemstoberightaroundthecornerwiththegeneralavailabilityplannedtobereleasedinSeptember2016.
www.EBooksWorld.ir
InstallingElasticsearchToinstallElasticsearchyoujustneedtogotohttps://www.elastic.co/downloads/elasticsearch,choosethelaststableversionofElasticsearch,downloadit,andunpackit.That’sit!Theinstallationiscomplete.
NoteAtthetimeofwriting,weusedasnapshotofElasticsearch2.2.Thismeansthatwe’veskippeddescribingsomepropertiesthatweremarkedasdeprecatedandareorwillberemovedinthefutureversionsofElasticsearch.
ThemaininterfacetocommunicatewithElasticsearchisbasedontheHTTPprotocolandREST.Thismeansthatyoucanevenuseawebbrowserforsomebasicqueriesandrequests,butforanythingmoresophisticatedyou’llneedtouseadditionalsoftware,suchasthecURLcommand.IfyouusetheLinuxorOSXcommand,thecURLpackageshouldalreadybeavailable.IfyouuseWindows,youcandownloadthepackagefromhttp://curl.haxx.se/download.html.
www.EBooksWorld.ir
RunningElasticsearchLet’srunourfirstinstancethatwejustdownloadedastheZIParchiveandunpacked.GotothebindirectoryandrunthefollowingcommandsdependingontheOS:
LinuxorOSX:./elasticsearchWindows:elasticsearch.bat
Congratulations!Now,youhaveyourElasticsearchinstanceup-and-running.Duringitswork,theserverusuallyusestwoportnumbers:thefirstoneforcommunicationwiththeRESTAPIusingtheHTTPprotocol,andthesecondoneforthetransportmoduleusedforcommunicationinaclusterandbetweenthenativeJavaclientandthecluster.ThedefaultportusedfortheHTTPAPIis9200,sowecanchecksearchreadinessbypointingthewebbrowsertohttp://127.0.0.1:9200/.Thebrowsershouldshowacodesnippetsimilartothefollowing:
{
"name":"Blob",
"cluster_name":"elasticsearch",
"version":{
"number":"2.2.0",
"build_hash":"5b1dd1cf5a1957682d84228a569e124fedf8e325",
"build_timestamp":"2016-01-13T18:12:26Z",
"build_snapshot":true,
"lucene_version":"5.4.0"
},
"tagline":"YouKnow,forSearch"
}
TheoutputisstructuredasaJavaScriptObjectNotation(JSON)object.IfyouarenotfamiliarwithJSON,pleasetakeaminuteandreadthearticleavailableathttps://en.wikipedia.org/wiki/JSON.
NoteElasticsearchissmart.Ifthedefaultportisnotavailable,theenginebindstothenextfreeport.Youcanfindinformationaboutthisontheconsoleduringbootingasfollows:
[2016-01-1320:04:49,953][INFO][http][Blob]publish_address
{127.0.0.1:9201},bound_addresses{[fe80::1]:9200},{[::1]:9200},
{127.0.0.1:9201}
Notethefragmentwith[http].Elasticsearchusesafewportsforvarioustasks.TheinterfacethatweareusingishandledbytheHTTPmodule.
Now,wewillusethecURLprogramtocommunicatewithElasticsearch.Forexample,tochecktheclusterhealth,wewillusethefollowingcommand:
curl-XGEThttp://127.0.0.1:9200/_cluster/health?pretty
The-XparameterisadefinitionoftheHTTPrequestmethod.ThedefaultvalueisGET(sointhisexample,wecanomitthisparameter).Fornow,donotworryabouttheGETvalue;wewilldescribeitinmoredetaillaterinthischapter.
www.EBooksWorld.ir
Asastandard,theAPIreturnsinformationinaJSONobjectinwhichnewlinecharactersareomitted.TheprettyparameteraddedtoourrequestsforcesElasticsearchtoaddanewlinecharactertotheresponse,makingtheresponsemoreuser-friendly.Youcantryrunningtheprecedingquerywithandwithoutthe?prettyparametertoseethedifference.
Elasticsearchisusefulinsmallandmedium-sizedapplications,butithasbeenbuiltwithlargeclustersinmind.So,nowwewillsetupourbigtwo-nodecluster.UnpacktheElasticsearcharchiveinadifferentdirectoryandrunthesecondinstance.Ifwelookatthelog,wewillseethefollowing:
[2016-01-1320:07:58,561][INFO][cluster.service][BigMan]
detected_master{Blob}{5QPh00RUQraeLHAInbR4Jw}{127.0.0.1}{127.0.0.1:9300},
added{{Blob}{5QPh00RUQraeLHAInbR4Jw}{127.0.0.1}{127.0.0.1:9300},},reason:
zen-disco-receive(frommaster[{Blob}{5QPh00RUQraeLHAInbR4Jw}{127.0.0.1}
{127.0.0.1:9300}])
Thismeansthatoursecondinstance(namedBigMan)discoveredthepreviouslyrunninginstance(namedBlob).Here,Elasticsearchautomaticallyformedanewtwo-nodecluster.StartingfromElasticsearch2.0,thiswillonlyworkwithnodesrunningonthesamephysicalmachine—becauseElasticsearch2.0nolongersupportsmulticast.Toallowyourclustertoform,youneedtoinformElasticsearchaboutthenodesthatshouldbecontactedinitiallyusingthediscovery.zen.ping.unicast.hostsarrayinelasticsearch.yml.Forexample,likethis:
discovery.zen.ping.unicast.hosts:["192.168.2.1","192.168.2.2"]
www.EBooksWorld.ir
ShuttingdownElasticsearchEventhoughweexpectourcluster(ornode)torunflawlesslyforalifetime,wemayneedtorestartitorshutitdownproperly(forexample,formaintenance).ThefollowingarethetwowaysinwhichwecanshutdownElasticsearch:
Ifyournodeisattachedtotheconsole,justpressCtrl+CThesecondoptionistokilltheserverprocessbysendingtheTERMsignal(seethekillcommandontheLinuxboxesandProgramManageronWindows)
NoteThepreviousversionsofElasticsearchexposedadedicatedshutdownAPIbut,in2.0,thisoptionhasbeenremovedbecauseofsecurityreasons.
www.EBooksWorld.ir
ThedirectorylayoutNow,let’sgotothenewlycreateddirectory.Weshouldseethefollowingdirectorystructure:
Directory Description
Bin ThescriptsneededtorunElasticsearchinstancesandforpluginmanagement
Config Thedirectorywhereconfigurationfilesarelocated
Lib ThelibrariesusedbyElasticsearch
Modules ThepluginsbundledwithElasticsearch
AfterElasticsearchstarts,itwillcreatethefollowingdirectories(iftheydon’texist):
Directory Description
Data ThedirectoryusedbyElasticsearchtostoreallthedata
Logs Thefileswithinformationabouteventsanderrors
Plugins Thelocationtostoretheinstalledplugins
Work ThetemporaryfilesusedbyElasticsearch
www.EBooksWorld.ir
ConfiguringElasticsearchOneofthereasons—ofcourse,nottheonlyone—whyElasticsearchisgainingmoreandmorepopularityisthatgettingstartedwithElasticsearchisquiteeasy.Becauseofthereasonabledefaultvaluesandautomaticsettingsforsimpleenvironments,wecanskiptheconfigurationandgostraighttoindexingandquerying(ortothenextchapterofthebook).Wecandoallthiswithoutchangingasinglelineinourconfigurationfiles.However,inordertotrulyunderstandElasticsearch,itisworthunderstandingsomeoftheavailablesettings.
WewillnowexplorethedefaultdirectoriesandthelayoutofthefilesprovidedwiththeElasticsearchtar.gzarchive.Theentireconfigurationislocatedintheconfigdirectory.Wecanseetwofileshere:elasticsearch.yml(orelasticsearch.json,whichwillbeusedifpresent)andlogging.yml.Thefirstfileisresponsibleforsettingthedefaultconfigurationvaluesfortheserver.Thisisimportantbecausesomeofthesevaluescanbechangedatruntimeandcanbekeptasapartoftheclusterstate,sothevaluesinthisfilemaynotbeaccurate.Thetwovaluesthatwecannotchangeatruntimearecluster.nameandnode.name.
Thecluster.namepropertyisresponsibleforholdingthenameofourcluster.Theclusternameseparatesdifferentclustersfromeachother.Nodesconfiguredwiththesameclusternamewilltrytoformacluster.
Thesecondvalueistheinstance(thenode.nameproperty)name.Wecanleavethisparameterundefined.Inthiscase,Elasticsearchautomaticallychoosesauniquenameforitself.Notethatthisnameischosenduringeachstartup,sothenamecanbedifferentoneachrestart.DefiningthenamecanhelpfulwhenreferringtoconcreteinstancesbytheAPIorwhenusingmonitoringtoolstoseewhatishappeningtoanodeduringlongperiodsoftimeandbetweenrestarts.Thinkaboutgivingdescriptivenamestoyournodes.
Otherparametersarecommentedwellinthefile,soweadviseyoutolookthroughit;don’tworryifyoudonotunderstandtheexplanation.Wehopethateverythingwillbecomeclearerafterreadingthenextfewchapters.
NoteRememberthatmostoftheparametersthathavebeensetintheelasticsearch.ymlfilecanbeoverwrittenwiththeuseoftheElasticsearchRESTAPI.WewilltalkaboutthisAPIinTheupdatesettingsAPIsectionofChapter9,ElasticsearchClusterinDetail.
Thesecondfile(logging.yml)defineshowmuchinformationiswrittentosystemlogs,definesthelogfiles,andcreatesnewfilesperiodically.Changesinthisfileareusuallyrequiredonlywhenyouneedtoadapttomonitoringorbackupsolutionsorduringsystemdebugging;however,ifyouwanttohaveamoredetailedlogging,youneedtoadjustitaccordingly.
Let’sleavetheconfigurationfilesfornowandlookatthebaseforalltheapplications—theoperatingsystem.TuningyouroperatingsystemisoneofthekeypointstoensurethatyourElasticsearchinstancewillworkwell.Duringindexing,especiallywhenhaving
www.EBooksWorld.ir
manyshardsandreplicas,Elasticsearchwillcreatemanyfiles;so,thesystemcannotlimittheopenfiledescriptorstolessthan32,000.ForLinuxservers,thiscanusuallybechangedin/etc/security/limits.confandthecurrentvaluecanbedisplayedusingtheulimitcommand.Ifyouendupreachingthelimit,Elasticsearchwillnotbeabletocreatenewfiles;somergingwillfail,indexingmayfail,andnewindiceswillnotbecreated.
NoteOnMicrosoftWindowsplatforms,thedefaultlimitismorethan16millionhandlesperprocess,whichshouldbemorethanenough.YoucanreadmoreaboutfilehandlesontheMicrosoftWindowsplatformathttps://blogs.technet.microsoft.com/markrussinovich/2009/09/29/pushing-the-limits-of-windows-handles/.
ThenextsetofsettingsisconnectedtotheJavaVirtualMachine(JVM)heapmemorylimitforasingleElasticsearchinstance.Forsmalldeployments,thedefaultmemorylimit(1,024MB)willbesufficient,butforlargeonesitwillnotbeenough.IfyouspotentriesthatindicateOutOfMemoryErrorexceptionsinalogfile,settheES_HEAP_SIZEvariabletoavaluegreaterthan1024.WhenchoosingtherightamountofmemorysizetobegiventotheJVM,rememberthat,ingeneral,nomorethan50percentofyourtotalsystemmemoryshouldbegiven.However,aswithalltherules,thereareexceptions.Wewilldiscussthisingreaterdetaillater,butyoushouldalwaysmonitoryourJVMheapusageandadjustitwhenneeded.
www.EBooksWorld.ir
Thesystem-specificinstallationandconfigurationAlthoughdownloadinganarchivewithElasticsearchandunpackingitworksandisconvenientfortesting,therearededicatedmethodsforLinuxoperatingsystemsthatgiveyouseveraladvantageswhenyoudoproductiondeployment.Inproductiondeployments,theElasticsearchserviceshouldberunautomaticallywithasystemboot;weshouldhavededicatedstartandstopscripts,unifiedpaths,andsoon.ElasticsearchsupportsinstallationpackagesforvariousLinuxdistributionsthatwecanuse.Let’sseehowthisworks.
InstallingElasticsearchonLinuxTheotherwaytoinstallElasticsearchonaLinuxoperatingsystemistousepackagessuchasRPMorDEB,dependingonyourLinuxdistributionandthesupportedpackagetype.Thiswaywecanautomaticallyadapttosystemdirectorylayout;forexample,configurationandlogswillgointotheirstandardplacesinthe/etc/or/var/logdirectories.Butthisisnottheonlything.Whenusingpackages,Elasticsearchwillalsoinstallstartupscriptsandmakeourlifeeasier.What’smore,wewillbeabletoupgradeElasticsearcheasilybyrunningasinglecommandfromthecommandline.Ofcourse,thementionedpackagescanbefoundatthesameURLaddressaswementionedpreviouslywhenwetalkedaboutinstallingElasticsearchfromziportar.gzpackages:https://www.elastic.co/downloads/elasticsearch.Elasticsearchcanalsobeinstalledfromremoterepositoriesviastandarddistributiontoolssuchasapt-getoryum.
NoteBeforeinstallingElasticsearch,makesurethatyouhaveaproperversionofJavaVirtualMachineinstalled.
InstallingElasticsearchusingRPMpackages
WhenusingaLinuxdistributionthatsupportsRPMpackagessuchasFedoraLinux,(https://getfedora.org/)Elasticsearchinstallationisveryeasy.AfterdownloadingtheRPMpackage,wejustneedtorunthefollowingcommandasroot:
yumelasticsearch-2.2.0.noarch.rpm
Alternatively,youcanaddtheremoterepositoryandinstallElasticsearchfromit(thiscommandneedstoberunasrootaswell):
rpm--importhttps://packages.elastic.co/GPG-KEY-elasticsearch
ThiscommandaddstheGPGkeyandallowsthesystemtoverifythatthefetchedpackagereallycomesfromElasticsearchdevelopers.Inthesecondstep,weneedtocreatetherepositorydefinitioninthe/etc/yum.repos.d/elasticsearch.repofile.Weneedtoaddthefollowingentriestothisfile:
[elasticsearch-2.2]
name=Elasticsearchrepositoryfor2.2.xpackages
baseurl=http://packages.elastic.co/elasticsearch/2.x/centos
gpgcheck=1
www.EBooksWorld.ir
gpgkey=http://packages.elastic.co/GPG-KEY-elasticsearch
enabled=1
Nowit’stimetoinstalltheElasticsearchserver,whichisassimpleasrunningthefollowingcommand(again,don’tforgettorunitasroot):
yuminstallelasticsearch
Elasticsearchwillbeautomaticallydownloaded,verified,andinstalled.
InstallingElasticsearchusingtheDEBpackage
WhenusingaLinuxdistributionthatsupportsDEBpackages(suchasDebian),installingElasticsearchisagainveryeasy.AfterdownloadingtheDEBpackage,allyouneedtodoisrunthefollowingcommand:
sudodpkg-ielasticsearch-2.2.0.deb
Itisassimpleasthat.Anotherway,whichissimilartowhatwedidwithRPMpackages,isbycreatinganewpackagessourceandinstallingElasticsearchfromtheremoterepository.ThefirststepistoaddthepublicGPGkeyusedforpackageverification.Wecandothatusingthefollowingcommand:
wget-qO-https://packages.elastic.co/GPG-KEY-elasticsearch|sudoapt-key
add-
ThesecondstepisbyaddingtheDEBpackagelocation.Weneedtoaddthefollowinglinetothe/etc/apt/sources.listfile:
debhttp://packages.elastic.co/elasticsearch/2.2/debianstablemain
ThisdefinesthesourcefortheElasticsearchpackages.ThelaststepisupdatingthelistofremotepackagesandinstallingElasticsearchusingthefollowingcommand:
sudoapt-getupdate&&sudoapt-getinstallelasticsearch
Elasticsearchconfigurationfilelocalization
WhenusingpackagestoinstallElasticsearch,theconfigurationfilesareinslightlydifferentdirectoriesthanthedefaultconfdirectory.Aftertheinstallation,theconfigurationfilesshouldbestoredinthefollowinglocation:
/etc/sysconfig/elasticsearchor/etc/default/elasticsearch:AfilewiththeconfigurationoftheElasticsearchprocessasausertorunas,directoriesforlogs,dataandmemorysettings/etc/elasticsearch/:AdirectoryfortheElasticsearchconfigurationfiles,suchastheelasticsearch.ymlfile
ConfiguringElasticsearchasasystemserviceonLinuxIfeverythinggoeswell,youcanrunElasticsearchusingthefollowingcommand:
/bin/systemctlstartelasticsearch.service
IfyouwantElasticsearchtostartautomaticallyeverytimetheoperatingsystemstarts,you
www.EBooksWorld.ir
cansetupElasticsearchasasystemservicebyrunningthefollowingcommand:
/bin/systemctlenableelasticsearch.service
ElasticsearchasasystemserviceonWindowsInstallingElasticsearchasasystemserviceonWindowsisalsoveryeasy.YoujustneedtogotoyourElasticsearchinstallationdirectory,thengotothebinsubdirectory,andrunthefollowingcommand:
service.batinstall
You’llbeaskedforpermissiontodoso.Ifyouallowthescripttorun,ElasticsearchwillbeinstalledasaWindowsservice.
Ifyouwouldliketoseeallthecommandsexposedbytheservice.batscriptfile,justrunthefollowingcommandinthesamedirectoryasearlier:
service.bat
Forexample,tostartElasticsearch,wewilljustrunthefollowingcommand:
service.batstart
www.EBooksWorld.ir
www.EBooksWorld.ir
ManipulatingdatawiththeRESTAPIElasticsearchexposesaveryrichRESTAPIthatcanbeusedtosearchthroughthedata,indexthedata,andcontrolElasticsearchbehavior.YoucanimaginethatusingtheRESTAPIallowsyoutogetasingledocument,indexorupdateadocument,gettheinformationonElasticsearchcurrentstate,createordeleteindices,orforceElasticsearchtomovearoundshardsofyourindices.Ofcourse,theseareonlyexamplesthatshowwhatyoucanexpectfromtheElasticsearchRESTAPI.Fornow,wewillconcentrateonusingthecreate,retrieve,update,delete(CRUD)partoftheElasticsearchAPI(https://en.wikipedia.org/wiki/Create,_read,_update_and_delete),whichallowsustouseElasticsearchinafashionsimilartohowwewoulduseanyotherNoSQL(https://en.wikipedia.org/wiki/NoSQL)datastore.
www.EBooksWorld.ir
UnderstandingtheRESTAPIIfyou’veneverusedanapplicationexposingtheRESTAPI,youmaybesurprisedhoweasyitistousesuchapplicationsandrememberhowtousethem.InREST-likearchitectures,everyrequestisdirectedtoaconcreteobjectindicatedbyapathintheaddress.Forexample,let’sassumethatourhypotheticalapplicationexposesthe/booksRESTend-pointasareferencetothelistofbooks.Insuchcase,acallto/books/1couldbeareferencetoaconcretebookwiththeidentifier1.Youcanthinkofitasadata-orientedmodelofanAPI.Ofcourse,wecannestthepaths—forexample,apathsuchas/books/1/chapterscouldreturnthelistofchaptersofourbookwithidentifier1andapathsuchas/books/1/chapters/6couldbeareferencetothesixthchapterinthatparticularbook.
Wetalkedaboutpaths,butwhenusingtheHTTPprotocol,(https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol)wehavesomeadditionalverbs(suchasPOST,GET,PUT,andsoon.)thatwecanusetodefinesystembehaviorinadditiontopaths.Soifwewouldliketoretrievethebookwithidentifier1,wewouldusetheGETrequestmethodwiththe/books/1path.However,wewouldusethePUTrequestmethodwiththesamepathtocreateabookrecordwiththeidentifierorone,thePOSTrequestmethodtoaltertherecord,DELETEtoremovethatentry,andtheHEADrequestmethodtogetbasicinformationaboutthedatareferencedbythepath.
Now,let’slookatexampleHTTPrequeststhataresenttorealElasticsearchRESTAPIendpoints,sotheprecedinghypotheticalinformationwillbeturnedintosomethingreal:
GEThttp://localhost:9200/:ThisretrievesbasicinformationaboutElasticsearch,suchastheversion,thenameofthenodethatthecommandhasbeensentto,thenameoftheclusterthatnodeisconnectedto,theApacheLuceneversion,andsoon.
GEThttp://localhost:9200/_cluster/state/nodes/Thisretrievesinformationaboutallthenodesinthecluster,suchastheiridentifiers,names,transportaddresseswithports,andadditionalnodeattributesforeachnode.
DELETEhttp://localhost:9200/books/book/123:Thisdeletesadocumentthatisindexedinthebooksindex,withthebooktypeandanidentifierof123.
WenowknowwhatRESTmeansandwecanstartconcentratingonElasticsearchtoseehowwecanstore,retrieve,alter,anddeletethedatafromitsindices.IfyouwouldliketoreadmoreaboutREST,pleaserefertohttp://en.wikipedia.org/wiki/Representational_state_transfer.
www.EBooksWorld.ir
StoringdatainElasticsearchInElasticsearch,everydocumentisrepresentedbythreeattributes—theindex,thetype,andtheidentifier.Eachdocumentmustbeindexedintoasingleindex,needstohaveitstypecorrespondtothedocumentstructure,andisdescribedbytheidentifier.ThesethreeattributesallowsustoidentifyanydocumentinElasticsearchandneedstobeprovidedwhenthedocumentisphysicallywrittentotheunderlyingApacheLuceneindex.Havingtheknowledge,wearenowreadytocreateourfirstElasticsearchdocument.
CreatinganewdocumentWewillstartlearningtheElasticsearchRESTAPIbyindexingonedocument.Let’simaginethatwearebuildingaCMSsystem(http://en.wikipedia.org/wiki/Content_management_system)thatwillprovidethefunctionalityofabloggingplatformforourinternalusers.Wewillhavedifferenttypesofdocumentsinourindices,butthemostimportantonesarethearticlesthatwillbepublishedandarereadablebyusers.
BecausewetalktoElasticsearchusingJSONnotationandElasticsearchrespondstousagainusingJSON,ourexampledocumentcouldlookasfollows:
{
"id":"1",
"title":"NewversionofElasticsearchreleased!",
"content":"Version2.2releasedtoday!",
"priority":10,
"tags":["announce","elasticsearch","release"]
}
Asyoucanseeintheprecedingcodesnippet,theJSONdocumentisbuiltwithasetoffields,whereeachfieldcanhaveadifferentformat.Inourexample,wehaveasetoftextfields(id,title,andcontent),wehaveanumber(thepriorityfield),andanarrayoftextvalues(thetagsfield).Wewillshowdocumentsthataremorecomplicatedinthenextexamples.
NoteOneofthechangesintroducedinElasticsearch2.0hasbeenthatfieldnamescan’tcontainthedotcharacter.SuchfieldnameswerepossibleinolderversionsofElasticsearch,butcouldresultinserializationerrorsincertaincasesandthusElasticsearchcreatorsdecidedtoremovethatpossibility.
OnethingtorememberisthatbydefaultElasticsearchworksasaschema-lessdatastore.ThismeansthatitcantrytoguessthetypeofthefieldinadocumentsenttoElasticsearch.Itwilltrytousenumerictypesforthevaluesthatarenotenclosedinquotationmarksandstringsfordataenclosedinquotationmarks.Itwilltrytoguessthedateandindexthemindedicatedfieldsandsoon.ThisispossiblebecausetheJSONformatissemi-typed.Internally,whenthefirstdocumentwithanewfieldissenttoElasticsearch,itwillbeprocessedandmappingswillbewritten(wewilltalkmoreaboutmappingsintheMappingsconfigurationsectionofChapter2,IndexingYourData).
www.EBooksWorld.ir
NoteAschema-lessapproachanddynamicmappingscanbeproblematicwhendocumentscomewithaslightlydifferentstructure—forexample,thefirstdocumentwouldcontainthevalueofthepriorityfieldwithoutquotationmarks(liketheoneshowninthediscussedexample),whiletheseconddocumentwouldhavequotationmarksforthevalueinthepriorityfield.ThiswillresultinanerrorbecauseElasticsearchwilltrytoputatextvalueinthenumericfieldandthisisnotpossibleinLucene.Becauseofthis,itisadvisabletodefineyourownmappings,whichyouwilllearnintheMappingsconfigurationsectionofChapter2,IndexingYourData.
Let’snowindexourdocumentandmakeitavailableforretrievalandsearching.Wewillindexourarticlestoanindexcalledblogunderatypenamedarticle.Wewillalsogiveourdocumentanidentifierof1,asthisisourfirstdocument.Toindexourexampledocument,wewillexecutethefollowingcommand:
curl-XPUT'http://localhost:9200/blog/article/1'-d'{"title":"New
versionofElasticsearchreleased!","content":"Version2.2released
today!","priority":10,"tags":["announce","elasticsearch","release"]
}'
Noteanewoptiontothecurlcommand,the-dparameter.Thevalueofthisoptionisthetextthatwillbeusedasarequestpayload—arequestbody.Thisway,wecansendadditionalinformationsuchasthedocumentdefinition.Also,notethattheuniqueidentifierisplacedintheURLandnotinthebody.Ifyouomitthisidentifier(whileusingtheHTTPPUTrequest),theindexingrequestwillreturnthefollowingerror:
Nohandlerfoundforuri[/blog/article]andmethod[PUT]
Ifeverythingworkedcorrectly,ElasticsearchwillreturnaJSONresponseinformingusaboutthestatusoftheindexingoperation.Thisresponseshouldbesimilartothefollowingone:
{
"_index":"blog",
"_type":"article",
"_id":"1",
"_version":1,
"_shards":{
"total":2,
"successful":1,
"failed":0},
"created":true
}
Intheprecedingresponse,Elasticsearchincludedinformationaboutthestatusoftheoperation,index,type,identifier,andversion.Wecanalsoseeinformationabouttheshardsthattookpartintheoperation—allofthem,theonesthatweresuccessfulandtheonesthatfailed.
Automaticidentifiercreation
www.EBooksWorld.ir
Inthepreviousexample,wespecifiedthedocumentidentifiermanuallywhenweweresendingthedocumenttoElasticsearch.However,thereareusecaseswhenwedon’thaveanidentifierforourdocuments—forexample,whenhandlinglogsasourdata.Insuchcases,wewouldlikesomeapplicationtocreatetheidentifierforusandElasticsearchcanbesuchanapplication.Ofcourse,generatingdocumentidentifiersdoesn’tmakesensewhenyourdocumentalreadyhasthem,suchasdatainarelationaldatabase.Insuchcases,youmaywanttoupdatethedocuments;inthiscase,automaticidentifiergenerationisnotthebestidea.However,whenweareinneedofsuchfunctionality,insteadofusingtheHTTPPUTmethodwecanusePOSTandomittheidentifierintheRESTAPIpath.SoifwewouldlikeElasticsearchtogeneratetheidentifierinthepreviousexample,wewouldsendacommandlikethis:
curl-XPOST'http://localhost:9200/blog/article/'-d'{"title":"New
versionofElasticsearchreleased!","content":"Version2.2released
today!","priority":10,"tags":["announce","elasticsearch","release"]
}'
We’veusedtheHTTPPOSTmethodinsteadofPUTandwe’veomittedtheidentifier.TheresponseproducedbyElasticsearchinsuchacasewouldbeasfollows:
{
"_index":"blog",
"_type":"article",
"_id":"AU1y-s6w2WzST_RhTvCJ",
"_version":1,
"_shards":{
"total":2,
"successful":1,
"failed":0},
"created":true
}
Asyoucansee,theresponsereturnedbyElasticsearchisalmostthesameasinthepreviousexample,withaminordifference—the_idfieldisreturned.Now,insteadofthe1value,wehaveavalueofAU1y-s6w2WzST_RhTvCJ,whichistheidentifierElasticsearchgeneratedforourdocument.
www.EBooksWorld.ir
RetrievingdocumentsWenowhavetwodocumentsindexedintoourElasticsearchinstance—oneusingaexplicitidentifierandoneusingageneratedidentifier.Let’snowtrytoretrieveoneofthedocumentsusingitsuniqueidentifier.Todothis,wewillneedinformationabouttheindexthedocumentisindexedin,whattypeithas,andofcoursewhatidentifierithas.Forexample,togetthedocumentfromtheblogindexwiththearticletypeandtheidentifierof1,wewouldrunthefollowingHTTPGETrequest:
curl-XGET'localhost:9200/blog/article/1?pretty'
NoteTheadditionalURIpropertycalledprettytellsElasticsearchtoincludenewlinecharactersandadditionalwhitespacesinresponsetomaketheoutputeasiertoreadforusers.
Elasticsearchwillreturnaresponsesimilartothefollowing:
{
"_index":"blog",
"_type":"article",
"_id":"1",
"_version":1,
"found":true,
"_source":{
"title":"NewversionofElasticsearchreleased!",
"content":"Version2.2releasedtoday!",
"priority":10,
"tags":["announce","elasticsearch","release"]
}
}
Asyoucanseeintheprecedingresponse,Elasticsearchreturnedthe_sourcefield,whichistheoriginaldocumentsenttoElasticsearchandafewadditionalfieldsthattellusaboutthedocument,suchastheindex,type,identifier,documentversion,andofcourseinformationastowhetherthedocumentwasfoundornot(thefoundproperty).
Ifwetrytoretrieveadocumentthatisnotpresentintheindex,suchastheonewiththe12345identifier,wegetaresponselikethis:
{
"_index":"blog",
"_type":"article",
"_id":"12345",
"found":false
}
Asyoucansee,thistimethevalueofthefoundpropertywassettofalseandtherewasno_sourcefieldbecausethedocumenthasnotbeenretrieved.
www.EBooksWorld.ir
UpdatingdocumentsUpdatingdocumentsintheindexisamorecomplicatedtaskcomparedtoindexing.WhenthedocumentisindexedandElasticsearchflushesthedocumenttoadisk,itcreatessegments—animmutablestructurethatiswrittenonceandreadmanytimes.ThisisdonebecausetheinvertedindexcreatedbyApacheLuceneiscurrentlyimpossibletoupdate(atleastmostofitsparts).Toupdateadocument,ElasticsearchinternallyfirstfetchesthedocumentusingtheGETrequest,modifiesits_sourcefield,removestheolddocument,andindexesanewdocumentusingtheupdatedcontent.ThecontentupdateisdoneusingscriptsinElasticsearch(wewilltalkmoreaboutscriptinginElasticsearchintheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter).
NotePleasenotethatthefollowingdocumentupdateexamplesrequireyoutoputthescript.inline:onpropertyintoyourelasticsearch.ymlconfigurationfile.ThisisneededbecauseinlinescriptingisdisabledinElasticsearchforsecurityreasons.TheotherwaytohandleupdatesistostorethescriptcontentinthefileintheElasticsearchconfigurationdirectory,butwewilltalkaboutthatintheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter.
Let’snowtrytoupdateourdocumentwithidentifier1bymodifyingitscontentfieldtocontaintheThisistheupdateddocumentsentence.Todothis,weneedtorunaPOSTHTTPrequestonthedocumentpathusingthe_updateRESTend-point.Ourrequesttomodifythedocumentwouldlookasfollows:
curl-XPOST'http://localhost:9200/blog/article/1/_update'-d'{
"script":"ctx._source.content=new_content",
"params":{
"new_content":"Thisistheupdateddocument"
}
}'
Asyoucansee,we’vesenttherequesttothe/blog/article/1/_updateRESTend-point.Intherequestbody,we’veprovidedtwoparameters—theupdatescriptinthescriptpropertyandtheparametersofthescript.Thescriptisverysimple;ittakesthe_sourcefieldandmodifiesthecontentfieldbysettingitsvaluetothevalueofthenew_contentparameter.Theparamspropertycontainsallthescriptparameters.
Fortheprecedingupdatecommandexecution,Elasticsearchwouldreturnthefollowingresponse:
{"_index":"blog","_type":"article","_id":"1","_version":2,"_shards":
{"total":2,"successful":1,"failed":0}}
Thethingtolookatintheprecedingresponseisthe_versionfield.Rightnow,theversionis2,whichmeansthatthedocumenthasbeenupdated(orre-indexed)once.Basically,eachupdatemakesElasticsearchupdatethe_versionfield.
Wecouldalsoupdatethedocumentusingthedocsectionandprovidingthechangedfield,
www.EBooksWorld.ir
forexample:
curl-XPOST'http://localhost:9200/blog/article/1/_update'-d'{
"doc":{
"content":"Thisistheupdateddocument"
}
}'
Wenowretrievethedocumentusingthefollowingcommand:
curl-XGET'http://localhost:9200/blog/article/1?pretty'
AndwegetthefollowingresponsefromElasticsearch:
{
"_index":"blog",
"_type":"article",
"_id":"1",
"_version":2,
"found":true,
"_source":{
"title":"NewversionofElasticsearchreleased!",
"content":"Thisistheupdateddocument",
"priority":10,
"tags":["announce","elasticsearch","release"]
}
}
Asyoucansee,thedocumenthasbeenupdatedproperly.
NoteThethingtorememberwhenusingtheupdateAPIofElasticsearchisthatthe_sourcefieldneedstobepresentbecausethisisthefieldthatElasticsearchusestoretrievetheoriginaldocumentcontentfromtheindex.Bydefault,thatfieldisenabledandElasticsearchusesittostoretheoriginaldocument.
Dealingwithnon-existingdocumentsThenicethingwhenitcomestodocumentupdates,whichwewouldliketomentionasitcancomeinhandywhenusingElasticsearchUpdateAPI,isthatwecandefinewhatElasticsearchshoulddowhenthedocumentwetrytoupdateisnotpresent.
Forexample,let’stryincrementingthepriorityfieldvalueforanon-existingdocumentwithidentifier2:
curl-XPOST'http://localhost:9200/blog/article/2/_update'-d'{
"script":"ctx._source.priority+=1"
}'
TheresponsereturnedbyElasticsearchwouldlookmoreorlessasfollows:
{"error":{"root_cause":[{"type":"document_missing_exception","reason":"
[article][2]:document
missing","shard":"2","index":"blog"}],"type":"document_missing_exception","
reason":"[article][2]:document
www.EBooksWorld.ir
missing","shard":"2","index":"blog"},"status":404}
Asyoucanimagine,thedocumenthasnotbeenupdatedbecauseitdoesn’texist.Sonow,let’smodifyourrequesttoincludetheupsertsectioninourrequestbodythatwilltellElasticsearchwhattodowhenthedocumentisnotpresent.Thenewcommandwouldlookasfollows:
curl-XPOST'http://localhost:9200/blog/article/2/_update'-d'{
"script":"ctx._source.priority+=1",
"upsert":{
"title":"Emptydocument",
"priority":0,
"tags":["empty"]
}
}'
Withthemodifiedrequest,anewdocumentwouldbeindexed;ifweretrieveitusingtheGETAPI,itwilllookasfollows:
{
"_index":"blog",
"_type":"article",
"_id":"2",
"_version":1,
"found":true,
"_source":{
"title":"Emptydocument",
"priority":0,
"tags":["empty"]
}
}
Asyoucansee,thefieldsfromtheupsertsectionofourupdaterequestweretakenbyElasticsearchandusedasdocumentfields.
AddingpartialdocumentsInadditiontowhatwealreadywroteabouttheupdateAPI,Elasticsearchisalsocapableofmergingpartialdocumentsfromtheupdaterequesttoalreadyexistingdocumentsorindexingnewdocumentsusinginformationabouttherequest,similartowhatwesawseenwiththeupsertsection.
Let’simaginethatwewouldliketoupdateourinitialdocumentandaddanewfieldcalledcounttoit(settingitto1initially).Wewouldalsoliketoindexthedocumentunderthespecifiedidentifierifthedocumentisnotpresent.Wecandothisbyrunningthefollowingcommand:
curl-XPOST'http://localhost:9200/blog/article/1/_update'-d'{
"doc":{
"count":1
},
"doc_as_upsert":true
}
Wespecifiedthenewfieldinthedocsectionandwesaidthatwewantthedocsectionto
www.EBooksWorld.ir
betreatedastheupsertsectionwhenthedocumentisnotpresent(withthedoc_as_upsertpropertysettotrue).
Ifwenowretrievethatdocument,weseethefollowingresponse:
{
"_index":"blog",
"_type":"article",
"_id":"1",
"_version":3,
"found":true,
"_source":{
"title":"NewversionofElasticsearchreleased!",
"content":"Thisistheupdateddocument",
"priority":10,
"tags":["announce","elasticsearch","release"],
"count":1
}
}
NoteForafullreferenceondocumentupdates,pleaserefertotheofficialElasticsearchdocumentationontheUpdateAPI,whichisavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html.
www.EBooksWorld.ir
DeletingdocumentsNowthatweknowhowtoindexdocuments,updatethem,andretrievethem,itistimetolearnabouthowwecandeletethem.DeletingadocumentfromanElasticsearchindexisverysimilartoretrievingit,butwithonemajordifference—insteadofusingtheHTTPGETmethod,wehavetouseHTTPDELETEone.
Forexample,ifwewouldliketodeletethedocumentindexedintheblogindexunderthearticletypeandwithanidentifierof1,wewouldrunthefollowingcommand:
curl-XDELETE'localhost:9200/blog/article/1'
TheresponsefromElasticsearchindicatesthatthedocumenthasbeendeletedandshouldlookasfollows:
{
"found":true,
"_index":"blog",
"_type":"article",
"_id":"1",
"_version":4,
"_shards":{
"total":2,
"successful":1,
"failed":0
}
}
Ofcourse,thisisnottheonlythingwhenitcomestodeleting.Wecanalsoremoveallthedocumentsofagiventype.Forexample,ifwewouldliketodeletetheentireblogindex,weshouldjustomittheidentifierandthetype,sothecommandwouldlooklikethis:
curl-XDELETE'localhost:9200/blog'
Theprecedingcommandwouldresultinthedeletionoftheblogindex.
www.EBooksWorld.ir
VersioningFinally,thereisonelastthingthatwewouldliketotalkaboutwhenitcomestodatamanipulationinElasticsearch—thegreatfeatureofversioning.Asyoumayhavealreadynoticed,Elasticsearchincrementsthedocumentversionwhenitdoesupdatestoit.Wecanleveragethisfunctionalityanduseoptimisticlocking(http://en.wikipedia.org/wiki/Optimistic_concurrency_control),andavoidconflictsandoverwriteswhenmultipleprocessesorthreadsaccessthesamedocumentconcurrently.Youcanassumethatyourindexingapplicationmaywanttotrytoupdatethedocument,whiletheuserwouldliketoupdatethedocumentwhiledoingsomemanualwork.Thequestionthatarisesis:Whichdocumentshouldbethecorrectone—theoneupdatedbytheindexingapplication,theoneupdatedbytheuser,orthemergeddocumentofthechanges?Whatifthechangesareconflicting?Tohandlesuchcases,wecanuseversioning.
UsageexampleLet’sindexanewdocumenttoourblogindex—onewithanidentifierof10,andlet’sindexitssecondversionsoonafterwedothat.Thecommandsthatdothislookasfollows:
curl-XPUT'localhost:9200/blog/article/10'-d'{"title":"Testdocument"}'
curl-XPUT'localhost:9200/blog/article/10'-d'{"title":"Updatedtest
document"}'
Becausewe’veindexedthedocumentwiththesameidentifier,itshouldhaveaversion2(youcancheckitusingtheGETrequest).
Now,let’strydeletingthedocumentwe’vejustindexedbutlet’sspecifyaversionpropertyequalto1.Bydoingthis,wetellElasticsearchthatweareinterestedindeletingthedocumentwiththeprovidedversion.Becausethedocumentisadifferentversionnow,Elasticsearchshouldn’tallowindexingwithversion1.Let’scheckifwhatwesayistrue.Thecommandwewillusetosendthedeleterequestlooksasfollows:
curl-XDELETE'localhost:9200/blog/article/10?version=1'
TheresponsegeneratedbyElasticsearchshouldbesimilartothefollowingone:
{
"error":{
"root_cause":[{
"type":"version_conflict_engine_exception",
"reason":"[article][10]:versionconflict,current[2],provided
[1]",
"shard":1,
"index":"blog"
}],
"type":"version_conflict_engine_exception",
"reason":"[article][10]:versionconflict,current[2],provided
[1]",
"shard":1,
"index":"blog"
},
"status":409
www.EBooksWorld.ir
}
Asyoucansee,thedeleteoperationwasnotsuccessful—theversionsdidn’tmatch.Ifwesettheversionpropertyto2,thedeleteoperationwouldbesuccessful:
curl-XDELETE'localhost:9200/blog/article/10?version=2&pretty'
Theresponsethistimewilllookasfollows:
{
"found":true,
"_index":"blog",
"_type":"article",
"_id":"10",
"_version":3,
"_shards":{
"total":2,
"successful":1,
"failed":0
}
}
Thistimethedeleteoperationhasbeensuccessfulbecausetheprovidedversionwasproper.
VersioningfromexternalsystemsTheverygoodthingaboutElasticsearchversioningcapabilitiesisthatwecanprovidetheversionofthedocumentthatwewouldlikeElasticsearchtouse.Thisallowsustoprovideversionsfromexternaldatasystemsthatareourprimarydatastores.Todothis,weneedtoprovideanadditionalparameterduringindexing—version_type=externaland,ofcourse,theversionitself.Forexample,ifwewouldlikeourdocumenttohavethe12345version,wecouldsendarequestlikethis:
curl-XPUT'localhost:9200/blog/article/20?
version=12345&version_type=external'-d'{"title":"Testdocument"}'
TheresponsereturnedbyElasticsearchisasfollows:
{
"_index":"blog",
"_type":"article",
"_id":"20",
"_version":12345,
"_shards":{
"total":2,
"successful":1,
"failed":0
},
"created":true
}
Wejustneedtorememberthat,whenusingversion_type=external,weneedtoprovidetheversionincaseswhereweindexthedocument.Incaseswherewewouldliketochangethedocumentanduseoptimisticlocking,weneedtoprovideaversionparameter
www.EBooksWorld.ir
equalto,orhigherthan,theversionpresentinthedocument.
www.EBooksWorld.ir
www.EBooksWorld.ir
SearchingwiththeURIrequestqueryBeforegettingintothewonderfulworldoftheElasticsearchquerylanguage,wewouldliketointroduceyoutothesimplebutprettyflexibleURIrequestsearch,whichallowsustouseasimpleElasticsearchquerycombinedwiththeLucenequerylanguage.Ofcourse,wewillextendoursearchknowledgeusingElasticsearchinChapter3,SearchingYourData,butfornowwewillsticktothesimplestapproach.
www.EBooksWorld.ir
SampledataForthepurposeofthissectionofthebook,wewillcreateasimpleindexwithtwodocumenttypes.Todothis,wewillrunthefollowingsixcommands:
curl-XPOST'localhost:9200/books/es/1'-d'{"title":"Elasticsearch
Server","published":2013}'
curl-XPOST'localhost:9200/books/es/2'-d'{"title":"ElasticsearchServer
SecondEdition","published":2014}'
curl-XPOST'localhost:9200/books/es/3'-d'{"title":"Mastering
Elasticsearch","published":2013}'
curl-XPOST'localhost:9200/books/es/4'-d'{"title":"Mastering
ElasticsearchSecondEdition","published":2015}'
curl-XPOST'localhost:9200/books/solr/1'-d'{"title":"ApacheSolr4
Cookbook","published":2012}'
curl-XPOST'localhost:9200/books/solr/2'-d'{"title":"SolrCookbookThird
Edition","published":2015}'
Runningtheprecedingcommandswillcreatethebook’sindexwithtwotypes:esandsolr.Thetitleandpublishedfieldswillbeindexedandthus,searchable.
www.EBooksWorld.ir
URIsearchAllqueriesinElasticsearcharesenttothe_searchendpoint.Youcansearchasingleindexormultipleindices,andyoucanrestrictyoursearchtoagivendocumenttypeormultipletypes.Forexample,inordertosearchourbook’sindex,wewillrunthefollowingcommand:
curl-XGET'localhost:9200/books/_search?pretty'
TheresultsreturnedbyElasticsearchwillincludeallthedocumentsfromourbook’sindex(becausenoqueryhasbeenspecified)andshouldlooksimilartothefollowing:
{
"took":3,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":6,
"max_score":1.0,
"hits":[{
"_index":"books",
"_type":"es",
"_id":"2",
"_score":1.0,
"_source":{
"title":"ElasticsearchServerSecondEdition",
"published":2014
}
},{
"_index":"books",
"_type":"es",
"_id":"4",
"_score":1.0,
"_source":{
"title":"MasteringElasticsearchSecondEdition",
"published":2015
}
},{
"_index":"books",
"_type":"solr",
"_id":"2",
"_score":1.0,
"_source":{
"title":"SolrCookbookThirdEdition",
"published":2015
}
},{
"_index":"books",
"_type":"es",
"_id":"1",
"_score":1.0,
www.EBooksWorld.ir
"_source":{
"title":"ElasticsearchServer",
"published":2013
}
},{
"_index":"books",
"_type":"solr",
"_id":"1",
"_score":1.0,
"_source":{
"title":"ApacheSolr4Cookbook",
"published":2012
}
},{
"_index":"books",
"_type":"es",
"_id":"3",
"_score":1.0,
"_source":{
"title":"MasteringElasticsearch",
"published":2013
}
}]
}
}
Asyoucansee,theresponsehasaheaderthattellsyouthetotaltimeofthequeryandtheshardsusedinthequeryprocess.Inadditiontothis,wehavedocumentsmatchingthequery—thetop10documentsbydefault.Eachdocumentisdescribedbytheindex,type,identifier,score,andthesourceofthedocument,whichistheoriginaldocumentsenttoElasticsearch.
Wecanalsorunqueriesagainstmanyindices.Forexample,ifwehadanotherindexcalledclients,wecouldalsorunasinglequeryagainstthesetwoindicesasfollows:
curl-XGET'localhost:9200/books,clients/_search?pretty'
WecanalsorunqueriesagainstallthedatainElasticsearchbyomittingtheindexnamescompletelyorsettingthequeriesto_all:
curl-XGET'localhost:9200/_search?pretty'
curl-XGET'localhost:9200/_all/_search?pretty'
Inasimilarmanner,wecanalsochoosethetypeswewanttouseduringsearching.Forexample,ifwewanttosearchonlyintheestypeinthebook’sindex,werunacommandasfollows:
curl-XGET'localhost:9200/books/es/_search?pretty'
Pleaserememberthat,inordertosearchforagiventype,weneedtospecifytheindexormultipleindices.Elasticsearchallowsustohavequitearichsemanticswhenitcomestochoosingindexnames.Ifyouareinterested,pleaserefertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/multi-index.html;however,thereisonethingwewouldliketopointout.Whenrunningaqueryagainstmultiple
www.EBooksWorld.ir
indices,itmayhappenthatsomeofthemdonotexistorareclosed.Insuchcases,theignore_unavailablepropertycomesinhandy.Whensettotrue,ittellsElasticsearchtoignoreunavailableorclosedindices.
Forexample,let’stryrunningthefollowingquery:
curl-XGET'localhost:9200/books,non_existing/_search?pretty'
Theresponsewouldbesimilartothefollowingone:
{
"error":{
"root_cause":[{
"type":"index_missing_exception",
"reason":"nosuchindex",
"index":"non_existing"
}],
"type":"index_missing_exception",
"reason":"nosuchindex",
"index":"non_existing"
},
"status":404
}
Nowlet’scheckwhatwillhappenifweaddtheignore_unavailable=truetoourrequestandexecutethefollowingcommand:
curl-XGET'localhost:9200/books,non_existing/_search?
pretty&ignore_unavailable=true'
Inthiscase,Elasticsearchwouldreturntheresultswithoutanyerror.
ElasticsearchqueryresponseLet’sassumethatwewanttofindallthedocumentsinourbook’sindexthatcontaintheelasticsearchterminthetitlefield.Wecandothisbyrunningthefollowingquery:
curl-XGET'localhost:9200/books/_search?pretty&q=title:elasticsearch'
TheresponsereturnedbyElasticsearchfortheprecedingrequestwillbeasfollows:
{
"took":37,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.625,
"hits":[{
"_index":"books",
"_type":"es",
"_id":"1",
"_score":0.625,
www.EBooksWorld.ir
"_source":{
"title":"ElasticsearchServer",
"published":2013
}
},{
"_index":"books",
"_type":"es",
"_id":"2",
"_score":0.5,
"_source":{
"title":"ElasticsearchServerSecondEdition",
"published":2014
}
},{
"_index":"books",
"_type":"es",
"_id":"4",
"_score":0.5,
"_source":{
"title":"MasteringElasticsearchSecondEdition",
"published":2015
}
},{
"_index":"books",
"_type":"es",
"_id":"3",
"_score":0.19178301,
"_source":{
"title":"MasteringElasticsearch",
"published":2013
}
}]
}
}
Thefirstsectionoftheresponsegivesusinformationabouthowmuchtimetherequesttook(thetookpropertyisspecifiedinmilliseconds),whetheritwastimedout(thetimed_outproperty),andinformationabouttheshardsthatwerequeriedduringtherequestexecution—thenumberofqueriedshards(thetotalpropertyofthe_shardsobject),thenumberofshardsthatreturnedtheresultssuccessfully(thesuccessfulpropertyofthe_shardsobject),andthenumberoffailedshards(thefailedpropertyofthe_shardsobject).Thequerymayalsotimeoutifitisexecutedforalongerperiodthanwewant.(Wecanspecifythemaximumqueryexecutiontimeusingthetimeoutparameter.)Thefailedshardmeansthatsomethingwentwrongwiththatshardoritwasnotavailableduringthesearchexecution.
Ofcourse,thementionedinformationcanbeuseful,butusually,weareinterestedintheresultsthatarereturnedinthehitsobject.Wehavethetotalnumberofdocumentsreturnedbythequery(inthetotalproperty)andthemaximumscorecalculated(inthemax_scoreproperty).Finally,wehavethehitsarraythatcontainsthereturneddocuments.Inourcase,eachreturneddocumentcontainsitsindexname(the_indexproperty),thetype(the_typeproperty),theidentifier(the_idproperty),thescore(the_scoreproperty),andthe
www.EBooksWorld.ir
_sourcefield(usually,thisistheJSONobjectsentforindexing.
www.EBooksWorld.ir
QueryanalysisYoumaywonderwhythequerywe’verunintheprevioussectionworked.WeindexedtheElasticsearchtermandranaqueryforElasticsearchandeventhoughtheydiffer(capitalization),therelevantdocumentswerefound.Thereasonforthisistheanalysis.Duringindexing,theunderlyingLucenelibraryanalyzesthedocumentsandindexesthedataaccordingtotheElasticsearchconfiguration.Bydefault,ElasticsearchwilltellLucenetoindexandanalyzebothstring-baseddataaswellasnumbers.ThesamehappensduringqueryingbecausetheURIrequestquerymapstothequery_stringquery(whichwillbediscussedinChapter3,SearchingYourData),andthisqueryisanalyzedbyElasticsearch.
Let’susetheindices-analyzeAPI(https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html).Itallowsustoseehowtheanalysisprocessisdone.Withthis,wecanseewhathappenedtooneofthedocumentsduringindexingandwhathappenedtoourqueryphraseduringquerying.
InordertoseewhatwasindexedinthetitlefieldoftheElasticsearchserverphrase,wewillrunthefollowingcommand:
curl-XGET'localhost:9200/books/_analyze?pretty&field=title'-d
'ElasticsearchServer'
Theresponsewillbeasfollows:
{
"tokens":[{
"token":"elasticsearch",
"start_offset":0,
"end_offset":13,
"type":"<ALPHANUM>",
"position":0
},{
"token":"server",
"start_offset":14,
"end_offset":20,
"type":"<ALPHANUM>",
"position":1
}]
}
YoucanseethatElasticsearchhasdividedthetextintotwoterms—thefirstonehasatokenvalueofelasticsearchandthesecondonehasatokenvalueoftheserver.
Nowlet’slookathowthequerytextwasanalyzed.Wecandothisbyrunningthefollowingcommand:
curl-XGET'localhost:9200/books/_analyze?pretty&field=title'-d
'elasticsearch'
Theresponseoftherequestwilllookasfollows:
www.EBooksWorld.ir
{
"tokens":[{
"token":"elasticsearch",
"start_offset":0,
"end_offset":13,
"type":"<ALPHANUM>",
"position":0
}]
}
Wecanseethatthewordisthesameastheoriginalonethatwepassedtothequery.Wewon’tgetintotheLucenequerydetailsandhowthequeryparserconstructedthequery,butingeneraltheindexedtermaftertheanalysiswasthesameastheoneinthequeryaftertheanalysis;so,thedocumentmatchedthequeryandtheresultwasreturned.
www.EBooksWorld.ir
URIquerystringparametersThereareafewparametersthatwecanusetocontrolURIquerybehavior,whichwewilldiscussnow.Thethingtorememberisthateachparameterinthequeryshouldbeconcatenatedwiththe&character,asshowninthefollowingexample:
curl-XGET'localhost:9200/books/_search?
pretty&q=published:2013&df=title&explain=true&default_operator=AND'
PleaseremembertoenclosetheURLoftherequestusingthe'charactersbecause,onLinux-basedsystems,the&characterwillbeanalyzedbytheLinuxshell.
ThequeryTheqparameterallowsustospecifythequerythatwewantourdocumentstomatch.ItallowsustospecifythequeryusingtheLucenequerysyntaxdescribedintheLucenequerysyntaxsectionlaterinthischapter.Forexample,asimplequerywouldlooklikethis:q=title:elasticsearch.
ThedefaultsearchfieldUsingthedfparameter,wecanspecifythedefaultsearchfieldthatshouldbeusedwhennofieldindicatorisusedintheqparameter.Bydefault,the_allfieldwillbeused.(ThisisthefieldthatElasticsearchusestocopythecontentofalltheotherfields.WewilldiscussthisingreaterdepthinChapter2,IndexingYourData).Anexampleofthedfparametervaluecanbedf=title.
AnalyzerTheanalyzerpropertyallowsustodefinethenameoftheanalyzerthatshouldbeusedtoanalyzeourquery.Bydefault,ourquerywillbeanalyzedbythesameanalyzerthatwasusedtoanalyzethefieldcontentsduringindexing.
ThedefaultoperatorpropertyThedefault_operatorpropertythatcanbesettoORorAND,allowsustospecifythedefaultBooleanoperatorusedforourquery(http://en.wikipedia.org/wiki/Boolean_algebra).Bydefault,itissettoOR,whichmeansthatasinglequerytermmatchwillbeenoughforadocumenttobereturned.SettingthisparametertoANDforaquerywillresultinreturningthedocumentsthatmatchallthequeryterms.
QueryexplanationIfwesettheexplainparametertotrue,Elasticsearchwillincludeadditionalexplaininformationwitheachdocumentintheresult—suchastheshardfromwhichthedocumentwasfetchedandthedetailedinformationaboutthescoringcalculation(wewilltalkmoreaboutitintheUnderstandingtheexplaininformationsectioninChapter6,MakeYourSearchBetter).Alsoremembernottofetchtheexplaininformationduringnormalsearchqueriesbecauseitrequiresadditionalresourcesandaddsperformancedegradationtothequeries.Forexample,aquerythatincludesexplaininformationcouldlookasfollows:
www.EBooksWorld.ir
curl-XGET'localhost:9200/books/_search?pretty&explain=true&q=title:solr'
TheresultsreturnedbyElasticsearchfortheprecedingquerywouldbeasfollows:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":2,
"max_score":0.70273256,
"hits":[{
"_shard":2,
"_node":"v5iRsht9SOWVzu-GY-YHlA",
"_index":"books",
"_type":"solr",
"_id":"2",
"_score":0.70273256,
"_source":{
"title":"SolrCookbookThirdEdition",
"published":2015
},
"_explanation":{
"value":0.70273256,
"description":"weight(title:solrin0)[PerFieldSimilarity],
resultof:",
"details":[{
"value":0.70273256,
"description":"fieldWeightin0,productof:",
"details":[{
"value":1.0,
"description":"tf(freq=1.0),withfreqof:",
"details":[{
"value":1.0,
"description":"termFreq=1.0",
"details":[]
}]
},{
"value":1.4054651,
"description":"idf(docFreq=1,maxDocs=3)",
"details":[]
},{
"value":0.5,
"description":"fieldNorm(doc=0)",
"details":[]
}]
}]
}
},{
"_shard":3,
"_node":"v5iRsht9SOWVzu-GY-YHlA",
"_index":"books",
www.EBooksWorld.ir
"_type":"solr",
"_id":"1",
"_score":0.5,
"_source":{
"title":"ApacheSolr4Cookbook",
"published":2012
},
"_explanation":{
"value":0.5,
"description":"weight(title:solrin1)[PerFieldSimilarity],
resultof:",
"details":[{
"value":0.5,
"description":"fieldWeightin1,productof:",
"details":[{
"value":1.0,
"description":"tf(freq=1.0),withfreqof:",
"details":[{
"value":1.0,
"description":"termFreq=1.0",
"details":[]
}]
},{
"value":1.0,
"description":"idf(docFreq=1,maxDocs=2)",
"details":[]
},{
"value":0.5,
"description":"fieldNorm(doc=1)",
"details":[]
}]
}]
}
}]
}
}
ThefieldsreturnedBydefault,foreachdocumentreturned,Elasticsearchwillincludetheindexname,thetypename,thedocumentidentifier,score,andthe_sourcefield.Wecanmodifythisbehaviorbyaddingthefieldsparameterandspecifyingacomma-separatedlistoffieldnames.Thefieldwillberetrievedfromthestoredfields(iftheyexist;wewilldiscusstheminChapter2,IndexingYourData)orfromtheinternal_sourcefield.Bydefault,thevalueofthefieldsparameteris_source.Anexampleis:fields=title,priority.
Wecanalsodisablethefetchingofthe_sourcefieldbyaddingthe_sourceparameterwithitsvaluesettofalse.
SortingtheresultsUsingthesortparameter,wecanspecifycustomsorting.ThedefaultbehaviorofElasticsearchistosortthereturneddocumentsindescendingorderofthevalueofthe_scorefield.Ifwewanttosortourdocumentsdifferently,weneedtospecifythesort
www.EBooksWorld.ir
parameter.Forexample,addingsort=published:descwillsortthedocumentsindescendingorderofpublishedfield.Byaddingthesort=published:ascparameter,wewilltellElasticsearchtosortthedocumentsonthebasisofthepublishedfieldinascendingorder.
Ifwespecifycustomsorting,Elasticsearchwillomitthe_scorefieldcalculationforthedocuments.Thismaynotbethedesiredbehaviorinyourcase.Ifyouwanttostillkeepatrackofthescoresforeachdocumentwhenusingacustomsort,youshouldaddthetrack_scores=truepropertytoyourquery.Pleasenotethattrackingthescoreswhendoingcustomsortingwillmakethequeryalittlebitslower(youmaynotevennoticethedifference)duetotheprocessingpowerneededtocalculatethescore.
ThesearchtimeoutBydefault,Elasticsearchdoesn’thavetimeoutforqueries,butyoumaywantyourqueriestotimeoutafteracertainamountoftime(forexample,5seconds).Elasticsearchallowsyoutodothisbyexposingthetimeoutparameter.Whenthetimeoutparameterisspecified,thequerywillbeexecuteduptoagiventimeoutvalueandtheresultsthatweregathereduptothatpointwillbereturned.Tospecifyatimeoutof5seconds,youwillhavetoaddthetimeout=5sparametertoyourquery.
TheresultswindowElasticsearchallowsyoutospecifytheresultswindow(therangeofdocumentsintheresultslistthatshouldbereturned).Wehavetwoparametersthatallowustospecifytheresultswindowsize:sizeandfrom.Thesizeparameterdefaultsto10anddefinesthemaximumnumberofresultsreturned.Thefromparameterdefaultsto0andspecifiesfromwhichdocumenttheresultsshouldbereturned.Inordertoreturnfivedocumentsstartingfromthe11thone,wewilladdthefollowingparameterstothequery:size=5&from=10.
Limitingper-shardresultsElasticsearchallowsustospecifythemaximumnumberofdocumentsthatshouldbefetchedfromeachshardusingterminate_afterpropertyandspecifyingthemaximumnumberofdocuments.Forexample,ifwewanttogetnomorethan100documentsfromeachshard,wecanaddterminate_after=100toourURIrequest.
IgnoringunavailableindicesWhenrunningqueriesagainstmultipleindices,itishandytotellElasticsearchthatwedon’tcareabouttheindicesthatarenotavailable.Bydefault,Elasticsearchwillthrowanerrorifoneoftheindicesisnotavailable,butwecanchangethisbysimplyaddingtheignore_unavailable=trueparametertoourURIrequest.
ThesearchtypeTheURIqueryallowsustospecifythesearchtypeusingthesearch_typeparameter,whichdefaultstoquery_then_fetch.Twovaluesthatwecanusehereare:dfs_query_then_fetchandquery_then_fetch.TherestofthesearchtypesavailableinolderElasticsearchversionsarenowdeprecatedorremoved.We’lllearnmoreabout
www.EBooksWorld.ir
searchtypesintheUnderstandingthequeryingprocesssectionofChapter3,SearchingYourData.
LowercasingtermexpansionSomequeries,suchastheprefixquery,usequeryexpansion.WewilldiscussthisintheQueryrewritesectioninChapter4,ExtendingYourQueryingKnowledge.Weareallowedtodefinewhethertheexpandedtermsshouldbelowercasedornotusingthelowercase_expanded_termsproperty.Bydefault,thelowercase_expanded_termspropertyissettotrue,whichmeansthattheexpandedtermswillbelowercased.
WildcardandprefixanalysisBydefault,wildcardqueriesandprefixqueriesarenotanalyzed.Ifwewanttochangethisbehavior,wecansettheanalyze_wildcardpropertytotrue.
NoteIfyouwanttoseealltheparametersexposedbyElasticsearchastheURIrequestparameters,pleaserefertotheofficialdocumentationavailableat:https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html.
www.EBooksWorld.ir
LucenequerysyntaxWethoughtthatitwouldbegoodtoknowabitmoreaboutwhatsyntaxcanbeusedintheqparameterpassedintheURIquery.SomeofthequeriesinElasticsearch(suchastheonecurrentlybeingdiscussed)supporttheLucenequeryparsersyntax—thelanguagethatallowsyoutoconstructqueries.Let’stakealookatitanddiscusssomebasicfeatures.
AquerythatwepasstoLuceneisdividedintotermsandoperatorsbythequeryparser.Let’sstartwiththeterms;youcandistinguishthemintotwotypes—singletermsandphrases.Forexample,toqueryforabookterminthetitlefield,wewillpassthefollowingquery:
title:book
Toqueryfortheelasticsearchbookphraseinthetitlefield,wewillpassthefollowingquery:
title:"elasticsearchbook"
Youmayhavenoticedthenameofthefieldinthebeginningandinthetermorthephraselater.
Aswealreadysaid,theLucenequerysyntaxsupportsoperators.Forexample,the+operatortellsLucenethatthegivenpartmustbematchedinthedocument,meaningthatthetermwearesearchingformustpresentinthefieldinthedocument.The-operatoristheopposite,whichmeansthatsuchapartofthequerycan’tbepresentinthedocument.Apartofthequerywithoutthe+or-operatorwillbetreatedasthegivenpartofthequerythatcanbematchedbutitisnotmandatory.So,ifwewanttofindadocumentwiththebookterminthetitlefieldandwithoutthecatterminthedescriptionfield,wesendthefollowingquery:
+title:book-description:cat
Wecanalsogroupmultipletermswithparentheses,asshowninthefollowingquery:
title:(crimepunishment)
Wecanalsoboostpartsofthequery(thisincreasestheirimportanceforthescoringalgorithm—thehighertheboost,themoreimportantthequerypartis)withthe^operatorandtheboostvalueafterit,asshowninthefollowingquery:
title:book^4
ThesearethebasicsoftheLucenequerylanguageandshouldallowyoutouseElasticsearchandconstructquerieswithoutanyproblems.However,ifyouareinterestedintheLucenequerysyntaxandyouwouldliketoexplorethatindepth,pleaserefertotheofficialdocumentationofthequeryparseravailableathttp://lucene.apache.org/core/5_4_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html.
www.EBooksWorld.ir
www.EBooksWorld.ir
SummaryInthischapter,welearnedwhatfulltextsearchisandthecontributionApacheLucenemakestothis.Inadditiontothis,wearenowfamiliarwiththebasicconceptsofElasticsearchanditstop-levelarchitecture.WeusedtheElasticsearchRESTAPInotonlytoindexdata,butalsotoupdate,retrieve,andfinallydeleteit.We’velearnedwhatversioningisandhowwecanuseitforoptimisticlockinginElasticsearch.Finally,wesearchedourdatausingthesimpleURIquery.
Inthenextchapter,we’llfocusonindexingourdata.WewillseehowElasticsearchindexingworksandwhattheroleofprimaryshardsandreplicasis.We’llseehowElasticsearchhandlesdatathatitdoesn’tknowandhowtocreateourownmappings—theJSONstructurethatdescribesthestructureofourindex.We’llalsolearnhowtousebatchindexingtospeeduptheindexingprocessandwhatadditionalinformationcanbestoredalongwithourindextohelpusachieveourgoal.Inaddition,wewilldiscusswhatanindexsegmentis,whatsegmentmergingis,andhowtotuneasegment.Finally,we’llseehowroutingworksinElasticsearchandwhatoptionswehavewhenitcomestobothindexingandqueryingrouting.
www.EBooksWorld.ir
www.EBooksWorld.ir
Chapter2.IndexingYourDataInthepreviouschapter,welearnedwhatfulltextsearchisandhowApacheLucenefitsthere.WewereintroducedtothebasicconceptsofElasticsearchandwearenowfamiliarwithitstop-levelarchitecture,soweknowhowitworks.WeusedtheRESTAPItoindexdata,toupdateit,todeleteit,andofcoursetoretrieveit.WesearchedourdatawiththesimpleURIqueryandweusedversioningthatallowedustouseoptimisticlockingfunctionality.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:
BasicinformationaboutElasticsearchindexingAdjustingElasticsearchschema-lessbehaviorCreatingyourownmappingsUsingoutoftheboxanalyzersConfiguringyourownanalyzersIndexdatainbatchesAddingadditionalinternalinformationtoindicesSegmentmergingRouting
www.EBooksWorld.ir
ElasticsearchindexingSofarwehaveourElasticsearchclusterupandrunning.WealsoknowhowtouseElasticsearchRESTAPItoindexourdata,weknowhowtoretrieveit,andwealsoknowhowtoremovethedatathatwenolongerneed.We’vealsolearnedhowtosearchinourdatabyusingtheURIrequestsearchandApacheLucenequerylanguage.However,untilnowwe’veusedElasticsearchfunctionalitythatallowsusnottocareaboutindices,shards,anddatastructure.ThisisnotsomethingthatyoumaybeusedtowhenyouarecomingfromtheworldofSQLdatabases,whereyouneedthedatabaseandthetableswithallthecolumnscreatedupfront.Ingeneral,youneededtodescribethedatastructuretobeabletoputdataintothedatabase.Elasticsearchisschema-lessandbydefaultcreatesindicesautomaticallyandbecauseofthatwecanjustinstallitandindexdatawithouttheneedofanypreparations.However,thisisusuallynotthebestsituationwhenitcomestoproductionenvironmentswhereyouwanttocontroltheanalysisofyourdata.BecauseofthatwewillstartwithshowingyouhowtomanageyourindicesandthenwewillgetyouthroughtheworldofmappingsinElasticsearch.
www.EBooksWorld.ir
ShardsandreplicasInChapter1,GettingStartedwithElasticsearchCluster,wetoldyouthatindicesinElasticsearcharebuiltfromoneormoreshards.EachofthoseshardscontainspartofthedocumentsetandeachshardisaseparateLuceneindex.Inadditiontothat,eachshardcanhavereplicas–physicalcopiesoftheprimarysharditself.Whenwecreateanindex,wecantellElasticsearchhowmanyshardsitshouldbebuiltfrom.
NoteThedefaultnumberofshardsthatElasticsearchusesis5andeachindexwillalsocontainasinglereplica.Thedefaultconfigurationcanbechangedbysettingtheindex.number_of_shardsandindex.number_of_replicaspropertiesintheelasticsearch.ymlconfigurationfile.
Whendefaultsareused,wewillendupwithfiveApacheLuceneindicesthatourElasticsearchindexisbuiltofandonereplicaforeachofthose.So,withfiveshardsandonereplica,wewouldactuallyget10shards.Thisisbecauseeachshardwouldgetitsowncopy,sothetotalnumberofshardsintheclusterwouldbe10.
Dividingindicesinsuchawayallowsustospreadtheshardsacrossthecluster.Thenicethingaboutthatisthatalltheshardswillbeautomaticallyspreadthroughoutthecluster.Ifwehaveasinglenode,Elasticsearchwillputthefiveprimaryshardsonthatnodeandwillleavethereplicasunassigned,becauseElasticsearchdoesn’tassignshardsandtheirreplicastothesamenode.Thereasonforthatissimple–ifanodewouldcrash,wewouldloseboththeprimarysourceofthedataandallthecopies.So,ifyouhaveoneElasticsearchnode,don’tworryaboutreplicasnotbeingassigned–itissomethingtobeexpected.OfcoursewhenyouhaveenoughnodesforElasticsearchtoassignallthereplicas(inadditiontoshards),itisnotgoodtonothavethemassignedandyoushouldlookfortheprobablecausesofthatsituation.
Thethingtorememberisthathavingshardsandreplicasisnotfree.Firstofall,eachreplicaneedsadditionaldiskspace,exactlythesameamountofspacethattheoriginalshardneeds.Soifwehave3replicasforourindex,wewillactuallyneed4timesmorespace.Ifourprimaryshardweighs100GBintotal,with3replicaswewouldneed400GB–100GBforeachreplica.However,thisisnottheonlycost.EachreplicaisaLuceneindexonitsownandElasticsearchneedssomememorytohandlethat.Themoreshardsinthecluster,themorememoryisbeingused.Andfinally,havingreplicasmeansthatwewillhavetodoindexationoneachofthereplica,inadditiontotheindexationontheprimaryshard.Thereisanotionofshadowreplicaswhichcancopythewholebinaryindex,but,inmostcases,eachreplicawilldoitsownindexation.ThegoodthingaboutreplicasisthatElasticsearchwilltrytospreadthequeryandgetrequestsevenlybetweentheshardsandtheirreplicas,whichmeansthatwecanscaleourclusterhorizontallybyusingthem.
Sotosumuptheconclusions:
Havingmoreshardsintheindexallowsustospreadtheindexbetweenmoreservers
www.EBooksWorld.ir
andparallelizetheindexingoperationsandthushavebetterindexingthroughput.Dependingonyourdeployment,havingmoreshardsmayincreasequerythroughputandlowerquerieslatency–especiallyinenvironmentsthatdon’thavealargenumberofqueriespersecond.Havingmoreshardsmaybeslowercomparedtoasingleshardquery,becauseElasticsearchneedstoretrievethedatafrommultipleserversandcombinethemtogetherinmemory,beforereturningthefinalqueryresults.Havingmorereplicasresultsinamoreresilientcluster,becausewhentheprimaryshardisnotavailable,itscopywilltakethatrole.Basically,havingasinglereplicaallowsustoloseonecopyofashardandstillservethewholedata.Havingtworeplicasallowsustolosetwocopiesoftheshardandstillseethewholedata.Thehigherthereplicacount,thehigherqueriesthroughputtheclusterwillhave.That’sbecauseeachreplicacanservethedataithasindependentlyfromalltheothers.Thehighernumberofshards(bothprimaryandreplicas)willresultinmorememoryneededbyElasticsearch.
Ofcourse,thesearenottheonlyrelationshipsbetweenthenumberofshardsandreplicasinElasticsearch.Wewilltalkaboutmostofthemlaterinthebook.
So,howmanyshardsandreplicasshouldwehaveforourindices?Thatdepends.Webelievethatthedefaultsarequitegoodbutnothingcanreplaceagoodtest.Notethatthenumberofreplicasisnotveryimportantbecauseyoucanadjustitonaliveclusterafterindexcreation.Youcanremoveandaddthemifyouwantandhavetheresourcestorunthem.Unfortunately,thisisnottruewhenitcomestothenumberofshards.Onceyouhaveyourindexcreated,theonlywaytochangethenumberofshardsistocreateanotherindexandre-indexyourdata.
WriteconsistencyElasticsearchallowsustocontrolthewriteconsistencytopreventwriteshappeningwhentheyshouldnot.Bydefault,Elasticsearchindexingoperationissuccessfulwhenthewriteissuccessfulonthequorumonactiveshards–meaning50%oftheactiveshardsplusone.Wecancontrolthisbehaviorbyaddingaction.write_consitencytoourelasticsearch.ymlfileorbyaddingtheconsistencyparametertoourindexrequest.Thementionedpropertiescantakethefollowingvalues:
quorum:Thedefaultvalue,requiring50%plus1activeshardstobesuccessfulfortheindexoperationtosucceedone:Requiresonlyasingleactiveshardtobesuccessfulfortheindexoperationtosucceedall:Requiresalltheactiveshardstobesuccessfulfortheindexoperationtosucceed
www.EBooksWorld.ir
CreatingindicesWhenwewereindexingourdocumentsinChapter1,GettingStartedwithElasticsearchCluster,wedidn’tcareaboutindexcreationatall.WeassumedthatElasticsearchwilldoeverythingforusandactuallyitwastrue;wejustusedthefollowingcommand:
curl-XPUT'http://localhost:9200/blog/article/1'-d'{"title":"New
versionofElasticsearchreleased!","content":"Version1.0released
today!","tags":["announce","elasticsearch","release"]}'
Thisisjustfine.Ifsuchanindexdoesnotexist,Elasticsearchautomaticallycreatestheindexforus.However,therearetimeswhenwewanttocreateindicesourselvesforvariousreasons.Maybewewouldliketohavecontroloverwhichindicesarecreatedtoavoiderrorsormaybewehavesomenondefaultsettingsthatwewouldliketousewhencreatingaparticularindex.Thereasonsmaydiffer,butit’sgoodtoknowthatwecancreateindiceswithoutindexingdocuments.
ThesimplestwaytocreateanindexistorunaPUTHTTPrequestwiththenameoftheindexwewanttocreate.Forexample,tocreateanindexcalledblog,wecouldusethefollowingcommand:
curl-XPUThttp://localhost:9200/blog/
WejusttoldElasticsearchthatwewanttocreatetheindexwiththenameblog.Ifeverythinggoesright,youwillseethefollowingresponsefromElasticsearch:
{"acknowledged":true}
AlteringautomaticindexcreationWealreadymentionedthatautomaticindexcreationisnotthebestideainsomecases.Forexample,asimpletypoduringindexcreationcanleadtocreatinghundredsofunusedindicesandmakeclusterstateinformationlargerthanitshouldbe,puttingmorepressureonElasticsearchandtheunderlyingJVM.Becauseofthat,wecanturnoffautomaticindexcreationbyaddingasimplepropertytotheelasticsearch.ymlconfigurationfile:
action.auto_create_index:false
Let’sstopforawhileanddiscusstheaction.auto_create_indexproperty,becauseitallowsustodomorecomplicatedthingsthanjustallowing(settingittotrue)anddisabling(settingittofalse)automaticindexcreation.Thementionedpropertyallowsustousepatternsthatspecifytheindexnameswhichshouldbeallowedtobeautomaticallycreatedandwhichshouldbedisallowed.Forexample,let’sassumethatwewouldliketoallowautomaticindexcreationforindicesstartingwithlogsandwewouldliketodisallowalltheothers.Todosomethinglikethis,wewouldsettheaction.auto_create_indexpropertytosomethingasfollows:
action.auto_create_index:+logs*,-*
Nowifwewouldliketocreateanindexcalledlogs_2015-10-01,wewouldsucceed.Tocreatesuchanindex,wewouldusethefollowingcommand:
www.EBooksWorld.ir
curl-XPUThttp://localhost:9200/logs_2015-10-01/log/1-d'{"message":
"Testlogmessage"}'
Elasticsearchwouldrespondwith:
{
"_index":"logs_2015-10-01",
"_type":"log",
"_id":"1",
"_version":1,
"_shards":{
"total":2,
"successful":1,
"failed":0
},
"created":true
}
However,supposewenowtrytocreatetheblogusingthefollowingcommand:
curl-XPUThttp://localhost:9200/blog/article/1-d'{"title":"Testarticle
title"}'
Elasticsearchwouldrespondwithanerrorsimilartothefollowingone:
{
"error":{
"root_cause":[{
"type":"index_not_found_exception",
"reason":"nosuchindex",
"resource.type":"index_expression",
"resource.id":"blog",
"index":"blog"
}],
"type":"index_not_found_exception",
"reason":"nosuchindex",
"resource.type":"index_expression",
"resource.id":"blog",
"index":"blog"
},
"status":404
}
Onethingtorememberisthattheorderofpatterndefinitionsmatters.Elasticsearchchecksthepatternsuptothefirstpatternthatmatches,soifwemove-*asthefirstpattern,the+logs*patternwon’tbeusedatall.
SettingsforanewlycreatedindexManualindexcreationisalsonecessarywhenwewanttopassnondefaultconfigurationoptionsduringindexcreation;forexample,initialnumberofshardsandreplicas.WecandothatbyincludingJSONpayloadwithsettingsasthePUTHTTPrequestbody.Forexample,ifwewouldliketotellElasticsearchthatourblogindexshouldonlyhaveasingleshardandtworeplicasinitially,thefollowingcommandcouldbeused:
curl-XPUThttp://localhost:9200/blog/-d'{
www.EBooksWorld.ir
"settings":{
"number_of_shards":1,
"number_of_replicas":2
}
}'
Theprecedingcommandwillresultinthecreationoftheblogindexwithoneshardandtworeplicas,makingatotalofthreephysicalLuceneindices–calledshardsaswealreadyknow.Ofcoursetherearealotmoresettingsthatwecanuse,butwhatwedidisenoughfornowandwewilllearnabouttherestthroughoutthebook.
IndexdeletionOfcourse,similartohowwehandleddocuments,Elasticsearchallowsustodeleteindicesaswell.Deletinganindexisverysimilartocreatingit,butinsteadofusingthePUTHTTPmethod,weusetheDELETEone.Forexample,ifwewouldliketodeleteourpreviouslycreatedblogindex,wewouldrunthefollowingcommand:
curl-XDELETEhttp://localhost:9200/blog
Theresponsewillbethesameastheonewesawearlierwhenwecreatedanindexandshouldlookasfollows:
{"acknowledged":true}
Nowthatweknowwhatanindexis,howtocreateit,andhowtodeleteit,wearereadytocreateindiceswiththemappingswehavedefined.EventhoughElasticsearchisschema–less,therearealotofsituationswherewewouldliketomanuallycreatetheschema,toavoidanyproblemswiththeindexstructure.
www.EBooksWorld.ir
www.EBooksWorld.ir
MappingsconfigurationIfyouareusedtoSQLdatabases,youmayknowthatbeforeyoucanstartinsertingthedatainthedatabase,youneedtocreateaschema,whichwilldescribewhatyourdatalookslike.AlthoughElasticsearchisaschema-less(werathercallitdatadrivenschema)searchengineandcanfigureoutthedatastructureonthefly,wethinkthatcontrollingthestructureandthusdefiningitourselvesisabetterway.Thefieldtypedeterminingmechanismisnotgoingtoguessthefuture.Forexample,ifyoufirstsendanintegervalue,suchas60,andyousendafloatvaluesuchas70.23forthesamefield,anerrorcanhappenorElasticsearchwilljustcutoffthedecimalpartofthefloatvalue(whichisactuallywhathappens).ThisisbecauseElasticsearchwillfirstsetthefieldtypetointegerandwilltrytoindexthefloatvaluetotheintegerfieldwhichwillcausecuttingofthedecimalpointinthefloatingpointnumber.Inthenextfewpagesyou’llseehowtocreatemappingsthatsuityourneedsandmatchyourdatastructure.
NoteNotethatwedidn’tincludealltheinformationabouttheavailabletypesinthischapterandsomefeaturesofElasticsearch,suchasnestedtype,parent-childhandling,storinggeographicalpoints,andsearch,aredescribedinthefollowingchaptersofthisbook.
www.EBooksWorld.ir
TypedeterminingmechanismBeforewestartdescribinghowtocreatemappingsmanually,wewanttogetbacktotheautomatictypedeterminingalgorithmusedinElasticsearch.Aswealreadysaid,ElasticsearchcantryguessingtheschemaforourdocumentsbylookingattheJSONthatthedocumentisbuiltfrom.BecauseJSONisstructured,thatseemseasytodo.Forexample,stringsaresurroundedbyquotationmarks,Booleansaredefinedusingspecificwords,andnumbersarejustafewdigits.Thisisasimpletrick,butitusuallyworks.Forexample,let’slookatthefollowingdocument:
{
"field1":10,
"field2":"10"
}
Theprecedingdocumenthastwofields.Thefield1fieldwillbegivenatypenumber(tobeprecise,thatfieldwillbegivenalongtype).Thesecondfield,calledfield2willbegivenastringtype,becauseitissurroundedbyquotationmarks.Ofcourse,forsomeusecasesthiscanbethedesiredbehavior.However,ifsomehowwewouldsurroundallthedatausingquotationmark(whichisnotthebestideaanyway)ourindexstructurewouldcontainonlystringtypefields.
NoteDon’tworryaboutthefactthatyouarenotfamiliarwithwhatarethenumerictypes,thestringtypes,andsoon.WewilldescribethemafterweshowyouwhatyoucandototunetheautomatictypedeterminingmechanisminElasticsearch.
DisablingthetypedeterminingmechanismThefirstsolutionistocompletelydisabletheschema-lessbehaviorinElasticsearch.Wecandothatbyaddingtheindex.mapper.dynamicpropertytoourindexpropertiesandsettingittofalse.Wecandothatbyrunningthefollowingcommandtocreatetheindex:
curl-XPUT'localhost:9200/sites'-d'{
"index.mapper.dynamic":false
}'
BydoingthatwetoldElasticsearchthatwedon’twantittoguessthetypeofourdocumentsinthesite’sindexandthatwewillprovidethemappingsourselves.Ifwewilltryindexingsomeexampledocumenttothesite’sindex,wewillgetthefollowingerror:
{
"error":{
"root_cause":[{
"type":"type_missing_exception",
"reason":"type[[doc,tryingtoautocreatemapping,butdynamic
mappingisdisabled]]missing",
"index":"sites"
}],
"type":"type_missing_exception",
"reason":"type[[doc,tryingtoautocreatemapping,butdynamic
www.EBooksWorld.ir
mappingisdisabled]]missing",
"index":"sites"
},
"status":404
}
Thisisbecausewedidn’tcreateanymappings–noschemafordocumentswascreated.Elasticsearchcouldn’tcreateoneforusbecausewedidn’tallowitandtheindexationcommandfailed.
Ofcoursethisisnottheonlythingwecandowhenitcomestoconfiguringhowthetypedeterminingmechanismworks.Wecanalsotuneitordisableitforagiventypeontheobjectlevel.WewilltalkaboutthesecondcaseinChapter5,ExtendingYourIndexStructure.Fornow,let’slookatthepossibilitiesoftuningtypedeterminingmechanisminElasticsearch.
TuningthetypedeterminingmechanismfornumerictypesOneofthesolutionstotheproblemswithJSONdocumentsandtypeguessingisthatwearenotalwaysincontrolofthedata.Thedocumentsthatweareindexingcancomefrommultipleplacesandsomesystemsinourenvironmentmayincludequotationmarksforallthefieldsinthedocument.Thiscanleadtoproblemsandbadguesses.Becauseofthat,Elasticsearchallowsustoenablemoreaggressivefieldsvaluecheckingfornumericfieldsbysettingthenumeric_detectionpropertytotrueinthemappingsdefinition.Forexample,let’sassumethatwewanttocreateanindexcalledusersandwewantittohavetheusertypeonwhichwewillwantmoreaggressivenumericfieldsparsing.Todothat,wewillusethefollowingcommand:
curl-XPUThttp://localhost:9200/users/?pretty-d'{
"mappings":{
"user":{
"numeric_detection":true
}
}
}'
Nowlet’srunthefollowingcommandtoindexasingledocumenttotheusersindex:
curl-XPOSThttp://localhost:9200/users/user/1-d'{"name":"User1",
"age":"20"}'
Earlier,withthedefaultsettings,theagefieldwouldbesettostringtype.Withthenumeric_detectionpropertysettotrue,thetypeoftheagefieldwillbesettolong.Wecancheckthatbyrunningthefollowingcommand(itwillretrievethemappingsforallthetypesintheusersindex):
curl-XGET'localhost:9200/users/_mapping?pretty'
TheprecedingcommandshouldresultinthefollowingresponsereturnedbyElasticsearch:
{
"users":{
"mappings":{
www.EBooksWorld.ir
"user":{
"numeric_detection":true,
"properties":{
"age":{
"type":"long"
},
"name":{
"type":"string"
}
}
}
}
}
}
Aswecansee,theagefieldwasreallysettobeoftypelong.
TuningthetypedeterminingmechanismfordatesAnothertypeofdatathatcausestroublearefieldswithdates.Datescancomeindifferentflavors,forexample,2015-10-0111:22:33isaproperdateandsois2015-10-01T11:22:33+00.Becauseofthat,Elasticsearchtriestomatchthefieldstotimestampsorstringsthatmatchsomegivendateformat.Ifthatmatchingoperationissuccessful,thefieldistreatedasadatebasedone.Ifweknowhowourdatefieldslook,wecanhelpElasticsearchbyprovidingalistofrecognizeddateformatsusingthedynamic_date_formatsproperty,whichallowsustospecifytheformatsarray.Let’slookatthefollowingcommandforcreatinganindex:
curl-XPUT'http://localhost:9200/blog/'-d'{
"mappings":{
"article":{
"dynamic_date_formats":["yyyy-MM-ddhh:mm"]
}
}
}'
Theprecedingcommandwillresultinthecreationofanindexcalledblogwiththesingletypecalledarticle.We’vealsousedthedynamic_date_formatspropertywithasingledateformatthatwillresultinElasticsearchusingthedatecoretype(refertotheCoretypessectioninthischapterformoreinformationaboutfieldtypes)forfieldsmatchingthedefinedformat.Elasticsearchusesthejoda-timelibrarytodefinethedateformats,sovisithttp://joda-time.sourceforge.net/api-release/org/joda/time/format/DateTimeFormat.htmlifyouareinterestedinknowingaboutthem.
NoteRememberthatthedynamic_date_formatpropertyacceptsanarrayofvalues.Thatmeansthatwecanhandleseveraldateformatssimultaneously.
Withtheprecedingindex,wecannowtryindexinganewdocumentusingthefollowingcommand:
www.EBooksWorld.ir
curl-XPUTlocalhost:9200/blog/article/1-d'{"name":"Test",
"test_field":"2015-10-0111:22"}'
Elasticsearchwillofcourseindexthatdocument,butlet’slookatthemappingscreatedforourindex:
curl-XGET'localhost:9200/blog/_mapping?pretty'
Theresponsefortheprecedingcommandwillbeasfollows:
{
"blog":{
"mappings":{
"article":{
"dynamic_date_formats":["yyyy-MM-ddhh:mm"],
"properties":{
"name":{
"type":"string"
},
"test_field":{
"type":"date",
"format":"yyyy-MM-ddhh:mm"
}
}
}
}
}
}
Aswecansee,thetest_fieldfieldwasgivenadatetype,soourtuningworks.
Unfortunately,theproblemstillexistsifwewanttheBooleantypetobeguessed.ThereisnooptiontoforcetheguessingofBooleantypesfromthetext.Insuchcases,whenachangeofsourceformatisimpossible,wecanonlydefinethefielddirectlyinthemappingsdefinition.
www.EBooksWorld.ir
IndexstructuremappingEachdatahasitsownstructure–someareverysimple,andsomeincludecomplicatedobjectrelations,childrendocuments,andnestedproperties.Ineachcase,weneedtohaveaschemainElasticsearchcalledmappingsthatdefinehowthedatalooks.Ofcourse,wecanusetheschema-lessnatureofElasticsearch,butwecanandweusuallywanttopreparethemappingsupfront,soweknowhowthedataishandled.
Forthepurposesofthischapter,wewilluseasingletypeintheindex.Ofcourse,Elasticsearchasamultitenantsystemallowsustohavemultipletypesinasingleindex,butwewanttosimplifytheexample,tomakeiteasiertounderstand.So,forthepurposeofthenextfewpages,wewillcreateanindexcalledpoststhatwillholddatafordocumentsinaposttype.Wealsoassumethattheindexwillholdthefollowinginformation:
UniqueidentifieroftheblogpostNameoftheblogpostPublicationdateContents–textofthepostitself
InElasticsearch,mappings,aswithalmostallcommunication,aresentasJSONobjectsintherequestbody.So,ifwewanttocreatethesimplestmappingsthatmatchesourneed,itwilllookasfollows(westoredthemappingsintheposts.jsonfile,sowecaneasilysendit):
{
"mappings":{
"post":{
"properties":{
"id":{"type":"long"},
"name":{"type":"string"},
"published":{"type":"date"},
"contents":{"type":"string"}
}
}
}
}
Tocreateourpostsindexwiththeprecedingmappingsfile,wewilljustrunthefollowingcommand:
curl-XPOST'http://localhost:9200/posts'[email protected]
NoteNotethatyoucanstoreyourmappingsandsetafilenametowhatevernameyoulike.Thecurlcommandwilljusttakethecontentsofit.
Andagain,ifeverythinggoeswell,weseethefollowingresponse:
{"acknowledged":true}
Elasticsearchreportedthatourindexhasbeencreated.IfwelookattheElasticsearchnodewww.EBooksWorld.ir
–onthecurrentmaster,wewillseesomethingasfollows:
[2015-10-1415:02:12,840][INFO][cluster.metadata][Shalla-Bal]
[posts]creatingindex,cause[api],templates[],shards[5]/[1],mappings
[post]
Wecanseethatthepostsindexhasbeencreated,with5shardsand1replica(shards[5]/[1])andwithmappingsforasingleposttype(mappings[post]).Let’snowdiscussthecontentsoftheposts.jsonfileandthepossibilitieswhenitcomestomappings.
TypeandtypesdefinitionThemappingsdefinitioninElasticsearchisjustanotherJSONobject,soitneedstobeproperlystartedandendedwithcurlybrackets.Allthemappingsdefinitionsarenestedinsideasinglemappingsobject.Inourexample,wehadasingleposttype,butwecanhavemultipleofthem.Forexample,ifwewouldliketohavemorethanasingletypeinourmappings,wejustneedtoseparatethemwithacommacharacter.Let’sassumethatwewouldliketohaveanadditionalusertypeinourpostsindex.Themappingsdefinitioninsuchcasewilllookasfollows(westoreditintheposts_with_user.jsonfile):
{
"mappings":{
"post":{
"properties":{
"id":{"type":"long"},
"name":{"type":"string"},
"published":{"type":"date"},
"contents":{"type":"string"}
}
},
"user":{
"properties":{
"id":{"type":"long"},
"name":{"type":"string"}
}
}
}
}
Asyoucansee,wecannamethetypeswiththenameswewant.Undereachtypewehavethepropertiesobjectinwhichwestoretheactualnameofthefieldsandtheirdefinition.
FieldsEachfieldinthemappingsdefinitionisjustanameandanobjectdescribingthepropertiesofthefield.Forexample,wecanhaveafielddefinedasthefollowing:
"body":{"type":"string","store":"yes","index":"analyzed"}
Theprecedingfielddefinitionstartswithaname–body.Afterthatwehaveanobjectwiththreeproperties–thetypeofthefield(thetypeproperty),iftheoriginalfieldvalueshouldbestored(thestoreproperty),andifthefieldshouldbeindexedandhow(theindexproperty).And,ofcourse,multiplefielddefinitionsareseparatedfromeachotherusingthecommacharacter,justlikeotherJSONobjects.
www.EBooksWorld.ir
CoretypesEachfieldtypeinElasticsearchcanbegivenoneoftheprovidedcoretypes.ThecoretypesinElasticsearchareasfollows:
StringNumber(integer,long,float,double)DateBooleanBinary
Inadditiontothecoretypes,Elasticsearchprovidesadditionaltypesthatcanhandlemorecomplicateddata–suchasnesteddocuments,object,andsoon.WewilltalkabouttheminChapter5,ExtendingYourIndexStructure.
Commonattributes
Beforecontinuingwithallthecoretypedescriptions,wewouldliketodiscusssomecommonattributesthatyoucanusetodescribeallthetypes(exceptforthebinaryone):
index_name:Thisattributedefinesthenameofthefieldthatwillbestoredintheindex.Ifthisisnotdefined,thenamewillbesettothenameoftheobjectthatthefieldisdefinedwith.Usually,youdon’tneedtosetthisproperty,butitmaybeusefulinsomecases;forexample,whenyoudon’thavecontroloverthenameofthefieldsintheJSONdocumentsthataresenttoElasticsearch.index:Thisattributecantakethevaluesanalyzedandnoand,forstring-basedfields,itcanalsobesettotheadditionalnot_analyzedvalue.Ifsettoanalyzed,thefieldwillbeindexedandthussearchable.Ifsettono,youwon’tbeabletosearchonsuchafield.Thedefaultvalueisanalyzed.Incaseofstring-basedfields,thereisanadditionaloption,not_analyzed.This,whenset,willmeanthatthefieldwillbeindexedbutnotanalyzed.So,thefieldiswrittenintheindexasitwassenttoElasticsearchandonlyaperfectmatchwillbecountedduringasearch–thequerywillhavetoincludeexactlythesamevalueasthevalueintheindex.IfwecompareittotheSQLdatabasesworld,settingtheindexpropertyofafieldtonot_analyzedwouldworkjustlikeusingwherefield=value.Alsorememberthatsettingtheindexpropertytonowillresultinthedisablinginclusionofthatfieldininclude_in_all(theinclude_in_allpropertyisdiscussedasthelastpropertyinthelist).store:Thisattributecantakethevaluesyesandnoandspecifiesiftheoriginalvalueofthefieldshouldbewrittenintotheindex.Thedefaultvalueisno,whichmeansthatElasticsearchwon’tstoretheoriginalvalueofthefieldandwilltrytousethe_sourcefield(theJSONrepresentingtheoriginaldocumentthathasbeensenttoElasticsearch)whenyouwanttoretrievethefieldvalue.Storedfieldsarenotusedforsearching,howevertheycanbeusedforhighlightingifenabled(whichmaybemoreefficientthatloadingthe_sourcefieldincaseitisbig).doc_values:Thisattributecantakethevaluesoftrueandfalse.Whensettotrue,Elasticsearchwillcreateaspecialondiskstructureduringindexationfornottokenizedfields(likenotanalyzedstringfields,numberbasedfields,Booleanfields,
www.EBooksWorld.ir
anddatefields).ThisstructureishighlyefficientandisusedbyElasticsearchforoperationsthatrequireun-inverteddata,suchasaggregations,sorting,orscripting.StartingwithElasticsearch2.0thedefaultvalueofthisistruefornottokenizedfields.SettingthisvaluetofalsewillresultinElasticsearchusingfielddatacacheinsteadofdocvalues,whichhashighermemorydemand,butmaybefasterinsomeraresituations.boost:Thisattributedefineshowimportantthefieldisinsidethedocument;thehighertheboost,themoreimportantthevaluesinthefieldare.Thedefaultvalueofthisattributeis1,whichmeansaneutralvalue–anythingabove1willmakethefieldmoreimportant,anythinglessthan1willmakeitlessimportant.null_value:Thisattributespecifiesavaluethatshouldbewrittenintotheindexincasethatfieldisnotapartofanindexeddocument.Thedefaultbehaviorwilljustomitthatfield.copy_to:Thisattributespecifiesanarrayoffieldstowhichtheoriginalvaluewillbecopiedto.Thisallowsfordifferentkindofanalysisofthesamedata.Forexample,youcouldimaginehavingtwofields–onecalledtitleandonecalledtitle_sort,eachhavingthesamevaluebutprocesseddifferently.Wecouldusecopy_totocopythetitlefieldvaluetotitle_sort.include_in_all:Thisattributespecifiesifthefieldshouldbeincludedinthe_allfield.The_allfieldisaspecialfieldusedbyElasticsearchtoalloweasysearchinginthecontentsofthewholeindexeddocument.Elasticsearchcreatesthecontentofthe_allfieldbycopyingallthedocumentfieldsthere.Bydefault,ifthe_allfieldisused,allthefieldswillbeincludedinit.
String
Stringisthebasictexttypewhichallowsustostoreoneormorecharactersinsideit.Asampledefinitionofsuchafieldisasfollows:
"body":{"type":"string","store":"yes","index":"analyzed"}
Inadditiontothecommonattributes,thefollowingattributescanalsobesetforthestring-basedfields:
term_vector:Thisattributecantakethevaluesno(thedefaultone),yes,with_offsets,with_positions,andwith_positions_offsets.ItdefineswhetherornottocalculatetheLucenetermvectorsforthatfield.Ifyouareusinghighlighting(distinctionwhichtermswherematchedinadocumentduringthequery),youwillneedtocalculatethetermvectorforthesocalledfastvectorhighlighting–amoreefficienthighlightingversion.analyzer:Thisattributedefinesthenameoftheanalyzerusedforindexingandsearching.Itdefaultstotheglobally-definedanalyzername.search_analyzer:Thisattributedefinesthenameoftheanalyzerusedforprocessingthepartofthequerystringthatissenttoaparticularfield.norms.enabled:Thisattributespecifieswhetherthenormsshouldbeloadedforafield.Bydefault,itissettotrueforanalyzedfields(whichmeansthatthenormswillbeloadedforsuchfields)andtofalsefornon-analyzedfields.Normsarevalues
www.EBooksWorld.ir
insideofLuceneindexthatareusedwhencalculatingascoreforadocument–usuallynotneededfornotanalyzedfieldsandusedonlyduringquerytime.Anexampleindexcreationcommandthatdisablesnormforasinglefieldpresentwouldlookasfollows:
curl-XPOST'localhost:9200/essb'-d'{
"mappings":{
"book":{
"properties":{
"name":{
"type":"string",
"norms":{
"enabled":false
}
}
}
}
}
}'
norms.loading:ThisattributetakesthevalueseagerandlazyanddefineshowElasticsearchwillloadthenorms.Thefirstvaluemeansthatthenormsforsuchfieldsarealwaysloaded.Thesecondvaluemeansthatthenormswillbeloadedonlywhenneeded.Normsareusefulforscoring,butmayrequireavastamountofmemoryforlargedatasets.Havingnormsloadedeagerly(propertysettoeager)meanslessworkduringquerytime,butwillleadtomorememoryconsumption.Anexampleindexcreationcommandthateagerlyloadnormsforasinglefieldpresentlookasfollows:
curl-XPOST'localhost:9200/essb_eager'-d'{
"mappings":{
"book":{
"properties":{
"name":{
"type":"string",
"norms":{
"loading":"eager"
}
}
}
}
}
}'
position_offset_gap:Thisattributedefaultsto0andspecifiesthegapintheindexbetweeninstancesofthegivenfieldwiththesamename.Settingthistoahighervaluemaybeusefulifyouwantposition-basedqueries(suchasphrasequeries)tomatchonlyinsideasingleinstanceofthefield.index_options:Thisattributedefinestheindexingoptionsforthepostingslist–thestructureholdingtheterms(wetalkmoreaboutitinthePostingsformatsectionofthischapter).Thepossiblevaluesaredocs(onlydocumentnumbersareindexed),freqs(documentnumbersandtermfrequenciesareindexed),positions(documentnumbers,termfrequencies,andtheirpositionsareindexed),andoffsets(document
www.EBooksWorld.ir
numbers,termfrequencies,theirpositions,andoffsetsareindexed).Thedefaultvalueforthispropertyispositionsforanalyzedfieldsanddocsforfieldsthatareindexedbutnotanalyzed.ignore_above:Thisattributedefinesthemaximumsizeofthefieldincharacters.Afieldwhosesizeisabovethespecifiedvaluewillbeignoredbytheanalyzer.
NoteInoneoftheupcomingElasticsearchversions,thestringtypemaybedeprecatedandmaybereplacedbytwonewtypes,textandkeyword,tobetterindicatewhatthestringbasedfieldisrepresenting.Thetexttypewillbeusedforanalyzedtextfieldsandthekeywordtypewillbeusedfornotanalyzedtextfields.Ifyouareinterestedintheincomingchanges,refertothefollowingGitHubissue:https://github.com/elastic/elasticsearch/issues/12394.
Number
Thisisthecommonnameforafewcoretypesthatgatherallthenumericfieldtypesthatareavailableandwaitingtobeused.ThefollowingtypesareavailableinElasticsearch(wespecifythembyusingthetypeproperty):
byte:Thistypedefinesabytevalue;forexample,1.Itallowsforvaluesbetween-128and127inclusive.short:Thistypedefinesashortvalue;forexample,12.Itallowsforvaluesbetween-32768and32767inclusive.integer:Thistypedefinesanintegervalue;forexample,134.Itallowsforvaluesbetween-231and231-1inclusiveuptoJava7andvaluesbetween0and232-1inJava8.long:Thistypedefinesalongvalue;forexample,123456789.Itallowsforvaluesbetween-263and263-1inclusiveuptoJava7andvaluesbetween0and264-1inJava8.float:Thistypedefinesafloatvalue;forexample,12.23.Forinformationaboutthepossiblevalues,refertohttps://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2.3.double:Thistypedefinesadoublevalue;forexample,123.45.Forinformationaboutthepossiblevalues,refertohttps://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2.3.
NoteYoucanlearnmoreaboutthementionedJavatypesathttp://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html.
Asampledefinitionofafieldbasedononeofthenumerictypesisasfollows:
"price":{"type":"float","precision_step":"4"}
Inadditiontothecommonattributes,thefollowingonescanalsobesetforthenumericfields:
www.EBooksWorld.ir
precision_step:Thisattributedefinesthenumberoftermsgeneratedforeachvalueinthenumericfield.Thelowerthevalue,thehigherthenumberoftermsgenerated.Forfieldswithahighernumberoftermspervalue,rangequerieswillbefasteratthecostofaslightlylargerindex.Thedefaultvalueis16forlonganddouble,8forinteger,short,andfloat,and2147483647forbyte.coerce:Thisattributedefaultstotrueandcantakethevalueoftrueorfalse.ItdefinesifElasticsearchshouldtrytoconvertthestringvaluestonumbersforagivenfieldandifthedecimalpartsofthefloatvalueshouldbetruncatedfortheintegerbasedfields.ignore_malformed:Thisattributecantakethevaluetrueorfalse(whichisthedefault).Itshouldbesettotrueinordertoomitthebadlyformattedvalues.
Boolean
ThebooleancoretypeisdesignedforindexingtheBooleanvalues(trueorfalse).Asampledefinitionofafieldbasedonthebooleantypeisasfollows:
"allowed":{"type":"boolean","store":"yes"}
Binary
ThebinaryfieldisaBASE64representationofthebinarydatastoredintheindex.Youcanuseittostoredatathatisnormallywritteninbinaryform,suchasimages.Fieldsbasedonthistypearebydefaultstoredandnotindexed,soyoucanonlyretrievethemandnotperformsearchoperationsonthem.Thebinarytypeonlysupportstheindex_name,type,store,anddoc_valuesproperties.Thesamplefielddefinitionbasedonthebinaryfieldmaylooklikethefollowing:
"image":{"type":"binary"}
Date
Thedatecoretypeisdesignedtobeusedfordateindexing.ThedateinthefieldallowsustospecifyaformatthatwillberecognizedbyElasticsearch.ItisworthnotingthatallthedatesareindexedinUTCandareinternallyindexedaslongvalues.Inadditiontothat,forthedatebasedfields,ElasticsearchacceptslongvaluesrepresentingUTCmillisecondssinceepochregardlessoftheformatspecifiedforthedatefield.
ThedefaultdateformatrecognizedbyElasticsearchisquiteuniversalandallowsustoprovidethedateandoptionallythetime;forexample,2012-12-24T12:10:22.Asampledefinitionofafieldbasedonthedatetypeisasfollows:
"published":{"type":"date","format":"YYYY-mm-dd"}
Asampledocumentthatusestheabovedatefieldwiththespecifiedformatisasfollows:
{
"name":"Sampledocument",
"published":"2012-12-22"
}
Inadditiontothecommonattributes,thefollowingonescanalsobesetforthefields
www.EBooksWorld.ir
basedonthedatetype:
format:Thisattributespecifiestheformatofthedate.ThedefaultvalueisdateOptionalTime.Forafulllistofformats,visithttps://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html.precision_step:Thisattributedefinesthenumberoftermsgeneratedforeachvalueinthenumericfield.Refertothenumericcoretypedescriptionformoreinformationaboutthisparameter.numeric_resolution:ThisattributedefinestheunitoftimethatElasticsearchwillusewhenanumericvalueispassedtothedatebasedfieldinsteadofthedatefollowingaformat.Bydefault,Elasticsearchusesthemillisecondsvalue,whichmeansthatthenumericvaluewillbetreatedasmillisecondssinceepoch.Anothervalueisseconds.ignore_malformed:Thisattributecantakethevaluetrueorfalse.Thedefaultvalueisfalse.Itshouldbesettotrueinordertoomitbadlyformattedvalues.
MultifieldsTherearesituationswhereweneedtohavethesamefieldanalyzeddifferently.Forexample,oneforsorting,oneforsearching,andoneforanalysiswithaggregations,butallusingthesamefieldvalue,justindexeddifferently.Wecouldofcourseusethepreviouslydescribedfieldvaluecopying,butwecanalsousesocalledmultifields.TobeabletousethatfeatureofElasticsearch,weneedtodefineanadditionalpropertyinourfielddefinitioncalledfields.Thefieldsisanobjectthatcancontainoneormoreadditionalfieldsthatwillbepresentinourindexandwillhavethevalueofthefieldthattheyareassignedto.Forexample,ifwewouldliketohaveaggregationsdoneonthenamefieldandinadditiontothatsearchonthatfield,wewoulddefineitasfollows:
"name":{
"type":"string",
"fields":{
"agg":{"type":"string","index":"not_analyzed"}
}
}
Theprecedingdefinitionwillcreatetwofields–onecallednameandthesecondcalledname.agg.Ofcourse,youdon’thavetospecifytwoseparatefieldsinthedatayouaresendingtoElasticsearch–asingleonenamednameisenough.Elasticsearchwilldotherest,whichmeanscopyingthevalueofthefieldtoallthefieldsfromtheprecedingdefinition.
TheIPaddresstypeTheipfieldtypewasaddedtoElasticsearchtosimplifytheuseofIPv4addressesinanumericform.ThisfieldtypeallowsustosearchdatathatisindexedasanIPaddress,sortonsuchdata,anduserangequeriesusingIPvalues.
Asampledefinitionofafieldbasedononeofthenumerictypesisasfollows:
www.EBooksWorld.ir
"address":{"type":"ip"}
Inadditiontothecommonattributes,theprecision_stepattributecanalsobesetfortheiptypebasedfields.Refertothenumerictypedescriptionformoreinformationaboutthatproperty.
Asampledocumentthatusestheipbasedfieldlooksasfollows:
{
"name":"TomPC",
"address":"192.168.2.123"
}
TokencounttypeThetoken_countfieldtypeallowsustostoreandindexinformationabouthowmanytokensthegivenfieldhasinsteadofstoringandindexingthetextprovidedtothefield.Itacceptsthesameconfigurationoptionsasthenumbertype,butinadditiontothat,weneedtospecifytheanalyzerwhichwillbeusedtodividethefieldvalueintotokens.Wedothatbyusingtheanalyzerproperty.
Asampledefinitionofafieldbasedonthetoken_countfieldtypelooksasfollows:
"title_count":{"type":"token_count","analyzer":"standard"}
www.EBooksWorld.ir
UsinganalyzersThegreatthingaboutElasticsearchisthatitleveragestheanalysiscapabilitiesofApacheLucene.Thismeansthatforfieldsthatarebasedonthestringtype,wecanspecifywhichanalyzerElasticsearchshoulduse.AsyourememberfromtheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster,theanalyzerisafunctionalitythatisusedtoanalyzedataorqueriesinthewaywewant.Forexample,whenwedividewordsonthebasisofwhitespacesandlowercasecharacters,wedon’thavetoworryabouttheuserssendingwordsthatarelowercasedoruppercased.ThismeansthatElasticsearch,elasticsearch,andElAstIcSeaRChwillbetreatedasthesameword.What’smoreisthatElasticsearchallowsustousenotonlytheanalyzersprovidedoutofthebox,butalsocreateourownconfigurations.Wecanalsousedifferentanalyzersatthetimeofindexinganddifferentanalyzersatthetimeofquerying—wecanchoosehowwewantourdatatobeprocessedateachstageofthesearchprocess.Let’snowhavealookattheanalyzersprovidedbyElasticsearchandatElasticsearchanalysisfunctionalityingeneral.
Out-of-the-boxanalyzersElasticsearchallowsustouseoneofthemanyanalyzersdefinedbydefault.Thefollowinganalyzersareavailableoutofthebox:
standard:ThisanalyzerisconvenientformostEuropeanlanguages(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.htmlforthefulllistofparameters).simple:Thisanalyzersplitstheprovidedvalueonnon-lettercharactersandconvertsthemtolowercase.whitespace:Thisanalyzersplitstheprovidedvalueonthebasisofwhitespacecharacters.stop:Thisissimilartoasimpleanalyzer,butinadditiontothefunctionalityofthesimpleanalyzer,itfiltersthedataonthebasisoftheprovidedsetofstopwords(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stop-analyzer.htmlforthefulllistofparameters).keyword:Thisisaverysimpleanalyzerthatjustpassestheprovidedvalue.You’llachievethesamebyspecifyingaparticularfieldasnot_analyzed.pattern:Thisanalyzerallowsflexibletextseparationbytheuseofregularexpressions(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.htmlforthefulllistofparameters).Thekeypointtorememberwhenitcomestothepatternanalyzeristhattheprovidedpatternshouldmatchtheseparatorsofthewords,notthewordsthemselves.language:Thisanalyzerisdesignedtoworkwithaspecificlanguage.Thefulllistoflanguagessupportedbythisanalyzercanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html.snowball:Thisisananalyzerthatissimilartostandard,butadditionallyprovidesthe
www.EBooksWorld.ir
stemmingalgorithm(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-snowball-analyzer.htmlforthefulllistofparameters).
NoteStemmingistheprocessofreducingtheinflectedandderivedwordstotheirstemorbaseform.Suchaprocessallowsforthereductionofwords,forexample,withcarsandcar.Forthementionedwords,stemmer(whichisanimplementationofthestemmingalgorithm)willproduceasinglestem,car.Afterindexing,thedocumentscontainingsuchwordswillbematchedwhileusinganyofthem.Withoutstemming,thedocumentswiththeword“cars”willonlybematchedbyaquerycontainingthesameword.YoucanfindmoreinformationaboutstemmingonWikipediaathttps://en.wikipedia.org/wiki/Stemming.
DefiningyourownanalyzersInadditiontotheanalyzersmentionedpreviously,ElasticsearchallowsustodefinenewoneswithouttheneedforwritingasinglelineofJavacode.Inordertodothat,weneedtoaddanadditionalsectiontoourmappingsfile;thatis,thesettingssection,whichholdsadditionalinformationusedbyElasticsearchduringindexcreation.Thefollowingcodesnippetshowshowwecandefineourcustomsettingssection:
"settings":{
"index":{
"analysis":{
"analyzer":{
"en":{
"tokenizer":"standard",
"filter":[
"asciifolding",
"lowercase",
"ourEnglishFilter"
]
}
},
"filter":{
"ourEnglishFilter":{
"type":"kstem"
}
}
}
}
}
Wespecifiedthatwewantanewanalyzernamedentobepresent.Eachanalyzerisbuiltfromasingletokenizerandmultiplefilters.Acompletelistofthedefaultfiltersandtokenizerscanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html.Ourenanalyzerincludesthestandardtokenizerandthreefilters:asciifoldingandlowercase,whicharetheonesavailablebydefault,andacustomourEnglishFilter,whichisafilterwehavedefined.
www.EBooksWorld.ir
Todefineafilter,weneedtoprovideitsname,itstype(thetypeproperty),andanynumberofadditionalparametersrequiredbythatfiltertype.ThefulllistoffiltertypesavailableinElasticsearchcanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenfilters.html.Pleasebeaware,thatwewon’tbediscussingeachfilterasthelistoffiltersisconstantlychanging.Ifyouareinterestedinthefullfilterslist,pleaserefertothementionedpageinthedocumentation.
So,thefinalmappingsfilewithourcustomanalyzerdefinedwillbeasfollows:
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"en":{
"tokenizer":"standard",
"filter":[
"asciifolding",
"lowercase",
"ourEnglishFilter"
]
}
},
"filter":{
"ourEnglishFilter":{
"type":"kstem"
}
}
}
}
},
"mappings":{
"post":{
"properties":{
"id":{"type":"long"},
"name":{"type":"string","analyzer":"en"}
}
}
}
}
Ifwesavetheprecedingmappingstoafilecalledposts_mappings.json,wecanrunthefollowingcommandtocreatethepostsindex:
curl-XPOST'http://localhost:9200/posts'-d@posts_mappings.json
WecanseehowouranalyzerworksbyusingtheAnalyzeAPI(https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html).Forexample,let’slookatthefollowingcommand:
curl-XGET'localhost:9200/posts/_analyze?pretty&field=name'-d'robots
cars'
www.EBooksWorld.ir
ThecommandasksElasticsearchtoshowthecontentoftheanalysisofthegivenphrase(robotscars)withtheuseoftheanalyzerdefinedfortheposttypeanditsnamefield.TheresponsethatwewillgetfromElasticsearchisasfollows:
{
"tokens":[{
"token":"robot",
"start_offset":0,
"end_offset":6,
"type":"<ALPHANUM>",
"position":0
},{
"token":"car",
"start_offset":7,
"end_offset":11,
"type":"<ALPHANUM>",
"position":1
}]
}
Asyoucansee,therobotscarsphrasewasdividedintotwotokens.Inadditiontothat,therobotswordwaschangedtorobotandthecarswordwaschangedtocar.
DefaultanalyzersThereisonemorethingtosayaboutanalyzers.Elasticsearchallowsustospecifytheanalyzerthatshouldbeusedbydefaultifnoanalyzerisdefined.Thisisdoneinthesamewayasweconfiguredacustomanalyzerinthesettingssectionofthemappingsfile,butinsteadofspecifyingacustomnamefortheanalyzer,adefaultkeywordshouldbeused.Sotomakeourpreviouslydefinedanalyzerthedefault,wecanchangetheenanalyzertothefollowing:
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"default":{
"tokenizer":"standard",
"filter":[
"asciifolding",
"lowercase",
"ourEnglishFilter"
]
}
},
"filter":{
"ourEnglishFilter":{
"type":"kstem"
}
}
}
}
}
www.EBooksWorld.ir
}
Wecanalsochooseadifferentdefaultanalyzerforsearchingandadifferentoneforindexing.Ifwewouldliketodothatinsteadofusingthedefaultkeywordfortheanalyzername,weshouldusedefault_searchanddefault_indexrespectively.
www.EBooksWorld.ir
DifferentsimilaritymodelsWiththereleaseofApacheLucene4.0in2012,alltheusersofthisgreatfulltextsearchlibraryweregiventheopportunitytoalterthedefaultTF/IDF-basedalgorithmanduseadifferentone(we’vementioneditintheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster).BecauseofthatweareabletochooseasimilaritymodelinElasticsearch,whichbasicallyallowsustousedifferentscoringformulasforourdocuments.
NoteNotethatthesimilaritymodelstopicrangesfromintermediatetoadvancedandinmostcasestheTF/IDFbasedalgorithmwillbesufficientforyourusecase.However,wedecidedtohaveitdescribedinthebook,soyouknowthatyouhavethepossibilityofchangingthescoringalgorithmbehaviorifneeded.
Settingper-fieldsimilaritySinceElasticsearch0.90,weareallowedtosetadifferentsimilarityforeachofthefieldsthatwehaveinourmappingsfile.Forexample,let’sassumethatwehavethefollowingsimplemappingsthatweuseinordertoindextheblogposts:
{
"mappings":{
"post":{
"properties":{
"id":{"type":"long"},
"name":{"type":"string"},
"contents":{"type":"string"}
}
}
}
}
Todothis,wewillusetheBM25similaritymodelforthenamefieldandthecontentsfield.Inordertodothat,weneedtoextendourfielddefinitionsandaddthesimilaritypropertywiththevalueofthechosensimilarityname.Ourchangedmappingswilllooklikethefollowing:
{
"mappings":{
"post":{
"properties":{
"id":{"type":"long"},
"name":{"type":"string","similarity":"BM25"},
"contents":{"type":"string","similarity":"BM25"}
}
}
}
}
Andthat’sall,nothingmoreisneeded.Aftertheabovechange,ApacheLucenewillusetheBM25similaritytocalculatethescorefactorforthenameandthecontentsfields.
www.EBooksWorld.ir
AvailablesimilaritymodelsThereareatleastfivenewsimilaritymodelsavailable.Formostoftheusecases,apartfromthedefaultone,youmayfindthefollowingmodelsuseful:
OkapiBM25model:Thissimilaritymodelisbasedonaprobabilisticmodelthatestimatestheprobabilityoffindingadocumentforagivenquery.InordertousethissimilarityinElasticsearch,youneedtousetheBM25name.OkapiBM25similarityissaidperformbestwhendealingwithshorttextdocumentswheretermrepetitionsareespeciallyhurtfultotheoveralldocumentscore.Tousethissimilarity,oneneedstosetthesimilaritypropertyforafieldtoBM25.Thissimilarityisdefinedoutoftheboxanddoesn’tneedadditionalpropertiestobeset.Divergencefromrandomnessmodel:Thissimilaritymodelisbasedontheprobabilisticmodelofthesamename.InordertousethissimilarityinElasticsearch,youneedtousetheDFRname.Itissaidthatthedivergencefromrandomnesssimilaritymodelperformswellontextthatissimilartonaturallanguage.Information-basedmodel:Thisisthelastmodelofthenewlyintroducedsimilaritymodelsandisverysimilartothedivergencefromrandomnessmodel.InordertousethissimilarityinElasticsearch,youneedtousetheIBname.SimilartotheDFRsimilarity,itissaidthattheinformation-basedmodelperformswellondatasimilartonaturallanguagetext.
ThetwoothersimilaritymodelscurrentlyavailableareLMDirichletsimilarity(touseit,setthetypepropertytoLMDirichlet)andLMJelinekMercersimilarity(touseit,setthetypepropertytoLMJelinekMercer).YoucanfindmoreaboutthesesimilaritymodelsinApacheLuceneJavadocs,MasteringElasticsearchSecondEdition,publishedbyPacktPublishingorinofficialdocumentationofElasticsearchavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-similarity.html.
Configuringdefaultsimilarity
Thedefaultsimilarityallowsustoprovideanadditionaldiscount_overlapsproperty.Itallowsustocontrolifthetokensonthesamepositionsinthetokenstream(withpositionincrementof0)areomittedduringscorecalculation.Bydefault,itissettotrue,whichmeansthatthetokensonthesamepositionsareomitted;ifyouwantthemtobecounted,youcansetthatpropertytofalse.Forexample,thefollowingcommandshowshowtocreateanindexwiththediscount_overlapspropertychangedforthedefaultsimilarity:
curl-XPUT'localhost:9200/test_similarity'-d'{
"settings":{
"similarity":{
"altered_default":{
"type":"default",
"discount_overlaps":false
}
}
},
"mappings":{
"doc":{
www.EBooksWorld.ir
"properties":{
"name":{"type":"string","similarity":"altered_default"}
}
}
}
}'
ConfiguringBM25similarity
Eventhoughwedon’tneedtoconfiguretheBM25similarity,wecanprovidesomeadditionaloptionstotuneitsbehavior.TheBM25similarityallowsustoprovidethediscount_overlapspropertysimilartothedefaultsimilarityandtwoadditionalproperties:k1andb.Thek1propertyspecifiesthetermfrequencynormalizationfactorandthebpropertyvaluedeterminestowhatdegreethedocumentlengthwillnormalizethetermfrequencyvalues.
ConfiguringDFRsimilarity
IncaseoftheDFRsimilarity,wecanconfigurethebasic_modelproperty(whichcantakethevaluebe,d,g,if,in,p,orine),theafter_effectproperty(withvaluesofno,b,orl),andthenormalizationproperty(whichcanbeno,h1,h2,h3,orz).Ifwechooseanormalizationvalueotherthanno,weneedtosetthenormalizationfactor.
Dependingonthechosennormalizationvalue,weshouldusenormalization.h1.c(thefloatvalue)forh1normalization,normalization.h2.c(thefloatvalue)forh2normalization,normalization.h3.c(thefloatvalue)forh3normalization,andnormalization.z.z(thefloatvalue)forznormalization.Forexample,thefollowingishowtheexamplesimilarityconfigurationwilllook(weputthisintothesettingssectionofourmappingsfile):
"similarity":{
"esserverbook_dfr_similarity":{
"type":"DFR",
"basic_model":"g",
"after_effect":"l",
"normalization":"h2",
"normalization.h2.c":"2.0"
}
}
ConfiguringIBsimilarity
IncaseofIBsimilarity,wehavethefollowingparametersthroughwhichwecanconfigurethedistributionproperty(whichcantakethevalueofllorspl)andthelambdaproperty(whichcantakethevalueofdfortff).Inadditiontothat,wecanchoosethenormalizationfactor,whichisthesameasfortheDFRsimilarity,sowe’llomitdescribingitasecondtime.ThefollowingishowtheexampleIBsimilarityconfigurationwilllook(weputthisintothesettingssectionofourmappingsfile):
"similarity":{
"esserverbook_ib_similarity":{
"type":"IB",
"distribution":"ll",
www.EBooksWorld.ir
"lambda":"df",
"normalization":"z",
"normalization.z.z":"0.25"
}
}
www.EBooksWorld.ir
www.EBooksWorld.ir
BatchindexingtospeedupyourindexingprocessInChapter1,GettingStartedwithElasticsearchCluster,wesawhowtoindexaparticulardocumentintoElasticsearch.ItrequiredopeninganHTTPconnection,sendingthedocument,andclosingtheconnection.Ofcourse,wewerenotresponsibleformostofthatasweusedthecurlcommand,butinthebackgroundthisiswhathappened.However,sendingthedocumentsonebyoneisnotefficient.Becauseofthat,itisnowtimetofindouthowtoindexalargenumberofdocumentsinamoreconvenientandefficientwaythandoingsoonebyone.
www.EBooksWorld.ir
PreparingdataforbulkindexingElasticsearchallowsustomergemanyrequestsintoonepackage.Thispackagecanbesentasasinglerequest.What’smore,wearenotlimitedtohavingasingletypeofrequestinthesocalledbulk–wecanmixdifferenttypesofoperationstogether,whichinclude:
Addingorreplacingtheexistingdocumentsintheindex(index)Removingdocumentsfromtheindex(delete)
Addingnewdocumentsintotheindexwhenthereisnootherdefinitionofthedocumentintheindex(create)Modifyingthedocumentsorcreatingnewonesifthedocumentdoesn’texist(update)
Theformatoftherequestwaschosenforprocessingefficiency.ItassumesthateverylineoftherequestcontainsaJSONobjectwiththedescriptionoftheoperationfollowedbythesecondlinewithadocument–anotherJSONobjectitself.Wecantreatthefirstlineasakindofinformationlineandthesecondasthedataline.Theexceptiontothisruleisthedeleteoperation,whichcontainsonlytheinformationline,becausethedocumentisnotneeded.Let’slookatthefollowingexample:
{"index":{"_index":"addr","_type":"contact","_id":1}}
{"name":"FyodorDostoevsky","country":"RU"}
{"create":{"_index":"addr","_type":"contact","_id":2}}
{"name":"ErichMariaRemarque","country":"DE"}
{"create":{"_index":"addr","_type":"contact","_id":2}}
{"name":"JosephHeller","country":"US"}
{"delete":{"_index":"addr","_type":"contact","_id":4}}
{"delete":{"_index":"addr","_type":"contact","_id":1}}
Itisveryimportantthateverydocumentoractiondescriptionisplacedinoneline(endedbyanewlinecharacter).Thismeansthatthedocumentcannotbepretty-printed.Thereisadefaultlimitationonthesizeofthebulkindexingfile,whichissetto100megabytesandcanbechangedbyspecifyingthehttp.max_content_lengthpropertyintheElasticsearchconfigurationfile.Thisletsusavoidissueswithpossiblerequesttimeoutsandmemoryproblemswhendealingwithrequeststhataretoolarge.
NoteNotethatwithasinglebatchindexingfile,wecanloadthedataintomanyindicesanddocumentsinthebulkrequestcanhavedifferenttypes.
www.EBooksWorld.ir
IndexingthedataInordertoexecutethebulkrequest,Elasticsearchprovidesthe_bulkendpoint.Thiscanbeusedas/_bulkorwithanindexnameas/index_name/_bulkorevenwithatypeandindexnameas/index_name/type_name/_bulk.Thesecondandthirdformsdefinethedefaultvaluesfortheindexnameandthetypename.WecanomitthesepropertiesintheinformationlineofourrequestandElasticsearchwillusethedefaultvaluesfromtheURI.ItisalsoworthknowingthatthedefaultURIvaluescanbeoverwrittenbythevaluesintheinformationlines.
Assumingwe’vestoredourdatainthedocuments.jsonfile,wecanrunthefollowingcommandtosendthisdatatoElasticsearch:
curl-XPOST'localhost:9200/_bulk?pretty'[email protected]
The?prettyparameterisofcoursenotnecessary.We’veusedthisparameteronlyfortheeaseofanalyzingtheresponseoftheprecedingcommand.Whatisimportant,inthiscase,isusingcurlwiththe--data-binaryparameterinsteadofusing–d.Thisisbecausethestandard–dparameterignoresnewlinecharacters,which,aswesaidearlier,areimportantforparsingthebulkrequestcontentbyElasticsearch.Nowlet’slookattheresponsereturnedbyElasticsearch:
{
"took":469,
"errors":true,
"items":[{
"index":{
"_index":"addr",
"_type":"contact",
"_id":"1",
"_version":1,
"_shards":{
"total":2,
"successful":1,
"failed":0
},
"status":201
}
},{
"create":{
"_index":"addr",
"_type":"contact",
"_id":"2",
"_version":1,
"_shards":{
"total":2,
"successful":1,
"failed":0
},
"status":201
}
},{
"create":{
www.EBooksWorld.ir
"_index":"addr",
"_type":"contact",
"_id":"2",
"status":409,
"error":{
"type":"document_already_exists_exception",
"reason":"[contact][2]:documentalreadyexists",
"shard":"2",
"index":"addr"
}
}
},{
"delete":{
"_index":"addr",
"_type":"contact",
"_id":"4",
"_version":1,
"_shards":{
"total":2,
"successful":1,
"failed":0
},
"status":404,
"found":false
}
},{
"delete":{
"_index":"addr",
"_type":"contact",
"_id":"1",
"_version":2,
"_shards":{
"total":2,
"successful":1,
"failed":0
},
"status":200,
"found":true
}
}]
}
Aswecansee,everyresultisapartoftheitemsarray.Let’sbrieflycomparetheseresultswithourinputdata.Thefirsttwocommands,namedindexandcreate,wereexecutedwithoutanyproblems.Thethirdoperationfailedbecausewewantedtocreatearecordwithanidentifierthatalreadyexistedintheindex.Thenexttwooperationsweredeletions.Bothsucceeded.Notethatthefirstofthemtriedtodeleteanonexistentdocument;asyoucansee,thiswasn’taproblemforElasticsearch–thethingworthnotingthoughisthatforthenonexistingdocumentwesawastatusof404,whichintheHTTPresponsecodemeansnotfound(http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html).Asyoucansee,Elasticsearchreturnsinformationabouteachoperation,soforlargebulkrequeststheresponsecanbemassive.
www.EBooksWorld.ir
The_allfieldThe_allfieldisusedbyElasticsearchtostoredatafromalltheotherfieldsinasinglefieldforeaseofsearching.Thiskindoffieldmaybeusefulwhenwewanttoimplementasimplesearchfeatureandwewanttosearchallthedata(oronlythefieldswecopytothe_allfield),butwedon’twanttothinkaboutthefieldnamesandthingslikethat.Bydefault,the_allfieldisenabledandcontainsallthedatafromallthefieldsfromthedocument.However,thisfieldmakestheindexabitbiggerandthatisnotalwaysneeded.
Forexample,whenyouinputasearchphraseintoasearchboxinthelibrarycatalogsite,youexpectthatyoucansearchusingtheauthor’sname,theISBNnumber,andthewordsthatthebooktitlecontains,butsearchingforthenumberofpagesorthecovertypeusuallydoesnotmakesense.Wecaneitherdisablethe_allfieldcompletelyorexcludethecopyingofcertainfieldstoit.Inordernottoincludeacertainfieldinthe_allfield,weusetheinclude_in_allproperty,whichwasdiscussedearlierinthischapter.Tocompletelyturnoffthe_allfieldfunctionality,wemodifyourmappingsfileasfollows:
{
"book":{
"_all":{
"enabled":false
},
"properties":{
...
}
}
}
Inadditiontotheenabledproperty,the_allfieldsupportsthefollowingones:
store
term_vector
analyzer
Forinformationabouttheprecedingproperties,refertotheMappingsconfigurationsectioninthischapter.
www.EBooksWorld.ir
The_sourcefieldThe_sourcefieldallowsustostoretheoriginalJSONdocumentthatwassenttoElasticsearchduringindexation.Bydefault,the_sourcefieldisturnedonassomeoftheElasticsearchfunctionalitiesdependonit(forexample,thepartialupdatefeature).Inadditiontothat,the_sourcefieldcanbeusedasthesourceofdataforthehighlightingfunctionalityifafieldisnotstored.However,ifwedon’tneedsuchafunctionality,wecandisablethe_sourcefieldasitcausessomestorageoverhead.Inordertodothat,weneedtosetthe_sourceobject’senabledpropertytofalse,asfollows:
{
"book":{ "_source":{
"enabled":false
},
"properties":{
...
}
}
}
WecanalsotellElasticsearchwhichfieldswewanttoexcludefromthe_sourcefieldandwhichfieldswewanttoinclude.Wedothatbyaddingtheincludesandexcludespropertiestothe_sourcefielddefinition.Forexample,ifwewanttoexcludeallthefieldsintheauthorpathfromthe_sourcefield,ourmappingswilllookasfollows:
{
"book":{
"_source":{
"excludes":["author.*"]
},
"properties":{
...
}
}
}
www.EBooksWorld.ir
AdditionalinternalfieldsThereareadditionalfieldsthatareinternallyusedbyElasticsearch,butwhichwecan’tconfigure.Thosefieldsare:
_id:Thisfieldisusedtoholdtheidentifierofthedocumentinsidetheindexandtype_uid:Thisfieldisusedtoholdtheuniqueidentifierofthedocumentintheindexandisbuiltof_idand_type(thisallowstohavedocumentswiththesameidentifierwithdifferenttypesinsidethesameindex)_type:Thisfieldisthetypenameforthedocument_field_names:Thisfieldisthelistoffieldsexistinginthedocument
www.EBooksWorld.ir
www.EBooksWorld.ir
IntroductiontosegmentmergingIntheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster,wementionedsegmentsandtheirimmutability.WewrotethattheLucenelibrary,andthusElasticsearch,writesdatatocertainstructuresthatarewrittenonceandneverchange.Thisallowsforsomesimplification,butalsointroducestheneedforadditionalwork.Onesuchexampleisdeletion.Becausesegment,cannotbealtered,informationaboutdeletionsmustbestoredalongsideanddynamicallyappliedduringsearch.Thisisdonebyfilteringdeleteddocumentsfromthereturnedresultset.Theotherexampleistheinabilitytomodifythedocuments(however,somemodificationsarepossible,suchasmodifyingnumericdocvalues).Ofcourse,onecansaythatElasticsearchsupportsdocumentupdates(refertotheManipulatingdatawiththeRESTAPIsectionofChapter1,GettingStartedwithElasticsearchCluster).However,underthehood,theolddocumentismarkedasdeletedandtheonewiththeupdatedcontentsisindexed.
Astimepassesandyoucontinuetoindexordeleteyourdata,moreandmoresegmentsarecreated.Dependingonhowoftenyoumodifytheindex,Lucenecreatessegmentswithvariousnumbersofdocuments-thus,segmentshavedifferentsizes.Becauseofthat,thesearchperformancemaybelowerandyourindexmaybelargerthanitshouldbe–itstillcontainsthedeleteddocuments.Theequationissimple-themoresegmentsyourindexhas,theslowerthesearchspeedis.Thisiswhensegmentmergingcomesintoplay.Wedon’twanttodescribethisprocessindetail;inthecurrentElasticsearchversion,thispartoftheenginewassimplifiedbutitisstillaratheradvancedtopic.Wedecidedtomentionmergingbecausewethinkthatitishandytoknowwheretolookforthecauseoftroublesconnectedwithtoomanyopenfiles,suspiciousCPUusage,expandingindices,orsearchingandindexingspeeddegradingwithtime.
www.EBooksWorld.ir
SegmentmergingSegmentmergingistheprocessduringwhichtheunderlyingLucenelibrarytakesseveralsegmentsandcreatesanewsegmentbasedontheinformationfoundinthem.Theresultingsegmenthasallthedocumentsstoredintheoriginalsegmentsexcepttheonesthatweremarkedfordeletion.Afterthemergeoperation,thesourcesegmentsaredeletedfromthedisk.BecausesegmentmergingisrathercostlyintermsofCPUandI/Ousage,itiscrucialtoappropriatelycontrolwhenandhowoftenthisprocessisinvoked.
www.EBooksWorld.ir
TheneedforsegmentmergingYoumayaskyourselfwhyyouhavetobotherwithsegmentmerging.Firstofall,themoresegmentstheindexisbuiltfrom,theslowerthesearchwillbeandthemorememoryLucenewilluse.Thesecondisthediskspaceandresources,suchasfiledescriptors,usedbytheindex.Ifyoudeletemanydocumentsfromyourindexthen,untilthemergehappens,thosedocumentsareonlymarkedasdeletedandnotdeletedphysically.So,itmayhappenthatmostofthedocumentsthatuseourCPUandmemorydon’texist!Fortunately,Elasticsearchusesreasonabledefaultsforsegmentmerginganditisveryprobablethatnochangesarenecessary.
www.EBooksWorld.ir
ThemergepolicyThemergepolicydefineswhenthemergingprocessshouldbeperformed.Elasticsearchmergessegmentsofapproximatelysimilarsizes,takingintoaccountthemaximumnumberofsegmentsallowedpertier.Thealgorithmofmergingcanfindsegmentswiththelowestcostofmergeandthemostimpactontheresultingsegment.
Thebasicpropertiesofthetieredmergepolicyareasfollows:
index.merge.policy.expunge_deletes_allowed:ThispropertytellsElasticsearchtomergesegmentswithpercentageofthedeleteddocumentshigherthanthisvalue,defaultsto10.index.merge.policy.floor_segment:Thispropertydefaultsto2mbandtellsElasticsearchtotreatsmallersegmentsasoneswithsizeequaltothevalueofthisproperty.Itpreventsflushingoftinysegmentstoavoidtheirhighnumber.index.merge.policy.max_merge_at_once:Inthisproperty,themaximumnumberofsegmentstobemergedatoncedefaultsto10.index.merge.policy.max_merge_at_once_explicit:Inthisproperty,themaximumnumberofsegmentsmergedatonceduringexpungedeletesoroptimizeoperationsdefaultsto10.index.merge.policy.max_merged_segment:Inthisproperty,themaximumsizeofsegmentthatcanbeproducedduringnormalmergingdefaultsto5gb.index.merge.policy.segments_per_tier:Thispropertydefaultsto10androughlydefinesthenumberofsegments.Smallervaluesmeanmoremergingbutfewersegments,whichresultsinhighersearchspeedbutlowerindexingspeedandmoreI/Opressure.Highervaluesofthepropertywillresultinhighersegmentscount,thusslowersearchspeedbuthigherindexingspeed.index.merge.policy.reclaim_deletes_weight–ThispropertytellsElasticsearchhowimportantitistochoosesegmentswithmanydeleteddocuments.Itdefaultsto2.0.
Forexample,toupdatemergepolicysettingsofalreadycreatedindexwecouldrunacommandlikethis:
curl-XPUT'localhost:9200/essb/_settings'-d'{
"index.merge.policy.max_merged_segment":"10gb"
}'
Togetdeeperintosegmentmerging,refertoourbookMasteringElasticsearchSecondEdition,publishedbyPacktPublishing.Youcanalsofindmoreinformationaboutthetieredmergepolicyathttps://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-merge.html.
NoteUptothe2.0versionofElasticsearch,wewereabletochoosebetweenthreemergepolicies:tiered,log_byte_size,andlog_doc.Thecurrentlyusedmergepolicyisbased
www.EBooksWorld.ir
onthetieredmergepolicyandweareforcedtouseit.
www.EBooksWorld.ir
ThemergeschedulerThemergeschedulertellsElasticsearchhowthemergeprocessshouldoccur.Thecurrentimplementationisbasedonaconcurrentmergeschedulerthatisstartedinaseparatethreadandusesthedefinednumberofthreadsdoingmergesinparallel.Elasticsearchallowsyoutosetthenumberofthreadsthatcanbeusedforsimultaneousmergingbyusingtheindex.merge.scheduler.max_thread_countproperty.
www.EBooksWorld.ir
ThrottlingAswehavealreadymentioned,mergingmaybeexpensivewhenitcomestoserverresources.Themergeprocessusuallyworksinparalleltootheroperations,sotheoreticallyitshouldn’thavetoomuchinfluence.Inpractice,thenumberofdiskinput/outputoperationscanbesolargeastosignificantlyaffecttheoverallperformance.Insuchcases,throttlingissomethingthatmayhelp.Infact,thisfeaturecanbeusedforlimitingthespeedofthemerge,butitmayalsobeusedforalltheoperationsusingthedatastore.ThrottlingcanbesetintheElasticsearchconfigurationfile(theelasticsearch.ymlfile)ordynamicallybyusingthesettingsAPI(refertotheTheupdatesettingsAPIsectionofChapter9,ElasticsearchCluster,fordetail).Therearetwosettingsthatadjustthrottling:typeandvalue.
Tosetthethrottlingtype,settheindices.store.throttle.typeproperty,whichallowsustousethefollowingvalues:
none:Thisvaluedefinesthatnothrottlingisonmerge:Thisvaluedefinesthatthrottlingaffectsonlythemergeprocessall:Thisvaluedefinesthatthrottlingisusedforallthedatastoreactivities
Thesecondproperty,indices.store.throttle.max_bytes_per_sec,describeshowmuchthethrottlinglimitstheI/Ooperations.Asitsnamesuggests,ittellsushowmanybytescanbeprocessedpersecond.Forexample,let’slookatthefollowingconfiguration:
indices.store.throttle.type:merge
indices.store.throttle.max_bytes_per_sec:10mb
Inthisexample,welimitthemergeoperationsto10megabytespersecond.Bydefault,Elasticsearchusesthemergethrottlingtypewiththemax_bytes_per_secpropertysetto20mb.Thismeansthatallthemergeoperationsarelimitedto20megabytespersecond.
www.EBooksWorld.ir
www.EBooksWorld.ir
IntroductiontoroutingBydefault,Elasticsearchwilltrytodistributeyourdocumentsevenlyamongalltheshardsoftheindex.However,that’snotalwaysthedesiredsituation.Inordertoretrievethedocuments,Elasticsearchmustqueryalltheshardsandmergetheresults.Whatifwecoulddivideourdataonsomebasis(forexample,theclientidentifier)andusethatinformationtoputdatawiththesamepropertiesinthesameplaceinthecluster.Elasticsearchallowsustodothatbyexposingapowerfuldocumentandquerydistributioncontrolmechanismrouting.Inshort,itallowsustochooseashardtobeusedtoindexorsearchthedata.
www.EBooksWorld.ir
DefaultindexingDuringindexingoperations,whenyousendadocumentforindexing,Elasticsearchlooksatitsidentifiertochoosetheshardinwhichthedocumentshouldbeindexed.Bydefault,Elasticsearchcalculatesthehashvalueofthedocument’sidentifierand,onthebasisofthat,itputsthedocumentinoneoftheavailableprimaryshards.Then,thosedocumentsareredistributedtothereplicas.Thefollowingdiagramshowsasimpleillustrationofhowindexingworksbydefault:
www.EBooksWorld.ir
DefaultsearchingSearchingisabitdifferentfromindexing,becauseinmostsituationsyouneedtoqueryalltheshardstogetthedatayouareinterestedin(wewilltalkaboutthatinChapter3,SearchingYourData),atleastintheinitialscatterphaseofthequery.Imagineasituationwhenyouhavethefollowingmappingsdescribingyourindex:
{
"mappings":{
"post":{
"properties":{
"id":{"type":"long"},
"name":{"type":"string"},
"contents":{"type":"string"},
"userId":{"type":"long"}
}}
}}
Asyoucansee,ourindexconsistsoffourfields:theidentifier(theidfield),nameofthedocument(thenamefield),contentsofthedocument(thecontentsfield),andtheidentifieroftheusertowhichthedocumentsbelong(theuserIdfield).Togetallthedocumentsforaparticularuser,onewithuserIdequalto12,youcanrunthefollowingquery:
curl–XGET'http://localhost:9200/posts/_search?q=userId:12'
Dependingonthesearchtype(wewilltalkmoreaboutitinChapter3,SearchingYourData),Elasticsearchwillrunyourquery.Itusuallymeansthatitwillfirstqueryallthenodesfortheidentifiersandscoreofthematchingdocumentsandthenitwillsendaninternalqueryagain,butonlytotherelevantshards(theonescontainingtheneededdocuments)togetthedocumentsneededtobuildtheresponse.
Averysimplifiedviewofhowthedefaultsearchingworksduringitsinitialphaseisshowninthefollowingillustration:
www.EBooksWorld.ir
Whatifwecouldputallthedocumentsforasingleuserintoasingleshardandqueryonthatshard?Wouldn’tthatbewiseforperformance?Yes,thatishandyandthatiswhatroutingallowsyoudoto.
www.EBooksWorld.ir
RoutingRoutingcancontrolwhichshardyourdocumentsandquerieswillbeforwardedto.Bynow,youwillprobablyhaveguessedthatwecanspecifytheroutingvaluebothduringindexingandduringqueryingand,infact,ifyoudecidetospecifyexplicitroutingvalues,you’llprobablywanttodothatduringindexingandsearching.
Inourcase,wewillusetheuserIdvaluetosetroutingduringindexingandthesamevaluewillbeusedduringsearching.Becausewewillusethesameroutingvalueforallthedocumentsforasingleuser,thesamehashvaluewillbecalculatedandthusallthedocumentsforthatparticularuserwillbeplacedinthesameshard.Usingthesamevalueduringsearchwillresultinsearchingasingleshardinsteadofthewholeindex.
Thereisonethingyoushouldrememberwhenusingroutingwhensearching.Whensearching,youshouldaddaquerypartthatwilllimitthereturneddocumentstotheonesforthegivenuser.Routingisnotenough.Thisisbecauseyou’llprobablyhavemoredistinctroutingvaluesthanthenumberofshardsyourindexwillbebuiltwith.Forexample,youcanhave10shardsbuildingyourindex,butatthesametimehavehundredsofusers.Itisphysicallyimpossibletodedicateasingleshardtoonlyasingleuser.Itisusuallynotgoodfromascalingpointforviewaswell.Becauseofthat,afewdistinctvaluescanpointtothesameshard–inourcasedataofafewuserswillbeplacedinthesameshard.Becauseofthat,weneedaquerypartthatwilllimitthedatatoaparticularuseridentifier,suchasatermquery.
Thefollowingdiagramshowsaverysimpleillustrationofhowsearchingworkswithaprovidedcustomroutingvalue:
www.EBooksWorld.ir
Asyoucansee,Elasticsearchwillsendourquerytoasingleshard.Nowlet’slookathowwecanspecifytheroutingvalues.
www.EBooksWorld.ir
TheroutingparametersTheideaisverysimple.TheendpointusedforalltheoperationsconnectedwithfetchingorstoringdocumentsinElasticsearchallowsustouseadditionalparametercalledrouting.YoucanaddittoyourHTTPorsetitbyusingtheclientlibraryofyourchoice.
So,inordertoindexasampledocumenttothepreviouslyshownindex,wewillusethefollowingcommand:
curl-XPUT'http://localhost:9200/posts/post/1?routing=12'-d'{
"id":"1",
"name":"Testdocument",
"contents":"Testdocument",
"userId":"12"
}'
Ifwenowgetbacktoourpreviousqueryfetchingouruser’sdataandwemodifyittouserouting,itwouldlookasfollows:
curl-XGET'http://localhost:9200/posts/_search?routing=12&q=userId:12'
Asyoucansee,thesameroutingvaluewasusedduringindexingandquerying.Thisispossibleinmostcaseswhenroutingisused.Weknowwhichuserdataweareindexingandwewillprobablyknowwhichuserissearchingforthedata.Inourcase,ourimaginaryuserwasgiventheidentifierof12andweusedthatvalueduringindexingandsearching.
Notethatduringsearchingyoucanspecifymultipleroutingvaluesseparatedbycommas.Forexample,ifwewanttheprecedingquerytobeadditionallyroutedbythevalueofthesectionparameter(ifitexisted)andwealsowanttofilterbythisparameter,ourquerywilllooklikethefollowing:
curl-XGET'http://localhost:9200/posts/_search?
routing=12,6654&q=userId:12+AND+section:6654'
Ofcourse,theprecedingcommandcanmatchmultipleshardsnowasthevaluesgiventoroutingcanpointtomultipleshards.Becauseofthatyouneedtoprovideonlyasingleroutingvalueduringindexation(Elasticsearchneedstobepointedtoasingleshardorindexationwillfail).Youcanofcoursequerymultipleshardsatthesametimeandbecauseofthatmultipleroutingvaluescanbeprovidedduringsearching.
NoteRememberthatroutingisnottheonlythingthatisrequiredtogetresultsforagivenuser.That’sbecauseusuallywehavefewshardsthathaveuniqueroutingvalues.Thismeansthatwewillhavedatafrommultipleusersinasingleshard.So,whenusingrouting,youshouldalsonarrowdownyourresultstotheonesforagivenuser.You’lllearnmoreabouthowyoucandothatinChapter3,SearchingYourData.
www.EBooksWorld.ir
RoutingfieldsSpecifyingtheroutingvaluewitheachrequestiscriticalwhenusinganindexoperation.Withoutit,Elasticsearchusesthedefaultwayofdeterminingwherethedocumentshouldbestored–itusesthehashvalueofthedocumentidentifier.Thismayleadtoasituationwhereonedocumentexistsinmanyversionsondifferentshards.Asimilarsituationmayoccurwhenfetchingthedocument.Whenadocumentisstoredwithagivenroutingvalue,wemayhitthewrongshardandthedocumentmaybenotfound.
Infact,Elasticsearchallowsustochangethedefaultbehaviorandforcesustouseroutingwhenqueryingagivenindex.Todothat,weneedtoaddthefollowingsectiontoourtypedefinition:
"_routing":{
"required":true
}
Theprecedingdefinitionmeansthattheroutingvalueneedstobeprovided(the"required":trueproperty);withoutit,anindexrequestwillfail.
www.EBooksWorld.ir
www.EBooksWorld.ir
SummaryInthischapter,we’velearnedalotwhenitcomestoindexationanddatahandlinginElasticsearch.WestartedwithbasicinformationaboutElasticsearchandweproceededtotuningtheschema-lessbehaviorinElasticsearch.Welearnedhowtoconfigureourmappings,useoutoftheboxlanguageanalysiscapabilitiesofElasticsearch,andcreateourownmappings.Welookedatbatchindexingtospeedupindexationandweaddedadditionalinternalinformationtothedocumentsinourindices.Finally,welookedatsegmentmergingandrouting.
Inthenextchapter,wewillfullyconcentrateonsearchingandtheextensivequerylanguageofElasticsearch.WewillstartwithhowtoqueryElasticsearchandhowtheElasticsearchqueryprocessworks.Wewilllearnaboutallthebasicqueriesandcompoundqueriestobeabletousetheminourapplications.Finally,wewillseewhichqueryshouldbechosenforthegivenusecase.
www.EBooksWorld.ir
www.EBooksWorld.ir
Chapter3.SearchingYourDataInthepreviouschapter,wedivedintoElasticsearchindexing.Welearnedalotwhenitcomestodatahandling.WesawhowtotuneElasticsearchschema-lessmechanismandwenowknowhowtocreateourownmappings.WealsosawthecoretypesofElasticsearchandweusedanalyzers–boththeonethatcomesoutoftheboxwithElasticsearchandtheonewedefinedourselves.Weusedbulkindexingandweaddedadditionalinternalinformationtoourindices.Finally,welearnedwhatsegmentmergingis,howwecanfinetuneit,andhowtouseroutinginElasticsearchandwhatitgivesus.Thischapterisfullydedicatedtoquerying.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:
HowtoqueryElasticsearchWhathappensinternallywhenqueriesarerunWhatarethebasicqueriesinElasticsearchWhatarethecompoundqueriesinElasticsearchthatallowustogroupotherqueriesHowtousepositionawarequeries–spanqueriesHowtochoosetherightqueryforthejob
www.EBooksWorld.ir
QueryingElasticsearchSofar,whenwehavesearchedourdata,weusedtheRESTAPIandasimplequeryortheGETrequest.Similarly,whenwewerechangingtheindex,wealsousedtheRESTAPIandsenttheJSON-structureddatatoElasticsearch.Regardlessofthetypeofoperationwewantedtoperform,whetheritwasamappingchangeordocumentindexation,weusedJSONstructuredrequestbodytoinformElasticsearchabouttheoperationdetails.
AsimilarsituationhappenswhenwewanttosendmorethanasimplequerytoElasticsearch,westructureitusingtheJSONobjectsandsendittoElasticsearchintherequestbody.ThisiscalledthequeryDSL.Inabroaderview,Elasticsearchsupportstwokindsofqueries:basiconesandcompoundones.Basicqueries,suchasthetermquery,areusedforqueryingtheactualdata.WewillcovertheseintheBasicqueriessectionofthischapter.Thesecondtypeofqueryisthecompoundquery,suchastheboolquery,whichcancombinemultiplequeries.WewillcovertheseintheCompoundqueriessectionofthischapter.
However,thisisnotthewholepicture.Inadditiontothesetwotypesofqueries,certainqueriescanhavefiltersthatareusedtonarrowdownyourresultswithcertaincriteria.Filterqueriesdon’taffectscoringandareusuallyveryefficientandeasilycached.
Tomakeitevenmorecomplicated,queriescancontainotherqueries(don’tworry;wewilltrytoexplainallthis!).Furthermore,somequeriescancontainfiltersandotherscancontainbothqueriesandfilters.Althoughthisisnoteverything,wewillstickwiththisworkingexplanationfornow.WewillgooverthisingreaterdetailintheCompoundqueriessectioninthischapterandtheFilteringyourresultssectioninChapter4,ExtendingYourQueryingKnowledge.
www.EBooksWorld.ir
TheexampledataIfnotstatedotherwise,thefollowingmappingswillbeusedfortherestofthechapter:
{
"book":{
"properties":{
"author":{
"type":"string"
},
"characters":{
"type":"string"
},
"copies":{
"type":"long",
"ignore_malformed":false
},
"otitle":{
"type":"string"
},
"tags":{
"type":"string",
"index":"not_analyzed"
},
"title":{
"type":"string"
},
"year":{
"type":"long",
"ignore_malformed":false,
"index":"analyzed"
},
"available":{
"type":"boolean"
}
}
}
}
Theprecedingmappingsrepresentasimplelibraryandwereusedtocreatethelibraryindex.OnethingtorememberisthatElasticsearchwillanalyzethestringbasedfieldsifwedon’tconfigureitdifferently.
Theprecedingmappingswerestoredinthemapping.jsonfileand,inordertocreatethementionedlibraryindex,wecanusethefollowingcommands:
curl-XPOST'localhost:9200/library'
curl-XPUT'localhost:9200/library/book/_mapping'[email protected]
Wealsousedthefollowingsampledataastheexampleonesforthischapter:
{"index":{"_index":"library","_type":"book","_id":"1"}}
{"title":"AllQuietontheWesternFront","otitle":"ImWestennichts
Neues","author":"ErichMariaRemarque","year":1929,"characters":["Paul
Bäumer","AlbertKropp","HaieWesthus","FredrichMüller","Stanislaus
www.EBooksWorld.ir
Katczinsky","Tjaden"],"tags":["novel"],"copies":1,"available":true,
"section":3}
{"index":{"_index":"library","_type":"book","_id":"2"}}
{"title":"Catch-22","author":"JosephHeller","year":1961,"characters":
["JohnYossarian","CaptainAardvark","ChaplainTappman","Colonel
Cathcart","DoctorDaneeka"],"tags":["novel"],"copies":6,"available":
false,"section":1}
{"index":{"_index":"library","_type":"book","_id":"3"}}
{"title":"TheCompleteSherlockHolmes","author":"ArthurConan
Doyle","year":1936,"characters":["SherlockHolmes","Dr.Watson","G.
Lestrade"],"tags":[],"copies":0,"available":false,"section":12}
{"index":{"_index":"library","_type":"book","_id":"4"}}
{"title":"CrimeandPunishment","otitle":"Преступлéниеи
наказáние","author":"FyodorDostoevsky","year":1886,"characters":
["Raskolnikov","SofiaSemyonovnaMarmeladova"],"tags":[],"copies":0,
"available":true}
Westoredoursampledatainthedocuments.jsonfileandweusethefollowingcommandtoindexit:
curl-s-XPOST'localhost:9200/_bulk'[email protected]
Thiscommandrunsbulkindexing.YoucanlearnmoreaboutitintheBatchindexingtospeedupyourindexingprocesssectioninChapter2,IndexingYourData.
www.EBooksWorld.ir
AsimplequeryThesimplestwaytoqueryElasticsearchistousetheURIrequestquery.WealreadydiscusseditintheSearchingwiththeURIrequestquerysectionofChapter1,GettingStartedwithElasticsearchCluster.Forexample,tosearchforthewordcrimeinthetitlefield,youcouldsendaqueryusingthefollowingcommand:
curl-XGET'localhost:9200/library/book/_search?q=title:crime&pretty'
Thisisaverysimple,butlimited,wayofsubmittingqueriestoElasticsearch.IfwelookfromthepointofviewoftheElasticsearchqueryDSL,theprecedingqueryisaquery_stringquery.Itsearchesforthedocumentsthathavethetermcrimeinthetitlefieldandcanberewrittenasfollows:
{
"query":{
"query_string":{"query":"title:crime"}
}
}
SendingaqueryusingthequeryDSLisabitdifferent,butstillnotrocketscience.WesendtheGET(POSTisalsoacceptedincaseyourtoolorlibrarydoesn’tallowsendingrequestbodyinHTTPGETrequests)HTTPrequesttothe_searchRESTendpointasearlierandincludethequeryintherequestbody.Let’stakealookatthefollowingcommand:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"query_string":{"query":"title:crime"}
}
}'
Asyoucansee,weusedtherequestbody(the-dswitch)tosendthewholeJSON-structuredquerytoElasticsearch.TheprettyrequestparametertellsElasticsearchtostructuretheresponseinsuchawaythatwehumanscanreaditmoreeasily.Inresponsetotheprecedingcommand,wegetthefollowingoutput:
{
"took":4,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.5,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"4",
"_score":0.5,
www.EBooksWorld.ir
"_source":{
"title":"CrimeandPunishment",
"otitle":"Преступлéниеинаказáние",
"author":"FyodorDostoevsky",
"year":1886,
"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],
"tags":[],
"copies":0,
"available":true
}
}]
}
}
Nice!WegotourfirstsearchresultswiththequeryDSL.
www.EBooksWorld.ir
PagingandresultsizeElasticsearchallowsustocontrolhowmanyresultswewanttoget(atmost)andfromwhichresultwewanttostart.Thefollowingarethetwoadditionalpropertiesthatcanbesetintherequestbody:
from:Thispropertyspecifiesthedocumentthatwewanttohaveourresultsfrom.Itsdefaultvalueis0,whichmeansthatwewanttogetourresultsfromthefirstdocument.size:Thispropertyspecifiesthemaximumnumberofdocumentswewantastheresultofasinglequery(whichdefaultsto10).Forexample,ifweareonlyinterestedinaggregationsresultsanddon’tcareaboutthedocumentsreturnedbythequery,wecansetthisparameterto0.
Ifwewantourquerytogetdocumentsstartingfromthetenthitemonthelistandfetch20documents,wesendthefollowingquery:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"from":9,
"size":20,
"query":{
"query_string":{"query":"title:crime"}
}
}'
TipDownloadingtheexamplecode
Youcandownloadtheexamplecodefilesforthisbookfromyouraccountathttp://www.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.
Youcandownloadthecodefilesbyfollowingthesesteps:
Loginorregistertoourwebsiteusingyoure-mailaddressandpasswordHoverthemousepointerontheSUPPORTtabatthetopClickonCodeDownloads&ErrataEnterthenameofthebookintheSearchboxSelectthebookforwhichyou’relookingtodownloadthecodefilesChoosefromthedrop-downmenuwhereyoupurchasedthisbookfromClickonCodeDownload
Oncethefileisdownloaded,makesurethatyouunziporextractthefolderusingthelatestversionof:
WinRAR/7-ZipforWindowsZipeg/iZip/UnRarXforMac7-Zip/PeaZipforLinux
www.EBooksWorld.ir
ReturningtheversionvalueInadditiontoalltheinformationreturned,Elasticsearchcanreturntheversionofthedocument(wementionedaboutversioninginChapter1,GettingStartedwithElasticsearchCluster.Todothis,weneedtoaddtheversionpropertywiththevalueoftruetothetoplevelofourJSONobject.So,thefinalquery,whichrequeststheversioninformation,willlookasfollows:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"version":true,
"query":{
"query_string":{"query":"title:crime"}
}
}'
Afterrunningtheprecedingquery,wegetthefollowingresults:
{
"took":4,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.5,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"4",
"_version":1,
"_score":0.5,
"_source":{
"title":"CrimeandPunishment",
"otitle":"Преступлéниеинаказáние",
"author":"FyodorDostoevsky",
"year":1886,
"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],
"tags":[],
"copies":0,
"available":true
}
}]
}
}
Asyoucansee,the_versionsectionispresentforthesinglehitwegot.
www.EBooksWorld.ir
LimitingthescoreFornonstandardusecases,Elasticsearchprovidesafeaturethatletsusfiltertheresultsonthebasisofaminimumscorevaluethatthedocumentmusthavetobeconsideredamatch.Inordertousethisfeature,wemustprovidethemin_scorevalueatthetoplevelofourJSONobjectwiththevalueoftheminimumscore.Forexample,ifwewantourquerytoonlyreturndocumentswithascorehigherthan0.75,wesendthefollowingquery:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"min_score":0.75,
"query":{
"query_string":{"query":"title:crime"}
}
}'
Wegetthefollowingresponseafterrunningtheprecedingquery:
{
"took":3,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":0,
"max_score":null,
"hits":[]
}
}
Ifyoulookatthepreviousexamples,thescoreofourdocumentwas0.5,whichislowerthan0.75,andthuswedidn’tgetanydocumentsinresponse.
Limitingthescoreusuallydoesn’tmakemuchsensebecausecomparingscoresbetweenthequeriesisquitehard.However,maybeinyourcase,thisfunctionalitywillbeneeded.
www.EBooksWorld.ir
ChoosingthefieldsthatwewanttoreturnWiththeuseofthefieldsarrayintherequestbody,Elasticsearchallowsustodefinewhichfieldstoincludeintheresponse.Rememberthatyoucanonlyreturnthesefieldsiftheyaremarkedasstoredinthemappingsusedtocreatetheindex,orifthe_sourcefieldwasused(Elasticsearchusesthe_sourcefieldtoprovidethestoredvaluesandthe_sourcefieldisturnedonbydefault).
So,forexample,toreturnonlythetitleandtheyearfieldsintheresults(foreachdocument),sendthefollowingquerytoElasticsearch:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"fields":["title","year"],
"query":{
"query_string":{"query":"title:crime"}
}
}'
Inresponse,wegetthefollowingoutput:
{
"took":5,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.5,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"4",
"_score":0.5,
"fields":{
"title":["CrimeandPunishment"],
"year":[1886]
}
}]
}
}
Asyoucansee,everythingworkedaswewantedto.Therearefourthingswewouldliketosharewithyouatthispoint,whichareasfollows:
Ifwedon’tdefinethefieldsarray,itwillusethedefaultvalueandreturnthe_sourcefieldifavailable.Ifweusethe_sourcefieldandrequestafieldthatisnotstored,thenthatfieldwillbeextractedfromthe_sourcefield(however,thisrequiresadditionalprocessing).Ifwewanttoreturnallthestoredfields,wejustpassanasterisk(*)asthefieldname.Fromaperformancepointofview,it’sbettertoreturnthe_sourcefieldinsteadof
www.EBooksWorld.ir
multiplestoredfields.Thisisbecausegettingmultiplestoredfieldsmaybeslowercomparedtoretrievingasingle_sourcefield.
www.EBooksWorld.ir
SourcefilteringInadditiontochoosingwhichfieldsarereturned,Elasticsearchallowsustouseso-calledsourcefiltering.Thisfunctionalityallowsustocontrolwhichfieldsarereturnedfromthe_sourcefield.Elasticsearchexposesseveralwaystodothis.Thesimplestsourcefilteringallowsustodecidewhetheradocumentshouldbereturnedornot.Considerthefollowingquery:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"_source":false,
"query":{
"query_string":{"query":"title:crime"}
}
}'
TheresultretunedbyElasticsearchshouldbesimilartothefollowingone:
{
"took":12,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.5,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"4",
"_score":0.5
}]
}
}
Notethattheresponseislimitedtobaseinformationaboutadocumentandthe_sourcefieldwasnotincluded.IfyouuseElasticsearchasasecondsourceofdataandcontentofthedocumentisservedfromSQLdatabaseorcache,thedocumentidentifierisallyouneed.
Thesecondwayissimilartothatdescribedintheprecedingfields,althoughwedefinewhichfieldsshouldbereturnedinthedocumentsourceitself.Let’sseethatusingthefollowingexamplequery:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"_source":["title","otitle"],
"query":{
"query_string":{"query":"title:crime"}
}
}'
Wewantedtogetthetitleandtheotitledocumentfieldsinthereturned_sourcefield.
www.EBooksWorld.ir
Elasticsearchextractedthosevaluesfromtheoriginal_sourcevalueandincludedthe_sourcefieldonlywiththerequestedfields.ThewholeresponsereturnedbyElasticsearchlookedasfollows:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.5,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"4",
"_score":0.5,
"_source":{
"otitle":"Преступлéниеинаказáние",
"title":"CrimeandPunishment"
}
}]
}
}
Wecanalsouseanasterisktoselectwhichfieldsshouldbereturnedinthe_sourcefield;forexample,title*willreturnvaluesforthetitlefieldandfortitle10(ifwehavesuchfieldinourdata).Ifwehavedocumentswithnestedparts,wecanusenotationwithadot;forexample,title.*toselectallthefieldsnestedunderthetitleobject.
Finally,wecanalsospecifyexplicitlywhichfieldswewanttoincludeandwhichtoexcludefromthe_sourcefield.Wecanincludefieldsusingtheincludepropertyandwecanexcludefieldsusingtheexcludeproperty(bothofthemarearraysofvalues).Forexample,ifwewantthereturned_sourcefieldtoincludeallthefieldsstartingwiththelettertbutnotthetitlefield,wewillrunthefollowingquery:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"_source":{
"include":["t*"],
"exclude":["title"]
},
"query":{
"query_string":{"query":"title:crime"}
}
}'
www.EBooksWorld.ir
UsingthescriptfieldsElasticsearchallowsustousescript-evaluatedvaluesthatwillbereturnedwiththeresultdocuments(wewilldiscussElasticsearchscriptingcapabilitiesingreaterdetailintheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter).Tousethescriptfieldsfunctionality,weaddthescript_fieldssectiontoourJSONqueryobjectandanobjectwithanameofourchoiceforeachscriptedvaluethatwewanttoreturn.Forexample,toreturnavaluenamedcorrectYear,whichiscalculatedastheyearfieldminus1800,werunthefollowingquery:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"script_fields":{
"correctYear":{
"script":"doc[\"year\"].value-1800"
}
},
"query":{
"query_string":{"query":"title:crime"}
}
}'
NoteBydefault,Elasticsearchdoesn’tallowustousedynamicscripting.Ifyoutriedtheprecedingquery,youprobablygotanerrorwithinformationstatingthatthescriptsoftype[inline]withoperation[search]andlanguage[groovy]aredisabled.Tomakethisexamplework,youshouldaddthescript.inline:onpropertytotheelasticsearch.ymlfile.However,thisexposesasecuritythreat.MakesuretoreadtheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter,tolearnabouttheconsequences.
Usingthedocnotation,likewedidintheprecedingexample,allowsustocatchtheresultsreturnedandspeedupscriptexecutionatthecostofhighermemoryconsumption.Wealsogetlimitedtosingle-valuedandsingletermfields.Ifwecareaboutmemoryusage,orifweareusingmorecomplicatedfieldvalues,wecanalwaysusethe_sourcefield.Thesamequeryusingthe_sourcefieldlooksasfollows:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"script_fields":{
"correctYear":{
"script":"_source.year-1800"
}
},
"query":{
"query_string":{"query":"title:crime"}
}
}'
ThefollowingresponseisreturnedbyElasticsearchwithdynamicscriptingenabled:
{
"took":76,
www.EBooksWorld.ir
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.5,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"4",
"_score":0.5,
"fields":{
"correctYear":[86]
}
}]
}
}
Asyoucansee,wegotthecalculatedcorrectYearfieldinresponse.
PassingparameterstothescriptfieldsLet’stakealookatonemorefeatureofthescriptfields-thepassingofadditionalparameters.Insteadofhavingthevalue1800intheequation,wecanuseavariablenameandpassitsvalueintheparamssection.Ifwedothis,ourquerywilllookasfollows:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"script_fields":{
"correctYear":{
"script":"_source.year-paramYear",
"params":{
"paramYear":1800
}
}
},
"query":{
"query_string":{"query":"title:crime"}
}
}'
Asyoucansee,weaddedtheparamYearvariableaspartofthescriptedequationandprovideditsvalueintheparamssection.ThisallowsElasticsearchtoexecutethesamescriptwithdifferentparametervaluesinaslightlymoreefficientway.
www.EBooksWorld.ir
www.EBooksWorld.ir
UnderstandingthequeryingprocessAfterreadingtheprevioussection,wenowknowhowqueryingworksinElasticsearch.YouknowthatElasticsearch,inmostcases,needstoscatterthequeryacrossmultiplenodes,gettheresults,mergethem,fetchtherelevantdocumentsfromoneormoreshards,andreturnthefinalresultstotheclientrequestingthedocuments.Whatwedidn’ttalkaboutaretwoadditionalthingsthatdefinehowqueriesbehave:searchtypeandqueryexecutionpreference.WewillnowconcentrateonthesefunctionalitiesofElasticsearch.
www.EBooksWorld.ir
QuerylogicElasticsearchisadistributedsearchengineandsoallfunctionalityprovidedmustbedistributedinitsnature.Itisexactlythesamewithquerying.Becausewewouldliketodiscusssomemoreadvancedtopicsonhowtocontrolthequeryprocess,wefirstneedtoknowhowitworks.
Let’snowgetbacktohowqueryingworks.Westartedthetheoryinthefirstchapterandwewouldliketogetbacktoit.Bydefault,ifwedon’talteranything,thequeryprocesswillconsistoftwophases:thescatterandthegatherphase.Theaggregatornode(theonethatreceivestherequest)willrunthescatterphasefirst.Duringthatphase,thequeryisdistributedtoalltheshardsthatourindexisbuiltfrom(ofcourseifroutingisnotused).Forexample,ifitisbuiltof5shardsand1replicathen5physicalshardswillbequeried(wedon’tneedtoqueryashardanditsreplicaastheycontainthesamedata).Eachofthequeriedshardswillonlyreturnthedocumentidentifierandthescoreofthedocument.Thenodethatsentthescatterquerywillwaitforalltheshardstocompletetheirtask,gathertheresults,andsortthemappropriately(inthiscase,fromtopscoringtothelowestscoringones).
Afterthat,anewrequestwillbesenttobuildthesearchresults.However,nowonlytothoseshardsthatheldthedocumentstobuildtheresponse.Inmostcases,Elasticsearchwon’tsendtherequesttoalltheshardsbuttoitssubset.That’sbecauseweusuallydon’tgetthecompleteresultofthequerybutonlyaportionofit.Thisphaseiscalledthegatherphase.Afterallthedocumentsaregathered,thefinalresponseisbuiltandreturnedasthequeryresult.ThisisthebasicanddefaultElasticsearchbehaviorbutwecanchangeit.
www.EBooksWorld.ir
SearchtypeElasticsearchallowsustochoosehowwewantourquerytobeprocessedinternally.Wecandothatbyspecifyingthesearchtype.Therearedifferentsituationswheredifferentsearchtypesareappropriate:sometimesonecancareonlyabouttheperformancewhilesometimesqueryrelevanceisthemostimportantfactor.YoushouldrememberthateachshardisasmallLuceneindexand,inordertoreturnmorerelevantresults,someinformation,suchasfrequencies,needstobetransferredbetweentheshards.Tocontrolhowthequeriesareexecuted,wecanpassthesearch_typerequestparameterandsetittooneofthefollowingvalues:
query_then_fetch:Inthefirststep,thequeryisexecutedtogettheinformationneededtosortandrankthedocuments.Thisstepisexecutedagainstalltheshards.Thenonlytherelevantshardsarequeriedfortheactualcontentofthedocuments.Thisisthesearchtypeusedbydefaultifnosearchtypeisprovidedwiththequeryandthisisthequerytypewedescribedpreviously.dfs_query_then_fetch:Thisissimilartoquery_then_fetch.However,itcontainsanadditionalqueryphasecomparingtoquery_then_fetchwhichcalculatesdistributedtermfrequencies.
Therearealsotwodeprecatedsearchtypes:countandscan.ThefirstoneisdeprecatedstartingfromElasticsearch2.0andthesecondonestartingwithElasticsearch2.1.Thefirstsearchtypeusedtoprovidebenefitswhereonlyaggregationsorthenumberofdocumentswasrelevant,butnowitisenoughtoaddsizeequalto0toyourqueries.Thescanrequestwasusedforscrollingfunctionality.
Soifwewouldliketousethesimplestsearchtype,wewouldrunthefollowingcommand:
curl-XGET'localhost:9200/library/book/_search?
pretty&search_type=query_then_fetch'-d'{
"query":{
"term":{"title":"crime"}
}
}'
www.EBooksWorld.ir
SearchexecutionpreferenceInadditiontothepossibilityofcontrollinghowthequeryisexecuted,wecanalsocontrolonwhichshardstoexecutethequery.Bydefault,Elasticsearchusesshardsandreplicasonanynodeinaroundrobinmanner–sothateachshardisqueriedasimilarnumberoftimes.Thedefaultbehavioristhepropermethodofshardexecutionpreferenceformostusecases.Buttheremaybetimeswhenwewanttochangethedefaultbehavior.Forexample,youmaywantthesearchtoonlybeexecutedontheprimaryshards.Todothat,wecansetthepreferencerequestparametertooneofthefollowingvalues:
_primary:Theoperationwillbeonlyexecutedontheprimaryshards,sothereplicaswon’tbeused.Thiscanbeusefulwhenweneedtousethelatestinformationfromtheindexbutourdataisnotreplicatedrightaway._primary_first:Theoperationwillbeexecutedontheprimaryshardsiftheyareavailable.Ifnot,itwillbeexecutedontheothershards._replica:Theoperationwillbeexecutedonlyonthereplicashards._replica_first:Thisoperationissimilarto_primary_first,butusesreplicashards.Theoperationwillbeexecutedonthereplicashardsifpossibleandontheprimaryshardsifthereplicasarenotavailable._local:Theoperationwillbeexecutedontheshardsavailableonthenodewhichtherequestwassentfromand,ifsuchshardsarenotpresent,therequestwillbeforwardedtotheappropriatenodes._only_node:node_id:Thisoperationwillbeexecutedonthenodewiththeprovidednodeidentifier._only_nodes:nodes_spec:Thisoperationwillbeexecutedonthenodesthataredefinedinnodes_spec.ThiscanbeanIPaddress,aname,anameorIPaddressusingwildcards,andsoon.Forexample,ifnodes_specissetto192.168.1.*,theoperationwillberunonthenodeswithIPaddressesstartingwith192.168.1._prefer_node:node_id:Elasticsearchwilltrytoexecutetheoperationonthenodewiththeprovidedidentifier.However,ifthenodeisnotavailable,itwillbeexecutedonthenodesthatareavailable._shards:1,2:Elasticsearchwillexecutetheoperationontheshardswiththegivenidentifiers;inthiscase,onshardswithidentifiers1and2.The_shardsparametercanbecombinedwithotherpreferences,buttheshardsidentifiersneedtobeprovidedfirst.Forexample,_shards:1,2;_local.Customvalue:Anycustom,stringvaluemaybepassed.Requestswiththesamevaluesprovidedwillbeexecutedonthesameshards.
Forexample,ifwewouldliketoexecuteaqueryonlyonthelocalshards,wewouldrunthefollowingcommand:
curl-XGET'localhost:9200/library/_search?pretty&preference=_local'-d'{
"query":{
"term":{"title":"crime"}
}
}'
www.EBooksWorld.ir
SearchshardsAPIWhendiscussingthesearchpreference,wewouldalsoliketomentionthesearchshardsAPIexposedbyElasticsearch.ThisAPIallowsustocheckwhichshardsthequerywillbeexecutedon.InordertousethisAPI,runarequestagainstthesearch_shardsrestendpoint.Forexample,toseehowthequerywillbeexecuted,werunthefollowingcommand:
curl-XGET'localhost:9200/library/_search_shards?pretty'-d
'{"query":"match_all":{}}'
Theresponsetotheprecedingcommandwillbeasfollows:
{
"nodes":{
"my0DcA_MTImm4NE3cG3ZIg":{
"name":"Cloud9",
"transport_address":"127.0.0.1:9300",
"attributes":{}
}
},
"shards":[[{
"state":"STARTED",
"primary":true,
"node":"my0DcA_MTImm4NE3cG3ZIg",
"relocating_node":null,
"shard":0,
"index":"library",
"version":4,
"allocation_id":{
"id":"9ayLDbL1RVSyJRYIJkuAxg"
}
}],[{
"state":"STARTED",
"primary":true,
"node":"my0DcA_MTImm4NE3cG3ZIg",
"relocating_node":null,
"shard":1,
"index":"library",
"version":4,
"allocation_id":{
"id":"wfpvtaLER-KVyOsuD46Yqg"
}
}],[{
"state":"STARTED",
"primary":true,
"node":"my0DcA_MTImm4NE3cG3ZIg",
"relocating_node":null,
"shard":2,
"index":"library",
"version":4,
"allocation_id":{
"id":"zrLPWhCOSTmjlb8TY5rYQA"
}
}],[{
www.EBooksWorld.ir
"state":"STARTED",
"primary":true,
"node":"my0DcA_MTImm4NE3cG3ZIg",
"relocating_node":null,
"shard":3,
"index":"library",
"version":4,
"allocation_id":{
"id":"efnvY7YcSz6X8X8USacA7g"
}
}],[{
"state":"STARTED",
"primary":true,
"node":"my0DcA_MTImm4NE3cG3ZIg",
"relocating_node":null,
"shard":4,
"index":"library",
"version":4,
"allocation_id":{
"id":"XJHW2J63QUKdh3bK3T2nzA"
}
}]]
}
Asyoucansee,intheresponsereturnedbyElasticsearch,wehavetheinformationabouttheshardsthatwillbeusedduringthequeryprocess.Ofcourse,withthesearchshardsAPI,wecanuseadditionalparametersthatcontrolthequeryingprocess.Thesepropertiesarerouting,preference,andlocal.Wearealreadyfamiliarwiththefirsttwo.ThelocalparameterisaBoolean(valuestrueorfalse),onethatallowsustotellElasticsearchtousetheclusterstateinformationstoredonthelocalnode(settinglocaltotrue)insteadoftheonefromthemasternode(settinglocaltofalse).Thisallowsustodiagnoseproblemswithclusterstatesynchronization.
www.EBooksWorld.ir
www.EBooksWorld.ir
BasicqueriesElasticsearchhasextensivesearchanddataanalysiscapabilitiesthatareexposedinformsofdifferentqueries,filters,aggregates,andsoon.Inthissection,wewillconcentrateonthebasicqueriesprovidedbyElasticsearch.Bybasicquerieswemeantheonesthatdon’tcombinetheotherqueriestogetherbutrunontheirown.
www.EBooksWorld.ir
ThetermqueryThetermqueryisoneofthesimplestqueriesinElasticsearch.Itjustmatchesthedocumentthathasaterminagivenfield-theexact,notanalyzedterm.Thesimplesttermqueryisasfollows:
{
"query":{
"term":{
"title":"crime"
}
}
}
Itwillmatchthedocumentsthathavethetermcrimeinthetitlefield.Rememberthatthetermqueryisnotanalyzed,soyouneedtoprovidetheexacttermthatwillmatchthetermintheindexeddocument.Notethatinourinputdata,wehavethetitlefieldwiththevalueofCrimeandPunishment(uppercased),butwearesearchingforcrime,becausetheCrimetermsbecomescrimeafteranalysisduringindexing.
Inadditiontothetermwewanttofind,wecanalsoincludetheboostattributetoourtermquery,whichwillaffecttheimportanceofthegiventerm.WewilltalkmoreaboutboostsintheIntroductiontoApacheLucenescoringsectionofChapter6,MakeYourSearchBetter.Fornow,wejustneedtorememberthatitchangestheimportanceofthegivenpartofthequery.
Forexample,tochangeourpreviousqueryandgiveourtermqueryaboostof10.0,sendthefollowingquery:
{
"query":{
"term":{
"title":{
"value":"crime",
"boost":10.0
}
}
}
}
Asyoucansee,thequerychangedabit.Insteadofasimpletermvalue,wenestedanewJSONobjectwhichcontainsthevaluepropertyandtheboostproperty.Thevalueofthevaluepropertyshouldcontainthetermweareinterestedinandtheboostpropertyistheboostvaluewewanttouse.
www.EBooksWorld.ir
ThetermsqueryThetermsqueryisanextensiontothetermquery.Itallowsustomatchdocumentsthathavecertaintermsintheircontentsinsteadofasingleterm.Thetermqueryallowedustomatchasingle,notanalyzedtermandthetermsqueryallowsustomatchmultipleofthose.Forexample,let’ssaythatwewanttogetallthedocumentsthathavethetermsnovelorbookinthetagsfield.Toachievethis,wewillrunthefollowingquery:
{
"query":{
"terms":{
"tags":["novel","book"]
}
}
}
Theprecedingqueryreturnsallthedocumentsthathaveoneorbothofthesearchedtermsinthetagsfield.Thisisakeypointtoremember–thetermsquerywillfinddocumentshavinganyoftheprovidedterms.
www.EBooksWorld.ir
ThematchallqueryThematchallqueryisoneofthesimplestqueriesavailableinElasticsearch.Itallowsustomatchallofthedocumentsintheindex.Ifwewanttogetallthedocumentsfromourindex,wejustrunthefollowingquery:
{
"query":{
"match_all":{}
}
}
Wecanalsoincludeboostinthequery,whichwillbegiventoallthedocumentsmatchedbyit.Forexample,ifwewanttoaddaboostof2.0toallthedocumentsinourmatchallquery,wewillsendthefollowingquerytoElasticsearch:
{
"query":{
"match_all":{
"boost":2.0
}
}
}
www.EBooksWorld.ir
ThetypequeryAverysimplequerythatallowsustofindallthedocumentswithacertaintype.Forexample,ifwewouldliketosearchforallthedocumentswiththebooktypeinourlibraryindex,wewillrunthefollowingquery:
{
"query":{
"type":{
"value":"book"
}
}
}
www.EBooksWorld.ir
TheexistsqueryAquerythatallowsustofindallthedocumentsthathaveavalueinthedefinedfield.Forexample,tofindthedocumentsthathaveavalueinthetagsfield,wewillrunthefollowingquery:
{
"query":{
"exists":{
"field":"tags"
}
}
}
www.EBooksWorld.ir
ThemissingqueryOppositetotheexistsquery,themissingqueryreturnsthedocumentsthathaveanullvalueornovalueatallinagivenfield.Forexample,tofindallthedocumentsthatdon’thaveavalueinthetagsfield,wewillrunthefollowingquery:
{
"query":{
"missing":{
"field":"tags"
}
}
}
www.EBooksWorld.ir
ThecommontermsqueryThecommontermsqueryisamodernElasticsearchsolutionforimprovingqueryrelevanceandprecisionwithcommonwordswhenwearenotusingstopwords(http://en.wikipedia.org/wiki/Stop_words).Forexample,acrimeandpunishmentqueryresultsinthreetermqueriesandeachofthemhaveacostintermsofperformance.However,theandtermisaverycommononeanditsimpactonthedocumentscorewillbeverylow.Thesolutionisthecommontermsquerywhichdividesthequeryintotwogroups.Thefirstgroupistheonewithimportantterms,whicharetheonesthathavelowerfrequency.Thesecondgroupistheonewithlessimportantterms,whicharetheoneswithhighfrequency.ThefirstqueryisexecutedfirstandElasticsearchcalculatesthescoreforallofthetermsfromthefirstgroup.Thiswaythelowfrequencyterms,whichareusuallytheonesthathavemoreimportance,arealwaystakenintoconsideration.ThenElasticsearchexecutesthesecondqueryforthesecondgroupofterms,butcalculatesthescoreonlyforthedocumentsmatchedforthefirstquery.Thiswaythescoreisonlycalculatedfortherelevantdocumentsandthushigherperformancecanbeachieved.
Anexampleofthecommontermsqueryisasfollows:
{
"query":{
"common":{
"title":{
"query":"crimeandpunishment",
"cutoff_frequency":0.001
}
}
}
}
Thequerycantakethefollowingparameters:
query:Theactualquerycontents.cutoff_frequency:Thepercentage(0.001means0.1%)oranabsolutevalue(whenpropertyissettoavalueequaltoorlargerthan1).Highandlowfrequencygroupsareconstructedusingthisvalue.Settingthisparameterto0.001meansthatthelowfrequencytermsgroupwillbeconstructedfortermshavingafrequencyof0.1%andlower.low_freq_operator:Thiscanbesettoororand,butdefaultstoor.ItspecifiestheBooleanoperatorusedforconstructingqueriesinthelowfrequencytermgroup.Ifwewantallthetermstobepresentinadocumentforittobeconsideredamatch,weshouldsetthisparametertoand.high_freq_operator:Thiscanbesettoororand,butdefaultstoor.ItspecifiestheBooleanoperatorusedforconstructingqueriesinthehighfrequencytermgroup.Ifwewantallthetermstobepresentinadocumentforittobeconsideredamatch,weshouldsetthisparametertoand.minimum_should_match:Insteadofusinglow_freq_operatorandhigh_freq_operator,wecanuseminimum_should_match.Justlikewiththeother
www.EBooksWorld.ir
queries,itallowsustospecifytheminimumnumberoftermsthatshouldbefoundinadocumentforittobeconsideredamatch.Wecanalsospecifyhigh_freqandlow_freqinsidetheminimum_should_matchobject,whichallowsustodefinethedifferentnumberoftermsthatneedtobematchedforthehighandlowfrequencyterms.boost:Theboostgiventothescoreofthedocuments.analyzer:Thenameoftheanalyzerthatwillbeusedtoanalyzethequerytext,whichdefaultstothedefaultanalyzer.disable_coord:Defaultstofalseandallowsustoenableordisablethescorefactorcomputationthatisbasedonthefractionofallthequerytermsthatadocumentcontains.Setittotrueforlessprecisescoring,butslightlyfasterqueries.
NoteUnlikethetermandtermsqueries,thecommontermsqueryisanalyzedbyElasticsearch.
www.EBooksWorld.ir
ThematchqueryThematchquerytakesthevaluesgiveninthequeryparameter,analyzesit,andconstructstheappropriatequeryoutofit.Whenusingamatchquery,Elasticsearchwillchoosetheproperanalyzerforthefieldwechoose,soyoucanbesurethatthetermspassedtothematchquerywillbeprocessedbythesameanalyzerthatwasusedduringindexing.Rememberthatthematchquery(andthemulti_matchquery)doesn’tsupportLucenequerysyntax;however,itperfectlyfitsasaqueryhandlerforyoursearchbox.Thesimplestmatch(andthedefault)querywilllooklikethefollowing:
{
"query":{
"match":{
"title":"crimeandpunishment"
}
}
}
Theprecedingquerywillmatchallthedocumentsthathavethetermscrime,and,orpunishmentinthetitlefield.However,thepreviousqueryisonlythesimplestone;therearemultipletypesofmatchquerywhichwewilldiscussnow.
TheBooleanmatchqueryTheBooleanmatchqueryisaquerywhichanalyzestheprovidedtextandmakesaBooleanqueryoutofit.Thisisalsothedefaulttypeforthematchquery.ThereareafewparameterswhichallowustocontrolthebehavioroftheBooleanmatchqueries:
operator:Thisparametercantakethevalueofororand,andcontrolswhichBooleanoperatorisusedtoconnectthecreatedBooleanclauses.Thedefaultvalueisor.Ifwewantallthetermsinourquerytobematched,weshouldusetheandBooleanoperator.analyzer:Thisspecifiesthenameoftheanalyzerthatwillbeusedtoanalyzethequerytextanddefaultstothedefaultanalyzer.fuzziness:Providingthevalueofthisparameterallowsustoconstructfuzzyqueries.Thevalueofthisparametercanvary.Fornumericfields,itshouldbesettonumericvalue;fordatebasedfield,itcanbesettomillisecondortimevalue,suchas2h;andfortextfields,itcanbesetto0,1,or2(theeditdistanceintheLevenshteinalgorithm–https://en.wikipedia.org/wiki/Levenshtein_distance),AUTO(whichallowsElasticsearchtocontrolhowfuzzyqueriesareconstructedandwhichisapreferredvalue).Finally,fortextfields,itcanalsobesettovaluesfrom0.0to1.0,whichresultsineditdistancebeingcalculatedastermlengthminus1.0multipliedbytheprovidedfuzzinessvalue.Ingeneral,thehigherthefuzziness,themoredifferencebetweentermswillbeallowed.prefix_length:Thisallowscontroloverthebehaviorofthefuzzyquery.Formoreinformationonthevalueofthisparameter,refertotheThefuzzyquerysectioninthischapter.max_expansions:Thisallowscontroloverthebehaviorofthefuzzyquery.Formore
www.EBooksWorld.ir
informationonthevalueofthisparameter,refertotheThefuzzyquerysectioninthischapter.zero_terms_query:Thisallowsustospecifythebehaviorofthequery,whenallthetermsareremovedbytheanalyzer(forexample,becauseofstopwords).Itcanbesettononeorall,withnoneasthedefault.Whensettonone,nodocumentswillbereturnedwhentheanalyzerremovesallthequeryterms.Ifsetittoall,allthedocumentswillbereturned.cutoff_frequency:Itallowsdividingthequeryintotwogroups:onewithhighfrequencytermsandonewithlowfrequencyterms.Refertothedescriptionofthecommontermsquerytoseehowthisparametercanbeused.lenient:Whensettotrue(bydefaultitisfalse),itallowsustoignoretheexceptionscausedbydataincompatibility,suchastryingtoquerynumericfieldsusingstringvalue.
Theparametersshouldbewrappedinthenameofthefieldwearerunningthequeryagainst.SoifwewanttorunasampleBooleanmatchqueryagainstthetitlefield,wesendaqueryasfollows:
{
"query":{
"match":{
"title":{
"query":"crimeandpunishment",
"operator":"and"
}
}
}
}
ThephrasematchqueryAphrasematchqueryissimilartotheBooleanquery,but,insteadofconstructingtheBooleanclausesfromtheanalyzedtext,itconstructsthephrasequery.YoumaywonderwhatphraseiswhenitcomestoLuceneandElasticsearch–well,itistwoormoretermspositionedoneafteranotherinanorder.Thefollowingparametersareavailable:
slop:Anintegervaluethatdefineshowmanyunknownwordscanbeputbetweenthetermsinthetextqueryforamatchtobeconsideredaphrase.Thedefaultvalueofthisparameteris0,whichmeansthatnoadditionalwordsareallowed.analyzer:Thisspecifiesthenameoftheanalyzerthatwillbeusedtoanalyzethequerytextanddefaultstothedefaultanalyzer.
Asamplephrasematchqueryagainstthetitlefieldlookslikethefollowingcode:
{
"query":{
"match_phrase":{
"title":{
"query":"crimepunishment",
"slop":1
}
www.EBooksWorld.ir
}
}
}
Notethatweremovedtheandtermfromourquery,butbecausetheslopissetto1,itwillstillmatchourdocumentbecauseweallowedonetermtobepresentbetweenourterms.
ThematchphraseprefixqueryThelasttypeofthematchqueryisthematchphraseprefixquery.Thisqueryisalmostthesameasthephrasematchquery,butinaddition,itallowsprefixmatchesonthelastterminthequerytext.Also,inadditiontotheparametersexposedbythematchphrasequery,itexposesanadditionalone–themax_expansionsparameter,whichcontrolshowmanyprefixesthelasttermwillberewrittento.Ourexamplequerychangedtothematch_phrase_prefixquerywilllookasfollows:
{
"query":{
"match_phrase_prefix":{
"title":{
"query":"crimepunishm",
"slop":1,
"max_expansions":20
}
}
}
}
Notethatwedidn’tprovidethefullcrimeandpunishmentphrase,butonlycrimepunishmandstillthequerywouldmatchourdocument.Thisisbecauseweusedthematch_phrase_prefixquerycombinedwithslopsetto1.
www.EBooksWorld.ir
ThemultimatchqueryItisthesameasthematchquery,butinsteadofrunningagainstasinglefield,itcanberunagainstmultiplefieldswiththeuseofthefieldsparameter.Ofcourse,alltheparametersyouusewiththematchquerycanbeusedwiththemultimatchquery.Soifwewouldliketomodifyourmatchquerytoberunagainstthetitleandotitlefields,wewillrunthefollowingquery:
{
"query":{
"multi_match":{
"query":"crimepunishment",
"fields":["title^10","otitle"]
}
}
}
Asshownintheprecedingexample,thenicethingaboutthemultimatchqueryisthatthefieldsdefinedinitsupportboosting,sowecanincreaseordecreasetheimportanceofmatchesoncertainfields.
However,thisisnottheonlydifferencewhenitcomestocomparisonwiththematchquery.Wecanalsocontrolhowthequeryisruninternallybyusingthetypepropertyandsettingittooneofthefollowingvalues:
best_fields:Thisisthedefaultbehavior,whichfindsdocumentshavingmatchesinanyfieldfromthedefinedones,butsettingthedocumentscoretothescoreofthebestmatchingfield.Themostusefultypewhensearchingformultiplewordsandwantingtoboostdocumentsthathavethosewordsinthesamefield.most_fields:Thisvaluefindsdocumentsthatmatchanyfieldandsetsthescoreofthedocumenttothecombinedscorefromallthematchedfields.cross_fields:Thisvaluetreatsthequeryasifallthetermswereinone,bigfield,thusreturningdocumentsmatchinganyfield.phrase:Thisvalueusesthematch_phrasequeryoneachfieldandsetsthescoreofthedocumenttothescorecombinedfromallthefields.phrase_prefix:Thisvalueusesthematch_phrase_prefixqueryoneachfieldandsetsthescoreofthedocumenttothescorecombinedfromallthefields.
Inadditiontotheparametersmentionedinthematchqueryandtype,themultimatchqueryexposessomeadditionalonesallowingmorecontroloveritsbehavior:
tie_breaker:Thisallowsustospecifythebalancebetweentheminimumandthemaximumscoringqueryitemsandthevaluecanbefrom0.0to1.0.Whenused,thescoreofthedocumentisequaltothebestscoringelementplusthetie_breakermultipliedbythescoreofalltheothermatchingfieldsinthedocument.So,whensetto0.0,Elasticsearchwillonlyusethescoreofthemostscoringmatchingelement.YoucanreadmoreaboutitinThedis_maxquerysectioninthischapter.
www.EBooksWorld.ir
ThequerystringqueryIncomparisontotheotherqueriesavailable,thequerystringquerysupportsfullApacheLucenequerysyntax,whichwediscussedearlierintheLucenequerysyntaxsectionofChapter1,GettingStartedwithElasticsearchCluster.Itusesaqueryparsertoconstructanactualqueryusingtheprovidedtext.Anexamplequerystringquerywilllooklikethefollowingcode:
{
"query":{
"query_string":{
"query":"title:crime^10+title:punishment-otitle:cat+author:
(+Fyodor+dostoevsky)",
"default_field":"title"
}
}
}
BecausewearefamiliarwiththebasicsoftheLucenequerysyntax,wecandiscusshowtheprecedingqueryworks.Asyoucansee,wewantedtogetthedocumentsthatmayhavethetermcrimeinthetitlefieldandsuchdocumentsshouldbeboostedwiththevalueof10.Next,wewantedonlythedocumentsthathavethetermpunishmentinthetitlefieldandwedidn’twantdocumentswiththetermcatintheotitlefield.Finally,wetoldLucenethatweonlywantedthedocumentsthathadthefyodoranddostoevskytermsintheauthorfield.
SimilartomostofthequeriesinElasticsearch,thequerystringqueryprovidesquiteafewparametersthatallowustocontrolthequerybehaviorandthelistofparametersforthisqueryisratherextensive:
query:Thisspecifiesthequerytext.default_field:Thisspecifiesthedefaultfieldthequerywillbeexecutedagainst.Itdefaultstotheindex.query.default_fieldproperty,whichisbydefaultsetto_all.default_operator:Thisspecifiesthedefaultlogicaloperator(ororand)usedwhennooperatorisspecified.Thedefaultvalueofthisparameterisor.analyzer:Thisspecifiesthenameoftheanalyzerusedtoanalyzethequeryprovidedinthequeryparameter.allow_leading_wildcard:Thisspecifiesifawildcardcharacterisallowedasthefirstcharacterofaterm.Itdefaultstotrue.lowercase_expand_terms:Thisspecifiesifthetermsthatarearesultofqueryrewriteshouldbelowercased.Itdefaultstotrue,whichmeansthattherewrittentermswillbelowercased.enable_position_increments:Thisspecifiesifpositionincrementsshouldbeturnedonintheresultquery.Itdefaultstotrue.fuzzy_max_expansions:Thisspecifiesthemaximumnumberoftermsintowhichfuzzyquerywillbeexpanded,iffuzzyqueryisused.Itdefaultsto50.fuzzy_prefix_length:Thisspecifiestheprefixlengthforthegeneratedfuzzy
www.EBooksWorld.ir
queriesanddefaultsto0.Tolearnmoreaboutit,lookatthefuzzyquerydescription.phrase_slop:Thisspecifiesthephraseslopanddefaultsto0.Tolearnmoreaboutit,lookatthephrasematchquerydescription.boost:Thisspecifiestheboostvaluewhichwillbeusedanddefaultsto1.0.analyze_wildcard:Thisspecifiesifthetermsgeneratedbythewildcardqueryshouldbeanalyzed.Itdefaultstofalse,whichmeansthatthosetermswon’tbeanalyzed.auto_generate_phrase_queries:specifiesifthephrasequerieswillbeautomaticallygeneratedfromthequery.Itdefaultstofalse,whichmeansthatthephrasequerieswon’tbeautomaticallygenerated.minimum_should_match:ThiscontrolshowmanyofthegeneratedBooleanshouldclausesshouldbematchedagainstadocumentforthedocumenttobeconsideredahit.Thevaluecanbeprovidedasapercentage;forexample,50%,whichwouldmeanthatatleast50percentofthegiventermsshouldmatch.Itcanalsobeprovidedasanintegervalue,suchas2,whichmeansthatatleast2termsmustmatch.fuzziness:Thiscontrolsthebehaviorofthegeneratedfuzzyquery.Refertothematchquerydescriptionformoreinformation.max_determined_states:Thisdefaultsto10000andsetsthenumberofstatesthattheautomatoncanhaveforhandlingregularexpressionqueries.Itisusedtodisallowveryexpensivequeriesusingregularexpressions.locale:Thissetsthelocalethatshouldbeusedfortheconversionofstringvalues.Bydefault,itissettoROOT.time_zone:Thissetsthetimezonethatshouldbeusedbyrangequeriesthatarerunondatebasedfields.lenient:Thiscantakethevalueoftrueorfalse.Ifsettotrue,format-basedfailureswillbeignored.Bydefault,itissettofalse.
NotethatElasticsearchcanrewritethequerystringqueryand,becauseofthat,Elasticsearchallowsustopassadditionalparametersthatcontroltherewritemethod.However,formoredetailsaboutthisprocess,gototheUnderstandingthequeryingprocesssectioninthischapter.
RunningthequerystringqueryagainstmultiplefieldsItispossibletorunthequerystringqueryagainstmultiplefields.Inordertodothat,oneneedstoprovidethefieldsparameterinthequerybody,whichshouldholdthearrayofthefieldnames.Therearetwomethodsofrunningthequerystringqueryagainstmultiplefields:thedefaultmethodusestheBooleanquerytomakequeriesandtheothermethodcanusethedis_maxquery.
Inordertousethedis_maxquery,oneshouldaddtheuse_dis_maxpropertyinthequerybodyandsetittotrue.Anexamplequerywilllooklikethefollowingcode:
{
"query":{
"query_string":{
"query":"crimepunishment",
"fields":["title","otitle"],
www.EBooksWorld.ir
"use_dis_max":true
}
}
}
www.EBooksWorld.ir
ThesimplequerystringqueryThesimplequerystringqueryusesoneofthenewestqueryparsersinLucene-theSimpleQueryParser(https://lucene.apache.org/core/5_4_0/queryparser/org/apache/lucene/queryparser/simple/SimpleQueryParser.htmlSimilartothequerystringquery,itacceptsLucenequerysyntaxasthequery;however,unlikeit,itneverthrowsanexceptionwhenaparsingerrorhappens.Insteadofthrowinganexception,itdiscardstheinvalidpartsofthequeryandrunstherest.
Anexamplesimplequerystringquerywilllooklikethefollowingcode:
{
"query":{
"simple_query_string":{
"query":"crimepunishment",
"default_operator":"or"
}
}
}
Thequerysupportsparameterssuchasquery,fields,default_operator,analyzer,lowercase_expanded_terms,locale,lenient,andminimum_should_match,andcanalsoberunagainstmultiplefieldsusingthefieldsproperty.
www.EBooksWorld.ir
TheidentifiersqueryThisisasimplequerythatfiltersthereturneddocumentstoonlythosewiththeprovidedidentifiers.Itworksontheinternal_uidfield,soitdoesn’trequirethe_idfieldtobeenabled.Thesimplestversionofsuchaquerywilllooklikethefollowing:
{
"query":{
"ids":{
"values":["1","2","3"]
}
}
}
Thisquerywillonlyreturnthosedocumentsthathaveoneoftheidentifierspresentinthevaluesarray.Wecancomplicatetheidentifiersqueryabitandalsolimitthedocumentsonthebasisoftheirtype.Forexample,ifwewanttoonlyincludedocumentsfromthebooktypes,wewillsendthefollowingquery:
{
"query":{
"ids":{
"type":"book",
"values":["1","2","3"]
}
}
}
Asyoucansee,we’veaddedthetypepropertytoourqueryandwe’vesetitsvaluetothetypeweareinterestedin.
www.EBooksWorld.ir
TheprefixqueryThisqueryissimilartothetermqueryinitsconfigurationandtothemultitermquerywhenlookingintoitslogic.Theprefixqueryallowsustomatchdocumentsthathavethevalueinacertainfieldthatstartswithagivenprefix.Forexample,ifwewanttofindallthedocumentsthathavevaluesstartingwithcriinthetitlefield,wewillrunthefollowingquery:
{
"query":{
"prefix":{
"title":"cri"
}
}
}
Similartothetermquery,youcanalsoincludetheboostattributetoyourprefixquerywhichwillaffecttheimportanceofthegivenprefix.Forexample,ifwewouldliketochangeourpreviousqueryandgiveourqueryaboostof3.0,wewillsendthefollowingquery:
{
"query":{
"prefix":{
"title":{
"value":"cri",
"boost":3.0
}
}
}
}
NoteNotethattheprefixqueryisrewrittenbyElasticsearchandbecauseofthatElasticsearchallowsustopassanadditionalparameter,thatis,controllingtherewritemethod.However,formoredetailsaboutthatprocess,refertotheUnderstandingthequeryingprocesssectioninthischapter.
www.EBooksWorld.ir
ThefuzzyqueryThefuzzyqueryallowsustofinddocumentsthathavevaluessimilartotheoneswe’veprovidedinthequery.Thesimilarityoftermsiscalculatedonthebasisoftheeditdistancealgorithm.Theeditdistanceiscalculatedonthebasisoftermsweprovideinthequeryandagainstthesearcheddocuments.ThisquerycanbeexpensivewhenitcomestoCPUresources,butcanhelpuswhenweneedfuzzymatching;forexample,whenusersmakespellingmistakes.Inourexample,let’sassumethatinsteadofcrime,ouruserentersthecrmewordintothesearchboxandwewouldliketorunthesimplestformoffuzzyquery.Suchaquerywilllooklikethis:
{
"query":{
"fuzzy":{
"title":"crme"
}
}
}
Theresponseforsuchaquerywillbeasfollows:
{
"took":81,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.5,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"4",
"_score":0.5,
"_source":{
"title":"CrimeandPunishment",
"otitle":"Преступлéниеинаказáние",
"author":"FyodorDostoevsky",
"year":1886,
"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],
"tags":[],
"copies":0,
"available":true
}
}]
}
}
Eventhoughwemadeatypo,Elasticsearchmanagedtofindthedocumentswewereinterestedin.
www.EBooksWorld.ir
Wecancontrolthefuzzyquerybehaviorbyusingthefollowingparameters:
value:Thisspecifiestheactualquery.boost:Thisspecifiestheboostvalueforthequery.Itdefaultsto1.0.fuzziness:Thiscontrolsthebehaviorofthegeneratedfuzzyquery.Refertothematchquerydescriptionformoreinformation.prefix_length:Thisisthelengthofthecommonprefixofthedifferencingterms.Itdefaultsto0.max_expansions:Thisspecifiesthemaximumnumberoftermsthequerywillbeexpandedto.Thedefaultvalueisunbounded.
Theparametersshouldbewrappedinthenameofthefieldwearerunningthequeryagainst.Soifwewouldliketomodifythepreviousqueryandaddadditionalparameters,thequerywilllooklikethefollowingcode:
{
"query":{
"fuzzy":{
"title":{
"value":"crme",
"fuzziness":2
}
}
}
}
www.EBooksWorld.ir
ThewildcardqueryAquerythatallowsustouse*and?wildcardsinthevalueswesearch.Apartfromthat,thewildcardqueryisverysimilartothetermqueryincaseofitsbody.Tosendaquerythatwouldmatchallthedocumentswiththevalueofthecr?meterm(?matchinganycharacter)wewouldsendthefollowingquery:
{
"query":{
"wildcard":{
"title":"cr?me"
}
}
}
Itwillmatchthedocumentsthathaveallthetermsmatchingcr?meinthetitlefield.However,youcanalsoincludetheboostattributetoyourwildcardquerywhichwillaffecttheimportanceofeachtermthatmatchesthegivenvalue.Forexample,ifwewouldliketochangeourpreviousqueryandgiveourtermqueryaboostof20.0,wewillsendthefollowingquery:
{
"query":{
"wildcard":{
"title":{
"value":"cr?me",
"boost":20.0
}
}
}
}
NoteNotethatwildcardqueriesarenotveryperformanceorientedqueriesandshouldbeavoidedifpossible;especiallyavoidleadingwildcards(termsstartingwithwildcards).ThewildcardqueryisrewrittenbyElasticsearchandbecauseofthatElasticsearchallowsustopassanadditionalparameter,thatis,controllingtherewritemethod.Formoredetailsaboutthisprocess,refertotheUnderstandingthequeryingprocesssectioninthischapter.Alsorememberthatthewildcardqueryisnotanalyzed.
www.EBooksWorld.ir
TherangequeryAquerythatallowsustofinddocumentsthathaveafieldvaluewithinacertainrangeandwhichworksfornumericalfieldsaswellasforstring-basedfieldsanddatebasedfields(justmapstoadifferentApacheLucenequery).Therangequeryshouldberunagainstasinglefieldandthequeryparametersshouldbewrappedinthefieldname.Thefollowingparametersaresupported:
gte:Thequerywillmatchdocumentswiththevaluegreaterthanorequaltotheoneprovidedwiththisparametergt:Thequerywillmatchdocumentswiththevaluegreaterthantheoneprovidedwiththisparameterlte:Thequerywillmatchdocumentswiththevaluelowerthanorequaltotheoneprovidedwiththisparameterlt:Thequerywillmatchdocumentswiththevaluelowerthantheoneprovidedwiththisparameter
Soforexample,ifwewanttofindallthebooksthathavethevaluefrom1700to1900intheyearfield,wewillrunthefollowingquery:
{
"query":{
"range":{
"year":{
"gte":1700,
"lte":1900
}
}
}
}
www.EBooksWorld.ir
RegularexpressionqueryRegularexpressionqueryallowsustouseregularexpressionsasthequerytext.Rememberthattheperformanceofsuchqueriesdependsonthechosenregularexpression.Ifourregularexpressionwouldmatchmanyterms,thequerywillbeslow.Thegeneralruleisthatthemoretermsmatchedbytheregularexpression,theslowerthequerywillbe.
Anexampleregularexpressionquerylookslikethis:
{
"query":{
"regexp":{
"title":{
"value":"cr.m[ae]",
"boost":10.0
}
}
}
}
TheprecedingquerywillresultinElasticsearchrewritingthequery.Therewrittenquerywillhavethenumberoftermqueriesdependingonthecontentofourindexmatchingthegivenregularexpression.Theboostparameterseeninthequeryspecifiestheboostvalueforthegeneratedqueries.
ThefullregularexpressionsyntaxacceptedbyElasticsearchcanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#regexp-syntax.
www.EBooksWorld.ir
ThemorelikethisqueryOneofthequeriesthatgotamajorreworkinElasticsearch2.0,themorelikethisqueryallowsustoretrievedocumentsthataresimilar(ornotsimilar)totheprovidedtextortothedocumentsthatwereprovided.
Themorelikethisqueryallowsustogetdocumentsthataresimilartotheprovidedtext.Elasticsearchsupportsafewparameterstodefinehowthemorelikethisqueryshouldwork:
fields:Anarrayoffieldsthatthequeryshouldberunagainst.Itdefaultstothe_allfield.like:Thisparametercomesintwoflavors:itallowsustoprovideatextwhichthereturneddocumentsshouldbesimilartooranarrayofdocumentsthatthereturningdocumentshouldbesimilarto.unlike:Thisissimilartothelikeparameter,butitallowsustodefinetextordocumentsthatourreturningdocumentshouldnotbesimilarto.min_term_freq:Theminimumtermfrequency(forthetermsinthedocuments)belowwhichtermswillbeignored.Itdefaultsto2.max_query_terms:Themaximumnumberoftermsthatwillbeincludedinanygeneratedquery.Itdefaultsto25.Thehighervaluemaymeanhigherprecision,butlowerperformance.stop_words:Anarrayofwordsthatwillbeignoredwhencomparingdocumentsandthequery.Itisemptybydefault.min_doc_freq:Theminimumnumberofdocumentsinwhichthetermhastobepresentinordernottobeignored.Itdefaultsto5,whichmeansthatatermneedstobepresentinatleastfivedocuments.max_doc_freq:Themaximumnumberofdocumentsinwhichthetermmaybepresentinordernottobeignored.Bydefault,itisunbounded(setto0).min_word_len:Theminimumlengthofasinglewordbelowwhichawordwillbeignored.Itdefaultsto0.max_word_len:Themaximumlengthofasinglewordabovewhichitwillbeignored.Itdefaultstounbounded(whichmeanssettingthevalueto0).boost_terms:Theboostvaluethatwillbeusedwhenboostingeachterm.Itdefaultsto0.boost:Theboostvaluethatwillbeusedwhenboostingthequery.Itdefaultsto1.include:Thisspecifiesiftheinputdocumentsshouldbeincludedintheresultsreturnedbythequery.Itdefaultstofalse,whichmeansthattheinputdocumentswon’tbeincluded.minimum_should_match:Thiscontrolsthenumberoftermsthatneedtobematchedintheresultingdocuments.Bydefault,itissetto30%.analyzer:Thenameoftheanalyzerthatwillbeusedtoanalyzethetextweprovided.
Anexampleforamorelikethisquerylookslikethis:
www.EBooksWorld.ir
{
"query":{
"more_like_this":{
"fields":["title","otitle"],
"like":"crimeandpunishment",
"min_term_freq":1,
"min_doc_freq":1
}
}
}
Aswesaidearlier,thelikepropertycanalsobeusedtoshowwhichdocumentstheresultsshouldbesimilarto.Forexample,thefollowingisthequerythatwillusethelikepropertytopointtoagivendocument(notethatthefollowingquerywon’treturndocumentsonourexampledata):
{
"query":{
"more_like_this":{
"fields":["title","otitle"],
"min_term_freq":1,
"min_doc_freq":1,
"like":[
{
"_index":"library",
"_type":"book",
"_id":"4"
}
]
}
}
}
Wecanalsomixthedocumentsandtexttogether:
{
"query":{
"more_like_this":{
"fields":["title","otitle"],
"min_term_freq":1,
"min_doc_freq":1,
"like":[
{
"_index":"library",
"_type":"book",
"_id":"4"
},
"crimeandpunishment"
]
}
}
}
www.EBooksWorld.ir
www.EBooksWorld.ir
CompoundqueriesIntheBasicqueriessectionofthischapter,wediscussedthesimplestqueriesexposedbyElasticsearch.WealsotalkedaboutthepositionawarequeriescalledspanqueriesintheSpanqueriessection.However,thesimpleonesandthespanqueriesarenottheonlyqueriesthatElasticsearchprovides.Thecompoundqueries,aswecallthem,allowustoconnectmultiplequeriestogetheroralterthebehaviorofotherqueries.Youmaywonderifyouneedsuchfunctionality.Yourdeploymentmaynotneedit,butanythingapartfromasimplequerywillprobablyrequirecompoundqueries.Forexample,combiningasimpletermquerywithamatch_phrasequerytogetbettersearchresultsmaybeagoodcandidateforcompoundqueriesusage.
www.EBooksWorld.ir
TheboolqueryTheboolqueryallowsustowrapavirtuallyunboundednumberofqueriesandconnectthemwithalogicalvalueusingoneofthefollowingsections:
should:Thequerywrappedintothissectionmayormaynotmatch.Thenumberofshouldsectionsthathavetomatchiscontrolledbytheminimum_should_matchparametermust:Thequerywrappedintothissectionmustmatchinorderforthedocumenttobereturned.must_not:Thequerywhenwrappedintothissectionmustnotmatchinorderforthedocumenttobereturned.
Eachoftheprecedingmentionedsectionscanbepresentmultipletimesinasingleboolquery.Thisallowsustobuildverycomplexqueriesthathavemultiplelevelsofnesting(youcanincludetheboolqueryinanotherboolquery).Rememberthatthescoreoftheresultingdocumentwillbecalculatedbytakingasumofallthewrappedqueriesthatthedocumentmatched.
Inadditiontotheprecedingsections,wecanaddthefollowingparameterstothequerybodytocontrolitsbehavior:
filter:Thisallowsustospecifythepartofthequerythatshouldbeusedasafilter.YoucanreadmoreaboutfiltersintheFilteringyourresultssectioninChapter4,ExtendingYourQueryingKnowledge.boost:Thisspecifiestheboostusedinthequery,defaultingto1.0.Thehighertheboost,thehigherthescoreofthematchingdocument.minimum_should_match:Thisdescribestheminimumnumberofshouldclausesthathavetomatchinorderforthecheckeddocumenttobecountedasamatch.Forexample,itcanbeanintegervaluesuchas2orapercentagevaluesuchas75%.Formoreinformation,refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-minimum-should-match.html.disable_coord:ABooleanparameter(defaultstofalse),whichallowsustoenableordisablethescorefactorcomputationthatisbasedonthefractionofallthequerytermsthatadocumentcontains.Weshouldsetittotrueforlessprecisescoring,butslightlyfasterqueries.
Imaginethatwewanttofindallthedocumentsthathavethetermcrimeinthetitlefield.Inaddition,thedocumentsmayormaynothavearangeof1900to2000intheyearfieldandmaynothavethenothingtermintheotitlefield.Suchaquerymadewiththeboolquerywilllookasfollows:
{
"query":{
"bool":{
"must":{
"term":{
"title":"crime"
www.EBooksWorld.ir
}
},
"should":{
"range":{
"year":{
"from":1900,
"to":2000
}
}
},
"must_not":{
"term":{
"otitle":"nothing"
}
}
}
}
}
NoteNotethatthemust,should,andmust_notsectionscancontainasinglequeryoranarrayofqueries.
www.EBooksWorld.ir
Thedis_maxqueryThedis_maxqueryisveryusefulasitgeneratesaunionofdocumentsreturnedbyallthesubqueriesandreturnsitastheresult.Thegoodthingaboutthisqueryisthefactthatwecancontrolhowthelowerscoringsubqueriesaffectthefinalscoreofthedocuments.Forthedis_maxquery,wespecifythequeriesusingthequeriesproperty(queryoranarrayofqueries)andthetiebreaker,withthetie_breakerproperty.Wecanalsoincludeadditionalboostbyspecifyingtheboostparameter.
Thefinaldocumentscoreiscalculatedasthesumofscoresofthemaximumscoringqueryandthesumofscoresreturnedfromtherestofthequeries,multipliedbythevalueofthetieparameter.So,thetie_breakerparameterallowsustocontrolhowthelowerscoringqueriesaffectthefinalscore.Ifwesetthetie_breakerparameterto1.0,wegettheexactsum,whilesettingthetieparameterto0.1resultsinonly10percentofthescores(ofallthescoresapartfromthemaximumscoringquery)beingaddedtothefinalscore.
Anexampleofthedis_maxqueryisasfollows:
{
"query":{
"dis_max":{
"tie_breaker":0.99,
"boost":10.0,
"queries":[
{
"match":{
"title":"crime"
}
},
{
"match":{
"author":"fyodor"
}
}
]
}
}
}
Asyoucansee,weincludedthetie_breakerandboostparameters.Inadditiontothat,wespecifiedthequeriesparameterthatholdsthearrayofqueriesthatwillberunandusedtogeneratetheunionofdocumentsforresults.
www.EBooksWorld.ir
TheboostingqueryTheboostingquerywrapsaroundtwoqueriesandlowersthescoreofthedocumentsreturnedbyoneofthequeries.Therearethreesectionsoftheboostingquerythatneedtobedefined:thepositivesectionthatholdsthequerywhosedocumentscorewillbeleftunchanged,thenegativesectionwhoseresultingdocumentswillhavetheirscorelowered,andthenegative_boostsectionthatholdstheboostvaluethatwillbeusedtolowerthesecondsection’squeryscore.Theadvantageoftheboostingqueryisthattheresultsofboththequeries(thenegativeandthepositiveones)willbepresentintheresults,althoughthescoresofsomequerieswillbelowered.Forcomparison,ifweweretousetheboolquerywiththemust_notsection,wewouldn’tgettheresultsforsuchaquery.
Let’sassumethatwewanttohavetheresultsofasimpletermqueryforthetermcrimeinthetitlefieldandwantthescoreofsuchdocumentstonotbechanged.However,wealsowanttohavethedocumentsthatrangefrom1800to1900intheyearfield,andthescoresofdocumentsreturnedbysuchaquerytohaveanadditionalboostof0.5.Suchaquerywilllooklikethefollowing:
{
"query":{
"boosting":{
"positive":{
"term":{
"title":"crime"
}
},
"negative":{
"range":{
"year":{
"from":1800,
"to":1900
}
}
},
"negative_boost":0.5
}
}
}
www.EBooksWorld.ir
Theconstant_scorequeryTheconstant_scorequerywrapsanotherqueryandreturnsaconstantscoreforeachdocumentreturnedbythewrappedquery.Wespecifythescorethatshouldbegiventothedocumentsbyusingtheboostproperty,whichdefaultsto1.0.Itallowsustostrictlycontrolthescorevalueassignedforadocumentmatchedbyaquery.Forexample,ifwewanttohaveascoreof2.0forallthedocumentsthathavethetermcrimeinthetitlefield,wesendthefollowingquerytoElasticsearch:
{
"query":{
"constant_score":{
"query":{
"term":{
"title":"crime"
}
},
"boost":2.0
}
}
}
www.EBooksWorld.ir
TheindicesqueryTheindicesqueryisusefulwhenexecutingaqueryagainstmultipleindices.Itallowsustoprovideanarrayofindices(theindicesproperty)andtwoqueries,onethatwillbeexecutedifwequerytheindexfromthelist(thequeryproperty)andthesecondthatwillbeexecutedonalltheotherindices(theno_match_queryproperty).Forexample,assumewehaveanaliasnamedbooks,holdingtwoindices:libraryandusers.Whatwewanttodoisusethisalias.However,wewanttorundifferentqueriesdependingonwhichindexisusedforsearching.Anexamplequeryfollowingthislogicwilllookasfollows:
{
"query":{
"indices":{
"indices":["library"],
"query":{
"term":{
"title":"crime"
}
},
"no_match_query":{
"term":{
"user":"crime"
}
}
}
}
}
Intheprecedingquery,thequerydescribedinthequerypropertywasrunagainstthelibraryindexandthequerydefinedintheno_match_querysectionwasrunagainstalltheotherindicespresentinthecluster,whichforourhypotheticalaliasmeanstheusersindex.
Theno_match_querypropertycanalsohaveastringvalueinsteadofaquery.Thisstringvaluecaneitherbeallornone,butitdefaultstoall.Iftheno_match_querypropertyissettoall,thedocumentsfromtheindicesthatdon’tmatchwillbereturned.Settingtheno_match_querypropertytononewillresultinnodocumentsfromtheindicesthatdon’tmatchthequeryfromthatsection.
www.EBooksWorld.ir
www.EBooksWorld.ir
UsingspanqueriesElasticsearchleveragesLucenespanqueries,whichallowustomakequerieswhensometokensorphrasesarenearothertokensorphrases.Basically,wecancallthempositionawarequeries.Whenusingthestandardnonspanqueries,wearenotabletomakequeriesthatarepositionaware;tosomeextent,thephrasequeriesallowthat,butonlytosomeextent.So,forElasticsearchandtheunderlyingLucene,itdoesn’tmatterifthetermisinthebeginningofthesentenceorattheendornearanotherterm.Whenusingspanqueries,itdoesmatter.
ThefollowingspanqueriesareexposedinElasticsearch:
spantermqueryspanfirstqueryspannearqueryspanorqueryspannotqueryspanwithinqueryspancontainingqueryspanmultiquery
Beforewecontinuewiththedescription,let’sindexadocumenttoacompletelynewindexthatwewillusetoshowhowspanquerieswork.Todothis,weusethefollowingcommand:
curl-XPUT'localhost:9200/spans/book/1'-d'{
"title":"Testbook",
"author":"Testauthor",
"description":"Theworldbreakseveryone,andafterward,somearestrong
atthebrokenplaces"
}'
www.EBooksWorld.ir
AspanAspan,inourcontext,isastartingandendingtokenpositioninafield.Forexample,inourcase,theworldbreakseveryonecouldbeasinglespan,aworldcanbeasinglespantoo.Asyoumayknow,duringanalysis,Lucene,inadditiontotoken,includessomeadditionalparameters,suchaspositioninthetokenstream.PositioninformationcombinedwiththetermsallowsustoconstructspansusingElasticsearchspanqueries(whicharemappedtoLucenespanqueries).Inthenextfewpages,wewilllearnhowtoconstructspansusingdifferentspanqueriesandhowtocontrolwhichdocumentsarematched.
www.EBooksWorld.ir
SpantermqueryThespan_termqueryisabuilderfortheotherspanqueries.Aspan_termqueryisaquerysimilartothealreadydiscussedtermquery.Onitsown,itworksjustlikethementionedtermquery–itmatchesaterm.Itsdefinitionissimpleandlooksasfollows(weomittedsomepartsofthequeriesonpurpose,becausewewilldiscussitlater):
{
"query":{
...
"span_term":{
"description":{
"value":"world",
"boost":5.0
}
}
}
}
Asyoucansee,itisverysimilartothestandardtermquery.Theabovequeryisrunagainstthedescriptionfieldandwewanttohavethedocumentsthathavetheworldtermreturned.Wealsospecifiedtheboost,whichisalsoallowed.
Onethingtorememberisthatthespan_termquery,similartothestandardtermquery,isnotanalyzed.
www.EBooksWorld.ir
SpanfirstqueryThespanfirstqueryallowsustomatchdocumentsthathavematchesonlyinthefirstpositionsofthefield.Inordertodefineaspanfirstquery,weneedtonestinsideofitanyotherspanquery;forexample,aspantermquerywealreadyknow.So,let’sfindthedocumentthathasthetermworldinthefirsttwopositionsinthedescriptionfield.Wedothatbysendingthefollowingquery:
{
"query":{
"span_first":{
"match":{
"span_term":{"description":"world"}
},
"end":2
}
}
}
Intheresults,wewillgetthedocumentthatwehadindexedinthebeginningofthissection.Inthematchsectionofthespanfirstquery,weshouldincludeatleastasinglespanquerythatshouldbematchedatthemaximumpositionspecifiedbytheendparameter.
So,tounderstandeverythingwell,ifwesettheendparameterto1,weshouldn’tgetourdocumentwiththepreviousquery.So,let’scheckitbysendingthefollowingquery:
{
"query":{
"span_first":{
"match":{
"span_term":{"description":"world"}
},
"end":1
}
}
}
Theresponsetotheprecedingquerywillbeasfollows:
{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":0,
"max_score":null,
"hits":[]
}
}
www.EBooksWorld.ir
Soitisworkingasexpected.Thisisbecausethefirstterminourindexwillbethetermtheandnotthetermworldwhichwesearchedfor.
www.EBooksWorld.ir
SpannearqueryThespannearqueryallowsustomatchdocumentsthathaveotherspansneareachotherandwecancallthisqueryacompoundqueryasitwrapsanotherspanquery.Forexample,ifwewanttofinddocumentsthathavethetermworldnearthetermeveryone,wewillrunthefollowingquery:
{
"query":{
"span_near":{
"clauses":[
{"span_term":{"description":"world"}},
{"span_term":{"description":"everyone"}}
],
"slop":0,
"in_order":true
}
}
}
Asyoucansee,wespecifyourqueriesintheclausessectionofthespannearquery.Itisanarrayofotherspanqueries.Theslopparameterdefinestheallowednumberoftermsbetweenthespans.Thein_orderparametercanbeusedtolimitthematchesonlytothosedocumentsthatmatchourqueriesinthesameorderthattheyweredefinedin.So,inourcase,wewillgetdocumentsthathaveworldeveryone,butnoteveryoneworldinthedescriptionfield.
Solet’sgetbacktoourquery,rightnowitwouldreturn0results.Ifyoulookatourexampledocument,youwillnoticethatbetweenthetermsworldandeveryone,anadditionaltermispresentandwesettheslopparameterto0(slopwasdiscussedduringthephrasequerydescription).Ifweincreaseitto1,wewillgetourresult.Totestit,let’ssendthefollowingquery:
{
"query":{
"span_near":{
"clauses":[
{"span_term":{"description":"world"}},
{"span_term":{"description":"everyone"}}
],
"slop":1,
"in_order":true
}
}
}
TheresultsreturnedbyElasticsearchareasfollows:
{
"took":6,
"timed_out":false,
"_shards":{
"total":5,
www.EBooksWorld.ir
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.10848885,
"hits":[{
"_index":"spans",
"_type":"book",
"_id":"1",
"_score":0.10848885,
"_source":{
"title":"Testbook",
"author":"Testauthor",
"description":"Theworldbreakseveryone,andafterward,someare
strongatthebrokenplaces"
}
}]
}
}
Aswecansee,thealteredquerysuccessfullyreturnedourindexeddocument.
www.EBooksWorld.ir
SpanorqueryThespanorqueryallowsustowrapotherspanqueriesandaggregatematchesofallthosethatwe’vewrapped.Similartothespan_nearquery,thespan_orqueryusesthearrayofclausestospecifyotherspanqueries.Forexample,ifwewanttogetthedocumentsthathavethetermworldinthefirsttwopositionsofthedescriptionfield,ortheonesthathavethetermworldnotfurtherthanasinglepositionfromthetermeveryone,wewillsendthefollowingquerytoElasticsearch:
{
"query":{
"span_or":{
"clauses":[
{
"span_first":{
"match":{
"span_term":{"description":"world"}
},
"end":2
}
},
{
"span_near":{
"clauses":[
{"span_term":{"description":"world"}},
{"span_term":{"description":"everyone"}}
],
"slop":1,
"in_order":true
}
}
]
}
}
}
Theresultoftheprecedingquerywillreturnourindexeddocument.
www.EBooksWorld.ir
SpannotqueryThespannotqueryallowsustospecifytwosectionsofqueries.Thefirstistheincludesectionwhichspecifieswhichspanqueriesshouldbematchedandthesecondsectionistheexcludeonewhichspecifiesthespanquerieswhichshouldn’tbeoverlappingthefirstones.Tokeepitsimple,ifaqueryfromtheexcludeonematchesthesamespan(orapartofit)asthequeryfromtheincludesection,suchadocumentwon’tbereturnedasamatchforsuchaspannotquery.Eachofthesesectionscancontainmultiplespanqueries.
So,toillustratethatquery,let’smakeaquerythatwillreturnallthedocumentsthathavethespanconstructedfromasingletermandwhichhavethetermbreaksinthedescriptionfield.Let’salsoexcludethedocumentsthathaveaspanwhichmatchesthetermsworldandeveryoneatthemaximumofasinglepositionfromeachother,whensuchaspanoverlapstheonedefinedinthefirstspanquery.
{
"query":{
"span_not":{
"include":{
"span_term":{"description":"breaks"}
},
"exclude":{
"span_near":{
"clauses":[
{"span_term":{"description":"world"}},
{"span_term":{"description":"everyone"}}
],
"slop":1
}
}
}
}
}
Thefollowingistheresult:
{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":0,
"max_score":null,
"hits":[]
}
}
Asyouwouldhavenoticed,theresultofthequeryisaswewouldhaveexpected.Ourdocumentwasn’tfoundbecausethespanqueryfromtheexcludesectionwasoverlapping
www.EBooksWorld.ir
thespanfromtheincludesection.
www.EBooksWorld.ir
SpanwithinqueryThespan_withinqueryallowsustofinddocumentsthathaveaspanenclosedinanotherspan.Wedefinetwosectionsinthespan_withinquery:thelittleandthebig.Thelittlesectiondefinesaspanquerythatneedstobeenclosedbythespanquerydefinedusingthebigsection.
Forexample,ifwewouldliketofindadocumentthathasthetermworldnearthetermbreaksandthosetermsshouldbeinsideaspanthatisboundbythetermsworldandafterwardnotmorethan10termsfromeachother,thequerythatdoesthatwilllookasfollows:
{
"query":{
"span_within":{
"little":{
"span_near":{
"clauses":[
{"span_term":{"description":"world"}},
{"span_term":{"description":"breaks"}}
],
"slop":0,
"in_order":false
}
},
"big":{
"span_near":{
"clauses":[
{"span_term":{"description":"world"}},
{"span_term":{"description":"afterward"}}
],
"slop":10,
"in_order":false
}
}
}
}
}
www.EBooksWorld.ir
SpancontainingqueryThespan_contaningquerycanbeseenastheoppositeofthespan_withinquerywejustdiscussed.Itallowsustomatchspansthatoverlapotherspans.Again,weusetwosectionswiththespanqueries:thelittleandthebig.Thelittlesectiondefinesaspanquerythatneedstobeenclosedbythespanquerydefinedusingthebigsection.
Wecanusethesameexample.Ifwewouldliketofindadocumentthathasthetermworldnearthetermbreaks,andthosetermsshouldbeinsideaspanthatisboundbythetermsworldandafterwardnotmorethan10termsfromeachother,thequerythatdoesthatwilllookasfollows:
{
"query":{
"span_containing":{
"little":{
"span_near":{
"clauses":[
{"span_term":{"description":"world"}},
{"span_term":{"description":"breaks"}}
],
"slop":0,
"in_order":false
}
},
"big":{
"span_near":{
"clauses":[
{"span_term":{"description":"world"}},
{"span_term":{"description":"afterward"}}
],
"slop":10,
"in_order":false
}
}
}
}
}
www.EBooksWorld.ir
SpanmultiqueryThelasttypeofspanquerythatElasticsearchsupportsisthespan_multiquery.Itallowsustowrapanymultitermquerythatwe’vediscussed(thetermquery,therangequery,thewildcardquery,theregexquery,thefuzzyquery,ortheprefixquery)asaspanquery.
Forexample,ifwewanttofinddocumentsthathavethetermstartingwiththeprefixworinthefirsttwopositionsinthedescriptionfield,wecandothatbysendingthefollowingquery:
{
"query":{
"span_multi":{
"match":{
"prefix":{
"description":{"value":"wor"}
}
}
}
}
}
Thereisonethingtoremember–themultitermquerythatwewanttouseneedstobeenclosedinthematchsectionofthespan_multiquery.
www.EBooksWorld.ir
PerformanceconsiderationsAfewwordsattheendofdiscussingspanqueries.Rememberthattheyarecostlierwhenitcomestoprocessingpower,becausenotonlydothetermshavetobematchedbutalsopositionshavetobecalculatedandchecked.ThismeansthatLuceneandthusElasticsearchwillneedmoreCPUcyclestocalculatealltheneededinformationtofindmatchingdocuments.Youcanexpectspanqueriestobeslowerthanthequeriesthatdon’ttakepositionsintoaccount.
www.EBooksWorld.ir
www.EBooksWorld.ir
ChoosingtherightqueryBynowwe’veseenwhatqueriesareavailableinElasticsearch,boththesimpleonesandtheonesthatcangroupotherqueriesaswell.Beforecontinuingwithmorecomplicatedtopics,wewouldliketodiscusswhichofthequeriesshouldbeusedforwhichusecase.Ofcourse,onecoulddedicatethewholebooktoshowingdifferentqueriesusecases,sowewillonlyshowafewofthemtohelpyouseewhatyoucanexpectandwhichquerytouse.
www.EBooksWorld.ir
TheusecasesAsyoualreadyknowwhichqueriescanbeusedtofindwhichdata,whatwewouldliketoshowyouareexampleusecasesusingthedataweindexedinChapter2,IndexingYourData.Todothis,wewillstartwithafewguidinglinesonhowtochosethequeryandthenwewillshowyouexampleusecasesanddiscusswhythosequeriescouldbeused.
LimitingresultstogiventagsOneofthesimplestexamplesofqueryingElasticsearchisthesearchforexactterms.ByexactwemeancharactertocharactercomparisonofatermthatisindexedandwrittenintoLuceneinvertedindex.Torunsuchaquery,wecanusethetermqueryprovidedbyElasticsearch.ThisisbecauseitscontentisnotanalyzedbyElasticsearch.Forexample,let’sassumethatwewouldliketosearchforallthebookswiththevaluenovelinthetagsfield,whichasweknowfromthemappingsisnotanalyzed.Todothat,wewouldrunthefollowingcommand:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"term":{
"tags":"novel"
}
}
}'
www.EBooksWorld.ir
SearchingforvaluesinarangeOneofthesimplestqueriesthatcanberunisaquerymatchingdocumentsinagivenrangeofvalues.Usuallysuchqueriesareapartofalargerqueryorafilter.Forexample,aquerythatwouldreturnbookswiththenumberofcopiesfrom1to3inclusive,wouldlookasfollows:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"range":{
"copies":{
"gte":1,
"lte":3
}
}
}
}'
BoostingsomeofthematcheddocumentsTherearemanycommonexamplesofusingtheboolquery.Forexample,verysimpleoneslikefindingdocumentshavingalistofterms.Whatwewouldliketoshowyouishowtousetheboolquerytoboostsomeofthedocuments.Forexample,ifwewanttofindallthedocumentsthathaveoneormorecopyandhavetheonesthatarepublishedafter1950,wewillrunthefollowingquery:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"bool":{
"must":[
{
"range":{
"copies":{
"gte":1
}
}
}
],
"should":[
{
"range":{
"year":{
"gt":1950
}
}
}
]
}
}
}'
IgnoringlowerscoringpartialqueriesThedis_maxquery,aswediscussed,allowsustocontrolhowinfluentialthelowerscoring
www.EBooksWorld.ir
partialqueriesare.Forexample,ifwewouldonlywanttoassignthescoreofthehighestscoringpartialqueryforthedocumentsmatchingcrimepunishmentinthetitlefieldorraskolnikovinthecharactersfield,wewouldrunthefollowingquery:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"fields":["_id","_score"],
"query":{
"dis_max":{
"tie_breaker":0.0,
"queries":[
{
"match":{
"title":"crimepunishment"
}
},
{
"match":{
"characters":"raskolnikov"
}
}
]
}
}
}'
Theresultfortheprecedingquerywilllookasfollows:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.70710677,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"4",
"_score":0.70710677
}]
}
}
Nowlet’sseethescoreofthepartialqueriesalone.Todothat,wewillrunthepartialqueriesusingthefollowingcommands:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"fields":["_id","_score"],
"query":{
"match":{
"title":"crimepunishment"
}
www.EBooksWorld.ir
}
}'
Theresponsefortheprecedingqueryisasfollows:
{
"took":4,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.70710677,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"4",
"_score":0.70710677
}]
}
}
Thefollowingisthenextcommand:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"fields":["_id","_score"],
"query":{
"match":{
"characters":"raskolnikov"
}
}
}'
Theresponseisasfollows:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.5,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"4",
"_score":0.5
}]
}
}
www.EBooksWorld.ir
Asyoucansee,thescoreofthedocumentreturnedbyourdis_maxqueryisequaltothescoreofthehighestscoringpartialquery(thefirstpartialquery).Thatisbecausewesetthetie_breakerpropertyto0.0.
UsingLucenequerysyntaxinqueriesHavingasimplesearchsyntaxisveryusefulforusersandwealreadyhavesuch–theLucenequerysyntax.Usingthequery_stringqueryisanexamplewherewecanleveragethatbyallowingtheuserstotypeinquerieswithadditionalcontrolcharacters.Forexample,ifwewouldliketofindbookshavingthetermscrimeandpunishmentintheirtitleandthefyodordostoevskyphraseintheauthorfield,andnotbeingpublishedbetween2000(exclusive)and2015(inclusive),wewouldusethefollowingcommand:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"query_string":{
"query":"+title:crime+title:punishment+author:\"fyodordostoevsky\"
-copies:{2000TO2015]"
}
}
}'
Asyoucansee,weusedtheLucenequerysyntaxtopassallthematchingrequirementsandweletthequeryparserconstructtheappropriatequery.
HandlinguserquerieswithouterrorsUsingthequery_stringqueryisveryhandy,butitisnoterrortolerant.IfouruserprovidesincorrectLucenesyntax,thequerywillreturnanerror.Becauseofthat,ElasticsearchexposesasecondquerythatsupportsanalysisandfullLucenequerysyntax–thesimple_query_stringquery.Usingsuchaqueryallowsustoruntheuserqueriesandnotcareabouttheparsingerrorsatall.Forexample,let’slookatthefollowingquery:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"query_string":{
"query":"+crime+punishment\"",
"default_field":"title"
}
}
}'
Theresponsewillcontain:
{
"error":{
"root_cause":[{
"type":"query_parsing_exception",
"reason":"Failedtoparsequery[+crime+punishment\"]",
"index":"library",
"line":6,
"col":3
}],
"type":"search_phase_execution_exception",
www.EBooksWorld.ir
"reason":"allshardsfailed",
"phase":"query",
"grouped":true,
"failed_shards":[{
"shard":0,
"index":"library",
"node":"7jznW07BRrqjG-aJ7iKeaQ",
"reason":{
"type":"query_parsing_exception",
"reason":"Failedtoparsequery[+crime+punishment\"]",
"index":"library",
"line":6,
"col":3,
"caused_by":{
"type":"parse_exception",
"reason":"Cannotparse'+crime+punishment\"':Lexicalerror
atline1,column21.Encountered:<EOF>after:\"\"",
"caused_by":{
"type":"token_mgr_error",
"reason":"Lexicalerroratline1,column21.
Encountered:<EOF>after:\"\""
}
}
}
}]
},
"status":400
}
Thismeansthatthequerywasnotproperlyconstructedandaparseerrorhappened.That’swhythesimple_query_stringquerywasintroduced.Itusesaqueryparserthattriestohandleusermistakesandtriestoguesshowthequeryshouldlook.Ourqueryusingthatparserwilllookasfollows:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"simple_query_string":{
"query":"+crime+punishment\"",
"fields":["title"]
}
}
}'
Ifyouruntheprecedingquery,youwillseethattheproperdocumentisreturnedbyElasticsearcheventhoughthequeryisnotproperlyconstructed.
AutocompleteusingprefixesAverycommonusecaseistoprovideautocompletefunctionalityontheindexeddata.Asweknow,theprefixqueryisnotanalyzedandworksonthebasisoftermsindexedinthefield.Sotheactualfunctionalitydependsonwhichtokensareproducedduringindexing.Forexample,let’sassumethatwewouldliketoprovideautocompletefunctionalityonanytokeninthetitlefieldandtheuserprovidedwesprefix.Aquerythatwouldmatchsucharequirementlooksasfollows:
www.EBooksWorld.ir
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"prefix":{
"title":"wes"
}
}
}'
FindingtermssimilartoagivenoneAverysimpleexampleisusingthefuzzyquerytofinddocumentshavingatermsimilartoagivenone.Forexample,ifwewanttofindallthedocumentshavingavaluesimilartocrimea,wewillrunthefollowingquery:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"fuzzy":{
"title":{
"value":"crimea",
"fuzziness":2,
"max_expansions":50
}
}
}
}'
MatchingphrasesThesimplestpositionawarequery,thephrasequeryallowsustofinddocumentsnotwithatermbuttermspositionedoneafteranother–onesthatformaphrase.Forexample,aquerythatwouldonlymatchdocumentsthathavethewestennichtsneuesphraseintheotitlefieldwouldlookasfollows:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"match_phrase":{
"otitle":"westennichtsneues"
}
}
}'
Spans,spanseverywhereThelastusecasewewouldliketodiscussisamorecomplicatedexampleofpositionawarequeriescalledspanqueries.Imaginethatwewouldliketorunaquerytofinddocumentsthathavethewesternfrontphrasenotmorethanthreepositionsafterthetermquietandallthatjustaftertheallterm?Thiscanbedonewithspanqueriesandthefollowingcommandshowshowsuchquerywilllook:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"span_near":{
"clauses":[
{
"span_term":{
www.EBooksWorld.ir
"title":"all"
}
},
{
"span_near":{
"clauses":[
{
"span_term":{
"title":"quiet"
}
},
{
"span_near":{
"clauses":[
{
"span_term":{
"title":"western"
}
},
{
"span_term":{
"title":"front"
}
}
],
"slop":0,
"in_order":true
}
}
],
"slop":3,
"in_order":true
}
}
],
"slop":0,
"in_order":true
}
}
}'
Notethatthespanqueriesarenotanalyzed.WecanseethatbylookingattheresponseoftheExplainAPI.Toseethatresponse,weshouldrunthesamerequestbody(ourquery)tothe/library/book/1/_explainRESTend-point.Theinterestingpartoftheoutputlooksasfollows:
"description":"weight(spanNear([title:all,spanNear([title:quiet,
spanNear([title:western,title:front],0,true)],3,true)],0,true)in0)
[PerFieldSimilarity],resultof:",
www.EBooksWorld.ir
www.EBooksWorld.ir
SummaryThischapterhasbeenallaboutthequeryingprocess.WestartedbylookingathowtoqueryElasticsearchandwhatElasticsearchdoeswhenitneedstohandlethequery.Wealsolearnedaboutthebasicandcompoundqueries,sowearenowabletousebothsimplequeriesaswellastheonesthatgroupmultiplesmallqueriestogether.Finally,wediscussedhowtochoosetherightqueryforagivenusecase.
Inthenextchapter,wewillextendourqueryknowledge.WewillstartwithfilteringourqueriesandmovetohighlightingpossibilitiesandawaytovalidateourqueriesusingElasticsearchAPI.WewilldiscusssortingofsearchresultsandqueryrewritewhichwillshowuswhathappenstosomequeriesinElasticsearchinternals.
www.EBooksWorld.ir
www.EBooksWorld.ir
Chapter4.ExtendingYourQueryingKnowledgeInthepreviouschapter,wedivedintoElasticsearchqueryingcapabilities.WediscussedhowtoqueryElasticsearchindetailandwelearnedhowElasticsearchqueryingworks.Wenowknowthebasicandcompoundqueriesofthisgreatsearchengineandwhataretheconfigurationoptionsforeachquerytype.Wealsogottoknowwhentouseourqueriesandwediscussedafewusecasesandwhichqueriescanbeusedtohandlethem.Thischapterisdedicatedtoextendingourqueryingknowledge.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:
WhatfilteringisandhowtouseitWhathighlightingisandhowtouseitWhatarethehighlightertypesandwhatbenefitstheybringHowtovalidateyourqueriesHowtosortyourqueryresultsWhatqueryrewriteisandhowtocontrolit
www.EBooksWorld.ir
FilteringyourresultsInthepreviouschapter,wetalkedaboutvarioustypesofqueries.Thecommonpartwasthatwealwayswantedtogetthebestresultsfirst.Thisisthemaindifferencefromthestandarddatabaseapproachwhereeverydocumentmatchesthequeryornot.Inthedatabaseworld,wedonotaskhowgoodthedocumentis;ouronlyinterestliesintheresultsreturned.Whentalkingaboutfulltextsearchenginesthisisdifferent–weareinterestednotonlyintheresults,wearealsointerestedintheirquality.Thereasonisobvious,wearesearchinginunstructureddata,usingtextfieldsthatuselanguageanalysis,stemming,andsoon.Becauseofthat,theinitialresultsofourqueries,inmostcases,giveresultsthatarefarfromoptimal.Thisiswhywhenwetalkaboutsearching,wetalkaboutprecisionanddocumentrecall.
Ontheotherhand,sometimeswewanttolimitthewholesubsetofdocumentstoachosenpart.Forexample,inalibrary,wemaywanttosearchonlytheavailablebooks,therestbeingunimportant.Sometimesthescore,busilycalculatedforthegivenfields,onlyinterfereswiththeoverallscoreandhasnomeaningintermsofaccuracy.Insuchcases,filtersshouldbeusedtolimittheresultsofthequery,butnotinterferewiththecalculatedscore.
PriortoElasticsearch2.0,filterswereindependententitiesfromqueries.Inpractice,almosteveryqueryhaditsowncounterpartinfilters.Therewasthetermqueryandthetermfilter,theboolqueryandtheboolfilter,therangequeryandtherangefilter,andsoon.Fromtheuserpointofview,themostimportantdifferencebetweenthequeriesandthefilterswasscoring.Thefilterdidn’tcalculatescore,whichresultedinthefilterbeingeasilycachedandmoreefficient.Butthisdifferencewasveryinconvenientforusers.WiththereleaseofElasticsearch2.0anditsusageofLucene5.3,filterqueriesweredeprecatedalongwithsometypesofqueriesthatallowedustousefilters.Let’sdiscusshowfilteringworksnowandwhatwecandotoachievethesameorbetterperformanceasbeforeinElasticsearch2.0.
www.EBooksWorld.ir
ThecontextisthekeyInElasticsearch2.0,queriescancalculatescoreoromititbychoosingmoreefficientwayofexecution.Thisbehavior,inmanycases,isdoneautomaticallybasedonthecontextwherethequeryisused.Thisisaboutthequeriesthatincludefiltersections,whichremovethedocumentsbasedonsomecriteria.Thesedocumentsareunnecessaryinthereturnedresultsandshouldbeskippedasquicklyaspossiblewithoutaffectingtheoverallscore.Thankstothis,afterdiscardingsomedocumentswecanfocusonlyontherestofthedocuments,calculatingtheirscores,andsortingthembeforereturning.Theexampleofthiscasecanbethemust_notclauseofaBooleanquery.Thedocumentthatmatchesthemust_notclausewillberemovedfromthereturnedresultset,socalculatingthescoreforthedocumentsmatchedbythispartoftheboolquerywouldbeanadditional,unnecessary,andperformanceineffectivework.
Thebestthingaboutallthechangesisthatwedon’tneedtocareaboutifwewanttousefilteringornot.ElasticsearchandtheunderlyingApacheLucenelibrarytakecareofchoosingtherightexecutionmethodforus.
www.EBooksWorld.ir
ExplicitfilteringwithboolqueryAswementionedintheCompoundqueriessectioninChapter3,SearchingYourData,theboolqueryinElasticsearch2.0allowsustoaddafilterexplicitlybyaddingthefiltersectionandincludingaqueryinthatsection.Thisisveryconvenientifwewanttohaveapartofthequerythatneedstomatch,butwearenotinterestedinthescoreforthosedocuments.
Let’slookatthefollowingquery:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"term":{
"available":true
}
}
}'
Weseeasimplequerythatshouldreturnallthebooksinourlibraryavailableforborrowing,whichmeansthedocumentswiththeavailablefieldsettotrue.Nowlet’scompareitwiththefollowingquery:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"bool":{
"must":{
"match_all":{}
},
"filter":{
"term":{
"available":true
}
}
}
}
}'
Thisqueryreturnsallthebooks,butitalsocontainsthefiltersection,whichtellsElasticsearchthatweareonlyinterestedintheavailablebooks.Thequerywillreturnthesameresultsasthepreviousquerywe’veseen,ofcoursewhenlookingonlyatthenumberofdocumentsandwhichdocumentsarereturned.Thedifferenceisthescore.Forourexampledata,boththequeriesreturntwobooks.Theresultsreturnedforthefirstquerylookasfollows:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
www.EBooksWorld.ir
"total":2,
"max_score":1.0,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"4",
"_score":1.0,
"_source":{
"title":"CrimeandPunishment",
"otitle":"Преступлéниеинаказáние",
"author":"FyodorDostoevsky",
"year":1886,
"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],
"tags":[],
"copies":0,
"available":true
}
},{
"_index":"library",
"_type":"book",
"_id":"1",
"_score":0.30685282,
"_source":{
"title":"AllQuietontheWesternFront",
"otitle":"ImWestennichtsNeues",
"author":"ErichMariaRemarque",
"year":1929,
"characters":["PaulBäumer","AlbertKropp","HaieWesthus",
"FredrichMüller","StanislausKatczinsky","Tjaden"],
"tags":["novel"],
"copies":1,
"available":true,
"section":3
}
}]
}
}
Theresultsforthesecondquerylookasfollows:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":2,
"max_score":1.0,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"4",
"_score":1.0,
www.EBooksWorld.ir
"_source":{
"title":"CrimeandPunishment",
"otitle":"Преступлéниеинаказáние",
"author":"FyodorDostoevsky",
"year":1886,
"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],
"tags":[],
"copies":0,
"available":true
}
},{
"_index":"library",
"_type":"book",
"_id":"1",
"_score":1.0,
"_source":{
"title":"AllQuietontheWesternFront",
"otitle":"ImWestennichtsNeues",
"author":"ErichMariaRemarque",
"year":1929,
"characters":["PaulBäumer","AlbertKropp","HaieWesthus",
"FredrichMüller","StanislausKatczinsky","Tjaden"],
"tags":["novel"],
"copies":1,
"available":true,"section":3}
}]
}
}
Ifyoulookatthescoreforthedocumentsineachquery,you’llnoticethedifference.Inthesimpletermquery,Elasticsearch(theLucenelibrary,infact)hasascoreof1.0forthefirstdocumentandascoreof0.30685282forthesecondone.Thisisnotaperfectsolutionbecausetheavailabilitycheckismoreorlessbinaryandwedon’twantittointerferewiththescore.That’swhythesecondqueryisbetterinthiscase.Withtheboolqueryandfiltering,thescoreforthefilterelementisnotcalculatedandthescoreforboththedocumentsisthesame,thatis1.0.
www.EBooksWorld.ir
www.EBooksWorld.ir
HighlightingYouhaveprobablyheardofhighlightingorseenit.YoumaynotevenknowthatyouareactuallyusinghighlightingwhenyouareusingthebiggerandsmallerpublicsearchenginesontheWorldWideWeb(WWW).Whenwetalkabouthighlightingincontextoffulltextsearch,weusuallymeanshowingwhichwordsorphrasesfromthequerywerematchedintheresultingdocuments.Forexample,ifweuseGoogleandsearchforthewordlucene,wewouldseethatwordboldedinthesearchresults:
ItisevenmorevisibleontheMicrosoftBingsearchengine:
Inthischapter,wewillseehowtouseElasticsearchhighlightingcapabilitiestoenhanceourapplicationwithhighlightedresults.
www.EBooksWorld.ir
GettingstartedwithhighlightingThereisnobetterwayofshowinghowhighlightingworksotherthanmakingaqueryandlookingattheresultsreturnedbyElasticsearch.Solet’sdothat.Weassumethatwewouldliketohighlightthetermsthatarematchedinthetitlefieldofourdocumentstoincreasethesearchexperienceofourusers.Bynowyouknowtheexampledatafromtoptobottom,solet’sagainreusethesamedataset.Wewanttomatchthetermcrimeinthetitlefieldandwewanttogethighlightingresults.Oneofthesimplestqueriesthatcanachievethislooksasfollows:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"match":{
"title":"crime"
}
},
"highlight":{
"fields":{
"title":{}
}
}
}'
Theresponsefortheprecedingqueryisasfollows:
{
"took":16,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.5,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"4",
"_score":0.5,
"_source":{
"title":"CrimeandPunishment",
"otitle":"Преступлéниеинаказáние",
"author":"FyodorDostoevsky",
"year":1886,
"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],
"tags":[],
"copies":0,
"available":true
},
"highlight":{
"title":["<em>Crime</em>andPunishment"]
}
www.EBooksWorld.ir
}]
}
}
Asyoucansee,apartfromthestandardinformationaboutthedocumentsthatmatchedthequery,wegotanewsectioncalledhighlight.Elasticsearchusedthe<em>HTMLtagasthebeginningofthehighlightingsectionanditsclosingcounterparttoclosethehighlightedsection.ThisisthedefaultbehaviorofElasticsearch,butwewilllearnhowtochangethat.
www.EBooksWorld.ir
FieldconfigurationInordertoperformhighlighting,theoriginalcontentofthefieldneedstobepresent.Wehavetosetthefieldswewilluseforhighlighting.Thisisdonebyeithermarkingafieldtobestoredorusingthe_sourcefieldwiththosefieldsincluded.Ifthefieldissettobestoredinthemappings,thestoredversionwillbeused,otherwiseElasticsearchwilltrytousethe_sourcefieldandextractthefieldthatneedstobehighlighted.
www.EBooksWorld.ir
UnderthehoodElasticsearchusesApacheLuceneunderthehoodandhighlightingisoneofthefeaturesofthatlibrary.Luceneprovidesthreetypesofhighlightingimplementation:thestandardone,whichwejustused;thesecondonecalledFastVectorHighlighter(https://lucene.apache.org/core/5_4_0/highlighter/org/apache/lucene/search/vectorhighlight/FastVectorHighlighter.htmlwhichneedstermvectorsandpositionstobeabletowork;andthethirdonecalledPostingsHighlighter
(http://lucene.apache.org/core/5_4_0/highlighter/org/apache/lucene/search/postingshighlight/PostingsHighlighter.htmlElasticsearchchoosestherighthighlighterimplementationautomatically.Ifthefieldisconfiguredwiththeterm_vectorpropertysettowith_positions_offsets,FastVectorHighlighterwillbeused.Ifthefieldisconfiguredwiththeindex_optionspropertysettooffsets,PostingsHighlighterwillbeused.Otherwise,thestandardhighlighterwillbeusedbyElasticsearch.
Whichhighlightertousedependsonyourdata,yourqueries,andtheneededperformance.Thestandardhighlighterisageneralusecaseone.However,ifyouwanttohighlightfieldswithlotsofdata,FastVectorHighlighteristherecommendedone.Thethingtorememberaboutitisthatitrequirestermvectorstobepresentandthatwillmakeyourindexslightlylarger.Finally,thefastesthighlighter,thatisalsorecommendedfornaturallanguagehighlighting,isPostingsHighlighter.However,thethingtorememberisthatPostingsHighlighterdoesn’tsupportcomplexqueriessuchasthematch_phrase_prefixqueryandinsuchcaseshighlightingwon’tbereturned.
ForcinghighlightertypeWhileElasticsearchchoosesthehighlightertypeforus,wecanalsoenforcethehighlightingtypeifwereallywantto.Todothat,weneedtosetthetypepropertytooneofthefollowingvalues:
plain:Whenthisvalueisset,Elasticsearchwillusethestandardhighlighterfvh:Whenthisvalueisset,ElasticsearchwilltryusingFastVectorHighlighter.Itwillrequiretermvectorstobeturnedonforthefieldusedforhighlighting.postings:Whenthisvalueisset,ElasticsearchwilltryusingPostingsHighlighter.Itwillrequireoffsetstobeturnedonforthefieldusedforhighlighting
Forexample,tousethestandardhighlighter,wewillrunthefollowingquery:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"term":{
"title":"crime"
}
},
"highlight":{
"fields":{
"title":{"type":"plain"}
}
}
}'
www.EBooksWorld.ir
ConfiguringHTMLtagsThedefaultbehaviorofhighlightingmechanismmaynotbesuitedforeveryone–notallofuswouldliketohavethe<em>and</em>tagstobeusedforhighlighting.Becauseofthat,Elasticsearchallowsustochangethedefaultbehaviorandchangethetagsthatareusedforthatpurpose.Todothat,weshouldsetthepre_tagsandpost_tagspropertiestothecodesnippetswewantthehighlightingtostartfromandendat;forexample,by<b>and</b>.Thepre_tagsandpost_tagspropertiesarearraysandbecauseofthatwecanprovidemorethanasingleopeningandclosingtagandElasticsearchwilluseeachofthedefinedtagstohighlightdifferentwords.Forexample,ifwewanttouse<b>astheopeningtagand</b>astheclosingtag,ourquerywilllooklikethis:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"term":{
"title":"crime"
}
},
"highlight":{
"pre_tags":["<b>"],
"post_tags":["</b>"],
"fields":{
"title":{}
}
}
}'
TheresultreturnedbyElasticsearchtotheprecedingquerywillbeasfollows:
{
"took":3,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.5,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"4",
"_score":0.5,
"_source":{
"title":"CrimeandPunishment",
"otitle":"Преступлéниеинаказáние",
"author":"FyodorDostoevsky",
"year":1886,
"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],
"tags":[],
"copies":0,
www.EBooksWorld.ir
"available":true
},
"highlight":{
"title":["<b>Crime</b>andPunishment"]
}
}]
}
}
Asyoucansee,thetermCrimeinthetitlefieldwassurroundedbythetagsofourchoice.
www.EBooksWorld.ir
ControllinghighlightedfragmentsElasticsearchallowsustocontrolthenumberofhighlightedfragmentsreturnedandtheirsizesbyexposingtwoproperties.Thefirstoneisnumber_of_fragments,whichdefinesthenumberoffragmentsreturnedbyElasticsearch(defaultsto5).Settingthispropertyto0causesthewholefieldtobereturned,whichcanbehandyforshortfieldsbutexpensiveforlongerfields.Thesecondpropertyisfragment_sizewhichletsusspecifythemaximumlengthofthehighlightedfragmentsincharactersanddefaultsto100.
Anexamplequeryusingthesepropertieswilllookasfollows:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"term":{
"title":"crime"
}
},
"highlight":{
"fields":{
"title":{"fragment_size":200,"number_of_fragments":0}
}
}
}'
www.EBooksWorld.ir
GlobalandlocalsettingsThehighlightingpropertieswediscussedpreviouslycanbesetbothonaglobalbasisandperfieldbasis.Theglobaloneswillbeusedforallthefieldsthatdon’toverwritethemandshouldbeplacedonthesamelevelasthefieldssectionofyourhighlighting,forexample,likethis:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"term":{
"title":"crime"
}
},
"highlight":{
"pre_tags":["<b>"],
"post_tags":["</b>"],
"fields":{
"title":{}
}
}
}'
Youcanalsosetthepropertiesforeachfield.Forexample,ifwewouldliketokeepthedefaultbehaviorforallthefieldsexceptourtitlefield,wewoulddothefollowing:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"term":{
"title":"crime"
}
},
"highlight":{
"fields":{
"title":{"pre_tags":["<b>"],"post_tags":["</b>"]}
}
}
}'
Asyoucansee,insteadofplacingthepropertiesonthesamelevelasthefieldssection,weplaceditinsidetheemptyJSONobjectthatspecifiesthetitlefieldbehavior.Ofcourse,eachfieldcanbeconfiguredusingdifferentproperties.
www.EBooksWorld.ir
RequirematchingSometimestheremaybeaneed(especiallywhenusingmultiplehighlightedfields)toshowonlythefieldsthatmatchedourquery.Inordertohavesuchbehavior,weneedtosettherequire_field_matchpropertytotrue.Settingthispropertytofalsewillcauseallthetermstobehighlightedevenifafielddidn’tmatchthequery.
Toseehowthatworks,let’screateanewindexcalledusersandlet’sindexasingledocumentthere.Wewilldothatbysendingthefollowingcommand:
curl-XPUT'http://localhost:9200/users/user/1'-d'{
"name":"Testuser",
"description":"Testdocument"
}'
So,let’sassumewewanttohighlightthehitsinbothoftheprecedingfields.Ourcommandsendingthequerytoournewindexwilllooklikethis:
curl-XGET'localhost:9200/users/_search?pretty'-d'{
"query":{
"term":{
"name":"test"
}
},
"highlight":{
"fields":{
"name":{"pre_tags":["<b>"],"post_tags":["</b>"]},
"description":{"pre_tags":["<b>"],"post_tags":["</b>"]}
}
}
}'
Theresultoftheprecedingquerywillbeasfollows:
{
"took":3,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.19178301,
"hits":[{
"_index":"users",
"_type":"user",
"_id":"1",
"_score":0.19178301,
"_source":{
"name":"Testuser",
"description":"Testdocument"
},
"highlight":{
www.EBooksWorld.ir
"name":["<b>Test</b>user"]
}
}]
}
}
Notethatweonlygothighlightingonthenamefield.Thisisbecauseourquerymatchedonlythatfield.Let’sseewhatwillhappenifwesettherequire_field_matchpropertytofalseanduseacommandsimilartothefollowingone:
curl-XGET'localhost:9200/users/_search?pretty'-d'{
"query":{
"term":{
"name":"test"
}
},
"highlight":{
"require_field_match":false,
"fields":{
"name":{"pre_tags":["<b>"],"post_tags":["</b>"]},
"description":{"pre_tags":["<b>"],"post_tags":["</b>"]}
}
}
}'
Nowlet’slookatthemodifiedqueryresults:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.19178301,
"hits":[{
"_index":"users",
"_type":"user",
"_id":"1",
"_score":0.19178301,
"_source":{
"name":"Testuser",
"description":"Testdocument"
},
"highlight":{
"name":["<b>Test</b>user"],
"description":["<b>Test</b>document"]
}
}]
}
}
Asyoucansee,Elasticsearchreturnedhighlightinginboththefieldsnow.
www.EBooksWorld.ir
CustomhighlightingqueryThereareusecaseswhereyourqueriesarecomplicatedandnotreallysuitableforhighlighting,butyoustillwanttousehighlightingfunctionality.Insuchcases,Elasticsearchallowsustohighlightresultsonthebasisofadifferentqueryprovidedusingthehighlight_queryproperty.Anexampleofusingadifferenthighlightingquerylooksasfollows:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"term":{
"title":"crime"
}
},
"highlight":{
"fields":{
"title":{
"highlight_query":{
"term":{
"title":"punishment"
}
}
}
}
}
}'
Theprecedingquerywillresultinhighlightingthetermpunishmentinthetitlefield,insteadofthecrimeone.
www.EBooksWorld.ir
ThePostingshighlighterItistimetotalkaboutthethirdavailablehighlighter.ItwasaddedinElasticsearch0.90.6andisslightlydifferentfromthepreviousones.PostingsHighlighterisautomaticallyusedwhenthefielddefinitionhasindex_optionssettooffsets.ToillustratehowPostingsHighlighterworks,wewillcreateasimpleindexwithproperconfigurationthatallowsthathighlightertowork.Wewilldothatbyusingthefollowingcommands:
curl-XPUT'localhost:9200/hl_test'
curl-XPOST'localhost:9200/hl_test/doc/_mapping'-d'{
"doc":{
"properties":{
"contents":{
"type":"string",
"fields":{
"ps":{"type":"string","index_options":"offsets"}
}
}
}
}
}'
Ifeverythinggoeswell,weshouldhaveanewindexandthemappings.Themappingshavetwofieldsdefined:onenamedcontentsandthesecondonenamedcontents.ps.Inthissecondcase,weturnedontheoffsetsbyusingtheindex_optionsproperty.ThismeansthatElasticsearchwillusethestandardhighlighterforthecontentsfieldandthepostingshighlighterforthecontents.psfield.
Toseethedifference,wewillindexasingledocumentwithafragmentfromWikipediadescribingthehistoryofBirmingham.Wedothatbyrunningthefollowingcommand:
curl-XPUTlocalhost:9200/hl_test/doc/1-d'{
"contents":"Birmingham''searlyhistoryisthatofaremoteand
marginalarea.Themaincentresofpopulation,powerandwealthinthepre-
industrialEnglishMidlandslayinthefertileandaccessiblerivervalleys
oftheTrent,theSevernandtheAvon.TheareaofmodernBirminghamlayin
between,ontheuplandBirminghamPlateauandwithinthedenselywoodedand
sparselypopulatedForestofArden."
}'
Thelaststepistosendaqueryusingboththehighlighters.Wecandoitinasinglerequestbyusingthefollowingcommand:
curl'localhost:9200/hl_test/_search?pretty'-d'{
"query":{
"term":{
"contents.ps":"modern"
}
},
"highlight":{
"require_field_match":false,
"fields":{
"contents":{},
"contents.ps":{}
www.EBooksWorld.ir
}
}
}'
Ifeverythinggoeswell,youwillfindthefollowingsnippetintheresponsereturnedbyElasticsearch:
"highlight":{
"contents":["valleysoftheTrent,theSevernandtheAvon.Thearea
of<em>modern</em>Birminghamlayinbetween,ontheupland"],
"contents.ps":["Theareaof<em>modern</em>Birminghamlayinbetween,
ontheuplandBirminghamPlateauandwithinthedenselywoodedandsparsely
populatedForestofArden."]
}
Asyousee,boththehighlightersfoundtheoccurrenceofthedesiredword.Thedifferenceisthatthepostingshighlighterreturnsthesmartersnippet–itchecksforthesentenceboundaries.
Let’stryonemorequery:
curl'localhost:9200/hl_test/_search?pretty'-d'{
"query":{
"match_phrase":{
"contents.ps":"centresof"
}
},
"highlight":{
"require_field_match":false,
"fields":{
"contents":{},
"contents.ps":{}
}
}
}'
Wesearchedforthephrasecentresof.Asyoumayexpect,theresultsforthetwohighlighterswilldiffer.Forthestandardhighlighter,runonthecontentsfield,wewillfindthefollowingphraseintheresponse:
"Birminghamsearlyhistoryisthatofaremoteandmarginalarea.Themain
<em>centres</em><em>of</em>population"
Asyoucanclearlysee,thestandardhighlighterdividedthegivenphraseandhighlightedindividualterms.Also,notalloccurrencesofthetermscentresandofwerehighlighted,butonlytheonesthatformedthephrase.
Ontheotherhand,thePostingsHighlighterreturnedthefollowinghighlightedfragment:
"Birminghamsearlyhistoryisthat<em>of</em>aremoteandmarginal
area.","Themain<em>centres</em><em>of</em>population,powerandwealth
inthepre-industrialEnglishMidlandslayinthefertileandaccessible
rivervalleys<em>of</em>theTrent,theSevernandtheAvon.","Thearea
<em>of</em>modernBirminghamlayinbetween,ontheuplandBirmingham
PlateauandwithinthedenselywoodedandsparselypopulatedForest
www.EBooksWorld.ir
<em>of</em>Arden."
Thisisthesignificantdifference.ThePostingsHighlighterhighlightedallthetermsmatchingthequeryandnotonlythosethatformedthephrase,andreturnedwholesentences.Thisisaverynicefeature,especiallywhenyouwanttodisplaythehighlightingresultsfortheuserintheUIofyourapplication.
www.EBooksWorld.ir
www.EBooksWorld.ir
ValidatingyourqueriesTherearetimeswhenyouarenotintotalcontrolofthequeriesthatyousendtoElasticsearch.Thequeriescanbegeneratedfrommultiplecriteriamakingthemamonsterorevenworse.Theycanbegeneratedbysomekindofawizardwhichmakesithardtotroubleshootandfindthepartthatisfaultyandmakingthequeryfail.Becauseofsuchusecases,ElasticsearchexposestheValidateAPI,whichhelpsusvalidateourqueriesanddiagnosepotentialproblems.
www.EBooksWorld.ir
UsingtheValidateAPITheusageoftheValidateAPIisverysimple.Insteadofsendingthequerytothe_searchRESTendpoint,wesendittothe_validate/queryone.Andthat’sit.Let’slookatthefollowingcommand:
curl-XGET'localhost:9200/library/_validate/query?pretty'--data-binary'{
"query":{
"bool":{
"must":{
"term":{
"title":"crime"
}
},
"should":{
"range:{
"year":{
"from":1900,
"to":2000
}
}
},
"must_not":{
"term":{
"otitle":"nothing"
}
}
}
}
}'
AsimilarquerywasalreadyusedinthisbookinChapter3,SearchingYourData.TheprecedingcommandwilltellElasticsearchtovalidateitandreturntheinformationaboutitsvalidity.TheresponseofElasticsearchtotheprecedingcommandwillbesimilartothefollowingone:
{
"valid":false,
"_shards":{
"total":1,
"successful":1,
"failed":0
}
}
Lookatthevalidattribute.Itissettofalse.Somethingwentwrong.Let’sexecutethequeryvalidationonceagainwiththeexplainparameteraddedinthequery:
curl-XGET'localhost:9200/library/_validate/query?pretty&explain'--data-
binary'{
"query":{
"bool":{
"must":{
"term":{
www.EBooksWorld.ir
"title":"crime"
}
},
"should":{
"range:{
"year":{
"from":1900,
"to":2000
}
}
},
"must_not":{
"term":{
"otitle":"nothing"
}
}
}
}
}'
NowtheresultreturnedfromElasticsearchismoreverbose:
{
"valid":false,
"_shards":{
"total":1,
"successful":1,
"failed":0
},
"explanations":[{
"index":"library",
"valid":false,
"error":"[library]QueryParsingException[Failedtoparse];nested:
JsonParseException[Illegalunquotedcharacter((CTRL-CHAR,code10)):has
tobeescapedusingbackslashtobeincludedinname\nat[Source:
org.elasticsearch.transport.netty.ChannelBufferStreamInput@1110d090;line:
10,column:18]];;com.fasterxml.jackson.core.JsonParseException:Illegal
unquotedcharacter((CTRL-CHAR,code10)):hastobeescapedusing
backslashtobeincludedinname\nat[Source:
org.elasticsearch.transport.netty.ChannelBufferStreamInput@1110d090;line:
10,column:18]"
}]
}
Noweverythingisclear.Inourexample,wehaveimproperlyquotedtherangeattribute.
NoteYoumaywonderwhyinourcurlqueryweusedthe--data-binaryparameter.ThisparameterproperlypreservesthenewlinecharacterwhensendingaquerytoElasticsearch.Thismeansthatthelineandthecolumnnumberremainintactandit’seasiertofinderrors.Intheothercases,the–dparameterismoreconvenientbecauseit’sshorter.
TheValidateAPIcanalsodetectothererrors,forexample,incorrectformatofanumberorothermapping-relatedissues.Unfortunately,forourapplication,itisnoteasytodetectwhattheproblemisbecauseofalackofstructureintheerrormessages.
www.EBooksWorld.ir
TheValidateAPIsupportsmostoftheparametersthataresupportedbystandardElasticsearchqueries,whichinclude:explain,ignore_unavailable,allow_no_indices,expand_wildcards,operation_threading,analyzer,analyze_wildcard,default_operator,df,lenient,lowercase_expanded_terms,andrewrite.
www.EBooksWorld.ir
www.EBooksWorld.ir
SortingdataSofarwe’verunourqueriesandgottheresultsintheorderdeterminedbythescoreofeachdocument.However,itisnotenoughforalltheusecases.Itisreallyhandytobeabletosortourresultsonthebasisofthefieldvalues.Forexample,whenyouaresearchinglogsortime-baseddataingeneral,youprobablywanttohavethemostrecentdatafirst.Inadditiontothat,Elasticsearchallowsustocontrolhowthedocumentsuchbesortednotonlyusingfieldvalues,butalsousingmoresophisticatedsortinglikeonesthatusescriptsorsortingonfieldsthathavemultiplevalues.Wewillcoverallthatinthissection.
www.EBooksWorld.ir
DefaultsortingLet’slookatthefollowingquerythatreturnsallthebookswithatleastoneofthespecifiedwords:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"terms":{
"title":["crime","front","punishment"]
}
}
}'
Underthehood,wecanimaginethatElasticsearchseestheprecedingqueryasfollows:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"terms":{
"title":["crime","front","punishment"]
}
},
"sort":{"_score":"desc"}
}'
Lookatthehighlightedsectionintheprecedingquery.ThisisthedefaultsortingusedbyElasticsearch.Forbettervisibility,wecanchangetheformattingslightlyandshowthehighlightedfragmentasfollows:
"sort":[
{"_score":"desc"}
]
Theprecedingsectiondefineshowthedocumentsshouldbesortedintheresultslist.Inthiscase,Elasticsearchwillshowthedocumentswiththehighestscoreontopoftheresultslist.Thesimplestmodificationistoreversetheorderingbychangingthesortsectiontothefollowingone:
"sort":[
{"_score":"asc"}
]
www.EBooksWorld.ir
SelectingfieldsusedforsortingDefaultsortingisboring,isn’tit?So,let’schangeittosortonthebasisofthevaluesofthefieldspresentinthedocuments.Let’schoosethetitlefield,whichmeansthatthesortsectionofourquerywilllookasfollows:
"sort":[
{"title":"asc"}
]
Unfortunately,thisdoesn’tworkasexpected.AlthoughElasticsearchsortedthedocuments,theorderingissomewhatstrange.Lookcloselyattheresponse.Witheverydocument,Elasticsearchreturnsinformationaboutthesorting;forexample,fortheCrimeandPunishmentbook,thereturneddocumentlookslikethefollowingcode:
{
"_index":"library",
"_type":"book",
"_id":"4",
"_score":null,
"_source":{
"title":"CrimeandPunishment",
"otitle":"Преступлéниеинаказáние",
"author":"FyodorDostoevsky",
"year":1886,
"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],
"tags":[],
"copies":0,
"available":true
},
"sort":["punishment"]
}
Ifyoucomparethetitlefieldandthereturnedsortinginformation,everythingshouldbeclear.Elasticsearch,duringtheanalysisprocess,splitsthefieldintoseveraltokens.Sincesortingisdoneusingasingletoken,Elasticsearchchoosesoneoftheproducedtokens.Itdoesthebestthatitcanbysortingthesetokensalphabeticallyandchoosingthefirstone.Thisisthereasonwhy,inthesortingvalue,wefindonlyasinglewordinsteadofthewholecontentofthetitlefield.IfyouwouldliketoseehowElasticsearchbehaveswhenusingdifferentfieldsforsorting,youcantryfieldssuchascopies:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"terms":{
"title":["crime","front","punishment"]
}
},
"sort":[
{"copies":"asc"}
]
}'
Ingeneral,itisagoodideatohaveanotanalyzedfieldforsorting.Wecanusefieldswith
www.EBooksWorld.ir
multiplevaluesforsorting,but,inmostcases,itdoesn’tmakemuchsenseandhaslimitedusage.
Asanexampleofusingtwodifferentfields,oneforsortingandanotherforsearching,let’schangeourtitlefield.Thechangedtitlefielddefinitionwilllookasfollows:
"title":{
"type":"string",
"fields":{
"sort":{"type":"string","index":"not_analyzed"}
}
}
Afterchangingthetitlefieldinthemappings(we’veusedthesamemappingsasinChapter3,SearchingYourData)andre-indexingthedata,wecantrysortingthetitle.sortfieldandseewhetheritworks.Todothis,wewillneedtosendthefollowingquery:
{
"query":{
"match_all":{}
},
"sort":[
{"title.sort":"asc"}
]
}
Now,itworksproperly.Asyoucansee,weusedthenewfield,thetitle.sortone.Wesetitasnottobeanalyzed,sothereisasingletokenforthatfieldintheindexofElasticsearch.
SortingmodeIntheresponsefromElasticsearch,everydocumentcontainsinformationaboutthevalueusedforsorting.Forexample,let’slookatoneofthedocumentsreturnedbythequeryinwhichweusedthetitlefieldforsorting:
{
"_index":"library",
"_type":"book",
"_id":"1",
"_score":null,
"_source":{
"title":"AllQuietontheWesternFront",
"otitle":"ImWestennichtsNeues",
"author":"ErichMariaRemarque",
"year":1929,
"characters":["PaulBäumer","AlbertKropp","HaieWesthus",
"FredrichMüller","StanislausKatczinsky","Tjaden"],
"tags":["novel"],
"copies":1,
"available":true,
"section":3
},
"sort":["all"]
www.EBooksWorld.ir
}
Thesortingusedinthequerytogettheprecedingdocument,wasasfollows:
"sort":[
{"title":"asc"}
]
However,becausewearesortingonananalyzedfield,whichcontainsmorethanasinglevalue,thesortingdefinitionisinfactequivalenttothelongerform,whichlooksasfollows:
"sort":[
{"title":{"order":"asc","mode":"min"}
]
modedefineswhichtokenshouldbeusedforcomparisonwhensortingonafieldwhichhasmorethanonevalue.Theavailablevalueswecanchoosefromare:
min:Sortingwillusethelowestvalue(orthefirstalphabeticalvalueonthetextbasedfields)max:Sortingwillusethehighestvalue(orthelastalphabeticalvalueonthetextbasedfields)avg:Sortingwillusetheaveragevaluemedian:Sortingwillusethemedianvaluesum:Sortingwillusethesumofallthevaluesinthefield
NoteThemodessuchasmedian,avg,andsumareusefulfornumericalmultivaluedfields,butdon’tmakemuchsensewhenitcomestotextbasedfields.
Notethatsort,inrequestandresponse,isgivenasanarray.Thissuggeststhatwecanuseseveraldifferentorderings.Elasticsearchwillusethenextelementinthesortingdefinitionlisttodetermineorderingbetweenthedocumentsthathavethesamevalueoftheprevioussortingclause.So,ifwehavethesamevalueinthetitlefield,thedocumentswillbesortedbythenextfieldthatwespecify.Forexample,ifwewouldliketogetthedocumentsthathavethemostcopiesandthensortbythetitle,wewillrunthefollowingquery:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"terms":{
"title":["crime","front","punishment"]
}
},
"sort":[
{"copies":"desc"},{"title":"asc"}
]
}'
www.EBooksWorld.ir
SpecifyingbehaviorformissingfieldsWhataboutwhensomeofthedocumentsthatmatchthequerydon’thavethefieldwewanttosorton?Bydefault,documentswithoutthegivenfieldarereturnedfirstinthecaseofascendingorderandlastinthecaseofdescendingorder.However,sometimesthisisnotexactlywhatwewanttoachieve.
Whenweusesortingonnumericfields,wecanchangethedefaultElasticsearchbehaviorfordocumentswithmissingfields.Forexample,let’stakealookatthefollowingquery:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"match_all":{}
},
"sort":[
{
"section":{
"order":"asc",
"missing":"_last"
}
}
]
}'
Notetheextendedformofthesortsectionofourquery.We’veaddedthemissingparametertoit.Bysettingthemissingparameterto_last,Elasticsearchwillplacethedocumentswithoutthegivenfieldatthebottomoftheresultslist.Settingthemissingparameterto_firstwillresultinElasticsearchplacingdocumentswithoutthegivenfieldatthetopoftheresultslist.Itisworthmentioningthatbesidesthe_lastand_firstvalues,Elasticsearchalsoallowsustouseanynumber.Insuchacase,adocumentwithoutadefinedfieldwillbetreatedasthedocumentwiththisgivenvalue.
www.EBooksWorld.ir
DynamiccriteriaAswementionedintheprevioussection,Elasticsearchallowsustosortusingfieldsthathavemultiplevalues.Wecancontrolhowthecomparisonismadeusingscripts.WedothatbyshowingElasticsearchhowtocalculatethevaluethatshouldbeusedforsorting.Let’sassumethatwewanttosortbythefirstvalueindexedinthetagsfield.Let’stakealookatthefollowingexamplequery(notethatrunningthefollowingqueryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile):
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"match_all":{}
},
"sort":{
"_script":{
"script":"doc[\"tags\"].values.size()>0?doc[\"tags\"].values[0]
:\"\u19999\"",
"type":"string",
"order":"asc"
}
}
}'
Intheprecedingexample,wereplacedeverynonexistentvaluewiththeUnicodecodeofacharacterthatshouldbelowenoughinthelist.Themainideaofthiscodeistocheckifourarraycontainsatleastasingleelement.Ifitdoes,thenthefirstvaluefromthearrayisreturned.Ifthearrayisempty,wereturntheUnicodecharacterthatshouldbeplacedatthebottomoftheresultslist.Besidesthescriptparameter,thisoptionofsortingrequiresustospecifytheorder(ascending,inourcase)andtypeparametersthatwillbeusedforthecomparison(wereturnstringfromourscript).
www.EBooksWorld.ir
CalculatescoringwhensortingBydefault,Elasticsearchassumesthatwhenyouusesorting,thescoreiscompletelyunimportant.Usuallyitisagoodassumption;whydoadditionalcomputationswhentheimportanceofthedocumentsisgivenbythesortingformula.Sometimes,however,youwanttoknowhowgoodthedocumentisinrelationtothecurrentquery,evenifthedocumentsarepresentedinadifferentorder.Thisiswhenthetrack_scoresparametershouldbeusedandsettotrue.Anexamplequeryusingitlooksasfollows:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"match_all":{}
},
"track_scores":true,
"sort":[
{"title":{"order":"asc"}}
]
}'
Theprecedingquerycalculatesthescoreforeverydocument.Infact,inourexample,thescoreisboringandisalwaysequalto1.0becauseofthematch_allquerywhichtreatsallthedocumentsasequal.
www.EBooksWorld.ir
www.EBooksWorld.ir
QueryrewriteWhendebuggingyourqueries,itisveryvaluabletoknowhowallthequeriesareexecuted.Becauseofthat,wedecidedtoincludethesectiononhowqueryrewriteworksinElasticsearch,whyitisused,andhowtocontrolit.Ifyouhaveeverusedqueries,suchastheprefixqueryandthewildcardquery,basicallyanyquerythatissaidtobemultiterm(aquerythatisbuiltofmultipleterms),you’veusedqueryrewritingeventhoughyoumaynothaveknownaboutit.Elasticsearchdoesrewriteforperformancereasons.Therewriteprocessisaboutchangingtheoriginal,expensivequeryintoasetofqueriesthatarefarlessexpensivefromanApacheLucenepointofview,thusspeedingupthequeryexecution.
www.EBooksWorld.ir
PrefixqueryasanexampleThebestwaytoillustratehowtherewriteprocessisdoneinternallyistolookatanexampleandseewhichtermsareusedinsteadoftheoriginalqueryterm.Wewillindexthreedocumentstoourlibrary_itindexbyusingthefollowingcommands:
curl-XPOST'localhost:9200/library_it/book/1'-d'{"title":"Solr4
Cookbook"}'
curl-XPOST'localhost:9200/library_it/book/2'-d'{"title":"Solr3.1
Cookbook"}'
curl-XPOST'localhost:9200/library_it/book/3'-d'{"title":"Mastering
Elasticsearch"}'
Whatwewouldlikeistofindallthedocumentsthatstartwiththeletters.Simpleasthat,werunthefollowingqueryagainstourlibrary_itindex:
curl-XGET'localhost:9200/library_it/_search?pretty'-d'{
"query":{
"prefix":{
"title":{
"prefix":"s",
"rewrite":"constant_score_boolean"
}
}
}
}'
We’veusedasimpleprefixquery;we’vesaidthatwewouldliketofindallthedocumentswiththelettersinthetitlefield.We’vealsousedtherewritepropertytospecifythequeryrewritemethod,butlet’sskipitfornowaswewilldiscussthepossiblevaluesofthisparameterinthelaterpartofthissection.
Astheresponsetothepreviousquery,wegetthefollowing:
{
"took":13,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":2,
"max_score":1.0,
"hits":[{
"_index":"library_it",
"_type":"book",
"_id":"2",
"_score":1.0,
"_source":{
"title":"Solr3.1Cookbook"
}
},{
"_index":"library_it",
www.EBooksWorld.ir
"_type":"book",
"_id":"1",
"_score":1.0,
"_source":{
"title":"Solr4Cookbook"
}
}]
}
}
Asyoucansee,inresponsewegotthetwodocumentsthathadthecontentsofthetitlefieldstartingwiththedesiredcharacter.Wedidn’tspecifythemappingsexplicitly,sowereliedonElasticsearch’sabilitytochoosethemappingtypeforus.Aswealreadyknow,forthetextfield,Elasticsearchusesthedefaultanalyzer.Thismeansthatthetermsinourdocumentswillbelowercasedand,becauseofthat,weusedthelowercasedletterinourprefixquery(rememberthattheprefixqueryisnotanalyzed).
www.EBooksWorld.ir
GettingbacktoApacheLuceneNowlet’stakeastepbackandlookatApacheLuceneagain.IfyourecallwhatLuceneinvertedindexisbuiltfrom,youcantellthatitcontainsaterm,count,anddocumentpointer(ifyoudon’trecall,refertotheFulltextsearchingsectioninChapter1,GettingStartedwithElasticsearchCluster).So,let’sseehowthesimplifiedviewoftheindexmaylookfortheprecedingdatawe’veputtothelibrary_itindex:
WhatyouseeinthecolumnwiththeTermtextisquiteimportant.IfyoulookatElasticsearchandApacheLuceneinternals,youcanseethatourprefixquerywasrewrittentothefollowingLucenequery:
ConstantScore(title:solr)
WecanchecktheportionsoftherewriteusingtheElasticsearchAPI.Firstofall,wecanusetheExplainAPIbyrunningthefollowingcommand:
curl-XGET'localhost:9200/library_it/book/1/_explain?pretty'-d'{
"query":{
"prefix":{
"title":{
"prefix":"s",
"rewrite":"constant_score_boolean"
}
}
}
}'
Theresultwillbeasfollows:
{
"_index":"library_it",
"_type":"book",
"_id":"1",
"matched":true,
"explanation":{
www.EBooksWorld.ir
"value":1.0,
"description":"sumof:",
"details":[{
"value":1.0,
"description":"ConstantScore(title:solr),productof:",
"details":[{
"value":1.0,
"description":"boost",
"details":[]
},{
"value":1.0,
"description":"queryNorm",
"details":[]
}]
},{
"value":0.0,
"description":"matchonrequiredclause,productof:",
"details":[{
"value":0.0,
"description":"#clause",
"details":[]
},{
"value":1.0,
"description":"_type:book,productof:",
"details":[{
"value":1.0,
"description":"boost",
"details":[]
},{
"value":1.0,
"description":"queryNorm",
"details":[]
}]
}]
}]
}
}
WecanseethatElasticsearchusedaconstantscorequerywiththetermsolragainstthetitlefield.
www.EBooksWorld.ir
QueryrewritepropertiesWecancontrolhowthequeriesarerewritteninternally.Todothat,weplacetherewriteparameterinsidetheJSONobjectresponsiblefortheactualquery.Forexample:
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"prefix":{
"title":"s",
"rewrite":"constant_score_boolean"
}
}
}'
Therewritepropertycantakethefollowingvalues:
scoring_boolean:ThisrewritemethodtranslateseachgeneratedtermintoaBooleanshouldclauseintheBooleanquery.Thisrewritemethodcausesthescoretobecalculatedforeachdocument.Becauseofthat,thismethodmaybeCPUdemanding.Pleasealsonotethat,forqueriesthathavemanyterms,itmayexceedtheBooleanquerylimit,whichissetto1024.ThedefaultBooleanquerylimitcanbechangedbysettingtheindex.query.bool.max_clause_countpropertyintheelasticsearch.ymlfile.However,rememberthatthemoreBooleanqueriesproduced,thelowerthequeryperformancemaybe.constant_score:Thisrewritemethodchoosesconstant_score_booleanorconstant_score_filterdependingonthequeryandtakingperformanceintoconsideration.Thisisalsothedefaultbehaviorwhentherewritepropertyisnotsetatall.constant_score_boolean:Thisrewritemethodissimilartothescoring_booleanrewritemethoddescribedpreviously,butlessCPUdemandingbecausethescoringisnotcomputedand,insteadofthat,eachtermreceivesascoreequaltothequeryboost(onebydefault,andwhichcanbesetusingtheboostproperty).BecausethisrewritemethodalsoresultsinBooleanshouldclausesbeingcreated,similartothescoring_booleanrewritemethod,thismethodcanalsohitthemaximumBooleanclauseslimit.top_terms_N:ArewritemethodthattranslateseachgeneratedtermintoaBooleanshouldclauseinaBooleanqueryandkeepsthescoresascomputedbythequery.However,unlikethescoring_booleanrewritemethod,itonlykeepsanNnumberoftopscoringtermstoavoidhittingthemaximumBooleanclauseslimitandincreasethefinalqueryperformance.top_terms_blended_freqs_N:ArewritemethodthattranslateseachtermintoaBooleanqueryandtreatthetermsasiftheyhadthesametermfrequency.top_terms_boost_N:Arewritemethodsimilartothetop_terms_None,butthescoresarenotcomputed.Instead,thedocumentsaregivenascoreequaltothevalueoftheboostproperty(onebydefault).
Forexample,ifwewouldlikeourexamplequerytousetop_terms_NwithNequalto2,ourquerywouldlooklikethis:
www.EBooksWorld.ir
curl-XGET'localhost:9200/library/book/_search?pretty'-d'{
"query":{
"prefix":{
"title":{
"prefix":"s",
"rewrite":"top_terms_2"
}
}
}
}'
IfyoulookattheresultsreturnedbyElasticsearch,you’llnoticethat,unlikeourinitialquery,thedocumentsweregivenascoredifferentthanthedefault1.0:
{
"took":4,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.15342641,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"3",
"_score":0.15342641,
"_source":{
"title":"TheCompleteSherlockHolmes",
"author":"ArthurConanDoyle",
"year":1936,
"characters":["SherlockHolmes","Dr.Watson","G.Lestrade"],
"tags":[],
"copies":0,
"available":false,
"section":12
}
}]
}
}
Thescoreisdifferentthanthedefault1.0becausewe’veusedthetop_terms_NrewritetypeandthistypeofqueryrewritekeepsthescoreforNtopscoringterms.
BeforewefinishtheQueryrewritesectionofthischapter,weshouldaskourselvesonelastquestion:whentousewhichrewritetype?Theanswertothisquestiongreatlydependsonyourusecase,but,tosummarize,ifyoucanlivewithlowerprecisionandrelevancy(buthigherperformance),youcangoforthetopNrewritemethod.Ifyouneedhighprecisionandthusmorerelevantqueries(butlowerperformance),choosetheBooleanapproach.
www.EBooksWorld.ir
www.EBooksWorld.ir
SummaryThechapteryoujustfinishedwasagainfocusedonquerying.Weusedfiltersandsawwhathighlightingisandhowtouseit.Welearnedwhatarethehighlightertypesandhowtheycanhelpus.WevalidatedourqueriesandwelearnedhowElasticsearchcanhelpuswhenitcomestosortingourresults.Finally,wediscussedqueryrewriting,whatthatbringsus,andhowwecancontrolit.
Inthenextchapter,wewillgetbacktoindexationtopic.WewilldiscussindexingcomplexJSONobjectssuchastree-likestructuresandindexingdatathatisnotflat.WewillprepareElasticsearchtohandlerelationshipsbetweendocumentsandwewillusetheElasticsearchAPItoupdatethestructureofourindices.
www.EBooksWorld.ir
www.EBooksWorld.ir
Chapter5.ExtendingYourIndexStructureWestartedthepreviouschapterbylearninghowtodealwithrevisedfilteringinElasticsearch2.xandwhattoexpectfromitnow.Wealsoexploredhighlightingandhowitcanhelpusinimprovingtheusers’searchexperience.WediscoveredqueryvalidationinElasticsearchandlearnedthewaysofdatasortinginElasticsearch.Finally,wediscussedqueryrewritingandhowthataffectsourqueries.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:
Indexingtree-likestructuresIndexingdatathatisnotflatHandlingdocumentrelationshipsbyusingnestedobjectandparent–childfeaturesModifyingindexstructurebyusingElasticsearchAPI
www.EBooksWorld.ir
Indexingtree-likestructuresTreesareeverywhere.Ifyoudevelopane-commerceshopapplication,yourproductswillprobablybedescribedwiththeuseofcategories.Thethingaboutcategoriesisthatinmostcasestheyarehierarchical.Therearetopcategories,suchaselectronics,music,books,andsoon.Eachofthetoplevelcategoriescanhavenumerouschildrencategories,suchasfictionandscience,andthosecangetevendeeperintosciencefiction,romance,andsoon.Ifyoulookatthefilesystem,thefilesanddirectoriesarearrangedintree-likestructuresaswell.Thisbookcanalsoberepresentedasatree:chapterscontaintopicsandtopicsaredividedintosubtopics.Sothedataaroundusisarrangedintotree-likestructuresandasyoucanimagine,Elasticsearchiscapableofindexingtree-likestructuressothatwecanrepresentthedatainaneasiermanner.Let’scheckhowwecannavigatethroughthistypeofdatausingpath_analyzer.
www.EBooksWorld.ir
DatastructureTobeginwith,let’screateasimpleindexstructurebyusingthefollowingcommand:
curl-XPUT'localhost:9200/path?pretty'-d'{
"settings":{
"index":{
"analysis":{
"analyzer":{
"path_analyzer":{"tokenizer":"path_hierarchy"}
}
}
}
},
"mappings":{
"category":{
"properties":{
"category":{
"type":"string",
"fields":{
"name":{"type":"string","index":"not_analyzed"},
"path":{"type":"string","analyzer":"path_analyzer",
"store":true}
}
}
}
}
}
}'
Asyoucansee,wehaveasingletypecreated–thecategorytype.Wewilluseittostoreandindextheinformationaboutthelocationofourdocumentinthetreestructure.Theideaissimple–wecanshowthelocationofthedocumentasapath,intheexactsamemannerasthefilesanddirectoriesarepresentedonyourharddiskdrive.Forexample,inanautomotiveshop,wecanhave/cars/passenger/sport,/cars/passenger/camper,or/cars/delivery_truck/.However,toachievethat,weneedtoindexthispathintwodifferentways.Firstofall,wewilluseannotanalyzedfieldcalledname,tostoreandindexpathsnameinitsoriginalform.Wewillalsouseafieldcalledpath,whichwillusethepath_analyzeranalyzerwhichwe’vedefinedtoprocessthepathsoitiseasiertosearch.
www.EBooksWorld.ir
AnalysisNow,let’sseewhatElasticsearchwilldowiththecategorypathduringtheanalysisprocess.Toseethis,wewillusethefollowingcommandline,whichusestheanalysisAPIdiscussedintheUnderstandingtheexplaininformationsectionofChapter6,MakeYourSearchBetter:
curl-XGET'localhost:9200/path/_analyze?field=category.path&pretty'-d
'/cars/passenger/sport'
ThefollowingresultswillbereturnedbyElasticsearch:
{
"tokens":[{
"token":"/cars",
"start_offset":0,
"end_offset":5,
"type":"word",
"position":0
},{
"token":"/cars/passenger",
"start_offset":0,
"end_offset":15,
"type":"word",
"position":0
},{
"token":"/cars/passenger/sport",
"start_offset":0,
"end_offset":21,
"type":"word",
"position":0
}]
}
Aswecansee,ourcategorypath/cars/passenger/sportwasprocessedbyElasticsearchanddividedintothreetokens.Thankstothis,wecansimplyfindeverydocumentthatbelongstoagivencategoryoritssubcategoriesusingthetermfilter.Fortheexampletobecomplete,let’sindexasimpledocumentbyusingthefollowingcommand:
curl-XPUT'localhost:9200/path/category/1'-d'{"category":
"/cars/passenger/sport"}'
Anexampleofusingfiltersisasfollows:
curl-XGET'localhost:9200/path/_search?pretty'-d'{
"query":{
"bool":{
"filter":{
"term":{
"category.path":"/cars"
}
}
}
}
}'
www.EBooksWorld.ir
Notethatwealsohavetheoriginalvalueindexedinthecategory.namefield.Thisishandywhenwewanttofinddocumentsfromaparticularpath,ignoringthedocumentsthataredeeperinthehierarchy.
www.EBooksWorld.ir
www.EBooksWorld.ir
IndexingdatathatisnotflatNotalldataisflatliketheexampleswehaveusedinthebookuntilnow.MostofthedatayouwillencounterwillhavesomestructureandnestedobjectsinsidetherootJSONobject.Ofcourse,ifwearebuildingoursystemthatElasticsearchwillbeapartofandweareincontrolofallthepiecesofit,wecancreateastructurethatisconvenientforElasticsearch.Buteveninsuchcases,flatdataisnotalwaysanoption.Thankfully,Elasticsearchallowsustoindexdatathatisnotflatandthissectionwillshowushowtodothat.
www.EBooksWorld.ir
DataLet’sassumethatwehavethefollowingdata(westoreitinthefilecalledstructured_data.json):
{
"author":{
"name":{
"firstName":"Fyodor",
"lastName":"Dostoevsky"
}
},
"isbn":"123456789",
"englishTitle":"CrimeandPunishment",
"year":1886,
"characters":[
{
"name":"Raskolnikov"
},
{
"name":"Sofia"
}
],
"copies":0
}
Asyoucanseethedataisnotflat–itcontainsarraysandnestedobjects.Ifwewanttocreatemappingsandusetheknowledgethatwe’vegotsofar,wewillhavetoflattenthedata.However,aswealreadysaid,Elasticsearchallowssomedegreeofstructureandweshouldbeabletocreatemappingsthatwillworkfortheprecedingexample.
www.EBooksWorld.ir
ObjectsTheprecedingexampledatashowsthestructuredJSONfile.Asyoucanseeintheexample,ourrootobjecthassomeadditional,simpleproperties,suchasenglishTitle,isbn,year,andcopies.Thesewillbeindexedasnormalfieldsintheindexandwealreadyknowhowtodealwiththem(wediscussedthatintheMappingsconfigurationsectionofChapter2,IndexingYourData).Inadditiontothat,ithasthecharactersarraytypeandtheauthorobject.Theauthorobjecthasanotherobjectnestedwithinit–thenameobject,whichhastwoproperties:firstNameandlastName.Soasyoucansee,wecanhavemultiplenestedobjectsinsideeachother.
www.EBooksWorld.ir
ArraysWehavealreadyusedarraytypedata,butwedidn’ttalkaboutit.Bydefault,allthefieldsinLuceneandthusinElasticsearcharemultivalued,whichmeansthattheycanstoremultiplevalues.InordertosendsuchfieldstoindexingtoElasticsearch,weusetheJSONarraytype,whichisnestedwithintheopeningandclosingsquarebrackets[].Asyoucanseeintheprecedingexample,weusedthearraytypeforthecharactersofourbook.
www.EBooksWorld.ir
MappingsLet’snowlookathowourmappingswouldlooklikeforthebookobjectweshowedearlier.Wealreadysaidthattoindexarrayswedon’tneedanythingspecial.So,inourcase,toindexthecharactersdatawewillneedtoaddfieldsdefinitionsimilartothefollowingone:
"characters":{
"properties":{
"name":{"type":"string"}
}
}
Nothingstrange!Wejustnestthepropertiessectioninsidethearraysname(whichischaractersinourcase)andwedefinethefieldsthere.Astheresultoftheprecedingmappings,wewillgetthecharacters.namemultivaluedfieldintheindex.
Wedosimilarthingforourauthorobject.Wecallthesectionwiththesamenameasitispresentinthedata.Wehavetheauthorobject,butitalsohasthenameobjectnestedinit,sowedothesame–wejustnestanotherobjectinsideit.So,ourmappingsfortheauthorfieldwouldlookasfollows:
"author":{
"properties":{
"name":{
"properties":{
"firstName":{"type":"string"},
"lastName":{"type":"string"}
}
}
}
}
ThefirstNameandlastNamefieldsappearintheindexasauthor.name.firstNameandauthor.name.lastName.
Therestofthefieldsaresimplecoretypes,soI’llskipdiscussingthemastheywerealreadydiscussedintheMappingsconfigurationsectionofChapter2,IndexingYourData.
FinalmappingsSoourfinalmappingsfile,thatwe’vecalledstructured_mapping.json,lookslikethefollowing:
{
"book":{
"properties":{
"author":{
"type":"object",
"properties":{
"name":{
"type":"object",
www.EBooksWorld.ir
"properties":{
"firstName":{"type":"string"},
"lastName":{"type":"string"}
}
}
}
},
"isbn":{"type":"string"},
"englishTitle":{"type":"string"},
"year":{"type":"integer"},
"characters":{
"properties":{
"name":{"type":"string"}
}
},
"copies":{"type":"integer"}
}
}
}
SendingthemappingstoElasticsearchNowthatwehaveourmappingsdone,wewouldliketotestifalltheworkwedidactuallyworks.Thistimewewilluseaslightlydifferenttechniqueofcreatinganindexandputtingthemappings.First,let’screatethelibraryindexwiththefollowingcommand(youneedtodeletethelibraryindexifyoualreadyhaveit):
curl-XPUT'localhost:9200/library'
Now,let’ssendourmappingsforthebooktype:
curl-XPUT'localhost:9200/library/book/_mapping'-d
@structured_mapping.json
Nowwecanindexourexampledata:
curl-XPOST'localhost:9200/library/book/1'-d@structured_data.json
www.EBooksWorld.ir
TobeornottobedynamicAswealreadyknow,Elasticsearchisschema-less,whichmeansthatitcanindexdatawithouttheneedofcreatingthemappingsupfront.WhatElasticsearchwilldointhebackgroundwhenanewfieldisencounteredinthedataisamappingupdate–itwilltrytoguessthefieldtypeandaddittothemappings.ThedynamicbehaviorofElasticsearchisturnedonbydefault,buttheremaybesituationswhereyoumaywanttoturnitoffforsomepartsofyourindex.Inordertodothat,oneshouldaddthedynamicpropertytothegivenfieldandsetittofalse.Thisshouldbedoneonthesamelevelofnestingasthetypepropertyfortheobject,whichshouldn’tbedynamic.Forexample,ifwewantourauthorandnameobjectstonotbedynamic,weshouldmodifytherelevantpartofthemappingsfilesothatitlooksasfollows:
"author":{
"type":"object",
"dynamic":false,
"properties":{
"name":{
"type":"object",
"dynamic":false,
"properties":{
"firstName":{"type":"string","index":"analyzed"},
"lastName":{"type":"string","index":"analyzed"}
}
}
}
}
However,rememberthatinordertoaddnewfieldsforsuchobjects,wewouldhavetoupdatethemappings.
NoteYoucanalsoturnoffthedynamicmappingsfunctionalitybyaddingtheindex.mapper.dynamicpropertytoyourelasticsearch.ymlconfigurationfileandsettingittofalse.
www.EBooksWorld.ir
DisablingobjectindexingThereisoneadditionalthingthatwewouldliketomentionwhenitcomestoobjectshandling–wecandisableindexingaparticularobjectbyusingtheenabledpropertyandsettingittofalse.Theremaybevariousreasonsforthat,suchasnotwantingafieldtobeindexedornotwantingawholeJSONobjecttobeindexed.Forexample,ifwewanttoomitanobjectcalledinformationfromourauthorobject,wewillhavetheauthorobjectdefinitionlookasfollows:
"author":{
"type":"object",
"properties":{
"name":{
"type":"object",
"dynamic":false,
"properties":{
"firstName":{"type":"string","index":"analyzed"},
"lastName":{"type":"string","index":"analyzed"},
"information":{"type":"object","enabled":false}
}
}
}
}
Thedynamicparametercanalsobesettostrict.Thismeansthatnewfieldswon’tbeaddedintothedocumentwhentheyappearandtheindexingofsuchdocumentwillfail.
www.EBooksWorld.ir
www.EBooksWorld.ir
UsingnestedobjectsNestedobjectscancomeinhandyincertainsituations.Basically,withnestedobjectsElasticsearchallowsustoconnectmultipledocumentstogether–onemaindocumentandmultipledependentones.Themaindocumentandthenestedonesareindexedtogetherandtheyareplacedinthesamesegmentoftheindex(actually,inthesameblockinsidethesegment,neareachother),whichguaranteesthebestperformancewecangetforsuchadatastructure.Thesamegoesforchangingthedocument;unlessyouareusingtheupdateAPI,youneedtoindextheparentdocumentandalltheothernestedonesatthesametime.
NoteIfyouwouldliketoreadmoreabouthownestedobjectsworkontheApacheLucenelevel,thereisaverygoodblogpostwrittenbyMikeMcCandlessathttp://blog.mikemccandless.com/2012/01/searching-relational-content-with.html.
Nowlet’sgetonwithourexampleusecase.Imaginethatwehaveashopwithclothesandwestorethesizeandcolorofeacht-shirt.Ourstandard,non-nestedmappingswilllooklikethis(storedincloth.json):
{
"cloth":{
"properties":{
"name":{"type":"string"},
"size":{"type":"string","index":"not_analyzed"},
"color":{"type":"string","index":"not_analyzed"}
}
}
}
Tocreatetheshopindexwithoutclothmapping,werunthefollowingcommands:
curl-XPOST'localhost:9200/shop'
curl-XPUT'localhost:9200/shop/cloth/_mapping'[email protected]
Nowimaginethatwehaveat-shirtinourshopthatweonlyhaveinXXLsizeinredandinXLsizeinblack.Soourexampledocumentindexationcommandwilllookasfollows:
curl-XPOST'localhost:9200/shop/cloth/1'-d'{
"name":"Testshirt",
"size":["XXL","XL"],
"color":["red","black"]
}'
However,thereisaproblemwithsuchadatastructure.WhatifoneofourclientssearchesourshopinordertofindtheXXLt-shirtinblack?Let’scheckthatbyrunningthefollowingquery(weassumethatwe’veusedourmappingstocreatetheindexandwe’veindexedourexampledocument):
curl-XGET'localhost:9200/shop/cloth/_search?pretty=true'-d'{
"query":{
www.EBooksWorld.ir
"bool":{
"must":[
{
"term":{"size":"XXL"}
},
{
"term":{"color":"black"}
}
]
}
}
}'
Weshouldgetnoresults,right?ButinfactElasticsearchreturnedthefollowingdocument:
{
(…)
"hits":{
"total":1,
"max_score":0.4339554,
"hits":[{
"_index":"shop",
"_type":"cloth",
"_id":"1",
"_score":0.4339554,
"_source":{
"name":"Testshirt",
"size":["XXL","XL"],
"color":["red","black"]
}
}]
}
}
Thisisbecausethedocumentwasmatched–wehavethevalueswearesearchingforinthesizefieldandinthecolorfield.Ofcourse,thisisnotwhatwewouldliketoget.
So,let’smodifyourmappingstousethenestedobjectstoseparatecolorandsizetodifferentnesteddocuments.Thefinalmappinglooksasfollows(westorethesemappingsinthecloth_nested.jsonfile):
{
"cloth":{
"properties":{
"name":{"type":"string","index":"analyzed"},
"variation":{
"type":"nested",
"properties":{
"size":{"type":"string","index":"not_analyzed"},
"color":{"type":"string","index":"not_analyzed"}
}
}
}
}
}
www.EBooksWorld.ir
Now,wewillcreateasecondindexcalledshop_nestedusingourmodifiedmappingsbyrunningthefollowingcommands:
curl-XPOST'localhost:9200/shop_nested'
curl-XPUT'localhost:9200/shop_nested/cloth/_mapping'-d
@cloth_nested.json
Asyoucansee,we’veintroducedanewobjectinsideourclothtype–variationone,whichisanestedone(thetypepropertysettonested).Itbasicallysaysthatwewillwanttoindexthenesteddocuments.Now,let’smodifyourdocument.Wewilladdthevariationobjecttoitandthatobjectwillstoretheobjectswithtwoproperties–sizeandcolor.Sotheindexcommandforourmodifiedexampleproductwilllooklikethefollowing:
curl-XPOST'localhost:9200/shop_nested/cloth/1'-d'{
"name":"Testshirt",
"variation":[
{"size":"XXL","color":"red"},
{"size":"XL","color":"black"}
]
}'
We’vestructuredthedocumentsothateachsizeanditsmatchingcolorisaseparatedocument.However,ifyourunourpreviousquery,itwon’treturnanydocuments.Thisisbecauseinordertoqueryfornesteddocuments,weneedtouseaspecializedquery.Sonowourquerylooksasfollows:
curl-XGET'localhost:9200/shop_nested/cloth/_search?pretty=true'-d'{
"query":{
"nested":{
"path":"variation",
"query":{
"bool":{
"must":[
{"term":{"variation.size":"XXL"}},
{"term":{"variation.color":"black"}}
]
}
}
}
}
}'
Andnow,theprecedingquerywillnotreturntheindexeddocument,becausewedon’thaveanesteddocumentthathasthesizeequaltoXXLandcolorblack.
Let’sgetbacktothequeryforasecondtodiscussitbriefly.Asyoucansee,weusedthenestedqueryinordertosearchinthenesteddocuments.Thepathpropertyspecifiesthenameofthenestedobject(yes,wecanhavemultipleofthem).Wejustincludedastandardquerysectionunderthenestedtype.Alsonotethatwespecifiedthefullpathforthefieldnamesinthenestedobjects,whichishandywhenyouhavemultilevelnesting,whichisalsopossible.
www.EBooksWorld.ir
ScoringandnestedqueriesThereisoneadditionalpropertywhenitcomestohandlingnesteddocumentsduringquery.Inadditiontothepathproperty,thereisthescore_modeproperty,whichallowsustodefinehowthescoringiscalculatedfromthenestedqueries.Elasticsearchallowsustosetthescore_modepropertytooneofthefollowingvalues:
avg:Thisisthedefaultvalue.Usingitforthescore_modepropertywillresultinElasticsearchtakingtheaveragevaluecalculatedfromthescoresofthedefinednestedqueries.Calculatedaveragewillbeincludedinthescoreofthemainquery.sum:Usingthisvalueforthescore_modepropertywillresultinElasticsearchtakingasumofthescoresforeachnestedqueryandincludingitinthescoreofthemainquery.min:Usingthisvalueforthescore_modepropertywillresultinElasticsearchtakingthescoreoftheminimumscoringnestedqueryandincludingitinthescoreofthemainquery.max:Usingthisvalueforthescore_modepropertywillresultinElasticsearchtakingthescoreofthemaximumscoringnestedqueryandincludingitinthescoreofthemainquery.none:Usingthisvalueforthescore_modepropertywillresultinnoscorebeingtakenfromthenestedquery.
www.EBooksWorld.ir
www.EBooksWorld.ir
Usingtheparent-childrelationshipIntheprevioussection,wediscussedusingElasticsearchtoindexthenesteddocumentsalongwiththeparentone.However,eventhoughthenesteddocumentsareindexedasseparatedocumentsintheindex,wecan’tchangeasinglenesteddocument(unlessweusetheupdateAPI).Elasticsearchallowsustohavearealparent-childrelationshipandwewilllookatitinthefollowingsection.
www.EBooksWorld.ir
IndexstructureanddataindexingLet’susethesameexamplethatweusedwhendiscussingthenesteddocuments–thehypotheticalclothstore.Whatwewouldliketohaveistheabilitytoupdatethesizesandcolorswithouttheneedtoindexthewholeparentdocumentaftereachchange.WewillseehowtoachievethatusingElasticsearchparent-childfunctionality.
ChildmappingsFirstwehavetocreateachildindexdefinition.Tocreatechildmappings,weneedtoaddthe_parentpropertywiththenameoftheparenttype,whichwillbeclothinourcase.Inthechildrendocuments,wewanttohavethesizeandthecolorofthecloth.So,thecommandthatwillcreatetheshopindexandthevariationtypewilllookasfollows:
curl-XPOST'localhost:9200/shop'
curl-XPUT'localhost:9200/shop/variation/_mapping'-d'{
"variation":{
"_parent":{"type":"cloth"},
"properties":{
"size":{"type":"string","index":"not_analyzed"},
"color":{"type":"string","index":"not_analyzed"}
}
}
}'
Andthat’sall.Youdon’tneedtospecifywhichfieldwillbeusedtoconnectthechilddocumentstotheparentones.Bydefault,Elasticsearchwillusethedocuments’uniqueidentifierforthat.Ifyourememberfromthepreviouschapters,theinformationaboutauniqueidentifierispresentintheindexbydefault.
ParentmappingsTheonlyfieldweneedtohaveinourparentdocumentisname.Wedon’tneedanythingmorethanthat.So,inordertocreateourclothtypeintheshopindex,wewillrunthefollowingcommands:
curl-XPUT'localhost:9200/shop/cloth/_mapping'-d'{
"cloth":{
"properties":{
"name":{"type":"string"}
}
}
}'
TheparentdocumentNowwearegoingtoindexourparentdocument.Aswewanttostoretheinformationaboutthesizeandthecolorinthechilddocuments,theonlythingweneedtohaveintheparentdocumentsisthename.Ofcourse,thereisonethingtoremember–ourparentdocumentsneedtobeoftypecloth,becauseofthe_parentpropertyvalueinthechildmappings.Theindexingcommandforourparentdocumentisverysimpleandlooksasfollows:
www.EBooksWorld.ir
curl-XPOST'localhost:9200/shop/cloth/1'-d'{
"name":"Testshirt"
}'
Ifyoulookattheprecedingcommand,you’llnoticethatourdocumentwillbegiventheidentifier1.
ChilddocumentsToindexthechilddocuments,weneedtoprovideinformationabouttheparentdocumentwiththeuseoftheparentrequestparameter.Thevalueoftheparentparametershouldpointtotheidentifieroftheparentdocument.So,toindextwochilddocumentstoourparentdocument,weneedtorunthefollowingcommandlines:
curl-XPOST'localhost:9200/shop/variation/1000?parent=1'-d'{
"color":"red",
"size":"XXL"
}'
curl-XPOST'localhost:9200/shop/variation/1001?parent=1'-d'{
"color":"black",
"size":"XL"
}'
Andthat’sall.We’veindexedtwoadditionaldocuments,whichareofourvariationtype,butwe’vespecifiedthatourdocumentshaveaparent,thedocumentwithanidentifierof1.
www.EBooksWorld.ir
QueryingWe’veindexedourdataandnowweneedtouseappropriatequeriestomatchthedocumentswiththedatastoredintheirchildren.Thisisbecause,bydefault,Elasticsearchsearchesonthedocumentswithoutlookingattheparent-childrelations.Forexample,thefollowingquerywillmatchallthreedocumentsthatwe’veindexed(twochildrenandoneparent):
curl-XGET'localhost:9200/shop/_search?q=*&pretty'
Thisisnotwhatwewouldliketoachieve,atleastinmostcases.Usually,weareinterestedinparentdocumentsthathavechildrenmatchingthequery.OfcourseElasticsearchprovidessuchfunctionalitieswithspecializedtypesofqueries.
NoteThethingtorememberthoughisthat,whenrunningqueriesagainstparents,thechildrendocumentswon’tbereturned,andviceversa.
QueryingdatainthechilddocumentsImaginethatwewanttogetclothesthatareoftheXXLsizeandarered.Asyourecall,thesizeandthecoloroftheclothareindexedinthechilddocuments,soweneedaspecializedhas_childquery,tocheckwhichparentdocumentshavechildrenwiththedesiredsizeandcolor.Soanexamplequerythatmatchesourrequirementlooksasfollows:
curl-XGET'localhost:9200/shop/_search?pretty'-d'{
"query":{
"has_child":{
"type":"variation",
"query":{
"bool":{
"must":[
{"term":{"size":"XXL"}},
{"term":{"color":"red"}}
]
}
}
}
}
}'
Thequeryisquitesimple;itisofthehas_childtype,whichtellsElasticsearchthatwewanttosearchinthechilddocuments.Inordertospecifywhichtypeofchildrenweareinterestedin,wespecifythetypepropertywiththenameofthechildtype.Thequeryisprovidedusingthequeryproperty.We’veusedastandardboolquery,whichwe’vealreadydiscussed.Theresultofthequerywillcontainonlythoseparentdocumentsthathavechildrenmatchingourboolquery.Inourcase,thesingledocumentreturnedlooksasfollows:
{
www.EBooksWorld.ir
"took":16,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":1.0,
"hits":[{
"_index":"shop",
"_type":"cloth",
"_id":"1",
"_score":1.0,
"_source":{
"name":"Testshirt"
}
}]
}
}
Thehas_childqueryallowsustoprovideadditionalparameterstocontrolitsbehavior.Everyparentdocumentfoundmaybeconnectedwithoneormorechilddocuments.Thismeansthateverychilddocumentcaninfluencetheresultingscore.Bydefault,thequerydoesn’tcareaboutthechildrendocuments,howmanyofthemmatched,andwhatistheircontent–itonlymattersiftheymatchthequeryornot.Thiscanbechangedbyusingthescore_modeparameter,whichcontrolsthescorecalculationofthehas_childquery.Thevaluesthisparametercantakeare:
none:Thedefaultone,thescoregeneratedbytherelationis1.0min:Thescoreistakenfromthelowestscoredchildmax:Thescoreistakenfromthehighestscoredchildsum:Thescoreiscalculatedasthesumofthechildscoresavg:Thescoreistakenastheaverageofthechildscores
Let’sseeanexample:
curl-XGET'localhost:9200/shop/_search?pretty'-d'{
"query":{
"has_child":{
"type":"variation",
"score_mode":"sum",
"query":{
"bool":{
"must":[
{"term":{"size":"XXL"}},
{"term":{"color":"red"}}
]
}
}
}
}
}'
www.EBooksWorld.ir
Weusedsumasscore_modewhichresultsinchildrencontributingtothefinalscoreoftheparentdocument–thecontributionisthesumofscoresofeverychilddocumentmatchingthequery.
Andfinally,wecanlimitthenumberofchildrendocumentsthatneedtobematched;wecanspecifyboththemaximumnumberofthechildrendocumentsallowedtobematched(themax_childrenproperty)andtheminimumnumberofchildrendocuments(themin_childrenproperty)thatneedtobematched.Thequeryillustratingtheusageoftheseparametersisasfollows:
curl-XGET'localhost:9200/shop/_search?pretty'-d'{
"query":{
"has_child":{
"type":"variation",
"min_children":1,
"max_children":3,
"query":{
"bool":{
"must":[
{"term":{"size":"XXL"}},
{"term":{"color":"red"}}
]
}
}
}
}
}'
QueryingdataintheparentdocumentsSometimes,wearenotinterestedintheparentdocumentsbutinthechildrendocuments.Ifyouwouldliketoreturnthechilddocumentsthatmatchesagivendataintheparentdocument,Elasticsearchhasaqueryforus–thehas_parentquery.Itissimilartothehas_childquery;however,insteadofthetypeproperty,wespecifytheparent_typepropertywiththevalueoftheparentdocumenttype.Forexample,thefollowingquerywillreturnboththechilddocumentsthatwe’veindexed,butnottheparentdocument:
curl-XGET'localhost:9200/shop/_search?pretty'-d'{
"query":{
"has_parent":{
"parent_type":"cloth",
"query":{
"term":{"name":"test"}
}
}
}
}'
TheresponsefromElasticsearchwillbesimilartothefollowingone:
{
"took":3,
"timed_out":false,
"_shards":{
www.EBooksWorld.ir
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":2,
"max_score":1.0,
"hits":[{
"_index":"shop",
"_type":"variation",
"_id":"1000",
"_score":1.0,
"_routing":"1",
"_parent":"1",
"_source":{
"color":"red",
"size":"XXL"
}
},{
"_index":"shop",
"_type":"variation",
"_id":"1001",
"_score":1.0,
"_routing":"1",
"_parent":"1",
"_source":{
"color":"black",
"size":"XL"
}
}]
}
}
Similartothehas_childquery,thehas_parentqueryalsogivesusthepossibilityoftuningthescorecalculationofthequery.Inthiscase,score_modehasonlytwooptions:none,thedefaultonewherethescorecalculatedbythequeryisequalto1.0,andscore,whichcalculatesthescoreofthedocumentonthebasisoftheparentdocumentcontents.Anexamplethatusesscore_modeinthehas_parentquerylooksasfollows:
curl-XGET'localhost:9200/shop/_search?pretty'-d'{
"query":{
"has_parent":{
"parent_type":"cloth",
"score_mode":"score",
"query":{
"term":{"name":"test"}
}
}
}
}'
Theonedifferencewiththepreviousexampleisscore_mode.Ifyouchecktheresultsofthesequeries,you’llnoticeonlyasingledifference.Thescoreofallthedocumentsfromthefirstexampleis1.0,whilethescorefortheresultsreturnedbytheprecedingqueryisequalto0.8784157.Inthiscase,allthedocumentsfoundhavethesamescore,because
www.EBooksWorld.ir
theyhaveacommonparentdocument.
www.EBooksWorld.ir
PerformanceconsiderationsWhenusingElasticsearchparent-childfunctionality,youhavetobeawareoftheperformanceimpactthatithas.Thefirstthingyouneedtorememberisthattheparentandthechilddocumentsneedtobestoredinthesameshardinorderforthequeriestowork.Ifyouhappentohaveahighnumberofchildrenforasingleparent,youmayendupwithshardsnothavingasimilarnumberofdocuments.Becauseofthat,yourqueryperformancecanbelowerononeofthenodes,resultinginthewholequerybeingslower.Also,rememberthatparent-childquerieswillbeslowerthanonesthatrunagainstthedocumentsthatdon’thavearelationshipbetweenthem.Thereisawayofspeedingupjoinsfortheparent-childqueriesatthecostofmemorybyeagerlyloadingthesocalledglobalordinals;however,wewilldiscussthatmethodintheElasticsearchcachessectionofChapter9,ElasticsearchClusterinDetail.
Finally,thefirstquerywillpreloadandcachethedocumentidentifiersusingthedocvalues.Thistakestime.Inordertoimprovetheperformanceofinitialqueriesthatusetheparent-childrelationship,WarmerAPIcanbeused.YoucanfindmoreinformationabouthowtoaddwarmingqueriestoElasticsearchintheWarmingupsectionofChapter10,AdministratingYourCluster.
www.EBooksWorld.ir
www.EBooksWorld.ir
ModifyingyourindexstructurewiththeupdateAPIInthepreviouschapters,wediscussedhowtocreateindexmappingsandindexthedata.Butwhatifyoualreadyhavethemappingscreated,anddataindexed,butyouwanttomodifythestructureoftheindex?Ofcourseonecouldsaythatwecouldjustcreateanewindexwithnewmappings,butthatisnotalwaysapossibility,especiallyinaproductionenvironment.Thisispossibletosomeextent.Forexample,bydefault,ifweindexadocumentwithanewfield,Elasticsearchwilladdthatfieldtotheindexstructure.Let’snowlookathowtomodifytheindexstructuremanually.
NoteForsituationswheremappingchangesareneededandtheyarenotpossiblebecauseofconflictswiththecurrentindexstructure,itisverygoodtousealiases–bothreadandwriteones.WewilldiscussaliasingintheIndexaliasingsectionofChapter10,AdministratingYourCluster.
www.EBooksWorld.ir
ThemappingsLet’sassumethatwehavethefollowingmappingsforourusersindexstoredintheuser.jsonfile:
{
"user":{
"properties":{
"name":{"type":"string"}
}
}
}
Asyoucansee,itisverysimple.Itjusthasasinglepropertythatwillholdtheusername.Nowlet’screateanindexcalledusersandlet’susetheprecedingmappingstocreateourtype.Todothat,wewillrunthefollowingcommands:
curl-XPOST'localhost:9200/users'
curl-XPUT'localhost:9200/users/user/_mapping'[email protected]
Ifeverythinggoeswell,wewillhaveourindex(calledusers)andtype(calleduser)created.Sonowlet’strytoaddanewfieldtothemappings.
AddinganewfieldtotheexistingindexInordertoillustratehowtoaddanewfieldtoourmappings,weassumethatwewanttoaddaphonenumbertothedatastoredforeachuser.Inordertodothat,weneedtosendanHTTPPUTcommandtothe/index_name/type_name/_mappingRESTendpointwiththeproperbodythatwillincludeournewfield.Forexample,toaddthementionedphonefield,wewillrunthefollowingcommand:
curl-XPUT'http://localhost:9200/users/user/_mapping'-d'{
"user":{
"properties":{
"phone":{"type":"string",index:"not_analyzed"}
}
}
}'
Similartothepreviouscommandweran,ifeverythinggoeswell,weshouldhaveanewfieldaddedtoourindexstructure.
NoteOfcourse,Elasticsearchwon’treindexourdataorpopulatethenewlyaddedfieldautomatically.Itwilljustalterthemappingsheldbythemasternodeandpopulatethemappingstoalltheothernodesintheclusterandthat’sall.Datareindexationmustbedonebyusortheapplicationthatindexesthedatainourenvironment.Untilthen,theolddocumentswon’thavethenewlyaddedfield.Thisiscrucialtoremember.Ifyoudon’thavetheoriginaldocuments,youcanusethe_sourcefieldtogettheoriginaldatafromElasticsearchandindexthemonceagain.
Toensureeverythingisokay,wecanruntheGETHTTPrequesttothe_mappingRESTend
www.EBooksWorld.ir
pointandElasticsearchwillreturntheappropriatemappings.Anexamplecommandtogetthemappingsforourusertypeintheusersindexwilllookasfollows:
curl-XGET'localhost:9200/users/user/_mapping?pretty'
ModifyingfieldsofanexistingindexOurusersindexstructurecontainstwofields:nameandphone.Let’simaginethatweindexedsomedatabutafterawhilewedecidedthatwewanttosearchonthephonefieldandwewouldliketochangeitsindexpropertyfromnot_analyzedtoanalyzed.Becausewealreadyknowhowtoaltertheindexstructure,wewillrunthefollowingcommand:
curl-XPUT'http://localhost:9200/users/user/_mapping?pretty'-d'{
"user":{
"properties":{
"phone":{"type":"string","store":"yes","index":"analyzed"}
}
}
}'
WhatElasticsearchwillreturnisaresponseindicatinganerror,whichlooksasfollows:
{
"error":{
"root_cause":[{
"type":"illegal_argument_exception",
"reason":"Mapperfor[phone]conflictswithexistingmappingin
othertypes:\n[mapper[phone]hasdifferent[index]values,mapper[phone]
hasdifferent[store]values,mapper[phone]hasdifferent[omit_norms]
values,cannotchangefromdisabletoenabled,mapper[phone]hasdifferent
[analyzer]]"
}],
"type":"illegal_argument_exception",
"reason":"Mapperfor[phone]conflictswithexistingmappinginother
types:\n[mapper[phone]hasdifferent[index]values,mapper[phone]has
different[store]values,mapper[phone]hasdifferent[omit_norms]values,
cannotchangefromdisabletoenabled,mapper[phone]hasdifferent
[analyzer]]"
},
"status":400
}
Thisisbecausewecan’tchangeafieldthatwassettobenot_analyzedtoonethatisanalyzed.Andnotonlythat,inmostcasesyouwon’tbeabletoupdatethefieldsmapping.Thisisagoodthing,becauseifwewouldbeallowedtochangesuchsettings,wewouldconfuseElasticsearchandLucene.Imaginethatwealreadyhavemanydocumentswiththephonefieldsettonot_analyzedandweareallowedtochangethemappingstoanalyzed.Elasticsearchwouldn’tchangethedatathatwasalreadyindexed,butthequeriesthatareanalyzedwouldbeprocessedwithadifferentlogicandthusyouwouldn’tbeabletoproperlyfindyourdata.
However,togiveyousomeexamplesofwhatisprohibitedandwhatisnot,wedecidedtomentionsomeoftheoperationsforboththecases.Forexample,thefollowingmodificationcanbesafelymade:
www.EBooksWorld.ir
AddinganewtypedefinitionAddinganewfieldAddinganewanalyzer
Thefollowingmodificationsareprohibitedorwillnotwork:
EnablingnormsforafieldChangingafieldtobestoredornotstoredChangingthetypeofthefield(forexample,fromtexttonumeric)ChangingastoredfieldtonotstoredandviceversaChangingthevalueofindexedpropertyChangingtheanalyzerofanalreadyindexeddocument
RememberthattheprecedingmentionedexamplesofallowedandnotallowedupdatesdonotmentionallthepossibilitiesofupdateAPIusageandyouhavetotryforyourselfiftheupdateyouaretryingtodowillwork.
www.EBooksWorld.ir
www.EBooksWorld.ir
SummaryThechapteryoujustfinishedreadingconcentratedonindexingoperationsandhandlingdatathatisnotflatorhaverelationshipsbetweenthedocuments.Westartedwithindexingtree-likestructuresandobjectsinElasticsearch.Wealsousednestedobjectsandlearnedwhentheycanbeused.Wealsousedparent-childfunctionalityandwelearnedhowthisapproachisdifferentcomparedtonesteddocuments.Finally,wemodifiedourindicesstructurewithacallofanAPIandlearnedwhenthisispossible.
Inthenextchapter,wewillgetbacktoqueryingrelatedtopics.WewilllearnhowLucenescoringworks,howtousescriptsinElasticsearch,andhowtohandlemultilingualdata.Wewillaffectscoringusingboostsandwewillusesynonymstoimproveusers’searchresults.Finally,wewilllookatwhatwecandotoseehowourdocumentswerescored.
www.EBooksWorld.ir
www.EBooksWorld.ir
Chapter6.MakeYourSearchBetterInthepreviouschapter,wewerefocusedonindexingoperations;welearnedhowtohandlethestructureddata.Westartedwithindexingtree-likestructuresandJSONobjects.Weusednestedobjectsandindexeddocumentsusingparent-childfunctionality.Finally,attheendofthechapter,weusedElasticsearchAPItomodifyourindicesstructures.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:
UnderstandinghowApacheLucenescoringworksUsingscriptingHandlingmultilingualdataUsingboostingtoaffectdocumentscoringUsingsynonymsUnderstandinghowyourdocumentswerescored
www.EBooksWorld.ir
IntroductiontoApacheLucenescoringWhentalkingaboutqueriesandtheirrelevance,wecan’tomittheinformationaboutthescoringandwhereitcomesfrom.Butwhatisascore?Thescoreisapropertythatdescribestherelevanceofadocumentinthecontextofaquery.Inthefollowingsection,wewilltalkaboutthedefaultApacheLucenescoringmechanism–theTF/IDFalgorithmandhowitaffectsthereturneddocument.
NoteTheTF/IDFisnottheonlyavailablealgorithmexposedbyElasticsearch.Formoreinformationabouttheavailablemodels,refertotheAvailablesimilaritymodelssectioninChapter2,IndexingYourData.YoucanalsorefertothebooksMasteringElasticsearchandMasteringElasticsearchSecondEditionpublishedbyPacktPublishing.
www.EBooksWorld.ir
WhenadocumentismatchedWhenadocumentisreturnedbyLucene,itmeansthatitmatchedthequerywesenttoit.Inmostcases,eachoftheresultingdocumentsintheresponseisgivenascore.Thehigherthescore,themorerelevantthedocumentisfromthesearchengine’spointofview,ofcourse,inthecontextofagivenquery.Thismeansthatthescorefactorcalculatedforthesamedocumentontwodifferentquerieswillbedifferent.Becauseofthat,comparingscoresbetweenqueriesusuallydoesn’tmakemuchsense.However,let’sgetbacktothescoring.Tocalculatethescorepropertyforadocument,multiplefactorsaretakenintoaccount:
documentboost:Theboostvaluegivenforadocumentduringindexing.fieldboost:Theboostvaluegivenforafieldduringqueryingandindexing.coord:Thecoordinationfactorthatisbasedonthenumberoftermsthedocumenthas.Itisresponsibleforgivingmorevaluetothedocumentsthatcontainmoresearchtermscomparedtotheotherdocuments.inversedocumentfrequency:Thetermbasedfactorthattellsthescoringformulahowrareforscorepropertycalculation:inversedocumentfrequency”thegiventermis.Thehighertheinversedocumentfrequencythelesscommonthetermis.lengthnorm:Thefieldbasedfactorfornormalizationbasedonthenumberoftermsthegivenfieldcontains.Thelongerthefield,thesmallerboostthisfactorwillgive.Itbasicallymeansthattheshorterdocumentswillbefavored.termfrequency:Thetermbasedfactordescribinghowmanytimesthegiventermoccursinadocument.Thehigherthetermfrequency,thehigherthescoreofthedocumentwillbe.querynorm:Thequerybasednormalizationfactorthatiscalculatedasthesumofthesquaredweightofeachofthequeryterms.Querynormisusedtoallowscorecomparisonbetweenqueries,whichwesaidisnotalwayseasyorpossible.
www.EBooksWorld.ir
DefaultscoringformulaThepracticalformulafortheTF/IDFalgorithmlooksasfollows:
Toadjustyourqueryrelevance,youdon’tneedtorememberthedetailsoftheequation,butitisveryimportanttoknowhowitworks–toatleastbeawarethatthereisanequationyoucananalyze.Wecanseethatthescorefactorforthedocumentisafunctionofqueryqanddocumentd.Therearealsotwofactorsthatarenotdependentdirectlyonqueryterms:coordandqueryNorm.Thesetwoelementsoftheformulaaremultipliedbythesumcalculatedforeachterminthequery.Thesumontheotherhandiscalculatedbymultiplyingthetermfrequencyforthegiventerm,itsinversedocumentfrequency,termboost,andthenorm,whichisthelengthnormwediscussedpreviously.
NoteNotethattheprecedingformulaisapracticalone.YoucanfindmoreinformationabouttheconceptualformulainLuceneJavadocsathttp://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
Thegoodthingabouttheprecedingrulesisthatyoudon’tneedtorememberallofthat.Whatyoushouldbeawareofiswhatmatterswhenitcomestothedocumentscore.Basically,thereareafewruleswhichcomefromtheprecedingmentionedequation:
Therarerthematchedtermis,thehigherthescorethedocumentwillhaveTheshorterthedocumentfieldsare(thelesstermstheyhave),thehigherthescorethedocumentwillhaveThehighertheboostforthefieldsis,thehigherthescorethedocumentwillhave
Aswecansee,Lucenegivesahigherscoreforthedocumentsthathavemanyquerytermsmatchedandhaveshorterfields(lesstermsindexed)thatwereusedformatching,anditalsofavorsrarertermsinsteadofthecommonones(ofcourse,theonesthatmatched).
www.EBooksWorld.ir
RelevancymattersInmostcases,wewanttogetthebestmatchingdocuments.However,themostrelevantdocumentsdon’talwaysmeanthesameasthebestmatches.Someusecasesdefineverystrictrulesonwhyagivendocumentshouldbehigherontheresultslist.Forexample,onecouldsaythat,inadditiontothedocumentbeingaperfectmatchintermsofTF/IDFsimilarity,wehavepayingcustomerstoconsider.Dependingonthecustomerplan,wewanttogivemoreimportancetosuchdocuments.Insuchcases,wecouldwantthedocumentsforthecustomersthatpaythemosttobeontopofthesearchresults.Ofcourse,thisisnotrelevantinTF/IDF.
Theotherexampleisyellowpages,wherecustomerspayformoreinformationdescribingthedocument.SuchlargedocumentsmaynotbethemostrelevantonesaccordingtoTF/IDF,soyoumaywanttoadjustthescoringifyouareworkingwithsuchdata.
TheseareverysimpleexamplesandElasticsearchqueriescanbecomereallycomplicated.WewilltalkaboutsuchqueriesintheInfluencingscoreswithqueryboostssectioninthischapter.
Whenworkingonsearchrelevance,youshouldalwaysrememberthatitisnotaonetimeprocess.Yourdatawillchangewithtimeandyourquerieswillneedtobeadjusted.Inmostcases,tuningthequeryrelevancywillbeconstantwork.Youwillneedtoreacttoyourbusinessrulesandneeds,tohowtheusersbehave,andsoon.Itisveryimportanttorememberthatthisprocessisnotasingletimeoneaboutwhichyoucanforget.
www.EBooksWorld.ir
www.EBooksWorld.ir
ScriptingcapabilitiesofElasticsearchElasticsearchhasafewfunctionalitieswherescriptscanbeused.You’vealreadyseenexamplessuchasupdatingdocumentsandsearching.WewillalsousethescriptingcapabilitiesofElasticsearchwhenwediscussaggregations.Eventhoughscriptsseemtobearatheradvancedtopic,wewilllookatthepossibilitiesofferedbyElasticsearch.That’sbecausescriptsarepricelessincertainsituations.
Elasticsearchcanuseseverallanguagesforscripting.Whennotexplicitlydeclared,itassumesthatGroovy(www.groovy-lang.org/)isused.OtherlanguagesavailableoutoftheboxareLuceneexpressionlanguageandMustache(https://mustache.github.io/).Ofcoursewecanuseplugins,whichwillmakeElasticsearchunderstandadditionalscriptinglanguages,suchasJavaScript,MVEL,andPython.Thethingworthmentioningisthatindependentfromthescriptinglanguagethatwechoose,Elasticsearchexposesobjectsthatwecanuseinourscripts.Let’sstartbybrieflylookingatwhattypeofinformationweareallowedtouseinourscripts.
www.EBooksWorld.ir
ObjectsavailableduringscriptexecutionDuringdifferentoperations,Elasticsearchallowsustousedifferentobjectsinourscripts.Todevelopascriptthatfitsourusecase,weshouldbefamiliarwiththeseobjects.
Forexample,duringasearchoperation,thefollowingobjectsareavailable:
_doc(alsoavailableasdoc):Thisisaninstanceoftheorg.elasticsearch.search.lookup.LeafDocLookupobject.Itgivesusaccesstothecurrentdocumentfoundwiththecalculatedscoreandfieldvalues._source:Thisisaninstanceoftheorg.elasticsearch.search.lookup.SourceLookupobject.Itprovidesaccesstothesourceofthecurrentdocumentandthevaluesdefinedinthesource._fields:Thisisaninstanceoftheorg.elasticsearch.search.lookup.LeafFieldsLookupobject.Itcanbeusedtoaccessthevaluesofthedocumentfields.
Ontheotherhand,duringadocumentupdateoperation,theprecedingmentionedvariablesarenotaccessible.Elasticsearchexposesonlythectxobjectwiththe_sourceproperty,whichprovidesaccesstothedocumentcurrentlyprocessedintheupdaterequest.
Aswehavepreviouslyseen,severalmethodsarementionedinthecontextofdocumentfieldsandtheirvalues.Let’snowlookatexamplesofhowtogetthevalueforaparticularfieldusingthepreviouslymentionedobjectavailableduringthesearchoperation.Inthebracketsafterthescriptpiece,youcanseewhatElasticsearchwillreturnforoneofourexampledocumentsfromthelibraryindex(wewillusethedocumentwithidentifier4):
_doc.title.value(and)_source.title(crimeandpunishment)_fields.title.value(null)
Abitconfusing,isn’tit?Duringindexing,theoriginaldocumentisbydefaultstoredinthe_sourcefield.Ofcourse,bydefault,allthefieldsarepresentinthat_sourcefield.Inadditiontothat,thedocumentisparsedandeveryfieldmaybestoredinanindexifitismarkedasstored(thatis,ifthestorepropertyissettotrue;otherwise,bydefault,thefieldsarenotstored).Finally,thefieldvaluemaybeconfiguredasindexed.Thismeansthatthefieldvalueisanalyzedandplacedintheindex.Tosumup,onefieldmaylandinElasticsearchindexinthefollowingways:
Asapartofthe_sourcedocumentAsastoredandunparsedoriginalvalueAsanindexedvaluethatisprocessedbyananalyzer
Inscripts,wehaveaccesstoallthesefieldrepresentations.Theonlyexceptionistheupdateoperation,which,aswe’vementionedbefore,givesusonlyaccesstodocument_sourceaspartofthectxvariable.Youmaywonderwhichversionyoushoulduse.Well,ifyouwantaccesstotheprocessedform,theanswerwillbesimple–usethe_docobject.Whatabout_sourceand_fields?Inmostcases,_sourceisagoodchoice.Itisusually
www.EBooksWorld.ir
fastandneedslessdiskoperationsthanreadingtheoriginalfieldvaluesfromtheindex.Thisisespeciallytruewhenyouneedtoreadthevaluesofmultiplefieldsinyourscripts;fetchingasingle_sourcefieldisfasterthanfetchingmultipleindependentfieldsfromtheindex.
www.EBooksWorld.ir
ScripttypesElasticsearchallowsustousescriptsinthreedifferentways:
Inlinescripts:ThesourceofthescriptisdirectlydefinedinthequeryInfilescripts:ThesourceisdefinedintheexternalfileplacedintheElasticsearchconfig/scriptsdirectoryAsadocumentinthededicatedindex:Thesourceofthescriptisdefinedasadocumentinaspecialindexavailablebyusingthe/_scriptsAPIend-point
Choosingthewaytodefinescriptsdependsonseveralfactors.Ifyouhavescriptswhichyouwilluseinmanydifferentqueries,thefileorthededicatedindexseemtobethebestsolutions.Thescriptsinfileisprobablylessconvenient,butitispreferredfromthesecuritypointofview;theycan’tbeoverwrittenandinjectedintoyourquerycausingasecuritybreach.
InfilescriptsThisistheonlywaytoallowdynamicscriptingifwedon’twanttoenablequerydynamicscriptinginElasticsearch.Theideaisthateveryscriptusedbythequeriesisdefinedinitsownfileplacedintheconfig/scriptsdirectory.Wewillnowlookatthismethodofusingscripts.Let’screateanexamplefilecalledtag_sort.groovyandlet’splaceitintheconfig/scriptsdirectoryofourElasticsearchinstance(orinstancesifwerunacluster).Thecontentofthementionedfileshouldlooklikethis:
_doc.tags.values.size()>0?_doc.tags.values[0]:'\u19999'
Afterfewseconds,Elasticsearchwillautomaticallyloadanewfile.YoushouldseesomethinglikethefollowingintheElasticsearchlogs:
[2015-08-3013:14:33,005][INFO][script][AlexWilder]
compilingscriptfile[/Users/negativ/Developer/ES/es-
current/config/scripts/tag_sort.groovy]
NoteIfyouhavemulti-nodecluster,youhavetomakesurethatthescriptisavailableoneverynode.
Nowwearereadytousethisscriptinourqueries.YoumayrememberthatweusedexactlythesamescriptintheSortingdatasectioninChapter4,ExtendingYourQueryingKnowledge.Nowthemodifiedquerythatusesourscriptstoredinthefilelooksasfollows:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"match_all":{}
},
"sort":{
"_script":{
"script":{
"file":"tag_sort"
www.EBooksWorld.ir
},
"type":"string",
"order":"asc"
}
}
}'
Wewillreturntothis,butfirst,thenextpossiblewayofdefininginlinescripts.
InlinescriptsInlinescriptsareamoreconvenientwayofusingscripts,especiallyforconstantlychangingqueriesandforad-hocqueries.Themaindrawbackofsuchanapproachissecurity.Ifweallowuserstorunanykindofquery,includingscripts,wecanexposeourElasticsearchinstancetoattackers.SuchattackscanexecutearbitrarycodeontheserverrunningElasticsearchwithrightsequaltotheonesgiventotheuserrunningElasticsearch.Intheworstcasescenario,theattackercouldusesecurityholestogainsuperuserrights.Thisisthereasonwhyinlinescriptsaredisabledbydefault.Aftercarefulconsideration,youcanenablethembyadding:
script.inline:on
Addtheprecedingcommandlinetotheelasticsearch.ymlfile.
Afterallowingtheinlinescripttobeexecuted,wecanrunaquerythatlooksasfollows:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"match_all":{}
},
"sort":{
"_script":{
"script":{
"inline":"_doc.tags.values.size()>0?_doc.tags.values[0]:
\"\u19999\""
},
"type":"string",
"order":"asc"
}
}
}'
IndexedscriptsThelastoptionfordefiningscriptsisstoringtheminthededicatedElasticsearchindex.Forthesamesecurityreasons,dynamicexecutionoftheindexedscriptsisbydefaultdisabled.Toenabletheindexedscripts,wehavetoaddasimilarconfigurationoptiontotheoneweaddedtobeabletousetheinlinescripts.Weneedtoaddthefollowinglinetotheelasticsearch.ymlfile:
script.indexed:on
Afteraddingtheprecedingpropertytoallthenodesandrestartingthecluster,wewillbereadytostartusingtheindexedscripts.Elasticsearchprovidesanadditional,dedicated
www.EBooksWorld.ir
endpointforthispurpose.Let’sstoreourscript:
curl-XPOST'localhost:9200/_scripts/groovy/tag_sort'-d'{
"script":"_doc.tags.values.size()>0?_doc.tags.values[0]:
\"\u19999\""
}'
Thescriptisready,butlet’sdiscusswhatwejustdid.WesentanHTTPPOSTrequesttothespecial_scriptsRESTend-point.Wealsospecifiedthelanguageofthescript(groovyinourcase)andthenameofthescript(tag_sort).Thebodyoftherequestisthescriptitself.
Wecannowmoveontothequery,whichlooksasfollows:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"match_all":{}
},
"sort":{
"_script":{
"script":{
"id":"tag_sort"
},
"type":"string",
"order":"asc"
}
}
}'
Aswesee,thequeryispracticallyidenticaltothequeryusedwiththescriptdefinedinafile.Theonlydifferenceisthatweprovidedtheidentifierofthescriptusingtheidparameterinsteadofprovidingthefilename.
www.EBooksWorld.ir
QueryingwithscriptsIfwelookatanyrequestmadetoElasticsearchthatusesscripts,wewillnoticesomesimilarproperties,whichareasfollows:
script:Thispropertywrapsthescriptdefinition.inline:Thispropertyholdsthecodeofthescriptitself.id:Thispropertydefinestheidentifieroftheindexedscript.file:Thefilenameofthescriptwithouttheextension.lang:Thispropertydefinesthelanguageofthescript.Ifitisomitted,Elasticsearchassumesgroovy.params:Thisobjectcontainstheparametersandtheirvalues.Everydefinedparametercanbeusedinsidethescriptbyspecifyingthatparameter’sname.Theparametersallowustowritecleanercodewhichwillbeexecutedinamoreefficientmanner.Scriptsusingtheparametersareexecutedfasterthancodewithembeddedconstantsbecauseofcaching.
www.EBooksWorld.ir
ScriptingwithparametersAsourscriptsbecomemoreandmorecomplicated,theneedforcreatingmultiple,almostidenticalscriptscanappear.Thesescriptsusuallydifferinthevaluesused,withthelogicbehindthembeingexactlythesame.Inoursimpleexample,weusedahardcodedvalueusedtomarkdocumentswithemptytagslist.Let’schangethistoallowdefinitionofthehardcodedvalue.Let’suseinfilescriptdefinitionandcreateatag_sort_with_param.groovyfilewiththefollowingcontents:
_doc.tags.values.size()>0?_doc.tags.values[0]:tvalue
Theonlychangewe’vemadeistheintroductionoftheparameternamedtvalue,whichcanbesetinthequeryinthefollowingway:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"match_all":{}
},
"sort":{
"_script":{
"script":{
"file":"tag_sort_with_param",
"params":{
"tvalue":"000"
}
},
"type":"string",
"order":"asc"
}
}
}'
Theparamssectiondefinesallthescriptparameters.Inoursimpleexample,we’veonlyusedasingleparameter,butofcoursewecanhavemultipleparametersinasinglequery.
www.EBooksWorld.ir
ScriptlanguagesAswealreadysaid,thedefaultlanguageforscriptingisGroovy.However,wearenotlimitedtoonlyasinglescriptinglanguagewhenusingElasticsearch.Infact,ifyouwouldliketo,youcanevenuseJavatowriteyourscripts.Inadditiontothat,thecommunitybehindElasticsearchprovidesadditionallanguagessupportasplugins.Soifyouarewillingtoinstallplugins,youcanextendthelistofscriptinglanguagesthatElasticsearchsupportsevenfurther.YoumaywonderwhyyouwouldevenconsiderusingascriptinglanguageotherthanthedefaultGroovy.Thefirstreasonisyourownpreferences.Ifyouareapythonenthusiast,youareprobablynowthinkingabouthowtousepythonforyourElasticsearchscripts.Theotherreasoncouldbesecurity.Whenwetalkedabouttheinlinescripts,wetoldyouthattheyareturnedoffbydefault.Thisisnotexactlytrueforallthescriptinglanguagesavailableoutofthebox.TheinlinescriptsaredisabledbydefaultwhenusingGroovy,butyoucanuseLuceneexpressionsandMustachewithoutanyissues.Thisisbecausethoselanguagesaresandboxed,whichmeansthatthesecuritysensitivefunctionsareturnedoff.Andofcourse,thelastfactorwhenchoosingalanguageisperformance.Theoretically,thenativescripts(inJava)shouldhavebetterperformancethanothers,butyoushouldrememberthatthedifferencecanbeinsignificant.Youshouldalwaysconsiderthecostofdevelopmentandmeasureperformance.
www.EBooksWorld.ir
UsingotherthanembeddedlanguagesUsingGroovyforscriptingisasimpleandsufficientsolutionformostusecases.However,youmayhaveadifferentpreferenceandyoumayliketousesomethingdifferent,suchasJavaScript,Python,orMvel.Beforeusingotherlanguages,wemustinstallanappropriateplugin.YoucanreadmoredetailsaboutpluginsintheElasticsearchpluginssectionofChapter9,ElasticsearchCluster.Fornow,we’lljustrunthefollowingcommandfromtheElasticsearchdirectory:
bin/plugininstalllang-javascript
TheprecedingcommandwillinstallapluginthatwillallowtheusageofJavaScriptasthescriptinglanguage.Theonlychangeweshouldmakeintherequestistoaddtheadditionalinformationaboutthelanguageweareusingforscriptingand,ofcourse,modifythescriptitselftocorrectlyusethenewlanguage.Lookatthefollowingexample:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"match_all":{}
},
"sort":{
"_script":{
"script":{
"inline":"_doc.tags.values.length>0?_doc.tags.values[0]
:\"\u19999\";",
"lang":"javascript"
},
"type":"string",
"order":"asc"
}
}
}'
Asyoucansee,we’veusedJavaScriptforscriptinginsteadofthedefaultGroovy.ThelangparameterinformsElasticsearchaboutthelanguagebeingused.
www.EBooksWorld.ir
UsingnativecodeIncasethescriptsaretoosloworyoudon’tlikescriptinglanguages,ElasticsearchallowsyoutowriteJavaclassesandusetheminsteadofscripts.Therearetwopossiblewaysofaddingnativescripts:addingclassesdefiningscriptstoElasticsearchclasspathoraddingscriptasafunctionalityprovidedbyaplugin.Wewilldescribethissecondsolutionasitismoreelegant.
ThefactoryimplementationWeneedtoimplementatleasttwoclassestocreateanewnativescript.Thefirstoneisafactoryforourscript.Fornow,let’sfocusonit.Thefollowingsamplecodeillustratesthefactoryforourscript:
packagepl.solr.elasticsearch.examples.scripts;
importjava.util.Map;
importorg.elasticsearch.common.Nullable;
importorg.elasticsearch.script.ExecutableScript;
importorg.elasticsearch.script.NativeScriptFactory;
publicclassHashCodeSortNativeScriptFactoryimplementsNativeScriptFactory
{
@Override
publicExecutableScriptnewScript(@NullableMap<String,Object>params)
{
returnnewHashCodeSortScript(params);
}
@Override
publicbooleanneedsScores(){
returnfalse;
}
}
Theessentialpartsarehighlightedinthecodesnippet.Thisclassshouldimplementtheorg.elasticsearch.script.NativeScriptFactoryclass.Theinterfaceforcesustoimplementtwomethods.ThenewScript()methodtakestheparametersdefinedintheAPIcallandreturnsaninstanceofourscript.Finally,needsScores()informsElasticsearchifwewanttousescoringandwhetheritshouldbecalculated.
ImplementingthenativescriptNowlet’slookattheimplementationofourscript.Theideaissimple–ourscriptwillbeusedforsorting.DocumentswillbeorderedbythehashCode()valueofthechosenfield.Thedocumentswithoutavalueinthedefinedfieldwillbefirstontheresultslist.Weknowthelogicdoesn’tmaketoomuchsense,butitisgoodforpresentationasitissimple.Thesourcecodeforournativescriptlooksasfollows:
www.EBooksWorld.ir
packagepl.solr.elasticsearch.examples.scripts;
importjava.util.Map;
importorg.elasticsearch.script.AbstractSearchScript;
publicclassHashCodeSortScriptextendsAbstractSearchScript{
privateStringfield="name";
publicHashCodeSortScript(Map<String,Object>params){
if(params!=null&¶ms.containsKey("field")){
this.field=params.get("field").toString();
}
}
@Override
publicObjectrun(){
Objectvalue=source().get(field);
if(value!=null){
returnvalue.hashCode();
}
return0;
}
}
Firstofall,ourclassinheritsfromtheorg.elasticsearch.script.AbstractSearchScriptclassandimplementstherun()method.Thisiswherewegettheappropriatevaluesfromthecurrentdocument,processitaccordingtoourstrangelogic,andreturntheresult.Youmaynoticethesource()call.Itisexactlythesame_sourceparameterthatweusedwhendealingwithnon-nativescripts.Thedoc()andfields()methodsarealsoavailableandtheyfollowthesamelogicwedescribedearlier.
Thethingworthlookingatishowwe’veusedtheparameters.Weassumethatausercanputthefieldparameter,tellinguswhichdocumentfieldwillbeusedformanipulation.Wealsoprovideadefaultvalueforthisparameter.
TheplugindefinitionWesaidthatwewillinstallourscriptasapartofaplugin.Thisiswhyweneedadditionalfiles.ThefirstfileistheplugininitializationclasswherewetellElasticsearchaboutournewscript:
packagepl.solr.elasticsearch.examples.scripts;
importorg.elasticsearch.plugins.Plugin;
importorg.elasticsearch.script.ScriptModule;
publicclassScriptPluginextendsPlugin{
@Override
publicStringdescription(){
return"Theexampleofnativesortscript";
www.EBooksWorld.ir
}
@Override
publicStringname(){
return"naive-sort-plugin";
}
publicvoidonModule(finalScriptModulemodule){
module.registerScript("native_sort",
HashCodeSortNativeScriptFactory.class);
}
}
Theimplementationiseasy.Thedescription()andname()methodsareonlyforinformation,solet’sfocusontheonModule()method.Inourcase,weneedaccesstothescriptmodule–Elasticsearchservicewithscriptsandscriptinglanguages.ThisiswhywedefineonModule()withoneScriptModuleargument.ThankstoElasticsearchmagic,wecanusethismoduleandregisterourscriptsoitcanbefoundbytheengine.WehaveusedtheregisterScript()method,whichtakesthescriptnameandthepreviouslydefinedfactoryclass.
Thesecondneededfileisaplugindescriptorfile:plugin-descriptor.properties.ItdefinestheconstantsusedbytheElasticsearchpluginsubsystem.Withoutmorethinking,let’slookatthecontentsofthisfile:
jvm=true
classname=pl.solr.elasticsearch.examples.scripts.ScriptPlugin
elasticsearch.version=2.2.0
version=0.0.1-SNAPSHOT
name=native_script
description=ExampleNativeScripts
java.version=1.7
Theappropriatelineshavethefollowingmeaning:
jvm:tellsElasticsearchthatourfilecontainsJavacodeclassname:describesthemainclasswithplugindefinitionelasticsearch.versionandjava.version:tellsusabouttheElasticsearchversionthatissupportedbythepluginandtheJavaversionthatisneedednameanddescription:Informativenameandshortdescriptionofourplugin
Andthat’sit.Wehaveallthefilesneededtorunourscript.Pleasenotethatyoucanhavemorethanasinglescriptpackedasasingleplugin.
InstallingthepluginNowit’stimetoinstallournativescriptembeddedintheplugin.AfterpackingthecompiledclassesasaJARarchive,weshouldputitintheElasticsearchplugins/native-scriptdirectory.Thenative-scriptpartisarootdirectoryforourpluginandyoumaynameitasyouwish.Inthisdirectoryyoualsoneedthepreparedplugin-descriptor.propertiesfile.ThismakesourpluginvisibletoElasicsearch.
www.EBooksWorld.ir
RunningthescriptAfterrestartingElasticsearch(orthewholeclusterifyourunmorethanasinglenode),wecanstartsendingthequeriesthatuseournativescript.Forexample,wewillsendaquerythatusesourpreviouslyindexeddatafromthelibraryindex.Thisexamplequerylooksasfollows:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"match_all":{}
},
"sort":{
"_script":{
"script":{
"script":"native_sort",
"lang":"native",
"params":{
"field":"otitle"
}
},
"type":"string",
"order":"asc"
}
}
}'
Notetheparamspartofthequery.Inthiscall,wewanttosortontheotitlefield.Weprovidethescriptnamenative_sortandthescriptlanguagenative.Thisisrequired.Ifeverythinggoeswell,weshouldseeourresultssortedbyourcustomsortlogic.IfwelookattheresponsefromElasticsearch,wewillseethatthedocumentswithouttheotitlefieldareatthefirstfewpositionsoftheresultslistandtheirsortvalueis0.
www.EBooksWorld.ir
www.EBooksWorld.ir
SearchingcontentindifferentlanguagesUntilnow,whendiscussinglanguageanalysis,we’vetalkedmostlyabouttheory.Wedidn’tseeanexampleregardinglanguageanalysis,handlingmultiplelanguagesthatourdatacanconsistof,andsoon.Nowthiswillchange,asthissectionisdedicatedtoinformationabouthowwecanhandledatainmultiplelanguages.
www.EBooksWorld.ir
HandlinglanguagesdifferentlyAsyoualreadyknow,Elasticsearchallowsustochoosedifferentanalyzersforourdata.Wecanhaveourdatadividedonthebasisofwhitespaces,orhavethemlowercased,andsoon.Thiscanusuallybedoneregardlessofthelanguage–thesametokenizationonthebasisofwhitespaceswillworkforEnglish,German,andPolish,althoughitwon’tworkforChinese.However,whatifyouwanttofinddocumentsthatcontainwordssuchascatandcatsbyonlysendingthewordcattoElasticsearch?Thisiswherelanguageanalysiscomesintoplaywithstemmingalgorithmsfordifferentlanguages,whichallowtheanalyzedwordstobereducedtotheirrootforms.Andnowtheworstpart–wecan’tuseonegeneralstemmingalgorithmforallthelanguagesintheworld;wehavetochooseoneappropriatelanguage.Thefollowingsectionsinthechapterwillhelpyouwithsomepartsofthelanguageanalysisprocess.
www.EBooksWorld.ir
HandlingmultiplelanguagesThereareafewwaysofhandlingmultiplelanguagesinElasticsearchandallofthemhavesomeprosandcons.Wewon’tbediscussingeverything,butjustforthepurposeofgivingyouanidea,afewofthosemethodsareasfollows:
StoringdocumentsindifferentlanguagesasdifferenttypesStoringdocumentsindifferentlanguagesinseparateindicesStoringlanguagedataindifferentfieldsofasingledocument
Forthepurposeofthebook,wewillfocusonasinglemethod–theonethatallowsstoringdocumentsindifferentlanguagesinasingleindex.Wewillfocusonaproblemwherewehaveasingletypeofdocument,buteachdocumentmaycomefromanywhereintheworldandthuscanbewritteninmultiplelanguages.Also,wewouldliketoenableouruserstousealltheanalysiscapabilities,suchasstemmingandstopwordsfordifferentlanguages,notonlyforEnglish.
NoteNotethatthestemmingalgorithmsperformdifferentlyfordifferentlanguages,bothintermsofanalysisperformanceandtheresultingterms.Forexample,Englishstemmersareverygood,butyoucanrunintoissueswithEuropeanlanguages,suchasGerman.
www.EBooksWorld.ir
DetectingthelanguageofthedocumentBeforewecontinuewithshowingyouhowtosolveourproblemwithhandlingmultiplelanguagesinElasticsearch,wewouldliketotellyouaboutoneadditionalthing,thatislanguagedetection.Therearesituationswhereyoujustdon’tknowwhatlanguageyourdocumentorqueryarein.Insuchcases,languagedetectionlibrariesmaybeagoodchoice,especiallywhenusingJavaasyourprogramminglanguageofchoice.Someofthelibrariesareasfollows:
ApacheTika(http://tika.apache.org/)Languagedetection(https://github.com/shuyo/language-detection)
Thelanguagedetectionlibraryclaimstohaveover99percentprecisionfor53languages;that’salotifyouaskus.
Youshouldremember,though,thatdatalanguagedetectionwillbemorepreciseforlongertext.Becausethetextofqueriesisusuallyshort,youcanexpecttohavesomedegreeoferrorduringquerylanguageidentification.
www.EBooksWorld.ir
SampledocumentLet’sstartwithintroducingasampledocument,whichisasfollows:
{
"title":"Firsttestdocument",
"content":"Thisisatestdocument"
}
Asyoucansee,thedocumentisprettysimple;itcontainsthefollowingtwofields:
title:Thisfieldholdsthetitleofthedocumentcontent:Thisfieldholdstheactualcontentofthedocument
Thisdocumentisquitesimple,but,fromthesearchpointofview,theinformationaboutdocumentlanguageismissing.Whatweshoulddoisenrichthedocumentbyaddingtheneededinformation.Wecandothatbyusingoneofthepreviouslymentionedlibraries,whichwilltrytodetectthelanguage.
Afterwehavethelanguagedetected,weinformElasticsearchwhichanalyzershouldbeusedandmodifythedocumenttodirectlyshowthelanguageofeachfield.Eachofthefieldswouldhavetobeanalyzedbyalanguageanalyzerdedicatedtothedetectedlanguage.
NoteAfulllistoftheselanguageanalyzerscanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html).
Ifadocumentiswritteninalanguagethatwearenotsupporting,wewilljustfallbacktosomedefaultfieldwiththedefaultanalyzer.Forexample,ourprocessedandpreparedforindexingdocumentcouldlooklikethis:
{
"title_english":"Firsttestdocument",
"content_english":"Thisisatestdocument"
}
Thethingisthatallthisprocessingwe’vementionedwouldhavetobedoneoutsideofElasticsearchorinsomekindofcustompluginthatwouldimplementthementionedlogic.
NoteInthepreviousversionsofElasticsearch,therewasapossibilityofchoosingananalyzerbasedonthevalueofanadditionalfield,whichcontainedtheanalyzername.Thiswasamoreconvenientandelegantwaybutintroducedsomeuncertaintyaboutthefieldcontents.Youalwayshadtodeliveraproperanalyzerwhenusingthegivenfieldorstrangethingshappened.TheElasticsearchteammadethedifficultdecisionandremovedthisfeature.
Thereisalsoasimplerway:wecantakeourfirstdocumentandindexitinseveralwaysindependentlyfrominputlanguage.Let’sfocusonthissolution.
www.EBooksWorld.ir
ThemappingsTohandleoursolution,whichwillprocessthedocumentusingseveraldefinedlanguages,weneednewmappings.Let’slookatthemappingswe’vecreatedtoindexourdocuments(we’vestoredtheminthemappings.jsonfile):
{
"mappings":{
"doc":{
"properties":{
"title":{
"type":"string",
"index":"analyzed",
"fields":{
"english":{
"type":"string",
"index":"analyzed",
"analyzer":"english"
},
"russian":{
"type":"string",
"index":"analyzed",
"analyzer":"russian"
},
"german":{
"type":"string",
"index":"analyzed",
"analyzer":"german"
}
}
},
"content":{
"type":"string",
"index":"analyzed",
"fields":{
"english":{
"type":"string",
"index":"analyzed",
"analyzer":"english"
},
"russian":{
"type":"string",
"index":"analyzed",
"analyzer":"russian"
},
"german":{
"type":"string",
"index":"analyzed",
"analyzer":"german"
}
}
}
}
}
}
www.EBooksWorld.ir
}
Intheprecedingmappings,we’veshownthedefinitionforthetitleandcontentfields(ifyouarenotfamiliarwithanyaspectofmappingsdefinition,refertotheMappingsconfigurationsectionofChapter2,IndexingYourData).WehaveusedthemultifieldfeatureofElasticsearch:eachfieldcanbeindexedinseveralwaysusingvariouslanguageanalyzers(inourexample,thoseanalyzersare:English,Russian,andGerman).
Inaddition,thebasefieldusesthedefaultanalyzer,whichwemayuseatquerytimewhenthelanguageisunknown.So,eachfieldwillactuallyhavefourfields–thedefaultoneandthreelanguageorientedfields.
Inordertocreateasampleindexcalleddocsthatusesourmappings,wewillusethefollowingcommand:
curl-XPUT'localhost:9200/docs'[email protected]
www.EBooksWorld.ir
QueryingNowlet’sseehowwecanqueryourdatatousethenewlycreatedlanguagefields.Wecandividethequeryingsituationintotwodifferentcases.Ofcourse,tostartqueryingweneeddocuments.Let’sindexourexampledocumentbyrunningthefollowingcommand:
curl-XPOST'localhost:9200/docs/doc/1'-d'{"title":"Firsttest
document","content":"Thisisatestdocument"}'
QuerieswithanidentifiedlanguageThefirstcaseiswhenwehaveourquerylanguageidentified.Let’sassumethattheidentifiedlanguageisEnglish.Insuchcases,ourqueryisasfollows:
curl'localhost:9200/docs/_search?pretty'-d'{
"query":{
"match":{
"content.english":"documents"
}
}
}'
Thethingtoputemphasisonintheprecedingqueryisthefieldusedforqueryingandthequerytype.Thefieldusediscontent.english,whichalsoindicateswhichanalyzerwewanttouse.Weusedthatfieldbecausewehadidentifiedourlanguagebeforerunningthequery.Thankstothis,theEnglishanalyzercanfindourdocumentevenifwehavethesingularformofthewordinthedocument.TheresponsereturnedbyElasticsearchwillbeasfollows:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.19178301,
"hits":[{
"_index":"docs",
"_type":"doc",
"_id":"1",
"_score":0.19178301,
"_source":{
"title":"Firsttestdocument",
"content":"Thisisatestdocument"
}
}]
}
}
Thethingtonoteisalsothequerytype–thematchquery.Weusedthematchquery
www.EBooksWorld.ir
becauseitanalyzesitsbodywiththeanalyzerusedbythefieldthatitisrunagainst.Weneedthattoproperlymatchthedatainthequeryandthedataintheindex.
QuerieswithanunknownlanguageNowlet’slookatthesecondsituation–handlingquerieswhenwecouldn’tidentifythelanguageofthequery.Insuchcases,wecan’tusethefieldnamepointingtooneofthelanguages,suchascontent.german.Insuchacase,weusethedefaultfieldwhichusesthedefaultanalyzerandwesendthequerytothecontentfieldinstead.Thequerywilllookasfollows:
curl'localhost:9200/docs/_search?pretty'-d'{
"query":{
"match":{
"content":"documents"
}
}
}'
However,wedidn’tgetanyresultsthistimebecausethedefaultanalyzercan’tdealwithasingularformofawordwhenwearesearchingwithapluralform.
www.EBooksWorld.ir
CombiningqueriesToadditionallyboostthedocumentsthatperfectlymatchwithourdefaultanalyzer,wecancombinethetwoprecedingquerieswiththeboolquery.Suchacombinedquerywilllookasfollows:
curl-XGET'localhost:9200/docs/_search?pretty=true'-d'{
"query":{
"bool":{
"minimum_should_match":1,
"should":[
{
"match":{
"content.english":"documents"
}
},
{
"match":{
"content":"documents"
}
}
]
}
}
}'
Forthedocumenttobereturned,atleastoneofthedefinedqueriesmustmatch.Iftheybothmatch,thedocumentwillhaveahigherscorevalueandwillbeplacedhigherintheresults.
Thereisoneadditionaladvantageoftheprecedingcombinedquery.Ifourlanguageanalyzerdoesn’tfindadocument(forexample,whentheanalysisisdifferentfromtheoneusedduringindexing),thesecondqueryhasachancetofindthetermsthataretokenizedonlybywhitespacecharactersandlowercase.
www.EBooksWorld.ir
www.EBooksWorld.ir
InfluencingscoreswithqueryboostsInthebeginningofthischapter,welearnedwhatscoringisandhowElasticsearchusesthescoringformula.Whenanapplicationgrows,theneedforimprovingthequalityofsearchalsoincreases-wecallitsearchexperience.Weneedtogainknowledgeaboutwhatismoreimportanttotheuserandweseehowtheusersusethesearchesfunctionality.Thisleadstovariousconclusions;forexample,weseethatsomepartsofthedocumentsaremoreimportantthanothersorthatparticularqueriesemphasizeonefieldatthecostofothers.Weneedtoincludesuchinformationinourdataandqueriessothatbothsidesofthescoringequationareclosertoourbusinessneeds.Thisiswhereboostingcanbeused.
www.EBooksWorld.ir
TheboostBoostisanadditionalvalueusedintheprocessofscoring.Wealreadyknowitcanbeappliedto:
Query:Whenused,weinformthesearchenginethatthegivenqueryisapartofacomplexqueryandismoresignificantthantheotherparts.Document:Whenusedduringindexing,wetellElasticsearchthatadocumentismoreimportantthantheothersintheindex.Forexample,whenindexingblogposts,weareprobablymoreinterestedinthepoststhemselvesthanpingbacksorcomments.
Valuesassignedbyustoaqueryoradocumentarenottheonlyfactorsusedwhenwecalculatetheresultingscoreandweknowthat.Wewillnowlookatafewexamplesofqueryboosting.
www.EBooksWorld.ir
AddingtheboosttoqueriesLet’simaginethatourindexhastwodocumentsandwe’veusedthefollowingcommandstoindexthem:
curl-XPOST'localhost:9200/messages/email/1'-d'{
"id":1,
"to":"JohnSmith",
"from":"DavidJones",
"subject":"Topsecret!"
}'
curl-XPOST'localhost:9200/messages/email/2'-d'{
"id":2,
"to":"DavidJones",
"from":"JohnSmith",
"subject":"John,readthisdocument"
}'
Thisdataistrivial,butitshoulddescribeourproblemverywell.Nowlet’sassumewehavethefollowingquery:
curl-XGET'localhost:9200/messages/_search?pretty'-d'{
"query":{
"query_string":{
"query":"john",
"use_dis_max":false
}
}
}'
Inthiscase,Elasticsearchwillcreateaquerytothe_allfieldandwillfinddocumentsthatcontainthedesiredwords.Wealsosaidthatwedon’twantthedisjunctionquerytobeusedbyspecifyingtheuse_dis_maxparametertofalse(ifyoudon’trememberwhatadisjunctionqueryis,refertotheThedis_maxquerysectioninChapter3,SearchingYourData).Aswecaneasilyguess,bothourrecordswillbereturned.Therecordwithidentifierequalto2willbefirstbecausethewordJohnoccurstwotimes–onceinthefromfieldandonceinthesubjectfield.Let’scheckthisoutinthefollowingresult:
"hits":{
"total":2,
"max_score":0.13561106,
"hits":[{
"_index":"messages",
"_type":"email",
"_id":"2",
"_score":0.13561106,
"_source":{
"id":2,
"to":"DavidJones",
"from":"JohnSmith",
"subject":"John,readthisdocument"
}
},{
www.EBooksWorld.ir
"_index":"messages",
"_type":"email",
"_id":"1",
"_score":0.11506981,
"_source":{
"id":1,
"to":"JohnSmith",
"from":"DavidJones",
"subject":"Topsecret!"
}
}]
}
Iseverythingallright?Technically,yes.Butwethinkthattheseconddocument(theonewithidentifier1)shouldbepositionedasthefirstoneintheresultlist,becausewhensearchingforsomething,themostimportantfactor(inmanycases)ismatchingpeopleratherthanthesubjectofthemessage.Youcandisagree,butthisisexactlywhyfull-textsearchingrelevanceisadifficulttopic;sometimesitishardtotellwhichorderingisbetterforaparticularcase.Whatcanwedo?First,let’srewriteourquerytoimplicitlyinformElasticsearchwhatfieldsshouldbeusedforsearching:
curl-XGET'localhost:9200/messages/_search?pretty'-d'{
"query":{
"query_string":{
"fields":["from","to","subject"],
"query":"john",
"use_dis_max":false
}
}
}'
Thisisnotexactlythesamequeryasthepreviousone.Ifwerunit,wewillgetthesameresults(inourcase).However,ifyoulookcarefully,youwillnoticedifferencesinscoring.Inthepreviousexample,Elasticsearchonlyusedonefield,thatisthedefault_allfield.Thequerythatweareusingnowisusingthreefieldsformatching.Thismeansthatseveralfactors,suchasfieldlengths,arechanged.Anyway,thisisnotsoimportantinourcase.Elasticsearchunderthehoodgeneratesacomplexquerymadeofthreequeries–onetoeachfield.Ofcourse,thescorecontributedbyeachquerydependsonthenumberoftermsfoundinthisfieldandthelengthofthisfield.
Let’sintroducesomedifferencesbetweenthefieldsandtheirimportance.Comparethefollowingquerytothelastone:
curl-XGET'localhost:9200/messages/_search?pretty'-d'{
"query":{
"query_string":{
"fields":["from^5","to^10","subject"],
"query":"john",
"use_dis_max":false
}
}
}'
www.EBooksWorld.ir
Lookatthehighlightedparts(^5and^10).Byusingthatnotation(the^characterfollowedbyanumber),wecaninformElasticsearchhowimportantagivenfieldis.Weseethatthemostimportantfieldisthetofield(becauseofthehighestboostvalue).Nextwehavethefromfield,whichislessimportant.Thesubjectfieldhasthedefaultvalueforboost,whichis1.0andistheleastimportantfieldwhenitcomestoscorecalculation.Alwaysrememberthatthisvalueisonlyoneofthevariousfactors.Youmaybewonderingwhywechoose5andnot1000or1.23.Well,thisvaluedependsontheeffectwewanttoachieve,whatquerywehave,and,mostimportantly,whatdatawehaveinourindex.Typically,whendatachangesinthemeaningfulparts,weshouldprobablycheckandtuneourrelevanceonceagain.
Intheend,let’slookatasimilarexample,butusingtheboolquery:
curl-XGET'localhost:9200/messages/_search?pretty'-d'{
"query":{
"bool":{
"should":[
{"term":{"from":{"value":"john","boost":5}}},
{"term":{"to":{"value":"john","boost":10}}},
{"term":{"subject":{"value":"john"}}}
]
}
}
}'
Theprecedingquerywillyieldthesameresults,whichmeansthatthefirstdocumentontheresultslistwillbetheonewiththeidentifier1,butthescoreswillbeslightlydifferent.ThisisbecausetheLucenequeriesmadefromthelasttwoexamplesareslightlydifferentandthusthescoresaredifferent.
www.EBooksWorld.ir
ModifyingthescoreTheprecedingexampleshowshowtoaffecttheresultlistbyboostingparticularquerycomponents–thefields.Anothertechniqueistorunaqueryandaffectthescoreofthematcheddocuments.Inthefollowingsections,wewillsummarizethepossibilitiesofferedbyElasticsearch.Intheexamples,wewilluseourlibrarydatathatwehavealreadyusedinthepreviouschapters.
ConstantscorequeryAconstant_scorequeryallowsustotakeanyqueryandexplicitlysetthevaluethatshouldbeusedasthescorethatwillbegivenforeachmatchingdocumentbyusingtheboostparameter.
Atfirst,thisquerydoesn’tseemtobepractical.Butwhenwethinkaboutbuildingcomplexqueries,thisqueryallowsustosethowmanydocumentsmatchingthisquerycanaffectthetotalscore.Lookatthefollowingexample:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"constant_score":{
"query":{
"query_string":{
"query":"available:falseauthor:heller"
}
}
}
}
}'
Inourdata,wehavetwodocumentswiththeavailablefieldsettofalse.Oneofthesedocumentshasanadditionalvalueintheauthorfield.Ifweuseadifferentquery,thedocumentwithanadditionalvalueintheauthorfield(abookwithidentifier2)wouldbegivenahigherscore,but,thankstotheconstantscorequery,Elasticsearchwillignorethatinformationduringscoring.Bothdocumentswillbegivenascoreequalto1.0.
BoostingqueryThenexttypeofquerythatcanbeusedwithboostingistheboostingquery.Theideaistoallowustodefineapartofquerywhichwillcausematcheddocumentstohavetheirscoreslowered.Thefollowingexamplereturnsalltheavailablebooks(availablefieldsettotrue),butthebookswrittenbyE.M.Remarquewillhaveanegativeboostof0.1(whichmeansabouttentimeslowerscore):
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"boosting":{
"positive":{
"term":{
"available":true
}
},
www.EBooksWorld.ir
"negative":{
"match":{
"author":"remarque"
}
},
"negative_boost":0.1
}
}
}'
ThefunctionscorequeryTillnowwe’veseentwoexamplesofqueriesthatallowedustoalterthescoreofthereturneddocuments.Thethirdexamplewewantedtotalkabout,thefunction_scorequery,iswaymorecomplicatedthanthepreviouslydiscussedqueries.Thefunction_scorequeryisveryusefulwhenthescorecalculationismorecomplicatedthangivingasingleboosttoallthedocuments;boostingmorerecentdocumentsisanexampleofaperfectusecaseforthefunction_scorequery.
Structureofthefunctionquery
Thestructureofthefunctionqueryisquitesimpleandlooksasfollows:
{
"query":{
"function_score":{
"query":{...},
"functions":[
{
"filter":{...},
"FUNCTION":{...}
}
],
"boost_mode":"...",
"score_mode":"...",
"max_boost":"...",
"min_score":"...",
"boost":"..."
}
}
}
Ingeneral,thefunctionscorequerycanuseaquery,oneofseveralfunctions,andadditionalparameters.Eachfunctioncanhaveafilterdefinedtofiltertheresultsonwhichitwillbeapplied.Ifnofilterisgivenforafunction,itwillbeappliedtoallthedocuments.
Thelogicbehindthefunctionscorequeryisquitesimple.Firstofall,thefunctionsarematchedagainstthedocumentsandthescoreiscalculatedbasedonscore_mode.Afterthat,thequeryscoreforthedocumentiscombinedwiththescorecalculatedforthefunctionsandcombinedtogetheronthebasisofboost_mode.
Let’snowdiscusstheparameters:
Boostmode:Theboost_modeparameterallowsustodefinehowthescorecomputedbythefunctionquerieswillbecombinedwiththescoreofthequery.Thefollowing
www.EBooksWorld.ir
valuesareallowed:
multiply:Thedefaultbehavior,whichresultsinthequeryscorebeingmultipliedbythescorecomputedfromthefunctionsreplace:Thequeryscorewillbetotallyignoredandthedocumentscorewillbeequaltothescorecalculatedbythefunctionssum:Thedocumentscorewillbecalculatedasthesumofthequeryandthefunctionscoresavg:Thescoreofthedocumentwillbeanaverageofthequeryscoreandthefunctionscoremax:Thedocumentwillbegivenamaximumofqueryscoreandfunctionscoremin:Thedocumentwillbegivenaminimumofqueryscoreandfunctionscore
Scoremode:Thescore_modeparameterdefineshowthescorecomputedbythefunctionsarecombinedtogether.Thefollowingscore_modeparametervaluesaredefined:
multiply:Thedefaultbehaviorwhichresultsinthescoresreturnedbythefunctionsbeingmultipliedsum:Thescoresreturnedbythedefinedfunctionsaresummedavg:Thescorereturnedbythefunctionsisanaverageofallthescoresofthematchingfunctionsfirst:Thescoreofthefirstfunctionwithafiltermatchingthedocumentisreturnedmax:Themaximumscoreofthefunctionsisreturnedmin:Theminimumscoreofthefunctionsisreturned
Thereisonethingtoremember–wecanlimitthemaximumcalculatedscorevaluebyusingthemax_boostparameterinthefunctionscorequery.Bydefault,thatparameterissettoFloat.MAX_VALUE,whichmeansthemaximumfloatvalue.
Theboostparameterallowsustosetaquerywideboostforthedocuments.
Ofcourse,thereisonethingweshouldremember–thescorecalculateddoesn’taffectwhichdocumentsmatchedthequery.Becauseofthat,themin_scorepropertyhasbeenintroduced.Itallowsustodefinetheminimumscoreofthedocuments.Documentsthathaveascorelowerthanthemin_scorepropertywillbeexcludedfromtheresults.
Whatwehaven’ttalkedaboutyetarethefunctionscoresthatwecanincludeinthefunctionssectionofourquery.Thecurrentlyavailablefunctionsare:
weightfactorfieldvaluefactorscriptscorerandomdecay
Theweightfactorfunction
Theweightfactorfunctionallowsustomultiplythescoreofthedocumentbyagiven
www.EBooksWorld.ir
value.Thevalueoftheweightparameterisnotnormalizedandistakenasis.Anexampleusingtheweightfunction,wherewemultiplythescoreofthedocumentby20,looksasfollows:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"function_score":{
"query":{
"term":{
"available":true
}
},
"functions":[
{"weight":20}
]
}
}
}'
Fieldvaluefactorfunction
Thefield_value_factorfunctionallowsustoinfluencethescoreofthedocumentbyusingavalueofthefieldinthatdocument.Forexample,tomultiplythescoreofthedocumentbythevalueoftheyearfield,werunthefollowingquery:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"function_score":{
"query":{
"term":{
"available":true
}
},
"functions":[
{
"field_value_factor":{
"field":"year",
"missing":1
}
}
]
}
}
}'
Inadditiontochoosingthefieldwhosevalueshouldbeused,wecanalsocontrolthebehaviorofthefieldvaluefactorfunctionbyusingthefollowingproperties:
factor:Themultiplicationfactorthatwillbeusedalongwiththefieldvalue.Itdefaultsto1.modifier:Themodifierthatwillbeappliedtothefieldvalue.Itdefaultstonone.Itcantakethevalueoflog,log1p,log2p,ln,ln1p,ln2p,square,sqrt,andreciprocal.missing:Thevaluethatshouldbeusedwhenadocumentdoesn’thaveanyvaluein
www.EBooksWorld.ir
thefieldspecifiedinthefieldproperty.
Thescriptscorefunction
Thescript_scorefunctionallowsustouseascripttocalculatethescorethatwillbeusedasthescorereturnedbyafunction(andthuswillfallintobehaviordefinedbytheboost_modeparameter).Anexampleofscript_scoreusageisasfollows(forthefollowingexampletowork,inlinescriptingneedstobeallowed,whichmeansaddingthescript.inlinepropertyandsettingittooninelasticsearch.yml):
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"function_score":{
"query":{
"term":{
"available":true
}
},
"functions":[
{
"script_score":{
"script":{
"inline":"_score*_source.copies*parameter1",
"params":{
"parameter1":12
}
}
}
}
]
}
}
}'
Therandomscorefunction
Byusingtherandom_scorefunction,wecangenerateapseudorandomscore,byspecifyingaseed.Inordertosimulaterandomness,weshouldspecifyanewseedeverytime.Therandomnumberwillbegeneratedbyusingthe_uidfieldandtheprovidedseed.Ifaseedisnotprovided,thecurrenttimestampwillbeused.Anexampleofusingthisisasfollows:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"function_score":{
"query":{
"term":{
"available":true
}
},
"functions":[
{
"random_score":{
"seed":12345
}
www.EBooksWorld.ir
}
]
}
}
}'
Decayfunctions
Inadditiontotheearliermentionedscoringfunctions,Elasticsearchexposesadditionalones,calledthedecayfunctions.Thedifferencefromthepreviouslydescribedfunctionsisthatthescoregivenbythosefunctionslowerswithdistance.Thedistanceiscalculatedonthebasisofasinglevaluednumericfield(suchasadate,ageographicalpoint,orastandardnumericfield).Thesimplestexamplethatcomestomindisboostingdocumentsonthebasisofdistancefromagivenpointorboostingonthebasisofdocumentdate.
Forexample,let’sassumethatwehaveapointfieldthatstoresthelocationandwewantourdocument’sscoretobeaffectedbythedistancefromapointwheretheuserstands(forexample,ourusersendsaqueryfromamobiledevice).Assumingtheuserisat52,21,wecouldsendthefollowingquery:
{
"query":{
"function_score":{
"query":{
"term":{
"available":true
}
},
"functions":[
{
"linear":{
"point":{
"origin":"52,21",
"scale":"1km",
"offset":0,
"decay":0.2
}
}
}
]
}
}
}
Intheprecedingexample,thelinearisthenameofthedecayfunction.Thevaluewilldecaylinearlywhenusingit.Theotherpossiblevaluesaregaussandexp.We’vechosenthelineardecayfunctionbecauseofthefactthatitsetsthescoreto0whenthefieldvalueexceedsthegivenoriginvaluetwice.Thisisusefulwhenyouwanttolowerthevalueofthedocumentsthataretoofaraway.
NoteNotethatthegeographicalsearchingcapabilitiesofElasticsearchwillbediscussedintheGeosectionofChapter8,BeyondFull-textSearching.
www.EBooksWorld.ir
Nowlet’sdiscusstherestofthequerystructure.Thepointisthenameofthefieldwewanttouseforscorecalculation.Ifthedocumentdoesn’thaveavalueinthedefinedfield,itwillbegivenavalueof1forthetimeofcalculation.
Inadditiontothat,we’veprovidedadditionalparameters.Theoriginandscalearerequired.Theoriginparameteristhecentralpointfromwhichthecalculationwillbedoneandthescaleistherateofdecay.Bydefault,theoffsetissetto0.Ifdefined,thedecayfunctionwillonlycomputeascoreforthedocumentswithvaluegreaterthanthevalueofthisparameter.ThedecayparametertellsElasticsearchhowmuchthescoreshouldbeloweredandissetto0.5bydefault.Inourcase,we’vesaidthat,atthedistanceof1kilometer,thescoreshouldbereducedby20%(0.2).
NoteWeexpectthefunction_scorequerytobemodifiedandextendedwiththenextversionsofElasticsearch(justasitwaswithElasticsearchversion1.x).Wesuggestfollowingtheofficialdocumentationandthepagededicatedtothefunction_scorequeryathttps://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html.
www.EBooksWorld.ir
www.EBooksWorld.ir
Whendoesindex-timeboostingmakesense?Intheprevioussection,wediscussedboostingqueries.Thiskindofapproachtohandlingdifferencesintheweightofdocumentsisveryhandy,powerful,andeasytouse.Itisalsosufficientinmostsituations.However,therearecaseswhenamoreconvenientwayofdocumentsboostingisindex-timeboosting.Oneofsuchusecaseisthesituationwhenweknowwhichdocumentsareimportantduringtheindexingphase.Insuchacase,wecanpreparethedocumentboostandincludeitaspartofthedocument.Wegainaboostthatisindependentfromaqueryatthecostofreindexingthedocumentswhentheboostvalueischanged(becauseweneedtoapplythechangedboost).Inadditiontothat,theperformancegetsslightlybetterbecausesomepartsneededintheboostingprocessarealreadycalculatedatindextime,whichcanmatterwhenyourindiceshavealargenumberofdocuments.Informationabouttheboostisstoredasapartofthenormalizationfactorandbecauseofthatitisimportanttokeepthenormsturnedon.Thismeansthatwecan’tsetnorms.enabledtofalsebecausewewon’tbeabletouseindextimeboosting.
www.EBooksWorld.ir
DefiningboostinginthemappingsItisalsopossibletodirectlydefinethefield’sboostinourmappings.ThiswillresultinElasticsearchgivingaboostforallthedocumentshavingavalueinsuchafield.Ofcourse,thatwillalsohappenduringindexingtime.Thefollowingexampleillustratesthat:
{
"mappings":{
"book":{
"properties":{
"title":{"type":"string"},
"author":{"type":"string","boost":10.0}
}
}
}
}
Thankstotheprecedingboost,allquerieswillfavorvaluesfoundinthefieldnamedauthor.Thisalsoappliestoqueriesusingthe_allfield,becauseElasticsearchwillapplytheboosttovaluescopiedbetweenthefields.
www.EBooksWorld.ir
www.EBooksWorld.ir
WordswiththesamemeaningYoumayhaveheardaboutsynonyms,wordsthathavethesameorsimilarmeaning.Sometimesyouwouldwanttohavesomewordsmatchedwhenoneofthosewordsisenteredintothesearchbox.Let’srecalloursampledatafromChapter3,SearchingYourData.Therewasabookcalledcrimeandpunishment.Whatifwewantthatbooktonotonlybematchedwhenthewordscrimeorpunishmentareused,butalsowhenusingthewordssuchascriminalityandabuse.Atfirstglance,thismaynotsoundlikegoodbehavior,butsometimesthisisreallyneeded,especiallyinusecaseswheretherearemultiplewordsmeaningthesame(likeinmedicine).Tohandlesuchusecases,wewillusesynonyms.
www.EBooksWorld.ir
SynonymfilterSynonymsinElasticsearcharehandledontheanalysislevel–atbothindexandquerytime,byadedicatedsynonymsfilter.Tousethesynonymfilter,weneedtodefineourownanalyzer.Forexample,let’sdefineananalyzerthatwillbecalledsynonymandwillusethewhitespacetokenizerandasinglefiltercalledsynonym.Ourfilter’stypepropertyneedstobesettosynonym,whichtellsElasticsearchthatthisfilterisasynonymfilter.
Inadditiontothat,wewanttoignorecase,sothattheuppercasedandlowercasedsynonymsaretreatedequally(settheignore_casepropertytotrue).Todefineourcustomsynonymanalyzerthatusesasynonymfilterwhencreatinganewindex,wewouldusethefollowingcommand:
curl-XPOST'localhost:9200/test'-d'{
"index":{
"analysis":{
"analyzer":{
"synonym":{
"tokenizer":"whitespace",
"filter":[
"synonym"
]
}
},
"filter":{
"synonym":{
"type":"synonym",
"ignore_case":true,
"synonyms":[
"crime=>criminality"
]
}
}
}
}
}'
SynonymsinthemappingsInthedefinitionyou’vejustseen,we’vespecifiedthesynonymruleinthemappingswesendtoElasticsearch.Todothat,weneededtoaddthesynonymsproperty,whichisanarrayofsynonymrules.Forexample,thefollowingpartofthemappingsdefinitiondefinesasinglesynonymrule:
"synonyms":[
"crime=>criminality"
]
TheprecedingruletellsElasticsearchtochangethecrimetermtothecriminalitytermwhenthecrimetermisencounteredduringanalysis.
Synonymsstoredonthefilesystem
www.EBooksWorld.ir
Apartfromstoringthesynonymsrulesinthemappings,Elasticsearchallowsustouseafile-basedsynonymsruleset.Touseafile,weneedtospecifythesynonyms_pathpropertyinsteadofthesynonymsone.Thesynonyms_pathpropertyshouldbesettothenameofthefilethatholdsthesynonym’sdefinitionandthespecifiedfilepathisrelativetotheElasticsearchconfigdirectory.So,ifwestoreoursynonymsinthesynonyms.txtfileandwesavethatfileintheconfigdirectory,then,inordertouseit,weshouldsetsynonyms_pathtothevalueofsynonyms.txt.
Forexample,thisishowoursynonymfilterwouldlooklikeifwewanttousethesynonymsstoredinafile:
"filter":{
"synonym":{
"type":"synonym",
"synonyms_path":"synonyms.txt"
}
}
www.EBooksWorld.ir
DefiningsynonymrulesSofarwehavediscussedwhatwehavetodoinordertousesynonymexpansionsinElasticsearch.Nowlet’sseewhatformatsofsynonymsareallowed.
UsingApacheSolrsynonymsThemostcommonsynonymstructureintheApacheLuceneworldisprobablytheoneusedbyApacheSolr(http://lucene.apache.org/solr/),thesearchenginebuiltontopofLucene,justlikeElasticsearchis.ThisisthedefaultwayofhandlingsynonymsinElasticsearchandthepossibilitiesofdefininganewsynonymarediscussedinthefollowingsections.
Explicitsynonyms
Asimplemappingallowsustomapalistofwordsontootherwords.So,inourcase,ifwewantthewordcriminalitytobemappedtocrimeandthewordabusetobemappedtopunishment,weneedtodefinethefollowingentries:
criminality=>crime
abuse=>punishment
Ofcourse,asinglewordcanbemappedintomultipleonesandmultipleonescanbemappedintoasingleone.Forexample:
starwars,wars=>starwars
Theprecedingexamplemeansthatstarwarsandwarswillbechangedtostarwarsbythesynonymfilter.
Equivalentsynonyms
Inadditiontotheexplicitmapping,Elasticsearchallowsustouseequivalentsynonyms.Forexample,thefollowingdefinitionwillmakeallthewordsexchangeablesothatyoucanuseanyofthemtomatchadocumentthathasoneoftheminitscontents:
star,wars,starwars,starwars
Expandingsynonyms
AsynonymfilterallowsustouseoneadditionalpropertywhenitcomestoApacheSolrformatsynonyms–theexpandproperty.Whentheexpandpropertyissettotrue(bydefaultitissettofalse),allsynonymswillbeexpandedbyElasticsearchtoallequivalentforms.Forexample,let’ssaywehavethefollowingfilterconfiguration:
"filter":{
"synonym":{
"type":"synonym",
"expand":false,
"synonyms":[
"one,two,three"
]
}
}
www.EBooksWorld.ir
Elasticsearchwillmaptheprecedingsynonymdefinitiontothefollowing:
one,two,three=>one
Thismeansthatthewordsone,two,andthreewillbechangedtoone.However,ifwesettheexpandpropertytotrue,thesamesynonymdefinitionwillbeinterpretedinthefollowingway:
one,two,three=>one,two,three
Thisbasicallymeansthateachofthewordsfromtheleft-sideofthedefinitionwillbeexpandedtoallthewordsontheright-side.
UsingWordNetsynonymsIfwewanttouseWordNet-structured(tolearnmoreaboutWordNet,visithttp://wordnet.princeton.edu/)synonyms,weneedtoprovideanadditionalpropertyforoursynonymfilter.ThepropertynameisformatandweshouldsetitsvaluetowordnetinorderforElasticsearchtounderstandthatformat.
www.EBooksWorld.ir
Queryorindex-timesynonymexpansionAswithalltheanalyzers,onecanwonderwhentousethesynonymfilter–duringindexing,duringquerying,ormaybeduringindexingandquerying.Ofcourse,itdependsonyourneeds.However,rememberthatusingindex-timesynonymsrequiresdatareindexingaftereachsynonymchange.That’sbecausetheyneedtobereappliedtoallthedocuments.Ifweuseonlythequery-timesynonyms,wecanupdatethesynonym’slistsandhavethemappliedwithoutdatareindexation.
www.EBooksWorld.ir
www.EBooksWorld.ir
UnderstandingtheexplaininformationComparedtodatabases,usingsystemscapableofperformingfull-textsearchcanoftenbeanythingotherthanobvious.Wecansearchinmanyfieldssimultaneouslyandthedataintheindexcanvaryfromtheonesprovidedasthevaluesofthedocumentfields(becauseoftheanalysisprocess,synonyms,abbreviations,andothers).It’sevenworse!Bydefault,searchenginessortdatabyrelevance,whichmeansthateachdocumentisgivenanumberindicatinghowsimilarthedocumentistothequery.Thekeypointhereisunderstandingthehowsimilarphrase.Aswediscussedinthebeginningofthechapter,scoringtakesmanyfactorsintoaccount–howmanysearchedwordswerefoundinthedocument,howfrequentthewordis,howmanytermsareinthefield,andsoon.Thisseemscomplicatedandfindingoutwhyadocumentwasfoundandwhyanotherdocumentisbetterisnoteasy.Fortunately,Elasticsearchprovidesuswithtoolsthatcananswerthesequestionsandwewilllookattheminthissection.
www.EBooksWorld.ir
UnderstandingfieldanalysisOneofthecommonquestionsaskedwhenanalyzingthereturneddocumentsiswhyagivendocumentwasnotfound.Inmanycases,theproblemliesinthemappingsdefinitionandtheanalysisprocessconfiguration.Fordebuggingtheanalysisprocess,ElasticsearchprovidesadedicatedRESTAPIendpoint–the_analyzeone.
Usingitisverysimple.Let’sseehowitisusedbyrunningarequesttoElasticsearchtogiveusinformationonhowthecrimeandpunishmentphraseisanalyzed.Todothat,wewillrunacommandusingHTTPGETtothe_analyzeRESTend-pointandwewillprovidethephraseastherequestbody.Thefollowingcommanddoesthat:
curl-XGET'localhost:9200/_analyze?pretty'-d'CrimeandPunishment'
Inresponse,wegetthefollowingdata:
{
"tokens":[{
"token":"crime",
"start_offset":0,
"end_offset":5,
"type":"<ALPHANUM>",
"position":0
},{
"token":"and",
"start_offset":6,
"end_offset":9,
"type":"<ALPHANUM>",
"position":1
},{
"token":"punishment",
"start_offset":10,
"end_offset":20,
"type":"<ALPHANUM>",
"position":2
}]
}
Aswecansee,Elasticsearchdividedtheinputphraseintothreetokens.Duringprocessing,thephrasewasdividedintotokensonthebasisofwhitespacecharactersandwaslowercased.Thisshowsusexactlywhatwouldbehappeningduringtheanalysisprocess.Wecanalsoprovidethenameoftheanalyzer.Forexample,wecanchangetheprecedingcommandtosomethinglikethis:
curl-XGET'localhost:9200/_analyze?analyzer=standard&pretty'-d'Crimeand
Punishment'
Theprecedingcommandwillallowustocheckhowthestandardanalyzeranalyzesthedata.
ItisworthnotingthatthereisanotherformofanalysisAPIavailable–onewhichallowsustoprovidetokenizersandfilters.Itisveryhandywhenwewanttoexperimentwithconfigurationbeforecreatingthetargetmappings.Insteadofspecifyingtheanalyzer
www.EBooksWorld.ir
parameterintherequest,weprovidethetokenizerandthefiltersparameters.Wecanprovideasingletokenizerandalistoffilters(separatedbycommacharacter).Forexample,toillustratehowtokenizationusingwhitespacetokenizerworkswithlowercaseandkstemfilterswewouldrunthefollowingrequest:
curl-XGET'localhost:9200/library/_analyze?
tokenizer=whitespace&filters=lowercase,kstem&pretty'-d'JohnSmith'
Aswecansee,ananalysisAPIcanbeveryusefulfortrackingdownbugsinthemappingconfiguration.Itisalsopricelesswhenwewanttosolveproblemswithqueriesandmatching.Itcanshowushowouranalyzerswork,whattermstheyproduce,andwhattheattributesofthosetermsare.Withsuchinformation,analyzingthequeryproblemswillbeeasiertotrackdown.
www.EBooksWorld.ir
ExplainingthequeryInadditiontolookingatwhathappenedduringanalysis,Elasticsearchallowsustoexplainhowthescorewascalculatedforaparticularqueryanddocument.Let’slookatthefollowingexample:
curl-XGET'localhost:9200/library/book/1/_explain?pretty&q=quiet'
Theprecedingrequestspecifiesadocumentandaquerytorun.ThedocumentisspecifiedintheURIandthequeryispassedusingtheqparameter.Usingthe_explainendpoint,weaskElasticsearchforanexplanationabouthowthedocumentwasmatchedbyElasticsearch(ornotmatched).TheresponsereturnedbyElasticsearchfortheprecedingrequestlooksasfollows:
{
"_index":"library",
"_type":"book",
"_id":"1",
"matched":true,
"explanation":{
"value":0.057534903,
"description":"sumof:",
"details":[{
"value":0.057534903,
"description":"weight(_all:quietin0)[PerFieldSimilarity],result
of:",
"details":[{
"value":0.057534903,
"description":"fieldWeightin0,productof:",
"details":[{
"value":1.0,
"description":"tf(freq=1.0),withfreqof:",
"details":[{
"value":1.0,
"description":"termFreq=1.0",
"details":[]
}]
},{
"value":0.30685282,
"description":"idf(docFreq=1,maxDocs=1)",
"details":[]
},{
"value":0.1875,
"description":"fieldNorm(doc=0)",
"details":[]
}]
}]
},{
"value":0.0,
"description":"matchonrequiredclause,productof:",
"details":[{
"value":0.0,
"description":"#clause",
"details":[]
www.EBooksWorld.ir
},{
"value":3.2588913,
"description":"_type:book,productof:",
"details":[{
"value":1.0,
"description":"boost",
"details":[]
},{
"value":3.2588913,
"description":"queryNorm",
"details":[]
}]
}]
}]
}
}
Itcanlookslightlycomplicatedandwell,itiscomplicated.Itisevenworseifwerealizethatthisisonlyasimplequery!Elasticsearch,andmorespecificallytheLucenelibrary,showstheinternalinformationaboutthescoringprocess.Wewillonlyscratchthesurfaceandwillexplainthemostimportantthingsabouttheprecedingresponse.
ThefirstthingthatyoucannoticeisthatfortheparticularqueryElasticsearchprovidedtheinformationifthedocumentwasamatchornot.Ifthematchedpropertyissettotrue,itmeansthatthedocumentwasamatchfortheprovidedquery.
Thenextimportantthingistheexplanationobject.Itcontainsthreeproperties:thevalue,thedescription,andthedetails.Thevalueisthescorecalculatedforthegivenpartofthequery.Thedescriptionisthesimplifiedtextrepresentationoftheinternalscorecalculation,andthedetailsobjectcontainsdetailedinformationaboutthescorecalculation.ThenicethingisthatthedetailsobjectwillagaincontainthesamethreepropertiesandthisishowElasticsearchprovidesuswithinformationonhowthescoreiscalculated.
Forexample,let’sanalyzethefollowingpartoftheresponse:
"value":0.057534903,
"description":"sumof:",
"details":[{
"value":0.057534903,
"description":"weight(_all:quietin0)[PerFieldSimilarity],result
of:",
"details":[{
"value":0.057534903,
"description":"fieldWeightin0,productof:",
"details":[{
"value":1.0,
"description":"tf(freq=1.0),withfreqof:",
"details":[{
"value":1.0,
"description":"termFreq=1.0",
"details":[]
}]
},{
www.EBooksWorld.ir
"value":0.30685282,
"description":"idf(docFreq=1,maxDocs=1)",
"details":[]
},{
"value":0.1875,
"description":"fieldNorm(doc=0)",
"details":[]
}]
}]
Thescoreoftheelementis0.057534903(thevalueproperty)anditisasumof(weseethatinthedescriptionproperty)alltheinnerelements.Inthedescriptiononthefirstlevelofnestingoftheprecedingfragment,wecanseethatPerFieldSimilarityhasbeenusedandthatthescoreofthatelementistheresultoftheinnerelements–thesecondlevelofnesting.
Onthesecondlevelofdetailsnesting,wecanseethreeelements.Thefirstoneshowsusthescoreoftheelement,whichistheproductofthetwoscoresoftheelementsbelowit.Wecanalsoseevariousinternalstatisticsretrievedfromtheindex:thetermfrequencywhichinformsushowcommonthetermis(termFreq=1.0),theinverteddocumentfrequency,whichshowsushowoftenthetermappearsinthedocuments(idf(docFreq=1,maxDocs=1)),andthefieldnormalizationfactor(fieldNorm(doc=0)).
TheExplainAPIsupportsthefollowingparameters:analyze_wildcard,analyzer,default_operator,df,fields,lenient,lowercase_expanded_terms,parent,preference,routing,_source,_source_exclude,and_source_include.Tolearnmoreaboutalltheseparameters,refertotheofficialElasticsearchdocumentationregardingExplainAPI,whichisavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html.
www.EBooksWorld.ir
www.EBooksWorld.ir
SummaryThechapterwejustfinishedwasfocusedonquerying;notaboutthematchingpartofitbutmostlyaboutscoring.WelearnedhowApacheLuceneTF/IDFscoringworks.WesawthescriptingcapabilitiesofElasticsearchandwehandledmultilingualdata.Weusedboostingtoinfluencehowthescoresofthereturneddocumentswerecalculatedandweusedsynonyms.Finally,weusedexplaininformationtoseehowthedocumentscoreswerecalculatedbythequery.
Inthenextchapter,wewillfullyfocusonElasticsearchdataanalysiscapabilities–theaggregations,theirtypes,andhowtheycanbeused.
www.EBooksWorld.ir
www.EBooksWorld.ir
Chapter7.AggregationsforDataAnalysisInthepreviouschapter,wediscussedthequeryingsideofElasticsearchagain.WelearnedhowtheLuceneTF/IDFalgorithmworksandhowtouseElasticsearchscriptingcapabilities.Wehandledmultilingualdataandinfluenceddocumentscoreswithboosts.WeusedsynonymstomatchwordsthathavethesamemeaningandweusedElasticsearchExplainAPItoseehowdocumentscoreswerecalculated.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:
WhatareaggregationsHowtheElasticsearchaggregationengineworksHowtousemetricsaggregationsHowtousebucketsaggregationsHowtousepipelineaggregations
www.EBooksWorld.ir
AggregationsIntroducedinElasticsearch1.0,aggregationsaretheheartofdataanalyticsinElasticsearch.Highlyflexibleandperformant,aggregationsbroughtElasticsearch1.0toanewpositionasafull-featuredanalysisengine.ExtendedthroughthelifeofElasticsearch1.x,in2.xtheyareyetmorepowerful,lessmemorydemanding,andfaster.Withthisframework,youcanuseElasticsearchastheanalysisenginefordataextractionandvisualization.Let’sseehowthatfunctionalityworksandwhatwecanachievebyusingit.
www.EBooksWorld.ir
GeneralquerystructureTouseaggregations,weneedtoaddanadditionalsectioninourquery.Ingeneral,ourquerieswithaggregationslooklikethis:
{
"query":{…},
"aggs":{
"aggregation_name":{
"aggregation_type":{
...
}
}
}
}
Intheaggsproperty(youcanuseaggregationsifyouwant;aggsisjustanabbreviation),youcandefineanynumberofaggregations.EachaggregationisdefinedbyitsnameandoneofthetypesofaggregationsthatareprovidedbyElasticsearch.Onethingtorememberthoughisthatthekeydefinesthenameoftheaggregation(youwillneedittodistinguishparticularaggregationsintheserverresponse).Let’stakeourlibraryindexandcreatethefirstqueryusinguseaggregations.Acommandsendingsuchaquerylookslikethis:
curl'localhost:9200/library/_search?
search_type=query_then_fetch&size=0&pretty'-d'{
"aggs":{
"years":{
"stats":{
"field":"year"
}
},
"words":{
"terms":{
"field":"copies"
}
}
}
}'
Thisquerydefinestwoaggregations.Theaggregationnamedyearsshowsstatisticsfortheyearfield.Thewordsaggregationcontainsinformationaboutthetermsusedinagivenfield.
NoteInourexamplesweassumedthatweperformaggregationinadditiontosearching.Ifwedon’tneedfounddocuments,abetterideaistousethesizeparameterandsetitto0.Thisomitssomeunnecessaryworkandismoreefficient.Insuchacase,theendpointshouldbe/library/_search?size=0.YoucanreadmoreaboutsearchtypesinChapter3,UnderstandingtheQueryingProcess.
Let’snowlookattheresponsereturnedbyElasticsearchfortheprecedingquery:
www.EBooksWorld.ir
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"words":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[{
"key":0,
"doc_count":2
},{
"key":1,
"doc_count":1
},{
"key":6,
"doc_count":1
}]
},
"years":{
"count":4,
"min":1886.0,
"max":1961.0,
"avg":1928.0,
"sum":7712.0
}
}
}
Asyousee,boththeaggregations(yearsandwords)werereturned.Thefirstaggregationwedefinedinourquery(years)returnedgeneralstatisticsforthegivenfieldgatheredacrossallthedocumentsthatmatchedourquery.Thesecondofthedefinedaggregations(words)wasabitdifferent.Itcreatedseveralsetscalledbucketsthatwerecalculatedonthereturneddocumentsandeachoftheaggregatedvalueswaswithinoneofthesesets.Asyoucansee,therearemultipleaggregationtypesavailableandtheyreturndifferentresults.Wewillseethedifferencesinthelaterpartofthissection.
Thegreatthingabouttheaggregationengineisthatitallowsyoutohavemultipleaggregationsandthataggregationscanbenested.Thismeansthatyoucanhaveindefinitelevelsofnestingandanynumberofaggregationsingeneral.Theextendedstructureofthequeryisshownnext:
{
"query":{…},
"aggs":{
www.EBooksWorld.ir
"first_aggregation_name":{
"aggregation_type":{
...
},
"aggregations":{
"first_nested_aggregation":{
...
},
.
.
.
"nth_nested_aggregation":{
...
}
}
},
.
.
.
"nth_aggregation_name":{
...
}
}
}
www.EBooksWorld.ir
InsidetheaggregationsengineAggregationsworkonthebasisofresultsreturnedbythequery.Thisisveryhandyaswegettheinformationthatweareinterestedin,bothfromthequeryaswellasthedataanalysisperspective.SowhatdoesElasticsearchdowhenweincludetheaggregationpartofthequeryintherequestthatwesendtoElasticsearch?Firstofall,theaggregationisexecutedoneachrelevantshardandtheresultsarereturnedtothenodethatisresponsibleforrunningthatquery.Thatnodewaitsforthepartialresultstobecalculated;afteritgetsalltheresults,itmergestheresults,producingthefinalresults.
Thisapproachisnothingnewwhenitcomestodistributedsystemsandhowtheyworkandcommunicate,butcancauseissueswhenitcomestotheprecisionoftheresults.Inmostcasesthisisnotaproblem,butyoushouldbeawareaboutwhattoexpect.Let’simaginethefollowingexample:
Theprecedingimageshowsasimplifiedviewofthreeshards,eachcontainingdocumentshavingonlyElasticsearchandSolrtermsinthem.Nowimaginethatweareinterestedinasingletermforourindex.Thetermsaggregationwhenrunusingsize=1wouldreturnasingleterm,thatwouldbetheonethatisthemostfrequent(ofcourselimitedtothequerywe’verun).SoouraggregatornodewouldseepartialresultstellingusthatElasticsearchispresentin19documentsinShard1andtheSolrtermispresentin10documentsinShard2andShard3,whichmeansthatthetoptermisSolr,whichisnottrue.Thisisanextremecase,butthereareusecases(suchasaccounting)whereprecisioniskeyandyoushouldbeawareaboutsuchsituations.
NoteComparedtoqueries,aggregationsareheavierforElasticsearchintermsofbothCPUcyclesandmemoryconsumption.WewilldiscussthisinmoredetailintheCachingAggregationssectionofthischapter.
www.EBooksWorld.ir
www.EBooksWorld.ir
AggregationtypesElasticsearch2.xallowsustousethreetypesofaggregation:metrics,buckets,andpipeline.Themetricsaggregationsreturnametric,justlikethestatsaggregationweusedforthestatsfield.Thebucketaggregationsreturnbuckets,thekeyandthenumberofdocumentssharingthesamevalues,ranges,andsoon,justlikethetermsaggregationweusedforthecopiesfield.Finally,thepipelineaggregationsintroducedinElasticsearch2.0aggregatetheoutputoftheotheraggregationsandtheirmetrics,whichallowsustodoevenmoresophisticateddataanalysis.Knowingallthat,let’snowlookatalltheaggregationswecanuseinElasticsearch2.x.
www.EBooksWorld.ir
MetricsaggregationsWewillstartwiththemetricsaggregations,whichcanaggregatevaluesfromdocumentsintoasinglemetric.Thisisalwaysthecasewithmetricsaggregations–youcanexpectthemtobeasinglemetriconthebasisofthedata.Let’snowtakealookatthemetricsaggregationsavailableinElasticsearch2.x.
Minimum,maximum,average,andsumThefirstgroupofmetricsaggregationsthatwewanttoshowyouistheonethatcalculatesthebasicvaluefromthegivendocuments.Theseaggregationsare:
min:Thiscalculatestheminimumvaluefromthegivennumericfieldinthereturneddocumentsmax:Thiscalculatesthemaximumvaluefromthegivennumericfieldinthereturneddocumentsavg:Thiscalculatesanaveragefromthegivennumericfieldinthereturneddocumentssum:Thiscalculatesthesumfromthegivennumericfieldinthereturneddocuments
Asyoucansee,theprecedingmentionedaggregationsareprettyself-explanatory.So,let’strytocalculatetheaveragevalueonourdata.Forexample,let’sassumethatwewanttocalculatetheaveragenumberofcopiesforourbooks.Thequerytodothatwilllookasfollows:
{
"aggs":{
"avg_copies":{
"avg":{
"field":"copies"
}
}
}
}
TheresultsreturnedbyElasticsearchafterrunningtheprecedingquerywillbeasfollows:
{
"took":5,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"avg_copies":{
"value":1.75
www.EBooksWorld.ir
}
}
}
So,wehaveanaverageof1.75copiesperbook.Itisveryeasytocalculate–(6+0+1+0)/4isequalto1.75.Seemsthatwegotitright.
Missingvalues
ThenicethingaboutthepreviouslymentionedaggregationsisthatwecancontrolwhatvalueElasticsearchcanuseifthefieldswe’vespecifieddon’thaveany.Forexample,ifwewantedElasticsearchtouse0asthevalueforthecopiesfieldinourpreviousexample,wewouldaddthemissingpropertytoourqueryandandsetitto0.Forexample:
{
"aggs":{
"avg_copies":{
"avg":{
"field":"copies",
"missing":0
}
}
}
}
Usingscripts
Theinputvaluescanalsobegeneratedbyascript.Forexample,ifwewanttofindtheminimumvaluefromallthevaluesintheyearfield,butwewanttosubtract1000fromthosevalues,wewillsendanaggregationlikethefollowingone:
{
"aggs":{
"min_year":{
"min":{
"script":"doc['year'].value-1000"
}
}
}
}
NoteNotethattheprecedingqueryrequiresinlinescriptstobeallowed.Thismeansthatthequeryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile.
Inthiscase,thevaluetheaggregationswillusewillbetheoriginalyearfieldvaluereducedby1000.
WecanalsousethevaluescriptcapabilitiesofElasticsearch.Forexample,toachievethesameasthepreviousscript,wecanusethefollowingquery:
{
"aggs":{
"min_year":{
"min":{
www.EBooksWorld.ir
"field":"year",
"script":{
"inline":"_value-factor",
"params":{
"factor":1000
}
}
}
}
}
}
IfyouarenotfamiliarwithElasticsearchscriptingcapabilities,youcanreadmoreaboutitintheScriptingcapabilitiesofElasticsearchsectionofChapter6,MakeYourSearchBetter.
Onethingworthrememberingisthatusingthecommandlinemayrequireproperescapingofthevaluesinthedocarray.Forexample,thecommandthatexecutesthefirstscriptedquerywouldlookasfollows:
curl-XGET'localhost:9200/library/_search?size=0&pretty'-d'{
"aggs":{
"min_year":{
"min":{
"script":"doc[\"year\"].value-1000"
}
}
}
}'
FieldvaluestatisticsandextendedstatisticsThenextaggregationswewilldiscussaretheonesthatprovideuswiththestatisticalinformationaboutthenumericfieldwearerunningtheaggregationon:thestatsandextended_statsaggregations.
Forexample,thefollowingqueryprovidesextendedstatisticsfortheyearfield:
{
"aggs":{
"extended_statistics":{
"extended_stats":{
"field":"year"
}
}
}
}
Theresponsetotheprecedingquerywillbeasfollows:
{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
www.EBooksWorld.ir
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"extended_statistics":{
"count":4,
"min":1886.0,
"max":1961.0,
"avg":1928.0,
"sum":7712.0,
"sum_of_squares":1.4871654E7,
"variance":729.5,
"std_deviation":27.00925767213901,
"std_deviation_bounds":{
"upper":1982.018515344278,
"lower":1873.981484655722
}
}
}
}
Asyoucansee,intheresponsewegotinformationaboutthenumberofdocumentswithvalueintheyearfield,theminimumvalue,themaximumvalue,theaverage,andthesum.Thesearethevaluesthatwewillgetifwerunthestatsaggregationinsteadofextended_stats.Theextended_statsaggregationprovidesadditionalinformation,suchasthesumofsquares,variance,andstandarddeviation.Elasticsearchprovidestwotypesofaggregationsbecauseextended_statsisslightlymoreexpensivewhenitcomestoprocessingpower.
NoteThestatsandextended_statsaggregations,similartothemin,max,avg,andsumaggregations,supportscriptingandallowustospecifywhichvalueshouldbeusedforthefieldsthatdon’thavevalueinthespecifiedfield.
ValuecountThevalue_countaggregationisasimpleaggregationwhichallowscountingvaluesinaggregateddocuments.Thisisquiteusefulwhenusedwithnestedaggregations.Wearenotfocusingonthattopicrightnow,butitissomethingtokeepinmind.Forexample,tousethevalue_countaggregationonthecopiedfield,wewillrunthefollowingquery:
{
"aggs":{
"count":{
"value_count":{
"field":"copies"
}
}
}
www.EBooksWorld.ir
}
NoteThevalue_countaggregationallowsustousescripts,discussedearlierinthischapterwhenwedescribedthemin,max,avg,andsumaggregations.PleaserefertothebeginningofMetricsaggregationsectionearlierinthecurrentchapterforfurtherreference.
FieldcardinalityOneoftheaggregationthatallowsustocontrolhowresourcehungrytheaggregationwillbebycontrollingitsprecision,thecardinalityaggregationcalculatesthecountofdistinctvaluesinagivenfield.However,onethingneedstoberemembered:thecalculatedcountisanapproximation,nottheexactvalue.ElasticsearchusestheHyperLogLog++algorithm(http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdftocalculatethevalue.
Thisaggregationhasawidevarietyofusecases,suchasshowingthenumberofdistinctvaluesinafieldthatisresponsibleforholdingthestatuscodeforyourindexedApacheaccesslogs.Onequery,andyouknowtheapproximatedcountofthedistinctvaluesinthatfield.
Forexample,wecanrequestthecardinalityforourtitlefield:
{
"aggs":{
"card_title":{
"cardinality":{
"field":"title"
}
}
}
}
Tocontroltheprecisionofthecardinalitycalculation,wecanspecifytheprecision_thresholdproperty–thehigherthevalue,themoreprecisetheaggregationwillbeandthemoreresourcesitwillneed.Thecurrentmaximumprecision_thresholdvalueis40000andthedefaultdependsontheparentaggregation.Anexamplequeryusingtheprecision_thresholdpropertylooksasfollows:
{
"aggs":{
"card_title":{
"cardinality":{
"field":"title",
"precision_threshold":1000
}
}
}
}
Percentiles
www.EBooksWorld.ir
ThepercentilesaggregationisanotherexampleofaggregationinElasticsearch.Itusesanalgorithmicapproximationapproachtoprovideuswithresults.ItusestheT-Digestalgorithm(https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)fromTedDunningandOtmarErtlandallowsustocalculatepercentiles:metricsthatshowushowmanyresultsareaboveacertainvalue.Forexample,the99thpercentileshowsusthevaluethatisgreaterthan99percentoftheothervalues.
Let’sgointoanexampleandlookataquerythatwillcalculatepercentilesfortheyearfieldinourdata:
{
"aggs":{
"copies_percentiles":{
"percentiles":{
"field":"year"
}
}
}
}
TheresultsreturnedbyElasticsearchfortheprecedingrequestwilllookasfollows:
{
"took":26,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"copies_percentiles":{
"values":{
"1.0":1887.2899999999997,
"5.0":1892.4499999999998,
"25.0":1918.25,
"50.0":1932.5,
"75.0":1942.25,
"95.0":1957.25,
"99.0":1960.25
}
}
}
}
Asyoucansee,thevaluethatishigherthan99percentofthevaluesis1960.25.
Youmaywonderwhysuchaggregationisimportant.Itisveryusefulforperformancemetrics;forexample,whereweusuallylookataveragesforsomeperiodoftime.Imaginethattheaverageresponsetimeofourqueriesforthelasthouris50milliseconds,whichis
www.EBooksWorld.ir
notbad.However,ifthe95thpercentilewouldshow2seconds,thatwouldmeanthatabout5percentoftheusershadtowaittwoormoresecondsforthesearchresults,whichisnotthatgood.
Bydefault,thepercentilesaggregationcalculatessevenpercentiles:1,5,25,50,75,95,and99.Wecancontrolthisbyusingthepercentspropertyandspecifywhichpercentilesweareinterestedin.Forexample,ifwewanttogetonlythe95thandthe99thpercentile,wechangeourquerytothefollowingone:
{
"aggs":{
"copies_percentiles":{
"percentiles":{
"field":"year",
"percents":["95","99"]
}
}
}
}
NoteSimilartothemin,max,avg,andsumaggregations,thepercentilesaggregationsupportsscriptingandallowsustospecifywhichvalueshouldbeusedforthefieldsthatdon’thavevalueinthespecifiedfield.
We’vementionedearlierthatthepercentilesaggregationusesanalgorithmicapproachandisanapproximation.Aswithallapproximations,wecancontroltheprecisionandmemoryusageofthealgorithm.Wedothatbyusingthecompressionproperty,whichdefaultsto100.ItisaninternalpropertyofElasticsearchanditsimplementationdetailsmaychangebetweenversions.Itisworthknowingthatsettingthecompressionvaluetoonehigherthan100canincreasethealgorithmprecisionatthecostofmemoryusage.
PercentileranksThepercentile_ranksaggregationissimilartothepercentilesonethatwejustdiscussed.Itallowsustoshowwhichpercentileagivenvaluehas.Forexample,toshowuswhichpercentileyear1932andyear1960are,werunthefollowingquery:
{
"aggs":{
"copies_percentile_ranks":{
"percentile_ranks":{
"field":"year",
"values":["1932","1960"]
}
}
}
}
TheresponsereturnedbyElasticsearchwillbeasfollows:
{
"took":2,
www.EBooksWorld.ir
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"copies_percentile_ranks":{
"values":{
"1932.0":49.5,
"1960.0":61.5
}
}
}
}
TophitsaggregationThetop_hitsaggregationkeepstrackofthemostrelevantdocumentbeingaggregated.Thisdoesn’tsoundveryappealing,butitallowsustoimplementoneofthemostdesiredfunctionalitiesinElasticsearchcalleddocumentgrouping,fieldcollapsing,ordocumentfolding.Suchfunctionalityisveryusefulinsomeusecases—forexample,whenwewanttoshowabookcatalogbutonlyonefromasinglepublisher.Todothatwithoutthetop_hitsaggregation,wewouldhavetorunmultiplequeries.Withthetop_hitsaggregation,weneedonlyasinglequery.
Thetop_hitsaggregationwasintroducedinElasticsearch1.3.Infact,thementioneddocumentfoldingismoreorlessasideeffectandonlyoneofthepossibleusageexamplesofthetop_hitsaggregation.
Theideabehindthetop_hitsaggregationissimple.Everydocumentthatisassignedtoaparticularbucketcanbealsoremembered.Bydefault,onlythreedocumentsperbucketareremembered.
NoteNotethat,inordertoshowthefullpotentialofthetop_hitsaggregation,wedecidedtouseoneofthebucketingaggregationsaswellandnestthemtoshowthedocumentgroupingfunctionalityimplementation.Thebucketingaggregationsaredescribedindetaillaterinthischapter.
Toshowyouapotentialusecasethatleveragesthetop_hitsaggregation,wehavedecidedtouseasimpleexample.Wewouldliketogetthemostrelevantbookpublishedevery100years.Todothatweusethefollowingquery:
{
"aggs":{
"when":{
www.EBooksWorld.ir
"histogram":{
"field":"year",
"interval":100
},
"aggs":{
"book":{
"top_hits":{
"_source":{
"include":["title","available"]
},
"size":1
}
}
}
}
}
}
Intheprecedingexample,wedidthehistogramaggregationonyearranges.Everybucketwascreatedforeveryonehundredyears.Thenestedtop_hitsaggregationsremembersasingledocumentwiththegreatestscorefromeachbucket(becauseofthesizepropertybeingsetto1).Weaddedtheincludeoptiononlyforsimplerresults,sothatweonlyreturnthetitleandavailablefieldsforeveryaggregateddocument.TheresponsereturnedbyElasticsearchwillbeasfollows:
{
"took":8,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"when":{
"buckets":[{
"key":1800,
"doc_count":1,
"book":{
"hits":{
"total":1,
"max_score":1.0,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"4",
"_score":1.0,
"_source":{
"available":true,
"title":"CrimeandPunishment"
www.EBooksWorld.ir
}
}]
}
}
},{
"key":1900,
"doc_count":3,
"book":{
"hits":{
"total":3,
"max_score":1.0,
"hits":[{
"_index":"library",
"_type":"book",
"_id":"2",
"_score":1.0,
"_source":{
"available":false,
"title":"Catch-22"
}
}]
}
}
}]
}
}
}
Wecanseethat,becauseofthetop_hitsaggregation,wehavethemostscoringdocument(fromeachbucket)includedintheresponse.Inourparticularcase,thequerywasthematch_alloneandallthedocumentshadthesamescore,sothetop-scoringdocumentforeverybucketwasmoreorlessrandom.However,youneedtorememberthatthisisthedefaultbehavior.Ifwewanttohavecustomsorting,thisisnotaproblemforElasticsearch.Wejustneedtoaddthesortpropertyforourtop_hitsaggregator.Forexample,wecanreturnthefirstbookfromagivencentury:
{
"aggs":{
"when":{
"histogram":{
"field":"year",
"interval":100
},
"aggs":{
"book":{
"top_hits":{
"sort":{
"year":"asc"
},
"_source":{
"include":["title","available"]
},
"size":1
}
}
www.EBooksWorld.ir
}
}
}
}
Weaddedsortingtothetop_hitsaggregation,sotheresultsaresortedonthebasisoftheyearfield.Thismeansthatthefirstdocumentwillbetheonewiththelowestvalueinthatfieldandthisisthedocumentthatisgoingtobereturnedforeachbucket.
Additionalparameters
Sortingandfieldinclusionisnoteverythingthatwecanwedoinsidethetop_hitsaggregation.Becausethisaggregationreturnsdocuments,wecanalsousefunctionalitiessuchas:
highlightingexplainscriptingfielddatafield(uninvertedrepresentationofthefields)version
Wejustneedtoincludeanappropriatesectioninthetop_hitsaggregationbody,similartowhatwedowhenweconstructaquery.Forexample:
{
"aggs":{
"when":{
"histogram":{
"field":"year",
"interval":100
},
"aggs":{
"book":{
"top_hits":{
"highlight":{
"fields":{
"title":{}
}
},
"explain":true,
"version":true,
"_source":{
"include":["title","available"]
},
"fielddata_fields":["title"],
"script_fields":{
"century":{
"script":"(doc[\"year\"].value/100).intValue()"
}
},
"size":1
}
}
}
}
www.EBooksWorld.ir
}
}
NoteNotethattheprecedingqueryrequirestheinlinescriptstobeallowed.Thismeansthatthequeryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile.
GeoboundsaggregationThegeo_boundsaggregationisasimpleaggregationthatallowsustocomputetheboundingboxthatincludesallthegeo_pointtypefieldvaluesfromtheaggregateddocuments.
NoteIfyouareinterestedinspatialsearches,thesectiondedicatedtoitiscalledGeoandisincludedinChapter8,BeyondFull-textSearching.
Weonlyneedtoprovidethefield(byusingthefieldproperty;itneedstobeofthegeo_pointtype).Wecanalsoprovidewrap_longitude(valuestrueorfalse;itdefaultstotrue)iftheboundingboxisallowedtooverlaptheinternationaldateline.Inresponse,wegetthelatitudeandlongitudeofthetop-leftandbottom-rightcornersoftheboundingbox.Anexamplequeryusingthisaggregationlooksasfollows(usingthehypotheticallocationfield):
{
"aggs":{
"box":{
"geo_bounds":{
"field":"location"
}
}
}
}
ScriptedmetricsaggregationThelastmetricaggregationwewanttodiscussisthescripted_metricaggregation,whichallowsustodefineourownaggregationcalculationusingscripts.Forthisaggregation,wecanprovidethefollowingscripts(map_scriptistheonlyrequiredone,therestareoptional):
init_script:Thisscriptisrunduringinitializationandallowsustosetupaninitialstateofthecalculation.map_script:Thisistheonlyrequiredscript.Itisexecutedonceforeverydocumentthatneedstostorethecalculationinanobjectcalled_agg.combine_script:ThisscriptisexecutedonceoneachshardafterElasticsearchfinishesdocumentcollectiononthatshard.reduce_script:Thisscriptisexecutedonceonthenodethatiscoordinatingaparticularqueryexecution.Thisscripthasaccesstothe_aggsvariable,whichisanarrayofthevaluesreturnedbycombine_script.
www.EBooksWorld.ir
Forexample,wecanusethescripted_metricaggregationtocalculateallthecopiesofallthebookswehaveinourlibrarybyrunningthefollowingrequest(weshowthewholerequesttoshowhowthenamesareescaped):
curl-XGET'localhost:9200/library/_search?size=0&pretty'-d'{
"aggs":{
"all_copies":{
"scripted_metric":{
"init_script":"_agg[\"all_copies\"]=0",
"map_script":"_agg.all_copies+=doc.copies.value",
"combine_script":"return_agg.all_copies",
"reduce_script":"sum=0;for(numberin_aggs){sum+=number};
returnsum"
}
}
}
}'
Ofcourse,theprecedingscriptisjustasimplesumandwecouldusesumaggregation,butwejustwantedtoshowyouasimpleexampleofwhatyoucandowiththescripted_metricaggregation.
NoteNotethattheprecedingqueryrequiresinlinescriptstobeallowed.Thismeansthatthequeryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile.
Asyoucansee,theinit_scriptpartoftheaggregationisusedtoinitializetheall_copiesvariable.Next,wehavemap_script,whichisexecutedonceforeverydocumentandwejustaddthevalueofthecopiesfieldtotheearlierinitializedvariable.Thecombine_scriptpart,executedonceoneachshard,tellsElasticsearchtoreturnthecalculatedvariable.Finally,thereduce_scriptpart,executedonceforthewholequeryontheaggregatornode,willrunaforloop,whichwillgothroughallthereturnedvaluesthatarestoredinthe_aggsarrayandreturnthesumofthose.ThefinalresultreturnedbyElasticsearchfortheprecedingquerylooksasfollows:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"all_copies":{
"value":7
}
www.EBooksWorld.ir
}
}
www.EBooksWorld.ir
BucketsaggregationsThesecondtypeofaggregationsthatwewilldiscussarethebucketsaggregations.Incomparisontometricsaggregations,bucketaggregationreturnsdatanotasasinglemetricbutasalistofkeyvaluepairscalledbuckets.Forexample,thetermsaggregationreturnsthenumberofdocumentsassociatedwitheachterminagivenfield.Theverypowerfulthingaboutbucketsaggregationsisthattheycanhavesub-aggregations,whichmeansthatwecannestotheraggregationsinsidetheaggregationsthatreturnbuckets(wewilldiscussthisattheendofthebucketsaggregationdiscussion).Let’slookatthebucketaggregationsthatareprovidedbyElasticsearchnow.
FilteraggregationThefilteraggregationisasimplebucketingaggregationthatallowsustofiltertheresultstoasinglebucket.Forexample,let’sassumethatwewanttogetacountandtheaveragecopiescountofallthebooksthatarenovels,whichmeanstheyhavethetermnovelinthetagsfield.Thequerythatwillreturnsuchresultslooksasfollows:
{
"aggs":{
"novels_count":{
"filter":{
"term":{
"tags":"novel"
}
},
"aggs":{
"avg_copies":{
"avg":{
"field":"copies"
}
}
}
}
}
}
Asyoucansee,wedefinedthefilterinthefiltersectionoftheaggregationdefinitionandwedefinedasecondnestedaggregation.Thenestedaggregationistheonethatwillberunonthefiltereddocuments.
TheresponsereturnedbyElasticsearchlooksasfollows:
{
"took":13,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
www.EBooksWorld.ir
"max_score":0.0,
"hits":[]
},
"aggregations":{
"novels_count":{
"doc_count":2,
"avg_copies":{
"value":3.5
}
}
}
}
Inthereturnedbucket,wehaveinformationaboutthenumberofdocuments(representedbythedoc_countproperty)andtheaveragenumberofcopies,whichisallwewanted.
FiltersaggregationThesecondbucketaggregationwewanttoshowyouisthefiltersaggregation.Whilethepreviouslydiscussedfilteraggregationresultedinasinglebucket,thefiltersaggregationreturnsmultiplebuckets–oneforeachofthedefinedfilters.Let’sextendourpreviousexampleandassumethat,inadditiontotheaveragenumberofcopiesforthenovels,wealsowanttoknowtheaveragenumberofcopiesforthebooksthatareavailable.Thequerythatwillgetusthisinformationwillusethefiltersaggregationandwilllookasfollows:
{
"aggs":{
"count":{
"filters":{
"filters":{
"novels":{
"term":{
"tags":"novel"
}
},
"available":{
"term":{
"available":true
}
}
}
},
"aggs":{
"avg_copies":{
"avg":{
"field":"copies"
}
}
}
}
}
}
Let’sstophereandlookatthedefinitionoftheaggregation.Asyoucansee,wedefined
www.EBooksWorld.ir
twofiltersusingthefilterssectionofthefiltersaggregation.EachfilterhasanameandtheactualElasticsearchfilter;thefirstiscallednovelsandthesecondiscalledavailable.Elasticsearchwillusethesenamesinthereturnedresponse.ThethingtorememberisthatElasticsearchwillcreateabucketforeachdefinedfilterandwillcalculatethenestedaggregationthatwedefined–inourcase,theonethatcalculatestheaveragenumberofcopies.
NoteThefiltersaggregationallowsustoreturnonemorebucketinadditiontothedefinedones–abucketwithallthedocumentsthatdidn’tmatchthefilters.Inordertocalculatesuchabucket,weneedtoaddtheother_bucketpropertytothebodyoftheaggregationandsetittotrue.
TheresultsreturnedbyElasticsearchareasfollows:
{
"took":4,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"count":{
"buckets":{
"novels":{
"doc_count":2,
"avg_copies":{
"value":3.5
}
},
"available":{
"doc_count":2,
"avg_copies":{
"value":0.5
}
}
}
}
}
}
Asyoucansee,wegottwobuckets,whichiswhatweexpected.
TermsaggregationOneofthemostcommonlyusedbucketaggregationsisthetermsaggregation.Itallows
www.EBooksWorld.ir
ustogetinformationaboutthetermsandthecountofdocumentshavingthoseterms.Forexample,oneofthesimplestusesisgettingthecountofthebooksthatareavailableandnotavailable.Wecandothatbyrunningthefollowingquery:
{
"aggs":{
"counts":{
"terms":{
"field":"available"
}
}
}
}
Intheresponse,wewillgettwobuckets(becausetheBooleanfieldcanonlyhavetwovalues–trueandfalse).Here,thiswilllookasfollows:
{
"took":7,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"counts":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[{
"key":0,
"key_as_string":"false",
"doc_count":2
},{
"key":1,
"key_as_string":"true",
"doc_count":2
}]
}
}
}
Bydefault,thedataissortedonthebasisofdocumentcount,whichmeansthatthemostcommontermswillbeplacedontopoftheaggregationresults.Ofcourse,wecancontrolthisbehaviorbyspecifyingtheorderpropertyandprovidingtheorderjustlikeweusuallydowhensortingbyarbitraryfieldvalues.Elasticsearchallowsustosortbythedocumentcount(usingthe_countstaticvalue)andbytheterm(usingthe_termstaticvalue).Forexample,ifwewanttosortourprecedingaggregationresultsbydescendingterm,wecanrunthefollowingquery:
www.EBooksWorld.ir
{
"aggs":{
"counts":{
"terms":{
"field":"available",
"order":{"_term":"desc"}}
}
}
}
However,that’snotallwhenitcomestosorting.Wecanalsosortbytheresultsofthenestedaggregationsthatwereincludedinthequery.
Notetermsaggregation,similartothemin,max,avg,andsumaggregationsdiscussedinthemetricsaggregationsectionofthischapter,supportsscriptingandallowsustospecifywhichvalueshouldbeusedforthefieldsthatdon’thaveavalueinthespecifiedfield.
Countsareapproximate
Thethingtorememberwhendiscussingtermsaggregationisthatthecountsareapproximate.Thisisbecauseeachshardprovidesitsowncountsandreturnsthataggregatedinformationtothecoordinatingnode.Thecoordinatingnodeaggregatestheinformationitgotreturningthefinalinformationtotheclient.Becauseofthat,dependingonthedataandhowitisdistributedbetweentheshards,someinformationaboutthecountsmaybelostandthecountswillnotbeexact.Ofcourse,whendealingwithlowcardinalityfields,theapproximationwillbeclosertoexactnumbers,butstillthisissomethingthatshouldbeconsideredwhenusingthetermsaggregation.
Wecancontrolhowmuchinformationisreturnedfromeachoftheshardstothecoordinatingnode.Wecandothisbyspecifyingthesizeandtheshard_sizeproperties.Thesizepropertyspecifieshowmanybucketswillbereturnedatmost.Thehigherthesizeproperty,themoreaccuratethecalculationwillbe.However,thatwillcostusadditionalmemoryandCPUcycles,whichmeansthatthecalculationwillbemoreexpensiveandwillputmorepressureonthehardware.Thisisbecausetheresultsreturnedtothecoordinatingnodefromeachshardwillbelargerandtheresultmergingprocesswillbeharder.
Theshard_sizepropertycanbeusedtominimizetheworkthatneedstobedonebythecoordinatingnode.Whenset,thecoordinatingnodewillfetch(fromeachshard)thenumberofbucketsdeterminedbytheshard_sizeproperty.Thisallowsustoincreasetheprecisionoftheaggregationwhileavoidingtheadditionaloverheadonthecoordinatingnode.Rememberthattheshard_sizepropertycannotbesmallerthanthesizeproperty.
Finally,thesizepropertycanbesetto0,whichwilltellElasticsearchnottolimitthenumberofreturnedbuckets.Itisusuallynotwisetosetthesizepropertyto0asitcanresultinhighresourceconsumption.Also,avoidsettingthesizepropertyto0forhighcardinalityfieldsasthiswilllikelymakeyourElasticsearchclusterexplode.
Minimumdocumentcount
www.EBooksWorld.ir
Elasticsearchprovidesuswithtwoadditionalproperties,whichcanbeusefulincertainsituations:min_doc_countandshard_min_doc_count.Themin_doc_countpropertydefaultsto1andspecifieshowmanydocumentsmustmatchatermtobeincludedintheaggregationresults.Onethingtorememberisthatsettingthemin_doc_countpropertyto0willresultinreturningalltheterms,nomatteriftheyhaveamatchingdocumentornot.Thiscanresultinaverylargeresultsetforaggregationresults.Forexample,ifwewanttoreturntermsmatchedby5ormoredocuments,wewillrunthefollowingquery:
{
"aggs":{
"counts":{
"terms":{
"field":"available",
"min_doc_count":5}
}
}
}
Theshard_min_doc_countpropertyisverysimilaranddefineshowmanydocumentsmustmatchatermtobeincludedintheaggregation’sresults,butontheshardlevel.
RangeaggregationTherangeaggregationallowsustodefineoneormorerangesandElasticsearchcalculatesbucketsforthem.Forexample,ifwewanttocheckhowmanybookswerepublishedinagivenperiodoftime,wecreatethefollowingquery:
{
"aggs":{
"years":{
"range":{
"field":"year",
"ranges":[
{"to":1850},
{"from":1851,"to":1900},
{"from":1901,"to":1950},
{"from":1951,"to":2000},
{"from":2001}
]
}
}
}
}
Wespecifythefieldwewanttheaggregationtobecalculatedonandthearrayofranges.Eachrangeisdefinedbyoneortwoproperties:thetwoandfromsimilartotherangequerieswhichwealreadydiscussed.
TheresultreturnedbyElasticsearchforourdatalooksasfollows:
{
"took":23,
"timed_out":false,
"_shards":{
www.EBooksWorld.ir
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"years":{
"buckets":[{
"key":"*-1850.0",
"to":1850.0,
"to_as_string":"1850.0",
"doc_count":0
},{
"key":"1851.0-1900.0",
"from":1851.0,
"from_as_string":"1851.0",
"to":1900.0,
"to_as_string":"1900.0",
"doc_count":1
},{
"key":"1901.0-1950.0",
"from":1901.0,
"from_as_string":"1901.0",
"to":1950.0,
"to_as_string":"1950.0",
"doc_count":2
},{
"key":"1951.0-2000.0",
"from":1951.0,
"from_as_string":"1951.0",
"to":2000.0,
"to_as_string":"2000.0",
"doc_count":1
},{
"key":"2001.0-*",
"from":2001.0,
"from_as_string":"2001.0",
"doc_count":0
}]
}
}
}
Forexample,between1901and1950wehadtwobooksreleased.
NoteTherangeaggregation,similartothemin,max,avg,andsumaggregationsdiscussedinthemetricsaggregationssectionofthischapter,supportsscriptingandallowsustospecifywhichvalueshouldbeusedforthefieldsthatdon’thaveavalueinthespecifiedfield.
Keyedbuckets
www.EBooksWorld.ir
Onethingthatshouldmentionwhenitcomestotherangeaggregationisthatwecangivethedefinedrangesnames.Forexample,let’sassumethatwewanttousethenamesBefore18thcenturyforthebooksreleasedbefore1799,18thcenturyforthebooksreleasedbetween1800and1900,19thcenturyforthebooksreleasedbetween1900and1999,andAfter19thcenturyforthebooksreleasedafter2000.Wecandothisbyaddingthekeypropertytoeachdefinedrange,givingitthename,andaddingthekeyedpropertysettotrue.Settingthekeyedpropertytotruewillassociateauniquestringvaluetoeachbucketandthekeypropertydefinesthenameforthebucketthatwillbeusedastheuniquename.Aquerythatdoesthatwilllookasfollows:
{
"aggs":{
"years":{
"range":{
"field":"year",
"keyed":true,
"ranges":[
{"key":"Before18thcentury","to":1799},
{"key":"18thcentury","from":1800,"to":1899},
{"key":"19thcentury","from":1900,"to":1999},
{"key":"After19thcentury","from":2000}
]
}
}
}
}
TheresponsereturnedbyElasticsearchinsuchacasewilllookasfollows:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"years":{
"buckets":{
"Before18thcentury":{
"to":1799.0,
"to_as_string":"1799.0",
"doc_count":0
},
"18thcentury":{
"from":1800.0,
"from_as_string":"1800.0",
"to":1899.0,
www.EBooksWorld.ir
"to_as_string":"1899.0",
"doc_count":1
},
"19thcentury":{
"from":1900.0,
"from_as_string":"1900.0",
"to":1999.0,
"to_as_string":"1999.0",
"doc_count":3
},
"After19thcentury":{
"from":2000.0,
"from_as_string":"2000.0",
"doc_count":0
}
}
}
}
}
NoteAnimportantandquiteusefulpointabouttherangeaggregationisthatthedefinedrangesneednotbedisjoint.Insuchcases,Elasticsearchwillproperlycountthedocumentformultiplebuckets.
DaterangeaggregationThedate_rangeaggregationissimilartothepreviouslydiscussedrangeaggregationbutitisdesignedforfieldsthatusedate-basedtypes.However,inthelibraryindex,thedocumentshaveyears,butthefieldisanumber,notadate.Forthepurposeofshowinghowthisaggregationworks,let’simaginethatwewanttoextendourlibraryindextosupportnewspapers.Todothiswewillcreateanewindex(calledlibrary2)byusingthefollowingcommand:
curl-XPOSTlocalhost:9200/_bulk--data-binary'{"index":{"_index":
"library2","_type":"book","_id":"1"}}
{"title":"Fishingnews","published":"2010/12/0310:00:00","copies":3,
"available":true}
{"index":{"_index":"library2","_type":"book","_id":"2"}}
{"title":"Knittingmagazine","published":"2010/11/0711:32:00",
"copies":1,"available":true}
{"index":{"_index":"library2","_type":"book","_id":"3"}}
{"title":"Theguardian","published":"2009/07/1304:33:00","copies":0,
"available":false}
{"index":{"_index":"library2","_type":"book","_id":"4"}}
{"title":"HadoopWorld","published":"2012/01/0104:00:00","copies":6,
"available":true}
'
Forthepurposeofthisexample,wewillleavethemappingsdefinitionforElasticsearch–thisissufficientinthiscase.Let’sstartwiththefirstqueryusingthedate_rangeaggregation:
{
www.EBooksWorld.ir
"aggs":{
"years":{
"date_range":{
"field":"published",
"ranges":[
{"to":"2009/12/31"},
{"from":"2010/01/01","to":"2010/12/31"},
{"from":"2011/01/01"}
]
}
}
}
}
Comparedwiththeordinaryrangeaggregation,theonlythingthatchangedistheaggregationtype,whichisnowdate_range.ThedatescanbepassedasastringinaformrecognizedbyElasticsearchorasanumbervalue(numberofmillisecondssince1970-01-01).TheresponsereturnedbyElasticsearchfortheprecedingquerylooksasfollows:
{
"took":5,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"years":{
"buckets":[{
"key":"*-2009/12/3100:00:00",
"to":1.2622176E12,
"to_as_string":"2009/12/3100:00:00",
"doc_count":1
},{
"key":"2010/01/0100:00:00-2010/12/3100:00:00",
"from":1.262304E12,
"from_as_string":"2010/01/0100:00:00",
"to":1.2937536E12,
"to_as_string":"2010/12/3100:00:00",
"doc_count":2
},{
"key":"2011/01/0100:00:00-*",
"from":1.29384E12,
"from_as_string":"2011/01/0100:00:00",
"doc_count":1
}]
}
}
}
www.EBooksWorld.ir
Asyoucansee,theresponseisnodifferentwhencomparedtotheresponsereturnedbytherangeaggregation.Wehavetwoattributesforeachbucket-namedfromandtowhichrepresentthenumberofmillisecondsfrom1970-01-01.Thepropertiesfrom_as_stringandto_as_stringpresentthesameinformationasfromandto,butinahuman-readableform.Ofcoursethekeyedparameterandkeyinthedefinitionofdaterangeworkinthealreadydescribedway.
Elasticsearchalsoallowsustodefinetheformatofpresenteddatesusingtheformatattribute.Inourexample,wepresentedthedateswithyearresolution,sothedayandtimepartswereunnecessary.Ifwewanttoshowthemonthnames,wecansendaquerysuchasthefollowingone:
{
"aggs":{
"years":{
"date_range":{
"field":"published",
"format":"MMMMYYYY",
"ranges":[
{"to":"December2009"},
{"from":"January2010","to":"December2010"},
{"from":"January2011"}
]
}
}
}
}
Notethatthedatesinthetoandfromparametersalsoneedtobeprovidedinthespecifiedformat.Oneofthereturnedrangeslooksasfollows:
{
"key":"January2010-December2010",
"from":1.262304E12,
"from_as_string":"January2010",
"to":1.2911616E12,
"to_as_string":"December2010",
"doc_count":1
}
NoteTheavailableformatswecanuseinformataredefinedintheJodaTimelibrary.Thefulllistisavailableathttp://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html.
Thereisonemorethingaboutthedate_rangeaggregationthatwewanttomention.Imaginethatsometimewemaywanttobuildanaggregationthatcanchangewithtime.Forexample,wemaywanttoseehowmanynewspaperswerepublishedinthelast3,6,9,and12months.Thisispossiblewithouttheneedtoadjustthequeryeverytime,aswecanuseconstantssuchasnow-9M.Thefollowingexampleshowsthis:
{
www.EBooksWorld.ir
"aggs":{
"years":{
"date_range":{
"field":"published",
"format":"dd-MM-YYYY",
"ranges":[
{"to":"now-9M/M"},
{"to":"now-9M"},
{"from":"now-6M/M","to":"now-9M/M"},
{"from":"now-3M/M"}
]
}
}
}
}
Thekeyhereisexpressionssuchasnow-9M.Elasticsearchdoesthemathandgeneratestheappropriatevalue.Forexample,youcanusey(year),M(month),w(week),d(day),h(hour),m(minute),ands(second).Forexample,theexpressionnow+3dmeansthreedaysfromnow.The/Minourexampletakesonlythedateroundedtomonths.Thankstosuchnotation,weonlycountfullmonths.Thesecondadvantageisthatthecalculateddateismorecache-friendlywithouttheroundingdatechangeseverymillisecondthatmakeeverycachebasedontherangeirrelevantandbasicallyuselessinmostcases.
IPv4rangeaggregationAveryinterestingaggregationistheip_rangeoneasitworksonInternetaddresses.ItworksonthefieldsdefinedwiththeiptypeandallowsdefiningrangesgivenbytheIPrangeinCIDRnotation(http://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing).Anexampleusageoftheip_rangeaggregationlooksasfollows:
{
"aggs":{
"access":{
"ip_range":{
"field":"ip",
"ranges":[
{"from":"192.168.0.1","to":"192.168.0.254"},
{"mask":"192.168.1.0/24"}
]
}
}
}
}
Theresponsetotheprecedingqueryisasfollows:
"access":{
"buckets":[
{
"from":3232235521,
"from_as_string":"192.168.0.1",
"to":3232235774,
"to_as_string":"192.168.0.254",
"doc_count":0
www.EBooksWorld.ir
},
{
"key":"192.168.1.0/24",
"from":3232235776,
"from_as_string":"192.168.1.0",
"to":3232236032,
"to_as_string":"192.168.2.0",
"doc_count":4
}
]
}
Similartotherangeaggregation,wedefinebothendsofthebracketsandthemask.TherestisdonebyElasticsearchitself.
MissingaggregationThemissingaggregationallowsustocreateabucketandseehowmanydocumentshavenovalueinaspecifiedfield.Forexample,wecancheckhowmanyofourbooksinthelibraryindexdon’thavetheoriginaltitledefined–theotitlefield.Todothis,werunthefollowingquery:
{
"aggs":{
"missing_original_title":{
"missing":{
"field":"otitle"
}
}
}
}
TheresponsereturnedbyElasticsearchinthiscasewilllookasfollows:
{
"took":15,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"missing_original_title":{
"doc_count":2
}
}
}
Aswecansee,wehavetwodocumentswithouttheotitlefield.
www.EBooksWorld.ir
HistogramaggregationThehistogramaggregationisaninterestingonebecauseofitsautomation.Thisaggregationdefinesbucketsitself.Weareonlyresponsiblefordefiningthefieldandtheinterval,andtherestisdoneautomatically.Thesimplestformofaquerythatusesthisaggregationlooksasfollows:
{
"aggs":{
"years":{
"histogram":{
"field":"year",
"interval":100
}
}
}
}
Thenewinformationweneedtoprovideisinterval,whichdefinesthelengthofeveryrangethatwillbeusedtocreateabucket.Wesettheintervalto100,whichinourcasewillresultinbucketsthatare100yearswide.Theaggregationpartoftheresponsetotheprecedingquerythatwassenttoourlibraryindexisasfollows:
{
"took":13,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"years":{
"buckets":[{
"key":1800,
"doc_count":1
},{
"key":1900,
"doc_count":3
}]
}
}
}
Similartotherangeaggregation,thehistogramaggregationallowsustousethekeyedpropertytodefinenamedbuckets.Theotheravailableoptionismin_doc_count,whichallowsustospecifytheminimumnumberofdocumentsrequiredtocreateabucket.Ifwesetthemin_doc_countpropertytozero,Elasticsearchwillalsoincludebucketswiththedocumentcountofzero.Wecanalsousethemissingpropertytospecifythevalue
www.EBooksWorld.ir
Elasticsearchshouldusewhenadocumentdoesn’thaveavalueinthespecifiedfield.
www.EBooksWorld.ir
DatehistogramaggregationAsadate_rangeaggregationisaspecializedformoftherangeaggregation,date_histogramisanextensionofthehistogramaggregationthatworksondates.Forthepurposeofthisexample,wewillagainusethedataweindexedwhendiscussingthedateaggregation.Thismeansthatwewillrunourqueriesagainsttheindexcalledlibrary2.Anexamplequeryusingthedate_histogramaggregationlooksasfollows:
{
"aggs":{
"years":{
"date_histogram":{
"field":"published",
"format":"yyyy-MM-ddHH:mm",
"interval":"10d",
"min_doc_count":1}
}
}
}
Thedifferencebetweenthehistogramanddate_histogramaggregationsistheintervalproperty.Thevalueofthispropertyisnowastringdescribingthetimeinterval,whichinourcaseis10days.Ofcoursewecansetittoanythingwewant.Itusesthesamesuffixeswediscussedwhiletalkingaboutformatsinthedate_rangeaggregation.Itisworthmentioningthatthenumbercanbeafloatvalue.Forexample,1.5mmeansthatthelengthofthebucketwillbeoneandahalfminutes.Theformatattributeisthesameasinthedate_rangeaggregation.Thankstoit,Elasticsearchcanaddahuman-readabledatetextaccordingtothedefinedformat.Ofcoursetheformatattributeisnotrequiredbutuseful.Inadditiontothat,similartotheotherrangeaggregations,thekeyedandmin_doc_countattributesstillwork.
TimezonesElasticsearchstoresallthedatesintheUTCtimezone.YoucandefinethetimezonetobeusedbyElasticsearchbyusingthetime_zoneattribute.Bysettingthisproperty,webasicallytellElasticsearchwhichtimezoneshouldbeusedtoperformthecalculations.Therearethreenotationswithwhichtosettheseattributes:
Wecansetthehoursoffset;forexample,time_zone:5Wecanusethetimeformat;forexample,time_zone:"-04:30"Wecanusethenameofthetimezone;forexample,time_zone:"Europe\Warsaw"
NoteLookathttp://joda-time.sourceforge.net/timezones.htmltoseetheavailabletimezones.
www.EBooksWorld.ir
GeodistanceaggregationsThenexttwoaggregationsareconnectedwithmapsandspatialsearches.WewilltalkaboutgeotypesandqueriesintheElasticsearchspatialcapabilitiessectionofChapter8,BeyondFull-textSearching,sofeelfreetoskipthesetwotopicsnowandreturntothemlater.
Lookatthefollowingquery:
{
"aggs":{
"neighborhood":{
"geo_distance":{
"field":"location",
"origin":[-0.1275,51.507222],
"ranges":[
{"to":1200},
{"from":1201}
]
}
}
}
}
Youcanseethatthequeryissimilartotherangeaggregation.Theprecedingaggregationwillcalculatethenumberofdocumentsthatfallintotwobuckets:onecloserthan1200kmandthesecondonefurtherthan1200kmfromthegeographicalpointdefinedbytheoriginproperty(intheprecedingcase,theoriginisLondon).TheaggregationsectionoftheresponsereturnedbyElasticsearchlooksasfollows:
"neighborhood":{
"buckets":[
{
"key":"*-1200.0",
"from":0,
"to":1200,
"doc_count":1
},
{
"key":"1201.0-*",
"from":1201,
"doc_count":4
}
]
}
Thekeyedandthekeyattributesworkinthegeo_distanceaggregationaswell,sowecaneasilymodifytheresponsetoourneedsandcreatenamedbuckets.
Thegeo_distanceaggregationsupportsafewadditionalparametersthatareshowninthefollowingquery:
{
"aggs":{
www.EBooksWorld.ir
"neighborhood":{
"geo_distance":{
"field":"location",
"origin":{"lon":-0.1275,"lat":51.507222},
"unit":"m",
"distance_type":"plane",
"ranges":[
{"to":1200},
{"from":1201}
]
}
}
}
}
Wehavehighlightedthreethingsintheprecedingquery.Thefirstchangeishowwedefinedtheoriginpoint.Thistimewespecifiedthelocationbyprovidingthelatitudeandlongitudeexplicitly.
Thesecondchangeistheunitattribute.Itdefinestheunitsusedintherangesarray.Thepossiblevaluesare:km(thedefault,kilometers),mi(miles),in(inches),yd(yards),m(meters),cm(centimeters),andmm(millimeters).
Thelastattribute,distance_type,specifieshowElasticsearchcalculatesthedistance.Thepossiblevaluesare(fromthefastestbutleastaccuratetotheslowestbutthemostaccurate):plane,sloppy_arc(thedefault),andarc.
www.EBooksWorld.ir
GeohashgridaggregationThesecondaggregationrelatedtogeographicalanalysisisbasedongridsandiscalledgeohash_grid.Itorganizesareasintogridsandassignseverylocationtoacellinsuchagrid.Todothisefficiently,ElasticsearchusesGeohash(http://en.wikipedia.org/wiki/Geohash),whichencodesthelocationintoastring.Thelongerthestringis,themoreaccuratethedescriptionofaparticularlocation.Forexample,oneletterissufficienttodeclareaboxofaboutfivethousandsquarekilometersand5lettersareenoughtoincreasetheaccuracytofivesquarekilometers.Let’slookatthefollowingquery:
{
"aggs":{
"neighborhood":{
"geohash_grid":{
"field":"location",
"precision":5
}
}
}
}
Wedefinedthegeohash_gridaggregationwithbucketsthathaveaprecisionoffivesquarekilometers(theprecisionattributedescribesthenumberoflettersusedinthegeohashstringobject).Thetablewithresolutionsversusthelengthofgeohashcanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-geohashgrid-aggregation.html.
Ofcourse,themoreaccuratewewanttheaggregationtobe,themoreresourcesElasticsearchwillconsume,becauseofthenumberofbucketsthattheaggregationhastocalculate.Bydefault,Elasticsearchdoesnotgeneratemorethan10,000buckets.Youcanchangethisbehaviorbyusingthesizeattribute,butkeepinmindthattheperformancemaysufferforverywidequeriesconsistingofthousandsofbuckets.
www.EBooksWorld.ir
GlobalaggregationTheglobalaggregationisanaggregationthatdefinesasinglebucketcontainingallthedocumentsfromagivenindexandtype,andnotinfluencedbythequeryitself.Thethingthatdifferentiatestheglobalaggregationfromalltheothersisthattheglobalaggregationhasanemptybody.Forexample,lookatthefollowingquery:
{
"query":{
"term":{
"available":"true"
}
},
"aggs":{
"all_books":{
"global":{}
}
}
}
Inourlibraryindex,weonlyhavetwoavailablebooks,buttheresponsetotheprecedingquerylooksasfollows:
{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":3,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"all_books":{
"doc_count":4
}
}
}
Asyoucansee,theglobalaggregationisnotboundbythequery.Becausetheresultoftheglobalaggregationisasinglebucketcontainingallthedocuments(notnarroweddownbythequeryitself),itisaperfectcandidateforuseasatop-levelparentaggregationfornestingaggregations.
www.EBooksWorld.ir
SignificanttermsaggregationThesignificant_termsaggregationallowsustogetthetermsthatarerelevantandprobablythemostsignificantforagivenquery.Thegoodthingisthatitdoesn’tonlyshowthetoptermsfromtheresultsofthegivenquery,butalsotheonethatseemstobethemostimportantone.
Theusecasesforthisaggregationtypecanvaryfromfindingthemosttroublesomeserverworkinginyourapplicationenvironment,tosuggestingnicknamesfromtext.WheneverElasticsearchseesasignificantchangeinthepopularityofaterm,suchatermisacandidateforbeingsignificant.
NoteRememberthatthesignificant_termsaggregationisveryexpensivewhenitcomestoresourcesandrunningagainstlargeindices.Workisbeingdonetoprovidealightweightversionofthataggregation;asaresult,theAPIforsignificant_termsaggregationmaychangeinthefuture.
Thebestwaytodescribethesignificant_termsaggregationtypeistouseanexample.Let’sstartwithindexing12simpledocuments,whichrepresentreviewsofworkdonebyinterns:
curl-XPOST'localhost:9200/interns/review/1'-d'{"intern":"Richard",
"grade":"bad","type":"grade"}'
curl-XPOST'localhost:9200/interns/review/2'-d'{"intern":"Ralf",
"grade":"perfect","type":"grade"}'
curl-XPOST'localhost:9200/interns/review/3'-d'{"intern":"Richard",
"grade":"bad","type":"grade"}'
curl-XPOST'localhost:9200/interns/review/4'-d'{"intern":"Richard",
"grade":"bad","type":"review"}'
curl-XPOST'localhost:9200/interns/review/5'-d'{"intern":"Richard",
"grade":"good","type":"grade"}'
curl-XPOST'localhost:9200/interns/review/6'-d'{"intern":"Ralf",
"grade":"good","type":"grade"}'
curl-XPOST'localhost:9200/interns/review/7'-d'{"intern":"Ralf",
"grade":"perfect","type":"review"}'
curl-XPOST'localhost:9200/interns/review/8'-d'{"intern":"Richard",
"grade":"medium","type":"review"}'
curl-XPOST'localhost:9200/interns/review/9'-d'{"intern":"Monica",
"grade":"medium","type":"grade"}'
curl-XPOST'localhost:9200/interns/review/10'-d'{"intern":"Monica",
"grade":"medium","type":"grade"}'
curl-XPOST'localhost:9200/interns/review/11'-d'{"intern":"Ralf",
"grade":"good","type":"grade"}'
curl-XPOST'localhost:9200/interns/review/12'-d'{"intern":"Ralf",
"grade":"good","type":"grade"}'
Ofcourse,toshowtherealpowerofthesignificant_termsaggregation,weshoulduseawaylargerdataset.However,forthepurposeofthisbook,wewillconcentrateonthisexample,soitiseasiertoillustratehowthisaggregationworks.
Nowlet’stryfindingthemostsignificantgradeforRichard.Todothiswewillusethe
www.EBooksWorld.ir
followingquery:
curl-XGET'localhost:9200/interns/_search?size=0&pretty'-d'{
"query":{
"match":{
"intern":"Richard"
}
},
"aggregations":{
"description":{
"significant_terms":{
"field":"grade"
}
}
}
}'
Theresultoftheprecedingquerylooksasfollows:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":5,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"description":{
"doc_count":5,
"buckets":[{
"key":"bad",
"doc_count":3,
"score":0.84,
"bg_count":3
}]
}
}
}
Asyoucansee,forourqueryElasticsearchinformedusthatthemostsignificantgradeforRichardisbad.Maybeitwasn’tthebestinternshipforhim;whoknows.
ChoosingsignificanttermsTocalculatesignificantterms,Elasticsearchlooksfordatathatreportsasignificantchangeintheirpopularitybetweentwosetsofdata:theforegroundsetandthebackgroundset.Theforegroundsetisthedatareturnedbyourquery,whilethebackgroundsetisthedatainourindex(orindices,dependingonhowwerunourqueries).Ifatermexistsin10documentsoutofonemillionindexed,butappearsin5documentsfromthe10returned,
www.EBooksWorld.ir
thensuchatermisdefinitelysignificantandworthconcentratingon.
Let’sgetbacktoourprecedingexamplenowtoanalyzeitabit.Richardgotthreegradesfromthereviewers–badthreetimes,mediumonetime,andgoodonetime.Fromthesethree,thebadvalueappearedinthreeoutofthefivedocumentsmatchingthequery.Ingeneral,thebadgradeappearedinthreedocuments(thebg_countproperty)outofthe12documentsintheindex(thisisourbackgroundset).Thisgivesus25percentoftheindexeddocuments.Ontheotherhand,thebadgradeappearedinthreeoutofthefivedocumentsmatchingthequery(thisisourforegroundset),whichgivesus60percentofthedocuments.Asyoucansee,thechangeinpopularityissignificantforthebadgradeandthat’swhyElasticsearchhasreturneditinthesignificant_termsaggregationresults.
MultiplevalueanalysisThesignificant_termsaggregationcanbenestedandprovideuswithnicedataanalysiscapabilitiesthatconnecttwomultiplesetsofdata.Forexample,let’strytofindasignificantgradeforeachoftheinternsthatwehaveinformationabout.Todothiswewillnestthesignificant_termsaggregationinsidethetermsaggregation.Thequerythatdoesthatlooksasfollows:
curl-XGET'localhost:9200/interns/_search?size=0&pretty'-d'{
"aggregations":{
"grades":{
"terms":{
"field":"intern"
},
"aggregations":{
"significantGrades":{
"significant_terms":{
"field":"grade"
}
}
}
}
}
}'
TheresultsreturnedbyElasticsearchfortheprecedingqueryareasfollows:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":12,
"max_score":0.0,
"hits":[]
},
"aggregations":{
www.EBooksWorld.ir
"grades":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[{
"key":"ralf",
"doc_count":5,
"significantGrades":{
"doc_count":5,
"buckets":[{
"key":"good",
"doc_count":3,
"score":0.48,
"bg_count":4
}]
}
},{
"key":"richard",
"doc_count":5,
"significantGrades":{
"doc_count":5,
"buckets":[{
"key":"bad",
"doc_count":3,
"score":0.84,
"bg_count":3
}]
}
},{
"key":"monica",
"doc_count":2,
"significantGrades":{
"doc_count":2,
"buckets":[]
}
}]
}
}
}
www.EBooksWorld.ir
SampleraggregationThesampleraggregationisoneoftheexperimentalaggregationsinElasticsearch.Itallowsustolimitthesubaggregationprocessingtoasampleofdocumentsthataretop-scoringones.Thisallowsfilteringandpotentialremovalofgarbageinthedata.Itisaverynicecandidateasatop-levelaggregationtolimittheamountofdatathesignificant_termsaggregationrunson.Thesimplestexampleofusingthisaggregationisasfollows:
{
"aggs":{
"sampler_example":{
"sampler":{
"field":"tags",
"max_docs_per_value":1,
"shard_size":10
},
"aggs":{
"best_terms":{
"terms":{
"field":"title"
}
}
}
}
}
}
Toseetherealpowerofsampling,wewillhavetoplaywithitonalargerdataset,butfornowwewilldiscusstheprecedingexample.Thesampleraggregationwasdefinedwiththreeproperties:field,max_docs_per_value,andshard_size.Thefirsttwopropertiesallowustocontrolthediversityofthesampling.WetellElasticsearchhowmanydocumentsatmaximum(thevalueofthemax_doc_per_valueproperty)canbecollectedonashardwiththesamevalueinthedefinedfield(thevalueofthefieldproperty).
Theshard_sizepropertytellsElasticsearchhowmanydocuments(atmost)tocollectfromeachshard.
www.EBooksWorld.ir
ChildrenaggregationThechildrenaggregationisasingle-bucketaggregationthatcreatesabucketwithallthechildrenofthespecifiedtype.Let’sgetbacktotheUsingtheparent-childrelationshipsectioninChapter5,ExtendingYourIndexStructure,andlet’susethecreatedshopindex.Tocreateabucketofallchildrendocumentswiththevariationtypeintheshopindex,werunthefollowingquery:
{
"aggs":{
"variation_children":{
"children":{
"type":"variation"
}
}
}
}
TheresponsereturnedbyElasticsearchisasfollows:
{
"took":4,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":3,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"variation_children":{
"doc_count":2
}
}
}
NoteBecausethechildrenaggregationusesparent–childfunctionality,itreliesonthe_parentfield,whichneedstobepresent.
www.EBooksWorld.ir
NestedaggregationIntheUsingnestedobjectssectionofChapter5,ExtendingYourIndexStructure,welearnedaboutnesteddocuments.Let’susethatdatatolookintothenexttypeofaggregation–thenestedone.Let’screatethesimplestworkingquery,whichlookslikethis(weusetheshop_nestedindexcreatedinthementionedchapter):
{
"aggs":{
"variations":{
"nested":{
"path":"variation"
}
}
}
}
Theprecedingqueryissimilarinstructuretoanyotheraggregation.However,insteadofprovidingthefieldnameonwhichtheaggregationshouldbecalculated,itcontainsasingleparameterpath,whichpointstothenesteddocument.Intheresponsewegetanumberofnesteddocuments:
{
"took":4,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"variations":{
"doc_count":2
}
}
}
Theprecedingresponsemeansthatwehavetwonesteddocumentsintheindex,withtheprovidedtypevariation.
www.EBooksWorld.ir
ReversenestedaggregationThereverse_nestedaggregationisaspecial,single-bucketaggregationthatallowsaggregationonparentdocumentsfromthenesteddocuments.Thereverse_nestedaggregationdoesn’thaveabodysimilartoglobalaggregation.Soundsquitecomplicated,butitisnot.Let’slookatthefollowingquerythatwerunagainsttheshop_nestedindexcreatedinChapter5,ExtendingYourIndexStructureintheUsingnestedobjectssection:
{
"aggs":{
"variations":{
"nested":{
"path":"variation"
},
"aggs":{
"sizes":{
"terms":{
"field":"variation.size"
},
"aggs":{
"product_name_terms":{
"reverse_nested":{},
"aggs":{
"product_name_terms_per_size":{
"terms":{
"field":"name"
}
}
}
}
}
}
}
}
}
}
Westartwiththetoplevelaggregation,whichisthesamenestedaggregationthatweusedwhendiscussingthenestedaggregation.However,weincludeasub-aggregationthatusesreverse_nestedtobeabletoshowtermsfromthetitleforeachsizereturnedbythetop-levelnestedaggregation.Thisispossiblebecause,whenthereverse_nestedaggregationisused,Elasticsearchcalculatesthedataonthebasisoftheparentdocumentsinsteadofusingthenesteddocuments.
NoteRememberthatthereverse_nestedaggregationmustbeusedinsidethenestedaggregation.
Theresponsetotheprecedingquerywilllookasfollows:
{
"took":7,
www.EBooksWorld.ir
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"variations":{
"doc_count":2,
"sizes":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[{
"key":"XL",
"doc_count":1,
"product_name_terms":{
"doc_count":1,
"product_name_terms_per_size":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[{
"key":"shirt",
"doc_count":1
},{
"key":"test",
"doc_count":1
}]
}
}
},{
"key":"XXL",
"doc_count":1,
"product_name_terms":{
"doc_count":1,
"product_name_terms_per_size":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[{
"key":"shirt",
"doc_count":1
},{
"key":"test",
"doc_count":1
}]
}
}
}]
}
}
}
}
www.EBooksWorld.ir
NestingaggregationsandorderingbucketsWhentalkingaboutbucketaggregations,wejustneedtogetbacktothetopicofnestingaggregations.Thisisaverypowerfultechnique,becauseitallowsyoutofurtherprocessthedatafordocumentsinthebuckets.Forexample,thetermsaggregationwillreturnabucketforeachtermandthestatsaggregationcanshowusthestatisticsfordocumentsineachbucket.Forexample,let’slookatthefollowingquery:
{
"aggs":{
"copies":{
"terms":{
"field":"copies"
},
"aggs":{
"years":{
"stats":{
"field":"year"
}
}
}
}
}
}
Thisisanexampleofnestedaggregations.Thetermsaggregationwillreturnbucketsforeachtermfromthecopiesfield(threebucketsinthecaseofourdata),andthestatsaggregationwillcalculatestatisticsfortheyearfieldforthedocumentsfallingintoeachbucketreturnedbythetopaggregation.TheresponsefromElasticsearchfortheprecedingquerylooksasfollows:
{
"took":3,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"copies":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[{
"key":0,
"doc_count":2,
"years":{
"count":2,
www.EBooksWorld.ir
"min":1886.0,
"max":1936.0,
"avg":1911.0,
"sum":3822.0
}
},{
"key":1,
"doc_count":1,
"years":{
"count":1,
"min":1929.0,
"max":1929.0,
"avg":1929.0,
"sum":1929.0
}
},{
"key":6,
"doc_count":1,
"years":{
"count":1,
"min":1961.0,
"max":1961.0,
"avg":1961.0,
"sum":1961.0
}
}]
}
}
}
Thisisapowerfulfeatureandallowsustobuildverycomplexdataprocessingpipelines.Ofcourse,wearenotlimitedtoasinglenestedaggregationandwecannestmultipleofthemandevennestanaggregationinsideanestedaggregation.Forexample:
{
"aggs":{
"popular_tags":{
"terms":{
"field":"copies"
},
"aggs":{
"years":{
"terms":{
"field":"year"
},
"aggs":{
"available_by_year":{
"stats":{
"field":"available"
}
}
}
},
"available":{
"stats":{
"field":"available"
www.EBooksWorld.ir
}
}
}
}
}
}
Asyoucansee,thepossibilitiesarealmostunlimited,ifyouhaveenoughmemoryandCPUpowertohandleverycomplicatedaggregations.
BucketsorderingThereisonemorefeatureaboutnestedaggregationsandtheorderingofaggregationresults.Elasticsearchcanusevaluesfromthenestedaggregationstosorttheparentbuckets.Forexample,let’slookatthefollowingquery:
{
"aggs":{
"availability":{
"terms":{
"field":"copies",
"order":{"numbers.avg":"desc"}
},
"aggs":{
"numbers":{"stats":{}}
}
}
}
}
Inthepreviousexample,theorderintheavailabilityaggregationisbasedontheaveragevaluefromthenumbersaggregation.Thenotationnumbers.avgisrequiredinthiscase,becausestatsisamultivaluedaggregationandprovidesmultipleinformationandwewereinterestedintheaverage.Ifitwerethesumaggregation,thenameoftheaggregationwouldbesufficient.
www.EBooksWorld.ir
www.EBooksWorld.ir
PipelineaggregationsThelasttypeofaggregationwewilldiscussispipelineaggregations.Tillnowwe’velearnedaboutmetricsaggregationsandbucketaggregations.Thefirstonereturnedmetricswhilethesecondtypereturnedbuckets.Andbothmetricsandbucketsaggregationsworkedonthebasisofreturneddocuments.Pipelineaggregationsaredifferent.Theyworkontheoutputoftheotheraggregationsandtheirmetrics,allowingfunctionalitiessuchasmoving-averagecalculations(https://en.wikipedia.org/wiki/Moving_average).
NoteRememberthatpipelineaggregationswereintroducedinElasticsearch2.0andareconsideredexperimental.ThismeansthattheAPIcanchangeinthefuture,breakingbackwards-compatibility.
www.EBooksWorld.ir
AvailabletypesTherearetwotypesofpipelineaggregation.Thesocalledparentaggregationsfamilyworksontheoutputofotheraggregations.Theyareabletoproducenewbucketsornewaggregationstoaddtoexistingbuckets.Thesecondtypeiscalledsiblingaggregationsandtheseaggregationsareabletoproducenewaggregationsonthesamelevel.
www.EBooksWorld.ir
ReferencingotheraggregationsBecauseoftheirnature,thepipelineaggregationsneedtobeabletoaccesstheresultsoftheotheraggregations.Wecandothatviathebuckets_pathproperty,whichisdefinedusingaspecifiedformat.WecanuseafewkeywordsthatallowustotellElasticsearchexactlywhichaggregationandmetricweareinterestedin.The>separatestheaggregationsandthe.characterseparatestheaggregationfromitsmetrics.Forexample,my_sum.summeansthatwetakethesummetricofanaggregationcalledmy_sum.Anotherexampleispopular_tags>my_sum.sum,whichmeansthatweareinterestedinthesummetricofasubaggregationcalledmy_sum,whichisnestedinsidethepopular_tagsaggregation.Inadditiontothis,wecanuseaspecialpathcalled_count.Thiscanbeusedtocalculatethepipelineaggregationsondocumentcountinsteadofspecifiedmetrics.
www.EBooksWorld.ir
GapsinthedataOurdatacancontaingaps–situationswherethedatadoesn’texist.Forsuchusecases,wehavetheabilitytospecifythegap_policypropertyandsetittoskiporinsert_zeros.TheskipvaluetellsElasticsearchtoignorethemissingdataandcontinuefromthenextavailablevalue,whileinsert_zerosreplacesthemissingvalueswithzero.
www.EBooksWorld.ir
PipelineaggregationtypesMostoftheaggregationswewillshowinthissectionareverysimilartotheoneswe’vealreadyseeninthesectionsaboutmetricsandbucketsaggregations.Becauseofthat,wewon’tdiscussthemindepth.Therearealsonew,specificpipelineaggregationsthatwewanttotalkaboutinalittlemoredata.
Min,max,sum,andaveragebucketaggregationsThemin_bucket,max_bucket,sum_bucket,andavg_bucketaggregationsaresiblingaggregations,similarinwhattheyreturntothemin,max,sum,andavgaggregations.However,insteadofworkingonthedatareturnedbythequery,theyworkontheresultsoftheotheraggregations.
Toshowyouasimpleexampleofhowthisaggregationworks,let’scalculatethesumofallthebucketsreturnedbytheotheraggregations.Thequerythatwilldothatlooksasfollows:
{
"aggs":{
"periods_histogram":{
"histogram":{
"field":"year",
"interval":100
},
"aggs":{
"copies_per_100_years":{
"sum":{
"field":"copies"
}
}
}
},
"sum_copies":{
"sum_bucket":{
"buckets_path":"periods_histogram>copies_per_100_years"
}
}
}
}
Asyoucansee,weusedthehistogramaggregationandweincludedanestedaggregationthatcalculatesthesumofthecopiesfield.Oursum_bucketsiblingaggregationisusedoutsidethemainaggregationandreferstoitusingthebuckets_pathproperty.IttellsElasticsearchthatweareinterestedinsummingthevaluesofmetricsreturnedbythecopies_per_100_yearsaggregation.TheresultreturnedbyElasticsearchforthisquerylooksasfollows:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
www.EBooksWorld.ir
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"periods_histogram":{
"buckets":[{
"key":1800,
"doc_count":1,
"copies_per_100_years":{
"value":0.0
}
},{
"key":1900,
"doc_count":3,
"copies_per_100_years":{
"value":7.0
}
}]
},
"sum_copies":{
"value":7.0
}
}
}
Asyoucansee,Elasticsearchaddedanotherbuckettotheresults,calledsum_copies,whichholdsthevaluewewereinterestedin.
CumulativesumaggregationThecumulative_sumaggregationisaparentpipelineaggregationthatallowsustocalculatethesuminthehistogramordate_histogramaggregation.Asimpleexampleoftheaggregationlooksasfollows:
{
"aggs":{
"periods_histogram":{
"histogram":{
"field":"year",
"interval":100
},
"aggs":{
"copies_per_100_years":{
"sum":{
"field":"copies"
}
},
"cumulative_copies_sum":{
"cumulative_sum":{
"buckets_path":"copies_per_100_years"
}
www.EBooksWorld.ir
}
}
}
}
}
Becausethisaggregationisaparentpipelineaggregation,itisdefinedinthesubaggregations.Thereturnedresultlooksasfollows:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"periods_histogram":{
"buckets":[{
"key":1800,
"doc_count":1,
"copies_per_100_years":{
"value":0.0
},
"cumulative_copies_sum":{
"value":0.0
}
},{
"key":1900,
"doc_count":3,
"copies_per_100_years":{
"value":7.0
},
"cumulative_copies_sum":{
"value":7.0
}
}]
}
}
}
Thefirstcumulative_copies_sumis0becauseofthesumdefinedinthebucket.Thesecondisthesumofallthepreviousonesandthecurrentbucket,whichmeans7.Thenextwillbethesumofallthepreviousonesandthenextbucket.
BucketselectoraggregationThebucket_selectoraggregationisanothersiblingparentaggregation.Itallowsusingascripttodecideifabucketshouldberetainedintheparentmulti-bucketaggregation.For
www.EBooksWorld.ir
example,tokeeponlybucketsthathavemorethanonecopyperperiod,wecanrunthefollowingquery(itneedsthescript.inlinepropertytobesettoonintheelasticsearch.ymlfile):
{
"aggs":{
"periods_histogram":{
"histogram":{
"field":"year",
"interval":100
},
"aggs":{
"copies_per_100_years":{
"sum":{
"field":"copies"
}
},
"remove_empty_buckets":{
"bucket_selector":{
"buckets_path":{
"sum_copies":"copies_per_100_years"
},
"script":"sum_copies>1"
}
}
}
}
}
}
Therearetwoimportantthingshere.Thefirstisthebuckets_pathproperty,whichisdifferenttowhatwe’veusedsofar.Nowitusesakeyandavalue.Thekeyisusedtoreferencethevalueinthescript.Thesecondimportantthingisthescriptproperty,whichdefinesthescriptthatdecidesiftheprocessedbucketshouldberetained.TheresultsreturnedbyElasticsearchinthiscaseareasfollows:
{
"took":330,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"periods_histogram":{
"buckets":[{
"key":1900,
"doc_count":3,
"copies_per_100_years":{
www.EBooksWorld.ir
"value":7.0
}
}]
}
}
}
Aswecansee,thebucketwiththecopies_per_100_yearsvalueequalto0hasbeenremoved.
BucketscriptaggregationThebucket_scriptaggregation(siblingparent)allowsustodefinemultiplebucketpathsandusetheminsideascript.Theusedmetricsmustbethenumerictypeandthereturnedvaluealsoneedstobenumeric.Anexampleofusingthisaggregationfollows(thefollowingqueryneedsthescript.inlinepropertytobesettoonintheelasticsearch.ymlfile):
{
"aggs":{
"periods_histogram":{
"histogram":{
"field":"year",
"interval":100
},
"aggs":{
"copies_per_100_years":{
"sum":{
"field":"copies"
}
},
"stats_per_100_years":{
"stats":{
"field":"copies"
}
},
"example_bucket_script":{
"bucket_script":{
"buckets_path":{
"sum_copies":"copies_per_100_years",
"count":"stats_per_100_years.count"
},
"script":"sum_copies/count*1000"
}
}
}
}
}
}
Therearetwothingshere.Thefirstthingisthatwe’vedefinedtwoentriesinthebuckets_pathproperty.Weareallowedtodothatinthebucket_scriptaggregation.Eachentryisakeyandavalue.Thekeyisthenameofthevaluethatwecanuseinthescript.Thesecondisthepathtotheaggregationmetricweareinterestedin.Ofcourse,the
www.EBooksWorld.ir
scriptpropertydefinesthescriptthatreturnsthevalue.
Thereturnedresultsfortheprecedingqueryareasfollows:
{
"took":5,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
"hits":[]
},
"aggregations":{
"periods_histogram":{
"buckets":[{
"key":1800,
"doc_count":1,
"copies_per_100_years":{
"value":0.0
},
"stats_per_100_years":{
"count":1,
"min":0.0,
"max":0.0,
"avg":0.0,
"sum":0.0
},
"example_bucket_script":{
"value":0.0
}
},{
"key":1900,
"doc_count":3,
"copies_per_100_years":{
"value":7.0
},
"stats_per_100_years":{
"count":3,
"min":0.0,
"max":6.0,
"avg":2.3333333333333335,
"sum":7.0
},
"example_bucket_script":{
"value":2333.3333333333335
}
}]
}
}
}
www.EBooksWorld.ir
SerialdifferencingaggregationTheserial_diffaggregationisaparentpipelineaggregationthatimplementsatechniquewherethevaluesintimeseriesdata(suchasahistogramordatehistogram)aresubtractedfromthemselvesatdifferenttimeperiods.Thistechniqueallowsdrawingthedatachangesbetweentimeperiodsinsteadofdrawingthewholevalue.Youknowthatthepopulationofacitygrowswithtime.Ifweusetheserialdifferencingaggregationwiththeperiodofoneday,wecanseethedailygrowth.
Tocalculatetheserial_diffaggregation,weneedtheparentaggregation,whichisahistogramoradate_histogram,andweneedtoprovideitwithbuckets_path,whichpointstothemetricweareinterestedin,andlag(apositive,non-zerointegervalue),whichtellswhichpreviousbuckettosubtractfromthecurrentone.Wecanomitlag,inwhichcaseElasticsearchwillsetitto1.
Let’snowlookatasimplequerythatusesthediscussedaggregation:
{
"aggs":{
"periods_histogram":{
"histogram":{
"field":"year",
"interval":100
},
"aggs":{
"copies_per_100_years":{
"sum":{
"field":"copies"
}
},
"first_difference":{
"serial_diff":{
"buckets_path":"copies_per_100_years",
"lag":1
}
}
}
}
}
}
Theresponsetotheprecedingquerylooksasfollows:
{
"took":68,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":0.0,
www.EBooksWorld.ir
"hits":[]
},
"aggregations":{
"periods_histogram":{
"buckets":[{
"key":1800,
"doc_count":1,
"copies_per_100_years":{
"value":0.0
}
},{
"key":1900,
"doc_count":3,
"copies_per_100_years":{
"value":7.0
},
"first_difference":{
"value":7.0
}
}]
}
}
}
Asyoucansee,withthesecondbucketwegotouraggregation(wewillgetitwitheverybucketafterthataswell).Thecalculatedvalueis7becausethecurrentvalueofcopies_per_100_yearsis7andthepreviousis0.Subtracting0from7givesus7.
DerivativeaggregationThederivativeaggregationisanotherexampleofparentpipelineaggregation.Asitsnamesuggests,itcalculatesaderivative(https://en.wikipedia.org/wiki/Derivative)ofagivenmetricfromahistogramordatehistogram.Theonlythingweneedtoprovideisbuckets_path,whichpointstothemetricweareinterestedin.Anexamplequeryusingthisaggregationlooksasfollows:
{
"aggs":{
"periods_histogram":{
"histogram":{
"field":"year",
"interval":100
},
"aggs":{
"copies_per_100_years":{
"sum":{
"field":"copies"
}
},
"derivative_example":{
"derivative":{
"buckets_path":"copies_per_100_years"
}
}
}
www.EBooksWorld.ir
}
}
}
MovingavgaggregationThelastpipelineaggregationthatwewanttodiscussisthemoving_avgone.Itcalculatesthemovingaveragemetric(https://en.wikipedia.org/wiki/Moving_average)overthebucketsoftheparentaggregation(yes,thisisaparentpipelineaggregation).Similartothefewpreviouslydiscussedaggregations,itneedstoberunontheparenthistogramordatehistogramaggregation.
Whencalculatingthemovingaverage,Elasticsearchwilltakethewindow(specifiedbythewindowpropertyandsetto5bydefault),calculatetheaverageforbucketsinthewindow,movethewindowonebucketfurther,andrepeat.Ofcoursewealsoneedtoprovidebuckets_path,whichpointstothemetricthatthemovingaverageshouldbecalculatedfor.
Anexampleofusingthisaggregationlooksasfollows:
{
"aggs":{
"periods_histogram":{
"histogram":{
"field":"year",
"interval":10
},
"aggs":{
"copies_per_10_years":{
"sum":{
"field":"copies"
}
},
"moving_avg_example":{
"moving_avg":{
"buckets_path":"copies_per_10_years"
}
}
}
}
}
}
Wewillomitincludingtheresponsefortheprecedingqueryasitisquitelarge.
Predictingfuturebuckets
Theverynicethingaboutmovingaverageaggregationisthatitsupportspredictions;itcanattempttoextrapolatethedataithasandcreatefuturebuckets.Toforcetheaggregationtopredictbuckets,wejustneedtoaddthepredictpropertytoanymovingaverageaggregationandsetittothenumberofpredictionswewanttoget.Forexample,ifwewanttoaddfivepredictionstotheprecedingquery,wewillchangeittolookasfollows:
www.EBooksWorld.ir
{
"aggs":{
"periods_histogram":{
"histogram":{
"field":"year",
"interval":10
},
"aggs":{
"copies_per_10_years":{
"sum":{
"field":"copies"
}
},
"moving_avg_example":{
"moving_avg":{
"buckets_path":"copies_per_10_years",
"predict":5
}
}
}
}
}
Ifyoulookattheresultsandcomparetheresponsereturnedforthepreviousquerywiththeonewithpredictions,youwillnoticethatthelastbucketinthepreviousqueryendsonthekeypropertyequalto1960,whilethequerywithpredictionsendsonthekeypropertyequalto2010,whichisexactlywhatwewantedtoachieve.
Themodels
Bydefault,Elasticsearchusesthesimplestmodelforcalculatingthemovingaveragesaggregation,butwecancontrolthatbyspecifyingthemodelproperty;thispropertyholdsthenameofthemodelandthesettingsobject,whichwecanusetoprovidemodelproperties.
Thepossiblemodelsare:simple,linear,ewma,holt,andholt_winters.Discussingeachofthemodelsindetailisbeyondthescopeofthebook,soifyouareinterestedindetailsaboutthedifferentmodels,refertotheofficialElasticsearchdocumentationregardingthemovingaveragesaggregationavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-pipeline-movavg-aggregation.html.
Anexamplequeryusingdifferentmodellooksasfollows:
{
"aggs":{
"periods_histogram":{
"histogram":{
"field":"year",
"interval":10},
"aggs":{
"copies_per_10_years":{
"sum":{
"field":"copies"
www.EBooksWorld.ir
}},
"moving_avg_example":{
"moving_avg":{
"buckets_path":"copies_per_10_years",
"model":"holt",
"settings":{
"alpha":0.6,
"beta":0.4
}
}
}
}
}
}
}
www.EBooksWorld.ir
www.EBooksWorld.ir
SummaryThechapterwejustfinishedwasallaboutdataanalysisinElasticsearch:theaggregationsengine.Welearnedwhattheaggregationsareandhowtheywork.Weusedmetrics,buckets,andnewlyintroducedpipelineaggregations,andlearnedwhatwecandowiththem.
Inthenextchapter,we’llgobeyondfulltextsearching.Wewillusesuggesterstobuildefficientautocompletefunctionalityandcorrecttheusers’spellingmistakes.Wewillseewhatpercolationisandhowtouseitinourapplication.WewillusethegeospatialabilitiesofElasticsearchandwe’lllearnhowtoefficientlyfetchlargeamountofdatafromElasticsearch.
www.EBooksWorld.ir
www.EBooksWorld.ir
Chapter8.BeyondFull-textSearchingThepreviouschapterwasfullydedicatedtodataanalysisandhowwecanperformitwithElasticsearch.Welearnedhowtouseaggregations,whattypesofaggregationareavailable,andwhataggregationsareavailablewithineachtypeandhowtousethem.Inthischapter,wewillgetbacktoqueryrelatedtopics.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:
WhatispercolatorandhowtouseitWhatarethegeospatialcapabilitiesofElasticsearchHowtouseandbuildfunctionalitiesusingElasticsearchsuggestersHowtousetheScrollAPItoefficientlyfetchlargenumbersofresults
www.EBooksWorld.ir
PercolatorHaveyoueverwonderedwhatwouldhappenifwereversethetraditionalmodelofusingqueriestofinddocumentsinElasticsearch?Doesitmakesensetohaveadocumentandsearchforqueriesmatchingit?Itisnotsurprisingthatthereisawholerangeofsolutionswherethismodelisveryuseful.Wheneveryouoperateonanunboundedstreamofinputdata,whereyousearchfortheoccurrencesofparticularevents,youcanusethisapproach.Thiscanbeusedforthedetectionoffailuresinamonitoringsystemorforthe“Tellmewhenaproductwiththedefinedcriteriawillbeavailableinthisshop”functionality.Inthissection,wewilllookathowanElasticsearchpercolatorworksandhowwecanuseittoimplementoneoftheaforementionedusecases.
www.EBooksWorld.ir
TheindexInalltheexamplestobeusedwhendiscussingpercolatorfunctionality,wewilluseanindexcallednotifier.Thementionedindexiscreatedbyusingthefollowingcommand:
curl-XPOST'localhost:9200/notifier'-d'{
"mappings":{
"book":{
"properties":{
"title":{
"type":"string"
},
"otitle":{
"type":"string"
},
"year":{
"type":"integer"
},
"available":{
"type":"boolean"
},
"tags":{
"type":"string",
"index":"not_analyzed"
}
}
}
}
}'
Itisquitesimple.Itcontainsasingletypeandfivefields,whichwillbeusedduringourjourneythroughtheworld.
www.EBooksWorld.ir
PercolatorpreparationElasticsearchexposesaspecialtypecalled.percolatorthatistreateddifferently.Thismeansthatwecanstoreanydocumentsandalsosearchthemlikeanordinarytypeinanyindex.IfyoulookatanyElasticsearchquery,youwillnoticethateachisavalidJSONdocument,whichmeansthatwecanindexandstoreitasadocumentaswell.Thethingisthatpercolatorallowsustoinversethesearchlogicandsearchforquerieswhichmatchagivendocument.Thisispossiblebecauseofthetwojustdiscussedfeatures:thespecial.percolatortypeandthefactthatqueriesinElasticsearcharevalidJSONdocuments.
Let’sgetbacktothelibraryexamplefromChapter2,IndexingYourData,andtrytoindexoneofthequeriesinthepercolator.Weassumethatourusersneedtobeinformedwhenanybookmatchingthecriteriadefinedbythequeryisavailable.
Lookatthefollowingquery1.jsonfilethatcontainsanexamplequerygeneratedbytheuser:
{
"query":{
"bool":{
"must":{
"term":{
"title":"crime"
}
},
"should":{
"range":{
"year":{
"gt":1900,
"lt":2000
}
}
},
"must_not":{
"term":{
"otitle":"nothing"
}
}
}
}
}
Toenhancetheexample,wealsoassumethatourusersareallowedtodefinefiltersusingourhypotheticaluserinterface.Forexample,ourusermaybeinterestedintheavailablebooksthatwerewrittenbeforetheyear2010.Anexamplequerythatcouldhavebeenconstructedbysuchauserinterfacewouldlookasfollows(thequerywaswrittentothequery2.jsonfile):
{
"query":{
"bool":{
"must":{
"range":{
www.EBooksWorld.ir
"year":{
"lt":2010
}
}
},
"filter":{
"term":{
"available":true
}
}
}
}
}
Now,let’sregisterbothqueriesinthepercolator(notethatweareregisteringthequeriesandhaven’tindexedanydocuments).Inordertodothis,wewillrunthefollowingcommands:
curl-XPUT'localhost:9200/notifier/.percolator/1'[email protected]
curl-XPUT'localhost:9200/notifier/.percolator/old_books'[email protected]
Intheprecedingexamples,weusedtwocompletelydifferentidentifiers.Wedidthatinordertoshowthatwecanuseanidentifierthatbestdescribesthequery.Itisuptoustodecideunderwhichnamewewouldlikethequerytoberegistered.
Wearenowreadytouseourpercolator.Ourapplicationwillprovidedocumentstothepercolatorandcheckifanyofthealreadyregisteredqueriesmatchthedocument.Thisisexactlywhatapercolatorallowsustodo-toreversethesearchlogic.Insteadofindexingthedocumentsandrunningqueriesagainstthem,westorethequeriesandsendthedocumentstofindthematchingqueries.
Let’suseanexampledocumentthatwillmatchbothstoredqueries;itwillhavetherequiredtitleandthereleasedate,andwillmentionwhetheritiscurrentlyavailable.Thecommandtosendsuchadocumenttothepercolatorlooksasfollows:
curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{
"doc":{
"title":"CrimeandPunishment",
"otitle":"Преступлéниеинаказáние",
"author":"FyodorDostoevsky",
"year":1886,
"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],
"tags":[],
"copies":0,
"available":true
}
}'
Asweexpected,bothqueriesmatchedandtheElasticsearchresponseincludestheidentifiersofthematchingqueries.Sucharesponselooksasfollows:
{
"took":36,
"_shards":{
"total":5,
www.EBooksWorld.ir
"successful":5,
"failed":0
},
"total":2,
"matches":[{
"_index":"notifier",
"_id":"old_books"
},{
"_index":"notifier",
"_id":"1"
}]
}
Thisworkslikeacharm.Oneveryimportantthingtonoteistheendpointusedinthisquery:_percolate.Usingthisendpointisrequiredwhenwewanttousethepercolator.Theindexnamecorrespondstotheindexwherethequerieswerestored,andthetypeisequaltothetypedefinedinthemappings.
NoteTheresponseformatcontainsinformationabouttheindexandthequeryidentifier.Thisinformationisincludedforcaseswhenwesearchagainstmultipleindicesatonce.Whenusingasingleindex,addinganadditionalqueryparameter,percolate_format=ids,willchangetheresponseasfollows:
"matches":["old_books","1"]
www.EBooksWorld.ir
GettingdeeperBecausethequeriesregisteredinapercolatorareinfactdocuments,wecanuseanormalquerysenttoElasticsearchinordertochoosewhichqueriesstoredinthe.percolatortypeshouldbeusedinthepercolationprocess.Thismaysoundweird,butitreallygivesalotofpossibilities.Inourlibrary,wecanhaveseveralgroupsofusers.Let’sassumethatsomeofthemhavepermissionstoborrowveryrarebooks,orthatwehaveseveralbranchesinthecityandtheusercandeclarewhereheorshewouldliketogetthebookfrom.
Let’sseehowsuchusecasescanbeimplementedbyusingthepercolator.Todothis,wewillneedtoupdateourmappingandincludethebranchinformation.Wedothatbyrunningthefollowingcommand:
curl-XPOST'localhost:9200/notifier/.percolator/_mapping'-d'{
".percolator":{
"properties":{
"branches":{
"type":"string",
"index":"not_analyzed"
}
}
}
}'
Now,inordertoregisteraquery,weusethefollowingcommand:
curl-XPUT'localhost:9200/notifier/.percolator/3'-d'{
"query":{
"term":{
"title":"crime"
}
},
"branches":["brA","brB","brD"]
}'
Intheprecedingexample,weregisteredaquerythatshowsauser’sinterest.Ourhypotheticaluserisinterestedinanybookwiththetermcrimeinthetitlefield(thetermqueryisresponsibleforthis).Heorshewantstoborrowthisbookfromoneofthethreelistedbranches.Whenspecifyingthemappings,wedefinedthatthebranchesfieldisanon-analyzedstringfield.Wecannowincludeaqueryalongwiththedocumentwesentpreviously.Let’slookathowtodothis.
Ourbooksystemjustgotthebook,anditisreadytoreportthebookandcheckwhetherthebookisofinteresttoanyone.Tocheckthis,wesendthedocumentthatdescribesthebookandaddanadditionalquerytosucharequest-thequerythatwilllimittheuserstoonlytheonesinterestedinthebrBbranch.Sucharequestlooksasfollows:
curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{
"doc":{
"title":"CrimeandPunishment",
"otitle":"
www.EBooksWorld.ir
Преступлéниеинаказáние
",
"author":"FyodorDostoevsky",
"year":1886,
"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],
"tags":[],
"copies":0,
"available":true
},
"size":10,
"filter":{
"term":{
"branches":"brB"
}
}
}'
Ifeverythingwasexecutedcorrectly,theresponsereturnedbyElasticsearchshouldlookasfollows(weindexedourquerywith3asanidentifier):
{
"took":27,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"total":1,
"matches":[{
"_index":"notifier",
"_id":"3"
}]
}
ControllingthesizeofreturnedresultsThesizeoftheresultswhenitcomestopercolatormakesthedifference.Themorequeriesasingledocumentmatches,themoreresultswillbereturnedandmorememorywillbeneededbyElasticsearch.Becauseofthis,thereisoneadditionalthingtonote-thesizeparameter.Itallowsustolimitthenumberofmatchesreturned.
PercolatorandscorecalculationInthepreviousexamples,wefilteredourqueriesusingasingletermquery,butwedidn’tthinkaboutthescoringprocessatall.Elasticsearchallowsustocalculatethescorewhenusingthepercolator.Let’schangethepreviouslyuseddocumentsenttothepercolatorandadjustitsothatscoringisused:
curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{
"doc":{
"title":"CrimeandPunishment",
"otitle":"Преступлéниеинаказáние",
"author":"FyodorDostoevsky",
"year":1886,
"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],
www.EBooksWorld.ir
"tags":[],
"copies":0,
"available":true
},
"size":10,
"query":{
"term":{
"branches":"brB"
}
},
"track_scores":true,
"sort":{
"_score":"desc"
}
}'
Asyoucansee,weusedthequerysectionandincludedanadditionaltrack_scoresattributesettotrue.Thisisneeded,becausebydefaultElasticsearchwon’tcalculatethescoreforthedocumentsbecauseofperformance.Ifweneedscoresinthepercolationprocess,weshouldbeawarethatsuchquerieswillbeslightlymoredemandingwhenitcomestoCPUprocessingpowerthantheonesthatomitcalculatingthescore.
NoteIntheprecedingexample,wetoldElasticsearchtosortourresultonthebasisofthescoreindescendingorder.Thisisthedefaultbehaviorwhentrack_scoresisturnedon,sowecanomitsortdeclaration.Atthetimeofwriting,sortingonscoreindescendingdirectionistheonlyavailableoption.
CombiningpercolatorswithotherfunctionalitiesIfweareallowedtousequeriesalongwiththedocumentssentforpercolation,whycanwenotuseotherElasticsearchfunctionalities?Ofcourse,thisispossible.Forexample,thefollowingdocumentissentalongwithanaggregationandtheresultswillincludetheaggregationcalculation:
curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{
"doc":{
"title":"CrimeandPunishment",
"available":true
},
"aggs":{
"test":{
"terms":{
"field":"branches"
}
}
}
}'
Aswecansee,percolatorallowsustorunbothqueryandaggregations.Lookatthefollowingexampledocument:
curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{
www.EBooksWorld.ir
"doc":{
"title":"CrimeandPunishment",
"year":1886,
"available":true
},
"size":10,
"highlight":{
"fields":{
"title":{}
}
}
}'
Asyoucansee,itcontainsahighlightingsection.AfragmentoftheresponsereturnedbyElasticsearchlooksasfollows:
{
"_index":"notifier",
"_id":"3",
"highlight":{
"title":["<em>Crime</em>andPunishment"]
}
}
NoteNotethattherearesomelimitationswhenitcomestothequerytypessupportedbythepercolatorfunctionality.Inthecurrentimplementation,parent-childrelationsarenotavailableinthepercolator,soyoucan’tusequeriessuchashas_child,top_children,andhas_parent.
www.EBooksWorld.ir
GettingthenumberofmatchingqueriesSometimesyoudon’tcareaboutthematchedqueriesandyouonlywantthenumberofmatchedqueries.Insuchcases,sendingadocumentagainstthestandardpercolatorendpointisnotefficient.Elasticsearchexposesthe_percolate/countendpointtohandlesuchcasesinanefficientway.Anexampleofsuchacommandfollows:
curl-XGET'localhost:9200/notifier/book/_percolate/count?pretty'-d'{
"doc":{...}
}'
www.EBooksWorld.ir
IndexeddocumentpercolationInthefinal,closingparagraphofthepercolationsection,wewanttoshowyouonemorething–thepossibilityofpercolatingadocumentthatisalreadyindexed.Todothis,weneedtousetheGEToperationonthedocumentandprovideinformationaboutwhichpercolatorindexshouldbeused.Let’slookatthefollowingcommand:
curl-XGET'localhost:9200/library/book/1/_percolate?
percolate_index=notifier'
Thiscommandchecksthedocumentwiththe1identifierfromourlibraryindexagainstthepercolatorindexdefinedbythepercolate_indexparameter.Rememberthat,bydefault,Elasticsearchwillusethepercolatorinthesameindexasthedocument;that’swhywe’vespecifiedthepercolate_indexparameter.
www.EBooksWorld.ir
www.EBooksWorld.ir
ElasticsearchspatialcapabilitiesThesearchserverssuchasElasticsearchareusuallylookedatfromtheperspectiveoffull-textsearching.Elasticsearch,becauseofitsmarketingasbeingpartofELK(Elasticsearch,Logstash,andKibana),isalsohighlyknownforbeingabletohandlelargeamountoftimeseriesdata.However,thisisonlyapartofthewholeview.Sometimesbothofthementionedusecasesarenotenough.Imaginesearchingforlocalservices.Fortheenduser,themostimportantthingistheaccuracyoftheresults.Byaccuracy,wenotonlymeantheproperresultsofthefull-textsearch,butalsotheresultsbeingasnearastheycanintermsoflocation.Inseveralcases,thisisthesameasatextsearchongeographicalnamessuchascitiesorstreets,butinothercaseswecanfinditveryusefultobeabletosearchonthebasisofthegeographicalcoordinatesofourindexeddocuments.AndthisisalsoafunctionalitythatElasticsearchiscapableofhandling.
WiththereleaseofElasticsearch2.2,thegeo_pointtypereceivedalotofchanges,especiallyinternallywherealltheoptimizationsweredone.Priorto2.2,thegeo_pointtypewasstoredintheindexasatwonotanalyzedstringvaluesandthischanged.WiththereleaseofElasticsearch2.2,thegeo_pointtypegotallthegreatimprovementsfromApacheLucenelibraryandisnowmoreefficient.
www.EBooksWorld.ir
MappingpreparationforspatialsearchesInordertodiscussthespatialsearchfunctionality,let’sprepareanindexwithalistofcities.Thiswillbeaverysimpleindexwithonetypenamedpoi(whichstandsforthepointofinterest),thenameofthecity,anditscoordinates.Themappingsareasfollows:
{
"mappings":{
"poi":{
"properties":{
"name":{"type":"string"},
"location":{"type":"geo_point"}
}
}
}
}
Assumingthatweputthisdefinitionintothemapping1.jsonfile,wecancreateanindexbyrunningthefollowingcommand:
curl-XPUTlocalhost:9200/[email protected]
Theonlynewthingintheprecedingmappingsisthegeo_pointtype,whichisusedforthelocationfield.Byusingit,wecanstorethegeographicalpositionofourcityandusespatial-basedfunctionalities.
www.EBooksWorld.ir
ExampledataOurexampledocuments1.jsonfilewithdocumentslooksasfollows:
{"index":{"_index":"map","_type":"poi","_id":1}}
{"name":"NewYork","location":"40.664167,-73.938611"}
{"index":{"_index":"map","_type":"poi","_id":2}}
{"name":"London","location":[-0.1275,51.507222]}
{"index":{"_index":"map","_type":"poi","_id":3}}
{"name":"Moscow","location":{"lat":55.75,"lon":37.616667}}
{"index":{"_index":"map","_type":"poi","_id":4}}
{"name":"Sydney","location":"-33.859972,151.211111"}
{"index":{"_index":"map","_type":"poi","_id":5}}
{"name":"Lisbon","location":"eycs0p8ukc7v"}
Inordertoperformabulkrequest,weaddedinformationabouttheindexname,type,anduniqueidentifiersofourdocuments;so,wecannoweasilyimportthisdatausingthefollowingcommand:
curl-XPOSTlocalhost:9200/[email protected]
Onethingthatweshouldtakeacloserlookatisthelocationfield.Wecanusevariousnotationsforcoordination.Wecanprovidethelatitudeandlongitudevaluesasastring,asapairofnumbers,orasanobject.Notethatthestringandarraymethodsofprovidingthegeographicallocationhavedifferentordersforthelatitudeandlongitudeparameters.ThelastrecordshowsthatthereisalsoapossibilitytogivecoordinationasaGeohashvalue(thenotationisdescribedindetailathttp://en.wikipedia.org/wiki/Geohash).
Additionalgeo_fieldpropertiesWiththereleaseofElasticsearch2.2,thenumberofparametersthatthegeo_pointtypecanaccepthasbeenreducedandisasfollows:
geohash:BooleanparametertellingElasticsearchwhetherthe.geohashfieldshouldbecreated.Defaultstofalseunlessgeohash_prefixisused.geohash_precision:Maximumsizeofgeohashandgeohash_prefix.geohash_prefix:BooleanparametertellingElasticsearchtoindexthegeohashanditsprefixes.Defaultstofalse.ignore_malformed:BooleanparametertellingElasticsearchtoignoreabadlywrittengeo_fieldpointinsteadofrejectingthewholedocument.Defaultstofalse,whichmeansthatthebadlyformattedgeo_fielddatawillresultinanindexationerrorforthewholedocument.lat_lon:BooleanparametertellingElasticsearchtoindexthespatialdataintwoseparatefieldscalled.latand.lon.Defaultstofalse.precision_step:Parameterallowingcontroloverhowournumericgeographicalpointswillbeindexed.
Keepinmindthatthegeohashfieldrelatedandlat_lonfieldrelatedpropertieswerenotremovedforbackward-compatibilityreasons.Theuserscanstillusethem.However,thequerieswillnotusethembutwillinsteadusethehighlyoptimizeddatastructurethatis
www.EBooksWorld.ir
builtduringindexingbythegeo_pointtype.
www.EBooksWorld.ir
SamplequeriesNowlet’slookatseveralexamplesofusingcoordinatesandsolvingcommonrequirementsinmodernapplicationsthatrequiregeographicaldatasearchingalongwithfull-textsearching.
NoteIfyouareinterestedinallthegeospatialqueriesthatareavailableforElasticsearchusers,refertotheofficialdocumentationavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/geo-queries.html.
Distance-basedsortingLet’sstartwithaverycommonrequirement:sortingthereturnedresultsbydistancefromagivenpoint.Inourexample,wewanttogetallthecitiesandsortthembytheirdistancesfromthecapitalofFrance,Paris.Todothis,wesendthefollowingquerytoElasticsearch:
curl-XGETlocalhost:9200/map/_search?pretty-d'{
"query":{
"match_all":{}
},
"sort":[{
"_geo_distance":{
"location":"48.8567,2.3508",
"unit":"km"
}
}]
}'
IfyouremembertheSortingdatasectionfromChapter4,ExtendingYourQueryingKnowledge,you’llnoticethattheformatisslightlydifferent.Weareusingthe_geo_distancekeytoindicatesortingbydistance.Wemustgivethebaselocation(thelocationattribute,whichholdstheinformationofthelocationofParisinourcase),andweneedtospecifytheunitsthatcanbeusedintheresults.Theavailablevaluesarekmandmi,whichstandforkilometersandmiles,respectively.Theresultofsuchaquerywillbeasfollows:
{
"took":5,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":5,
"max_score":null,
"hits":[{
"_index":"map",
"_type":"poi",
"_id":"2",
www.EBooksWorld.ir
"_score":null,
"_source":{
"name":"London",
"location":[-0.1275,51.507222]
},
"sort":[343.17487356850313]
},{
"_index":"map",
"_type":"poi",
"_id":"5",
"_score":null,
"_source":{
"name":"Lisbon",
"location":"eycs0p8ukc7v"
},
"sort":[1452.9506736367805]
},{
"_index":"map",
"_type":"poi",
"_id":"3",
"_score":null,
"_source":{
"name":"Moscow",
"location":{
"lat":55.75,
"lon":37.616667
}
},
"sort":[2483.837565935267]
},{
"_index":"map",
"_type":"poi",
"_id":"1",
"_score":null,
"_source":{
"name":"NewYork",
"location":"40.664167,-73.938611"
},
"sort":[5832.645958617513]
},{
"_index":"map",
"_type":"poi",
"_id":"4",
"_score":null,
"_source":{
"name":"Sydney",
"location":"-33.859972,151.211111"
},
"sort":[16978.094780773998]
}]
}
}
Aswiththeotherexamplesofsorting,Elasticsearchshowsinformationaboutthevalueusedforsorting.Let’slookatthehighlightedrecord.Aswecansee,thedistancebetweenParisandLondonisabout343km,andifyoucheckatraditionalmap,youwillseethat
www.EBooksWorld.ir
thisistrue.
BoundingboxfilteringThenextexamplethatwewanttoshowisnarrowingdowntheresultstoaselectedareathatisboundedbyagivenrectangle.Thisisveryhandyifwewanttoshowresultsonthemaporwhenweallowausertomarkthemapareaforsearching.YoualreadyreadaboutfiltersintheFilteringyourresultssectionofChapter4,ExtendingYourQueryingKnowledge,buttherewedidn’tmentionspatialfilters.Thefollowingqueryshowshowwecanfilterbyusingtheboundingbox:
curl-XGETlocalhost:9200/map/_search?pretty-d'{
"query":{
"bool":{
"must":{"match_all":{}},
"filter":{
"geo_bounding_box":{
"location":{
"top_left":"52.4796,-1.903",
"bottom_right":"48.8567,2.3508"
}
}
}
}
}
}'
Intheprecedingexample,weselectedamapfragmentbetweenBirminghamandParisbyprovidingthetop-leftandbottom-rightcornercoordinates.Thesetwocornersareenoughtospecifyanyrectanglewewant,andElasticsearchwilldotherestofthecalculationforus.Thefollowingscreenshotshowsthespecifiedrectangleonthemap:
www.EBooksWorld.ir
Aswecansee,theonlycityfromourdatathatmeetsthecriteriaisLondon.So,let’scheckwhetherElasticsearchknowsthisbyrunningtheprecedingquery.Let’snowlookatthereturnedresults:
{
"took":38,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":1.0,
"hits":[{
"_index":"map",
"_type":"poi",
"_id":"2",
"_score":1.0,
"_source":{
"name":"London",
"location":[-0.1275,51.507222]
}
}]
}
}
www.EBooksWorld.ir
Asyoucansee,againElasticsearchagreeswiththemap.
LimitingthedistanceThelastexampleshowsthenextcommonrequirement:limitingtheresultstotheplacesthatarelocatednofurtherthanthedefineddistancefromagivenpoint.Forexample,ifwewanttolimitourresultstoallthecitieswithinthe500kmradiusfromParis,wecanusethefollowingquery:
curl-XGETlocalhost:9200/map/_search?pretty-d'{
"query":{
"bool":{
"must":{"match_all":{}},
"filter":{
"geo_distance":{
"location":"48.8567,2.3508",
"distance":"500km"
}
}
}
}
}'
Ifeverythinggoeswell,Elasticsearchshouldonlyreturnasinglerecordfortheprecedingquery,andtherecordshouldbeLondonagain.However,wewillleaveitforyouasareadertocheck.
www.EBooksWorld.ir
ArbitrarygeoshapesSometimes,usingasinglegeographicalpointorasinglerectangleisjustnotenough.Insuchcasessomethingmoresophisticatedisneeded,andElasticsearchaddressesthisbygivingyouthepossibilitytodefineshapes.Inordertoshowyouhowwecanleveragecustomshape-limitinginElasticsearch,weneedtomodifyourindexorcreateanewoneandintroducethegeo_shapetype.Ournewmappinglooksasfollows(wewillusethistocreateanindexcalledmap2):
{
"mappings":{
"poi":{
"properties":{
"name":{"type":"string","index":"not_analyzed"},
"location":{"type":"geo_shape"}
}
}
}
}
Assumingwewrotetheprecedingmappingdefinitiontothemapping2.jsonfile,wecancreateanindexbyusingthefollowingcommand:
curl-XPUTlocalhost:9200/[email protected]
NoteElasticsearchallowsustosetseveralattributesforthegeo_shapetype.Themostcommonlyusedistheprecisionparameter.Duringindexing,theshapeshavetobeconvertedtoasetofterms.Themoreaccuracyrequired,themoretermsshouldbegenerated,whichisdirectlyreflectedintheindexsizeandperformance.Precisioncanbedefinedinthefollowingunits:in,inch,yd,yard,mi,miles,km,kilometers,m,meters,cm,centimeters,ormm,millimeters.Bydefault,theprecisionissetto50m.
Next,let’schangeourexampledatatomatchournewindexstructureandcreatethedocuments2.jsonfilewiththefollowingcontents:
{"index":{"_index":"map2","_type":"poi","_id":1}}
{"name":"NewYork","location":{"type":"point","coordinates":
[-73.938611,40.664167]}}
{"index":{"_index":"map2","_type":"poi","_id":2}}
{"name":"London","location":{"type":"point","coordinates":
[-0.1275,51.507222]}}
{"index":{"_index":"map2","_type":"poi","_id":3}}
{"name":"Moscow","location":{"type":"point","coordinates":[
37.616667,55.75]}}
{"index":{"_index":"map2","_type":"poi","_id":4}}
{"name":"Sydney","location":{"type":"point","coordinates":
[151.211111,-33.865143]}}
{"index":{"_index":"map2","_type":"poi","_id":5}}
{"name":"Lisbon","location":{"type":"point","coordinates":
[-9.142685,38.736946]}}
www.EBooksWorld.ir
Thestructureofthefieldofthegeo_shapetypeisdifferentfromgeo_point.ItissyntacticallycalledGeoJSON(http://en.wikipedia.org/wiki/GeoJSON).Itallowsustodefinevariousgeographicaltypes.Nowit’stimetoindexourdata:
curl-XPOSTlocalhost:9200/[email protected]
Let’ssumupthetypesthatwecanuseduringquerying,atleasttheonesthatwethinkarethemostusefulones.
PointApointisdefinedbythetablewhenthefirstelementisthelongitudeandthesecondisthelatitude.Anexampleofsuchashapeisasfollows:
{
"type":"point",
"coordinates":[-0.1275,51.507222]
}
EnvelopeAnenvelopedefinesaboxgivenbythecoordinatesoftheupper-leftandbottom-rightcornersofthebox.Anexampleofsuchashapeisasfollows:
{
"type":"envelope",
"coordinates":[[-0.087890625,51.50874245880332],[2.4169921875,
48.80686346108517]]
}
PolygonApolygondefinesalistofpointsthatareconnectedtocreateourpolygon.Thefirstandthelastpointinthearraymustbethesamesothattheshapeisclosed.Anexampleofsuchashapeisasfollows:
{
"type":"polygon",
"coordinates":[[
[-5.756836,49.991408],
[-7.250977,55.124723],
[1.845703,51.500194],
[-5.756836,49.991408]
]]
}
Ifyoulookcloselyattheshapedefinition,youwillfindasupplementaryleveloftables.Thankstothis,youcandefinemorethanasinglepolygon.Insuchacase,thefirstpolygondefinesthebaseshapeandtherestofthepolygonsaretheshapesthatwillbeexcludedfromthebaseshape.
MultipolygonThemultipolygonshapeallowsustocreateashapethatconsistsofmultiplepolygons.Anexampleofsuchashapeisasfollows:
www.EBooksWorld.ir
{
"type":"multipolygon",
"coordinates":[
[[
[-5.756836,49.991408],
[-7.250977,55.124723],
[1.845703,51.500194],
[-5.756836,49.991408]
]],[[
[-0.087890625,51.50874245880332],
[2.4169921875,48.80686346108517],
[3.88916015625,51.01375465718826],
[-0.087890625,51.50874245880332]
]]]
}
Themultipolygonshapecontainsmultiplepolygonsandfallsintothesamerulesasthepolygontype.So,wecanhavemultiplepolygonsand,inadditiontothis,wecanincludemultipleexclusionshapes.
AnexampleusageNowthatwehaveourindexwiththegeo_shapefields,wecancheckwhichcitiesarelocatedintheUK.Thequerythatwillallowustodothislooksasfollows:
curl-XGETlocalhost:9200/map2/_search?pretty-d'{
"query":{
"bool":{
"must":{"match_all":{}},
"filter":{
"geo_shape":{
"location":{
"shape":{
"type":"polygon",
"coordinates":[[
[-5.756836,49.991408],[-7.250977,55.124723],
[-3.955078,59.352096],[1.845703,51.500194],
[-5.756836,49.991408]
]]
}
}
}
}
}
}
}'
ThepolygontypedefinestheboundariesoftheUK(inavery,veryimpreciseway),andElasticsearch’sresponseisasfollows:
{
"took":7,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
www.EBooksWorld.ir
"failed":0
},
"hits":{
"total":1,
"max_score":1.0,
"hits":[{
"_index":"map2",
"_type":"poi",
"_id":"2",
"_score":1.0,
"_source":{
"name":"London",
"location":{
"type":"point",
"coordinates":[-0.1275,51.507222]
}
}
}]
}
}
Asfarasweknow,theresponseiscorrect.
StoringshapesintheindexUsually,shapedefinitionsarecomplex,andthedefinedareasdon’tchangetoooften(forexample,theboundariesoftheUK).Insuchcases,itisconvenienttodefinetheshapesintheindexandusetheminqueries.Thisispossible,andwewillnowdiscusshowtodoit.Asusual,wewillstartwiththeappropriatemapping,whichisasfollows:
{
"mappings":{
"country":{
"properties":{
"name":{"type":"string","index":"not_analyzed"},
"area":{"type":"geo_shape"}
}
}
}
}
Thismappingissimilartothemappingusedpreviously.Wehaveonlychangedthefieldnameandsaveditinthemapping3.jsonfile.Let’screateanewindexbyrunningthefollowingcommand:
curl-XPUTlocalhost:9200/[email protected]
Theexampledatathatwewilluselooksasfollows(storedinthefilecalleddocuments3.json):
{"index":{"_index":"countries","_type":"country","_id":1}}
{"name":"UK","area":{"type":"polygon","coordinates":[[[-5.756836,
49.991408],[-7.250977,55.124723],[-3.955078,59.352096],[1.845703,
51.500194],[-5.756836,49.991408]]]}}
{"index":{"_index":"countries","_type":"country","_id":2}}
{"name":"France","area":{"type":"polygon","coordinates":[[[
www.EBooksWorld.ir
3.1640625,42.09822241118974],[-1.7578125,43.32517767999296],[
-4.21875,48.22467264956519],[2.4609375,50.90303283111257],[
7.998046875,48.980216985374994],[7.470703125,44.08758502824516],[
3.1640625,42.09822241118974]]]}}
{"index":{"_index":"countries","_type":"country","_id":3}}
{"name":"Spain","area":{"type":"polygon","coordinates":[[[
3.33984375,42.22851735620852],[-1.845703125,43.32517767999296],[
-9.404296875,43.19716728250127],[-6.6796875,41.57436130598913],[
-7.3828125,36.87962060502676],[-2.109375,36.52729481454624],[
3.33984375,42.22851735620852]]]}}
Toindexthedata,wejustneedtorunthefollowingcommand:
curl-XPOSTlocalhost:9200/[email protected]
Asyoucanseeinthedata,eachdocumentcontainsapolygontype.Thepolygonsdefinetheareaofthegivencountries(again,itisfarfrombeingaccurate).Ifyouremember,thefirstpointofashapeneedstobethesameasthelastonesothattheshapeisclosed.Now,let’schangeourquerytoincludetheshapesfromtheindex.Ournewquerylooksasfollows:
curl-XGETlocalhost:9200/map2/_search?pretty-d'{
"query":{
"bool":{
"must":{"match_all":{}},
"filter":{
"geo_shape":{
"location":{
"indexed_shape":{
"index":"countries",
"type":"country",
"path":"area",
"id":"1"
}
}
}
}
}
}
}'
Whencomparingthesetwoqueries,wecannotethattheshapeobjectchangedtoindexed_shape.WeneedtotellElasticsearchwheretolookforthisshape.Wecandothisbydefiningtheindex(theindexproperty,whichdefaultstoshape),thetype(thetypeproperty),andthepath(thepathproperty,whichdefaultstoshape).Theoneitemlackingisanidpropertyoftheshape.Inourcase,thisis1.However,ifyouwanttoindexmoreshapes,weadviseyoutoindextheshapeswiththeirnameastheiridentifier.
www.EBooksWorld.ir
www.EBooksWorld.ir
UsingsuggestersAlongtimeago,startingfromElasticsearch0.90(whichwasreleasedonApril29,2013),wegottheabilitytouseso-calledsuggesters.Wecandefineasuggesterasafunctionalityallowingustocorrecttheuser’sspellingmistakesandbuildautocompletefunctionalitykeepingperformanceinmind.Thissectionisdedicatedtothesefunctionalitiesandwillhelpyoulearnaboutthem.Wewilldiscusseachavailablesuggestertypeandshowthemostcommonpropertiesthatallowustocontrolthem.However,keepinmindthatthissectionisnotacomprehensiveguidedescribingeachandeveryproperty.Descriptionofallthedetailsaboutsuggestersareaverybroadtopicandisoutofthescopeofthisbook.Ifyouwanttodigintotheirfunctionality,refertotheofficialElasticsearchdocumentation(https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html)ortotheMasteringElasticsearchSecondEditionbookpublishedbyPacktPublishing.
www.EBooksWorld.ir
AvailablesuggestertypesThesehavechangedsincetheinitialintroductionoftheSuggestAPItoElasticsearch.Wearenowabletousefourtypeofsuggesters:
term:Asuggesterreturningcorrectionsforeachwordpassedtoit.Usefulforsuggestionsthatarenotphrases,suchassingletermqueries.phrase:Asuggesterworkingonphrases,returningaproperphrase.completion:Asuggesterdesignedtoprovidefastandefficientautocompleteresults.context:ExtensiontotheSuggestAPIofElasticsearch.Allowsustohandlepartsofthesuggestqueriesinmemoryandthusveryeffectiveintermsofperformance.
www.EBooksWorld.ir
IncludingsuggestionsLet’snowtrygettingsuggestionsalongwiththequeryresults.Forexample,let’suseamatch_allqueryandtrygettingasuggestionforaserlockholnesphrase,whichhastwotermsspelledincorrectly.Todothis,werunthefollowingcommand:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"match_all":{}
},
"suggest":{
"first_suggestion":{
"text":"serlockholnes",
"term":{
"field":"_all"
}
}
}
}'
Asyoucansee,we’veintroducedanewsectiontoourquery–thesuggestone.We’vespecifiedthetextwewanttogetthecorrectionforbyusingthetextproperty.We’vespecifiedthesuggesterwewanttouse(thetermone)andconfigureditspecifyingthenameofthefieldthatshouldbeusedforbuildingsuggestionsusingthefieldproperty.first_suggestionisthenamewegivetooursuggester;weneedtodothisbecausetherecanbemultipleonesused.Thisishowyousendarequestforsuggestioningeneral.
Ifwewanttogetmultiplesuggestionsforthesametext,wecanembedoursuggestionsinthesuggestobjectandplacethetextpropertyasthesuggestobjectoption.Forexample,ifwewanttogetsuggestionsfortheserlockholnestextforthetitlefieldandforthe_allfield,werunthefollowingcommand:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"query":{
"match_all":{}
},
"suggest":{
"text":"serlockholnes",
"first_suggestion":{
"term":{
"field":"_all"
}
},
"second_suggestion":{
"term":{
"field":"title"
}
}
}
}'
Suggesterresponse
www.EBooksWorld.ir
Nowlet’slookattheresponseofthefirstquerywesent.Asyoucanguess,theresponseincludesboththequeryresultsandthesuggestions:
{
"took":10,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":1.0,
"hits":[...]
},
"suggest":{
"first_suggestion":[{
"text":"serlock",
"offset":0,
"length":7,
"options":[{
"text":"sherlock",
"score":0.85714287,
"freq":1
}]
},{
"text":"holnes",
"offset":8,
"length":6,
"options":[{
"text":"holmes",
"score":0.8333333,
"freq":1
}]
}]
}
}
Wecanseethatwegotboththesearchresultsandthesuggestions(we’veomittedtheresultstomaketheexamplemorereadable)intheresponse.Thetermsuggesterreturnedalistofpossiblesuggestionsforeachtermthatwaspresentinthetextparameter.Foreachterm,thetermsuggesterreturnsanarrayofpossiblesuggestions.Lookingatthedatareturnedfortheserlockterm,wecanseetheoriginalword(thetextparameter),itsoffsetintheoriginaltextparameter(theoffsetparameter),anditslength(thelengthparameter).
TheoptionsarraycontainssuggestionsforthegivenwordandwillbeemptyifElasticsearchdoesn’tfindanysuggestions.Eachentryinthisarrayisasuggestionanddescribedbythefollowingproperties:
text:Textofthesuggestion.score:Suggestionscore;thehigherthescore,thebetterthesuggestion.freq:Frequencyofthesuggestion.Thefrequencyrepresentshowmanytimesthe
www.EBooksWorld.ir
wordappearsinthedocumentsintheindexwearerunningthesuggestionqueryagainst.
www.EBooksWorld.ir
TermsuggesterThetermsuggesterworksonthebasisofstringeditdistance.Thismeansthatthesuggestionwiththefewestcharactersthatneedtobechanged,added,orremovedtomakethesuggestionlookastheoriginalword,isthebestone.Forexample,let’stakethewordsworlandwork.Tochangetheworltermtowork,weneedtochangethellettertok,soitmeansadistanceof1.Thetextprovidedtothesuggesterisofcourseanalyzedandthentermsarechosentobesuggested.
TermsuggesterconfigurationoptionsThecommonandmostusedtermsuggesteroptionscanbeusedforallthesuggesterimplementationsthatarebasedonthetermone.Currently,thesearethephrasesuggesterandofcoursethebasetermone.Theavailableoptionsare:
text:Thetextwewanttogetthesuggestionsfor.Thisparameterisrequiredinorderforthesuggestertowork.field:Anotherrequiredparameterthatweneedtoprovide.Thefieldparameterallowsustosetwhichfieldthesuggestionsshouldbegeneratedfor.analyzer:Thenameoftheanalyzerwhichshouldbeusedtoanalyzethetextprovidedinthetextparameter.Ifnotset,Elasticsearchutilizestheanalyzerusedforthefieldprovidedbythefieldparameter.size:Defaultsto5andspecifiesthemaximumnumberofsuggestionsallowedtobereturnedbyeachtermprovidedinthetextparameter.suggest_mode:Controlswhichsuggestionswillbeincludedandforwhattermsthesuggestionswillbereturned.Thepossibleoptionsare:missing–thedefaultbehavior,whichmeansthatthesuggesterwillonlyprovidesuggestionsfortermsthatarenotpresentintheindex;popular–meansthatthesuggestionswillonlybereturnedwhentheyaremorefrequentthantheprovidedterm;andfinallyalwaysmeansthatsuggestionswillbereturnedeverytime.sort:AllowsustospecifyhowthesuggestionsaresortedintheresultreturnedbyElasticsearch.Bydefault,itissettoscore,whichtellsElasticsearchthatthesuggestionsshouldbesortedbythesuggestionscorefirst,thesuggestiondocumentfrequencynext,andfinallybytheterm.Thesecondpossiblevalueisfrequency,whichmeansthattheresultsarefirstsortedbythedocumentfrequency,thenbythescore,andfinallybytheterm.
AdditionaltermsuggesteroptionsInadditiontotheprecedingcommontermsuggestoptions,Elasticsearchallowsustouseadditionalonesthatonlymakesenseforthetermsuggesteritself.Someoftheseoptionsareasfollows:
lowercase_terms:Whensettotrue,ittellsElasticsearchtolowercaseallthetermsthatareproducedfromthetextfieldafteranalysis.max_edits:Itdefaultsto2andspecifiesthemaximumeditdistancethatthesuggestioncanhavetobereturnedasatermsuggestion.Elasticsearchallowsusto
www.EBooksWorld.ir
setthisvalueto1or2.prefix_len:Bydefault,itissetto1.Ifwearestrugglingwithsuggesterperformance,increasingthisvaluewillimprovetheoverallperformance,becausefewersuggestionswillneedtobeprocessed.min_word_len:Itdefaultsto4andspecifiestheminimumnumberofcharactersasuggestionmusthaveinordertobereturnedonthesuggestionslist.shard_size:Itdefaultstothevaluespecifiedbythesizeparameterandallowsustosetthemaximumnumberofsuggestionsthatshouldbereadfromeachshard.Settingthispropertytovalueshigherthanthesizeparametercanresultinmoreaccuratedocumentfrequencyatthecostofdegradationinsuggesterperformance.
NoteTheprovidedlistofparametersdoesnotcontainalltheoptionsthatareavailableforthetermsuggester.RefertotheofficialElasticsearchdocumentationforreference,athttps://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-term.html.
www.EBooksWorld.ir
PhrasesuggesterThetermsuggesterprovidesagreatwaytocorrectuserspellingmistakesonpertermbasis,butitisnotgreatforphrases.That’swhythephrasesuggesterwasintroduced.Itisbuiltontopofthetermsuggester,butaddsadditionalphrasecalculationlogictoit.
Let’sstartwithanexampleofhowtousethephrasesuggester.Thistimewewillomitthequerysectioninourquery.Wedothatbyrunningthefollowingcommand:
curl-XGET'localhost:9200/library/_search?pretty'-d'{
"suggest":{
"text":"sherlockholnes",
"our_suggestion":{
"phrase":{"field":"_all"}
}
}
}'
Asyoucanseeintheprecedingcommand,itisalmostthesameaswesentwhenusingthetermsuggester,butinsteadofspecifyingthetermsuggestertypewe’vespecifiedthephrasetype.Theresponsetotheprecedingcommandisasfollows:
{
"took":24,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
"max_score":1.0,
"hits":[...]
},
"suggest":{
"our_suggestion":[{
"text":"sherlockholnes",
"offset":0,
"length":15,
"options":[{
"text":"sherlockholmes",
"score":0.12227806
}]
}]
}
}
Asyoucansee,theresponseisverysimilartotheonereturnedbythetermsuggesterbut,insteadofasinglewordbeingreturned,itisalreadycombinedandreturnedasaphrase.
ConfigurationBecausethephrasesuggesterisbasedonthetermsuggester,itcanalsousesomeofthe
www.EBooksWorld.ir
configurationoptionsprovidedbyit.Thoseoptionsare:text,size,analyzer,andshard_size.Inadditiontothementionedproperties,thephrasesuggesterexposesadditionaloptions.Someoftheseoptionsare:
max_errors:Specifiesthemaximumnumber(orpercentage)oftermsthatcanbeerroneousinordertocreateacorrectionusingit.Thevalueofthispropertycanbeeitheranintegernumber,suchas1,orafloatbetween0and1whichwillbetreatedasapercentagevalue.Bydefault,itissetto1,whichmeansthatatmostasingletermcanbemisspelledinagivencorrection.separator:Defaultstoawhitespacecharacterandspecifiestheseparatorthatwillbeusedtodividethetermsintheresultingbigramfield.
NoteTheprovidedlistofparametersdoesnotcontainalltheoptionsthatareavailableforthephrasesuggester.Infact,thelistiswaymoreextensivethanwhatwe’veprovided.RefertotheofficialElasticsearchdocumentationforreference,athttps://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html,ortoMasteringElasticsearchSecondEditionpublishedbyPacktPublishing.
www.EBooksWorld.ir
CompletionsuggesterThecompletionsuggesterallowsustocreateautocompletefunctionalityinaveryperformance-effectiveway,becauseofstoringcomplicatedstructuresintheindexinsteadofcalculatingthemduringquerytime.WeneedtoprepareElasticsearchforthatbyusingadedicatedfieldtypecalledcompletion.Let’sassumethatwewanttocreateanautocompletefeaturetoallowustoshowbookauthors.Inadditiontoauthor’snamewewanttoreturntheidentifiersofthebooksshe/hewrote.Westartwithcreatingtheauthorsindexbyrunningthefollowingcommand:
curl-XPOST'localhost:9200/authors'-d'{
"mappings":{
"author":{
"properties":{
"name":{"type":"string"},
"ac":{
"type":"completion",
"payloads":true,
"analyzer":"standard",
"search_analyzer":"standard"
}
}
}
}
}'
Ourindexwillcontainasingletypecalledauthor.Eachdocumentwillhavetwofields:thenameandtheacfield,whichisthefieldwewilluseforautocomplete.We’vedefinedtheacfieldusingthecompletiontype.Inadditiontothat,we’veusedthestandardanalyzerforboththeindexandthequerytime.Thelastthingisthepayload-theadditional,optionalinformationwewillreturnalongwiththesuggestion-inourcaseitwillbeanarrayofbookidentifiers.
IndexingdataToindexthedata,weneedtoprovidesomeadditionalinformationalongwiththeonesweusuallyprovideduringindexing.Let’slookatthefollowingcommandsthatindextwodocumentsdescribingtheauthors:
curl-XPOST'localhost:9200/authors/author/1'-d'{
"name":"FyodorDostoevsky",
"ac":{
"input":["fyodor","dostoevsky"],
"output":"FyodorDostoevsky",
"payload":{"books":["123456","123457"]}
}
}'
curl-XPOST'localhost:9200/authors/author/2'-d'{
"name":"JosephConrad",
"ac":{
"input":["joseph","conrad"],
"output":"JosephConrad",
www.EBooksWorld.ir
"payload":{"books":["121211"]}
}
}'
Notethestructureofthedatafortheacfield.Wehaveprovidedtheinput,output,andpayloadproperties.Theoptionalpayloadpropertyisusedtoprovidetheadditionalinformationthatwillbereturned.Theinputpropertyisusedtoprovidetheinputinformationthatwillbeusedforbuildingthecompletionusedbythesuggester.Itwillbeusedforuserinputmatching.Theoptionaloutputpropertyisusedtotellthesuggesterwhichdatashouldbereturnedforthedocument.
Wecanalsoomittheadditionalparameterssectionandindexdatainthewayweareusedto,justlikeinthefollowingexample:
curl-XPOST'localhost:9200/authors/author/1'-d'{
"name":"FyodorDostoevsky",
"ac":"FyodorDostoevsky"
}'
However,becausethecompletionsuggesterusesFSTunderthehood,wewon’tbeabletofindtheprecedingdocumentbystartingwiththesecondpartoftheacfield.That’swhywethinkthatindexingthedatainthewayweshowedfirstismoreconvenient,becausewecanexplicitlycontrolwhatwewanttomatchandwhatwewanttoshowasanoutput.
QueryingindexedcompletionsuggesterdataIfwewanttofinddocumentsthathaveauthorsstartingwithfyo,werunthefollowingcommand:
curl-XGET'localhost:9200/authors/_suggest?pretty'-d'{
"authorsAutocomplete":{
"text":"fyo",
"completion":{
"field":"ac"
}
}
}'
Beforewelookattheresults,let’sdiscussthequery.Asyoucansee,we’verunthecommandtothe_suggestendpoint,becausewedon’twanttorunastandardquery;wearejustinterestedintheautocompleteresults.Thequeryisquitesimple.WesetitsnametoauthorsAutocomplete,wesetthetextwewanttogetthecompletionfor(thetextproperty),andweaddedthecompletionobjectwiththeconfigurationinit.Theresultoftheprecedingcommandlooksasfollows:
{
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"authorsAutocomplete":[{
"text":"fyo",
"offset":0,
www.EBooksWorld.ir
"length":3,
"options":[{
"text":"FyodorDostoevsky",
"score":1.0,
"payload":{
"books":["123456","123457"]
}
}]
}]
}
Asyoucanseeintheresponse,wegetthedocumentwewerelookingforalongwiththepayloadinformation,ifitisavailable(fortheprecedingresponse,itisnot).
Wecanalsousefuzzysearches,whichallowustotoleratespellingmistakes.Wedothatbyincludingtheadditionalfuzzysectioninourquery.Forexample,toenablefuzzymatchinginthecompletionsuggesterandsetthemaximumeditdistanceto2(whichmeansthatamaximumoftwoerrorsareallowed),wesendthefollowingquery:
curl-XGET'localhost:9200/authors/_suggest?pretty'-d'{
"authorsAutocomplete":{
"text":"fio",
"completion":{
"field":"ac",
"fuzzy":{
"edit_distance":2
}
}
}
}'
Althoughwe’vemadeaspellingmistake,wewillstillgetthesameresultsaswegotearlier.
CustomweightsBydefault,thetermfrequencyisusedtodeterminetheweightofthedocumentreturnedbytheprefixsuggester.However,thismaynotbethebestsolution.Insuchcases,itisusefultodefinetheweightofthesuggestionbyspecifyingtheweightpropertyforthefielddefinedascompletion.Theweightpropertyshouldbesettoanintegervalue.Thehighertheweightpropertyvalue,themoreimportantthesuggestion.Forexample,ifwewanttospecifyaweightforthefirstdocumentinourexample,werunthefollowingcommand:
curl-XPOST'localhost:9200/authors/author/1'-d'{
"name":"FyodorDostoevsky",
"ac":{
"input":["fyodor","dostoevsky"],
"output":"FyodorDostoevsky",
"payload":{"books":["123456","123457"]},
"weight":30
}
}'
www.EBooksWorld.ir
Nowifwerunourexamplequery,theresultswillbeasfollows:
{
...
"authorsAutocomplete":[{
"text":"fyo",
"offset":0,
"length":3,
"options":[{
"text":"FyodorDostoevsky",
"score":30.0,
"payload":{
"books":["123456","123457"]
}
}]
}]
}
Lookhowthescoreoftheresultchanged.Inourinitialexample,itwas1.0andnowitis30.0.Thisissobecausewesettheweightparameterto30duringindexing.
ContextsuggesterThecontextsuggesterisanextensiontotheElasticsearchSuggestAPIforElasticsearch2.1andolderversionsthatwejustdiscussed.WhendescribingthecompletionsuggesterforElasticsearch2.1,wementionedthatthissuggesterallowsustohandlesuggester-relatedsearchesentirelyinmemory.Usingthissuggester,wecandefinethesocalledcontextforthequerythatwilllimitthesuggestionstoasubsetofdocuments.Becausewedefinethecontextinthemappings,itiscalculatedduringindexation,whichmakesquerytimecalculationseasierandlessdemandingintermsofperformance.
NoteRememberthatthissectionisrelatedtoElasticsearch2.1.ContextsinElasticsearch2.2arehandleddifferentlyandwerediscussedwhendiscussingthecompletionsuggester.
Contexttypes
Elasticsearch2.1supportstwotypesofcontext:categoryandgeo.Thecategorytypeofcontextallowsustoassignadocumenttooneormorecategoriesduringtheindextime.Later,duringthequerytime,wecantellElasticsearchwhichcategoryweareinterestedinandElasticsearchwilllimitthesuggestionstothosecategories.Thegeocontextallowsustolimitthedocumentsreturnedbythesuggesterstoagivenlocationortoacertaindistancefromapoint.Thenicethingaboutcontextisthatwecanhavemultiplecontexts.Forexample,wecanhaveboththecategorycontextandthegeocontextforthesamedocument.Let’snowseewhatweneedtodotousecontextinsuggestions.
Usingcontext
Usingthegeoandcategorycontextisverysimilar–theyjustdifferinparameters.Wewillshowyouhowtousecontextsinanexampleusingthesimplercategorycontextandlaterwewillgetbacktothegeocontextandshowyouwhatweneedtoprovide.
www.EBooksWorld.ir
Thefirststepwhenusingcontextsuggesteriscreatingapropermapping.Let’sgetbacktoourauthormapping,butthistimelet’sassumethateachauthorcanbegivenoneormorecategory–thebrandofbooksshe/heiswriting.Thiswillbeourcontext.Themappingsusingthecontextlookasfollows:
curl-XPOST'localhost:9200/authors_geo_context'-d'{
"mappings":{
"author":{
"properties":{
"name":{"type":"string"},
"ac":{
"type":"completion",
"analyzer":"simple",
"search_analyzer":"simple",
"context":{
"brand":{
"type":"category",
"default":["none"]
}
}
}
}
}
}
}'
We’veintroducedanewsectioninouracfielddefinition:context.Eachcontextisgivenaname,whichisbrandinourcase,andinsidethatobjectweprovideconfiguration.Weneedtoprovidethetypeusingthetypeproperty–wewillbeusingthecategorycontextsuggesternow.Inadditiontothat,we’vesetthedefaultarray,whichprovidesuswiththevalueorvaluesthatshouldbeusedasthedefaultcontext.Ifwewant,wecanalsoprovidethepathproperty,whichwillpointElasticsearchtoafieldinthedocumentsfromwhichthecontextvalueshouldbetaken.
Wecannowindexasingleauthorbymodifyingthecommandsweusedearlier,becauseweneedtoprovidethecontext:
curl-XPOST'localhost:9200/authors_context/author/1'-d'{
"name":"FyodorDostoevsky",
"ac":{
"input":"FyodorDostoevsky",
"context":{
"brand":"drama"
}
}
}'
Asyoucansee,theacfielddefinitionisabitdifferentnow;itisanobject.Theinputpropertyisusedtoprovidethevalueforautocompleteandthecontextobjectisusedtoprovidethevaluesforeachofthecontextsdefinedinthemappings.
Finally,wecanquerythedata.Asyoucouldimagine,wewillagainprovidethecontextweareinterestedin.Thequerythatdoesthatlooksasfollows:
www.EBooksWorld.ir
curl-XGET'localhost:9200/authors_context/_suggest?pretty'-d'{
"authorsAutocomplete":{
"text":"fyo",
"completion":{
"field":"ac",
"context":{
"brand":"drama"
}
}
}
}'
Asyoucansee,we’veincludedthecontextobjectinthequeryinsidethecompletionsectionandwe’vesetthecontextweareinterestedinusingthecontextname.TheresponsereturnedbyElasticsearchisasfollows:
{
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"authorsAutocomplete":[{
"text":"fyo",
"offset":0,
"length":3,
"options":[{
"text":"FyodorDostoevsky",
"score":1.0
}]
}]
}
However,ifwechangethebrandcontexttocomedy,forexample,Elasticsearchwillreturnnoresults,becausewedon’thaveauthorswithsuchacontext.Let’stestitbyrunningthefollowingquery:
curl-XGET'localhost:9200/authors_context/_suggest?pretty'-d'{
"authorsAutocomplete":{
"text":"fyo",
"completion":{
"field":"ac",
"context":{
"brand":"comedy"
}
}
}
}'
ThistimeElasticsearchreturnsthefollowingresponse:
{
"_shards":{
"total":5,
"successful":5,
"failed":0
www.EBooksWorld.ir
},
"authorsAutocomplete":[{
"text":"fyo",
"offset":0,
"length":3,
"options":[]
}]
}
Thisisbecausenoauthorwiththebrandcontextandthevalueofcomedyispresentintheauthors_contextindex.
Usingthegeolocationcontext
Thegeocontextissimilartothecategorycontextwhenitcomestousingit.However,insteadoffilteringbyterms,wefilterusinggeographicalpointsanddistances.Whenweusethegeocontext,weneedtoprovideprecision,whichdefinestheprecisionofthecalculatedgeohash.Thesecondpropertythatweprovideistheneighborsone,whichcanbesettotrueorfalse.Bydefault,itissettotrue,whichmeansthattheneighboringgeohasheswillbeincludedinthecontext.
Inadditiontothat,similartothecategorycontext,wecanprovidepath,whichspecifieswhichfieldtouseasthelookupforthegeographicalpoint,andthedefaultproperty,specifyingthedefaultgeopointforthedocuments.
Forexample,let’sassumethatwewanttofilteronthebirthplaceofourauthors.Themappingsforsuchasuggesterwilllookasfollows:
curl-XPOST'localhost:9200/authors_geo_context'-d'{
"mappings":{
"author":{
"properties":{
"name":{"type":"string"},
"ac":{
"type":"completion",
"analyzer":"simple",
"search_analyzer":"simple",
"context":{
"birth_location":{
"type":"geo",
"precision":["1000km"],
"neighbors":true,
"default":{
"lat":0.0,
"lon":0.0
}
}
}
}
}
}
}
}'
Nowwecanindexthedocumentsandprovidethebirthlocation.Forourexampleauthor,
www.EBooksWorld.ir
itwilllookasfollows(thecentreofMoscow):
curl-XPOST'localhost:9200/authors_geo_context/author/1'-d'{
"name":"FyodorDostoevsky",
"ac":{
"input":"FyodorDostoevsky",
"context":{
"birth_location":{
"lat":55.75,
"lon":37.61
}
}
}
}'
Asyoucansee,we’veprovidedthebirth_locationcontextforourauthor.
Nowduringquerytime,weneedtoprovidethecontextthatweareinterestedinandwecan(butwearenotobligatedto)providetheprecisionasthesubsetoftheprecisionvaluesprovidedinthemappings.We’vedefinedtheprecisionto1000km,solet’sfindalltheauthorsstartingwithfyothatwereborninKazan,whichisabout800kmfromMoscow.Weshouldfindourexampleauthor.
Thequerythatdoesthatlooksasfollows:
curl-XGET'localhost:9200/authors_geo_context/_suggest?pretty'-d'{
"authorsAutocomplete":{
"text":"fyo",
"completion":{
"field":"ac",
"context":{
"birth_location":{
"lat":55.45,
"lon":49.8
}
}
}
}
}'
TheresponsereturnedbyElasticsearchlooksasfollows:
{
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"authorsAutocomplete":[{
"text":"fyo",
"offset":0,
"length":3,
"options":[{
"text":"FyodorDostoevsky",
"score":1.0
}]
www.EBooksWorld.ir
}]
}
However,ifwerunthesamequerybutpointtotheNorthPole,wewillgetnoresults:
curl-XGET'localhost:9200/authors_geo_context/_suggest?pretty'-d'{
"authorsAutocomplete":{
"text":"fyo",
"completion":{
"field":"ac",
"context":{
"birth_location":{
"lat":0.0,
"lon":0.0
}
}
}
}
}'
ThefollowingistheresponsefromElasticsearchinthiscase:
{
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"authorsAutocomplete":[{
"text":"fyo",
"offset":0,
"length":3,
"options":[]
}]
}
www.EBooksWorld.ir
www.EBooksWorld.ir
TheScrollAPILet’simaginethatwehaveanindexwithseveralmilliondocuments.Wealreadyknowhowtobuildourqueryandsoon.However,whentryingtofetchalargenumberofdocuments,youseethatwhengettingfurtherandfurtherwithpagesoftheresults,thequeriesslowdownandfinallytimeoutorresultinmemoryissues.
Thereasonforthisisthatfull-textsearchengines,especiallythosethataredistributed,don’thandlepagingverywell.Ofcourse,gettingafewhundredpagesofresultsisnotaproblemforElasticsearch,butforgoingthroughalltheindexeddocumentsorthroughlargeresultset,aspecializedAPIhasbeenintroduced.
www.EBooksWorld.ir
ProblemdefinitionWhenElasticsearchgeneratesaresponse,itmustdeterminetheorderofthedocumentsthatformtheresult.Ifweareonthefirstpage,thisisnotabigproblem.Elasticsearchjustfindsthesetofdocumentsandcollectsthefirstones;let’ssay,20documents.Butifweareonthetenthpage,Elasticsearchhastotakeallthedocumentsfrompagesonetotenandthendiscardtheonesthatareonpagesonetonine.Thisisevenmorecomplicatedifwehaveadistributedenvironment,becausewedon’tknowfromwhichnodestheresultswillcome.Becauseofthat,eachnodeneedstobuildtheresponseandkeepitinmemoryforsometime.TheproblemisnotElasticsearch-specific;asimilarsituationcanbefoundinthedatabasesystems,forexample,generally,ineverysystemthatusestheso-calledpriorityqueue.
www.EBooksWorld.ir
ScrollingtotherescueThesolutionissimple.SinceElasticsearchhastodosomeoperations(determinethedocumentsforthepreviouspages)foreachrequest,wecanaskElasticsearchtostorethisinformationforsubsequentqueries.Thedrawbackisthatwecannotstorethisinformationforeverduetolimitedresources.Elasticsearchassumesthatwecandeclarehowlongweneedthisinformationtobeavailable.Let’sseehowitworksinpractice.
Firstofall,wequeryElasticsearchasweusuallydo.However,inadditiontoalltheknownparameters,weaddonemore:theparameterwiththeinformationthatwewanttousescrollingwithandhowlongwesuggestthatElasticsearchshouldkeeptheinformationabouttheresults.Wecandothisbysendingaqueryasfollows:
curl'localhost:9200/library/_search?pretty&scroll=5m'-d'{
"size":1,
"query":{
"match_all":{}
}
}'
Thecontentofthisqueryisirrelevant.TheimportantthingishowElasticsearchmodifiestheresponse.LookatthefollowingfirstfewlinesoftheresponsereturnedbyElasticsearch:
{
"_scroll_id":
"cXVlcnlUaGVuRmV0Y2g7NTsxNjo1RDNrYnlfb1JTeU1sX20yS0NRSUZ3OzE3OjVEM2tieV9vUl
N5TWxfbTJLQ1FJRnc7MTg6NUQza2J5X29SU3lNbF9tMktDUUlGdzsxOTo1RDNrYnlfb1JTeU1sX
20yS0NRSUZ3OzIwOjVEM2tieV9vUlN5TWxfbTJLQ1FJRnc7MDs=",
"took":3,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":4,
...
Thenewpartisthe_scroll_idsection.Thisisahandlethatwewilluseinthequeriesthatfollow.Elasticsearchhasaspecialendpointforthis:the_search/scrollendpoint.Let’slookatthefollowingexample:
curl-XGET'localhost:9200/_search/scroll?pretty'-d'{
"scroll":"5m",
"scroll_id":
"cXVlcnlUaGVuRmV0Y2g7NTsyNjo1RDNrYnlfb1JTeU1sX20yS0NRSUZ3OzI3OjVEM2tieV9vUl
N5TWxfbTJLQ1FJRnc7Mjg6NUQza2J5X29SU3lNbF9tMktDUUlGdzsyOTo1RDNrYnlfb1JTeU1sX
20yS0NRSUZ3OzMwOjVEM2tieV9vUlN5TWxfbTJLQ1FJRnc7MDs="
}'
Noweverycalltothisendpointwithscroll_idreturnsthenextpageofresults.
www.EBooksWorld.ir
Rememberthatthishandleisonlyvalidforthedefinedtimeofinactivity.
Ofcourse,thissolutionisnotideal,anditisnotveryappropriatewhentherearemanyrequeststorandompagesofvariousresultsorwhenthetimebetweentherequestsisdifficulttodetermine.However,youcanusethissuccessfullyforusecaseswhereyouwanttogetlargerresultsets,suchastransferringdatabetweenseveralsystems.
www.EBooksWorld.ir
www.EBooksWorld.ir
SummaryInthechapterthatwejustfinished,welearnedaboutsomefunctionalitiesofElasticsearchthatwewon’tprobablyuseeverydayoratleastnoteveryoneofuswillusethem.Wediscussedpercolator–anupsidedownsearchfunctionalitythatallowsustoindexqueriesandfindwhichdocumentsmatchthem.WelearnedaboutthespatialcapabilitiesofElasticsearchandweusedsuggesterstocorrectuserspellingmistakesandbuildahighlyefficientautocompletefunctionality.WealsousedtheScrollAPItoefficientlyfetchlargenumberofresultsfromourElasticsearchindices.
Inthenextchapter,wewillfocusonclustersanditsconfiguration.Wewilldiscussnodediscovery,gateway,andrecoverymodules–whattheyareresponsibleforandhowtoconfigurethemtomatchourneeds.Wewillusetemplatesanddynamictemplates,andwewillseehowtoinstallpluginsextendingElasticsearch’sout-of-theboxfunctionalities.WewilllearnwhatarethecachesofElasticsearchcachesareandhowtoconfigurethemefficientlytomakethemostoutofthem.Finally,wewillusetheupdatesettingsAPItoupdateElasticsearchconfigurationonliveandrunningclusters.
www.EBooksWorld.ir
www.EBooksWorld.ir
Chapter9.ElasticsearchClusterinDetailThepreviouschapterwasfullydedicatedtosearchfunctionalitiesthatarenotonlyaboutfulltextsearching.Welearnedhowtousepercolator–aninversedsearchthatallowsustobuildalteringfunctionalitiesontopofElasticsearch.WelearnedtousespatialfunctionalitiesofElasticsearchandweusedthesuggestAPIthatallowedustocorrectuser’sspellingmistakesaswellasbuildveryefficientautocompletefunctionalities.Butlet’snowfocusonrunningandadministeringElasticsearch.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:
HowdoesElasticsearchfindnewnodesthatshouldjointheclusterWhatarethegatewayandrecoverymodulesHowdotemplatesworkHowtousedynamictemplatesHowtousetheElasticsearchpluginmechanismWhatarethecachesinElasticsearchandhowtotunethemHowtousetheUpdateSettingsAPItoupdateElasticsearchsettingsonrunningclusters
www.EBooksWorld.ir
UnderstandingnodediscoveryWhenstartingyourElasticsearchnode,oneofthefirstthingsthathappensislookingforamasternodethathasthesameclusternameandisvisible.Ifamasterisfound,thenodegetsjoinedintoanalreadyformedcluster.Ifnomasterisfound,thenthenodeitselfisselectedasamaster(ofcourseiftheconfigurationallowssuchbehavior).Theprocessofformingaclusterandfindingnodesiscalleddiscovery.Themoduleresponsiblefordiscoveryhastwomainpurposes:electingamasteranddiscoveringnewnodeswithinacluster.Inthissection,wewilldiscusshowwecanconfigureandtunethediscoverymodule.
www.EBooksWorld.ir
DiscoverytypesBydefault,withoutinstallingadditionalplugins,ElasticsearchallowsustouseZendiscovery,whichprovidesuswithunicastdiscovery.Unicast(http://en.wikipedia.org/wiki/Unicast)allowstransmissionofasinglemessageoverthenetworktoasinglehostatonce.Elasticsearchnodesendsthemessagetothenodesdefinedintheconfigurationandwaitsforaresponse.Whenthenodeisacceptedintothecluster,therecoverymodulekicksinandstartstherecoveryprocessifneeded,orthemasterelectionprocessifthemasterisstillnotelected.
NotePriortoElasticsearch2.0,theZendiscoverymoduleallowedustousemulticastdiscovery.Onamulticastcapablenetwork,ElasticsearchwasabletoautomaticallydiscovernodeswithoutspecifyinganyIPaddressesofotherElasticsearchserverssharingthesameclustername.Thiswasverymistakeproneandnotadvisedforproductionuseandthusitwasdeprecatedandremovedtoaplugin.
Elasticsearcharchitectureisdesignedtobepeertopeer.Whenrunningoperationssuchasindexingorsearching,themasternodedoesn’ttakepartincommunicationandtherelevantnodescommunicatewitheachotherdirectly.
www.EBooksWorld.ir
NoderolesElasticsearchnodescanbeconfiguredtoworkinoneofthefollowingroles:
Master:Thenoderesponsibleformaintainingtheglobalclusterstate,changingitdependingontheneeds,andhandlingtheadditionandremovalofnodes.Therecanonlybeasinglemasternodeactiveinasinglecluster.Data:Thenoderesponsibleforholdingthedataandexecutingdatarelatedoperations(indexationandsearching)ontheshardsthatarepresentlocallyforthenode.Client:Thenoderesponsibleforhandlingrequests.Fortheindexingrequests,theclientnodeforwardstherequesttotheappropriateprimaryshardand,forthesearchrequests,itsendsittoalltherelevantshardsandaggregatestheresults.
Bydefault,eachnodecanworkasmaster,data,orclient.Itcanbeadataandaclientatthesametimeforexample.Onlargeandhighlyloadedclusters,itisveryimportanttodividetherolesofthenodesintheclusterandhavethenodesdoonlyasingleroleatatime.Whendealingwithsuchclusters,youwilloftenseeatleastthreemasternodes,multipledatanodes,andafewclientonlynodesaspartofthewholecluster.
MasternodeItisthemostimportantnodetypefromElasticsearchcluster’spointofview.Ithandlestheclusterstate,changesit,managesthenodesjoiningandleavingthecluster,checksthehealthoftheothernodesinthecluster(byrunningpingrequests),andmanagestheshardrelocationoperations.Ifthemasterissomehowdisconnectedfromthecluster,theremainingnodeswillselectanewmasterfromeachother.Alltheseprocessesaredoneautomaticallyonthebasisoftheconfigurationvaluesweprovide.YouusuallywantthemasternodestoonlycommunicatewiththeotherElasticsearchnodes,usingtheinternalJavacommunication.Toavoidhittingthemasternodesbymistake,itisadvisedtoturnofftheHTTPmoduleforthemintheconfiguration.
DatanodeThedatanodeisresponsibleforholdingthedataintheindices.Thedatanodesaretheonesthatneedthemostdiskspacebecauseofbeingloadedwithdataindexationrequestsandrunningsearchesonthedatatheyhavelocally.Thedatanodes,similartothemasternodescanhavetheHTTPmoduledisabled.
ClientnodeTheclientnodesareinmostcasesnodesthatdon’thaveanydataandarenotmasternodes.Theclientnodesaretheonesthatcommunicatewiththeoutsideworldandwithallthenodesinthecluster.Theyforwardthedatatotheappropriateshardsandaggregatethesearchandaggregationsresultsfromalltheothernodes.
Keepinmindthatclientnodescanhavedataaswell,butinsuchacasetheywillrunboththeindexingrequestsandthesearchrequestsforthelocaldataandwillaggregatethedatafromtheothernodes,whichinlargeclustersmaybetoomuchworkforasinglenode.
www.EBooksWorld.ir
ConfiguringnoderolesBydefault,Elasticsearchallowseverynodetobeamasternode,adatanode,oraclientnode.However,aswealreadymentioned,incertainsituationsyoumaywanttohavenodesthatonlyholddata,clientnodesthatareonlyusedtoprocessrequests,andmasterhoststomanagethecluster.Onesuchsituationiswhenmassiveamountsofdataneedstobehandled,wherethedatanodesshouldbeasperformantaspossible.TotellElasticsearchwhatroleitshouldtake,weusethreeBooleanpropertiessetintheelasticsearch.ymlconfigurationfile:
node.master:Whensettotrue,wetellElasticsearchthatthenodeismastereligible,whichmeansthatitcantaketheroleofamaster.However,notethatthemasterwillbeautomaticallymarkedasnotmastereligibleassoonasitisassignedaclientrole.node.data:Whensettotrue,wetellElasticsearchthatthenodecanbeusedtoholddata.node.client:Whensettotrue,wetellElasticsearchthatthenodeshouldbeusedasaclient.
So,tosetanodetoonlyholddata,weshouldaddthefollowingpropertiestotheelasticsearch.ymlconfigurationfile:
node.master:false
node.data:true
node.client:false
Tosetthenodetonotholddataandonlybeamasternode,weneedtoinstructElasticsearchthatwedon’twantthenodetoholddata.Inordertodothis,weaddthefollowingpropertiestotheelasticsearch.ymlconfigurationfile:
node.master:true
node.data:false
node.client:false
www.EBooksWorld.ir
Settingthecluster’snameIfwedon’tsetthecluster.namepropertyinourelasticsearch.ymlfile,Elasticsearchusestheelasticsearchdefaultvalue.Thisisnotagoodthing,becauseeachnewElasticsearchnodewillhavethesameclusternameandyoumaywanttohavemultipleclustersinthesamenetwork.Insuchacase,connectingthewrongnodestogetherisjustamatteroftime.Becauseofthat,wesuggestsettingthecluster.namepropertytosomeothervalueofyourchoice.Usually,itisagoodideatoadjustclusternamesbasedonclusterresponsibilities.
www.EBooksWorld.ir
ZendiscoveryThedefaultdiscoverymethodusedbyElasticsearchandonethatiscommonlyusedintheElasticsearchworldiscalledZendiscovery.Itsupportsunicastdiscoveryandallowsadjustingvariouspartsofitsconfiguration.
NoteNotethatthereareadditionaldiscoverytypesavailableasplugins,suchasAmazonEC2discovery,MicrosoftAzurediscovery,andGoogleComputeEnginediscovery.
MasterelectionconfigurationImaginethatyouhaveaclusterthatisbuiltof10nodes.Everythingisworkingfineuntilonedaywhenyournetworkfailsand3ofyournodesaredisconnectedfromthecluster,buttheystillseeeachother.BecauseoftheZendiscoveryandmasterelectionprocess,thenodesthatgotdisconnectedelectanewmasterandyouendupwithtwoclusterswiththesamename,withtwomasternodes.Suchasituationiscalledasplit-brainandyoumustavoiditasmuchaspossible.Whensplit-brainhappens,youendupwithtwo(ormore)clustersthatwon’tjoineachotheruntilthenetwork(oranyother)problemsarefixed.Thethingtorememberisthatsplit-brainmayresultinnotrecoverableerrors,suchasdataconflictsinwhichyouendupwithdatacorruptionorpartialdataloss.That’swhyitisimportanttoavoidsuchsituationsatallcosts.
Inordertopreventsplit-brainsituations,Elasticsearchprovidesadiscovery.zen.minimum_master_nodesproperty.Thispropertydefinestheminimumamountofmastereligiblenodesthatshouldbeconnectedtoeachotherinordertoformacluster.Sonowlet’sgetbacktoourcluster;ifwesetthediscovery.zen.minimum_master_nodespropertyto50percentofthetotalnodesavailable+1(whichis6inourcase),wewillendupwithasinglecluster.Whyisthat?Beforethenetworkfailure,wehad10nodes,whichismorethansixnodes,andthosenodesformedacluster.Afterthedisconnectionofthethreenodes,wewouldstillhavethefirstclusterupandrunning.However,becauseonlythreenodesgotdisconnectedandthreeislessthansix,thesethreenodeswouldn’tbeallowedtoelectanewmasterandtheywouldwaitforreconnectionwiththeoriginalcluster.
Ofcoursethisisalsonotaperfectscenario.Itisadvisedtohaveadedicatedmastereligiblenodesonly,thatdon’tworkasdataorclientnodes.Tohaveaquoruminsuchacase,weneedatleastthreededicatedmastereligiblenodes,becausethatwillallowustohaveasinglemasterofflineandstillkeepthequorum.Thisisusuallyenoughtokeeptheclustersinagoodshapewhenitcomestomasterrelatedfeaturesandtobesplit-brainproof.Ofcourse,insuchacase,thediscovery.zen.minimum_master_nodespropertyshouldbesetto2andweshouldhavethethreemasternodesupandrunning.
Furthermore,ElasticsearchallowsustoadditionallyspecifytwoadditionalBooleanproperties:discover.zen.master_election.filter_clientanddiscover.zen.master_election.filter_data.TheyallowustotellElasticsearchtoignorepingrequestsfromtheclientanddatanodesduringmasterelection.Bydefault,the
www.EBooksWorld.ir
firstmentionedpropertyissettotrueandthesecondissettofalse.ThisallowsElasticsearchtofocusonthemasterelectionandnotbeoverloadedwithpingrequestsfromthenodesthatarenotmastereligible.
Inadditiontothementionedproperties,Elasticsearchallowsconfiguringtimeoutsrelatedtothemasterelectionprocess.discovery.zen.ping_timeout,whichdefaultsto3s(threeseconds),allowsconfiguringtimeoutforslownetworks–thehigherthevalue,thelesserthechanceoffailure,buttheelectionprocesscantakelonger.Thesecondpropertyiscalleddiscover.zen.join_timeoutandspecifiesthetimeoutforthejoinrequesttothemaster.Itdefaultsto20timesthediscovery.zen.ping_timeoutproperty.
ConfiguringunicastBecauseofthewayunicastworks,weneedtospecifyatleastahostthattheunicastmessageshouldbesentto.Todothis,weshouldaddthediscovery.zen.ping.unicast.hostspropertytoourelasticsearch.ymlconfigurationfile.Basically,weshouldspecifyallthehoststhatformtheclusterinthediscovery.zen.ping.unicast.hostsproperty(wedon’thavetospecifyallthehosts,wejustneedtoprovideenoughsothatwearesurethatasingleonewillwork).Forexample,ifwewantthehosts192.168.2.1,192.168.2.2and192.168.2.3forourhost,weshouldspecifytheprecedingpropertyinthefollowingway:
discovery.zen.ping.unicast.hosts:192.168.2.1:9300,192.168.2.2:9300,
192.168.2.3:9300
OnecanalsodefinearangeoftheportsElasticsearchcanuse.Forexample,tosaythatportsfrom9300to9399canbeused,wespecifythefollowing:
discovery.zen.ping.unicast.hosts:192.168.2.1:[9300-9399],192.168.2.2:
[9300-9399],192.168.2.3:[9300-9399]
Notethatthehostsareseparatedwithacommacharacterandwe’vespecifiedtheportonwhichweexpectunicastmessages.
FaultdetectionpingsettingsInadditiontothesettingsdiscussedpreviously,wecanalsocontroloralterthedefaultpingconfiguration.Pingisasignalsentbetweenthenodestocheckiftheyarerunningandresponsive.Themasternodepingsalltheothernodesintheclusterandeachoftheothernodesintheclusterpingsthemasternode.Thefollowingpropertiescanbeset:
discovery.zen.fd.ping_interval:Thisdefaultsto1s(onesecond)andspecifieshowoftenthenodespingeachotherdiscovery.zen.fd.ping_timeout:Thisdefaultsto30s(30seconds)anddefineshowlonganodewillwaitfortheresponsetoitspingmessagebeforeconsideringanodeasunresponsivediscovery.zen.fd.ping_retries:Thisdefaultsto3andspecifieshowmanyretriesshouldbetakenbeforeconsideringanodeasnotworking
Ifyouexperiencesomeproblemswithyournetwork,oryouknowthatyournodesneedmoretimetoseethepingresponse,youcanadjusttheprecedingvaluestotheonesthat
www.EBooksWorld.ir
aregoodforyourdeployment.
ClusterstateupdatescontrolAswehavealreadydiscussed,themasternodeistheoneresponsibleforhandlingthechangesoftheclusterstateandElasticsearchallowsustocontrolthatprocess.Formostusecases,thedefaultsettingsaremorethanenough,butyoumayrunintosituationswherechangingthesettingsisrequired.
Themasternodeprocessesasingleclusterstatecommandatatime.Firstthemasternodepropagatesthechangestoothernodesandthenitwaitsforresponse.Eachclusterstatechangeisnotconsideredfinisheduntilenoughnodesrespondtothemasterwithacknoledgment.Thenumberofnodesthatneedtorespondisspecifiedbydiscovery.zen.minimum_master_nodes,whichwearealreadyawareof.ThemaximumtimeanElasticsearchnodewaitsforthenodestorespondis30sbydefaultandisspecifiedbythediscovery.zen.commit_timeoutproperty.Ifnotenoughnodesrespondtothemaster,theclusterstatechangeisrejected.
Onceenoughnodesrespondtothemasterpublishmessage,theclusterstatechangeisacceptedonthemasterandtheclusterstateischanged.Oncethatisdone,themastersendsamessagetoallthenodessayingthatthechangecanbeapplied.Thetimeoutofthismessageisagainsetto30secondsandiscontrolledusingthediscovery.zen.publish_timeoutproperty.
DealingwithmasterunavailabilityIfaclusterhasnomasternode,whateverthereasonmaybe,itisnotfullyoperational.Bydefault,wecan’tchangethemetadata,clusterwidecommandswillnotbeworking,andsoon.Elasticsearchallowsustoconfigurethebehaviorofthenodeswhenthemasternodeisnotelected.Todothat,wecanusethediscovery.zen.no_master_blockpropertywhichthesettingsofallandwrite.Settingthispropertytoallmeansthatalltheoperationsonthenodewillberejected,thatis,thesearchoperations,thewriterelatedoperations,andtheclusterwideoperationssuchashealthormappingsretrieval.Settingthispropertytowritemeansthatonlythewriteoperationwillberejected–thisisthedefaultbehaviorofElasticsearch.
www.EBooksWorld.ir
AdjustingHTTPtransportsettingsWhilediscussingthenodediscoverymoduleandprocess,wementionedtheHTTPmoduleafewtimes.WewouldliketogetbacktothattopicnowanddiscussafewusefulpropertieswhendiscussingandusingElasticsearch.
DisablingHTTPThefirstthingisdisablingtheHTTPcompletely.Thisisusefultoensurethatthemasteranddatanodeswon’tacceptanyqueriesorrequestsingeneralfromusers.TodisabletheHTTPtransportcompletely,wejustneedtoaddthehttp.enabledpropertyandsetittofalseinourelasticsearch.ymlfile.
HTTPportElasticsearchallowsustodefinetheportonwhichitwillbelisteningtoHTTPrequests.Thisisdonebyusingthehttp.portproperty.Itdefaultsto9200-9300,whichmeansthatElasticsearchwillstartfrom9200portandincreaseiftheportisnotavailable(sothenextinstancewilluse9201port,andsoon).Thereisalsohttp.publish_port,whichisveryusefulwhenrunningElasticsearchbehindafirewallandwhentheHTTPportisnotdirectlyaccessible.ItdefineswhichportshouldbeusedbytheclientsconnectingtoElasticsearchanddefaultstothesamevalueasthehttp.portproperty.
HTTPhostWecanalsodefinethehosttowhichElasticsearchwillbind.Tospecifyit,weneedtodefinethehttp.hostproperty.Thedefaultvalueistheonesetbythenetworkmodule.Ifneeded,wecansetthepublishhostandthebindhostseparatelyusingthehttp.publish_hostandhttp.bind_hostproperties.Youusuallydon’thavetospecifythesepropertiesunlessyournodeshavenonstandardhostnamesormultiplenamesandyouwantElasticsearchtobindtoasingleoneonly.
YoucanfindthefulllistofpropertiesallowedfortheHTTPmoduleinElasticsearchofficialdocumentationavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/2.2/modules-http.html.
www.EBooksWorld.ir
www.EBooksWorld.ir
ThegatewayandrecoverymodulesApartfromourindicesandthedataindexedinsidethem,Elasticsearchneedstoholdthemetadata,suchasthetypemappings,theindexlevelsettings,andsoon.Thisinformationneedstobepersistedsomewheresoitcanbereadduringclusterrecovery.Ofcourse,itcouldbestoredinmemory,butfullclusterrestartorafatalfailurewouldresultinthisinformationbeinglost,whichisnotsomethingthatwewant.ThisiswhyElasticsearchintroducedthegatewaymodule.Youcanthinkaboutitasasafeheavenforyourclusterdataandmetadata.Eachtimeyoustartyourcluster,alltheneededdataisreadfromthegatewayand,whenyoumakeachangetoyourcluster,itispersistedusingthegatewaymodule.
www.EBooksWorld.ir
ThegatewayInordertosetthetypeofgatewaywewanttouse,weneedtoaddthegateway.typepropertytotheelasticsearch.ymlconfigurationfileandsetittothelocalvalue.Currently,Elasticsearchrecommendsusingthelocalgatewaytype(gateway.typesettolocal),whichisthedefaultoneandtheonlyoneavailablewithoutadditionalplugins.
Thedefaultlocalgatewaytypestorestheindicesandtheirmetadatainthelocalfilesystem.Comparedtotheothergateways,thewriteoperationtothisgatewayisnotperformedinanasynchronousway,so,wheneverawritesucceeds,youcanbesurethatthedatawaswrittenintothegateway(sobasicallyindexedorstoredinthetransactionlog).
www.EBooksWorld.ir
RecoverycontrolInadditiontochoosingthegatewaytype,Elasticsearchallowsustoconfigurewhentostarttheinitialrecoveryprocess.Therecoveryisaprocessofinitializingalltheshardsandreplicas,readingallthedatafromthetransactionlog,andapplyingthemontheshards.Basically,it’saprocessneededtostartElasticsearch.
Forexample,let’simaginethatwehaveaclusterthatconsistsof10Elasticsearchnodes.WeshouldinformElasticsearchaboutthenumberofnodesbysettinggateway.expected_nodestothatvalue,so10inourcase.WeinformElasticsearchaboutthenumberofexpectednodesthatareeligibletoholdthedataandeligibletobeselectedasamaster.Elasticsearchwillstarttherecoveryprocessimmediatelyifthenumberofnodesintheclusterisequaltothatproperty.
Wewouldalsoliketostarttherecoveryaftersixnodesaretogether.Todothis,weshouldsetthegateway.recover_after_nodespropertyto6.Thispropertyshouldbesettoavaluethatensuresthatthenewestversionoftheclusterstatesnapshotwillbeavailable,whichusuallymeansthatyoushouldstartrecoverywhenmostofyournodesareavailable.
Thereisalsoonemorething.Wewouldlikethegatewayrecoveryprocesstostart5minutesafterthegateway.recover_after_nodesconditionismet.Todothis,wesetthegateway.recover_after_timepropertyto5m.Thispropertytellsthegatewaymodulehowlongtowaitwiththerecoveryprocessafterthenumberofnodesreachedtheminimumspecifiedbythegateway.recovery_after_nodesproperty.Wemaywanttodothisbecauseweknowthatournetworkisquiteslowandwewantthenodescommunicationtobestable.NotethatElasticsearchwon’tdelaytherecoveryifthenumberofmasteranddataeligiblenodesthatformedtheclusterisequaltothevalueofthegateway.expected_nodesproperty.
Theprecedingpropertyvaluesshouldbesetintheelasticsearch.ymlconfigurationfile.Forexample:ifwewouldliketohavethepreviouslydiscussedvalueinthementionedfile,wewouldendupwiththefollowingsectioninthefile:
gateway.recover_after_nodes:6
gateway.recover_after_time:5m
gateway.expected_nodes:10
AdditionalgatewayrecoveryoptionsInadditiontothementionedoptions,Elasticsearchallowsussomeadditionaldegreeofcontrol.Theseadditionaloptionsare:
gateway.recover_after_master_nodes:Thisissimilartothegateway_recover_after_nodesproperty,butinsteadoftakingintoconsiderationallthenodes,itallowsustospecifyhowmanymastereligiblenodesshouldbepresentintheclusterbeforerecoverystartsgateway.recover_after_data_nodes:Thisisalsosimilartothegateway_recover_after_nodesproperty,butitallowsspecifyinghowmanydata
www.EBooksWorld.ir
nodesshouldbepresentintheclusterbeforerecoverystartsgateway.expected_master_nodes:Thisissimilartothegateway.expected_nodesproperty,butinsteadofspecifyingthenumberofallthenodesthatweexpectinthecluster,itallowsspecifyinghowmanymastereligiblenodesweexpecttobepresentgateway.expected_master_nodes:Thisissimilartothegateway.expected_nodesproperty,butallowsspecifyinghowmanymasternodesweexpecttobepresentgateway.expected_data_nodes:Thisisalsosimilartothegateway.expected_nodesproperty,butallowsspecifyinghowmanydatanodesweexpecttobepresent
IndicesrecoveryAPIThereisalsooneotherthingwhenitcomestotherecoveryprocess–theindicesrecoveryAPI.Itallowsustoseetheprocessofindexorindicesrecovery.Touseit,wejustneedtospecifytheindicesandusethe_recoveryend-point.Forexample,tochecktherecoveryprocessofthelibraryindex,wewillrunthefollowingcommand:
curl-XGET'localhost:9200/library/_recovery?pretty'
Theresponsefortheprecedingcommandcanbelargeanddependsonthenumberofshardsintheindexandofcoursetheamountofindiceswewanttogetinformationfor.Inourcase,theresponselooksasfollows(weleftinformationaboutasingleshardtomakeitlessextensive):
{
"library":{
"shards":[{
"id":0,
"type":"STORE",
"stage":"DONE",
"primary":true,
"start_time_in_millis":1444030695956,
"stop_time_in_millis":1444030695962,
"total_time_in_millis":5,
"source":{
"id":"Brt5ejEVSVCkIfvY9iDMRQ",
"host":"127.0.0.1",
"transport_address":"127.0.0.1:9300",
"ip":"127.0.0.1",
"name":"PuffAdder"
},
"target":{
"id":"Brt5ejEVSVCkIfvY9iDMRQ",
"host":"127.0.0.1",
"transport_address":"127.0.0.1:9300",
"ip":"127.0.0.1",
"name":"PuffAdder"
},
"index":{
"size":{
"total_in_bytes":157,
"reused_in_bytes":157,
"recovered_in_bytes":0,
www.EBooksWorld.ir
"percent":"100.0%"
},
"files":{
"total":1,
"reused":1,
"recovered":0,
"percent":"100.0%"
},
"total_time_in_millis":1,
"source_throttle_time_in_millis":0,
"target_throttle_time_in_millis":0
},
"translog":{
"recovered":0,
"total":-1,
"percent":"-1.0%",
"total_on_start":-1,
"total_time_in_millis":4
},
"verify_index":{
"check_index_time_in_millis":0,
"total_time_in_millis":0
}
},
...
]
}
}
Asyoucanseeintheresponse,weseetheinformationabouteachshard.Foreachshard,weseethetypeoftheoperation(thetypeproperty),thestage(thestageproperty)describingwhatpartoftherecoveryprocessisinprogress,andwhetheritisaprimaryshard(theprimaryproperty).Inadditiontothis,weseesectionsaboutthesourceshard,thetargetshard,theindextheshardispartof,theinformationaboutthetransactionlog,andfinallyinformationabouttheindexverification.Allofthisallowsustoseewhatisthestatusoftherecoveryofourindices.
DelayedallocationWealreadydiscussedthatbydefaultElasticsearchtriestobalancetheshardsintheclusteraccordinglytothenumberofnodesinthatcluster.Becauseofthat,whenanodedropsoffthecluster(ormultiplenodesdo)orwhennodesjointhecluster,Elasticsearchstartsrebalancingthecluster,movingtheshardsandthereplicasaround.Thisisusuallyveryexpensive–newprimaryshardsmaybepromotedoutoftheavailablereplicas,largeamountofdatamaybecopiedbetweenthenewprimaryanditsreplicas,andsoon.Andthismaybehappeningbecauseasinglenodewasjustrestartedfor30secondsmaintenance.
Toavoidsuchsituations,Elasticsearchprovidesuswiththepossibilitytocontrolhowlongtowaitbeforebeginningallocationofshardsthatareinunassignedstate.Wecancontrolthedelaybyusingtheindex.unassigned.node_left.delayed_timeoutpropertyandsettingitonperindexbasis.Forexample,toconfiguretheallocationtimeoutforthe
www.EBooksWorld.ir
libraryindexto10minutes,werunthefollowingcommand:
curl-XPUT'localhost:9200/library/_settings'-d'{
"settings":{
"index.unassigned.node_left.delayed_timeout":"10m"
}
}'
Wecanalsoconfiguretheallocationtimeoutforalltheindicesbyrunningthefollowingcommand:
curl-XPUT'localhost:9200/_all/_settings'-d'{
"settings":{
"index.unassigned.node_left.delayed_timeout":"10m"
}
}'
IndexrecoveryprioritizationElasticsearch2.2exposesonemorefeaturewhenitcomestotheindicesrecoveryprocessthatallowsustodefinewhichindicesshouldbeprioritizedwhenitcomestorecovery.Byspecifyingtheindex.prioritypropertyintheindexsettingsandassigningitapositiveintegervalue,wedefinetheorderinwhichElasticsearchshouldrecovertheindices;theoneswiththehigherindex.prioritypropertywillbestartedfirst.
Forexample,let’sassumethatwehavetwoindices,libraryandmap,andwewantthelibraryindextoberecoveredbeforethemapindex.Todothis,wewillrunthefollowingcommands:
curl-XPUT'localhost:9200/library/_settings'-d'{
"settings":{
"index.priority":10
}
}'
curl-XPUT'localhost:9200/map/_settings'-d'{
"settings":{
"index.priority":1
}
}'
Weassignedhigherprioritytothelibraryindexand,becauseofthat,itwillberecoveredfaster.
www.EBooksWorld.ir
www.EBooksWorld.ir
TemplatesanddynamictemplatesIntheMappingsconfigurationsectionofChapter2,IndexingYourData,wediscussedmappings,howtheyarecreated,andhowthetype-determiningmechanismworks.Nowwewillgetintomoreadvancedtopics.Wewillshowyouhowtodynamicallycreatemappingsfornewindicesandhowtoapplysomelogictothetemplates,sothatnewindicesarealreadycreatedwithpredefinedmappings.
www.EBooksWorld.ir
TemplatesInvariouspartsofthebook,whendiscussingindexconfigurationanditsstructure,we’veseenthatthiscanbecomecomplicated,especiallywhenwehavesophisticateddatastructuresthatwewanttoindex,search,andaggregate.Especiallyifyouhavealotofsimilarindices,takingcareofthemappingsineachofthemcanbeaverypainfulprocess–eachnewindexhastobecreatedwithappropriatemappings.Elasticsearchcreatorspredictedthisandimplementedafeaturecalledindextemplates.Eachtemplatedefinesapattern,whichiscomparedtoanewlycreatedindexname.Whenbothofthemmatch,thevaluesdefinedinthetemplatearecopiedtotheindexstructuredefinition.Whenmultipletemplatesmatchthenameofthenewlycreatedindex,allofthemareappliedandthevaluesfromthetemplatesthatareappliedlateroverridethevaluesdefinedinthepreviouslyappliedtemplates.Thisisveryconvenientbecausewecandefineafewcommonsettingsinthegeneraltemplatesandchangetheminthemorespecializedones.Inaddition,thereisanorderparameterthatletsusforcethedesiredtemplateordering.Youcanthinkoftemplatesasdynamicmappingsthatcanbeappliednottothetypesindocumentsbuttotheindices.
AnexampleofatemplateLet’sseearealexampleofatemplate.Imaginethatwewanttocreatemanyindicesinwhichwedon’twanttostorethesourceofthedocumentssothatourindicesaresmaller.Wealsodon’tneedanyreplicas.WecancreateatemplatethatmatchesourneedbyusingtheElasticsearchRESTAPIandthe/_templateendpoint,bysendingthefollowingcommand:
curl-XPUThttp://localhost:9200/_template/main_template?pretty-d'{
"template":"*",
"order":1,
"settings":{
"index.number_of_replicas":0
},
"mappings":{
"_default_":{
"_source":{
"enabled":false
}
}
}
}'
Fromnowon,allthecreatedindiceswillhavenoreplicasandnosourcestored.Thisisbecausethetemplateparametervalueissetto*,whichmatchesallthenamesoftheindices.Notethe_default_typenameinourexample.Thisisaspecialtypenamewhichindicatesthatthecurrentruleshouldbeappliedtoeverydocumenttype.Thesecondinterestingthingistheorderparameter.Let’sdefineasecondtemplatebyusingthefollowingcommand:
curl-XPUThttp://localhost:9200/_template/ha_template?pretty-d'{
"template":"ha_*",
www.EBooksWorld.ir
"order":10,
"settings":{
"index.number_of_replicas":5
}
}'
Afterrunningtheprecedingcommand,allthenewindiceswillbehaveasearlierexcepttheoneswithnamesbeginningwithha_.Incaseoftheseindices,boththetemplatesareapplied.First,thetemplatewiththelowerordervalueisusedandthenthenexttemplateoverwritesthereplica’ssetting.So,theindiceswhosenamesstartwithha_willhavefivereplicasanddisabledsourcesstored.
NoteBeforeversion2.0,Elasticsearchtemplatescouldalsobestoredinfiles.StartingwithElasticsearch2.0,thisfeatureisnolongeravailable.
www.EBooksWorld.ir
DynamictemplatesSometimeswewanttohavethepossibilityofdefiningtypethatisdependentonthefieldnameandthetype.Thisiswheredynamictemplatescanhelp.Dynamictemplatesaresimilartotheusualmappings,buteachtemplatehasitspatterndefined,whichisappliedtoadocument’sfieldname.Ifafieldnamematchesthepattern,thetemplateisused.
Let’shavealookatthefollowingexample:
curl-XPOST'localhost:9200/news'-d'{
"mappings":{
"article":{
"dynamic_templates":[
{
"template_test":{
"match":"*",
"mapping":{
"index":"analyzed",
"fields":{
"str":{
"type":"{dynamic_type}",
"index":"not_analyzed"
}
}
}
}
}
]
}
}
}'
Intheprecedingexample,wedefinedthemappingforthearticletype.Inthismapping,wehaveonlyonedynamictemplatenamedtemplate_test.Thistemplateisappliedforeveryfieldintheinputdocumentbecauseofthesingleasteriskpatterninthematchproperty.Eachfieldwillbetreatedasamultifield,consistingofafieldnamedastheoriginalfield(forexample,title)andthesecondfieldwithanamesuffixedwithstr(forexample,title.str).ThefirstfieldwillhaveitstypedeterminedbyElasticsearch(withthe{dynamic_type}type),andthesecondfieldwillbeastring(becauseofthestringtype).
ThematchingpatternWehavetwowaysofdefiningthematchingpattern.Theyareasfollows:
match:Thistemplateisusedifthenameofthefieldmatchesthepattern(thispatterntypewasusedinourexample)unmatch:Thistemplateisusedifthenameofthefielddoesn’tmatchthepattern
Bydefault,thepatternisverysimpleandusesglobpatterns.Thiscanbechangedbyusingmatch_pattern=regexp.Afteraddingthisproperty,wecanuseallthemagicprovidedbyregularexpressionstomatchandunmatchthepatterns.Therearevariationssuchas
www.EBooksWorld.ir
path_matchandpath_unmatchthatcanbeusedtomatchthenamesinnesteddocuments(byprovidingpath,similartoqueries).
FielddefinitionsWhenwritingatargetfielddefinition,thefollowingvariablescanbeused:
{name}:Thenameoftheoriginalfieldfoundintheinputdocument{dynamic_type}:Thetypedeterminedfromtheoriginaldocument
NoteNotethatElasticsearchchecksthetemplatesintheorderoftheirdefinitionsandthefirstmatchingtemplateisapplied.Thismeansthatthemostgenerictemplates(forexample,with"match":"*")mustbedefinedattheend.
www.EBooksWorld.ir
www.EBooksWorld.ir
ElasticsearchpluginsAtvariousplacesinthisbook,wehaveuseddifferentpluginsthathavebeenabletoextendthecorefunctionalityofElasticsearch.YouprobablyremembertheadditionalprogramminglanguagesusedinscriptsdescribedintheScriptingcapabilitiesofElasticsearchsectionofChapter6,MakeYourSearchBetter.Inthissection,wewilllookathowthepluginsworkandhowtoinstallthem.
www.EBooksWorld.ir
ThebasicsBydefault,Elasticsearchpluginsarelocatedintheirownsubdirectoryinthepluginssubdirectoryofthesearchenginehomedirectory.Ifyouhavedownloadedanewpluginmanually,youcanjustcreateanewdirectorywiththepluginnameandunpackthatpluginarchivetothisdirectory.Thereisalsoamoreconvenientwaytoinstallplugins:byusingthepluginscript.Wehaveuseditseveraltimesinthisbookwithouttalkingaboutit,sothistimelet’stakethetimeanddescribethistool.
Elasticsearchhastwomaintypesofplugins.Thesetwotypescanbecategorizedbasedonthecontentoftheplugin-descriptor.propertiesfile:Javapluginsandsiteplugins.Let’sstartwiththesiteplugins.TheyusuallycontainsetsofHTML,CSS,andJavaScriptfilesandaddadditionalUIcomponentstoElasticsearch.Elasticsearchtreatsthesitepluginsasafilesetthatshouldbeservedbythebuilt-inHTTPserverunderthe/_plugin/plugin_name/URL(forexample,/_plugin/bigdesk/).Thistypeofplugindoesn’tchangeanythingincoreElasticsearchfunctionality.
TheJavapluginsaretheonesthataddormodifythecoreElasticsearchfeatures.TheyusuallycontaintheJARfiles.Theplugin-descriptor.propertiesfilecontainsinformationaboutthemainclassthatshouldbeusedbyElasticsearchasanentrypointtoconfigurepluginsandallowthemtoextendtheElasticsearchfunctionality.ThenicethingabouttheJavapluginsisthattheycancontainthesitepartaswell.Thesitepartofthepluginneedstobeplacedinthe_sitedirectoryifweareunpackingthepluginmanually.
www.EBooksWorld.ir
InstallingpluginsPluginscanbedownloadedfromthreesourcetypes.Thefirstistheofficialrepositorylocatedathttps://download.elastic.co.Allpluginsfromthissourcecanbeinstalledbyreferringtothepluginname.Forexample:
bin/plugininstalllang-javascript
Theprecedingcommandresultsininstallationofapluginthatallowsustouseanadditionalscriptinglanguage,JavaScript.ElasticsearchautomaticallytriestofindapluginversionthatisthesameastheversionofElasticsearchweareusing.Sometimes,likeinthefollowingexample,apluginmayaskforadditionalpermissionsduringinstallation.
Justsoweknowwhattoexpect,thisisanexampleresultofrunningtheprecedingcommand:
->Installinglang-javascript…
Trying
https://download.elastic.co/elasticsearch/release/org/elasticsearch/plugin/
lang-javascript/2.2.0/lang-javascript-2.2.0.zip…
Downloading…...............................................................
.........DONE
Verifying
https://download.elastic.co/elasticsearch/release/org/elasticsearch/plugin/
lang-javascript/2.2.0/lang-javascript-2.2.0.zipchecksumsifavailable…
Downloading.DONE
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@WARNING:pluginrequiresadditionalpermissions@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
*java.lang.RuntimePermissioncreateClassLoader
*org.elasticsearch.script.ClassPermission<<STANDARD>>
*org.elasticsearch.script.ClassPermission
org.mozilla.javascript.ContextFactory
*org.elasticsearch.script.ClassPermissionorg.mozilla.javascript.Callable
*org.elasticsearch.script.ClassPermission
org.mozilla.javascript.NativeFunction
*org.elasticsearch.script.ClassPermissionorg.mozilla.javascript.Script
*org.elasticsearch.script.ClassPermission
org.mozilla.javascript.ScriptRuntime
*org.elasticsearch.script.ClassPermissionorg.mozilla.javascript.Undefined
*org.elasticsearch.script.ClassPermission
org.mozilla.javascript.optimizer.OptRuntime
See
http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.
html
fordescriptionsofwhatthesepermissionsallowandtheassociatedrisks.
Continuewithinstallation?[y/N]y
Installedlang-javascriptinto/Users/someplace/elasticsearch-
2.2.0/plugins/lang-javascript
Installedlang-javascriptinto
/Users/negativ/Developer/Elastic/elasticsearch-2.2.0/plugins/lang-
javascript
www.EBooksWorld.ir
Ifthepluginisnotavailableatthefirstlocation,itcanbeplacedinoneoftheApacheMavenrepositories:MavenCentral(https://search.maven.org/)orMavenSonatype(https://oss.sonatype.org/).Inthiscase,thepluginnameforinstallationshouldbeequaltogroupId/artifactId/version,justaseverylibraryforMaven(http://maven.apache.org/).Forexample:
bin/plugininstallorg.elasticsearch/elasticsearch-mapper-attachments/3.0.1
ThethirdsourcearetheGitHub(https://github.com/)repositories.Theplugintoolassumesthatthegivenpluginaddresscontainstheorganizationnamefollowedbythepluginnameand,optionally,theversionnumber.Let’slookatthefollowingcommandexample:
bin/plugininstallmobz/elasticsearch-head
Ifyouwriteyourownpluginandyouhavenoaccesstotheearlier-mentionedsites,thereisnoproblem.Theplugintoolacceptstheurlpropertyfromwherethepluginshouldbedownloaded(insteadofspecifyingthenameoftheplugin).Thisoptionallowsustosetanylocationfortheplugins,includingthelocalfilesystem(usingthefile://prefix)orremotefile(usingthehttp://prefix).Forexample,thefollowingcommandwillresultintheinstallationofapluginarchivedonthelocalfilesysteminthe/tmp/elasticsearch-lang-javascript-3.0.0.RC1.zipdirectory:
bin/plugininstallfile:///tmp/elasticsearch-lang-javascript-3.0.0.RC1.zip
www.EBooksWorld.ir
RemovingpluginsRemovingapluginisassimpleasremovingitsdirectory.Youcanalsodothisbyusingtheplugintool.Forexample,toremovethepreviouslyinstalledJavaScriptplugin,werunacommandasfollows:
bin/pluginremovelang-javascript
Theoutputfromthecommandjustconfirmsthatthepluginwasremoved:
->Removinglang-javascript…
Removedlang-javascript
NoteYouneedtorestarttheElasticsearchnodefortheplugininstallationorremovaltotakeeffect.
www.EBooksWorld.ir
www.EBooksWorld.ir
ElasticsearchcachesUntilnowwehaven’tmentionedElasticsearchcachesmuchinthebook.However,asmostcommonsystemsElasticsearchusersavarietyofcachestoperformmorecomplicatedoperationsortospeedupperformanceofheavydataretrievalfromdiskbasedLuceneindices.Inthissection,wewilllookatthemostcommoncachesofElasticsearch,whattheyareusedfor,whataretheperformanceimplicationsofusingthem,andhowtoconfigurethem.
www.EBooksWorld.ir
FielddatacacheInthebeginningofthebook,wediscussedthatElasticsearchusesthesocalledinvertedindexdatastructuretoquicklyandefficientlysearchthroughthedocuments.Thisisverygoodwhensearchingandfilteringthedata,butforfeaturessuchasaggregations,sorting,orscriptusage,Elasticsearchneedsanun-inverteddatastructure,becausethesefunctionsrelyonperdocumentdatainformation.
Becauseoftheneedforuninverteddata,whenElasticsearchwasfirstreleaseditcontainedandstillcontainsaninmemorydatastructurecalledfielddata.Fielddataisusedtostoreallthevaluesofagivenfieldtomemorytoprovideveryfastdocumentbasedlookup.However,thecostofusingfielddataismemoryandincreasedgarbagecollection.Becauseofmemoryandperformancecost,startingfromElasticsearch2.0,eachindexed,notanalyzedfieldusesdocvaluesbydefault.Otherfields,suchasanalyzedtextfields,stillusefielddataandbecauseofthatitisgoodtoknowhowtohandlefielddata.
FielddatasizeElasticsearchallowsustocontrolhowmuchmemorythefielddatacacheuses.Bydefault,thecacheisunbounded,whichisverydangerous.Ifyouhavelargeindices,youmayrunintomemoryissues,wherethefielddatacachewilleatmostofthememorygiventoElasticsearchandwillresultinnodefailure.Weareallowedtoconfigurethesizeofthefielddatacachebyusingthestaticindices.fielddata.cache.sizepropertysettoanexplicitvalue(like10GB)ortoapercentageofthewholememorygiventoElasticsearch(like20%).
Rememberthatthefielddatacacheisveryexpensivetobuildasitneedstoloadallthevaluesofagivenfieldtomemory.Thiscantakealotoftimeresultingindegradationintheperformanceofthequeries.Becauseofthis,itisadvisedtohaveenoughmemorytokeeptheneededcachepermanentlyinElasticsearchmemory.However,weunderstandthatthisisnotalwayspossiblebecauseofhardwarecosts.
CircuitbreakersThenicethingaboutElasticsearchisthatitallowsustoachieveasimilarthinginmultiplewaysandwehavethesamesituationwhenitcomestofielddataandlimitingthememoryusage.Elasticsearchallowsustouseafunctionalitycalledcircuitbreakers,whichcanestimatehowmuchmemoryarequestoraquerywilluse,andifitisaboveadefinedthreshold,itwon’tbeexecutedatall,resultinginnomemoryusageandanexceptionthrown.Thisisverynicewhenwedon’twanttolimitthesizeofthefielddatacachebutwealsodon’twantasinglequerytocausememoryissuesandmaketheclusterunstable.Therearetwomaincircuitbreakers:thefielddatacircuitbreakerandtherequestcircuitbreaker.
Thefirstcircuitbreaker,thefielddataone,estimatestheamountofmemorythatwillneedtobeusedtoloaddatatothefielddatacacheforagivenquery.Wecanconfigurethelimitbyusingtheindices.breaker.fielddata.limitproperty,whichisbydefaultsetto60%,whichmeansthatafielddatacacheforasinglequerycan’tusemorethan60percent
www.EBooksWorld.ir
ofthememorygiventoElasticsearch.
Thesecondcircuitbreaker,therequestone,estimatesthememoryusedbyperrequestdatastructuresandpreventsthemfromusingmorethantheamountspecifiedbytheindices.breaker.request.limitproperty.Bydefault,thementionedpropertyissetto40%,whichmeansthatsinglerequestdatastructures,suchastheonesusedforaggregationcalculation,can’tusemorethan40%ofthememorygiventoElasticsearch.
Finally,thereisonemorecircuitbreakerthatisdefinedbytheindices.breaker.limit.totalproperty(bydefaultsetto70%).Thiscircuitbreakerdefinesthetotalamountofmemorythatcanbeusedbyboththeperrequestdatastructuresandfielddata.
Rememberthatthesettingsforcircuitbreakersaredynamicandcanbeupdatedusingclusterupdatesettings.
www.EBooksWorld.ir
FielddataanddocvaluesAswealreadydiscussed,insteadoffielddatacache,docvaluescanbeused.Ofcourse,thisisonlytruefornotanalyzedfieldsandonesusingnumericdatatypesandnotmultivaluedones.Thiswillsavememoryandshouldbefasterthanthefielddatacacheduringquerytime,atthecostofslightindexingspeeddegradations(verysmall)andaslightlylargerindex.Ifyoucanusedocvalues,dothat–itwillhelpyourElasticsearchclustertomaintainstabilityandrespondtoqueriesquickly.
www.EBooksWorld.ir
ShardrequestcacheThefirstofthecachesthatoperatesonthequeries.Theshardrequestcachecachestheaggregationsandsuggestionsresultedbythequery,but,whenwritingthisbook,itwasnotcachingqueryhits.WhenElasticsearchexecutesthequery,thiscachecansavetheresourceconsumingaggregationsforthequeryandspeedupthesubsequentqueriesbyretrievingtheaggregationsorsuggestionsfrommemory.
NoteDuringthewritingofthisbook,theshardrequestcachewasonlyusedwhenthesize=0parameterwassetforthequery.Thismeansthatonlythetotalnumberofhits,aggregationresults,andsuggestionswillbecached.Rememberthatwhenrunningquerieswithdatesandusingthenowconstant,theshardquerycachewon’talsobeused.
Theshardrequestcache,asitsnamesays,cachestheresultsofthequeriesoneachshard,beforetheyarereturnedtothenodethataggregatestheresults.Thiscanbeverygoodwhenyouraggregationsareheavy,liketheonesthatdoalotofcomputationonthedatareturnedbythequery.Ifyourunalotofaggregationswithyourqueriesandthequeriescanberepeated,thinkaboutusingtheshardrequestcacheasitshouldhelpyouwithquerieslatency.
EnablingandconfiguringtheshardrequestcacheTheshardrequestcacheisdisabledbydefault,butcanbeeasilyenabled.Toenableit,weshouldsettheindex.requests.cache.enablepropertytotruewhencreatingtheindex.Forexample,toenabletheshardrequestcacheforanindexcallednew_library,weusethefollowingcommand:
curl-XPUT'localhost:9200/new_library'-d'{
"settings":{
"index.requests.cache.enable":true
}
}'
Onethingtorememberisthatthementionedsettingisnotdynamicallyupdatable.Weneedtoincludeitintheindexcreationcommandorwecanupdateitwhentheindexisclosed.
Themaximumsizeofthecacheisspecifiedusingtheindices.requests.cache.sizepropertyandissetto1%bydefault(whichmeans1%ofthetotalmemorygiventoElasticsearch).Wecanalsospecifyhowlongeachentryshouldbekeptbyusingtheindices.requests.cache.expireproperty,butitisnotsetbydefault.Also,thecacheisinvalidatedoncetheindexisrefreshed(duringindexsearcherreopening),whichmakesthesettinguselessmostofthetime.
NoteNotethatintheearlierversionsofElasticsearch,forexampleinthe1.xbranch,toenableordisablethiscache,theindex.cache.query.enablepropertywasused.Thismaybe
www.EBooksWorld.ir
importantwhenmigratingfromolderElasticsearchversions.
PerrequestshardrequestcachedisablingElasticsearchallowsustocontroltherequestshardcacheusedonaperrequestbasis.Ifwehavethementionedcacheenabled,wecanstillforcethesearchenginetoomitcachingforsuchrequests.Thisisdonebyusingtherequest_cacheparameter.Ifsettotrue,therequestwillbecachedand,ifsettofalse,therequestwon’tbecached.Thisisespeciallyusefulwhenwewanttocacheourrequestsingeneralbutomitcachingforsomequeriesthatarerareandnotusedoften.Itisalsowiseforrequeststhatusenon-deterministicscriptsandtimerangestonotbecached.
ShardrequestcacheusagemonitoringIfwedon’tuseanymonitoringsoftwarethatallowsmonitoringthecachesusage,wecanuseElasticsearchAPItocheckthemetricsaroundtheshardrequestcache.Thiscanbedonebothattheindicesleveloratthenodeslevel.
Tocheckthemetricsfortheshardrequestcacheforalltheindices,weshouldusetheindicesstatsAPIandrunthefollowingcommand:
curl'localhost:9200/_stats/request_cache?pretty'
Tochecktherequestcachemetrics,butinpernodeview,werunthefollowingcommand:
curl'localhost:9200/_nodes/stats/indices/request_cache?pretty'
www.EBooksWorld.ir
NodequerycacheThenodequerycacheisresponsibleforholdingtheresultsofqueriesforthewholenode.Itssizeisdefinedusingindices.queries.cache.size,defaultingto10%,andissharableacrossalltheshardspresentonthenode.WecansetitbothtothepercentageoftheheapmemorygiventoElasticsearch,likethedefaultone,ortoanexplicitvalue,like1024mb.Onethingtorememberaboutthecacheisthatitsconfigurationisstatic,itcan’tbeupdateddynamicallyandshouldbesetintheelasticsearch.ymlfile.Thenodequerycacheusestheleastrecentusedevictionpolicy,whichmeansthat,whenfull,itremovesthedatathatwasusedtheleast.
Thiscacheisveryusefulwhenyourunqueriesthatarerepetetiveandheavy,suchastheonesusedtogeneratecategorypagesorthemainpageinane-commerceapplication.
www.EBooksWorld.ir
IndexingbuffersThelastcachewewanttodiscussistheindexingbufferthatallowsustoimproveindexingthroughput.Theindexingbufferisdividedbetweenalltheshardsonthenodeandisusedtostorenewlyindexeddocuments.Oncethecachefillsup,Elasticsearchflushesthedatafromthecachetodisk,creatinganewLucenesegmentintheindex.
Therearefourstaticpropertiesthatallowustoconfiguretheindexingbuffersize.Theyneedtobesetintheelasticsearch.ymlfileandcan’tbechangeddynamicallyusingtheSettingsAPI.Thesepropertiesare:
indices.memory.index_buffer_size:Thispropertydefinestheamountofmemoryusedbyanodefortheindexingbuffer.Itacceptsbothapercentagevalueaswellasanexplicitvalueinbytes.Itdefaultsto10%,whichmeansthat10%oftheheapmemorygiventoanodewillbeusedastheindexingbuffer.indices.memory.min_index_buffer_size:Thispropertydefaultsto48mbandspecifiestheminimummemorythatwillbeusedbytheindexingbuffer.Itisusefulwhenindices.memory.index_buffer_sizeisdefinedasapercentagevalue,sothattheindexingbufferisneversmallerthanthevaluedefinedbythisproperty.indices.memory.max_index_buffer_size:Thispropertyspecifiesthemaximummemorythatwillbeusedbytheindexingbuffer.Itisusefulwhenindices.memory.index_buffer_sizeisdefinedasapercentagevalue,sothattheindexingbuffernevercrossesacertainamountofmemoryusage.indices.memory.min_shard_index_buffer_size:Thispropertydefaultsto4mbandsetsthehardminimumlimitoftheindexingbufferthatisgiventoeachshardonanode.Theindexingbufferforeachshardwillnotbelowerthanthevaluesetbythisproperty.
Whenitcomestoindexingperformance,ifyouneedhigherindexingthroughput,considersettingtheindexingbuffersizetoavaluehigherthanthedefaultsize.ItwillallowElasticsearchtoflushthedatatodisklessoftenandcreatefewersegments.Thiswillresultinlessmerges,thuslessI/OandCPUintensiveoperations.Becauseofthat,Elasticsearchwillbeabletousemoreresourcesforindexingpurposes.
www.EBooksWorld.ir
WhencachesshouldbeavoidedTheusualquestionthatmaybeaskedbyusersisiftheyshouldreallycachealltheirrequests.Theanswerisobvious–ofcourse,cachesarenotthetoolforeveryone.Usingcachingisnotfree–itrequiresmemoryandadditionaloperationstoputthedatatocacheorgetthedataoutofthere.
What’smore,youshouldrememberthatElasticsearchroundrobinsqueriesbetweenprimaryshardsarereplicas,so,ifyouhavereplicas,noteveryrequestafterthefirstonewillusethecache.Imaginethatyouhaveanindexwhichhasasingleprimaryshardandtworeplicas.Whenthefirstrequestcomes,itwillhitarandomshard,butthenextrequest,evenwiththesamequery,willhitanothershard,notthesameone(unlessroutingisused).Youshouldtakethisintoconsiderationwhenusingcaches,becauseifyourqueriesarenotrepeated,youmayhavethemrunninglongerbecauseofacachebeingused.
Sotoanswerthequestionifyoushouldusecachingornot,wewouldadvisetakingyourdata,takingyourqueries,andrunningperformancetestsusingtoolssuchasJMeter(http://jmeter.apache.org).Thiswillletyouseehowyourclusterbehaveswithrealdataunderatestloadandseeifthequeriesareactuallyfasterwithorwithoutthecaches.
www.EBooksWorld.ir
www.EBooksWorld.ir
TheupdatesettingsAPIElasticsearchletsustuneitselfbyspecifyingthevariousparametersintheelasticsearch.ymlfile.ButyoushouldtreatthisfileasthesetofdefaultvaluesthatcanbechangedintheruntimeusingtheElasticsearchRESTAPI.Wecanchangeboththeperindexsettingandtheclusterwidesettings.However,youshouldrememberthatnotallpropertiescanbedynamicallychanged.Ifyoutrytoaltertheseparameters,Elasticsearchwillrespondwithapropererror.
www.EBooksWorld.ir
TheclustersettingsAPIInordertosetoneoftheclusterproperties,weneedtousetheHTTPPUTmethodandsendaproperrequesttothe_cluster/settingsURI.However,wehavetwooptions:addingthechangesastransientorpermanent.
Thefirstone,transient,willsetthepropertyonlyuntilthefirstrestart.Inordertodothis,wesendthefollowingcommand:
curl-XPUT'localhost:9200/_cluster/settings'-d'{
"transient":{
"PROPERTY_NAME":"PROPERTY_VALUE"
}
}'
Asyoucansee,intheprecedingcommand,weusedtheobjectnamedtransientandweaddedourpropertydefinitionthere.Thismeansthatthepropertywillbevalidonlyuntiltherestart.Ifwewantourpropertysettingstopersistbetweenrestarts,insteadofusingtheobjectnamedtransient,weusetheonenamedpersistent.
Atanymoment,youcanfetchthesesettingsusingthefollowingcommand:
curl-XGETlocalhost:9200/_cluster/settings
www.EBooksWorld.ir
TheindicessettingsAPITochangetheindicesrelatedsettings,Elasticsearchprovidesthe/_settingsendpointforchangingtheparametersforalltheindicesandthe/index_name/_settingsendpointformodifyingthesettingsofasingleindex.Whencomparedtotheclusterwidesettings,allthechangesdonetoindicesusingtheAPIarealwayspersistentandvalidafterElasticsearchrestarts.Tochangethesettingsforalltheindices,wesendthefollowingcommand:
curl-XPUT'localhost:9200/_settings'-d'{
"index":{
"PROPERTY_NAME":"PROPERTY_VALUE"
}
}'
Thecurrentsettingsforalltheindicescanbelistedusingthefollowingcommand:
curl-XGETlocalhost:9200/_settings
Tosetapropertyforasingleindex,werunthefollowingcommand:
curl-XPUT'localhost:9200/index_name/_settings'-d'{
"index":{
"PROPERTY_NAME":"PROPERTY_VALUE"
}
}'
Thegetthesettingsforthelibraryindex,werunthefollowingcommand:
curl-XGETlocalhost:9200/library/_settings
www.EBooksWorld.ir
www.EBooksWorld.ir
SummaryInthechapterwejustfinished,welearnedafewveryimportantthingsaboutElasticsearch.Firstofall,welearnedhowwecanconfigurethenodediscoverymechanism.Inadditiontothat,welearnedtocontrolwhathappensaftertheclusterisinitiallyformedusingtherecoveryandgatewaymodules.Weuseddynamicandnon-dynamictemplatestohandleourindicesmoreeasily,andwelearnedwhattypeofcachesElasticsearchhasandhowtocontrolthem.Finally,weusedtheupdatesettingsAPItoupdatethevariousElasticsearchconfigurationvariablesonanalreadylivecluster.
Inthenextchapter,wewillfocusonclusteradministration.Wewillstartwithlearninghowtobackupourdataandhowtomonitorthekeyclustermetrics.We’llseethewaytocontrolclusterrebalancingandshardallocation,andwewilluseahumanfriendlyCatAPIthatallowsustogetvariedinformationaboutthecluster.Finally,wewilllearnaboutwarmingupourindicesandaliasing.
www.EBooksWorld.ir
www.EBooksWorld.ir
Chapter10.AdministratingYourClusterInthepreviouschapter,wefocusedonElasticsearchnodesandclusterconfiguration.Westartedbydiscussingthenodediscoveryprocess,whatitisandhowtoconfigureit.We’vediscussedgatewayandrecoverymodulesandtunedthemtomatchourneeds.We’veusedtemplatesanddynamictemplatestomanagedatastructureeasilyandlearnedhowtoinstallpluginstoextendthefunctionalitiesofElasticsearch.Finally,we’velearnedaboutthecachesofElasticsearchandhowtoupdateindicesandclustersettingsusingadedicatedAPI.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:
BackingupyourindicesinElasticsearchMonitoringyourclustersControllingshardsandrebalancingreplicasControllingshardsandallocatingreplicasUsingCATAPItolearnaboutclusterstateWarmingupAliasing
www.EBooksWorld.ir
ElasticsearchtimemachineAgoodpieceofsoftwareisaonethatcanmanageexceptionalsituationssuchashardwarefailureorhumanerror.Eventhoughaclusterofafewserversislessdependentonhardwareproblems,badthingscanstillhappen.Forexample,let’simaginethatyouneedtorestoreyourindices.OnepossiblesolutionistoreindexallyourdatafromaprimarydatastoresuchasaSQLdatabase.Butwhatwillyoudoifittakestoolongor,evenworse,theonlydatastoreisElasticsearch?BeforeElasticsearch1.0,creatingbackupsofindiceswasnoteasy.Theprocedureincludedstoppingindexation,flushingthedatatodisk,shuttingdownthecluster,and,finally,copyingthedatatoabackupdevice.
Fortunately,nowwecantakesnapshotsandthissectionwillguideyouandshowhowthisfunctionalityworks.
www.EBooksWorld.ir
CreatingasnapshotrepositoryAsnapshotkeepsallthedatarelatedtotheclusterfromthetimethesnapshotcreationstartsanditincludesinformationabouttheclusterstateandindices.Beforewecreatesnapshots,atleastthefirstone,asnapshotrepositorymustbecreated.Eachrepositoryisrecognizedbyitsnameandshoulddefinethefollowingaspects:
name:Auniquenameoftherepository;wewillneeditlater.type:Thetypeoftherepository.Thepossiblevaluesarefs(arepositoryonasharedfilesystem)andurl(aread-onlyrepositoryavailableviaURL)settings:Additionalinformationneededdependingontherepositorytype
Now,let’screateafilesystemrepository.Beforethis,wehavetomakesurethatthedirectoryforourbackupsfulfilstworequirements.Thefirstisrelatedtosecurity.EveryrepositoryhastobeplacedinthepathdefinedintheElasticsearchconfigurationfileaspath.repo.Forexample,ourelasticsearch.ymlincludesalinesimilartothefollowingone:
path.repo:["/tmp/es_backup_folder","/tmp/backup/es"]
Thesecondrequirementsaysthateverynodeintheclustershouldbeabletoaccessthedirectorywesetfortherepository.
Sonow,let’screateanewfilesystemrepositorybyrunningthefollowingcommand:
curl-XPUTlocalhost:9200/_snapshot/backup-d'{
"type":"fs",
"settings":{
"location":"/tmp/es_backup_folder/cluster1"
}
}'
Theprecedingcommandcreatesarepositorynamedbackup,whichstoresthebackupfilesinthedirectorygivenbythelocationattribute.Elasticsearchrespondswiththefollowinginformation:
{"acknowledged":true}
Atthesametime,es_backup_folderonthelocalfilesystemiscreated—withoutanycontentyet.
NoteYoucanalsosetarelativepathwiththelocationparameter.Inthiscase,Elasticsearchdeterminestheabsolutepathbyfirstgettingthedirectorydefinedinpath.repo.
Aswesaid,thesecondrepositorytypeisurl.Itrequiresaurlparameterinsteadofthelocation,whichpointstotheaddresswheretherepositoryresides,forexample,theHTTPaddress.Asinthepreviouscase,theaddressshouldbedefinedintherepositories.url.allowed_urlsparameterintheElasticsearchconfiguration.Theparameterallowstheuseofwildcardsintheaddress.
www.EBooksWorld.ir
NoteNotethatfile://addressesarecheckedagainstthepathsdefinedinthepath.repoparameter.
YoucanalsostoresnapshotsinAmazonS3,HDFS,orAzureusingtheadditionalpluginsavailable.Tolearnaboutthese,pleasevisitthefollowingpages:
https://github.com/elastic/elasticsearch-cloud-aws#s3-repositoryhttps://github.com/elastic/elasticsearch-hadoop/tree/master/repository-hdfshttps://github.com/elastic/elasticsearch-cloud-azure#azure-repository
Nowthatwehaveourfirstrepository,wecanseeitsdefinitionusingthefollowingcommand:
curl-XGETlocalhost:9200/_snapshot/backup?pretty
Wecanalsocheckalltherepositoriesbyrunningacommandlikethefollowing:
curl-XGETlocalhost:9200/_snapshot/_all?pretty
Orsimply,wecanusethis:
curl-XGETlocalhost:9200/_snapshot/_all?pretty
curl-XGETlocalhost:9200/_snapshot/?pretty
Ifyouwanttodeleteasnapshotrepository,thestandardDELETEcommandhelps:
curl-XDELETElocalhost:9200/_snapshot/backup?pretty
www.EBooksWorld.ir
CreatingsnapshotsBydefault,Elasticsearchtakesalltheindicesandclustersettings(exceptthetransientones)whencreatingsnapshots.Youcancreateanynumberofsnapshotsandeachwillholdinformationavailablerightfromthetimewhenthesnapshotwascreated.Thesnapshotsarecreatedinasmartway;onlynewinformationiscopied.ThismeansthatElasticsearchknowswhichsegmentsarealreadystoredintherepositoryanddoesn’thavetosavethemagain.
Tocreateanewsnapshot,weneedtochooseauniquenameandusethefollowingcommand:
curl-XPUT'localhost:9200/_snapshot/backup/bckp1'
Theprecedingcommanddefinesanewsnapshotnamedbckp1(youcanonlyhaveonesnapshotwithagivenname;Elasticsearchwillcheckitsuniqueness)anddataisstoredinthepreviouslydefinedbackuprepository.Thecommandreturnsanimmediateresponse,whichlooksasfollows:
{"accepted":true}
Theprecedingresponsemeansthattheprocessofsnapshot-inghasstartedandcontinuesinthebackground.Ifyouwouldliketheresponsetobereturnedonlywhentheactualsnapshotiscreated,youcanaddthewait_for_completion=trueparameterasshowninthefollowingexample:
curl-XPUT'localhost:9200/_snapshot/backup/bckp2?
wait_for_completion=true&pretty'
Theresponsetotheprecedingcommandshowsthestatusofacreatedsnapshot:
{
"snapshot":{
"snapshot":"bckp2",
"version_id":2000099,
"version":"2.2.0",
"indices":["news"],
"state":"SUCCESS",
"start_time":"2016-01-07T21:21:43.740Z",
"start_time_in_millis":1446931303740,
"end_time":"2016-01-07T21:21:44.750Z",
"end_time_in_millis":1446931304750,
"duration_in_millis":1010,
"failures":[],
"shards":{
"total":5,
"failed":0,
"successful":5
}
}
}
Asyoucansee,Elasticsearchpresentsinformationaboutthetimetakenbythesnapshot-
www.EBooksWorld.ir
ingprocess,itsstatus,andtheindicesaffected.
AdditionalparametersThesnapshotcommandalsoacceptsthefollowingadditionalparameters:
indices:Thenamesoftheindicesofwhichwewanttotakesnapshots.ignore_unavailable:Whenthisissettofalse(thedefault),Elasticsearchwillreturnanerrorifanyindexlistedusingtheindicesparameterismissing.Whensettotrue,Elasticsearchwilljustignorethemissingindicesduringbackup.include_global_state:Whenthisissettotrue(thedefault),theclusterstateisalsowrittentothesnapshot(exceptforthetransientsettings).partial:Thesnapshotoperationsuccessdependsontheavailabilityofalltheshards.Ifanyoftheshardsisnotavailable,thesnapshotoperationwillfail.SettingpartialtotruecausesElasticsearchtosaveonlytheavailableshardsandomitthelostones.
Anexampleofusingadditionalparameterscanlookasfollows:
curl-XPUT'localhost:9200/_snapshot/backup/bckp3?
wait_for_completion=true&pretty'-d'{
"indices":"b*",
"include_global_state":"false"
}'
www.EBooksWorld.ir
RestoringasnapshotNowthatwehaveoursnapshotsdone,wewillalsolearnhowtorestoredatafromagivensnapshot.Aswesaidearlier,asnapshotcanbeaddressedbyitsname.Wecanlistallthesnapshotsusingthefollowingcommand:
curl-XGET'localhost:9200/_snapshot/backup/_all?pretty'
TheresponsereturnedbyElasticsearchtotheprecedingcommandshowsthelistofallavailablebackups.Everylistitemissimilartothefollowing:
{
"snapshot":{
"snapshot":"bckp2",
"version_id":2000099,
"version":"2.2.0",
"indices":["news"],
"state":"SUCCESS",
"start_time":"2016-01-07T21:21:43.740Z",
"start_time_in_millis":1446931303740,
"end_time":"2016-01-07T21:21:44.750Z",
"end_time_in_millis":1446931304750,
"duration_in_millis":1010,
"failures":[],
"shards":{
"total":5,
"failed":0,
"successful":5
}
}
}
Therepositorywecreatedearlieriscalledbackup.Torestoreasnapshotnamedbckp1fromoursnapshotrepository,runthefollowingcommand:
curl-XPOST'localhost:9200/_snapshot/backup/bckp1/_restore'
Duringtheexecutionofthiscommand,Elasticsearchtakestheindicesdefinedinthesnapshotandcreatesthemwiththedatafromthesnapshot.However,iftheindexalreadyexistsandisnotclosed,thecommandwillfail.Inthiscase,youmayfinditconvenienttoonlyrestorecertainindices,forexample:
curl-XPOST'localhost:9200/_snapshot/backup/bckp1/_restore?pretty'-d'{
"indices":"c*"}'
Theprecedingcommandrestoresonlytheindicesthatbeginwiththeletterc.Theotheravailableparametersareasfollows:
ignore_unavailable:Thisparameterwhensettofalse(thedefaultbehavior),willcauseElasticsearchtofailtherestoreprocessifanyoftheexpectedindicesisnotavailable.include_global_state:ThisparameterwhensettotruewillcauseElasticsearchtorestoretheglobalstateincludedinthesnapshot,whichisalsothedefaultbehavior.
www.EBooksWorld.ir
rename_pattern:Thisparameterallowstherenamingoftheindexduringarestoreoperation.Thankstothis,therestoredindexwillhaveadifferentname.Thevalueofthisparameterisaregularexpressionthatdefinesthesourceindexname.Ifapatternmatchesthenameoftheindex,namesubstitutionwilloccur.Inthepattern,youshouldusegroupslimitedbyparenthesesusedintherename_replacementparameter.rename_replacement:Thisparameteralongwithrename_patterndefinesthetargetindexname.Usingthedollarsignandnumber,youcanrecalltheappropriategroupfromrename_pattern.
Forexample,duetorename_pattern=products_(.*),onlytheindiceswithnamesthatbeginwithproducts_willberestored.Therestoftheindexnamewillbeusedduringreplacement.rename_pattern=products_(.*)togetherwithrename_replacement=items_$1causestheproducts_carsindextoberestoredtoanindexcalleditems_cars.
www.EBooksWorld.ir
Cleaningup–deletingoldsnapshotsElasticsearchleavessnapshotrepositorymanagementuptoyou.Currently,thereisnoautomaticclean-upprocess.Butdon’tworry;thisissimple.Forexample,let’sremoveourpreviouslytakensnapshot:
curl-XDELETE'localhost:9200/_snapshot/backup/bckp1?pretty'
Andthat’sall.Thecommandcausesthesnapshotnamedbckp1fromthebackuprepositorytobedeleted.
www.EBooksWorld.ir
www.EBooksWorld.ir
Monitoringyourcluster’sstateandhealthMonitoringisessentialwhenitcomestohandlingyourclusterandensuringitisinahealthystate.Itallowsadministratorsanddevelopstodetectpossibleproblemsandpreventthembeforetheyoccurortoactassoonastheystartshowing.Intheworstcase,monitoringallowsustodoapostmortemanalysisofwhathappenedtotheapplication—inthiscase,ourElasticsearchclusterandeachofthenodes.
Elasticsearchprovidesverydetailedinformationthatallowsustocheckandmonitorournodesortheclusterasawhole.Thisincludesstatisticsandinformationabouttheservers,nodes,indices,andshards.Ofcourse,wearealsoabletogetinformationabouttheentireclusterstate.BeforewegetintothedetailsaboutthementionedAPI,pleaserememberthattheAPIiscomplexandweareonlydescribingthebasics.Wewilltrytoshowyouwheretostartsoyou’llbeabletoknowwhattolookforwhenyouneedverydetailedinformation.
www.EBooksWorld.ir
ClusterhealthAPIOneofthemostbasicAPIsistheclusterhealthAPI,whichallowsustogetinformationabouttheentireclusterstatewithasingleHTTPcommand.Forexample,let’srunthefollowingcommand:
curl-XGET'localhost:9200/_cluster/health?pretty'
AsampleresponsereturnedbyElasticsearchfortheprecedingcommandlooksasfollows:
{
"cluster_name":"elasticsearch",
"status":"yellow",
"timed_out":false,
"number_of_nodes":1,
"number_of_data_nodes":1,
"active_primary_shards":11,
"active_shards":11,
"relocating_shards":0,
"initializing_shards":0,
"unassigned_shards":11,
"delayed_unassigned_shards":0,
"number_of_pending_tasks":0,
"number_of_in_flight_fetch":0,
"task_max_waiting_in_queue_millis":0,
"active_shards_percent_as_number":50.0
}
Themostimportantinformationisaboutthestatusofthecluster.Inourexample,weseethattheclusterisinyellowstatus.Thismeansthatalltheprimaryshardshavebeenallocatedproperly,butthereplicaswerenot(becauseofasinglenodeinthecluster,butthatdoesn’tmatterfornow).
Ofcourse,apartfromtheclusternameandstatus,wecanseehowtherequestwastimedout,howmanynodesthereare,howmanydatanodes,primaryshards,initializingshards,unassignedones,andsoon.
Let’sstophereandtalkabouttheclusterandwhenthecluster,asawhole,isfullyoperational.ClusterisfullyoperationalwhenElasticsearchisabletoallocatealltheshardsandreplicasaccordingtotheconfiguration.Thisiswhentheclusterisinthegreenstate.Theyellowstatemeansthatwearereadytohandlerequestsbecausetheprimaryshardsareallocated,butsome(orall)replicasarenot.Thelaststate,theredone,meansthatatleastoneprimaryshardwasnotallocatedandbecauseofthis,theclusterisnotreadyyet.Thatmeansthatthequeriesmayreturnerrorsornotcompleteresults.
Theprecedingcommandcanalsobeexecutedtocheckthehealthstateofcertainindices.Forexample,ifwewouldliketocheckthehealthofthelibraryandmapindices,wewouldrunthefollowingcommand:
curl-XGET'localhost:9200/_cluster/health/library,map/?pretty'
Controllinginformationdetails
www.EBooksWorld.ir
Elasticsearchallowsustospecifyaspeciallevelparameter,whichcantakethevalueofcluster(default),indices,orshards.ThisallowsustocontrolthedetailsofinformationreturnedbythehealthAPI.We’vealreadyseenthedefaultbehavior.Whensettingthelevelparametertoindices,apartfromtheclusterinformation,wewillalsogetperindexhealth.SettingthementionedparametertoshardstellsElasticsearchtoreturnpershardinformationinadditiontowhatwe’veseenintheexample.
AdditionalparametersInadditiontothelevelparameter,wehaveafewadditionalparametersthatcancontrolthebehaviorofthehealthAPI.
Thefirstofthementionedparametersistimeoutandallowsustocontrolhowlongatthemost,thecommandexecutionwillwaitwhenoneofthefollowingparametersisused:wait_for_status,wait_for_nodes,wait_for_relocating_shards,andwait_for_active_shards.Bydefault,itissetto30sandmeansthatthehealthcommandwillwait30secondsmaximumandreturntheresponsebythen.
Thewait_for_statusparameterallowsustotellElasticsearchwhichhealthstatustheclustershouldbeattoreturnthecommand.Itcantakethevaluesofgreen,yellow,andred.Forexample,whensettogreen,thehealthAPIcallwillreturntheresultsuntilthegreenstatusortimeoutisreached.
Thewait_for_nodesparameterallowsustosettherequirednumberofnodesavailabletoreturnthehealthcommandresponse(oruntiladefinedtimeoutisreached).Itcanbesettoanintegernumberlike3ortoasimpleequationlike>=3(means,greaterthanorequaltothreenodes)or<=3(meanslessthanorequaltothreenodes).
Thewait_for_active_shardsparametermeansthatElasticsearchwillwaitforaspecifiednumberofactiveshardstobepresentbeforereturningtheresponse.
Thelastparameteristhewait_for_relocating_shard,whichisbydefaultnotspecified.ItallowsustotellElasticsearchhowmanyrelocatingshardsitshouldwaitfor(oruntilthetimeoutisreached).Settingthisparameterto0meansthatElasticsearchshouldwaitforalltherelocatingshards.
Anexampleusageofthehealthcommandwithsomeofthementionedparametersisasfollows:
curl-XGET'localhost:9200/_cluster/health?
wait_for_status=green&wait_for_nodes=>=3&timeout=100s'
www.EBooksWorld.ir
IndicesstatsAPIElasticsearchindexistheplacewhereourdatalivesanditisacrucialpartformostdeployments.WiththeuseoftheindicesstatsAPIavailableusingthe_statsendpoint,wecangetalotofinformationabouttheindiceslivinginsideourcluster.Ofcourse,aswithmostoftheAPI’sinElasticsearch,wecansendacommandtogettheinformationaboutalltheindices(usingthepure_statsendpoint),aboutoneparticularindex(forexamplelibrary/_stats)orseveralindicesatthesametime(forexamplelibrary,map/_stats).Forexample,tocheckthestatisticsforthemapandlibraryindiceswe’veusedinthebook,wecouldrunthefollowingcommand:
curl-XGET'localhost:9200/library,map/_stats?pretty'
Theresponsetotheprecedingcommandhasmorethan700lines,soweonlydescribeitsstructureomittingtheresponseitself.Apartfromtheinformationabouttheresponsestatusandtheresponsetime,wecanseethreeobjectsnamedprimaries,total(in_allobject),andindices.Theindicesobjectcontainsinformationaboutthelibraryandmapindices.Theprimariesobjectcontainsinformationabouttheprimaryshardsallocatedtothecurrentnode,andthetotalobjectcontainsinformationaboutalltheshardsincludingreplicas.Alltheseobjectscancontainobjectsdescribingaparticularstatisticsuchasthefollowing:docs,store,indexing,get,search,merges,refresh,flush,warmer,query_cache,fielddata,percolate,completion,segments,translog,suggest,request_cache,andrecovery.
WecanlimittheamountofinformationthatwegetfromtheindicesstatsAPIbyprovidingthetypeofdataweareinterestedinusingthenamesofthestatisticsmentionedpreviously.Forexample,ifwewanttogetinformationaboutindexingandsearching,wecanrunthefollowingcommand:
curl-XGET'localhost:9200/library,map/_stats/indexing,search?pretty'
Let’sdiscusstheinformationstoredinthoseobjects.
DocsThedocssectionoftheresponseshowsinformationaboutindexeddocuments.Forexample,itcouldlookasfollows:
"docs":{
"count":4,
"deleted":0
}
Themaininformationisthecount,indicatingthenumberofdocumentsinthedescribedindex.Whenwedeletedocumentsfromtheindex,Elasticsearchdoesn’tremovethesedocumentsimmediatelyandonlymarksthemasdeleted.Documentsarephysicallydeletedduringthesegmentmergeprocess.Thenumberofdocumentsmarkedasdeletedispresentedbythedeletedattributeandshouldbe0rightafterthemerge.
Store
www.EBooksWorld.ir
Thenextstatistic,thestoreone,providesinformationregardingstorage.Forexample,suchasectioncouldlookasfollows:
"store":{
"size_in_bytes":6003,
"throttle_time_in_millis":0
}
Themaininformationisabouttheindex(orindices)size.Wecanalsolookatthrottlingstatistics.ThisinformationisusefulwhenthesystemhasproblemswiththeI/Operformanceandhasconfiguredlimitsonaninternaloperationduringsegmentmerging.
Indexing,get,andsearchTheindexing,get,andsearchsectionsoftheresponseprovideinformationaboutdatamanipulationindexingwithdeleteoperations,usingreal-timegetandsearching.Let’slookatthefollowingexamplereturnedbyElasticsearch:
"indexing":{
"index_total":0,
"index_time_in_millis":0,
"index_current":0,
"delete_total":0,
"delete_time_in_millis":0,
"delete_current":0,
"noop_update_total":0,
"is_throttled":false,
"throttle_time_in_millis":0
},
"get":{
"total":0,
"time_in_millis":0,
"exists_total":0,
"exists_time_in_millis":0,
"missing_total":0,
"missing_time_in_millis":0,
"current":0
},
"search":{
"open_contexts":0,
"query_total":0,
"query_time_in_millis":0,
"query_current":0,
"fetch_total":0,
"fetch_time_in_millis":0,
"fetch_current":0,
"scroll_total":0,
"scroll_time_in_millis":0,
"scroll_current":0
}
Asyoucansee,allofthesestatisticshavesimilarstructures.Wecanreadthetotaltimespentinvariousrequesttypes(inmilliseconds),thenumberofrequests(whichwiththetotaltimeallowsustocalculatetheaveragetimeofasinglequery).Inthecaseofgetrequests,valuableinformationishowmanyfetcheswereunsuccessful(missing
www.EBooksWorld.ir
documents);anindexingrequesthasinformationaboutthrottling,andsearchincludesinformationregardingscrolling.
AdditionalinformationInadditiontothepreviouslydescribedsection,Elasticsearchprovidesthefollowinginformation:
merges:ThissectioncontainsinformationaboutLucenesegmentmergesrefresh:Thissectioncontainsinformationabouttherefreshoperationflush:Thissectioncontainsinformationaboutflusheswarmer:Thissectioncontainsinformationaboutwarmersandforhowlongtheywereexecutedquery_cache:Thisquerycachesstatisticsfielddata:Thisfielddatacachesstatisticspercolate:Thissectioncontainsinformationaboutthepercolatorusagecompletion:Thissectioncontainsinformationaboutthecompletionsuggestersegments:ThissectioncontainsinformationaboutLucenesegmentstranslog:Thissectioncontainsinformationaboutthetransactionlogscountandsizesuggest:Thissectioncontainssuggesters-relatedstatisticsrequest_cache:Thiscontainsshardrequestcachesstatisticsrecovery:Thiscontainsshardsrecoveryinformation
www.EBooksWorld.ir
NodesinfoAPIThenodesinfoAPIprovidesuswithinformationaboutthenodesinthecluster.TogetinformationfromthisAPI,weneedtosendtherequesttothe_nodesRESTendpoints.ThesimplestcommandtoretrievenodesrelatedinformationfromElasticsearchwouldbeasfollows:
curl-XGET'localhost:9200/_nodes?pretty'
ThisAPIcanbeusedtofetchinformationaboutparticularnodesorasinglenodeusingthefollowing:
Nodename:IfwewouldliketogetinformationaboutthenodenamedPulse,wecouldrunacommandtothefollowingRESTendpoint:_nodes/PulseNodeidentifier:Ifwewouldliketogetinformationaboutthenodewithanidentifierequaltony4hftjNQtuKMyEvpUdQWg,wecouldrunacommandtothefollowingRESTendpoint:_nodes/ny4hftjNQtuKMyEvpUdQWgIPaddress:WecanuseIPaddressestogetinformationaboutthenodes.Forexample,ifwewouldliketogetinformationaboutthenodewithanIPaddressequalto192.168.1.103,wecouldrunacommandtothefollowingRESTendpoint:_nodes/192.168.1.103
ParametersfromtheElasticsearchconfiguration:Ifwewouldliketogetinformationaboutallthenodeswiththenode.rackpropertysetto2,wecouldrunacommandtothefollowingRESTendpoint:/_nodes/rack:2
ThisAPIalsoallowsustogetinformationaboutseveralnodesatonceusingthese:
Patterns,forexample:_nodes/192.168.1.*or_nodes/P*Nodesenumeration,forexample:_nodes/Pulse,SlabBothpatternsandenumerations,forexample:/_nodes/P*,S*
ReturnedinformationBydefault,thenodesAPIwillreturnextensiveinformationabouteachnodealongwiththename,identifier,andaddresses.Thisextensiveinformationincludesthefollowing:
settings:TheElasticsearchconfigurationos:Informationabouttheserversuchasprocessor,RAM,andswapspaceprocess:Processidentifierandrefreshintervaljvm:InformationaboutJavaVirtualMachinesuchasmemorylimits,memorypools,andgarbagecollectorsthread_pool:Theconfigurationofthreadpoolsforvariousoperationstransport:Listeningaddressesforthetransportprotocolhttp:InformationaboutlisteningaddressesforanHTTP-basedAPIplugins:Informationaboutthepluginsinstalledbytheusermodules:Informationaboutthebuilt-inplugins
AnexampleusageofthisAPIcanbeillustratedbythefollowingcommand:
curl'localhost:9200/_nodes/Pulse/os,jvm,plugins'
www.EBooksWorld.ir
TheprecedingcommandwillreturnthebasicinformationaboutthenodenamedPulseand,inadditiontothis,itwillincludetheoperatingsysteminformation,javavirtualmachineinformation,andplugins-relatedinformation.
www.EBooksWorld.ir
NodesstatsAPIThenodesstatsAPIissimilartothenodesinfoAPIdescribedintheprecedingsection.ThemaindifferenceisthatthepreviousAPIprovidedinformationabouttheenvironmentinwhichthenodeisrunning,whiletheonewearecurrentlydiscussingtellsusaboutwhathappenedwiththeclusterduringitswork.TousethenodesstatsAPI,youneedtosendacommandtothe/_nodes/statsRESTendpoint.However,similartothenodesinfoAPI,wecanalsoretrieveinformationaboutspecificnodes(forexample:_nodes/Pulse/stats).
ThesimplestcommandtoretrievenodesrelatedinformationfromElasticsearchwouldbeasfollows:
curl-XGET'localhost:9200/_nodes/stats?pretty'
Bydefault,Elasticsearchreturnsalltheavailablestatisticsbutwecanlimittheonesweareinterestedin.Theavailableoptionsareasfollows:
indices:Informationabouttheindicesincludingsize,documentcount,indexingrelatedstatistics,searchandgettime,caches,segmentmerges,andsoonos:Operatingsystemrelatedinformationsuchasfreediskspace,memory,swapusage,andsoonprocess:Memory,CPU,andfilehandlerusagerelatedtotheElasticsearchprocessjvm:Javavirtualmachinememoryandgarbagecollectorstatisticstransport:Informationaboutdatasentandreceivedbythetransportmodulehttp:Informationabouthttpconnectionsfs:InformationaboutavailablediskspaceandI/Ooperationsstatisticsthread_pool:Informationaboutthestateofthethreadsassignedtovariousoperationsbreakers:Informationaboutcircuitbreakersscript:Scriptingenginerelatedinformation
AnexampleusageofthisAPIcanbeillustratedbythefollowingcommand:
curl'localhost:9200/_nodes/Pulse/stats/os,jvm,breaker'
www.EBooksWorld.ir
ClusterstateAPIAnotherAPIprovidedbyElasticsearchistheclusterstateAPI.Asitsnamesuggests,itallowsustogetinformationabouttheentirecluster(wecanalsolimitthereturnedinformationtoalocalnodebyaddingthelocal=trueparametertotherequest).ThebasiccommandusedtogetalltheinformationreturnedbythisAPIlooksasfollows:
curl-XGET'localhost:9200/_cluster/state?pretty'
Wecanalsolimittheprovidedinformationtothegivenmetricsincomma–separatedform,specifiedafterthe_cluster/statepartoftheRESTcall.Forexample:
curl-XGET'localhost:9200/_cluster/state/version,nodes?pretty'
Wecanalsolimittheinformationtothegivenmetricsandindices.Forexample,ifwewouldliketogetthemetadataforthelibraryindex,wecouldrunthefollowingcommand:
curl-XGET'localhost:9200/_cluster/state/metadata/library?pretty'
Thefollowingmetricsareallowedtobeused:
version:Thisreturnsinformationabouttheclusterstateversion.master_node:Thisreturnsinformationabouttheelectedmasternode.nodes:Thisreturnsnodesinformation.routing_table:Thisreturnsroutingrelatedinformation.metadata:Thisreturnsmetadatarelatedinformation.Whenspecifyingretrievingthemetadatametricwecanalsoincludeanadditionalparametersuchasindex_templates=true,whichwillresultinincludingthedefinedindextemplates.blocks:Thisreturnstheblockspartoftheresponse.
www.EBooksWorld.ir
ClusterstatsAPITheclusterstatsAPIallowsustogetstatisticsabouttheindicesandnodesfromtheclusterwideperspective.TousethisAPI,weneedtoruntheGETrequesttothe/_cluster/statsRESTendpoint,forexample:
curl-XGET'localhost:9200/_cluster/stats?pretty'
Theresponsesizedependsonthenumberofshards,indices,andnodesinthecluster.Itwillincludebasicindicesinformationsuchasshards,theirstate,recoveryinformation,cachesinformation,andnoderelatedinformation.
www.EBooksWorld.ir
PendingtasksAPIOneoftheAPI’sthathelpsusinseeingwhatElasticsearchisdoing;itallowsustocheckwhichtasksarewaitingtobeexecuted.Toretrievethisinformation,weneedtosendarequesttothe/_cluster/pending_tasksRESTendpoint.Inthisresponse,wewillseeanarrayoftaskswithinformationaboutthem,suchastaskpriorityandtimeinqueue.
www.EBooksWorld.ir
IndicesrecoveryAPITherecoveryAPIgivesusinsightabouttherecoverystatusoftheshardsthatarebuildingindicesinourcluster(learnmoreaboutrecoveryinThegatewayandrecoverymodulessectionofChapter9,ElasticsearchClusterinDetail).
Thesimplestcommandthatwouldreturntheinformationabouttherecoveryofalltheshardsintheclusterwouldlookasfollows:
curl-XGET'http://localhost:9200/_recovery?pretty'
Wecanalsogetinformationaboutrecoveryforparticularindices,suchasthelibraryindexforexample:
curl-XGET'http://localhost:9200/library/_recovery?pretty'
TheresponsereturnedbyElasticsearchisdividedbyindicesandshards.Aresponseforasingleshardcouldlookasfollows:
{
"id":2,
"type":"STORE",
"stage":"DONE",
"primary":true,
"start_time_in_millis":1446132761730,
"stop_time_in_millis":1446132761734,
"total_time_in_millis":4,
"source":{
"id":"DboTibRlT1KJSQYnDPxwZQ",
"host":"127.0.0.1",
"transport_address":"127.0.0.1:9300",
"ip":"127.0.0.1",
"name":"Plague"
},
"target":{
"id":"DboTibRlT1KJSQYnDPxwZQ",
"host":"127.0.0.1",
"transport_address":"127.0.0.1:9300",
"ip":"127.0.0.1",
"name":"Plague"
},
"index":{
"size":{
"total_in_bytes":156,
"reused_in_bytes":156,
"recovered_in_bytes":0,
"percent":"100.0%"
},
"files":{
"total":1,
"reused":1,
"recovered":0,
"percent":"100.0%"
},
"total_time_in_millis":0,
www.EBooksWorld.ir
"source_throttle_time_in_millis":0,
"target_throttle_time_in_millis":0
},
"translog":{
"recovered":0,
"total":-1,
"percent":"-1.0%",
"total_on_start":-1,
"total_time_in_millis":3
},
"verify_index":{
"check_index_time_in_millis":0,
"total_time_in_millis":0
}
}
Intheprecedingresponse,wecanseeinformationabouttheshardidentifier,thestageofrecovery,informationwhethertheshardisaprimaryorareplica,thetimestampsofthestartandendofrecovery,andthetotaltimetherecoveryprocesstook.Wecanseethesourcenode,targetnode,andinformationabouttheshard’sphysicalstatistics,suchassize,numberoffiles,transactionlog-relatedstatistics,andindexverificationtime.
Itisworthknowingtheinformationaboutthestagesofrecoveryandtypes.Whenitcomestothetypesofrecovery(thetypeattributeintheresponse),wecanexpectthefollowing:theSTORE,SNAPSHOT,REPLICA,andRELOCATINGvalues.Whenitcomestothestageofrecovery(thestageattributeintheresponse),wecanexpectvaluessuchasINIT(recoveryhasnotstarted),INDEX(Elasticsearchcopiesmetadatainformationanddatafromsourcetodestination),START(Elasticsearchisopeningtheshardforuse),FINALIZE(finalstage,whichcleansupgarbage),andDONE(recoveryhasended).
WecanlimittheresponsereturnedbytheindicesrecoveryAPItoonlytheshardsthatarecurrentlyinactiverecoverybyincludingtheactive_only=trueparameterintherequest.Finally,wecanrequestmoredetailedinformationbyaddingthedetailed=trueparameterintheAPIcall.
www.EBooksWorld.ir
IndicesshardstoresAPITheindicesshardstoresAPIgivesusinformationaboutthestorefortheshardsofourindices.WeusethisAPIbyrunningasimplecommandtothe/_shard_storesRESTendpointandprovidingornotprovidingthecomma-separatedindicesnames.
Forexample,togetinformationaboutalltheindices,wewouldrunthefollowingcommand:
curl-XGET'http://localhost:9200/_shard_stores?pretty'
Wecanalsogetinformationaboutparticularindices,suchasthelibraryandmapones:
curl-XGET'http://localhost:9200/library,map/_shard_stores?pretty'
TheresponsereturnedbyElasticsearchcontainsinformationaboutthestoreforeachshard.Forexample,thisiswhatElasticsearchreturnedforoneoftheshardsofthelibraryindex:
"0":{
"stores":[{
"DboTibRlT1KJSQYnDPxwZQ":{
"name":"Plague",
"transport_address":"127.0.0.1:9300",
"attributes":{}
},
"version":6,
"allocation":"primary"
}]
}
Wecanseeinformationaboutthenodeinthestoresarrays.Eachentrycontainsnoderelatedinformation(thenodewheretheshardisphysicallylocated),theversionofthestorecopy,andtheallocation,whichcantakethevaluesofprimary(forprimaryshards),replica(forreplicas),andunused(forunassignedshards).
www.EBooksWorld.ir
IndicessegmentsAPIThelastAPIwewanttomentionistheLucenesegmentsAPIthatcanbeavailedbyusingthe/_segmentsendpoint.Wecaneitherrunitfortheentirecluster,forexamplelikethis:
curl-XGET'localhost:9200/_segments?pretty'
Wecanalsorunthecommandforindividualindices.Forexample,ifwewouldliketogetsegmentsrelatedinformationforthemapandlibraryindices,wewouldusethefollowingcommand:
curl-XGET'localhost:9200/library,map/_segments?pretty'
ThisAPIprovidesinformationaboutshards,theirplacements,andinformationaboutsegmentsconnectedwiththephysicalindexmanagedbytheApacheLucenelibrary.
www.EBooksWorld.ir
www.EBooksWorld.ir
ControllingtheshardandreplicaallocationTheindicesthatliveinsideyourElasticsearchclustercanbebuiltfrommanyshardsandeachshardcanhavemanyreplicas.Theabilitytodivideasingleindexintomultipleshardsgivesusthepossibilityofdividingthedataintomultiplephysicalinstances.Thereasonswhywewanttodothismaybedifferent.Wemaywanttoparallelizeindexingtogetmorethroughput,orwemaywanttohavesmallershardssothatourqueriesarefaster.Ofcourse,wemayhavetoomanydocumentstofitthemonasinglemachineandwemaywantashardbecauseofthis.Withreplicas,wecanparallelizethequeryloadbyhavingmultiplephysicalcopiesofeachshard.Wecansaythat,usingshardsandreplicas,wecanscaleoutElasticsearch.However,Elasticsearchhastofigureoutwhereintheclusteritshouldplaceshardsandreplicas.Itneedstofigureoutonwhichserver/nodeseachshardorreplicashouldbeplaced.
www.EBooksWorld.ir
ExplicitlycontrollingallocationOneofthemostcommonusecasesthatuseexplicitcontrollingofshardsandreplicasallocationinElasticsearchistime-baseddata,thatis,logs.Eachlogeventhasatimestampassociatedwithit;however,theamountoflogsinmostorganizationsisjustenormous.Thethingisthatyouneedalotofprocessingpowertoindexthem,butyoudon’tusuallysearchhistoricaldata.Ofcourse,youmaywanttodothat,butitwillbedonelessfrequentlythanthequeriesforthemostrecentdata.
Becauseofthis,wecandividetheclusterintosocalledtwotiers—thecoldandthehottier.Thehottiercontainsmorepowerfulnodes,onesthathaveveryfastdisks,lotsofCPUprocessingpower,andmemory.Thesenodeswillhandlebothalotofindexingaswellasqueriesforrecentdata.Thecoldtier,ontheotherhand,willcontainnodesthathaveverylargedisks,butarenotveryfast.Wewon’tbeindexingintothecoldtier;wewillonlystoreourhistoricalindiceshereandsearchthemfromtimetotime.WiththedefaultElasticsearchbehavior,wecan’tbesurewheretheshardsandreplicaswillbeplaced,butluckilyElasticsearchallowsustocontrolthis.
NoteThemainassumptionwhenitcomestotimeseriesdataisthatoncetheyareindexed,theyarenotbeingupdated.ThisistrueforlogindexingusecasesandweassumewecreateElasticsearchdeploymentforsuchausecase.
Theideaistocreatetheindicesthatindextoday’sdataonthehotnodesand,whenwestopusingit(whenanotherdaystarts),weupdatetheindexsettingssothatitismovedtothetiercalledcold.Let’snowseehowwecandothis.
SpecifyingnodeparametersSolet’sdivideourclusterintotwotiers.Wesaytiers,buttheycanbeanynameyouwant,wejustliketheterm“tier”anditiscommonlyused.Weassumethatwehavesixnodes.Wewantourmorepowerfulnodesnumbered1and2tobeplacedinthetiercalledhotandthenodesnumbered3,4,5,and6,whicharesmallerintermsofCPUandmemory,butverylargeintermsofdiskspace,tobeplacedinatiercalledcold.
ConfigurationToconfigure,weaddthefollowingpropertytotheelasticsearch.ymlconfigurationfileonnodes1and2(theonesthataremorepowerful):
node.tier:hot
Ofcourse,wewilladdasimilarpropertytotheelasticsearch.ymlconfigurationfileonnodes3,4,5,and6(thelesspowerfulones):
node.tier:cold
IndexcreationNowlet’screateourdailyindexfortoday’sdata,onecalledlogs_2015-12-10.Aswesaid
www.EBooksWorld.ir
earlier,wewantthistobeplacedonthenodesinthehottier.Wedothisbyrunningthefollowingcommands:
curl-XPUT'http://localhost:9200/logs_2015-12-10'-d'{
"settings":{
"index":{
"routing.allocation.include.tier":"hot"
}
}
}'
Theprecedingcommandwillresultinthecreationofthelogs_2015-12-10indexandspecificationoftheindex.routing.allocation.include.tierpropertytoit.Wesetthispropertytothehotvalue,whichmeansthatwewanttoplacethelogs_2015-12-10indexonthenodesthathavethenode.tierpropertysettohot.
Now,whenthedayendsandweneedtocreateanewindex,weagainputitonthehotnodes.Wedothisbyrunningthefollowingcommand:
curl-XPUT'http://localhost:9200/logs_2015-12-11'-d'{
"settings":{
"index":{
"routing.allocation.include.tier":"hot"
}
}
}'
Finally,weneedtotellElasticsearchtomovetheindexholdingthedataforthepreviousdaytothecoldtier.Wedothisbyupdatingtheindexsettingsandsettingtheindex.routing.allocation.include.tierpropertytocold.Thisisdoneusingthefollowingcommand:
curl-XPUT'http://localhost:9200/logs_2015-12-10/_settings'-d'{
"index.routing.allocation.include.tier":"cold"
}'
Afterrunningtheprecedingcommand,Elasticsearchwillstartrelocatingtheindexcalledlogs_2015-12-10tothenodesthathavethenode.tierpropertysettocoldintheelasticsearch.ymlfilewithoutanymanualworkneededfromus.
ExcludingnodesfromallocationInthesamemanneraswespecifiedonwhichnodestheindexshouldbeplaced,wecanalsoexcludenodesfromindexallocation.Referringtothepreviouslyshownexample.ifwewanttheindexcalledlogs_2015-12-10tonotbeplacedonthenodeswiththenode.tierpropertysettocold,wewouldrunthefollowingcommand:
curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{
"index.routing.allocation.exclude.tier":"cold"
}'
Noticethatinsteadoftheindex.routing.allocation.include.tierproperty,we’veusedtheindex.routing.allocation.exclude.tierproperty.
www.EBooksWorld.ir
RequiringnodeattributesInadditiontoinclusionandexclusionrules,wecanalsospecifytherulesthatmustmatchinorderforashardtobeallocatedtoagivennode.Thedifferenceisthatwhenusingtheindex.routing.allocation.includeproperty,theindexwillbeplacedonanynodethatmatchesatleastoneoftheprovidedpropertyvalues.Usingindex.routing.allocation.require,Elasticsearchwillplacetheindexonanodethathasallthedefinedvalues.Forexample,let’sassumethatwe’vesetthefollowingsettingsforthelogs_2015-12-10index:
curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{
"index.routing.allocation.require.tier":"hot",
"index.routing.allocation.require.disk_type":"ssd"
}'
Afterrunningtheprecedingcommand,Elasticsearchwouldonlyplacetheshardsofthelogs_2015-12-10indexonanodewiththenode.tierpropertysettohotandthenode.disk_typepropertysettossd.
UsingtheIPaddressforshardallocationInsteadofaddingaspecialparametertothenodesconfiguration,weareallowedtouseIPaddressestospecifywhichnodeswewanttoincludeorexcludefromtheshardsandreplicasallocation.Inordertodothis,insteadofusingthetierpartoftheindex.routing.allocation.include.tierorindex.routing.allocation.exclude.tierproperties,weshouldusethe_ip.Forexample,ifwewouldlikeourlogs_2015-12-10indextobeplacedonlyonthenodeswiththe10.1.2.10and10.1.2.11IPaddresses,wewouldrunthefollowingcommand:
curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{
"index.routing.allocation.include._ip":"10.1.2.10,10.1.2.11"
}'
NoteInadditionto_ip,Elasticsearchalsoallowsustouse_nametospecifyallocationrulesusingnodenamesand_hosttospecifyallocationrulesusinghostnames.
Disk-basedshardallocationInadditiontothealreadydescribedallocationfilteringmethods,Elasticsearchgivesusdisk-basedshardallocationrules.Itallowsustosetallocationrulesbasedonthenodes’diskusage.
Configuringdiskbasedshardallocation
Therearefourpropertiesthatcontrolthebehaviorofadisk-basedshardallocation.Allofthemcanbeupdateddynamicallyorsetintheelasticsearch.ymlconfigurationfile.
Thefirstoftheseiscluster.info.update.interval,whichisbydefaultsetto30secondsanddefineshowoftenElasticsearchupdatesinformationaboutdiskusageonnodes.
www.EBooksWorld.ir
Thesecondpropertyisthecluster.routing.allocation.disk.watermark.low,whichisbydefaultsetto0.85.ThismeansthatElasticsearchwillnotallocatenewshardstoanodethatusesmorethan85%ofitsdiskspace.
Thethirdpropertyisthecluster.routing.allocation.disk.watermark.high,whichcontrolswhenElasticsearchwillstartrelocatingshardsfromagivennode.Itdefaultsto0.90andmeansthatElasticsearchwillstartreallocatingshardswhenthediskusageonagivennodeisequaltoormorethan90%.
Boththecluster.routing.allocation.disk.watermark.lowandcluster.routing.allocation.disk.watermark.highpropertiescanbesettoapercentagevalue(suchas0.60,meaning60%)andtoanabsolutevalue(suchas600mb,meaning600megabytes).
Finally,thelastpropertyiscluster.routing.allocation.disk.include_relocations,whichbydefaultissettotrue.IttellsElasticsearchtotakeintoaccounttheshardsthatarenotyetcopiedtothenodebutElasticsearchisintheprocessofdoingthat.Havingthisbehaviorturnedonbydefaultmeansthatthedisk-basedallocationmechanismwillbemorepessimisticwhenitcomestoavailablediskspaces(whenshardsarerelocating),butwewon’trunintosituationswhereshardscan’tberelocatedbecausetheassumptionsaboutdiskspacewerewrong.
Disablingdiskbasedshardallocation
Thediskbasedshardallocationisenabledbydefault.Wecandisableitbyspecifyingthecluster.routing.allocation.disk.threshold_enabledpropertyandsettingittofalse.Wecandothisintheelasticsearch.ymlfileordynamicallyusingtheclustersettingsAPI:
curl-XPUTlocalhost:9200/_cluster/settings-d'{
"transient":{
"cluster.routing.allocation.disk.threshold_enabled":false
}
}'
www.EBooksWorld.ir
ThenumberofshardsandreplicaspernodeInadditiontospecifyingshardsandreplicasallocation,wearealsoallowedtospecifythemaximumnumberofshardsthatcanbeplacedonasinglenodeforasingleindex.Forexample,ifwewouldlikeourlogs_2015-12-10indextohaveonlyasingleshardpernode,wewouldrunthefollowingcommand:
curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{
"index.routing.allocation.total_shards_per_node":1
}'
Thispropertycanbeplacedintheelasticsearch.ymlfileorcanbeupdatedonliveindicesusingtheprecedingcommand.PleaserememberthatyourclustercanstayintheredstateifElasticsearchwon’tbeabletoallocatealltheprimaryshards.
www.EBooksWorld.ir
AllocationthrottlingTheElasticsearchallocationmechanismcanbethrottled,whichmeansthatwecancontrolhowmuchresourcesElasticsearchwilluseduringtheshardallocationandrecoveryprocess.Wearegivenfivepropertiestocontrol,whichareasfollows:
cluster.routing.allocation.node_concurrent_recoveries:Thispropertydefineshowmanyconcurrentshardrecoveriesmaybehappeningatthesametimeonanode.Thisdefaultsto2andshouldbeincreasedifyouwouldlikemoreshardstoberecoveredatthesametimeonasinglenode.However,increasingthisvaluewillresultinmoreresourceconsumptionduringrecovery.Also,pleaserememberthatduringthereplicarecoveryprocess,datawillbecopiedfromtheothernodesoverthenetwork,whichcanbeslow.cluster.routing.allocation.node_initial_primaries_recoveries:Thispropertydefaultsto4anddefineshowmanyprimaryshardsarerecoveredatthesametimeonagivennode.Becauseprimaryshardrecoveryusesdatafromlocaldisks,thisprocessshouldbeveryfast.cluster.routing.allocation.same_shard.host:ABooleanpropertythatdefaultstofalseandisapplicableonlywhenmultipleElasticsearchnodesarestartedonthesamemachine.Whensettotrue,thiswillforceElasticsearchtocheckwhetherphysicalcopiesofthesameshardarepresentonasinglephysicalmachine.Thedefaultfalsevaluemeansnocheckisdone.indices.recovery.concurrent_streams:Thisisthenumberofnetworkstreamsusedtocopydatafromothernodesthatcanbeusedconcurrentlyonasinglenode.Themorethestreams,thefasterthedatawillbecopied,butthiswillresultinmoreresourceconsumption.Thispropertydefaultsto3.indices.recovery.concurrent_small_file_streams:Thisissimilartotheindices.recovery.concurrent_streamsproperty,butdefineshowmanyconcurrentdatastreamsElasticsearchwillusetocopysmallfiles(onesthatareunder5mbinsize).Thispropertydefaultsto2.
Thisallowsustoperformachecktopreventtheallocationofmultipleinstancesofthesameshardonasinglehost,basedonhostnameandhostaddress.Thisdefaultstofalse,meaningthatnocheckisperformedbydefault.Thissettingonlyappliesifmultiplenodesarestartedonthesamemachine.
www.EBooksWorld.ir
Cluster-wideallocationInadditiontotheperindicesallocationsettings,Elasticsearchalsoallowsustocontrolshardandindicesallocationonacluster-widebasis—socalledshardallocationawareness.Thisisespeciallyusefulwhenwehavenodesindifferentphysicalracksandwewouldliketoplaceshardsandreplicasindifferentphysicalnodes.
Let’sstartwithasimpleexample.Weassumethatwehaveaclusterbuiltoffournodes.Eachnodeinadifferentphysicalrack.Thesimplegraphicthatillustratesthisisasfollows:
Asyoucansee,ourclusterisbuiltfromfournodes.EachnodewasboundtoaspecificIPaddressandeachnodewasgiventhetagpropertyandagroupproperty(addedtoelasticsearch.ymlasthenode.tagandnode.groupproperties).Thisclusterwillservethepurposeofshowinghowshardallocationfilteringworks.Thegroupandtagpropertiescanbegivenwhatevernamesyouwant,youjustneedtoprefixyourdesiredpropertynamewiththenodename,forexample,ifyouwouldliketouseapartypropertyname,youwouldjustaddnode.party:party1toyourelasticsearch.yml.
AllocationawarenessAllocationawarenessallowsustoconfigureshardsandtheirreplicasallocationwiththeuseofgenericparameters.Inordertoillustratehowallocationawarenessworks,wewilluseourexamplecluster.Fortheexampletowork,weshouldaddthefollowingpropertytotheelasticsearch.ymlfile:
www.EBooksWorld.ir
cluster.routing.allocation.awareness.attributes:group
ThiswilltellElasticsearchtousethenode.grouppropertyastheawarenessparameter.
NoteYoucanspecifymultipleattributeswhensettingthecluster.routing.allocation.awareness.attributesproperty.Forexample:cluster.routing.allocation.awareness.attributes:group,node
Afterthis,let’sstartthefirsttwonodes,theoneswiththenode.groupparameterequaltogroupA,andlet’screateanindexbyrunningthefollowingcommand:
curl-XPOST'localhost:9200/awarness'-d'{
"settings":{
"index":{
"number_of_shards":1,"number_of_replicas":1
}
}
}'
Afterthiscommand,ourtwo-nodeclusterwilllookmoreorlesslikethis:
Asyoucansee,theindexwasdividedbetweenthetwonodesevenly.Nowlet’sseewhathappenswhenwelaunchtherestofthenodes(theoneswithnode.groupsettogroupB):
www.EBooksWorld.ir
Noticethedifference—theprimaryshardswerenotmovedfromtheiroriginalallocationnodes,butthereplicashardsweremovedtothenodeswithadifferentnode.groupvalue.That’sexactlyright;whenusingshardallocationawareness,Elasticsearchwon’tallocatetheprimaryshardsandreplicasofthesameindextothenodeswiththesamevalueofthepropertyusedtodeterminetheallocationawareness(whichinourcaseisthenode.group).
NotePleaserememberthatwhenusingallocationawareness,shardswillnotbeallocatedtothenodethatdoesn’thavetheexpectedattributesset.Soinourexample,anodewithoutthenode.grouppropertysetwillnotbetakenintoconsiderationbytheallocationmechanism.
ForcingallocationawarenessForcingallocationawarenesscancomeinhandywhenweknow,inadvance,howmanyvaluesourawarenessattributescantakeandwedon’twantmorereplicasthanneededtobeallocatedinourcluster,forexample,nottooverloadourclusterwithtoomanyreplicas.Forthis,wecanforcetheallocationawarenesstobeactiveonlyforcertainattributes.Wecanspecifythesevaluesusingthecluster.routing.allocation.awareness.force.zone.valuespropertyandprovidingalistofcomma-separatedvaluestoit.Forexample,ifwewouldliketheallocationawarenesstouseonlythegroupAandgroupBvaluesofthenode.groupproperty,wewouldaddthefollowingtotheelasticsearch.ymlfile:
www.EBooksWorld.ir
cluster.routing.allocation.awareness.attributes:group
cluster.routing.allocation.awareness.force.zone.values:groupA,groupB
FilteringElasticsearchallowsustoconfigureallocationfortheentireclusterorfortheindexlevel.Inthecaseofclusterallocation,wecanusethepropertiesprefixes:
cluster.routing.allocation.include
cluster.routing.allocation.require
cluster.routing.allocation.exclude
Whenitcomestoindex-specificallocation,wecanusethefollowingpropertiesprefixes:
index.routing.allocation.include
index.routing.allocation.require
index.routing.allocation.exclude
Thepreviouslymentionedprefixescanbeusedwiththepropertiesthatwe’vedefinedintheelasticsearch.ymlfile(ourtagandgroupproperties)andwithaspecialpropertycalled_ipthatallowsustomatchorexcludetheuseofthenodes’IPaddresses,forexample,likethis:
cluster.routing.allocation.include._ip:192.168.2.1
IfwewouldliketoincludenodeswithagrouppropertymatchingthegroupAvalue,wewouldsetthefollowingproperty:
cluster.routing.allocation.include.group:groupA
Noticethatwe’veusedthecluster.routing.allocation.includeprefixandwe’veconcatenateditwiththenameoftheproperty,whichisgroupinourcase.
Whatdoinclude,exclude,andrequiremean
Ifyoulookcloselyattheprecedingparameters,youwillnoticethattherearethreekinds:
include:Thistypewillresultinincludingallthenodeswiththisparameterdefined.Ifmultipleincludeconditionsarevisiblethanallthenodesthatmatchatleastaoneoftheseconditionswillbetakenintoconsiderationwhenallocatingshards.Forexample,ifweaddtwocluster.routing.allocation.include.tagparameterstoourconfiguration,onewithapropertywiththevalueofnode1andsecondwiththenode2value,wewouldendupwithindices(actuallytheirshards)beingallocatedtothefirstandsecondnode(countingfromlefttoright).TosumupthenodesthathavetheincludeallocationparametertypewillbetakenintoconsiderationbyElasticsearchwhenchoosingthenodestoplaceshardson,butthisdoesn’tmeanthatElasticsearchwillputshardsinthem.require:Thisparameter,whichwasintroducedintheElasticsearch0.90typeofallocationfilter,requiresallthenodestohaveavaluethatmatchesthevalueofthisproperty.Forexample,ifweaddonecluster.routing.allocation.require.tagparametertoourconfigurationwiththevalueofnode1andacluster.routing.allocation.require.groupparameterwiththevalueofgroupA,
www.EBooksWorld.ir
wewouldendupwithshardsallocatedonlytothefirstnode(theonewithanIPaddressof192.168.2.1).exclude:Thisparameterallowsustoexcludenodeswithgivenpropertiesfromtheallocationprocess.Forexample,ifwesetcluster.routing.allocation.include.tagtogroupA,wewouldendupwithindicesbeingallocatedonlytothenodeswithIPaddresses192.168.3.1and192.168.3.2(thethirdandfourthnodesinourexample).
NoteThepropertyvaluecanusesimplewildcardcharacters.Forexample,ifwewanttoincludeallthenodesthathavethegroupparametervaluebeginningwithgroup,wecouldsetthecluster.routing.allocation.include.grouppropertytogroup*.Intheexampleclustercase,thiswouldresultinmatchingnodeswiththegroupAandgroupBgroupparametervalues.
www.EBooksWorld.ir
ManuallymovingshardsandreplicasThelastthingwewantedtodiscussistheabilitytomanuallymoveshardsbetweennodes.Elasticsearchexposesthe_cluster/rerouteRESTend-point,whichallowsustocontrolthat.Thefollowingoperationsareavailable:
MovingashardfromnodetonodeCancellingshardallocationForcingshardallocation
Nowlet’slookcloselyatalloftheprecedingoperations.
MovingshardsLet’ssaywehavetwonodescalledes_node_oneandes_node_two,andwehavetwoshardsoftheshopindexplacedbyElasticsearchonthefirstnodeandwewouldliketomovethesecondshardtothesecondnode.Inordertodothis,wecanrunthefollowingcommand:
curl-XPOST'localhost:9200/_cluster/reroute'-d'{
"commands":[{
"move":{
"index":"shop",
"shard":1,
"from_node":"es_node_one",
"to_node":"es_node_two"
}
}]
}'
We’vespecifiedthemovecommand,whichallowsustomoveshards(andreplicas)oftheindexspecifiedbytheindexproperty.Theshardpropertyisthenumberofshardswewanttomove.And,finally,thefrom_nodepropertyspecifiesthenameofthenodewewanttomovetheshardfromandtheto_nodepropertyspecifiesthenameofthenodewewanttheshardtobeplacedon.
CancelingshardallocationIfwewouldliketocancelanon-goingallocationprocess,wecanrunthecancelcommandandspecifytheindex,node,andshardwewanttocanceltheallocationfor.Forexample:
curl-XPOST'localhost:9200/_cluster/reroute'-d'{
"commands":[{
"cancel":{
"index":"shop",
"shard":0,
"node":"es_node_one"
}
}]
}'
Theprecedingcommandwouldcanceltheallocationofshard0oftheshopindexonthe
www.EBooksWorld.ir
es_node_onenode.
ForcingshardallocationInadditiontocancellingandmovingshardsandreplicas,wearealsoallowedtoallocateanunallocatedshardtoaspecificnode.Forexample,ifwehaveanunallocatedshardnumbered0fortheusersindexandwewouldlikeittobeallocatedtoes_node_twobyElasticsearch,wewouldrunthefollowingcommand:
curl-XPOST'localhost:9200/_cluster/reroute'-d'{
"commands":[{
"allocate":{
"index":"users",
"shard":0,
"node":"es_node_two"
}
}]
}'
MultiplecommandsperHTTPrequestWecan,ofcourse,includemultiplecommandsinasingleHTTPrequest.Forexample:
curl-XPOST'localhost:9200/_cluster/reroute'-d'{
"commands":[
{"move":{"index":"shop","shard":1,"from_node":"es_node_one",
"to_node":"es_node_two"}},
{"cancel":{"index":"shop","shard":0,"node":"es_node_one"}}
]
}'
AllowingoperationsonprimaryshardsThecancelandallocatecommandsacceptanadditionalallow_primaryparameter.Ifsettotrue,ittellsElasticsearchthattheoperationcanbeperformedontheprimaryshard.Pleasebeadvisedthatoperationswiththeallow_primaryparametersettotruemayresultindataloss.
www.EBooksWorld.ir
HandlingrollingrestartsThereisonemorethingthatwewouldliketodiscusswhenitcomestoshardandreplicaallocation—handlingrollingrestarts.WhenElasticsearchisrestarted,itmaytakesometimetogetitbacktothecluster.Duringthistime,therestoftheclustermaydecidetodorebalancingandmoveshardsaround.Whenweknowwearedoingrollingrestarts,forexample,toupdateElasticsearchtoanewversionorinstallaplugin,wemaywanttotellthistoElasticsearch.Theprocedureforrestartingeachnodeshouldbeasfollows:
First,beforeyoudoanymaintenance,youshouldstoptheallocationbysendingthefollowingcommand:
curl-XPUT'localhost:9200/_cluster/settings'-d'{
"transient":{
"cluster.routing.allocation.enable":"none"
}
}'
ThiswilltellElasticsearchtostopallocation.Afterthis,wewillstopthenodewewanttodomaintenanceonandstartitagain.Afteritjoinsthecluster,wecanenabletheallocationagainbyrunningthefollowing:
curl-XPUT'localhost:9200/_cluster/settings'-d'{
"transient":{
"cluster.routing.allocation.enable":"all"
}
}'
Thiswillenabletheallocationagain.Thisprocedureshouldberepeatedforeachnodewewanttoperformmaintenanceon.
www.EBooksWorld.ir
www.EBooksWorld.ir
ControllingclusterrebalancingBydefault,Elasticsearchtriestokeeptheshardsandtheirreplicasevenlybalancedacrossthecluster.Suchbehaviorisgoodinmostcases,buttherearetimeswhenwewanttocontrolthisbehavior—forexample,duringrollingrestarts.Wedon’twanttorebalancetheentireclusterwhenoneortwonodesarerestarted.Inthissection,wewilllookathowtoavoidclusterrebalanceandcontrolthisprocess’behaviorindepth.
Imagineasituationwhereyouknowthatyournetworkcanhandleveryhighamountsoftrafficortheoppositeofthis—yournetworkisusedextensivelyandyouwanttoavoidtoomuchloadonit.TheotherexampleisthatyoumaywanttodecreasethepressurethatisputonyourI/Osubsystemafterafull-clusterrestartandyouwanttohavelessshardsandreplicasbeinginitializedatthesametime.Theseareonlytwoexampleswhererebalancecontrolmaybehandy.
www.EBooksWorld.ir
UnderstandingrebalanceRebalancingistheprocessofmovingshardsbetweendifferentnodesinourcluster.Aswehavealreadymentioned,itisfineinmostsituations,butsometimesyoumaywanttocompletelyavoidthis.Forexample,ifwedefinehowourshardsareplacedandwewanttokeepitthisway,wemaywanttoavoidrebalancing.However,bydefault,ElasticsearchwilltrytorebalancetheclusterwhenevertheclusterstatechangesandElasticsearchthinksarebalanceisneeded(andthedelayedtimeouthaspassedasdiscussedinThegatewayandrecoverymodulessectionofChapter9,ElasticsearchClusterinDetail).
www.EBooksWorld.ir
ClusterbeingreadyWealreadyknowthatourindicesarebuiltfromshardsandreplicas.Primaryshardsorjustshardsaretheonesthatgetthedatafirst.Thereplicasarephysicalcopiesoftheprimariesandgetthedatafromthem.Youcanthinkoftheclusterasbeingreadytobeusedwhenalltheprimaryshardsareassignedtotheirnodesinyourcluster–assoonastheyellowhealthstateisachieved.However,Elasticsearchmaystillinitializeothershards–thereplicas.However,youcanuseyourclusterandbesurethatyoucansearchyourentiredatasetandsendindexchangecommands.Thenthecommandswillbeprocessedproperly.
www.EBooksWorld.ir
TheclusterrebalancesettingsElasticsearchletsuscontroltherebalanceprocesswiththeuseofafewpropertiesthatcanbesetintheelasticsearch.ymlfileorbyusingtheElasticsearchRESTAPI(asdescribedinTheupdatesettingsAPIsectionofChapter9,ElasticsearchClusterinDetail).
ControllingwhenrebalancingwillbeallowedThecluster.routing.allocation.allow_rebalancepropertyallowsustospecifywhenrebalancingisallowed.Thispropertycantakethefollowingvalues:
always:Rebalancingwillbeallowedassoonasit’sneededindices_primaries_active:Rebalancingwillbeallowedwhenalltheprimaryshardsareinitializedindices_all_active:Thedefaultone,whichmeansthatrebalancingwillbeallowedwhenalltheshardsandreplicasareinitialized
Thecluster.routing.allocation.allow_rebalancepropertycanbesetintheelasticsearch.ymlconfigurationfileandupdateddynamicallyaswell.
ControllingthenumberofshardsbeingmovedbetweennodesconcurrentlyThecluster.routing.allocation.cluster_concurrent_rebalancepropertyallowsustospecifyhowmanyshardscanbemovedbetweennodesatonceintheentirecluster.Ifyouhaveaclusterthatisbuiltfrommanynodes,youcanincreasethisvalue.Thisvaluedefaultsto2.Youcanincreasethedefaultvalueifyouwouldliketherebalancingtobeperformedfaster,butthiswillputmorepressureonyourclusterresourcesandwillaffectindexingandquerying.Thecluster.routing.allocation.cluster_concurrent_rebalancepropertycanbesetintheelasticsearch.ymlconfigurationfileandupdateddynamicallyaswell.
ControllingwhichshardsmayberebalancedThecluster.routing.allocation.enablepropertyallowsustospecifywhenwhichshardswillbeallowedtoberebalancedbyElasticsearch.Thispropertycantakethefollowingvalues:
all:Thedefaultbehavior,whichtellsElasticsearchtorebalancealltheshardsintheclusterprimaries:Thisvalueallowstherebalancingoftheprimaryshardsonlyreplicas:Thisvalueallowstherebalancingofthereplicashardsonlynone:Thisvaluedisablestherebalancingofalltypeofshardsforallindicesinthecluster
Thecluster.routing.allocation.enablepropertycanbesetintheelasticsearch.ymlconfigurationfileandupdateddynamicallyaswell.
www.EBooksWorld.ir
www.EBooksWorld.ir
TheCatAPITheElasticsearchAdminAPIisquiteextensiveandcoversalmosteverypartofElasticsearcharchitecture:fromlow-levelinformationaboutLucenetohigh-levelonesabouttheclusternodesandtheirhealth.AllthisinformationisavailableusingtheElasticsearchJavaAPIaswellastheRESTAPI.However,thereturneddata,eventhoughitisaJSONdocument,isnotveryreadablebyauser,atleastwhenitcomestotheamountofinformationgiven.
Becauseofthis,Elasticsearchprovidesuswithamorehuman-friendlyAPI–theCatAPI.ThespecialCatAPIreturnsdatainasimpletext,tabularformatandwhat’smore–itprovidesaggregateddatathatisusuallyusablewithoutanyfurtherprocessing.
www.EBooksWorld.ir
ThebasicsThebaseendpointfortheCatAPIisquiteobvious:itis/_cat.Withoutanyparameters,itshowsalltheavailableendpointsforthisAPI.Wecancheckthisbyrunningthefollowingcommand:
curl-XGET'localhost:9200/_cat'
TheresponsereturnedbyElasticsearchshouldbesimilaroridentical(dependingonyourElasticsearchversion)tothefollowingone:
=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
SolookingfromthetopElasticsearchallowsustogetthefollowinginformationusingtheCatAPI:
Shardallocation-relatedinformationAllshards-relatedinformation(alsoonelimitedtoagivenindex)InformationaboutthemasternodeNodesinformationIndicesstatistics(alsoonelimitedtoagivenindex)Segmentsstatistics(alsoonelimitedtoagivenindex)Documentscount(alsoonelimitedtoagivenindex)Recoveryinformation(alsoonelimitedtoagivenindex)ClusterhealthTaskspendingforexecutionIndexaliasesandindicesforagivenaliasThreadpoolconfiguration
www.EBooksWorld.ir
PluginsinstalledoneachnodeFielddatacachesizeandfielddatacachesizesforindividualfieldsNodeattributesinformationDefinedbackuprepositoriesSnapshotscreatedinthebackuprepository
www.EBooksWorld.ir
UsingCatAPIUsingtheCatAPIisassimpleasrunningtheGETrequesttotheoneofthepreviouslymentionedRESTend-points.Forexample,togetinformationabouttheclusterstate,wecouldrunthefollowingcommand:
curl-XGET'localhost:9200/_cat/health'
TheresponsereturnedbyElasticsearchfortheprecedingcommandshouldbesimilartothefollowingone,but,ofcourse,willbedependentonyourcluster:
144629204112:47:21elasticsearchyellow11212100210-50.0%
Thisiscleanandnice.Becauseitisintabularformat,itisalsoeasytousetheresponseintoolssuchasgrep,awk,orsed–astandardsetoftoolsforeveryadministrator.Itisalsomorereadableonceyouknowwhatitisallabout.
Toaddaheaderdescribingeachcolumnpurpose,wejustneedtoaddanadditionalvparameter,justlikethis:
curl-XGET'localhost:9200/_cat/health?v'
CommonargumentsEveryCatAPIendpointhasitsownarguments,butthereareafewcommonoptionsthataresharedamongallofthem:
v:Thisaddsaheaderlinetotheresponsewiththenamesofpresenteditems.h:Thisallowsustoshowonlythechosencolumns,forexampleh=status,node.total,shards,pri.help:Thislistsallthepossiblecolumnsthatthisparticularendpointisabletoshow.Thecommandshowsthenameoftheparameter,itsabbreviation,anddescription.bytes:Thisistheformatfortheinformationrepresentingthevaluesinbytes.Aswesaidearlier,theCatAPIisdesignedtobeusedbyhumansandbecauseofthis,bydefault,thesevaluesarerepresentedinhuman-readableform,forexample:3.5kBor40GB.Thebytesoptionallowsthesettingofthesamebaseforallthenumbers,sosortingornumericalcomparisonwillbeeasier.Forexample,bytes=bpresentsallvaluesinbytes,bytes=kinkilobytes,andsoon.
NoteForthefulllistofargumentsforeachCatAPIendpoint,pleaserefertotheofficialElasticsearchdocumentationavailableat:https://www.elastic.co/guide/en/elasticsearch/reference/2.2/cat.html.
www.EBooksWorld.ir
TheexamplesWhenwewrotethisbook,theCatAPIhadtwenty-twoendpoints.Wedon’twanttodescribethemall–itwouldbearepeatofinformationcontainedinthedocumentationanditdoesn’tmakesense.However,wedidn’twanttoleavethissectionwithoutanexampleregardingtheusageoftheCatAPI.Becauseofthis,wedecidedtoshowhoweasilyyoucangetinformationusingtheCatAPIcomparedtothestandardJSONAPIexposedbyElasticsearch.
GettinginformationaboutthemasternodeThefirstexampleshowshoweasyitistogetinformationaboutwhichnodeinourclusteristhemasternode.Bycallingthe/_cat/masterRESTendpointwecangetinformationaboutthenodesandwhichoneofthemiscurrentlybeingelectedasamaster.Forexample,let’srunthefollowingcommand:
curl-XGET'localhost:9200/_cat/master?v'
TheresponsereturnedbyElasticsearchformylocaltwo-nodeclusterlooksasfollows:
idhostipnode
Cfj3tzqpSNi5SZx4g8osAg127.0.0.1127.0.0.1Skin
Asyoucanseeinresponse,we’vegottheinformationaboutwhichnodeiscurrentlyelectedasthemaster:wecanseeitsidentifier,IPaddress,andname.
GettinginformationaboutthenodesThe/_cat/nodesRESTendpointprovidesinformationaboutallthenodesinthecluster.Let’sseewhatElasticsearchwillreturnafterrunningthefollowingcommand:
curl-XGET'localhost:9200/_cat/nodes?v&h=name,node.role,load,uptime'
Intheprecedingexample,wehaveusedthepossibilityofchoosingwhatinformationwewanttogetfromtheapproximatelyseventyoptionsofthisendpoint.Wehavechosentogetonlythenodename,itsrole—whetherthenodeisadataorclientnode-,nodeload,anditsuptime.
AndtheresponsereturnedbyElasticsearchlooksasfollows:
namenode.roleloaduptime
Skind2.001.3h
Asyoucansee,the/_cat/nodesRESTendpointprovidesalltherequestedinformationaboutthenodesinthecluster.
RetrievingrecoveryinformationforanindexAnotherniceexampleofusingtheCatAPIisgettinginformationabouttherecoveryofasingleindexoralltheindices.Inourcase,wewillretrieverecoveryinformationforasinglelibraryindexbyrunningthefollowingcommand:
curl-XGET'localhost:9200/_cat/recovery/library?
www.EBooksWorld.ir
v&h=index,shard,time,type,stage,files_percent'
Theresponsefortheprecedingcommandlooksasfollows:
indexshardtimetypestagefiles_percent
library075storedone100.0%
library183storedone100.0%
library288storedone100.0%
library379storedone100.0%
library45storedone100.0%
www.EBooksWorld.ir
www.EBooksWorld.ir
WarmingupSometimes,theremaybeaneedtoprepareElasticsearchtohandleyourqueries.Maybeit’sbecauseyouheavilyrelyonthefielddatacacheandyouwantittobeloadedbeforeyourproductionqueriesarrive,ormaybeyouwanttowarmupyouroperatingsystem’sI/Ocachesothatthedataindicesfilesarereadfromthecache.Whateverthereason,Elasticsearchallowsustousesocalledwarmingqueriesforourtypesandindices.
www.EBooksWorld.ir
DefininganewwarmingqueryAwarmingqueryisnothingmorethantheusualquerystoredinaspecialtypecalled_warmerinElasticsearch.Let’sassumethatwehavethefollowingquerythatwewanttouseforwarmingup:
curl-XGETlocalhost:9200/library/_search?pretty-d'{
"query":{
"match_all":{}
},
"aggs":{
"warming_aggs":{
"terms":{
"field":"tags"
}
}
}
}'
Tostoretheprecedingqueryasawarmingqueryforourlibraryindex,wewillrunthefollowingcommand:
curl-XPUT'localhost:9200/library/_warmer/tags_warming_query'-d'{
"query":{
"match_all":{}
},
"aggs":{
"warming_aggs":{
"terms":{
"field":"tags"
}
}
}
}'
Theprecedingcommandwillregisterourqueryasawarmingquerywiththetags_warming_queryname.Youcanhavemultiplewarmingqueriesforyourindex,buteachofthesequeriesneedstohaveauniquename.
Wecannotonlydefinewarmingqueriesfortheentireindex,butalsoforthespecifictypeinit.Forexample,tostoreourpreviouslyshownqueryasthewarmingqueryonlyforthebooktypeinthelibraryindex,runtheprecedingcommandnottothe/library/_warmerURIbutto/library/book/_warmer.So,theentirecommandwillbeasfollows:
curl-XPUT'localhost:9200/library/book/_warmer/tags_warming_query'-d'{
"query":{
"match_all":{}
},
"aggs":{
"warming_aggs":{
"terms":{
"field":"tags"
}
}
www.EBooksWorld.ir
}
}'
Afteraddingawarmingquery,beforeElasticsearchallowsanewsegmenttobesearchedon,itwillbewarmedupbyrunningthedefinedwarmingqueriesonthatsegment.ThisallowsElasticsearchandtheoperatingsystemtocachedataand,thus,speedupsearching.
JustaswereadintheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster,Lucenedividestheindexintopartscalledsegments,whichoncewrittencan’tbechanged.Everynewcommitoperationcreatesanewsegment(whichiseventuallymergedifthenumberofsegmentsistoohigh),whichLuceneusesforsearching.
NotePleasenotethattheWarmerAPIwillberemovedinthefutureversionsofElasticsearch.
www.EBooksWorld.ir
RetrievingthedefinedwarmingqueriesInordertogetaspecificwarmingqueryforourindex,wejustneedtoknowitsname.Forexample,ifwewanttogetthewarmingquerynamedastags_warming_queryforourlibraryindex,wewillrunthefollowingcommand:
curl-XGET'localhost:9200/library/_warmer/tags_warming_query?pretty'
TheresultreturnedbyElasticsearchwillbeasfollows:
{
"library":{
"warmers":{
"tags_warming_query":{
"types":["book"],
"source":{
"query":{
"match_all":{}
},
"aggs":{
"warming_aggs":{
"terms":{
"field":"tags"
}
}
}
}
}
}
}
}
Wecanalsogetallthewarmingqueriesfortheindexandtypeusingthefollowingcommand:
curl-XGET'localhost:9200/library/_warmer?pretty'
Andfinally,wecanalsogetallthewarmingqueriesthatstartwithagivenprefix.Forexample,ifwewanttogetallthewarmingqueriesforthelibraryindexthatstartwiththetagsprefix,wewillrunthefollowingcommand:
curl-XGET'localhost:9200/library/_warmer/tags*?pretty'
www.EBooksWorld.ir
DeletingawarmingqueryDeletingawarmingqueryisverysimilartogettingone;wejustneedtousetheDELETEHTTPmethod.Todeleteaspecificwarmingqueryfromourindex,wejustneedtoknowitsname.Forexample,ifwewanttodeletethewarmingquerynamedtags_warming_queryforourlibraryindex,wewillrunthefollowingcommand:
curl-XDELETE'localhost:9200/library/_warmer/tags_warming_query'
Wecanalsodeleteallthewarmingqueriesfortheindexusingthefollowingcommand:
curl-XDELETE'localhost:9200/library/_warmer/_all'
Andfinally,wecanalsoremoveallthewarmingqueriesthatstartwithagivenprefix.Forexample,ifwewanttoremoveallthewarmingqueriesforthelibraryindexthatstartwiththetagsprefix,wewillrunthefollowingcommand:
curl-XDELETE'localhost:9200/library/_warmer/tags*'
www.EBooksWorld.ir
DisablingthewarmingupfunctionalityTodisablethewarmingqueriestotallybuttosavetheminthe_warmerindex,youshouldsettheindex.warmer.enabledconfigurationpropertytofalse(settingthispropertytotruewillresultinenablingthewarmingupfunctionality).Thissettingcanbeeitherputintheelasticsearch.ymlfileorjustsetusingtheRESTAPIonalivecluster.
Forexample,ifwewanttodisablethewarmingupfunctionalityforthelibraryindex,wewillrunthefollowingcommand:
curl-XPUT'localhost:9200/library/_settings'-d'{
"index.warmer.enabled":false
}'
www.EBooksWorld.ir
ChoosingqueriesforwarmingFinally,weshouldaskourselvesonequestion:whichqueriesshouldbeconsideredascandidatesforwarming.Typically,you’llwanttochooseonesthatareexpensivetoexecuteandonesthatrequirecachestobepopulated.Soyou’llprobablywanttochoosequeriesthatincludeaggregationsandsortingbasedonthefieldsinyourindex.Thiswillforcetheoperatingsystemtoloadthepartoftheindicesthatholdthedatarelatedtosuchqueriesandimprovetheperformanceofconsecutivequeriesthatarerun.Inadditiontothis,parent-childqueriesandnestedqueriesarealsopotentialcandidatesforwarming.Youmayalsochooseotherqueriesbylookingatthelogs,andfindingwhereyourperformanceisnotasgreatasyouwantittobe.Suchqueriesmayalsobeperfectcandidatesforwarmingup.
Forexample,let’ssaythatwehavethefollowingloggingconfigurationsetintheelasticsearch.ymlfile:
index.search.slowlog.threshold.query.warn:10s
index.search.slowlog.threshold.query.info:5s
index.search.slowlog.threshold.query.debug:2s
index.search.slowlog.threshold.query.trace:1s
Andwehavethefollowinglogginglevelsetinthelogging.ymlconfigurationfile:
logger:
index.search.slowlog:TRACE,index_search_slow_log_file
Noticethattheindex.search.slowlog.threshold.query.tracepropertyissetto1sandtheindex.search.slowloglogginglevelissettoTRACE.Thismeansthatwheneveraqueryisexecutedforlongerthanonesecond(onashard,notintotal),itwillbeloggedintotheslowlogfile(thenameofwhichisspecifiedbytheindex_search_slow_log_fileconfigurationsectionofthelogging.ymlconfigurationfile).Forexample,thefollowingcanbefoundinaslowlogfile:
[2015-11-2519:53:00,248][TRACE][index.search.slowlog.query]
took[340000.2ms],took_millis[3400],types[],stats[],
search_type[QUERY_THEN_FETCH],total_shards[5],source[{"query":
{"match_all":{}},"aggs":{"warming_aggs":{"terms":{"field":"tags"}}}}],
extra_source[],
Asyoucansee,intheprecedinglogline,wehavethequerytime,searchtype,andthequerysource,whichshowsustheexecutedquery.
Ofcourse,thevaluescanbedifferentinyourconfigurationbuttheslowlogcanbeavaluablesourceofthequeriesthathavebeenrunningtoolongandmayneedtohavesomewarmupdefined;maybetheseareparent-childqueriesandneedsomeidentifierstobefetchedtoperformbetter,ormaybeyouareusingafilterthatisexpensivewhenyouexecuteitforthefirsttime.
Thereisonethingyoushouldremember:don’toverloadyourElasticsearchclusterwithtoomanywarmingqueriesbecauseyoumayendupspendingtoomuchtimeinwarmingupinsteadofprocessingyourproductionqueries.
www.EBooksWorld.ir
www.EBooksWorld.ir
www.EBooksWorld.ir
IndexaliasingandusingittosimplifyyoureverydayworkWhenworkingwithmultipleindicesinElasticsearch,youcansometimeslosetrackofthem.Imagineasituationwhereyoustorelogsinyourindicesortime-baseddataingeneral.Usually,theamountofdatainsuchcasesisquitelargeand,therefore,itisagoodsolutiontohavethedatadividedsomehow.Alogicaldivisionofsuchdataisobtainedbycreatingasingleindexforasingledayoflogs(ifyouareinterestedinanopensourcesolutionusedtomanagelogs,lookattheLogstashfromtheElasticsearchsuiteathttps://www.elastic.co/products/logstash).
However,aftersometime,ifwekeepalltheindices,wewillstarthavingaproblemintakingcareofallthat.Anapplicationneedstotakecareofalltheinformation,suchaswhichindextosenddatato,whichtoquery,andsoon.Withthehelpofaliases,wecanchangethistoworkwithasinglenamejustaswewoulduseasingleindex,butwewillworkwithmultipleindices.
www.EBooksWorld.ir
AnaliasWhatisanindexalias?It’sanadditionalnameforoneormoreindicesthatallowsustousetheseindicesbyreferringtothemwiththoseadditionalnames.Asinglealiascanhavemultipleindicesaswellastheotherwayround;asingleindexcanbeapartofmultiplealiases.
However,pleaserememberthatyoucan’tuseanaliasthathasmultipleindicesforindexingorforreal-timeGEToperations.Elasticsearchwillthrowanexceptionifyoudothis.Wecanstilluseanaliasthatlinkstoonlyasingleindexforindexing,though.ThisisbecauseElasticsearchdoesn’tknowinwhichindexthedatashouldbeindexedorfromwhichindexthedocumentshouldbefetched.
www.EBooksWorld.ir
CreatinganaliasTocreateanindexalias,weneedtoruntheHTTPPOSTmethodtothe_aliasesRESTend-pointwithadefinedaction.Forexample,thefollowingrequestwillcreateanewaliascalledweek12thatwillincludetheindicesnamedday10,day11,andday12(weneedtocreatethoseindicesfirst):
curl-XPOST'localhost:9200/_aliases'-d'{
"actions":[
{"add":{"index":"day10","alias":"week12"}},
{"add":{"index":"day11","alias":"week12"}},
{"add":{"index":"day12","alias":"week12"}}
]
}'
Iftheweek12aliasisn’tpresentinourElasticsearchcluster,theprecedingcommandwillcreateit.Ifitispresent,thecommandwilljustaddthespecifiedindicestoit.
Wewouldrunasearchacrossthethreeindicesasfollows:
curl-XGET'localhost:9200/day10,day11,day12/_search?q=test'
Ifeverythinggoeswell,wecaninsteadrunitasfollows:
curl-XGET'localhost:9200/week12/_search?q=test'
Isn’tthisbetter?
Sometimeswehaveasetofindiceswhereeveryindexservesindependentinformationbutsomequeriesshouldgoacrossallofthem;forexample,wehavededicatedindicesforcountries(country_en,country_us,country_de,andsoon).Inthiscase,wewouldcreatethealiasbygroupingthemall:
curl-XPOST'localhost:9200/_aliases'-d'{
"actions":[
{"add":{"index":"country_*","alias":"countries"}}
]
}'
Thelastcommandcreatedonlyonealias.Elasticsearchallowsyoutorewritethistosomethinglessverbose:
curl-XPUT'localhost:9200/country_*/_alias/countries'
www.EBooksWorld.ir
ModifyingaliasesOfcourse,youcanalsoremoveindicesfromanalias.Wecandothissimilarlytohowweaddindicestoanalias,butinsteadoftheaddcommand,weusetheremoveone.Forexample,toremovetheindexnamedday9fromtheweek12index,wewillrunthefollowingcommand:
curl-XPOST'localhost:9200/_aliases'-d'{
"actions":[
{"remove":{"index":"day9","alias":"week12"}}
]
}'
www.EBooksWorld.ir
CombiningcommandsTheaddandremovecommandscanbesentasasinglerequest.Forexample,ifyouwouldliketocombineallthepreviouslysentcommandsintoasinglerequest,youwillhavetosendthefollowingcommand:
curl-XPOST'localhost:9200/_aliases'-d'{
"actions":[
{"add":{"index":"day10","alias":"week12"}},
{"add":{"index":"day11","alias":"week12"}},
{"add":{"index":"day12","alias":"week12"}},
{"remove":{"index":"day9","alias":"week12"}}
]
}'
www.EBooksWorld.ir
RetrievingaliasesInadditiontoaddingorremovingindicestoorfromaliases,weandourapplicationsthatuseElasticsearchmayneedtoretrieveallthealiasesavailableintheclusterorallthealiasesthatanindexisconnectedto.Toretrievethesealiases,wesendarequestusingtheHTTPGETcommand.Forexample,thefollowingcommandgetsallthealiasesfortheday10indexandthesecondonewillgetalltheavailablealiases:
curl-XGET'localhost:9200/day10/_aliases'
curl-XGET'localhost:9200/_aliases'
Theresponsefromthesecondcommandisasfollows:
{
"day12":{
"aliases":{
"week12":{}
}
},
"library":{
"aliases":{}
},
"day11":{
"aliases":{
"week12":{}
}
},
"day9":{
"aliases":{}
},
"day10":{
"aliases":{
"week12":{}
}
}
}
Youcanalsousethe_aliasendpointtogetallaliasesfromthegivenindex:
curl-XGET'localhost:9200/day10/_alias/*'
Togetaparticularaliasdefinition,youcanusethefollowing:
curl-XGET'localhost:9200/day10/_alias/day12'
www.EBooksWorld.ir
RemovingaliasesYoucanalsoremoveanaliasusingthe_aliasendpoint.Forexample,sendingthefollowingcommandwillremovetheclientaliasfromthedataindex:
curl-XDELETElocalhost:9200/data/_alias/client
www.EBooksWorld.ir
FilteringaliasesAliasescanbeusedinawaysimilartohowviewsareusedinSQLdatabases.YoucanuseafullQueryDSL(discussedindetailinChapter3,SearchingYourData)andhaveyourfilterappliedtoallcount,search,deletebyquery,andsoon.
Let’slookatanexample.Imaginethatwewanttohavealiasesthatreturndataforacertainclientsowecanuseitinourapplication.Let’ssaythattheclientidentifierweareinterestedinisstoredintheclientIdfieldandweareinterestedinthe12345client.So,let’screatethealiasnamedclientwithourdataindex,whichwillapplyaqueryforclientIdautomatically:
curl-XPOST'localhost:9200/_aliases'-d'{
"actions":[
{
"add":{
"index":"data",
"alias":"client",
"filter":{"term":{"clientId":12345}}
}
}
]
}'
Sowhenusingthedefinedalias,youwillalwaysgetyourrequestfilteredbyatermquerythatensuresthatallthedocumentshavethe12345valueintheclientIdfield.
www.EBooksWorld.ir
AliasesandroutingIntheIntroductiontoroutingsectionofChapter2,IndexingYourData,wetalkedaboutrouting.Similartoaliasesthatusefiltering,wecanaddroutingvaluestothealiases.Imaginethatweareusingroutingonthebasisofuseridentifierandwewanttousethesameroutingvalueswithouraliases.So,forthealiasnamedclient,wewillusetheroutingvaluesof12345,12346,and12347forquerying,andonly12345forindexing.Todothis,wewillcreateanaliasusingthefollowingcommand:
curl-XPOST'localhost:9200/_aliases'-d'{
"actions":[
{
"add":{
"index":"data",
"alias":"client",
"search_routing":"12345,12346,12347",
"index_routing":"12345"
}
}
]
}'
Thisway,whenweindexourdatausingtheclientalias,thevaluesspecifiedbytheindex_routingpropertywillbeused.Atthetimeofquerying,thevaluesspecifiedbythesearch_routingpropertywillbeused.
Thereisonemorething.Pleaselookatthefollowingquerysenttothepreviouslydefinedalias:
curl-XGET'localhost:9200/client/_search?q=test&routing=99999,12345'
Thevalueusedasaroutingvaluewillbe12345.ThisisbecauseElasticsearchwilltakethecommonvaluesofthesearch_routingattributeandthequeryroutingparameter,whichinourcaseis12345.
www.EBooksWorld.ir
ZerodowntimereindexingandaliasesOneofthegreatestadvantagesofusingaliasesistheabilitytore-indexthedatawithoutanydowntimefromthesystemusingElasticsearch.Toachievethis,youwouldneedtointeractwithyourindicesonlythroughaliases—bothforindexingandquerying.Insuchacase,youcanjustcreateanewindex,indexthedatahere,andswitchaliaseswhenneeded.Duringindexing,aliaseswouldstillpointtotheoldindex,sotheapplicationcouldworkasusual.
www.EBooksWorld.ir
www.EBooksWorld.ir
SummaryInthischapter,wediscussedElasticsearchadministration.WestartedbylearninghowtoperformbackupsofourindicesandhowtomonitorourclusterhealthandstateusingitsAPI.Wecontrolledclustershardrebalancingandlearnedhowtoadjustshardallocationaccordingtoourneeds.We’veusedtheCATAPItogetinformationaboutElasticsearchinhuman-readableformandwe’vewarmedupourqueriestomakethemfaster.Finally,we’veusedaliasestoallowabettermanagementofourindicesandtohavemoreflexibility.
Inthenextandfinalchapterofthebook,wewillfocusonahypotheticalonlinelibrarystoretoseehowtomakeElasticsearchworkinpractice.Wewillstartwithabriefintroductionandhardwareconsiderations.WewilltuneasingleinstanceofElasticsearchandproperlyconfigureourclusterbydiscussingeachofitspartsandprovidingaproperarchitecture.Wewillverticallyexpandtheclusterandprepareitforbothhighqueryingandhighindexingload.Finally,wewilllearnhowtomonitorsuchapreparedcluster.
www.EBooksWorld.ir
www.EBooksWorld.ir
Chapter11.ScalingbyExampleInthepreviouschapter,wediscussedElasticsearchadministration.WestartedwithdiscussionaboutbackupsandhowwecandothembyusingavailableAPI.Wemonitoredthehealthandstateofourclustersandnodesandwelearnedhowtocontrolshardrebalancing.WecontrolledtheshardandreplicasallocationandusedhumanfriendlyCatAPItogetinformationabouttheclusterandnodes.Wesawhowtousewarmerstospeeduppotentiallyheavyqueriesandweusedindexaliasingtomanageourindicesmoreeasily.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:
HardwarepreparationsforrunningElasticsearchTuningasingleElasticsearchnodePreparinghighlyavailableandfaulttolerantclustersExpandingElasticsearchverticallyPreparingElasticsearchforhighqueryandindexingthroughputMonitoringElasticsearch
www.EBooksWorld.ir
HardwareOneofthefirstdecisionsthatweneedtomakewhenstartingeveryserioussoftwareprojectisasetchoicesrelatedtohardware.Andbelieveus,thisisnotonlyaveryimportantchoice,butalsooneofthemostdifficultones.Oftenthedecisionsaremadeatearlyprojectstages,whenonlythebasicarchitectureisknownandwedon’thavepreciseinformationregardingthequeries,dataload,andsoon.Projectarchitecthastobalanceprecautionandprojectedcostofthewholesolution.Toomanytimesitisanintersectionofexperienceandclairvoyance,whichcanleadtoeithergreatorterribleresults.
www.EBooksWorld.ir
PhysicalserversoracloudLet’sstartwithadecision:acloud,virtual,orphysicalmachines.Nowadays,theseareallvalidoptions,butitwasnotalwaysthecase.Sometimeagotheonlyoptionwastobuynewserversforeachenvironmentpartorshareresourceswiththeotherapplicationsonthesamemachine.Thesecondoptionmakesperfectsenseasitismorecost-effectivebutintroducesrisk.Problemswithoneapplication,especiallywhentheyarehardwarerelated,willresultinproblemsforanotherapplication.YoucanimagineoneofyourapplicationsusingmostoftheI/OsubsystemofthephysicalmachineandalltheotherapplicationsstrugglingwithlotsofI/Owaitsandperformanceproblemsbecauseofthat.Virtualizationpromisesapplicationseparationandamoreconvenientwayofmanagingresources,butyouarestilllimitedbytheunderlyinghardware.Everyunexpectedtrafficcouldbeaproblemandaffectserviceavailability.Imaginethatyourecommercesitesuddenlygainsmassivenumberofcustomers.Insteadofbeinggladthatthespikeappearedandyouhavemorepotentialcustomers,yousearchforaplacewhereyoucanbuyadditionalhardwarethatwillbesuppliedassoonaspossible.
Cloudcomputingontheotherhandmeansamoreflexiblecostmodel.Wecaneasilyaddnewmachineswheneverweneed.Wecanaddthemtemporarilywhenweexpectagreaterload(forexample,beforeChristmasforanecommercesite)andpayonlyfortheactuallyusedprocessingpower.Itisjustafewclicksintheadminpanel.Evenmore,wecanalsosetupautomaticscaling,thatisnewvirtualmachinescanappearautomaticallywhenweneedthem.Cloud-basedsoftwarecanalsoshutthemdownwhenwedonotneedthemanymore.Thecloudhasmanyadvantages,suchaslowerinitialcost,abilitytoeasilygrowyourbusiness,andinsensitivitytotemporalfluctuationsofresourcerequirements,butitalsohasseveralflaws.Thecostsofcloudserversrisefasterthanthatofphysicalmachines.Also,massstorage,althoughpracticallyunlimited,hasworsecharacteristics(numberofoperationsperseconds)thanphysicalservers.Thisissometimesagreatproblemforus,especiallywithdiskbasedstoragesuchasElasticsearch.
Inpractice,asusual,thechoicecanbehardbutgoingthroughafewpointscanhelpyouwithyourdecision:
Businessrequirementsmaydirectlypointforyourownservers;forexample,someproceduresrelatedtofinancialormedicaldataautomaticallyexcludecloudsolutionshostedbythird-partyvendorsForproofofconceptandlow/mediumloadservices,thecloudcanbeagoodchoicebecauseofsimplicity,scalability,andlowcostSolutionswithstrongrequirementsconnectedwithI/OsubsystemswillprobablyworkbetteronbaremetalmachineswhereyouhavegreaterinfluencewhatstoragetypeisavailabletoyouWhenthetrafficcangreatlychangewithinashorttime,thecloudisaperfectplaceforyou
Forthepurposeoffurtherdiscussion,let’sassumethatwewanttobuyourownservers.Weareinthecomputerstorenowandlet’sbuysomething!
www.EBooksWorld.ir
CPUInmostcases,thisistheleastimportantpart.YoucanchooseanymodernCPUmodelbutyoushouldknowthatmorenumberofcoresmeansahighernumberofconcurrentqueriesandindexingthreads.Thatwillleadtobeingabletoindexdatafaster,especiallywithcomplicatedanalysisandlotsofmerges.
www.EBooksWorld.ir
RAMmemoryMoregigabytesofRAMisalwaysbetterthanlessgigabytesofRAM.Memoryisnecessary,especiallyforaggregationandsorting.Itislessofaproblemnow,withElasticsearch2.0anddocvalues,butstillcomplicatedquerieswithlotsofaggregationrequirememorytoprocessthedata.Memoryisalsousedforindexingbuffersandcanleadtoindexingspeedimprovements,becausemoredatacanbebufferedinmemoryandthusdiskswillbeusedlessfrequently.Ifyoutrytousemorememorythanavailable,theoperatingsystemwillusetheharddisksastemporaryspace(itstartsswapping)andyoushouldavoidthisatallcost.NotethatyoushouldnevertrytoforceElasticsearchtouseasmuchaspossiblememory.ThefirstreasonisJavagarbagecollector–lessmemoryismoreGCfriendly.Thesecondreasonisthattheunusedmemoryisactuallyusedbytheoperatingsystemforbuffersanddiskcache.Infact,whenyourindexcanfitinthisspace,alldataisreadfromthesecachesandnotfromthedisksdirectly.Thiscandrasticallyimprovetheperformance.Bydefault,ElasticsearchandtheI/OsubsystemsharethesameI/Ocache,whichgivesanotherreasontoleaveevenmorememoryfortheoperatingsystemitself.
Inpractice,8GBisthelowestrequirementformemory.ItdoesnotmeanthatElasticsearchwillneverworkwithlessmemory,butformostsituationsanddataintensiveapplications,itisthereasonableminimum.Ontheotherhand,morethan64GBisrarelyneeded.Inlieu,thinkaboutscalingthesystemhorizontallyinsteadofassigningsuchamountsofmemorytoasingleElasticsearchnode.
www.EBooksWorld.ir
MassstorageWesaidthatweareinagoodsituationwhenthewholeindexfitsintomemory.Inpracticethiscanbedifficulttoachieve,sogoodandfastdisksareveryimportant.Itisevenmoreimportantifoneoftherequirementsishighindexingthroughput.Insuchacase,youmayconsiderfastSSDdisks.Unfortunately,thesedisksareexpensiveifyourdatavolumeisbig.YoucanimprovethesituationbyavoidingusingRAID(seehttps://en.wikipedia.org/wiki/RAID),exceptRAID0.Inmostcases,whenyouhandlefaulttolerancebyhavingmultipleservers,theadditionallevelofsecurityontheRAIDlevelisunnecessary.Thelastthingistoavoidusingexternalstorage,suchasnetworkattachedstorage(NAS)orNFSvolumes.Thenetworklatencyinsuchcasesalwayskillsalltheadvantagesofthesesolutions.
www.EBooksWorld.ir
ThenetworkWhenyouuseElasticsearchcluster,eachnodeopensseveralconnectionstoothernodesforvarioususes.Whenyouindex,thedataisforwardedtodifferentshardsandreplicas.Whenyouqueryfordata,thenodeusedforqueryingcanrunmultiplepartialqueriestotheothernodesandcomposereplyfromthedatafetchedfromtheothernodes.Thisiswhyyoushouldmakesurethatyournetworkisnotthebottleneck.Inpractice,useonenetworkforalltheserversintheclusterandavoidsolutionsinwhichthenodesintheclusterarespreadbetweendatacenters.
www.EBooksWorld.ir
HowmanyserversTheanswerisalwaysthesame,asitdepends.Itdependsonmanyfactors:thenumberofrequestperseconds,thedatavolume,thelevelofthequery’scomplexity,theaggregationsandsortingusage,thenumberofnewdocumentsperunitoftime,howfastnewdatashouldbeavailableforsearching(therefreshtime),theaveragedocumentsize,andtheanalyzersused.Inpractice,thehandiestansweris-testitandapproximate.
Theonethingthatisoftenunderestimatedisdatasecurity.Whenyouthinkaboutfaulttoleranceandavailability,youshouldstartfromthreeservers.Why?WetalkedaboutthesplitbrainsituationintheMasterelectionconfigurationsectionofChapter9,ElasticsearchClusterinDetail.StartingfromthreeserversweareabletohandleasingleElasticsearchnodefailurewithouttakingdownthewholecluster.
www.EBooksWorld.ir
CostcuttingYoudidsometests,consideredcarefullyplannedfunctionalities,estimatedvolumesandload,andwenttotheprojectownerwithanarchitecturedraft.“Itstooexpensive”,hesaidandaskedyoutothinkaboutserversonceagain.Whatcanwedo?
Let’sthinkaboutserverrolesandtrytointroducesomedifferencesbetweenthem.Ifoneoftherequirementsisindexingmassiveamountsofdataconnectedwithtime(maybelogs),thepossiblewayishavingtwogroupsofservers:hotnodes,whennewdataarrives,andcoldnodes,whenolddataismoved.Thankstothisapproach,hotnodesmayhavefasterbutsmallerdisks(thatis,solidstatedrives)inoppositetothecoldnodes,whenfastdisksarenotsoimportantbutspaceis.Youcanalsodivideyourarchitectureintoseveralgroupsasmasterservers(lesspowerful,withrelativlysmalldisks),datanodes(biggerdisks),andqueryaggregatornodes(moreRAM).Wewilltalkaboutthisinthefollowingsections.
www.EBooksWorld.ir
www.EBooksWorld.ir
PreparingasingleElasticsearchnodeWhenwetalkaboutverticalscaling,weoftenmeanaddingmoreresourcestotheserverElasticsearchisrunningon.WecanaddmemoryorwecanswitchtoamachinewithabetterCPUorfasterdiskstorage.Ofcourse,withbettermachineswecanexpectanincreaseinperformance;dependingonourdeploymentanditsbottlenecks,itcanbeasmallorlargeimprovement.However,therearelimitationswhenitcomestoverticalscaling.Forexample,oneofthelimitationsisthemaximumamountofphysicalmemoryavailableforyourserversorthetotalmemoryrequiredbytheJVMtooperate.Whenhavinglargedataandcomplicatedqueries,youcanverysoonrunintomemoryissuesandaddingnewmemorymaynothelpatall.Inthissection,wewilltrytogiveyougeneraladviceonwheretolookandwhattotunewhenitcomestoasingleElasticsearchnode.
Thethingtorememberwhentuningyoursystemisperformancetests,onesthatcanberepeatedunderthesamecircumstances.Onceyoumakeachange,youneedtobeabletoseehowitaffectstheoverallperformance.Inadditiontothat,Elasticsearchscalesgreat.Usingthatknowledge,wecanrunperformancetestsonasinglemachine(orafewofthem)andextrapolatetheresults.Suchobservationsmaybeagoodstartingpointforfurthertuning.
Alsokeepinmindthatthissectiondoesn’tcontainadeepdiveintoallperformancerelatedtopics,butisdedicatedtoshowingyouthemostcommonthings.
www.EBooksWorld.ir
ThegeneralpreparationsApartfromallthethingswewilldiscussinthissection,therearethreemajor,operatingsystemrelatedthingsyouneedtoremember:thenumberofallowedfiledescriptors,thevirtualmemory,andavoidingswapping.
NotethatthefollowingsectioncontainsinformationforLinuxoperatingsystems,butyoucanalsoachievesimilaroptionsonMicrosoftWindows.
AvoidingswappingLet’sstartwiththethirdone.ElasticsearchandJavaVirtualMachinebasedapplications,ingeneral,don’tliketobeswapped.Thismeansthattheseapplicationsworkbestiftheoperatingsystemdoesn’tputthememorythattheyuseintheswapspace.Thisisverysimple,because,toaccesstheswappedmemory,theoperatingsystemwillhavetoreaditfromthedisk,whichisslowandwhichwouldaffecttheperformanceinaverybadway.
Ifwehaveenoughmemory,andweshouldhaveifwewantourElasticsearchinstancetoperformwell,wecanconfigureElasticsearchtoavoidswapping.Todothat,wejustneedtomodifytheelasticsearch.ymlfileandincludethefollowingproperty:
bootstrap.mlockall:true
Thisisoneoftheoptions.Thesecondoneistosetthepropertyvm.swappinessinthe/etc/sysctl.conffileto0(forcompleteswapdisabling)or1forswappingonlyinemergency(forKernelversions3.5andabove).
Thethirdoptionistodisableswappingbyediting/etc/fstabandremovingthelinesthatcontaintheswapword.Thefollowingisanexample/etc/fstabcontent:
LABEL=cloudimg-rootfs/ext4defaults,discard00
/dev/xvdbswapswapdefaults00
Todisableswappingwewouldjustremovethesecondlinefromtheabovecontents.Wecouldalsorunthefollowingcommandtodisableswapping:
sudoswapoff-a
However,rememberthatthiseffectwon’tpersistbetweenloggingoffandbackintothesystem,sothisisonlyatemporarysolution.
Also,rememberthatifyoudon’thaveenoughmemorytorunElasticsearch,theoperatingsystemwilljustkilltheprocesswhenswappingisdisabled.
FiledescriptorsMakesureyouhaveenoughlimitsrelatedtofiledescriptorsfortheuserrunningElasticsearch(wheninstallingfromofficialpackages,thatuserwillbecalledelasticsearch).Ifyoudon’t,youmayendupwithproblemswhenElasticsearchtriestoflushthedataandcreatenewsegmentsormergesegmentstogether,whichcanresultinindexcorruption.
Toadjustthenumberofallowedfiledescriptors,youwillneedtoadjustthewww.EBooksWorld.ir
/etc/security/limits.conffile(atleastonmostcommonLinuxsystems)andadjustoraddanentryrelatedtoagivenuser(forbothsoftandhardlimits).Forexample:
elasticsearchsoftnofile65536
elasticsearchhardnofile65536
Itisadvisedtosetthenumberofallowedfiledescriptorstoatleast65536,butevenmorecanbeneeded,dependingonyourindexsize.
OnsomeLinuxsystems,youmayalsoneedtoloadanappropriatelimitsmodulefortheprecedingsettingtotakeeffect.Toloadthatmodule,youneedtoadjustthe/etc/pam.d/loginfileandaddoruncommentthefollowingline:
sessionrequiredpam_limits.so
ThereisalsoapossibilitytodisplaythenumberoffiledescriptorsavailableforElasticsearchbyaddingthe-Des.max-open-files=trueparametertoElasticsearchstartupparameters.Forexample,likethis:
bin/elasticsearch-Des.max-open-files=true
Whendoingthat,Elasticsearchwillincludeinformationaboutthefiledescriptorsinthelogs:
[2015-12-2000:22:19,869][INFO][bootstrap]max_open_files
[10240]
VirtualmemoryElasticsearch2.2useshybriddirectoryimplementation,whichisacombinationofmmapfsandniofsdirectories.Becauseofthat,especiallywhenyourindicesarelarge,youmayneedalotofvirtualmemoryonyoursystem.Bydefault,theoperatingsystemlimitstheamountofmemorymappedfilesandthatcancauseerrorswhenrunningElasticsearch.Becauseofthat,werecommendincreasingthedefaultvalues.Todothat,youjustneedtoeditthe/etc/sysctl.conffileandsetthevm.max_map_countproperty;forexample,toavalueequalto262144.
Youcanalsochangethevaluetemporarilybyrunningthefollowingcommand:
sysctl-wvm.max_map_count=262144
www.EBooksWorld.ir
ThememoryBeforethinkingaboutElasticsearchconfigurationrelatedthings,weshouldrememberaboutgivingenoughmemorytoElasticsearch.Ingeneral,weshouldn’tgivemorethan50-60percentofthetotalavailablememorytotheJVMprocessrunningElasticsearch.WedothatbecausewewanttoleavememoryfortheoperatingsystemandfortheoperatingsystemI/Ocache.However,weneedtorememberthatthe50-60percentfigureisnotalwaystrue.Youcanimaginehavingnodeswith256GBofRAMandhavingindicesof30GBintotalonsuchanode.Insuchcircumstances,evenassigningmorethan60percentofphysicalRAMtoElasticsearchwouldleaveplentyofRAMfortheoperatingsystem.ItisalsoagoodideatosettheXmxandXmspropertiestothesamevaluestoavoidJVMheapsizeresizing.
Anotherthingtorememberarethesocalledcompressedoops(http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html#compressedOop),theordinaryobjectpointers.Javavirtualmachinecanbetoldtousethembyaddingthe-XX:+UseCompressedOopsswitch.ThisallowsJavavirtualmachinetouselessmemorytoaddresstheobjectsontheheap.However,thisisonlytrueforheapsizeslessthanorequalto31GB.Goingforalargerheapmeansnocompressedoopsandhighermemoryusageforaddressingtheobjectsontheheap.
www.EBooksWorld.ir
FielddatacacheandbreakingthecircuitAsweknow,bydefaultthefielddatacacheinElasticsearchisunbounded.Thiscanbeverydangerous,especiallywhenyouareusingaggregationsandsortingonmanyfieldsthatareanalysed,becausetheydon’tusedocvaluesbydefault.Ifthosefieldsarehighcardinalityones,thenyoucanrunintoevenmoretrouble.Bytroublewemeanrunningoutofmemory.
Wehavetwodifferentfactorswecantunetobesurethatwedon’trunintooutofmemoryerrors.Firstofall,wecanlimitthesizeofthefielddatacacheandweshoulddothat.Thesecondthingisthecircuitbreaker,whichwecaneasilyconfiguretojustthrowexceptionsinsteadofloadingtoomuchdata.Combiningthesetwothingstogetherwillensurethatwedon’trunintomemoryissues.
However,weshouldalsorememberthatElasticsearchwillevictdatafromthefielddatacacheifitssizeisnotenoughtohandleaggregationrequestsorsorting.Thiswillaffectthequeryperformancebecauseloadingthefielddatainformationisnotveryefficientandisresourceintensive.However,inouropinion,itisbettertohaveourqueriesslowerthanhavingourclusterblownupbecauseofoutofmemoryerrors.
ThefielddatacacheandcachesingeneralwerediscussedintheElasticsearchcachessectionofChapter9,ElasticsearchClusterinDetail.
www.EBooksWorld.ir
UsedocvaluesWheneveryouplantousesorting,aggregations,orscriptingheavily,youshouldusedocvalueswheneveryoucan.Thiswillnotonlysaveyouthememoryneededforthefielddatacache,becauseoffewerobjectsproduced,itwillalsomaketheJavavirtualmachineworkbetterwithlowergarbagecollectortime.DocvalueswerediscussedintheMappingsConfigurationsectionofChapter2,IndexingYourData.
www.EBooksWorld.ir
RAMbufferforindexingIntheElasticsearchcachessectionofChapter9,ElasticsearchClusterinDetail,wealsodiscussed.Thereareafewthingswewouldliketomention.Firstofall,themoreRAMfortheindexingbuffer,themoredocumentsElasticsearchwillbeabletoholdinmemory.Sothemorememorywehaveforindexing,thelessoftentheflushtodiskwillhappenandfewersegmentswillbecreated.Becauseofthat,yourindexingwillbefaster.Butofcourse,wedon’twantElasticsearchtooccupy100percentoftheavailablememory.KeepinmindthattheRAMbuffersaresetpershard,sotheamountofmemorythatwillbeuseddependsonthenumberofshardsandreplicasthatareassignedonthegivennodeandonthenumberofdocumentsyouindex.Youshouldsettheupperlimitssoyournodedoesn’tblowupwhenithasmultipleshardsassigned.
www.EBooksWorld.ir
IndexrefreshrateElasticsearchusesLuceneandweknowitbynow.ThethingwithLuceneisthattheviewoftheindexisnotrefreshedwhennewdataisindexedorsegmentsarecreated.Toseethenewlyindexeddata,weneedtorefreshtheindex.Bydefault,Elasticsearchdoesthatonceeverysecondandtheperiodofrefreshiscontrolledbyusingtheindex.refresh_intervalproperty,specifiedperindex.Thelowertherefreshrate,thesoonerthedocumentswillbevisibleforsearchoperations.However,thatalsomeansthatElasticsearchwillneedtoputmoreresourcesintorefreshingtheindexview,meaningthattheindexingandsearchingoperationswillbeslower.Thehighertherefreshrate,themoretimeyouwillhavetowaitbeforebeingabletoseethedatainthesearchresults,butyourindexingandqueryingwillbefaster.
www.EBooksWorld.ir
ThreadpoolsWehaven’ttalkedaboutthreadpoolsuntilnow,butwewouldliketomentionthemnow.EachElasticsearchnodeholdsseveralthreadpoolsthatcontroltheexecutionqueuesforoperationssuchasindexingorquerying.Elasticsearchusesseveralpoolstoallowcontroloverhowthethreadsarehandledandmuchthememoryconsumptionisallowedforuserrequests.
NoteJavavirtualmachineallowsapplicationstousemultiplethreads-concurrentlyrunningmultipleapplicationtasks.FormoreinformationaboutJavathreads,refertohttp://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html.
Therearemanythreadpools(wecanspecifythetypeweareconfiguringbyspecifyingthetypeproperty).However,forperformance,themostimportantare:
generic:Thisisthethreadpoolforgenericoperations,suchasnodediscovery.Bydefault,thegenericthreadpoolisoftypecached.index:Thisisthethreadpoolusedforindexinganddeletingoperations.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto200.search:Thisisthethreadpoolusedforsearchandcountrequests.Itstypedefaultstofixedanditssizetothenumberofavailableprocessorsmultipliedby3anddividedby2,withthesizeofthequeuedefaultingto1000.suggest:Thisisthethreadpoolusedforsuggestrequests.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto1000.get:Thisisthethreadpoolusedforrealtimegetrequests.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto1000.bulk:Asyoucanguess,thisisthethreadpoolusedforbulkoperations.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto50.percolate:Thisisthethreadpoolforpercolationrequests.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto1000.
NoteBeforeElasticsearch2.1,wecouldcontrolthetypeofthethreadpool.StartingwithElasticsearch2.1wecannolongerdothat.Formoreinformationpleaserefertotheofficialdocumentation-https://www.elastic.co/guide/en/elasticsearch/reference/2.1/breaking_21_removed_features.html
Forexample,ifwewanttoconfigurethethreadpoolforindexingoperationstohaveasizeof100andaqueueof500,wewillsetthefollowingintheelasticsearch.ymlconfigurationfile:
threadpool.index.size:100
threadpool.index.queue_size:500
www.EBooksWorld.ir
AlsorememberthatthethreadpoolconfigurationcanbeupdatedusingtheclusterupdateAPI.Forexample,likethis:
curl-XPUT'localhost:9200/_cluster/settings'-d'{
"transient":{
"threadpool.index.size":100,
"threadpool.index.queue_size":500
}
}'
Ingeneral,youdon’tneedtoworkwiththethreadpoolsandtheirconfiguration.However,whenconfiguringyourcluster,youmaywanttoputmoreemphasisonindexingorqueryingand,insuchcases,givingmorethreadsorlargerqueuestotheprioritizedoperationmayresultinmoreresourcesbeingusedforsuchoperations.
www.EBooksWorld.ir
www.EBooksWorld.ir
HorizontalexpansionElasticsearchisahighlyscalablesearchandanalyticsplatform.Wecanscaleitbothhorizontallyandvertically.WediscussedhowtotuneasinglenodeinthePreparingasingleElasticsearchnodesectionearlierinthischapterandwewouldliketofocusonhorizontalscalingnow;howtohandlemultiplenodesinthesamecluster,whatrolesshouldtheyhave,andhowtotunetheconfigurationtohaveahighlyreliable,available,andfaulttolerantcluster.
Youcanimagineverticalscalinglikebuildingaskyscrapper–wehavelimitedspaceavailableandweneedtogoashighaswecan.Ofcourse,thatisexpensiveandrequiresalotofengineeringdoneright.Ontheotherhand,wehavehorizontalscaling,whichislikehavingmanyhousesinaresidentialarea.Insteadofinvestingintohardwareandhavingpowerfulmachines,wechoosetohavemultiplemachinesandourdatasplitbetweenthem.Horizontalscalinggivesusvirtuallyunlimitedscalingpossibilities.Evenwiththemostpowerfulhardware,thetimecomeswhenasinglemachineisnotenoughtohandlethedata,thequeries,orbothofthem.Insuchcases,spreadingthedataamongmultipleserversiswhatsavesusandallowsustohaveterabytesofdatainmultipleindicesspreadacrossthewholecluster,justliketheoneinthefollowingimage:
Wehaveour4nodesclusterwiththelibraryindexcreatedandbuiltoffourshards.
Ifwewanttoincreasethequeryingcapabilitiesofourcluster,wecanjustaddadditionalnodes,forexample,fourofthem.Afteraddingnewnodestothecluster,wecaneithercreatenewindicesthatwillbebuiltofmoreshardstospreadtheloadmoreevenlyoraddreplicastothealreadyexistingshards.Bothoptionsareviable.Thisisbecausewedon’thavethepossibilityofsplittingshardsoraddingmoreprimaryshardstoanexistingindex.Weshouldgoforhavingmoreprimaryshardswhenourhardwareisnotenoughtohandletheamountofdataitholds.Insuchcases,weusuallyrunintooutofmemorysituations,longshardqueryexecutiontime,swapping,orhighI/Owaits.Thesecondoption,thatishavingreplicas,isthewaytogowhenourhardwareishappilyhandlingthedatawehavebutthetrafficissohighthatthenodesjustcan’tkeepup.
www.EBooksWorld.ir
Thefirstoptionissimple,butlet’slooksatthesecondcase-havingmorereplicas.Sowithfouradditionalnodes,ourclusterwouldlookasfollows:
Now,let’srunthefollowingcommandtoaddasinglereplica:
curl-XPUT'localhost:9200/library/_settings'-d'{
"index":{
"number_of_replicas":1
}
}'
Ourclusterviewwouldlookmoreorlessasfollows:
Asyoucansee,eachoftheinitialshardsbuildingthelibraryindexhasasinglereplicastoredonanothernode.ThenicethingaboutshardsandtheirreplicasisthatElasticsearchissmartenoughtobalancetheshardsinasingleindexandputthemonseparatenodes.Forexample,youwon’teverendupinasituationwhereyouhaveashardanditsreplicasonthesamenode.Also,Elasticsearchisabletoroundrobinthequeriesbetweentheshardsandtheirreplicas,whichmeansthatallthenodeswillbehitbythequeriesandwedon’thavetocareaboutthat.Becauseofthat,weareabletohandlealmostdoublethequeryloadcomparedtoourinitialdeployment.
www.EBooksWorld.ir
AutomaticallycreatingthereplicasLet’sstayabitlongeraroundreplicas.Elasticsearchallowsustoautomaticallyexpandreplicaswhentheclusterisbigenough.Thismeansthatthereplicascanbecreatedautomaticallywhennewnodesareaddedtothecluster.Youcanwonderwheresuchfunctionalitycanbeuseful.Imagineasituationwhereyouhaveasmallindexthatyouwouldliketobepresentoneverynodesothatyourpluginsdon’thavetorundistributedqueriesjusttogetthedatafromit.Inadditiontothat,yourclusterisdynamicallychanging,thatisyouaddandremovenodesfromit.ThesimplestwaytoachievesuchfunctionalityistoallowElasticsearchtoautomaticallyexpandthereplicas.Todothat,weneedtosetindex.auto_expand_replicasto0-all,whichmeansthattheindexcanhave0replicasorbepresentonallthenodes.SoifoursmallindexiscalledshopsandwewouldlikeElasticsearchtoautomaticallyexpanditsreplicastoallthenodesinthecluster,wewouldusethefollowingcommandtocreatetheindex:
curl-XPOST'localhost:9200/shops/'-d'{
"settings":{
"index":{
"auto_expand_replicas":"0-all"
}
}
}'
Wecanalsoupdatethesettingsofthatindexifitisalreadycreatedbyrunningthefollowingcommand:
curl-XPUT'localhost:9200/shops/_settings'-d'{
"index":{
"auto_expand_replicas":"0-all"
}
}'
www.EBooksWorld.ir
RedundancyandhighavailabilityTheElasticsearchreplicationmechanismnotonlygivesusabilitytohandlehigherquerythroughput,butalsogivesusredundancyandhighavailability.ImagineanElasticsearchclusterhostingasingleindexcalledlibrarythatisbuiltof2shardsand0replicas.Suchaclusterwouldlookasfollows:
Nowwhathappenswhenoneofthenodesfail?Thesimplestansweristhatweloseabout50percentofthedataand,ifthefailureisfatal,welosethatdataforever.Evenwhenhavingbackups,wewouldneedtospinupanothernodeandrestorethebackupandthattakestime.Duringthattime,yourapplication,orpartsofitthatarebasedonElasticsearch,can’tworkcorrectly.IfyourbusinessreliesonElasticsearch,downtimemeansmoneyloss.Ofcourse,wecanusereplicastocreatemorereliableclustersthatcanhandlethehardwareandsoftwarefailures.Andonethingtorememberisthateverythingwillfaileventually–ifthesoftwarewon’t,hardwarewill.Forexample,sometimeagoGooglesaidthatineachoftheirclusters,duringthefirstyearatleast1000machineswillfail(youcanreadmoreonthattopicathttp://www.cnet.com/news/google-spotlights-data-center-inner-workings/).Becauseofthat,weneedtobereadytohandlesuchcases.
Let’slookatthesameclusterbutwithonereplica:
NowlosingasingleElasticsearchnodemeansthatwestillhavethewholedataavailableandwecanworkonrestoringthefullclusterstructurewithoutdowntime.Ofcourse,thisisonlyaverysmallclusterbuiltoftwoElasticsearchnodesclusters.Thelargerthecluster,themorereplicas,themorefailureyouwillbeabletohandlewithoutworryingaboutthedataloss.Ofcourseyouwillhavelowerperformance,dependingonthepercentageofnodesthatfail,butthedatawillstillbethereandtheclusterwillbeoperational.
That’swhy,whendesigningyourarchitectureanddecidingonthenumberofnodesandindicesandtheirarchitecture,youshouldtakeintoconsiderationhowmanynodes,failure
www.EBooksWorld.ir
youwanttolivewith.Ofcourse,youcan’tforgetabouttheperformancepartoftheequation,butredundancyandhighavailabilityshouldbeoneofthefactorsofthescalingequation.
www.EBooksWorld.ir
CostandperformanceflexibilityThedefaultdistributednatureofElasticsearchanditsabilitytoscalehorizontallyallowsustobeflexiblewhenitcomestoperformanceandcoststhatwehavewhenrunningourenvironment.Firstofall,highendserverswithhighperformancedisks,numerousCPUcores,andalotofRAMarestillexpensive.Inadditiontothat,cloudcomputingisgettingmoreandmorepopularandifyouneedalotofflexibilityanddon’twanttohaveyourownhardware,youcanchoosesolutionssuchasAmazon(http://aws.amazon.com/),Rackspace(http://www.rackspace.com/),DigitalOcean(https://www.digitalocean.com/),andsoon.Theydonotonlyallowustorunoursoftwareonrentedmachines,butalsoallowustoscaleondemand.Wejustneedtoaddmoremachineswhichisafewclicksawayorcanevenbeautomatedwithsomedegreeofwork.
Usingahostedsolutionwithoneclickmachinerentingallowshavingatrulyhorizontallyscalablesolution.Ofcourse,that’snotcheap–youpayfortheflexibility.Butwecaneasilysacrificeperformanceifcostsarethemostcrucialfactorinourbusinessplan.Ofcourse,wecanalsogotheotherway.Ifwecanaffordlargebaremetalmachines,Elasticsearchclusterscanbepushedtohundredsofterabytesofdatastoredintheindicesandstillgetdecentperformance(ofcoursewithaproperhardwareandpropertydistributed).
www.EBooksWorld.ir
ContinuousupgradesHighavailability,costandperformanceflexibility,andvirtuallyendlessgrowtharenottheonlythingsworthtalkingaboutwhendiscussingthescalabilitysideofElasticsearch.Atsomepointintime,youwillwanttohaveyourElasticsearchclusterupgradedtoanewversion.Itcanbebecauseofbugfixes,performanceimprovements,newfeatures,oranythingthatyoucanthinkof.Thethingisthatwhenyouhaveasingleinstanceofeachshard,withoutreplicas,anupgrademeansunavailabilityofElasticsearch(oratleastitsparts)andthatmaymeandowntimeoftheapplicationsthatuseElasticsearch.Thisisanotherreasonwhyhorizontalscalingissoimportant;youcanperformupgrades,atleasttothepointwheresoftwaresuchasElasticsearchsupports.Forexample,youcantakeElasticsearch2.0andupgradetoElasticsearch2.1withonlyrollingrestarts(gettingonenodeoutofthecluster,upgradingit,bringingitback,andcontinuingwiththenextnodeuntilallthenodesaredone),thushavingallthedatastillavailableforsearchingandindexinghappeningatthesametime.
www.EBooksWorld.ir
MultipleElasticsearchinstancesonasinglephysicalmachineHavingalargephysicalmachinewithlotofmemoryandCPUcoreshasadvantagesandsomechallenges.Firstofall,ifyoudecidetorunasingleElasticsearchnodeonthatmachine,youwillsoonerorlaterrunintogarbagecollectionissues,youwillhavelotsofshardsonasinglenodewhichwillrequireahighnumberofI/OoperationsfortheinternalElasticsearchcommunication(retrievingclusterstatistics),andsoso.What’smore,youusuallyshouldn’tgoabove31GBofheapmemoryforasingleJVMprocessbecauseyoucan’tusecompressedordinaryobjectpointers(https://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html).
Insuchcases,youcaneitherrunmultipleElasticsearchinstancesonthesamebaremetalmachine,runmultiplevirtualmachinesandasingleElasticsearchinsideeachone,orrunElasticsearchinacontainer,suchasDocker(http://www.docker.com/).Thisisoutofthescopeofthebook,but,becausewearetalkingaboutscaling,wethoughtitmaybeagoodthingtomentionwhatcanbedoneinsuchcases.
NoteThereisalsothepossibilityofrunningmultipleElasticsearchserversonasinglephysicalmachinewithoutrunningmultiplevirtualmachines.Whichroadtotake-virtualmachinesormultipleinstances-isreallyyourchoice.However,weliketokeepthingsseparateandbecauseofthatweusuallygofordividinganylargeserverintomultiplevirtualmachines.Whendividingonelargeserverintomultiplesmallervirtualmachines,rememberthattheI/Osubsystemwillbesharedacrossthosesmallervirtualmachines.Becauseofthat,itmaybegoodtowiselydividethedisksbetweenthevirtualmachines.
PreventingashardanditsreplicasfrombeingonthesamenodeThereisoneadditionalthingworthmentioning.Whenyouhavemultiplephysicalserversdividedintovirtualmachines,itiscrucialtoensurethattheshardanditsreplicadon’tenduponthesamephysicalmachine.Bydefault,ElasticsearchissmartenoughtonotputtheshardanditsreplicaonthesameElasticsearchinstance,butitdoesn’tknowanythingaboutbaremetalmachines,soweneedtotellit.WecantellElasticsearchtoseparatetheshardsandreplicasbyusingclusterallocationawareness.Inourpreviouscase,wehadthreephysicalservers.Let’scallthem:server1,server2,andserver3.
NowforeachElasticsearchonaphysicalserver,wedefinethenode.server_namepropertyandwesetittotheidentifieroftheserver(thenameofthepropertycanbeanythingwewant).Soforexample,forallElasticsearchnodesonthefirstphysicalserver,wewouldsetthefollowingpropertyintheelasticsearch.ymlconfigurationfile:
node.server_name:server1
Inadditiontothat,eachElasticsearchnode(nomatteronwhichphysicalserver)needstohavethefollowingpropertyaddedtotheelasticsearch.ymlconfigurationfile:
www.EBooksWorld.ir
cluster.routing.allocation.awareness.attributes:server_name
IttellsElasticsearchnottoputtheprimaryshardanditsreplicasonthenodeswiththesamevalueinthenode.server_nameproperty.ThisisenoughforusandElasticsearchwilltakecareoftherest.
www.EBooksWorld.ir
DesignatednoderolesforlargerclustersThereisonemorethingthatwewanttodiscussandemphasise.Whenitcomestolargeclusters,itisimportanttoassignrolestoallthenodesinthecluster.ThisallowsforatrulyfullyfaulttolerantandhighlyavailableElasticsearchcluster.TheroleswecanassigntoeachElasticsearchnodeareasfollows:
MastereligiblenodeDatanodeQueryaggregatornode
Bydefault,eachElasticsearchnodeisbothmastereligible(itcanserveasamasternode),canholddata,andworkasaqueryaggregatornode.Youmaywonderwhythatisneeded.Letusgiveyouasimpleexample:ifthemasternodeisunderalotofstress,itmaynotbeabletohandletheclusterstaterelatedcommandfastenoughandtheclustercouldbecomeunstable.Thisisonlyasingle,simpleexampleandyoucanthinkofnumerousothers.
Becauseofthat,mostElasticsearchclustersthatarelargerthanafewnodes,usuallylookliketheonepresentedinthefollowingpicture:
Asyoucansee,ourhypotheticalclustercontainsthreeclientnodes(becauseweknowthattherewillbealotofqueries),alargenumberofdatanodesbecausetheamountofdatawillbelarge,andatleastthreemastereligiblenodesthatshouldn’tbedoinganythingelse.WhythreemasternodeswhenElasticsearchwillonlyuseasingleoneatanygiventime?Again,becauseofredundancyandtobeabletopreventsplitbrainsituationsbysettingdiscovery.zen.minimum_master_nodesto2,whichwouldallowustoeasilyhandlethefailureofasinglemastereligiblenodeinthecluster.
Letusnowgiveyousnippetsoftheconfigurationforeachtypeofnodeinourcluster.WealreadytalkedaboutthatintheUnderstandingnodediscoverysectioninChapter9,ElasticsearchClusterinDetail,butwewouldliketomentionthatonceagain.
www.EBooksWorld.ir
QueryaggregatornodesThequeryaggregatornodesconfigurationisquitesimple.Toconfigurethose,wejustneedtotellElasticsearchthatwedon’twantthosenodestobemastereligibleortoholddata.Thiscorrespondstothefollowingconfigurationsnippetsintheelasticsearch.ymlfile:
node.master:false
node.data:false
DatanodesDatanodesarealsoverysimpletoconfigure.Wejustneedtotellthattheyshouldnotbemastereligible.However,wearenotbigfansofdefaultconfigurations(becausetheytendtochange)andthusourElasticsearchdatanodesconfigurationlooksasfollows:
node.master:false
node.data:true
MastereligiblenodesWe’veleftthemastereligiblenodestotheendofthegeneralscalingsection.Ofcourse,suchElasticsearchnodesshouldn’tbeallowedtoholddata,but,inadditiontothat,itisagoodpracticetodisableHTTPprotocolonsuchnodes.Thisisdonetoavoidaccidentallyqueryingthosenodes.Mastereligiblenodescanuselessresourcesthandataandqueryaggregatornodesandbecauseofthatweshouldensurethattheyareonlyusedformasterrelatedpurpose.Soourconfigurationformastereligiblenodeslooksmoreorlessasfollows:
node.master:true
node.data:false
http.enabled:false
www.EBooksWorld.ir
www.EBooksWorld.ir
PreparingtheclusterforhighindexingandqueryingthroughputUntilthischapter,wemostlytalkedaboutdifferentfunctionalitiesofElasticsearch,bothintermsofhandlingqueries,indexingdata,andtuning.However,runningaclusterinproductionisnotonlyaboutusingthisgreatsearchengine,butalsoaboutpreparingtheclustertohandleboththeindexingandqueryingload.Let’snowsummarizetheknowledgewehaveandseewhatarethethingsweneedtocareaboutwhenitcomestopreparingtheclusterforhighindexingandqueryingthroughput.
www.EBooksWorld.ir
IndexingrelatedadviceInthissection,wewilllookattheindexingrelatedadvicearoundtuningElasticsearch.Eachproductionenvironmentdataisdifferent,indexrateisdifferent,anduser’sbehaviorisdifferent.Takethatintoconsiderationandrunperformancetestsonyourenvironment.Thiswillgiveyouthebestideaaboutwhattoexpectandwhatworksthebestinthecaseofyoursystem.
IndexrefreshrateOneofthegeneralthingsyoushouldpayattentiontoistheindexrefreshrate.Weknowthatrefreshratespecifieshowfastthedocumentswillbevisibleforsearchoperations.Theequationisquitesimple-thefastertherefreshrate,theslowerthequerieswillbeandthelowertheindexingthroughput.Ifwecanallowourselvestohaveaslowerrefreshrate,suchas10sor30s,goforit.ItwillputlesspressureonElasticsearch,Lucene,andhardwareingeneral.Rememberthatbydefaulttherefreshrateissetto1s,whichbasicallymeansthattheindexsearcherobjectisreopenedeverysecond.
Togiveyouabitofinsightintowhatperformancegainswearetalkingabout,wedidsomeperformancetestsincludingElasticsearchanddifferentrefreshrates.Withtherefreshrateof1swewereabletoindexabout1000documentspersecondusingasingleElasticsearchnode.Increasingtherefreshrateto5sgaveusincreaseinindexingthroughputofmorethan25percentandwewereabletoindexabout1250documentspersecond.Settingtherefreshrateto25sgaveusabout70percentofmorethroughputascomparedto1srefreshrate,whichwasabout1700documentspersecondonthesameinfrastructure.Itisalsoworthrememberingthatincreasingthetimeindefinitelydoesn’tmakemuchsense,becauseafteracertainpoint(dependingonyourdataloadandtheamountofdatayouhave)theincreaseofperformanceisnegligible.
Someperformancecomparisonsrelatedtoindexingthroughputandindexrefreshratecanbefoundintheblogpostathttp://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/.
ThreadpoolstuningBydefault,Elasticsearchcomeswithverygooddefaultswhenitcomestoallthreadpoolsconfiguration.Youshouldrememberthattuningthedefaultthreadpoolsconfigurationshouldbedoneonlywhenyoureallyseethatyournodesarefillingupthequeuesandtheyhavestillprocessingpowerleftthatcouldbedesignatedtotheprocessingofthewaitingoperationsorwhenyouwanttoincreasethepriorityofoneormoreoperations.
Forexample,ifyoudidyourperformancetestsandyousawyourElasticsearchinstancesnotbeingsaturated100percent,butontheotherhandyouexperiencedarejectedexecutionerror,thenthatisapointwhenyoushouldstartadjustingthethreadpools.Youcaneitherincreasetheamountofthreadsthatareallowedtobeexecutedatthesametimeorincreasethequeue.Ofcourse,youshouldalsorememberthatincreasingthenumberofconcurrentlyrunningthreadstoveryhighnumberswillleadtomanyCPUcontextswitches(http://en.wikipedia.org/wiki/Context_switch)whichwillresultinaperformance
www.EBooksWorld.ir
drop.
AutomaticstorethrottlingBeforeElasticsearch2.0,wehadtocareabouthowoursegmentprocesswasconfiguredandhowmuchdiskI/Omergingcoulduseingeneral,butthatchanged.RightnowElasticsearchlooksathowI/Osubsystembehavesandadjuststhethrottlingandmergingprocessifthemergesarefallingbehindtheindexing.So,wenolongerneedtoautomaticallyadjustthrottlingfordiskbasedoperations.YoucanreadmoreabouttherelatedchangesonGitHubathttps://github.com/elastic/elasticsearch/pull/9243.
Handlingtime-baseddataWhenyouhavetime-baseddata,suchaslogsforexample,thearchitectureofyourindicesplaysaveryimportantrole.Let’sassumethatwehavelogsindexedintoElasticsearch.Theseusuallycomeinlargenumbers,areconstantlyindexed,andaretimerelated(aneventthatisloggedhappenedatacertainpointintime).TheassumptionisthatyouhaveacertainretentiontoyourdataandatimethatyouwouldlikethedatatobepresentandsearchableinElasticsearch.Afterthattime,youjustdeletethedataandforgetaboutit.
Withsuchassumptionsinmind,youcouldjustcreateasingleindexwithlotofshardsandtrytoindexlargeamountsoflogsthere.However,that’snottheperfectsolution.Firstofall,becauseofmerges–thelargertheindexgets,themoreexpensivethemergesare.ElasticsearchneedstomergelargerandlargersegmentsandmoreI/OandCPUisrequiredtohandlethem.Thismeansslowdowns.Inadditiontothat,deleteswillbeexpensivebecauseyouwillhavetodeletethedataeitherbyusingTTLorbyusingdeletebyqueryplugin–bothexpensivetouseintermsofperformanceandwillcauseevenmoremerging.Andthisisnoteverything–duringqueryingyouwillhavetorunthroughthewholeindextogeteventhesmallestsliceofthedata.So,aretherebetterindexarchitecturesfortime-baseddata?
Yes,oneofthemostcommonandbestsolutionsistousetimebasedindices.Dependingonthedatavolume,youcanhavedaily,weekly,monthly,orevenhourlyindices.Thedownsideisthenumberofshardsyouwillhavewhenthenumberofindicesgrow,butapartfromthatthereareonlypros:youcancontroleachindex,changethenumberofshardsifthatisneeded,andhavefastermergingbecausetheindiceswillbesmallercomparedtoonlyonebigindex.What’smore,deletingdatawon’tbepainfulatall–theideaistodeletethewholeindices;forexample,adayworthofdataincaseofdailyindices.Querieswillalsobenefit–youcanjustrunthequeryonasingletimebasedindextonarrowdownthesearchresults.Finally,Elasticsearch,bydefault,willcreatetheindicesforus.Forexample,whenusingdailyindices,wecanhavenamessuchaslogs_2016-01-01,logs_2016-01-02,andsoon.
TheonlythingweneedtocareaboutisprovidingtheindexnameonthebasisofthedateandcreatingtemplatestoconfigureeachnewlycreatedindexandElasticsearchwilldotherest.
Multipledatapaths
www.EBooksWorld.ir
WiththereleaseofElasticsearch2.0,weweregiventheabilitytospecifymultiplepath.datapropertiesinourelasticsearch.ymlpointingtodifferentdirectoriesondifferentphysicaldevices.Elasticsearchcannowleveragethatbyputtingdifferentshardsondifferentdevicesandusingthemultiplepathsinthemostefficientway.Becauseofthat,wecanparallelizewritingtodisksifwehavemorethanasingledisk.Thisisespeciallyusefulforhighindexingusecaseswhereyouindexalotofdata.
DatadistributionAsweknow,eachindexintheElasticsearchworldcanbedividedintomultipleshardsandeachshardcanhavemultiplereplicas.IncaseswhenyouhavemultipleElasticsearchnodes(andyouwillprobablyhaveinproduction),youshouldthinkaboutthenumberofshardsandreplicasandhowthatwillaffectyournodes.Datadistributionmaybecrucialtoeventheloadontheclusterandnothavesomenodesdoingmoreworkthantheotherones.
Let’stakethefollowingexample.Imaginewehaveaclusterthatisbuiltof4nodesandithasasingleindexcalledbookbuiltof3shardsandonereplica.Suchadeploymentwilllookasfollows:
Asyoucansee,thefirsttwonodeshavetwophysicalshardsallocatedtothem,whilethelasttwonodeshaveonlyoneshardallocatedeach.Theactualdataallocationisnoteven.Whensendingthequeriesandindexingdata,wewillhavethefirsttwonodesdomoreworkthantheothertwo-thisiswhatwewanttoavoid.Oneoptionistohavethebookindexhavetwoshardsandonereplica,soitlooksasfollows:
www.EBooksWorld.ir
Thisarchitecturewillworkanditisperfectlyfine.Wedon’thavetohaveprimaryshardsonallournodes,wecanhavereplicas,dependingonwhatbottleneckweexpect.Forqueryingwemaywanttohavemorereplicas,forindexingmoreprimaries.
Wecanalsohaveourprimaryshardssplitevenly,likeinthefollowingimage:
ThethingtorememberthoughisthatinbothcaseswewillendupwithevendistributionofshardsandreplicasandElasticsearchwilldosimilaramountofworkonallthenodes.Ofcourse,withmoreindices(likehavingdailyindices)itmaybetrickiertogetthedataevenlydistributedanditmaynotbepossibletohaveevenlydistributedshards,butweshouldtrytogettosuchpoint.
Onemorethingtorememberwhenitcomestodatadistributionandshardsandreplicasisthatwhendesigningyourindexarchitecture,youshouldrememberwhatyouwanttoachieve.Ifyouaregoingforaveryhighindexingusecase,youmaywanttospreadtheindexintomultipleshardstolowerthepressurethatisputontheCPUandtheI/Osubsystemoftheserver.Thisisalsotrueforrunningexpensivequeries,becausewithmoreshardsyoucanlowertheloadonasingleserver.However,withthequeriesthereis
www.EBooksWorld.ir
onemorething-ifyournodescan’tkeepupwiththeloadcausedbyqueries,youcanaddmoreElasticsearchnodesandincreasethenumberofreplicassothatthephysicalcopiesoftheprimaryshardsareplacedonthosenodes.Thatwillmakeindexingabitslowerbutwillgiveyouthecapacitytohandlemorequeriesatthesametime.
BulkindexingThisisveryobviousadvice,butyouwouldbesurprisedhowmanyElasticsearchusersforgetaboutindexingdatainbulksinsteadofsendingthedocumentsonebyone.Sotheadvicehereistodobulksinsteadofonebyoneindexingwheneverpossible.ThethingtorememberthoughisnottooverloadElasticsearchwithtoomanybulkrequestsandtokeepthemunderareasonablesize(donotpushmillionsofdocumentsinasinglerequest).Rememberaboutthebulkthreadpoolanditssizeandtrytoadjustyourindexersnottogobeyonditoryouwillfirststarttoqueuetheserequestsand,ifElasticsearchwillnotbeabletoprocessthem,youwillquicklystartseeingrejectedexecutionexceptionsandyourdatawon’tbeindexed.
Justasanexample,wewouldliketoshowresultsoftestswedidsometimeagoforthetwotypesofindexing:onebyoneandbulks.Inthefollowingimage,wehavetheindexingthroughputwhenrunningindexationonedocumentbyone:
Inthisnextimage,wedothesame,butinsteadofindexingdocumentsonebyone,weindextheminbatchesof10documents(whichisstillarelativelylownumberofdocumentsinabulk):
www.EBooksWorld.ir
Asyoucansee,whenindexingdocumentsonebyone,wewereabletoindexabout30documentspersecondanditwasstable.Thesituationchangedwithbulkindexingandbatchesof10documents;wewereabletoindexslightlymorethan200documentspersecond.Sothedifferencecanbeclearlyseen.
Ofcoursethisisaverybasiccomparisonofindexingspeed.Toshowtherealdifference,weshouldusedozensofthreadsandpushElasticsearchtoitslimits.However,theprecedingcomparisonshouldgiveyouabasicviewoftheindexingthroughputgainswhenusingbulkindexing.
RAMbufferforindexingRemember,themoreavailableRAMfortheindexingbuffer(theindices.memory.index_buffer_sizeproperty),themoredocumentsElasticsearchcanholdinmemory.However,wedon’twanttohaveElasticsearchoccupy100percentoftheavailablememory.Theindexingbuffercanhelpuswithdelayingtheflushtodisk,whichwillmeanlessI/Opressureandlessmerges.YoucanreadmoreaboutindexingbufferconfigurationinChapter9,ElasticsearchClusterinDetail.
www.EBooksWorld.ir
AdviceforhighqueryratescenariosOneofthegreatfeaturesofElasticsearchisitsabilitytosearchandanalyzethedatathatwasindexed.However,sometimesitisnecessarytoadjustElasticsearchandourqueriestonotonlygettheresultsofthequery,butalsogetthemfast(orinareasonableamountoftime).Inthissection,wewilllookatthepossibilitiesofpreparingElasticsearchforhighquerythroughputusecases,butnotjustthat.Wewillalsolookatgeneralperformancetipswhenitcomestoquerying.
ShardrequestcacheThepurposeoftheshardrequestcacheistocacheaggregations,suggesterresults,andnumbersofhits(itwillnotcachethereturneddocumentsandthusonlyworkswithsize=0).Whenyourqueriesuseaggregationsorsuggestions,itmaybeagoodideatoenablethiscache(itisdisabledbydefault)sothatElasticsearchcanre-usethedatastoredthere.Thebestthingaboutthecacheisthatitpromisesthesamenearreal-timesearchasasearchthatisnotcached.YoucanreadmoreaboutcachesandtheshardrequestcacheinparticularinChapter9,ElasticsearchClusterinDetail.
ThinkaboutthequeriesThisisthemostgeneraladvicewecanactuallygive–youshouldalwaysthinkaboutoptimalquerystructure,filterusage,andsoon.Forexample,let’slookatthefollowingquery:
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"masteringANDdepartment:itANDcategory:book",
"default_field":"name"
}
},
{
"term":{
"tag":"popular"
}
},
{
"term":{
"tag":"2014"
}
}
]
}
}
}
Itreturnsthebookmatchingafewconditions.However,thereareafewthingswecanimproveintheprecedingquery.Forexample,wecanmovethestaticthingssuchasthe
www.EBooksWorld.ir
tag,department,andcategoryfieldrelatedconditionstothefiltersectionoftheBooleanquery,sothatthenexttimeweusesomepartsofthequerywesaveCPUcyclesandre-usetheinformationstoredincache.Thatstaticfilteringinformationisalsonotrelevantwhenitcomestoscoring.Becauseofthatwecanmovethosestaticelementstothefiltersectionandomitscoringcalculationforthem.Forexample,thisishowtheoptimizedquerywilllooklike:
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"mastering",
"default_field":"name"
}
}
],
"filter":[
{
"term":{
"tag":"popular"
}
},
{
"term":{
"tag":"2014"
}
},
{
"term":{
"department":"it"
}
},
{
"term":{
"category":"book"
}
}
]
}
}
}
Asyoucansee,thereareafewthingsthatwedid.Westillusedtheboolquery,butweintroducedtheuseofthefiltersection.Weusedfilteringforthestatic,non-analyzedfields.Thisallowsustoeasilyre-usethefiltersinthenextqueriesthatweexecute.Becauseofsuchqueryrestructuring,wewereabletosimplifythemainquery.Thisisexactlywhatyoushouldbedoingwhenoptimizingyourqueriesordesigningthem-haveoptimizationandperformanceinmindandtrytokeepthemasoptimalastheycanbe.Thiswillresultinfasterexecutionofthequeries,lowerresourceconsumption,andbetterhealthofthewholeElasticsearchcluster.
www.EBooksWorld.ir
ParallelizeyourqueriesOnethingthatisusuallyforgottenistheneedofparallelizingqueries.Imaginethatyouhaveadozennodesinyourclusterbutyourindexisbuiltofasingleshard.Iftheindexislarge,yourquerieswillperformworsethanyouexpect.Ofcourseyoucanincreasethenumberofreplicas,butthatwon’thelp.Asinglequerywillstillgotoasingleshardinthatindex,becausereplicasarenotmorethanthecopiesoftheprimaryshardandtheycontainthesamedata(oratleasttheyshould).Thisisalsotruenotonlyforindiceshavingoneshardbutalsoifyouhavemorethanoneshard,buttheyareverylarge,youcanstillhaveperformancerelatedproblems.Itissaidthatthequeryisonlyasfastastheslowestpartialqueryresponse.
Ofcourse,theparallelizationalsodependsontheusecase.IfyourunalotofqueriestoElasticsearch,youmaynotneedtoparallelizethequeries,especiallywhentheshardsaresmallenoughandyoudon’tseeproblemsatshardlevel.Ingeneral,lookatyourElasticsearchnodesandseeiftheyhaveunusedCPUcoresand,ifthat’sthecase,youmayhaveroomforimprovementandparallelization.
FielddatacacheandbreakingthecircuitWehavetwodifferentfactorswecantunetobesurethatwedon’trunintooutofmemoryerrors.Firstofall,wecanlimitthesizeofthefielddatacache.Thesecondthingisthecircuitbreaker,whichwecaneasilyconfiguretojustthrowanexceptioninsteadofloadingtoomuchdata.Combiningthesetwothingswillensurethatwedon’trunintomemoryissues.Evenifyouareusingdocvaluesalot,youmaystillrunintooutofmemoryissues.Forexample,foranalysedfields,whichcan’tusedocvaluesandwilluse,fielddatacache–configurethefielddatacacheandcircuitbreakerscorrectly.YoucanreadmoreabouthowtoconfiguretheminChapter9,ElasticsearchClusterinDetail.
KeepsizeandshardsizeundercontrolWhendealingwithsomeofthequeriesthatuseaggregations,wehavethepossibilityofusingtwoproperties:sizeandshard_size.Thesizeparameterdefineshowmanybucketsshouldbereturnedbythefinalaggregationresults;thenodethataggregatesthefinalresultswillgetthetopbucketsfromeachshardthatreturnstheresultandwillonlyreturnthetopsizeofthemtotheclient.Theshard_sizeparametertellsElasticsearchaboutthesamebutattheshardlevel.Increasingthevalueoftheshard_sizeparameterwillleadtomoreaccurateaggregations(likeinthecaseofsignificanttermsaggregation)atthecostofnetworktrafficandmemoryusage.Loweringthatparameterwillcauseaggregationresultstobelessprecise,butwewillbenefitfromlowermemoryconsumptionandlowernetworktraffic.Ifweseethatthememoryusageistoolarge,wecanlowerthesizeandshard_sizepropertiesforproblematicqueriesandseeifthequalityoftheresultsisstillacceptable.
www.EBooksWorld.ir
www.EBooksWorld.ir
MonitoringElasticsearchmonitoringAPIsexposealotofinformation,bothaboutthesearchengineitselfaswellasabouttheenvironment,suchastheoperatingsystem.WesawthatinChapter10,AdministratingYourCluster.Becauseofthisandtheeaseofretrievingthisinformation,numerousapplicationswerebuilt–onesthatallowustodomonitoringandbeyond.Someoftheseapplicationsaresimpleandjustreadthedatainrealtimewithoutanypersistentstorage,whileothersallowustoreadhistoricaldataaboutourclusterbehavior.Inthischapter,wewillonlyslightlytouchthetopofthepileofinformationaboutsuchapplications,butwestronglyadviseyoutogetfamiliarwithsomeofthemastheycanmakeyoureverydayworkwithElasticsearcheasier.
WechosethreeexamplesofmonitoringsolutionswhichtakeadifferentapproachofintegrationwithElasticsearch.ThefirsttwotoolsareavailableasElasticsearchpluginsandthethirdtakesadifferentapproachtointegration.
www.EBooksWorld.ir
ElasticsearchHQThistoolisavailableasanElasticsearchpluginbutcanalsobedownloadedseparatelyasaJavaScriptapplicationruninabrowser.
ElasticsearchHQusesJavaScriptandAJAXtechniqueswheredataisfetchedperiodicallyfromthecluster,preparedforvisualizationonthebrowserside,andshowntotheuser.
Thetoolallowsustotrackstatisticsonaparticularnode.Thebrowsercanpresentvitalinformationabouttheclusterandparticularnodes.ThefollowingscreenshotshowsthegraphicaluserinterfacefromElasticsearchHQ:
Wehavethebasicinformationaboutthecluster,thenumberofnodes,andElasticsearchhealth.Wecanalsoseewhichnodewearelookingatandsomestatisticsaboutthenode,whichincludethememoryusage(bothheapandnon-heap),thenumberofthreads,Javavirtualmachinegarbagecollectorwork,andsoon.Thepluginalsopresentssimplifiedinformationaboutschemaandshardsandallowsexecutionofsimplequeries.
InordertoinstallElasticsearchHQ,oneshouldjustrunthefollowingcommand:
bin/plugininstallroyrusso/elasticsearch-HQ
Afterthat,theGUIwillbeavailableathttp://localhost:9200/_plugin/hq/.
OnethingtorememberisthatElasticsearchHQdoesn’tpersistthefetcheddataanywhere,sothedataisonlyfetchedwhenyourbrowserisrunningandhasElasticsearchHQopened.Ifsomethinghashappenedinthepast,youwon’tbeabletodiagnoseit.
www.EBooksWorld.ir
MarvelMarvelisthetoolcreatedbytheElasticsearchteam.Inthecurrentversion,itisbuiltasapluginforavisualizationplatformcalledKibana(https://www.elastic.co/products/kibana).
NoteKibanaisoutofthescopeofthisbook.YoucanfindmoreaboutKibanaonofficialproductpageavailableat
https://www.elastic.co/.
Marvelalsovisualizesbasicinformationaboutclustersandnodesbydrawingnicegraphsthataredynamicallyupdatedovertime.ThemaindifferencefromElasticsearchHQisthattheperformancedataisstoredontheserverside(inthesameorexternalElasticsearchcluster),sohistoricaldataisavailable.Theexamplescreenshotispresentednext:
TheinstallationprocedureforMarvelcontainsthreesteps:
bin/plugininstalllicense
bin/plugininstallmarvel-agent
Andfinally,thethirdstepistoinstalltheMarvelplugininKibanabyrunningthefollowingcommand:
bin/kibanaplugin--installelasticsearch/marvel/latest
www.EBooksWorld.ir
SPMforElasticsearchThistoolpresentsadifferentapproachthanthepreviouslymentionedtools.SPMisaSoftwareasaService(SaaS)solutioncreatedformonitoringElasticsearchinstallationsofanysizeandallowsmonitoringseveralclustersanddifferenttechnologies.ThoughitsrootsareSaaS-based,itisalsoavailableonpremises,whichmeansthatyoucanrunSPMonyourownmachineswithouttheneedforsendingyourmetricstocloud.
InformationissentbysimpleclientsoftwareinstalledontheElasticsearchmachinetotheSPMservers.Themainadvantageisthepossibilityofstoringinformationforawiderrangeoftimeandseeingwhatwashappeninginthepast.Youcancreateyourowndashboardsandcorrelatemetricswithlogsbetweenmultipleapplications(SPMallowsyoutomonitorawidevarietyofapplications).
ThefollowingscreenshotshowsthedashboardofSPMforElasticsearch:
Theoverviewdashboardshownintheprecedingscreenshotprovidesinformationabouttheclusternodes,therequestrateandlatency,thenumberofdocumentsintheindices,CPUusage,load,memorydetails,Javavirtualmachinememory,thediskspaceusage,andfinallynetworktraffic.Youcangetdetailedinformationabouteachoftheseelementsbygoingintothetabdedicatedtoit.
YoucanfindadditionalinformationaboutSPMinstallationandavailableoptionsathttp://sematext.com/spm/index.html.
www.EBooksWorld.ir
www.EBooksWorld.ir
SummaryInthischapter,wefocusedonscalingandtuningElasticsearch.Westartedwiththehardwarepreparationsanddecisionsweneedtomake.Next,wetunedasingleElasticsearchnodeasmuchaswecouldandafterthatweconfiguredthewholeclustertoworkaswellasitcould.Wediscussedverticalexpansionpossibilitiesandwelearnedhowtomonitorourclusteronceithitstheproductionenvironment.
Sonowwehavereachedtheendofthebook.Wehopethatitwasanicereadingexperienceandthatyoufoundthebookinteresting.Sincethepreviouseditionofthebook,Elasticsearchhaschangedalot.Notonlywhenitcomestoversions,butalsowhenitcomestofunctionalities.Someofthefeaturesarenolongerthere,someofthemweremovedtoplugins,andofcoursenewfeatureswereadded.WereallyhopethatyouhavelearnedsomethingfromthisbookandnowyouwillfinditeasiertouseElasticsearcheveryday–nomatterifyouareabeginnerinthisworldorasemi–experiencedElasticsearchuser.Astheauthorsofthisbook,butalsoasElasticsearchusersourselves,wetriedtobringyou,ourreaders,thebestreadingexperiencewecould.OfcourseElasticsearchismorethanwedescribedinthebook,especiallywhenitcomestomonitoringandadministrationcapabilitiesandAPI.However,thenumberofpagesislimitedandifweweretodescribeeverythingingreatdetailswewouldhaveendedupwithabookonethousandpageslong.WeneedtorememberthatElasticsearchisnotonlyuserfriendlybutalsoprovidesalargeamountofconfigurationoptions,queryingpossibilities,andsoon.Duetothat,wehadtochoosewhichfunctionalitiestodescribeingreaterdetails,whichhadtobeonlymentioned,andwhichhadtobetotallyskipped.Aswiththetwopreviouseditionsofthebookyouareholding,wehopethatwemadetherightchoiceandthatyouarehappyaboutwhatyou’veread.
WewouldalsoliketosaythatitisworthrememberingthatElasticsearchisconstantlyevolving.Whenwritingthisbook,wewentthroughafewstableversionsfinallymakingittothereleaseofElasticsearch2.2.Evenbackthenweknewthatnewfeaturesandimprovementswerecoming,likesomeofthechangesmentionedinthebookthatwillbepartofthenextrelease,oratleasttheyareplannedtobe.BesuretochecktheofficialdocumentationofElasticsearchperiodicallyforthereleasenotesfornewversionsofElasticsearch,ifyouwanttobeuptodatewiththenewfeaturesbeingadded.Wewillalsobewritingaboutnewfeaturesthatwethinkareworthmentioningonwww.elasticsearchserverbook.com.Soifyouareinterested,visitthesitefromtimetotime.
Onceagainthankyouforthetimeyou’vespentwiththebook.
www.EBooksWorld.ir
IndexA
advices,forhighqueryratescenariosabout/Adviceforhighqueryratescenariosshardrequestcache/Shardrequestcachequeries/Thinkaboutthequeriesqueries,parallelizing/Parallelizeyourqueriesfielddatacache/Fielddatacacheandbreakingthecircuitcircuit,breaking/Fielddatacacheandbreakingthecircuitsize,controlling/Keepsizeandshardsizeundercontrolshardsize,controlling/Keepsizeandshardsizeundercontrol
aggregationengineworking/Insidetheaggregationsengine
aggregationsabout/Aggregationsgeneralquerystructure/Generalquerystructuretypes/Aggregationtypesdate_histogram/Datehistogramaggregationgeodistanceaggregations/Geodistanceaggregationsgeohashgridaggregation/Geohashgridaggregationglobalaggregation/Globalaggregationsignificant_termsaggregation/Significanttermsaggregationsampleraggregation/Sampleraggregationchildrenaggregation/Childrenaggregationnestedaggregation/Nestedaggregationreverse_nestedaggregation/Reversenestedaggregationnestingaggregations/Nestingaggregationsandorderingbuckets
aggregations,typesmetrics/Aggregationtypes,Metricsaggregationsbuckets/Aggregationtypes,Bucketsaggregationspipeline/Aggregationtypes
AmazonURL/Costandperformanceflexibility
AmazonS3URL/Creatingasnapshotrepository
AnalyzeAPIURL/Definingyourownanalyzers
analyzersusing/Usinganalyzersout-of-the-boxanalyzers/Out-of-the-boxanalyzersdefining/Definingyourownanalyzersdefaultanalyzers/Defaultanalyzers
www.EBooksWorld.ir
ApacheLucene/GettingbacktoApacheLuceneURL/Fulltextsearchingglossary/TheLuceneglossaryandarchitecturearchitecture/TheLuceneglossaryandarchitecturedocument/TheLuceneglossaryandarchitecturefield/TheLuceneglossaryandarchitectureterm/TheLuceneglossaryandarchitecturetoken/TheLuceneglossaryandarchitecturetokenizer/Inputdataanalysisscoring/IntroductiontoApacheLucenescoring
ApacheLuceneJavadocsfortheTFIDFURL/Scoringandqueryrelevance
ApacheLucenescoringabout/IntroductiontoApacheLucenescoringdocumentmatching,factors/Whenadocumentismatcheddefaultscoringformula/Defaultscoringformularelevantdocuments/Relevancymatters
ApacheSolrURL/UsingApacheSolrsynonyms
ApacheSolrsynonymsusing/UsingApacheSolrsynonymsexplicitsynonyms/Explicitsynonymsequivalentsynonyms/Equivalentsynonymsexpandproperty/Expandingsynonyms
ApacheTikaURL/Detectingthelanguageofthedocument
arbitrarygeoshapesabout/Arbitrarygeoshapespoint/Pointenvelope/Envelopepolygon/Polygonmultipolygon/Multipolygonexampleusage/Anexampleusagestoring,inindex/Storingshapesintheindex
arguments,CatAPIURL/Commonarguments
attributes,indexstructuremappingindex_name/Commonattributesindex/Commonattributesstore/Commonattributesdoc_values/Commonattributesboost/Commonattributesnull_value/Commonattributescopy_to/Commonattributes
www.EBooksWorld.ir
include_in_all/Commonattributesprecision_step/Number,Datecoerce/Numberignore_malformed/Number,Dateformat/Dateformat,referencelink/Datenumeric_resolution/Date
availableobjects,scriptexecution_doc/Objectsavailableduringscriptexecution_source/Objectsavailableduringscriptexecution_fields/Objectsavailableduringscriptexecution
AzureURL/Creatingasnapshotrepository
www.EBooksWorld.ir
Bbasicqueries
about/Basicqueriestermquery/Thetermquerytermsquery/Thetermsquerymatchallquery/Thematchallquerytypequery/Thetypequeryexistsquery/Theexistsquerymissingquery/Themissingquerycommontermsquery/Thecommontermsquerymatchquery/Thematchquerymultimatchquery/Themultimatchqueryquerystringquery/Thequerystringquerysimplequerystringquery/Thesimplequerystringqueryidentifiersquery/Theidentifiersqueryprefixquery/Theprefixqueryfuzzyquery/Thefuzzyquerywildcardquery/Thewildcardqueryrangequery/Therangequeryregularexpressionquery/Regularexpressionquerymorelikethisquery/Themorelikethisquery
batchindexingused,forspeedingupindexingprocess/Batchindexingtospeedupyourindexingprocess
Booleanpropertiessetnode.master/Configuringnoderolesnode.data/Configuringnoderolesnode.client/Configuringnoderoles
boolqueryabout/Theboolqueryshouldsection/Theboolquerymustsection/Theboolquerymust_notsection/Theboolqueryfilterparameter/Theboolqueryboostparameter/Theboolqueryminimum_should_matchparameter/Theboolquerydisable_coordparameter/Theboolqueryused,forexplicitfiltering/Explicitfilteringwithboolquery
boostingquery/Theboostingqueryboost_modeparameter
multiplyvalue/Structureofthefunctionqueryreplacevalue/Structureofthefunctionquerysumvalue/Structureofthefunctionquery
www.EBooksWorld.ir
avgvalue/Structureofthefunctionquerymaxvalue/Structureofthefunctionqueryminvalue/Structureofthefunctionquery
bucketaggregationsordering/Nestingaggregationsandorderingbuckets,Bucketsordering
buckets/Generalquerystructurebucketsaggregations
about/Bucketsaggregationsfilteraggregation/Filteraggregationfiltersaggregation/Filtersaggregationtermsaggregation/Termsaggregationrangeaggregation/Rangeaggregationdate_rangeaggregation/Daterangeaggregationip_rangeaggregation/IPv4rangeaggregationmissingaggregation/Missingaggregationhistogramaggregation/Histogramaggregation
bulkindexingdata,preparing/Preparingdataforbulkindexing
www.EBooksWorld.ir
Ccaches
about/Elasticsearchcachesfielddatacache/Fielddatacachefielddata,usingwithdocvalues/Fielddataanddocvaluesshardrequestcache/Shardrequestcachenodequerycache/Nodequerycacheindexingbuffers/Indexingbuffersavoiding,scenarios/Whencachesshouldbeavoided
CatAPIabout/TheCatAPIdefining/Thebasicsusing/UsingCatAPIcommonarguments/Commonargumentsexamples/Theexamples,Gettinginformationaboutthenodes
childrenaggregationabout/Childrenaggregation
CIDRnotationURL/IPv4rangeaggregation
ClassDateTimeFormatURL/Tuningthetypedeterminingmechanismfordates
clientnodeabout/Noderoles,Clientnode
clusterabout/Nodesandclustersinstalling/Installingandconfiguringyourclusterconfiguring/Installingandconfiguringyourclusterdirectorylayout/Thedirectorylayoutsystem-specificinstallationandconfiguration/Thesystem-specificinstallationandconfiguration
clusterhealthAPIabout/ClusterhealthAPIinformationdetails,controlling/Controllinginformationdetailsadditionalparameters/Additionalparameters
clusterrebalancingcontrolling/Controllingclusterrebalancingdefining/Understandingrebalanceimplementing/Clusterbeingreadysettings/Theclusterrebalancesettings,Controllingthenumberofshardsbeingmovedbetweennodesconcurrently
clustersettingsAPI/TheclustersettingsAPIclusterwideallocation
about/Cluster-wideallocation
www.EBooksWorld.ir
allocationawareness/Allocationawarenessallocationawareness,forcing/Forcingallocationawarenessfiltering/Filtering
CMSsystemURL/Creatinganewdocument
commontermsquery/Thecommontermsquerycompletionsuggester
about/CompletionsuggesterinElasticsearch2.2/Completionsuggester
completionsuggester,Elasticsearch2.1data,indexing/Indexingdataindexeddata,querying/Queryingindexedcompletionsuggesterdatacustomweights/Customweights
completionsuggester,Elasticsearch2.2about/Completionsuggester
compoundqueriesabout/Compoundqueriesboolquery/Theboolquerydis_maxquery/Thedis_maxqueryboostingquery/Theboostingqueryconstant_scorequery/Theconstant_scorequeryindicesquery/Theindicesquery
compressedoopsURL/Thememory
compressedordinaryobjectpointersreferencelink/MultipleElasticsearchinstancesonasinglephysicalmachine
configurationoptions,phrasesuggestermax_errors/Configurationseparator/Configuration
configurationoptions,termsuggestertext/Termsuggesterconfigurationoptionsfield/Termsuggesterconfigurationoptionsanalyzer/Termsuggesterconfigurationoptionssize/Termsuggesterconfigurationoptionssuggest_mode/Termsuggesterconfigurationoptionssort/Termsuggesterconfigurationoptions
constant_scorequery/Theconstant_scorequerycontent
searching,indifferentlanguages/Searchingcontentindifferentlanguagescontent,searchingindifferentlanguages
about/Searchingcontentindifferentlanguageslanguages,handling/Handlinglanguagesdifferentlymultiplelanguages,handling/Handlingmultiplelanguagesdocumentlanguage,detecting/Detectingthelanguageofthedocument
www.EBooksWorld.ir
sampledocument/Sampledocumentmappings/Themappingsdata,querying/Queryingqueries,combining/Combiningqueries
contextsuggesterabout/Contextsuggestertypes/Contexttypesusing/Usingcontextgeolocationcontext,using/Usingthegeolocationcontext
contextswitchesreferencelink/Threadpoolstuning
coretypes,indexstructuremappingabout/Coretypescommonattributes/Commonattributesstring/Stringnumber/Numberboolean/Booleanbinary/Binarydate/Date
counttoitfield/Addingpartialdocumentscreate,retrieve,update,delete(CRUD)
URL/ManipulatingdatawiththeRESTAPIcURLcommand
URL/InstallingElasticsearch
www.EBooksWorld.ir
Ddata
manipulating,withRESTAPI/ManipulatingdatawiththeRESTAPIstoring,inElasticsearch/StoringdatainElasticsearchpreparing,forbulkindexing/Preparingdataforbulkindexingindexing/Indexingthedata_allfield/The_allfield_sourcefield/The_sourcefieldinternalfields/Additionalinternalfieldssorting/Sortingdatadefaultsorting/Defaultsortingquerying,inchilddocuments/Queryingdatainthechilddocumentsquerying,inparentdocuments/Queryingdataintheparentdocuments
data,thatisnotflatindexing/Indexingdatathatisnotflatdata/Dataobjects/Objectsarrays/Arraysmappings/Mappingsdynamicbehavior/Tobeornottobedynamicobjectindexing,disabling/Disablingobjectindexing
datanodeabout/Noderoles
dataquerying,casesidentifiedlanguage,using/Querieswithanidentifiedlanguageunknownlanguage,using/Querieswithanunknownlanguage
datasetsforegroundsets/Choosingsignificanttermsbackgroundsets/Choosingsignificantterms
datasortingabout/Sortingdatadefaultsorting/Defaultsortingfields,selecting/Selectingfieldsusedforsortingmode/Sortingmodebehaviorformissingfields,specifying/Specifyingbehaviorformissingfieldsdynamiccriteria/Dynamiccriteriascoring,calculating/Calculatescoringwhensorting
date_histogramaggregationsabout/Datehistogramaggregationtimezones/Timezones
DEBpackageused,forinstallingElasticsearch/InstallingElasticsearchusingtheDEBpackage
www.EBooksWorld.ir
defaultindexing/Defaultindexingderivativeaggregation
URL/Derivativeaggregationdesignatednodesrolesforlargerclusters
about/Designatednoderolesforlargerclustersqueryaggregatornodes/Queryaggregatornodesdatanodes/Datanodesmastereligiblenodes/Mastereligiblenodes
DigitalOceanURL/Costandperformanceflexibility
directorylayout,clusterbin/Thedirectorylayoutconfig/Thedirectorylayoutlib/Thedirectorylayoutmodules/Thedirectorylayoutdata/Thedirectorylayoutlogs/Thedirectorylayoutplugins/Thedirectorylayoutwork/Thedirectorylayout
disk-basedshardallocationabout/Disk-basedshardallocationconfiguring/Configuringdiskbasedshardallocationdisabling/Disablingdiskbasedshardallocation
dis_maxquery/Thedis_maxqueryDocker
referencelink/MultipleElasticsearchinstancesonasinglephysicalmachinedocument
about/Documentcreating/Creatinganewdocumentautomaticidentifiercreation,creating/Automaticidentifiercreationretrieving/Retrievingdocumentsupdating/Updatingdocumentsnon-existingdocuments,dealingwith/Dealingwithnon-existingdocumentspartialdocuments,adding/Addingpartialdocumentsdeleting/Deletingdocuments
documenttype/Documenttypedoubletype
URL/Numberdynamictemplates
about/Templatesanddynamictemplates,Dynamictemplatesmatchingpattern/Thematchingpatterntargetfielddefinition,writing/Fielddefinitions
www.EBooksWorld.ir
EElasticsearch
about/ThebasicsofElasticsearchkeyconcepts/KeyconceptsofElasticsearchindex/Indexdocument/Documentdocumenttype/Documenttypemapping/Mappingindexing/Indexingandsearching,Elasticsearchindexingsearching/IndexingandsearchingURL/InstallingElasticsearch,Availablesimilaritymodelsinstalling/InstallingElasticsearchrunning/RunningElasticsearchshuttingdown/ShuttingdownElasticsearchconfiguring/ConfiguringElasticsearchinstalling,withRPMpackage/InstallingElasticsearchusingRPMpackagesinstalling,withDEBpackage/InstallingElasticsearchusingtheDEBpackageconfigurationfiles,localization/Elasticsearchconfigurationfilelocalizationquerying/QueryingElasticsearch,Asimplequeryexampledata/Theexampledatapaging/Pagingandresultsizeresultsize,controlling/Pagingandresultsizeversionvalue,returning/Returningtheversionvaluescore,limiting/Limitingthescorereturnfields,selecting/Choosingthefieldsthatwewanttoreturnsourcefiltering/Sourcefilteringscriptfields,using/Usingthescriptfieldsparameters,passingtoscriptfields/Passingparameterstothescriptfieldsparametrs,passingtoscriptfields/Passingparameterstothescriptfieldsscriptingcapabilities/ScriptingcapabilitiesofElasticsearchspatialcapabilities/Elasticsearchspatialcapabilitiesreferencedocumentation,URL/Configurationplugins/Elasticsearchpluginscaches/Elasticsearchcacheshardwarepreparations/Hardwaremonitoring/MonitoringKibana,URL/Marvel
Elasticsearch2.1URL/Threadpools
Elasticsearch2.2completionsuggester/Completionsuggester
Elasticsearchclusterpreparing,forhighindexing/Preparingtheclusterforhighindexingand
www.EBooksWorld.ir
queryingthroughputpreparing,forhighquerying/Preparingtheclusterforhighindexingandqueryingthroughput
ElasticsearchHQtoolusing/ElasticsearchHQ
Elasticsearchindexingabout/Elasticsearchindexingshards/Shardsandreplicasreplicas/Shardsandreplicasindices,creating/Creatingindices
Elasticsearchinfrastructurekeyconcepts/KeyconceptsoftheElasticsearchinfrastructurenode/Nodesandclusterscluster/Nodesandclustersshard/Shardsreplica/Replicasgateway/Gateway
Elasticsearchmonitoringabout/MonitoringElasticsearchHQtool,using/ElasticsearchHQMarveltool,using/MarvelSPMtool,using/SPMforElasticsearch
Elasticsearchtimemachineabout/Elasticsearchtimemachinesnapshotrepository,creating/Creatingasnapshotrepositorysnapshots,creating/Creatingsnapshotssnapshot,restoring/Restoringasnapshotparameters/Restoringasnapshotoldsnapshots,deleting/Cleaningup–deletingoldsnapshots
existsquery/TheexistsqueryExplainAPI
URL/Explainingthequeryexplaininformation
about/Understandingtheexplaininformationfieldanalysis/Understandingfieldanalysisquery,explaining/Explainingthequery
www.EBooksWorld.ir
Ffactors,forscorepropertycalculation
documentboost/Whenadocumentismatchedfieldboost/Whenadocumentismatchedcoord/Whenadocumentismatchedinversedocumentfrequency/Whenadocumentismatchedlengthnorm/Whenadocumentismatchedtermfrequency/Whenadocumentismatchedquerynorm/Whenadocumentismatched
FastVectorHighlighterURL/Underthehood
FedoraLinuxURL/InstallingElasticsearchusingRPMpackages
fielddatacacheabout/Fielddatacachesize,controlling/Fielddatasizecircuitbreakers/Circuitbreakers
fielddefinitionvariables,dynamictemplates{name}/Fielddefinitions{dynamic_type}/Fielddefinitions
filteringabout/Filteringinclude/Whatdoinclude,exclude,andrequiremeanrequire/Whatdoinclude,exclude,andrequiremeanexclude/Whatdoinclude,exclude,andrequiremean
filterslowercasefilter/Inputdataanalysissynonymsfilter/Inputdataanalysislanguagestemmingfilters/Inputdataanalysis
filtersandtokenizersURL/Definingyourownanalyzers
filtertypesURL/Definingyourownanalyzers
fulltextsearchingabout/FulltextsearchingApacheLucene,glossary/TheLuceneglossaryandarchitectureApacheLucene,architecture/TheLuceneglossaryandarchitectureinputdataanalysis/Inputdataanalysisindexing/Indexingandqueryingquerying/Indexingandqueryingscoring/Scoringandqueryrelevancequeryrelevance/Scoringandqueryrelevance
functionscorequery
www.EBooksWorld.ir
about/Thefunctionscorequerystructure/Structureofthefunctionqueryweightfactorfunction/Theweightfactorfunctionfield_value_factorfunction/Fieldvaluefactorfunctionscript_scorefunction/Thescriptscorefunctionrandom_scorefunction/Therandomscorefunctiondecayfunctions/Decayfunctions
function_scorequeryURL/Decayfunctions
fuzzyqueryabout/Thefuzzyquery
www.EBooksWorld.ir
Ggateway/Gatewaygatewaymodule
about/Thegatewayandrecoverymodules,Thegatewaygatewayrecoveryoptions
gateway.recover_after_master_nodes/Additionalgatewayrecoveryoptionsgateway.recover_after_data_nodes/Additionalgatewayrecoveryoptionsgateway.expected_master_nodes/Additionalgatewayrecoveryoptionsgateway.expected_data_nodes/Additionalgatewayrecoveryoptions
generalpreparations,singleElasticsearchnodeabout/Thegeneralpreparationsswapping,avoiding/Avoidingswappingfiledescriptors/Filedescriptorsvirtualmemory/Virtualmemory,Thememory
Geo/Geoboundsaggregationgeodistanceaggregations
about/GeodistanceaggregationsGeohash
URL/Geohashgridaggregationgeohashgridaggregation
about/GeohashgridaggregationURL/Geohashgridaggregation
GeohashvalueURL/Exampledata
GeoJSONURL/Arbitrarygeoshapes
geospatialqueriesURL/Samplequeries
geo_fieldpropertiesgeohash/Additionalgeo_fieldpropertiesgeohash_precision/Additionalgeo_fieldpropertiesgeohash_prefix/Additionalgeo_fieldpropertiesignore_malformed/Additionalgeo_fieldpropertieslat_lon/Additionalgeo_fieldpropertiesprecision_step/Additionalgeo_fieldproperties
GitHubURL/Installingpluginsautomaticstorethrottling,URL/Automaticstorethrottling
Githubissue,URL/String
globalaggregationabout/Globalaggregation
Groovy
www.EBooksWorld.ir
URL/ScriptingcapabilitiesofElasticsearch
www.EBooksWorld.ir
Hhardwarepreparations,forrunningElasticsearch
about/Hardwarephysicalservers/Physicalserversoracloudcloud/PhysicalserversoracloudCPU/CPURAMmemory/RAMmemorymassstorage/Massstoragenetwork/Thenetworkserverscounting/Howmanyserverscostcutting/Costcutting
HDFSURL/Creatingasnapshotrepository
highlightedfragmentscontrolling/Controllinghighlightedfragments
highlightertypeselecting/Forcinghighlightertype
highlightingabout/Highlightingusing/Gettingstartedwithhighlightingfieldconfiguration/FieldconfigurationApacheLucene,using/Underthehoodhighlightertype,selecting/ForcinghighlightertypeHTMLtags,,configuring/ConfiguringHTMLtagsglobalsettings/Globalandlocalsettingslocalsettings/Globalandlocalsettingsmatchingneed/Requirematchingcustomquery/CustomhighlightingqueryPostingshighlighter/ThePostingshighlighter,Validatingyourqueries
horizontalexpansionabout/Horizontalexpansionreplicas,automaticcreation/Automaticallycreatingthereplicasredundancy/Redundancyandhighavailabilityhighavailability/Redundancyandhighavailabilityreferencelinks/Redundancyandhighavailabilitycostandperformanceflexibility/Costandperformanceflexibilitycontinuesupgrades/ContinuousupgradesmultipleElasticsearchinstances,onsinglephysicalmachine/MultipleElasticsearchinstancesonasinglephysicalmachinedesignatednodesrolesforlargerclusters/Designatednoderolesforlargerclusters
howsimilarphrase/UnderstandingtheexplaininformationHTTPmodule
www.EBooksWorld.ir
properties,URL/HTTPhostHTTPprotocol
URL/UnderstandingtheRESTAPIHTTPtransportsettings,adjusting
node/AdjustingHTTPtransportsettingsHTTP,disabling/DisablingHTTPHTTPport/HTTPportHTTPhost/HTTPhost
HyperLogLog++algorithmURL/Fieldcardinality
www.EBooksWorld.ir
Iidentifiersquery
about/Theidentifiersqueryindex
segments/TheLuceneglossaryandarchitectureabout/Index
index-timeboostingusing/Whendoesindex-timeboostingmakesense?defining,inmappings/Definingboostinginthemappings
indexaliasabout/Indexaliasingandusingittosimplifyyoureverydayworkdefining/Analiascreating/Creatinganaliasmodifying/Modifyingaliasescommands,combining/Combiningcommandsretrieving/Retrievingaliasesremoving/Removingaliasesfiltering/Filteringaliasesandrouting/Aliasesandroutingandzerodowntimereindexing/Zerodowntimereindexingandaliases
indexation/Inputdataanalysisindexingprocess
speedingup,batchindexingused/Batchindexingtospeedupyourindexingprocess
indexingrelatedadvicesabout/Indexingrelatedadviceindexrefreshrate/Indexrefreshratethreadpools,tuning/Threadpoolstuningautomaticstorethrottling/Automaticstorethrottlingtime-baseddata,handling/Handlingtime-baseddatamultipledatapaths/Multipledatapathsdatadistribution/Datadistributionbulkindexing/BulkindexingRAMbuffer,usedforindexing/RAMbufferforindexing
indexrefreshratereferencelink/Indexrefreshrate
indexstructuremodifying,withupdateAPI/ModifyingyourindexstructurewiththeupdateAPI
indexstructure,modifyingmappings/Themappingsnewfield,adding/Addinganewfieldtotheexistingindexexistingindexfields,modifying/Modifyingfieldsofanexistingindex
www.EBooksWorld.ir
indexstructure,parent-childrelationshipabout/Indexstructureanddataindexingchildmappings/Childmappingsparentmappings/Parentmappingsparentdocument/Theparentdocumentchildrendocuments/Childdocuments
indexstructuremappingabout/Indexstructuremappingtypes/Typeandtypesdefinitiontypesdefinition/Typeandtypesdefinitionfields/Fieldscoretypes/Coretypesmultifields/MultifieldsIPaddresstype/TheIPaddresstypetokencounttype/Tokencounttype
indices,Elasticsearchindexingcreating/Creatingindicesautomaticcreation,altering/Alteringautomaticindexcreationnewlycreatedindex,settings/Settingsforanewlycreatedindexdeleting/Indexdeletion
indicesanalyzeAPIURL/Queryanalysis
indicesquery/TheindicesqueryindicessettingsAPI/TheindicessettingsAPIindicesstatsAPI
about/IndicesstatsAPIdocs/Docsstore/Storeindexing/Indexing,get,andsearchget/Indexing,get,andsearchsearch/Indexing,get,andsearchdefining/Additionalinformation
internalfields_id/Additionalinternalfields_uid/Additionalinternalfields_type/Additionalinternalfields_field_names/Additionalinternalfields
invertedindexabout/TheLuceneglossaryandarchitectureURL/Index
www.EBooksWorld.ir
JJava
URL/Fulltextsearchinginstalling/InstallingJava
JavaScriptObjectNotation(JSON)URL/RunningElasticsearch
JavathreadsURL/Threadpools
JavatypesURL/Number
JavaVersion7URL/InstallingJava
JavaVirtualMachine(JVM)/ConfiguringElasticsearchJMeter
URL/WhencachesshouldbeavoidedJodaTimelibrary
URL/DaterangeaggregationJSON
URL/Document
www.EBooksWorld.ir
KKibana
URL/Marvel
www.EBooksWorld.ir
Llanguageanalyzer
URL/Out-of-the-boxanalyzerslanguageanalyzers
URL/Sampledocumentlanguagedetection
URL/DetectingthelanguageofthedocumentLevenshteinalgorithm
URL/TheBooleanmatchqueryLinux
Elasticsearch,installing/InstallingElasticsearchonLinuxElasticsearch,configuringassystemservice/ConfiguringElasticsearchasasystemserviceonLinux
LogstashURL/Indexaliasingandusingittosimplifyyoureverydaywork
LuceneJavadocsURL/Defaultscoringformula
Lucenequerysyntaxabout/Lucenequerysyntax
www.EBooksWorld.ir
Mmapping/Mappingmappings
configuration/Mappingsconfigurationtypedeterminingmechanism/Typedeterminingmechanismindexstructuremapping/Indexstructuremappinganalyzers,using/Usinganalyzerssimilaritymodels/Differentsimilaritymodelsabout/Mappingsfinalmappings/Finalmappingssending,toElasticsearch/SendingthemappingstoElasticsearchnewfield,addingtoexistingindex/Addinganewfieldtotheexistingindexfieldofexistingindex,modifying/Modifyingfieldsofanexistingindex
Marveltoolusing/Marvel
masternodeabout/Noderoles,Masternode
matchallquery/Thematchallquerymatchingpattern,dynamictemplates
match/Thematchingpatternunmatch/Thematchingpattern
matchqueryabout/ThematchqueryBooleanmatchquery/TheBooleanmatchqueryphrasematchquery/Thephrasematchquerymatchphraseprefixquery/Thematchphraseprefixquery
MavenURL/Installingplugins
MavenCentralURL/Installingplugins
MavenSonatypeURL/Installingplugins
mergepolicyabout/Themergepolicyproperties/Themergepolicy
mergescheduler/Themergeschedulermetricsaggregations
about/Metricsaggregationsmin/Minimum,maximum,average,andsummax/Minimum,maximum,average,andsumavg/Minimum,maximum,average,andsumsum/Minimum,maximum,average,andsummissingvalues/Missingvalues
www.EBooksWorld.ir
scripts,using/Usingscriptsfieldvaluestatistics/Fieldvaluestatisticsandextendedstatisticsextended_statistics/Fieldvaluestatisticsandextendedstatisticsvalue_countaggregation/Valuecountfieldcardinalityaggregation/Fieldcardinalitypercentilesaggregation/Percentilespercentile_ranksaggregation/Percentilerankstop_hitsaggregation/Tophitsaggregationtop_hitsaggregation,additionalparameters/Additionalparametersgeo_boundsaggregation/Geoboundsaggregationscriptedmetricsaggregation/Scriptedmetricsaggregation
MicrosoftWindowsplatformfilehandles,URL/ConfiguringElasticsearch
minimum_should_matchparameterURL/Theboolquery
missingquery/Themissingquerymorelikethisquery
about/Themorelikethisquerymovingaveragescalculation
URL/Pipelineaggregationsmoving_avgaggregation
URL/Movingavgaggregationabout/Movingavgaggregationfuturebuckets,predicting/Predictingfuturebucketsmodels/Themodelsmodels,URL/Themodels
multimatchquery/ThemultimatchquerymultipleElasticsearchinstances,onsinglephysicalmachine
about/MultipleElasticsearchinstancesonasinglephysicalmachineshard,preventingonsamenode/Preventingashardanditsreplicasfrombeingonthesamenodereplicas,preventingonsamenode/Preventingashardanditsreplicasfrombeingonthesamenode
multipleindicesURL/URIsearch
multiterm/Queryrewritemultivaluedfield/DocumentMustache
URL/ScriptingcapabilitiesofElasticsearch
www.EBooksWorld.ir
Nnativecode,using
factoryimplementation/Thefactoryimplementationnativescriptimplementation/Implementingthenativescriptplugindefinition/Theplugindefinitionplugin,installing/Installingthepluginscript,running/Runningthescript
nestedaggregationabout/Nestedaggregation
nestedobjectsusing/UsingnestedobjectsURL/Usingnestedobjectsnestedqueries/Scoringandnestedqueriesscore_modeproperty,setting/Scoringandnestedqueries
nestingaggregationsabout/Nestingaggregationsandorderingbuckets
networkattachedstorage(NAS)/Massstoragenode/Nodesandclusters
discoveryTopicnabout/Understandingnodediscoverydiscoverytypes/Understandingnodediscovery,Discoverytypesroles/Noderolesclustername,setting/Settingthecluster’snameZendiscovery/ZendiscoveryHTTPtransportsettings,adjusting/AdjustingHTTPtransportsettings
noderolesmasternode/Noderoles,Masternodedatanode/Noderoles,Datanodeclientnode/Noderoles,Clientnodeconfiguring/Configuringnoderoles
nodesinfoAPIabout/NodesinfoAPIrequisites/NodesinfoAPIextensiveinformation,returning/Returnedinformation
NoSQLURL/ManipulatingdatawiththeRESTAPI
number,indexstructuremappingbyte/Numbershort/Numberinteger/Numberlong/Numberfloat,URL/Numberfloat/Numberdouble/Number
www.EBooksWorld.ir
double,URL/Number
www.EBooksWorld.ir
Oobjectindexing
disabling/Disablingobjectindexingofficialrepository
URL/InstallingpluginsOpenJDK
URL/InstallingJavaoptimisticlocking
URL/Versioningoptions,termsuggester
lowercase_terms/Additionaltermsuggesteroptionsmax_edits/Additionaltermsuggesteroptionsprefix_len/Additionaltermsuggesteroptionsmin_word_len/Additionaltermsuggesteroptionsshard_size/Additionaltermsuggesteroptions
out-of-the-boxanalyzersstandard/Out-of-the-boxanalyzerssimple/Out-of-the-boxanalyzerswhitespace/Out-of-the-boxanalyzersstop/Out-of-the-boxanalyzerskeyword/Out-of-the-boxanalyzerspattern/Out-of-the-boxanalyzerslanguage/Out-of-the-boxanalyzerssnowball/Out-of-the-boxanalyzers
www.EBooksWorld.ir
Pparameters,Booleanmatchquery
operator/TheBooleanmatchqueryanalyzer/TheBooleanmatchqueryfuzziness/TheBooleanmatchqueryprefix_length/TheBooleanmatchquerymax_expansions/TheBooleanmatchqueryzero_terms_query/TheBooleanmatchquerycero_terms_query/TheBooleanmatchquerylenient/TheBooleanmatchquery
parameters,fuzzyqueryvalue/Thefuzzyqueryboost/Thefuzzyqueryfuzziness/Thefuzzyqueryprefix_length/Thefuzzyquerymax_expansions/Thefuzzyquery
parameters,morelikethisqueryfields/Themorelikethisquerylike/Themorelikethisqueryunlike/Themorelikethisqueryin_term_freq/Themorelikethisquerymax_query_terms/Themorelikethisquerystop_words/Themorelikethisquerymin_doc_freq/Themorelikethisquerymin_word_len/Themorelikethisquerymax_word_len/Themorelikethisqueryboost_terms/Themorelikethisqueryboost/Themorelikethisqueryinclude/Themorelikethisqueryminimum_should_match/Themorelikethisqueryanalyzer/Themorelikethisquery
parameters,querystringqueryquery/Thequerystringquerydefault_field/Thequerystringquerydefault_operator/Thequerystringqueryanalyzer/Thequerystringqueryallow_leading_wildcard/Thequerystringquerylowercase_expand_terms/Thequerystringqueryenable_position_increments/Thequerystringqueryfuzzy_max_expansions/Thequerystringqueryfuzzy_prefix_length/Thequerystringqueryphrase_slop/Thequerystringqueryboost/Thequerystringquery
www.EBooksWorld.ir
analyze_wildcard/Thequerystringqueryauto_generate_phrase_queries/Thequerystringqueryminimum_should_match/Thequerystringqueryfuzziness/Thequerystringquerymax_determined_states/Thequerystringquerylocale/Thequerystringquerytime_zone/Thequerystringquerylenient/Thequerystringquery
parameters,rangequerygte/Therangequerygt/Therangequerylte/Therangequerylt/Therangequery
parent-childrelationshipusing/Usingtheparent-childrelationshipindexstructure/Indexstructureanddataindexingdataindexing/Indexstructureanddataindexingquerying/Queryingperformanceconsiderations/Performanceconsiderations
parentaggregations/Availabletypespatternanalyzer
URL/Out-of-the-boxanalyzerspercolator
about/Percolatorindex/Theindexpreparing/Percolatorpreparationexploring/Gettingdeeperreturnedresultssize,controlling/Controllingthesizeofreturnedresultsusing,forandscorecalculation/Percolatorandscorecalculationcombining,withotherfunctionalities/Combiningpercolatorswithotherfunctionalitiesmatchingqueriescount,obtaining/Gettingthenumberofmatchingqueriesindexeddocumentspercolation/Indexeddocumentpercolation
phrasematchqueryslop/Thephrasematchqueryanalyzer/Thephrasematchquery
phrasesuggesterabout/Phrasesuggesterconfiguration/Configuration
pipelineaggregationsabout/PipelineaggregationsURL/Pipelineaggregationsparentaggregationfamily/Availabletypessiblingaggregationfamily/Availabletypes
www.EBooksWorld.ir
types/Availabletypes,Pipelineaggregationtypesotheraggregations,referencing/Referencingotheraggregationsdata,gaps/Gapsinthedata
pipelineaggregations,typessum_bucket/Min,max,sum,andaveragebucketaggregationsmin_bucket/Min,max,sum,andaveragebucketaggregationsmax_bucket/Min,max,sum,andaveragebucketaggregationsavg_bucket/Min,max,sum,andaveragebucketaggregationscumulative_sumaggregation/Cumulativesumaggregationbucket_selectoraggregation/Bucketselectoraggregationbucket_scriptaggregation/Bucketscriptaggregationserial_diffaggregation/Serialdifferencingaggregationderivativeaggregation/Derivativeaggregationmoving_avgaggregation/Movingavgaggregation
pluginsabout/Elasticsearchpluginsbasics/Thebasicsinstalling/Installingpluginsremoving/Removingplugins
PostingsHighlighterURL/Underthehoodabout/ThePostingshighlighter
prefixquery/Theprefixqueryproperties,faultdetectionpingsettings
discovery.zen.fd.ping_interval/Faultdetectionpingsettingsdiscovery.zen.fd.ping_timeout/Faultdetectionpingsettingsdiscovery.zen.fd.ping_retries/Faultdetectionpingsettings
properties,mergepolicyindex.merge.policy.expunge_deletes_allowed/Themergepolicyindex.merge.policy.max_merge_at_once/Themergepolicyindex.merge.policy.max_merge_at_once_explicit/Themergepolicyindex.merge.policy.max_merged_segment/Themergepolicyindex.merge.policy.segments_per_tier/Themergepolicyindex.merge.policy.reclaim_deletes_weight/Themergepolicy
www.EBooksWorld.ir
Qqueries
selecting,forwarming/Choosingqueriesforwarmingqueryboost
applying,todocument/Theboostqueryboosts
used,forinfluencingscores/Influencingscoreswithqueryboostsabout/Theboostadding,toqueries/Theboost,Addingtheboosttoqueriesscore,modifying/Modifyingthescore
queryingdata,inchilddocuments/Queryingdatainthechilddocumentsdata,inparentdocuments/Queryingdataintheparentdocuments
queryingprocessabout/Understandingthequeryingprocessquerylogic/Querylogicsearchtype,specifying/Searchtypesearchexecutionpreference,specifying/SearchexecutionpreferencesearchshardsAPI,specifying/SearchshardsAPI
queryparserURL/Lucenequerysyntax
queryrewriteabout/Queryrewriteprefixquery,example/PrefixqueryasanexampleApacheLucene,using/GettingbacktoApacheLuceneproperties/Queryrewriteproperties
querystringqueryabout/Thequerystringqueryrunning,againstmultiplefields/Runningthequerystringqueryagainstmultiplefields
www.EBooksWorld.ir
RRackspace
URL/CostandperformanceflexibilityRAID
URL/Massstoragerangeaggregation
about/Rangeaggregationkeyedbuckets/Keyedbuckets
rangequery/Therangequeryrecoverymodules
about/Thegatewayandrecoverymodulesrecoveryprocess
about/Recoverycontrolgatewayrecoveryoptions/AdditionalgatewayrecoveryoptionsindicesrecoveryAPI/IndicesrecoveryAPIdelayedallocation/Delayedallocationindexrecoveryprioritization/Indexrecoveryprioritization
regularexpressionqueryabout/RegularexpressionqueryURL/Regularexpressionquery
replica/Replicasreplicas,Elasticsearchindexing
about/Shardsandreplicaswriteconsistency,controlling/Writeconsistency
RESTAPIused,fordatamanipulation/ManipulatingdatawiththeRESTAPIabout/UnderstandingtheRESTAPIURL/UnderstandingtheRESTAPIdata,storinginElasticsearch/StoringdatainElasticsearchdocuments,retrieving/Retrievingdocumentsdocuments,updating/Updatingdocumentsdocuments,deleting/Deletingdocumentsversioning/Versioning
resultsfiltering/Filteringyourresultsquerycontext/Thecontextisthekeyexplicitfiltering,boolqueryused/Explicitfilteringwithboolquery
reverse_nestedaggregationabout/Reversenestedaggregation
rewriteproperty,valuesscoring_boolean/Queryrewritepropertiesconstant_score/Queryrewritepropertiesconstant_score_boolean/Queryrewriteproperties
www.EBooksWorld.ir
top_terms/Queryrewritepropertiestop_terms_blendedfreqs/Queryrewritepropertiestop_terms_boost_N/Queryrewriteproperties
rightqueryselecting/Choosingtherightqueryusecases/Theusecasesresults,limitingtogiventags/Limitingresultstogiventagsvaluesinrange,searching/Searchingforvaluesinarange
routingabout/Introductiontorouting,Routingdefaultindexing/Defaultindexingdefaultsearching/Defaultsearchingparameters/Theroutingparametersfields/Routingfields
RPMpackageused,forinstallingElasticsearch/InstallingElasticsearchusingRPMpackages
www.EBooksWorld.ir
Ssample
distance-basedsorting/Distance-basedsortingboundingboxfiltering/Boundingboxfilteringdistance,limiting/Limitingthedistance
samplequeriesabout/Samplequeries
sampleraggregationabout/Sampleraggregation
scoreabout/IntroductiontoApacheLucenescoringinfluencing,withqueryboosts/Influencingscoreswithqueryboostsmodifying/Modifyingthescore
score,modifyingabout/Modifyingthescoreconstant_scorequery/Constantscorequeryboostingquery/Boostingqueryfunctionscorequery/Thefunctionscorequery
score_modeparameterabout/Structureofthefunctionquerymultiplevalue/Structureofthefunctionquerysumvalue/Structureofthefunctionqueryavgvalue/Structureofthefunctionqueryfirstvalue/Structureofthefunctionquerymaxvalue/Structureofthefunctionqueryminvalue/Structureofthefunctionquery
scriptfieldsselecting/Usingthescriptfieldsparameters,passingto/Passingparameterstothescriptfields
scriptingcapabilitiesabout/ScriptingcapabilitiesofElasticsearchscriptexecution,availableobjects/Objectsavailableduringscriptexecutionscript,types/Scripttypesquerying,scriptsused/Queryingwithscriptsparameters,using/Scriptingwithparameterslanguages,Groovy/Scriptlanguagesotherthanembeddedlanguages,using/Usingotherthanembeddedlanguagesnativecode,using/Usingnativecode
scriptpropertiesscript/Queryingwithscriptsinline/Queryingwithscriptsid/Queryingwithscriptsfile/Queryingwithscripts
www.EBooksWorld.ir
lang/Queryingwithscriptsparams/Queryingwithscripts
scripts,scripted_metricaggregationinit_script/Scriptedmetricsaggregationmap_script/Scriptedmetricsaggregationcombine_script/Scriptedmetricsaggregationreduce_script/Scriptedmetricsaggregation
scripttypesabout/Scripttypesinlinescripts/Scripttypes,Inlinescriptsinfilescripts/Infilescriptsindexedscripts/Indexedscripts
ScrollAPIabout/TheScrollAPIproblemdefinition/Problemdefinitionproblemdefinition,solution/Scrollingtotherescue
searching/Defaultsearchingsearchingrequestexecution/Indexingandsearchingsegmentmerging
about/Introductiontosegmentmerging,Segmentmergingneedfor/Theneedforsegmentmergingmergepolicy/Themergepolicymergepolicy,basicproperties/Themergepolicymergescheduler/Themergeschedulerthrottling/Throttling
shardallocationIPaddress,usingfor/UsingtheIPaddressforshardallocationcancelling/Cancelingshardallocationforcing/ForcingshardallocationmultiplecommandsperHTTPrequest/MultiplecommandsperHTTPrequestoperations,allowingonprimaryshards/Allowingoperationsonprimaryshards
shardandreplicaallocationcontrolling/Controllingtheshardandreplicaallocationcontrolling,explicitly/Explicitlycontrollingallocationnodeparameters,specifying/Specifyingnodeparametersconfiguration/Configurationindex,creating/Indexcreationnodes,excluding/Excludingnodesfromallocationnodeattributes,requiring/Requiringnodeattributesnumberofshardsandreplicaspernode/Thenumberofshardsandreplicaspernodeallocationthrottling/Allocationthrottlingclusterwideallocation/Cluster-wideallocationshardsandreplicas,movingmanually/Manuallymovingshardsandreplicas
www.EBooksWorld.ir
rollingrestarts,handling/Handlingrollingrestartsshardrequestcache
about/Shardrequestcacheenabling/Enablingandconfiguringtheshardrequestcacheconfiguring/Enablingandconfiguringtheshardrequestcacheperrequestshardrequestcache,disabling/Perrequestshardrequestcachedisablingusagemonitoring/Shardrequestcacheusagemonitoring
shards/Index,Shardsmoving/Movingshards
shards,Elasticsearchindexingabout/Shardsandreplicaswriteconsistency,controlling/Writeconsistency
siblingaggregations/Availabletypessignificant_termsaggregation
about/Significanttermsaggregationsignificantterms,selecting/Choosingsignificanttermsmultiplevalue,analyzing/Multiplevalueanalysis
similaritymodelsabout/Differentsimilaritymodelsper-fieldsimilarity,setting/Settingper-fieldsimilarityOkapiBM25model/Availablesimilaritymodelsrandomnessmodel,divergence/Availablesimilaritymodelsinformation-basedmodel/Availablesimilaritymodelsdefaultsimilarity,configuring/ConfiguringdefaultsimilarityBM25similarity,configuring/ConfiguringBM25similarityDFRsimilarity,configuring/ConfiguringDFRsimilarityIBsimilarity,configuring/ConfiguringIBsimilarity
simplequerystringqueryabout/ThesimplequerystringqueryURL/Thesimplequerystringquery
singleElasticsearchnodetuning/PreparingasingleElasticsearchnodegeneralpreparations/Thegeneralpreparationsfielddatacache/Fielddatacacheandbreakingthecircuitcircuit,breaking/Fielddatacacheandbreakingthecircuitdocvalues,using/UsedocvaluesRAMbuffer,usedforindexing/RAMbufferforindexingindexrefreshrate/Indexrefreshratethreadpools/Threadpools
snapshotscreating/Creatingsnapshotsadditionalparameters/Additionalparameters
snowballanalyzer
www.EBooksWorld.ir
URL/Out-of-the-boxanalyzersSoftwareasaService(SaaS)/SPMforElasticsearchsourcefiltering/Sourcefilteringspan/Aspanspanfirstquery/Spanfirstqueryspannearquery/Spannearqueryspannotquery/Spannotqueryspanorquery/Spanorqueryspanqueries
using/Usingspanqueriesspan/Aspanspan_termquery/Spantermqueryspanfirstquery/Spanfirstqueryspannearquery/Spannearqueryspanorquery/Spanorqueryspannotquery/Spannotqueryspan_withinquery/Spanwithinqueryspan_containingquery/Spancontainingqueryspan_multiquery/Spanmultiqueryperformanceconsiderations/Performanceconsiderations
span_contaningquery/Spancontainingqueryspan_multiquery/Spanmultiqueryspan_termquery/Spantermqueryspan_withinquery/Spanwithinqueryspatialcapabilities
about/Elasticsearchspatialcapabilitiesmappingspreparation/Mappingpreparationforspatialsearchesexampledata/Exampledatageo_fieldproperties/Additionalgeo_fieldproperties
SPMtoolURL/SPMforElasticsearch
standardanalyzerURL/Out-of-the-boxanalyzers
stateandhealth,clustermonitoring/Monitoringyourcluster’sstateandhealthclusterhealthAPI/ClusterhealthAPIindicesstatsAPI/IndicesstatsAPInodesinfoAPI/NodesinfoAPInodesstatsAPI/NodesstatsAPIclusterstateAPI/ClusterstateAPIclusterstatsAPI/ClusterstatsAPIpendingtasksAPI/PendingtasksAPIindicesrecoveryAPI/IndicesrecoveryAPIindicesshardstoresAPI/IndicesshardstoresAPI
www.EBooksWorld.ir
indicessegmentsAPI/IndicessegmentsAPIstaticproperties,forindexingbuffersizeconfiguration
indices.memory.index_buffer_size/Indexingbuffersindices.memory.min_index_buffer_size/Indexingbuffersindices.memory.max_index_buffer_size/Indexingbuffersindices.memory.min_shard_index_buffer_size/Indexingbuffers
statuscodedefinitionURL/Indexingthedata
stemmingURL/Out-of-the-boxanalyzers
stopanalyzerURL/Out-of-the-boxanalyzers
stopwordsURL/Thecommontermsquery
string,indexstructuremappingterm_vector/Stringanalyzer/Stringsearch_analyzer/Stringnorms.enabled/Stringnorms.loading/Stringposition_offset_gap/Stringindex_options/Stringignore_above/String
suggestersusing/UsingsuggestersURL/Usingsuggesters,Additionaltermsuggesteroptionstypes/Availablesuggestertypessuggestions,including/Includingsuggestionsresponse/Suggesterresponsetextproperty/Suggesterresponsescoreproperty/Suggesterresponsefreqproperty/Suggesterresponse
synonymrulesdefining/DefiningsynonymrulesApacheSolrsynonyms,using/UsingApacheSolrsynonymsWordNetsynonyms,using/UsingWordNetsynonyms
synonymsabout/Wordswiththesamemeaningfiltering/Synonymfilterinmappings/Synonymsinthemappingsstoring,infilesystem/Synonymsstoredonthefilesystemrules,defining/Definingsynonymrulesindex-timesynonymsexpansion/Queryorindex-timesynonymexpansionquery-timesynonymexpansion/Queryorindex-timesynonymexpansion
www.EBooksWorld.ir
synonymsfilterusing/Synonymfilter
system-specificinstallationandconfigurationabout/Thesystem-specificinstallationandconfigurationElasticsearch,installingonLinux/InstallingElasticsearchonLinuxElasticsearch,configuringassystemserviceonLinux/ConfiguringElasticsearchasasystemserviceonLinuxElasticsearch,usingassystemserviceonWindows/ElasticsearchasasystemserviceonWindows
www.EBooksWorld.ir
TT-Digestalgorithm
URL/Percentilestemplates
about/Templatesexample/Anexampleofatemplate
termquery/Thetermquerytermsaggregation
about/Termsaggregationapproximatecounts/Countsareapproximateminimumdocumentcount/Minimumdocumentcount
termsquery/Thetermsquerytermsuggester
about/Termsuggesterconfigurationoptions/Termsuggesterconfigurationoptionsoptions/Additionaltermsuggesteroptions
threadpoolsabout/Threadpoolsgeneric/Threadpoolsindex/Threadpoolssearch/Threadpoolssuggest/Threadpoolsget/Threadpoolsbulk/Threadpoolspercolate/Threadpools
throttling,adjustingtypesetting/Throttlingvalue/Throttlingnonevalue/Throttlingmergevalue/Throttlingallvalue/Throttling
timezonesURL/Timezones
tree-likestructuresindexing/Indexingtree-likestructuresdatastructure/Datastructureanalysis/Analysis
typedeterminingmechanismabout/Typedeterminingmechanismdisabling/Disablingthetypedeterminingmechanismtuning,fornumerictypes/Tuningthetypedeterminingmechanismfornumerictypestuning,fordates/Tuningthetypedeterminingmechanismfordates
www.EBooksWorld.ir
typeproperty,valuesplain/Forcinghighlightertypefvh/Forcinghighlightertypepostins/Forcinghighlightertype
typequery/Thetypequerytypes,suggesters
term/Availablesuggestertypes,Termsuggesterphrase/Availablesuggestertypes,Phrasesuggestercompletion/Availablesuggestertypes,Completionsuggestercontext/Availablesuggestertypes,Contextsuggester
www.EBooksWorld.ir
UUnicast
URL/DiscoverytypesupdateAPI
used,formodifyingindexstructure/ModifyingyourindexstructurewiththeupdateAPI
UpdateAPIURL/Addingpartialdocuments
updatesettingsAPIabout/TheupdatesettingsAPIclustersettingsAPI/TheclustersettingsAPIindicessettingsAPI/TheindicessettingsAPI
URIquerystringparametersabout/URIquerystringparametersquery/Thequerydefaultsearchfield/Thedefaultsearchfieldanalyzerproperty/Analyzerdefaultoperator/Thedefaultoperatorpropertyexplainparameter/Queryexplanationfieldsreturned/Thefieldsreturnedresults,sorting/Sortingtheresultssearchtimeout/Thesearchtimeoutresultswindow/Theresultswindowpershardresults,limiting/Limitingper-shardresultsunavailableindices,ignoring/Ignoringunavailableindicessearchtype/Thesearchtypelowercasingtermsexpansion/Lowercasingtermexpansionwildcardqueriesanalysis/Wildcardandprefixanalysisanalyze_wildcardproperty/Wildcardandprefixanalysisprefixqueriesanalysis/Wildcardandprefixanalysis
URIrequestqueryused,forsearching/SearchingwiththeURIrequestquerysampledata/SampledataURIsearch/URIsearchanalyzing/Queryanalysisparameters/URIquerystringparametersLucenequerysyntax/Lucenequerysyntax
URIsearchabout/URIsearchElasticsearchqueryresponse/ElasticsearchqueryresponseURL/Wildcardandprefixanalysis
www.EBooksWorld.ir
VValidateAPI
using/UsingtheValidateAPIvalues,has_childqueryparameter
none/Queryingdatainthechilddocumentsmin/Queryingdatainthechilddocumentsmax/Queryingdatainthechilddocumentssum/Queryingdatainthechilddocumentsavg/Queryingdatainthechilddocuments
values,inrangesearching/Searchingforvaluesinarangematcheddocuments,boosting/Boostingsomeofthematcheddocumentslowerscoringpartialqueries,ignoring/IgnoringlowerscoringpartialqueriesLucenequerysyntax,usinginqueries/UsingLucenequerysyntaxinqueriesuserquerieswithouterrors,handling/Handlinguserquerieswithouterrorsprefixes,usedforprovidingautocompletefunctionality/Autocompleteusingprefixessimilarterms,finding/Findingtermssimilartoagivenonespans/Spans,spanseverywhere
values,score_modepropertyavg/Scoringandnestedqueriessum/Scoringandnestedqueriesmin/Scoringandnestedqueriesmax/Scoringandnestedqueriesnone/Scoringandnestedqueries
versioningabout/Versioningusageexample/Usageexamplefromexternalsystem/Versioningfromexternalsystems
verticalscaling/PreparingasingleElasticsearchnode
www.EBooksWorld.ir
Wwarmingquery
about/Warmingupdefining/Defininganewwarmingquerydefinedwarmingqueries,retrieving/Retrievingthedefinedwarmingqueriesdeleting/Deletingawarmingquerywarmingupfunctionality,disabling/Disablingthewarmingupfunctionality
wildcardquery/ThewildcardqueryWindows
Elasticsearch,configuringassystemservice/ElasticsearchasasystemserviceonWindows
WordNetURL/UsingWordNetsynonyms
www.EBooksWorld.ir
ZZendiscovery
about/Zendiscoverymasterelectionconfiguration/Masterelectionconfigurationunicast,configuring/Configuringunicastfaultdetectionpingsettings/Faultdetectionpingsettingsclusterstateupdatescontrol/Clusterstateupdatescontrolmasterunavailability,dealingwith/Dealingwithmasterunavailability
www.EBooksWorld.ir