conf2014_searchheadclustering
Post on 14-Jul-2015
692 Views
Preview:
TRANSCRIPT
Copyright © 2014 Splunk Inc.
Mustafa Ahamed Director, Product Management
Eric Woo Senior SoEware Engineer
Anirban Rahut Senior SoEware Engineer
What’s New In Search Head Clustering
Disclaimer
2
During the course of this presentaLon, we may make forward-‐looking statements regarding future events or the expected performance of the company. We cauLon you that such statements reflect our current expectaLons and
esLmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements,
please review our filings with the SEC. The forward-‐looking statements made in the this presentaLon are being made as of the Lme and date of its live presentaLon. If reviewed aEer its live presentaLon, this presentaLon may not contain current or accurate informaLon. We do not assume any obligaLon to update any forward-‐looking statements we may make. In addiLon, any informaLon about our roadmap outlines our general product direcLon and is subject to change at any Lme without noLce. It is for informaLonal purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligaLon either to develop the features or funcLonality described or to
include any such feature or funcLonality in a future release.
Agenda
! What is Search Head Clustering? ! Business Benefits of Search Head Clustering ! SHC ConfiguraLon / ReplicaLon ! App Deployment ! Tips and Tricks
– MigraLon
! Q&A
3
Search Head Clustering
Ability to group search heads into a cluster in order to provide Highly Available and Scalable search services
4
MISSION CRITICAL ENTERPRISE
5
Horizontal Scaling
Always-‐on Search Services
Consistent User Experience
Easy to add / manage
premium contents (apps)
Business Benefits of SHC
SHP vs. SHC SHC SHP
• Available since v4.2 • Sharing configuraLons through NFS • Single point of failure • Performance issues
• No NFS • ReplicaLon using local storage • Commodity hardware
6
NFS
1. No Single Point of Failures
2. “One ConfiguraLon” across SH
3. Horizontal Scaling
7
1. Dynamic Captain 2. AutomaLc Config
replicaLon across SH 3. Ability to add/remove
nodes on running cluster
Design Goals ImplementaLon
SHC – How Does it Work?
8
1 Search Head gets the peer list from Cluster Master 1 Search Head gets the peer list from Cluster Master 1 Search Head gets the peer list from Cluster Master
1. Group search heads into a cluster 2. A captain gets elected dynamically 3. User created reports/dashboards automaLcally
replicated to other search heads
1 2
3
Anatomy & Cluster Bring up
Search Head Cluster Bring Up
10
captain
config-‐log {s1,s2, ..., sn}
• Bootstrap captain • Bring-‐up members • Captain establishes authority • Members join/register • CLI based cluster scale/shrink
...
members
Job Scheduling
Use Case
12
! Scale search capacity ! Enable more reports, dashboards, alerts ! Load balance user sessions (onboarding)
• Captain is job scheduler • Eliminates job-‐server need • Load-‐based heurisLc
Job Scheduling OrchestraLon
13
captain
scheduler
... search 1
search 2
LOAD
SUCC
FAIL
load balancer
search 3
Details
14
! Captain updates RA/DM summaries on indexers. ! Scheduler limits honored across the cluster ! Real Lme scheduled searches run one instance across cluster ! Auto-‐failover – New captain becomes scheduler ! captain_is_adhoc_searchhead knob to reduce captain load
Alerts & Suppression
15
! Alerts fired when results of search meet alerLng criteria – Historical Searches – within 10 seconds aEer job completes – RealTime searches – ongoing basis
! Captain merges and maintains global view of alerts ! Suppression informaLon centralized by the captain ! Merged Alerts/Suppression sent back to members
High Availability of Search Results
Search Results primer
17
Search -‐ HEAD indexers
stream results
reduce map
Other Names 1. search results 2. search arLfact 3. dispatch directory 4. SID
$SPLUNK_HOME/var/run/splunk/dispatch/scheduler__admin__search__mysearch_at_1410708600_345
sourcetype = access_combined | stats count by clienLp
Dispatch dir needs to be replicated to mulCple nodes to tolerate node failures
ArLfact ReplicaLon
18
... succ
succ
• Captain orchestrates replicaLon • Default replicaLon_factor=3 • Success failure ACK’d to captain • Asynch Replicate on Proxy • ReplicaLon policy enforced by fixups replica-‐1
replica-‐2
succ captain
replicate
orig
ArLfact Proxy-‐ing
19
! ReplicaLon Guarantees HA&DR but... ! SID not available on all nodes *locally* ! RealTime searches are not replicated ! We use proxy-‐ing to fill these gaps ! Proxying on REST request
captain over HB
locaLon log
r1
proxy
orig
AuthenCcaCon is cluster aware!!
HB = Heartbeat r2
async replicate
Adhoc Search Management
20
! Adhoc search -‐ interacLve search run from a user session ! Adhoc searches not replicated ! Captain, however will have global knowledge of all searches ! GET services/search/jobs will return the global list of searches ! You can proxy and access adhoc searches from any node
Reaping of Search ArLfacts
21
! Reaping – DeleLon of search results when TTL (Lme to live) expires ! Search ArLfacts reaped from the origin node ! Captain orchestrates reaps of the replicas
Auto Failover
HA & Auto-‐Failover
23
Design Goals 1. No Single Point of Failure 2. ConLnuous UpLme 3. Consistent User Experience
ImplementaLon 1. Dynamic Captain elecLon 2. Auto Failover 3. Proxying for consistent view
Dynamic Captain
24
! RaE Consensus Protocol from Stanford – Diego Ongaro & John Osterhout – Acknowledge Diego Ongaro for help!
! SHC uses RAFT for LE and Auto Failover
RV = Request Vote LE = Leader ElecLon SHC = Search Head Clustering
S4 S2
S5
S3
S1
captain
new captain
Auto-‐Failover
25
new captain
...
members
old captain
arLfacts running jobs alerts, etc search load
scheduler
Fixups
ConfiguraLon Management
ConfiguraLon Files ! Custom user content
– Reports – Dashboards
! Search-‐Lme knowledge – Field extracLons – Event types – Macros
! System configuraLons – Inputs, forwarding, authenLcaLon
Goal
28
! Consistent user experience across all search heads ! Changes made on one member are reflected on all members
ConfiguraLon Changes
29
! Users customize search and UI configuraLons via UI/CLI/REST – Save report – Add panel to dashboards – Create field extracLon
! Administrators modify system configuraLons – Configure forwarding – Deploy centralized authenLcaLon (e.g. LDAP) – Install enLrely new app or hand-‐edited configuraLon
Search and UI ConfiguraLons
30
! Changes to search and UI configuraLons are replicated across the search head cluster automaLcally
! Goal: eventual consistency
ConfiguraLon ReplicaLon
31
my_dashboard.xml
C
Concurrent Changes
32
C
Custom App Content
33
! App devs may "opt-‐in" their custom configuraLons for replicaLon under search head clustering
! Example server.conf from an app would look like: [shclustering]
conf_replicaLon_include.my_custom_file = true
System ConfiguraLons
34
! Recall: only changes to search and UI configuraLons are replicated across the search head cluster automaLcally
! hanges to system configuraLons are not replicated automaLcally because of their high potenLal impact
! How are system configuraLons kept consistent then?
! Deployer: a single, well-‐controlled instance outside of the cluster ! ConfiguraLons should be tested on dev/QA instances prior to deploy
D
ConfiguraLon Deployment
35
UI
36
Tips & Tricks
Best PracLces
! Deployer Instance – Can piggyback Cluster Master or Deployment Server – RecommendaLon is to run Deployer on separate instance
! Run CLI to get status about SHC – ./splunk show shcluster-‐status
38
Summary
Key Benefits of SHC
40
Horizontal Scaling
Always-‐on Search Services
Consistent User Experience
Easy to add / manage
premium contents (apps)
Q & A
THANK YOU
top related