hunk*6.1**...aboutme*! principal*architect 7+years*at splunk* mainly*involved*in*search*

50
Copyright © 2014 Splunk Inc. Ledion Bi<ncka Principal Architect, Splunk Hunk 6.1

Upload: others

Post on 07-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Copyright  ©  2014  Splunk  Inc.  

Ledion  Bi<ncka  Principal  Architect,  Splunk  

Hunk  6.1    

Page 2: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Disclaimer  

2  

During  the  course  of  this  presenta<on,  we  may  make  forward-­‐looking  statements  regarding  future  events  or  the  expected  performance  of  the  company.  We  cau<on  you  that  such  statements  reflect  our  current  expecta<ons  and  

es<mates  based  on  factors  currently  known  to  us  and  that  actual  events  or  results  could  differ  materially.  For  important  factors  that  may  cause  actual  results  to  differ  from  those  contained  in  our  forward-­‐looking  statements,  

please  review  our  filings  with  the  SEC.  The  forward-­‐looking  statements  made  in  the  this  presenta<on  are  being  made  as  of  the  <me  and  date  of  its  live  presenta<on.  If  reviewed  aSer  its  live  presenta<on,  this  presenta<on  may  not  contain  current  or  accurate  informa<on.  We  do  not  assume  any  obliga<on  to  update  any  forward-­‐looking  statements  we  may  make.  In  addi<on,  any  informa<on  about  our  roadmap  outlines  our  general  product  direc<on  and  is  subject  to  change  at  any  <me  without  no<ce.  It  is  for  informa<onal  purposes  only,  and  shall  not  be  incorporated  into  any  contract  or  other  commitment.  Splunk  undertakes  no  obliga<on  either  to  develop  the  features  or  func<onality  described  or  to  

include  any  such  feature  or  func<onality  in  a  future  release.  

Page 3: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

About  Me  

!   Principal  Architect  !   7+  years  at  Splunk  !   Mainly  involved  in  search  <me  stuff:    

–  Hunk  –  Key-­‐value  pair  extrac<on    –  Scheduler  &  Aler<ng  –  Transac<ons,  even\ypes  ,  tags  etc    –  MySQLConnect,  HadoopConnect  

!   @ledbit  

3  

Page 4: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Agenda  

!   The  problem  !   Hunk  architecture  !   Virtual  indexes    !   Computa<on  models  !   What’s  new  in  6.1    

4  

Page 5: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Got  Problem?  

Page 6: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

The  Problem  

6  

!  Easy  to  get  data  into  Hadoop  !  Large  amounts  of  data  already  in  Hadoop  !  Hard  to  get  value  out    

Page 7: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Data  à  Value  (Today)  

7  

Collect   Prepare   Ask  

Page 8: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Data  à  Value  (Ideally)  

8  

Collect   Prepare   Ask  

Page 9: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

What  If?  

9  

Hadoop  +  Splunk  =  

Page 10: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

10  

Hadoop  +  Splunk  =  Hunk  

Page 11: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Solu<on  Goals  

!   A  viable  solu<on  must:  –  Process  the  data  in  place    –  Maintain  support  for  Splunk  Processing  Language  (SPL)  –  True  schema  on  read    –  Query  previews  –  Ease  of  setup  &  use  

11  

Page 12: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Support  SPL  

!   Naturally  suitable  for  MapReduce    !   Reduces  adop<on  <me    !   Challenge:  Hadoop  “apps”  wri\en  in  Java  &  all  SPL  code  is  in  C++  !   Por<ng  SPL  to  Java  would  be  a  daun<ng  task  !   Reuse  the  C++  code  somehow  

–  Use  “splunkd”  (the  binary)  to  process  the  data    –  JNI  is  not  easy  nor  stable    

12  

Page 13: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Schema  on  Read  

!   Apply  Splunk’s  index-­‐<me  schema  at  search  <me  –  Event  breaking,  <me  stamping  etc  

!   Anything  else  would  be  bri\le  &  maintenance  nightmare  !   Extremely  flexible    !   Run<me  overhead  (manpower  >>$  computa<on)  !   Challenge:  Hadoop  “apps”  wri\en  in  Java  &  all  index-­‐<me  schema  logic  is  implemented  in  C++  

13  

Page 14: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Intermediate  Results  

!   No  one  likes  to  stare  at  a  blank  screen!  !   Challenge:  Hadoop  is  designed  for  batch-­‐like  jobs  

14  

Page 15: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Ease  of  Setup  &  Use  

!   Users  should  just  specify:    –  Hadoop  cluster  they  want  to  use  –  Data  within  the  cluster  they  want  to  process  

!   Immediately  be  able  to  explore  &  analyze  their  data  

15  

Page 16: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Architecture  

Page 17: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Hunk  Server  

64-­‐bit  Linux  OS  

splunkweb  •  Web  and  Applica<on  server  •  Python,  AJAX,  CSS,  XSLT,  XML  

•  Search  Head  •  Virtual  Indexes  •  C++,  Web  Services  

REST  API   COMMAND  LINE  

Explore   Analyze   Visualize   Dashboards   Share  

ODBC  (beta)  

splunkd  

Hadoop  interface  •  Hadoop  client  libraries  •  JAVA  

Page 18: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

64-­‐bit  Linux  OS  

splunkweb  •  Web  and  Applica<on  server  •  Python,  AJAX,  CSS,  XSLT,  XML  

•  Search  Head  •  Virtual  Indexes  •  C++,  Web  Services  

REST  API   COMMAND  LINE  

Explore   Analyze   Visualize   Dashboards   Share  

ODBC  (beta)  

splunkd  

Hadoop  interface  •  Hadoop  client  libraries  •  JAVA  

Connec<ng  to  Hadoop  

Connect  to  Apache  HDFS  and  MapReduce    or  your  choice  of  Hadoop  distribu<on  

Hadoop  Cluster  1  

Page 19: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

64-­‐bit  Linux  OS  

splunkweb  •  Web  and  Applica<on  server  •  Python,  AJAX,  CSS,  XSLT,  XML  

•  Search  Head  •  Virtual  Indexes  •  C++,  Web  Services  

REST  API   COMMAND  LINE  

Explore   Analyze   Visualize   Dashboards   Share  

ODBC  (beta)  

splunkd  

Hadoop  interface  •  Hadoop  client  libraries  •  JAVA  

Mul<ple  Hadoop  Clusters  

19  

Connect  Hunk  to  mul<ple  Hadoop  clusters  

Hadoop  Cluster  3  

Hadoop  Cluster  2  

Hadoop  Cluster  1  

Page 20: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Deployment  Overview  (Advanced)  

20  

Cluster  1  

Cluster  2    

Cluster  3    

….  1  

n  •  Load  balance  users  across  •  Hunk  Search  Head  pooling/cluster  •  Mul<ple  Hadoop  cluster  

LB  

Page 21: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Virtual  Indexes  

Page 22: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

22  

search  index=main  |  top  user  |  fields  -­‐  percent    

SPL  Overview    

Page 23: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

SPL  Overview    

23  

!   Search  Processing  Language  =  SPL  !   Mo<vated  by  Unix  shell  pipes    !   First  command  is  always  responsible  for  event  retrieval  –  Generally,  events  are  retrieved  from  Splunk’s  naDve  indexes  

!   Follow-­‐on  commands  transform  events  to  final  results    

Page 24: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Na<ve  Indexes  

24  

Na<ve  Serve  as  data  containers  

Access  control  

Read/writes  

Data  retenDon  policies  

OpDmized  for  keyword  searches  

OpDmized  for  Dme  range  searches  

Page 25: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Na<ve  Indexes  vs.  Virtual  Indexes  

25  

Na<ve   Virtual  Serve  as  data  containers   Serve  as  data  containers  

Access  control   Access  control  

Read/writes   Read  only    

Data  retenDon  policies   –    

OpDmized  for  keyword  searches   –    

OpDmized  for  Dme  range  searches   Available  via  regex/pruning  

Page 26: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Hunk’s  Core  Technology  

Virtual  Indexes  (VIX)    

External  Result  Providers  (ERPs)  

26  

Page 27: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

External  Result  Providers  

!   Search  <me  helper  process  responsible  for:  –  Access  external  system  

e.g.  Hadoop,  Cassandra,  RDBMs  etc  –  Translate/interpret  search  request  –  Push  computa<on  to  external  system    

27  

Page 28: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

External  Result  Providers  (ERPs)  

28  

Search  process  

Hunk  Search  Head  >  

ERP  process  

ERP  process  

ERP  process  

Cluster  1  

Cluster  2    

Cluster  3    

For  each  Hadoop  cluster  (or  external  system)  the  search  process  spawns  an  ERP  process  which  is  responsible  for  execu<ng  the  (remote  part  of  the)  search  on  that  system.  

Page 29: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Computa<on  Models  

Page 30: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Move  Data  to  Computa<on  (Streaming)    

!   Move  data  from  HDFS  to  Search  Head  !   Process  it  in  a  streaming  fashion  !   Visualize  the  results    !   Problem?  

30  

Page 31: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Move  Computa<on  to  Data  (Repor<ng)  

!   Create  and  start  a  MapReduce  job  to  do  the  processing  !   Monitor  MR  job  &  collect  its  results    !   Merge  the  results  and  visualize    !   Problem?  

31  

Page 32: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Search  Modes  

32  

Streaming   Repor<ng  Pull  data  from  HDFS  to  SH  for  processing  

Push  compute  down  to  DN/TT  and  consume  results  

       

Low  Latency   High  Latency  

Low  Throughput   High  Throughput  

Low  Latency  =  InteracDvity  =  VALUE  High  Throughput  =  Process  larger  datasets  =  VALUE  

Page 33: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Search  Modes  

33  

Streaming   Repor<ng   Mixed  Mode  Pull  data  from  HDFS  to  SH  for  processing  

Push  compute  down  to  DN/TT  and  consume  results  

Start  both  Streaming  and  Repor<ng  modes.  Show  Streaming  results  un<l  Repor<ng  starts  to  complete  

Low  Latency   High  Latency   Low  Latency  

Low  Throughput   High  Throughput   High  Throughput  

Low  Latency  =  InteracDvity  =  VALUE  High  Throughput  =  Process  larger  datasets  =  VALUE  

Page 34: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Mixed  Mode  

!   Use  both  computa<on  models  concurrently  

34  

Page 35: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Mixed  Mode  

!   Use  both  computa<on  models  concurrently  

35  

Time  

MR  

Stream  

Page 36: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Mixed  Mode  

!   Use  both  computa<on  models  concurrently  

36  

Time  

MR  

Stream  

Page 37: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Mixed  Mode  

!   Use  both  computa<on  models  concurrently  

37  

Time  

MR  

Stream  

preview  

MR  job  submi\ed  

Page 38: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Mixed  Mode  

!   Use  both  computa<on  models  concurrently  

38  

Time  

MR  

Stream  

preview  

MR  job  starts  

Page 39: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Mixed  Mode  

!   Use  both  computa<on  models  concurrently  

39  

Time  

MR  

Stream  

MR  tasks  start  to  complete    

preview  

Page 40: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Mixed  Mode  

!   Use  both  computa<on  models  concurrently  

40  

Time  

MR  

Stream  Switch  over  

<me    

preview  

preview  

Page 41: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Mixed  Mode  

!   Use  both  computa<on  models  concurrently  

41  

Time  

MR  

Stream  Switch  over  

<me    

preview  

preview  

Page 42: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Mixed  Mode  

!   Use  both  computa<on  models  concurrently  

42  

Time  

MR  

Stream  Switch  over  

<me    

preview  

preview  

results  …….  

Page 43: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

New  in  6.1  

Page 44: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

More  Data  …  !   Wider  support  for  Hadoop  na<ve  data  formats  

44  

Format   DescripDon   Support  Sequence     Key  value  store    Yes  Avro     Complex  objects,  with  embedded  

schema    Yes  

RC  /  ORC   Columnar,    commonly  used  by  Hive    Yes  Parquet   Columnar,  commonly  used  by  Impala    Yes  Custom   Any  other  Hadoop  file  format    Yes  

Page 45: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Faster  …  

45  

•  Accelerate  searches  on  virtual  indexes  served  by  the  Hadoop  results  provider  by  reusing  Mapper  results    •  This  allows  Hunk  to  accelerate  saved  searches  rather  than  re-­‐compu<ng  the  same  search        •  This  feature  is  iden<cal  to  Report  Accelera<on  on  Splunk  Enterprise.    

Report  AcceleraDon  

Page 46: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Secure  …  

46  

Pass-­‐through  authen<ca<on  

•  Use  LDAP/AD  or  stand-­‐alone  authen<ca<on  

•  Provide  role-­‐based  security  for  Hadoop  clusters  

•  Access  Hadoop  resources  under  security  and  compliance  

•  Integrates  with  Kerberos  for  Hadoop  security  

Page 47: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Streaming  Resource  Libraries  

•  Developers  stream  data  for  rapid  explora<on  and  visualiza<on  

•  Accumulo/Sqrrl  and  MongoDB  are  available  on  apps.splunk.com    

47  

Open  …  

Page 48: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Summary  of  6.1  

More  data  …          Faster  …                    Secure  …                    Open  …  

48  

Page 49: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Coming  Up  in  6.2  

Page 50: Hunk*6.1**...AboutMe*! Principal*Architect 7+years*at Splunk* Mainly*involved*in*search*

Helpful  resources  

!   Download  –  h\p://www.splunk.com/hunk  

!   Help  &  Docs  –  h\p://docs.splunk.com/Documenta<on/Hunk/latest/Hunk/MeetHunk  

!   Community  resource  –  h\p://answers.splunk.com  

50