splunk talk at the aws big data meetup in palo alto on nov 17 2015

33
Data Through Splunk 1 Ledion Bi6ncka ([email protected]) Alex Batsakis ([email protected]) Architects

Upload: stevemcpherson

Post on 09-Jan-2017

284 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Data  Through  Splunk  

1  

Ledion  Bi6ncka        ([email protected])  Alex  Batsakis      ([email protected])    

Architects    

Page 2: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Spelunking:    

Splunking:  

to  explore  underground  caves  

to  explore  machine  data    

Splunk  

Make  machine  data  accessible,  usable    and  valuable  to  everyone.    

Page 3: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

What  Does  Machine  Data  Look  Like?  

3  

Sources  

Twi2er  

Care  IVR  

Middleware    Error  

Order  Processing  

Page 4: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Machine  Data  Contains  Cri6cal  Insights  

4  

Customer  ID   Order  ID  

Customer’s  Tweet    

Time  Wai6ng  On  Hold  

TwiMer  ID  

Product  ID  

Company’s  TwiMer  ID  

Sources  

Twi2er  

Care  IVR  

Middleware    Error  

Order  Processing  

Customer  ID  Order  ID  

Customer  ID  

Page 5: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Machine  Data  Contains  Cri6cal  Insights  

5  

Order  ID  

Customer’s  Tweet    

Time  Wai6ng  On  Hold  

Product  ID  

Company’s  TwiMer  ID  

Sources  

Twi2er  

Care  IVR  

Middleware    Error  

Order  Processing  

Order  ID  

Customer  ID  

TwiMer  ID  

Customer  ID  

Customer  ID  

Page 6: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Web  Services  

Search,  Inves6gate  and  Explore  Your  Data  

6  

Find  and  fix  issues  and  incidents  drama6cally  faster  across  your  organiza6on  

Energy  

Manufacturing  

Shipping   RFID   Web  Services  Developers  

App  Support  Telecoms  

Networking  

Desktops  

Servers   Security  

Databases/  DWH  

Storage  Messaging  

Online  Shopping  Carts  

Clickstream  

GPS/Cellular  Social  Media  

Page 7: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Search  and  Inves6gate  

Proac6ve  Monitoring  and  

Aler6ng  

Opera6onal  Visibility  

Real-­‐6me    Business  Insight  

Turning  Machine  Data  into  Opera6onal  Intelligence  

7  

Proac6ve  

Reac6ve  

Page 8: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Let’s  drill  down  ….  

8  

Page 9: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Massive  Linear  Scalability  to  100s  of  TBs/Day  

9  

Auto  load-­‐balanced  forwarding  to  as  many  Splunk  Indexers  as  you  need  to  index  TB/day  

Offload  search  load  to  Splunk  Search  Heads    

Page 10: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

How  data  moves  thru  Splunk  

10  

Page 11: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Consider  this  chunk  of  data  from  a  log  file:  /var/log/secure.log  ...  2013/07/01T14:30:24.234-­‐0400  Brian  pretends  to  be  from  South  Africa  2013/07/01T14:31:24.234-­‐0400  Sean  is  originally  Canadian  2013/07/01T14:30:50.234-­‐0400  Brian  spends  his  time  in:    

 -­‐  Kentucky  with  phone  number  345.567.3456    -­‐  New  Jersey    

2013/07/01T14:32:24.234-­‐0400  Matty  has  lived  in  the  following  cities:    -­‐  Tijuana:  345  Main  St.        -­‐  Saskatchewan:  3  One  Lane    -­‐  Colombia:  567  White  line  Dr.  Bogota  

2013/07/01T14:33:24.234-­‐0400  Cesar  prefers  Burbon  Manhattans  over  beer  2013/07/01T14:33:24.234-­‐0400  Matty  loves  GiGi  Mellow  Burgers  2013/07/01T14:33:24.234-­‐0400  Sean  is  not  the  only  one  to  not  like  them  ...  

11  

Page 12: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Host   my_host  

Index   my_index  

_raw   2013/07/01T14:30:24.234-­‐0400  Brian  pretends  to  be  from  South  Africa  2013/07/01T14:31:24.234-­‐0400  Sean  is  originally  Canadian  2013/07/01T14:30:50.234-­‐0400  Brian  spends  his  time  in:  ...  

UTF-­‐8   Line  Broken              

_conf   <key  here>  

Pipeline  Data    

Page 13: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Pipelines/Processors  

Parsing  Queue  

Agg  Queue  

Typing  Queue  

Index  Queue  

uk8  

header  

aggregator  

regex  replacement  

annotator  

tcp  out  

syslog  out  

indexer  

Parsing  Pipeline  

Merging  Pipeline  

Typing  Pipeline  

Index  Pipeline  

linebreaker  

TCP/UDP  pipeline  

Tailing  

FIFO  pipeline  

FSChange  

Exec  pipeline  

Page 14: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Queue  

pData   pData   pData   pData  

Queue  

Thread  Thread  

Process  

Process  Remove  Insert  

ü  Queue  size  bounded  by  memory    ü  Variable  size  Pipeline  Data  

Page 15: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Persistent  Queue  

Splunk  Host  Internal  Queues  Full  

pData   pData   Tcpout  Q  Input  Q  

Persistent  Q   A  Full  

Network  

Much  Bigger  Queue    

Network  

Page 16: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Indexing  

Parsing  Queue  

Agg  Queue  

Typing  Queue  

Index  Queue  

uk8  

header  

aggregator  

regex  replacement  

annotator  

tcp  out  

syslog  out  

indexer  

Parsing  Pipeline  

Merging  Pipeline  

Typing  Pipeline  

Index  Pipeline  

linebreaker  

TCP/UDP  pipeline  

Tailing  

FIFO  pipeline  

FSChange  

Exec  pipeline  

Page 17: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

What’s  an  index  

Collec6ve  term  used  to  describe  rawdata  and  associated  tsidx  &  metadata  files.  

17  

Page 18: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Inside  an  index  

18  

[09:31:39]  [1065]::  lbi6ncka@lbi6ncka:  /opt/splunk/var/lib/splunk/_internaldb/      $  ls  -­‐l  total  0  drwx-­‐-­‐-­‐-­‐-­‐-­‐      2  lbi6ncka    admin      68  Feb    6  12:57  colddb  drwx-­‐-­‐-­‐-­‐-­‐-­‐    17  lbi6ncka    admin    578  Jul    1  09:31  db  drwx-­‐-­‐-­‐-­‐-­‐-­‐    13  lbi6ncka    admin    442  Jun  27  16:36  summary  drwx-­‐-­‐-­‐-­‐-­‐-­‐      2  lbi6ncka    admin      68  Aug  24    2012  thaweddb  

Index  name  

Bucket  loca6ons  

Page 19: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Inside  hot  &  warm  path  

19  

[10:20:00]  [1074]::  lbi6ncka@lbi6ncka:  /opt/splunk/var/lib/splunk/_internaldb/db/      $  ll  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      1.3K  Jun  27  13:50  .bucketManifest  drwx-­‐-­‐-­‐-­‐-­‐-­‐    17  lbi6ncka    admin      578B  Jul    1  10:19  .  drwx-­‐-­‐x-­‐-­‐x    17  lbi6ncka    admin      578B  Jun  26  12:45  db_1372264972_1371998026_159  drwx-­‐-­‐x-­‐-­‐x    16  lbi6ncka    admin      544B  Jun  18  08:20  db_1371225002_1370897127_156  drwx-­‐-­‐x-­‐-­‐x    16  lbi6ncka    admin      544B  Jun  26  12:50  db_1371998025_1371214200_158  drwx-­‐-­‐x-­‐-­‐x    14  lbi6ncka    admin      476B  Jun  26  12:50  db_1372265194_1372264972_160  drwx-­‐-­‐x-­‐-­‐x    14  lbi6ncka    admin      476B  Jul    1  10:19  hot_v1_161  drwx-­‐-­‐-­‐-­‐-­‐-­‐      6  lbi6ncka    admin      204B  Nov  12    2012  ..  drwx-­‐-­‐x-­‐-­‐x      2  lbi6ncka    admin        68B  Aug  24    2012  GlobalMetaData  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin        10B  Aug  24    2012  Crea6onTime  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin          0B  Dec  21    2012  .db_1356066789_1355865285_43.rbsen6nel  

No6ce  hot  &  warm  buckets    Bucket  names:  db_<lt>_<et>_<id>  

Page 20: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Inside  a  bucket  

20  

[10:31:32]  [1092]::  lbi6ncka@lbi6ncka:  /opt/splunk/var/lib/splunk/_internaldb/db/db_1371998025_1371214200_158/      $  ll  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin        27M  Jun  21  16:49  1371847782-­‐1371214200-­‐1941140693112088843.tsidx  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      7.1M  Jun  26  12:43  1371998025-­‐1371847783-­‐907852835360656754.tsidx  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      2.5M  Jun  26  12:43  merged_lexicon.lex  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      459K  Jun  26  12:43  bloomfilter  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      1.3K  Jun  23  10:33  Sources.data  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      615B  Jun  23  10:33  SourceTypes.data  drwx-­‐-­‐-­‐-­‐-­‐-­‐    17  lbi6ncka    admin      578B  Jul    1  10:31  ..  drwx-­‐-­‐x-­‐-­‐x    16  lbi6ncka    admin      544B  Jun  26  12:50  .  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      451B  Jun  23  10:31  Strings.data  drwx-­‐-­‐-­‐-­‐-­‐-­‐      4  lbi6ncka    admin      136B  Jun  26  12:42  rawdata  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin      116B  Jun  23  10:33  Hosts.data  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin        76B  Jun  23  10:33  splunk-­‐autogen-­‐params.dat  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin        52B  Jun  26  12:50  bucket_info.csv  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin        49B  Jun  26  12:43  op6mize.result  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin        10B  Jun  26  12:43  .rawSize  -­‐rw-­‐-­‐-­‐-­‐-­‐-­‐-­‐      1  lbi6ncka    admin          8B  Jun  26  12:43  .sizeManifest4.1  

Page 21: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Metadata  &  Bloomfilters    *.data    –  metadata  about  sources,  sourcetypes  and  hosts  of  the  events  contained  in  each  

bucket  

Bloomfilters  –  Efficient  data  structure  that  authorita6vely  rules  out  buckets  

ê  i.e.  tells  you  with  100%  certainty  that  a  querying  term  is  NOT  in  present  in  a  bucket  –  By  default  consulted  by  every  search  

21  

Page 22: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Rawdata  (not  raw  data)    Collec6on  of  compressed  (gzipped)  blocks,  called  slices,    –  Concatenated  together  in  a  rawdata/journal.gz  –  Think  ”cat  chunkA.gz  chunkB.gz  ...chunkN.gz  >  journal.gz”).    

  Slices  contain  the  actual  raw  events.      Pool  of  concatenated  slices  allows  be  seeked  into    –  Loca6ons  offsets  are  pointed  to  by  the  values  array  pointers  in  tsidx.    

  Such  organiza6on  allows  us  to  zoom  in  to  the  right  slice    –  reduces  the  amount  of  decompression  6me  &  volume  compared  to  having  a  

single,  massive  rawdata  file.        

22  

Page 23: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

TSIDX    Time  series  index  (Inverted  index  op6mized  for  6me)    Lexicon:  –  Keywords  within  the  specified  6me  range  –  Pos6ngs  list  array    

  Values  array:  –  Structure  that  contains  pos6ng  values,  seek  address,  _6me  etc.    –  Seek  address  points  to  offsets  in  rawdata  

  Time  is  of  transcendent  importance  in  Splunk,    –  tsidx  filenames  expose  et  and  lt    –  Values  arrays  arranged  in  6me  order  as  well    

23  

Page 24: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Lexicon  

24  

 2013/07/01T14:30:24.234-­‐0400  Brian  pretends  to  be  from  South  Africa  2013/07/01T14:31:24.234-­‐0400  Sean  is  originally  Canadian  2013/07/01T14:30:50.234-­‐0400  Brian  spends  his  time  in:    

 -­‐  Kentucky  with  phone  number  345.567.3456    -­‐  New  Jersey    

2013/07/01T14:32:24.234-­‐0400  Matty  has  lived  in  the  following  cities:    -­‐  Tijuana:  345  Main  St.        -­‐  Saskatchewan:  3  One  Lane    -­‐  Colombia:  567  White  line  Dr.  Bogota  

2013/07/01T14:33:24.234-­‐0400  Cesar  prefers  Burbon  Manhattans  over  beer  2013/07/01T14:33:24.234-­‐0400  Matty  loves  GiGi  Mellow  Burgers  2013/07/01T14:33:24.234-­‐0400  Sean  is  not  the  only  one  to  not  like  them  

Term   Posbng  List  

3   4  

345   3,4  

…   …  

Africa   0  

Brian   0,2  

Bogota   4  

…   …  

MaMy   5,6  

Tijuana   4  

Page 25: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Values  Array  

25  

 2013/07/01T14:30:24.234-­‐0400  Brian  pretends  to  be  from  South  Africa      2013/07/01T14:31:24.234-­‐0400  Sean  is  originally  Canadian      2013/07/01T14:30:50.234-­‐0400  Brian  spends  his  time  in:    

 -­‐  Kentucky  with  phone  number  345.567.3456    -­‐  New  Jersey    

 2013/07/01T14:32:24.234-­‐0400  Matty  has  lived  in  the  following  cities:  

 -­‐  Tijuana:  345  Main  St.        -­‐  Saskatchewan:  3  One  Lane    -­‐  Colombia:  567  White  line  Dr.  Bogota  

2013/07/01T14:33:24.234-­‐0400  Cesar  prefers  Burbon  Manhattans  over  beer    2013/07/01T14:33:24.234-­‐0400  Matty  loves  GiGi  Mellow  Burgers      2013/07/01T14:33:24.234-­‐0400  Sean  is  not  the  only  one  to  not  like  them  

Posbng   Seek  addr   _bme   host   …  

0   130   1372689024   my_host   …  

1   150   1372689084   my_host   …  

2   190   1372689050   my_host   …  

3   389   1372689050   my_host   …  

4   589   1372689050   my_host   …  

5   800   1372689050   my_host   …  

6   1399   1372689050   my_host   …  

…   …  

…   …  

*all  values  for  illustra6on  purposes.  Not  necessarily  accurate  

Page 26: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Tsidx  merging     Many  small  tsidx  files  due  to  data  streaming    Searching  is  inefficient  when  going  against  many  tsidx  files    splunk-­‐op6mize  –  Merging  of  small  tsidx  files  into  a  larger  ones  –  Consolida6on  of  lexicons  and  pos6ng  list  

 

26  

Page 27: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Puzng  it  together  

27  

IDX  1  IDX  2  

IDX  3  

Cold  Path  

Thawed  Path  

Rawdata  

TSIDX  hot_v1_100  

hot_v1_101  

db_lt_et_80  

db_lt_et_101  

*.data  *.tsidx  rawdata  

db_lt_et_70  

apple  

beer  

LEXICON  

POSTING  

“apple  pie  and  ice  cream  is  delicious”  

“an  apple  a  day  keeps  doctor  away”  

150  100  

et  et  

lt  lt  

it  it  

apple   beer   coke  ice   java   …  

Home  Path  

Source/Sourcetype/Host  Metadata  

1  source  :  :  /my/log  2  source:  :  /blah  

cream  

Page 28: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Bucket  Lifecycle  

28  

Events  

[Too  Many  Warms]  [Hot  Bucket  is  Full]  

[Out  of  Space  or  Bucket  is  Old]  

[Explicit  User  Ac6on]  

$  Thawed  Path  

$  Home  Path   $  Cold  Path  [Cheaper  Storage]  

$  Frozen  Path  or  Deleted  

Page 29: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

How  do  we  search?    Consult  the  lexicon  and  combine  the  pos6ng  lists    –  brian  OR  tijuana  =>  (0,  2)  OR  (4)  =  (0,  2,  4)    Use  values  array  to  get  seek  address,  _6me,  source  and  sourcetype  for  (0,  2,  4)    Use  the  seek  addresses  to  read  rawdata  in  offset  (130,  150,  190)    Send  “results”  to  the  search    

29  

Page 30: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Search  Model  Example  

sourcetype=syslog ERROR | top user | fields - percent

Fetch  events  from  disk,  apply  

schema  

Summarize  into  table  of  top  10  

users  

Remove  column  showing  

percentage  

Intermediate results

table

Intermediate

results table Intermediate

results table

Final results

table Disk  

Page 31: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

What  can  we  do  with  events?    It’s  not  just  search  …    SPL  =  Search  Processing  Language    –  Inspired  by  *nix  pipes    –  Schema  on  read    –  130+  search  commands  for  slicing  thru  data  

  Versa6le  visualiza6on  library      Scheduling  and  aler6ng      …  

31  

Page 32: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

LOB  Owners/  Execu6ves  

System  Administrator  

Opera6ons  Teams  

Security  Analysts  

IT    Execu6ves  

Applica6on  Developers   Auditors  

Website/Business  Analysts  

Customer  Support  

32  

IT  Opera6ons  Management   Web  Intelligence  

Business  Analy6cs  Applica6on  Management  

Security  and  Compliance  

Page 33: Splunk talk at the AWS Big Data Meetup in Palo Alto on Nov 17 2015

Take  it  for  a  spin  …    

hMp://www.splunk.com/download/    -­‐  Download    -­‐  Try  Splunk  Cloud  –  AWS    

WE’RE  HIRING  !!  (in  SF  &  valley)