optimize it infrastructure

59
October 15, 2015 Optimize your IT Infrastructure with Scalar, EMC and Splunk

Upload: scalar-decisions

Post on 14-Apr-2017

563 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Optimize IT Infrastructure

October 15, 2015

Optimize your IT Infrastructure with Scalar, EMC and Splunk

Page 2: Optimize IT Infrastructure

Scalar leads Canadian Business to the Next Generation of IT through

Innovation, Expertise & Service

Page 3: Optimize IT Infrastructure

3

DAVID WIEDASECK SR. Partner Sales Engineer

[email protected]

JEFFREY WIGGINS ETD SE Manager

[email protected]

MICHAEL TRAVES Solutions Architect

[email protected]

Page 4: Optimize IT Infrastructure

© 2015 Scalar Decisions Inc. Not for distribution outside of intended audience. 4

Scalar Client Solutions

Security

Context-Based Enterprise Security

Infrastructure

Integration of Emerging Technologies

Cloud

Hybrid Cloud Solutions

Page 5: Optimize IT Infrastructure

© 2015 Scalar Decisions Inc. Not for distribution outside of intended audience. 5

Splunk Analytics – Use Cases

Operational Intelligence

§  IT Operations: Utilization, Capacity Growth

§  Security: Fraud Detection, Real-time Detection of Threats, Forensics

§  Internet of Things (IoT): Sensor Data, Machine-to-Machine, Machine-Human Interactions

Page 6: Optimize IT Infrastructure

© 2015 Scalar Decisions Inc. Not for distribution outside of intended audience. 6

Consulting – Solution Design

§  Business Drivers

§  Alignment with IT

§  Stakeholders and Big Data Teams

§  (Data Scientists, Business Analysts, Marketing, IT, CxO, Dir.)

§  Sizing

§  Ingest Performance and Scalability, Search & Index

§  Infrastructure – Scale Out

§  Compute (Virtual, Physical)

§  Network (1/10/40GbE)

§  Storage (Hot/Warm and Cold/Frozen Tiers)

§  Data Security and Protection (Distributed or Consolidated)

Page 7: Optimize IT Infrastructure

© 2015 Scalar Decisions Inc. Not for distribution outside of intended audience. 7

Consulting – Deployment

§  Build

§  Pilot and Pre-production

§  Proof of Value

§  Integration with Big Data and Data Lake Initiatives

§  Validate

§  Performance and Scalability

§  Availability

§  Customize

§  Dashboards

§  Reporting and Alerting

Page 8: Optimize IT Infrastructure

8

Page 9: Optimize IT Infrastructure

We want to work with YOU

9

Page 10: Optimize IT Infrastructure

But why should you work with US?

10

Page 11: Optimize IT Infrastructure

© 2015 Scalar Decisions Inc. Not for distribution outside of intended audience. 11

Top Tier Technical Talent

§  Engineers average 15 years of experience

§  World-class experts from some of the leading organizations in the industry

§  Dedicated PMO, finance, sales and operations teams

Page 12: Optimize IT Infrastructure

Copyright  ©  2013  Splunk,  Inc.  

Splunk  Big  Data  Analy=cs  

Page 13: Optimize IT Infrastructure

Machine  Data  OR  Big  Data?  

Page 14: Optimize IT Infrastructure

AND VALUABLE

SPLUNK - MAKE MACHINE DATA ACCESSIBLE, USABLE

TO EVERYONE What  is  Machine  Data    hEps://youtu.be/3YEE3RfXVVA  

Page 15: Optimize IT Infrastructure

COLLECT  DATA  FROM  ANYWHERE  

SEARCH  AND  ANALYZE  EVERYTHING  

GAIN  REAL-­‐TIME  DATA  

INTELLIGENCE  

The  Power  of  Splunk  

15  

Page 16: Optimize IT Infrastructure

16  

Turning  Machine  Data  Into  Business  Value  Index  Untapped  Data:  Any  Source,  Type,  Volume  

Online  Services   Web  

Services  

Servers  Security   GPS  

Loca=on  

Storage  Desktops  

Networks  

Packaged  Applica=ons  

Custom  Applica=ons  Messaging  

Telecoms  Online  

Shopping  Cart  

Web  Clickstreams  

Databases  

Energy  Meters  

Call  Detail  Records  

Smartphones  and  Devices  

RFID  

On-­‐  Premises  

Private    Cloud  

Public    Cloud  

 Ask  Any  QuesQon  

ApplicaQon  Delivery  

Security,  Compliance  and  Fraud  

IT  OperaQons  

Business  AnalyQcs  

Industrial  Data  and  the  Internet  of  Things  

Page 17: Optimize IT Infrastructure

What  Does  Machine  Data  Look  Like?  Sources  

Order  Processing  

TwiTer  

Care  IVR  

Middleware    Error  

17  

Page 18: Optimize IT Infrastructure

Machine  Data  Contains  CriQcal  Insights  Customer  ID   Order  ID  

Customer’s  Tweet    

Time  Wai=ng  On  Hold  

TwiEer  ID  

Product  ID  

Company’s  TwiEer  ID  

Customer  ID  Order  ID  

Customer  ID  

Sources  

Order  Processing  

TwiTer  

Care  IVR  

Middleware    Error  

18  

Page 19: Optimize IT Infrastructure

Machine  Data  Contains  CriQcal  Insights  Order  ID  

Customer’s  Tweet    

Time  Wai=ng  On  Hold  

Product  ID  

Company’s  TwiEer  ID  

Order  ID  

Customer  ID  

TwiEer  ID  

Customer  ID  

Customer  ID  

Sources  

Order  Processing  

TwiTer  

Care  IVR  

Middleware    Error  

19  

Page 20: Optimize IT Infrastructure

SPLUNK TODAY  

20  

Mainframe Data

VMware

Platform for Machine Data

Exchange PCI Security

DB Connect Mobile Forwarders Syslog, TCP, Other

Sensors, Control Systems

600+ Ecosystem of Apps

Stream

Page 21: Optimize IT Infrastructure

Splunk  Use  Cases  

Page 22: Optimize IT Infrastructure

IT  Opera=ons  

API  SDKs   UI  

Server,  Storage,  Network  

Server  Virtualiza=on  

Opera=ng  Systems  

Custom    Applica=ons  

Business    Applica=ons  

Cloud  Services  

App  Performance  Monitoring  Ticke=ng/Other  

Web  Intelligence  

Mobile  Applica=ons  

Page 23: Optimize IT Infrastructure

Servers  

Storage  

Desktops  Email   Web  

Transac=on  Records  

Network  Flows  

DHCP/  DNS  

Hypervisor  Custom  Apps  

Physical  Access  

Badges  

Threat  Intelligence  

Mobile  

CMBD  

23  

Security  

Intrusion    Detec=on  

Firewall  

Data  Loss  Preven=on  

An=-­‐Malware  

Vulnerability  Scans  

Authen=ca=on  

TradiQonal  SIEM  

Page 24: Optimize IT Infrastructure

Business  Intelligence  Soda  Company  Use  Case  

"   Soda  Company  extracts  data  from  vending  machines,  social  media,  and  loyalty  programs  –  Distribu=on  –  New  product  development  –  Insight  into  consumer  buying  paEerns  

"   "without  data  you're  just  a  person  with  an  opinion".    "   Customers  face  challenges  with  “data  cartels”  within  their  organiza=on  "   Need  to  “free  the  data  lake”    from  ridgid  structured  data  warehouse  applica=ons  

24  

Page 25: Optimize IT Infrastructure

Analy=cs    "   What  we  are  looking  for  or  Why  will  depend  on  Who  we  ask    

–  What  are  the  normal  characteris=cs  for  a  dog?  ê  Dog  Show:  height,  weight,  coat,  gait,  posture  ê  Veterinarian:  Immuniza=ons,  history  of  illness,  injuries,  diet  ê  Parent:  Suitability  for  children,  temperament,  allergies    ê  Data  Scien=st:    Mean  +/-­‐  Standard  devia=on  

25  

-­‐mean  +  std.  dev  -­‐Mean  -­‐Mean  –  std.dev  

Page 26: Optimize IT Infrastructure

Internet  of  Things  

26  

CorrelaQon  Criteria  "   MAC  address  same  "   Content  in  Search  Results  "   Purchase  =me  

Search  Results  (ApplicaQon  Logs)  

Device  ID  (MAC  Address)  

Time  of  Search  

Content  Purchased  (IDA  #)  

Device  (MAC  Address)  

Time  of  Search  Amount  of  Purchase  ($)  

Billing  (Structured  Data)  

Search  (Machine  Data)  

Business  Value  "   Revenues  driven  by  Search  "   Improving  local  content  mix    "   BeEer  search  results  "   Tailor  content  promo=on  

>  

Page 27: Optimize IT Infrastructure

How  Splunk  Stores  Data  

Page 28: Optimize IT Infrastructure

How  Splunk  Stores  Data  "   As  Splunk  indexes  your  data  it  creates  a  bunch  of  files  

–  Raw  data  in  compressed  for  (rawdata)  –  Indexes  that  point  to  the  raw  data,  plus  some  meta  data  files  (Index  Files)  

"   The  index  files  reside  in  directories  known  as  a  “bucket”  "   A  bucket  Moves  through  Several  Stages  as  it  ages  

–  Hot  &  Warm    $SPLUNK_HOME/var/lib/splunk/defaultdb/db/*    –  Cold    $SPLUNK_HOME/var/lib/splunk/defaultdb/colddb/  

–  Frozen    Archive  (Can  sEll  be  searched  and  thawed)    "   File  name  Format    db_<newest_Eme>_<oldest_Eme>_<localid>_<guid>    

 

28  

Page 29: Optimize IT Infrastructure

Splunk  Index  Buckets  

29  

Bucket    Stage  

DescripQon   Searchable?  

Hot   Newly  Indexed  Data,  One  or  more  hot  buckets  per  Index  

Yes  

Warm   Data  rolled  from  hot.  There  are  many  warm  buckets  

Yes  

Cold   Data  rolled  from  cold.  There  are  many  cold  buckets  

Yes  

Frozen   Data  rolled  from  cold.  Splunk  deletes  frozen  data  by  default,  but  it  can  also  be  archived.  Archived  data  can  later  be  thawed  

Can  be  

Page 30: Optimize IT Infrastructure

Storage  Considera=ons  "   Storage  requirements  !=  Index  Volume  (GB/day)  

–  Search  profile  and  number  of  searches  is  just  as  important  –  Also  must  consider  data  reten=on  

" Splunk  u=lizes  I/O  to  perform  both  Searching  AND  Indexing  –  Load  =  Search  Volume  +  Indexing  Volume  –  Index  load  is  write  intensive  –  Search  load  is  read  intensive  against  the  data  searched  (current  vs  recent  vs  old)  –  SSDs  generally  provide  higher  performance  over  HDDs,  but  at  a  cost  

30  

Page 31: Optimize IT Infrastructure

Storage  Considera=ons  "   What  is  the  use-­‐case?  

–  IT  Opera=ons  use-­‐cases  typically  search  against  recent  data  (e.g.  –  0  to  14  days)  –  Security  and  Analy=cs  use-­‐cases  typically  search  all  data  (e.g.  –  days  to  months  

to  years)  

"   What  is  the  typical  =me  span  of  the  data  searched?  –  Most  ad-­‐hoc  searches  are  against  current  or  recent  data    –  Analy=cs  may  span  a  very  large  =me  frame  –  Security  forensics  typically  search  all  data  –  Reports  or  Aler=ng  Searches  might  be  over  the  past  day  or  week  

31  

Page 32: Optimize IT Infrastructure

Splunk  Index  Replica=on  –  High  Availability  

32  

2  Master  asks  the  redundant    

peer  to  act  as  primary  

3  Peers  copies  the  search  

files  /  index  files  /  raw  data  

2   3  

1  Master  auto-­‐detects  that  a  

peer  is  down  

1  •  Default  is  3X  Replica=on                                

Page 33: Optimize IT Infrastructure

Scalable  Cluster  Base  Architecture  

Send  data  from  1000s  of  servers  using  combina=on  of  Splunk  Forwarders,  syslog,  WMI,  message  queues,  or  other  remote  protocols  

Auto  load-­‐balanced  forwarding  to  as  many  Splunk  Indexers  as  you  need  to  index  terabytes/day  

Offload  search  load  to  Splunk  Search  Heads    

33  

" Automa=c  load  balancing  linearly    scales  indexing  

" Distributed  search  and  MapReduce  linearly  scales  search  and  repor=ng  

Page 34: Optimize IT Infrastructure

Splunk  Real-­‐Time  Analy=cs    

Data  Parsing  Que

ue   Parsing  Pipeline  

•  Source,  event  typing  •  Character  set  normaliza=on  

•  Line  breaking  •  Timestamp  iden=fica=on  •  Regex  transforms  

Indexing  Pipeline  

Real-­‐=me  Buffer  

Raw  data  Index  Files  

Real-­‐=me  Search  Process  

Monitor  Input  

Inde

x  Que

ue  

TCP/UDP  Input  

Scripted  Input   Splunk  Index  

34  

Page 35: Optimize IT Infrastructure

Distributed  File  System  (semi-­‐structured)  

Key/Value,  Columnar  or    Other  (semi-­‐structured)  

RelaQonal  Database    (highly  structured)  

MapReduce  

Cassandra  Accumulo  MongoDB  

Splunk  -­‐  Big  Data  Technologies  

SQL  &  MapReduce  

NoSQL  

Temporal,  Unstructured  Heterogeneous  

Hadoop  

RDBMS   HDFS  Storage  +    MapReduce  

Real-­‐Time  Indexing  

35  

Oracle  MySQL  IBM  DB2  Teradata  

Page 36: Optimize IT Infrastructure

Copyright  ©  2013  Splunk,  Inc.  

Hunk  -­‐  Hadoop  

Page 37: Optimize IT Infrastructure

Image  Search  with  Hunk  hEp://blogs.splunk.com/2013/10/18/images-­‐search-­‐with-­‐splunk-­‐and-­‐hunk/  

37  

•  Image  search  on  HDFS  using  Splunk  •  Select  images  based  on  ranges  of  color  •  3  parts    

•  The  Preprocessor  using  Hadoop  Record  reader  in  Java  

•  Splunk  Search  •  Splunk  UI  

•  search  index=images  |  eval  score=color1+color2+…+colorN  |  sort  -­‐score  by  image  

 

Page 38: Optimize IT Infrastructure

Why  Splunk  &  Hunk  •  Schema  on  the  Fly  –  fast,  flexible,  interac=ve  analy=cs  experience.      •  Interac=ve  Search  –  you  don’t  to  know  anything  about  the  data  in  advance,  Hunk  automa=cally  adds  structure  and  iden=fies  fields  of  interest,  keywords,  top  values,  and  paEerns  over  =me  

•  Results  Preview  –  query  results  are  streamed  back  in  real  =me.  Pause  and  refine  queries  without  having  to  wait  for  jobs  to  finish.  

•  Drag  and  Drop  Analy=cs  –  quickly  create  charts,  visuals  ,  and  dashboards  using  pivot  

•  Rich  App  ecosystem  for  popular  applica=ons  and  data  types  •  Hunk  –  Search  and  Report  on  na=ve  HDFS  without  inges=ng  the  data  

38  

Page 39: Optimize IT Infrastructure

Challenges  With  Open  Source  Analy=cs    •  Open  source  sozware  such  as  Hadoop  and  Cassandra  require  significant  services  effort  —  as  much  as  20X  higher  personnel  costs  rela=ve  to  sozware  purchases.    

•  Challenges  Ge|ng  Value  from  Data  in  Hadoop  •  Easy  storage  but  hard  analy=cs:  difficult  for  non-­‐specialists  to  explore,  analyze  and  

visualize  data  •  Complex  technology:  wide  range  of  open  source  projects  •  Hard-­‐to-­‐staff  skills:  must  write  MapReduce  jobs  or  pre-­‐define  schemas  for  Hive  

•  Hadoop  was  designed  to  be  a  batch  job  processing  system,  ie  you  start  a  job  and  see  results  in  a  range  from  tens  of  minutes  to  days.  

39  

Gartner,  “Big  Data  Drives  Rapid  Changes  in  Infrastructure  and    US$232  Billion  in  IT  Spending  Through  2016”,  October  17,  2012  

Page 40: Optimize IT Infrastructure

Splunk  and  Hadoop  

40  

"   Hunk:  – Main  use  case  =  Analyze  Hadoop  Data  using  Hadoop  Processing  

"    Splunk  Hadoop  Connect:    – Main  use  case  =  Real-­‐=me  export  data  from  Splunk  to  Hadoop  

"   Hunk  Archive    –  Main  use  case  =  Archive  Splunk  indexers  to  Hadoop  

"   Splunk  HadoopOps:  – Main  use  case  =  Monitor  Hadoop  

Page 41: Optimize IT Infrastructure

41  

Integrated  Analy=cs  Pla�orm  

Full-­‐featured,  Integrated  Product  

Insights  for  Everyone  

Works  with  What  You  Have  Today  

Explore   Visualize   Dashboards  

Share  Analyze  

Hadoop  Clusters   NoSQL,  EMR,  S3  Buckets  

Hadoop  Client  Libraries  

for  Diverse  Data  Stores  

Page 42: Optimize IT Infrastructure

Hunk  –  Unique    

42  

1.   Run  NaQvely  in  Hadoop:  –  Use  Hadoop  MapReduce    

2.   Mixed  Mode:    –  Allows  for  data  Preview  

3.   Auto  deploy  SplunkD  to  DataNodes:  –  On  the  fly  Indexing  

4.   Access  Control:  –  Allows  for  many  users  /  many  Hadoop  directories  /  support  Kerberos      

5.   Schema  On  the  Fly  

Page 43: Optimize IT Infrastructure

Mixed-­‐mode  Search  

43  

Time  

Hadoop  MR  /  Splunk  Index  

Splunk  Stream  Switch  over  

=me    

preview  

preview  

•  Data  Preview    •  Allows  users  to  search  interac=vely  by  pausing  and  

refining  queries  

Page 44: Optimize IT Infrastructure

44  

Role-­‐based  Security  for  Shared  Clusters  

Pass-­‐through  Authen=ca=on  •  Provide  role-­‐based  security  for  Hadoop  clusters  

•  Access  Hadoop  resources  under  security  and  compliance  

•  Integrates  with  Kerberos  for  Hadoop  security  

Business  Analyst  

MarkeQng  Analyst  

Sys  Admin  

Business    Analyst    Queue:    

Biz  AnalyQcs  

MarkeQng  Analyst  Queue:  

MarkeQng  

Sys    Admin2  Queue:    Prod  

Page 45: Optimize IT Infrastructure

Hadoop  as  a  Self  Service  

45  

Page 46: Optimize IT Infrastructure

Copyright  ©  2013  Splunk,  Inc.  

Thank  you  

Page 47: Optimize IT Infrastructure

Copyright  ©  2015  Splunk  Inc.  

Jeff  Wiggins  Systems  Engineer  Manager,    Emerging  Technologies  @  EMC  

Splunk…so  Big  and  Flashy  Building  Massive  and  Efficient  Indexer  Storage  Environments  for  Splunk  

Page 48: Optimize IT Infrastructure

Architecture  MaEers…  

Scale-up Scale-Out

Page 49: Optimize IT Infrastructure

SPLUNK  STORAGE  REQUIREMENTS  

•  High-­‐Performance  Storage                –  Rare  &  Sparse  Searches  

•  High-­‐Capacity  Storage                              –  Long-­‐Term  Reten=on  

•  Scale-­‐Out  Infrastructure                    –   Indexer  &  Search  Heads  

•  De-­‐dupe  &  Compression      –  Clustered  Indexer  Deployments  

•  Backup  &  Security                                        –  Data  Protec=on  &  Compliance  

ENTERPRISE  PERFORMANCE  AND  DATA  SERVICES  

Indexers  

Search  Heads  

Capacity  Triggered  

HOT  

WARM  

COLD  

Page 50: Optimize IT Infrastructure

DAS  PRESENTS  CHALLENGES  SPLUNK DAS ENVIRONMENT 1 Dedicated Storage Infrastructure

•  Silo that only runs Splunk

2 Compromised Availability •  SSDs & servers fail •  Index rebuilds can take hours to days

3 Lack of Enterprise Data Protection •  No Snapshots or Compliance •  DR limited to Multisite Clustering

4 Poor Storage Efficiency •  Multiple copies of data •  Multisite Clustering Increases Overhead

5 Non-Optimized Growth •  Fixed compute to storage ratio •  Servers must maintain storage symmetry

6 Management complexity •  Multiple management points

1x

2x

3x

2x

3x

1x

Page 51: Optimize IT Infrastructure

WHY  EMC  FOR  SPLUNK  OPTIMIZED  INFRASTRUCTURE  FOR  BIG  &  FAST  DATA  

   

OpQmized  Shared  Storage  &  Tiering  

                   

Hot & Warm Data Deployed On XtremIO or ScaleIO

Cold & Frozen Data Deployed On Isilon

 Powerful  Data  Services    

                   

Encyption & Security

Index File Compression

Deduplication Of Clustered Indexes

Snapshots For Backups

Cost-­‐EffecQve  &  Flexible  Scale-­‐Out  

               

Scale-Out Capacity & Compute Independently Or

As Converged Platform

Page 52: Optimize IT Infrastructure

Why  Flash?!?  Economic  Influences    ü  Consumer  Demand  

ü  Data  Services  Reducing  Impact  of  Applica=on  Data  Copies  

ü  Flash  technology  has  improved  at  a  faster  rate  than  Moore’s  Law  

Intelligent  Scale-­‐out  Flash  

HDD  

Page 53: Optimize IT Infrastructure

AGILE WRITEABLE SNAPSHOTS

INLINE DATA AT REST ENCRYPTION

XTREMIO DATA PROTECTION

INLINE DEDUPLICATION

INLINE COMPRESSION

ALWAYS-ON THIN

PROVISIONING

XTREMIO  DATA  SERVICES  ALWAYS-­‐ON,  INLINE,  ZERO  PENALTY,  FREE  

 

Page 54: Optimize IT Infrastructure

 Data  Services  For  Hot  &  

Warm  Data                      

Self-Encrypting Flash Drives

Index File Compression

Dedupe Clustered Index Copies

In-Memory Data Copy Services

EMC  XTREMIO  &  SPLUNK  ALL-­‐FLASH  INFRASTRUCTURE  FOR  HOT  &  WARM  DATA  

Scale-Out Flash For I/O-Bound Data >1M IOPS & <1ms Latencies

High-Speed Search Accelerate SuperSparse & Rare Searches

Indexers  

Search  Heads  

Page 55: Optimize IT Infrastructure

EMC  SCALEIO  &  SPLUNK  CONVERGED  ARCHITECTURE  FOR  HOT  &  WARM  DATA  

Indexers  

Search  Heads  

Servers  

Network  

Storage  

Converged  Splunk  Architecture  

Leveraging  Exis=ng  Hardware  Investments  

5K  IOPS  1  TB  

5K  IOPS  1  TB  

5K  IOPS  1  TB  

5K  IOPS  1  TB  

5K  IOPS  1  TB  

Shared  Capacity  &  Performance  

Remove  Silos  &  Increase  ROI  On  DAS  Capacity  &  No  Single  Point  

Of  Failure  

25K  IOPS  &  5TB  

Page 56: Optimize IT Infrastructure

OneFS  

EMC  Isilon  –  Deep  and  WIDE  Storage  Single  Volume/    File  System  

Policy  based  Tiering  

Simplicity  &  Ease  of  Use  

Linear  Scalability  

MulQ-­‐protocol  support  

High  Performance  

Unmatched  Efficiency  

Easy  Growth  

Page 57: Optimize IT Infrastructure

Consolidate,  Protect  &  Secure  Cold  Data  

                 

SmartLock Protects Cold & Frozen Data

SmartDedupe For Clustered Indexes

Snapshots IQ For Backups

EMC  ISILON  &  SPLUNK  LOW-­‐COST  &  SECURE  SCALE-­‐OUT  FOR  COLD  DATA  

 

High-Speed Ingest & Long-Term Retention With Native HDFS Integration

Indexers  

Search  Heads  

Scale-Out Capacity Up To 50PB Of Highly

Available Capacity

Self-Encrypting Drives

Page 58: Optimize IT Infrastructure

© 2015 Scalar Decisions Inc. Not for distribution outside of intended audience. 58  

For more information:

§  Read more about Scalar’s infrastructure practice model:

§  https://www.scalar.ca/en/what-we-do/#/services/pillar/infrastructure-en

Page 59: Optimize IT Infrastructure

© 2015 Scalar Decisions Inc. Not for distribution outside of intended audience. 59  

Connect with us!

§  @scalardecisions

§  Scalar Decisions

§  Facebook.com/ScalarDecisions