optimize it infrastructure
TRANSCRIPT
October 15, 2015
Optimize your IT Infrastructure with Scalar, EMC and Splunk
Scalar leads Canadian Business to the Next Generation of IT through
Innovation, Expertise & Service
3
DAVID WIEDASECK SR. Partner Sales Engineer
JEFFREY WIGGINS ETD SE Manager
MICHAEL TRAVES Solutions Architect
© 2015 Scalar Decisions Inc. Not for distribution outside of intended audience. 4
Scalar Client Solutions
Security
Context-Based Enterprise Security
Infrastructure
Integration of Emerging Technologies
Cloud
Hybrid Cloud Solutions
© 2015 Scalar Decisions Inc. Not for distribution outside of intended audience. 5
Splunk Analytics – Use Cases
Operational Intelligence
§ IT Operations: Utilization, Capacity Growth
§ Security: Fraud Detection, Real-time Detection of Threats, Forensics
§ Internet of Things (IoT): Sensor Data, Machine-to-Machine, Machine-Human Interactions
© 2015 Scalar Decisions Inc. Not for distribution outside of intended audience. 6
Consulting – Solution Design
§ Business Drivers
§ Alignment with IT
§ Stakeholders and Big Data Teams
§ (Data Scientists, Business Analysts, Marketing, IT, CxO, Dir.)
§ Sizing
§ Ingest Performance and Scalability, Search & Index
§ Infrastructure – Scale Out
§ Compute (Virtual, Physical)
§ Network (1/10/40GbE)
§ Storage (Hot/Warm and Cold/Frozen Tiers)
§ Data Security and Protection (Distributed or Consolidated)
© 2015 Scalar Decisions Inc. Not for distribution outside of intended audience. 7
Consulting – Deployment
§ Build
§ Pilot and Pre-production
§ Proof of Value
§ Integration with Big Data and Data Lake Initiatives
§ Validate
§ Performance and Scalability
§ Availability
§ Customize
§ Dashboards
§ Reporting and Alerting
8
We want to work with YOU
9
But why should you work with US?
10
© 2015 Scalar Decisions Inc. Not for distribution outside of intended audience. 11
Top Tier Technical Talent
§ Engineers average 15 years of experience
§ World-class experts from some of the leading organizations in the industry
§ Dedicated PMO, finance, sales and operations teams
Copyright © 2013 Splunk, Inc.
Splunk Big Data Analy=cs
Machine Data OR Big Data?
AND VALUABLE
SPLUNK - MAKE MACHINE DATA ACCESSIBLE, USABLE
TO EVERYONE What is Machine Data hEps://youtu.be/3YEE3RfXVVA
COLLECT DATA FROM ANYWHERE
SEARCH AND ANALYZE EVERYTHING
GAIN REAL-‐TIME DATA
INTELLIGENCE
The Power of Splunk
15
16
Turning Machine Data Into Business Value Index Untapped Data: Any Source, Type, Volume
Online Services Web
Services
Servers Security GPS
Loca=on
Storage Desktops
Networks
Packaged Applica=ons
Custom Applica=ons Messaging
Telecoms Online
Shopping Cart
Web Clickstreams
Databases
Energy Meters
Call Detail Records
Smartphones and Devices
RFID
On-‐ Premises
Private Cloud
Public Cloud
Ask Any QuesQon
ApplicaQon Delivery
Security, Compliance and Fraud
IT OperaQons
Business AnalyQcs
Industrial Data and the Internet of Things
What Does Machine Data Look Like? Sources
Order Processing
TwiTer
Care IVR
Middleware Error
17
Machine Data Contains CriQcal Insights Customer ID Order ID
Customer’s Tweet
Time Wai=ng On Hold
TwiEer ID
Product ID
Company’s TwiEer ID
Customer ID Order ID
Customer ID
Sources
Order Processing
TwiTer
Care IVR
Middleware Error
18
Machine Data Contains CriQcal Insights Order ID
Customer’s Tweet
Time Wai=ng On Hold
Product ID
Company’s TwiEer ID
Order ID
Customer ID
TwiEer ID
Customer ID
Customer ID
Sources
Order Processing
TwiTer
Care IVR
Middleware Error
19
SPLUNK TODAY
20
Mainframe Data
VMware
Platform for Machine Data
Exchange PCI Security
DB Connect Mobile Forwarders Syslog, TCP, Other
Sensors, Control Systems
600+ Ecosystem of Apps
Stream
Splunk Use Cases
IT Opera=ons
API SDKs UI
Server, Storage, Network
Server Virtualiza=on
Opera=ng Systems
Custom Applica=ons
Business Applica=ons
Cloud Services
App Performance Monitoring Ticke=ng/Other
Web Intelligence
Mobile Applica=ons
Servers
Storage
Desktops Email Web
Transac=on Records
Network Flows
DHCP/ DNS
Hypervisor Custom Apps
Physical Access
Badges
Threat Intelligence
Mobile
CMBD
23
Security
Intrusion Detec=on
Firewall
Data Loss Preven=on
An=-‐Malware
Vulnerability Scans
Authen=ca=on
TradiQonal SIEM
Business Intelligence Soda Company Use Case
" Soda Company extracts data from vending machines, social media, and loyalty programs – Distribu=on – New product development – Insight into consumer buying paEerns
" "without data you're just a person with an opinion". " Customers face challenges with “data cartels” within their organiza=on " Need to “free the data lake” from ridgid structured data warehouse applica=ons
24
Analy=cs " What we are looking for or Why will depend on Who we ask
– What are the normal characteris=cs for a dog? ê Dog Show: height, weight, coat, gait, posture ê Veterinarian: Immuniza=ons, history of illness, injuries, diet ê Parent: Suitability for children, temperament, allergies ê Data Scien=st: Mean +/-‐ Standard devia=on
25
-‐mean + std. dev -‐Mean -‐Mean – std.dev
Internet of Things
26
CorrelaQon Criteria " MAC address same " Content in Search Results " Purchase =me
Search Results (ApplicaQon Logs)
Device ID (MAC Address)
Time of Search
Content Purchased (IDA #)
Device (MAC Address)
Time of Search Amount of Purchase ($)
Billing (Structured Data)
Search (Machine Data)
Business Value " Revenues driven by Search " Improving local content mix " BeEer search results " Tailor content promo=on
>
How Splunk Stores Data
How Splunk Stores Data " As Splunk indexes your data it creates a bunch of files
– Raw data in compressed for (rawdata) – Indexes that point to the raw data, plus some meta data files (Index Files)
" The index files reside in directories known as a “bucket” " A bucket Moves through Several Stages as it ages
– Hot & Warm $SPLUNK_HOME/var/lib/splunk/defaultdb/db/* – Cold $SPLUNK_HOME/var/lib/splunk/defaultdb/colddb/
– Frozen Archive (Can sEll be searched and thawed) " File name Format db_<newest_Eme>_<oldest_Eme>_<localid>_<guid>
28
Splunk Index Buckets
29
Bucket Stage
DescripQon Searchable?
Hot Newly Indexed Data, One or more hot buckets per Index
Yes
Warm Data rolled from hot. There are many warm buckets
Yes
Cold Data rolled from cold. There are many cold buckets
Yes
Frozen Data rolled from cold. Splunk deletes frozen data by default, but it can also be archived. Archived data can later be thawed
Can be
Storage Considera=ons " Storage requirements != Index Volume (GB/day)
– Search profile and number of searches is just as important – Also must consider data reten=on
" Splunk u=lizes I/O to perform both Searching AND Indexing – Load = Search Volume + Indexing Volume – Index load is write intensive – Search load is read intensive against the data searched (current vs recent vs old) – SSDs generally provide higher performance over HDDs, but at a cost
30
Storage Considera=ons " What is the use-‐case?
– IT Opera=ons use-‐cases typically search against recent data (e.g. – 0 to 14 days) – Security and Analy=cs use-‐cases typically search all data (e.g. – days to months
to years)
" What is the typical =me span of the data searched? – Most ad-‐hoc searches are against current or recent data – Analy=cs may span a very large =me frame – Security forensics typically search all data – Reports or Aler=ng Searches might be over the past day or week
31
Splunk Index Replica=on – High Availability
32
2 Master asks the redundant
peer to act as primary
3 Peers copies the search
files / index files / raw data
2 3
1 Master auto-‐detects that a
peer is down
1 • Default is 3X Replica=on
Scalable Cluster Base Architecture
Send data from 1000s of servers using combina=on of Splunk Forwarders, syslog, WMI, message queues, or other remote protocols
Auto load-‐balanced forwarding to as many Splunk Indexers as you need to index terabytes/day
Offload search load to Splunk Search Heads
33
" Automa=c load balancing linearly scales indexing
" Distributed search and MapReduce linearly scales search and repor=ng
Splunk Real-‐Time Analy=cs
Data Parsing Que
ue Parsing Pipeline
• Source, event typing • Character set normaliza=on
• Line breaking • Timestamp iden=fica=on • Regex transforms
Indexing Pipeline
Real-‐=me Buffer
Raw data Index Files
Real-‐=me Search Process
Monitor Input
Inde
x Que
ue
TCP/UDP Input
Scripted Input Splunk Index
34
Distributed File System (semi-‐structured)
Key/Value, Columnar or Other (semi-‐structured)
RelaQonal Database (highly structured)
MapReduce
Cassandra Accumulo MongoDB
Splunk -‐ Big Data Technologies
SQL & MapReduce
NoSQL
Temporal, Unstructured Heterogeneous
Hadoop
RDBMS HDFS Storage + MapReduce
Real-‐Time Indexing
35
Oracle MySQL IBM DB2 Teradata
Copyright © 2013 Splunk, Inc.
Hunk -‐ Hadoop
Image Search with Hunk hEp://blogs.splunk.com/2013/10/18/images-‐search-‐with-‐splunk-‐and-‐hunk/
37
• Image search on HDFS using Splunk • Select images based on ranges of color • 3 parts
• The Preprocessor using Hadoop Record reader in Java
• Splunk Search • Splunk UI
• search index=images | eval score=color1+color2+…+colorN | sort -‐score by image
Why Splunk & Hunk • Schema on the Fly – fast, flexible, interac=ve analy=cs experience. • Interac=ve Search – you don’t to know anything about the data in advance, Hunk automa=cally adds structure and iden=fies fields of interest, keywords, top values, and paEerns over =me
• Results Preview – query results are streamed back in real =me. Pause and refine queries without having to wait for jobs to finish.
• Drag and Drop Analy=cs – quickly create charts, visuals , and dashboards using pivot
• Rich App ecosystem for popular applica=ons and data types • Hunk – Search and Report on na=ve HDFS without inges=ng the data
38
Challenges With Open Source Analy=cs • Open source sozware such as Hadoop and Cassandra require significant services effort — as much as 20X higher personnel costs rela=ve to sozware purchases.
• Challenges Ge|ng Value from Data in Hadoop • Easy storage but hard analy=cs: difficult for non-‐specialists to explore, analyze and
visualize data • Complex technology: wide range of open source projects • Hard-‐to-‐staff skills: must write MapReduce jobs or pre-‐define schemas for Hive
• Hadoop was designed to be a batch job processing system, ie you start a job and see results in a range from tens of minutes to days.
39
Gartner, “Big Data Drives Rapid Changes in Infrastructure and US$232 Billion in IT Spending Through 2016”, October 17, 2012
Splunk and Hadoop
40
" Hunk: – Main use case = Analyze Hadoop Data using Hadoop Processing
" Splunk Hadoop Connect: – Main use case = Real-‐=me export data from Splunk to Hadoop
" Hunk Archive – Main use case = Archive Splunk indexers to Hadoop
" Splunk HadoopOps: – Main use case = Monitor Hadoop
41
Integrated Analy=cs Pla�orm
Full-‐featured, Integrated Product
Insights for Everyone
Works with What You Have Today
Explore Visualize Dashboards
Share Analyze
Hadoop Clusters NoSQL, EMR, S3 Buckets
Hadoop Client Libraries
for Diverse Data Stores
Hunk – Unique
42
1. Run NaQvely in Hadoop: – Use Hadoop MapReduce
2. Mixed Mode: – Allows for data Preview
3. Auto deploy SplunkD to DataNodes: – On the fly Indexing
4. Access Control: – Allows for many users / many Hadoop directories / support Kerberos
5. Schema On the Fly
Mixed-‐mode Search
43
Time
Hadoop MR / Splunk Index
Splunk Stream Switch over
=me
preview
preview
• Data Preview • Allows users to search interac=vely by pausing and
refining queries
44
Role-‐based Security for Shared Clusters
Pass-‐through Authen=ca=on • Provide role-‐based security for Hadoop clusters
• Access Hadoop resources under security and compliance
• Integrates with Kerberos for Hadoop security
Business Analyst
MarkeQng Analyst
Sys Admin
Business Analyst Queue:
Biz AnalyQcs
MarkeQng Analyst Queue:
MarkeQng
Sys Admin2 Queue: Prod
Hadoop as a Self Service
45
Copyright © 2013 Splunk, Inc.
Thank you
Copyright © 2015 Splunk Inc.
Jeff Wiggins Systems Engineer Manager, Emerging Technologies @ EMC
Splunk…so Big and Flashy Building Massive and Efficient Indexer Storage Environments for Splunk
Architecture MaEers…
Scale-up Scale-Out
SPLUNK STORAGE REQUIREMENTS
• High-‐Performance Storage – Rare & Sparse Searches
• High-‐Capacity Storage – Long-‐Term Reten=on
• Scale-‐Out Infrastructure – Indexer & Search Heads
• De-‐dupe & Compression – Clustered Indexer Deployments
• Backup & Security – Data Protec=on & Compliance
ENTERPRISE PERFORMANCE AND DATA SERVICES
Indexers
Search Heads
Capacity Triggered
HOT
WARM
COLD
DAS PRESENTS CHALLENGES SPLUNK DAS ENVIRONMENT 1 Dedicated Storage Infrastructure
• Silo that only runs Splunk
2 Compromised Availability • SSDs & servers fail • Index rebuilds can take hours to days
3 Lack of Enterprise Data Protection • No Snapshots or Compliance • DR limited to Multisite Clustering
4 Poor Storage Efficiency • Multiple copies of data • Multisite Clustering Increases Overhead
5 Non-Optimized Growth • Fixed compute to storage ratio • Servers must maintain storage symmetry
6 Management complexity • Multiple management points
1x
2x
3x
2x
3x
1x
WHY EMC FOR SPLUNK OPTIMIZED INFRASTRUCTURE FOR BIG & FAST DATA
OpQmized Shared Storage & Tiering
Hot & Warm Data Deployed On XtremIO or ScaleIO
Cold & Frozen Data Deployed On Isilon
Powerful Data Services
Encyption & Security
Index File Compression
Deduplication Of Clustered Indexes
Snapshots For Backups
Cost-‐EffecQve & Flexible Scale-‐Out
Scale-Out Capacity & Compute Independently Or
As Converged Platform
Why Flash?!? Economic Influences ü Consumer Demand
ü Data Services Reducing Impact of Applica=on Data Copies
ü Flash technology has improved at a faster rate than Moore’s Law
Intelligent Scale-‐out Flash
HDD
AGILE WRITEABLE SNAPSHOTS
INLINE DATA AT REST ENCRYPTION
XTREMIO DATA PROTECTION
INLINE DEDUPLICATION
INLINE COMPRESSION
ALWAYS-ON THIN
PROVISIONING
XTREMIO DATA SERVICES ALWAYS-‐ON, INLINE, ZERO PENALTY, FREE
Data Services For Hot &
Warm Data
Self-Encrypting Flash Drives
Index File Compression
Dedupe Clustered Index Copies
In-Memory Data Copy Services
EMC XTREMIO & SPLUNK ALL-‐FLASH INFRASTRUCTURE FOR HOT & WARM DATA
Scale-Out Flash For I/O-Bound Data >1M IOPS & <1ms Latencies
High-Speed Search Accelerate SuperSparse & Rare Searches
Indexers
Search Heads
EMC SCALEIO & SPLUNK CONVERGED ARCHITECTURE FOR HOT & WARM DATA
Indexers
Search Heads
Servers
Network
Storage
Converged Splunk Architecture
Leveraging Exis=ng Hardware Investments
5K IOPS 1 TB
5K IOPS 1 TB
5K IOPS 1 TB
5K IOPS 1 TB
5K IOPS 1 TB
Shared Capacity & Performance
Remove Silos & Increase ROI On DAS Capacity & No Single Point
Of Failure
25K IOPS & 5TB
OneFS
EMC Isilon – Deep and WIDE Storage Single Volume/ File System
Policy based Tiering
Simplicity & Ease of Use
Linear Scalability
MulQ-‐protocol support
High Performance
Unmatched Efficiency
Easy Growth
Consolidate, Protect & Secure Cold Data
SmartLock Protects Cold & Frozen Data
SmartDedupe For Clustered Indexes
Snapshots IQ For Backups
EMC ISILON & SPLUNK LOW-‐COST & SECURE SCALE-‐OUT FOR COLD DATA
High-Speed Ingest & Long-Term Retention With Native HDFS Integration
Indexers
Search Heads
Scale-Out Capacity Up To 50PB Of Highly
Available Capacity
Self-Encrypting Drives
© 2015 Scalar Decisions Inc. Not for distribution outside of intended audience. 58
For more information:
§ Read more about Scalar’s infrastructure practice model:
§ https://www.scalar.ca/en/what-we-do/#/services/pillar/infrastructure-en
© 2015 Scalar Decisions Inc. Not for distribution outside of intended audience. 59
Connect with us!
§ @scalardecisions
§ Scalar Decisions
§ Facebook.com/ScalarDecisions