Copyright © 2015 Splunk Inc.
Pierre Pellissier Leela Kesireddy Performance Management PayPal, Inc.
Splunk for Akamai Cloud Monitor
Disclaimer
2
During the course of this presentaEon, we may make forward looking statements regarding future events or the expected performance of the company. We cauEon you that such statements reflect our current expectaEons and esEmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements, please review our filings with the SEC. The forward-‐looking statements made in the this presentaEon are being made as of the Eme and date of its live presentaEon. If reviewed aRer its live presentaEon, this presentaEon may not contain current or
accurate informaEon. We do not assume any obligaEon to update any forward looking statements we may make.
In addiEon, any informaEon about our roadmap outlines our general product direcEon and is subject to change at any Eme without noEce. It is for informaEonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaEon either to develop the features
or funcEonality described or to include any such feature or funcEonality in a future release.
Agenda
3
! Why we need Log Analysis ! What we had was too slow ! Real Time logging OpEon ! GeXng to Real Time ! Benefits & Challenges ! ConfiguraEon Details ! Akamai Splunk ApplicaEon ! Final Analysis and Q & A
Why we need log analysis
4
• Customer interacEons on Akamai exceed 1 BN/day
• It is business criEcal that we : – Know the state of the site – Track the impact of changes – Analyze the customer experience
What we had was too slow
5
! Akamai for content delivery, web acceleraEon and security
! Akamai log collecEon takes hours to get logs into NetStorage.
! Downloading logs hourly and indexing them in Splunk makes them available for analysis.
! But this is not a real Eme soluEon. DC #1 DC #2
WAF
Hourly download
Log Gathering
Akamai Network
NetStorage
Real Eme logging opEon
6
! Akamai Cloud Monitor delivers logs in < 60 sec
! Faster delivery enables real Eme operaEonal monitoring and easier analyEcs by internal users.
! Lots of configuraEon decisions. ! The results are worth the effort. ! UlEmately easier than expected.
DC #1 DC #2
WAF
Aggregation
Real Time Delivery
CM Log Streaming
Akamai Network
GeXng to real Eme
7
! OperaEonal goals required: – Real Eme visibility into site – Improved usability of logs
! Lots of design decisions needed to be made – How will we receive the logs? – How do we configure CM on Akamai? – How will this change the indexed volume? – How do we ensure we get the logs? – How will the logs be used?
! The iniEal benefits were achieved quickly – MulEple “bonus” benefits were also realized – Some challenges needed to be dealt with
DC #2
Aggregation
Real Time Delivery
CM Log Streaming
Which data?
What volume?
Redundancy?
What aggregaEon?
How to receive?
Dashboards views?
Which Sites?
IniEal benefits – site control
8
! Receiving the logs in <60 sec. makes them usable for real Eme analysis – The NOC can immediately see the effects of site changes – The CDN team can see the effects of Akamai configuraEon changes in real Eme
Monitoring of site ramp acEvity.
IniEal benefits – log usability
9
! NetStorage log parsing needed improving
! Akamai CM logs use JSON formaXng
! Splunk TA for CM logs put them into CIM format for ES consumpEon
! Internal customers can work with the JSON formaned logs more intuiEvely
AddiEonal benefits – opEonal CM data
10
! OpEonal data fields enable rich analyEcs for: – OperaEonal Teams – Performance – Security – Fraud
! Adding more fields results in higher indexed volume
AddiEonal benefits – custom fields
11
! Data from the connecEon, HTTP header or payload can be inserted into CM
! Enables powerful insight for mulEple teams: – Crypto use analysis – Customer Experience – Referer AnalyEcs – TroubleshooEng
AddiEonal benefits – more complete data
12
! For real Eme analyEcs we chose to receive 100% of the CM logs – Some use cases may only need 10%
! NetStorage log collecEon does not capture 100% of the logs
! The CM log count is ~5% greater than the NetStorage log count – This increases our confidence that we
are seeing the enEre picture DC #1 DC #2
WAF
Real Time Delivery
Sample percentage
= 100%
Akamai Network
Challenges
13
! Greater log volume with CM – ~50% more data indexed – Partly due to JSON formaXng – Mostly due to addiEonal CM data
! IniEal configuraEon of log receiver did not provide redundancy – CM Logs were lost when it was
unavailable ! Log forwarding logic took several iteraEons to get it right – A POC got us close and then we
adjusted aRer we were in producEon DC #1 DC #2
WAF
Delivery Failure
Akamai Network
ConfiguraEon Details
CM & splunk checklist
15
Ini$al configura$ons: q Add Cloud Monitor to the Akamai contract q Set up receiver VIP, DNS, SSL Cert and redundant servers q Choose the property and CM data sets to log q Build the iniEal CM and Splunk configuraEons
Trial Run: q Configure for a short duraEon of logging at 10% to verify receiver is working q Verify the CM data is being mapped to Splunk CIM q Verify that the receiver and Splunk can support the expected load q Verify the data is what you need and expect
Plan on doing several rounds of configura$ons, tes$ng and tuning q It is easy to iterate and change the configuraEons as needed
CM data delivery
16
! Select Data Sets to Include ! Configure Delivery Endpoint ! Configure the AggregaEon opEons:
– Time: 60 seconds max – Line Count: Max 3000 records – Message Size: Max 900 KB of data
! Configure DistribuEon & Failover OpEons: – Primary receiver gets 100% – Secondary receiver gets 100% if
Primary receiver is unavailable – NetStorage gets 100% if Primary &
Secondary are unavailable ê With scheduled FTP download hourly
CM configuraEons
17
! Default data sets: – CP, Format, Message, Type, Version
! OpEonal data sets: – Akadebug, network, netPerf, Geo, WAF, PPCustomData
ê Can not opt out of data elements within an opEonal data set ê Results in duplicates of some data or unwanted data
! Custom Data field: – Selected HTTP Header data is included – Other header info is excluded-‐ like large cookies
CM receiver
18
! Receiver build – Akamai Posts CM data to receiver VIP – SSL cert required for secure connecEon by Akamai – MulEple Linux servers in pool running nginx & node.js – Writes logs to a local data file – Splunk Universal Forwarder monitors logs and forwards them to Indexers
CM receiver (future state)
19
Splunk VIP
CM VIP1
CM VIP2
CM VIP1
CM VIP2
! Plan for receiver redundancy in the iniEal design
NetStorage
Splunk configuraEon
20
! Index configured with 100 day retenEon ! Built Add-‐on’s with Field renames &
EvaluaEons ! Normalized data to be CIM Complaint ! LDAP group configured for accessing CM data ! Apps set up for internal teams to build and
share searches ! Common Akamai CM Dashboard for all teams
– Performance & Availability Dashboard for each property
– StarEng point for internal teams to view data but with click through access to granular data and search bar
New-‐ Splunk 6.3 HTTP event collector ! HTTP endpoint can securely receive high-‐
volume JSON-‐based applicaEon and IOT data
! Create and mange receiver configuraEons using the HTTP event collector configuraEon
! Token based authenEcaEon Model ! Supports to both hnp & hnps ! Replaces the nginx & node.js receiver soluEon
21
Akamai ApplicaEon
Akamai App
! Monitor the Real Time Traffic ! Insights by Property ! Monitor Health & Performance of Origin ! Monitor Heath & Performance of Akamai Edge ! Observe User’s Pain Points ! Inspect App / URL level Issues & Performance ! InvesEgate Issues
23
24
! Performance Issue – Searches Were Slow – Dashboards taking longer to load – Greater than 25K events/sec
! Summary Index – Loses the Rich informaEon in data
! Report AcceleraEon – AcceleraEon is suspended as the summary reaches 10% of its total size – No Control on Summary Schedule
Hurdles
25
Data Models & AcceleraEon Came to Rescue
Akamai ApplicaEon
26
Data Model’s facilitated to format, evaluate fields and work with necessary event data for each data set and also accelerate the model
! Data models are made up of object types like lookups, transacEons, fields extracEons & Calculated fields.
! Data models encodes only necessary knowledge objects.
! Data Model for each Property or group of ProperEes.
! Objects by Success or Failure (HTTP Status) ! Add addiEonal fields using regular expressions, eval expressions and lookups.
Data Models
Dashboards
27
! Real Time Availability & Performance
! Origin Insights ! Edge Overview ! Security ! Release Monitoring
Availability & Performance
28
Availability • Volume of Hits. • Success & Failure Volume by HTTP
Status. • Success & Failure by GEO
Performance • Track total Eme for transacEons. • Origin Latency • Last Mile Round Trip Time
Origin & Edge
29
Origin • Monitor traffic distribuEon.
• Error Monitoring. • Monitor Latency.
• Edge Errors • Edge Performance stats • Last Mile RTT
Edge
Security
30
• Monitor the WAF denies and warnings. • Monitor Top Deny Rules. • Report on the WAF warning triggered. • Top deny & warning URL’s. • Deny & warn tracking by Geo • Top Denied Client IP’s
Release Monitoring
31
• Monitor Traffic by Origin / DC • Monitor Issue’s & Performance by
Property • Performance by Origin • Last Mile Time by Edge • Performance by GEO DC 01 DC 02 DC 03 DC 04 DC 05
DC 01 DC 02 DC 03 DC 04 DC 05
Final Analysis and Q & A
Final analysis
33
! The iniEal business needs for Real Time logging have been met: – Real Eme monitoring enables us to know the current state of the site – Dashboards allow us to track the effect of site changes – JSON data formaXng makes it easier to do analyEcs on the customer
experience ! The addiEonal benefits are extremely valuable:
– Custom fields to help troubleshoot site and code issues – Header informaEon providing customer experience analyEcs data – Rich data set about customer crypto to help with SHA256 migraEon – Performance data on a regional and network level
! ConEnuing to find new ways to leverage the CM data in Splunk ! Q & A
THANK YOU